From blk at skynet.be Thu Jul 13 12:30:27 2000 From: blk at skynet.be (Brad Knowles) Date: Thu, 13 Jul 2000 12:30:27 +0200 Subject: [RETRANSMISSION] DNS problems with nameservers for .be and 193.in-addr.arpa Message-ID: [ The previous version of this message (with attachments included) was too large to be accepted by certain recipient addresses, so I am retransmitting the message with the attachments instead available at the URL . I also mistakenly left off a few recipients on the original version, which are included here. -Brad ] Folks, A couple of problems with some pretty important nameservers has been brought to my attention in the last couple of weeks. I would not normally bring this issue directly to such a wide group of people, but I believe that this matter is sufficiently serious to warrant such an approach. In particular, we have discovered that holem.belnet.be [193.190.198.10] (acting as one of the two nameservers for ns.belnet.be) is acting as both an authoritative nameserver for .be (among many others, I'm sure), as well as acting as a general caching nameserver. This has hit us a few times when we've changed machines on our own networks, made the changes in our authoritative nameservers and then reloaded the zones, and then finally restarted our caching nameservers -- only to have the data in the caching nameservers poisoned by incorrect information from holem.belnet.be [193.190.198.2]. Earlier we had the same problem with vivaldi.belnet.be [193.190.198.2] (the other machine acting as ns.belnet.be). However, the problem with vivaldi.belnet.be appears to have since been corrected. However, I note that DNS.CS.KULEUVEN.AC.be, NS.EU.NET, SPARKY.ARL.MIL, and SUNIC.SUNET.SE appear to have the same problem as holem.belnet.be [193.190.198.10]. This is an extremely dangerous situation, one that has long since been recognized by experienced nameserver administrators. This is why section 2.5 of RFC 2870 "Root Nameserver Operational Requirements" says: 2.5 Servers MUST provide authoritative responses only from the zones they serve. The servers MUST disable recursive lookup, forwarding, or any other function that may allow them to provide cached answers. They also MUST NOT provide secondary service for any zones other than the root and root-servers.net zones. These restrictions help prevent undue load on the root servers and reduce the chance of their caching incorrect data. While these machines are not root nameservers (to the best of my knowledge), the same rules and regulations should be applied to their maintenance and operation, for the same reasons. Therefore, I recommend that these machines be fixed ASAP, and if they cannot be fixed within a reasonable period of time, they should be removed from the list of authoritative nameservers for .be. Furthermore, I recommend that if these machines cannot be fixed within a reasonable period of time, they be removed from all nameserver duties for all upper-level domains, including root nameservice, gTLD and ccTLD duties, service for any of the in-addr.arpa zones, etc.... I have also noticed that while vivaldi.belnet.be [193.190.198.2] and AUTH02.NS.UU.NET do not appear to be caching data directly, they do appear to somehow be secondaries for the .com zone, as the referrals they give for www.aol.com go directly to the AOL nameservers, and not through the root or .com gTLD nameservers. Try as I might, I cannot find the IP addresses 193.190.198.2 and 198.6.1.82 on the list of published nameservers for the .com gTLD. This concerns me, but without additional information, I can't call this an "error" per se. You can confirm my test results by executing the following Bourne shell script: #!/bin/sh BENS="DNS.CS.KULEUVEN.AC.be. SECDNS.EUNET.be. SPARKY.ARL.MIL. SUNIC.SUNET.SE. AUTH02.NS.UU.NET. NS.EU.NET. 193.190.198.10 193.190.198.2 NS.DNS.be." for NS in $BENS do dig @$NS www.aol.com. a +aa done Note that because of the lossy nature of sending UDP packets via the Internet, you may need to run this script multiple times in order to get complete output. I know I did. The results I get can be found at . I would also refer you to the output of running "doc -d be" at . I have also discovered that NS.EU.NET is not properly handling DNS queries for 193.in-addr.arpa. In particular, I was trying to do a reverse lookup on 193.74.108.3, and was getting SERVFAIL responses, something that I should not be getting -- especially not from nameservers that are supposed to be serving 193.in-addr.arpa. I wrote the following script to check out the nameservers for this top-level reverse zone: #!/bin/sh REVNS="NS.RIPE.NET. NS.EU.NET. AUTH03.NS.UU.NET. NS2.NIC.FR. SUNIC.SUNET.SE. MUNNARI.OZ.AU. NS.APNIC.NET." for NS in $REVNS do dig @$NS -x 193.74.108.3 done The results I get are at . I would recommend that you also look at the output of running "doc -d 3.108.74.193.in-addr.arpa" and "doc -d 74.193.in-addr.arpa" at ). You will note that NS.EU.NET is not the only nameserver here that is giving me SERVFAIL. I also get it from SUNIC.SUNET.SE and MUNNARI.OZ.AU. Again, if these machines cannot be fixed in a reasonable period of time, I suggest that they should be removed from the list of authoritative nameservers for 193.in-addr.arpa. Also note some of the interesting and different answers given back by the various machines for the IP address of ns.ripe.net in particular -- some list the IPv6 address and some don't. Is this an error? I honestly don't know.... With this information, I hope that you will understand the issues, and share my very deep concern that this is a systemic problem that affects everyone on the Internet, and goes far beyond the operational problems that any one ISP may be currently experiencing. I don't want to come across as a self-styled DNS expert, although I do believe that I know more on this topic than the average system administrator. I hope that you won't hold against me the fact that I am the current maintainer of the DNS debugging tool "doc", and that I have used the output of this tool to help document my position. If I wanted to be really nasty, I'd ask that you take away the high-level authoritative nameserver privileges for the above named machines, and that you instead give them to us, thus increasing the prestige of my employer. However, all I really want to do is get these problems fixed, and stop having so many complaints being delivered to me which ultimately result from the issues I have raised here -- and therefore, problems that I cannot do anything about, at least not by myself. That said, if it would help for Skynet to provide a nameserver to fill in for these various different roles, I'm sure that I and my employers would be more than happy to move Heaven and Earth to make sure that we got a suitable machine up and running as quickly as humanly possible. ADDENDUM: Please note that we have previously brought up these issues regarding ns.belnet.be with Marc Roger at Belnet. So far as I can tell, he didn't seem to care. We also brought these issues up with DNS.BE, and again we haven't seen any results. Also note that the information at appears to be out-of-date, as I keep getting bounces back from the supposedly official address "marc.vanwezemael at dns.be" as the Administrative POC for .be. I know that the management of EUnet is aware of their reverse DNS problems, because we have been having running problems with them for at least the last couple of weeks, and our servers don't accept mail from any of theirs, because they don't have reverse DNS. Of course, the reason they don't have reverse DNS is that the same servers are used for serving various important zones within Europe as well as serving those local to EUnet, and these machines are the ones that are screwed-up. Unfortunately, while they have been aware of the problems for a while and we've been trying to get them to fix their local problems in that time, it wasn't until more recently that I realized just how severe the issue with their nameservers really is. Again, I have yet to see anything useful at all come from them on this issue. Frankly, I just don't know what else to do. If I have not used the proper escalation procedure, then I would appreciate it if someone would tell me what the proper escalation procedure is. I would also like to see where this is documented on the appropriate web sites, because I looked everywhere I could think of, and still haven't found it. -- These are my opinions -- not to be taken as official Skynet policy ====================================================================== Brad Knowles, || Belgacom Skynet SA/NV Systems Architect, Mail/News/FTP/Proxy Admin || Rue Colonel Bourg, 124 Phone/Fax: +32-2-706.13.11/12.49 || B-1140 Brussels http://www.skynet.be || Belgium From blk at skynet.be Wed Jul 12 22:32:07 2000 From: blk at skynet.be (Brad Knowles) Date: Wed, 12 Jul 2000 22:32:07 +0200 Subject: DNS problems with nameservers for .be and [ 194.in-addr.arpa ] Message-ID: On Wed, 12 Jul 2000 20:11:26 +0200 Brad Knowles said: > A couple of problems with some pretty important nameservers > has been brought to my attention in the last couple of weeks. I > would not normally bring this issue directly to such a wide group > of people, but I believe that this matter is sufficiently serious > to warrant such an approach. Sorry, I meant to mention that we had previously brought up these issues with Marc Roger at Belnet, and so far as I am able to determine, he simply didn't seem to care. I believe that these issues have also been brought up with the management of DNS.BE, and again we haven't seen any results. Likewise, I believe that the administrators of EUnet are well aware of their DNS problems -- they are certainly aware of the problems of getting e-mail from any of their servers to many of ours, since our more recently configured and install machines refuse to accept mail from hosts that do not have proper reverse DNS, in accordance with Internet standard practice going back at least two years. Again, we have yet to see any results. If I have not followed the proper escalation procedure, I would appreciate it if you would please let me know what the proper escalation procedure is. I would also ask that you tell me where this escalation procedure is documented, because I'm certainly not aware of anything of this sort. -- These are my opinions -- not to be taken as official Skynet policy ====================================================================== Brad Knowles, || Belgacom Skynet SA/NV Systems Architect, Mail/News/FTP/Proxy Admin || Rue Colonel Bourg, 124 Phone/Fax: +32-2-706.13.11/12.49 || B-1140 Brussels http://www.skynet.be || Belgium From Daniel.Karrenberg at ripe.net Thu Jul 13 17:46:48 2000 From: Daniel.Karrenberg at ripe.net (Daniel Karrenberg) Date: Thu, 13 Jul 2000 17:46:48 +0200 Subject: [RETRANSMISSION] DNS problems with nameservers for .be and 193.in-addr.arpa In-Reply-To: Message-ID: <4.3.2.7.2.20000713173314.00d2f810@localhost.ripe.net> Brad, if you detect probems it is best to alert the people concerned rather than broadcasting as widely and indiscriminately as you have done. DNS SOA RRs provide excellent reference points here. There are no problems we can detect with 193.in-addr.arpa name service at this point. We will follow up to a smaller audience. Regards Daniel From blk at skynet.be Thu Jul 13 20:47:05 2000 From: blk at skynet.be (Brad Knowles) Date: Thu, 13 Jul 2000 20:47:05 +0200 Subject: [RETRANSMISSION] DNS problems with nameservers for .be and 193.in-addr.arpa In-Reply-To: <4.3.2.7.2.20000713173314.00d2f810@localhost.ripe.net> References: <4.3.2.7.2.20000713173314.00d2f810@localhost.ripe.net> Message-ID: At 5:46 PM +0200 2000/7/13, Daniel Karrenberg wrote: > if you detect probems it is best to alert the people concerned rather than > broadcasting as widely and indiscriminately as you have done. DNS SOA RRs > provide excellent reference points here. Unfortunately, it's been my experience that these labels typically age and are not kept up-to-date in many places. However, perhaps this is my memory of the multitude of problems I had when I was at AOL and posting daily "lamers" reports to comp.protocols.tcp-ip.domains, and mailing off a copy of the notice to the address claimed in the SOA records. Maybe this experience with the information in the SOA records is less applicable to the higher-level domains, although I can say that in the case of the .be ccTLD, this would not have done anything more than what we had already done in the past. > There are no problems we can > detect with 193.in-addr.arpa name service at this point. We will follow up to > a smaller audience. I'm still very concerned about the number of SERVFAIL errors that I previously saw which have since been mysteriously fixed, and I am very, very concerned about the safety of any of the zones served by any of these machines that are both authoritative and caching/recursive. -- These are my opinions -- not to be taken as official Skynet policy ====================================================================== Brad Knowles, || Belgacom Skynet SA/NV Systems Architect, Mail/News/FTP/Proxy Admin || Rue Colonel Bourg, 124 Phone/Fax: +32-2-706.13.11/12.49 || B-1140 Brussels http://www.skynet.be || Belgium From Daniel.Karrenberg at ripe.net Fri Jul 14 08:53:23 2000 From: Daniel.Karrenberg at ripe.net (Daniel Karrenberg) Date: Fri, 14 Jul 2000 08:53:23 +0200 Subject: [RETRANSMISSION] DNS problems with nameservers for .be and 193.in-addr.arpa In-Reply-To: References: <4.3.2.7.2.20000713173314.00d2f810@localhost.ripe.net> <4.3.2.7.2.20000713173314.00d2f810@localhost.ripe.net> Message-ID: <4.3.2.7.2.20000714082959.00d38390@localhost.ripe.net> At 08:47 PM 7/13/00, Brad Knowles wrote: >> There are no problems we can >> detect with 193.in-addr.arpa name service at this point. We will follow up to >> a smaller audience. > > I'm still very concerned about the number of SERVFAIL errors that I previously saw which have since been mysteriously fixed, As I told you privately, SERVFAIL does not tell you very much about the server you test if you use a recursive query. The problem may very well be on another server down the tree towards the name you are trying to resolve, or in the connectivity between the server you test and such other server. You should use a non-recursive query for such tests. +norecurse if you use dig >and I am very, very concerned about the safety of any of the zones served by any of these machines that are both authoritative and caching/recursive. This is not a probem per-se. The main concern with recursion is the load induced by lots of resolvers just pointing at a high-level server, and of course all those people digging away with recursive queries too ;-). This is the main reason for para 2.5 in RFC2870. This is why we recomend turning recursion off on high level servers. Caching incorrect data is a seperate issue which should be addressed by running good server software which is readily available. Caching correct data is not a problem. Yes it can be inconvenient for those who neglect to reduce TTL before making changes. But turning caching off on just some servers is not going to help here. So can we now please take all those innocent people off this thread once and for all. Daniel From randy at psg.com Fri Jul 14 18:27:23 2000 From: randy at psg.com (Randy Bush) Date: Fri, 14 Jul 2000 09:27:23 -0700 Subject: [RETRANSMISSION] DNS problems with nameservers for .be and 193.in-addr.arpa References: <4.3.2.7.2.20000713173314.00d2f810@localhost.ripe.net> <4.3.2.7.2.20000714082959.00d38390@localhost.ripe.net> Message-ID: > So can we now please take all those innocent > people off this thread once and for all. puhleeze! randy