[dns-wg] Action Item 48.1: Lame Delegations -- first draft
Edward Lewis Ed.Lewis at neustar.biz
Thu May 5 11:54:39 CEST 2005
At 23:55 +0200 5/4/05, Peter Koch wrote: >Dear all, > >here is a first draft addressing action item 48.1 on lame delegation problems >on large scale name servers. It's a -00 kind of document mainly issuing the >problem statement. Although there are some hours left, I don't expect >anyone to have read it by the WG meeting. An HTML version may be made >available later and depending on how the PDP evolves, we might want or need >to inject it into the policy developing engine. Comments are welcome! Okay, "comments are welcome." ;) Here they come... # RIPE DNS WG 48.1 P. Koch # DENIC eG # May 4, 2005 # # # DNS lame delegations caused by AXFR source unavailability ... # RIPE DNS WG DRAFT Large Scale DNS Lame Dels May 2005 # # # 1. Introduction # # This document analyses causes for DNS lame delegations seen on large # (thousands of zones) name servers and investigates and assesses # countermeasures. # # First we will define the term "lame delegation" and similar # operational problems. In the third section we will address various # reasons that lead to lame delegations. The fourth paragraph will # summarize mechanisms server administrators currently do or could # apply to lower the impact of lame delegations. # # 2. Signs of Lame Delegations Please, no more jokes about my picture appearing here. ;) # A lame delegation is a DNS delegation where the target of the NS RR # does not respond authoritatively to queries for the domain so # delegated. # # Lame delegations show different symptoms, which are sometimes given # separate names: # # 1. The server's responses do not have the AA bit set # # 2. The server responses with an (upward) referral # # 3. The server responses SERVFAIL # # 4. The server responses REFUSED # # 5. The server refuses the query packet (giving either ICMP port # unreachable or TCP RST) # # 6. The server does not respond at all # # 7. The server's name does not exist (NXDOMAIN) # # 8. The server's name does not own any A (or AAAA) RRs Some others... First, let's assume that the query is (akin to): dig <zone.name> soa +norec There can be "no error/no data" - a problem unique to the reverse map is that registrants who have a /17 worth of IPv4 address space might have mistakenly configured the entire /16 and not the 128 /24's they really have. (This is one of a few places were I can isolate an diagnosis from the symptoms that are observable.) Another failure scenario to consider is that, just like the query for the SOA, you can have no response for the address lookup for the domain name of the name server. I.e., akin to #7 and #8, there's a "name query is not answered." #6 is a problem. There is a name server implementation that will answer only for what it is authoritative for - and not answer at all for other queries. So, imagine a registrant has a /22 and reserves the first 256 addresses (a /24) for later use. The registrant may not configure the zone because it's not being used - meaning the first zone goes response-less on the server, but the other 24's are good. (This assumes you group the zones by some registration record.) I believe it would be good to document the query you will use and what the set of acceptable answers will be. I considered the return code, authority bit, answer and authority counts. (An answer count of 1 and authority records could indicate a CNAME in the answer and a referral to another server. Yes, it happens.) # Cases (1) and (2) above are classical signs of a zone that has been # forgotten by its server, either by expiry or due to syntax errors. # # Cases (1) through (4) are common lame delegations, cases (5) and (6) # often just appear as temporary operational problems and cases (7) and # (8) are sometimes called stale delegations. The latter may result in # a significant increase of the query volume at the servers serving the # domain the non existing name server is expected to reside in. Okay, now I'm going to start down a path that may lead to a rathole. (Just a warning to regular participants of mailing lists.) First, trying to guess the reason for symptoms is a slippery slope. Over the years I have found such a divergence in operational practices that diving what's in the configuration file via the network protocol is nearly impossible. There are some common meltdowns - like configuring the /16 instead of 128 /24's for a /17 - but there aren't enough "common" meltdowns to say that we can efficiently send diagnosis to all the problem cases. I no longer have an idea of numbers, but I think that something like 10-30% of problems (depending on how you count problems) fall into easy to diagnose, the remaining majority quickly falls into "other problems." Second, to begin the pathway to the home of the rat, you have to ask "what do you want to do here?" What is a lame delegation? A lame server is defined a few times in RFCs. Is the purpose to prune off lame servers or clean up DNS operations? E.g., what if you see a server answering correctly for a zone and the other server is running recursively and answers non-authoritatively. This could be because the second server has learned the answer from the first via something like forwarding. In this case, both servers answer correctly and won't cause an operational problem (which is where the 'problem' of lame delegations originated). However, such a configuration could be considered a problem registration (not truly meeting the need for 2+ servers), and may even be an indication of subscriber (registrant) fraud. (I.e., hijacking by changing the name server registrations, etc.) The rathole is "what's the target of stamping out lame delegations?" My work in the field (no longer being pursued, at least by me) resulted in just stating "observations." I.e., no diagnosis, just a report. Servers that did not answer gave "never heard from" or "last heard from" dates. The cautionary tale here is that - because you need to repeat tests (I'm just stipulating that here) to properly test non-answering servers, sometimes a server that passed a test will slip into a failure mode. My tests ignored "once good" results, meaning fat fingering post haste could introduce lameness. (No test can really stop that.) But the test code has to be able to "deal with it." # While operational guidelines suggest that the NS RRSet of a zone and # the corresponding delegation in the parent zone should match, there # are sometimes inconsistencies. Without acknowledging or endorsing # this practice the union of both NS RRSets shall be eligible for a # lame delegation assessment. In other words, even an NS RR that is # only present in the delegated (child) zone may constitute a lame # delegation as well. I firmly believe that you cannot require that the parent copy of the NS set be the same as the child NS set. The parent copy should be a subset (improper or proper) at all times. As a lame tester, you should be testing against the parent set only - as that's the data registered. I'll stop with just comments on lame delegations for now. One thing to keep in mind is how to discuss this. It would be good to document the testing done to explore the situation. It would be good to document the symptoms observed. It is also good to document diagnosis, and also remediation. I "repeat" that because "good to document" depends on the purpose of the document. A tool to perform the testing ought not get caught up in diagnosis, I found having it just "log" symptoms much more useful that having it try to "think" about the cause. Remediation is tool dependent, so a "parent" trying to help the "children" needs to judge if it is willing to handhold completely, for preferred DNS implementations, or be fair by not offering the teaching service at all. (Not helpful, but unbiased on tool choice.) PS - There was some other rathole I wanted to step in, but I've forgotten it. ;) -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Edward Lewis +1-571-434-5468 NeuStar If you knew what I was thinking, you'd understand what I was saying.
[ dns-wg Archives ]