This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/dns-wg@ripe.net/
[dns-wg] Action Item 48.1: Lame Delegations -- first draft

Previous message (by thread): [dns-wg] Action Item 48.1: Lame Delegations -- first draft
Next message (by thread): [dns-wg] WG Agenda for RIPE50
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Edward Lewis Ed.Lewis at neustar.biz
Thu May 5 11:54:39 CEST 2005
At 23:55 +0200 5/4/05, Peter Koch wrote:
>Dear all,
>
>here is a first draft addressing action item 48.1 on lame delegation problems
>on large scale name servers. It's a -00 kind of document mainly issuing the
>problem statement. Although there are some hours left, I don't expect
>anyone to have read it by the WG meeting. An HTML version may be made
>available later and depending on how the PDP evolves, we might want or need
>to inject it into the policy developing engine. Comments are welcome!

Okay, "comments are welcome." ;) Here they come...

# RIPE DNS WG 48.1                                                 P. Koch
#                                                                 DENIC eG
#                                                              May 4, 2005
#
#
#        DNS lame delegations caused by AXFR source unavailability

...

# RIPE DNS WG DRAFT       Large Scale DNS Lame Dels               May 2005
#
#
# 1.  Introduction
#
#    This document analyses causes for DNS lame delegations seen on large
#    (thousands of zones) name servers and investigates and assesses
#    countermeasures.
#
#    First we will define the term "lame delegation" and similar
#    operational problems.  In the third section we will address various
#    reasons that lead to lame delegations.  The fourth paragraph will
#    summarize mechanisms server administrators currently do or could
#    apply to lower the impact of lame delegations.
#
# 2.  Signs of Lame Delegations

Please, no more jokes about my picture appearing here. ;)

#    A lame delegation is a DNS delegation where the target of the NS RR
#    does not respond authoritatively to queries for the domain so
#    delegated.
#
#    Lame delegations show different symptoms, which are sometimes given
#    separate names:
#
#    1.  The server's responses do not have the AA bit set
#
#    2.  The server responses with an (upward) referral
#
#    3.  The server responses SERVFAIL
#
#    4.  The server responses REFUSED
#
#    5.  The server refuses the query packet (giving either ICMP port
#        unreachable or TCP RST)
#
#    6.  The server does not respond at all
#
#    7.  The server's name does not exist (NXDOMAIN)
#
#    8.  The server's name does not own any A (or AAAA) RRs

Some others...

First, let's assume that the query is (akin to):

       dig <zone.name> soa +norec

There can be "no error/no data" - a problem unique to the reverse map 
is that registrants who have a /17 worth of IPv4 address space might 
have mistakenly configured the entire /16 and not the 128 /24's they 
really have.  (This is one of a few places were I can isolate an 
diagnosis from the symptoms that are observable.)

Another failure scenario to consider is that, just like the query for 
the SOA, you can have no response for the address lookup for the 
domain name of the name server.  I.e., akin to #7 and #8, there's a 
"name query is not answered."

#6 is a problem.  There is a name server implementation that will 
answer only for what it is authoritative for - and not answer at all 
for other queries.  So, imagine a registrant has a /22 and reserves 
the first 256 addresses (a /24) for later use.  The registrant may 
not configure the zone because it's not being used - meaning the 
first zone goes response-less on the server, but the other 24's are 
good.  (This assumes you group the zones by some registration record.)

I believe it would be good to document the query you will use and 
what the set of acceptable answers will be.  I considered the return 
code, authority bit, answer and authority counts.  (An answer count 
of 1 and authority records could indicate a CNAME in the answer and a 
referral to another server.  Yes, it happens.)

#    Cases (1) and (2) above are classical signs of a zone that has been
#    forgotten by its server, either by expiry or due to syntax errors.
#
#    Cases (1) through (4) are common lame delegations, cases (5) and (6)
#    often just appear as temporary operational problems and cases (7) and
#    (8) are sometimes called stale delegations.  The latter may result in
#    a significant increase of the query volume at the servers serving the
#    domain the non existing name server is expected to reside in.

Okay, now I'm going to start down a path that may lead to a rathole. 
(Just a warning to regular participants of mailing lists.)

First, trying to guess the reason for symptoms is a slippery slope. 
Over the years I have found such a divergence in operational 
practices that diving what's in the configuration file via the 
network protocol is nearly impossible.  There are some common 
meltdowns - like configuring the /16 instead of 128 /24's for a /17 - 
but there aren't enough "common" meltdowns to say that we can 
efficiently send diagnosis to all the problem cases.  I no longer 
have an idea of numbers, but I think that something like 10-30% of 
problems (depending on how you count problems) fall into easy to 
diagnose, the remaining majority quickly falls into "other problems."

Second, to begin the pathway to the home of the rat, you have to ask 
"what do you want to do here?"  What is a lame delegation?  A lame 
server is defined a few times in RFCs.  Is the purpose to prune off 
lame servers or clean up DNS operations?

E.g., what if you see a server answering correctly for a zone and the 
other server is running recursively and answers non-authoritatively. 
This could be because the second server has learned the answer from 
the first via something like forwarding.  In this case, both servers 
answer correctly and won't cause an operational problem (which is 
where the 'problem' of lame delegations originated).  However, such a 
configuration could be considered a problem registration (not truly 
meeting the need for 2+ servers), and may even be an indication of 
subscriber (registrant) fraud.  (I.e., hijacking by changing the name 
server registrations, etc.)

The rathole is "what's the target of stamping out lame delegations?"

My work in the field (no longer being pursued, at least by me) 
resulted in just stating "observations."  I.e., no diagnosis, just a 
report.  Servers that did not answer gave "never heard from" or "last 
heard from" dates.

The cautionary tale here is that - because you need to repeat tests 
(I'm just stipulating that here) to properly test non-answering 
servers, sometimes a server that passed a test will slip into a 
failure mode.  My tests ignored "once good" results, meaning fat 
fingering post haste could introduce lameness.  (No test can really 
stop that.)  But the test code has to be able to "deal with it."

#    While operational guidelines suggest that the NS RRSet of a zone and
#    the corresponding delegation in the parent zone should match, there
#    are sometimes inconsistencies.  Without acknowledging or endorsing
#    this practice the union of both NS RRSets shall be eligible for a
#    lame delegation assessment.  In other words, even an NS RR that is
#    only present in the delegated (child) zone may constitute a lame
#    delegation as well.

I firmly believe that you cannot require that the parent copy of the 
NS set be the same as the child NS set.  The parent copy should be a 
subset (improper or proper) at all times.  As a lame tester, you 
should be testing against the parent set only - as that's the data 
registered.

I'll stop with just comments on lame delegations for now.  One thing 
to keep in mind is how to discuss this.  It would be good to document 
the testing done to explore the situation.  It would be good to 
document the symptoms observed.  It is also good to document 
diagnosis, and also remediation.

I "repeat" that because "good to document" depends on the purpose of 
the document.  A tool to perform the testing ought not get caught up 
in diagnosis, I found having it just "log" symptoms much more useful 
that having it try to "think" about the cause.

Remediation is tool dependent, so a "parent" trying to help the 
"children" needs to judge if it is willing to handhold completely, 
for preferred DNS implementations, or be fair by not offering the 
teaching service at all.  (Not helpful, but unbiased on tool choice.)

PS - There was some other rathole I wanted to step in, but I've 
forgotten it. ;)
-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Edward Lewis                                                +1-571-434-5468
NeuStar

If you knew what I was thinking, you'd understand what I was saying.
Previous message (by thread): [dns-wg] Action Item 48.1: Lame Delegations -- first draft
Next message (by thread): [dns-wg] WG Agenda for RIPE50
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[ dns-wg Archives ]