You are here: Home > Participate > Join a Discussion > RIPE Forum
RIPE Forum v1.4.1

DNS Working Group

Threaded
Collapse

[dns-wg] Lower TTLs for NS and DS records in reverse DNS delegations

User Image

Anand Buddhdev

2021-11-29 12:59:18 CET

RIPE NCC staff member

Dear colleagues,

Users may request reverse DNS delegation by creating "domain" objects in 
the RIPE Database. Such domain objects must contain "nserver" attributes 
to specify the name servers for a reverse DNS zone, and may contain 
"ds-rdata" attributes, to specify delegation signer (DS) records.

When the RIPE NCC publishes these records in the appropriate parent 
zones, the Time to Live (TTL) of all these records is set at 172800 (two 
days).

The TTL of delegation NS records may be overridden by the TTL of NS 
records from a zone's apex. Alternatively, many large resolvers ignore 
the TTL values of NS records and cap them at much lower values such as 
21600. Finally, there is no way for a zone operator to change the TTL of 
a DS record, which is only present in a parent zone.

Long TTLs can cause problems for users when they want to change their 
name servers or perform DNSSEC key roll-overs. A long TTL on a DS record 
is especially harmful when a user needs to do a key roll-over in an 
emergency.

We propose to lower, in the first quarter of 2022, the TTL on NS records 
to 86400 and on DS records to 3600.

We welcome feedback or discussion about this, ideally via the DNS 
Working Group mailing list. If you prefer to send your feedback directly 
to us, you can email dns _at_ ripe _dot_ net.

Regards,
Anand Buddhdev
RIPE NCC

Ralf Weber

2021-11-29 17:07:45 CET

Moin!

On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:

> We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600.
I very much support that and would go even lower for for NS records. Maybe consider 21600 there.

So long
-Ralf
——-
Ralf Weber

Dave Lawrence

2021-11-29 22:55:26 CET

Anand Buddhdev writes:
> We propose to lower, in the first quarter of 2022, the TTL on NS records 
> to 86400 and on DS records to 3600.

I am in favor of this change.  I'd also like if the change was
accompanied by measurements of the effect on the relevant
authoritative nameservers to determine whether it would be reasonable
to reduce the NS TTL even further.

User Image

Peter van Dijk

2021-12-02 12:35:08 CET

On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote:
> Moin!
> 
> On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
> 
> > We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600.
> I very much support that and would go even lower for for NS records. Maybe consider 21600 there.

Same. I support this, and I also support lowering NS even further, even
to 3600.

Kind regards,
-- 
Peter van Dijk
PowerDNS.COM BV - https://www.powerdns.com/


User Image

Tim Wicinski

2021-12-02 12:49:11 CET

I support lowering the TTL on the DS records to 3600.

I support lowering the TTL on the NS records - I was going to put my hat in
for 21600,
but Mr van Dijk's suggestion of 3600 is very enticing.

But I liked Mr. Lawrence's suggestion on gathering data on lowering the NS
records TTL.
Perhaps the TTL can be lowered from 172800 to 86400 to 21600, then to 3600,
collecting data
along the way.

tim


On Thu, Dec 2, 2021 at 6:35 AM Peter van Dijk <peter.van.dijk _at_ powerdns _dot_ com>
wrote:

> On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote:
> > Moin!
> >
> > On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
> >
> > > We propose to lower, in the first quarter of 2022, the TTL on NS
> records to 86400 and on DS records to 3600.
> > I very much support that and would go even lower for for NS records.
> Maybe consider 21600 there.
>
> Same. I support this, and I also support lowering NS even further, even
> to 3600.
>
> Kind regards,
> --
> Peter van Dijk
> PowerDNS.COM BV - https://www.powerdns.com/
>
>
> --
>
> To unsubscribe from this mailing list, get a password reminder, or change
> your subscription options, please visit:
> https://lists.ripe.net/mailman/listinfo/dns-wg
>

Gregory Brzeski

2021-12-02 13:03:36 CET

Maybe allow users to set TTL on "domain" object in RIPE database?

Allowed values can be constrained to few common lengths of time? The 
default would be one decided here.

This way users would have a choice.

Gregory


On 02/12/2021 12:49, Tim Wicinski wrote:
>
>
> I support lowering the TTL on the DS records to 3600.
>
> I support lowering the TTL on the NS records - I was going to put my 
> hat in for 21600,
> but Mr van Dijk's suggestion of 3600 is very enticing.
>
> But I liked Mr. Lawrence's suggestion on gathering data on lowering 
> the NS records TTL.
> Perhaps the TTL can be lowered from 172800 to 86400 to 21600, then to 
> 3600, collecting data
> along the way.
>
> tim
>
>
> On Thu, Dec 2, 2021 at 6:35 AM Peter van Dijk 
> <peter.van.dijk _at_ powerdns _dot_ com> wrote:
>
>     On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote:
>     > Moin!
>     >
>     > On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
>     >
>     > > We propose to lower, in the first quarter of 2022, the TTL on
>     NS records to 86400 and on DS records to 3600.
>     > I very much support that and would go even lower for for NS
>     records. Maybe consider 21600 there.
>
>     Same. I support this, and I also support lowering NS even further,
>     even
>     to 3600.
>
>     Kind regards,
>     -- 
>     Peter van Dijk
>     PowerDNS.COM BV - https://www.powerdns.com/
>
>
>     -- 
>
>     To unsubscribe from this mailing list, get a password reminder, or
>     change your subscription options, please visit:
>     https://lists.ripe.net/mailman/listinfo/dns-wg
>
>
User Image

Jeroen Massar

2021-12-02 13:11:17 CET

On Thu, Dec 2, 2021 at 6:35 AM Peter van Dijk <peter.van.dijk _at_ powerdns _dot_ com> wrote:
> On Mon, 2021-11-29 at 17:07 +0100, Ralf Weber wrote:
> > Moin!
> > 
> > On 29 Nov 2021, at 12:59, Anand Buddhdev wrote:
> > 
> > > We propose to lower, in the first quarter of 2022, the TTL on NS records to 86400 and on DS records to 3600.
> > I very much support that and would go even lower for for NS records. Maybe consider 21600 there.
> 
> Same. I support this, and I also support lowering NS even further, even
> to 3600.

Another Aye from me on DS & NS to TTL 3600.

I think this will definitely help in DNSSEC deployment as then a mistake is much easier corrected, which thus means more people might deploy DNSSEC.

For reverses there is low risk of course, till one realizes that most SMTP servers verify it hard, and missing reverse typically is considered misconfiguration. But especially for the SMTP case, an hour outage is doable, mail will be delayed but will be retried.

Greets,
 Jeroen


Petr Špaček

2021-12-02 14:46:24 CET

On 29. 11. 21 12:59, Anand Buddhdev wrote:
> Dear colleagues,
> 
> Users may request reverse DNS delegation by creating "domain" objects in 
> the RIPE Database. Such domain objects must contain "nserver" attributes 
> to specify the name servers for a reverse DNS zone, and may contain 
> "ds-rdata" attributes, to specify delegation signer (DS) records.
> 
> When the RIPE NCC publishes these records in the appropriate parent 
> zones, the Time to Live (TTL) of all these records is set at 172800 (two 
> days).
> 
> The TTL of delegation NS records may be overridden by the TTL of NS 
> records from a zone's apex. Alternatively, many large resolvers ignore 
> the TTL values of NS records and cap them at much lower values such as 
> 21600. Finally, there is no way for a zone operator to change the TTL of 
> a DS record, which is only present in a parent zone.
> 
> Long TTLs can cause problems for users when they want to change their 
> name servers or perform DNSSEC key roll-overs. A long TTL on a DS record 
> is especially harmful when a user needs to do a key roll-over in an 
> emergency.
> 
> We propose to lower, in the first quarter of 2022, the TTL on NS records 
> to 86400 and on DS records to 3600.
> 
> We welcome feedback or discussion about this, ideally via the DNS 
> Working Group mailing list. If you prefer to send your feedback directly 
> to us, you can email dns _at_ ripe _dot_ net.

I think lowering both TTLs is a step in right direction, but let me ask 
provocative question:

Why not make the TTL _dynamic_, based on time of last change in the RIPE 
database?

Here is a wild example how it could work - all constants are made up, 
feel free to substitute your own:

Step 1: Define upper bound for NS & DS TTLs which are "stable". Say 1 
day for both NS and DS.

Step 2:
At the moment when someone updates NS or DS, lower respective TTL to 1 
minute.

Step 3: Cycle:
Step 3a: If there was no update to the record in the last 1 hour, double 
the respective TTL. Repeat until defined upped bound is reached. -> Go 
to Step 3
Step 3b: If there _was_ another update, reset TTL to 1 minute and reset 
the timer. -> Go to Step 3

If the upper bound was 1 hour then the maximum would be reached in ~ 6 
steps (6 hours since the change was introduced). 1 day TTL would be 
reached in 11 steps, i.e. 11 hours.

I think something like this would provide best of both worlds:
- Quick turnaround around changes and potential problems. Most problems 
happen right after change, in which case even 1 hour is PITA.

- Automatic TTL adjustment of "stable" records lowers load on servers 
and improves reliability when outages in the DNS infrastructure happen.

- Even if the delegation was hijacked (unlikely for reverse zone, so 
here just to illustrate) the lower TTL would help fixing it/pointing it 
back to the rightful owner.

What do you think? It seems so simple that I now have to wonder why 
registries are not doing it?

-- 
Petr Špaček  @  Internet Systems Consortium

User Image

Peter Thomassen

2021-12-02 14:58:27 CET

On 12/2/21 2:46 PM, Petr Špaček wrote:
> Why not make the TTL _dynamic_, based on time of last change in the RIPE database?
> 
> Here is a wild example how it could work - all constants are made up, feel free to substitute your own:
> 
> Step 1: Define upper bound for NS & DS TTLs which are "stable". Say 1 day for both NS and DS.
> 
> Step 2:
> At the moment when someone updates NS or DS, lower respective TTL to 1 minute.
> 
> Step 3: Cycle:
[...]
> What do you think? It seems so simple that I now have to wonder why registries are not doing it?

One problem I see is that if you change or add NS/DS records, and the TTL is set to a low value without your active participation, you can no longer figure out for how long old values (pre-change) are cached somewhere, so you don't know when stale stuff will globally expire.

But knowing this may be relevant in some recovery scenarios. For example, if you remove a DS record and throw away the corresponding key, and later realize that this was an error, you will see a DS TTL on the order of a minutes. That may make you think that it would not be worth recovering the old key from the backup, and that it would be better to create a new key pair and deploy it (including the DS).

Unfortunately, that won't work, because resolvers may have cached the old values for a time period that you can't determine in hindsight. Only if modifying the TTL would be an explicit step, you could know this (by first looking, then changing).*

So it seems to me that explicit is better than implicit (as usual). If communication channels for that are missing (e.g in EPP), perhaps that's what the actual problem is?

* One could keep a history of TTL values somewhere, but that seems overengineered.

Thanks,
Peter

Gregory Brzeski

2021-12-02 15:33:15 CET

On 02/12/2021 14:46, Petr Špaček wrote:
>
> Why not make the TTL _dynamic_, based on time of last change in the 
> RIPE database?
>
I belive this is an interesting approach, however requires that same 
logic will be applied to all users on server side.

The question arises if same logic fits all users.

When TTL can be set by a user on "domain" object in RIPE database then 
this logic is controlled by a user, what allows different users to have 
different strategies, allowing for flexible approach.

Gregory Brzeski



User Image

Michiel Klaver

2021-12-02 15:34:25 CET

> - Quick turnaround around changes and potential problems. Most
> problems happen right after change, in which case even 1 hour is PITA.

One hour should then be your upper (stable) limit. From experience I know DNS problems can occur anytime anywhere unplanned, not just after a change in the RIPE DB.



Jim Reid

2021-12-02 15:37:44 CET

> On 2 Dec 2021, at 13:46, Petr Špaček <pspacek _at_ isc _dot_ org> wrote:
> 
> Why not make the TTL _dynamic_, based on time of last change in the RIPE database?

Because it’s a very bad idea?

1) The RIPE database and its reverse zone DNS data are orthogonal things (modulo the nameserver objects for bits of the reverse tree). These two different things shouldn’t get linked in this way. There needs to be a clean and clear separation between the two. If they get entangled, the outcome will be painful for everyone.

2) It imposes (IMO unwanted) operational requirements on the database -- uptime, availability, extra tooling, new processes, opportunites for adding cruft, etc -- unrelated to the database's prime function.

3) Changes to the RIPE database for some reverse zone do not necessarily mean changes to that zone’s DS TTLs or the LIR’s DNSSEC policies.


Anyways, to get back on topic I think it would be better to discuss TTL values for NS and DS records based on solid engineering. At present, we seem to be plucking numbers out of the air based on gut feel. Simply saying “I think the TTL should be X” is not helpful when without some justification for choosing X - or why X is better than Y - or an explanation of the operational impacts.

Anand and his colleagues have identified an issue. But I’m not convinced his proposal is the right one. LIRs may well have good reasons for choosing TTLs for NS and DNSKEY RRs that are higher or lower than the defaults that are being proposed. I think this needs careful WG consideration: unintended consequences and all that.
User Image

Peter Koch

2021-12-02 15:57:54 CET

On Thu, Dec 02, 2021 at 01:11:17PM +0100, Jeroen Massar via dns-wg wrote:

> > Same. I support this, and I also support lowering NS even further, even
> > to 3600.
> 
> Another Aye from me on DS & NS to TTL 3600.

I'm slightly reminded of the solar activity cycle by another instance of
a race to low TTLs, to be followed by another train of thought recommending
high (infrastructure RRSet) TTLs in favour of resilience.

No objection to Anand's proposal at all, but maybe there are limits to committees
finding "optimum" numbers, especially under the impression of a prominent incident.

-Peter

Paul Ebersman

2021-12-02 17:28:20 CET

gregory> Maybe allow users to set TTL on "domain" object in RIPE
gregory> database?

Be nice if we could expand to a more sane/standard set of TTLs for NS/DS
in TLD/SLDs, which makes this non-functional for a slew of such zones.

gregory> Allowed values can be constrained to few common lengths of
gregory> time? The default would be one decided here.

I made this same comment during the recent DNS-OARC. There is value in a
somewhat longer TTL for steady state, to weather temporary glitches in
routing (though current values are way too long for even that). When
doing provider changes and/or KSK rollovers, a shorter value makes more
sense.

I think with some testing and actual data, we can come up with a short
list of useful values without having to allow unconstrained, random
user-picked values that will just create support issues and not really
improve the situation.

Michael Richardson

2021-12-02 18:08:13 CET

Tim Wicinski <tjw.ietf _at_ gmail _dot_ com> wrote:
    > I support lowering the TTL on the DS records to 3600.

    > I support lowering the TTL on the NS records - I was going to put my
    > hat in for 21600, but Mr van Dijk's suggestion of 3600 is very
    > enticing.

    > But I liked Mr. Lawrence's suggestion on gathering data on lowering the
    > NS records TTL.  Perhaps the TTL can be lowered from 172800 to 86400 to
    > 21600, then to 3600, collecting data along the way.

Aside from performance/load on the distributing name servers, what else would
you collect?

Or perhaps I am asking: What is your hypothesis, and what kind of data do you
need to prove/disprove it?

--
Michael Richardson <mcr+IETF _at_ sandelman _dot_ ca>, Sandelman Software Works
 -= IPv6 IoT consulting =-
User Image

Sander Steffann

2021-12-02 23:25:05 CET

Hi Petr,

> I think lowering both TTLs is a step in right direction, but let me ask provocative question:
> 
> Why not make the TTL _dynamic_, based on time of last change in the RIPE database?

Because explicit is better than implicit. Magically calculated dynamic values rarely match operational expectations :)

Cheers,
Sander


Petr Špaček

2021-12-03 08:31:03 CET

Disclaimer:
I agree with everyone in this thread that explicit is better than 
implicit, and that auto-magic is much worse than operators lowering 
their TTL in time and then setting it back when they are done.

Of course, RIPE NCC can be a pioneer among registries and expose TTL to 
domain admins. In that case I will sit silently and watch how it goes.


Rest of this e-mail applies only to situation when explicit TTL 
configuration is not possible or practical.

---- Further musing about dynamic TTL below: ----
---- Ignore if explicit TTL control is introduced ----

This ideal IMHO has several practical problems:

- In my (admittedly limited, anecdotal) experience most operators do not 
lower their TTLs before doing changes, and then when problems happen 
they are in a trap. Maybe RIPE NCC's audience would be significantly 
better in that respect, who knows. We cannot have data about that 
without exposing the explicit TTL knob.

- It does not work at all CDS/CDNSKEY automation because AFAIK there is 
no way for child to signal desired TTL to the parent. One could argue 
that CDS/CDNSKEY should have lower risks so it might not be necessary.

- In my (again admittedly limited, anecdotal) experience registries do 
not _want_ to expose interface to change TTLs (for various reasons).

Another angle how to look at this is that explicit manual configuration, 
while theoretically the best, very much resembles the way how DNS was 
done in 1980s and not operational reality of 2020s. Manual and error 
prone processes are being replaced with automatic everywhere, and DNS 
should not be an exception.

In other words, I agree with purists on the theoretical level: Static 
and explicit TTLs are perfect for world full of cooperating DNS experts 
and registries, but I don't believe we are in this ideal world. And if 
the "explicit" option not practical for any reason, we are left either 
with static or dynamic "defaults" imposed by the registry. Pick you 
poison then.


On 02. 12. 21 15:37, Jim Reid wrote:
>> On 2 Dec 2021, at 13:46, Petr Špaček <pspacek _at_ isc _dot_ org> wrote:
>> Why not make the TTL _dynamic_, based on time of last change in the RIPE database?
> 
> Because it’s a very bad idea?
> 
> 1) The RIPE database and its reverse zone DNS data are orthogonal things (modulo the nameserver objects for bits of the reverse tree). These two different things shouldn’t get linked in this way. There needs to be a clean and clear separation between the two. If they get entangled, the outcome will be painful for everyone.

Except that they already are entangled. You cannot plausibly claim they 
are orthogonal if DS & NS records read from the database and used to 
generate zone data. (I'm not database expert of course, but that's my 
understanding.)


> 2) It imposes (IMO unwanted) operational requirements on the database -- uptime, availability, extra tooling, new processes, opportunites for adding cruft, etc -- unrelated to the database's prime function.

I don't think so. The database already has "changelog", and there 
already has to be a component which generates zone data from the 
relevant fields in the database. Whatever theoretical logic for dynamic 
TTLs would belong to this "database->zone translation layer".


> 3) Changes to the RIPE database for some reverse zone do not necessarily mean changes to that zone’s DS TTLs or the LIR’s DNSSEC policies.

Agreed. I'm theorizing about the case where "registry" does not want to 
expose TTL configuration directly.


> Anyways, to get back on topic I think it would be better to discuss TTL values for NS and DS records based on solid engineering. At present, we seem to be plucking numbers out of the air based on gut feel. Simply saying “I think the TTL should be X” is not helpful when without some justification for choosing X - or why X is better than Y - or an explanation of the operational impacts.
> 
> Anand and his colleagues have identified an issue. But I’m not convinced his proposal is the right one. LIRs may well have good reasons for choosing TTLs for NS and DNSKEY RRs that are higher or lower than the defaults that are being proposed. I think this needs careful WG consideration: unintended consequences and all that.

Let's be honest here. TTLs are _always_ wrong:
Either too long when you need to do a change, or too short when there is 
an outage and long TTLs would have helped to paper over it :-)

-- 
Petr Špaček

User Image

Giovane Moura

2021-12-03 09:12:45 CET

On 11/29/21 10:55 PM, Dave Lawrence wrote:
> I am in favor of this change.  I'd also like if the change was
> accompanied by measurements of the effect on the relevant
> authoritative nameservers to determine whether it would be reasonable
> to reduce the NS TTL even further.

For folks interested in measurement the impact of TTL changes, we did
two studies in the past:

In [0] we look into the trade-offs between long and short TTLs. You can
skip the measurements details and go to Section 6 for the discussion on
pros and cons of longer/shorter TTLs. (closely related to what folks are
posting to this thread).

In [1] we look into the impact of TTLs and caching while auth servers
suffer DDoS.

-- 
/giovane
SIDN Labs


[0] https://www.isi.edu/~johnh/PAPERS/Moura19b.pdf
[1] https://www.isi.edu/~johnh/PAPERS/Moura18b.pdf



User Image

Shane Kerr

2021-12-03 10:03:20 CET

Anand,

On 29/11/2021 12.59, Anand Buddhdev wrote:

> We propose to lower, in the first quarter of 2022, the TTL on NS records 
> to 86400 and on DS records to 3600.

I support this change.


I would also support:

- Setting the TTL on NS records even lower. As a data point, at least 
one ccTLD (.CL) already has a 3600 TTL on NS records.

- Allowing LIR to set their own TTL on NS/DS records explicitly.

- Allowing LIR to request that the TTL on NS/DS records get set from the 
child (either from the TTL on the NS in the child's servers or from the 
CDS/CDNSKEY TTL).

- Allowing LIR to choose from a set of pre-defined TTL, as suggested by 
Gregory Brzeski.

- Adopting a back-off algorithm as suggested by Petr Špaček.


Cheers,

--
Shane