Missing signature for current KSK
On Tuesday, 15 February 2011 at around 13:00 UTC, our DNSSEC signer system produced a e164.arpa zone that was missing a signature for the current KSK.
We resolved the error at approximately 16:30 UTC by producing a new zone (serial 1297787668).
We apologise for the outage and will provide more details after we've analysed the incident.
Update: 04 March 2011
Together with our vendor, we have analysed the DNSSEC outage that occurred in e164.arpa on 15 February 2011 and have come to the conclusion that it was caused by an unknown bug in the signer system.
The publication of the new DS record for e164.arpa (key tag 33067) occurred on the same day that we began rolling out our new DNS provisioning system. During that migration we had to change our serial timestamp format from YYYYMMDDNN serial number to Unix timestamp format. To do this, we had to increment serials in all of our zones twice to roll to the new values (e.g. 2011021500 -> 1297728000). This excessive re-transfer behaviour seemed to cause the system that verifies the publication of new DS records to skip this signature.
As this re-transfer behavior doesn't occur during normal operations, we do not foresee this becoming an reoccurring problem in the future. However, we have decided that there is a need to increase the sanity checks performed before a zone gets published. We are now in contact with NLNetLabs and are investigating the possibility of having a zone transfer proxy that can be configured to validate a zone that comes in on one end and is only sent out on the other end if it validates against a given set of trust anchors.
For more information:
We believe this type of sanity check would benefit the community by reducing the occurrence of DNSSEC-related incidents.
Update: 21 April 2011
As some of you have noticed we had another DNSSEC outage last week. The zones affected were:
ripe.net: 11:29 - 16:00 UTC on 14 April
0.a.2.ip6.arpa: 02:31 - 10:00 UTC on 15 April
After analysis with our vendor, we determined that the cause of this outage was the same bug that caused the outage in e164.arpa on 15 February 2011.
Our vendor concluded that the bug on 15 February was caused by an unusually high load on the signer system, but this time the system was in normal day-to-day operation, so that can't explain the failure.
We've collected a sufficient amount of data from this incident to allow us to reproduce the circumstances and have found the bug in the system together with our vendor. We will receive an updated version of the software within the coming weeks. We have agreed to this timeline because this bug is only triggered in specific circumstances during a Key Signing Key rollover.
We apologise for this outage. I would like to use the opportunity to point out that our long-term mitigation plan is to have a DNSSEC verification proxy in place. I am happy to say that our efforts for this have been well-received and a group of other interested parties has formed to work on it.
If you would like to join the mailing list, please see: