Sapphire/Slammer Worm Impact on Internet Performance
James Aldridge, Daniel Karrenberg,
Henk Uijterwaal and René Wilhelm
New Projects Group / RIPE NCC
On the morning of 25 January 2003, the Sapphire worm (aka "SQL slammer") was released on the Internet. Abusing a vulnerability in Microsoft SQL server it multiplied itself rapidly and soon spread out over networks worldwide. From news headlines and activity on mailing lists it was clear the attack had an impact on the Internet's performance.
The RIPE NCC has various projects that monitor various elements of the key infrastructure of the Internet on a 24/7 basis and is therefore in a unique position to provide some quantitative insight of the impact of this worm. These projects are:
The Test Traffic Measurement Service. This project measures packet delays and losses between hosts on the Internet.
The Routing Information Service. This project monitors BGP announcements from approximately 220 sites worldwide.
Root Server Monitoring. This project monitors the response to DNS queries to each of the thirteen root servers from 60 points on the Internet.
In this note, we summarize the effects seen by these three projects on the morning of January 25. Detailed information about the effects seen by each of the projects can be found in the appendices.
2. The experimental data
All measurements show that the event started extremely rapidly at 0530 UTC. This bears out earlier analysis of the worm itself which concluded that it tried to spread very agressively.
All measurements show that not all parts of the Internet were affected equally. In some areas service was degraded significantly immediately at the onset of the attack. In other areas there were no noticable effects at all.
The affected areas also coped differently and the effects of the attack reduced in different patterns and at different rates. A somewhat discernible pattern is that the most serious effects were over between 0900 and 1000 UTC and little trace of service degradations remained after 1400 UTC.
At the time of the attack, the Test Traffic Measurements were measuring approximately 2,350 relations between 49 hosts on the Internet. We selected the relations with significantly higher delays or losses than in the 6 hour period before the attack. This showed that 922 (40%) of the relations were affected by the worm, while the remaining 1430 (60%) were not.
A visual inspection of the data revealed a range of problems, this is discussed in more detail in the appendix.
Next, we looked at the distribution of problems over the test-boxes; i.e. which hosts were the sender and which the receiver in the affected measurement relations. Surprisingly, we found that 20% of the test-boxes accounted for 86% of the affected measurement relations. Eight boxes had problems in relation to all other test-boxes both for incoming and outgoing traffic, which indicates trouble at or close to the hosting site. An additional two boxes had all of their incoming traffic affected. These sites were possibly safe from infection but suffered from overloaded input queues. It is also possible that asymmetric routing simply bypassed the spots that troubled the test-traffic being send to the host. The 14% (130 of the initial 922) remaining cases were mixed situations.
Looking at all data we can conclude that the Internet did not come to a global "meltdown" as 60% of the measured relations do not show any sign of deterioration. This indicates most backbone links were fine. The problems were localized in edge sites or their immediate upstream provider.
We also observed a lot of asymmetry in the data, which again proves that one-way measurements are crucial for understanding Internet performance.
The RIS measurements show in particular that the amount of BGP activity increased sharply (by a factor of 30-60) at the onset of the attack and subsided only very slowly. This suggests that the BGP routing system can be easily disturbed but takes a longer time to settle down. Details can be found in the appendix.
The root service measurement show that the attack significantly affected the connectivity of two of the thirteen servers. This did not cause any degradation in DNS service. Details can be found in the appendix.
Looking at all data we can conclude that the Internet did not come to a global "meltdown" even though some individual sites were highly affected by this worm. Sixty percent of the measured relations do not show any sign of deterioration. This indicates most backbone links were fine and the problems were localized in edge sites or their immediate upstream provider. Also, eleven of the thirteen root servers remained accessible.
This data clearly shows that many of the routine measurements taken by the RIPE NCC can be used to detect widespread problems in the Internet infrastructure and to differentiate them from local problems. This can be crucial information to NOCs at the time of a problem. We are investigating how we can combine this data and make it available in real time.