Sapphire/Slammer Worm Impact on Internet Performance
James Aldridge, Daniel Karrenberg,
Henk Uijterwaal and René Wilhelm
New Projects Group / RIPE NCC
10-February-2003
1. Introduction
On the morning of 25 January 2003, the Sapphire worm (aka "SQL
slammer") was released on the Internet. Abusing a vulnerability in
Microsoft SQL server it multiplied itself rapidly and soon spread out over
networks worldwide. From news headlines and activity on mailing lists it
was clear the attack had an impact on the Internet's performance.
The RIPE NCC has various projects that monitor various elements of the
key infrastructure of the Internet on a 24/7 basis and is therefore in a
unique position to provide some quantitative insight of the impact of this
worm. These projects are:
-
The Test Traffic Measurement
Service. This project measures packet delays and losses
between hosts on the Internet.
-
The Routing Information
Service. This project monitors BGP announcements from
approximately 220 sites worldwide.
-
Root Server Monitoring. This project monitors the response to
DNS queries to each of the thirteen root servers from 60 points on
the Internet.
In this note, we summarize the effects seen by these three projects on
the morning of January 25. Detailed information about the effects seen by
each of the projects can be found in the appendices.
2. The experimental data
All measurements show that the event started extremely rapidly at 0530
UTC. This bears out earlier analysis of the worm itself which concluded
that it tried to spread very agressively.
All measurements show that not all parts of the Internet were affected
equally. In some areas service was degraded significantly immediately
at the onset of the attack. In other areas there were no noticable effects
at all.
The affected areas also coped differently and the effects of the attack
reduced in different patterns and at different rates. A somewhat discernible
pattern is that the most serious effects were over between 0900 and 1000 UTC
and little trace of service degradations remained after 1400 UTC.
At the time of the attack, the
Test Traffic Measurements were measuring
approximately 2,350 relations between 49 hosts on the Internet. We selected
the relations with significantly higher delays or losses than in the 6 hour
period before the attack. This showed that 922
(40%) of the relations were affected by the worm, while the
remaining 1430 (60%) were not.
A visual inspection of the data revealed a range of problems, this
is discussed in more detail in the appendix.
Next, we looked at the distribution of problems over the test-boxes;
i.e. which hosts were the sender and which the receiver in the affected
measurement relations. Surprisingly, we found that 20% of the test-boxes
accounted for 86% of the affected measurement relations. Eight boxes had
problems in relation to all other test-boxes both for incoming and
outgoing traffic, which indicates trouble at or close to the hosting site.
An additional two boxes had all of their incoming traffic affected.
These sites were possibly safe from infection but suffered from overloaded
input queues. It is also possible that asymmetric routing simply bypassed
the spots that troubled the test-traffic being send to the host.
The 14% (130 of the initial 922) remaining cases were mixed situations.
Looking at all data we can conclude that the Internet did not come to
a global "meltdown" as 60% of the measured relations do not show any sign
of deterioration. This indicates most backbone links were fine. The
problems were localized in edge sites or their immediate upstream
provider.
We also observed a lot of asymmetry in the data, which again proves
that one-way measurements are crucial for understanding Internet
performance.
The RIS measurements show in particular that the amount of BGP activity
increased sharply (by a factor of 30-60) at the onset of the attack and
subsided only very slowly. This suggests that the BGP routing system can
be easily disturbed but takes a longer time to settle down. Details
can be found in the appendix.
The root service measurement show that the attack significantly affected
the connectivity of two of the thirteen servers. This did not cause any
degradation in DNS service. Details
can be found in the appendix.
3. Conclusions
Looking at all data we can conclude that the Internet did not come to
a global "meltdown" even though some individual sites were highly affected
by this worm. Sixty percent of the measured relations do not show any sign
of deterioration. This indicates most backbone links were fine and the
problems were localized in edge sites or their immediate upstream
provider. Also, eleven of the thirteen root servers remained
accessible.
This data clearly shows that many of the routine measurements taken by
the RIPE NCC can be used to detect widespread problems in the Internet
infrastructure and to differentiate them from local problems. This can be
crucial information to NOCs at the time of a problem. We are investigating
how we can combine this data and make it available in real time.
References
Full TTM Report.
Full RIS Report.
Full Root Server Report.
|