Experiences Monitoring Backbone IP Networks
Chuck Fraleigh*, Christophe Diot+, Sue Moon+, Philippe Owezarski+, Dina
Papagiannaki#, Fouad Tobagi*
* Department of Electrical Engineering, Stanford University
+ Advanced Technology Labs, Sprint
# Department of Computer Science, University College London
Network traffic measurements provide essential data for networking
research and operation. Obtaining such data from an operational IP
network, however, is not an easy task. The traffic volume on commercial
backbone networks ranges from tens of Mbits/sec on OC-3 access links to
several Gb/sec on backbone OC-48 and OC-192 links. Furthermore, the
network contains hundreds of links, making exhaustive monitoring of the
network impractical. Finally, the traffic carried on the network can
exhibit strange phenomenon resulting from routing loops, incorrect
protocol implementations, and even malicious attacks.
We present our experiences with monitoring traffic on the Sprint IP
backbone network. We have installed passive packet monitors on selected
OC-3 and OC-12 links at several locations in Sprint's network. Using the
DAG OC-3/OC-12 card developed at the University of Waikato, these systems
capture the first 44 bytes of each packet that is transmitted on the
links. Once the trace collection is completed, the data is transferred
back to our lab for offline analysis.
First we describe the system we have developed. This system differs
from other packet monitoring systems in three basic aspects. First, it is
deployed in a commercial backbone ISP. Most other traffic monitors are
installed in either research networks or in access networks. Second, our
monitoring systems are deployed at a much larger scale than other passive
measurement efforts. We currently have 11 monitoring systems installed at
one location in the Sprint network, and we are in the process of
installing two additional locations, each with another 10 systems. We hope
these will be operational by the time of publication. Third, all of the
systems are synchronized using a GPS clock so we are able correlate the
traces and measure one way delays.
Managing a monitoring system of this scale is a challenging task. Each
monitored link can generate up to 100 GB of data each day. We discuss
techniques for transfering the data to the lab, storing the data, and
efficiently processing the data on a dedicated 16 node computing cluster
used for data analysis. We also present the techniques we use to
synchronize the measurements conducted at different points in the network
so that we identify individual packets as they flow over multiple links in
the network.
Finally we present results that demonstrate the capabilites of our
system and provide information which is useful in developing future
measurement systems. Two traffic parameters greatly influenced our system
design. First was the packet size distribution. Earlier measurement work
indicated that the average packet size in the network was around 400
bytes. Our systems record 64 bytes for every packet, so the systems were
designed to handle an average data rate of approximately 16% of the total
traffic volume. For example, if an OC-3 link were running at 100 Mbit/sec,
we would expect to record about 2 MBytes/sec (7.2 GByte/hour) of trace
data. However, we find the packet size distribution varies slightly from
link to link due to asymmetric traffic patterns. Some links tend to carry
many small ACK packets reducing the average packet size, while other links
tend to carry large data packets increasing the average packet size. The
second traffic parameter that influenced our system design was bursts of
minimum size packets. These bursts represent the worst case traffic our
system must handle. We present information on how long these bursts may
last and what impact this has on our system design. We also present
results on the number of flows/sec that are observed in the network, link
utilizations, and the delays packets incur as they travel through the
backbone.
|