About RIPE | Contact  | Search | Sitemap    
Homepage RIPE  
RIPE Document Store
search  
     
RIPE Navigation Ends
Current Documents Document Store
Current Documents Current Documents
Draft Documents Draft Documents
Draft Documents Draft Document Archive
RIPE NCC Navigation Ends
Next Section
2 Measurements next up previous


Next: 3 Experimental Setup Up: Internet Delay Measurements Previous: 1 Introduction

2 Measurements

 

There are (at least) two ways to monitor the characteristics of the Internet:

  • Passive: by monitoring the amount of traffic that passes a certain point and,
  • Active: by generating test traffic and measuring how much time it takes to ship the test traffic over the network.
The advantage of an active measurement over a passive measurement, is that for the active method, the test traffic can be generated under well-defined and controlled conditions, whereas the passive measurement depends on the traffic that happens to pass a certain point. Another important advantage of active measurements is that it avoids the privacy concerns associated with passive measurements. As we want to deliver an objective measurements of the performance of the Internet, we have decided to focus on active measurements only.

 figure86
Figure 2:   Experimental Setup

The principle of our measurements is shown in figure 2. A test-box is connected 0 hops away from (or, if that is not feasible, as close as possible to) the border router of each participating ISP. By connecting the test-box 0 hops away from the border router, we exclude effects of the internal network from our measurements.

If the ISP has more than 1 border router, we will either connect the test-box in such a way that the delays to each border router are similar or we will install multiple test-boxes. This way, we avoid a bias for traffic in certain directions in our measurements.

The box at ISP-A generates a pre-defined pattern of test messages. The test messages are send through the border router and the network to a similar box at ISP-B. Both test-boxes are connected to a clock. By looking at these clocks when a test message is sent and when it arrives, one can determine the delay between these two providers. There are other measurements that one can do with this setup, this will be discussed in section 2.1.

The test-boxes at both ISP's are identical, thus the process can be reversed to measure the delay between ISP-B and ISP-A. Also, the boxes at both ISP's can be used to send test traffic to similar boxes at other ISP's.

It should be obvious from figure 2 that the clocks at the two ISP's should be synchronized (in other words, the time offset between the two clocks should be known and remain constant) and provide the time with a high accuracy compared to the time difference that we want to measure. This will be discussed in more detail below.

In the extreme case where all O(1000) ISP's in the RIPE-NCC (geographical) area participate in the project, the test-boxes can be receiving test traffic from and sending it to 1000 other boxes, thus testing all 10002 possible connections. However, the topology of the Internet is such that it is likely that one can still do reliable measurements of the performance of the Internet without having to send test traffic for all 1000000 possible connections. This will be studied.

Although this method can be used to measure the performance of the internal network of each provider, we want to limit our work to the external networks only. For this reason, the test-boxes should be located as close as possible to the border routers of the ISP's, in order to eliminate any effects caused by the internal networks.

2.1 Data Collection

  There are several possibilities for the test traffic that we can generate:

  • One-way: A packet is sent from ISP-A to ISP-B, as described in the previous section.
  • Two-way: A packet is sent from ISP-A to ISP-B and then returned to the sender. This principle is used in ``ping'' and other tools that determine the round trip time.

    However, assuming that the paths are known, the results obtained with one two-way traffic measurement, will simply be the sum of two the one-way traffic measurements. For this reason, we plan to use two-way traffic only for independent checks of the one-way results.

  • Real life applications: A measurement that a user sees in a real application, like fetching a WWW-page or downloading a file by FTP, using a well defined TCP benchmark connection.
In the first instance (phase 1, section 2.1.1), we plan to do one-way and two-way measurements. At a later stage (phase 2, section 2.1.2), this will be expanded to performance measurements for real life applications. Our measurements will, already in phase 1, produce several observables, this will be discussed in section 2.1.3.

2.1.1 Phase 1

 

The one way test traffic will consist of UDP data packets of 3 different sizes: small (56 bytes, as used by ping and similar tools), medium (576 bytes, the minimum packet size that any router must handle as a single packet) and large (say, 2048 bytes, to see the effect of (possible) splitting data-packets into smaller units and related effects).

The packets will contain:

  • The address of the sender.
  • A time-stamp that shows when the packet was sent, together with the dispersion of the clock in the sending machine. The dispersion is needed to determine the overall error in the measurement.
  • A reference number, in order to keep track of the number of packets send and lost.
  • A hop-count (number of routers that the packet passed between sender and receiver).
  • Administrative information, such as the version number of the software and a checksum.
  • Padding to give the packet the desired size.
The receiving process will remove the padding and then add the following information to the packet:
  • The address of the receiver.
  • A time-stamp that shows when the packet was received, together with the dispersion of the clock in the receiving machine.
The result is raw-data that contains all information about this particular delay measurement.

The path or routing-vector between ISP-A and ISP-B is defined as the collection of machines and network between the border routers of these two ISP's. This path may vary as a function of time. A tool like ``traceroute'' will be used to determine the path for the test traffic at any given time. The path information will be made available.

There are several potential problems while doing these measurements:

  • Set-up effects: even for Internet traffic, the routers have to be setup before a connection between two points is established. If this is not taken into account, then the measured delay will be larger than the delay that can be attributed to the network. Two possible solutions to circumvent this problem are:
    • Precede any measurement by a ``traceroute''. This determines the path and provides the routing information that we are interested in anyway.
    • Run the test more than once and compare the results. If the first result is significantly different from the other measurements, discard it.
    Set-up effects are interesting in themselves and this data will be recorded and studied.
  • No connectivity. The fact that there is, contrary to what one expects, no connectivity between two points is interesting information in itself. However, our software should be written such that it will survive this case without operator interference.

For the two-way measurements, we plan to use data-packets that are similar to the one-way measurements. These results will be used for consistency checks only.

2.1.2 Phase 2

 

In the second phase of the project, we plan to do measurements with:

  • TCP-streams.
  • Simulations of applications.
  • Packet Trains. A packet train is a number of test messages that are sent with very short intervals. They provide a way to study if the packets are delivered out of order and/or merged together into larger packets along the way.
The implementation of this will be discussed in a future design note.

2.1.3 Observables

 

These measurements will provide several observables:

  • Delay Information: The results of the delay measurements can be stored in a N×N matrix D(t,s) where each element Dsd(t,s) represents the time that a packet needed to travel from a source s to a destination d. If there is no connection between two points, then Dsd(t,s) will be tex2html_wrap_inline677.

    The elements of D are a function of the time t, as the network characteristics will change over time, and the packet size s.

    The elements Dsd(t,s) all have an error ðDsd(t,s) which gives the total error in this delay measurement.

  • Routing Vector or Path Information: Each delay measurement is accompanied by a determination of the path between the two locations using a tool like ``traceroute''. These results can be written as a N×N matrix P(t) of vectors.

    This matrix provides, assuming that our test-boxes are installed at a significant fraction of the ISP's, an up-to-data map of the Internet as well as the history of of the network.

    If there is no connectivity between two points i and j, then the element Pij will be 0. The number of zeroes therefore gives a measure of the total connectivity. Also, the two elements Pij and Pji should either both be zero (no connection between these two points) or non-zero. If only one of them is zero, this indicates a network configuration error.

    Finally, the elements Pij can be used to detect routing loops.

It should be noted that on any particular test-box, only the results of delay measurements to that box and the path information from that box are available (in other words: the columns of the matrix D and the rows of the matrix P ): the results of a delay measurement between ISP-A and ISP-B will be collected at ISP-B, whereas the traceroute information is only available at ISP-A. It is undesirable to do the traceroute at ISP-B, as the path from A to B is not guaranteed to be the same as the path from B to A.

In order to get the full matrices, the results have to transferred to a central point. This is one of the reasons for collecting all test results on a single machine.

From these observables we can determine if there are trends in the transfer times as a function of time and isolate special or suspicious events. During the initial phase of the project we will concentrate on developing (statistical) tools to analyze the matrices. In a later phase we, or other researchers, may develop tools to correlate the data in the matrices, analyze the effect of the size of the packets or perform analysis of the routing vector matrix itself.

Once enough data has been collected we must be able to define meaningful metrics and we might tune the measurements to optimally sample this metric.

2.2 Frequency of the test traffic

There are 5 basic requirements:

  1. The test traffic should be small compared to the load on the connection under consideration. If not, then the test traffic will affect the performance and the measurement becomes useless.
  2. The intervals should be small enough to study (interesting) fluctuations in the performance of the network.
  3. As the network changes over time, the amount and type of test traffic should be easily configurable.
  4. The interval between two measurements should be small enough to detect and eliminate set-up effects.
  5. As suggested by the IPPM [2] the measurements should be randomly distributed to prevent synchronization of events due to weak coupling as demonstrated in [3]. The IPPM suggest using a so-called Poisson sampling rate. We will investigate the IPPM suggestion and other possibilities to prevent weak coupling.
The first two requirements are contradictory: smaller time intervals means more test traffic, but more test traffic means a higher load on the network.

For the initial settings, suppose that we generate the following amount of test traffic:

  • 1 small packet per minute.
  • 1 medium sized packet per minute.
  • 1 large packet every 10 minutes. These packets are used to see the effects of splitting large packets into smaller ones, presumably this effect is constant over time.
This then generates a data-volume of approximately 14 bytes/s for each measurements. This number does not include overheads and the data for the ``traceroute'' program.

If this amount of data is too high, then one might consider increasing the interval between two packets and add a burst mode (a short time with a higher test traffic volume) for occasional measurements with a smaller interval between two test packets.

This leads to a number of questions that have to be answered

  1. Can we generate this amount of data without the ISP's objecting? Probably not for 1 connection, but what if we have installed test-boxes at 10, 100 or even 1000 sites, with 102, 1002 or 10002 possible connections?
  2. Are we sure that the test traffic does not affect the performance of the network?
  3. How do we check that a packet has left the test-box before the next one is sent out? We do not want packets to sit in a buffer if the local network is at its maximum load.

2.3 Error Analysis, Required Clock Accuracy

In this section, we want to estimate the errors on the final results. As we mentioned before, our plan is to do both one and two way measurements.

One way measurements involve sending a time-stamped message from one machine to another. The second machine compares its arrival time with its local clock and determines the transfer delays from that. This requires that the local clocks are synchronized and provide the time with a high enough accuracy. The synchronization is the difference between the clock on this machine and an absolute time standard. The accuracy is the error in the time measurement. The errors in synchronization and accuracy determine the overall error in the delay measurement.

Two way measurements are a bit simpler in this respect: a message is sent to another machine and echoed there. All time measurements can be done on the same machine, so the clock has to be accurate and the the resolution of the clock has to be high enough. The resolution is the smallest time interval that can be measured by this clock.

In both cases, we will be subtracting two times, t1 and t2. If we call the total error on these times tex2html_wrap_inline721, then the error in the final result (tex2html_wrap_inline723) is equal to:
equation163

It should be noted that, even with an high resolution clock, tex2html_wrap_inline723 can become relatively large for small time intervals. For example, if we measure a 50 ms time interval with a clock with a resolution of 10 ms, then the error in the result will be 14 ms.

Systematic effects of the equipment on the results have to be studied and should be understood.


next up previous


Next: 3 Experimental Setup Up: Internet Delay Measurements Previous: 1 Introduction

Henk Uijterwaal
Fri May 30 15:42:21 MET DST 1997
 

Next Section
     About RIPE | Site Map | LIR Portal | About the RIPE NCC | Contact | © RIPE Community. All rights reserved.
RIPE.NET Homepage LIRPortal RIPE Community Homepage