Re: New Document available: RIPE-271 (fwd)

To: Daniel Karrenberg < >
Roberto Percacci < >
From: Roberto Percacci < >
Date: Tue, 02 Sep 2003 21:33:03 +0200
Cc: "Henk Uijterwaal (RIPE-NCC)" < >

RIPE46 looks very interesting but I am attending a conference in Rome
and cannot participate in the discussion.
Let me make a few comments on what has been said in this list on this subject.

On Sun, 27 Jul 2003, Fotis Georgatos wrote:

> > It seems to me that publishing a mountain of raw data would not
> > really be of much use to the general RIPE community.
>
> Considering those who don't mind having their metrics published and those who do,
> we have two different groups. RIPE NCC might provide solutions for both,
> either by maintaining two different service levels with an adequate pricing
> model 'because privacy has a cost', or, allow the members to self-organize
> their own TTM projects by using the source code on self-managed custom setups
> and RIPE NCC picking either side as default for the standard service.
>
> Making a shoe of a single size, might be hard to live with...

Private experiments aside, I think there should be only one TTM.
There are already too many non-interoperable measurement systems.

> > My main suggestion is that RIPE publish on the web site only some
> > statistical summaries of the TTM data. This would be of general usefulness
> > insofar as it would give everybody a general feeling of what to expect
> > from their Internet connection.
>
> Any form of public "summary" or "review" that can draw conclusions among ISPs for
> service quality, will be considered potentially harmful for commercial providers.

The summary would be a set of quality indicators for the "internet as a whole"
(whatever that means) that do not differentiate between various networks.

> A few providers compete on quality and others on price. Both are useful.
>
> > Another way in which this could benefit the community at large is to give
> > also non - TTM participants a way of gathering their own performance
> > statistics, at the same time gathering much higher statistics.
> > This could work as follows: a user downloads a software that pings the TTM
> > hosts for some time and then extracts quality statistics.
> > In addition, these data would also be sent to the central database and
> > used in forming the "grand average".
>
> Currently, there is no better way to do this other than downloading... TTM itself.
> TTM follows very specific standards for measurements, that only a handful of
> other similarly-built tools can be used for data proof and correlation.
>
> I'd like to offer my advice of non accepting such a "grandaverage", since it puts
> in the loop of measurements parties who may be considered "mutually untrusted",
> in the commercial spirit of the word. Confidentiality issues will raise eyebrows.

Confidentiality issues should not stop us from investigating, measuring and
publishing general properties of the Internet. I think this activity is useful to the
industry and no one would object as long as they know their own data are
not shown separately.
Of course the data analysis has to be done by a trusted party.

On Wed, 13 Aug 2003, Henk Uijterwaal (RIPE-NCC) wrote:

> Roberto, others,
>
> I think your mail can be split up in several issues:
>
>
> 1. The RIPE NCC should publish a statistical summary number for the
> delays seen by a TB. This number should be independent of the
> location of the TB. You propose RTT/great-circle distance.

Perhaps a distribution rather than a number.
People may then extract their favourite number from this distribution
(the average? the 74th percentile?)

> I agree, this is indeed a useful number, though I think we should
> create them for both incoming and outgoing traffic (and then add them
> up for RTT).
>
> Generating these numbers isn't a problem as such for us and I'm happy
> to take this up as a work item.
>
> I'd like to know in advance though if people feel this is a useful
> number and if not, what they'd like to see instead.

I got used to thinking of RTT/distance, but others may prefer distance/RTT,
which is the average speed at which the data travel from source to destination.
The information conveyed is equivalent.

> 2. Disclosure policy. You suggest to split it up in several levels:
>
> 1. Everybody can see the summary/benchmark
> 2. A TB owner can see the results related to his box.
> 3. One cannot see data between third parties.
>
> With the exception of (1), this is essentially the current policy.
>
> I don't think that adding (1) is sufficient. There is interest in the
> raw data, sites want to play with other metrics. I much rather make
> all data available (plus code and instructions on how to read them),
> with an AUP to avoid abuse, as this will allow everybody to run any
> analysis he likes. It also opens up the possibility for researchers
> to analyze the data and come up with useful quantities that we haven't
> thought off.

The AUP is the key. It must address privacy issues in a satisfactory way
for the parties involved.

On Sun, 31 Aug 2003, Daniel Karrenberg wrote:

> As Henk has already said, the disclosure policy that you propose is
> essentially the current one with those general statistics added. I
> share your concerns but I do not see a sure way to solve both those and
> the issue we are addressing with RIPE-271: the RIPE NCC cannot develop
> information for closed user groups. I would like to go with publishing
> everything and lowering the service charges for now. My expectation is
> that we will *gain* paying test box hosts this way.

As INFN we are perfectly happy to be able to access all the data.
Depending on the AUP, what you say may also end up not being very
different from the current policy, which gives access to the data to
some research groups.

> I have also often thought about a multi-tier measurement network that
> could develop into a flatter peer-to-peer network. There are two
> fundamental problems with that. One is aggregation of the data; unless
> you do that at many places this will not scale. You simply cannot send
> all raw results to a central place for analysis.

I agree. My answer to this would be not to keep all the data.
Each box in the outer tier could do some local analysis, transmit the resulting
"pre-digested" data to the central location and throw away the rest.
Depending on how many boxes and how much data are kept,
one may still need a hierarchical structure.

> The second, more
> fundamental, problem is that of trusting the validity of observations
> taken by someone else and taking responsibility to publish results based
> on them; this becomes even more complicated when you cannot identify
> those taking the observations and there are parties interested in
> inserting certain observations favorable to themselves. Essentially I
> agree with Fotis here: this is hard! However I also agree with you that
> we should work in this direction. A valuable step might be to add a
> second tier of test boxes without GPS but still with a formal agreement
> identifying the host and requiring them to operate the box to TTM standards.

I agree here too. However, by restricting ourselves only to trusted parties
would greatly limit the size of the system and hence the statistics.
There could be a third tier of measurements by less trusted parties ("TTM ultralite")
who are not bounded by contracts etc.
The problem of fake data injection should not really arise if these data
from untrusted parties are not published separately, but are only used
to form the grand average.
The untrusted party would still have access to its own detailed statistics,
of course.

Roberto

Post To The List:

References:
- Re: New Document available: RIPE-271 (fwd)
  - From: Roberto Percacci
- New Document available: RIPE-271 (fwd)
  - From: Henk Uijterwaal (RIPE-NCC)
- Re: New Document available: RIPE-271 (fwd)
  - From: Daniel Karrenberg

<<< Chronological >>>

Author Subject

<<< Threads >>>