Skip to main content

tt-wg

Note: Please be advised that this an edited version of the real-time captioning that was used during the RIPE 56 Meeting. In some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid to understanding the proceedings at the session, but it should not be treated as an authoritative record.

RIPE 56 Test Traffic Working Group 4:00pm, Thursday, 8 May 2008

CHAIR: Good afternoon, ladies and gentlemen. We're about to start the Test Traffic Working Group and the first item on the agenda is well the first item, apparently is the logistics. This is the stuff, the web page, the mailing list, which everybody is free to join. Because the session is being recorded and broadcast, we'd ask everybody who's using the mike to please state your name when asking a question.

We have the usual administrative stuff, the agenda, blue sheets no we don't have blue sheets. We have the scribe and anybody else I need to credit? Sorry, this is my first time doing this.

The first item on the agenda is me giving a brief introduction of who I am and why I'm here. Then we have a presentation given over video conference from Russia on available band width measurements. Then we have an update from Henk on the IETF IPPM working group. I'll then give a brief talk about the way we use DNSMON on top of test traffic and the test traffic monitoring system at Nominet. [unclear] there's going to be an update on the TTM network and an update on where the test traffic measurement system is going and if we have time any other business.

There's no slides for my general introduction. My name is Ian Meikle. I'm the system manager and I have I wanted to become the Chair of this when Henk asked for a volunteer last meeting mainly because it's the sort of thing that interests me. I need to know how our systems are going along and test traffic is one very good system for doing that. I'm interested primarily in the applications built on top of it, the DNSMON is the one I'm going to talk a little bit about later. But it's the sort of thing that excites me generally to know how systems operate, how they connect.

I've already forgotten what the next item on the agenda was.

HENK UIJTERWAAL: The next item is Andre except he's disappeared. He was on the video conferencing until I said to him he will be up in two minutes. So I can do my talk and then see if he comes back.

CHAIR: Okay.

HENK UIJTERWAAL: Letter. Welcome everybody. For those of you who don't know me, I'm Henk Uijterwaal. I'm still with the RIPE NCC but this afternoon I'm talking to you with my hat as IPPM Co-chair on. I'm talking about developments in this working group. So, IPPM working group, what's it all about? IPPM stands for IP performance metrics and it's a working group established 10, 11 years ago with this charter and the charter has been changed a bit over time but the two paragraphs I'm showing here have been in there for, like, ten years and they say pretty much summarize it. What it's chartered for is develop a set of metrics that can be applied to measure quality, performance or reliability of internet services and these metrics are meant to be run by everybody. The group, just focus on the metrics, so measure something, it doesn't it's not specifically chartered to say what good or bad so we can tell you how to measure delay between boxes, we measure ten milliseconds and we tell you how to do this. Whether ten milliseconds is good or bad is something up to you.

So this is the charter.

Some practical information about the group, we're in the transport area, data transport area, we have a mailing list and archives where you can see what we've been discussing over the last years. Of course we do a lot of work on mailing lists. We also meet three times a year at the IETF meetings. Last time in March in Philadelphia, next time late July in Dublin. The group has two Chairs, currently those are Matt from Internet 2 in the US and myself. We've been around for ten years. What have we been doing in those ten years? We've worked on a lot of documents and they've all been published as RFC, first is a set of framework documents, basically the stuff you have to define in order to do any measurements at all, so there's a general framework just describing the background, some terminology. We have a framework for bulk transfer capacity metrics so transfer a lot of data and another framework that sets the definition for network capacity. On those frame works we've built so far six metrics, connectivity, one-way and round trip delay, those are two metrics but similar. Loss metrics, something called IP delay variation and also known as jitter, so instead of looking at absolute delays look at differences for delays between different packages, metrics for packet reordering, the packets are sent in one order and come out in a different order, this quantifies that, and a loss pattern sample metrics where we try to quantify not that packets are lost but which, for example, every third packet in a stream is lost, if you have that you can detect it, your application might be able to adjust to the situation. So we have a bunch of metrics, frameworks.

We have developed two protocols, one a metric registry so that's a way to describe which metrics you're taking put them in all kind of databases in the standard way and an active measurement protocol, which is essentially a generalization of the work that is done by TTM. You have a mechanism to set up measurement between two generic implementations of metrics. We also have a document on network performance for periodic streams so data that is known to do that.

So that is the work over the last ten years.

Currently we have five things on our plate on which the group is working. First of all, TWAMP, two-way active measurement protocol. You can set up two-way measurements from a box to another box and back, between any implementation of our metrics. We're looking at composition of metrics, that's essentially a discussion where you have a case like you measure delay between box A and box B and then delays between box B and box C, now can you add it up and if so, how should you add it up?

XML is something that came, we want to store some data in XML trace routes and there's two things I'd like to share more of, packet duplication and second is reporting.

Turning to packet duplication. Well, how do we get around this? Essentially this is something we got ten years ago. Ten years ago we looked at loss, so somebody sends a packet from one box to the other, packet doesn't arrive for whatever reason. Now, that's something that happens quite often, but what also can happen is the opposite, somebody sends a packet and more than one copy arrives. One way you can actually generate this on your lap is making some loops in your switches and things like that. It's often a network configuration but doesn't have to be. People have seen this happen, but, yeah, you want to quantify this. See this metrics and it's work done in the last year, year and a half, it's essentially a way to quantify duplication. So how is it built up? A two-step thing. First you look at single packet and you define something called the Type P one-way packet duplication. This may sound like a long name but it's part of the framework. So what does it have? It's a metric, there's a couple of parameters, source box, destination box, a time when the packet is sent and a timeout. So the source box should send a packet and then the destination box should try to look at it and look at it for a time in the T 0 and count the number of packets received. If it receives one then it says one, if it receives two then it says 2, very simple. If it doesn't receive anything, the metric is fine and there was no duplication. And the other thing is this is about packets that arrive and will be processed by the application so we only count uncorrupted packets. If there's some lower level error mechanism in there that a packet is corrupted, please send it again, second packet is sent, that is something that isn't counted.

So this is for single packet but of course not that interesting.

The next thing you want to do is look at some statistics, so the document also specifies ways to summarize the results and there are two packet duplication average which shows the number of additional packets received in a stream and package duplication rates and packets that were sent and resulted in duplicates and these complement each other as I'll show.

So here we have some examples. I started sending four packets. The normal case is those exactly those four packets arrive, then obviously both rates will be zero or 0%. Now, look at the slightly more interesting case, every packet arrives twice for some reason, now, the duplication average [unclear] 100%, the duplication rate is also 100% since every original packet was duplicated. If I look at the third case where instead of four copies or two copies of each packet, three copies arrive, I get a duplication average of 200%, that makes sense. The duplication rate, so the number of rate is still 100%, packet each one was sent multiple times.

In the other case, case no. 4, I sent four packets, eight arrive but three of 1, one of 2, 3 of three, one of 4. The duplication rate is only 50% since only two of the four copies sent resulted in multiple copies. So those two complement each other and you can have some one one makes more sense than the other. Of course, example number 5 all of them arrive in sequence, that doesn't have to be the case, if I sent 1, 2, 3, 4 and then they arrive the same again, the numbers will be the same as case number two and if you're interested in those differences look at the reordering metrics.

So this draft was discussed over the last year, year and a half in the group. It's now ready to be moved to a standard and it's something that people interested in this can implement. Yeah, even my colleagues at the NCC can do this in TTM. Since you have the packets all you have to do is some statistics to count them.

So this is work that's almost finished. Another one is one that is in a more premature stage, the reporting draft or the document on reporting. If you go back you see that we define metrics for all kinds of everything. If you bother to look up the start reading them, there's lots of parameters everybody can select, packet size, rate, stream type, time intervals and lots and lots more. Now, this is done intentionally. For some applications you want to measure some things differently than others so we define a generic thing and said good engineering will sort out what the parameters will be. Good engineering is slightly harder than people thought. And we've been getting questions like, things people always wanted to know and the basic question is how good is my connection and there you want to have an integrated view for short periods and you also want to be able to compare your results with other measurements. If you want to do this, then you have to select a set of parameters for packet size, rate, whatever, that is A, sensible, and B, the same as everybody else uses.

So we came up with the idea of what we call reporting 5TUPLE, report delay, loss, reordering, duplication, jitter, you measure it at the same time with the same packets, you say we don't discuss them and you show this set of numbers for short intervals, ten seconds, one minute, intervals like this. Now, this sounds like a good idea, I think, but it's actually a question to operators. There's a draft and that draft should be back on the IETF site soon. Something went wrong there. It was there a couple days ago and mysteriously disappeared. It wasn't back this morning. There's a draft describing it and there's a draft where we like as working groups to get some comments from our operators and users, if this is a good idea and if we have made reasonable assumptions.

So we have a couple of other things on our plate but if you look at this, this isn't much work. And we're also reaching the end of our current charter. So the second question I would like to ask you is basically is there anything else we have to do? And more specifically, do you think, as users, as network operators, is there anything in daily operations that you would like to see measured and see as standard? And this is also a motivation for my talk. I need input from the community.

Now, we have some ideas, passive measurements. So far we've always been looking at active measurements, that is you generate the data that you then send out. Passive measurement is looking at packets that fly by on a cable. There are some issues, biggest thing is privacy and also statistics. Since you don't have control over how they're sent you don't have much control over this.

Another one is SLA monitoring. People are using the IPP and metrics to monitor SLAs. So far it has always been the question is do we need to standardize this or are people happy to use as is?

So how do we want to continue this? I said the next meeting of the group is in Dublin. We want to discuss it with the whole group there. So before that all suggestions on work that this group should be picking up are welcome. You can either do this by talking to the chairs or post us on the mailing list. One thing you can safely do is ignore the charter. The charter will be outdated as soon as we're finished with our current work so we have to update it. If yes it's measurement but not sure if it's in the charter, ignore the charter and post the idea and hopefully we'll have a fruitful discussion in Dublin.

So to conclude, we did a lot of work. I gave an overview of it in the last ten years and we're now at the phase where we're considering what to do next and that's something where we welcome community input. With that I'd like to open the floor for questions, comments. No? Then thank you.

(Applause)

CHAIR: I think we have to do the next one.

I mentioned before that we at Nominet run a very [unclear] global authority DNSMON name servers. We're particularly interested in the reachability of them. DNSMON is a very good source for this. There was some debate off line about whether this fitted within this working group or the DNS working group. I can see arguments for both but in my opinion it's a talk about reachability of name servers, nothing to do with the actual DNS data that's reported by them.

What I want to talk about is the what we need to know about this global note work of name servers, how it is that we measure it at the moment and what we want to do with and how we measure it using DNSMON and what we want to do with DNSMON in the future to help us get a better overview of our network.

We have a series of key performance indicators, essentially SLAs. This is something that we use to publish so that people can see how our services are performing. We don't publish it anymore. It's something that we still use internally to make sure we know how our network is performing and it's based around the idea of planned and unplanned downtime. All our name servers are hosted in other people's networks so here planned is either something that that we planted or something else did and didn't tell us about, unplanned downtime could be planned but just that we weren't notified about. To make it easy how we're actually performing we decided to correlate this into one single metric. The thresholds that we allow ourselves are 60 minutes of unplanned down time per month, 120 minutes total per name server per month and of course 0 down time for the whole constellation. To make it easier to recognise whether we were meeting this kind of performance requirement we decided to define this as being 100% performance so if we have a drop below we could see how far we were performing. What do I mean by that? We define performance in terms of total minutes actually within a single month. The actual planned and unplanned down time that we had within that month, the allowed time minus the total allowed down time and expressed that as a percentage. In terms of how we monitor it, we have our own monitoring system within Nominet, which uses NAGIOS. To see how reachable our services are. In terms of differentiating between planned and unplanned we have an onus to do that on our system and to catch everything we don't see through nag yes, sir we use DNSMON. The way we tend to use it is somebody, usually me, sits down at the beginning of the month, looks at the previous month's statistics using DNSMON if you're not familiar it changes with context. What we're looking at are all the name servers as they're identified further down. This is from October 2006. I picked it because there's a couple interesting things. For instance there's a problem with our name server in Manchester in the UK and another with our name server in Amsterdam. Looking at the Manchester outage, this was a planned outage, moving our name server to a different data center. A couple of interesting things I noted when I was preparing for this talk, one is the obvious phase shift, the time shift for TT 26 which seems to be running about three hours earlier than the other ones, which is well

AUDIENCE: Embarrassing.

CHAIR: Yes, foresaw our outage by three hours. The other one is the wire access refers to test traffic boxes and there's only ten reported here. I don't know if this is an issue with data archiving at the time, if anybody knows about that.

The second example was the name server in Amsterdam. We got this one down as unplanned given that its got well you can see a fairly tight boundary around an hour with slight shifts either way, suggests to me that it was probably a planned outage that we didn't know about. But the second outage is I'm almost sure down to some sort of routing instability. What was interesting about this from our point of view is, as you can see, for some of the test traffic machines this was quite visible. And it's not something that we actually detected at Nominet. So it's the thing that we're missing which DNSMON gives us is this kind of global overview.

So what future plans do we have for the use of DNSMON? We're aware that you can get ahold of the DNSMON raw data and this is something we want to incorporate. The format has been defined. It took a little bit of digging when we asked for somebody to actually find out where it was but it's something we want to incorporate into our monitoring system which we're going to upgrade in a major way later on this year. What we hope to get from that is a global overview of our name server constellation and hopefully much more rapid notification and, therefore, response to any kind of outages that we notice.

Are there any questions?

AUDIENCE: RIPE NCC. One comment. I think it looks like the graphs you showed were shown for events that were very far in the past, maybe months in the past, so what you get there since this is done there are D databases behind this, the resolution changes but the numbers shouldn't have changed. That was definitely a bug. There should be 50 or 60 of these things. But there is, I think, a threshold at do you know that, like six months or so when it becomes coarser?

CHAIR: Yes, this is for 30 days we keep like one minute resolution. But I can look this up

AUDIENCE: 30 days for one minute and then it gets coarser. That's why it looked strange. I don't know why there's not enough props. Now, I have a question. In the raw data you can find among other things that are not presented in the plots, you can find the host name.bind or ID.server responses of these of the name server so you could find out which instance of a multicasted or log share group of servers actually is there. Would that be interesting to you? Do you use this identifying tags? Do you know?

CHAIR: We don't. We don't have any cast name servers that we manage, though we do use new stars, any cast name server system. It would be very, very interesting to us, yes.

AUDIENCE: We have this data and the question is wanting to prioritize if you're showing this data.

CHAIR: I'm intrigued. You said you want to keep the fine granularity for 30 days because I'm tending to look at this possibly 31 days

AUDIENCE: Is it really 30 days?

CHAIR: 30 days one minute and then 100 day five minutes and then 400 days for an hour.

CHAIR: So if we carry on doing this I have to look more frequently than once a month.

AUDIENCE: If it's a requirement you can tell us to change it but it won't change data in the past. It's going to be quite an operation to change it, but it can be done. That was my question.

CHAIR: Thank you. Thank you. Is Andre online?

ANDREI SHUKHOV: Yes.

CHAIR: In that case we'll get a presentation from

(Applause)

CHAIR: Thank you.

Hi Andre. We're trying to set you up right now. Andre, can you say something.

Hello. Hello Andre. We're having some feedback problems here. Do you have a head set?

ANDREI SHUKHOV: Can you see my slides?

HENK UIJTERWAAL: Yeah, we have them up here.

ANDREI SHUKHOV: How I can see?

HENK UIJTERWAAL: Okay, you can just start talking.

ANDREI SHUKHOV: The quality of the picture from me is not good or...

HENK UIJTERWAAL: This is readable.

Sorry, I didn't quite get that.

Yes, we now have everything on the screen. So you can start talking.

(COMPUTER: Master, I have mail for you)

Andre we're ready. And now the speaker disappears.

Go ahead.

ANDREI SHUKHOV: Yes. Good afternoon. I am professor at aerospace university. My presentation of dedicated to.... the next slide, let's start with definition, the band width of the maximum amount of data that can be transmitted, it's an important metric for several applications such as grid, video and voice streaming, overlay routing, file transfers, server selection and interdomain path monitoring. Tools previously developed for available band width monitoring and diagnose often have high overhead and are difficult to use. Fortunately RIPE test box doesn't measure the available band width but it collects the numerical values characterized the network heals like delay, jitter, routing path, et cetera. This data allows us to investigate the basic interdependence of available band width from significant network parameters. Our aim is to estimate the available band width from the delay value received from one point of path.

The next slide. This formula may be used for complication for two network points. Here W is size of transmitted packet and D is the packet delay. The delay value is caused by such constant network factors as propagation delay, transmission delay, per packet router processing time, et cetera. A

The next slide. To validate this assumption, they check the minimum delay of packets of the same size for each path and plot the minimum delay against the packet size. But the parameters of their linear equation will not be a simple function of the link capacities and the propagation delays. On this slide the graphic shows computer minimal delay. Next slide. Our model. The equation for this consists two or more D is related to the distance between the sites plus the propagation delay and per packet router processing time at each hop along the path between the sites. The formula should contain D minus A.

The next slide. Our model supposes the variation of packet size on the same path for calculation of available band width. If the testing process between two [unclear] points is organized by packets with different sizes and then the delay D gets also two different values. Our model must give the identical value of available bandwidth independently from packet size of testing packets. The received system from two equations is easy solved concerning D. You can see this solutions for B. It is easy to use the linear approach for calculation of parameters A here N is the number of hops, routers, that gives the trace route command and I is the sum of single legal routing path.

The next slide. Measurement process. There are several ways to measure network delay between two remote points in global networks. Please remember that the field of application of our method is the area from tens kbps S to several mbps. The basic problem is the precise measurements mentioned. The basic problem is the precise measurement for delay. The high precision of measurement is necessary for accurate result. The result should be measured the results for delay could be received from RIPE test box database. The size of testing packets should be different in several times, it is reasonable to choose 100 bytes and 1124 bytes correspondingly. Unfortunately current version of RIPE measurement system don't provide simple interface for such measurements.

The next slide. Routinely the special utilities could be used for delay measurements. We tried to test [unclear], the new UDP PING and other utility. In result of test the simplest utility PING was found to be a best choice for delay measurements. It shows on this slide the measurements. For example, ADSL connection at my home gives D 1 equals 18 milliseconds, D2 42 milliseconds. That corresponds to 350 kbps of available band width. During FTP session the delay grows to approximately 60 kbp of available band width. This is very rough computation but it could be made quickly and independently.

Last slide. The additional experiments on different direction are necessary for verifying our model. During experiments the received data should be compared with results of IPERF utility from NLANR .org, we have contacted with university of Missouri. It will be interesting to test remote direction like Australia and New Zealand. Unfortunately there is not a simple way to receive data with delays for different packet sizes in current configuration of RIPE test box. After finishing reserve our mechanisms could be incorporated in RIPE test box mechanism and additional graphics could be added to RIPE Database.

CHAIR: Thank you very much Andre. Does anybody have any questions?

AUDIENCE: This is very interesting. So if I understand this correctly it's that you get delay measurements, round trip delay measurements for different sized packets and you estimate the available band width from those measurements?

ANDREI SHUKHOV: Yes. If you measure the delay for different packets on the basis of this data, you get the available band width.

AUDIENCE: So my question is: About verification of the results. Have you done verification of the style where you tried your measurement

ANDREI SHUKHOV: Yes but we testing only with the PING utility. Unfortunately we cannot yet give result from RIPE test box. But our measurements of IPF but our measurement gives same results. It's different but...

AUDIENCE: So you did measurements where you tried this on an otherwise unloaded link or free link and you tried it with your method and IPERF by sat rating the link where it said its sat rated now and this is around 10% error. That's what you're saying?

ANDREI SHUKHOV: Please speak louder and slowly. I cannot follow

AUDIENCE: You did verification by using your method on an otherwise unloaded link, free link, that has had no other traffic on it and you tried your method then you tried saturating the link, you tried IPERF. Is that right?

ANDREI SHUKHOV: Yes, we compared the data there is a traditional method of measurements with help of IPERF utility, but IPERF utility demand installation on the... testing measurements... but our method allow to estimate available band width on delay data which could preserve it from only one site testing and we verify. For example, I measure at my home and we are measuring in the USA direction. IPERF utility, IPERF available band width and available bandwidth calculated by our methods. It's compared with, compared with now for the small

AUDIENCE: The small differences.

ANDREI SHUKHOV: The small mistake.

AUDIENCE: Okay. Thank you.

ANDREI SHUKHOV: In the Missouri measurements, we will give the result for precise measurements. We need... quality of delay measurements. This quality could be reserved from RIPE box and now our measurement method gives only route applications for delays. We need in the research experiment research. But I'm sure that theoretical models, our Americans make significant experiments as shown on the slide 4. You can see here experiment testing, available bandwidth showed.

CHAIR: Thank you very much. Any further questions for Andre? Okay. Thank you very much.

(Applause)

CHAIR: Next we have Ruben talking about the TCMTTM

RUBEN VAN STAVEREN: Good afternoon everyone. After this interesting talk, I'll bring you the TTM update, what we did in the last half year.

Well, the status of the grid, we currently have 97 boxes enlisted in our measurement grid and 74, that is 76% of them are active and running measurements. We did also a couple of installations since last update. We have ten kits in total, five were complete and five were antenna kits where host supplied their own hardware by the GPS card and our special clocker card from us.

The new boxes since our last update, there are a bunch of them, they're deployed for internal measurements, the sponsors for this meeting DECIX. The box in Russia near Moscow, has also been active and Spacenet has bought a second box for internal measurements. Unfortunately we also have a Commission of decommissioned boxes, the box in Geneva and the boxes in Sweden are also being decommissioned.

The current running projects, well TTM in the Russian Federation. There were some installation problems delaying the project, one of them is at least fully active and the other one is being discussed as possible hardware replacement for the other box. The other project we also have running, TTM in South America. Just before the RIPE meeting started, I pulled our contactor there and they have placed at least one box, which is needed, which needs network configuration and can be installed and the other two locations they had a site survey, so that should be coming up next after that.

Pending new installs, we have a new box for Easynet which should be in London, if I remember correctly. Additional Scandinavian coverage. Our colleagues in APNIC they've also obtained and is waiting installation. The box at LINX broke down. It was one of the very earliest boxes that were around and it will get a reinstallation one of these days. And our sponsors DECIX will get a box to complete the measurement grid.

How does it look in the graph, with the circle that's RIPE 55, we're now in the right corner of the graph so you see a continuing trend that goes upwards for the boxes that are in on status and the boxes that are in setup status even goes a bit down on this. So I think that those are really good figures that we achieved March of last six months.

Further more, we saw in TTM the deployment of ad hoc. My colleague mark in France will tell more about it in the next presentation. It's easier to use, easier to contribute, faster results and measures what you want and not what we want, so to say. Also currently is new improved alarms, not only for TTM but generic platform and will replace the legacy TTM alerting. We're also looking at a dedicated operations handling in our department because we have as many assets as our IT department, perhaps even more. Who can tell? The other projects like RIS, DNSMON, et cetera, too. And we'll also be looking to transfer some of these tasks to our NCC customer services.

What is in the pipeline? We will, given that we frequently hand out these new antenna kits we need to revamp the installation procedures for those kind which require clear requirements for [unclear] kit users and support for other controllers than just your plain ordinary serial or plain out [unclear]. In the beginning of TTM the boxes were only assembled by us. We choose a very simple and straightforward configuration where only these kind of controls were used. So we should add support for arbitrary number of controllers.

Also start upgrading boxes in the field to reduce the maintenance overhead by too many versions around. Also ease development of new features that we can employ like alarms and ad hoc measurements. Also we will be devising some kind of automatic in the field mechanism so we won't need as much support if possible.

In the last six months we also got a couple of sails inquiries and we're going to rekindle interest in them and explain the new pricing scheme and mark will tell more about that.

That brings us to our question slide.

CHAIR: Any questions for Ruben?

AUDIENCE: University of Washington. I was wondering if you could talk a little bit more about the alarms and what data presumably people will be able to specify what events they care about, but can you specify what data they'll's have access to

RUBEN VAN STAVEREN: Do you mean the existing alarm system or the one that's going to replace it

AUDIENCE: The new one.

RUBEN VAN STAVEREN: You have to wait for the next speaker and he will explain everything about it.

CHAIR: Any more questions? Thanks.

(Applause)

CHAIR: Next we have Franz from RIPE NCC and mark to talk about the future plans.

MARK DRANSE: Good afternoon. I'm Mark Dranse. I'm the Information Services manager of NCC I'm going to do the update on the work we've been doing with TTM. We've been working on this for quite some time. Back in RIPE 52, there was a presentation, the first one about the future of the TTM service, that was followed up by Henk talking about TTM futures, that was at the Amsterdam RIPE 53 meeting back in 2006 at which point the test traffic task force was formed. We continued on in RIPE 54. I presented a proposal based on feedback. It looked a little bit like that. In RIPE 55 in Amsterdam last time we displayed prototypes and got consensus from the task force and they were disbanded. I think I quite rudely forgot to thank the task force. We've done a lot of encoding and...

(Applause)

MARK DRANSE: Without your participation or being here today we wouldn't have done any of this. Thank you for that. Here we are now at RIPE 56 and here's what we have to show you.

The current status we said we'd make new alarms framework we've done that, and ad hoc testing. Interesting work that's gone on there. I'm very excited about what we've done and I think you will be too. We have here today the lead developer on this who is going to talk about those two things for you in a moment. Before that there's a couple more things I want to cover. The alarms framework and ad hoc testing are two of the elements we want to change with the TTM future stuff in terms of making the system more useful to subscribers. There were two other things we wanted to do which is making the system more reliable, more robust, safer and stronger and larger and the things we wanted to do this were involved with changing the pricing and making the network bigger in our own way with some sponsored nodes. We've done some work on that and I'll be talking about that now quite briefly.

So on pricing we announced this last time. This is now live and effective as of Tuesday, it's a bank holiday in the Netherlands on Monday. There's two options, changing from what the structure used to be we now have a single annual fee of 1800 euro that includes hardware, set up, support, configuration and service fees for a minimum three years commitment, 1800 per year for a minimum three years. After that point you can go away or stay you can extend for two years but at that point you only pay a thousand euros service fee. The reason we limit that is after five years old we begin to worry about how reliable and will they drop off the network. We would rather put new hardware in at that point.

The second option is make a onetime up front payment of five thousand and that works out at a discount. You can extend that again for two years. These are available for new subscribers now, as of Tuesday, essentially. For everybody who's got a live box in the field we're going to work on migrating that. We've got a list of all of you. We're putting in a plan to migrate everybody. We're going to focus on the older boxes. Some of these have been around for quite sometime and need attention. We're going to focus on that first. If you aren't contacted and you feel that you are someone that needs to be urgently replaced, contact us and we can look at where you sit on the list of migrations.

Secondly sponsored nodes is something that we agreed that we would do. The network coverage map, which I think was displayed, shows that the TTM network is quite Euro-centric. We have a number of nodes in northern USA and some interesting ones in Asia and down towards Australia but there's some big holes so we agreed we'd put some sponsored nodes so we have more probe sites to measure data from. We want to form lies the way we do that so we could be transparent. We've published a criteria document which outlines the criteria for sponsored sites so we're not just going to give it away to mates or my mom's garage or anything like that. The requirements are in the document. We favour noncommercial sites, not for profit, other organizations like ourselves, you have an interest in the [unclear] and the growth and we want to make sure they go in places where they're going to see a lot of traffic and routes and interesting stuff for you. That's published and online at that address. We've currently got formal agreements for two of these, one at the London internet exchange and one with APNIC in Brisbane and that's on its way to coming online soon. And there are more sites under discussion. That concludes what I have to say. I'll pass on to Franz. We can leave questions till the end unless there's any burning topics now.

(Applause)

FRANZ SCHWARZINGER: Good afternoon everybody. I also work for the NCC information services department. For those of you who may be newcomers or don't know too much about TTM I want to give you an introduction. It's a distributed measurement system. We have a map here that shows all the nodes that are currently online. They are mostly in euro but we have some efforts to distribute them more evenly around the globe. How do these boxes look like? Not very exciting, just a simple server. The exciting thing about this is the green cable that you see on the right side it goes to a GPS antenna. It looks like this. The benefit of this is that we get very, very accurate [unclear] keeping on those boxes, up to 10 microseconds accuracy. To give you a feel, a typical camera flash is one thousand microseconds, with one camera flash we can measure one hundred times.

What I really want to talk to you today is TTM futures. We have been proposing this at the last RIPE meeting and there are two big things that we've now developed. The first thing is an alarms framework and the other thing is ad hoc testing. I'm going to start off with the alarms framework. The old system already mentioned looks a little like this. We have this cloud of TTM boxes and each one has its own alarm system and only test box owner can access the system and configure an alarm there. Some of you might have multiple test boxes and you have to configure the alarm on each test box and multiple emails from each [unclear]. A little messy. We proposed a fancier system. We still have all the TTM boxes but what we want to do we want to get all the data centrally and put it through a data quality management so that we, the engineers, can be informed instantly if something goes wrong and only if the data quality management approves this data it will go towards the graphing systems and towards the alarm system that will then notify you in case really some problem occurs.

The system looked a bit complicated and also is a bit complicated so we tried to simplify it a little bit and we came up with a three-stage system. It starts with an input. We thought maybe not only TTM would benefit from such a system, maybe also RIS or DNS 1 would benefit from it. So we have to input stage, which is basically small modular in the whole system which allows to stream data into this alarm system. The next stage are the filter. I'll show you how these work. You define in this stage how your alarm conditions should look like. And the last stage is the output. Right now you can only be notified by Email but you might want to have something written to your log and maybe SMS in the future, and this output sends a notification to the hopefully happy engineer.

So how does this all work? I put together a very simple example. As I said earlier we have these three stages, input, filters and out put. Since we're in the test traffic working group we'll have a look at TTM. Very simple TTM input connector as we call it might put a delay and a time stamp into this alarm system. For instance, I would want to be informed when my test box to another test box experiences a delay of more than 30 milliseconds so I put that filter in there and connect it to the delay out put of the TTM module. And I want to be informed by Email so I connect that Email outputting to my filter. Also I might want to be informed by text messaging so we also put the SMS output module here and connect it to the filter. For me when I go home from work I usually don't like to be bothered so much with work stuff, so maybe I only want to get SMS between nine and five p.m. and I can do that, I put a nine a.m. to five p.m. filter and put that in there and I will only get an SMS if it's between nine to five and the delay is more than 30 milliseconds. It's a rather simple example but you can probably imagine how much more is possible with this system. You can stack more filters next to each other, more in parallel. It really is quite powerful and we're also planning to have a very nice user interface where you can drag these things around and rearrange them and see them.

Where are we now? We have the basic under lining system finished and we're going online with the Beta version right now. Here's the URL. You can try it out. We're thinking what we should use as a first input module for this and with the YouTube incident that we would do MyASN. If you have a MyASN you can try it out. We will have TTM function alternate in the next two months and offer centralized real-time monitoring and delay and loss and they'll get accounts. We'll announce this on the mailing list once we're there.

Bigger roadmap what do we want to do with this? Short term, we want to add more plug-ins, more modules, TTM, DNSMON would be nice to have, the new ad hoc testing I'm going to show you and for long term system improvements, user management than we have now, user defined presets. I already showed you said that we would like to have this user interface where you can drag those filters around. Yes, that was the alarms framework. It's flexible system. Simple user interface. Extensible to other services.

The next big thing is ad hoc testing. Test traffic measurement, what it is right now, one-way delay and loss measurements between all the nodes. You guys, test traffic working group asked us to do a little bit more and you wanted us to make it possible to have arbitrary ad hoc experiments and the ability to measure any network in the entire world. So we went ahead and did this and this is what we came up with. We developed an API which allows you to describe a user interface and the data presentation in XML and write the test procedures in Pearl. This will be distributed on. TTM network and the data presentation is generated from that. Everything is stored in the database on the other side. And all of this is finished. Two weeks ago we did the final step, finished the project and it's already running. For those of you who visited me in the demo stand might have already seen it. For those of you who haven't. I hope we have enough time, yes it looks like it. I'm going to demo it now quickly again. If you go to the site this is how it looks like. You have to log in with your user account. These are a few tests we ran. I'm going to create a new one name and call it test traffic working group test. We have two of these plug in, PING and HTTP. We're going to extend this, PING, (PING) we'll have a look at ROSIE. The Neil nice thing is this is all real-time so I'm going to schedule a test that will run in one minute and an interval of ten seconds so that we can see something nicely already. And down here the plug-in specific parameters will come up, these are generated from those XML files and I'm going to put in ROSIE here. We'll stick to IPv4 for now. IPv6 worked as well, but... then we submit the test. Unfortunately it just jumped to 521. We can wait for these results. We could also edit the test here and change it if we changed our mind, I wanted it some other time. I wanted another region. I wanted it to be run on other test boxes. That is all possible. Before we are waiting too long maybe I show you a test we ran earlier. This is how these pages look like. The user interface is fully generated. You have the graph on the bottom and the raw data in a table form. You can sort this by test box or retrieval time. It should be time now. Let's have a look at live data. You see the scheduled test, it's down here now at the previous test and it's running now. The first three results they're coming in from Japan, from the US and from Dublin, and this is live data this is currently running on the test boxes coming in live and available to you and if we refresh now we'll have even more measurements.

That was the demo on this and I'll go back to my presentation.

Again, our roadmap what do we want to do? Short term, have a few more plug-ins, SMTP, POP 3. That's what we came up with. We want to put multicast monitoring. That's what I was talking about at the last RIPE meetings. I think it would be a nice thing to integrate and more suggestions from you guys would also be welcome. Long term plans would be integrations of alarms or better make an input model for alarms so it works with this. And we need to work on some scalability issues, not really because of the system but because of the graphs. We tried this with more boxes and the graphs got very messy. We need to think of new visualization techniques to expand this to more test boxes. How to get access, you have to be a test box host for now. We hope that nobody will use it for malicious purposes. But that's why we only make it available to test box host for now. You can request an account by sending us an Email. If you do that now on Tuesday we'll generate those accounts for you. This is the URL, how you can access it. We'll also sound out an announcement on the test traffic mailing list so everybody knows about it. That was the ad hoc testing. You can measure any network, easy to extend and has a unified interface, all looks the same, will be familiar after a few users. That's basically it. Before I go to the questions I would like to invite you to the demo stand. I will be outside after this working group ends and any further questions, show you this ad hoc testing or any of our other information services once again and a bit slower. And we also have these nice brochures and there's quite a lot of information about test traffic inside. If you're interested grab one. They're also inside this room.

CHAIR: Any questions?

AUDIENCE: You just told that you are developing or the short term is to develop a plug-in based interface, possibility to have plug-ins in the software. Why wouldn't you just make an API description for the system or API description of the software and probably let it run as an open source so you get more input from the community?

FRANZ SCHWARZINGER: I've heard this suggestion a few times today. Currently we don't have any real plans to open this up and give it out to the community but if the test traffic working group wants this, I guess it would technically be possible but I would also say you'd probably have to think about some restrictions on what you would actually want to run on your test boxes because it's going to run on your test boxes. I guess there should be some kind of documentation

AUDIENCE: You can still restrict what is being run on test boxes, but you can open the development, so you can still be the one who is deciding what is going to run on the test boxes probably by voting for this or that plug-in or this or that extension, just but the development itself, which is the resource consuming stuff, it can be done separately.

FRANZ SCHWARZINGER: But what would be the criteria that we use to select which one we take and which one we don't

AUDIENCE: Make an open voting for which one you take and which one you don't based on the future that this or that plug-in has.

FRANZ SCHWARZINGER: We would need some set criteria, but it could be possible.

CHAIR: Any other questions?

AUDIENCE: Niall O'Reilly UCD. Do you have one of these boxes here at the meeting to show us? Then I should stop by.

FRANZ SCHWARZINGER: We have one box. I guess it's in the room. I don't know if you would have to talk to him. We also have also the GPS antenna that we have here, you can see that from outside the hotel, it's above the entrance somewhere there's a small white box. I showed the picture earlier.

MARK DRANSE: As long as we don't get an enormous queue we can probably show the box to a couple people.

CHAIR: Okay. Any other questions? I've got one. In the demo you showed the results coming back from four test boxes?

FRANZ SCHWARZINGER: Yes.

CHAIR: How were those selected?

FRANZ SCHWARZINGER: We have regions defined with Europe, Asia, North America, South America, they're selected randomly, we want to interface it where you can select which boxes, we wanted to have it ready for the RIPE meeting, we didn't have it ready. It will be ready in the next two weeks or so.

CHAIR: Okay. Thank you very much.

(Applause)

CHAIR: That leaves with one agenda item which is any other business? Anything anyone wants to raise? I'd like to thank our scribe and stenographer and all our speakers. Thank you.

(Applause)