Routing Working Group Minutes RIPE 75

Wednesday, 25 October 2017, 9:00-10:30
WG Chairs: Joao Damas, Paolo Moroni
Status: Draft


A. Administrative Matters


The presentation is available at:
https://ripe75.ripe.net/archives/video/147

Joao Damas, Routing WG Co-Chair, welcomed everyone to the session and declared approval on the minutes from RIPE 74.

B. Workshop/Feedback About the Routing WG Future Evolution


The presentation is available at:
https://ripe75.ripe.net/archives/video/204

Joao asked for input on the WG’s charter.

Ignas Bagdonas, Equinix, recommended a focus on the close interaction and alignment with standards organisations like the IETF, adding that the communication between the operations community and the IETF isn't optimal. They also advised the WG focus on Internet Exchange operators.

Rudiger Volk, Deutsche Telekom, agreed that a regular IETF update would be beneficial to the working group. He added that he didn't feel that much operational requirements have been brought up in the working group in the past and wasn't sure that was going to be fixed in a substantial way.

Geoff Huston, APNIC, added that small changes in BGP have been solely on the basis of operator input, rather than IETF invention. He emphasised the relevance of discussing the problem of route leaks, BGP security and alternatives to inter-domain routing as part of the working group's charter.

Joao thanked everyone for their input and said that he heard general consensus on enhancing this back and forth communication, especially between operators and vendors, protocol definition, etc. He added that the chairs would work on finalising the charter.

C. Working Group Chair Selection


The presentation is available at:
https://ripe75.ripe.net/archives/video/149

Joao reiterated that he was going to step down as WG Co-Chair. He has been doing it a long time and thinks it’s time for someone else to come in with fresh ideas.

Joao added that there was volunteer when the call went out and so he was confirming that Ignas Bagdonas was the new Routing Working Group Co-Chair.

D. [RACI] Smarter BGP Convergence - Juan Brenes


The presentation is available at:
https://ripe75.ripe.net/archives/video/150

Peter Hessler, Hostserver GmbH, asked if he's looked at how the algorithm would be applied if he were to mark a so‑called low packet per second but considered extremely valuable AS neighbour or prefix.

Juan replied that he hadn't checked that yet. They found that the traffic was mostly regular during the day so the prefix ranks didn't change that much. He added that it's not affected by the time that you do the measurements or with the time you do the ranks.

Geoff Huston commented that this work relies on a common misimplementation of BGP. The theory is with the MRAI (Minimum Route Advertisement Interval) timers, that when you have an update to send to your peer, you place a pointer to that prefix in the queue and you wait for a per‑peer timer to expire, and at that point, you drain the queue and send all the updates to your peer. So that you find that you effectively send a burst of updates every near 30 seconds, plus or minus 3 seconds, which is the way it works. But that was a Cisco-ism that's not in the standard because it is certainly possible to put the timer on the prefix rather than on the peer. As soon as you do that, when you’ve got something to say to your neighbour, you say it, modulo the fact that you apply the MRAI timer per prefix, in which case there is no prioritisation any more. Once a prefix is due for an update, it sends an update. And so what you are really talking about is altering the behaviour of the misimplementation of the MRAI timer in the wait queue commonly used or abused by Cisco, and if you actually did something different, i.e. per-prefix timers and spent a bit more time in memory, you would get actually better outcomes anyway.

Juan agreed and added that they know Cisco and Juniper do different things related to MRAI. He mentioned an experiment they have to analyse how these affect the Internet. They found that it takes a lot of time to analyse 500,000 prefixes; it’s not completely related to MRAI. Sometimes its been affected by MRAI measurements but it's not the whole thing. He posited the question of whether not having anything makes it possible to follow with this and if it's possible to push through an RFC. He asked for suggestions.

Rudiger commented that if he has something to offer in the IETF, he should just draft something.

Juan clarified that he was wondering if it was useful to bring this kind of approach to the IETF.

Rudiger replied that they haven't heard much about how they prioritise the prefixes. It sounded more as if he was making a statement about every BGP failure creating a problem, and that is entirely false. He added that there are two kinds of statements, one being that the Internet is designed and operated as a largely resilient infrastructure. The other statement is that the Internet is broken all the time by design and the two go hand-in-hand. He added that the statement about BGP failure resulting in a loss of money was wrong, it's about the kind of consequences of some types of BGP failures and the difference between them. He commented that Juan's metric to define what was important was extremely interesting.

Juan replied that they are not pushing for any algorithm to run the prefixes. They just thought it would be a good idea to use the traffic to rank the prefixes. Their main argument is that it would be useful to have a way of establishing the order of the BGP update, which network operators can use to determine what happens in their networks.

He added that Rudiger was probably right that the Internet is designed to fail all the time but we have to make the most out of the routing protocol to avoid losing money eventually.

Peter Hessler referred to Greg Mounier's presentation in the plenary that micro blackholes are concerning to many operators. He praised the work that's been going into minimising the impact of these and how operators can change their implementations to minimise the damage. He said he'd like to see more of this being discussed and the IETF is a good place to do that. He suggested that Juan send the IETF a draft of his algorithm and how it's used.

Peter added that in his network, they don't send a lot of traffic to (for example) Amazon s3 but their customers consider it extremely important to mark certain types of prefixes "converge fast" to help with micro blackholes and that this would help from their perspective as well.

Juan replied that someone asked this on the mailing list and they would focus on the traffic because it's what they can measure. But of course if you want to put your DNS on the top of the list, that's fine. If you want to put your mail servers, that's fine. They are not pushing for any kind of implementation.

Peter replied that it's not a generic algorithm and one of the better methods.

Juan thanked Peter.

Geoff commented that he didn't think it was a good idea, adding that there are two kinds of issues trying to recover from in the BGP - one is the normal cut and thrust of updates. If one were to implement BGP diligently, one would do per-prefix MRAI timers, allowing an update to be sent immediately, modulo an MRAI timer applied to the prefix, and not to the session. If you want updates faster, drop the MRAI timer. Why is it 30s? There’s no science behind that number, and it’s just arbitrarily 30s.

The second issue is one of start-up. In other words you have come down, you are coming back. If both of you were dead, you and your peer and you're new, you have got to transfer a full route table. If one of you was alive and you have just died, may be the other end would try to keep hold of that session state and what you told it for a period of time, it's just memory. And what you really want to do when you come up is say “Hi, it's me again, remember what I told you last time? That stills holds, let me resume from where I was”. That's about as fast as you can possibly get. This whole idea of prioritising, telling the neighbour what the neighbour already knew is kind of backwards. You knew that. You know, you are just correcting an amnesiac problem that the other end, if it chose to could just remember it anyway. I kind of think this is warped and twisted in terms of what you are trying to do. What you really should be doing is taking advantage of the fact that one ended failure doesn't mean failure of information at the other end. Use that information and get the other end to hold it and then come back in live and reassemble from that known state.

Juan replied that when they did the experiments, they found that the Quagga takes 15 seconds (best case) between the first update and the last update. They played with the MRAI timer, they changed the topology, they changed the timer. He added that he thought this was related to the processing of the amount of prefixes that they have. They are doing some experiments to determine the MRAI in the wild and looked forward to presenting the experiment.

Joao added that before assuming that state was lost, to check if it was really lost.

E. More Specific Announcements in BGP - Geoff Huston


The presentation is available at:
https://ripe75.ripe.net/archives/video/153

Owen DeLong, ARIN, proposed that the RIRs are creating plenty of new prefixes by splitting the old ones through the transfer process and that's why you can’t see any slow down in the BGP growth rate even though you have seen a slow down in the handout of fresh addresses.

Geoff replied that he's given a presentation in the past where he analyses all the new prefixes that appear each year and tries and figure out if they are in the transfer registries or if it's another issue. Back in 2010, most of the new prefixes each year from allocated sort of six months before. And today, most of the new prefixes were allocated somewhere back in the mists of time, but only 10% of them are in the transfer logs. He added that either the transfer logs aren't capturing all of transfers or he has no idea. There is a certain amount of churn going on, but the overt transfer logs don't describe what's going into the routing table in its completeness.

Owen then replied that in terms of the reason that v6 is so much more unstable, that a lot of unstable prefixes he’s seeing are probably /48s. And they are probably a large quantity of home users messing with tunnels and other people experimenting with BGP and v6 and turning it on and doing their little lab thing and turning it off when they go home, so it doesn't mess up their production network because they are afraid.

Geoff replied that with IPv6, it was common to blame everything that's bad on Teredo.

Owen replied he was thinking more of tunnel broker.

Geoff said he would look at it from that angle and look at the prefix size.

Owen replied that the reason for more disaggregation in the category 3 announcements in IPv4 compared to IPv6 is probably because a high percentage of that comes from end sites and there are a lot more end sites with more than one prefix in IPv4 than there are in IPv6.

Geoff replied that there is this mythology out there that if you announce for specifics, you can't get hijacked. If you advertise a more specific, no one else will advertise the same more specific anywhere on the planet, that's wrong too. He added that he suspects there is a certain amount of "If I do this nothing bad will happen to me", which is kind of wrong.

Peter Hessler, Hostserver GmbH, commented they have noticed a lot of instability in their v6 routing updates, and the RIPE BGPlay for their AS is essentially unusable unless they filter out their IPv6 announcements. He said he was the one who doing those announcements and he announces only the allocation and it's static. He hasn't changed it in months. He added that RIPE shows that some of their neighbours were just flip-flopping every three minutes, every five minutes, and that could indicate a problem. Peter asked Geoff if he was able to look and see if he sees a flip flop and filter those out and see if v6 is stable or instable and if it's a certain number of players involved in that.

Geoff replied that he did some work sometime ago about playing updates through a big cache and found that most of the updates are things you've already heard about a few seconds or even a few minutes ago, that a lot of updates just repeat information and the new information load you need a microscope to find it. This behaviour is visible all around the world. Geoff wondered why it happens in v6, but not in v4, and it’s not behaving as it should.

Peter agreed, adding that he was announcing their v4 allocations to the same neighbours and they aren't. He asked what people are doing with v6.

Geoff said they switched on the "I hate v6" on the router. He didn't know either.

Jen Linkova, Google, commented that from her operational experience, broken things get fixed when someone complains. She asked if he sees any users in those unstable prefixes.

Geoff replied he can't see users, only the routing table. He doesn't run an operational network so he doesn't see traffic.

Jen asked if he saw users in his other experiments because she suspected it was because of the 20% of users on v6 might complain about the v6 network being unstable.

Geoff said that the thing about v6 is that v4 makes it all better. Happy eyeballs is a testament to that, so it could be that users complained but the problem went away because it happened again and got better.

Jen wondered what the probability was that some of the prefixes Geoff was seeing as Type III were actually Type II or Type I and that he was just not seeing another route.

Geoff said she was right because BGP masks localised but the real issue was why, in their tools, BGP local policy becomes public noise. Why don't more specifics stop doing what their doing when they cease to be effective.

Bernd Spies, DE-CIX, says he is also confronted with this type of thing. He added that they have more specifics, and people think it's cool to have a larger prefix space but it just increases their chance to get some larger peerings. Maybe they can help those guys realise that they don't need to have many prefixes to have super networks in the routing table. He added that people often don't understand what they are doing, so they think they need /24s. Perhaps we could explain that this is not needed to do this kind of fragmentation.

Geoff replied that it's always hard to tell a paying customer that you're going to refine what they're asking from you. He said that the other thing they can do is identify the people that do this, because believing that announcing /24s makes them invulnerable to attack.

Ruediger Volk, Deutsch Telekom, commented that Geoff's analysis of routing vandalism is with safety belts on, there is actually the aggregate announced. He asked if he still had the statistics of the vandalism without the safety belts.

Geoff replied that all /24s look the same. He published it all on the CIDR report but didn't put it through the update engine because of the CPU load. He added that another test would be to extend it to years.

Ruediger added that in the databases, he is seeing really threatening examples of vandalism in v6 and wondering why they didn't publish route objects for 48 Ks of /48s out of their /32.

Peter replied that he created route objects for their /24s in their IPv4, he forgets what it is with v6, but there is an attribute that says he's allowed to announce /48s, and it's specifically for DDoS mitigation. He said he normally only announces the allocation but if he's under attack he needs the route objects to exist so his neighbours and other entities accept them.

Geoff wondered if they were doing the right thing to do a DDoS response by changing their routing prefixes which forces them to advertise more and more specifics so the response matches the DDoS target. He said he understands why he's doing it, because he thinks there's no other way. He wondered if the routing system was the place to apply a BandAid. He added that they should talk about it more in the WG.

F. Pseudowires and Control Words – Fixing It Once and for All - Ignas Bagdonas


The presentation is available at:
https://ripe75.ripe.net/archives/video/155

There were no questions.


G. Foreign ROUTE objects in RIPE Database - William Sylvester


The presentation is available at:
https://ripe75.ripe.net/archives/video/156

Shane Kerr, speaking as himself, asked if he meant that he was just moving origin authorisation.

William said yes.

Ruediger asked if this was supposed to close an NWI.

William said yes.

Ruediger asked if there was a solution published for the NWI. He saw the email but didn't see a clear proposal in last call.

William said he'd talk about if with him offline.


H. Internet Routing Health Measurement BoF - Kevin Meynell


The presentation is available at:
https://ripe75.ripe.net/archives/video/157

Ruediger said he strongly objected to the third bullet. The phrase reads as if everybody working on it or interested in it should be ignored.

Joao said that the wording was more along the lines of the IETF taking a formal academic approach to design this, and maybe a little bit detached from operational reality.

Ruediger said it translates into "we abandon the hope that the IETF gets connected to reality".

Kevin replied that they shouldn't get too hung up on the words. He's happy to hear other suggestions.

Peter said the intention of third bullet wasn't to exclude anyone who has ever thought about BGPsec or RPKI, but to remove those who are extremely close to it (like the authors) because there is a likelihood that emotions come into play. They are looking for a dispassionate analysis.

Tim Bruijnzeels, RIPE NCC, commented that he was aware of deployment issues around BGPsec but it's important to differentiate from origin validation and BGPsec.

Kevin says he takes the points on board - he added that it wasn't for him to rewrite it but it was circulated in advance to the people that were at the BoF for comment. Kevin said it wasn't set in stone. They can soften the language.

Joao added that an external third-party would have a more objective look.

Z. AOB

RIPE Forum

The RIPE Forum is an additional way to participate in RIPE community mailing list discussions using a web-based interface rather than an email client.

Check out the forum