RIPE 26

RIPE Meeting: 26
Working Group: Routing
Status: Final
Revision Number: 2

Please mail comments/suggestions on:

  • content to the Chair of the working group.
  • format to webmaster _at_ ripe _dot_ net.

Report of Meeting, 20th January 1997

A. Administrative Issues
Joachim Schmitz, Chairman, presided and welcomed people to the meeting. There were 95 attenders. Chris Fletcher took minutes.

The draft agenda circulated in the previous week was agreed. A copy is displayed at the end of this summary. There were no additions or changes to the minutes of the RIPE 25 Routing WG session. A short overview of open actions was given.

 

Action 22.10 on Joachim Schmitz
To trigger the discussion on the mailing list of the Routing WG, which focus to choose for a future tool development project and to come to consensus on it
is still open. It will be pursued after the current meeting.
Action 24.4 on Joachim Schmitz
To investigate the status of the CIDR FAQ and see whether additions are needed, probably by triggering a discussion on the mailing list
According to the result of the investigation no official additions by the Routing WG are needed. Therefore, the WG agreed to close this action. However, it was felt that further distribution of knowledge about CIDR is needed and therefore a pointer to the CIDR FAQ location should be included on the Routing WG pages at the RIPE NCC weg server.
New Action 26.R1 on the RIPE NCC
To add a link on the RIPE web server from the Routing WG pages to the CIDR FAQ location
B. Hierarchical authorisation for route objects (J.Schmitz)
One major topic of this session was the discussion on hierarchical authorisation for route objects. There was already some discussion on the mailing list regarding various issues involved. To continue with the discussion J.Schmitz compiled the current state in a presentation. The transparencies of the presentation are available by ftp. Based upon this several of the open issues were discussed in the WG session.

Last year improvements to authorisation during the interaction with the database were discussed. One of the elements is hierarchical authorisation and the first implementations of it were done for inetnum objects and domains. Up to now there is no hierarchical authorization scheme for route objects. Following the same reasoning as for inetnum objects - to protect the objects against unauthorized changes - there definitely is a need to apply hierarchical authorisation also to route objects; route objects in the RIPE database are already used by some ISPs to build configuration elements for their routers. Obviously, this calls for stronger protection.

The route objects do not stand alone. They do have relationships to other objects in the database

 

  • Relation to the autnum object
    AS numbers constitute routing entities defined in the autnum object. They are closely related to routing and from a logical point of view are hierarchically higher than route objects. However, they form a flat space of numbers and a hierarchy among themselves is difficult to apply. Moreover, autnum objects in the current version of the database do not point to route objects making it difficult to implement a top-to-down search mechanism from autnum objects to route objects. Therefore, some hierarchical authorisation scheme starting from the autnum objects seems to be unapplicable on first sight.
    Yet, route objects point to two other objects at the same time, both pointers being mandatory: first they point to corresponding autnum objects via the "origin" attribute, and in addition a maintainer must be specified. Interesting enough this maintainer needs not be the same as the maintainer specified in the autnum object allowing creation of route objects for some AS using a completely independent maintainer. Obviously, there should be a possibility to prevent it introducing a hierarchical scheme: the proposal is to allow "mnt-lower" attributes within autnum objects defining which maintainers may create route objects for the AS of the corresponding autnum object.
  • Relation to inetnum objects
    Several people are not happy about the fact that there is no reference to address allocation for route objects because address space and its routing are somehow related. There are proposals to make route objects dependent on inetnum objects. The big advantage of such a scheme is that there will be no routes without allocation of address space which is a very appealing approach! However, it is not at all applicable to pure routing registries. Therefore, a dependency of route objects from inetnum objects seems to make the overall handling too complicated. Establishing a mere combination of address space ownership and route objects might be more easy. This may be (and in most cases already is) done somehow by the maintainer object. However, it is again not applicable to pure routing registries. Moreover, ownership and routing often differ, and changes in routing may demand changes in inetnum objects. In general, there are also many more inetnum objects than route objects (because of CIDR) which makes the relation again more complicated. There also is the opinion that the philosophy of the database should be to mirror the real world. In the real world advertised routes are completely controlled by an AS making other authorisation irrelevant. No policy should be enforced which does not exist in the real world. Maybe, consistency can be achieved through notification to flag discrepencies.
    Obviously, the relationship is difficult and there was not yet any consensus on how to deal with it. The problems might be solved in a unified distributed global registry but there is no immediate solution. More discussion is needed.
  • A prefix based hierarchical scheme
    With inetnum objects in the database a prefix based hierarchical scheme is used making shorter prefixes hierarchically higher than longer ones, controlling longer prefixes below them. This scheme could also easily be applied to route objects. Its big advantage is that already a working mechanism exists. However, there are also several problems in it:
    • To make authorisation work some starting point in form of top level route objects must be created in each registry in order to prevent that anybody may gain control of the whole tree of route objects. These top level route objects are kind of artificial because they do not reflect any routing at all. Starting points are especially difficult to apply to the swamp 192.x.
    • If a hierarchical authorisation based on prefix length is enforcedonly one route object per IP-range would be allowed. This is different from what can be found in usage of the database today. It will cause difficulties in several cases:
      • Multihoming today is expressed by several route objects of differing origin. If only one route object is allowed multihoming could not be expressed in the database. This again might be circumvented by introducing multiple origins per route object which again raises the question of authorisation by maintainers.
      • Relatively often specific IP-subranges are routed by another AS than a given IP-range. If the maintainer of the IP-range allows handling of IP-subranges in general (by specifying a corresponding "mnt-lower" attribute), any IP-subrange may be captured by the maintainers given in the mnt-lower attribute. This may not be intended. A possible solution might be in extending the usage of the "hole" attribute being already an optional attribute of the route object: holes could be excluded from hierarchical authorisation down to the next longer prefix specified.
      • If a customer changes from one ISP to another the origin of the corresponding route object changes, too. To facilitate the change, currently for a few weeks both ISPs keep a route object of the same IP range with their AS as origin. This is especially important if they generate elements of their router configurations from route objects in the database. The problem is similar to multihoming however with the focus on transition of maintainers.
    • Even though route objects could be secured by hierarchical authorisation in one registry they are not necessarily protected in another registry because data in different registries do not depend on each other. As a consequence duplication not only of the data but also of the specific hierarchy is indispensable.
    Obviously, enforcement of a prefix based hierarchical authorisation causes troubles which can not be solved within a short time.
  • A temporary suggestion
    The prefix based scheme is very valuable and should be applied somehow. A temporary suggestion is to apply it but not to enforce it, just to notify. The advantages are that it is built upon a working mechanism and nothing much changes. A starting point in form of top level route objects is not needed and duplication in several routing registries has no consequences. Still several route objects of differing origin per IP range are allowed and conflicts with current practice in using the database do not occur. Yet, a simple notification does not solve conflicts. The "owner" can not remove conflicting records - but with conflicts two parties exist and both must resolve the problem together. The notification does not solve the problem in itself but it will flag it to be there and that a solution is needed.
    However, objects in one registry remain unsecure if hierarchical authorisation is applied in another and in the end this is no real solution. Nonetheless, this is a slight improvement to the current situation which is upwards compatible because notification is also needed if authorisation is enforced in the future.
    There was a general consensus that pure notification as a temporary solution should be applied. Later, on the way to enforcing authorisation some kind of approval mechanism could be probably installed if errors occur which come from insufficient authorisation for the action requested. With notification several new questions arise:
    • To keep notification traffic low it might be useful to notify only if it is requested by an object hierarchically higher. Currently, in the database notification only occurs if requested. Because of the importance of route objects one might choose to notify of overlapping routes in all cases.
    • It might be useful to trigger notification by route objects only. To take the importance of inetnum objects into account one might think of notifications in cases of inetnum objects of overlapping address space.
    • It might be useful to notify the creator of route objects of the other notifications for coordination purposes.
    • Up to now notification is done only for the creation of objects, not for changes or deletions. To prevent floods of mail this should be kept.

In general, several issues could be clarified but there remains quite a lot to do. The items where consensus was found should be implemented soon and discussion on the other items (still the majority) must continue.

 

New Action 26.R2 on Joachim Schmitz
To trigger database implementation of first discussion results from hierarchical authorization for route objects
New Action 26.R3 on Joachim Schmitz
To finalize the hierarchical authorization for route objects together with the Routing WG
C. Report on route aggregation by the RIPE NCC (D.Karrenberg, RIPE NCC)
The report on route aggregation could not be given because the statistics mechanism was offline for two months. Therefore, nothing new can be reported. DK was asked to fix it and to report again at the next RIPE meeting.

 

Action 25.R1 on Daniel Karrenberg/RIPE NCC
To report on the results from the route aggregation analysis on the next RIPE meeting

There is other data available from SURFnet showing growth of the Internet:

 

New Action 26.R4 on Eric-Jan Bos
To circulate the URL of his analysis of routing table size on the mailing list

This has already been done - see ftp://ftp.surfnet.nl/surfnet/net-management/ip/nets.ps - it contains data on the growth of the global routing table since 1 January 1994.
D. New Developments of RATools (D.Kessens, ISI)
The RATools as part of the Routing Arbiter Project of Merit and ISI are a valuable means to make use of registry data and to compare it to the real world. David Kessens (formerly RIPE NCC, now at ISI) gave an overview of new developments in version 3.4.x and 3.5.1 of the RAToolSet. There were noticable enhancements for RtConfig, a tool which allows automatic generation of router configuration elements from registry data. Moreover, the aoe (autnum object editor) is now in production. The overview of the RAToolSet news is available as ftp://ftp.ripe.net/ripe/presentations/ripe-m26-davidk-ratools.ps.gz and contains information about where to find the software (WWW, ftp).

In a second presentation David Kessens introduced the aoe (autnum object editor). With this new tool autnum objects can be automatically generated for registries. Data of neighbors and for the policies can be taken from existing databases, from real life BGP dumps, or entered manually including heuristics. It has a user friendly interface including a GUI (based on X.11 Tcl/Tk), an "on-line" help, and it makes updates to IRRs easy. The functionality of aoe was explained by various examples and it turns out that aoe can make work with autnum objects much easier. In the future, RPSL will also be supported (up to now aoe generates RIPE-181++ syntax), cooperation with other tools will be implemented, and more and better heuristic methods will be included. The aoe shares the same requirements with the RAToolSet (which it is part of) being

  • gcc 2.7.2 or later
  • libg++ 2.7.2 or later
  • Tcl 7.5/Tk 4.1 or later
in version 3.5.1 of the RAToolSet. The presentation of aoe is available as ftp://ftp.ripe.net/ripe/presentations/ripe-m26-davidk-aoe.ps.gz.

During the RIPE meeting a test installation of the RA ToolSet was accessible to the participants.

E. Report on routing stability (G.Winters, Merit)
Since routing stability has become a major issue in the Internet it was very interesting to have a presentation of recent measurements done by Merit. Gerald Winters of Merit showed results from Craig Labovitz which he (Craig) had presented at the May '96 Nanog meeting supplemented by some newer studies. The presentation of Craig Labovitz is available as ftp://home.merit.edu/pub/users/labovit/talks/nanog-9605/instability.ps and further information may be found at http://nic.merit.edu/~ipma/.

The presentation was very interesting and caused a lot of discussion. Topics from the presentation and the discussion are comprised below: As it turns out there are peaks of close to 10 million BGP announcements and withdrawals in a given day with more than 100 BGP updates per second. Major providers are not major causes of instability while individual ISPs and end-sites can have a disproportionate effect on routing stability. It is interesting that BGP traffic is a function of weekday/weekend and even of the time of the day. A correlation to the amount of traffic is speculative, however there might be an indication of some correlation to maintenance work. Up to now there has been no analysis whether long holidays show in the statistics as well.

For real big instability incidents (abnormal events) Merit people called the originators of the instability to find out the reason. From a relatively low number of 36 incidents it turned out that

  • links are nearly no problem
  • hardware is rarely a problem (but approx 3 times as often as link problems)
  • software and configuration problems cause more than 50% of BGP updates
However, this does not give much indication on the reasons for the general instability below major incidents.

A graph showing the number of BGP announcements (normalized to the number of routes) seemed to indicate that the instability grows because a linear regression of the data showed an increase. However, in the discussion it was noted that there was a very large variance in the samples taken and a linear regression may not be justified and therefore misleading. Moreover, the growth in complexity of the Internet may introduce another effect: more routes are seen because of an increasing number of peerings. Therefore, a mere normalisation of the number of BGP announcements to the number of routes may not be sufficient. In the end (because the increase was not very steep), there might be no indication for a growing instability at all.

A more elaborated analysis of the BGP announcements shows that most of the BGP updates are redundant or unnecessary with a large percentage of duplicates. 99% of the BGP traffic is withdrawals. One reason for this might be that withdrawals are always sent to all peers and accepted there regardless of outgoing or incoming filters. It is suspected that this behaviour is actually wanted because especially if a new filter is set up, previously valid routes should be withdrawn anyway.

Another interesting analysis showed the frequency of BGP updates for the same prefix and origin. There were pronounced peaks at 30sec intervals. A close relation to IGP updates is suspected. However, in the discussion it was also mentioned that with Cisco routers the default keepalive on lines is 10sec and the line protocols go down after three missed keepalives which is 30sec. If immediate BGP update is configured (which is the default) then the corresponding routes are immediately withdrawn. Obviously, there are other sources for this specific frequency and a more detailed analysis should be performed.

It was a bit disappointing to have no statistics on the instability depending on the prefix length. With all the data collected an analysis like this should be possible.

Recommendations to improve routing stability were

  • to use BGP route flap dampening
  • to aggregate as much as possible using CIDR
  • to filter
As already seen at previous RIPE meetings route flap dampening is an important topic which has been dealt with before. A BOF on this topic was announced (see below).
Y. General input from other WGs
There was no current input from other WGs.
Z. AOB
Christian Panigl announced that a BOF session on route flap dampening was planned. The minutes of this session are included here.

Route Flap Dampening BOF, RIPE 26, 22.1.97, 14:00

 

Chairman: Christian Panigl (CP)
Scribe: Joachim Schmitz (JS)
Attendees: approx 30

In the Routing WG session Christian Panigl asked whether people are interested to participate in a BOF on route flap dampening. The BOF session was held after the plenary session of the RIPE meeting on Wednesday.

CP experienced quite severe reachability problems of customer networks because route flap dampening became active at various AS borders following scheduled maintenance actions on a core router. If the default dampening parameters were used everywhere, it wouldn't have hurt that much, because dampening would have lasted for ~20-30 minutes only for all prefixes.

Some backbone ISPs, however, have started to implement "progressive route flap dampening" typically using different parameters. The common effect is that longer prefixes are dampened more aggressively than shorter prefixes.

In the observed case all /24 customer networks were cut off from parts of the Internet for more than 2 hours and were no longer able to reach for instance the root nameservers. By the way, many, even top- and second-level nameservers are sitting in /24 (192/TWD) prefixes themselves and could easily be "victims" of such a progressive dampening policy ! This also applies to PI address space and multi-homed site prefixes.

CP wasn't branding route flap dampening itself, but the aggressiveness of some of the implemented "progressive" parameters and was questioning the real usefulness of progressive dampening at all.

Following CP's introduction a vivid discussion on route flap dampening came off:

 

  • Does flapping really depend on the prefix length?
    • To the knowledge of people attending the BOF session no measurements exist. Although several items were already measured by Merit on the stability of routes (as seen in the presentation by G.Winters in the Routing WG) they did not include a stability analysis with regard to the prefix length. If flapping does not necessarily depend on the prefix length longer prefixes should not be punished by more aggressive dampening.
    • However, the number of longer prefixes in the routing tables is much bigger than the number of shorter ones. As a consequence, if the percentage of flapping routes is the same for all prefix lengths the absolute number of flaps will be definitely higher for longer prefixes As each flap consumes the same performance on the router (regardless of the prefix length) and to get the the best CPU saving factor, longer prefixes should be dampened more aggressively.
    • Further justification for the latter was primarily based on the assumption that longer prefixes are serving less users, which did not stay uncontradicted (e.g. think of important servers sitting in a /24).
  • Which networks or prefixes are "important"?
    • Stating that shorter prefixes are more important because they cover more users doesn't hold in general. On the one hand this may be valuable and motivate ISPs to CIDRize and customers to renumber, on the other hand it may lead to the situation that organisations try to get (or keep, think of Class A/B recycling) as short a prefix as possible, wasting address space without having to care for stability. In this case instability would be moved to shorter prefixes which is far from desirable.
    • Long prefixes need not be instable. There are discussions to use long prefix routes ("golden networks") for root nameservers or for other Internet structure servers (even for application servers as news, etc). It can be well assumed that these routes are more stable than others and they must not be dampened too aggressively in order not to tackle the functionality of the Internet itself.

During all the discussion the general consensus was clear: for routers with large BGP tables (notably with full routing) the CPU load caused by flapping would kill any existing router. To survive instabilities route flap dampening should be applied by everybody. However, it was obvious that dampening parameters need to be coordinated throughout the Internet in order to
  • allow efficient dampening and easy clearing after repair
  • dampen flaps at their source by keeping them from spreading in the network
This will significantly increase overall stability and manageability.

The group was forming into two major camps with regard to how dampening should be done:

  • progressive dampening: needs to be accompanied by means to explicitely exclude "golden networks" from "hostility acts"
  • flat dampening: because it's very hard to make a distinction between less and more "important", not to say "golden" networks, all prefixes could be treated equally.
Nonetheless, efforts should be focussed on the propagation of equal dampening throughout the Internet.

The default values for dampening parameters as they are found in Cisco routers are based upon some experiments approx one year ago. These experiments lead to recommendations by the IETF last year.

Nevertheless, some ISPs have moved away from the default values and are using their own parameters. Because of the urgent need of coordination of these values CP will try to collect related recommendations and the outcome of similar discussions. This is an activity of the RIPE Routing WG, therefore everybody who is aware of related efforts (IETF, NANOG, ...) should come back to the Routing WG list with hints and pointers !

 

New Action 26.R5 on Chrisian Panigl
To collect reasonable route flap dampening parameter values and to present them at the next RIPE meeting in the Routing WG

Further reading:
ftp://ftp.ripe.net/ripe/minutes/ripe-m-24.ps
ftp://ftp.ripe.net/ripe/minutes/ripe-m-25.ps
http://www.ripe.net/wg/routing/r25-routing.html
ftp://ftp.ripe.net/ripe/presentations/ripe-m25-tbarber-bgp-damp.html

 

Agenda

- Routing Working Group Meeting -

Agenda for RIPE-26, Jan 1997, Amsterdam

 

A. Administrative issues (J.Schmitz)
  • volunteering of the scribe
  • agenda bashing
  • minutes from last meeting
  • open actions
B. Hierarchical authorisation for route objects (J.Schmitz)
  • current state
  • discussion
C. Report on route aggregation by the RIPE NCC (D.Karrenberg, RIPE NCC)
  • in general
  • route aggregation
D. New Developments of RATools (D.Kessens, ISI)
  • new tools
  • aoe
E. Report on routing stability (G.Winters, Merit)
  • measurements by Merit
  • route flap dampening
Y. General input from other WGs

 

Z. AOB

 

Summary of Actions

Action 22.10 on Joachim Schmitz
To trigger the discussion on the mailing list of the Routing WG, which focus to choose for a future tool development project and to come to consensus on it
Action 25.R1 on Daniel Karrenberg/RIPE NCC
To report on the results from the route aggregation analysis on the next RIPE meeting
Action 26.R1 on the RIPE NCC
To add a link on the RIPE web server from the Routing WG pages to the CIDR FAQ location
Action 26.R2 on Joachim Schmitz
To trigger database implementation of first discussion results from hierarchical authorization for route objects
Action 26.R3 on Joachim Schmitz
To finalize the hierarchical authorization for route objects together with the Routing WG
Action 26.R4 on Eric-Jan Bos
To circulate the URL of his analysis of routing table size on the mailing list
Action 26.R5 on Christian Panigl
To collect reasonable route flap dampening parameter values and to present them at the next RIPE meeting in the Routing WG

Chris Fletcher, Joachim Schmitz, Christian Panigl
21/5/97