Routing WG minutes from RIPE 26
| RIPE Meeting:
|
26
|
| Working Group:
|
Routing
|
| Status:
|
Final
|
| Revision Number:
|
2
|
Please mail comments/suggestions on:
Report of Meeting, 20th January 1997
- A. Administrative Issues
- Joachim Schmitz, Chairman, presided and welcomed people to the meeting.
There were 95 attenders. Chris Fletcher took minutes.
The draft agenda
circulated in the previous week was agreed. A copy is displayed at the end of
this summary. There were no additions or changes to the minutes of the RIPE 25
Routing WG session. A short overview of open actions was given.
- Action 22.10 on Joachim Schmitz
- To trigger the discussion on the mailing list of the Routing WG, which
focus to choose for a future tool development project and to come to consensus
on it
is still open. It will be pursued after the current meeting.
- Action 24.4 on Joachim Schmitz
- To investigate the status of the CIDR FAQ and see whether additions are
needed, probably by triggering a discussion on the mailing list
According to the result of the investigation no official additions by the
Routing WG are needed. Therefore, the WG agreed to close this action. However,
it was felt that further distribution of knowledge about CIDR is needed and
therefore a pointer to the CIDR FAQ location should be included on the Routing
WG pages at the RIPE NCC weg server.
- New Action 26.R1 on the RIPE NCC
- To add a link on the RIPE web server from the Routing WG pages to the CIDR
FAQ location
- B.
Hierarchical authorisation for route objects (J.Schmitz)
- One major topic of this session was the discussion on hierarchical
authorisation for route objects. There was already some discussion on the
mailing list regarding various issues involved. To continue with the discussion
J.Schmitz compiled the current state in a presentation. The transparencies of
the presentation are available
by
ftp. Based upon this several of the open issues were discussed in the WG
session.
Last year improvements to authorisation during the interaction with
the database were discussed. One of the elements is hierarchical
authorisation and the first implementations of it were done for inetnum objects
and domains. Up to now there is no hierarchical authorization scheme for route
objects. Following the same reasoning as for inetnum objects - to protect the
objects against unauthorized changes - there definitely is a need to apply
hierarchical authorisation also to route objects; route objects in the RIPE
database are already used by some ISPs to build configuration elements for
their routers. Obviously, this calls for stronger protection.
The route objects do not stand alone. They do have relationships to other
objects in the database
- Relation to the autnum object
AS numbers constitute routing entities defined in the autnum object. They are
closely related to routing and from a logical point of view are hierarchically
higher than route objects. However, they form a flat space of numbers and a
hierarchy among themselves is difficult to apply. Moreover, autnum objects in
the current version of the database do not point to route objects making it
difficult to implement a top-to-down search mechanism from autnum objects to
route objects. Therefore, some hierarchical authorisation scheme starting from
the autnum objects seems to be unapplicable on first sight.
Yet, route objects point to two other objects at the same time, both pointers
being mandatory: first they point to corresponding autnum objects via the
"origin" attribute, and in addition a maintainer must be specified.
Interesting enough this maintainer needs not be the same as the maintainer
specified in the autnum object allowing creation of route objects for some AS
using a completely independent maintainer. Obviously, there should be a
possibility to prevent it introducing a hierarchical scheme: the proposal is to
allow "mnt-lower" attributes within autnum objects defining which
maintainers may create route objects for the AS of the corresponding autnum
object.
- Relation to inetnum objects
Several people are not happy about the fact that there is no reference to
address allocation for route objects because address space and its routing are
somehow related. There are proposals to make route objects dependent on inetnum
objects. The big advantage of such a scheme is that there will be no routes
without allocation of address space which is a very appealing approach!
However, it is not at all applicable to pure routing registries. Therefore, a
dependency of route objects from inetnum objects seems to make the overall
handling too complicated. Establishing a mere combination of address space
ownership and route objects might be more easy. This may be (and in most cases
already is) done somehow by the maintainer object. However, it is again not
applicable to pure routing registries. Moreover, ownership and routing often
differ, and changes in routing may demand changes in inetnum objects. In
general, there are also many more inetnum objects than route objects (because
of CIDR) which makes the relation again more complicated. There also is the
opinion that the philosophy of the database should be to mirror the real world.
In the real world advertised routes are completely controlled by an AS making
other authorisation irrelevant. No policy should be enforced which does not
exist in the real world. Maybe, consistency can be achieved through
notification to flag discrepencies.
Obviously, the relationship is difficult and there was not yet any consensus on
how to deal with it. The problems might be solved in a unified distributed
global registry but there is no immediate solution. More discussion is needed.
- A prefix based hierarchical scheme
With inetnum objects in the database a prefix based hierarchical scheme is used
making shorter prefixes hierarchically higher than longer ones, controlling
longer prefixes below them. This scheme could also easily be applied to route
objects. Its big advantage is that already a working mechanism exists. However,
there are also several problems in it:
- To make authorisation work some starting point in form of top level route
objects must be created in each registry in order to prevent that anybody may
gain control of the whole tree of route objects. These top level route objects
are kind of artificial because they do not reflect any routing at all. Starting
points are especially difficult to apply to the swamp 192.x.
- If a hierarchical authorisation based on prefix length is enforced
only one route object per IP-range would be allowed. This is different from
what can be found in usage of the database today. It will cause difficulties in
several cases:
- Multihoming today is expressed by several route objects of differing
origin. If only one route object is allowed multihoming could not be expressed
in the database. This again might be circumvented by introducing multiple
origins per route object which again raises the question of authorisation by
maintainers.
- Relatively often specific IP-subranges are routed by another AS than a
given IP-range. If the maintainer of the IP-range allows handling of
IP-subranges in general (by specifying a corresponding "mnt-lower"
attribute), any IP-subrange may be captured by the maintainers given in the
mnt-lower attribute. This may not be intended. A possible solution might be in
extending the usage of the "hole" attribute being already an optional
attribute of the route object: holes could be excluded from hierarchical
authorisation down to the next longer prefix specified.
- If a customer changes from one ISP to another the origin of the
corresponding route object changes, too. To facilitate the change, currently
for a few weeks both ISPs keep a route object of the same IP range with their
AS as origin. This is especially important if they generate elements of their
router configurations from route objects in the database. The problem is
similar to multihoming however with the focus on transition of maintainers.
- Even though route objects could be secured by hierarchical authorisation in
one registry they are not necessarily protected in another registry because
data in different registries do not depend on each other. As a consequence
duplication not only of the data but also of the specific hierarchy is
indispensable.
Obviously, enforcement of a prefix based hierarchical authorisation causes
troubles which can not be solved within a short time.
- A temporary suggestion
The prefix based scheme is very valuable and should be applied somehow. A
temporary suggestion is to apply it but not to enforce it, just to notify. The
advantages are that it is built upon a working mechanism and nothing much
changes. A starting point in form of top level route objects is not needed and
duplication in several routing registries has no consequences. Still several
route objects of differing origin per IP range are allowed and conflicts with
current practice in using the database do not occur. Yet, a simple notification
does not solve conflicts. The "owner" can not remove conflicting
records - but with conflicts two parties exist and both must resolve the
problem together. The notification does not solve the problem in itself but it
will flag it to be there and that a solution is needed.
However, objects in one registry remain unsecure if hierarchical authorisation
is applied in another and in the end this is no real solution. Nonetheless,
this is a slight improvement to the current situation which is upwards
compatible because notification is also needed if authorisation is enforced in
the future.
There was a general consensus that pure notification as a temporary solution
should be applied. Later, on the way to enforcing authorisation some kind of
approval mechanism could be probably installed if errors occur which come from
insufficient authorisation for the action requested. With notification several
new questions arise:
- To keep notification traffic low it might be useful to notify only if it is
requested by an object hierarchically higher. Currently, in the database
notification only occurs if requested. Because of the importance of route
objects one might choose to notify of overlapping routes in all cases.
- It might be useful to trigger notification by route objects only. To take
the importance of inetnum objects into account one might think of notifications
in cases of inetnum objects of overlapping address space.
- It might be useful to notify the creator of route objects of the other
notifications for coordination purposes.
- Up to now notification is done only for the creation of objects, not for
changes or deletions. To prevent floods of mail this should be kept.
In general, several issues could be clarified but there remains quite a lot to
do. The items where consensus was found should be implemented soon and
discussion on the other items (still the majority) must continue.
- New Action 26.R2 on Joachim Schmitz
- To trigger database implementation of first discussion results from
hierarchical authorization for route objects
- New Action 26.R3 on Joachim Schmitz
- To finalize the hierarchical authorization for route objects together with
the Routing WG
- C. Report on route aggregation by the RIPE NCC (D.Karrenberg, RIPE
NCC)
- The report on route aggregation could not be given because the statistics
mechanism was offline for two months. Therefore, nothing new can be reported.
DK was asked to fix it and to report again at the next RIPE meeting.
- Action 25.R1 on Daniel Karrenberg/RIPE NCC
- To report on the results from the route aggregation analysis on the next
RIPE meeting
There is other data available from SURFnet showing growth of the Internet:
- New Action 26.R4 on Eric-Jan Bos
- To circulate the URL of his analysis of routing table size on the mailing
list
This has already been done - see
ftp://ftp.surfnet.nl/surfnet/net-management/ip/nets.ps
- it contains data on the growth of the global routing table since 1 January
1994.
- D. New Developments of RATools (D.Kessens, ISI)
- The RATools as part of the Routing Arbiter Project of Merit and ISI are a
valuable means to make use of registry data and to compare it to the real
world. David Kessens (formerly RIPE NCC, now at ISI) gave an overview of new
developments in version 3.4.x and 3.5.1 of the RAToolSet. There were noticable
enhancements for RtConfig, a tool which allows automatic generation of router
configuration elements from registry data. Moreover, the aoe (autnum object
editor) is now in production. The overview of the RAToolSet news is available
as
ftp://ftp.ripe.net/ripe/presentations/ripe-m26-davidk-ratools.ps.gz
and contains information about where to find the software (WWW, ftp).
In a
second presentation David Kessens introduced the aoe (autnum object editor).
With this new tool autnum objects can be automatically generated for
registries. Data of neighbors and for the policies can be taken from existing
databases, from real life BGP dumps, or entered manually including heuristics.
It has a user friendly interface including a GUI (based on X.11 Tcl/Tk), an
"on-line" help, and it makes updates to IRRs easy. The functionality
of aoe was explained by various examples and it turns out that aoe can make
work with autnum objects much easier. In the future, RPSL will also be
supported (up to now aoe generates RIPE-181++ syntax), cooperation with other
tools will be implemented, and more and better heuristic methods will be
included. The aoe shares the same requirements with the RAToolSet (which it is
part of) being
- gcc 2.7.2 or later
- libg++ 2.7.2 or later
- Tcl 7.5/Tk 4.1 or later
in version 3.5.1 of the RAToolSet. The presentation of aoe is available as
ftp://ftp.ripe.net/ripe/presentations/ripe-m26-davidk-aoe.ps.gz.
During the RIPE meeting a test installation of the RA ToolSet was
accessible to the participants.
- E. Report on routing stability (G.Winters, Merit)
- Since routing stability has become a major issue in the Internet it was
very interesting to have a presentation of recent measurements done by Merit.
Gerald Winters of Merit showed results from Craig Labovitz which he (Craig) had
presented at the May '96 Nanog meeting supplemented by some newer studies. The
presentation of Craig Labovitz is available as
ftp://home.merit.edu/pub/users/labovit/talks/nanog-9605/instability.ps
and further information may be found at http://nic.merit.edu/~ipma/.
The
presentation was very interesting and caused a lot of discussion. Topics from
the presentation and the discussion are comprised below: As it turns out there
are peaks of close to 10 million BGP announcements and withdrawals in a given
day with more than 100 BGP updates per second. Major providers are not major
causes of instability while individual ISPs and end-sites can have a
disproportionate effect on routing stability. It is interesting that BGP
traffic is a function of weekday/weekend and even of the time of the day. A
correlation to the amount of traffic is speculative, however there might be an
indication of some correlation to maintenance work. Up to now there has been no
analysis whether long holidays show in the statistics as well.
For real big instability incidents (abnormal events) Merit people called
the originators of the instability to find out the reason. From a relatively
low number of 36 incidents it turned out that
- links are nearly no problem
- hardware is rarely a problem (but approx 3 times as often as link problems)
- software and configuration problems cause more than 50% of BGP updates
However, this does not give much indication on the reasons for the general
instability below major incidents. A graph showing the number of BGP
announcements (normalized to the number of routes) seemed to indicate that the
instability grows because a linear regression of the data showed an increase.
However, in the discussion it was noted that there was a very large variance in
the samples taken and a linear regression may not be justified and therefore
misleading. Moreover, the growth in complexity of the Internet may introduce
another effect: more routes are seen because of an increasing number of
peerings. Therefore, a mere normalisation of the number of BGP announcements to
the number of routes may not be sufficient. In the end (because the increase
was not very steep), there might be no indication for a growing instability at
all.
A more elaborated analysis of the BGP announcements shows that most of the
BGP updates are redundant or unnecessary with a large percentage of duplicates.
99% of the BGP traffic is withdrawals. One reason for this might be that
withdrawals are always sent to all peers and accepted there regardless
of outgoing or incoming filters. It is suspected that this behaviour is
actually wanted because especially if a new filter is set up, previously valid
routes should be withdrawn anyway.
Another interesting analysis showed the frequency of BGP updates for the
same prefix and origin. There were pronounced peaks at 30sec intervals. A close
relation to IGP updates is suspected. However, in the discussion it was also
mentioned that with Cisco routers the default keepalive on lines is 10sec and
the line protocols go down after three missed keepalives which is 30sec. If
immediate BGP update is configured (which is the default) then the
corresponding routes are immediately withdrawn. Obviously, there are other
sources for this specific frequency and a more detailed analysis should be
performed.
It was a bit disappointing to have no statistics on the instability
depending on the prefix length. With all the data collected an analysis like
this should be possible.
Recommendations to improve routing stability were
- to use BGP route flap dampening
- to aggregate as much as possible using CIDR
- to filter
As already seen at previous RIPE meetings route flap dampening is an important
topic which has been dealt with before. A BOF on this topic was announced (see
below).
- Y. General input from other WGs
- There was no current input from other WGs.
- Z. AOB
- Christian Panigl announced that a BOF session on route flap dampening was
planned. The minutes of this session are included here.
Route Flap Dampening
BOF, RIPE 26, 22.1.97, 14:00
| Chairman: |
Christian Panigl (CP) |
| Scribe: |
Joachim Schmitz (JS) |
| Attendees: |
approx 30 |
In the Routing WG session Christian Panigl asked whether people are
interested to participate in a BOF on route flap dampening. The BOF session was
held after the plenary session of the RIPE meeting on Wednesday.
CP experienced quite severe reachability problems of customer networks
because route flap dampening became active at various AS borders following
scheduled maintenance actions on a core router. If the default dampening
parameters were used everywhere, it wouldn't have hurt that much, because
dampening would have lasted for ~20-30 minutes only for all prefixes.
Some backbone ISPs, however, have started to implement "progressive
route flap dampening" typically using different parameters. The common
effect is that longer prefixes are dampened more aggressively than shorter
prefixes.
In the observed case all /24 customer networks were cut off from parts of
the Internet for more than 2 hours and were no longer able to reach for
instance the root nameservers. By the way, many, even top- and second-level
nameservers are sitting in /24 (192/TWD) prefixes themselves and could easily
be "victims" of such a progressive dampening policy ! This also
applies to PI address space and multi-homed site prefixes.
CP wasn't branding route flap dampening itself, but the aggressiveness of
some of the implemented "progressive" parameters and was questioning
the real usefulness of progressive dampening at all.
Following CP's introduction a vivid discussion on route flap dampening came
off:
- Does flapping really depend on the prefix length?
- To the knowledge of people attending the BOF session no measurements exist.
Although several items were already measured by Merit on the stability of
routes (as seen in the presentation by G.Winters in the Routing WG) they did
not include a stability analysis with regard to the prefix length. If flapping
does not necessarily depend on the prefix length longer prefixes should not be
punished by more aggressive dampening.
- However, the number of longer prefixes in the routing tables is much bigger
than the number of shorter ones. As a consequence, if the percentage of
flapping routes is the same for all prefix lengths the absolute number of flaps
will be definitely higher for longer prefixes As each flap consumes the same
performance on the router (regardless of the prefix length) and to get the the
best CPU saving factor, longer prefixes should be dampened more aggressively.
- Further justification for the latter was primarily based on the assumption
that longer prefixes are serving less users, which did not stay uncontradicted
(e.g. think of important servers sitting in a /24).
- Which networks or prefixes are "important"?
- Stating that shorter prefixes are more important because they cover more
users doesn't hold in general. On the one hand this may be valuable and
motivate ISPs to CIDRize and customers to renumber, on the other hand it may
lead to the situation that organisations try to get (or keep, think of Class
A/B recycling) as short a prefix as possible, wasting address space without
having to care for stability. In this case instability would be moved to
shorter prefixes which is far from desirable.
- Long prefixes need not be instable. There are discussions to use long
prefix routes ("golden networks") for root nameservers or for other
Internet structure servers (even for application servers as news, etc). It can
be well assumed that these routes are more stable than others and they must not
be dampened too aggressively in order not to tackle the functionality of the
Internet itself.
During all the discussion the general consensus was clear: for routers with
large BGP tables (notably with full routing) the CPU load caused by flapping
would kill any existing router. To survive instabilities route flap dampening
should be applied by everybody. However, it was obvious that dampening
parameters need to be coordinated throughout the Internet in order to
- allow efficient dampening and easy clearing after repair
- dampen flaps at their source by keeping them from spreading in the network
This will significantly increase overall stability and manageability. The
group was forming into two major camps with regard to how dampening should be
done:
- progressive dampening: needs to be accompanied by means to explicitely
exclude "golden networks" from "hostility acts"
- flat dampening: because it's very hard to make a distinction between less
and more "important", not to say "golden" networks, all
prefixes could be treated equally.
Nonetheless, efforts should be focussed on the propagation of equal dampening
throughout the Internet. The default values for dampening parameters as they
are found in Cisco routers are based upon some experiments approx one year ago.
These experiments lead to recommendations by the IETF last year.
Nevertheless, some ISPs have moved away from the default values and are
using their own parameters. Because of the urgent need of coordination of these
values CP will try to collect related recommendations and the outcome of
similar discussions. This is an activity of the RIPE Routing WG, therefore
everybody who is aware of related efforts (IETF, NANOG, ...) should come back
to the Routing WG list with hints and pointers !
- New Action 26.R5 on Chrisian Panigl
- To collect reasonable route flap dampening parameter values and to present
them at the next RIPE meeting in the Routing WG
Further reading:
ftp://ftp.ripe.net/ripe/minutes/ripe-m-24.ps
ftp://ftp.ripe.net/ripe/minutes/ripe-m-25.ps
http://www.ripe.net/wg/routing/r25-routing.html
ftp://ftp.ripe.net/ripe/presentations/ripe-m25-tbarber-bgp-damp.html
Agenda
- Routing Working Group Meeting -
Agenda for RIPE-26, Jan 1997, Amsterdam
- A. Administrative issues (J.Schmitz)
- volunteering of the scribe
- agenda bashing
- minutes from last meeting
- open actions
- B. Hierarchical authorisation for route objects (J.Schmitz)
-
- C. Report on route aggregation by the RIPE NCC (D.Karrenberg, RIPE NCC)
- in general
- route aggregation
- D. New Developments of RATools (D.Kessens, ISI)
-
- E. Report on routing stability (G.Winters, Merit)
- measurements by Merit
- route flap dampening
- Y. General input from other WGs
- Z. AOB
Summary of Actions
- Action 22.10 on Joachim Schmitz
- To trigger the discussion on the mailing list of the Routing WG, which
focus to choose for a future tool development project and to come to consensus
on it
- Action 25.R1 on Daniel Karrenberg/RIPE NCC
- To report on the results from the route aggregation analysis on the next
RIPE meeting
- Action 26.R1 on the RIPE NCC
- To add a link on the RIPE web server from the Routing WG pages to the CIDR
FAQ location
- Action 26.R2 on Joachim Schmitz
- To trigger database implementation of first discussion results from
hierarchical authorization for route objects
- Action 26.R3 on Joachim Schmitz
- To finalize the hierarchical authorization for route objects together with
the Routing WG
- Action 26.R4 on Eric-Jan Bos
- To circulate the URL of his analysis of routing table size on the mailing
list
- Action 26.R5 on Christian Panigl
- To collect reasonable route flap dampening parameter values and to present
them at the next RIPE meeting in the Routing WG
Chris Fletcher, Joachim Schmitz, Christian Panigl
21/5/97
|