Draft of Route-Flap Dampening Paper
- Date: Mon, 22 Sep 1997 18:05:56 MET
Dear members of the Routing-WG,
here is the final draft version of the "Route Flap Dampening" paper as
it came out of our task-force. Please, if possible, have a look at it
before the WG meeting on Wednesday morning and bring your comments to
the meeting and/or post them to the list.
See you
Christian
--- ---------------------------------------------------------------------- ---
--- Christian Panigl : Vienna University Computer Center - ACOnet ---
--- VUCC - ACOnet - VIX : -------------------------------------------- ---
--- Universitaetsstrasse 7 : Mail: Panigl@localhost (CP8-RIPE) ---
--- A-1010 Vienna / Austria : Tel: +43 1 4277-14032 (Fax: -9140) ---
--- ---------------------------------------------------------------------- ---
===============================================================================
RIPE Routing-WG
Recommendation for coordinated route-flap dampening paramters
Tony Barber
Sean Doran
Daniel Karrenberg
Christian Panigl
Joachim Schmitz
Document status: DRAFT 1.2 22-SEP-1997
ABSTRACT
This paper recommends a set of route-flap dampening parameters which
should be applied by all ISPs in the Internet and should be deployed as
new default values by BGP router vendors.
Table of Contents
ABSTRACT
1. Introduction
1.1 Motivation for route-flap dampening
1.2 What is route-flap dampening ?
1.3 "Progressive" versus "flat&gentle" approach
1.4 Motivation for coordinated parameters
2. Recommended dampening parameters
2.1 Motivation for recommendation
2.2 Description of recommended dampening parameters
2.3 Example configuration for Cisco IOS
2.4 No BGP fast-external-fallover (Cisco IOS)
2.5 Clear IP BGP soft (Cisco IOS)
3. Open problems
3.1 Multiplication of flaps through multiply interconnected ASes
3.2 Is dampening of customer route-flaps a good idea ?
4. References
1. Introduction
In the Routing WG session of RIPE26 Christian Panigl asked whether
people are interested to participate in a BOF on route flap dampening.
The BOF session was held after the plenary session of RIPE26.
The discussion was continued in the Routing WG session of RIPE27 and led
to a task-force directed to write a proposal document for coordinated
route-flap dampening parameters.
1.1 Motivation for route-flap dampening
About 1993/94 the massive growth of the Internet with regard to the
number of announced prefixes (often due to inadequate
prefix-aggregation), multiple paths and instabilities started to do
significant harm to the efficiency of the core routers of the Internet.
Every single line-flap at the periphery which makes a routing prefix
unreachable has to be advertised to the whole core Internet and has to
be dealt by every single router by means of updates of the
routing-table.
To overcome this situation a route-flap dampening mechanism was
"invented" in 1993 and has been integrated into several router code
since 1995 (Cisco, ISI/RSd, GateD Consortium). It significantly helps
now with keeping severe instabilities more local. And there's a second
benfit: it's raising the awareness of the existence of instabilities
because severe route/line-flapping problems lead to permanent
suppression of the unstable area by means of holding down the flapping
prefixes.
1.2 What is route-flap dampening ?
Route-flap dampening is a mechanism for (BGP) routers which is aimed at
improving the overall stability of the Internet routing table and
offloading core-routers CPUs.
When BGP route-flap dampening is enabled in a router the router starts
to collect statistics about the announcement and withdrawal of prefixes.
Route-flap dampening is governed by a set of parameters with
vendor-supplied default values which may be modified by the router
manager. The names, semantic and syntax of these parameters differ
between the various implementations, however, the behaviour of the
dampening mechanism is basically the same:
If a threshold of the number of pairs of withdrawals/announcements
(=flap) is exceeded in a given timeframe the prefix is held down for a
calculated period (penalty) which is further incremented with every
subsequent flap. The penalty is then decremented by using a half-life
parameter whenever the prefix is visible until the penalty is below a
reuse-threshold. Therefore, after beeing stable up for a certain period
the hold-down is released from the prefix and it is re-used and
re-advertised.
Pointers to some more detailed and vendor specific documents:
Cisco BGP Case Studies: Route Flap Dampening
http://www.cisco.com/warp/public/459/16.html
ISI/RSd Configuration: Route Flap Dampening
http://www.isi.edu/div7/ra/RSd/doc/dampen.html
GateD Configuration: Weighted Route Dampening Statement
http://www.gated.org/new_web/code/doc/gated-uni/config_guide/wrd.html
See also "4. References"
1.3 "Progressive" versus "flat&gentle" approach
One easy approach would be to just apply the current default-parameters
which are treating all prefixes equally ("flat&gentle") everywhere,
however, there is a major concern to penalise longer prefixes (=smaller
aggregates) more than well aggregated short prefixes ("progressive"),
because the number of short prefixes in the routing table is
significantly lower and it seems in general that those are tending to be
much more stable.
Another aspect is that progressive dampening might increase the
awareness of aggregation needs, however, it has to be accompanied by a
careful design which doesn't force a rush for higher than required IP
address-range allocations.
Because a significant number of important services is sitting in long
prefixes (e.g. root nameservers) the progressive approach has to exclude
the strong penalisation for those long but "golden" prefixes.
With this recommendation we are trying to make a compromise and call it
therefor "graded dampening".
1.4 Motivation for coordinated parameters
There is a strong need for the coordinated use of dampening parameters
because of several reasons:
Coordination of "progressiveness":
If the boundaries for different treatment of longer prefixes and the
penalties are not coordinated throughout the Internet, route-flap
dampening could even lead to additional flapping or temporary
routing-loops because longer prefixes might already be re-announced
through some parts of the Internet where shorter prefixes are still held
down through other paths.
Coordination of "aggressiveness":
If an upstream or peering provider would be dampening more aggressively
(e.g. triggered by less flaps or applying longer hold-down timers) than
an access-provider towards his customers it will lead to a very
inconsistent situation, where a flapping network might still be able to
reach "near-line" parts of the Internet. Debugging of such
instabilities is then much harder because the effect for the customer
leads to the assumption that there is a problem "somewhere" in the
"upstream" Internet instead of making him just call his ISPs hotline and
complain that he can't get out any longer.
Further, after successful repair of the problem the access-provider can
easily clear the flap-dampening for his customer on his local router
instead of needing to contact upstream NOCs all over the Internet to get
the dampening cleared.
2. Recommended dampening parameters
2.1 Motivation for recommendation
At RIPE26 and 27 Christian Panigl presented the following network
backbone maintenance example from his own experience, which was
triggering flap dampening in some upstream and peering ISPs routers for
all his and his customers /24 prefixes for more than 3 hours because of
too "aggressive" paramters:
scheduled SW upgrade of backbone router failed:
- reload after SW upgrade 1 flap
- new SW crashed 1 flap
- reload with old SW 1 flap
------
3 flaps within 10 minutes
which resulted in the following dampening scenario at some boundaries
with progressive route-flap dampening enabled:
Prefix length: /24 /19 /16
suppress time: ~3h 45-60' <30'
Therefore, in the Routing-WG session at RIPE27, it was agreed that
suppression should not start until the 4th flap in a row and that the
maximum suppression should in no case last longer than 1 hour from the
last flap.
It was agreed that a recommendation from RIPE would be desirable. Given
that the current allocation policies are expected to hold for the
foreseeable future, it was suggested that all /19's or shorter prefixes
are not penalised harder than current Cisco default dampening does.
Those suggestions in mind Tony Barber designed the following set of
route-flap dampening parameters which have prooved to work smoothly in
his environment for a couple of months.
2.2 Description of recommended dampening parameters
Basically the recommended values do the following with harsher treatment
for /24 and longer prefixes:
- don't start dampening before the 4th flap in a row
- /24 and longer prefixes: max=min outage 60 minutes
- /22 and /23 prefixes: max outage 45 minutes but potential for less
because of half life value - minimum of 30 minutes outage
- all else prefixes: max outage 30 minutes min outage 10 minutes
2.3 Example configuration for Cisco IOS
! Parameters are :
! set dampening <half life> <reuse-at> <supress-at> <max suppress time>
! There is a 1000 penalty for each flap
! Penalty decays at granularity of 5 seconds
! Unsuppressed at granularity of 10 seconds
! Dampening info kept until penalty becomes < half of reuse limit.
!
router bgp 65500
!no bgp damp
bgp damp route-map graded-flap-dampening
!
! don't dampen candidate default routes ! OPTIONAL (not part of recommendation)
! access-list 189 is the candidate default routes
!
no route-map graded-flap-dampening deny 5
route-map graded-flap-dampening deny 5
match ip address 189
!
! don't dampen root nameserver nets
!
no route-map graded-flap-dampening deny 7
route-map graded-flap-dampening deny 7
match ip address 180
!
! Heavy dampening of all networks which have a mask of
! /24 and above. These are supressed into a datastructure
! with a half life of 30 minutes, only re-use when reaches 750
! Max outage of 60 minutes.
!
no route-map graded-flap-dampening permit 10
route-map graded-flap-dampening permit 10
match ip address 181
set dampening 30 750 3000 60
!
! dampen /23 /22
! half life is now 15 minutes and reuse at 1000
!
no route-map graded-flap-dampening permit 20
route-map graded-flap-dampening permit 20
match ip address 182
set dampening 15 750 3000 45
!
! default dampening on all less than /22 defaults to this
! different to CISCO defaults which are 15 750 2000 30
! bgp dampening command
!
no route-map graded-flap-dampening permit 40
route-map graded-flap-dampening permit 40
set dampening 10 1500 3000 30
!
!-----------------------------------------------------------------------
! ACCESS LISTS 180-189 GO BELOW
!-----------------------------------------------------------------------
! access-lists 180 to 189 used or reserved for progressive route flap dampening
!
! 180 - BGP dampening - root-nameservers.net networks are NOT dampened
! This filter stops these networks being dampened.
! Also DONT dampen routes used to derive default (see list 7)
! but this is handled in a separate route-map statement.
! in the file dampening-confg.
! Route map uses DENY to drop out of map on matching.
!
no access-list 180
!
! A.ROOT-SERVERS.NET.
access-list 180 permit ip 198.41.0.0 0.0.0.0 255.255.252.0 0.0.0.0
!
! B.ROOT-SERVERS.NET.
access-list 180 permit ip 128.9.0.0 0.0.0.0 255.255.0.0 0.0.0.0
!
! C.ROOT-SERVERS.NET.
access-list 180 permit ip 192.33.4.0 0.0.0.0 255.255.255.0 0.0.0.0
!
! D.ROOT-SERVERS.NET.
access-list 180 permit ip 128.8.0.0 0.0.0.0 255.255.0.0 0.0.0.0
!
! E.ROOT-SERVERS.NET.
access-list 180 permit ip 192.203.230.0 0.0.0.0 255.255.255.0 0.0.0.0
!
! F.ROOT-SERVERS.NET.
access-list 180 permit ip 192.5.4.0 0.0.0.0 255.255.254.0 0.0.0.0
!
! G.ROOT-SERVERS.NET.
access-list 180 permit ip 192.112.36.0 0.0.0.0 255.255.255.0 0.0.0.0
!
! H.ROOT-SERVERS.NET.
access-list 180 permit ip 128.63.0.0 0.0.0.0 255.255.0.0 0.0.0.0
!
! I.ROOT-SERVERS.NET.
access-list 180 permit ip 192.36.148.0 0.0.0.0 255.255.255.0 0.0.0.0
!
! J.ROOT-SERVERS.NET. 198.41.0.10 same net as A
!
! K.ROOT-SERVERS.NET.
access-list 180 permit ip 193.0.14.0 0.0.0.0 255.255.255.0 0.0.0.0
!
! L.ROOT-SERVERS.NET. 198.32.64.12
access-list 180 permit ip 198.32.64.0 0.0.0.255 255.255.255.0 0.0.0.255
!
! M.ROOT-SERVERS.NET. 198.32.65.12
access-list 180 permit ip 198.32.65.0 0.0.0.255 255.255.255.0 0.0.0.255
!
!
! - 181 - dampens /24 and greater prefixes
!
no access-list 181
!
access-list 181 permit ip 0.0.0.0 255.255.255.255 255.255.255.0 0.0.0.255
access-list 181 deny ip 0.0.0.0 255.255.255.255 0.0.0.0 255.255.255.255
!
! - 182 - dampens /23 /22 and above
!
no access-list 182
!
access-list 182 permit ip 0.0.0.0 255.255.255.255 255.255.252.0 0.0.3.255
access-list 182 deny ip 0.0.0.0 255.255.255.255 0.0.0.0 255.255.255.255
!
! - 189 - Candidate default networks used in some customer bgp implementations
!
no access-list 189
!
access-list 189 permit ip !!! put your defaults in here
access-list 189 deny ip any any
!
2.4 No BGP fast-external-fallover (Cisco IOS)
In Cisco IOS there is a BGP configuration parameter
"fast-external-fallover" which when on (default) leads to an immediate
clearing of a BGP neighbor whenever the line-protocol to this external
neighbor goes down. If it is turned off the BGP sessions will survive
short line-flaps as they will use the longer BGP keepalive/hold timers
(default 60/180 seconds). The drawback of turning it off is that the
switchover to an alternative path will take longer. It is recommendet
to turn off fast-external-fallover:
!
router bgp 65501
no bgp fast-external-fallover
!
2.5 Clear IP BGP soft (Cisco IOS)
There is a new "soft" mechanism for the clearing of BGP sessions
available with Cisco IOS. For beeing able to make use of the
"clear ip bgp x.x.x.x soft" command both sides must support it and need
to be configured to accept it from the neighbor:
!
router bgp 65501
neighbor 10.0.0.2 remote-as 65502
neighbor 10.0.0.2 soft-reconfiguration inbound
!
Without the keyword "soft" a "clear ip bgp x.x.x.x" will always withdraw
all announced prefixes from/to neighbor x.x.x.x and re-advertise them (=
route-flap for all prefixes which are available before and after the
clear). With "clear ip bgp x.x.x.x soft" only those prefixes will be
withdrawn which are no longer available with the new update and only
those prefixes will be re-advertised which haven't been known before (no
route-flap).
3. Open problems
3.1 Multiplication of flaps through multiply interconnected ASes
Christian Panigl recently made the following experience with a line
upgrade of an Ebone customer:
- It is absolutely positive that through the upgrade process just ONE
flap was generated (disconnect router-port from modem A reconnect to
modem B), nevertheless the customers prefix was dampened in all ICM
routers (ICM/AS1800 is US upstream for Ebone).
- The flap statistics in the ICM routers stated *4* flaps !!!
- The only explanation would be that the multiple interconnections
between Ebone/AS1755 and ICM/AS1800 did multiply the flaps
(advertisements/withdrawals arrived time-shifted at ICM routers
through the multiple paths).
- This would then potentially hold true for any meshed topology because
of the propagation delays of advertisements/withdrawals.
- Workaround for scheduled actions like with the given example:
Schedule a downtime for at least 3-5 minutes which should be enough
for the prefix withdrawals to have propagated through all paths before
reconnection and re-advertisement of the prefix. Avoid clearing BGP
sessions as this is usually generating a 30" outage which might easily
give the same result.
- A solution has to be provided by the vendors !
3.2 Is dampening of customer route-flaps a good idea ?
As already explained in section 1.3 flap-dampening is at its best value
and most consistent and helpful if applied as near to the source of
the problem as possible. Therefore flap-dampening should not only be
applied at peering boundaries but even more at customer boundaries !
4. References
RIPE/Routing-WG Minutes dealing with Route Flap Dampening:
ftp://ftp.ripe.net/ripe/minutes/ripe-m-24.ps
ftp://ftp.ripe.net/ripe/minutes/ripe-m-25.ps
http://www.ripe.net/wg/routing/r25-routing.html
http://www.ripe.net/wg/routing/r26-routing.html
http://www.ripe.net/wg/routing/r27-routing.html
Curtis Villamizar, ANS: Controlling BGP/IDRP Routing Overhead
http://figaro.ans.net/route-dampen/
NANOG-Feb-1995 Route Flap Dampening Presentation (slides):
ftp://engr.ans.net/pub/papers/slides/nanog-feb-1995-route-dampen.ps
Merit/IPMA: Internet Routing Recommendations
http://www.merit.edu/~ipma/docs/help.html
Cisco BGP Case Studies: Route Flap Dampening
http://www.cisco.com/warp/public/459/16.html
ISI/RSd Configuration: Route Flap Dampening
http://www.isi.edu/div7/ra/RSd/doc/dampen.html
GateD Configuration: Weighted Route Dampening Statement
http://www.gated.org/new_web/code/doc/gated-uni/config_guide/wrd.html
|