Changes to RIPE Routing Working Group Recommendations On Route-flap Damping
Legend | (+) Added | (-) Deleted |
---|---|---|
Changed | Tag Added | Tag Deleted |
insert: <br />
This document discusses Route-flap Damping and recommends acceptable practices for ISPs who are considering deploying Route-flap Damping. delete: </p> delete: <hr /> delete: <p> delete: </p> delete: <p> delete: <a href="#intro"> 1.0 Introduction delete: </a> delete: <br /> delete: <a href="#1_1"> 1.1 Background delete: </a> delete: <br /> delete: <a href="#1_2"> 1.2 Coordination of Flap Damping Parameters delete: </a> delete: <br /> delete: <a href="#2"> 2.0 Current Status of Route-flap Damping delete: </a> delete: <br /> delete: <a href="#2_1"> 2.1 Impeded Convergence delete: </a> delete: <br /> delete: <a href="#2_2"> 2.2 Updates transiting the network delete: </a> delete: <br /> delete: <a href="#sol"> 3.0 Solutions delete: </a> delete: <br /> delete: <a href="#recommendation"> 4.0 Recommendation delete: </a> delete: <br /> delete: <a href="#conclusion"> 5.0 Conclusion delete: </a> delete: <br /> delete: <a href="#acknowledgements"> 6.0 Acknowledgements delete: </a> delete: <br /> delete: <a href="#aw"> 7.0 References delete: </a> delete: </p> delete: <p> delete: </p> delete: <hr /> delete: <h2> delete: <a name="intro"> delete: </a> 1.0 Introduction delete: </h2> delete: <p> Route-flap Damping (RFD) [1] RIPE Document is a mechanism for BGP speaking routers intended to improve the overall stability of the Internet routing table and reduce the load on the CPUs of the core routers. Unfortunately, due to the dynamics of the protocol, common simple configurations can do more harm than good, see [3,4]. delete: </p> delete: <h3> delete: <a name="1_1"> delete: </a> 1.1 Background delete: </h3> delete: <p> In the early 1990s the accelerating growth in the number of prefixes being announced to the Internet (often due to inadequate prefix-aggregation), the denser meshing through multiple inter-provider paths, and increased instabilities started to cause significant impact on the performance and efficiency of the Internet backbone routers. Every time a routing prefix became unreachable because of a single line-flap, the withdrawal was advertised to the whole core Internet and handled by every single router that carried the full Internet routing table. delete: <br /> delete: <br /> It was soon realized that the increasing routing churn created significant processing load on routing engines, sometimes sufficiently high load to cause router crashes. delete: <br /> delete: <br /> To overcome this situation RFD was developed in 1993 and has since been integrated into most router BGP software implementations. RFD is described in detail in RFC 2439. RFD is now used in many service provider networks in the Internet. delete: </p> delete: <h3> delete: <a id="1_2" name="1_2"> delete: </a> 1.2 Coordination of flap damping parameters delete: </h3> delete: <p> When RFD was first implemented in commercial routers, vendor implementations had different default values and different characteristics. As inconsistency would result in different rates of flap damping, and therefor introduce inconsistent path selection and thus behavior that was very hard to diagnose, the ISP community introduced a consistent set of recommendations for flap damping parameters, so that ISPs deploying RFD would treat flapping prefixes in the same way. delete: </p> delete: <p> This call for consistency resulted in the RIPE Routing Working Group producing first ripe-178, then ripe-210, and finally the ripe-229 documents [2], following consensus of the Routing Working Group. The parameters documented in ripe-229 were considered, at time of publication in 2001, the best current practice. delete: </p> delete: <hr /> delete: <h2> delete: <a name="2"> delete: </a> 2.0 Current Status of Route-flap Damping delete: </h2> delete: <p> Research in the years following the introduction of RFD into BGP implementations, and the publication of the RIPE Routing Working Group recommendations, has demonstrated that there are real and signficant problems with RFD as deployed on the Internet today. delete: </p> delete: <h3> delete: <a id="2_1" name="2_1"> delete: </a> 2.1 Impeded Convergence delete: </h3> delete: <p> Perhaps the best known work highlighting major problems with RFD is that by Zhuoquing Mao and colleagues, presented at Sigcomm in 2002. Following presentations by Randy Bush and colleagues explain the research work more accessibly. delete: </p> delete: <p> The major issue is that if one path is withdrawn, all BGP speakers will use best path selection to pick the next best path, and advertise this best path to all their neighbours. These neighbours will see a change in path; a change in path is a change in attribute, so the prefix as seen on a neighbouring router will attract a flap penalty - even though that path is perfectly valid and there has been no disappearance of the prefix from the routing table [5]. delete: </p> delete: <p> And this path "hunting" goes on throughout the Internet - a simple prefix withdrawal can result in the appearance of a major flap event a few AS hops away in the Internet, with the result that vendor default and even the RIPE-229 recommended flap damping parameters will mark the prefix to be suppressed. While the operator can see this is an error, the routers are simply reacting to the circumstances presented to them. delete: </p> delete: <h3> delete: <a id="2_2" name="2_2"> delete: </a> 2.2 Updates transiting the network delete: </h3> delete: <p> Problems are not just caused by path "hunting". Each implementation of BGP either has differing values of the Minimum Route Advertisement Interval (MRAI) Timer (the amount of time a router waits before passing on a route update) or does not implement MRAI at all in favour of the vendor's own throttling algorithm. delete: </p> delete: <p> Some implementations pass on the update without waiting at all, others wait for 30 seconds, etc. These differences mean that update messages transiting different ASNs using different vendor equipment will arrive at the target router at different times. This router will see these different messages, and will consider each one for best path options. This will more than likely result available in a different best path offered to its neighbours for each message update arriving. ASCII and PDF