This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/routing-wg@ripe.net/

[routing-wg] How BGP routes can get 'stuck' in the Default-Free Zone

Previous message (by thread): [routing-wg] New on RIPE Labs: Does The Internet Route Around Damage? - Edition 2021
Next message (by thread): [routing-wg] AS8003 and U.S. Department of Defense routing

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Job Snijders job at fastly.com
Wed Apr 21 14:00:52 CEST 2021

Dear group,

I'd like to draw your attention to an excellent article on an intricate
interaction between BGP and TCP which can result in 'zombie routes' in
the BGP Default-Free Zone.

    https://blog.benjojo.co.uk/post/bgp-stuck-routes-tcp-zero-window

My current running theory on the root cause of some mishaps in the
global routing system is that certain BGP implementations can end up in
a broken state where such systems will still generate and send out
KEEPALIVE messages, but are unable to process other BGP messages (and
such a system instructs all its peers to not send new data by signalling
a zero TCP receive window). This is "Problem #1".

"Problem #2" is that almost all BGP implementations are unable to
robustly deal with systems suffering from Problem #1. Allmost all BGP
implementations assume that when KEEPALIVE messages don't make it across
the wire, the remote system will initiate the session tear down. But of
course, if the remote system is in such a broken state that it can't
issue session tear downs ... the combined system state is perpetually
broken.

The Security Section of https://datatracker.ietf.org/doc/html/draft-spaghetti-idr-bgp-sendholdtimer
elaborates on three detrimental facets of the above situation.

It is quite rare for systems to end up in the "Problem #1" state, but
when it happens, all systems connected to the broken node probably are
better off disconnecting from such a system than perpetually forwarding
(and potentially blackholing) Internet traffic into the broken system.

Kind regards,

Job

Previous message (by thread): [routing-wg] New on RIPE Labs: Does The Internet Route Around Damage? - Edition 2021
Next message (by thread): [routing-wg] AS8003 and U.S. Department of Defense routing

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ routing-wg Archives ]