[routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Previous message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Next message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Danny McPherson
danny at tcb.net
Mon Apr 6 15:54:09 CEST 2020
[top post only] Thanks for this Job, interesting analysis. Another question here: at what interval is data from a given RIR repository ingested / operationalized by a given network operator? Or put differently, any idea how much lag today between when an RIR RPKI repository has a change until that becomes OV policy in _your routers? I'm sure this varies but not sure by how much within a given operator, or across operators. -danny On 2020-04-05 14:29, Job Snijders wrote: > Dear Danny, others, > > On Fri, Apr 03, 2020 at 04:56:41PM -0400, Danny McPherson wrote: >> I also look forward to [your] analysis of the Rostelecom incident that >> occurred in the same timeframe. > > I've taken a look at the incident. 2,666 VRPs disappeared around > 2020-04-01T16:32Z. For the purpose of this analysis the list of > affected > VRPs is > http://instituut.net/~job/deleted-vrps-ripe-2020-04-01-16-32.txt > > Andree Toonk (BGPMon) so kind to compile a list of prefixes which were > wrongly originated by Rostelecom during incident at 2020-04-01T19:27Z > https://portal.bgpmon.net/data/12389_apr2020.txt > > The above list is not the full list of prefixes affected by this leak. > The leak appears to have included route announcements that 12389 > received from some customers and some peers, in addition to 'bgp > optimiser'-style more-specific hijacks. Full list is available here: > https://map.internetintel.oracle.com/api/leak_prefixes/20764_12389_1585768500.pfxs > I'm leaving the 'merely leaked otherwise untouched' routes out of this > analysis as those are outside of scope of Origin Validation: the > fabricated routes in relation to missing RPKI VRPs are what is matters > for this analysis. > > If we take the intersection of Andree's list with the list of missing > VRPs, we have the IP addresses that were affected by both the RIPE NCC > RPKI Deletion incident and the Rostelecom BGP incident. The following > 12 > prefixes (4352 IP addresses): > > peer_count start_time alert_type base_prefix > base_as announced_prefix src_AS Affected_ASname example_ASPath > 49 2020-04-01 19:30:34 more_spec_by_other 91.195.240.0/23 > 47846 91.195.240.0/24 12389 SEDO-AS, DE 24751 20764 12389 > 12 2020-04-01 19:29:55 more_spec_by_other 62.122.168.0/21 > 50245 62.122.170.0/24 12389 SERVEREL-AS, NL 18356 38794 4651 > 4651 20764 12389 > 11 2020-04-01 19:30:34 more_spec_by_other 91.203.184.0/22 > 41064 91.203.187.0/24 12389 SKYROCK, FR 29430 13030 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.164.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.174.0/23 12389 SERVEREL-AS, NL 49515 197595 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.178.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.168.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.180.0/23 12389 SERVEREL-AS, NL 43317 20764 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.161.0/24 12389 SERVEREL-AS, NL 49515 197595 20764 > 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.170.0/24 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.187.0/24 12389 SERVEREL-AS, NL 1126 24785 20562 > 20764 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.166.0/24 12389 SERVEREL-AS, NL 51514 20562 20764 > 12389 > > If we look at the list of ASNs which were most impacted, the top ten > seems mostly anchored to the US (thus under the ARIN TAL), and almost > all of them seem heavyweights in the cloud / CDN space. > https://portal.bgpmon.net/data/12389_apr2020_affected_asns.txt > > The incorrect routing information covering to the above listed prefixes > was observed by a limited number of BGPMon peers, for other affected > routes the peer_count was around 170. While the RPKI incident lasted a > number of hours, but the Rostelecom routing incident lasted ten minutes > or so. (source: > https://map.internetintel.oracle.com/leaks#/id/20764_12389_1585768500) > > If we assume the generation & propagation of these hijacks was the > result of operator error, I imagine the change could've been reverted > almost immediately but we'd still see a bit of sloshing for a few > minutes through the routing system. Or perhaps the 'waves' we can see > in > Oracle's 3D rendering of the incident are the effects of Maximum Prefix > limits kicking in and various timers firing off at different times. > > Were these prefixes just unlucky because some BGP optimiser algorithm > had chosen them for the purpose of traffc engineering? Was this the > result of sophisticated planning? In any case, I can't judge the impact > this routing incident had on the three above listed ASNs. I don't know > what the victim IPs are used for. > > We have to keep in mind that a large portion of RIPE NCC's RPKI > repository, and of course the RPKI repositories of the other RIRs were > *not* affected. ISPs with 'invalid == reject' policies had lot of RPKI > data (~134,516 VRPs) available and those VRPs did have positive effects > on the scope and reach of the hijacks. RPKI Invalid BGP announcements > don't propagate as as good as Not-Found announcements. > > It appears the 'peer_count' for RPKI protected prefixes was > significantly lower (~140) than prefixes not covered by RPKI ROAs > (~160). The 'peer_count' value can be considered a proxy metric for a > hijack's reach and impact. The RPKI Invalids in this leak propagated > through ASNs for which we know they have not yet deployed RPKI OV. > > The above suggests to me that unavailability of RPKI services during > routing incidents, or lack of deployment of Origin Validation confirms > what most of us already suspected: it is inconvenient. > > RIPE NCC's service interruption appears to have affected 4,352 out of > the total of 5,945,764 misrouted IPs, and the 'peer_count' for the > illegitimate announcements was much lower (better) compared to other > prefixes. > > This leads me to believe this was not a deliberate plan dependent on a > process failure inside RIPE NCC, the incident's BGP data just doesn't > seem to show the incident maximally capitalised on the RPKI outage. > > Kind regards, > > Job
- Previous message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Next message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]