[routing-wg] RPKI Quarterly Planning
- Previous message (by thread): [routing-wg] RPKI Quarterly Planning
- Next message (by thread): [routing-wg] New on RIPE Labs: Some Handy ROA Advice from Randy Bush
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ties de Kock
tdekock at ripe.net
Fri Jul 16 14:28:58 CEST 2021
Hi Job, > On 13 Jul 2021, at 12:57, Job Snijders via routing-wg <routing-wg at ripe.net> wrote: > > Hi, > > On Mon, Jul 12, 2021 at 10:23:20AM +0200, Daniel Karrenberg wrote: >> Natanlie pointed us to >> https://www.ripe.net/manage-ips-and-asns/resource-management/rpki/rpki-planning-and-roadmap >> a while ago. Among other things this says: >> >> “In preparation for the improved RPKI repository architecture, the >> distributed nature of the RRDP repository is going to be implemented using >> containers and krill-sync that pulls data from the centralised on-premise >> repository. This greatly simplifies smooth transitioning between publication >> servers without any downtime. >> >> NOTE: We are not referring to cloud technologies here, just to our internal >> deployment technologies.” >> >> The silence here worries me. > > What silence?! > > Over the last few months there have been quite some mail threads in this > working group about RPKI and RPKI outage incidents, and NCC staff have > provided updates during the virtual RIPE meetings in the Routing WG > slot. > > To me the roadmap seems to reflect the sentiment that reliability is the > key objective at this moment in time. > >> I would like to see some feedback from this group whether this is what >> you want to see happening. The RIPE Routing WG is the forum for giving >> guidance to the RIPE NCC about RPKI. I know other channels exist too >> and that is fine. I also know that individuals here seem to be happy >> with what is happening. However private channels and conversations are >> not the way RIPE does this. This group is where the RIPE NCC looks >> for guidance and where that guidance gets properly archived and >> responded to. > > To be honest I am not sure what the purpose of krill-sync is. > > In May 2021 [1] extensive testing was conducted with the help of the > NLNOG RING to see if krill-sync could be used to power the RSYNC > service, but it turned out there were multiple issues with krill-sync > making it a suboptimal choice. I believe RIPE NCC ended up deploying a > different solution to serve RSYNC - and my hope is that the > recently-achieved stability is here to stay, because the current setup > seems to work quite nicely. We are [1] evaluating krill-sync as a tool to build rsync servers that are independent of NFS and can use cached IO. The reason for this is rsync fallback. We see ~139 RPs using the rsync repository (as well as the majority of the NLNOG RING nodes ) and >1600 RPs using the RRDP repository [2]. When rsync fallback happens for many RPs, the current infrastructure will likely not scale, even when each RP starts from the last RRDP state. We are evaluating krill-sync because it allows us to build a rsync repository from RRDP and is available as an open-source project. I recall that while evaluating that krill-sync based environment we found three issues: * Repository versions need to be available for two hours _after they last were the current version_ to give slow clients the chance to retrieve them [3]. * The modification time of objects needs to be the same (between nodes and between copies for a serial) to prevent additional IOs for RPs. * There are very slow outliers reading repositories, but keeping versions available for two hours is long enough in practice. Finding these issues was good: it ensured that they were accounted for in our implementation that writes to NFS. After reporting the relevant issues upstream they have been fixed in krill-sync. The use of NLNOG RING helped verify the current NFS based setup - which I agree is working nicely. Kind regards, Ties [1]: https://www.ripe.net/ripe/mail/archives/routing-wg/2021-June/004351.html [2]: rsync: number of unique IPs reading from /repository yesterday in one hour. hour-to-hour variance is minimal. RRDP: number of unique IPs retrieving notification.xml >24 times/day in early July. [3]: Example: revision 0 gets published at 0h0m, revision 1 at 1h59m, revision 2 at 2h01m (and revision 0 is deleted). The files that clients that connect at 1h58m read get deleted.
- Previous message (by thread): [routing-wg] RPKI Quarterly Planning
- Next message (by thread): [routing-wg] New on RIPE Labs: Some Handy ROA Advice from Randy Bush
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]