[atlas] Processing RIPE Atlas data as Big Data
- Previous message (by thread): [atlas] Processing RIPE Atlas data as Big Data
- Next message (by thread): [atlas] The three generations of the Atlas probes : technical details
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Stephen D. Strowes
sds at ripe.net
Tue Jul 24 12:05:33 CEST 2018
Hi, I assume you're referring to the daily dumps that we release here: https://data-store.ripe.net/datasets/atlas-daily-dumps/ There are a couple of things that I find are relatively slow to deal with on the command line: standard bzip2 tooling, and jq for json parsing. So I lean on a couple of other tools to speed things up for me: - the lbzip2 suite parallelises parts of the compress/decompress pipeline - GNU parallel can split data in a pipe onto one process per core So, for example, on my laptop I can reasonably quickly pull out all of the traceroutes my own probe ran: lbzcat traceroute-2018-07-23T0700.bz2 | parallel -q --pipe jq '. | select(.prb_id == 14277)' Stéphane has written about using jq to parse Atlas results on labs.ripe.net also: https://labs.ripe.net/Members/stephane_bortzmeyer/processing-ripe-atlas-results-with-jq Happy to hear from others what tools they use for data processing! Cheers, S. On 21/07/2018 19:09, BELLAFKIH hayat wrote: > Dear RIPE Atlas users, > > I am studying the processing of the data collected by the probes as a > Big Data problem. For instance, one hour of traceroute data count for > 500 Mo (bzip2), so 7 Go of data in text format. Can you share with me > how you deal with these data in practice. > are you using a super machine, Big Data tools? > > best regards, > Hayat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.ripe.net/ripe/mail/archives/ripe-atlas/attachments/20180724/f6c894b6/attachment.html>
- Previous message (by thread): [atlas] Processing RIPE Atlas data as Big Data
- Next message (by thread): [atlas] The three generations of the Atlas probes : technical details
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]