The architecture of the CoralReef Internet Traffic monitoring software suite
Ken Keys, David Moore, Ryan Koga, Edouard Lagache, Michael Tesch, and K. Claffy
Cooperative Association for Internet Data Analysis - CAIDA
San Diego Supercomputer Center, University of California, San Diego
Passive data collection tools have traditionally been designed for specific tasks such as accounting (NeTraMet) or packet capture (tcpdump). The CoralReef suite (http://www.caida.org/tools/measurement/coralreef/) was designed to provide a uniform interface to passive data for a wide range of applications: from raw capture to real-time report generation. CoralReef provides convenient set of passive data tools for a diverse audience, from network administrators to researchers.
The CoralReef architecture is based on a toolbox paradigm. The base programming interface is implemented in a C library named libcoral. In order to allow users to write a single program to access many data sources, libcoral provides a consistent API for capture cards from multiple providers on ATM, POS, and 10-100-1000 ethernet capture cards from multiple vendors, as well as pcap interfaces. In addition, libcoral provides a consistent interface to many packet capture file formats including: all coral formats (NLANR formats included), pcap, DAG ATM, and DAG POS. This API satisfies an important design goal of providing multiple ways to approach the same problem. The libcoral API can operate on ATM cells, blocks of cells, or network packets, one at a time or via callbacks; the application developer can use whichever is most convenient. NeTraMet and Vern Paxson's Bro are two applications that have been adapted to use the Coral API. To facilitate rapid prototyping and development, another design goal is to provide the same interface in C/C++ and Perl. Because the Perl module CRL.pm directly calls the C routines, Perl scripts using CRL.pm perform well enough for many data analysis applications. Additional libraries and modules have been provided to perform more complex tasks. There are libraries and Perl modules for doing AS lookups from BGP routing tables and others for determining geographic locations via NetGeo. CoralReef includes modules for the storage and manipulation of frequently collected data including: source and destination hosts, IP protocols, ports, and amounts of traffic in bytes, packets and flows. These modules provide methods to automatically aggregate data into other table types and allow for efficiently selecting those entries that generate the most traffic. For example, it is possible to convert an AS table of byte/packet counts into a country table with a single method call. Higher level applications are written using these building blocks, so that for example creating an IP or AS matrix is accomplished with just a small Perl program, while larger programs (such as the realtime report generator t2_report) are more complex arrangements of the same building blocks.
In this paper, we will present the CoralReef design philosophy, overall architecture, and capabilities. CoralReef is a package of libraries, device drivers, classes, and applications written in, and for use with, several programming languages. By highlighting design and architectural decisions at all levels in CoralReef, we will show how CoralReef is a powerful, extensible, efficient, and convenient package for passive data collection and analysis.