|
RIPE NCC: Test Box Alarms/Internals
Internals of the alarm system.
Who should read this?
This page contains details about the alarm setup and is primarily intended
for the test-box operators at the NCC.
Which programs run on the test-box?
Two processes are run by the cron deamon at regular intervals. Each
process consists of a perl-script to set up links and such, and piece of
C-code that does the actual work. Output is written to the GENE-file of
the test-box. The perl-scripts are called lta.pl and sta.pl, the
corresponding C-programs lta and sta.
LTA.PL
lta.pl is run once a day at 5 minutes past midnight. It should be run
once a day, the 5 minutes is just to avoid conflicts with other things
that might happen at the turn of the day.
Lta.pl calls lta. Lta first reads a 2-dimensional histogram from the
lta_history file, with along 1 axis the delay in ms-bins and along the
other the day data was taken. Lta then reads all RCDP files on the system,
checks which one have been produced since the last time the program was
run, and adds that data to the histogram. Bins with data older than 30
days are discarded. The histogram is then written back to the lta_history
file.
STA.PL
sta.pl is run every 15 minutes, at 7, 22, 37 and 52 minutes past the hour.
There is no fundamental reason for this, it should just not be run exactly
on the hour or when lta.pl is run, to avoid conflits with those programs.
Sta.pl is used to set up sym-links and such. It then calls the C-code.
Sta reads the lta_history file and creates a histogram with the delay
distribution for each connection with data from corresponding period (0-6
hours, 6-12, etc), for the last 30 days. It determines median and
percentiles. Sta then reads the 2 most recent RCPD files and creates
short term histograms for each connection with the data taken during the
last 30 minutes. Again, percentiles are determined.
The program then reads the alarm status from the last time the program
was run. It calculates the alarm code for the current status and, if this
differs from the last time the program was run, it prints a message.
Finally, the current alarm status is written back to the sta_history file.
Control is then passed back to sta.pl. Sta.pl takes the output of
sta and sends emails to ttXX@ripe.net, where ttXX@ripe.net is defined
in /etc/aliases as an alias for the host of ttXX. At the moment, this
file is created by hand from Fotis' notes.
Updating the mailaliases
In case a host asks for mails to be sent to another address, or not at all:
- Go to /ncc/test-traffic/SETUP/INSTALL/rdist_trees/proto_test_box/etc
- Edit the file aliases and look for a line like:
- ttXY: somebody@somewhere.domain
- Then:
- For a new box: add a line ttXX: somebody@somewhere.domain
- To change the addresses for an existing box: change the
appropriate line, use commas to separate entries.
- To switch off mail to a host: replace the existing line
by: ttXX: /dev/null
- Save the file, go to /ncc/ttpro/config/
- Type: ./cfengine.conf -DSyncAliases
This is a new and fast, but not completely tested procedure,
the old procedure was:
- Log in to tt01 and make yourself root.
- Edit the file /etc/aliases and look for a line like:
- ttXY: somebody@somewhere.domain
- Then:
- For a new box: add a line ttXX: somebody@somewhere.domain
- To change the addresses for an existing box: change the
appropriate line.
use commas to separate entries.
- To switch off mail to a host: replace the existing line by:
ttXX: /dev/null
(Yes, this can be automated, in fact, I already have a script for it, but
it needs a bit of work for the "special" cases).
- Save the file.
- Run: ./newaliases
- Log out
- Copy the files /etc/aliases and /etc/aliases.db to the rdist trees
on office.
- Use update_testbox to distribute the file to all other boxes.
- Copy the file by hand to tt02.
|