This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/ripe-atlas@ripe.net/

[atlas] some thoughts and question regrding probe "stability"

Previous message (by thread): [atlas] some thoughts and question regrding probe "stability"
Next message (by thread): [atlas] some thoughts and question regrding probe "stability"

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Philip Homburg philip.homburg at ripe.net
Thu Jul 17 18:03:15 CEST 2014

Hi Wilfried,

> Let's compare the most recent dis/connection logs for my 3 pets:

Here is what I found in our logs:

> ID 6009
> 2014-07-14 03:58:03	3d 8h 16m	 Still Connected

Upgrade to firmware 4650
	
> 2014-05-27 03:03:54	48d 0h 46m	 2014-07-14 03:50:47	0h 7m

Hard to say, some network glitch

> 2014-05-20 15:19:02	6d 11h 37m	 2014-05-27 02:57:00	0h 6m

Anchor was rebooted

> 2014-05-14 21:16:56	5d 17h 59m	 2014-05-20 15:16:22	0h 2m

Network glitch

https://atlas.ripe.net/atlas/udm.html?1026358.increase_type=rel&1026358.current_shift=150&1026358.current_clip=250&1026358.group_by=cc&1026358.show_me_filter=max,pls&msm_id=1026358&1026358.start_timestamp=1400098401&1026358.end_timestamp=1400102942&1026358.selected_probes=6001,6002,6003,6019,6022,6031,6040,6052#tab-seismograph1026358

> 2014-04-08 16:03:21	36d 5h 1m	 2014-05-14 21:05:17	0h 11m

Anchor was rebooted

> ID 0466
> 2014-07-13 23:31:05	3d 12h 45m	 Still Connected	

Some network glitch, unclear what

> 2014-07-09 23:05:40	3d 23h 54m	 2014-07-13 22:59:49	0h 31m

Probe upgraded firmware, reason for disconnect got lost

> 2014-06-16 10:53:21	23d 11h 55m	 2014-07-09 22:49:04	0h 16m

Network problem

> 2014-05-25 09:03:06	22d 1h 38m	 2014-06-16 10:42:00	0h 11m

Some network problem.

> 2014-05-24 20:34:50	11h 54m	 	 2014-05-25 08:29:12	0h 33m

Unclear

> ID 0414
> 2014-07-07 23:41:23	9d 12h 35m	 Still Connected	

Some network problem

> 2014-07-02 03:58:45	5d 19h 31m	 2014-07-07 23:29:54	0h 11m

Power cycled?

> 2014-06-13 09:37:50	18d 18h 7m	 2014-07-02 03:45:08	0h 13m

Some network problem. High RTTs

> 2014-06-08 13:22:14	4d 20h 7m	 2014-06-13 09:29:38	0h 8m

Power cycled?

> 2014-05-21 08:29:23	18d 4h 45m	 2014-06-08 13:15:11	0h 7m

Same.

> Again, I fail to see some obvious correlation, what am I missing?
> 
> Does anyone else see a similar pattern?
> 
> How to start debugging, if there's anythig that needs debugging?

A couple of points:
1) The connection between a probe (or anchor) and its controller doesn't
have to be perfectly stable. It has to be good enough that probes will
report results in timely fashion and can get commands. But nothing
beyond that.
2) For single probe to see a network failure (with measurements using
the default parameters) the failure has to last for at least 10 minutes.
That way a couple of measurements will have a chance to report on the
failure. In contrast, the connection between a probe and the controller
is already terminated if the network is down for one minute.
3) When a target is measured by many probes then it is likely that at
least some probes will pick up an event. But one probe on its own, it is
hard to say anything about that.
4) Version 1 probes tend to reboot after losing the connection to the
controller due to memory fragmentation issues. That is unfortunate, but
we can't really do anything about it. Version 3 probes and anchors just
report their results a little later.

Philip

Previous message (by thread): [atlas] some thoughts and question regrding probe "stability"
Next message (by thread): [atlas] some thoughts and question regrding probe "stability"

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ ripe-atlas Archives ]