This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/ripe-atlas@ripe.net/

[atlas] some thoughts and question regrding probe "stability"

Previous message (by thread): [atlas] some thoughts and question regrding probe "stability"
Next message (by thread): [atlas] some thoughts and question regrding probe "stability"

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Philip Homburg philip.homburg at ripe.net
Fri Jul 18 14:23:43 CEST 2014

On 2014/07/18 12:12 , Wilfried Woeber wrote:
> Hu Philip + Team,
> 
> Philip Homburg wrote:
> 
> first of all thanks for investigating!

No problem. I was also curious myself why 'normal' probes would
disconnect. Most time is spend looking at the exceptions.

> [...]
>> More like, the controller 'pings' the probe every 20 seconds and after 3
>> missed responses the connection is terminated.
>>
>> And for the Atlas system as a whole, that works. But the goal of the
>> Atlas system is not to have a probe connected as long as possible.
> 
> That's fully understood.
> 
> I'm still having a couple of questions :-)
> 
> 1) if I do understand correctly, the decision to label a probe "disconnected"
>    is made by the associateed collector, based on pings? (btw. - "real" pings
>    on ICMP or internal over the channel?)

Connected/disconnected is based on whether a probe has a ssh connection
to a controller. There is a keepalive mechanism within the ssh protocol
to see if there other end is still there. That ssh mechanism is used
abort the connection. Nothing to do with real (ICMP) pings.

> 2) if that's the case, is there an easy way to find out to which collector a
>    probe is "assigned"? (is this static or dynamic?)

I don't know why, but that information is not shown to normal users. Of
course, if you can capture traffic, you can easily find out :-)

The assignment is dynamic.

> 3) if a probe, in particular an anchor, gets updated with a new firmware, is
>    it possible that the ethernet IF does *not* go down? (Note, the 6009 is an
>    old, big, beta box! Is there a difference with the new soekris probes?)

On regular probes a firmware upgrade always involves a reboot. On
anchors the Atlas 'firmware' is an rpm. There is no reason to reboot the
box or bring its interface down to upgrade the Atlas rpm.

> Just to be very clear, I just want to understand how to interpret things,
> 'cause I already had an issue with one of my v1 probes, and in the end it
> turned out that the USB power feed was just boarderline, problem gone after
> replacement.

Yes it is good to keep an eye on those things. We can only look at
probes statistically or in response to tickets, mail, etc.

> And as an ISP and backbone operator, seeing stuff as "down" or "disconnected",
> without a good explanation, starts to itch after a while :-)

I think the best page to look at is the 'Result from Built-in
Measurements'. If those graphs look fine, then there is no real reason
to worry. Unless the probe keeps connecting and disconnecting multiple
time a day or something like that.

Previous message (by thread): [atlas] some thoughts and question regrding probe "stability"
Next message (by thread): [atlas] some thoughts and question regrding probe "stability"

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ ripe-atlas Archives ]