This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/ripe-atlas@ripe.net/

[atlas] RIPE Atlas probe status issues

Previous message (by thread): [atlas] RIPE Atlas probe status issues
Next message (by thread): [atlas] paris traceroute output

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Daniel AJ Sokolov listclient at sokolov.eu.org
Thu Apr 26 07:35:21 CEST 2018

Thank you. Has the problem resurfaced?

My probe #1118 is showing offline again, although it isn't. I can see
the ongoing measurement results on my profile page.

BR
Daniel AJ

On 2018-04-23 at 06:37 AM, Chris Amin wrote:
> Dear RIPE Atlas users,
> 
> There were various issues relating to the recorded status of RIPE Atlas
> probes over the weekend. This was brought to our attention by internal
> monitoring and information provided by users on the mailing list.
> 
> Throughout this period most probes did actually remain connected to
> controllers, and measurement results were collected as normal. The side
> effects included:
> 
> * the number of probes reported as connected by the system was lower
> than it should have been
> * the status (connected/disconnected) of many probes was incorrect
> * new measurements took longer than usual to start
> * fewer probes than usual were available for new measurements, leading
> in some cases to “no suitable” probes messages when trying to schedule
> new measurements
> * various system tags were incorrectly applied, including many probes
> being marked as having USB problems when this was not the case
> * temporary discrepancies with crediting/debiting of RIPE Atlas credits
> for the
> connected time of probes
> 
> The issues were caused by a bug fix deployment at Friday 9AM UTC where a
> package was accidentally downgraded causing a regression to an old bug
> in the task handling of the central system. This bug caused a backlog of
> messages to build, slowing down or stopping the registering of various
> status messages in the system. Problems built up gradually as the
> backlog increased, until the root cause was identified on Sunday
> morning. The issue was then fixed and the system stabilized completely
> by about 10AM UTC. We have identified procedural and technical solutions
> that will stop this problem happening again, and are looking at ways to
> improve our monitoring of these kinds of issues.
> 
> We apologise for any inconvenience or confusion caused by this event and
> would like to thank all of you who took the time to notify us of what
> you were seeing.
> 
> Kind regards,
> Chris Amin
> RIPE NCC
>

Previous message (by thread): [atlas] RIPE Atlas probe status issues
Next message (by thread): [atlas] paris traceroute output

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ ripe-atlas Archives ]