[atlas] RIPE Atlas probe status issues
- Previous message (by thread): [atlas] RIPE Atlas probe status issues
- Next message (by thread): [atlas] paris traceroute output
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Daniel AJ Sokolov
listclient at sokolov.eu.org
Thu Apr 26 07:35:21 CEST 2018
Thank you. Has the problem resurfaced? My probe #1118 is showing offline again, although it isn't. I can see the ongoing measurement results on my profile page. BR Daniel AJ On 2018-04-23 at 06:37 AM, Chris Amin wrote: > Dear RIPE Atlas users, > > There were various issues relating to the recorded status of RIPE Atlas > probes over the weekend. This was brought to our attention by internal > monitoring and information provided by users on the mailing list. > > Throughout this period most probes did actually remain connected to > controllers, and measurement results were collected as normal. The side > effects included: > > * the number of probes reported as connected by the system was lower > than it should have been > * the status (connected/disconnected) of many probes was incorrect > * new measurements took longer than usual to start > * fewer probes than usual were available for new measurements, leading > in some cases to “no suitable” probes messages when trying to schedule > new measurements > * various system tags were incorrectly applied, including many probes > being marked as having USB problems when this was not the case > * temporary discrepancies with crediting/debiting of RIPE Atlas credits > for the > connected time of probes > > The issues were caused by a bug fix deployment at Friday 9AM UTC where a > package was accidentally downgraded causing a regression to an old bug > in the task handling of the central system. This bug caused a backlog of > messages to build, slowing down or stopping the registering of various > status messages in the system. Problems built up gradually as the > backlog increased, until the root cause was identified on Sunday > morning. The issue was then fixed and the system stabilized completely > by about 10AM UTC. We have identified procedural and technical solutions > that will stop this problem happening again, and are looking at ways to > improve our monitoring of these kinds of issues. > > We apologise for any inconvenience or confusion caused by this event and > would like to thank all of you who took the time to notify us of what > you were seeing. > > Kind regards, > Chris Amin > RIPE NCC >
- Previous message (by thread): [atlas] RIPE Atlas probe status issues
- Next message (by thread): [atlas] paris traceroute output
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]