[atlas] Email or SMS alert when probe goes offline/online
- Previous message (by thread): [atlas] Email or SMS alert when probe goes offline/online
- Next message (by thread): [atlas] Email or SMS alert when probe goes offline/online
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg B - NANOG
gwbnanog at gmail.com
Wed Dec 14 18:36:29 CET 2011
Robert, That's great and I do hope the firmware update helps at least some of these situations. Looking at my last 25 connections list I also see downtimes of 4 hours, 6 hours, and two times for 1 hour over the last month. I'm pretty sure my internet connection wasn't actually down for these long periods since I have monitoring of it from another location (my office) which doesn't show these outages. So I do hope a feature is added in the near future to allow the probe host to set a threshold for when to notify of probe down in minutes instead of the default of 5 days. -Greg On Wed, Dec 14, 2011 at 9:05 AM, Robert Kisteleki <robert at ripe.net> wrote: > Hi, > > On 2011.12.14. 5:37, Greg B - NANOG wrote: > > Hi, > > I see there was a thread started back on September 7, 2011 with > > subject: Email or SMS alert when probe goes offline/online > > this was prior to me joining the mailing list. > > > > I'd like to voice my support for a user-configurable amount of time for > the > > Atlas system to send out an email notification that your probe is down > (and > > returned to service). > > Indeed, this is on our list -- but see also below. > > > My probe which I run on my home internet connection was apparently down > for > > 3.5 days before I just happened to login to look at the stats. > Considering I > > was at home for much of these 3.5 days, and my Internet connection was > > working, I assume the probe crashed because simply power-cycling it > "fixed" > > the problem. > > > > I know that if I got an email ~15 minutes after the probe was down, my > > probes downtime would probably have been closer to about 30 minutes > rather > > than 3.5 days. > > A little background story: > > We have identified a particular condition on the probes where the probe > refuses to connect back to our infrastructure after a disconnect (which can > be caused by a network hickup, anywhere between the probe and our > infrastructure, for example). This particular issue happens in low memory > situations. The probe still does measurements happily, it just cannot > connect to us and send the results in. > > After a while, the storage on the probe fills up, so as a best effort the > probe reboots -- which fixes the low memory situation and then everything > is > back to normal again. The punch line: the probe's local storage, as with > the > current configuration, fills up in about 3.5 days... > > We're rolling out a new firmware (4.280) to address this. So, unless there > are other similar conditions, after upgrading you will not see 3.5 day > downtimes. Fingers crossed :-) > > Regards, > Robert > > > Thanks. > > > > -Greg > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.ripe.net/ripe/mail/archives/ripe-atlas/attachments/20111214/c88e51ef/attachment.html>
- Previous message (by thread): [atlas] Email or SMS alert when probe goes offline/online
- Next message (by thread): [atlas] Email or SMS alert when probe goes offline/online
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]