[atlas] Email or SMS alert when probe goes offline/online
- Previous message (by thread): [atlas] Email or SMS alert when probe goes offline/online
- Next message (by thread): [atlas] Email or SMS alert when probe goes offline/online
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Daniel Karrenberg
daniel.karrenberg at ripe.net
Wed Dec 14 20:44:21 CET 2011
We also need to be quite clear in our communication on what "probe down" means and that data keeps being collected. Daniel On 14.12.2011, at 18:36, Greg B - NANOG wrote: > Robert, > That's great and I do hope the firmware update helps at least some of these situations. > > Looking at my last 25 connections list I also see downtimes of 4 hours, 6 hours, and two times for 1 hour over the last month. I'm pretty sure my internet connection wasn't actually down for these long periods since I have monitoring of it from another location (my office) which doesn't show these outages. So I do hope a feature is added in the near future to allow the probe host to set a threshold for when to notify of probe down in minutes instead of the default of 5 days. > > -Greg > > On Wed, Dec 14, 2011 at 9:05 AM, Robert Kisteleki <robert at ripe.net> wrote: > Hi, > > On 2011.12.14. 5:37, Greg B - NANOG wrote: > > Hi, > > I see there was a thread started back on September 7, 2011 with > > subject: Email or SMS alert when probe goes offline/online > > this was prior to me joining the mailing list. > > > > I'd like to voice my support for a user-configurable amount of time for the > > Atlas system to send out an email notification that your probe is down (and > > returned to service). > > Indeed, this is on our list -- but see also below. > > > My probe which I run on my home internet connection was apparently down for > > 3.5 days before I just happened to login to look at the stats. Considering I > > was at home for much of these 3.5 days, and my Internet connection was > > working, I assume the probe crashed because simply power-cycling it "fixed" > > the problem. > > > > I know that if I got an email ~15 minutes after the probe was down, my > > probes downtime would probably have been closer to about 30 minutes rather > > than 3.5 days. > > A little background story: > > We have identified a particular condition on the probes where the probe > refuses to connect back to our infrastructure after a disconnect (which can > be caused by a network hickup, anywhere between the probe and our > infrastructure, for example). This particular issue happens in low memory > situations. The probe still does measurements happily, it just cannot > connect to us and send the results in. > > After a while, the storage on the probe fills up, so as a best effort the > probe reboots -- which fixes the low memory situation and then everything is > back to normal again. The punch line: the probe's local storage, as with the > current configuration, fills up in about 3.5 days... > > We're rolling out a new firmware (4.280) to address this. So, unless there > are other similar conditions, after upgrading you will not see 3.5 day > downtimes. Fingers crossed :-) > > Regards, > Robert > > > Thanks. > > > > -Greg > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.ripe.net/ripe/mail/archives/ripe-atlas/attachments/20111214/3f624c3b/attachment.html>
- Previous message (by thread): [atlas] Email or SMS alert when probe goes offline/online
- Next message (by thread): [atlas] Email or SMS alert when probe goes offline/online
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]