Xymon Mailing List Archive search

Client interval question

list Scott Walters
Tue, 13 Dec 2005 15:08:03 -0500
Message-Id: <user-68f464cf21f6@xymon.invalid>

We run pretty much all of our big brother tests every minute.  On
our new hobbit servers, we're running them at the default intervals.

BB shows us that our primary name server is going out for less than
a minute, about every 62 minutes.
Hobbit is missing most of those
outages, although the longer "xxxx events received in the last xxx
minutes" is what helped us spot the problem, as a whole bunch of
machines' services don't respond well when our primary name server
is out, and having a mass of servers go yellow then green, in
unison, is sort of eye catching.
So hobbit with the xxx events (running every 5m) did provide enough  information to indicate an intermittent problem with DNS?

Things running every 5m will collide with a problem that happens for  a minute frequently enough to 'show up on the radar'

But every site has different requirements.  It's just been my  experience that sampling more frequently than 5m hits the knee-bend  of diminishing returns.  It also increases the potential for state  changes, which chews up the filesystem with the history info.

ymmv

scott