Xymon Mailing List Archive search

only alert if X number of hosts are already in error

list Daniel J McDonald
Mon, 20 Jun 2005 08:14:59 -0500
Message-Id: <user-ed3d1076bda5@xymon.invalid>

On Fri, 2005-06-17 at 08:01 +0200, Henrik Stoerner wrote:
Something like

   HOST=%(www.*).foo.com TEST=http COLOR=red COUNT>=5
      MAIL user-3aaf2ac8399f@xymon.invalid

The "COUNT>=5" would then cause this rule to trigger only if there
were 5 or more hosts named www.*.foo.com, whose http tests are red.
You could even combine this with other criteria, say have a threshold of
5 during the daytime, and 10 during off-hours.

I can foresee a problem in handling recovery-notifications for this kind
of alerts, but that's something I'll have to think about.

Would that be useful ?
The main place I would use it would be NTP alerts.  If one router loses
NTP, I'm not terribly worried.  If 10-20 of them all fail at once then I
know there is something really bad happening... Maybe both GPS clocks
lost sync and all 4 cesium backups failed, or ntp locked up on a core
router and I need to make fewer down-stream nodes dependent on that one.


I would also consider using it for purple alerts.  I don't want
individual purples for most of my stuff, but if there are a lot of them
(>100) then I know I killed mrtg and I should page on that.
-- 
Daniel J McDonald, CCIE # 2495, CNX
Austin Energy

user-290ce4e24e19@xymon.invalid