Xymon Mailing List Archive search

only alert if X number of hosts are already in error

list Henrik Størner
Fri, 17 Jun 2005 08:01:36 +0200
Message-Id: <user-3c012b3b86ae@xymon.invalid>

On Thu, Jun 16, 2005 at 02:28:53PM -0700, Bruce Lysik wrote:
My best suggestion would be to use the bbcombotest tool to define
a pseudo "host" with the combined status of your host "pool".

E.g. if you're monitoring http on 5 hosts, you could define a
combination test like this:

Pool1.http=(hostA.http+hostB.http+hostC.http+hostD.http+hostE.http)>3

That would give you a red alert if 3 or fewer hosts in the pool were
green. And you could then trigger an alert based on that test result.
Pretty unwieldy when you have large pools of servers, however.  
Could be, yes.
I just started writing a smart paging script which will keep track of 
downed hosts and decide whether or not to page.  
I'm interested to know if this kind of alerting is generally useful.
I suspect it might be ... if so, then we should devise a way of defining
such alerts directly in Hobbit instead of forcing you to come up with
scripts that work around this.

Perhaps one solution could be to implement a new kind of rule for the
hobbit-alerts file. Currently all of the rules are matched against a
specific host+test combination; we could define a type of rule that
could be matched against all of the host+test statuses that are in an 
alerting stage, and then have the rule trigger based on some criteria
for how many matches we get.

Something like

   HOST=%(www.*).foo.com TEST=http COLOR=red COUNT>=5
      MAIL user-3aaf2ac8399f@xymon.invalid

The "COUNT>=5" would then cause this rule to trigger only if there
were 5 or more hosts named www.*.foo.com, whose http tests are red.
You could even combine this with other criteria, say have a threshold of
5 during the daytime, and 10 during off-hours.

I can foresee a problem in handling recovery-notifications for this kind
of alerts, but that's something I'll have to think about.

Would that be useful ?

One question I have so far is: Does hobbit wait for an alerting script 
to return before continuing to evaluate other rules?  
Paging scripts are serialized, yes - Hobbit will wait for a paging
script to complete before continuing down the list of alert rules.


Regards,
Henrik