Xymon Mailing List Archive search

disable alert by test in bb-hosts

2 messages in this thread

list Sue Bauer-Lee · Thu, 8 Jun 2006 12:04:21 -0400 ·
The background for this request is as follows:
	1. each host has a standard list of services we check 
	2. alert rules group systems by functionality (regex) rather
	   specifying a separate alert rule for each host or page.

	HOST="%(tucwbd.*|vsiwbd.*)"
        MAIL $UNIXOPS SERVICE=conn,disk,http,procs DURATION>5m REPEAT=10m TIME=W:0800:1800 RECOVERED


Request:
	1. For hostA in bb-hosts:
		10.20.33.208     tucwbd153 # telnet ssh

	I want to add a series of http checks as a dependency chain
	such that if http://tucwbd153 fails, 
		1. don't alert on http://tucwbd153/web/hobbit
		2. if http://tucwbd153/web/hobbi fails, 
		   don't alert on http://hobbit:monitor at tucwbd153/web/hobbit

Is this setup as a depends statement?
Additionally, for certain periods of time I may want to set up a test for 
a host that I don't want to receive alerts for. I'd hate to have to break out 
those individual hostnames from regex above in my alerts file. Is there a
tag I can use with the test in bb-hosts to disable that alert?

	10.20.33.208     tucwbd153 # telnet ssh noalert=http://tucwbd153/web/hobbit
	(is line continuation recognized for the bb-hosts config?)

	I always want alerts for specified time frames for telent, conn, ssh
	but the DEV systems for the web team can have lots of stops and
	starts of web service. 

	clear would be ok to display for downtime.

Is there a better way to manage alerts for regex group and then isolate 
specific tests?

		
Sue Bauer-Lee        |    
list Henrik Størner · Fri, 9 Jun 2006 00:20:38 +0200 ·
quoted from Sue Bauer-Lee
On Thu, Jun 08, 2006 at 12:04:21PM -0400, Sue Bauer-Lee wrote:
Request:
	1. For hostA in bb-hosts:
		10.20.33.208     tucwbd153 # telnet ssh

	I want to add a series of http checks as a dependency chain
	such that if http://tucwbd153 fails, 
		1. don't alert on http://tucwbd153/web/hobbit
		2. if http://tucwbd153/web/hobbi fails, 
		   don't alert on http://hobbit:monitor at tucwbd153/web/hobbit

Is this setup as a depends statement?
This one is difficult with the current way that the network tests are
done. As you know, all of the "http" tests go into a single "http"
status. So to do what you want would mean splitting up the single
"http" status, or at least being able to look at each of the individual
tests inside the http status when it comes to deciding on what alerts
to generate.

Now, doing that could be a good thing for many reasons - being able to
split up alerts so they go to different people depending on which URL
is failing would be nice. I've had some thoughts about this, and I can
see that the network test tool does need re-designing. I'll keep this 
in mind when I work out how to re-do the network tests.
quoted from Sue Bauer-Lee

Additionally, for certain periods of time I may want to set up a test for 
a host that I don't want to receive alerts for. I'd hate to have to break out 
those individual hostnames from regex above in my alerts file. Is there a
tag I can use with the test in bb-hosts to disable that alert?

	10.20.33.208     tucwbd153 # telnet ssh noalert=http://tucwbd153/web/hobbit
	(is line continuation recognized for the bb-hosts config?)

	I always want alerts for specified time frames for telent, conn, ssh
	but the DEV systems for the web team can have lots of stops and
	starts of web service. 

	clear would be ok to display for downtime.

Is there a better way to manage alerts for regex group and then isolate 
specific tests?
First - yes, line continuation is supported for bb-hosts. If the line
ends with a backslash, it will continue on the next line.

That was the good part :-) The answer to your question is: No, this is
not possible today.

For the client-side tests (procs, msgs, ports, disk) I've implemented a
way of associating the rules in hobbit-clients.cfg causing a red status 
with the rules in hobbit-alerts.cfg that controls who gets alerted. The
same mechanism could be used for network tests, so you could associate
each URL you checked with some identifier which would then have an
IGNORE rule in the alert config.

(If anyone is wondering how that is done for the client side tests -
well, I forgot to document it in the current beta-release, but if you
pick up the patch-set for the beta it includes an update to the 
man-pages that explain what to do).


So - you've brought up some very good ideas. Unfortunately it's going to
be a while before I can make them come true.


Regards,
Henrik