I think there are 2 false errors :
- for each 'foo.cpu' alert I got paged twice, with the same ACK code.
- I shouldn't have been paged between 11h05 and 12h05, nor
after 13h00,
for 'foo.procs'
Could you try running "bbcmd hobbitd_alert --test foo cpu" ?
Of course :
$ $BBHOME/bin/bbcmd hobbitd_alert --test foo cpu
2005-02-15 14:59:22 Using default environment file ../etc/hobbitserver.cfg
Matching host:service:page 'foo:cpu:' against rule line 115:Matched
*** Match with 'HOST=foo TIME=W:0900:1800' ***
Matching host:service:page 'foo:cpu:' against rule line 116:Matched
*** Match with 'SCRIPT /tmp/alerte.sh SERVICE=*
EXSERVICE=disk,mem,procs' ***
Script alert with command '/tmp/alerte.sh' and recipient SERVICE=*
Matching host:service:page 'foo:cpu:' against rule line 117:Failed (min.
duration)
Matching host:service:page 'foo:cpu:' against rule line 118:Failed (color)
Matching host:service:page 'foo:cpu:' against rule line 119:Failed (time
criteria)
Here are lines 115 to 119 of my $BBHOME/etc/hobbit-alerts.cfg :
115 HOST=foo TIME=W:0900:1800
116 SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs
117 SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h
118 SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h
119 SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150,*:1205:1300
REPEAT=24h
Also, if you add the option "--cfid" to the hobbitd_alert
commandline in hobbitlaunch.cfg, it will include the
linenumber of the hobbit-alerts.cfg file with each alert.
That should make it easier to track down what rules trigger an alert.
Done.
I just noticed this won't work for SCRIPT recipients, because it's put in
the message subject which scripts ignore. So drop that.
Undone ;-)
Regards,
--
Frédéric Mangeant