Can't see my alert in the "info" column
list Frédéric Mangeant
Hi all
I'm playing with alerts and Hobbit 4.0-rc2, and I must say the ease of use
is fantastic !
The only problem is that I can't see my alerts in the "info" colum.
My $BBHOME/etc/hobbit-alerts.cfg contains this :
HOST=foo TIME=W:0900:1800
SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs
SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h
SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h
SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150 REPEAT=24h
Alerts work fine, but I have this in the "info" column :
"No e-mail/SMS alerting defined"
Any hint ?
Thanks in advance.
Regards,
--
Frédéric Mangeant
list Henrik Størner
▸
On Tue, Feb 15, 2005 at 11:57:03AM +0100, Frédéric Mangeant wrote:
Hi all I'm playing with alerts and Hobbit 4.0-rc2, and I must say the ease of use is fantastic !
Thanks :-)
The only problem is that I can't see my alerts in the "info" colum.
Known mis-feature. The "info" generator cannot handle the Hobbit alert configuration right now. Regards, Henrik
list Frédéric Mangeant
▸
The only problem is that I can't see my alerts in the "info" colum.Known mis-feature. The "info" generator cannot handle the Hobbit alert configuration right now.
Thanks for your answer. I've changed the subjet of this mail because I think some alerts don't work as expected. With this $BBHOME/etc/hobbit-alerts.cfg :
▸
HOST=foo TIME=W:0900:1800
SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs
SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h
SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h
SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150,*:1205:1300
REPEAT=24h
I received these alerts :
15/02/2005 11:38:33 foo.mem = yellow (ACK :125406)
15/02/2005 11:39:33 foo.disk = red (ACK :419182)
15/02/2005 11:45:13 foo.procs = red (ACK :992240)
15/02/2005 12:03:46 foo.procs = red (ACK :143469)
15/02/2005 13:39:50 foo.disk = red (ACK :78043)
15/02/2005 14:03:50 foo.procs = red (ACK :408423)
15/02/2005 14:09:50 foo.cpu = yellow (ACK :844373)
15/02/2005 14:09:50 foo.cpu = yellow (ACK :844373)
15/02/2005 14:11:50 foo.cpu = yellow (ACK :589672)
15/02/2005 14:11:50 foo.cpu = yellow (ACK :589672)
I think there are 2 false errors :
- for each 'foo.cpu' alert I got paged twice, with the same ACK code.
- I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for
'foo.procs'
Any clue ?
Thanks...
--
Frédéric Mangeant
list Henrik Størner
In <user-648014b9477f@xymon.invalid> Fr�d�ric Mangeant <user-b6ea1d850181@xymon.invalid> writes:
some alerts don't work as expected.
[snip config and summary of sent alerts]
▸
I think there are 2 false errors : - for each 'foo.cpu' alert I got paged twice, with the same ACK code. - I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'
Could you try running "bbcmd hobbitd_alert --test foo cpu" ? Also, if you add the option "--cfid" to the hobbitd_alert commandline in hobbitlaunch.cfg, it will include the linenumber of the hobbit-alerts.cfg file with each alert. That should make it easier to track down what rules trigger an alert. Regards, Henrik
list Henrik Størner
▸
In <cusuvl$s7c$user-e356fad9864f@xymon.invalid> Henrik Storner <user-ce4a2c883f75@xymon.invalid> writes:
Also, if you add the option "--cfid" to the hobbitd_alert commandline in hobbitlaunch.cfg, it will include the linenumber of the hobbit-alerts.cfg file with each alert. That should make it easier to track down what rules trigger an alert.
I just noticed this won't work for SCRIPT recipients, because it's put in the message subject which scripts ignore. So drop that. Henrik
list Frédéric Mangeant
▸
I think there are 2 false errors : - for each 'foo.cpu' alert I got paged twice, with the same ACK code. - I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'Could you try running "bbcmd hobbitd_alert --test foo cpu" ?
Of course :
$ $BBHOME/bin/bbcmd hobbitd_alert --test foo cpu
2005-02-15 14:59:22 Using default environment file ../etc/hobbitserver.cfg
Matching host:service:page 'foo:cpu:' against rule line 115:Matched
*** Match with 'HOST=foo TIME=W:0900:1800' ***
Matching host:service:page 'foo:cpu:' against rule line 116:Matched
*** Match with 'SCRIPT /tmp/alerte.sh SERVICE=*
EXSERVICE=disk,mem,procs' ***
Script alert with command '/tmp/alerte.sh' and recipient SERVICE=*
Matching host:service:page 'foo:cpu:' against rule line 117:Failed (min.
duration)
Matching host:service:page 'foo:cpu:' against rule line 118:Failed (color)
Matching host:service:page 'foo:cpu:' against rule line 119:Failed (time
criteria)
Here are lines 115 to 119 of my $BBHOME/etc/hobbit-alerts.cfg :
115 HOST=foo TIME=W:0900:1800
116 SCRIPT /tmp/alerte.sh SERVICE=* EXSERVICE=disk,mem,procs
117 SCRIPT /tmp/alerte.sh SERVICE=disk DURATION>5m REPEAT=2h
118 SCRIPT /tmp/alerte.sh SERVICE=mem COLOR=yellow REPEAT=24h
119 SCRIPT /tmp/alerte.sh SERVICE=procs TIME=*:1145:1150,*:1205:1300
REPEAT=24h
▸
Also, if you add the option "--cfid" to the hobbitd_alert commandline in hobbitlaunch.cfg, it will include the linenumber of the hobbit-alerts.cfg file with each alert. That should make it easier to track down what rules trigger an alert.
Done.
▸
I just noticed this won't work for SCRIPT recipients, because it's put in the message subject which scripts ignore. So drop that.
Undone ;-) Regards, -- Frédéric Mangeant
list Henrik Størner
▸
On Tue, Feb 15, 2005 at 02:36:26PM +0100, Frédéric Mangeant wrote:
I think there are 2 false errors : - for each 'foo.cpu' alert I got paged twice, with the same ACK code. - I shouldn't have been paged between 11h05 and 12h05, nor after 13h00, for 'foo.procs'
I've tried, but I cannot make this happen on my own setup. Could you send me the script you use for alerting, and the ~hobbit/data/ack/notifications.log file ? Regards, Henrik
list Frédéric Mangeant
Hi Henrik
▸
I've tried, but I cannot make this happen on my own setup. Could you send me the script you use for alerting, and the ~hobbit/data/ack/notifications.log file ?
Well, I moved to another server, on which I cleanly installed Hobbit 4.0-rc2
+ patches, and can't seem to reproduice the problem.
Anyway, here's my tiny paging script :
$ cat /tmp/alert.sh
#!/bin/sh
DATE=`date +%d/%m/%Y%t%H:%M:%S`
echo "$DATE $BBHOSTNAME.$BBSVCNAME = $BBCOLORLEVEL (ack : $ACKCODE,
recovered : $RECOVERED)" >> /tmp/alert.txt
I did some more testing, there seems to be 2 small problems :
1) Warning when the format of a script is missing
With this rule :
$ cat $BBHOME/etc/hobbit-alerts.cfg
HOST=fmangeant SERVICE=* EXSERVICE=procs,disk,mem,svcs REPEAT=24h
TIME=W:0900:1800 SCRIPT /tmp/alert.sh FORMAT=TEXT
HOST=fmangeant SERVICE=disk DURATION>2m SCRIPT /tmp/alert.sh
I get a warning :
$ $BBHOME/bin/bbcmd hobbitd_alert --test fmangeant disk
2005-02-16 15:22:03 Using default environment file
/BB/hobbit/server/etc/hobbitserver.cfg
2005-02-16 15:22:03 Ignoring SCRIPT with no recipient at line 2
Matching host:service:page 'fmangeant:disk:' against rule line 1:Failed
(service excluded)
Matching host:service:page 'fmangeant:disk:' against rule line 2:Failed
(min. duration)
If I add the format of the script, like this :
$ cat $BBHOME/etc/hobbit-alerts.cfg
HOST=fmangeant SERVICE=* EXSERVICE=procs,disk,mem,svcs REPEAT=24h
TIME=W:0900:1800 SCRIPT /tmp/alert.sh FORMAT=TEXT
HOST=fmangeant SERVICE=disk DURATION>2m SCRIPT /tmp/alert.sh FORMAT=text
$ $BBHOME/bin/bbcmd hobbitd_alert --test fmangeant disk
2005-02-16 15:22:54 Using default environment file
/BB/hobbit/server/etc/hobbitserver.cfg
Matching host:service:page 'fmangeant:disk:' against rule line 1:Failed
(service excluded)
Matching host:service:page 'fmangeant:disk:' against rule line 2:Failed
(min. duration)
2) Repeat interval not correctly taken into account
I tried to repeat an alert every 5 minutes :
$ cat $BBHOME/etc/hobbit-alerts.cfg
HOST=fmangeant SERVICE=* EXSERVICE=procs,disk,mem,svcs REPEAT=24h
TIME=W:0900:1800 SCRIPT /tmp/alert.sh FORMAT=TEXT
HOST=fmangeant SERVICE=disk DURATION>2m SCRIPT /tmp/alert.sh FORMAT=TEXT
HOST=fmangeant SERVICE=procs REPEAT=5m SCRIPT /tmp/alert.sh FORMAT=TEXT
$ $BBHOME/bin/bbcmd hobbitd_alert --test fmangeant procs
2005-02-16 15:23:59 Using default environment file
/BB/hobbit/server/etc/hobbitserver.cfg
Matching host:service:page 'fmangeant:procs:' against rule line 1:Failed
(service excluded)
Matching host:service:page 'fmangeant:procs:' against rule line 2:Failed
(min. duration)
Matching host:service:page 'fmangeant:procs:' against rule line 3:Matched
*** Match with 'HOST=fmangeant SERVICE=procs REPEAT=5m SCRIPT
/tmp/alert.sh FORMAT=TEXT' ***
Script alert with command '/tmp/alert.sh' and recipient FORMAT=TEXT
But I got paged every 30 minutes :
$ cat /tmp/alert.txt
16/02/2005 14:43:27 fmangeant.procs = red (ack : 145155, recovered : 0)
16/02/2005 15:13:30 fmangeant.procs = red (ack : 145155, recovered : 0)
Is it possible to use any repeat value ?
Thanks in advance.
Regards,
--
Frédéric Mangeant