alerts not emailing
list Bruce Lysik
Hi,
So I'm trying to get alerts working. Here's my test alerts config:
HOST=cs01 SERVICE=cpu
MAIL user-af77cde45853@xymon.invalid
The cpu alarm on this host has been red for quite a while:
-bash-2.05b$ bin/bb 127.0.0.1 "hobbitdlog cs01.cpu"
cs01|cpu|red||1106849049|1106876950|1106878750|0|0|172.16.150.1|291230||
red Thu Jan 27 17:49:09 PST 2005 up: 56 day(s), 0 users, 31 procs, load=1627
LOAD AVG on cs01 is 1627
But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.)
Any help would be appreciated.
--
Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Chris Morris
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
▸
But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.) Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and goes red on the hobbit display but hobbitd_alert takes no action to send a mail. Chris **************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). *****************************************************************************
list Henrik Størner
▸
On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.) Any help would be appreciated.I am having the same problem. A disk exceeds the Alarm threshold and goes red on the hobbit display but hobbitd_alert takes no action to send a mail.
Could you try running the following command (login as the hobbit user): ~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat Let it run for 5-10 minutes (long any for the critical status to be updated) and let me know if there's any output. Henrik
list Chris Morris
Henrik, There is plenty of output from running that command - e.g.: @@page#677|1106919385.676695|10.2.216.252|bku005|disk|10.2.48.244|1106921185 |red|green|1106919385||187413 status bku005.disk red Fri 28 Jan 13:36:25 2005 - Disk on bku005 at PANIC level &red /tmp (95%) has reached the defined disk space PANIC level (95%) /dev/lv00 139264 15136 124128 11% /tsg /dev/aixdoclv 319488 63000 256488 20% /aixdoc /dev/lv10 8192 2116 6076 26% /innogy /dev/hd4 98304 42824 55480 44% / /dev/hd9var 49152 24656 24496 51% /var /dev/lv01 409600 210716 198884 52% /bmc /dev/linuxlv 614400 369204 245196 61% /linux /dev/hd1 98304 60120 38184 62% /home /dev/lv09 524288 392520 131768 75% /maint /dev/lv11 917504 719512 197992 79% /downloads /dev/hd2 1294336 1149268 145068 89% /usr /dev/hd3 28672 27100 1572 95% /tmp @@ But still no alerts are being emailed and the page.log is not being updated. Chris
▸
-----Original Message----- From: Henrik Stoerner [SMTP:user-ce4a2c883f75@xymon.invalid] Sent: Friday, January 28, 2005 1:25 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] alerts not emailing On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.) Any help would be appreciated.I am having the same problem. A disk exceeds the Alarm threshold and goes red on the hobbit display but hobbitd_alert takes no action to send a mail.Could you try running the following command (login as the hobbit user): ~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat Let it run for 5-10 minutes (long any for the critical status to be updated) and let me know if there's any output. Henrik
**************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). *****************************************************************************
list Henrik Størner
OK, can you mail me your hobbit-alerts.cfg file then ? I'm sure you already checked that hobbitd_alert is running - it's enabled by default so it should be, but still ... Thanks, Henrik
▸
On Fri, Jan 28, 2005 at 01:48:41PM -0000, Morris, Chris (Shared Services) wrote:Henrik, There is plenty of output from running that command - e.g.: @@page#677|1106919385.676695|10.2.216.252|bku005|disk|10.2.48.244|1106921185 |red|green|1106919385||187413 status bku005.disk red Fri 28 Jan 13:36:25 2005 - Disk on bku005 at PANIC level &red /tmp (95%) has reached the defined disk space PANIC level (95%) /dev/lv00 139264 15136 124128 11% /tsg /dev/aixdoclv 319488 63000 256488 20% /aixdoc /dev/lv10 8192 2116 6076 26% /innogy /dev/hd4 98304 42824 55480 44% / /dev/hd9var 49152 24656 24496 51% /var /dev/lv01 409600 210716 198884 52% /bmc /dev/linuxlv 614400 369204 245196 61% /linux /dev/hd1 98304 60120 38184 62% /home /dev/lv09 524288 392520 131768 75% /maint /dev/lv11 917504 719512 197992 79% /downloads /dev/hd2 1294336 1149268 145068 89% /usr /dev/hd3 28672 27100 1572 95% /tmp @@ But still no alerts are being emailed and the page.log is not being updated. Chris-----Original Message----- From: Henrik Stoerner [SMTP:user-ce4a2c883f75@xymon.invalid] Sent: Friday, January 28, 2005 1:25 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] alerts not emailing On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.) Any help would be appreciated.I am having the same problem. A disk exceeds the Alarm threshold and goes red on the hobbit display but hobbitd_alert takes no action to send a mail.Could you try running the following command (login as the hobbit user): ~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat Let it run for 5-10 minutes (long any for the critical status to be updated) and let me know if there's any output. Henrik**************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). *****************************************************************************
--
Henrik Storner
list Tom Georgoulias
Count me in on this. I've induced a couple of process failures on a test system and the email alerts aren't coming through. What other info should I provide or look for?
Tom
-bash-2.05b$ ./bbcmd --env=/home/bb/hobbit/server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat
@@page#13|1106927004.323354|152.52.2.252|rfnd204d.nandomedia.com|cpu|0.0.0.0|1106928804|yellow|yellow|1106926704|web6|885572
status rfnd204d,nandomedia,com.cpu yellow Fri Jan 28 10:43:24 EST 2005 up: 5 min, 0 users, 55 procs, load=0.12
Warning: Machine recently rebooted
LOAD AVG on rfnd204d,nandomedia,com is 0.12
@@
@@page#14|1106927078.016993|152.52.2.254|radm200p.nandomedia.com|procs|0.0.0.0|1106928878|red|red|1106925877|web1|388890
status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005 Some processes are in error
&red redproc >=1 - not running, requires at least 1
&yellow yellowproc >=1 - not running, requires at least 1
@@
======================
myhobbit-alert.cfg. (email addresses were changed to protect the innocent):
##############################
# Begin Nando Modifications
#############################
HOST=radm200p.nandomedia.com
MAIL user-52400382fe79@xymon.invalid SERVICE=proc COLOR=yellow REPEAT=5m
MAIL user-56294d1c8449@xymon.invalid SERVICE=proc COLOR=red REPEAT=5m
-bash-2.05b$
list Henrik Størner
▸
On Fri, Jan 28, 2005 at 10:46:52AM -0500, Tom Georgoulias wrote:
Count me in on this. I've induced a couple of process failures on a test system and the email alerts aren't coming through. What other info should I provide or look for?
The status message says:
status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005
so it is the "procs" column that is in error.
HOST=radm200p.nandomedia.com
MAIL user-52400382fe79@xymon.invalid SERVICE=proc COLOR=yellow REPEAT=5m
MAIL user-56294d1c8449@xymon.invalid SERVICE=proc COLOR=red REPEAT=5mBut here you have rules for the "proc" (no "s") column. If I setup a config with these rules, but SERVICE=procs, your message triggers an alert e-mail. Me thinks it would be nice to have a "test" option for the alert module, so you can run it with a hostname + testname as input, and it will tell you which rules match, and which rules does not. Henrik
list Tom Georgoulias
▸
Henrik Stoerner wrote:
The status message says:status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005so it is the "procs" column that is in error.HOST=radm200p.nandomedia.com MAIL user-52400382fe79@xymon.invalid SERVICE=proc COLOR=yellow REPEAT=5m MAIL user-56294d1c8449@xymon.invalid SERVICE=proc COLOR=red REPEAT=5mBut here you have rules for the "proc" (no "s") column. If I setup a config with these rules, but SERVICE=procs, your message triggers an alert e-mail.
Yup, I'm an idiot. Bitten by a typo, once again. Sorry to bother you about that. I like your idea about the test option. That would be a nice troubleshooting feature. Tom
list Bruce Lysik
Here's the cpu alert coming in:
@@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150.1|1106950975|red|red|1106849049|cs|370516
status cs01.cpu red Fri Jan 28 13:52:54 PST 2005 up: 57 day(s), 1 users, 35 procs, load=1761
LOAD AVG on cs01 is 1761
@@
Here's the hobbit-alerts.cfg definition:
HOST=cs01
MAIL user-4e63a10f8934@xymon.invalid SERVICE=cpu REPEAT=5m COLOR=red
hobbitd_alert shows up as running. Nothing in page.log.
▸
--
Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid>
Operations Engineer
-----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Friday, January 28, 2005 5:25 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] alerts not emailing On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-But I fail to get any alerts sent out. I've confirmed that email is working from this machine, and nothing shows up in/var/log/hobbit/page.log. (And nothing relevant to this issue in any of the logs there.) Any help would be appreciated. I am having the same problem. A disk exceeds the Alarm threshold and goes red on the hobbit display but hobbitd_alert takes no action to send a mail.Could you try running the following command (login as the hobbit user): ~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat Let it run for 5-10 minutes (long any for the critical status to be updated) and let me know if there's any output. Henrik
list Henrik Størner
▸
On Fri, Jan 28, 2005 at 02:09:59PM -0800, Bruce Lysik wrote:
Here's the cpu alert coming in: @@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150.1|1106950975|red|red|1106849049|cs|370516
[snip]
Here's the hobbit-alerts.cfg definition:
HOST=cs01
MAIL user-4e63a10f8934@xymon.invalid SERVICE=cpu REPEAT=5m COLOR=red
This one should trigger all-right, and if I copy your alert-config
into my own config-file it does send out alerts (in fact, I'm afraid
I sent some of them to you because I forgot to change the e-mail
address in the config file).
Could you try this:
Login as hobbit
Cut-and-paste the alert message into a file "alert.msg", with the
"@@page..." as the first line, and the "@@" as the last line.
The run these commands:
./server/bin/bbcmd --env=server/etc/hobbitserver.cfg sh
# You now have a shell with the Hobbit environment set
hobbitd_alert --debug <alert.msg
You should see messages like these:
2005-01-28 23:21:47 send_alert cs01:cpu state 0
2005-01-28 23:21:47 criteriamatch cs01:cpu cs01:(NULL):(NULL)
2005-01-28 23:21:47 Checking default color setting 70 against 5 gives 1
2005-01-28 23:21:47 Found a first matching rule
2005-01-28 23:21:47 criteriamatch cs01:cpu (NULL):(NULL):cpu
2005-01-28 23:21:47 Checking explicit color setting 10000000040 against 5 gives 1
2005-01-28 23:21:47 repeat cs01|cpu|mail|user-4e63a10f8934@xymon.invalid at 0
2005-01-28 23:21:47 Alert for cs01:cpu to user-4e63a10f8934@xymon.invalid
2005-01-28 23:21:47 No more secondary matching rule
Henrik
list Bruce Lysik
▸
The run these commands: ./server/bin/bbcmd --env=server/etc/hobbitserver.cfg sh # You now have a shell with the Hobbit environment set hobbitd_alert --debug <alert.msg
Sure thing: sh-2.05b$ hobbitd_alert --debug <alerts.msg 2005-01-28 14:38:15 hobbitd_alert: Got message 2448 @@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150.1|1106950975|red|red|1106849049|cs|370516 2005-01-28 14:38:15 Got page message from cs01:cpu 2005-01-28 14:38:15 Alert status changed from 0 to 1 2005-01-28 14:38:15 Found no first matching rule 2005-01-28 14:38:15 1 alerts to go 2005-01-28 14:38:15 send_alert cs01:cpu state 0 2005-01-28 14:38:15 criteriamatch cs01:cpu cs01:(NULL):(NULL) 2005-01-28 14:38:15 criteriamatch cs01:cpu cs01:(NULL):(NULL) 2005-01-28 14:38:15 Checking default color setting 70 against 5 gives 1 2005-01-28 14:38:15 Found a first matching rule 2005-01-28 14:38:15 criteriamatch cs01:cpu (NULL):(NULL):cpu 2005-01-28 14:38:15 Checking explicit color setting 10000000040 against 5 gives 1 2005-01-28 14:38:15 No more secondary matching rule 2005-01-28 14:38:15 hobbitd_alert: Out-of-sync data in channel: 2005-01-28 14:38:15 Checking default color setting 70 against 5 gives 1 2005-01-28 14:38:15 Found a first matching rule 2005-01-28 14:38:15 criteriamatch cs01:cpu (NULL):(NULL):cpu 2005-01-28 14:38:15 Checking explicit color setting 10000000040 against 5 gives 1 2005-01-28 14:38:15 repeat cs01|cpu|mail|user-4e63a10f8934@xymon.invalid at 0 2005-01-28 14:38:15 Alert for cs01:cpu to user-4e63a10f8934@xymon.invalid sh-2.05b$ 2005-01-28 14:38:15 No more secondary matching rule And that actually sent me an alert email. Huh. So how come no email normally?
▸
--
Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid>
Operations Engineer
-----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Friday, January 28, 2005 2:34 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] alerts not emailing On Fri, Jan 28, 2005 at 02:09:59PM -0800, Bruce Lysik wrote:Here's the cpu alert coming in:
@@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150 .1|1106950975|red|red|1106849049|cs|370516
▸
[snip]Here's the hobbit-alerts.cfg definition: HOST=cs01 MAIL user-4e63a10f8934@xymon.invalid SERVICE=cpu REPEAT=5m COLOR=redThis one should trigger all-right, and if I copy your alert-config into my own config-file it does send out alerts (in fact, I'm afraid I sent some of them to you because I forgot to change the e-mail address in the config file). Could you try this: Login as hobbit Cut-and-paste the alert message into a file "alert.msg", with the "@@page..." as the first line, and the "@@" as the last line. You should see messages like these: 2005-01-28 23:21:47 send_alert cs01:cpu state 0 2005-01-28 23:21:47 criteriamatch cs01:cpu cs01:(NULL):(NULL) 2005-01-28 23:21:47 Checking default color setting 70 against 5 gives 1 2005-01-28 23:21:47 Found a first matching rule 2005-01-28 23:21:47 criteriamatch cs01:cpu (NULL):(NULL):cpu 2005-01-28 23:21:47 Checking explicit color setting 10000000040 against 5 gives 1 2005-01-28 23:21:47 repeat cs01|cpu|mail|user-4e63a10f8934@xymon.invalid at 0 2005-01-28 23:21:47 Alert for cs01:cpu to user-4e63a10f8934@xymon.invalid 2005-01-28 23:21:47 No more secondary matching rule Henrik
list Bruce Lysik
▸
And that actually sent me an alert email. Huh. So how come no email normally?
Okay, after restarting hobbit, my alerts appear to be working. -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
list Bruce Lysik
▸
And that actually sent me an alert email. Huh. So how come no email normally?Okay, after restarting hobbit, my alerts appear to be working.
Perhaps I posted too soon. Two alerts triggered after restart, but now one with a 5 minute repeat hasn't triggered again. -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer