Xymon Mailing List Archive search

alerts not emailing

13 messages in this thread

list Bruce Lysik · Thu, 27 Jan 2005 17:54:06 -0800 ·
Hi,

So I'm trying to get alerts working.  Here's my test alerts config:

HOST=cs01 SERVICE=cpu
        MAIL user-af77cde45853@xymon.invalid

The cpu alarm on this host has been red for quite a while:

-bash-2.05b$ bin/bb 127.0.0.1 "hobbitdlog cs01.cpu"
cs01|cpu|red||1106849049|1106876950|1106878750|0|0|172.16.150.1|291230||
red Thu Jan 27 17:49:09 PST 2005 up: 56 day(s), 0 users, 31 procs, load=1627


LOAD AVG on cs01 is 1627

But I fail to get any alerts sent out.  I've confirmed that email is working from this machine, and nothing shows up in /var/log/hobbit/page.log.  (And nothing relevant to this issue in any of the logs there.)

Any help would be appreciated.

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Chris Morris · Fri, 28 Jan 2005 12:05:20 -0000 ·
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
quoted from Bruce Lysik
But I fail to get any alerts sent out.  I've confirmed that email is
working from this machine, and nothing shows up in
/var/log/hobbit/page.log.  (And nothing relevant to this issue in any of
the logs there.)

Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and goes
red on the hobbit display but hobbitd_alert takes no action to send a mail.

Chris


****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************
list Henrik Størner · Fri, 28 Jan 2005 14:24:42 +0100 ·
quoted from Chris Morris
On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
But I fail to get any alerts sent out.  I've confirmed that email is
working from this machine, and nothing shows up in
/var/log/hobbit/page.log.  (And nothing relevant to this issue in any of
the logs there.)

Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and goes
red on the hobbit display but hobbitd_alert takes no action to send a mail.
Could you try running the following command (login as the hobbit
user):

~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat

Let it run for 5-10 minutes (long any for the critical status to be
updated) and let me know if there's any output.


Henrik
list Chris Morris · Fri, 28 Jan 2005 13:48:41 -0000 ·
Henrik,

There is plenty of output from running that command - e.g.:

@@page#677|1106919385.676695|10.2.216.252|bku005|disk|10.2.48.244|1106921185
|red|green|1106919385||187413
status bku005.disk red Fri 28 Jan 13:36:25 2005 - Disk on bku005 at PANIC
level
&red /tmp (95%) has reached the defined disk space PANIC level (95%)

/dev/lv00          139264     15136    124128   11% /tsg
/dev/aixdoclv      319488     63000    256488   20% /aixdoc
/dev/lv10            8192      2116      6076   26% /innogy
/dev/hd4            98304     42824     55480   44% /
/dev/hd9var         49152     24656     24496   51% /var
/dev/lv01          409600    210716    198884   52% /bmc
/dev/linuxlv       614400    369204    245196   61% /linux
/dev/hd1            98304     60120     38184   62% /home
/dev/lv09          524288    392520    131768   75% /maint
/dev/lv11          917504    719512    197992   79% /downloads
/dev/hd2          1294336   1149268    145068   89% /usr
/dev/hd3            28672     27100      1572   95% /tmp
@@

But still no alerts are being emailed and the page.log is not being updated.

Chris
quoted from Henrik Størner
-----Original Message-----
From:	Henrik Stoerner [SMTP:user-ce4a2c883f75@xymon.invalid]
Sent:	Friday, January 28, 2005 1:25 PM
To:	user-ae9b8668bcde@xymon.invalid
Subject:	Re: [hobbit] alerts not emailing

On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services)
wrote:
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
But I fail to get any alerts sent out.  I've confirmed that email is
working from this machine, and nothing shows up in
/var/log/hobbit/page.log.  (And nothing relevant to this issue in any
of
the logs there.)

Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and
goes
red on the hobbit display but hobbitd_alert takes no action to send a
mail.
Could you try running the following command (login as the hobbit
user):

~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel
--channel=page cat

Let it run for 5-10 minutes (long any for the critical status to be
updated) and let me know if there's any output.


Henrik

****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************
list Henrik Størner · Fri, 28 Jan 2005 15:50:32 +0100 ·
OK, can you mail me your hobbit-alerts.cfg file then ?

I'm sure you already checked that hobbitd_alert is running -
it's enabled by default so it should be, but still ...


Thanks,
Henrik
quoted from Chris Morris


On Fri, Jan 28, 2005 at 01:48:41PM -0000, Morris, Chris (Shared Services) wrote:
Henrik,

There is plenty of output from running that command - e.g.:

@@page#677|1106919385.676695|10.2.216.252|bku005|disk|10.2.48.244|1106921185
|red|green|1106919385||187413
status bku005.disk red Fri 28 Jan 13:36:25 2005 - Disk on bku005 at PANIC
level
&red /tmp (95%) has reached the defined disk space PANIC level (95%)

/dev/lv00          139264     15136    124128   11% /tsg
/dev/aixdoclv      319488     63000    256488   20% /aixdoc
/dev/lv10            8192      2116      6076   26% /innogy
/dev/hd4            98304     42824     55480   44% /
/dev/hd9var         49152     24656     24496   51% /var
/dev/lv01          409600    210716    198884   52% /bmc
/dev/linuxlv       614400    369204    245196   61% /linux
/dev/hd1            98304     60120     38184   62% /home
/dev/lv09          524288    392520    131768   75% /maint
/dev/lv11          917504    719512    197992   79% /downloads
/dev/hd2          1294336   1149268    145068   89% /usr
/dev/hd3            28672     27100      1572   95% /tmp
@@

But still no alerts are being emailed and the page.log is not being updated.

Chris
-----Original Message-----
From:	Henrik Stoerner [SMTP:user-ce4a2c883f75@xymon.invalid]
Sent:	Friday, January 28, 2005 1:25 PM
To:	user-ae9b8668bcde@xymon.invalid
Subject:	Re: [hobbit] alerts not emailing

On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services)
wrote:
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
But I fail to get any alerts sent out.  I've confirmed that email is
working from this machine, and nothing shows up in
/var/log/hobbit/page.log.  (And nothing relevant to this issue in any
of
the logs there.)

Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and
goes
red on the hobbit display but hobbitd_alert takes no action to send a
mail.
Could you try running the following command (login as the hobbit
user):

~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel
--channel=page cat

Let it run for 5-10 minutes (long any for the critical status to be
updated) and let me know if there's any output.


Henrik

****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************

-- 

Henrik Storner
list Tom Georgoulias · Fri, 28 Jan 2005 10:46:52 -0500 ·
Count me in on this.  I've induced a couple of process failures on a test system and the email alerts aren't coming through.  What other info should I provide or look for?

Tom

-bash-2.05b$ ./bbcmd --env=/home/bb/hobbit/server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat


@@page#13|1106927004.323354|152.52.2.252|rfnd204d.nandomedia.com|cpu|0.0.0.0|1106928804|yellow|yellow|1106926704|web6|885572
status rfnd204d,nandomedia,com.cpu yellow Fri Jan 28 10:43:24 EST 2005 up: 5 min, 0 users, 55 procs, load=0.12

Warning: Machine recently rebooted

LOAD AVG on rfnd204d,nandomedia,com is 0.12

@@
@@page#14|1106927078.016993|152.52.2.254|radm200p.nandomedia.com|procs|0.0.0.0|1106928878|red|red|1106925877|web1|388890
status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005 Some processes are in error

&red redproc >=1 - not running, requires at least 1
&yellow yellowproc >=1 - not running, requires at least 1

@@

======================

myhobbit-alert.cfg. (email addresses were changed to protect the innocent):


##############################
# Begin Nando Modifications
#############################

HOST=radm200p.nandomedia.com
         MAIL user-52400382fe79@xymon.invalid SERVICE=proc COLOR=yellow REPEAT=5m
         MAIL user-56294d1c8449@xymon.invalid SERVICE=proc COLOR=red REPEAT=5m

-bash-2.05b$
list Henrik Størner · Fri, 28 Jan 2005 20:46:58 +0100 ·
quoted from Tom Georgoulias
On Fri, Jan 28, 2005 at 10:46:52AM -0500, Tom Georgoulias wrote:
Count me in on this.  I've induced a couple of process failures on a 
test system and the email alerts aren't coming through.  What other info 
should I provide or look for?
The status message says:
status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005 
so it is the "procs" column that is in error.
HOST=radm200p.nandomedia.com
        MAIL user-52400382fe79@xymon.invalid SERVICE=proc COLOR=yellow REPEAT=5m
        MAIL user-56294d1c8449@xymon.invalid SERVICE=proc COLOR=red REPEAT=5m
But here you have rules for the "proc" (no "s") column.

If I setup a config with these rules, but SERVICE=procs, your message
triggers an alert e-mail.


Me thinks it would be nice to have a "test" option for the alert
module, so you can run it with a hostname + testname as input, and it
will tell you which rules match, and which rules does not.


Henrik
list Tom Georgoulias · Fri, 28 Jan 2005 15:45:11 -0500 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
The status message says:
status radm200p,nandomedia,com.procs red Fri Jan 28 10:44:38 EST 2005
so it is the "procs" column that is in error.
HOST=radm200p.nandomedia.com
        MAIL user-52400382fe79@xymon.invalid SERVICE=proc COLOR=yellow REPEAT=5m
        MAIL user-56294d1c8449@xymon.invalid SERVICE=proc COLOR=red REPEAT=5m
But here you have rules for the "proc" (no "s") column.

If I setup a config with these rules, but SERVICE=procs, your message
triggers an alert e-mail.
Yup, I'm an idiot.  Bitten by a typo, once again.  Sorry to bother you about that.  I like your idea about the test option.  That would be a nice troubleshooting feature.

Tom
list Bruce Lysik · Fri, 28 Jan 2005 14:09:59 -0800 ·
Here's the cpu alert coming in:

@@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150.1|1106950975|red|red|1106849049|cs|370516
status cs01.cpu red Fri Jan 28 13:52:54 PST 2005 up: 57 day(s), 1 users, 35 procs, load=1761


LOAD AVG on cs01 is 1761

@@

Here's the hobbit-alerts.cfg definition:

HOST=cs01
        MAIL user-4e63a10f8934@xymon.invalid SERVICE=cpu REPEAT=5m COLOR=red

hobbitd_alert shows up as running.  Nothing in page.log.
quoted from Henrik Størner

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Friday, January 28, 2005 5:25 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] alerts not emailing


On Fri, Jan 28, 2005 at 12:05:20PM -0000, Morris, Chris (Shared Services) wrote:
On Friday, January 28, 2005 1:54 AM, Bruce Lysik wrote :-
But I fail to get any alerts sent out.  I've confirmed that email is
working from this machine, and nothing shows up in
/var/log/hobbit/page.log.  (And nothing relevant to this issue in any of
the logs there.)

Any help would be appreciated.
I am having the same problem. A disk exceeds the Alarm threshold and goes
red on the hobbit display but hobbitd_alert takes no action to send a mail.
Could you try running the following command (login as the hobbit
user):

~/server/bin/bbcmd --env=server/etc/hobbitserver.cfg hobbitd_channel --channel=page cat

Let it run for 5-10 minutes (long any for the critical status to be
updated) and let me know if there's any output.


Henrik

list Henrik Størner · Fri, 28 Jan 2005 23:34:11 +0100 ·
quoted from Bruce Lysik
On Fri, Jan 28, 2005 at 02:09:59PM -0800, Bruce Lysik wrote:
Here's the cpu alert coming in:

@@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150.1|1106950975|red|red|1106849049|cs|370516
[snip]
Here's the hobbit-alerts.cfg definition:

HOST=cs01
        MAIL user-4e63a10f8934@xymon.invalid SERVICE=cpu REPEAT=5m COLOR=red

This one should trigger all-right, and if I copy your alert-config
into my own config-file it does send out alerts (in fact, I'm afraid
I sent some of them to you because I forgot to change the e-mail
address in the config file).

Could you try this:

Login as hobbit 

Cut-and-paste the alert message into a file "alert.msg", with the
"@@page..." as the first line, and the "@@" as the last line.

The run these commands:

./server/bin/bbcmd --env=server/etc/hobbitserver.cfg sh
# You now have a shell with the Hobbit environment set
hobbitd_alert --debug <alert.msg

You should see messages like these:

2005-01-28 23:21:47 send_alert cs01:cpu state 0
2005-01-28 23:21:47 criteriamatch cs01:cpu cs01:(NULL):(NULL)
2005-01-28 23:21:47 Checking default color setting 70 against 5 gives 1
2005-01-28 23:21:47 Found a first matching rule
2005-01-28 23:21:47 criteriamatch cs01:cpu (NULL):(NULL):cpu
2005-01-28 23:21:47 Checking explicit color setting 10000000040 against 5 gives 1
2005-01-28 23:21:47   repeat cs01|cpu|mail|user-4e63a10f8934@xymon.invalid at 0
2005-01-28 23:21:47   Alert for cs01:cpu to user-4e63a10f8934@xymon.invalid
2005-01-28 23:21:47 No more secondary matching rule


Henrik
list Bruce Lysik · Fri, 28 Jan 2005 14:43:59 -0800 ·
quoted from Henrik Størner
The run these commands:

./server/bin/bbcmd --env=server/etc/hobbitserver.cfg sh
# You now have a shell with the Hobbit environment set
hobbitd_alert --debug <alert.msg
Sure thing:

sh-2.05b$ hobbitd_alert --debug <alerts.msg
2005-01-28 14:38:15 hobbitd_alert: Got message 2448 @@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150.1|1106950975|red|red|1106849049|cs|370516
2005-01-28 14:38:15 Got page message from cs01:cpu
2005-01-28 14:38:15 Alert status changed from 0 to 1
2005-01-28 14:38:15 Found no first matching rule
2005-01-28 14:38:15 1 alerts to go
2005-01-28 14:38:15 send_alert cs01:cpu state 0
2005-01-28 14:38:15 criteriamatch cs01:cpu cs01:(NULL):(NULL)
2005-01-28 14:38:15 criteriamatch cs01:cpu cs01:(NULL):(NULL)
2005-01-28 14:38:15 Checking default color setting 70 against 5 gives 1
2005-01-28 14:38:15 Found a first matching rule
2005-01-28 14:38:15 criteriamatch cs01:cpu (NULL):(NULL):cpu
2005-01-28 14:38:15 Checking explicit color setting 10000000040 against 5 gives 1
2005-01-28 14:38:15 No more secondary matching rule
2005-01-28 14:38:15 hobbitd_alert: Out-of-sync data in channel: 

2005-01-28 14:38:15 Checking default color setting 70 against 5 gives 1
2005-01-28 14:38:15 Found a first matching rule
2005-01-28 14:38:15 criteriamatch cs01:cpu (NULL):(NULL):cpu
2005-01-28 14:38:15 Checking explicit color setting 10000000040 against 5 gives 1
2005-01-28 14:38:15   repeat cs01|cpu|mail|user-4e63a10f8934@xymon.invalid at 0
2005-01-28 14:38:15   Alert for cs01:cpu to user-4e63a10f8934@xymon.invalid
sh-2.05b$ 2005-01-28 14:38:15 No more secondary matching rule

And that actually sent me an alert email.  Huh.  So how come no email normally?
quoted from Bruce Lysik

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Friday, January 28, 2005 2:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] alerts not emailing


On Fri, Jan 28, 2005 at 02:09:59PM -0800, Bruce Lysik wrote:
Here's the cpu alert coming in:

@@page#2448|1106949175.057403|172.16.150.1|cs01|cpu|172.16.150
.1|1106950975|red|red|1106849049|cs|370516
quoted from Henrik Størner
[snip]
Here's the hobbit-alerts.cfg definition:

HOST=cs01
        MAIL user-4e63a10f8934@xymon.invalid SERVICE=cpu REPEAT=5m COLOR=red

This one should trigger all-right, and if I copy your alert-config
into my own config-file it does send out alerts (in fact, I'm afraid
I sent some of them to you because I forgot to change the e-mail
address in the config file).

Could you try this:

Login as hobbit 

Cut-and-paste the alert message into a file "alert.msg", with the
"@@page..." as the first line, and the "@@" as the last line.


You should see messages like these:

2005-01-28 23:21:47 send_alert cs01:cpu state 0
2005-01-28 23:21:47 criteriamatch cs01:cpu cs01:(NULL):(NULL)
2005-01-28 23:21:47 Checking default color setting 70 against 
5 gives 1
2005-01-28 23:21:47 Found a first matching rule
2005-01-28 23:21:47 criteriamatch cs01:cpu (NULL):(NULL):cpu
2005-01-28 23:21:47 Checking explicit color setting 
10000000040 against 5 gives 1
2005-01-28 23:21:47   repeat cs01|cpu|mail|user-4e63a10f8934@xymon.invalid at 0
2005-01-28 23:21:47   Alert for cs01:cpu to user-4e63a10f8934@xymon.invalid
2005-01-28 23:21:47 No more secondary matching rule


Henrik

list Bruce Lysik · Fri, 28 Jan 2005 17:31:22 -0800 ·
quoted from Bruce Lysik
And that actually sent me an alert email.  Huh.  So how come 
no email normally?
Okay, after restarting hobbit, my alerts appear to be working.

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Bruce Lysik · Fri, 28 Jan 2005 17:42:58 -0800 ·
quoted from Bruce Lysik
And that actually sent me an alert email.  Huh.  So how come 
no email normally?
Okay, after restarting hobbit, my alerts appear to be working.
Perhaps I posted too soon.  Two alerts triggered after restart, but now one with a 5 minute repeat hasn't triggered again.

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer