Xymon Mailing List Archive search

alerts still not alerting

7 messages in this thread

list Daniel J McDonald · Sat, 19 Mar 2005 10:33:09 -0600 ·
I'm still flummoxed by hobbit-alerts.  I'm certain I broke something,
because I am not getting any alerts from the box.

The only logs in /var/log/hobbit/page.log are 
2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument

I see a couple of those in the hobbitlaunch.log file as well, I also see
the following error:
2005-03-19 10:14:21 Task bbdisplay started with PID 7417
2005-03-19 10:14:21 Task bbretest started with PID 7418
2005-03-19 10:14:29 Our child has failed and will not talk to us
2005-03-19 10:14:36 Our child has failed and will not talk to us

Not knowning which child makes it difficult to figure out what is going
on.  bbpage is aparently running - the logfile says process 5892 is
bbpage, and there is a process 5892 still running.

I fixed the "unmatched" syntax error I had before.

Here is a sample host that is not paging.  The info page lists:
Alerting: Service Recipient 1st Delay Stop after Repeat Time of Day
Colors 
conn user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red 
telnet user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red 

Both telnet and conn have been down on this host for over two hours.

The salient rule is:
HOST=%.
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED

I imagine I'm doing something terribly silly, but I'm just not clear
what it might be.
list Asif Iqbal · Sat, 19 Mar 2005 13:41:53 -0500 ·
On Sat, Mar 19, 2005 at 10:33:09AM, Daniel J McDonald wrote:
[...] 
The salient rule is:
HOST=%.
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
Try this

         MAIL user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
quoted from Daniel J McDonald
 RECOVERED COLOR="red" UNMATCHED
I imagine I'm doing something terribly silly, but I'm just not clear
what it might be.
-- 

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
"..there are two kinds of people: those who work and those who take the credit...try
 to be in the first group;...less competition there."  - Indira Gandhi
list Dan McDonald · Sat, 19 Mar 2005 13:49:46 -0600 ·
quoted from Asif Iqbal
The salient rule is:
HOST=%.
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
Try this

        MAIL user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED

Made the change.  Restarted hobbit.  Added a new test - it turned red.
Stayed red for 35 minutes.  No alerts.
list Asif Iqbal · Sat, 19 Mar 2005 16:34:44 -0500 ·
quoted from Dan McDonald
On Sat, Mar 19, 2005 at 01:49:46PM, McDonald, Dan wrote:
Made the change.  Restarted hobbit.  Added a new test - it turned red.
Stayed red for 35 minutes.  No alerts.
Can you take the DURATION parameter out for a sec and run the following
test for the hostname.service that is RED and post it here

./bin/bbcdm hobbitd_alert --test hostname service

You should have a line that matches for the RED and show who should be
alerted. I am assuming hobbitd server can send email out.

If that shows OK I will have to request you to post the output of this

./bin/bbcmd hobbitd_alert --dump-config

Thanks
quoted from Asif Iqbal
-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
"..there are two kinds of people: those who work and those who take the credit...try
 to be in the first group;...less competition there."  - Indira Gandhi
list Henrik Størner · Sun, 20 Mar 2005 14:23:16 +0100 ·
quoted from Daniel J McDonald
On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
I'm still flummoxed by hobbit-alerts.  I'm certain I broke something,
because I am not getting any alerts from the box.
It's probably a config error ... 
quoted from Daniel J McDonald
The only logs in /var/log/hobbit/page.log are 
2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or
restarted.
quoted from Daniel J McDonald
I see a couple of those in the hobbitlaunch.log file as well, I also see
the following error:
2005-03-19 10:14:21 Task bbdisplay started with PID 7417
2005-03-19 10:14:21 Task bbretest started with PID 7418
2005-03-19 10:14:29 Our child has failed and will not talk to us
2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the
error-message. I've fixed that. But it generally means that one of the
hobbitd helper tasks has stopped responding.
quoted from Daniel J McDonald
Here is a sample host that is not paging.  The info page lists:
Service Recipient 1st Delay Stop after Repeat Time of Day Colors 
conn user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red 
telnet user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red

Both telnet and conn have been down on this host for over two hours.

The salient rule is:
HOST=%.
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
Your "HOST=" is wrong - it will only match hostnames with exactly one
letter (do you really have a host named "a" ?) - if you want to match
all hosts, then it's "HOST=%.*" or the simple form "HOST=*"

So some other rule must be generating the info-column output you
have, and therefore even if your HOST entry was correct, the rule
would not trigger because of the UNMATCHED restriction.

Could you try running

   exec ~hobbit/server/bin/bbcmd
   hobbitd_alert --test HOSTNAME conn "" 120 red

That should tell you how the alert is handled, and who gets notified
using what rules.


Regards,
Henrik
list Dan McDonald · Mon, 21 Mar 2005 09:27:46 -0600 ·
I tried a couple of these, and it says it's sending mail to me, but there is
nothing in the log...

Ah wait, here's something in the log: postfix got munged when an updated
mailman rpm was loaded on the box.  But it should have still queued the
message.

I'll see if anything goes down today.  Probably will...
quoted from Henrik Størner
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Sunday, March 20, 2005 7:23 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] alerts still not alerting


On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
I'm still flummoxed by hobbit-alerts.  I'm certain I broke something,
because I am not getting any alerts from the box.
It's probably a config error ... 
The only logs in /var/log/hobbit/page.log are 
2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or
restarted.
I see a couple of those in the hobbitlaunch.log file as well, I also see
the following error:
2005-03-19 10:14:21 Task bbdisplay started with PID 7417
2005-03-19 10:14:21 Task bbretest started with PID 7418
2005-03-19 10:14:29 Our child has failed and will not talk to us
2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the
error-message. I've fixed that. But it generally means that one of the
hobbitd helper tasks has stopped responding.
Here is a sample host that is not paging.  The info page lists:
Service Recipient 1st Delay Stop after Repeat Time of Day Colors 
conn user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red 
telnet user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red

Both telnet and conn have been down on this host for over two hours.

The salient rule is:
HOST=%.
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
Your "HOST=" is wrong - it will only match hostnames with exactly one
letter (do you really have a host named "a" ?) - if you want to match
all hosts, then it's "HOST=%.*" or the simple form "HOST=*"

So some other rule must be generating the info-column output you
have, and therefore even if your HOST entry was correct, the rule
would not trigger because of the UNMATCHED restriction.

Could you try running

   exec ~hobbit/server/bin/bbcmd
   hobbitd_alert --test HOSTNAME conn "" 120 red

That should tell you how the alert is handled, and who gets notified
using what rules.


Regards,
Henrik
list Asif Iqbal · Mon, 21 Mar 2005 18:42:23 -0500 ·
McDonald,

Your message was top-posted.  Please configure your MUA to quote
correctly before sending messages to mailing lists.  If you don't know what this
means, read this: http://www.faqs.org/docs/jargon/T/top-post.html

To learn what "quote correctly" means, read this:
http://www.netmeister.org/news/learn2quote2.html

If you are using MS MUA, these free add-on packages can apparently fix
their quoting style for you: http://home.in.tum.de/~jain/software/oe-quotefix/
http://home.in.tum.de/~jain/software/outlook-quotefix/
quoted from Dan McDonald

On Mon, Mar 21, 2005 at 09:27:46AM, McDonald, Dan wrote:
I tried a couple of these, and it says it's sending mail to me, but there is
nothing in the log...
Just try to send an email to your email from hobbitd server to see if
you receive it. I do not think your issue is related to hobbitd. I would
ask postfix mailing list for more help
quoted from Asif Iqbal
Ah wait, here's something in the log: postfix got munged when an updated
mailman rpm was loaded on the box.  But it should have still queued the
message.
-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
"..there are two kinds of people: those who work and those who take the credit...try
 to be in the first group;...less competition there."  - Indira Gandhi