Xymon Mailing List Archive search

alerts still not alerting

list Dan McDonald
Mon, 21 Mar 2005 09:27:46 -0600
Message-Id: <user-d0d2a6275b4c@xymon.invalid>

I tried a couple of these, and it says it's sending mail to me, but there is
nothing in the log...

Ah wait, here's something in the log: postfix got munged when an updated
mailman rpm was loaded on the box.  But it should have still queued the
message.

I'll see if anything goes down today.  Probably will...
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Sunday, March 20, 2005 7:23 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] alerts still not alerting


On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
I'm still flummoxed by hobbit-alerts.  I'm certain I broke something,
because I am not getting any alerts from the box.
It's probably a config error ... 
The only logs in /var/log/hobbit/page.log are 
2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or
restarted.
I see a couple of those in the hobbitlaunch.log file as well, I also see
the following error:
2005-03-19 10:14:21 Task bbdisplay started with PID 7417
2005-03-19 10:14:21 Task bbretest started with PID 7418
2005-03-19 10:14:29 Our child has failed and will not talk to us
2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the
error-message. I've fixed that. But it generally means that one of the
hobbitd helper tasks has stopped responding.
Here is a sample host that is not paging.  The info page lists:
Service Recipient 1st Delay Stop after Repeat Time of Day Colors 
conn user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red 
telnet user-290ce4e24e19@xymon.invalid (R) 30m  - 5d  - red

Both telnet and conn have been down on this host for over two hours.

The salient rule is:
HOST=%.
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
Your "HOST=" is wrong - it will only match hostnames with exactly one
letter (do you really have a host named "a" ?) - if you want to match
all hosts, then it's "HOST=%.*" or the simple form "HOST=*"

So some other rule must be generating the info-column output you
have, and therefore even if your HOST entry was correct, the rule
would not trigger because of the UNMATCHED restriction.

Could you try running

   exec ~hobbit/server/bin/bbcmd
   hobbitd_alert --test HOSTNAME conn "" 120 red

That should tell you how the alert is handled, and who gets notified
using what rules.


Regards,
Henrik