I tried a couple of these, and it says it's sending mail to me, but there is
nothing in the log...
Ah wait, here's something in the log: postfix got munged when an updated
mailman rpm was loaded on the box. But it should have still queued the
message.
I'll see if anything goes down today. Probably will...
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Sunday, March 20, 2005 7:23 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] alerts still not alerting
On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
I'm still flummoxed by hobbit-alerts. I'm certain I broke something,
because I am not getting any alerts from the box.
It's probably a config error ...
The only logs in /var/log/hobbit/page.log are
2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or
restarted.
I see a couple of those in the hobbitlaunch.log file as well, I also see
the following error:
2005-03-19 10:14:21 Task bbdisplay started with PID 7417
2005-03-19 10:14:21 Task bbretest started with PID 7418
2005-03-19 10:14:29 Our child has failed and will not talk to us
2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the
error-message. I've fixed that. But it generally means that one of the
hobbitd helper tasks has stopped responding.
Here is a sample host that is not paging. The info page lists:
Service Recipient 1st Delay Stop after Repeat Time of Day Colors
conn user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
telnet user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
Both telnet and conn have been down on this host for over two hours.
The salient rule is:
HOST=%.
MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
Your "HOST=" is wrong - it will only match hostnames with exactly one
letter (do you really have a host named "a" ?) - if you want to match
all hosts, then it's "HOST=%.*" or the simple form "HOST=*"
So some other rule must be generating the info-column output you
have, and therefore even if your HOST entry was correct, the rule
would not trigger because of the UNMATCHED restriction.
Could you try running
exec ~hobbit/server/bin/bbcmd
hobbitd_alert --test HOSTNAME conn "" 120 red
That should tell you how the alert is handled, and who gets notified
using what rules.
Regards,
Henrik