alerts still not alerting
list Daniel J McDonald
I'm still flummoxed by hobbit-alerts. I'm certain I broke something,
because I am not getting any alerts from the box.
The only logs in /var/log/hobbit/page.log are
2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument
2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
I see a couple of those in the hobbitlaunch.log file as well, I also see
the following error:
2005-03-19 10:14:21 Task bbdisplay started with PID 7417
2005-03-19 10:14:21 Task bbretest started with PID 7418
2005-03-19 10:14:29 Our child has failed and will not talk to us
2005-03-19 10:14:36 Our child has failed and will not talk to us
Not knowning which child makes it difficult to figure out what is going
on. bbpage is aparently running - the logfile says process 5892 is
bbpage, and there is a process 5892 still running.
I fixed the "unmatched" syntax error I had before.
Here is a sample host that is not paging. The info page lists:
Alerting: Service Recipient 1st Delay Stop after Repeat Time of Day
Colors
conn user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
telnet user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
Both telnet and conn have been down on this host for over two hours.
The salient rule is:
HOST=%.
MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
I imagine I'm doing something terribly silly, but I'm just not clear
what it might be.
list Asif Iqbal
On Sat, Mar 19, 2005 at 10:33:09AM, Daniel J McDonald wrote:
[...]
The salient rule is:
HOST=%.
MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHED
Try this
MAIL user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
▸
RECOVERED COLOR="red" UNMATCHED
I imagine I'm doing something terribly silly, but I'm just not clear what it might be.
--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
"..there are two kinds of people: those who work and those who take the credit...try
to be in the first group;...less competition there." - Indira Gandhi
list Dan McDonald
▸
The salient rule is: HOST=%. MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m RECOVERED COLOR="red" UNMATCHEDTry this MAIL user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m RECOVERED COLOR="red" UNMATCHED
Made the change. Restarted hobbit. Added a new test - it turned red.
Stayed red for 35 minutes. No alerts.
list Asif Iqbal
▸
On Sat, Mar 19, 2005 at 01:49:46PM, McDonald, Dan wrote:
Made the change. Restarted hobbit. Added a new test - it turned red. Stayed red for 35 minutes. No alerts.
Can you take the DURATION parameter out for a sec and run the following test for the hostname.service that is RED and post it here ./bin/bbcdm hobbitd_alert --test hostname service You should have a line that matches for the RED and show who should be alerted. I am assuming hobbitd server can send email out. If that shows OK I will have to request you to post the output of this ./bin/bbcmd hobbitd_alert --dump-config Thanks
▸
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "..there are two kinds of people: those who work and those who take the credit...try to be in the first group;...less competition there." - Indira Gandhi
list Henrik Størner
▸
On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:
I'm still flummoxed by hobbit-alerts. I'm certain I broke something, because I am not getting any alerts from the box.
It's probably a config error ...
▸
The only logs in /var/log/hobbit/page.log are 2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument 2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or restarted.
▸
I see a couple of those in the hobbitlaunch.log file as well, I also see the following error: 2005-03-19 10:14:21 Task bbdisplay started with PID 7417 2005-03-19 10:14:21 Task bbretest started with PID 7418 2005-03-19 10:14:29 Our child has failed and will not talk to us 2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the error-message. I've fixed that. But it generally means that one of the hobbitd helper tasks has stopped responding.
▸
Here is a sample host that is not paging. The info page lists:
Service Recipient 1st Delay Stop after Repeat Time of Day Colors
conn user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
telnet user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
Both telnet and conn have been down on this host for over two hours.
The salient rule is:
HOST=%.
MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHEDYour "HOST=" is wrong - it will only match hostnames with exactly one letter (do you really have a host named "a" ?) - if you want to match all hosts, then it's "HOST=%.*" or the simple form "HOST=*" So some other rule must be generating the info-column output you have, and therefore even if your HOST entry was correct, the rule would not trigger because of the UNMATCHED restriction. Could you try running exec ~hobbit/server/bin/bbcmd hobbitd_alert --test HOSTNAME conn "" 120 red That should tell you how the alert is handled, and who gets notified using what rules. Regards, Henrik
list Dan McDonald
I tried a couple of these, and it says it's sending mail to me, but there is nothing in the log... Ah wait, here's something in the log: postfix got munged when an updated mailman rpm was loaded on the box. But it should have still queued the message. I'll see if anything goes down today. Probably will...
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Sunday, March 20, 2005 7:23 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] alerts still not alerting
On Sat, Mar 19, 2005 at 10:33:09AM -0600, Daniel J McDonald wrote:I'm still flummoxed by hobbit-alerts. I'm certain I broke something, because I am not getting any alerts from the box.
It's probably a config error ...
The only logs in /var/log/hobbit/page.log are 2005-03-11 07:49:30 Tried to down BOARDBUSY: Invalid argument 2005-03-14 17:24:21 Tried to down BOARDBUSY: Invalid argument
These are harmless, and often occur when Hobbit is shutdown or restarted.
I see a couple of those in the hobbitlaunch.log file as well, I also see the following error: 2005-03-19 10:14:21 Task bbdisplay started with PID 7417 2005-03-19 10:14:21 Task bbretest started with PID 7418 2005-03-19 10:14:29 Our child has failed and will not talk to us 2005-03-19 10:14:36 Our child has failed and will not talk to us
That's a first - and you're right it should be more detailed in the error-message. I've fixed that. But it generally means that one of the hobbitd helper tasks has stopped responding.
Here is a sample host that is not paging. The info page lists:
Service Recipient 1st Delay Stop after Repeat Time of Day Colors
conn user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
telnet user-290ce4e24e19@xymon.invalid (R) 30m - 5d - red
Both telnet and conn have been down on this host for over two hours.
The salient rule is:
HOST=%.
MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCHEDYour "HOST=" is wrong - it will only match hostnames with exactly one letter (do you really have a host named "a" ?) - if you want to match all hosts, then it's "HOST=%.*" or the simple form "HOST=*" So some other rule must be generating the info-column output you have, and therefore even if your HOST entry was correct, the rule would not trigger because of the UNMATCHED restriction. Could you try running exec ~hobbit/server/bin/bbcmd hobbitd_alert --test HOSTNAME conn "" 120 red That should tell you how the alert is handled, and who gets notified using what rules. Regards, Henrik
list Asif Iqbal
McDonald, Your message was top-posted. Please configure your MUA to quote correctly before sending messages to mailing lists. If you don't know what this means, read this: http://www.faqs.org/docs/jargon/T/top-post.html To learn what "quote correctly" means, read this: http://www.netmeister.org/news/learn2quote2.html If you are using MS MUA, these free add-on packages can apparently fix their quoting style for you: http://home.in.tum.de/~jain/software/oe-quotefix/ http://home.in.tum.de/~jain/software/outlook-quotefix/
▸
On Mon, Mar 21, 2005 at 09:27:46AM, McDonald, Dan wrote:I tried a couple of these, and it says it's sending mail to me, but there is nothing in the log...
Just try to send an email to your email from hobbitd server to see if you receive it. I do not think your issue is related to hobbitd. I would ask postfix mailing list for more help
▸
Ah wait, here's something in the log: postfix got munged when an updated mailman rpm was loaded on the box. But it should have still queued the message.
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "..there are two kinds of people: those who work and those who take the credit...try to be in the first group;...less competition there." - Indira Gandhi