TIME alert problems (still)
list Mike Rowell
Fellow Hobbitiers...
Some may remember an issue I raised a few weeks back with regard to the
TIME option in the hobbit-alerts config in that TIME was not being
honoured and so we were getting alerted during blackout time periods
that were set. This still looks like it's an issue in the snapshot I
downloaded 3days after the beta release.
Has anyone got any other ideas, is TIME only honoured on built in checks
or something along those lines as it's the ext scripts that are causing
the alerting (they are sending results to hobbit and hobbit does the
actual alerting).
Some information...
notifications.log
Sun Jun 18 21:19:37 2006 xxxxx.aq (10.6.2.2) support [139] 1150661977 0
Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) sysalert [137] 1150662261 0
Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) support [139] 1150662261 0
Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) sysalert [137] 1150662321
0
Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) support [139] 1150662321 0
Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) sysalert [137] 1150665601 0
Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) support [139] 1150665601 0
hobbit-alerts.cfg
HOST=*
MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow
MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h
TIME=W:0900:1700 STOP
(these are lines 135 and 136 so it looks like it's ignoring them
totally, although in bb-hostsvc.sh it shows them laid out properly with
the correct blackout times listed against the services). As you can see
from the information above even though the aq service is set to only
alert W(eekdays) between 0900 and 1700 we were still getting alerts over
the weekend.
I also have the same problem with another service, this one was just
easiest to get the information for.
Regards,
Mike Rowell
This email has been scanned for all viruses by the MessageLabs service.
list Henrik Størner
▸
On Mon, Jun 19, 2006 at 09:50:37AM +0100, Mike Rowell wrote:
Sun Jun 18 21:19:37 2006 xxxxx.aq (10.6.2.2) support [139] 1150661977 0 Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) sysalert [137] 1150662261 0 Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) support [139] 1150662261 0 Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) sysalert [137] 1150662321 0 Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) support [139] 1150662321 0 Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) sysalert [137] 1150665601 0 Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) support [139] 1150665601 0
HOST=*
MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow
MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h TIME=W:0900:1700 STOP
(these are lines 135 and 136 so it looks like it's ignoring them
totally, although in bb-hostsvc.sh it shows them laid out properly with
the correct blackout times listed against the services).What's on lines 137 and 139 of the hobbit-alerts.cfg file ? Those are the lines that trigger these alerts, as evidenced by the "[13x]" in the log entries. Regards, Henrik
list Mike Rowell
Henrik, On 137 and 139 we have the catch alls for sysalert and support (support is our red address and sysalert is where we send both to). Mike
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 19 June 2006 10:33
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] TIME alert problems (still)
On Mon, Jun 19, 2006 at 09:50:37AM +0100, Mike Rowell wrote:Sun Jun 18 21:19:37 2006 xxxxx.aq (10.6.2.2) support [139] 1150661977
0
Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) sysalert [137] 1150662261
0
Sun Jun 18 21:24:21 2006 xxxxx.aq (10.7.2.2) support [139] 1150662261
0
Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) sysalert [137] 1150662321
0
Sun Jun 18 21:25:21 2006 xxxxx.aq (10.8.2.2) support [139] 1150662321
0
Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) sysalert [137] 1150665601
0
Sun Jun 18 22:20:01 2006 xxxxx.aq (10.6.2.2) support [139] 1150665601
0
HOST=*
MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow
MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1hTIME=W:0900:1700 STOP
(these are lines 135 and 136 so it looks like it's ignoring them totally, although in bb-hostsvc.sh it shows them laid out properly with the correct blackout times listed against the services).
What's on lines 137 and 139 of the hobbit-alerts.cfg file ? Those are the lines that trigger these alerts, as evidenced by the "[13x]" in the log entries. Regards, Henrik This email has been scanned for all viruses by the MessageLabs service. This email has been scanned for all viruses by the MessageLabs service.
list Henrik Størner
▸
On Mon, Jun 19, 2006 at 10:38:17AM +0100, Mike Rowell wrote:
Henrik, On 137 and 139 we have the catch alls for sysalert and support (support is our red address and sysalert is where we send both to).
Well, those catch-all rules are what triggers the alerts you don't want.
They probably have a "UNMATCHED" setting ? But that will also cause
them to be applied when the rules above them are skipped due to time-
constraints.
In other words, if you have a setup like
HOST=myhost TEST=mytest
MAIL user-c0b4a5e3f417@xymon.invalid TIME=W:0800:1700
HOST=*
MAIL user-9a4e95710e98@xymon.invalid UNMATCHED
then "user-9a4e95710e98@xymon.invalid" will get all myhost.mytest alerts
that happen outside the weekdays-0800-1700 time window.
Regards,
Henrik
list Mike Rowell
Henrik, So what you're saying is that when you have a TIME blackout window for a service, even if the last rule for that service has STOP after it, the alerts continue until it finds a rule it can send with? That if it is what you are saying is not something I would be expecting. Just so you can see, these are the two lines 137 and 139. MAIL=user-d5da4a3e59bc@xymon.invalid COLOR=red,yellow REPEAT=1h FORMAT=PLAIN MAIL=user-fca9e44cc8cf@xymon.invalid COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h Regards,
▸
Mike
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 19 June 2006 11:43
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] TIME alert problems (still)
On Mon, Jun 19, 2006 at 10:38:17AM +0100, Mike Rowell wrote:Henrik, On 137 and 139 we have the catch alls for sysalert and support
(support
is our red address and sysalert is where we send both to).
Well, those catch-all rules are what triggers the alerts you don't want.
They probably have a "UNMATCHED" setting ? But that will also cause
them to be applied when the rules above them are skipped due to time-
constraints.
In other words, if you have a setup like
HOST=myhost TEST=mytest
MAIL user-c0b4a5e3f417@xymon.invalid TIME=W:0800:1700
HOST=*
MAIL user-9a4e95710e98@xymon.invalid UNMATCHED
then "user-9a4e95710e98@xymon.invalid" will get all myhost.mytest alerts
that happen outside the weekdays-0800-1700 time window.
Regards,
Henrik
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
list Henrik Størner
Hi Mike,
▸
On Mon, Jun 19, 2006 at 11:55:53AM +0100, Mike Rowell wrote:
So what you're saying is that when you have a TIME blackout window for a service, even if the last rule for that service has STOP after it, the alerts continue until it finds a rule it can send with?
Yes.
That if it is what you are saying is not something I would be expecting.
OK, let me try and explain why that is. From your other email I gather your alert configuration (lines 134-139) is like this:
▸
HOST=*
MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow
MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h TIME=W:0900:1700 STOP
MAIL=user-d5da4a3e59bc@xymon.invalid COLOR=red,yellow REPEAT=1h FORMAT=PLAIN
MAIL=user-fca9e44cc8cf@xymon.invalid COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h
The STOP keyword means (from the man-page):
"STOP Stop looking for more recipients after this one matches."
So STOP only applies for rules that are positively matched (ie. they did
result in an alert being sent).
If STOP meant "after seeing this rule, whether it matched or not, stop
looking for any more recipients" - then your two last lines (the "catch-all"
rules) would never trigger because there's a STOP rule in front of them.
And that is not what you would expect either.
I *think* that what you want is to have "sysalert" and "support" alerted
on weekdays, and the "systems at ..." and "support-rightmove at ..." alerted
outside this time window. May I suggest
TIME=W:0900:1700 SERVICE=aq
MAIL=sysalert COLOR=yellow FORMAT=PLAIN REPEAT=1h
MAIL=support COLOR=red FORMAT=SMS DURATION>5 REPEAT=1h
EXTIME=W:0900:1700
MAIL=user-d5da4a3e59bc@xymon.invalid COLOR=red,yellow REPEAT=1h FORMAT=PLAIN
MAIL=user-fca9e44cc8cf@xymon.invalid COLOR=red FORMAT=SMS DURATION>5 REPEAT=1h
Regards,
Henrik
list Mike Rowell
Thanks for this information Henrik, One small problem, I'm running the 4.2 beta snapshot from a few days after release, I'm getting this in the log files. 2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=W:0900:1700' at line 131 2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=*:0200:0700' at line 137 Can you let us know if it's the current snapshot we need to run to use this feature?
▸
Regards,
Mike
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 19 June 2006 13:47
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] TIME alert problems (still)
Hi Mike,
On Mon, Jun 19, 2006 at 11:55:53AM +0100, Mike Rowell wrote:
So what you're saying is that when you have a TIME blackout window for a service, even if the last rule for that service has STOP after it, the alerts continue until it finds a rule it can send with?
Yes.
That if it is what you are saying is not something I would be expecting.
OK, let me try and explain why that is. From your other email I gather
your alert configuration (lines 134-139) is like this:
HOST=*
MAIL=sysalert SERVICE=aq FORMAT=PLAIN REPEAT=1h COLOR=yellow
MAIL=support SERVICE=aq COLOR=RED FORMAT=SMS DURATION>5 REPEAT=1h
TIME=W:0900:1700 STOP
MAIL=user-7b0ad79b39aa@xymon.invalid COLOR=red,yellow REPEAT=1h FORMAT=PLAIN
MAIL=user-53467fd899a1@xymon.invalid COLOR=RED FORMAT=SMS
▸
DURATION>5 REPEAT=1h
The STOP keyword means (from the man-page):
"STOP Stop looking for more recipients after this one matches."
So STOP only applies for rules that are positively matched (ie. they did
result in an alert being sent).
If STOP meant "after seeing this rule, whether it matched or not, stop
looking for any more recipients" - then your two last lines (the
"catch-all"
rules) would never trigger because there's a STOP rule in front of them.
And that is not what you would expect either.
I *think* that what you want is to have "sysalert" and "support" alerted
on weekdays, and the "systems at ..." and "support-rightmove at ..." alerted
outside this time window. May I suggest
TIME=W:0900:1700 SERVICE=aq
MAIL=sysalert COLOR=yellow FORMAT=PLAIN REPEAT=1h
MAIL=support COLOR=red FORMAT=SMS DURATION>5 REPEAT=1h
EXTIME=W:0900:1700
MAIL=user-7b0ad79b39aa@xymon.invalid COLOR=red,yellow REPEAT=1h FORMAT=PLAIN
MAIL=user-53467fd899a1@xymon.invalid COLOR=red FORMAT=SMS
▸
DURATION>5 REPEAT=1h
Regards,
Henrik
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
list Henrik Størner
▸
On Mon, Jun 19, 2006 at 03:58:31PM +0100, Mike Rowell wrote:
Thanks for this information Henrik, One small problem, I'm running the 4.2 beta snapshot from a few days after release, I'm getting this in the log files. 2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=W:0900:1700' at line 131 2006-06-19 14:56:58 Ignored unknown/unexpected token 'EXTIME=*:0200:0700' at line 137 Can you let us know if it's the current snapshot we need to run to use this feature?
Oops - sorry. Dont have en "EXTIME" keyword, since it's simple to do with just TIME:
EXTIME=W:0900:1700
should be TIME=W:1700:0900,06:0000:2359 Which tells me that EXTIME is more readable, so perhaps I should go and create that one... Regards, Henrik