Xymon Mailing List Archive search

recovered messages/SMS missing

5 messages in this thread

list Dominique Frise · Mon, 22 May 2006 11:16:00 +0200 ·
Hi,

We use the RECOVERED keyword for all recipients defined in hobbit-alerts.cfg.

We noticed a problem for hosts where alerting for a given service is excluded during a certain time. When a problem occurs on the service -out of the exclusion time-, the yellow/red alarms get sent. When the problem is resolved though, there is no recovered confirmation message/SMS. This issue is not related to the amount of time the service was down.


Example configuration and logs:

----hobbit-alerts.cfg----
...
...
# Do not send anything for given service(s) during period of time
HOST=test3 SERVICE=http TIME=*:0305:0315
...
...
# Rules by administrator
HOST=test3
MAIL user-5a72e5dcda3f@xymon.invalid REPEAT=24h RECOVERED
SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
...
...

-----notification.log-----
Mon May 22 10:23:54 2006 test3.http (13.22.8.8) test.example at com 1148286234 600
Mon May 22 10:24:34 2006 test3.http (13.22.8.8) 0123456789 1148286234 600
...
...

------histfile for test3----------
Last 50 log entries (Full HTML log)
Date 	Status 	Duration
Mon May 22 10:24:15 2006 	green 	0:40:50
Mon May 22 10:23:54 2006 	red 	0:00:21


Is this a bug or a is something wrong with the exclusion specification?

Thanks

Dominique
UNIL - University of Lausanne
list Henrik Størner · Mon, 29 May 2006 22:32:51 +0200 ·
quoted from Dominique Frise
On Mon, May 22, 2006 at 11:16:00AM +0200, Dominique Frise wrote:
We use the RECOVERED keyword for all recipients defined in hobbit-alerts.cfg.

We noticed a problem for hosts where alerting for a given service is excluded during a certain time. When a problem occurs on the service -out of the exclusion time-, the yellow/red alarms get sent. When the problem is resolved though, there is no recovered confirmation message/SMS. This issue is not related to the amount of time the service was down.


Example configuration and logs:

----hobbit-alerts.cfg----
...
...
# Do not send anything for given service(s) during period of time
HOST=test3 SERVICE=http TIME=*:0305:0315
...
...
# Rules by administrator
HOST=test3
MAIL user-5a72e5dcda3f@xymon.invalid REPEAT=24h RECOVERED
SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED
If I understand your configuration snippet correctly, then this is a configuration error. You shouldn't have rules with no recipients, like the first one you have shown here.
Is this a bug or a is something wrong with the exclusion specification?
Your exclusion is wrong. It should be (notice the TIME setting):

HOST=test3 TIME=*:0315:0305
   MAIL user-5a72e5dcda3f@xymon.invalid REPEAT=24h RECOVERED
   SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED


Regards,
Henrik
list Dominique Frise · Tue, 30 May 2006 16:32:30 +0200 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Mon, May 22, 2006 at 11:16:00AM +0200, Dominique Frise wrote:
We use the RECOVERED keyword for all recipients defined in hobbit-alerts.cfg.

We noticed a problem for hosts where alerting for a given service is excluded during a certain time. When a problem occurs on the service -out of the exclusion time-, the yellow/red alarms get sent. When the problem is resolved though, there is no recovered confirmation message/SMS. This issue is not related to the amount of time the service was down.


Example configuration and logs:

----hobbit-alerts.cfg----
...
...
# Do not send anything for given service(s) during period of time
HOST=test3 SERVICE=http TIME=*:0305:0315
...
...
# Rules by administrator
HOST=test3
MAIL user-5a72e5dcda3f@xymon.invalid REPEAT=24h RECOVERED
SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED

If I understand your configuration snippet correctly, then this is a configuration error. You shouldn't have rules with no recipients, like the first one you have shown here.

Is this a bug or a is something wrong with the exclusion specification?

Your exclusion is wrong. It should be (notice the TIME setting):

HOST=test3 TIME=*:0315:0305
   MAIL user-5a72e5dcda3f@xymon.invalid REPEAT=24h RECOVERED
   SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h RECOVERED


Regards,
Henrik

Thank you fo these explanations.

That means it is not possible to write simple rules for excluding alerts for a given service for all hosts (HOST=*) during a period of time?
Do we really have to write the same exclude/include rules for each host?


Dominique
UNIL - University of Lausanne
list Henrik Størner · Tue, 30 May 2006 18:02:28 +0200 ·
quoted from Dominique Frise
On Tue, May 30, 2006 at 04:32:30PM +0200, Dominique Frise wrote:
Your exclusion is wrong. It should be (notice the TIME setting):

HOST=test3 TIME=*:0315:0305
  MAIL user-5a72e5dcda3f@xymon.invalid REPEAT=24h RECOVERED
  SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h >   RECOVERED
Thank you fo these explanations.

That means it is not possible to write simple rules for excluding alerts for a given service for all hosts (HOST=*) during a period of time?
Do we really have to write the same exclude/include rules for each host?
Not at all. If that's what you want to do, put this at the top of your rules list:

TIME=*:0305:0315
	IGNORE


Regards,
Henrik
list Dominique Frise · Wed, 31 May 2006 10:43:20 +0200 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Tue, May 30, 2006 at 04:32:30PM +0200, Dominique Frise wrote:
Your exclusion is wrong. It should be (notice the TIME setting):

HOST=test3 TIME=*:0315:0305
 MAIL user-5a72e5dcda3f@xymon.invalid REPEAT=24h RECOVERED
 SCRIPT /usr/local/sendsms 0123456789 COLOR=red FORMAT=SMS REPEAT=24h  RECOVERED
Thank you fo these explanations.

That means it is not possible to write simple rules for excluding alerts for a given service for all hosts (HOST=*) during a period of time?
Do we really have to write the same exclude/include rules for each host?

Not at all. If that's what you want to do, put this at the top of your rules list:

TIME=*:0305:0315
	IGNORE


Regards,
Henrik

Thanks again for this other tip.

I think I did not fully understand the IGNORE setting :-)

All our rules are now setup -hopefully- correctly and recovered messages are sent when they should :-)


Dominique
UNIL - University of Lausanne