Xymon Mailing List Archive search

Alert REPEAT not working in 4.3.15.

5 messages in this thread

list Johan Sjöberg · Mon, 10 Feb 2014 07:18:00 +0000 ·
Hi.
A while ago, we upgraded to 4.3.15. It seems like the alert repeat setting isn't working, only the first alert is sent. We have an on-call person that receives the first alert via SMS after 7 minutes. It should then repeat every 15 minutes. The rest of the team gets their first alert after 22 minutes.

Example conf (e-mail and phone numbers masked):
   SCRIPT /usr/local/xymon/server/ext/html_mail.pl user-76c7fb09b118@xymon.invalid EXSERVICE=conn REPEAT=1d RECOVERED FORMAT=PLAIN
    SCRIPT /usr/local/xymon/server/ext/html_mail.pl user-76c7fb09b118@xymon.invalid SERVICE=conn DURATION>1 REPEAT=1d RECOVERED FORMAT=PLAIN
    SCRIPT /usr/local/bin/sendsms.sh 111111 DURATION>7 FORMAT=SMS REPEAT=15
    SCRIPT /usr/local/bin/sendsms.sh 222222 DURATION>22 FORMAT=SMS REPEAT=15
    SCRIPT /usr/local/bin/sendsms.sh 333333 DURATION>22 FORMAT=SMS REPEAT=15
    SCRIPT /usr/local/bin/sendsms.sh 444444 DURATION>22 FORMAT=SMS REPEAT=15

From the notification log:

Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123) user-76c7fb09b118@xymon.invalid 1392007395 0
Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111 1392007875 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222 1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333 1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444 1392008717 0

Strangely though, it seems like it was working on Feb 5, which was also after the upgrade. The only change done since then is the patch for xymonnet, and don't see how this could affect the alerts?

Regards,
Johan
list Henrik Størner · Mon, 10 Feb 2014 10:22:03 +0100 ·
quoted from Johan Sjöberg
Den 2014-02-10 8:18, Johan Sjöberg skrev:
A while ago, we upgraded to 4.3.15. It seems like the alert repeat
setting isn't working, only the first alert is sent. We have an on-call
person that receives the first alert via SMS after 7 minutes. It should
then repeat every 15 minutes. The rest of the team gets their first alert
after 22 minutes.
[snip config]
quoted from Johan Sjöberg
From the notification log:

Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123)
user-76c7fb09b118@xymon.invalid 1392007395 0

Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111
1392007875 0

Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222
1392008717 0

Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333
1392008717 0

Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444
1392008717 0

Strangely though, it seems like it was working on Feb 5, which was also
after the upgrade. The only change done since then is the patch for
xymonnet, and don't see how this could affect the alerts?
There are no changes to how alerts work in neither 4.3.15 or 4.3.16.

I copied your configuration into a 4.3.16 system, and REPEAT is working fine here:

$ tail -f notifications.log
Mon Feb 10 09:39:58 2014 webmail.hswn.dk.conn (0.0.0.0) root[3] 1392021598 500
Mon Feb 10 09:46:16 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392021976 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 1392022917 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4] 1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5] 1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6] 1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7] 1392023826 500

(my "root" recipient is your first recipient, the "root-X" are your "11111", "22222" etc. recipients).

You didn't list the history log for the web01.apache2 service. Are you sure that it was red all of the time? Any green status will reset the REPEAT interval, this could explain why you don't see it.

Running xymond_alert with the "--debug" option will log a lot of data about how alert messages are handled. It would be nice to have this if the problem re-occurs.


Regards,
Henrik
list Johan Sjöberg · Mon, 10 Feb 2014 09:47:08 +0000 ·
-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of
user-ce4a2c883f75@xymon.invalid
Sent: den 10 februari 2014 10:22
To: xymon at xymon.com
Subject: Re: [Xymon] Alert REPEAT not working in 4.3.15.
quoted from Johan Sjöberg

Den 2014-02-10 8:18, Johan Sjöberg skrev:
A while ago, we upgraded to 4.3.15. It seems like the alert repeat
setting isn't working, only the first alert is sent. We have an
on-call person that receives the first alert via SMS after 7 minutes.
It should then repeat every 15 minutes. The rest of the team gets
their first alert after 22 minutes.
[snip config]
From the notification log:

Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123)
user-76c7fb09b118@xymon.invalid 1392007395 0

Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111
1392007875 0

Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222
1392008717 0

Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333
1392008717 0

Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444
1392008717 0

Strangely though, it seems like it was working on Feb 5, which was
also after the upgrade. The only change done since then is the patch
for xymonnet, and don't see how this could affect the alerts?
There are no changes to how alerts work in neither 4.3.15 or 4.3.16.

I copied your configuration into a 4.3.16 system, and REPEAT is working fine
here:

$ tail -f notifications.log
Mon Feb 10 09:39:58 2014 webmail.hswn.dk.conn (0.0.0.0) root[3]
1392021598 500
Mon Feb 10 09:46:16 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4]
1392021976 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4]
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5]
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6]
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7]
1392022917 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4]
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5]
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6]
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7]
1392023826 500

(my "root" recipient is your first recipient, the "root-X" are your "11111",
"22222" etc. recipients).

You didn't list the history log for the web01.apache2 service. Are you sure
that it was red all of the time? Any green status will reset the REPEAT
interval, this could explain why you don't see it.

Running xymond_alert with the "--debug" option will log a lot of data about
how alert messages are handled. It would be nice to have this if the problem
re-occurs.


Regards,
Henrik

If it wasn't red the whole time, the reciepients with 22 minutes delay wouldn't have received any alerts. It also happened for two different alerts during the night. I will check if I can reproduce it by forcing a red alert. Should I add the debug flag to tasks.cfg to enable it?

Regards,
Johan
list Henrik Størner · Mon, 10 Feb 2014 10:51:54 +0100 ·
quoted from Johan Sjöberg
Den 2014-02-10 10:47, Johan Sjöberg skrev:
Running xymond_alert with the "--debug" option will log a lot of data about
how alert messages are handled. It would be nice to have this if the problem
re-occurs.
If it wasn't red the whole time, the reciepients with 22 minutes
delay wouldn't have received any alerts. It also happened for two
different alerts during the night. I will check if I can reproduce it
by forcing a red alert. Should I add the debug flag to tasks.cfg to
enable it?
Either that, or toggle it on-the-fly by doing "kill -USR2 `pidof xymond_alert`" (you can see it enabled debugging in the alert.log file).


Regards,
Henrik
list Johan Sjöberg · Mon, 10 Feb 2014 10:23:37 +0000 ·
quoted from Henrik Størner
-----Original Message-----
From: user-ce4a2c883f75@xymon.invalid [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: den 10 februari 2014 10:52
To: Johan Sjöberg
Cc: xymon at xymon.com
Subject: RE: [Xymon] Alert REPEAT not working in 4.3.15.

Den 2014-02-10 10:47, Johan Sjöberg skrev:
Running xymond_alert with the "--debug" option will log a lot of data
about how alert messages are handled. It would be nice to have this
if the problem re-occurs.
If it wasn't red the whole time, the reciepients with 22 minutes delay
wouldn't have received any alerts. It also happened for two different
alerts during the night. I will check if I can reproduce it by forcing
a red alert. Should I add the debug flag to tasks.cfg to enable it?
Either that, or toggle it on-the-fly by doing "kill -USR2 `pidof xymond_alert`"
(you can see it enabled debugging in the alert.log file).


Regards,
Henrik
Of course it worked now, so I couldn't get anything useful. Will keep an eye on this.

/Johan