-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of
user-ce4a2c883f75@xymon.invalid
Sent: den 10 februari 2014 10:22
To: xymon at xymon.com
Subject: Re: [Xymon] Alert REPEAT not working in 4.3.15.
Den 2014-02-10 8:18, Johan Sjöberg skrev:
A while ago, we upgraded to 4.3.15. It seems like the alert repeat
setting isn't working, only the first alert is sent. We have an
on-call person that receives the first alert via SMS after 7 minutes.
It should then repeat every 15 minutes. The rest of the team gets
their first alert after 22 minutes.
[snip config]
From the notification log:
Mon Feb 10 05:43:15 2014 web01.apache2 (123.123.123.123)
user-76c7fb09b118@xymon.invalid 1392007395 0
Mon Feb 10 05:51:15 2014 web01.apache2 (123.123.123.123) 111111
1392007875 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 222222
1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 333333
1392008717 0
Mon Feb 10 06:05:17 2014 web01.apache2 (123.123.123.123) 444444
1392008717 0
Strangely though, it seems like it was working on Feb 5, which was
also after the upgrade. The only change done since then is the patch
for xymonnet, and don't see how this could affect the alerts?
There are no changes to how alerts work in neither 4.3.15 or 4.3.16.
I copied your configuration into a 4.3.16 system, and REPEAT is working fine
here:
$ tail -f notifications.log
Mon Feb 10 09:39:58 2014 webmail.hswn.dk.conn (0.0.0.0) root[3]
1392021598 500
Mon Feb 10 09:46:16 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4]
1392021976 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4]
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5]
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6]
1392022917 500
Mon Feb 10 10:01:57 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7]
1392022917 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-1[4]
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-2[5]
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-3[6]
1392023826 500
Mon Feb 10 10:17:06 2014 webmail.hswn.dk.conn (0.0.0.0) root-4[7]
1392023826 500
(my "root" recipient is your first recipient, the "root-X" are your "11111",
"22222" etc. recipients).
You didn't list the history log for the web01.apache2 service. Are you sure
that it was red all of the time? Any green status will reset the REPEAT
interval, this could explain why you don't see it.
Running xymond_alert with the "--debug" option will log a lot of data about
how alert messages are handled. It would be nice to have this if the problem
re-occurs.
Regards,
Henrik