Xymon Mailing List Archive search

Alert repeat time not interrupted when going yellow->red

list Japheth Cleaver
Thu, 29 Oct 2015 08:07:34 -0700
Message-Id: <user-8db0ba22b5b9@xymon.invalid>

Hmm. Any ideas on what the variation is on when it happens and when it
doesn't?

Looking over the code, the only thing that catches my eye is how recipient
interactions might differ if there are two different 'unmatched' lines in
the config. I'd expect that to *always* cause a problem, though... Not
just some times.


Any chance you could put xymond_alert into debug mode (and perhaps w/ a
trace file) for a bit and you could see if you can catch xymond_alert when
a status gets worse? I'm curious how far we get into processing with it.


Regards,
-jc


On Thu, October 29, 2015 5:59 am, user-7adce57665bb@xymon.invalid wrote:
Hmmm.... Anyone?

Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office

From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Rothlisberger,
John R.
Sent: Tuesday, October 20, 2015 11:13 AM
To: xymon at xymon.com
Subject: [Xymon] Alert repeat time not interrupted when going yellow->red

I am wondering if anyone has seen this before and to be honest it doesn't
happen all the time.

Xymon 4.3.21 running on either Ubuntu or RHEL (happens on both).

If I have a rule for a warning that has a set REPEAT interval, and that
warning (yellow) goes to alert (red), the time has to expire for the
yellow before the rules are applied for the red.

Example:

Notification.log:
Tue Oct 20 00:31:09 2015 clientbox.disk (x.x.x.x) disk_warn 1445315463 100
<8 hours pass>
Tue Oct 20 08:31:11 2015 clientbox.disk (x.x.x.x) disk_alert 1445344265
100

Alerts.cfg:
PAGE=CLIENTPAGE COLOR=red,yellow,purple
   SCRIPT /home/xymon/server/ext/pg/my_alert_script disk_alert DURATION>20
REPEAT=15m COLOR=red SERVICE=disk FORMAT=TEXT UNMATCHED
   SCRIPT /home/xymon/server/ext/pg/my_warn_script disk_warn DURATION>30
REPEAT=8h COLOR=yellow SERVICE=disk FORMAT=TEXT UNMATCHED
STOP

The disk went to red at 03:10:00 but instead of sending out an immediate
alert as the status changed, it waited out the full time length of the
REPEAT value (8hours) and then sent the alert.

I have seen this before and it drives me nuts but it doesn't happen all
the time.

Ideas?
Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office


This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and delete
the original. Any other use of the e-mail by you is prohibited. Where
allowed by local law, electronic communications with Accenture and its
affiliates, including e-mail and instant messaging (including content),
may be scanned by our systems for the purposes of information security and
assessment of internal compliance with Accenture policy.

www.accenture.com<http://www.accenture.com>;