Xymon Mailing List Archive search

Alert repeat time not interrupted when going yellow->red

4 messages in this thread

list John Rothlisberger · Tue, 20 Oct 2015 16:13:27 +0000 ·
I am wondering if anyone has seen this before and to be honest it doesn't happen all the time.

Xymon 4.3.21 running on either Ubuntu or RHEL (happens on both).

If I have a rule for a warning that has a set REPEAT interval, and that warning (yellow) goes to alert (red), the time has to expire for the yellow before the rules are applied for the red.

Example:

Notification.log:
Tue Oct 20 00:31:09 2015 clientbox.disk (x.x.x.x) disk_warn 1445315463 100
<8 hours pass>
Tue Oct 20 08:31:11 2015 clientbox.disk (x.x.x.x) disk_alert 1445344265 100

Alerts.cfg:
PAGE=CLIENTPAGE COLOR=red,yellow,purple
   SCRIPT /home/xymon/server/ext/pg/my_alert_script disk_alert DURATION>20 REPEAT=15m COLOR=red SERVICE=disk FORMAT=TEXT UNMATCHED
   SCRIPT /home/xymon/server/ext/pg/my_warn_script disk_warn DURATION>30 REPEAT=8h COLOR=yellow SERVICE=disk FORMAT=TEXT UNMATCHED
STOP

The disk went to red at 03:10:00 but instead of sending out an immediate alert as the status changed, it waited out the full time length of the REPEAT value (8hours) and then sent the alert.

I have seen this before and it drives me nuts but it doesn't happen all the time.

Ideas?
Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office


This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.

www.accenture.com
list John Rothlisberger · Thu, 29 Oct 2015 12:59:01 +0000 ·
Hmmm.... Anyone?
signature

Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office

quoted from John Rothlisberger
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Rothlisberger, John R.
Sent: Tuesday, October 20, 2015 11:13 AM
To: xymon at xymon.com
Subject: [Xymon] Alert repeat time not interrupted when going yellow->red

I am wondering if anyone has seen this before and to be honest it doesn't happen all the time.

Xymon 4.3.21 running on either Ubuntu or RHEL (happens on both).

If I have a rule for a warning that has a set REPEAT interval, and that warning (yellow) goes to alert (red), the time has to expire for the yellow before the rules are applied for the red.

Example:

Notification.log:
Tue Oct 20 00:31:09 2015 clientbox.disk (x.x.x.x) disk_warn 1445315463 100
<8 hours pass>
Tue Oct 20 08:31:11 2015 clientbox.disk (x.x.x.x) disk_alert 1445344265 100

Alerts.cfg:
PAGE=CLIENTPAGE COLOR=red,yellow,purple
   SCRIPT /home/xymon/server/ext/pg/my_alert_script disk_alert DURATION>20 REPEAT=15m COLOR=red SERVICE=disk FORMAT=TEXT UNMATCHED
   SCRIPT /home/xymon/server/ext/pg/my_warn_script disk_warn DURATION>30 REPEAT=8h COLOR=yellow SERVICE=disk FORMAT=TEXT UNMATCHED
STOP

The disk went to red at 03:10:00 but instead of sending out an immediate alert as the status changed, it waited out the full time length of the REPEAT value (8hours) and then sent the alert.

I have seen this before and it drives me nuts but it doesn't happen all the time.

Ideas?
Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office


This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.

www.accenture.com<http://www.accenture.com>;
list Japheth Cleaver · Thu, 29 Oct 2015 08:07:34 -0700 ·
Hmm. Any ideas on what the variation is on when it happens and when it
doesn't?

Looking over the code, the only thing that catches my eye is how recipient
interactions might differ if there are two different 'unmatched' lines in
the config. I'd expect that to *always* cause a problem, though... Not
just some times.


Any chance you could put xymond_alert into debug mode (and perhaps w/ a
trace file) for a bit and you could see if you can catch xymond_alert when
a status gets worse? I'm curious how far we get into processing with it.


Regards,
-jc
quoted from John Rothlisberger


On Thu, October 29, 2015 5:59 am, user-7adce57665bb@xymon.invalid wrote:
Hmmm.... Anyone?

Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office

From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Rothlisberger,
John R.
Sent: Tuesday, October 20, 2015 11:13 AM
To: xymon at xymon.com
Subject: [Xymon] Alert repeat time not interrupted when going yellow->red

I am wondering if anyone has seen this before and to be honest it doesn't
happen all the time.

Xymon 4.3.21 running on either Ubuntu or RHEL (happens on both).

If I have a rule for a warning that has a set REPEAT interval, and that
warning (yellow) goes to alert (red), the time has to expire for the
yellow before the rules are applied for the red.

Example:

Notification.log:
Tue Oct 20 00:31:09 2015 clientbox.disk (x.x.x.x) disk_warn 1445315463 100
<8 hours pass>
Tue Oct 20 08:31:11 2015 clientbox.disk (x.x.x.x) disk_alert 1445344265
100

Alerts.cfg:
PAGE=CLIENTPAGE COLOR=red,yellow,purple
   SCRIPT /home/xymon/server/ext/pg/my_alert_script disk_alert DURATION>20
REPEAT=15m COLOR=red SERVICE=disk FORMAT=TEXT UNMATCHED
   SCRIPT /home/xymon/server/ext/pg/my_warn_script disk_warn DURATION>30
REPEAT=8h COLOR=yellow SERVICE=disk FORMAT=TEXT UNMATCHED
STOP

The disk went to red at 03:10:00 but instead of sending out an immediate
alert as the status changed, it waited out the full time length of the
REPEAT value (8hours) and then sent the alert.

I have seen this before and it drives me nuts but it doesn't happen all
the time.

Ideas?
Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office


This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and delete
the original. Any other use of the e-mail by you is prohibited. Where
allowed by local law, electronic communications with Accenture and its
affiliates, including e-mail and instant messaging (including content),
may be scanned by our systems for the purposes of information security and
assessment of internal compliance with Accenture policy.

www.accenture.com<http://www.accenture.com>;

list John Rothlisberger · Fri, 30 Oct 2015 12:40:29 +0000 ·
I ran into problems a while back where multiple rules were being run when only 1 should have.  I was able to correct that problem only by adding the UNMATCHED to every line.  I didn't like it then and I don't like it now and am willing to remove those to see what happens.

As for running in debug mode - that just may create too much noise.

I will start by removing the unmatched from each line and see where that takes me.
quoted from Japheth Cleaver

Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office

-----Original Message-----
From: J.C. Cleaver [mailto:user-87556346d4af@xymon.invalid]
Sent: Thursday, October 29, 2015 10:08 AM
To: Rothlisberger, John R. <user-7adce57665bb@xymon.invalid>
Cc: xymon at xymon.com
Subject: Re: [Xymon] Alert repeat time not interrupted when going
yellow->red

Hmm. Any ideas on what the variation is on when it happens and when it
doesn't?

Looking over the code, the only thing that catches my eye is how
recipient interactions might differ if there are two different
'unmatched' lines in the config. I'd expect that to *always* cause a
problem, though... Not just some times.


Any chance you could put xymond_alert into debug mode (and perhaps w/ a
trace file) for a bit and you could see if you can catch xymond_alert
when a status gets worse? I'm curious how far we get into processing
with it.


Regards,
-jc


On Thu, October 29, 2015 5:59 am, user-7adce57665bb@xymon.invalid
wrote:
Hmmm.... Anyone?

Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing Accenture
XXX.XXX.XXXX office

From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of
Rothlisberger, John R.
Sent: Tuesday, October 20, 2015 11:13 AM
To: xymon at xymon.com
Subject: [Xymon] Alert repeat time not interrupted when going
yellow->red

I am wondering if anyone has seen this before and to be honest it
doesn't happen all the time.

Xymon 4.3.21 running on either Ubuntu or RHEL (happens on both).

If I have a rule for a warning that has a set REPEAT interval, and
that warning (yellow) goes to alert (red), the time has to expire for
the yellow before the rules are applied for the red.

Example:

Notification.log:
Tue Oct 20 00:31:09 2015 clientbox.disk (x.x.x.x) disk_warn
1445315463
100
<8 hours pass>
Tue Oct 20 08:31:11 2015 clientbox.disk (x.x.x.x) disk_alert
1445344265
100

Alerts.cfg:
PAGE=CLIENTPAGE COLOR=red,yellow,purple
   SCRIPT /home/xymon/server/ext/pg/my_alert_script disk_alert
DURATION>20 REPEAT=15m COLOR=red SERVICE=disk FORMAT=TEXT UNMATCHED
   SCRIPT /home/xymon/server/ext/pg/my_warn_script disk_warn
DURATION>30 REPEAT=8h COLOR=yellow SERVICE=disk FORMAT=TEXT UNMATCHED
STOP

The disk went to red at 03:10:00 but instead of sending out an
immediate alert as the status changed, it waited out the full time
length of the REPEAT value (8hours) and then sent the alert.

I have seen this before and it drives me nuts but it doesn't happen
all the time.

Ideas?
Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing Accenture
XXX.XXX.XXXX office


This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If
you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is
prohibited.
Where allowed by local law, electronic communications with Accenture
and its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of
information security and assessment of internal compliance with
Accenture policy.

www.accenture.com<http://www.accenture.com>;

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.

www.accenture.com