Xymon Mailing List Archive search

DURATION not working as expected

3 messages in this thread

list Johan Sjöberg · Tue, 22 Mar 2016 14:39:12 +0000 ·
Hi,
We are having problems with the DURATION setting in alerts.cfg, that it doesn't work as expected. The problem is that DURATION counts the time since the test went from OKCOLOR to ALERTCOLOR, and not the time since the alert-triggering color started. This means that if you have the default setting, with yellow as an ALERTCOLOR, and configure SMS alerts for red color with a certain DURATION, SMS will be sent to escalation teams immediately if a test is yellow for more than DURATION, and then goes red. If I configure an alert for red color, I would expect it to start counting the DURATION from when the red color was triggered.
Is there some easy hack I can do to change this behavior? The current workaround is to have yellow as an OKCOLOR, but that means we cannot set up any alerts for yellow, and we cannot ack yellow tests.

Regards,
Johan
list Japheth Cleaver · Wed, 23 Mar 2016 11:57:38 -0700 ·
quoted from Johan Sjöberg
On Tue, March 22, 2016 7:39 am, Johan Sjöberg wrote:
Hi,
We are having problems with the DURATION setting in alerts.cfg, that it
doesn't work as expected. The problem is that DURATION counts the time
since the test went from OKCOLOR to ALERTCOLOR, and not the time since the
alert-triggering color started. This means that if you have the default
setting, with yellow as an ALERTCOLOR, and configure SMS alerts for red
color with a certain DURATION, SMS will be sent to escalation teams
immediately if a test is yellow for more than DURATION, and then goes red.
If I configure an alert for red color, I would expect it to start counting
the DURATION from when the red color was triggered.
Is there some easy hack I can do to change this behavior? The current
workaround is to have yellow as an OKCOLOR, but that means we cannot set
up any alerts for yellow, and we cannot ack yellow tests.

Regards,
Johan
Hi Johan,

I can confirm that this is the case for DURATION. It's a reflection of the
fact that the timing is compared against the alert record as a whole
instead of the per-recipient record (where REPEAT values are stored, for
example).

I *think* I've got a fix for this, but I'm hesitant to put it into 4.3 as
this may be behavior that's being relied on by folks (in the same way that
REPEAT values *are* cleared when an alert escalates from yellow->red).
This should be fixed at the revision release, however.

I'm a bit surprised it hasn't been noticed before.


Regards,
-jc
list Johan Sjöberg · Thu, 24 Mar 2016 10:19:43 +0000 ·
Hi,

That sounds good, do you have any idea of when this might be available in a "stable" version?
Is it a big change, or something we might be able to change in the code and recompile ourselves, in case you already have the code?

We actually noticed this when we started using Xymon many years ago, but then we "solved" it by making yellow an OKCOLOR. Now we wanted to change our way of working and be able to ack yellow, and then we re-discovered this forgotten problem :)

Regards,
Johan
quoted from Japheth Cleaver

-----Original Message-----
From: J.C. Cleaver [mailto:user-87556346d4af@xymon.invalid] Sent: den 23 mars 2016 19:58
To: Johan Sjöberg <user-17b2bb4fd594@xymon.invalid>
Cc: Xymon Mailing List <xymon at xymon.com>
Subject: Re: [Xymon] DURATION not working as expected

On Tue, March 22, 2016 7:39 am, Johan Sjöberg wrote:
Hi,
We are having problems with the DURATION setting in alerts.cfg, that it doesn't work as expected. The problem is that DURATION counts the time since the test went from OKCOLOR to ALERTCOLOR, and not the time since the alert-triggering color started. This means that if you have the default setting, with yellow as an ALERTCOLOR, and configure SMS alerts for red color with a certain DURATION, SMS will be sent to escalation teams immediately if a test is yellow for more than DURATION, and then goes red.
If I configure an alert for red color, I would expect it to start counting the DURATION from when the red color was triggered.
Is there some easy hack I can do to change this behavior? The current workaround is to have yellow as an OKCOLOR, but that means we cannot set up any alerts for yellow, and we cannot ack yellow tests.

Regards,
Johan
Hi Johan,

I can confirm that this is the case for DURATION. It's a reflection of the fact that the timing is compared against the alert record as a whole instead of the per-recipient record (where REPEAT values are stored, for example).

I *think* I've got a fix for this, but I'm hesitant to put it into 4.3 as this may be behavior that's being relied on by folks (in the same way that REPEAT values *are* cleared when an alert escalates from yellow->red).
This should be fixed at the revision release, however.

I'm a bit surprised it hasn't been noticed before.


Regards,
-jc