Xymon Mailing List Archive search

xymon_4.3.0-RC1: possible lost alerts

list Dominique Frise
Mon, 14 Feb 2011 12:21:14 +0100
Message-Id: <user-6eb5770a86f9@xymon.invalid>

On 02/14/11 11:00 AM, Henrik Størner wrote:
In<user-f44f191b2358@xymon.invalid>  Dominique Frise<user-78ab6673b600@xymon.invalid>  writes:
I think I found a bug in xymond_alert.c.
Lets say there is a page msg for hostA.serviceA and this alert will not
be processed immediately because of this part of code:
    816                  /*
    817                   * When a burst of alerts happen, we get lots of alert messages
    818                   * coming in quickly. So lets handle them in bunches and only
    819                   * do the full alert handling once every 10 secs - that lets us
    820                   * combine a bunch of alerts into one transmission process.
    821                   */
    822                  if (nowtimer<  (lastxmit+10)) continue;
    823                  lastxmit = nowtimer;
The main loop will then wait for a new msg from xymond (Want msg<num>,
startpos... etc).
Now if the next msg is a page recovery from the same hostA.serviceA,
the next processing of the active alerts (for loop) will then cleanup
the alert for hostA.serviceA without sending any alert.
I haven't tested your diagnosis, but it is probably correct
(from how I remember that this code works).

But is it a problem ?

If you get an alert that clears a few seconds later (that is why there
is a recovery message), then what is the point of sending an alert ?
The notification would be for data that is no longer valid, and
personally I would rather NOT be alerted a 3 AM if the problem no
longer exists.

So I am tempted to invoke the old "this is not a bug, it's a feature!"
meme :-)
I think the problem is rather that the behaviour is not deterministic.
Some alert/recovered transitions will get through (if the alert goes 
into the alerts loop processing without waiting) or can get lost (if 
alert and recovery are processed in the same loop).

Dominique