Xymon Mailing List Archive search

A different RECOVERED message problem!

4 messages in this thread

list Sebastian Auriol · Tue, 16 Jun 2009 15:40:39 +0100 ·
Hi all,

 
This seems to be a bug (or at least a flaw) in the alerting system (on
hobbit trunk from Dec 2008):  a recovery is sent for a MAIL alert that never
actually sent an alert in the first place.  The last two lines in the
notifications.log are recoveries that were sent at the exact same time.  But
only 1 alert (alarm) was actually sent out.

 
[root at IVRA1 log]# tail /var/log/hobbit/notifications.log -n 3

Tue Jun 16 14:46:49 2009 Db1.Special2 (192.168.4.xx)
meATmyCompanyDOTcom[191] 1245160009 0

Tue Jun 16 14:48:43 2009 Db1.Special2 (192.168.4.xx)
meATmyCompanyDOTcom[191] 1245160123 0 492

Tue Jun 16 14:48:43 2009 Db1.Special2 (192.168.4.xx)
meATmyCompanyDOTcom[192] 1245160123 0 492

 
The reason why this happened may be seen from the hobbit-alerts.cfg file.
One of the alert lines triggers on persistent alarms (including yellow
alarms), while the other is immediate but for red only.  Both have RECOVERED
alerts.  The test was only red for 2 minutes, so only the first MAIL rule
fired initially.  It seems that the RECOVERED part does not check that this
line actually triggered an alert before sending the recovered message!

 
HOST=Db1 SERVICE=Special2

     MAIL meATmyCompanyDOTcom COLOR=red REPEAT=30 RECOVERED

     MAIL meATmyCompanyDOTcom COLOR=yellow,red DURATION>10 REPEAT=30
RECOVERED

 
There haven't been any changes to the alerting system since December right?
Should I file this as a bug anywhere else?

 
Kind regards, 

 
SebA
list Alan Sparks · Tue, 16 Jun 2009 20:45:12 -0600 ·
quoted from Sebastian Auriol
SebA wrote:
Hi all,

 
This seems to be a bug (or at least a flaw) in the alerting system (on
hobbit trunk from Dec 2008):  a recovery is sent for a MAIL alert that
never actually sent an alert in the first place.  The last two lines
in the notifications.log are recoveries that were sent at the exact
same time.  But only 1 alert (alarm) was actually sent out.

 
I have a feeling you're seeing an artifact of some oddities in how the
alerts module matches up events and paging rules.  You can see my
analysis writeup here (however, I never solved the problem):
http://www.hswn.dk/hobbiton/2008/07/msg00132.html
-Alan
list Sebastian Auriol · Mon, 22 Jun 2009 15:42:56 +0100 ·
quoted from Alan Sparks
Alan Sparks <mailto:user-8f2174fd8b66@xymon.invalid> wrote:
SebA wrote:
This seems to be a bug (or at least a flaw) in the alerting system
(on hobbit trunk from Dec 2008):  a recovery is sent for a MAIL
alert that never actually sent an alert in the first place.  The
last two lines in the notifications.log are recoveries that were
sent at the exact same time.  But only 1 alert (alarm) was actually
sent out. 
I have a feeling you're seeing an artifact of some oddities in how the
alerts module matches up events and paging rules.  You can see my
analysis writeup here (however, I never solved the problem):
http://www.hswn.dk/hobbiton/2008/07/msg00132.html
-Alan
Yes, it looks like you're right Alan. I really think the alerts module needs
a bit of a rewrite... I've already posted about several flaws in it over the
past year... It just doesn't seem to be possible to get it doing what I want
(and expect) for several different tests / scenarios (at the same time).

SebA
list Alan Sparks · Thu, 25 Jun 2009 13:23:28 -0600 ·
quoted from Sebastian Auriol
SebA wrote:
Alan Sparks <mailto:user-8f2174fd8b66@xymon.invalid> wrote:
  
I have a feeling you're seeing an artifact of some oddities in how the
alerts module matches up events and paging rules.  You can see my
analysis writeup here (however, I never solved the problem):
http://www.hswn.dk/hobbiton/2008/07/msg00132.html
-Alan
    
Yes, it looks like you're right Alan. I really think the alerts module needs
a bit of a rewrite... I've already posted about several flaws in it over the
past year... It just doesn't seem to be possible to get it doing what I want
(and expect) for several different tests / scenarios (at the same time).

Seb
Would tend to agree, due to issues with RECOVERED and DURATION
processing, and the seeming conflict of interest between the alerts and
the flap suppression algorithms.  I have on occasion considered
outboarding the alert processing, and just having Hobbit send alerts for
anything to a custom script, where I could exert finer control.
-Alan