Xymon Mailing List Archive search

xymon_4.3.0-RC1: possible lost alerts

list Dominique Frise
Mon, 14 Feb 2011 15:17:35 +0100
Message-Id: <user-5937a478bc8f@xymon.invalid>


Meilleures salutations,

Dominique
_______________UNIL - University of Lausanne_______________
Dominique Frise             E-mail: user-78ab6673b600@xymon.invalid
UNIL, Centre Informatique   Phone:         +XX XX XXX XX XX
Quartier Sorge / Amphimax   Fax:           +XX XX XXX XX XX
1015 Lausanne, Switzerland  URL:      http://www.unil.ch/ci

On 02/14/11 02:51 PM, Henrik Størner wrote:
In<user-fea03e92c89e@xymon.invalid>  Dominique Frise<user-78ab6673b600@xymon.invalid>  writes:
what is suppose to happen if you remove the "clear" color from OKCOLORS
in xymonserver.cfg ?
Then a "clear" status would trigger alerts, i.e. the xymond_alert
module would begin to see alert-messages for a clear status (same
as for yellow, red, purple).

I don't think you would actually see any alerts being sent, unless
you also change ALERTCOLORS to include the "clear" status.

But that would be a bad idea, since "clear" is also used for
e.g. "noping" hosts, or for client-side statuses (cpu, disk, ...)
when the server is down ("conn" status is red means client-side
tests will not go purple - they go clear).
We would expect that not recovery message should be sent when a status
goes from yellow/red to clear. Only the repeat interval should be reset.
Does this make sense ?
Kind of, yes. I don't recall if it was actually tested.
I dont't think it was ;-)
Here below the little changes we made in xymond_alerts.c (version before 
your last changes) to achieve this:


[super at iris xymond]# diff -u xymond_alert.c.dist xymond_alert.c
--- xymond_alert.c.dist Sun Nov 14 18:21:19 2010
+++ xymond_alert.c      Mon Feb 14 15:02:24 2011
@@ -355,7 +355,7 @@
         char *msg;
         int seq;
         int argi;
-       int alertcolors, alertinterval;
+       int alertcolors, alertinterval, okcolors;
         char *configfn = NULL;
         char *checkfn = NULL;
         int checkpointinterval = 900;
@@ -377,6 +377,7 @@
         /* Load alert config */
         alertcolors = colorset(xgetenv("ALERTCOLORS"), ((1 << 
COL_GREEN) | (1 << COL_BLUE)));
         alertinterval = 60*atoi(xgetenv("ALERTREPEAT"));
+       okcolors = colorset(xgetenv("OKCOLORS"), (1 << COL_RED));

         /* Create our loookup-trees */
         hostnames = rbtNew(name_compare);
@@ -656,7 +657,7 @@
                                         awalk->maxcolor = newcolor;
                                 }
                         }
-                       else {
+                       else if ((okcolors & (1 << newcolor)) != 0) {
                                 /*
                                  * Send one "recovered" message out 
now, then go to A_DEAD.
                                  * Dont update the color here - we want 
recoveries to go out
@@ -663,6 +664,11 @@
                                  * only if the alert color triggered an 
alert
                                  */
                                 awalk->state = A_RECOVERED;
+                       } else {
+                               /*
+                                * This color should not trigger 
"recovered" messages.
+                                */
+                               awalk->state = A_NORECIP;
                         }


With this in place we can better support alerting for SNMP traps (see 
previous discussion with Buchan 
http://www.xymon.com/archive/2011/02/msg00062.html), but then we want 
all short transitions from an alert state to a clear status to be 
processed by Xymon (not ignored).

Dominique