Xymon Mailing List Archive search

Yellow alerts, but no yellow config...

2 messages in this thread

list Daniel J McDonald · Mon, 14 Mar 2005 18:32:26 -0600 ·
I'm finally trying to get alerts working with hobbit (RC5, no post
patches, Linux 2.6.8.1-24mdksmp i686).  I'm getting paged on yellow.
The man pages don't give many examples, but I think I've got this right.
If I don't, let me know what I need to fix, or where to look.  There is
nothing in /var/log/hobbit/page.log


Here are the hobbitlaunch.cfg parameters:

[hobbitd]
        ENVFILE /var/hobbit/server/etc/hobbitserver.cfg
        CMD hobbitd --restart=$BBTMP/hobbitd.chk --checkpoint-file=
$BBTMP/hobbitd.chk --checkpoint-interval=600 --log=
$BBSERVERLOGS/hobbitd.log --alertcolors="red" --okcolors="green,blue"
--admin-senders=127.0.0.1,$BBSERVERIP
#...
[bbpage]
        ENVFILE /var/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD hobbitd_channel --channel=page --alertcolors="red"
--okcolors="green,blue" --log=$BBSERVERLOGS/page.log hobbitd_alert

and the paging rules:
HOST=ae-urps.aenetad.net
        MAIL=user-290ce4e24e19@xymon.invalid,user-24a930c5e365@xymon.invalid
REPEAT=24h RECOVERED

HOST=%.*ups.*.austin-energy.net
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=2h DURATION>10m
SERVICE=freq COLOR="red" RECOVERED
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=2h
SERVICE=upsmin,upssec COLOR="red" RECOVERED

HOST=%.*probe.*.austin-energy.net
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=24h DURATION>20m
COLOR="red" RECOVERED

HOST=%.
        MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m
RECOVERED COLOR="red" UNMATCED
list Henrik Størner · Tue, 15 Mar 2005 09:35:27 +0000 (UTC) ·
quoted from Daniel J McDonald
In <user-ad9f2d417d6e@xymon.invalid> Daniel J McDonald <user-290ce4e24e19@xymon.invalid> writes:
I'm finally trying to get alerts working with hobbit (RC5, no post
patches, Linux 2.6.8.1-24mdksmp i686).  I'm getting paged on yellow.
First check if those are alert-messages or recovery messages. In
~hobbit/data/acks/notifications.log you should see log entries for
the messages you receive, like these:

Tue Mar 15 09:50:34 2005 backup-mx.post.tele.dk.smtp (195.41.53.68) user-ce4a2c883f75@xymon.invalid 1110876634 725
Tue Mar 15 10:05:37 2005 backup-mx.post.tele.dk.smtp (195.41.53.68) user-ce4a2c883f75@xymon.invalid 1110877537 725
Tue Mar 15 10:10:41 2005 backup-mx.post.tele.dk.smtp (195.41.53.68) user-ce4a2c883f75@xymon.invalid 1110877840 725 4220

The first two are alerts, the last one is a recovery message (you can
see that by the extra number "4220" which is how long the service was
down).

RC5 has a known bug in the alert module, where it will send recovery
messages even if you never received an alert message.
Here are the hobbitlaunch.cfg parameters:
Looks OK
quoted from Daniel J McDonald
and the paging rules:
HOST=ae-urps.aenetad.net
       MAIL=user-290ce4e24e19@xymon.invalid,user-24a930c5e365@xymon.invalid REPEAT=24h RECOVERED

HOST=%.*ups.*.austin-energy.net
       MAIL=user-290ce4e24e19@xymon.invalid REPEAT=2h DURATION>10m SERVICE=freq COLOR="red" RECOVERED
       MAIL=user-290ce4e24e19@xymon.invalid REPEAT=2h SERVICE=upsmin,upssec COLOR="red" RECOVERED

HOST=%.*probe.*.austin-energy.net
       MAIL=user-290ce4e24e19@xymon.invalid REPEAT=24h DURATION>20m COLOR="red" RECOVERED

HOST=%.
       MAIL=user-290ce4e24e19@xymon.invalid REPEAT=140h DURATION>30m RECOVERED COLOR="red" UNMATCED
I suspect your yellow alerts are recovery messages, and hence this is
the known bug in RC5.

BTW, the "UNMATCED" in your last rule must be a typo ...


Regards,
Henrik