Xymon Mailing List Archive search

Alerting - I'm not doing it right...

list Carl Inglis
Thu, 15 Dec 2011 12:01:04 +0000
Message-Id: <user-6e12cdbb5ca4@xymon.invalid>

 Carl Inglis
Systems Administrator

Rakon UK Limited
Dowsett House, Sadler Road, Lincoln LN6 3RS, United Kingdom
Tel: +XX (X)XXXX XXXXXX | Fax:+XX (X) XXXX XXXXXX | Mob: +44 (0) 7786 552915
user-96685bdc864b@xymon.invalid | www.rakon.com
Winner of the 2010 Lincolnshire Business of the Year Award

This message together with any attachments contains confidential information and may be
subject to privilege. If you are not the intended recipient you may not distribute it in any
way, you must notify the sender immediately and delete any copies of the message along
with its attachments.
-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On
Behalf Of user-ce4a2c883f75@xymon.invalid
Sent: 15 December 2011 11:36
To: xymon at xymon.com

On Thu, 15 Dec 2011 10:02:43 +0000, Carl Inglis <user-96685bdc864b@xymon.invalid>
wrote:
alerts.cfg

$EMAIL_ALERT=user-96685bdc864b@xymon.invalid
$LIN_WINDOWS_PROBLEMS=$EMAIL_ALERT

HOST=%lin(.*) SERVICE=%win(.*)
        MAIL $LIN_WINDOWS_PROBLEMS REPEAT=24h DURATION>1d RECOVERED
STOP

HOST=* EXPAGE=printers
        MAIL $EMAIL_ALERT REPEAT=1h RECOVERED UNMATCHED STOP

When the host "lin-apps-01" has a yellow alert on it's "winUpdates"
services, I expect it to shout about it once every 24h. It is,
however, shouting about it once every hour.
There may be some confusion about "service" here.

When you refer to "winUpdates" - is that a status-column in Xymon, or a
Windows Service that you are monitoring with a client on the Windows
machine? The latter would typically show up in a "svcs" (services)
status column on Xymon.
It's a status column that's returned by a BBWIN ext script- it goes yellow if there are pending Windows Updates on that server.
The SERVICE=... setting in alerts.cfg refer to the status-column, not a
Windows service. So to catch a "Windows updates" service that is not
running, you would have 'SERVICE=svcs' in alerts.cfg.

What the first part of your alerts.cfg says, is "if you have a host
whose name contains 'lin', and that host has a status-column that
contains 'win', then send an alert after 1 day, and repeat every 24
hours".
Which is what I wanted it to do.
The second part of your configuration says "Any status that has an
error - except those on the 'printers' page, and those handled by other
rules - trigger an alert that is repeated once an hour". Pretty broad
definition, I think.
Indeed - I'm currently in development mode trying to finalise how we're going to do our alerting; the last line of the configuration was intended as a "you missed one" alert for me. There are a number of lines above the first line in my original email.
Hope that removes a bit of confusion.
It does indeed, thank you.

It appears that removing the "DURATION>1d" option has stopped the second rule for firing - which would make sense since (as Johan suggested) the first rule is unmatched until the alert has a duration of more than one day.

Is that interpretation correct?

Thanks,

Carl


Rakon UK Ltd is a limited company registered in England and Wales.
Registered Office: Dowsett House, Sadler Road, Lincoln LN6 3RS
Company Registration Number: 5128090.

Please be aware that Rakon UK Limited may monitor email traffic data including the date, time, subject line, sender and recipients for the purposes of security and usage monitoring. Automated monitoring systems may also be applied to ascertain whether incoming/outgoing emails are likely to contain viruses, other destructive devices or inappropriate content.