-----Original Message-----
From: Cortes, Manny [mailto:user-4d8222bd9f10@xymon.invalid]
Sent: Mittwoch, 14. März 2007 23:30
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] hobbit-alerts.cfg - DURATION
We use DURATION in our case as a way to escalate notifications to another
group of recipients 15 minutes after the initial event occurred in hobbit.
The initial alert goes to our onsite Operations folks then after 15
minutes, a custom script fires off that informs all in that particular
recipient group that the event is still ongoing and it is being escalated.
so: qpage pages Operations as soon as the event occurs and they
monitor the event
DURATION>15: the second script fires off.
Working pretty well so far....
Could REPEAT be used for further escalation? Or will another DURATION>30
suffice?
This is from the hobbit-alerts.cfg man-page:
Rule matcing an alert if the event has lasted longer/shorter than the given duration. E.g. DURATION>1h (lasted longer than 1 hour) or DURATION<30 (only sends alerts the first 30 minutes).
That's exactly the way we are using the DURATION tag.
We've specified on most of the alert rules a DURATION>5 because often a test fails and becomes back green after e.g. 2 minutes. So in this case we don't want to get alarmed by mail or SMS.
If the RED condition is still valid after more than 5 minutes then send out an alarm.
We also use the REPEAT tag, based on the importance of the systems, resend the alarm to make the appropiate people aware that the problem is still not fixed.
Regards
Johann