Xymon Mailing List Archive search

recovery emails on alerts which don't generate pages == undesired behaviour

6 messages in this thread

list Bruce Lysik · Thu, 3 Mar 2005 15:13:31 -0800 ·
Hi,

Previously, we've only wanted to alert on red and purples, and then send recovery emails when it changes out of a red or purple state.  This was easily accomplished by setting --alertcolors to red,purple.

Recently however, I've gotten some requests from people who want to get alerted on a few monitors when they yellow.  No problem, I thought.  I'll just change --alertcolors back to default, and then add COLOR=red,purple to all the existing alert definitions to start.

This caused the problem where a monitor would go into a yellow state (not causing a page because COLOR=red,purple) and then go back to green (which would then send a recovery page).  This current behaviour doesn't make sense.  Why would I want to be alerted on a recovery of something that never generated an alert in the first place?  It would make more sense if the recovery email condition was tied to the COLOR definition.

So an alert with COLOR=red,yellow,purple (the default) would send a recovery message on leaving the red, yellow, or purple state to green, blue, or clear.  

And an alert with COLOR=red,purple would only send a recovery message if it was leaving a red or purple state.

Or maybe I'm braindead and can't see how this can be accomplished currently.

Comments?  

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Henrik Størner · Fri, 4 Mar 2005 07:40:17 +0100 ·
quoted from Bruce Lysik
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote:
 Recently however, I've gotten some requests from people who want to
 get alerted on a few monitors when they yellow.  No problem, I
 thought.  I'll just change --alertcolors back to default, and then
 add COLOR=red,purple to all the existing alert definitions to start.

 This caused the problem where a monitor would go into a yellow state
 (not causing a page because COLOR=red,purple) and then go back to
 green (which would then send a recovery page).  This current
 behaviour doesn't make sense.
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I
sent out yesterday to stop alerts from going off every minute - 
should behave the way you want. As you say, it doesn't make sense to
get a recovery message when you didn't get the alert. I just tested it
to be absolutely certain it behaves, and it does.

If you're confused about what version you've got (quite
understandable): Unpack the hobbit-4.0-RC4 archive; grab the latest
post-RC4 patch (I updated it yesterday) from http://www.hswn.dk/beta/
and apply it with "cd hobbit-4.0-RC4; patch -p0 </tmp/post-RC4.patch"
copy over your Makefile from the old setup and run make, make install.


Henrik
list Bruce Lysik · Fri, 4 Mar 2005 10:02:02 -0800 ·
quoted from Henrik Størner
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I
sent out yesterday to stop alerts from going off every minute - 
should behave the way you want. As you say, it doesn't make sense to
get a recovery message when you didn't get the alert. I just tested it
to be absolutely certain it behaves, and it does.

If you're confused about what version you've got (quite
understandable): Unpack the hobbit-4.0-RC4 archive; grab the latest
post-RC4 patch (I updated it yesterday) from http://www.hswn.dk/beta/
and apply it with "cd hobbit-4.0-RC4; patch -p0 </tmp/post-RC4.patch"
copy over your Makefile from the old setup and run make, make install.
Sweet!  Yeah, I rolled back the RC4 patch when it started paging every minute, and didn't get a chance to try with the fix.  I'll download the latest post-RC4 patch and give it a shot.

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Kevin Hanrahan · Fri, 4 Mar 2005 13:24:13 -0500 ·
The patch for the page-a-minute prob did fix it. I havent tried RC5 yet but
if it includes the patch, it should fix it as well


KEvin

-----Original Message-----
From: Bruce Lysik [mailto:user-4e63a10f8934@xymon.invalid] 
Sent: Friday, March 04, 2005 1:02 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] recovery emails on alerts which don't generate pages
== undesired behaviour
quoted from Bruce Lysik

The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I 
sent out yesterday to stop alerts from going off every minute - should 
behave the way you want. As you say, it doesn't make sense to get a 
recovery message when you didn't get the alert. I just tested it to be 
absolutely certain it behaves, and it does.

If you're confused about what version you've got (quite
understandable): Unpack the hobbit-4.0-RC4 archive; grab the latest 
post-RC4 patch (I updated it yesterday) from http://www.hswn.dk/beta/ 
and apply it with "cd hobbit-4.0-RC4; patch -p0 </tmp/post-RC4.patch" 
copy over your Makefile from the old setup and run make, make install.
Sweet!  Yeah, I rolled back the RC4 patch when it started paging every
minute, and didn't get a chance to try with the fix.  I'll download the
latest post-RC4 patch and give it a shot.

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer 


Note:  The information contained in this email and in any attachments is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material.  Any review,
retransmission, dissemination or other use of, or taking of any action in
reliance upon, this information by persons or entities other than the
intended recipient is prohibited.  The recipient should check this email and
any attachments for the presence of viruses.  Sender accepts no liability
for any damages caused by any virus transmitted by this email. If you have
received this email in error, please notify us immediately by replying to
the message and delete the email from your computer.  This e-mail is and any
response to it will be unencrypted and, therefore, potentially unsecure.
Thank you.  NOVA Information Systems, Inc.
list Asif Iqbal · Fri, 4 Mar 2005 14:19:54 -0500 ·
quoted from Kevin Hanrahan
On Fri, Mar 04, 2005 at 07:40:17AM, Henrik Stoerner wrote:
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote:

The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I
sent out yesterday to stop alerts from going off every minute - 
should behave the way you want. As you say, it doesn't make sense to
get a recovery message when you didn't get the alert. I just tested it
Like to know how you test. It will add more debug skills, for hobbit, in my list
:-) 
to be absolutely certain it behaves, and it does.
-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
"It is not the strongest of the species that survives, not the most intelligent, but
 the one most responsive to change."    - Charles Darwin
list Henrik Størner · Fri, 4 Mar 2005 23:10:12 +0100 ·
quoted from Asif Iqbal
On Fri, Mar 04, 2005 at 02:19:54PM -0500, Asif Iqbal wrote:
On Fri, Mar 04, 2005 at 07:40:17AM, Henrik Stoerner wrote:
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote:

The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I
sent out yesterday to stop alerts from going off every minute - 
should behave the way you want. As you say, it doesn't make sense to
get a recovery message when you didn't get the alert. I just tested it
Like to know how you test. It will add more debug skills, for
 hobbit, in my list
I have Hobbit running on my workstation, just monitoring itself. Then
I setup bb-hosts or hobbit-alerts.cfg as needed for the test I want to
do; e.g here I added some extra alert rules:

HOST=osiris.hswn.dk
        MAIL user-ce4a2c883f75@xymon.invalid REPEAT=1h COLOR=red RECOVERED
        MAIL user-ef2660e32166@xymon.invalid REPEAT=1h COLOR=red,yellow RECOVERED

Restarted Hobbit, and fired off a yellow and a red alert:

bb 127.0.0.1 "status osiris,hswn,dk.test1 yellow `date` Test Y"
bb 127.0.0.1 "status osiris,hswn,dk.test1 red `date` Test R"

and noticed what emails were being sent - I should get one message for
the yellow status, and two for the red. When I got these, repeat the
"bb" commands with a green status, and see what recovery messages show
up.


Henrik