recovery emails on alerts which don't generate pages == undesired behaviour
list Bruce Lysik
Hi, Previously, we've only wanted to alert on red and purples, and then send recovery emails when it changes out of a red or purple state. This was easily accomplished by setting --alertcolors to red,purple. Recently however, I've gotten some requests from people who want to get alerted on a few monitors when they yellow. No problem, I thought. I'll just change --alertcolors back to default, and then add COLOR=red,purple to all the existing alert definitions to start. This caused the problem where a monitor would go into a yellow state (not causing a page because COLOR=red,purple) and then go back to green (which would then send a recovery page). This current behaviour doesn't make sense. Why would I want to be alerted on a recovery of something that never generated an alert in the first place? It would make more sense if the recovery email condition was tied to the COLOR definition. So an alert with COLOR=red,yellow,purple (the default) would send a recovery message on leaving the red, yellow, or purple state to green, blue, or clear. And an alert with COLOR=red,purple would only send a recovery message if it was leaving a red or purple state. Or maybe I'm braindead and can't see how this can be accomplished currently. Comments? -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
list Henrik Størner
▸
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote:
Recently however, I've gotten some requests from people who want to get alerted on a few monitors when they yellow. No problem, I thought. I'll just change --alertcolors back to default, and then add COLOR=red,purple to all the existing alert definitions to start. This caused the problem where a monitor would go into a yellow state (not causing a page because COLOR=red,purple) and then go back to green (which would then send a recovery page). This current behaviour doesn't make sense.
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested it to be absolutely certain it behaves, and it does. If you're confused about what version you've got (quite understandable): Unpack the hobbit-4.0-RC4 archive; grab the latest post-RC4 patch (I updated it yesterday) from http://www.hswn.dk/beta/ and apply it with "cd hobbit-4.0-RC4; patch -p0 </tmp/post-RC4.patch" copy over your Makefile from the old setup and run make, make install. Henrik
list Bruce Lysik
▸
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested it to be absolutely certain it behaves, and it does. If you're confused about what version you've got (quite understandable): Unpack the hobbit-4.0-RC4 archive; grab the latest post-RC4 patch (I updated it yesterday) from http://www.hswn.dk/beta/ and apply it with "cd hobbit-4.0-RC4; patch -p0 </tmp/post-RC4.patch" copy over your Makefile from the old setup and run make, make install.
Sweet! Yeah, I rolled back the RC4 patch when it started paging every minute, and didn't get a chance to try with the fix. I'll download the latest post-RC4 patch and give it a shot. -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
list Kevin Hanrahan
The patch for the page-a-minute prob did fix it. I havent tried RC5 yet but if it includes the patch, it should fix it as well KEvin -----Original Message----- From: Bruce Lysik [mailto:user-4e63a10f8934@xymon.invalid] Sent: Friday, March 04, 2005 1:02 PM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] recovery emails on alerts which don't generate pages == undesired behaviour
▸
The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested it to be absolutely certain it behaves, and it does. If you're confused about what version you've got (quite understandable): Unpack the hobbit-4.0-RC4 archive; grab the latest post-RC4 patch (I updated it yesterday) from http://www.hswn.dk/beta/ and apply it with "cd hobbit-4.0-RC4; patch -p0 </tmp/post-RC4.patch" copy over your Makefile from the old setup and run make, make install.
Sweet! Yeah, I rolled back the RC4 patch when it started paging every minute, and didn't get a chance to try with the fix. I'll download the latest post-RC4 patch and give it a shot. -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
Note: The information contained in this email and in any attachments is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of any action in
reliance upon, this information by persons or entities other than the
intended recipient is prohibited. The recipient should check this email and
any attachments for the presence of viruses. Sender accepts no liability
for any damages caused by any virus transmitted by this email. If you have
received this email in error, please notify us immediately by replying to
the message and delete the email from your computer. This e-mail is and any
response to it will be unencrypted and, therefore, potentially unsecure.
Thank you. NOVA Information Systems, Inc.
list Asif Iqbal
▸
On Fri, Mar 04, 2005 at 07:40:17AM, Henrik Stoerner wrote:
On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote: The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested it
Like to know how you test. It will add more debug skills, for hobbit, in my list :-)
to be absolutely certain it behaves, and it does.
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "It is not the strongest of the species that survives, not the most intelligent, but the one most responsive to change." - Charles Darwin
list Henrik Størner
▸
On Fri, Mar 04, 2005 at 02:19:54PM -0500, Asif Iqbal wrote:
On Fri, Mar 04, 2005 at 07:40:17AM, Henrik Stoerner wrote:On Thu, Mar 03, 2005 at 03:13:31PM -0800, Bruce Lysik wrote: The current code - i.e. RC4 plus the post-RC4 patch, plus the fix I sent out yesterday to stop alerts from going off every minute - should behave the way you want. As you say, it doesn't make sense to get a recovery message when you didn't get the alert. I just tested itLike to know how you test. It will add more debug skills, for hobbit, in my list
I have Hobbit running on my workstation, just monitoring itself. Then
I setup bb-hosts or hobbit-alerts.cfg as needed for the test I want to
do; e.g here I added some extra alert rules:
HOST=osiris.hswn.dk
MAIL user-ce4a2c883f75@xymon.invalid REPEAT=1h COLOR=red RECOVERED
MAIL user-ef2660e32166@xymon.invalid REPEAT=1h COLOR=red,yellow RECOVERED
Restarted Hobbit, and fired off a yellow and a red alert:
bb 127.0.0.1 "status osiris,hswn,dk.test1 yellow `date` Test Y"
bb 127.0.0.1 "status osiris,hswn,dk.test1 red `date` Test R"
and noticed what emails were being sent - I should get one message for
the yellow status, and two for the red. When I got these, repeat the
"bb" commands with a green status, and see what recovery messages show
up.
Henrik