Xymon Mailing List Archive search

stale alerts

list Josh Luthman
Wed, 14 Nov 2007 12:18:15 -0500
Message-Id: <user-9177c1c1e19a@xymon.invalid>

This has never happened to me - are the two of you using the 4.2.0 release?

Josh

On 11/14/07, Gore, David W (David) <user-3e5761c68b56@xymon.invalid> wrote:
-----Original Message-----
From: Gary Baluha [mailto:user-ae3e15c22de1@xymon.invalid]
Sent: Wednesday, November 14, 2007 16:30
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] stale alerts

Order of events:
1am: alert went yellow, email was sent out
1:15am: alert recovered
2am, and each additional hour: email was sent out saying
alert was yellow (it was actually showing green)
10:30a: I restart Hobbit and get the "stale alert" message,
and it finally stops sending alerts.  Recovery email was
never sent out, even though it is in the alert rules.

On Nov 14, 2007 11:22 AM, Josh Luthman
<user-4c45a83f15cb@xymon.invalid> wrote:
You're saying it went yellow, then green.  The log tells
you it sent
an alert when it was yellow.

I'm not sure I'm seeing the problem here =/  It sent an alert first
when it was yellow and another when it switch to green to
inform you
it recovered, correct?


On 11/14/07, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
Yes, the alert history shows it went yellow and then 5
minutes later
recovered.  The web page is showing everything correct.  However,
when I check the notifications.log file, I can see that
it was still
sending alerts about it being yellow, even though it was
definitely
green.

On Nov 14, 2007 10:38 AM, Josh Luthman
<user-4c45a83f15cb@xymon.invalid>
wrote:
Click on it the host's test and click on history - was
red at all?

Are the WWW pages updating?  Look in the top right
corner of the
page
once
you click on the host's test link.


 On 11/14/07, Gary Baluha < user-ae3e15c22de1@xymon.invalid> wrote:

I noticed this morning that our Hobbit server was sending out
alerts for the process check for a machine that was actually
shown as green on the hobbit  web page.  I checked on the
monitored machine, and the alert was indeed green, yet the
server was still sending out emails as though it were in a
yellow state.  I restarted the Hobbit client on the monitored
machine, and then restarted the Hobbit server on the server.
After doing this, I noticed the following in the page.log
file:

2007-11-14 10:26:13 Stale alert for host-name:procs dropped

(I changed the actual host name to "host-name" to protect the
innocent)
What exactly does this mean?  Before I restarted the Hobbit
server process, I manually edited the alert.chk temp file and
removed the erroneous alert, but that didn't correct the
problem.  It was only after I restarted the Hobbit
server process that it cleared the alert.
Is this a bug in the 4.2.0 code, or is there something else
going on here?
Gary,

We get those too, along with leftover semaphores and shared memory
segments when we stop the hobbit server.  The snapshot, seems much
better, but with tooltips and host descriptions pushing our display to
the far right of the screen we cannot use it right now.

~David

Ps. Just to let you know it's not just your setup.

-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer