Order of events:
1am: alert went yellow, email was sent out
1:15am: alert recovered
2am, and each additional hour: email was sent out saying alert was
yellow (it was actually showing green)
10:30a: I restart Hobbit and get the "stale alert" message, and it
finally stops sending alerts. Recovery email was never sent out, even
though it is in the alert rules.
On Nov 14, 2007 11:22 AM, Josh Luthman <user-4c45a83f15cb@xymon.invalid> wrote:
You're saying it went yellow, then green. The log tells you it sent an
alert when it was yellow.
I'm not sure I'm seeing the problem here =/ It sent an alert first when it
was yellow and another when it switch to green to inform you it recovered,
correct?
On 11/14/07, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
Yes, the alert history shows it went yellow and then 5 minutes later
recovered. The web page is showing everything correct. However, when
I check the notifications.log file, I can see that it was still
sending alerts about it being yellow, even though it was definitely
green.
On Nov 14, 2007 10:38 AM, Josh Luthman <user-4c45a83f15cb@xymon.invalid>
wrote:
Click on it the host's test and click on history - was red at all?
Are the WWW pages updating? Look in the top right corner of the page
once
you click on the host's test link.
On 11/14/07, Gary Baluha < user-ae3e15c22de1@xymon.invalid> wrote:
I noticed this morning that our Hobbit server was sending out alerts
for the process check for a machine that was actually shown as green
on the hobbit web page. I checked on the monitored machine, and the
alert was indeed green, yet the server was still sending out emails as
though it were in a yellow state. I restarted the Hobbit client on
the monitored machine, and then restarted the Hobbit server on the
server. After doing this, I noticed the following in the page.log
file:
2007-11-14 10:26:13 Stale alert for host-name:procs dropped
(I changed the actual host name to "host-name" to protect the
innocent)
What exactly does this mean? Before I restarted the Hobbit server
process, I manually edited the alert.chk temp file and removed the
erroneous alert, but that didn't correct the problem. It was only
after I restarted the Hobbit server process that it cleared the alert.
Is this a bug in the 4.2.0 code, or is there something else going on
here?