Xymon Mailing List Archive search

cpu alerts

list Bill Perez
Thu, 10 Aug 2006 19:29:42 -0400
Message-Id: <user-7497d56a4408@xymon.invalid>

What time should Hobbit consider the start-of-event time?  Some prefer
the current arrangement where it uses the time it goes non-green; others
prefer the time it goes to a color which triggers an alert.  I've heard
arguments both ways.
Thanks Henrik.  The way it is working in the server I have running
4.2.1RC1is how I'm looking for it to run on the standby server, which
I just
installed 4.2.1P1 on and I'm still getting the alerts with the duration
including the yellow time, not just when it goes to red\panic.  Is there a
way I can change it so that it will work as it does in 4.2.1RC1 and only
send an alert after 10 minutes of  a red panic that doesn't include the
yellow in the duration?  I'm also finding I don't get recovery notices if it
goes from red to yellow and then to green.


On 8/8/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Aug 01, 2006 at 01:29:00PM -0400, Bill Perez wrote:
Could you show us a copy of the cpu history log (in
~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log
from ~hobbit/server/logs/notifications.log ?
Here is the hostname.cpu and section from notifications.log for those
alerts
this morning:
From /hobbit/data/hist/HOSTNAME.cpu
Tue Aug  1 10:34:30 2006 yellow 1154442870 1200
Tue Aug  1 10:54:30 2006 red 1154444070 600
Tue Aug  1 11:04:30 2006 green 1154444670 299
Tue Aug  1 11:09:29 2006 yellow 1154444969 301
Tue Aug  1 11:14:30 2006 red 1154445270 301
Tue Aug  1 11:19:31 2006 green 1154445571

Tue Aug  1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154444070 200
Tue Aug  1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154444670 200 1800
Tue Aug  1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154445570 200
Tue Aug  1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154445581 200 612
OK, Hobbit thinks the first event begins at 10:34 when the status
goes yellow. Even though this doesn't trigger an alert, it registers
this as the starttime of the event. So when it goes red at 10:54, your
10 minute delay has already elapsed, and you get an immediate alert.
Then when it goes green at 11:04 you of course get a recovery notice.

Same thing when the goes yellow again at 11:09. No alert is sent, but
this time is registered as the start of the event. So at 11:14 when it
goes red you do not get an alert (11:09->11:14 is only 5 minutes), but
you do get the alert at 11:19:30 - and when it goes green at 11:19:31
it sends out a "recovered" message.

What time should Hobbit consider the start-of-event time?  Some prefer
the current arrangement where it uses the time it goes non-green; others
prefer the time it goes to a color which triggers an alert.  I've heard
arguments both ways.


Regards,
Henrik