Xymon Mailing List Archive search

cpu alerts

6 messages in this thread

list Bill Perez · Sat, 29 Jul 2006 20:02:19 -0400 ·
Hello - I have a standby hobbit server running 4.2-alfa-20060423, with
alerts configured the same as the primary hobbit server running 4.2.1RC1.  I
have a duration of 10m set up for the cpu alert and have been noticing cpu
alerts coming before the 10 minutes (cpu only panics for 4:59) and only
sends the alert doesn't send a recovery.  Like I mentioned I have the same
alert configured on another hobbit server and don't get alerts from that
one, only the standby.  Has anyone seen different behavior with alerting for
this hobbit version?  Any ideas why this might be happening?  I have a bunch
of other alerts set up that seem to be behaving properly, it just seems to
be cpu that is acting different.

Here's the alert config -

MAIL $UNIX HOST=%(server1|server2|server3) DURATION>10m RECOVERED
SERVICE=cpu COLOR=red,purple

Thanks in advance for any info
list Bill Perez · Tue, 1 Aug 2006 11:54:33 -0400 ·
Hi - just wondering if someone could shed any light on why I'm having
duration alert issues with cpu alert.
 I updated my secondary hobbit server to the latest Hobbit version,
4.2RC20060712 and am still not having luck with the duration taking
affect for my
cpu alerts.  It shows in the info column that the delay is 10 minutes, but
will send an alert after 5 minutes of red status.  The recovery email has
not been consistent either.

Here is my alert configuration -

PAGE=$PG_SIEBEL
        MAIL $UNIX HOST=server.domain.com SERVICE=cpu DURATION>10 RECOVERED
COLOR=red,purple

Any ideas?

---------- Forwarded message ----------
quoted from Bill Perez
From: Bill Perez <user-3527628fa04a@xymon.invalid>
Date: Jul 29, 2006 8:02 PM
Subject: cpu alerts
To: hobbit <user-ae9b8668bcde@xymon.invalid>


Hello - I have a standby hobbit server running 4.2-alfa-20060423, with
alerts configured the same as the primary hobbit server running 4.2.1RC1.  I
have a duration of 10m set up for the cpu alert and have been noticing cpu
alerts coming before the 10 minutes (cpu only panics for 4:59) and only
sends the alert doesn't send a recovery.  Like I mentioned I have the same
alert configured on another hobbit server and don't get alerts from that
one, only the standby.  Has anyone seen different behavior with alerting for
this hobbit version?  Any ideas why this might be happening?  I have a bunch
of other alerts set up that seem to be behaving properly, it just seems to
be cpu that is acting different.

Here's the alert config -

MAIL $UNIX HOST=%(server1|server2|server3) DURATION>10m RECOVERED
SERVICE=cpu COLOR=red,purple

Thanks in advance for any info
list Henrik Størner · Tue, 1 Aug 2006 18:00:40 +0200 ·
quoted from Bill Perez
On Tue, Aug 01, 2006 at 11:54:33AM -0400, Bill Perez wrote:
Hi - just wondering if someone could shed any light on why I'm having
duration alert issues with cpu alert.
I updated my secondary hobbit server to the latest Hobbit version,
4.2RC20060712 and am still not having luck with the duration taking
affect for my
cpu alerts.  It shows in the info column that the delay is 10 minutes, but
will send an alert after 5 minutes of red status.  The recovery email has
not been consistent either.
Could you show us a copy of the cpu history log (in
~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log
from ~hobbit/server/logs/notifications.log ?


Regards,
Henrik
list Bill Perez · Tue, 1 Aug 2006 13:29:00 -0400 ·
quoted from Henrik Størner
Could you show us a copy of the cpu history log (in
~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log
from ~hobbit/server/logs/notifications.log ?
Here is the hostname.cpu and section from notifications.log for those alerts
this morning:

From /hobbit/data/hist/HOSTNAME.cpu
Tue Aug  1 10:34:30 2006 yellow 1154442870 1200
Tue Aug  1 10:54:30 2006 red 1154444070 600
Tue Aug  1 11:04:30 2006 green 1154444670 299
Tue Aug  1 11:09:29 2006 yellow 1154444969 301
Tue Aug  1 11:14:30 2006 red 1154445270 301
Tue Aug  1 11:19:31 2006 green 1154445571

Tue Aug  1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154444070 200
Tue Aug  1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154444670 200 1800
Tue Aug  1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154445570 200
Tue Aug  1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154445581 200 612
quoted from Henrik Størner


On 8/1/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Aug 01, 2006 at 11:54:33AM -0400, Bill Perez wrote:
Hi - just wondering if someone could shed any light on why I'm having
duration alert issues with cpu alert.
I updated my secondary hobbit server to the latest Hobbit version,
4.2RC20060712 and am still not having luck with the duration taking
affect for my
cpu alerts.  It shows in the info column that the delay is 10 minutes,
but
will send an alert after 5 minutes of red status.  The recovery email
has
not been consistent either.
Could you show us a copy of the cpu history log (in
~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log
from ~hobbit/server/logs/notifications.log ?


Regards,
Henrik

list Henrik Størner · Tue, 8 Aug 2006 23:00:39 +0200 ·
quoted from Bill Perez
On Tue, Aug 01, 2006 at 01:29:00PM -0400, Bill Perez wrote:
Could you show us a copy of the cpu history log (in
~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log
from ~hobbit/server/logs/notifications.log ?
Here is the hostname.cpu and section from notifications.log for those alerts
this morning:
From /hobbit/data/hist/HOSTNAME.cpu
Tue Aug  1 10:34:30 2006 yellow 1154442870 1200
Tue Aug  1 10:54:30 2006 red 1154444070 600
Tue Aug  1 11:04:30 2006 green 1154444670 299
Tue Aug  1 11:09:29 2006 yellow 1154444969 301
Tue Aug  1 11:14:30 2006 red 1154445270 301
Tue Aug  1 11:19:31 2006 green 1154445571

Tue Aug  1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444070 200
Tue Aug  1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444670 200 1800
Tue Aug  1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445570 200
Tue Aug  1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445581 200 612
OK, Hobbit thinks the first event begins at 10:34 when the status
goes yellow. Even though this doesn't trigger an alert, it registers
this as the starttime of the event. So when it goes red at 10:54, your
10 minute delay has already elapsed, and you get an immediate alert.
Then when it goes green at 11:04 you of course get a recovery notice.

Same thing when the goes yellow again at 11:09. No alert is sent, but
this time is registered as the start of the event. So at 11:14 when it
goes red you do not get an alert (11:09->11:14 is only 5 minutes), but
you do get the alert at 11:19:30 - and when it goes green at 11:19:31
it sends out a "recovered" message.

What time should Hobbit consider the start-of-event time?  Some prefer 
the current arrangement where it uses the time it goes non-green; others 
prefer the time it goes to a color which triggers an alert.  I've heard 
arguments both ways.


Regards,
Henrik
list Bill Perez · Thu, 10 Aug 2006 19:29:42 -0400 ·
quoted from Henrik Størner
What time should Hobbit consider the start-of-event time?  Some prefer
the current arrangement where it uses the time it goes non-green; others
prefer the time it goes to a color which triggers an alert.  I've heard
arguments both ways.
Thanks Henrik.  The way it is working in the server I have running
4.2.1RC1is how I'm looking for it to run on the standby server, which
I just
installed 4.2.1P1 on and I'm still getting the alerts with the duration
including the yellow time, not just when it goes to red\panic.  Is there a
way I can change it so that it will work as it does in 4.2.1RC1 and only
send an alert after 10 minutes of  a red panic that doesn't include the
yellow in the duration?  I'm also finding I don't get recovery notices if it
goes from red to yellow and then to green.
quoted from Henrik Størner


On 8/8/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Aug 01, 2006 at 01:29:00PM -0400, Bill Perez wrote:
Could you show us a copy of the cpu history log (in
~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log
from ~hobbit/server/logs/notifications.log ?
Here is the hostname.cpu and section from notifications.log for those
alerts
this morning:
From /hobbit/data/hist/HOSTNAME.cpu
Tue Aug  1 10:34:30 2006 yellow 1154442870 1200
Tue Aug  1 10:54:30 2006 red 1154444070 600
Tue Aug  1 11:04:30 2006 green 1154444670 299
Tue Aug  1 11:09:29 2006 yellow 1154444969 301
Tue Aug  1 11:14:30 2006 red 1154445270 301
Tue Aug  1 11:19:31 2006 green 1154445571

Tue Aug  1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154444070 200
Tue Aug  1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154444670 200 1800
Tue Aug  1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154445570 200
Tue Aug  1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31)
user-b9608d6c1a4c@xymon.invalid[175] 1154445581 200 612
OK, Hobbit thinks the first event begins at 10:34 when the status
goes yellow. Even though this doesn't trigger an alert, it registers
this as the starttime of the event. So when it goes red at 10:54, your
10 minute delay has already elapsed, and you get an immediate alert.
Then when it goes green at 11:04 you of course get a recovery notice.

Same thing when the goes yellow again at 11:09. No alert is sent, but
this time is registered as the start of the event. So at 11:14 when it
goes red you do not get an alert (11:09->11:14 is only 5 minutes), but
you do get the alert at 11:19:30 - and when it goes green at 11:19:31
it sends out a "recovered" message.

What time should Hobbit consider the start-of-event time?  Some prefer
the current arrangement where it uses the time it goes non-green; others
prefer the time it goes to a color which triggers an alert.  I've heard
arguments both ways.


Regards,
Henrik