cpu alerts
list Bill Perez
Hello - I have a standby hobbit server running 4.2-alfa-20060423, with alerts configured the same as the primary hobbit server running 4.2.1RC1. I have a duration of 10m set up for the cpu alert and have been noticing cpu alerts coming before the 10 minutes (cpu only panics for 4:59) and only sends the alert doesn't send a recovery. Like I mentioned I have the same alert configured on another hobbit server and don't get alerts from that one, only the standby. Has anyone seen different behavior with alerting for this hobbit version? Any ideas why this might be happening? I have a bunch of other alerts set up that seem to be behaving properly, it just seems to be cpu that is acting different. Here's the alert config - MAIL $UNIX HOST=%(server1|server2|server3) DURATION>10m RECOVERED SERVICE=cpu COLOR=red,purple Thanks in advance for any info
list Bill Perez
Hi - just wondering if someone could shed any light on why I'm having
duration alert issues with cpu alert.
I updated my secondary hobbit server to the latest Hobbit version,
4.2RC20060712 and am still not having luck with the duration taking
affect for my
cpu alerts. It shows in the info column that the delay is 10 minutes, but
will send an alert after 5 minutes of red status. The recovery email has
not been consistent either.
Here is my alert configuration -
PAGE=$PG_SIEBEL
MAIL $UNIX HOST=server.domain.com SERVICE=cpu DURATION>10 RECOVERED
COLOR=red,purple
Any ideas?
---------- Forwarded message ----------
▸
From: Bill Perez <user-3527628fa04a@xymon.invalid>
Date: Jul 29, 2006 8:02 PM
Subject: cpu alerts
To: hobbit <user-ae9b8668bcde@xymon.invalid>
Hello - I have a standby hobbit server running 4.2-alfa-20060423, with
alerts configured the same as the primary hobbit server running 4.2.1RC1. I
have a duration of 10m set up for the cpu alert and have been noticing cpu
alerts coming before the 10 minutes (cpu only panics for 4:59) and only
sends the alert doesn't send a recovery. Like I mentioned I have the same
alert configured on another hobbit server and don't get alerts from that
one, only the standby. Has anyone seen different behavior with alerting for
this hobbit version? Any ideas why this might be happening? I have a bunch
of other alerts set up that seem to be behaving properly, it just seems to
be cpu that is acting different.
Here's the alert config -
MAIL $UNIX HOST=%(server1|server2|server3) DURATION>10m RECOVERED
SERVICE=cpu COLOR=red,purple
Thanks in advance for any info
list Henrik Størner
▸
On Tue, Aug 01, 2006 at 11:54:33AM -0400, Bill Perez wrote:
Hi - just wondering if someone could shed any light on why I'm having duration alert issues with cpu alert. I updated my secondary hobbit server to the latest Hobbit version, 4.2RC20060712 and am still not having luck with the duration taking affect for my cpu alerts. It shows in the info column that the delay is 10 minutes, but will send an alert after 5 minutes of red status. The recovery email has not been consistent either.
Could you show us a copy of the cpu history log (in ~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log from ~hobbit/server/logs/notifications.log ? Regards, Henrik
list Bill Perez
▸
Could you show us a copy of the cpu history log (in ~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log from ~hobbit/server/logs/notifications.log ?
Here is the hostname.cpu and section from notifications.log for those alerts this morning: From /hobbit/data/hist/HOSTNAME.cpu Tue Aug 1 10:34:30 2006 yellow 1154442870 1200 Tue Aug 1 10:54:30 2006 red 1154444070 600 Tue Aug 1 11:04:30 2006 green 1154444670 299 Tue Aug 1 11:09:29 2006 yellow 1154444969 301 Tue Aug 1 11:14:30 2006 red 1154445270 301 Tue Aug 1 11:19:31 2006 green 1154445571 Tue Aug 1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444070 200 Tue Aug 1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444670 200 1800 Tue Aug 1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445570 200 Tue Aug 1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445581 200 612
▸
On 8/1/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Tue, Aug 01, 2006 at 11:54:33AM -0400, Bill Perez wrote:Hi - just wondering if someone could shed any light on why I'm having duration alert issues with cpu alert. I updated my secondary hobbit server to the latest Hobbit version, 4.2RC20060712 and am still not having luck with the duration taking affect for my cpu alerts. It shows in the info column that the delay is 10 minutes, but will send an alert after 5 minutes of red status. The recovery email has not been consistent either.Could you show us a copy of the cpu history log (in ~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log from ~hobbit/server/logs/notifications.log ? Regards, Henrik
list Henrik Størner
▸
On Tue, Aug 01, 2006 at 01:29:00PM -0400, Bill Perez wrote:
Could you show us a copy of the cpu history log (in ~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log from ~hobbit/server/logs/notifications.log ?Here is the hostname.cpu and section from notifications.log for those alerts this morning:From /hobbit/data/hist/HOSTNAME.cpuTue Aug 1 10:34:30 2006 yellow 1154442870 1200 Tue Aug 1 10:54:30 2006 red 1154444070 600 Tue Aug 1 11:04:30 2006 green 1154444670 299 Tue Aug 1 11:09:29 2006 yellow 1154444969 301 Tue Aug 1 11:14:30 2006 red 1154445270 301 Tue Aug 1 11:19:31 2006 green 1154445571 Tue Aug 1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444070 200 Tue Aug 1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444670 200 1800 Tue Aug 1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445570 200 Tue Aug 1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445581 200 612
OK, Hobbit thinks the first event begins at 10:34 when the status goes yellow. Even though this doesn't trigger an alert, it registers this as the starttime of the event. So when it goes red at 10:54, your 10 minute delay has already elapsed, and you get an immediate alert. Then when it goes green at 11:04 you of course get a recovery notice. Same thing when the goes yellow again at 11:09. No alert is sent, but this time is registered as the start of the event. So at 11:14 when it goes red you do not get an alert (11:09->11:14 is only 5 minutes), but you do get the alert at 11:19:30 - and when it goes green at 11:19:31 it sends out a "recovered" message. What time should Hobbit consider the start-of-event time? Some prefer the current arrangement where it uses the time it goes non-green; others prefer the time it goes to a color which triggers an alert. I've heard arguments both ways. Regards, Henrik
list Bill Perez
▸
What time should Hobbit consider the start-of-event time? Some prefer the current arrangement where it uses the time it goes non-green; others prefer the time it goes to a color which triggers an alert. I've heard arguments both ways.
Thanks Henrik. The way it is working in the server I have running 4.2.1RC1is how I'm looking for it to run on the standby server, which I just installed 4.2.1P1 on and I'm still getting the alerts with the duration including the yellow time, not just when it goes to red\panic. Is there a way I can change it so that it will work as it does in 4.2.1RC1 and only send an alert after 10 minutes of a red panic that doesn't include the yellow in the duration? I'm also finding I don't get recovery notices if it goes from red to yellow and then to green.
▸
On 8/8/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Tue, Aug 01, 2006 at 01:29:00PM -0400, Bill Perez wrote:Could you show us a copy of the cpu history log (in ~hobbit/data/hist/HOSTNAME.cpu) compared with the notifications log from ~hobbit/server/logs/notifications.log ?Here is the hostname.cpu and section from notifications.log for those alerts this morning:From /hobbit/data/hist/HOSTNAME.cpuTue Aug 1 10:34:30 2006 yellow 1154442870 1200 Tue Aug 1 10:54:30 2006 red 1154444070 600 Tue Aug 1 11:04:30 2006 green 1154444670 299 Tue Aug 1 11:09:29 2006 yellow 1154444969 301 Tue Aug 1 11:14:30 2006 red 1154445270 301 Tue Aug 1 11:19:31 2006 green 1154445571 Tue Aug 1 10:54:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444070 200 Tue Aug 1 11:04:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154444670 200 1800 Tue Aug 1 11:19:30 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445570 200 Tue Aug 1 11:19:42 2006 uswosfad.domain.com.cpu (10.128.40.31) user-b9608d6c1a4c@xymon.invalid[175] 1154445581 200 612OK, Hobbit thinks the first event begins at 10:34 when the status goes yellow. Even though this doesn't trigger an alert, it registers this as the starttime of the event. So when it goes red at 10:54, your 10 minute delay has already elapsed, and you get an immediate alert. Then when it goes green at 11:04 you of course get a recovery notice. Same thing when the goes yellow again at 11:09. No alert is sent, but this time is registered as the start of the event. So at 11:14 when it goes red you do not get an alert (11:09->11:14 is only 5 minutes), but you do get the alert at 11:19:30 - and when it goes green at 11:19:31 it sends out a "recovered" message. What time should Hobbit consider the start-of-event time? Some prefer the current arrangement where it uses the time it goes non-green; others prefer the time it goes to a color which triggers an alert. I've heard arguments both ways. Regards, Henrik