Xymon Mailing List Archive search

Alert config issue or question -- 4.2 Alpha Release (and 4.03 Rc1)

6 messages in this thread

list Tom Kauffman · Wed, 5 Apr 2006 11:52:00 -0400 ·
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.

With this alert_config set:

HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs
        SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h
        MAIL techsupt at localhost SERVICE=telnet,svcs,procs DURATION>6
REPEAT=4h
        MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h

I get this when running the test:

00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 140
00018172 2006-04-05 11:42:36 *** Match with
'HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs' ***
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 141
00018172 2006-04-05 11:42:36 Failed
'SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h' (min.
duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 142
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost
SERVICE=telnet,svcs,procs DURATION>6 REPEAT=4h' (min. duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 143
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)

So if I'm reading this correctly, my 'duration>6m' is being interpreted
as 'duration>6h'. The intent here is to require two failed tests before
paging. Where have I gone wrong?

Tom Kauffman
NIBCO, Inc

CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.
list Charles Jones · Wed, 05 Apr 2006 09:01:52 -0700 ·
Not sure if this could be the problem, but your DURATION parameter in 
the MAIL line just says "DURATION>6". Maybe hours is the default, and 
you need to change it to DURATION>6m (like your second use of it on the 
last line)?

-Charles
quoted from Tom Kauffman

Kauffman, Tom wrote:
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.

With this alert_config set:

HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs
        SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h
        MAIL techsupt at localhost SERVICE=telnet,svcs,procs *DURATION>6*
REPEAT=4h
        MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h

I get this when running the test:

00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 140
00018172 2006-04-05 11:42:36 *** Match with
'HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs' ***
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 141
00018172 2006-04-05 11:42:36 Failed
'SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h' (min.
duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 142
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost
SERVICE=telnet,svcs,procs DURATION>6 REPEAT=4h' (min. duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 143
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)

So if I'm reading this correctly, my 'duration>6m' is being interpreted
as 'duration>6h'. The intent here is to require two failed tests before
paging. Where have I gone wrong?

Tom Kauffman
NIBCO, Inc

CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.

list Tom Kauffman · Wed, 5 Apr 2006 12:05:06 -0400 ·
Nope - same result both ways (and the doc says minutes is the default,
FWIW).

 
Tom

 
From: Charles Jones [mailto:user-e86b4aeade4e@xymon.invalid] 
Sent: Wednesday, April 05, 2006 12:02 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Alert config issue or question -- 4.2 Alpha
Release (and 4.03 Rc1)
quoted from Charles Jones

 
Not sure if this could be the problem, but your DURATION parameter in
the MAIL line just says "DURATION>6". Maybe hours is the default, and
you need to change it to DURATION>6m (like your second use of it on the
last line)?

-Charles

Kauffman, Tom wrote: 

I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.
 
With this alert_config set:
 
HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs
        SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h
        MAIL techsupt at localhost SERVICE=telnet,svcs,procs DURATION>6
REPEAT=4h
        MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h
 
I get this when running the test:
 
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 140
00018172 2006-04-05 11:42:36 *** Match with
'HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs' ***
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 141
00018172 2006-04-05 11:42:36 Failed
'SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h' (min.
duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 142
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost
SERVICE=telnet,svcs,procs DURATION>6 REPEAT=4h' (min. duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 143
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
 
So if I'm reading this correctly, my 'duration>6m' is being interpreted
as 'duration>6h'. The intent here is to require two failed tests before
paging. Where have I gone wrong?
 
Tom Kauffman
NIBCO, Inc
 
CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are
not
the intended recipient, please do not read, distribute or take action in

reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.
 
 
CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.
list Henrik Størner · Wed, 5 Apr 2006 18:08:37 +0200 ·
quoted from Tom Kauffman
On Wed, Apr 05, 2006 at 11:52:00AM -0400, Kauffman, Tom wrote:
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h' (min. duration 0<360)

So if I'm reading this correctly, my 'duration>6m' is being interpreted
as 'duration>6h'. The intent here is to require two failed tests before
paging. Where have I gone wrong?
You're not reading it correctly. When you put DURATION>6m into the
config, Hobbit internally converts that into 360 seconds. The output you
see means that the current duration of the alert (0 seconds) - was less
than the minimum duration required (360 seconds).

I cannot recall if the test option in 4.0.3 allows you to specify the
duration for the test-alert; with newer version you can, and then you'd
see the the "0" change accordingly.


Henrik
list Tom Kauffman · Wed, 5 Apr 2006 13:02:43 -0400 ·
Well, I feel better, but then I'm still stumped. We had an instance last
week where the telnet test failed for 21 minutes -- and no alerts went
out.

This box acts as a telnet server for half our remote-site scanners, so
if we loose it, we have production people not working :-(

What's the syntax for adding the duration on the test? I tried
"bin/bbcmd hobbitd_alert --test whq-sapcon-1 telnet DURATION=361" while
I was playing around, to no apparent avail (on the 4.2 alpha).

Thanks!

Tom
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, April 05, 2006 12:09 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Alert config issue or question -- 4.2 Alpha
Release (and 4.03 Rc1)

On Wed, Apr 05, 2006 at 11:52:00AM -0400, Kauffman, Tom wrote:
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
So if I'm reading this correctly, my 'duration>6m' is being
interpreted
as 'duration>6h'. The intent here is to require two failed tests
before
paging. Where have I gone wrong?
You're not reading it correctly. When you put DURATION>6m into the
config, Hobbit internally converts that into 360 seconds. The output you
see means that the current duration of the alert (0 seconds) - was less
than the minimum duration required (360 seconds).

I cannot recall if the test option in 4.0.3 allows you to specify the
duration for the test-alert; with newer version you can, and then you'd
see the the "0" change accordingly.


Henrik


CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.
list Tom Kauffman · Wed, 5 Apr 2006 18:01:51 -0400 ·
OK, I *am* dense today. Found the way to specify duration on the test
(DUH!) and am now trying to figure out wha' hoppened last week.

Time to set up controlled test environment again . . .

Tom
quoted from Tom Kauffman

-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid] 
Sent: Wednesday, April 05, 2006 1:03 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Alert config issue or question -- 4.2 Alpha
Release (and 4.03 Rc1)

Well, I feel better, but then I'm still stumped. We had an instance last
week where the telnet test failed for 21 minutes -- and no alerts went
out.

This box acts as a telnet server for half our remote-site scanners, so
if we loose it, we have production people not working :-(

What's the syntax for adding the duration on the test? I tried
"bin/bbcmd hobbitd_alert --test whq-sapcon-1 telnet DURATION=361" while
I was playing around, to no apparent avail (on the 4.2 alpha).

Thanks!

Tom

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, April 05, 2006 12:09 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Alert config issue or question -- 4.2 Alpha
Release (and 4.03 Rc1)

On Wed, Apr 05, 2006 at 11:52:00AM -0400, Kauffman, Tom wrote:
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
So if I'm reading this correctly, my 'duration>6m' is being
interpreted
as 'duration>6h'. The intent here is to require two failed tests
before
paging. Where have I gone wrong?
You're not reading it correctly. When you put DURATION>6m into the
config, Hobbit internally converts that into 360 seconds. The output you
see means that the current duration of the alert (0 seconds) - was less
than the minimum duration required (360 seconds).

I cannot recall if the test option in 4.0.3 allows you to specify the
duration for the test-alert; with newer version you can, and then you'd
see the the "0" change accordingly.


Henrik


CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are
not
the intended recipient, please do not read, distribute or take action in

reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.


CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.