Alert config issue or question -- 4.2 Alpha Release (and 4.03 Rc1)
list Tom Kauffman
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.
With this alert_config set:
HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs
SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h
MAIL techsupt at localhost SERVICE=telnet,svcs,procs DURATION>6
REPEAT=4h
MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h
I get this when running the test:
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 140
00018172 2006-04-05 11:42:36 *** Match with
'HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs' ***
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 141
00018172 2006-04-05 11:42:36 Failed
'SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h' (min.
duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 142
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost
SERVICE=telnet,svcs,procs DURATION>6 REPEAT=4h' (min. duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 143
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
So if I'm reading this correctly, my 'duration>6m' is being interpreted
as 'duration>6h'. The intent here is to require two failed tests before
paging. Where have I gone wrong?
Tom Kauffman
NIBCO, Inc
CONFIDENTIALITY NOTICE: This email and any attachments are for the
exclusive and confidential use of the intended recipient. If you are not
the intended recipient, please do not read, distribute or take action in
reliance upon this message. If you have received this in error, please
notify us immediately by return email and promptly delete this message
and its attachments from your computer system. We do not waive
attorney-client or work product privilege by the transmission of this
message.
list Charles Jones
Not sure if this could be the problem, but your DURATION parameter in the MAIL line just says "DURATION>6". Maybe hours is the default, and you need to change it to DURATION>6m (like your second use of it on the last line)? -Charles
▸
Kauffman, Tom wrote:I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.
With this alert_config set:
HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs
SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h
MAIL techsupt at localhost SERVICE=telnet,svcs,procs *DURATION>6*
REPEAT=4h
MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h
I get this when running the test:
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 140
00018172 2006-04-05 11:42:36 *** Match with
'HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs' ***
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 141
00018172 2006-04-05 11:42:36 Failed
'SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h' (min.
duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 142
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost
SERVICE=telnet,svcs,procs DURATION>6 REPEAT=4h' (min. duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 143
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
So if I'm reading this correctly, my 'duration>6m' is being interpreted
as 'duration>6h'. The intent here is to require two failed tests before
paging. Where have I gone wrong?
Tom Kauffman
NIBCO, Inc
CONFIDENTIALITY NOTICE: This email and any attachments are for the
exclusive and confidential use of the intended recipient. If you are not
the intended recipient, please do not read, distribute or take action in
reliance upon this message. If you have received this in error, please
notify us immediately by return email and promptly delete this message
and its attachments from your computer system. We do not waive
attorney-client or work product privilege by the transmission of this
message.
list Tom Kauffman
Nope - same result both ways (and the doc says minutes is the default, FWIW). Tom From: Charles Jones [mailto:user-e86b4aeade4e@xymon.invalid] Sent: Wednesday, April 05, 2006 12:02 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Alert config issue or question -- 4.2 Alpha Release (and 4.03 Rc1)
▸
Not sure if this could be the problem, but your DURATION parameter in
the MAIL line just says "DURATION>6". Maybe hours is the default, and
you need to change it to DURATION>6m (like your second use of it on the
last line)?
-Charles
Kauffman, Tom wrote:
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't
seem to change with 4.2 Alpha.
With this alert_config set:
HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs
SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h
MAIL techsupt at localhost SERVICE=telnet,svcs,procs DURATION>6
REPEAT=4h
MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h
I get this when running the test:
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 140
00018172 2006-04-05 11:42:36 *** Match with
'HOST=%(whq-sapcon-1|whq-sapcon-2) EXSERVICE=msgs' ***
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 141
00018172 2006-04-05 11:42:36 Failed
'SCRIPT=/usr/local/hobbit/server/ext/pg/sms oracle_echelon
SERVICE=telnet,svcs,procs color=RED DURATION>6m REPEAT=4h' (min.
duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 142
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost
SERVICE=telnet,svcs,procs DURATION>6 REPEAT=4h' (min. duration 0<360)
00018172 2006-04-05 11:42:36 Matching host:service:page
'whq-sapcon-1:telnet:' against rule line 143
00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
So if I'm reading this correctly, my 'duration>6m' is being interpreted
as 'duration>6h'. The intent here is to require two failed tests before
paging. Where have I gone wrong?
Tom Kauffman
NIBCO, Inc
CONFIDENTIALITY NOTICE: This email and any attachments are for the
exclusive and confidential use of the intended recipient. If you are
not
the intended recipient, please do not read, distribute or take action in
reliance upon this message. If you have received this in error, please
notify us immediately by return email and promptly delete this message
and its attachments from your computer system. We do not waive
attorney-client or work product privilege by the transmission of this
message.
CONFIDENTIALITY NOTICE: This email and any attachments are for the
exclusive and confidential use of the intended recipient. If you are not
the intended recipient, please do not read, distribute or take action in
reliance upon this message. If you have received this in error, please
notify us immediately by return email and promptly delete this message
and its attachments from your computer system. We do not waive
attorney-client or work product privilege by the transmission of this
message.
list Henrik Størner
▸
On Wed, Apr 05, 2006 at 11:52:00AM -0400, Kauffman, Tom wrote:
I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't seem to change with 4.2 Alpha. 00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED DURATION>6m REPEAT=4h' (min. duration 0<360) So if I'm reading this correctly, my 'duration>6m' is being interpreted as 'duration>6h'. The intent here is to require two failed tests before paging. Where have I gone wrong?
You're not reading it correctly. When you put DURATION>6m into the config, Hobbit internally converts that into 360 seconds. The output you see means that the current duration of the alert (0 seconds) - was less than the minimum duration required (360 seconds). I cannot recall if the test option in 4.0.3 allows you to specify the duration for the test-alert; with newer version you can, and then you'd see the the "0" change accordingly. Henrik
list Tom Kauffman
Well, I feel better, but then I'm still stumped. We had an instance last week where the telnet test failed for 21 minutes -- and no alerts went out. This box acts as a telnet server for half our remote-site scanners, so if we loose it, we have production people not working :-( What's the syntax for adding the duration on the test? I tried "bin/bbcmd hobbitd_alert --test whq-sapcon-1 telnet DURATION=361" while I was playing around, to no apparent avail (on the 4.2 alpha). Thanks! Tom
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, April 05, 2006 12:09 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Alert config issue or question -- 4.2 Alpha
Release (and 4.03 Rc1)
On Wed, Apr 05, 2006 at 11:52:00AM -0400, Kauffman, Tom wrote:I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't seem to change with 4.2 Alpha. 00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
So if I'm reading this correctly, my 'duration>6m' is being interpreted as 'duration>6h'. The intent here is to require two failed tests before paging. Where have I gone wrong?
You're not reading it correctly. When you put DURATION>6m into the config, Hobbit internally converts that into 360 seconds. The output you see means that the current duration of the alert (0 seconds) - was less than the minimum duration required (360 seconds). I cannot recall if the test option in 4.0.3 allows you to specify the duration for the test-alert; with newer version you can, and then you'd see the the "0" change accordingly. Henrik CONFIDENTIALITY NOTICE: This email and any attachments are for the exclusive and confidential use of the intended recipient. If you are not the intended recipient, please do not read, distribute or take action in reliance upon this message. If you have received this in error, please notify us immediately by return email and promptly delete this message and its attachments from your computer system. We do not waive attorney-client or work product privilege by the transmission of this message.
list Tom Kauffman
OK, I *am* dense today. Found the way to specify duration on the test (DUH!) and am now trying to figure out wha' hoppened last week. Time to set up controlled test environment again . . . Tom
▸
-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid]
Sent: Wednesday, April 05, 2006 1:03 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Alert config issue or question -- 4.2 Alpha
Release (and 4.03 Rc1)
Well, I feel better, but then I'm still stumped. We had an instance last
week where the telnet test failed for 21 minutes -- and no alerts went
out.
This box acts as a telnet server for half our remote-site scanners, so
if we loose it, we have production people not working :-(
What's the syntax for adding the duration on the test? I tried
"bin/bbcmd hobbitd_alert --test whq-sapcon-1 telnet DURATION=361" while
I was playing around, to no apparent avail (on the 4.2 alpha).
Thanks!
Tom
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, April 05, 2006 12:09 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Alert config issue or question -- 4.2 Alpha
Release (and 4.03 Rc1)
On Wed, Apr 05, 2006 at 11:52:00AM -0400, Kauffman, Tom wrote:I've a problem with my running 4.03RC1 (yeah, I know :-) that doesn't seem to change with 4.2 Alpha. 00018172 2006-04-05 11:42:36 Failed 'MAIL techsupt at localhost UNMATCHED
DURATION>6m REPEAT=4h' (min. duration 0<360)
So if I'm reading this correctly, my 'duration>6m' is being interpreted as 'duration>6h'. The intent here is to require two failed tests before paging. Where have I gone wrong?
You're not reading it correctly. When you put DURATION>6m into the config, Hobbit internally converts that into 360 seconds. The output you see means that the current duration of the alert (0 seconds) - was less than the minimum duration required (360 seconds). I cannot recall if the test option in 4.0.3 allows you to specify the duration for the test-alert; with newer version you can, and then you'd see the the "0" change accordingly. Henrik CONFIDENTIALITY NOTICE: This email and any attachments are for the exclusive and confidential use of the intended recipient. If you are not the intended recipient, please do not read, distribute or take action in reliance upon this message. If you have received this in error, please notify us immediately by return email and promptly delete this message and its attachments from your computer system. We do not waive attorney-client or work product privilege by the transmission of this message. CONFIDENTIALITY NOTICE: This email and any attachments are for the exclusive and confidential use of the intended recipient. If you are not the intended recipient, please do not read, distribute or take action in reliance upon this message. If you have received this in error, please notify us immediately by return email and promptly delete this message and its attachments from your computer system. We do not waive attorney-client or work product privilege by the transmission of this message.