Xymon Mailing List Archive search

disable untill ok bug ?

10 messages in this thread

list Stef Coene · Wed, 4 Mar 2009 10:27:53 +0100 ·
Hi,

Yesterday I disabled a ping test 'untill ok'.
This night, the ping test failed on the hobbit server (4.2) with status clear:
Wed Mar 4 00:05:42 2009 ping ok : System failure of the ping test

Service ping on srvs3tsm is OK
Hobbit system error

System unreachable for 1508 poll periods (455337 seconds)

This also cleared the disable setting so 5 minutes later, the status was error 
and 10 minutes later an alert was triggered.


Is it normal that a 'system failure' of the ping test clears the 'disabled 
untill ok' setting?


Stef
list Henrik Størner · Wed, 4 Mar 2009 13:45:45 +0100 ·
quoted from Stef Coene
On Wed, Mar 04, 2009 at 10:27:53AM +0100, Stef Coene wrote:
Yesterday I disabled a ping test 'untill ok'.
This night, the ping test failed on the hobbit server (4.2) with status clear:
Wed Mar 4 00:05:42 2009 ping ok : System failure of the ping test

Service ping on srvs3tsm is OK
Hobbit system error

System unreachable for 1508 poll periods (455337 seconds)

This also cleared the disable setting so 5 minutes later, the status was error and 10 minutes later an alert was triggered.

Is it normal that a 'system failure' of the ping test clears the 'disabled untill ok' setting?
A "system error" of the ping test is definitely NOT normal.
Check the network test error log to see why it failed,
it shouldn't do that. (Most likely it couldn't create the file where fping or hobbitping stores the results).

Whether a "clear" status should count as OK in the "disabled until OK" sense can be discussed.


Regards,
Henrik
list Dan McDonald · Wed, 4 Mar 2009 06:55:41 -0600 ·
quoted from Stef Coene
On Wed, 2009-03-04 at 10:27 +0100, Stef Coene wrote:
Hi,

Yesterday I disabled a ping test 'untill ok'.
This night, the ping test failed on the hobbit server (4.2) with status clear:
[...]
Is it normal that a 'system failure' of the ping test clears the 'disabled 
untill ok' setting?
if clear is listed in OKCOLORS, then yes, that would be normal.

The default is:
xymonserver.cfg:ALERTCOLORS="red,yellow,purple"			# Colors that may
trigger an alert message
xymonserver.cfg:OKCOLORS="green,blue,clear"			# Colors that may trigger
a recovery message

-- 
Daniel J McDonald, CCIE #2495, CISSP #78281, CNX
Austin Energy
http://www.austinenergy.com
list Stef Coene · Wed, 4 Mar 2009 15:25:44 +0100 ·
quoted from Henrik Størner
On Wednesday 04 March 2009, Henrik Størner wrote:
A "system error" of the ping test is definitely NOT normal.
Check the network test error log to see why it failed,
it shouldn't do that. (Most likely it couldn't create the
file where fping or hobbitping stores the results).
I know, I found the culprit:
2009-03-04 00:05:47 Cannot create 
file /home/users/hobbit/server/tmp/ping-stdout.2056 : Permission denied
2009-03-04 00:05:47 Cannot create 
file /home/users/hobbit/server/tmp/ping-stderr.2056 : Permission denied
2009-03-04 00:05:50 xgetenv: Cannot find value for variable $BBTMP
2009-03-04 00:05:50 hobbitping child could not create outputfiles in (null)
2009-03-04 00:05:50 Cannot open ping output 
file /home/users/hobbit/server/tmp/ping-stdout.2056

This is caused by a configure scripts that does a chown root for all hobbit 
files and changes the required files back to owner hobbit.
quoted from Henrik Størner
Whether a "clear" status should count as OK in the
"disabled until OK" sense can be discussed.
For me, OK = green.  I didn't expected that a clear message would count as OK.


Stef
list Stef Coene · Wed, 4 Mar 2009 15:26:09 +0100 ·
quoted from Dan McDonald
On Wednesday 04 March 2009, McDonald, Dan wrote:
On Wed, 2009-03-04 at 10:27 +0100, Stef Coene wrote:
Hi,

Yesterday I disabled a ping test 'untill ok'.
This night, the ping test failed on the hobbit server (4.2) with status
clear:
[...]
Is it normal that a 'system failure' of the ping test clears the
'disabled untill ok' setting?
if clear is listed in OKCOLORS, then yes, that would be normal.

The default is:
xymonserver.cfg:ALERTCOLORS="red,yellow,purple"			# Colors that may
trigger an alert message
xymonserver.cfg:OKCOLORS="green,blue,clear"			# Colors that may trigger
a recovery message
Thx, I will change this to green,blue only.


Stef
list D. - Gdi/snb Kip · Thu, 5 Mar 2009 09:21:00 +0100 ·
-----Oorspronkelijk bericht-----
Van: Stef Coene [mailto:user-dbffe946c0f4@xymon.invalid] 
Verzonden: woensdag 4 maart 2009 15:26
Aan: user-ae9b8668bcde@xymon.invalid
Onderwerp: Re: [hobbit] disable untill ok bug ?
Thx, I will change this to green,blue only.
That will also mean any page with a 'clear' status will no longer remain
green...
Make sure you want this effect as well...

Your one time faulty ping was obviously not a common problem, and you
have found the cause. It is unlikely that you will want to disable a
pingtest 'untill ok' more often then you would encounter a non-geen page
due to a 'clear' test.

For example, I use the clear status as an in-between before going yellow
or red, with the 'badtest' settings. I would not want the page to go
grey jyst because one ping was missed, so we have 'badconn:1:3:5' in our
host settings.

//Danny.
list Stef Coene · Thu, 5 Mar 2009 10:19:51 +0100 ·
quoted from D. - Gdi/snb Kip
That will also mean any page with a 'clear' status will no longer remain
green...
Make sure you want this effect as well...
A clear page is still grey, not green.
As far as I can check, it just means that if you  disabled a test, the clear 
page will not trigger it to undo the disable, so it will stay disabled.


Stef
list D. - Gdi/snb Kip · Thu, 5 Mar 2009 11:58:34 +0100 ·
quoted from Stef Coene
 

-----Oorspronkelijk bericht-----
Van: Stef Coene [mailto:user-dbffe946c0f4@xymon.invalid] 
Verzonden: donderdag 5 maart 2009 10:20
Aan: user-ae9b8668bcde@xymon.invalid
Onderwerp: Re: [hobbit] disable untill ok bug ?
That will also mean any page with a 'clear' status will no longer 
remain green...
Make sure you want this effect as well...
A clear page is still grey, not green.
As far as I can check, it just means that if you  disabled a test, the
clear page will not trigger it to undo the disable, so it will stay
disabled.
A page full of hosts that is all-green (every dot is green) will display
as a green page.

If one dot (for example the 'conn' check on server X) changes to 'clear'
(grey, as you wish) the page will remain green. 

If later, the dot ('conn' for server X again) changes to yellow, the
entire page will also turn yellow (unless there is a 'NOPROPAGATEYELLOW'
setting on that check, of course)


This is desired behaviour of hobbit, as the 'clear' setting is marked in
the configuration as 'OK' and 'yellow' is not. (see previous posts in
this thread)
If you change the setting in the configuration for 'clear' to not be
'OK', pages will change color to 'clear' (yes, grey).

This last bit may not be the behaviour the topicstarter want, so I
wished to warn him about it.

If you were refering to the seperate page for the actual check (dot)
that turns grey, then yes, that will always be grey. As that page (the
'conn' page for server X perhaps?) only refers to the actual check you
are displaying, not a page full of hosts. 

//Danny.
list Stef Coene · Thu, 5 Mar 2009 13:26:23 +0100 ·
quoted from D. - Gdi/snb Kip
On Thursday 05 March 2009, Kip, D. - GDI/SNB wrote:
If you change the setting in the configuration for 'clear' to not be
'OK', pages will change color to 'clear' (yes, grey).
Are you sure?  Because all pages are green, even if there is a clear check.
From the config file:
OKCOLORS: Colors that may trigger a recovery message
In my case this is blue and green.  I dont't think this anything to do in 
determining the color of the pages.


Stef
list D. - Gdi/snb Kip · Thu, 5 Mar 2009 14:18:07 +0100 ·
I admit that I am not sure :)

But a colleague had some time back changed stuff around and we got grey
pages as a result. I never bothered to look exactly what he changed and
just put back the backup... But I know he was at least changing things
in that config file :p
quoted from Stef Coene
 

-----Oorspronkelijk bericht-----
Van: Stef Coene [mailto:user-dbffe946c0f4@xymon.invalid] 
Verzonden: donderdag 5 maart 2009 13:26
Aan: user-ae9b8668bcde@xymon.invalid
Onderwerp: Re: [hobbit] disable untill ok bug ?

On Thursday 05 March 2009, Kip, D. - GDI/SNB wrote:
If you change the setting in the configuration for 'clear' to not be 
'OK', pages will change color to 'clear' (yes, grey).
Are you sure?  Because all pages are green, even if there is a clear
check.
From the config file:
OKCOLORS: Colors that may trigger a recovery message In my case this is
blue and green.  I dont't think this anything to do in determining the
color of the pages.


Stef