Xymon Mailing List Archive search

best (or any) way to remember disabled tests on the main page?

5 messages in this thread

list Oliver · Tue, 29 Jul 2014 16:39:17 -0400 ·
I'm not really sure how to explain this but hopefully it will make sense

Let's say the main Xymon page is divided up as follows:

prod --> 8 servers
devel --> 8 servers

One of the 'prod' servers has a known issue causing it to go red, so
we disable the test and it switches to blue.  Is there any way to
reflect this on the main page (which has now, of course, gone back to
green)?

Ideally, I'd like to see the name of the server group ("prod" in the
example) change to blue from white on the main view to remind me
there's an ignored test.  But I don't want the "main view" colour to
change from green

Possible?  Easy?

Thanks
list John Thurston · Tue, 29 Jul 2014 12:47:42 -0800 ·
quoted from Oliver
On 7/29/2014 12:39 PM, oliver wrote:
I'm not really sure how to explain this but hopefully it will make sense

Let's say the main Xymon page is divided up as follows:

prod --> 8 servers
devel --> 8 servers

One of the 'prod' servers has a known issue causing it to go red, so
we disable the test and it switches to blue.  Is there any way to
reflect this on the main page (which has now, of course, gone back to
green)?

Ideally, I'd like to see the name of the server group ("prod" in the
example) change to blue from white on the main view to remind me
there's an ignored test.  But I don't want the "main view" colour to
change from green
Don't disable the test. Acknowledge the alert.

-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska
list Oliver · Wed, 30 Jul 2014 12:50:12 -0400 ·
quoted from John Thurston
Ideally, I'd like to see the name of the server group ("prod" in the
example) change to blue from white on the main view to remind me
there's an ignored test.  But I don't want the "main view" colour to
change from green

Don't disable the test. Acknowledge the alert.
Let me explain the situation a little more clearly.

We have tons of servers deployed in pairs.  Each pair consists of an
active box and a standby box and it doesn't technically matter which
one of the two is active.  For consistency reasons, we like to keep it
so the "first" box is active whenever possible.

If the first box fails over, for whatever reason, it generates a red
alarm on Xymon saying it's no longer active and (after checking
everything out) we ask someone on the night-shift to fail back over
during off-hours.  At this point, we don't want the main Xymon view to
be red so we "ignore" the test.  However, since the main view is now
green, the techs sometimes forget that there's anything to do and it
remains failed over until someone drills down and sees it.

I was trying to get to a state where they would know that there's a
disabled/ignored/ack'd box from the front page to eliminate the "I
missed the email" excuses
list John Thurston · Wed, 30 Jul 2014 09:14:27 -0800 ·
quoted from Oliver
On 7/30/2014 8:50 AM, oliver wrote:
Ideally, I'd like to see the name of the server group ("prod" in the
example) change to blue from white on the main view to remind me
there's an ignored test.  But I don't want the "main view" colour to
change from green

Don't disable the test. Acknowledge the alert.
Let me explain the situation a little more clearly.

We have tons of servers deployed in pairs.  Each pair consists of an
active box and a standby box and it doesn't technically matter which
one of the two is active.  For consistency reasons, we like to keep it
so the "first" box is active whenever possible.

If the first box fails over, for whatever reason, it generates a red
alarm on Xymon saying it's no longer active and (after checking
everything out) we ask someone on the night-shift to fail back over
during off-hours.  At this point, we don't want the main Xymon view to
be red so we "ignore" the test.  However, since the main view is now
green, the techs sometimes forget that there's anything to do and it
remains failed over until someone drills down and sees it.
This comes back around to something I regularly tell our staff:
"Xymon (and Big Brother before that) is not a task list. It is an alerting system. Using it as a task list is an abuse of the tool and reduces its ability to meets its fundamental business goal."

We have task-list and problem tracking processes in place so don't need to use Xymon to meet this need. Your business needs and available tools may be different, but I urge you to consider finding a better tool than Xymon for managing task lists.
quoted from Oliver
I was trying to get to a state where they would know that there's a
disabled/ignored/ack'd box from the front page to eliminate the "I
missed the email" excuses
You could define a 'combo' test which alarmed when fewer than two of the underlying tests were green. This 'combo' test could be rigged to propagate to the non-green screen while suppressing the propagation of the underlying tests.

You could then rig the underlying tests to send automated email alerts to the folks who should fix the broken half of the pair. Look at combo.cfg and alerts.cfg for options to aggregate test results and time/escalate automated email alerts.
quoted from John Thurston

-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska
list Jeremy Laidman · Wed, 6 Aug 2014 21:34:37 +1000 ·
quoted from John Thurston
On 31/07/2014 3:14 AM, "John Thurston" <user-ce4d79d99bab@xymon.invalid> wrote:
This comes back around to something I regularly tell our staff:
"Xymon (and Big Brother before that) is not a task list. It is an
alerting system. Using it as a task list is an abuse of the tool and
reduces its ability to meets its fundamental business goal."
This is gold! Thank you.