Xymon Mailing List Archive search

best (or any) way to remember disabled tests on the main page?

list John Thurston
Wed, 30 Jul 2014 09:14:27 -0800
Message-Id: <user-02b232e60fbd@xymon.invalid>

On 7/30/2014 8:50 AM, oliver wrote:
Ideally, I'd like to see the name of the server group ("prod" in the
example) change to blue from white on the main view to remind me
there's an ignored test.  But I don't want the "main view" colour to
change from green

Don't disable the test. Acknowledge the alert.
Let me explain the situation a little more clearly.

We have tons of servers deployed in pairs.  Each pair consists of an
active box and a standby box and it doesn't technically matter which
one of the two is active.  For consistency reasons, we like to keep it
so the "first" box is active whenever possible.

If the first box fails over, for whatever reason, it generates a red
alarm on Xymon saying it's no longer active and (after checking
everything out) we ask someone on the night-shift to fail back over
during off-hours.  At this point, we don't want the main Xymon view to
be red so we "ignore" the test.  However, since the main view is now
green, the techs sometimes forget that there's anything to do and it
remains failed over until someone drills down and sees it.
This comes back around to something I regularly tell our staff:
"Xymon (and Big Brother before that) is not a task list. It is an alerting system. Using it as a task list is an abuse of the tool and reduces its ability to meets its fundamental business goal."

We have task-list and problem tracking processes in place so don't need to use Xymon to meet this need. Your business needs and available tools may be different, but I urge you to consider finding a better tool than Xymon for managing task lists.
I was trying to get to a state where they would know that there's a
disabled/ignored/ack'd box from the front page to eliminate the "I
missed the email" excuses
You could define a 'combo' test which alarmed when fewer than two of the underlying tests were green. This 'combo' test could be rigged to propagate to the non-green screen while suppressing the propagation of the underlying tests.

You could then rig the underlying tests to send automated email alerts to the folks who should fix the broken half of the pair. Look at combo.cfg and alerts.cfg for options to aggregate test results and time/escalate automated email alerts.

-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska