Xymon Mailing List Archive search

"Disable until change"

list Japheth Cleaver
Tue, 3 Nov 2015 10:42:33 -0800
Message-Id: <user-17edf4c5c437@xymon.invalid>

I'd agree that disable is intended more as a human override about the alertability of a host+service combo. The acknowledge functionality is more in line with what it seems you're looking for: "It's still Yellow, still keep track of things, but don't alert downstream unless something explicitly wants to."

If the issue is with the nongreen page, I believe there should be a way to remove ack'd items from that page (but it might require running a second instance of xymongen just to spit out that page, potentially with a BOARDFILTER in there to limit it further).

"Disable until Change" would be possible, but we'd need to store the actual underlying color to compare the incoming report to, since disabling works by overriding the color that was sent and forcing it blue. "Unack on Change" works precisely because we still have a meaningful current color to compare an incoming message to.

-jc


On 11/2/2015 4:21 PM, Novosielski, Ryan wrote:
I personally do not think using disable is a good idea for unplanned problems. For one, if you use the reporting features, you will be mixing planned and unplanned downtime together. Disable is really for times when you know exactly what is going on with the system, and alerting is not needed/someone is watching the system manually. That's my take on it anyway, and what I tell the people that work with me.

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid <mailto:user-46c89e614701@xymon.invalid>- 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
    `'

On Nov 2, 2015, at 18:59, John Thurston <user-ce4d79d99bab@xymon.invalid <mailto:user-ce4d79d99bab@xymon.invalid>> wrote:
We often use "disable until ok", but it was brought to my attention that
it has burned us from time to time. For example:

Host foo is yellow on disk. But that's ok. We're going to allocate some
new storage for it in the next service window. The test is marked
"disable until ok". But before the service window arrives, something
chews up a whole bunch of disk and the now-red test continues to be blue
because the test is not yet ok.

We sometimes use "acknowledge" for this function, but the non-green
screen can get kind of cluttered this way.

Does anyone have a good way to fake "disable while status remains
unchanged"?