Xymon Mailing List Archive search

Looking for suggestions on hobbit SLA reporting

3 messages in this thread

list Tom Kauffman · Wed, 30 Aug 2006 16:10:41 -0400 ·
I can't currently use the hobbit SLA reporting, as much as I would like
to. I've got two problems with the current setup, and can't figure a way
around them both.

First -- from a management perspective, our primary application (SAP
R/3) is either available for use or down. We have a stated objective of
some % uptime. Scheduled outages are charged just like unscheduled. So
if we take the system for a scheduled 12-hour upgrade, we get charged
with 12 hours against our total availability.

This means that 'blue' status is the same as 'red' -- the guys in the
field can't use it, it's not available, and management doesn't care that
it's a scheduled outage. (Yes, if we take ALL our allocated down time
AND run our normal off-line weekly backup, we will *not* meet our SLA by
about 10 hours -- based on current backup run time; and that's not to
mention the three unscheduled outages so far this year).

The second problem -- 'red' doesn't always mean "it's dead, Jim"; it can
also mean "somebody better look at this quick, BEFORE it dies". But we
don't want to page on yellow, even on this system.

Suggestions?

Henrik, is there a (relatively) easy way to convince the sla report
funtion to consider 'blue' and/or 'clear' as 'red' for reporting?

For this particular application, I have a potential ugly hack. We run an
in-house test that verifies the SAP central instance can access the
Oracle database; it reports 'green' or 'red'. It also checks to see if
the off-line backup is running, and reports 'blue' with no attempt at a
test. I could always take out the test for the backup and let the test
go red, altering the alert config to not page for this test during the
scheduled backup window.

But that leaves me altering the alert config to shut paging off during a
scheduled 'down' instead of using the 'disable' function . . .

TIA

Tom Kauffman
NIBCO, Inc

(Is there any way to add 'hot pink' as a status between 'yellow' and
'red'? :-)

CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.
list Henrik Størner · Wed, 30 Aug 2006 23:03:44 +0200 ·
quoted from Tom Kauffman
On Wed, Aug 30, 2006 at 04:10:41PM -0400, Kauffman, Tom wrote:
First -- from a management perspective, our primary application (SAP
R/3) is either available for use or down. We have a stated objective of
some % uptime. Scheduled outages are charged just like unscheduled. So
if we take the system for a scheduled 12-hour upgrade, we get charged
with 12 hours against our total availability.

This means that 'blue' status is the same as 'red' -- the guys in the
field can't use it, it's not available, and management doesn't care that
it's a scheduled outage.
[snip]
Henrik, is there a (relatively) easy way to convince the sla report
funtion to consider 'blue' and/or 'clear' as 'red' for reporting?
You'll have to change the Hobbit source to do that, but it is fairly
simple. In the lib/availability.c file around line 500 is where it
decides what colors go into the availability percentage. That line
reads:

  repinfo->reportavailability = 
     100.0 - repinfo->reportpct[COL_RED] - repinfo->reportpct[COL_CLEAR];

As you can see, "clear" and "red" go together. So you just need to add
"blue" to the list, so it becomes

  repinfo->reportavailability = 
     100.0 - repinfo->reportpct[COL_RED] - repinfo->reportpct[COL_CLEAR] - repinfo->reportpct[COL_BLUE];
quoted from Tom Kauffman
The second problem -- 'red' doesn't always mean "it's dead, Jim"; it can
also mean "somebody better look at this quick, BEFORE it dies". But we
don't want to page on yellow, even on this system.

Suggestions?
Well, that *is* what the yellow color was intended for. You do know that
you can configure the alerts to go out on yellow for just a single
system or test?
quoted from Tom Kauffman
(Is there any way to add 'hot pink' as a status between 'yellow' and
'red'? :-)
Or "orange" ... in theory, yes - but I am not at all sure how many
places in the Hobbit code would need changing to make sure they are
all right when there's suddenly one more color to handle.


Regards,
Henrik
list Tom Kauffman · Thu, 31 Aug 2006 09:42:01 -0400 ·
Thanks, Henrik. I may well change the availability report code.

Is there any way to make this a compile/config time option in the
future? (More accurately -- would you consider doing this?)

As for the additional color -- I was just yanking your chain :-) You've
done such a great job and added so much to hobbit in 4.2, and thanks to
you and the guys out there that had the time to wring it out there have
been no real issues. 

I just need to get off my backside and tweak some of the testing and
further define what we *really* need to care about after hours. I also
need to be able to tell the management team that certain columns reflect
actual SLA, while other columns indicate percentage of time not in a
critical state.

Thanks again for such a flexible package!

Tom
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, August 30, 2006 5:04 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Looking for suggestions on hobbit SLA reporting

On Wed, Aug 30, 2006 at 04:10:41PM -0400, Kauffman, Tom wrote:
First -- from a management perspective, our primary application (SAP
R/3) is either available for use or down. We have a stated objective
of
some % uptime. Scheduled outages are charged just like unscheduled. So
if we take the system for a scheduled 12-hour upgrade, we get charged
with 12 hours against our total availability.

This means that 'blue' status is the same as 'red' -- the guys in the
field can't use it, it's not available, and management doesn't care
that
it's a scheduled outage.
[snip]
Henrik, is there a (relatively) easy way to convince the sla report
funtion to consider 'blue' and/or 'clear' as 'red' for reporting?
You'll have to change the Hobbit source to do that, but it is fairly
simple. In the lib/availability.c file around line 500 is where it
decides what colors go into the availability percentage. That line
reads:

  repinfo->reportavailability = 
     100.0 - repinfo->reportpct[COL_RED] -
repinfo->reportpct[COL_CLEAR];

As you can see, "clear" and "red" go together. So you just need to add
"blue" to the list, so it becomes

  repinfo->reportavailability = 
     100.0 - repinfo->reportpct[COL_RED] - repinfo->reportpct[COL_CLEAR]
- repinfo->reportpct[COL_BLUE];
The second problem -- 'red' doesn't always mean "it's dead, Jim"; it
can
also mean "somebody better look at this quick, BEFORE it dies". But we
don't want to page on yellow, even on this system.

Suggestions?
Well, that *is* what the yellow color was intended for. You do know that
you can configure the alerts to go out on yellow for just a single
system or test?
(Is there any way to add 'hot pink' as a status between 'yellow' and
'red'? :-)
Or "orange" ... in theory, yes - but I am not at all sure how many
places in the Hobbit code would need changing to make sure they are
all right when there's suddenly one more color to handle.


Regards,
Henrik


CONFIDENTIALITY NOTICE:  This email and any attachments are for the 
exclusive and confidential use of the intended recipient.  If you are not
the intended recipient, please do not read, distribute or take action in 
reliance upon this message. If you have received this in error, please 
notify us immediately by return email and promptly delete this message 
and its attachments from your computer system. We do not waive  
attorney-client or work product privilege by the transmission of this
message.