Xymon Mailing List Archive search

SLA 99.9999 question?

7 messages in this thread

list Mario Andre · Thu, 5 Jan 2006 10:22:41 -0300 ·
Hi Friends,

Is there a way to report SLA with 4 decimals digits - "99.9999" ?

Thanks in advance,

Mario.
list Paul Williamson · Thu, 05 Jan 2006 08:29:31 -0500 ·
That equates to 31.5 seconds of downtime per year.
Since I'm guessing you're doing a 5 minute test interval,  this doesn't make much sense.  Actually, 99.99% would be a good balance, since this equates to 52 minutes and 33 seconds of downtime per year.

All this is moot if management wants to see 6 nines, which is funny since the standard is 5 nines... (5:15 per year of d/t)

Paul
user-82c7780661a4@xymon.invalid 01/05/06 8:22 AM >>>
quoted from Mario Andre
Hi Friends,

Is there a way to report SLA with 4 decimals digits - "99.9999" ?

Thanks in advance,

Mario.
list Mario Andre · Thu, 5 Jan 2006 10:48:18 -0300 ·
Thanks Paul,


You're absolutely right!
But, maybe, in a near future we will have to work with systems that needs
this kind of disponibility.


Regards.
quoted from Paul Williamson


On 1/5/06, PAUL WILLIAMSON <user-b9fa55f5c833@xymon.invalid> wrote:
That equates to 31.5 seconds of downtime per year.
Since I'm guessing you're doing a 5 minute test interval,
this doesn't make much sense.  Actually, 99.99% would
be a good balance, since this equates to 52 minutes
and 33 seconds of downtime per year.

All this is moot if management wants to see 6 nines, which
is funny since the standard is 5 nines... (5:15 per year of d/t)

Paul
user-82c7780661a4@xymon.invalid 01/05/06 8:22 AM >>>
Hi Friends,

Is there a way to report SLA with 4 decimals digits - "99.9999" ?

Thanks in advance,

Mario.

list Asif Iqbal · Thu, 5 Jan 2006 22:58:40 -0800 ·
quoted from Mario Andre
On Thu, Jan 05, 2006 at 10:22:41AM, mario andre wrote:
Hi Friends,

Is there a way to report SLA with 4 decimals digits - "99.9999" ?

Thanks in advance,

Mario.
This should help

http://www.google.com/search?q=SLA+report+with+four+decimals

Basically I think you need to look for "%.2f" on bb-replog.c and
pagegen.c and replace them with "%.4f". Then you need to recompile the hobbit server.
Please verify before putting in production. It has been a while since I
did that.

Thanks

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
"..there are two kinds of people: those who work and those who take the credit...try
 to be in the first group;...less competition there."  - Indira Gandhi
list Rob MacGregor · Fri, 6 Jan 2006 08:12:48 +0000 ·
quoted from Asif Iqbal
On 06/01/06, Asif Iqbal <user-c8222abeff59@xymon.invalid> wrote:
http://www.google.com/search?q=SLA+report+with+four+decimals

Basically I think you need to look for "%.2f" on bb-replog.c and
pagegen.c and replace them with "%.4f". Then you need to recompile the hobbit server.
And keep in mind that with a 5 minute test interval you'll at best
check often enough for meaningful 2 digit accuracy.  To meaningfully
get a 4 digit accuracy the longest interval you can have is 30
seconds, and even that isn't good enough.

--
                 Please keep list traffic on the list.
Rob MacGregor
      Whoever fights monsters should see to it that in the process he
        doesn't become a monster.                  Friedrich Nietzsche
list Mario Andre · Fri, 6 Jan 2006 10:04:56 -0300 ·
The great bbretest-net tool reduced the interval for the TCP tests, so now,
in some cases a 3 or 4 digit accuracy could be shown.


Mario.
quoted from Rob MacGregor

On 1/6/06, Rob MacGregor <user-07c9d92ae079@xymon.invalid> wrote:
On 06/01/06, Asif Iqbal <user-c8222abeff59@xymon.invalid> wrote:
http://www.google.com/search?q=SLA+report+with+four+decimals

Basically I think you need to look for "%.2f" on bb-replog.c and
pagegen.c and replace them with "%.4f". Then you need to recompile the
hobbit server.
And keep in mind that with a 5 minute test interval you'll at best
check often enough for meaningful 2 digit accuracy.  To meaningfully
get a 4 digit accuracy the longest interval you can have is 30
seconds, and even that isn't good enough.

--
                Please keep list traffic on the list.
Rob MacGregor
     Whoever fights monsters should see to it that in the process he
       doesn't become a monster.                  Friedrich Nietzsche

list Scott Walters · Fri, 6 Jan 2006 10:30:47 -0500 (EST) ·
quoted from Mario Andre
On Fri, 6 Jan 2006, mario andre wrote:
The great bbretest-net tool reduced the interval for the TCP tests, so now,
in some cases a 3 or 4 digit accuracy could be shown.
I agree the bbretest-net is great.  I disagree it it increases the
accuracy of availability measurements to 3 or 4 digits.

The re-test only affects the frequency when a failure is detected.  When
tests pass, the interval is still the standard 5 minutes.

A service could be down for 2 minutes before the 're-test' kicks in.  That
two minutes of 'missed' downtime is roughly 0.0004 of the year.

With 5 minute intervals, only 4 significant digits should be used or
'rounding errors' will be compounded.

If it is because management wants it, fine, but mathematically you're
making it up.

And from a business perspective, I've found availability statistics an
extremely poor way of managing expections for SLAs.  They are barely good
for measuring them.

There are two 'million dollar questions':

1)  When does the service need to be available?

2)  If it is down, what is the longest outage you can tolerate?

	* And be prepared to offer the cost differences between 1,4,12,24
hour recovery windows.  Customers will change their tune quickly when they
see the costs associated with 'zero downtime' environments.  Obviously,
stock exchanges, 911 call centers, eBay environments are prepared to pay .
. . . and charge an extra gazillion dollars if you can never get a
maintenance window.

The answers to those two questions will make it clear how to build the
infrastructure (tecnical, staffing, etc) requirements.

"Three kinds of lies: Lies, damned lies, and statistics. "
- Mark Twain

-- 
Scott Walters
-PacketPusher