SLA 99.9999 question?
list Mario Andre
Hi Friends, Is there a way to report SLA with 4 decimals digits - "99.9999" ? Thanks in advance, Mario.
list Paul Williamson
That equates to 31.5 seconds of downtime per year. Since I'm guessing you're doing a 5 minute test interval, this doesn't make much sense. Actually, 99.99% would be a good balance, since this equates to 52 minutes and 33 seconds of downtime per year. All this is moot if management wants to see 6 nines, which is funny since the standard is 5 nines... (5:15 per year of d/t) Paul
user-82c7780661a4@xymon.invalid 01/05/06 8:22 AM >>>
▸
Hi Friends, Is there a way to report SLA with 4 decimals digits - "99.9999" ? Thanks in advance, Mario.
list Mario Andre
Thanks Paul, You're absolutely right! But, maybe, in a near future we will have to work with systems that needs this kind of disponibility. Regards.
▸
On 1/5/06, PAUL WILLIAMSON <user-b9fa55f5c833@xymon.invalid> wrote:That equates to 31.5 seconds of downtime per year. Since I'm guessing you're doing a 5 minute test interval, this doesn't make much sense. Actually, 99.99% would be a good balance, since this equates to 52 minutes and 33 seconds of downtime per year. All this is moot if management wants to see 6 nines, which is funny since the standard is 5 nines... (5:15 per year of d/t) Pauluser-82c7780661a4@xymon.invalid 01/05/06 8:22 AM >>>Hi Friends, Is there a way to report SLA with 4 decimals digits - "99.9999" ? Thanks in advance, Mario.
list Asif Iqbal
▸
On Thu, Jan 05, 2006 at 10:22:41AM, mario andre wrote:
Hi Friends, Is there a way to report SLA with 4 decimals digits - "99.9999" ? Thanks in advance, Mario.
This should help http://www.google.com/search?q=SLA+report+with+four+decimals Basically I think you need to look for "%.2f" on bb-replog.c and pagegen.c and replace them with "%.4f". Then you need to recompile the hobbit server. Please verify before putting in production. It has been a while since I did that. Thanks -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "..there are two kinds of people: those who work and those who take the credit...try to be in the first group;...less competition there." - Indira Gandhi
list Rob MacGregor
▸
On 06/01/06, Asif Iqbal <user-c8222abeff59@xymon.invalid> wrote:
http://www.google.com/search?q=SLA+report+with+four+decimals Basically I think you need to look for "%.2f" on bb-replog.c and pagegen.c and replace them with "%.4f". Then you need to recompile the hobbit server.
And keep in mind that with a 5 minute test interval you'll at best
check often enough for meaningful 2 digit accuracy. To meaningfully
get a 4 digit accuracy the longest interval you can have is 30
seconds, and even that isn't good enough.
--
Please keep list traffic on the list.
Rob MacGregor
Whoever fights monsters should see to it that in the process he
doesn't become a monster. Friedrich Nietzsche
list Mario Andre
The great bbretest-net tool reduced the interval for the TCP tests, so now, in some cases a 3 or 4 digit accuracy could be shown. Mario.
▸
On 1/6/06, Rob MacGregor <user-07c9d92ae079@xymon.invalid> wrote:On 06/01/06, Asif Iqbal <user-c8222abeff59@xymon.invalid> wrote:http://www.google.com/search?q=SLA+report+with+four+decimals Basically I think you need to look for "%.2f" on bb-replog.c and pagegen.c and replace them with "%.4f". Then you need to recompile the hobbit server.And keep in mind that with a 5 minute test interval you'll at best check often enough for meaningful 2 digit accuracy. To meaningfully get a 4 digit accuracy the longest interval you can have is 30 seconds, and even that isn't good enough. -- Please keep list traffic on the list. Rob MacGregor Whoever fights monsters should see to it that in the process he doesn't become a monster. Friedrich Nietzsche
list Scott Walters
▸
On Fri, 6 Jan 2006, mario andre wrote:
The great bbretest-net tool reduced the interval for the TCP tests, so now, in some cases a 3 or 4 digit accuracy could be shown.
I agree the bbretest-net is great. I disagree it it increases the accuracy of availability measurements to 3 or 4 digits. The re-test only affects the frequency when a failure is detected. When tests pass, the interval is still the standard 5 minutes. A service could be down for 2 minutes before the 're-test' kicks in. That two minutes of 'missed' downtime is roughly 0.0004 of the year. With 5 minute intervals, only 4 significant digits should be used or 'rounding errors' will be compounded. If it is because management wants it, fine, but mathematically you're making it up. And from a business perspective, I've found availability statistics an extremely poor way of managing expections for SLAs. They are barely good for measuring them. There are two 'million dollar questions': 1) When does the service need to be available? 2) If it is down, what is the longest outage you can tolerate? * And be prepared to offer the cost differences between 1,4,12,24 hour recovery windows. Customers will change their tune quickly when they see the costs associated with 'zero downtime' environments. Obviously, stock exchanges, 911 call centers, eBay environments are prepared to pay . . . . and charge an extra gazillion dollars if you can never get a maintenance window. The answers to those two questions will make it clear how to build the infrastructure (tecnical, staffing, etc) requirements. "Three kinds of lies: Lies, damned lies, and statistics. " - Mark Twain -- Scott Walters -PacketPusher