Tips to get custom availability metrics?

3 messages in this thread

list Shawn Maschino · Tue, 21 Feb 2006 10:37:51 -0500 ·

            Hi guys, I was wondering if anyone could shed some light (or
share some code!) on processing the Hobbit data files to generate custom
availability reports for servers/test pairs in Hobbit.  I'm sure there
is a way to do it, I'm just not smart enough to figure it out :-)

 
            To better describe my situation - I have all my servers in
Hobbit, and then a database that has server details - like site,
criticality, ...  and I'm looking to be able to run reports like - "what
was my conn test uptime for anything at site XYZ", or "what was my http
uptime for our business critical apps".  

 
I can easily query the database and load the server list into an array
as the source list to process, but when it comes to knowing how to
process the Hobbit data files to get the % uptime for that list of
servers I am lost.  I suppose in a perfect solution I'd be looking to be
able to populate and array with the list of hostnames, pass that and the
test name that I'm looking for to a routine, and get back a single
number as the % uptime for that list of hosts for that test.

 
            Any pointers would be appreciated!

 
Shawn Maschino 
GE Plastics
(413)448-6375

user-518eb92a87d3@xymon.invalid

list Henrik Størner · Tue, 21 Feb 2006 17:16:36 +0100 ·

▸ quoted from Shawn Maschino

On Tue, Feb 21, 2006 at 10:37:51AM -0500, Maschino, Shawn (GE Indust, Plastics) wrote:

            To better describe my situation - I have all my servers in
Hobbit, and then a database that has server details - like site,
criticality, ...  and I'm looking to be able to run reports like - "what
was my conn test uptime for anything at site XYZ", or "what was my http
uptime for our business critical apps".  

I can easily query the database and load the server list into an array
as the source list to process, but when it comes to knowing how to
process the Hobbit data files to get the % uptime for that list of
servers I am lost.  I suppose in a perfect solution I'd be looking to be
able to populate and array with the list of hostnames, pass that and the
test name that I'm looking for to a routine, and get back a single
number as the % uptime for that list of hosts for that test.

Hobbit has some built in availability reporting, which also lets you
output the result in a CSV format for importing into a spreadsheet.
But that only gives you availability for each of the status columns in
Hobbit - which is not very useful, when you want to combine multiple
statuses into a single availability measure.

The easiest way would be if you know what combinations you want to
measure in advance. E.g. if you have 4 webservers handling you business
app, and you run an http check on all 4, then you can make a combotest
out of those 4. E.g. if you add this to the bbcombotest.cfg file:

   app01.http=(web01.http+web02.http+web03.http+web04.http)>=2

then you'll get a status for the host "app01" where the "http" column
will be green if 2 or more of the web01-04 http columns are green.

Then you can run the normal Hobbit availability report, and it will 
give you the percentage available for the combined test also, which
is what you want.


If that is not sufficient - e.g. you want to do ad-hoc queries that
combine hosts/tests you had not planned in advance - then it gets
complicated. The historical logs for each status are in the
~hobbit/data/hist/HOSTNAME.COLUMN files, but they only log when a status
changes color. In the Hobbit source, lib/availability.c is the file that
deals with calculating availability - you're free to have a look at how
it works. It does have a debugging utility built-in, but this won't
build in the current version (I rarely use it nowadays). If you pickup
the current snapshot you can see how it's used: E.g. running

   ./lib/availability /var/lib/hobbit/hist/osiris.hobbitd \
      `date +%s --date="01 Jan 2006"` `date +%s`  

will call the routine that determines how much time the hobbitd
column has had the various color from Jan 1st until now.


Regards,
Henrik

list Shawn Maschino · Wed, 22 Feb 2006 14:05:08 -0500 ·

	Thanks Henrik, very helpful as always.  I was able to hack
together this little Perl script to do the job.  I still need to make
sure all the date and availability calculations are correct (since I
won't pretend to be good at perl or math!) but it does what I needed so
I figured I'd share in case anyone else will find it useful. 

	If anyone sees problems or improves it let me know!  I'm sure it
would be must more useful with a web interface to chose the dates and
systems to report on, but this works what I needed.

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Tuesday, February 21, 2006 11:17 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Tips to get custom availability metrics?

On Tue, Feb 21, 2006 at 10:37:51AM -0500, Maschino, Shawn (GE Indust,
Plastics) wrote:

            To better describe my situation - I have all my servers in
Hobbit, and then a database that has server details - like site,
criticality, ...  and I'm looking to be able to run reports like -

"what

was my conn test uptime for anything at site XYZ", or "what was my
http
uptime for our business critical apps".  

I can easily query the database and load the server list into an array
as the source list to process, but when it comes to knowing how to
process the Hobbit data files to get the % uptime for that list of
servers I am lost.  I suppose in a perfect solution I'd be looking to
be
able to populate and array with the list of hostnames, pass that and
the
test name that I'm looking for to a routine, and get back a single
number as the % uptime for that list of hosts for that test.

Hobbit has some built in availability reporting, which also lets you
output the result in a CSV format for importing into a spreadsheet.
But that only gives you availability for each of the status columns in
Hobbit - which is not very useful, when you want to combine multiple
statuses into a single availability measure.

The easiest way would be if you know what combinations you want to
measure in advance. E.g. if you have 4 webservers handling you business
app, and you run an http check on all 4, then you can make a combotest
out of those 4. E.g. if you add this to the bbcombotest.cfg file:

   app01.http=(web01.http+web02.http+web03.http+web04.http)>=2

then you'll get a status for the host "app01" where the "http" column
will be green if 2 or more of the web01-04 http columns are green.

Then you can run the normal Hobbit availability report, and it will 
give you the percentage available for the combined test also, which
is what you want.


If that is not sufficient - e.g. you want to do ad-hoc queries that
combine hosts/tests you had not planned in advance - then it gets
complicated. The historical logs for each status are in the
~hobbit/data/hist/HOSTNAME.COLUMN files, but they only log when a status
changes color. In the Hobbit source, lib/availability.c is the file that
deals with calculating availability - you're free to have a look at how
it works. It does have a debugging utility built-in, but this won't
build in the current version (I rarely use it nowadays). If you pickup
the current snapshot you can see how it's used: E.g. running

   ./lib/availability /var/lib/hobbit/hist/osiris.hobbitd \
      `date +%s --date="01 Jan 2006"` `date +%s`  

will call the routine that determines how much time the hobbitd
column has had the various color from Jan 1st until now.


Regards,
Henrik

Attachments (1)

attachment.obj application/octet-stream · 2.1 KB

Tips to get custom availability metrics? 🔗 link

Tips to get custom availability metrics?