trying to get netapp filer data into larrd graphs

20 messages in this thread

list Tom Georgoulias · Wed, 16 Feb 2005 15:52:02 -0500 ·

I'm using the filerstats2bb script from deadcat.net to get from my
Netapp filers and displaying it in hobbit.  This is what is displayed on
the status page:

conn, cpu, disk, info, inode, qtree, trends, user_quota

The data displayed is accurate, but the only graph that works is conn.
The rest are severely broken.  I checked the rrd directory and the only
files that existed were:

memory.real.rrd
memory.swap.rrd
tcp.conn.rrd

Not sure why memory* exist at all, but they do and are empty.

I would like to fix this, starting with CPU.  I'm hoping that what I
learn here can be used when I attempt to create custom graphs for
user_quota & qtree with the custom RRD feature described in 
hobbitd_larrd.  The rest of this message concerns only the load 
average/CPU graph problem, since I figure this ought to work without any 
modification.

For example, this is the contents of a status summary displayed on the
CPU status page:
==
  Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1

LOAD AVG on filerA.nandomedia.com is 1
==

Underneath the status message, a link for "hobbit graph la" instead of a 
real graph.

There is data in the usual places in hobbit/data/hist/

cat hobbit/data/hist/filerA,nandomedia,com.cpu
<snip>
Sat Feb 12 16:32:47 2005 purple 1108243967 140478
Mon Feb 14 07:34:05 2005 green 1108384445 1050
Mon Feb 14 07:51:35 2005 purple 1108385495 304
Mon Feb 14 07:56:39 2005 green 1108385799

cat hobbit/data/hist/filerA.nandomedia.com | grep cpu
<snip>
cpu 1108128065 1108073455 54610 gr pu 1
cpu 1108243967 1108128065 115902 pu gr 2
cpu 1108384445 1108243967 140478 gr pu 1
cpu 1108385495 1108384445 1050 pu gr 2
cpu 1108385799 1108385495 304 gr pu 1

So why isn't the data being pulled from the status report messages and 
put into an rrd file to larrd can use it?

Tom

list Michael Lowery · Wed, 16 Feb 2005 15:01:28 -0600 ·

On a similar note, I'm trying to get the ciscocpu.pl/.sh script to also
present the data to larrd.  But I don't really have a clue as how to
accomplish this, so if someone feels that they would like to help, let
me know.  I will be equally grateful for those who can provide links and
or hints that can help me get this done.  

**I am willing to learn how to do it...**  But I'm not yet fully versed
in scripting and know nothing about perl.

Michael Lowery

▸ quoted from Tom Georgoulias

-----Original Message-----
From: Tom Georgoulias [mailto:user-e7ef09aae711@xymon.invalid] 
Sent: Wednesday, February 16, 2005 2:52 PM
To: hobbit mailing list
Subject: [hobbit] trying to get netapp filer data into larrd graphs

I'm using the filerstats2bb script from deadcat.net to get from my
Netapp filers and displaying it in hobbit.  This is what is displayed on
the status page:

conn, cpu, disk, info, inode, qtree, trends, user_quota

The data displayed is accurate, but the only graph that works is conn.
The rest are severely broken.  I checked the rrd directory and the only
files that existed were:

memory.real.rrd
memory.swap.rrd
tcp.conn.rrd

Not sure why memory* exist at all, but they do and are empty.

I would like to fix this, starting with CPU.  I'm hoping that what I
learn here can be used when I attempt to create custom graphs for
user_quota & qtree with the custom RRD feature described in 
hobbitd_larrd.  The rest of this message concerns only the load 
average/CPU graph problem, since I figure this ought to work without any

modification.

For example, this is the contents of a status summary displayed on the
CPU status page:
==
  Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1

LOAD AVG on filerA.nandomedia.com is 1
==

Underneath the status message, a link for "hobbit graph la" instead of a

real graph.

There is data in the usual places in hobbit/data/hist/

cat hobbit/data/hist/filerA,nandomedia,com.cpu
<snip>
Sat Feb 12 16:32:47 2005 purple 1108243967 140478
Mon Feb 14 07:34:05 2005 green 1108384445 1050
Mon Feb 14 07:51:35 2005 purple 1108385495 304
Mon Feb 14 07:56:39 2005 green 1108385799

cat hobbit/data/hist/filerA.nandomedia.com | grep cpu
<snip>
cpu 1108128065 1108073455 54610 gr pu 1
cpu 1108243967 1108128065 115902 pu gr 2
cpu 1108384445 1108243967 140478 gr pu 1
cpu 1108385495 1108384445 1050 pu gr 2
cpu 1108385799 1108385495 304 gr pu 1

So why isn't the data being pulled from the status report messages and 
put into an rrd file to larrd can use it?

Tom

list Tom Georgoulias · Wed, 16 Feb 2005 16:32:17 -0500 ·

▸ quoted from Michael Lowery

Lowery, Michael wrote:

On a similar note, I'm trying to get the ciscocpu.pl/.sh script to also
present the data to larrd.  But I don't really have a clue as how to
accomplish this, so if someone feels that they would like to help, let
me know.  I will be equally grateful for those who can provide links and
or hints that can help me get this done.

Did you look at the "custom rrd data" section in the hobbitd_larrd manpage?  It may be easier to do than you think.  THere is no perl involved (assuming your ciscocpu script is already running on the hobbit server and displaying the data (but not the graphs)) and an example shell script is provided.

Tom

list Henrik Størner · Wed, 16 Feb 2005 21:56:36 +0000 (UTC) ·

▸ quoted from Tom Georgoulias

In <user-f78d2739487b@xymon.invalid> Tom Georgoulias <user-e7ef09aae711@xymon.invalid> writes:

I'm using the filerstats2bb script from deadcat.net to get from my
Netapp filers and displaying it in hobbit.  This is what is displayed on
the status page:

conn, cpu, disk, info, inode, qtree, trends, user_quota

The data displayed is accurate, but the only graph that works is conn.
The rest are severely broken.

That figures, since the "conn" test is run by Hobbit (bbtest-net) and
reports data in a form that Hobbit knows how to handle.

▸ quoted from Michael Lowery

I would like to fix this, starting with CPU.  I'm hoping that what I
learn here can be used when I attempt to create custom graphs for
user_quota & qtree with the custom RRD feature described in 
hobbitd_larrd.

You can use some of it, but there is a difference between fixing an
existing handler (hobbit already handles some "cpu" data), and adding
a new handler that hobbit does not know about. Simply because when
fixing the cpu-handler, you really have to fix the current C code.

▸ quoted from Michael Lowery

The rest of this message concerns only the load 
average/CPU graph problem, since I figure this ought to work without any 
modification.

For example, this is the contents of a status summary displayed on the
CPU status page:
==
 Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1

The best way of working with the RRD data that Hobbit handles is to
snoop on the data that is sent from hobbitd to the hobbitd_larrd
program. You can do that by listening on the hobbit "status" channel:

    ~/server/bin/bbcmd sh
    hobbitd_channel --channel=status cat

When the "cpu" status arrives, you'll see something like this:

@@status#121308|1108589727.548324|172.16.10.2||voodoo.hswn.dk|cpu|1108591527|green||green|1106668421|0||0|
status voodoo,hswn,dk.cpu green Wed Feb 16 22:35:27 CET 2005 up: 23 days, 2 users, 171 procs, load=11

top - 22:35:27 up 23 days, 48 min,  2 users,  load average: 0.24, 0.11, 0.09
Tasks: 170 total,   1 running, 169 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.2% us,  1.5% sy,  0.1% ni, 91.2% id,  2.8% wa,  0.1% hi,  0.1% si
Mem:    646876k total,   635204k used,    11672k free,   194116k buffers
Swap:   787176k total,    23608k used,   763568k free,   123284k cached

[lots of lines from "top" snipped]

@@

The first line with "@@status..." is the beginning of a message - it
has some information that hobbitd picks out from all messages, like
the hostname, test-name, color etc. The important thing here is to see
that hobbitd does see that it is a "cpu" status - there's "|cpu|" in
the first line. That means hobbitd_larrd will send this message
through the "cpu" handler in hobbitd/larrd/do_la.c.

So we need to look at what the do_la.c file does.

        eoln = strchr(msg, '\n'); if (eoln) *eoln = '\0';

This finds the first new-line character, and cuts off anything after
that. So essentially, it only looks at the first line of the status
message.


        p = strstr(msg, "up: ");
        if (p) {
              .... process the message ....

This searches the message (or rather, the first line of it), for the
string "up: " . I suspect this is where it breaks for your Netapp
reports, because they have "Uptime:", not "up: "

▸ quoted from Michael Lowery

 Wed Feb 16 14:59:46 EST 2005 - CPU Utilization on
filerA.nandomedia.com is OK. Uptime: 63 days, 06:57:54.29, load=1

Yes, computers are picky about such details ...

So the first fix is to change those lines above to handle a report
with the keyword "Uptime:" - e.g. like this:

        p = strstr(msg, "up: ");
	if (!p) p = strstr(msg, "Uptime:");
        if (p) {


Just one line added. But in this case, I think it makes all the
difference - because the rest of the reports looks like it will be
handled just fine by the current code in do_la.c

I've added this fix to my sources.


Not much info here about doing custom graphs, I'm afraid. But if you
look over the example in the hobbitd_larrd man-page, it should get you
started. If not, feel free to ask for more help.

Henrik


PS: If you want me to look at that Netapp disk-report that isn't being
graphed, just send me an example of what such a report looks like.

H.

list Michael Lowery · Wed, 16 Feb 2005 16:09:47 -0600 ·

Thanks for the pointer!  It makes some sense to me.  But I have the
ciscocpu external script reporting its output under the cpu column, will
that have any effect on what I'm trying to accomplish?  I mean if I
create  a custom rrd script for this, will it interfere with my
windows/unix hosts cpu graphs that are already being generated?  Do I
need to configure it to report under its own column?

Here is what the status message looks like from the ciscocpu script:

@@status#28646|1108591336.263764|10.10.10.22||WAN.jvbs-router.atg.com|cp
u|1108593136|green||green|1108550383|0||0|
status WAN,jvbs-router,atg,com.cpu green Wed Feb 16 16:02:16 CST 2005
<br>CPU 5 min average: 4%


Thanks!

▸ quoted from Tom Georgoulias


Michael Lowery
       

-----Original Message-----
From: Tom Georgoulias [mailto:user-e7ef09aae711@xymon.invalid] 
Sent: Wednesday, February 16, 2005 3:32 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] trying to get netapp filer data into larrd graphs

Lowery, Michael wrote:

On a similar note, I'm trying to get the ciscocpu.pl/.sh script to
also
present the data to larrd.  But I don't really have a clue as how to
accomplish this, so if someone feels that they would like to help, let
me know.  I will be equally grateful for those who can provide links
and
or hints that can help me get this done.

Did you look at the "custom rrd data" section in the hobbitd_larrd 
manpage?  It may be easier to do than you think.  THere is no perl 
involved (assuming your ciscocpu script is already running on the hobbit

server and displaying the data (but not the graphs)) and an example 
shell script is provided.

Tom

list Henrik Størner · Wed, 16 Feb 2005 23:18:28 +0100 ·

▸ quoted from Michael Lowery

On Wed, Feb 16, 2005 at 04:09:47PM -0600, Lowery, Michael wrote:

Thanks for the pointer!  It makes some sense to me.  But I have the
ciscocpu external script reporting its output under the cpu column, will
that have any effect on what I'm trying to accomplish?  I mean if I
create  a custom rrd script for this, will it interfere with my
windows/unix hosts cpu graphs that are already being generated?

The way hobbitd_larrd works, then the external scripts can only be
invoked for status messages that it does not know how to handle.
Since "cpu" is well-known and handled internally in hobbitd_larrd, a
script cannot be used to process "cpu" status messages.

Do I need to configure it to report under its own column?

If you want to do it with a script: Yes.

▸ quoted from Michael Lowery

Here is what the status message looks like from the ciscocpu script:

@@status#28646|1108591336.263764|10.10.10.22||WAN.jvbs-router.atg.com|cp
u|1108593136|green||green|1108550383|0||0|
status WAN,jvbs-router,atg,com.cpu green Wed Feb 16 16:02:16 CST 2005
<br>CPU 5 min average: 4%

Wouldn't be hard to add support for this in the standard hobbitd_larrd
handler. Is that "CPU 5 min average" on a line by itself, or is it
just your mail program that added a newline between the "status ..."
and "<br>CPU 5 min average..." ?


Henrik

list Michael Lowery · Wed, 16 Feb 2005 16:22:40 -0600 ·

It is on the same line.

Thanks for the help!
Michael

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, February 16, 2005 4:18 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] trying to get netapp filer data into larrd graphs

On Wed, Feb 16, 2005 at 04:09:47PM -0600, Lowery, Michael wrote:

Thanks for the pointer!  It makes some sense to me.  But I have the
ciscocpu external script reporting its output under the cpu column,
will
that have any effect on what I'm trying to accomplish?  I mean if I
create  a custom rrd script for this, will it interfere with my
windows/unix hosts cpu graphs that are already being generated?

The way hobbitd_larrd works, then the external scripts can only be
invoked for status messages that it does not know how to handle.
Since "cpu" is well-known and handled internally in hobbitd_larrd, a
script cannot be used to process "cpu" status messages.

Do I need to configure it to report under its own column?

If you want to do it with a script: Yes.

Here is what the status message looks like from the ciscocpu script:

@@status#28646|1108591336.263764|10.10.10.22||WAN.jvbs-router.atg.com|cp

u|1108593136|green||green|1108550383|0||0|
status WAN,jvbs-router,atg,com.cpu green Wed Feb 16 16:02:16 CST 2005
<br>CPU 5 min average: 4%

Wouldn't be hard to add support for this in the standard hobbitd_larrd
handler. Is that "CPU 5 min average" on a line by itself, or is it
just your mail program that added a newline between the "status ..."
and "<br>CPU 5 min average..." ?


Henrik

list Henrik Størner · Wed, 16 Feb 2005 23:29:51 +0100 ·

▸ quoted from Michael Lowery

On Wed, Feb 16, 2005 at 11:18:28PM +0100, Henrik Stoerner wrote:

Here is what the status message looks like from the ciscocpu script:

@@status#28646|1108591336.263764|10.10.10.22||WAN.jvbs-router.atg.com|cp
u|1108593136|green||green|1108550383|0||0|
status WAN,jvbs-router,atg,com.cpu green Wed Feb 16 16:02:16 CST 2005
<br>CPU 5 min average: 4%

Wouldn't be hard to add support for this in the standard hobbitd_larrd
handler.

I've added a couple of lines to the "cpu" handler in hobbitd_larrd,
and from your message I think it should make it work. I also added
the lines that should make the Netapp cpu-reports work.

I've attached a small patch for this (also available at
http://www.hswn.dk/beta/cpu-reports.patch); apply with GNU patch using
"cd hobbit-4.0-RC2; patch -p0 </tmp/cpu-reports.patch" then rebuild
with "make" and "make install" and restart Hobbit.  If graphs appear
after 10-15 minutes, it worked.

Let me know if it's OK.


Regards,
Henrik (off to bed now)
-------------- next part --------------
--- hobbitd/larrd/do_la.c	2005/02/06 08:49:02	1.7
+++ hobbitd/larrd/do_la.c	2005/02/16 22:22:33
@@ -8,7 +8,7 @@
 /*                                                                            */
 /*----------------------------------------------------------------------------*/
 
-static char la_rcsid[] = "$Id: do_la.c,v 1.7 2005/02/06 08:49:02 henrik Exp $";
+static char la_rcsid[] = "$Id: do_la.c,v 1.8 2005/02/16 22:04:46 henrik Exp henrik $";
 
 static char *la_params[]          = { "rrdcreate", rrdfn, "DS:la:GAUGE:600:0:U", rra1, rra2, rra3, rra4, NULL };
 
@@ -20,6 +20,8 @@
 
 	eoln = strchr(msg, '\n'); if (eoln) *eoln = '\0';
 	p = strstr(msg, "up: ");
+	if (!p) p = strstr(msg, "Uptime:");	/* Netapp filerstats2bb script */
+	if (!p) p = strstr(msg, "uptime:");
 	if (p) {
 		/* First line of cpu report, contains "up: 159 days, 1 users, 169 procs, load=21" */
 		p = strchr(p, ',');
@@ -57,6 +59,17 @@
 	}
 	if (eoln) *eoln = '\n';
 
+	if (!gotload) {
+		/* See if it's a report from the ciscocpu.pl script. It has load-average on a line by itself */
+		p = strstr(msg, "<br>CPU 5 min average:");
+		if (p) {
+			/* It reports in % cpu utilization */
+			p = strchr(p, ':');
+			load = atoi(p+1);
+			gotload = 1;
+		}
+	}
• if (gotload) {
 		sprintf(rrdfn, "la.rrd");
 		sprintf(rrdvalues, "%d:%d", (int)tstamp, load);

list Henrik Størner · Wed, 16 Feb 2005 23:31:00 +0100 ·

▸ quoted from Henrik Størner

On Wed, Feb 16, 2005 at 04:22:40PM -0600, Lowery, Michael wrote:

It is on the same line.

OK; the patch I just sent should work regardless of that.


Henrik

list Michael Lowery · Wed, 16 Feb 2005 17:17:34 -0600 ·

Ok, after following the instructions, I get this on the BBTEST page:

- Program crashed
Fatal signal caught!

Now all the network tests have gone purple... What did I do wrong?

Michael Lowery

▸ quoted from Henrik Størner

 
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, February 16, 2005 4:30 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] trying to get netapp filer data into larrd graphs

On Wed, Feb 16, 2005 at 11:18:28PM +0100, Henrik Stoerner wrote:

Here is what the status message looks like from the ciscocpu script:

@@status#28646|1108591336.263764|10.10.10.22||WAN.jvbs-router.atg.com|cp

u|1108593136|green||green|1108550383|0||0|
status WAN,jvbs-router,atg,com.cpu green Wed Feb 16 16:02:16 CST

<br>CPU 5 min average: 4%

Wouldn't be hard to add support for this in the standard hobbitd_larrd
handler.

I've added a couple of lines to the "cpu" handler in hobbitd_larrd,
and from your message I think it should make it work. I also added
the lines that should make the Netapp cpu-reports work.

I've attached a small patch for this (also available at
http://www.hswn.dk/beta/cpu-reports.patch); apply with GNU patch using
"cd hobbit-4.0-RC2; patch -p0 </tmp/cpu-reports.patch" then rebuild
with "make" and "make install" and restart Hobbit.  If graphs appear
after 10-15 minutes, it worked.

Let me know if it's OK.


Regards,
Henrik (off to bed now)

list Andy France · Thu, 17 Feb 2005 13:24:55 +1300 ·


Hi all,

Before I move on to graphing some more complex datasets (Oracle :-), I've
been trying my hand at some simple graphs for our UPS.

I'm monitoring the UPS using a modified version of the apcsnmp222.tar.gz
script  from deadcat (http://www.deadcat.net.au/viewfile.php?fileid=893).
I've modified the test names for "more uniqueness", updated some of the
status message texts, and converted it back to degress celcius.

I've written my --extra-script (based on the sample in the hobbitd_larrd
man page) and added this as well my columns for the --extra-tests in
hobbitlaunch.cfg under the larrdstatus command.

My script is getting executed OK.  I've added some debug messages which get
appended to a log file and most things are running as expected.  However,
I'm getting no actual data, no rrd files and no graphs :-(

It seems the the $FNAME parameter is not giving me what it shoud, and this
is causing my tests to echo the DS line, the rrd file name, but an empty
data string.  I've added an extra debug line in the script to do a
"/usr/bin/cp $FNAME /tmp/$FNAME.$TESTNAME.$$" but there are no files
appearing in temp.  This is what leads me to believe the file at $FNAME is
not there when I grep it.

I'm running 4.0-RC2 on Solaris 9 x86.  I started as a clean 4.0-b4 install
and have done a "make install" during the b5, b6, RC1, RC2 upgrades.  I
have also recently run a "make setup" for RC2 due to issues that most
people have run accross!

Does this give anyone enough info to help?  I can start posting all my
config and scripts if required.

TIA,
Andy.


#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################

list Henrik Størner · Thu, 17 Feb 2005 07:55:57 +0100 ·

▸ quoted from Andy France

On Thu, Feb 17, 2005 at 01:24:55PM +1300, Andy France wrote:

I've written my --extra-script (based on the sample in the hobbitd_larrd
man page) and added this as well my columns for the --extra-tests in
hobbitlaunch.cfg under the larrdstatus command.

Brave man - you're  the first to try out this mechanism for real:-)

▸ quoted from Andy France

My script is getting executed OK.  I've added some debug messages which get
appended to a log file and most things are running as expected.  However,
I'm getting no actual data, no rrd files and no graphs :-(

It seems the the $FNAME parameter is not giving me what it shoud, and this
is causing my tests to echo the DS line, the rrd file name, but an empty
data string.

Some shell's do have slightly different syntax than what the example
script in the man-page shows.  You could try just echo'ing the input
parameters to some file and see what they are. Like

   echo "Input 1: $1" > /tmp/params.txt
   echo "Input 2: $2" >>/tmp/params.txt
   echo "Input 3: $3" >>/tmp/params.txt

at the top of your script (before they get put into the $FNAME etc).

I'm running 4.0-RC2 on Solaris 9 x86.

Something in the back of my head says: Could you try putting the
input params inside curly brackets ? Instead of 

             HOSTNAME="$1"
             TESTNAME="$2"
             FNAME="$3"

try
             HOSTNAME="${1}"
             TESTNAME="${2}"
             FNAME="${3}"


Henrik

list Andy France · Thu, 17 Feb 2005 23:11:11 +1300 ·


Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote on 17/02/2005 19:55:57:

▸ quoted from Andy France

On Thu, Feb 17, 2005 at 01:24:55PM +1300, Andy France wrote:

I've written my --extra-script (based on the sample in the
hobbitd_larrd
man page) and added this as well my columns for the --extra-tests in
hobbitlaunch.cfg under the larrdstatus command.

Brave man - you're  the first to try out this mechanism for real:-)


Wheeeeeeeee!  Bring on the pain!  ;-)

▸ quoted from Henrik Størner

My script is getting executed OK.  I've added some debug messages which
get
appended to a log file and most things are running as expected.

However,

I'm getting no actual data, no rrd files and no graphs :-(

It seems the the $FNAME parameter is not giving me what it shoud, and
this
is causing my tests to echo the DS line, the rrd file name, but an
empty
data string.


Quite the opposite it turns out.  I added some extra debug lines to cat $3
- lo and behold, there was my expected output in all its glory.  My bad.

Once I *fixed my grep* I started getting data and rrd files ( I was
grepping for text at the start of the line, but the report included a
leading space).

Of course, my original "cp $FNAME /tmp/$FNAME.$TESTNAME.$$" in debug wasn't
working because I didn't basename $FNAME first.  Duh.


My remaining problem is that after colecting a few data points, I still
have no graphs on the test pages or the trends page.  I'm off to bed now
and will re-check tomorrow morning.  But if you know of anything else I may
have missed please let me know!  Should bb-hostsvc.sh pick it up
automagically?

▸ quoted from Henrik Størner

Some shell's do have slightly different syntax than what the example
script in the man-page shows.  You could try just echo'ing the input
parameters to some file and see what they are. Like

echo "Input 1: $1" > /tmp/params.txt
echo "Input 2: $2" >>/tmp/params.txt
echo "Input 3: $3" >>/tmp/params.txt

at the top of your script (before they get put into the $FNAME etc).

I'm running 4.0-RC2 on Solaris 9 x86.

Something in the back of my head says: Could you try putting the
input params inside curly brackets ? Instead of

HOSTNAME="$1"
TESTNAME="$2"
FNAME="$3"

try
HOSTNAME="${1}"
TESTNAME="${2}"
FNAME="${3}"

Henrik

This wasn't an issue of course - plain old $1, $2 and $3 are fine.

Thanks for the help (and of course the awesome tool).

On to Oracle data... and maybe some c modules to replace my script.

Cheers,

▸ quoted from Andy France

Andy.


#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################

list Henrik Størner · Thu, 17 Feb 2005 12:30:45 +0100 ·

On Thu, Feb 17, 2005 at 11:11:11PM +1300, Andy France wrote:

[success story]

Congrats - it's nice to know this mechanism works. Doing it in my "lab
setup" is often very different than what goes on in real life.

▸ quoted from Andy France

My remaining problem is that after colecting a few data points, I still
have no graphs on the test pages or the trends page.  I'm off to bed now
and will re-check tomorrow morning.  But if you know of anything else I may
have missed please let me know!  Should bb-hostsvc.sh pick it up
automagically?

No, there are a couple of tweaks you must do for that to happen.

The LARRDS and GRAPHS settings in etc/hobbitserver.cfg must be told
about this new graph. 

If the column-name for your custom test is "xyz", then you just add
"xyz" to the LARRDS setting - when bb-hostsvc.cgi then builds the
status display, it will know to include the graph on the webpage.

If you want the graph also to appear on the "trends" page with all the
other graphs, you must also add it to the GRAPHS setting.

In either case, you of course have to setup hobbitgraph.cfg with the
RRDtool definitions for this new graph, so it knows how to generate
the graph from your data.


Henrik

list Charles Jones · Thu, 17 Feb 2005 07:18:46 -0700 ·

Currently I am forwarding (via BBRELAY option) all of my BB status messages to my test Hobbit server.  This allows me to fully test hobbit, configure alerts etc, without disrupting my "production" bb server.

Once I make the full switch to hobbit, what is the best way to move as much of the historacle information as possible over to hobbit?  I assume I could copy the rrd files over?  What about the history logs?  I mainly don't want to lose my year of rrd info.

Thanks,

-Charles

list Henrik Størner · Thu, 17 Feb 2005 14:30:25 +0000 (UTC) ·

▸ quoted from Charles Jones

In <user-d56f428e680c@xymon.invalid> Charles Jones <user-e86b4aeade4e@xymon.invalid> writes:

Currently I am forwarding (via BBRELAY option) all of my BB status 
messages to my test Hobbit server.  This allows me to fully test hobbit, 
configure alerts etc, without disrupting my "production" bb server.

Once I make the full switch to hobbit, what is the best way to move as 
much of the historacle information as possible over to hobbit?  I assume 
I could copy the rrd files over?  What about the history logs?  I mainly 
don't want to lose my year of rrd info.

From BB's bbvar/ directory, move the "hist" and "histlogs" directories
directly to Hobbit's data/ directory. That way you'll keep all of the
historical status logs.

For the RRD files, they must be moved and renamed. There's a
"moverrd.sh" script in the "hobbit-4.0*/hobbitd/" directory, which
will help you with moving or copying the RRD files from BB to Hobbit.
Note that this script requires that your Hobbit bb-hosts file has all
of the hosts listed.

You cannot move the netstat RRD files over, since the data layout is
different. That goes for the vmstat RRD files from Linux hosts as
well. The rest of the RRD files are compatible between BB/LARRD and
Hobbit.


Henrik

list Henrik Størner · Thu, 17 Feb 2005 14:33:39 +0000 (UTC) ·

▸ quoted from Henrik Størner

In <cv29q1$2tu$user-e356fad9864f@xymon.invalid> Henrik Storner <user-ce4a2c883f75@xymon.invalid> writes:

Once I make the full switch to hobbit, what is the best way to move as 
much of the historacle information as possible over to hobbit?

How to do this is also described in the "Big Brother to Hobbit" guide.
There's a link to it in your own Hobbit setup (Help -> Installing
Hobbit -> Migration guide"), or you'll find it at 
http://www.hswn.dk/hobbit/help/bb-to-hobbit.html


Henrik

list Tom Georgoulias · Thu, 17 Feb 2005 13:21:42 -0500 ·

▸ quoted from Henrik Størner

Henrik Storner wrote:

The best way of working with the RRD data that Hobbit handles is to
snoop on the data that is sent from hobbitd to the hobbitd_larrd
program. You can do that by listening on the hobbit "status" channel:

    ~/server/bin/bbcmd sh
    hobbitd_channel --channel=status cat

The first line with "@@status..." is the beginning of a message - it
has some information that hobbitd picks out from all messages, like
the hostname, test-name, color etc. The important thing here is to see
that hobbitd does see that it is a "cpu" status - there's "|cpu|" in
the first line. That means hobbitd_larrd will send this message
through the "cpu" handler in hobbitd/larrd/do_la.c.

THis was extremely useful to learn.  Thanks for sharing it.

▸ quoted from Henrik Størner

So the first fix is to change those lines above to handle a report
with the keyword "Uptime:" - e.g. like this:

        p = strstr(msg, "up: ");
        if (!p) p = strstr(msg, "Uptime:");
        if (p) {


Just one line added. But in this case, I think it makes all the
difference - because the rest of the reports looks like it will be
handled just fine by the current code in do_la.c

I've added this fix to my sources.

I added the line to do_la.c and a rrd file is being created for la, but the data used in the graph was being converted or truncated in some manner on its way from the status report message to the rrd file. The "load average" collected by this script is actually the %CPU utilization, not a true unix load average. I thought that it may have been getting converted by the operation that converts load averages when DISPREALLOADAVG=FALSE, so I added a line to the perl script that adds 2 digits after a decimal when returning the CPU load avg to hobbit. Now a CPU utilization of 11% is displayed as "load=11.00", which seems to be working better.

So as it stands now, the trend charting works and I've found a new problem while pulling my hair out on this one: The CPU utilization data obtained by SNMP is not always accurate (netapp bug #145119). In my experience, it seems to be about 5-10% off. That's not something that I can fix, so I'm just going to have to live with it for now. Still didn't make troubleshooting this hobbit graphing any easier! ;)

Coincidence or not, it seems that after I applied the fix above and rebuilt hobbit, sometime later a hobbitd_larrd column appeared and stayed red then purple for a very long time. The error message was "fatal signal caught" or something like that. I ended up using the bb 127.0.0.1 "drop servername hobbitd_larrd" command just to get rid of it, with the intention of adding it back later once I was sure it wasn't a bogus message. I'm beginning to regret that, since in my haste I may have thrown out perfectly good data. Was that a new feature that was added in RC2? How would I get it back? Add hobbitd_larrd to bb-hosts?

▸ quoted from Henrik Størner

PS: If you want me to look at that Netapp disk-report that isn't being
graphed, just send me an example of what such a report looks like.

Sure thing.  See below, sorry about the line wrap.  After seeing what you looked at in the CPU case, I think I know what the problem could be.   The rest of my systems use the phrase "Disk partitions" while the filer uses "NetAPP Volumes".  I poked at the do_disk.c code but was clearly out of my league when it came to fixing it.  The column ordering is different too, although I can reorder it in the perl script to match the other linux style systems if needed.


  Thu Feb 17 08:12:36 EST 2005 - NetAPP Volumes on filerA.nandomedia.com OK

Volume:	Size:	Used:	Avail:	%Used
green /vol/test01/                        382G 92915122176      296G     22.63%
green /vol/test01/.snapshot                96G 27266535424       70G     26.56%
green /vol/test01/total                   478G 120181657600      366G     23.41%
green /vol/vol0/                           96G 193298432       95G      0.19%
green /vol/vol0/.snapshot                  24G 129028096       24G      0.50%
green /vol/vol0/total                     120G 322326528      119G      0.25%

list Andy France · Fri, 18 Feb 2005 09:26:31 +1300 ·


Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote on 18/02/2005 00:30:45:

▸ quoted from Andy France

On Thu, Feb 17, 2005 at 11:11:11PM +1300, Andy France wrote:

[success story]

Congrats - it's nice to know this mechanism works. Doing it in my "lab
setup" is often very different than what goes on in real life.

My remaining problem is that after colecting a few data points, I still
have no graphs on the test pages or the trends page.  I'm off to bed
now
and will re-check tomorrow morning.  But if you know of anything else I
may
have missed please let me know!  Should bb-hostsvc.sh pick it up
automagically?

No, there are a couple of tweaks you must do for that to happen.

The LARRDS and GRAPHS settings in etc/hobbitserver.cfg must be told
about this new graph.

If the column-name for your custom test is "xyz", then you just add
"xyz" to the LARRDS setting - when bb-hostsvc.cgi then builds the
status display, it will know to include the graph on the webpage.

If you want the graph also to appear on the "trends" page with all the
other graphs, you must also add it to the GRAPHS setting.

In either case, you of course have to setup hobbitgraph.cfg with the
RRDtool definitions for this new graph, so it knows how to generate
the graph from your data.

Awesome!

I've updated etc/hobbitserver.cfg for my new columns, tweaked my entries in
etc/hobbitgraph.cfg, and now have a funky graph that shows a 10 volt
variance on my UPS input over the last 10 hours!

Thanks again,

▸ quoted from Andy France

Andy.


#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################

list Henrik Størner · Thu, 17 Feb 2005 22:52:39 +0100 ·

▸ quoted from Tom Georgoulias

On Thu, Feb 17, 2005 at 01:21:42PM -0500, Tom Georgoulias wrote:

Coincidence or not, it seems that after I applied the fix above and rebuilt hobbit, sometime later a hobbitd_larrd column appeared and stayed red then purple for a very long time.  The error message was "fatal signal caught" or something like that.

Aha - all of the hobbitd programs have a built-in feature so that if
they do crash, they'll try to let you know it happened by sending off
a status-message about themselves, like the one you saw. Since
hobbitd_larrd doesn't normally send status messages, it will
eventually go purple.

I'll look over the code - there's probably something that needs more
thorough error-checking to withstand all kinds of input.

▸ quoted from Tom Georgoulias

PS: If you want me to look at that Netapp disk-report that isn't being
graphed, just send me an example of what such a report looks like.

Sure thing.  See below, sorry about the line wrap.  After seeing what you looked at in the CPU case, I think I know what the problem could be.  The rest of my systems use the phrase "Disk partitions" while the filer uses "NetAPP Volumes".  I poked at the do_disk.c code but was clearly out of my league when it came to fixing it.

A bit of experience with the code does help :-) The disk handler is
one of the more complicated ones.

 The column ordering is different too, although I can reorder it in the perl script to match the other linux style systems if needed.

That won't be necessary.

I think I have something now that appears to work. I'll send you the
latest source-files directly to test, and then it will be in the next
release.


Regards,
Henrik

trying to get netapp filer data into larrd graphs 🔗 link

trying to get netapp filer data into larrd graphs