Xymon Mailing List Archive search

urgent rrd help needed - im desperate!

list Jeff Newman
Wed, 22 Mar 2006 01:35:55 -0600
Message-Id: <user-0791d1348231@xymon.invalid>

I got things working, but now am stuck on a slightly different problem.

host A: has cpu0,1,2,3
host B: has cpu0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

Both have exactly the same RRD's, and both have the same service names
and everything. The problem is:

host A: Graph's each CPU just fine (I have meter::1 defined, so I get
1 graph per CPU)

host B: Graphs CPU 0,1 and then has broken boxes (i.e. no graph's) for
2,3 and doesn't even attempt to do anything with 4-15. All RRD's are
being updated correctly.

It's so odd because it's all setup EXACTLY the same. I don't see why
one would work and the other not. Im guessing that maybe there is
something with RRDIDX, but I don't know. Anyone have any thoughts?


On 3/20/06, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:
I would study the code in hobbitgraph for graphing disk partition sizes,
or for graphing usage on multi-CPU systems.  This might help you with
the "graph whatever you find" problem.  I haven't worked with this
myself, so I am no help...

GLH

-----Original Message-----
From: Jeff Newman [mailto:user-e96740e73ca8@xymon.invalid]
Sent: Monday, March 20, 2006 12:58 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] urgent rrd help needed - im desperate!

Rather than focus on sar (Im going to have this problem with several
tests I have) let me approach this question slightly different.

I have host A and host B. They both have a program that looks at queues.
there is a requirement on host A to only look at queue A, and on host B,
they want to see queue's A & B.

Host A sends a status message (with a test called qmeter) that has:

aqueue:#

Host B sends a status message (with a test called qmeter) that has:

aqueue:#
bqueue:#

each host has an RRD that has it's own dataset (i.e. a DS for just
aqueue or a DS for both a and bqueue). That all works great. The problem
is the graphing. If the graph definition looks like this:

[qmeter]
       TITLE 1 Second qmeter
       YAXIS Avg. Messages per second
       DEF:aqueue=qmeter.rrd:aqueue:AVERAGE
       DEF:bqueue=qmeter.rrd:bqueue:AVERAGE
       LINE1:aqueue#CC3333:a queue
       LINE1:bqueue#FF0000:b queue
       COMMENT:\n
       GPRINT:aqueue:LAST:a Queue \: %5.1lf%s (cur)
       GPRINT:aqueue:MAX: \: %5.1lf%s (max)
       GPRINT:aqueue:MIN: \: %5.1lf%s (min)
       GPRINT:aqueue:AVERAGE: \: %5.1lf%s (avg)\n
       GPRINT:bqueue:LAST:b Queue \: %5.1lf%s (cur)
       GPRINT:bqueue:MAX: \: %5.1lf%s (max)
       GPRINT:bqueue:MIN: \: %5.1lf%s (min)
       GPRINT:bqueue:AVERAGE: \: %5.1lf%s (avg)\n

This obviously works great for host B, but on host A where only there A
queue is defined in the RRD, this doesn't work, it won't even draw the
graph.

Using RRDIDX won't work because it relies on numbers. So I am stuck on
how to make this graph properly, and using a server-side script is all I
can think of. Would this be the correct approach, or is there a trick in
hobbitgraph.cfg that I don't know about?

-Jeff


On 3/20/06, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:
Is there one sar graph per host, or multiple?

I agree with your assessment -- creating explicit custom graphs is
easier than trying to make ncv_whatever work.

Beware of the many moving parts:

A) the "pitcher" -- a custom script run by the client to send over a
status page which the server will associate with a column.  Do
yourself a favor and send the data in a format that is uniquely
recognized at the server end.  You can see what is sent by looking at
the Web page associated with the column for a node.  Henrik also
suggests defining $BB as "echo" during the early process so you can
see what is being sent by looking through the client log.

B) the "catcher" -- a custom script run by the server to process the
data in a status page for a custom test.  You are only allowed ONE (1)
catcher script per Hobbit server, so it must be equipped to handle all
custom tests.  Fortunately, the test/column name is a parameter for
this script so you can use a switch statement to branch to the right
code for the incoming data.  Mine is handling about 7 custom tests
right now.

C) the RRD format that you will use -- even Tobi's documentation is
hazy on whether it is better to use one file for several variables, or
one variable in each file, or what.  Experimentation, trial, and error
cannot be avoided here unless you are already the RRD guru.

D) the graph definitions in hobbitgraph.cfg -- more opportunity to
learn RRD!

E) changing the right settings in hobbitserver.cfg file for TEST2RRD
and GRAPHS variables.

F) Even so -- there are some limits.  First, you can only have one
graph on the status page for each custom test.  Other graphs can be
included on the trends page, but you will only get the one associated
with each custom test -- unless you set the TREND: flag in bb-hosts --
which I haven't fully explored.

Hope all this helps.  Even though the Hobbit documents provide a lot
of pointers, there are still many places where the innocent can go
wrong -- but debugging it will teach you a lot about the Hobbit
innards -- which helps you appreciate the hard work Henrik and others
have put into this tool...!

GLH


-----Original Message-----
From: Jeff Newman [mailto:user-e96740e73ca8@xymon.invalid]
Sent: Monday, March 20, 2006 11:45 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] urgent rrd help needed - im desperate!

Ya, I tried for about 4-6 hours to get it to work, even tried
following the steps that someone previously sent in to the hobbit
list. I could get it to send data, but I couldn't get the graphing
portion working.

As for the problem I posted, I think I figured out what I need to do.
I need to have a parsing script on the server end to parse the data
into seperate RRD's. That way, I can send from the client side a test
with the name "sar" (then I will only have 1 sar column) and let the
server side create the sar#.rrd. Hopefully I don't have problems on
the hobbitgraph.cfg end.

-Jeff


On 3/17/06, Galen Johnson <user-d2ff723b6cb6@xymon.invalid> wrote:
Jeff Newman wrote:
Ok, i've been working on this for 6+ hours, and am totally stuck.
Here is the script on the client:


====================================

#!/bin/sh

BB=/usr/local/hobbit/client/bin/bb
BBDISP=xxx.xxx.xxx.xxx
MACHINE=xxxxxx

sar -P ALL 1 1 | grep -E "^[0-9]|^(  *)" | grep -v \- | grep -v cpu
|
cut -c 9->  /tmp/hobbit_sar.$$ 2>&1; mv /tmp/hobbit_sar.$$
/tmp/hobbit_sar.tmp </dev/null > /dev/null


while read aline; do
CPUNUM=`echo $aline | awk '{print $1}'` PUSR=`echo $aline | awk
'{print $2}'` PSYS=`echo $aline | awk '{print $3}'` PWIO=`echo
$aline
| awk '{print $4}'` PIDL=`echo $aline | awk '{print $5}'`

echo "cpu"$CPUNUM"pcntusr : $PUSR" >> /tmp/hobbit_sar"$CPUNUM".msg
echo "cpu"$CPUNUM"pcntsys : $PSYS" >> /tmp/hobbit_sar"$CPUNUM".msg
echo "cpu"$CPUNUM"pcntwio : $PWIO" >> /tmp/hobbit_sar"$CPUNUM".msg
echo "cpu"$CPUNUM"pcntidl : $PIDL" >> /tmp/hobbit_sar"$CPUNUM".msg

$BB $BBDISP "status $MACHINE.sar,"$CPUNUM" green `date` `cat
/tmp/hobbit_sar"$CPUNUM".msg` "
rm /tmp/hobbit_sar"$CPUNUM".msg
done < /tmp/hobbit_sar.tmp
rm /tmp/hobbit_sar.tmp

========================================

It sends the data just fine. (output from sh -x)
+ BB=/usr/local/hobbit/client/bin/bb
+ BBDISP=167.76.113.220
+ MACHINE=stlfan3
+ sar -P ALL 1 1
+ grep -E ^[0-9]|^(  *)
+ grep -v -
+ grep -v cpu
+ cut -c 9-
+ 1> /tmp/hobbit_sar.20750 2>& 1
+ mv /tmp/hobbit_sar.20750 /tmp/hobbit_sar.tmp 0< /dev/null 1>
+ /dev/null 0< /tmp/hobbit_sar.tmp read aline
+ + awk {print $1}
+ echo 0 2 8 1 89
CPUNUM=0
+ + awk {print $2}
+ echo 0 2 8 1 89
PUSR=2
+ + awk {print $3}
+ echo 0 2 8 1 89
PSYS=8
+ + awk {print $4}
+ echo 0 2 8 1 89
PWIO=1
+ + awk {print $5}
+ echo 0 2 8 1 89
PIDL=89
+ echo cpu0pcntusr : 2
+ 1>> /tmp/hobbit_sar0.msg
+ echo cpu0pcntsys : 8
+ 1>> /tmp/hobbit_sar0.msg
+ echo cpu0pcntwio : 1
+ 1>> /tmp/hobbit_sar0.msg
+ echo cpu0pcntidl : 89
+ 1>> /tmp/hobbit_sar0.msg
+ date
+ cat /tmp/hobbit_sar0.msg

+ /usr/local/hobbit/client/bin/bb --debug 167.76.113.220 status
+ stlfan3.sar,0 gr
een Fri Mar 17 11:58:59 EST 2006

cpu0pcntusr : 2
cpu0pcntsys : 8
cpu0pcntwio : 1
cpu0pcntidl : 89

2006-03-17 11:58:59 Transport setup is:
2006-03-17 11:58:59 bbdportnumber = 1984
2006-03-17 11:58:59 bbdispproxyhost = NONE
2006-03-17 11:58:59 bbdispproxyport = 0
2006-03-17 11:58:59 Recipient listed as 'xxx.xx.xxx.xxx'
2006-03-17 11:58:59 Standard BB protocol on port 1984
2006-03-17 11:58:59 Will connect to address xxx.xx.xxx.xxx port
1984
2006-03-17 11:58:59 Connect status is 0
2006-03-17 11:58:59 Sent 121 bytes
2006-03-17 11:58:59 Closing connection
+ rm /tmp/hobbit_sar0.msg
+ read aline

<and so on, incrementing cpu numbers as expected.>

On the hobbit server, I want this to work like "disk" where there
are
multiple file systems under one disk column. I manually created the
RRD's (again for a custom time step)

-rw-r--r--   1 hobbit hobbit 22121176 Mar 17 10:31 sar,0.rrd
-rw-r--r--   1 hobbit hobbit 22121176 Mar 17 10:31 sar,1.rrd
-rw-r--r--   1 hobbit hobbit 22121176 Mar 17 10:31 sar,2.rrd
-rw-r--r--   1 hobbit hobbit 22121176 Mar 17 10:31 sar,3.rrd

The DS names in the rrd dump look fine:
      <ds>
               <name> cpu0pcntusr </name>
               <type> GAUGE </type> for example.

Note this all doesn't work if the files are just "sar0.rrd, sar1rrd
etc..." without the ,'s

Unfortunately, on the web page, it gives me 3 columns, a sar,0
sar,1
sar,2 Which is nitpicky, but if I can just have a "sar" column with
the others under it would be great (like the disk problem). Here is
the REAL problem.

Looking at the sar,0 button for example, I see the data update
there,
HOWEVER, NONE of the /usr/local/hobbit/data/rrd/xxxx/sar,#.rrd
files ever get updated!!!
In addition, there isn't even a link for a graph in the page.

No errors in /var/log/hobbit/rrd*, or any others that I have looked
at on the client or server. Here are the .cfg files. I have tried
many variations, these are just how they are now:

hobbitserver.cfg - here are the lines that I have "sar" in:

TEST2RRD="cpu=la,disk,inode,qtree,memory,$PINGCOLUMN=tcp,http=tcp,dns
=t
cp,dig=tcp,time=ntpstat,vmstat,iostat,netstat,temperature,apache,bind,
se
ndmail,mailq,nmailq=mailq,sar,socks,bea,iishealth,citrix,bbgen,bbtest,
bb proxy,hobbitd,HiFlowNet="ncv",sock="ncv",qmeter="ncv",rtt="ncv"
Note, I have tried sar="ncv", sar,sar0,sar1, etc.. maybe I havn't
tried the right variation :-(
GRAPHS="la,disk,inode,qtree,memory,users,vmstat,iostat,tcp.http,tcp,n
et
stat,mrtg::1,temperature,ntpstat,apache,bind,sendmail,mailq,socks,bea,
ii
shealth,citrix,bbgen,bbtest,bbproxy,hobbitd,ncv,HiFlowNet,sock,rtt,sar
,s
ar0,sar1,sar2,sar3"
again, tried many variations.. and also again, maybe I havn't tried
the right one.

bb-hosts:
only tried putting "xxx.xxx.xxx.xxx  xxx   # conn sar
that didn't help


hobbitgraph.cfg:

[sar]
       FNPATTERN sar(.*).rrd
       TITLE CPU sar
       YAXIS %
       DEF:p at RRDIDX@=@RRDFN@:cpu at RRDIDX@pcntusr:AVERAGE
       DEF:p at RRDIDX@=@RRDFN@:cpu at RRDIDX@pcntsys:AVERAGE
       DEF:p at RRDIDX@=@RRDFN@:cpu at RRDIDX@pcntwio:AVERAGE
       DEF:p at RRDIDX@=@RRDFN@:cpu at RRDIDX@pcntidl:AVERAGE
       LINE2:p at RRDIDX@#@COLOR@:@RRDPARAM@
       -u 100
       -l 0
       GPRINT:p at RRDIDX@:LAST: \: %5.1lf (cur)
       GPRINT:p at RRDIDX@:MAX: \: %5.1lf (max)
       GPRINT:p at RRDIDX@:MIN: \: %5.1lf (min)
       GPRINT:p at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n

If anyone has anything else they need to see, let me know.
I need to get this working quickly, and am at the end of my rope!
I've done other custom graph's with custom RRD's, and never had
this problem before.

By the way, the host is sending other custom data just fine with no
problems.

Thanks for any help!

-Jeff

Have you looked at the sar script on deadcat?...it's really very
nice...it has some minor issues but works great.   It might need
tweaking for hobbit but I don't think it will.

=G=