vmstat graphing with CPU io wait

84 messages in this thread

list Tom Georgoulias · Mon, 24 Jan 2005 13:19:32 -0500 ·

My old BB setup has a customized vmstat-larrd.pl script which allows for   variations in vmstat output based on the version of procps.  In other words, it compensates for the fact that RHEL3 and old red hat linux systems have vmstat output who's column ordering doesn't match up.

So I'd like to bring some of those vmstat changes into my new hobbit setup, most notably the ability to plot CPU wait for IO (wa) alongside user, system & idle time, but poking around in hobbitgraph.cfg doesn't reveal an easy way to do it.

Any tips on how I might accomplish this?

Tom

list Henrik Størner · Mon, 24 Jan 2005 22:06:31 +0000 (UTC) ·

▸ quoted from Tom Georgoulias

In <user-1109dc7e0465@xymon.invalid> Tom Georgoulias <user-e7ef09aae711@xymon.invalid> writes:

My old BB setup has a customized vmstat-larrd.pl script which allows for 
 variations in vmstat output based on the version of procps.  In other 
words, it compensates for the fact that RHEL3 and old red hat linux 
systems have vmstat output who's column ordering doesn't match up.

Hobbit knows these two layouts of the vmstat data as "linux" and
"debian3", the latter being the one for the older linux versions
(essentially, systems running vmstat for a Linux 2.2 kernel).

▸ quoted from Tom Georgoulias

So I'd like to bring some of those vmstat changes into my new hobbit 
setup, most notably the ability to plot CPU wait for IO (wa) alongside 
user, system & idle time, but poking around in hobbitgraph.cfg doesn't 
reveal an easy way to do it.

Any tips on how I might accomplish this?

As with LARRD, there are two steps: Collecting the data, and
displaying them.

Data collection is handled by hobbitd_larrd. The interesting stuff
here is in hobbitd/do_larrd.c, and hobbitd/larrd/*.c . do_larrd.c
determines which function should parse an incoming status or data
message, by looking at the name of the "status" or "data" name.
It also consults the LARRDS environment variable, e.g. to figure out
that "ftp" is handled by the "tcp" parser. Each type of RRD file then
has it's own little routine in one of the hobbitd/larrd/*.c files to
pick out the interesting data, and put it into the RRD file.

Where do you get the I/O wait information from ?


Data display is handled by hobbitgraph.cgi, and the config file
hobbitgraph.cfg. This is very similar to the looong set of 
definitions in larrd-grapher.cgi, except that you need not worry about
hostnames in the RRD files, because Hobbit keeps all RRD files for a
given host in a separate directory. So e.g. the "vmstat" graph can
just get the CPU idle-time value with

     DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE

i.e. grab the "vmstat.rrd" file, and extract the current average value
of the "cpu_idl" dataset.

You can mix values from different RRD files in the same graph,
e.g. the "vmstat2" graph uses both the "vmstat.rrd" file and the
"la.rrd" file:

	DEF:avg=la.rrd:la:AVERAGE
        CDEF:la=avg,100,/
        DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE
        CDEF:cpu_idl2=cpu_idl,100,/


If you have more questions, please ask. And if you have something that
could be of interest to others, I'll be happy to include it with
Hobbit.


Regards,
Henrik

list Charles Jones · Mon, 24 Jan 2005 15:14:54 -0700 ·

How difficult is it to add custom graphs?  For example we have several Oracle standby databases that "resync" (import binary database changelogs) for several hours every night. I have a perl script which parses one of the logfiles created by this process, and gets values like the total time it took to import each binary log. Having a graph of the average time it takes to process would be VERY handy for metric and scaling evaluations.

The long story short is that I have a script or process that outputs numbers that I want to graph with Hobbit+LARRD. Is it possible?  Or, I should be asking, how difficult would it be, and can you give any pointers on what to do.

Thanks,

-Charles

list Henrik Størner · Mon, 24 Jan 2005 23:46:01 +0100 ·

On Mon, Jan 24, 2005 at 03:14:54PM -0700, Charles Jones wrote:

How difficult is it to add custom graphs?

Well, it does require some programming - it's an advantage if you 
are familiar with C, since that is the language Hobbit is written in.
I don't think it's terribly hard, but I am biased.

▸ quoted from Charles Jones

For example we have several Oracle standby databases that "resync"
(import binary database changelogs) for several hours every night. I
have a perl script which parses one of the logfiles created by this
process, and gets values like the total time it took to import each
binary log. Having a graph of the average time it takes to process
would be VERY handy for metric and scaling evaluations.

The hard part usually is getting the data, and you have that already
with your perl script.

Next step is getting the data to the Hobbit server. This is easy;
decide on a unique name for the type of data you're handling - e.g.
"orasync" - and use the "bb" utility to send it off as a "data"
message to Hobbit. E.g. the following script runs your perl script,
stores the output in a temporary file, and uses the "bb" utility from
a Big Brother client installation to send this datafile to Hobbit
in a "data" message:

   #!/bin/sh

   /foo/perlscript >/tmp/datafile
   
   BBHOME=/usr/local/bbc
   export BBHOME
   . $BBHOME/etc/bbdef.sh

   $BB $BBDISP "data $MACHINE.orasync

   `cat /tmp/datafile`
   "

Now the fun bit starts. Hobbit will automatically pass data-messages
to all tasks monitoring the "data" channel. hobbitd_larrd is one of
them, so you can either add some code to this "worker module", or you
can create your own module from scratch using your favorite
programming language. An example of a hobbitd worker module is in the
hobbitd_sample.c application included with Hobbit.

Assuming you just add stuff to the existing hobbitd_larrd module,
you must do two things:

1) Write a routine do_orasync_larrd() similar to the other ones
   in the hobbitd/larrd/*.c files, that receives the message, 
   picks out the numbers that you want to store, and saves it in
   an RRD file;

2) Add a line to do_larrd.c at the end of the file, so when it sees
   the "orasync" message, it calls your do_orasync_larrd() routine.

You need to learn about RRDtool to really do this; the "rrdcreate"
and "rrdgraph" manpages include some tips on how to define RRD's
and how you can setup graphs.

Take a look at one of the simple ones, e.g. hobbitd/larrd/do_bbgen.c
which picks out a single value from the status message that bbgen
sends when updating the web-pages - the do_bbgen_larrd routine simply
finds the string "TIME TOTAL <some number>", picks out the number
(which is the time bbgen takes to generate the webpages and stores it
in an RRD file.

The data stored in the RRD file is described in the bbgen_params
variable (or "orasync_params" in your case):

static char *bbgen_params[] = { "rrdcreate", rrdfn, 
      "DS:runtime:GAUGE:600:0:U", 
      rra1, rra2, rra3, rra4, NULL };

The first and last line of this is static; you only change the
"DS:..." line to define the data you store in the rrd.

When you have the data you want, put all of it in the "rrdvalues"
string, with the timestamp in front, and call the
create_and_update_rrd routine to do the work of saving the data.


So now you have an RRD file. Time to put it into hobbitgraph.cfg.
This really depends on the kind of data you are handling.


The last step is to add "orasync" to the GRAPHS definition in
hobbitserver.cfg. This causes the bb-larrdcolumn tool to include the
orasync graph on the "trends" column page.


The first time you do it, it seems complex, and I admit: it isn't
exactly trivial because there are many pieces that need to fit
together. But once you get the first simple graph working you'll
see that it isn't all that hard. And I'll be happy to help you if you
run into problems along the way.


Henrik

list Charles Jones · Mon, 24 Jan 2005 16:03:31 -0700 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

and use the "bb" utility to send it off as a "data"
message to Hobbit. E.g. the following script runs your perl script,
stores the output in a temporary file, and uses the "bb" utility from
a Big Brother client installation to send this datafile to Hobbit
in a "data" message:

  #!/bin/sh

  /foo/perlscript >/tmp/datafile
    BBHOME=/usr/local/bbc
  export BBHOME
  . $BBHOME/etc/bbdef.sh

  $BB $BBDISP "data $MACHINE.orasync

  `cat /tmp/datafile`
  "

Now the fun bit starts. Hobbit will automatically pass data-messages
to all tasks monitoring the "data" channel.

Can you tell me the difference between using "data" and "status"?  The reason I ask is because I looked at the hobbitd/larrd/do_bea.c (because it was the smallest one), and I notice in the comments that script that feeds it is using "status" instead of "data".

Thanks,
-Charles

list Charles Jones · Mon, 24 Jan 2005 16:12:35 -0700 ·

▸ quoted from Charles Jones

Charles Jones wrote:

Henrik Stoerner wrote:

and use the "bb" utility to send it off as a "data"
message to Hobbit. E.g. the following script runs your perl script,
stores the output in a temporary file, and uses the "bb" utility from
a Big Brother client installation to send this datafile to Hobbit
in a "data" message:

  #!/bin/sh

  /foo/perlscript >/tmp/datafile
    BBHOME=/usr/local/bbc
  export BBHOME
  . $BBHOME/etc/bbdef.sh

  $BB $BBDISP "data $MACHINE.orasync

  `cat /tmp/datafile`
  "

Now the fun bit starts. Hobbit will automatically pass data-messages
to all tasks monitoring the "data" channel.

Can you tell me the difference between using "data" and "status"?  The reason I ask is because I looked at the hobbitd/larrd/do_bea.c (because it was the smallest one), and I notice in the comments that script that feeds it is using "status" instead of "data".

Thanks,
-Charles

Opps, scratch that....the do_bea.c is definitely not the smallest one, and I should have looked at the one you suggested :-)  I would still like to know the difference between, and when one should use, data vs status though.

Now that I am looking at d0_bbgen.c, it does look very simple...I will give a try at making my own.  Thanks again.

-Charles

list Charles Jones · Mon, 24 Jan 2005 17:07:54 -0700 ·

Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager?  If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)

-Charles

list Daniel J McDonald · Mon, 24 Jan 2005 18:10:48 -0600 ·

▸ quoted from Charles Jones

On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:

Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager?  If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)

Then you wouldn't know that the person who was supposed to be notified
really was...  Making them copy it off a pager is a cheap "2-factor"
authentication....

-- 
Daniel J McDonald, CCIE # 2495, CNX
Austin Energy

user-290ce4e24e19@xymon.invalid

list Charles Jones · Mon, 24 Jan 2005 17:16:21 -0700 ·

▸ quoted from Daniel J McDonald

Daniel J McDonald wrote:

On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:

Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager?  If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)

Then you wouldn't know that the person who was supposed to be notified
really was...  Making them copy it off a pager is a cheap "2-factor"
authentication....

I see what you mean.  In my case the alerts go to an email alias that includes both the alert pager and an email list. So it can be any one of a number of people who actually Ack the alert, and they always have to copy and paste it from the email subject. They use the explanation/cause field to indicate who acked ie. "Network cable came unplugged, plugged back in -cjones".  I'm just trying to figure out a way to make the Acking process more efficient.

-Charles

list Adam Goryachev · Tue, 25 Jan 2005 15:26:00 +1100 ·

▸ quoted from Charles Jones

On Mon, 2005-01-24 at 16:12 -0700, Charles Jones wrote:

Charles Jones wrote:

Henrik Stoerner wrote:

and use the "bb" utility to send it off as a "data"
message to Hobbit. E.g. the following script runs your perl script,
stores the output in a temporary file, and uses the "bb" utility from
a Big Brother client installation to send this datafile to Hobbit
in a "data" message:

  #!/bin/sh

  /foo/perlscript >/tmp/datafile
    BBHOME=/usr/local/bbc
  export BBHOME
  . $BBHOME/etc/bbdef.sh

  $BB $BBDISP "data $MACHINE.orasync

  `cat /tmp/datafile`
  "

Now the fun bit starts. Hobbit will automatically pass data-messages
to all tasks monitoring the "data" channel.

Can you tell me the difference between using "data" and "status"?  The > reason I ask is because I looked at the hobbitd/larrd/do_bea.c > (because it was the smallest one), and I notice in the comments that > script that feeds it is using "status" instead of "data".

Thanks,
-Charles

Opps, scratch that....the do_bea.c is definitely not the smallest one, and I should have looked at the one you suggested :-)  I would still like to know the difference between, and when one should use, data vs status though.

AFAIK, status is the result of a test, which should be
alarmed/displayed/etc, whereas data is not alarmed/displayed, just
handed off to something else to deal with it. (I think in BB it was just
appended to a file in bbvar/data/hostname....)

Regards,
Adam

-- 
 -- Adam Goryachev
Website Managers
Ph:  +XX X XXXX XXXX                        user-eaec2ffb4cbc@xymon.invalid
Fax: +XX X XXXX XXXX                        www.websitemanagers.com.au

list Henrik Størner · Tue, 25 Jan 2005 07:25:26 +0100 ·

▸ quoted from Charles Jones

On Mon, Jan 24, 2005 at 04:03:31PM -0700, Charles Jones wrote:

Henrik Stoerner wrote:

and use the "bb" utility to send it off as a "data"
message to Hobbit.

Can you tell me the difference between using "data" and "status"?

A "status" message results in a column on the display, and also has a
color (red, green, yellow) that might trigger an alert.

A "data" message is never displayed and cannot generate an alert,
it is just a way of collecting data.


Henrik

list Henrik Størner · Tue, 25 Jan 2005 06:36:10 +0000 (UTC) ·

▸ quoted from Daniel J McDonald

In <user-f81b191abbfc@xymon.invalid> Daniel J McDonald <user-290ce4e24e19@xymon.invalid> writes:

On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:

Would it be possible, when the status of something goes Red, to have the 
code printed somewhere on the status page, so that one could use the 
"Acknowledge alert" option and copy/paste it in, rather than having to 
get the incident code from email/pager?  If we wanted to be really 
spiffy the acknowledge alert could even have a dropdown/list of current 
alerts so you wouldn't even have to type it :)

Then you wouldn't know that the person who was supposed to be notified
really was...  Making them copy it off a pager is a cheap "2-factor"
authentication....

Exactly. But I understand Charles' question, because I've been wanting
to do something like that.

Our monitoring is handled by a NOC manned 24x7, and when an alert
pops up on the Hobbit NK view they raise a trouble-ticket in some
other system. The NOC people dont get an e-mail or pager alert, but it
would still be nice if they could acknowledge "yes, a TT has been
raised about this" to get the problem off their monitor. So I will
probably implement some way of putting an "acknowledge" function on
the webpages - this would have to be protected with some sort of
access control, obviously.


Henrik

list Charles Jones · Tue, 25 Jan 2005 02:07:42 -0700 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

On Mon, Jan 24, 2005 at 04:03:31PM -0700, Charles Jones wrote:

Henrik Stoerner wrote:

and use the "bb" utility to send it off as a "data"
message to Hobbit.

Can you tell me the difference between using "data" and "status"?

A "status" message results in a column on the display, and also has a
color (red, green, yellow) that might trigger an alert.

A "data" message is never displayed and cannot generate an alert,
it is just a way of collecting data.

But data can be collected from a status message as well right? At least I hope so, because I want to update a status, AND graph the result. For instance in my oracle resync scenario, I want to send basically a status that says "Resync completed successfully. 500 files imported. Total Resync Time:  1530 seconds." I want the 1530 to be trended.  Will I be able to do that, or will have have to send both a status and a seperate data message?

Thanks,
-Charles

list Charles Jones · Tue, 25 Jan 2005 02:12:14 -0700 ·

▸ quoted from Henrik Størner

Henrik Storner wrote:

In <user-f81b191abbfc@xymon.invalid> Daniel J McDonald <user-290ce4e24e19@xymon.invalid> writes:

On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:

Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager?  If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)

Then you wouldn't know that the person who was supposed to be notified
really was...  Making them copy it off a pager is a cheap "2-factor"
authentication....

Exactly. But I understand Charles' question, because I've been wanting
to do something like that.

Our monitoring is handled by a NOC manned 24x7, and when an alert
pops up on the Hobbit NK view they raise a trouble-ticket in some
other system. The NOC people dont get an e-mail or pager alert, but it
would still be nice if they could acknowledge "yes, a TT has been
raised about this" to get the problem off their monitor. So I will
probably implement some way of putting an "acknowledge" function on
the webpages - this would have to be protected with some sort of
access control, obviously.

Currently I am simply using a .htaccess file to restrict access, which has been working for me so far, but built-in access control would be nice, particularly for Acks and for Maint.pl. If there were a proper permissions system, you could even define what users could see which groups of hosts!  Ahhh I feel the feature creature sneaking up on us! :-)

-Charles

list Henrik Størner · Tue, 25 Jan 2005 11:16:27 +0100 ·

▸ quoted from Charles Jones

On Tue, Jan 25, 2005 at 02:07:42AM -0700, Charles Jones wrote:

Henrik Stoerner wrote:

A "status" message results in a column on the display, and also has a
color (red, green, yellow) that might trigger an alert.

A "data" message is never displayed and cannot generate an alert,
it is just a way of collecting data.

But data can be collected from a status message as well right? At least 
I hope so, because I want to update a status, AND graph the result.

Certainly, no problem at all. Hobbit gets most of its RRD graph-data
from status messages (the "cpu", "disk", "memory" and network test
messages, for instance). That's why you'll see two hobbitd_larrd
processes running: One gets the "status" messages, and the other gets
the "data" messages.

So no, you don't need to do anything special. Whether you send your
original data as a status- or a data-message is up to you, as far as
collecting the data in an RRD and graphing them, there is no
difference.


Regards,
Henrik

list Tom Georgoulias · Tue, 25 Jan 2005 08:27:37 -0500 ·

Henrik Storner wrote:

<snip>

Thanks for the explanation of larrd.  It helped a lot.

Where do you get the I/O wait information from ?

On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu", so I modified a section of larrd-0.43c's vmstat-larrd.pl so it'd recognize this value and use it when dealing with rhel3 systems.  I hacked my client's vmstat larrd bf script to make it determine if the system rhel3 or not, then exported the BBOSNAME as rhel3 so this array assignment would used by vmstat-larrd.pl.

rhel3 => {  cpu_r    => 0,
	cpu_b    => 1,
	mem_swpd => 2,
	mem_free => 3,
	mem_buff => 4,
	mem_cach => 5,
	mem_si   => 6,
	mem_so   => 7,
	dsk_bi   => 8,
	dsk_bo   => 9,
	cpu_int  => 10,
	cpu_csw  => 11,
	cpu_usr  => 12,
	cpu_sys  => 13,
	cpu_wait => 14,
	cpu_idl  => 15,

I might try adding this to hobbitd/larrd/do_vmstat.c and see if I can make it work.

▸ quoted from Henrik Størner

     DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE

i.e. grab the "vmstat.rrd" file, and extract the current average value
of the "cpu_idl" dataset.

You can mix values from different RRD files in the same graph,
e.g. the "vmstat2" graph uses both the "vmstat.rrd" file and the
"la.rrd" file:

This is nice.  Once I figured out what you were doing there, I thought "hey, all I've got to do is set up a def for cpu_wa|cpu_wait and I'm golden."  Then I fired up rrdtool and checked the rrd file, only to realize that I didn't have the data to begin with...

▸ quoted from Henrik Størner

If you have more questions, please ask. And if you have something that
could be of interest to others, I'll be happy to include it with
Hobbit.

I'll be happy to contribute any patches that I generate.

Tom

Tom

list Tom Georgoulias · Tue, 25 Jan 2005 09:39:03 -0500 ·

▸ quoted from Tom Georgoulias

Tom Georgoulias wrote:

rhel3 => {  cpu_r    => 0,
        cpu_b    => 1,
        mem_swpd => 2,
        mem_free => 3,
        mem_buff => 4,
        mem_cach => 5,
        mem_si   => 6,
        mem_so   => 7,
        dsk_bi   => 8,
        dsk_bo   => 9,
        cpu_int  => 10,
        cpu_csw  => 11,
        cpu_usr  => 12,
        cpu_sys  => 13,
        cpu_wait => 14,
        cpu_idl  => 15,

I might try adding this to hobbitd/larrd/do_vmstat.c and see if I can
make it work.

I was able to get this to work w/o much hassle at all--modifying hobbitd/larrd/do_vmstat.c to include the rhel3 array and lib/misc.c, lib/misc.h to define rhel3 as an os type did the trick.  Then I created a vmstat graph config (vmstat_rhel3) in hobbitgraph.cfg that uses all 4 cpu status parameters and referenced that in bb-hosts for my rhel3 systems.  Like I said in my last message, the vmstat bottom feeders on the clients have to be configured to set the BBOSTYPE to rhel3 when sending the data to the hobbit server for this to take effect, so this is more of a positive test result than a general purpose solution.  I guess the point of this email is that it works just like I wanted it too.  Getting it more generalized is my next step.

TOm

list Daniel J McDonald · Tue, 25 Jan 2005 08:46:30 -0600 ·

▸ quoted from Tom Georgoulias

On Tue, 2005-01-25 at 08:27 -0500, Tom Georgoulias wrote:

Henrik Storner wrote:

<snip>

Thanks for the explanation of larrd.  It helped a lot.

Where do you get the I/O wait information from ?

On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu", so I modified a section of larrd-0.43c's vmstat-larrd.pl so it'd recognize this value and use it when dealing with rhel3 systems.  I hacked my client's vmstat larrd bf

Actually, that is present in all kernel 2.6 versions, e.g. Mandrake 10.0
and 10.1.  I'd love to be able to capture that - I beat on bb-central
for quite a while trying to track it.

Tracking wait state is great for figuring out which boxes need more ram.

-- 
Daniel J McDonald, CCIE # 2495, CNX
Austin Energy

user-290ce4e24e19@xymon.invalid

list Henrik Størner · Tue, 25 Jan 2005 18:04:24 +0100 ·

▸ quoted from Tom Georgoulias

On Tue, Jan 25, 2005 at 08:27:37AM -0500, Tom Georgoulias wrote:

Henrik Storner wrote:

Where do you get the I/O wait information from ?

On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of 
vmstat's output, labeled "wa" under "cpu"

Aha! So that's it - I had been wondering a bit why my load graphs
didn't always add up to 100% !

This is quite interesting, and definitely something that should be
tracked. So I hope you don't mind that I've tried adding it myself ...

One annoying bit with the RRD files is that changing the dataset
(e.g. adding an extra variable) is not possible. So adding the
cpu_wait data will break any existing vmstat data that has been
collected. So if we're gonna break the vmstat RRD layout for Linux
clients, we might as well do it now before the official release.  And
that should also include getting the very old layout (the one from
Linux 2.2 kernels, with the "r b w" proces-counts) aligned with the
new layout - effectively creating a single vmstat RRD format
regardless of what Linux version you are running.

So: I've modified the Linux vmstat RRD layout to always include the
"cpu_w" (from the very old vmstat version) and "cpu_wait" columns
(from the latest vmstat versions). If the client doesn't report a
value for these, they are set to the special RRD-value "undefined". So
when someone upgrades a system from Linux 2.2. to 2.4, or from 2.4 to
2.6, the vmstat data will still work.

I've also defined a "vmstat1" graph similar to the normal "vmstat"
graph, but with the cpu_wait data added (it stacks on top of the
"system" time, below "user" time).

Some sample graphs (they don't have any data yet, so you're probably
better off waiting a couple of hours before you view them):


Linux 2.6 host:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=vmstat1&graph=hourly

Linux 2.4 host:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=tyge.sslug.dk&service=vmstat1&graph=hourly

Linux 2.2 host (actually 2.4, but an old vmstat version):
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=fenris.hswn.dk&service=vmstat1&graph=hourly


Henrik

list Tom Georgoulias · Tue, 25 Jan 2005 14:56:24 -0500 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of
vmstat's output, labeled "wa" under "cpu"

Aha! So that's it - I had been wondering a bit why my load graphs
didn't always add up to 100% !

This is quite interesting, and definitely something that should be
tracked. So I hope you don't mind that I've tried adding it myself ...

Oh no, please do.  Mine is a hack, your's would be a release.  ;)

▸ quoted from Henrik Størner


So adding the

cpu_wait data will break any existing vmstat data that has been
collected. So if we're gonna break the vmstat RRD layout for Linux
clients, we might as well do it now before the official release.  And
that should also include getting the very old layout (the one from
Linux 2.2 kernels, with the "r b w" proces-counts) aligned with the
new layout - effectively creating a single vmstat RRD format
regardless of what Linux version you are running.

Good.  Very good.

▸ quoted from Henrik Størner

So: I've modified the Linux vmstat RRD layout to always include the
"cpu_w" (from the very old vmstat version)

Isn't that value the number of processes swapped out, the third column 
from old vmstat?  That is basically going to be ignored, unless someone 
has a custom larrd graph that uses it, right?

▸ quoted from Henrik Størner


and "cpu_wait" columns

(from the latest vmstat versions). If the client doesn't report a
value for these, they are set to the special RRD-value "undefined". So
when someone upgrades a system from Linux 2.2. to 2.4, or from 2.4 to
2.6, the vmstat data will still work.

Cool. I'm looking forward to testing it out in the next beta.

Tom

list Chris Morris · Wed, 26 Jan 2005 09:29:31 -0000 ·

Henrik,

AIX also reports i/o wait in its vmstat output in column 16 under wa of cpu
which it would be nice to have in the graphs.

kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 0  2 238300 239351   0   0   0  60  108   0 110   79  83  8 30 30 32
 0  2 238803 238827   0   0   0   0    0   0 498 3066 334  1  2 97  1

Chris

▸ quoted from Tom Georgoulias

-----Original Message-----
From:	Henrik Stoerner [SMTP:user-ce4a2c883f75@xymon.invalid]
Sent:	Tuesday, January 25, 2005 5:04 PM
To:	user-ae9b8668bcde@xymon.invalid
Subject:	Re: [hobbit] vmstat graphing with CPU io wait

On Tue, Jan 25, 2005 at 08:27:37AM -0500, Tom Georgoulias wrote:

Henrik Storner wrote:

Where do you get the I/O wait information from ?

On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of 
vmstat's output, labeled "wa" under "cpu"

Aha! So that's it - I had been wondering a bit why my load graphs
didn't always add up to 100% !

This is quite interesting, and definitely something that should be
tracked. So I hope you don't mind that I've tried adding it myself ...

One annoying bit with the RRD files is that changing the dataset
(e.g. adding an extra variable) is not possible. So adding the
cpu_wait data will break any existing vmstat data that has been
collected. So if we're gonna break the vmstat RRD layout for Linux
clients, we might as well do it now before the official release.  And
that should also include getting the very old layout (the one from
Linux 2.2 kernels, with the "r b w" proces-counts) aligned with the
new layout - effectively creating a single vmstat RRD format
regardless of what Linux version you are running.

So: I've modified the Linux vmstat RRD layout to always include the
"cpu_w" (from the very old vmstat version) and "cpu_wait" columns
(from the latest vmstat versions). If the client doesn't report a
value for these, they are set to the special RRD-value "undefined". So
when someone upgrades a system from Linux 2.2. to 2.4, or from 2.4 to
2.6, the vmstat data will still work.

I've also defined a "vmstat1" graph similar to the normal "vmstat"
graph, but with the cpu_wait data added (it stacks on top of the
"system" time, below "user" time).

Some sample graphs (they don't have any data yet, so you're probably
better off waiting a couple of hours before you view them):


Linux 2.6 host:


http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=v
mstat1&graph=hourly

Linux 2.4 host:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=tyge.sslug.dk&service=vm
stat1&graph=hourly

Linux 2.2 host (actually 2.4, but an old vmstat version):
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=fenris.hswn.dk&service=v
mstat1&graph=hourly


Henrik

****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************

list Henrik Størner · Wed, 26 Jan 2005 11:10:03 +0100 ·

▸ quoted from Chris Morris

On Wed, Jan 26, 2005 at 09:29:31AM -0000, Morris, Chris (Shared Services) wrote:

AIX also reports i/o wait in its vmstat output in column 16 under wa of cpu
which it would be nice to have in the graphs.

kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 0  2 238300 239351   0   0   0  60  108   0 110   79  83  8 30 30 32
 0  2 238803 238827   0   0   0   0    0   0 498 3066 334  1  2 97 1

Yes, I noticed that when I worked on the vmstat graphs yesterday.
This was already being collected for AIX, so I made sure the names
matched, so that the graph-definitions will work for both Linux and
AIX.

You can try it with the AIX data you have. Add this to
hobbitgraph.cfg - it's the definition for "vmstat1" I wrote
yesterday. It gives you a graph with the CPU usage split into
system, I/O wait, user and idle:

[vmstat1]
        TITLE CPU Utilization
        YAXIS % Load
        -u 100
        -r
        DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE
        DEF:cpu_usr=vmstat.rrd:cpu_usr:AVERAGE
        DEF:cpu_sys=vmstat.rrd:cpu_sys:AVERAGE
        DEF:cpu_wait=vmstat.rrd:cpu_wait:AVERAGE
        AREA:cpu_sys#FF0000:System
        STACK:cpu_wait#774400:I/O wait
        STACK:cpu_usr#FFFF00:User
        STACK:cpu_idl#00FF00:Idle
        COMMENT:\n
        GPRINT:cpu_sys:LAST:System  \: %5.1lf (cur)
        GPRINT:cpu_sys:MAX: \: %5.1lf (max)
        GPRINT:cpu_sys:MIN: \: %5.1lf (min)
        GPRINT:cpu_sys:AVERAGE: \: %5.1lf (avg)\n
        GPRINT:cpu_wait:LAST:I/O Wait\: %5.1lf (cur)
        GPRINT:cpu_wait:MAX: \: %5.1lf (max)
        GPRINT:cpu_wait:MIN: \: %5.1lf (min)
        GPRINT:cpu_wait:AVERAGE: \: %5.1lf (avg)\n
        GPRINT:cpu_usr:LAST:User    \: %5.1lf (cur)
        GPRINT:cpu_usr:MAX: \: %5.1lf (max)
        GPRINT:cpu_usr:MIN: \: %5.1lf (min)
        GPRINT:cpu_usr:AVERAGE: \: %5.1lf (avg)\n
        GPRINT:cpu_idl:LAST:Idle    \: %5.1lf (cur)
        GPRINT:cpu_idl:MAX: \: %5.1lf (max)
        GPRINT:cpu_idl:MIN: \: %5.1lf (min)
        GPRINT:cpu_idl:AVERAGE: \: %5.1lf (avg)\n

Now find one of your AIX boxes on the Hobbit webpages and look at the
vmstat graphs. Then, in the browser change the part of the URL that
says "service=vmstat" to "service=vmstat1". You should then see the
new graph.

Or put "LARRD:*,vmstat:vmstat1" in the AIX-hosts' entry in bb-hosts
and wait for bb-larrdcolumn to update the set of graphs shown by
default.


Henrik

list Chris Morris · Wed, 26 Jan 2005 11:12:37 -0000 ·

Henrik,

Re vmstat1 on AIX - that works a treat - finally a complete graph : )

Thanks

Chris

▸ quoted from Chris Morris



****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************

list Tom Georgoulias · Wed, 26 Jan 2005 07:44:21 -0500 ·

Henrik,

Are the vmstat patches you created ready for beta testing?  Care to 
share them so I can test them out?

Tom

list Henrik Størner · Wed, 26 Jan 2005 14:23:57 +0100 ·

▸ quoted from Tom Georgoulias

On Wed, Jan 26, 2005 at 07:44:21AM -0500, Tom Georgoulias wrote:

Are the vmstat patches you created ready for beta testing?  Care to 
share them so I can test them out?

I plan on putting out a "release candidate" tomorrow.

There is a beta6-vmstat.patch file on http://www.hswn.dk/beta/
which has the vmstat changes; applies on top of beta-6.

After patching, run "make" and "make install", then restart hobbit (or
at least hobbitd_larrd - if you just kill it, then hobbitlaunch will
restart it automatically).

Make sure you copy over the new hobbitd/etcfiles/hobbitgraph.cfg file
to ~hobbit/server/etc/

You also need to delete the existing ~hobbit/data/rrd/*/vmstat.rrd
files (at least those from Linux systems), or you will get a lot of
errors that it cannot update the vmstat.rrd file. Check the
larrd-status.log and larrd-data.log files.


Henrik

list Tom Georgoulias · Wed, 26 Jan 2005 11:47:30 -0500 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:


On Wed, Jan 26, 2005 at 07:44:21AM -0500, Tom Georgoulias wrote:

Are the vmstat patches you created ready for beta testing?  Care to
share them so I can test them out?

I plan on putting out a "release candidate" tomorrow.

There is a beta6-vmstat.patch file on http://www.hswn.dk/beta/
which has the vmstat changes; applies on top of beta-6.

Thanks for providing the patch.  I applied it and it built without any errors, but I'm still having problems getting it to work.  I did copy over the new hobbitgraph.cfg file after installing & deleted the vmstat.rrd for the linux system in question before restarting.

So, my first question:  I was looking at the patch and wasn't sure the array order is correct.  (I'm not a programmer by any means, so if I'm wrong just say so).

on RHEL3, vmstat's CPU info columns are in this order:


user -12th
system - 13th
IO wait - 14th
idle - 15th

For example (pardon the line wrap):

-bash-2.05b$ vmstat 2
procs                      memory      swap          io     system     cpu
  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy wa id
  0  1      0  19036  27412 4370032    0    0   214     0  622   649  0   1 50 48


in the patch, you have cpu_idl =14 & cpu_wait=15.  Is that backwards? Or am I out of my league (disclaimer:  I hardly know anything about C programming).

static vmstat_layout_t vmstat_linux_layout[] = {
         { 0, "cpu_r" },
         { 1, "cpu_b" },
         { -1, "cpu_w" },        /* Not present for 2.4+ kernels, so log as "Undefined" */
         { 2, "mem_swpd" },
         { 3, "mem_free" },
         { 4, "mem_buff" },
         { 5, "mem_cach" },
         { 6, "mem_si" },
         { 7, "mem_so" },
         { 8, "dsk_bi" },
         { 9, "dsk_bo" },
         { 10, "cpu_int" },
         { 11, "cpu_csw" },
         { 12, "cpu_usr" },
         { 13, "cpu_sys" },
         { 14, "cpu_idl" },
         { 15, "cpu_wait"  },    /* Requires kernel 2.6, but may not be present */
         { -1, NULL }
};

list Henrik Størner · Wed, 26 Jan 2005 18:15:14 +0100 ·

▸ quoted from Tom Georgoulias

On Wed, Jan 26, 2005 at 11:47:30AM -0500, Tom Georgoulias wrote:

So, my first question:  I was looking at the patch and wasn't sure the 
array order is correct.  (I'm not a programmer by any means, so if I'm 
wrong just say so).

on RHEL3, vmstat's CPU info columns are in this order:


user -12th
system - 13th
IO wait - 14th
idle - 15th

Argh! They swapped the order of the IO wait and idle counters!

Well, the simple way of fixing that is to just switch them around
in hobbitgraph.cfg. But cpu_idl is used in a lot of graphs, so that 
does get rather messy.

So it's probably better to define RHEL3 as a new OS type, and
setup it's own table for mapping the numbers to the RRD data.

Patch - on top of the previous one - attached. It compiles, but
I haven't tested it. It assumes your vmstat data sends in
"rhel3" as the name of the OS.


Henrik
-------------- next part --------------
--- lib/misc.c	2005/01/20 22:02:23	1.23
+++ lib/misc.c	2005/01/26 17:11:14
@@ -43,6 +43,7 @@
 	else if (strcmp(osname, "debian3") == 0)     result = OS_DEBIAN3;
 	else if (strcmp(osname, "debian") == 0)      result = OS_DEBIAN;
 	else if (strcmp(osname, "linux") == 0)       result = OS_LINUX;
+	else if (strcmp(osname, "rhel3") == 0)       result = OS_RHEL3;
 	else if (strcmp(osname, "snmp") == 0)        result = OS_SNMP;
 	else if (strcmp(osname, "snmpnetstat") == 0) result = OS_SNMP;
 
--- lib/misc.h	2005/01/17 23:13:41	1.10
+++ lib/misc.h	2005/01/26 17:11:36
@@ -13,7 +13,7 @@
 
 #include <stdio.h>
 
-enum ostype_t { OS_UNKNOWN, OS_SOLARIS, OS_OSF, OS_FREEBSD, OS_LINUX, OS_REDHAT, OS_DEBIAN3, OS_DEBIAN, OS_HPUX, OS_AIX, OS_SCO, OS_SNMP, OS_WIN32 } ;
+enum ostype_t { OS_UNKNOWN, OS_SOLARIS, OS_OSF, OS_FREEBSD, OS_LINUX, OS_REDHAT, OS_DEBIAN3, OS_DEBIAN, OS_HPUX, OS_AIX, OS_SCO, OS_SNMP, OS_WIN32, OS_RHEL3 } ;
 
 extern enum ostype_t get_ostype(char *osname);
 extern int hexvalue(unsigned char c);
--- hobbitd/larrd/do_vmstat.c	2005/01/25 17:53:45	1.9
+++ hobbitd/larrd/do_vmstat.c	2005/01/26 17:10:26
@@ -119,6 +119,31 @@
 	{ -1, NULL }
 };
 
+/*
+ * This one is for Red Hat Enterprise Linux 3. Identical to the "linux" layout,
+ * except Red Hat for some reason decided to swap the cpu_wait and cpu_idle columns.
+ */
+static vmstat_layout_t vmstat_rhel3_layout[] = {

▸ quoted from Tom Georgoulias

+	{ 0, "cpu_r" },
+	{ 1, "cpu_b" },
+	{ -1, "cpu_w" },
+	{ 2, "mem_swpd" },
+	{ 3, "mem_free" },
+	{ 4, "mem_buff" },
+	{ 5, "mem_cach" },
+	{ 6, "mem_si" },
+	{ 7, "mem_so" },
+	{ 8, "dsk_bi" },
+	{ 9, "dsk_bo" },
+	{ 10, "cpu_int" },
+	{ 11, "cpu_csw" },
+	{ 12, "cpu_usr" },
+	{ 13, "cpu_sys" },


+	{ 14, "cpu_wait" },
+	{ 15, "cpu_idl"  },
+	{ -1, NULL }
+};
• /* This one is for Debian 3.0 (Woody), and possibly others with a Linux 2.2 kernel */
 static vmstat_layout_t vmstat_debian3_layout[] = {
 	{ 0, "cpu_r" },
@@ -218,8 +243,9 @@
 	  case OS_LINUX:
 	  case OS_REDHAT:
 	  case OS_DEBIAN:
-		layout = vmstat_linux_layout;
-		break;
+		layout = vmstat_linux_layout; break;
+	  case OS_RHEL3:
+		layout = vmstat_rhel3_layout; break;
 	  case OS_DEBIAN3:
 		layout = vmstat_debian3_layout; break;
 	  case OS_FREEBSD:

list Charles Jones · Wed, 26 Jan 2005 12:32:45 -0700 ·

Will Hobbit play nice with bbfetch?  If I recall, bbfetch is run on the BBDISPLAY server, and scp's the raw status files generated by the remote clients modified $BBHOME/bin/bb. I'm wondering if it would still work since Hobbit has no bbvar directory...I think it would because I think bb-fetch uses the $BBTMP variable, and from what I have seen Hobbit populates all the usual bb variables.

If bb-fetch won't work with Hobbit, it might be nice to incorporate similar functionality in, as it is quite useful for situations where bbproxy won't do the trick because of one-way firewall issues.

-Charles

list Henrik Størner · Wed, 26 Jan 2005 21:56:59 +0100 ·

▸ quoted from Charles Jones

On Wed, Jan 26, 2005 at 12:32:45PM -0700, Charles Jones wrote:

Will Hobbit play nice with bbfetch?  If I recall, bbfetch is run on the BBDISPLAY server, and scp's the raw status files generated by the remote clients modified $BBHOME/bin/bb.

It might not ... I haven't tried bbfetch myself, so I cannot say.
But it would probably be pretty easy to come up with a script that
picks up the status-files that bbfetch collects, and sends them off
to the Hobbit daemon via the normal Hobbit "bb" command.

If bb-fetch won't work with Hobbit, it might be nice to incorporate similar functionality in, as it is quite useful for situations where bbproxy won't do the trick because of one-way firewall issues.

I have some ideas for a Hobbit client, and yes - making it work in
both a "push" (normal client) and a "pull" (bbfetch style) setup
it necessary.


Henrik

list Charles Jones · Wed, 26 Jan 2005 15:17:01 -0700 ·

My production BigBrother server is running BigBrother + bbgen 2.5 (I know there is newer bbgen, I plan on replacing BB with a Hobbit server).  My current bb+bbgen setup has problems whenever a machine dies in such a way that it is pingable, but when you connect to any open TCP port you get nothing back (usually caused by a memory error or overheating).  When my current bb+bbgen setup tries to test one of these machines that has zombified, it gets hung testing that host, and eventually everything turns purple since  bb isn't updating anymore.

Does Hobbit have proper timeouts to timeout a hung TCP connection so this sort of thing does not happen?  For all I know this behavior was fixed in bbgen 3.x ,  but as I said I plan on just phasing out my BB server in favor of Hobbit.

-Charles

list Henrik Størner · Wed, 26 Jan 2005 23:29:13 +0100 ·

▸ quoted from Charles Jones

On Wed, Jan 26, 2005 at 03:17:01PM -0700, Charles Jones wrote:

My production BigBrother server is running BigBrother + bbgen 2.5 (I know there is newer bbgen, I plan on replacing BB with a Hobbit server).

Wow, that's a pretty old bbgen version - 1œ years, in fact.

▸ quoted from Charles Jones

My current bb+bbgen setup has problems whenever a machine dies in such a way that it is pingable, but when you connect to any open TCP port you get nothing back (usually caused by a memory error or overheating).  When my current bb+bbgen setup tries to test one of these machines that has zombified, it gets hung testing that host, and eventually everything turns purple since  bb isn't updating anymore.

Does Hobbit have proper timeouts to timeout a hung TCP connection so this sort of thing does not happen?

If not, then it's definitely a bug. All network tests done by Hobbit
must timeout if the other end doesn't respond. The default timeout is
10 seconds (set with the "--timeout=N" option to bbtest-net).

Looking back through the bbgen changelog, there are a couple of
bugfixes through the 2.x series that seem likely to fix it. But
without knowing exactly what's triggering this behaviour it is hard to
say for sure.


Henrik

list Charles Jones · Wed, 26 Jan 2005 15:33:23 -0700 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

All network tests done by Hobbit
must timeout if the other end doesn't respond. The default timeout is
10 seconds (set with the "--timeout=N" option to bbtest-net).

The problem is, the ports DO respond, you can telnet for example to port 25, and it connects...but the daemon does not respond...you can input text you will get nothing back, and unless you ^], break, the telnet session it will stay hung and connected indefinitely.  It's these sort of hangs I'm hoping Hobbit can sense and timeout on.  Unfortunately the only way for me to test it, is for a machine to lock up in that manner, and although it happens every now and then, I cannot reproduce it at will.

-Charles

list Tom Georgoulias · Thu, 27 Jan 2005 07:50:09 -0500 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

on RHEL3, vmstat's CPU info columns are in this order:


user -12th
system - 13th
IO wait - 14th
idle - 15th

Argh! They swapped the order of the IO wait and idle counters!

Frustrating, huh?  And I'll bet it'll match Fedora's and others when procps gets updated in a future batch of errata.  :(

▸ quoted from Henrik Størner

Well, the simple way of fixing that is to just switch them around
in hobbitgraph.cfg. But cpu_idl is used in a lot of graphs, so that
does get rather messy.

So it's probably better to define RHEL3 as a new OS type, and
setup it's own table for mapping the numbers to the RRD data.

That's what I've been doing.  One problem that remains for me when doing this, or maybe there for other OSes as well, is the continued use of the "vmstat" graph in the vmstat status page.  I'm going to try and adjust that so the rhel3 systems use vmstat1 and other OSes use whatever they need.

▸ quoted from Henrik Størner

Patch - on top of the previous one - attached. It compiles, but
I haven't tested it. It assumes your vmstat data sends in
"rhel3" as the name of the OS.

I was going to share the patch I created, which looks almost the same, but   I went ahead and used yours instead, though, just to be in sync with your sources.

Tom

list Tom Georgoulias · Thu, 27 Jan 2005 10:35:11 -0500 ·

▸ quoted from Tom Georgoulias

Tom Georgoulias wrote:

Patch - on top of the previous one - attached. It compiles, but
I haven't tested it. It assumes your vmstat data sends in
"rhel3" as the name of the OS.

I was going to share the patch I created, which looks almost the same,
but   I went ahead and used yours instead, though, just to be in sync
with your sources.

I think I spoke too soon.  My Red Hat 7.1/7.3 systems need to use the same layout as debian3, so I had that in my patch.  I also created the cpu_wait column for my freebsd systems, but left it undefined so that every system could use the same vmstat graph.  For those that track IOwait, it'll use it.  For those that do not, the parameter will show up in the legend and keep the value "nan".  Not the prettiest, but much easier to maintain.  Patch is attached, which relies on yours already being in place, in case you are interested.  I hesitate to push for inclusion since RH 8.0, 9 and what ever else is out there may report their BBOSNAME as "redhat" but use a different vmstat, plus not everyone wants their graphs to include a parameter that might not exist.  It's out there for whoever wants to use it.

I also included a simple, tiny patch to add an echo statement for starthobbit.sh that tells the user hobbit is stoppped, much like the message displayed when starting.  I put it there as a way to clarify what is happening when the rest of my team starts messing around with hobbit.  I'm thinking of creating a new symlink called "runhobbit.sh", just to match the old BB style and to try and avoid any confusion that may go along with a command that looks like this "starthobbit.sh stop"

Tom

list Charles Jones · Thu, 27 Jan 2005 21:45:43 -0700 ·

I think it would be cool if Hobbit graphed the number of alerts it sent 
out.  It could be included on the hobbitd status page. Trending alerts 
is good for showing how much pages the Oncall persons are responding to :-)

-Charles

list Charles Jones · Thu, 27 Jan 2005 22:19:29 -0700 ·

I am still unable to get the elusive apache1-apache3 graphs to display.

Here's my relavant bb-hosts entries"
paeg WEB Web Sites
1.2.3.4    www.mysite.com        # noconn http://www.mysite.com 
apache=http://1.2.3.4/server-status?auto 
LARRD:*,apache:apache1|apache2|apache3

I have verified that going to http://1.2.3.4/server-status?auto works, 
here is the data it returns:

Total Accesses: 237
Total kBytes: 1279
CPULoad: 9.10606
Uptime: 66
ReqPerSec: 3.59091
BytesPerSec: 19843.9
BytesPerReq: 5526.14
BusyWorkers: 2
IdleWorkers: 12

Can you see anything I'm doing wrong?  Note I also tried just having the simple keyword "apache" instead of apache=..., still no luck. I'm not even getting an "apache" column (although I wouldn't mind if the graphs just appeared in the http status info page).

Scoreboard: _C_________W__..................................................................................................................................................................................................................................................

list Henrik Størner · Fri, 28 Jan 2005 10:25:42 +0100 ·

▸ quoted from Charles Jones

On Thu, Jan 27, 2005 at 10:19:29PM -0700, Charles Jones wrote:

I am still unable to get the elusive apache1-apache3 graphs to display.

Here's my relavant bb-hosts entries"
paeg WEB Web Sites
1.2.3.4    www.mysite.com        # noconn http://www.mysite.com 
apache=http://1.2.3.4/server-status?auto 
LARRD:*,apache:apache1|apache2|apache3

Do you have an apache.rrd file in ~/data/rrd/www.mysite.com/ ?

If you do, then the graphs should show up on the "trends" page after a
while; the "trends" page is updated every 15 minutes by default so it 
may take a while after you change the bb-hosts file for the new graphs
to show up.

If not, then there's a problem with the data collection. But your
bb-hosts entry looks right, and it seems your server sends the right
data.


Henrik

list Charles Jones · Fri, 28 Jan 2005 03:01:44 -0700 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

On Thu, Jan 27, 2005 at 10:19:29PM -0700, Charles Jones wrote:

I am still unable to get the elusive apache1-apache3 graphs to display.

Here's my relavant bb-hosts entries"
paeg WEB Web Sites
1.2.3.4    www.mysite.com        # noconn http://www.mysite.com apache=http://1.2.3.4/server-status?auto LARRD:*,apache:apache1|apache2|apache3

Do you have an apache.rrd file in ~/data/rrd/www.mysite.com/ ?

Yep:  -rw-r--r--    1 hobbit   other      114492 Jan 28 02:56 apache.rrd

▸ quoted from Henrik Størner

If you do, then the graphs should show up on the "trends" page after a
while; the "trends" page is updated every 15 minutes by default so it may take a while after you change the bb-hosts file for the new graphs
to show up.

It's been in the bb-hosts file for a few hours and still no apache column, nor extra graphs in the http column status page.

▸ quoted from Henrik Størner

If not, then there's a problem with the data collection. But your
bb-hosts entry looks right, and it seems your server sends the right
data.

How do we troubleshoot this? I checked the apache server logs and the access log shows that Hobbit is hitting the server-status url.  Is there some way to manually query data from the apache.rrd file to see if it has anything in it?

-Charles

list Charles Jones · Fri, 28 Jan 2005 03:06:15 -0700 ·

Okay, scratch my last message. I re-read what you said and looked at the *trends* page and the graphs are there.  This is what I get for working on stuff past midnight :-)

Now my question is, how can I get the apache graphs to display on the httpd page as well as in trends page?

-Charles

▸ quoted from Charles Jones



Charles Jones wrote:

Henrik Stoerner wrote:

On Thu, Jan 27, 2005 at 10:19:29PM -0700, Charles Jones wrote:

I am still unable to get the elusive apache1-apache3 graphs to display.

Here's my relavant bb-hosts entries"
paeg WEB Web Sites
1.2.3.4    www.mysite.com        # noconn http://www.mysite.com apache=http://1.2.3.4/server-status?auto LARRD:*,apache:apache1|apache2|apache3

Do you have an apache.rrd file in ~/data/rrd/www.mysite.com/ ?

Yep:  -rw-r--r--    1 hobbit   other      114492 Jan 28 02:56 apache.rrd

If you do, then the graphs should show up on the "trends" page after a
while; the "trends" page is updated every 15 minutes by default so it may take a while after you change the bb-hosts file for the new graphs
to show up.

It's been in the bb-hosts file for a few hours and still no apache column, nor extra graphs in the http column status page.

If not, then there's a problem with the data collection. But your
bb-hosts entry looks right, and it seems your server sends the right
data.

How do we troubleshoot this? I checked the apache server logs and the access log shows that Hobbit is hitting the server-status url.  Is there some way to manually query data from the apache.rrd file to see if it has anything in it?

-Charles

list Henrik Størner · Fri, 28 Jan 2005 12:22:03 +0100 ·

▸ quoted from Charles Jones

On Fri, Jan 28, 2005 at 03:06:15AM -0700, Charles Jones wrote:

Okay, scratch my last message. I re-read what you said and looked at the *trends* page and the graphs are there.  This is what I get for working on stuff past midnight :-)

I had the same feeling last night.

Now my question is, how can I get the apache graphs to display on the httpd page as well as in trends page?

Right now you cannot.

Ideally, bb-hostsvc.cgi that generates the html view of a status log
would pick up the LARRD setting from bb-hosts, and give you the same
graphs that you get on the trends page. It doesn't right now - for
historical reasons, mostly.


Henrik

list Charles Jones · Fri, 28 Jan 2005 04:47:27 -0700 ·

I'm assuming this wont work with Hobbit, since Hobbit stores the rrd files differently.  Do you think temperature-larrd.pl could be modified to run on the Hobbit server and work? Or should I instead attempt to hack the client temperature.sh to send the temp as a data message and then create a do_temp.c module?

Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc). I've been up all night because of temperature issues in my server room, so forgive me if I'm not making much sense :-)

-Charles

list Henrik Størner · Fri, 28 Jan 2005 14:39:39 +0100 ·

▸ quoted from Charles Jones

On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:

I'm assuming this wont work with Hobbit, since Hobbit stores the rrd files differently.  Do you think temperature-larrd.pl could be modified to run on the Hobbit server and work? Or should I instead attempt to hack the client temperature.sh to send the temp as a data message and then create a do_temp.c module?

I looked at converting temperature-larrd.pl when doing the Hobbit
larrd stuff, but I couldn't find the script that feeds it - and
without some idea of what the input data looks like, it's a bit hard
to do the data collection.

Where can I find the client side script ? Or perhaps you can just send
me a sample of the status it reports.

Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc).

You mean doing it in C is too hard :-)

The current work-around is to enable the hobbitd_filestore module to
save status- and data-reports to files, the way Big Brother does.

There's an option for hobbitd_filestore so you need not save all
status logs on disk, but only the ones you want to process with some
other tool.


Henrik

list Charles Jones · Fri, 28 Jan 2005 09:18:09 -0700 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:

I'm assuming this wont work with Hobbit, since Hobbit stores the rrd files differently.  Do you think temperature-larrd.pl could be modified to run on the Hobbit server and work? Or should I instead attempt to hack the client temperature.sh to send the temp as a data message and then create a do_temp.c module?

I looked at converting temperature-larrd.pl when doing the Hobbit
larrd stuff, but I couldn't find the script that feeds it - and
without some idea of what the input data looks like, it's a bit hard
to do the data collection.

Where can I find the client side script ? Or perhaps you can just send
me a sample of the status it reports.

The client script is on deadcat.net - http://www.deadcat.net/viewfile.php?fileid=501
Here is a sample status message, from my BigBrother server that is using it:
logs]# cat *temp
green Fri Jan 28 09:13:19 MST 2005 Temperature status:
Device             Temp(C)  Temp(F)
&green AMBIENT            24       75
&green CPU0               40      104
&green CPU1               40      104
&green CPU2               40      104
&green CPU3               40      104
Status green: All devices look okay

Status unchanged in 5.12 hours
Status message received from 1.2.3.4

Note that the output can vary depending on which kind of machine temperature.sh is run on, but I believe they all have AMBIENT so thats the main value we want to grab and trend

▸ quoted from Henrik Størner

Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc).

You mean doing it in C is too hard :-)

Okay ya got me there :P

▸ quoted from Henrik Størner

The current work-around is to enable the hobbitd_filestore module to
save status- and data-reports to files, the way Big Brother does.

There's an option for hobbitd_filestore so you need not save all
status logs on disk, but only the ones you want to process with some
other tool.

Blah...I'm trying to not use any of the backwards compatible features...I want new and improved all the way :-)

list Daniel J McDonald · Sat, 29 Jan 2005 15:50:12 -0600 ·

▸ quoted from Henrik Størner

On Wed, 2005-01-26 at 21:56 +0100, Henrik Stoerner wrote:

On Wed, Jan 26, 2005 at 12:32:45PM -0700, Charles Jones wrote:

Will Hobbit play nice with bbfetch?  If I recall, bbfetch is run on the 
BBDISPLAY server, and scp's the raw status files generated by the remote 
clients modified $BBHOME/bin/bb.

It might not ... I haven't tried bbfetch myself, so I cannot say.
But it would probably be pretty easy to come up with a script that
picks up the status-files that bbfetch collects, and sends them off
to the Hobbit daemon via the normal Hobbit "bb" command.

If bb-fetch won't work with Hobbit, it might be nice to incorporate 
similar functionality in, as it is quite useful for situations where 
bbproxy won't do the trick because of one-way firewall issues.

I have some ideas for a Hobbit client, and yes - making it work in
both a "push" (normal client) and a "pull" (bbfetch style) setup
it necessary.

I'm actually rather fond of the bb-central style - no clients, all of
the "client like scripts" run via ssh from the server.

bb-fetch has trouble with time - the remote client wipes the status
files on it's own schedule, and the server picks them up on it's own
schedule, and if the client isn't done writing you end up with lots of
purples...

I haven't tried bb-central with hobbit yet.  bbmap is still my next
priority now that I've got a really good bbmrtg.pl running.  But I've
got to get bb-central up soon

list Charles Jones · Tue, 01 Feb 2005 17:10:38 -0700 ·

Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line.  Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?

This would be helpful, as I have a server whose load average shot up to over 100 because of a problem.  Only lasted a few mins, but now my load average graph looks like a tall weed growing in the middle of a golf green :-)

-Charles

list Henrik Størner · Wed, 2 Feb 2005 07:47:32 +0100 ·

▸ quoted from Charles Jones

On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:

Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line.  Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?

Yes, but it's actually a feature that exists in RRDtool, which is used
to generate the graphs. Just change hobbitgraph.cfg's [la] definition:

[la]
	... some lines ...
        CDEF:la=avg,100,/
	--upper-limit 3.0
	--rigid
        .... more lines ...

The --upper-limit sets the value for the top of the graph; the --rigid
makes this setting "rigid" so even if there are larger values in the
dataset, the graph will not adapt to show these.

It might be an idea to let this upper value be determined by the CGI
so you can adjust it dynamically. I'll have a look at that.


Henrik

list Gordon Thiesfeld · Thu, 3 Feb 2005 11:41:58 -0600 ·

Is this something that is going to be implemented into hobbit in the future?
If not, I'd appreciate some help modifying hobbitd_larrd.  The only thing I
can do with C is spell it:-)

 
I'm not using temperature.sh on the client side, but I can format my script
so that the output will match it for consistency.

 
Thanks,

 
Gordon

▸ quoted from Charles Jones

From: Charles Jones [mailto:user-e86b4aeade4e@xymon.invalid] 
Sent: Friday, January 28, 2005 10:18 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] temperature-larrd.pl

Henrik Stoerner wrote: 

On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:

I'm assuming this wont work with Hobbit, since Hobbit stores the rrd 
files differently.  Do you think temperature-larrd.pl could be modified 
to run on the Hobbit server and work? Or should I instead attempt to 
hack the client temperature.sh to send the temp as a data message and 
then create a do_temp.c module?

I looked at converting temperature-larrd.pl when doing the Hobbit
larrd stuff, but I couldn't find the script that feeds it - and
without some idea of what the input data looks like, it's a bit hard
to do the data collection.

Where can I find the client side script ? Or perhaps you can just send
me a sample of the status it reports.

The client script is on deadcat.net -
http://www.deadcat.net/viewfile.php?fileid=501
<http://www.deadcat.net/viewfile.php?fileid=501>; 
Here is a sample status message, from my BigBrother server that is using it:
logs]# cat *temp
green Fri Jan 28 09:13:19 MST 2005 Temperature status:
Device             Temp(C)  Temp(F)
&green AMBIENT            24       75
&green CPU0               40      104
&green CPU1               40      104
&green CPU2               40      104
&green CPU3               40      104
Status green: All devices look okay

Status unchanged in 5.12 hours
Status message received from 1.2.3.4

Note that the output can vary depending on which kind of machine
temperature.sh is run on, but I believe they all have AMBIENT so thats the
main value we want to grab and trend

Speaking of this, it sure would be nice to have some sort of plugin 
system, or something for easily creating custom graphs. I can think of 
many uses for simple one-element graphs (temperature, emails sent per 
day, etc).

You mean doing it in C is too hard :-)

Okay ya got me there :P

The current work-around is to enable the hobbitd_filestore module to
save status- and data-reports to files, the way Big Brother does.

There's an option for hobbitd_filestore so you need not save all
status logs on disk, but only the ones you want to process with some
other tool.

Blah...I'm trying to not use any of the backwards compatible features...I
want new and improved all the way :-)

list Henrik Størner · Fri, 4 Feb 2005 13:10:46 +0100 ·

▸ quoted from Charles Jones

On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:

Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line.  Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?

I've worked on this a bit, and so far I've come up with the solution
that you can see on my site - e.g. if you go to
http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the
first host, and click the graph to see the 4 periodic graphs.

Direct link:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la

At the bottom of the page you can define the lower- and upper-limit
for the graph, and which periodic graph you want to see. E.g enter
"0.5" in the upper-limit, click "View" and you'll get graphs that cut
off spikes at load 0.5.

Something like that you had in mind ?


Henrik

list Henrik Størner · Fri, 4 Feb 2005 23:51:04 +0100 ·

▸ quoted from Charles Jones

On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:

Speaking of this, it sure would be nice to have some sort of plugin 
system, or something for easily creating custom graphs. I can think of 
many uses for simple one-element graphs (temperature, emails sent per 
day, etc).

Sounds reasonable. I've found a way of doing this that keeps as much
as possible of the RRD handling in Hobbit, and makes it easy to use
custom scripts (written in your favourite scripting language) to
process a message and pick out the interesting data you want to put
into a graph.

Basically, you tell hobbitd_larrd which status- or data-messages are
handled by an external script, and what the script is. Your script
is then called when such a message arrives, and is fed the status
message in a file. In return, the script must output the RRD
definitions for the data you want to store, a filename for the RRD
file, and the values.

E.g. if you have a message like

   green Weather in Copenhagen is FAIR

   Temperature: 6
   Wind: 4
   Humidity: 72

and you want to track these, then this script would do:

   #!/bin/sh

   # Input parameters: Hostname, testname (column), and messagefile
   HOSTNAME="$1"
   TESTNAME="$2"
   FNAME="$3"

   # Analyze the message we got
   TEMP=`cat $FNAME | grep "^Temperature:" | awk '{print $2}'
   WIND=`cat $FNAME | grep "^Wind:" | awk '{print $2}'
   HMTY=`cat $FNAME | grep "^Humidity:" | awk '{print $2}'

   # The RRD dataset definition
   echo "DS:temperature:GAUGE:600:-30:50"
   echo "DS:wind:GAUGE:600:0:U"
   echo "DS:humidity:GAUGE:600:0:100"

   # The filename
   echo "weather.rrd"

   # The data
   echo "$TEMP:$WIND:$HMTY"

   exit 0


Does that seem like a usable plug-in facility ?


Henrik

list Charles Jones · Fri, 04 Feb 2005 16:47:59 -0700 ·

That would be cool.  It would require users to learn about the RRD 
definition options, but at least they wouldn't have to code and compile 
their own C module.  Would it also create a default entry for them in 
hobbitgraph.cfg, or would they have to learn those options as well?

The plugin system sounds cool...any chance it will make it into an RC 
before 4.0 final?

-Charles

▸ quoted from Henrik Størner



Henrik Stoerner wrote:

On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:

Speaking of this, it sure would be nice to have some sort of plugin 
system, or something for easily creating custom graphs. I can think of 
many uses for simple one-element graphs (temperature, emails sent per 
day, etc).

Sounds reasonable. I've found a way of doing this that keeps as much
as possible of the RRD handling in Hobbit, and makes it easy to use
custom scripts (written in your favourite scripting language) to
process a message and pick out the interesting data you want to put
into a graph.

Basically, you tell hobbitd_larrd which status- or data-messages are
handled by an external script, and what the script is. Your script
is then called when such a message arrives, and is fed the status
message in a file. In return, the script must output the RRD
definitions for the data you want to store, a filename for the RRD
file, and the values.

E.g. if you have a message like

  green Weather in Copenhagen is FAIR

  Temperature: 6
  Wind: 4
  Humidity: 72

and you want to track these, then this script would do:

  #!/bin/sh

  # Input parameters: Hostname, testname (column), and messagefile
  HOSTNAME="$1"
  TESTNAME="$2"
  FNAME="$3"

  # Analyze the message we got
  TEMP=`cat $FNAME | grep "^Temperature:" | awk '{print $2}'
  WIND=`cat $FNAME | grep "^Wind:" | awk '{print $2}'
  HMTY=`cat $FNAME | grep "^Humidity:" | awk '{print $2}'

  # The RRD dataset definition
  echo "DS:temperature:GAUGE:600:-30:50"
  echo "DS:wind:GAUGE:600:0:U"
  echo "DS:humidity:GAUGE:600:0:100"

  # The filename
  echo "weather.rrd"

  # The data
  echo "$TEMP:$WIND:$HMTY"

  exit 0


Does that seem like a usable plug-in facility ?

list Bruce Lysik · Fri, 4 Feb 2005 15:57:21 -0800 ·

Does that seem like a usable plug-in facility ?

Something like that sounds awesome.  In the near future I'm going to want to graph some data from SQL queries, and this would be much simpler.

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer

list Kevin Hanrahan · Fri, 4 Feb 2005 22:26:20 -0500 ·

I like that feature a lot! 
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Friday, February 04, 2005 7:11 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Graph Limits
Importance: Low

▸ quoted from Charles Jones


On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:

Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line.  Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?

I've worked on this a bit, and so far I've come up with the solution that
you can see on my site - e.g. if you go to
http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the first
host, and click the graph to see the 4 periodic graphs.

Direct link:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la

At the bottom of the page you can define the lower- and upper-limit for the
graph, and which periodic graph you want to see. E.g enter "0.5" in the
upper-limit, click "View" and you'll get graphs that cut off spikes at load
0.5.

Something like that you had in mind ?


Henrik

list Kevin Hanrahan · Fri, 4 Feb 2005 22:27:44 -0500 ·

I could use this for a great number of data sets! 

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Friday, February 04, 2005 5:51 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Custom graphs (was: temperature-larrd.pl)
Importance: Low

▸ quoted from Charles Jones


On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:

Speaking of this, it sure would be nice to have some sort of plugin 
system, or something for easily creating custom graphs. I can think of 
many uses for simple one-element graphs (temperature, emails sent per 
day, etc).

Sounds reasonable. I've found a way of doing this that keeps as much as
possible of the RRD handling in Hobbit, and makes it easy to use custom
scripts (written in your favourite scripting language) to process a message
and pick out the interesting data you want to put into a graph.

Basically, you tell hobbitd_larrd which status- or data-messages are handled
by an external script, and what the script is. Your script is then called
when such a message arrives, and is fed the status message in a file. In
return, the script must output the RRD definitions for the data you want to
store, a filename for the RRD file, and the values.

E.g. if you have a message like

   green Weather in Copenhagen is FAIR

   Temperature: 6
   Wind: 4
   Humidity: 72

and you want to track these, then this script would do:

   #!/bin/sh

   # Input parameters: Hostname, testname (column), and messagefile
   HOSTNAME="$1"
   TESTNAME="$2"
   FNAME="$3"

   # Analyze the message we got
   TEMP=`cat $FNAME | grep "^Temperature:" | awk '{print $2}'
   WIND=`cat $FNAME | grep "^Wind:" | awk '{print $2}'
   HMTY=`cat $FNAME | grep "^Humidity:" | awk '{print $2}'

   # The RRD dataset definition
   echo "DS:temperature:GAUGE:600:-30:50"
   echo "DS:wind:GAUGE:600:0:U"
   echo "DS:humidity:GAUGE:600:0:100"

   # The filename
   echo "weather.rrd"

   # The data
   echo "$TEMP:$WIND:$HMTY"

   exit 0


Does that seem like a usable plug-in facility ?


Henrik

list Charles Jones · Sat, 05 Feb 2005 04:16:09 -0700 ·

I like! :-)  I like he cpu test too...is that a test that is available on the hobbit server?

-Charles

▸ quoted from Kevin Hanrahan

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Friday, February 04, 2005 7:11 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Graph Limits
Importance: Low

On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:

Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line.  Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?

I've worked on this a bit, and so far I've come up with the solution that
you can see on my site - e.g. if you go to
http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the first
host, and click the graph to see the 4 periodic graphs.

Direct link:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la

At the bottom of the page you can define the lower- and upper-limit for the
graph, and which periodic graph you want to see. E.g enter "0.5" in the
upper-limit, click "View" and you'll get graphs that cut off spikes at load
0.5.

Something like that you had in mind ?

list Henrik Størner · Sat, 5 Feb 2005 12:27:04 +0100 ·

▸ quoted from Charles Jones

On Sat, Feb 05, 2005 at 04:16:09AM -0700, Charles Jones wrote:

I like! :-)  I like he cpu test too...is that a test that is available 
on the hobbit server?

The cpu (and the other client-side tests) show up if you install the
Big Brother client on the servers you're monitoring. You can get it as
part of the Big Brother package on bb4.org.

There is some work underway for a free (i.e. Open Source) client
implementation; Emanuel Dreyfus has been working on a client for
NetBSD, and I think it's designed so it will be easy to make it handle
other Unix-like operating systems.


Regards,
Henrik

list Charles Jones · Sat, 05 Feb 2005 04:32:19 -0700 ·

I'm using the bb client on remote machines, including the cpu test...Hobbit just graphs the load average but doesn't include the ps output like yours does. I'm going to guess it's because the bb clients I am using are 1.9e  :-)

-Charles

▸ quoted from Henrik Størner



Henrik Stoerner wrote:

On Sat, Feb 05, 2005 at 04:16:09AM -0700, Charles Jones wrote:

I like! :-)  I like he cpu test too...is that a test that is available on the hobbit server?

The cpu (and the other client-side tests) show up if you install the
Big Brother client on the servers you're monitoring. You can get it as
part of the Big Brother package on bb4.org.

There is some work underway for a free (i.e. Open Source) client
implementation; Emanuel Dreyfus has been working on a client for
NetBSD, and I think it's designed so it will be easy to make it handle
other Unix-like operating systems.

list Henrik Størner · Sat, 5 Feb 2005 12:52:37 +0100 ·

▸ quoted from Charles Jones

On Sat, Feb 05, 2005 at 04:32:19AM -0700, Charles Jones wrote:

I'm using the bb client on remote machines, including the cpu test...Hobbit just graphs the load average but doesn't include the ps output like yours does. I'm going to guess it's because the bb clients I am using are 1.9e  :-)

You can get that too, if you have "top" installed on the systems.
In your client's etc/bbsys.local, add this:

  TOP="/usr/bin/top"
  TOPARGS="-b -n 1"
  export TOP TOPARGS

This is for "top" on Linux, I think the Solaris parameters need a bit
of tweaking.


Henrik

list Henrik Størner · Sat, 5 Feb 2005 12:54:10 +0100 ·

▸ quoted from Gordon Thiesfeld

On Thu, Feb 03, 2005 at 11:41:58AM -0600, Thiesfeld, Gordon wrote:

Is this something that is going to be implemented into hobbit in the future?
If not, I'd appreciate some help modifying hobbitd_larrd.  The only thing I
can do with C is spell it:-)

Yes, I've added a "temperature" handler to Hobbit's RRD module.

▸ quoted from Gordon Thiesfeld

I'm not using temperature.sh on the client side, but I can format my script
so that the output will match it for consistency.

If you do, then it should pick up the data and graph it automatically.


Henrik

list Olivier Beau · Sun, 6 Feb 2005 09:57:09 +0100 ·

Hi,

Cacti does this a cool way (with php/gd); a demo site :
http://www.bigspring.k12.pa.us/cacti/graph_view.php?action=tree&tree_id=31&leaf_id=408&select_first=true
click on the magnifying glass and then select in the graph what you want to zoom
on...


Olivier


Selon Henrik Stoerner <user-ce4a2c883f75@xymon.invalid>:

▸ quoted from Charles Jones

On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:

Sometimes a large spike can ruin a graph, because the graph then scales > to the max size of the spike, so the rest of the graph isn't very > readable as it is now scrunched down to almost a flat line.  Is it > possible to have an option for Hobbits larrding so that you can define a > maximum threshold not to extend past?

I've worked on this a bit, and so far I've come up with the solution
that you can see on my site - e.g. if you go to
http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the
first host, and click the graph to see the 4 periodic graphs.

Direct link:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la

At the bottom of the page you can define the lower- and upper-limit
for the graph, and which periodic graph you want to see. E.g enter
"0.5" in the upper-limit, click "View" and you'll get graphs that cut
off spikes at load 0.5.

Something like that you had in mind ?


Henrik

--


Olivier Beau

list Henrik Størner · Sun, 6 Feb 2005 10:04:59 +0100 ·

▸ quoted from Olivier Beau

On Sun, Feb 06, 2005 at 09:57:09AM +0100, user-fe6e0e6a0d05@xymon.invalid wrote:

Hi,

Cacti does this a cool way (with php/gd); a demo site :
http://www.bigspring.k12.pa.us/cacti/graph_view.php?action=tree&tree_id=31&leaf_id=408&select_first=true
click on the magnifying glass and then select in the graph what you want to zoom

Agreed, that *is* cool.

It seems they are also using rrdtool as the back-end, so it shouldn't
be too hard to implement something similar in Hobbit.

For now, I'll leave it "as-is" - but this looks like one of those
"wow" features that are very nice to have when you're trying to
convince someone to use Hobbit :-) So I'll get back to it later,
unless someone else would like to contribute code for it.


Henrik

list Daniel Magnuszewski · Thu, 10 Feb 2005 11:39:37 -0500 ·

I have created this functionality with a Big Brother external script
that integrates Big Brother and Cacti. The script is called BigCactus
and it's available on deadcat. I am very new to Hobbit, and I have just
joined the list (I haven't even installed hobbit yet). With that said,
if BB external scripts work in Hobbit, then this should work no problem.
If you already have Cacti installed, you can integrate the two, similar
to that of the bbmrtg mrtg scripts (with the ability to set thresholds,
etc). The only problem currently is that I haven't written much
threshold checking tests (only for Novell servers so far). So by using
this script, it will allow you to go right from the hobbit page to the
cacti page that contains the zoom functionality.

One problem I found with the magnifying glass zoom (if/when someone
begins writing this functionality) is that when you zoom in on the
graph, there is a DHTML layering over the image. This causes problems
when trying to print out the graph in IE, but apparently not firefox,
mozilla, etc. 

Daniel Magnuszewski
CCNA
M & T Bank
user-2179d46e0f82@xymon.invalid

user-ce4a2c883f75@xymon.invalid 2/6/2005 4:04:59 AM >>>

▸ quoted from Henrik Størner

On Sun, Feb 06, 2005 at 09:57:09AM +0100, user-fe6e0e6a0d05@xymon.invalid wrote:

Cacti does this a cool way (with php/gd); a demo site :

http://www.bigspring.k12.pa.us/cacti/graph_view.php?action=tree&tree_id=31&leaf_id=408&select_first=true

click on the magnifying glass and then select in the graph what you
want to zoom

It seems they are also using rrdtool as the back-end, so it shouldn't
be too hard to implement something similar in Hobbit.

For now, I'll leave it "as-is" - but this looks like one of those
"wow" features that are very nice to have when you're trying to
convince someone to use Hobbit :-) So I'll get back to it later,
unless someone else would like to contribute code for it.

list Bruce Lysik · Fri, 25 Feb 2005 17:24:01 -0800 ·

Hey,

Just wanted to say I started using the custom graphing feature.  Manager is very impressed.  Thanks Henrik.

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer

list Craig Boyce · Tue, 21 Jun 2005 14:05:21 +1200 ·

Hi,
 
I have moved from BB to hobbit and am having problems getting the memory
data graphed from all the windows clients, There is no graph being
generated for memory while Cpu and Disk are.
I have also added Henrik's Citrix script and these stats are not being
passed through.
 
I have checked the LARRDS and GRAPHS section of hobbitserver.cfg and the
entries to graph these items are enabled by default.
I have enabled the debug options for larrdstatus and larrddata, In
checking the log files I can see no entries for the client memory or the
citrix stats.
 
Any Ideas.
 
Thanks
 
Craig Boyce

#####################################################################################
Disclaimer:
The information in this electronic mail message is confidential and may be legally privileged.
It is intended solely for the Addressee.Access to this internet electronic mail message by anyone else is unauthorised.
If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be
taken in reliance on it is prohibited and may be unlawful.
If you have received this message in error please notify us immediately.

Rodney District Council accepts no responsibility for any effects this email message or attachments has on the recipient 
network or computer system.

#####################################################################################

list Andy France · Tue, 21 Jun 2005 14:24:55 +1200 ·


Hi Craig,

"Craig Boyce" wrote on 21/06/2005 14:05:21:

▸ quoted from Craig Boyce

Hi,

I have moved from BB  to hobbit and am having problems getting the
memory data graphed from all  the windows clients, There is no graph
being generated for memory while Cpu  and Disk are.
I have also  added Henrik's Citrix script and these stats are not
being passed  through.

I have checked the  LARRDS and GRAPHS section of hobbitserver.cfg
and the entries to graph these  items are enabled by default.
I have enabled the  debug options for larrdstatus and larrddata, In
checking the log files I can see  no entries for the client memory
or the citrix stats.

Any  Ideas.

Thanks

Craig  Boyce

CPU (load) and DISK (percent) are provided by the base NT client.  Do you
already have bb-memory script from Deadcat installed on each to pass the
memory and netstat data?  I'm guessing you had the graphs under BB since
you are asking where they have gone :-)

Are you missing the whole "memory" column, or is it just the graph that is
broken?  Is it the same for "citrix"?  If you have the columns but
broken/missing graphs try deleting any existing rrd files for these tests
on the hobbit server.

I have both of these working OK, so feel free to drop me a note.  Nice to
see another kiwi on the list!

Andy.

#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################

list Fabio Flores · Tue, 21 Jun 2005 09:46:34 +0100 ·

Hi All,

Im trying once again to get my custom graphs to work.

the rrd files are ok, Ive tested them separately, but when I try to merge
them into 1 graph, it wont work. Here is what I have on my hobbitgraph.cfg:

[jms]
        FNPATTERN jms(.*).rrd
        TITLE JMS Queues
        YAXIS Num of Messages
        DEF:IN at RRDIDX@=@RRDFN@:IN:AVERAGE
        LINE2:IN at RRDIDX@#@COLOR@:@RRDPARAM@
        GPRINT:IN at RRDIDX@:LAST: \: %5.1lf (cur)
        GPRINT:IN at RRDIDX@:MAX: \: %5.1lf (max)
        GPRINT:IN at RRDIDX@:MIN: \: %5.1lf (min)
        GPRINT:IN at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n

I might be missing something but Ive checked the mail list archive and also
the rrd documentation but couldn't figure out.

The point is that I have different jms,foo.rrd files each one got different
DS: definitions, for a start Im using only the "IN" definition.


Hope someone can help.

Thanks.

list Didier Degey · Thu, 23 Jun 2005 09:03:52 +0200 ·

Hello Andy,Craig

I have exactly the same problem.
I have tried the solution that Andy propose, but that did not change
anything
Still have the memory graph missing (and only the graph)

Craig, does the solution change something for you ?

Didier.

▸ quoted from Andy France


-----Message d'origine-----
De : Andy France [mailto:user-ee2a9e4eaf57@xymon.invalid] 
Envoyé : mardi 21 juin 2005 04:25
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] NT Client trend graphs


Hi Craig,

"Craig Boyce" wrote on 21/06/2005 14:05:21:

Hi,

I have moved from BB  to hobbit and am having problems getting the 
memory data graphed from all  the windows clients, There is no graph 
being generated for memory while Cpu  and Disk are.
I have also  added Henrik's Citrix script and these stats are not 
being passed  through.

I have checked the  LARRDS and GRAPHS section of hobbitserver.cfg and 
the entries to graph these  items are enabled by default.
I have enabled the  debug options for larrdstatus and larrddata, In 
checking the log files I can see  no entries for the client memory or 
the citrix stats.

Any  Ideas.

Thanks

Craig  Boyce

CPU (load) and DISK (percent) are provided by the base NT client.  Do you
already have bb-memory script from Deadcat installed on each to pass the
memory and netstat data?  I'm guessing you had the graphs under BB since you
are asking where they have gone :-)

Are you missing the whole "memory" column, or is it just the graph that is
broken?  Is it the same for "citrix"?  If you have the columns but
broken/missing graphs try deleting any existing rrd files for these tests on
the hobbit server.

I have both of these working OK, so feel free to drop me a note.  Nice to
see another kiwi on the list!

Andy.

############################################################################
#########

This email is intended for the person to whom it is addressed only. If you
are not the intended recipient, do not read, copy or use the contents in any
way. The opinions expressed may not necessarily reflect those of ZESPRI
Group of Companies ('ZESPRI').

While every effort has been made to verify the information contained herein,
ZESPRI does not make any representations as to the accuracy of the
information or to the performance of any data, information or the products
mentioned herein.
ZESPRI will not accept liability for any losses, damage or consequence,
however, resulting directly or indirectly from the use of this
e-mail/attachments.
############################################################################
#########

list Craig Boyce · Thu, 23 Jun 2005 21:11:02 +1200 ·

Hi Didier,

My previous BB install was building the memory graphs from the CPU stats returned from the BB client and this was not working with Hobbit. I installed the bb-memory script from deadcat and set the saved logs location on the client and it started working straight away.


Craig

▸ quoted from Didier Degey

-----Original Message-----
From: Didier Degey [mailto:user-fe2d30acf6f7@xymon.invalid] 
Sent: Thursday, 23 June 2005 7:04 p.m.
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] NT Client trend graphs

Hello Andy,Craig

I have exactly the same problem.
I have tried the solution that Andy propose, but that did not change anything Still have the memory graph missing (and only the graph)

Craig, does the solution change something for you ?

Didier.

-----Message d'origine-----
De : Andy France [mailto:user-ee2a9e4eaf57@xymon.invalid] 
Envoyé : mardi 21 juin 2005 04:25
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] NT Client trend graphs

Hi Craig,

"Craig Boyce" wrote on 21/06/2005 14:05:21:

Hi,

I have moved from BB  to hobbit and am having problems getting the
memory data graphed from all  the windows clients, There is no graph 
being generated for memory while Cpu  and Disk are.
I have also  added Henrik's Citrix script and these stats are not 
being passed  through.

I have checked the  LARRDS and GRAPHS section of hobbitserver.cfg and
the entries to graph these  items are enabled by default.
I have enabled the  debug options for larrdstatus and larrddata, In 
checking the log files I can see  no entries for the client memory or 
the citrix stats.

Any  Ideas.

Thanks

Craig  Boyce

CPU (load) and DISK (percent) are provided by the base NT client. Do you already have bb-memory script from Deadcat installed on each to pass the memory and netstat data? I'm guessing you had the graphs under BB since you are asking where they have gone :-)

Are you missing the whole "memory" column, or is it just the graph that is broken? Is it the same for "citrix"? If you have the columns but broken/missing graphs try deleting any existing rrd files for these tests on the hobbit server.

I have both of these working OK, so feel free to drop me a note. Nice to see another kiwi on the list!

Andy.

############################################################################
#########

This email is intended for the person to whom it is addressed only. If you are not the intended recipient, do not read, copy or use the contents in any way. The opinions expressed may not necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information contained herein, ZESPRI does not make any representations as to the accuracy of the information or to the performance of any data, information or the products mentioned herein. ZESPRI will not accept liability for any losses, damage or consequence, however, resulting directly or indirectly from the use of this e-mail/attachments. ############################################################################
#########

#####################################################################################
Disclaimer:
The information in this electronic mail message is confidential and may be legally privileged.
It is intended solely for the Addressee.Access to this internet electronic mail message by anyone else is unauthorised.
If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be
taken in reliance on it is prohibited and may be unlawful.
If you have received this message in error please notify us immediately.

Rodney District Council accepts no responsibility for any effects this email message or attachments has on the recipient
network or computer system.

#####################################################################################

list Didier Degey · Fri, 24 Jun 2005 09:18:14 +0200 ·

Hello Craig,
I was using BBCheckMemory.vbs ... I change to the bb-memory and it works
well too

Thanks
Didier.

▸ quoted from Craig Boyce


-----Message d'origine-----
De : Craig Boyce [mailto:user-e7830d35cd5f@xymon.invalid] 
Envoyé : jeudi 23 juin 2005 11:11
À : user-ae9b8668bcde@xymon.invalid
Objet : RE: [hobbit] NT Client trend graphs

Hi Didier,

My previous BB install was building the memory graphs from the CPU stats
returned from the BB client and this was not working with Hobbit. I
installed the bb-memory script from deadcat and set the saved logs location
on the client and it started working straight away.


Craig

-----Original Message-----
From: Didier Degey [mailto:user-fe2d30acf6f7@xymon.invalid]
Sent: Thursday, 23 June 2005 7:04 p.m.
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] NT Client trend graphs


Hello Andy,Craig

I have exactly the same problem.
I have tried the solution that Andy propose, but that did not change
anything Still have the memory graph missing (and only the graph)

Craig, does the solution change something for you ?

Didier.

-----Message d'origine-----
De : Andy France [mailto:user-ee2a9e4eaf57@xymon.invalid] 
Envoyé : mardi 21 juin 2005 04:25
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] NT Client trend graphs


Hi Craig,

"Craig Boyce" wrote on 21/06/2005 14:05:21:

Hi,

I have moved from BB  to hobbit and am having problems getting the
memory data graphed from all  the windows clients, There is no graph 
being generated for memory while Cpu  and Disk are.
I have also  added Henrik's Citrix script and these stats are not 
being passed  through.

I have checked the  LARRDS and GRAPHS section of hobbitserver.cfg and
the entries to graph these  items are enabled by default.
I have enabled the  debug options for larrdstatus and larrddata, In 
checking the log files I can see  no entries for the client memory or 
the citrix stats.

Any  Ideas.

Thanks

Craig  Boyce

CPU (load) and DISK (percent) are provided by the base NT client.  Do you
already have bb-memory script from Deadcat installed on each to pass the
memory and netstat data?  I'm guessing you had the graphs under BB since you
are asking where they have gone :-)

Are you missing the whole "memory" column, or is it just the graph that is
broken?  Is it the same for "citrix"?  If you have the columns but
broken/missing graphs try deleting any existing rrd files for these tests on
the hobbit server.

I have both of these working OK, so feel free to drop me a note.  Nice to
see another kiwi on the list!

Andy.

############################################################################
#########

This email is intended for the person to whom it is addressed only. If you
are not the intended recipient, do not read, copy or use the contents in any
way. The opinions expressed may not necessarily reflect those of ZESPRI
Group of Companies ('ZESPRI').

While every effort has been made to verify the information contained herein,
ZESPRI does not make any representations as to the accuracy of the
information or to the performance of any data, information or the products
mentioned herein. ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from the use of this
e-mail/attachments.
############################################################################
#########


############################################################################
#########
Disclaimer:
The information in this electronic mail message is confidential and may be
legally privileged.
It is intended solely for the Addressee.Access to this internet electronic
mail message by anyone else is unauthorised.
If you are not the intended recipient, any disclosure, copying, distribution
or any action taken or omitted to be
taken in reliance on it is prohibited and may be unlawful.
If you have received this message in error please notify us immediately.

Rodney District Council accepts no responsibility for any effects this email
message or attachments has on the recipient 
network or computer system.

############################################################################
#########

list Thomas Kern · Thu, 18 Jan 2007 10:19:26 -0500 ·

I have two non-linux systems that I have a specialized client on. On one
of these systems, I have a special test for the CPU utilization of
logical partitions. I used the NCV method of getting the data
(LPARname:CPUvalue) into the RRD files supported by Hobbit. I have been
able to add a graph to the status page for the LPAR column with a graph
definition that lists each of the known LPARs in this system and the
total value that is also sent. This has been working for several weeks.
Now I am trying to add the second system. It has a different set of LPAR
names. These LPAR names must be unique across our organization. 

I asked our Hobbit admin to update the hobbitgraph.cfg with a modified
[lpar] definition that included the second system's LPAR names in the
hope that RRD would simply say that those entries from the other system
were 'Not Available' or simply not display anything for them. I had
hoped that I could use ONE graph definition for the [lpar] column no
matter which system was supplying the data. Both graphs had an error
about one of the OTHER's DS entries not being present. Is it possible to
have two graph definitions for the same column, such as [vm1.lpar] and
[vmhost.lpar] versus a single definition [lpar]?

Here are copies of the two status pages:

Thu Jan 18 10:05:06 EST 2007 General Processor LPAR utilization within
thresholds 
LEGACY:1.0
ZOSEPROD:26.7
ZOSPROD:2.4
Total:30.1
 
Thu Jan 18 10:05:13 EST 2007 IFL Processor LPAR utilization within
thresholds 
IFLPROD:1.3
Total:1.3

Here is the combined graph definition:

[lpar]
	TITLE z890 LPAR CPU Utilization
   	YAXIS Percent
   	FNPATTERN lpar.rrd
   	-u 100
   	-l 0
   	DEF:LEGACY=lpar.rrd:LEGACY:AVERAGE
	DEF:IFLPROD=lpar.rrd:IFLPROD:AVERAGE	
	DEF:ZOSPROD=lpar.rrd:ZOSPROD:AVERAGE
	DEF:IFLTEST=lpar.rrd:IFLTEST:AVERAGE	
	DEF:ZOSEPROD=lpar.rrd:ZOSEPROD:AVERAGE
	DEF:Total=lpar.rrd:Total:AVERAGE
	LINE2:LEGACY#00cc00:Legacy
	LINE2:IFLPROD#00cc00:IFL-Production
   	LINE2:ZOSPROD#ff0000:ZOSProd
	LINE2:IFLTEST#00cc00:IFL-Test
   	LINE2:ZOSEPROD#0000ff:ZOSEProd
  	LINE2:Total#ff00ff:Total
	COMMENT: \n
   	GPRINT:LEGACY:LAST:LEGACY   \: %5.1lf (cur)
   	GPRINT:LEGACY:MAX: \: %5.1lf (max)
   	GPRINT:LEGACY:MIN: \: %5.1lf (min)
   	GPRINT:IFLPROD:AVERAGE: \: %5.1lf (avg) \n
   	GPRINT:IFLPROD:LAST:LEGACY   \: %5.1lf (cur)
   	GPRINT:IFLPROD:MAX: \: %5.1lf (max)
   	GPRINT:IFLPROD:MIN: \: %5.1lf (min)
   	GPRINT:IFLPROD:AVERAGE: \: %5.1lf (avg) \n
   	GPRINT:ZOSPROD:LAST:ZOSPROD  \: %5.1lf (cur)
   	GPRINT:ZOSPROD:MAX: \: %5.1lf (max)
   	GPRINT:ZOSPROD:MIN: \: %5.1lf (min)
   	GPRINT:ZOSPROD:AVERAGE: \: %5.1lf (avg) \n
   	GPRINT:IFLTEST:LAST:LEGACY   \: %5.1lf (cur)
   	GPRINT:IFLTEST:MAX: \: %5.1lf (max)
   	GPRINT:IFLTEST:MIN: \: %5.1lf (min)
   	GPRINT:IFLTEST:AVERAGE: \: %5.1lf (avg) \n
   	GPRINT:ZOSEPROD:LAST:ZOSEPROD \: %5.1lf (cur)
   	GPRINT:ZOSEPROD:MAX: \: %5.1lf (max)
   	GPRINT:ZOSEPROD:MIN: \: %5.1lf (min)
   	GPRINT:ZOSEPROD:AVERAGE: \: %5.1lf (avg) \n
   	GPRINT:Total:LAST:Total    \: %5.1lf (cur)
   	GPRINT:Total:MAX: \: %5.1lf (max)
   	GPRINT:Total:MIN: \: %5.1lf (min)
   	GPRINT:Total:AVERAGE: \: %5.1lf (avg) \n


/Thomas Kern
/XXX-XXX-XXXX

list Jeremy Ruffer · Fri, 1 Feb 2008 11:15:41 -0000 ·

I'm obviously missing something here.

I have an external test that creates columns Humidity and Temperature as
follows:

Fri Feb 1 11:01:39 GMT 2008

Humidity : 21.0

and

Fri Feb 1 11:01:39 GMT 2008

Temperature : 24.7

I have added the columns in hobbitserver.cfg

TEST2RRD="cpu=la,disk,inode,qtree,memory,$PINGCOLUMN=tcp,http=tcp,dns=tc
p,dig=tcp,time=ntpstat,vmstat,iostat,netstat,temperature,apache,bind,sen
dmail,mailq,nmailq=mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy
,hobbitd,files,procs=processes,ports,clock,lines,Temperature=ncv,Humidit
y=ncv"

GRAPHS="la,disk,inode,qtree,files,processes,memory,users,vmstat,iostat,t
cp.http,tcp,ncv,netstat,ifstat,mrtg::1,ports,temperature,ntpstat,apache,
bind,sendmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobb
itd,clock,lines,Temperature,Humidity"

NCV_Temperature="Temperature:GAUGE"

NCV_Humidity="Humidity:GAUGE"

I did wonder if it was the decimal point so I took it out but that
didn't make any difference.

The rrd files aren't being created.

What have I missed?

Jeremy

This message, and any associated files, are intended only for the use of the message recipient and may contain information that is confidential, subject to copyright or constitute a trade secret. If you are not the message recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to the message and then deleting it from your computer. HSS Hire Service Group Limited may monitor email traffic data and also the content of email for the purposes of security and staff training. Any views or opinions presented are solely those of user-020a2aa3cf14@xymon.invalid and do not necessarily represent those of the company.

HSS Hire Service Group is a limited company registered in England and Wales. Registered number: 2103564.
Registered office: 25 Willow Lane, Mitcham, Surrey, CR4 4TS, United Kingdom.

list James Wade · Fri, 1 Feb 2008 09:03:04 -0600 ·

Jeremy,

 
Try adding another column to one of the tests:

 
Humidity : 21.0

Test : 19.0

 
I had a problem where I was feeding in four values,

the fourth one would never appear in the rrd file.

I figure out that it needed a newline when I sent the status

message:

 
`cat $line`

"

 
I just put the quote on the line below the cat. When I had it

on the same line: `cat $line`" , it didn't read in the fourth value.

 
I'm just wondering if because you only have one value, the

new line isn't sent and so it's not creating an rrd file. By

adding a value below Humidity or Test, you could check for it.

 
The other thing I did when I created my test was to restart

Hobbit Server.

 
James

▸ quoted from Jeremy Ruffer


 
From: Jeremy Ruffer [mailto:user-020a2aa3cf14@xymon.invalid] 
Sent: Friday, February 01, 2008 5:16 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Custom graphs

 
I'm obviously missing something here.

 
I have an external test that creates columns Humidity and Temperature as
follows:


Fri Feb 1 11:01:39 GMT 2008

Humidity  :  21.0

 
and

 
Fri Feb 1 11:01:39 GMT 2008

Temperature : 24.7

 
I have added the columns in hobbitserver.cfg


TEST2RRD="cpu=la,disk,inode,qtree,memory,$PINGCOLUMN=tcp,http=tcp,dns=tcp,di
g=tcp,time=ntpstat,vmstat,iostat,netstat,temperature,apache,bind,sendmail,ma
ilq,nmailq=mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,fil
es,procs=processes,ports,clock,lines,Temperature=ncv,Humidity=ncv"

 
GRAPHS="la,disk,inode,qtree,files,processes,memory,users,vmstat,iostat,tcp.h
ttp,tcp,ncv,netstat,ifstat,mrtg::1,ports,temperature,ntpstat,apache,bind,sen
dmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,clock,li
nes,Temperature,Humidity"

▸ quoted from Jeremy Ruffer


 
NCV_Temperature="Temperature:GAUGE"

NCV_Humidity="Humidity:GAUGE"

 
I did wonder if it was the decimal point so I took it out but that didn't
make any difference.

 
The rrd files aren't being created.

 
What have I missed?

 
Jeremy

 
This message, and any associated files, are intended only for the use of the
message recipient and may contain information that is confidential, subject
to copyright or constitute a trade secret. If you are not the message
recipient you are hereby notified that any dissemination, copying or
distribution of this message, or files associated with this message, is
strictly prohibited. If you have received this message in error, please
notify the sender immediately by replying to the message and then deleting
it from your computer. HSS Hire Service Group Limited may monitor email
traffic data and also the content of email for the purposes of security and
staff training. Any views or opinions presented are solely those of
user-020a2aa3cf14@xymon.invalid and do not necessarily represent those of the company.
 
HSS Hire Service Group is a limited company registered in England and Wales.
Registered number: 2103564.
Registered office: 25 Willow Lane, Mitcham, Surrey, CR4 4TS, United Kingdom.

list dOCtoR MADneSs · Wed, 25 Feb 2009 14:40:36 +0100 ·

Hi,

I created some tests, and I get graphs from them, using splitncv
All is almost perfect (I get values, they are stored in separate files, the graphs are made), but I have a little issue.
In trends page, I get my custom graphs, then 2 useless lines (having the name of my custom graphs). I attache a screenshot that is better descritption.
I'd like to remove those useless lines, that's all. I tried to add/remove/modify GRAPHS section in hobbitserver.cfg, TRENDS definition in bb-hosts, without any result (I can't  get more or less useless lines).

Any help is very welcome.

list Ryan Novosielski · Mon, 24 Sep 2012 16:55:09 -0400 ·

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I searched the web a bit and found the Xymon docs and there was
mention of two ways of doing custom graphs: NCV and trends messages.
NCV doesn't really suit me as my scripts have too much formatting in
them and would require a rewrite to have the separate lines for values
as is needed by NCV. I happened to check the docs on my 4.2.3 server
however and I see that the part about sending trends messages is
missing. So, I've got two questions:

1) Was sending trends messages added in Xymon 4.3.x and not available
in 4.2.3?

2) At the end of this:
http://www.xymon.com/xymon/help/howtograph.html
...there's a description of what to put in a trends message. There is
not, however, any description of what one does with it. My expectation
is that you do the following:

xymon <display> `cat trends.file`

...where the trends file is like the one described in the manual (eg.
starts with "data hostname.test"). Essentially, similar to sending a
status message (but sending a data message instead). If this is the
case, it might be helpful to have a note there in the manual, or at
least a reference to the part of the manual I'm about to go hunt for
that I assume mentions data messages.

- -- 
- ---- _  _ _  _ ___  _  _  _
|Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| |  | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlBgyKgACgkQmb+gadEcsb7KRwCgnSF/ZHyyLuRFcgHPrn7WotPz
Ik0An1gl+WN6UpaeNJ21GWb1rNRqbYTd
=B2zv
-----END PGP SIGNATURE-----

list Wim Nelis · Tue, 25 Sep 2012 06:41:06 +0200 ·

Hello,

▸ quoted from Ryan Novosielski

I searched the web a bit and found the Xymon docs and there was
mention of two ways of doing custom graphs: NCV and trends messages.
NCV doesn't really suit me as my scripts have too much formatting in
them and would require a rewrite to have the separate lines for values
as is needed by NCV.

Perhaps a rewrite is not needed. If the values to be entered in the graph are still available once the status message is build, you could add an HTML comment section to the message containing the NCVs, formatted as Xymon expects them. Additionally, you could replace the ':' and the '=' in the original status message by the corresponding HTML escape character. This will prevent the NCV module from extracting values from the original message.

HTH,
  Wim Nelis.


******************************************************************************************************************

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual
or legal commitment on the part of the sender.
This message may contain information that is not intended for you. If you are not the addressee or if this
message was sent to you by mistake, you are requested to inform the sender and delete the message.
Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic
transmission of messages.
 
******************************************************************************************************************

list Ryan Novosielski · Tue, 25 Sep 2012 02:11:13 -0400 ·

▸ quoted from Wim Nelis

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/25/2012 12:41 AM, Nelis, Wim wrote:

Hello,

I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much
 formatting in them and would require a rewrite to have the separate lines for values as is needed by NCV.

Perhaps a rewrite is not needed. If the values to be entered in the
graph are still available once the status message is build, you could add an HTML comment section to the message containing the NCVs, formatted as Xymon expects them. Additionally, you could replace the ':' and the '=' in the original status message by the corresponding HTML escape character. This will prevent the NCV module from extracting values from the original message.

That is actually a pretty interesting idea, though it ends up being
about the same amount of work as using a trends message when it comes
down to it (except I guess it cuts down on the number of message). I
guess the NCV will use either = or : as a delimiter?

▸ quoted from Ryan Novosielski


- -- - ---- _  _ _  _ ___  _  _  _
|Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| |  | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/


iEYEARECAAYFAlBhSvkACgkQmb+gadEcsb5DGwCgtw7f45WuBMYU6tHp7UTmPQRY
n1sAn1GVbqEWaScIomgAZLuUiw3SkQgC
=IO/S
-----END PGP SIGNATURE-----

list Ryan Novosielski · Tue, 25 Sep 2012 02:15:03 -0400 ·

▸ quoted from Ryan Novosielski

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/24/2012 04:55 PM, Ryan Novosielski wrote:

Hi,

I searched the web a bit and found the Xymon docs and there was 
mention of two ways of doing custom graphs: NCV and trends
messages. NCV doesn't really suit me as my scripts have too much
formatting in them and would require a rewrite to have the separate
lines for values as is needed by NCV. I happened to check the docs
on my 4.2.3 server however and I see that the part about sending
trends messages is missing. So, I've got two questions:

1) Was sending trends messages added in Xymon 4.3.x and not
available in 4.2.3?

The answer to this question I found is "no, the feature was indeed
available in 4.2.3 but just missing from that section of the docs."

▸ quoted from Ryan Novosielski

2) At the end of this: 
http://www.xymon.com/xymon/help/howtograph.html ...there's a
description of what to put in a trends message. There is not,
however, any description of what one does with it. My expectation 
is that you do the following:

xymon <display> `cat trends.file`

...where the trends file is like the one described in the manual
(eg. starts with "data hostname.test"). Essentially, similar to
sending a status message (but sending a data message instead). If
this is the case, it might be helpful to have a note there in the
manual, or at least a reference to the part of the manual I'm about
to go hunt for that I assume mentions data messages.

And this was correct as well.

A followup question relates to the apparent requirement that you add
the RRD/test name to both the GRAPH and the TEST2RRD variables in the
hobbitserver.cfg. Is there any way to add the graph to just the test
page and not the trends page? Why do both of these variables exist if
it seems like you need both of them for it to work?

▸ quoted from Ryan Novosielski


- -- 
- ---- _  _ _  _ ___  _  _  _
|Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| |  | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/


iEYEARECAAYFAlBhS+YACgkQmb+gadEcsb45xwCeJ0w6ptFrjtdDa92ZkJ1hisHO
PjUAnAo0QrYtJT674R2l0yTKnGJj7gZv
=BJey
-----END PGP SIGNATURE-----

list Chris Morris · Tue, 25 Sep 2012 09:16:47 +0100 ·

The comments in the "xymonserver.cfg" file are quite clear :-

TEST2RRD # This is also used by the svcstatus.cgi script to determine if the
detailed
# status view of a test should include a graph.

GRAPH
# This defines which RRD files to include on the "trends" column
webpage,
# and the order in which they appear.

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Ryan Novosielski
Sent: 25 September 2012 07:15
To: xymon at xymon.com
Subject: Re: [Xymon] Custom Graphs

▸ quoted from Ryan Novosielski


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/24/2012 04:55 PM, Ryan Novosielski wrote:

Hi,

I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much formatting in them and would require a rewrite to have the separate lines for values

as is needed by NCV. I happened to check the docs on my 4.2.3 server however and I see that the part about sending trends messages is missing. So, I've got two questions:

1) Was sending trends messages added in Xymon 4.3.x and not available in 4.2.3?

The answer to this question I found is "no, the feature was indeed
available in 4.2.3 but just missing from that section of the docs."

2) At the end of this: http://www.xymon.com/xymon/help/howtograph.html ...there's a description of what to put in a trends message. There is not, however,

any description of what one does with it. My expectation is that you do the following:

xymon <display> `cat trends.file`

...where the trends file is like the one described in the manual (eg. starts with "data hostname.test"). Essentially, similar to sending a status message (but sending a data message instead). If this is the case, it might be helpful to have a note there in the manual, or at least a reference to the part of the manual I'm about to go hunt for that I assume mentions data messages.

And this was correct as well.

A followup question relates to the apparent requirement that you add the
RRD/test name to both the GRAPH and the TEST2RRD variables in the
hobbitserver.cfg. Is there any way to add the graph to just the test
page and not the trends page? Why do both of these variables exist if it
seems like you need both of them for it to work?

- --
- ---- _  _ _  _ ___  _  _  _
|Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| |  | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlBhS+YACgkQmb+gadEcsb45xwCeJ0w6ptFrjtdDa92ZkJ1hisHO
PjUAnAo0QrYtJT674R2l0yTKnGJj7gZv
=BJey
-----END PGP SIGNATURE-----


**************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither the RWE Group of Companies nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). 
*****************************************************************************

list Ryan Novosielski · Tue, 25 Sep 2012 04:33:33 -0400 ·

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adding it to the GRAPH variable didn't seem to do anything if it
wasn't also present on the TEST2RRD variable, and vice-versa. I will
double-check that, but it didn't seem to be possible to do one or the
other.

▸ quoted from Chris Morris


On 09/25/2012 04:16 AM, user-e510f6c03e57@xymon.invalid wrote:

The comments in the "xymonserver.cfg" file are quite clear :-

TEST2RRD # This is also used by the svcstatus.cgi script to
determine if the detailed # status view of a test should include a
graph.

GRAPH # This defines which RRD files to include on the "trends"
column webpage, # and the order in which they appear.

-----Original Message----- From: xymon-bounces at xymon.com
[mailto:xymon-bounces at xymon.com] On Behalf Of Ryan Novosielski 
Sent: 25 September 2012 07:15 To: xymon at xymon.com Subject: Re:
[Xymon] Custom Graphs

On 09/24/2012 04:55 PM, Ryan Novosielski wrote:

Hi,

I searched the web a bit and found the Xymon docs and there was 
mention of two ways of doing custom graphs: NCV and trends
messages. NCV doesn't really suit me as my scripts have too much
formatting in them and would require a rewrite to have the
separate lines for values

as is needed by NCV. I happened to check the docs on my 4.2.3
server however and I see that the part about sending trends
messages is missing. So, I've got two questions:

1) Was sending trends messages added in Xymon 4.3.x and not
available in 4.2.3?

The answer to this question I found is "no, the feature was indeed 
available in 4.2.3 but just missing from that section of the
docs."

2) At the end of this: 
http://www.xymon.com/xymon/help/howtograph.html ...there's a 
description of what to put in a trends message. There is not,
however,

any description of what one does with it. My expectation is that
you do the following:

xymon <display> `cat trends.file`

...where the trends file is like the one described in the manual
(eg. starts with "data hostname.test"). Essentially, similar to
sending a status message (but sending a data message instead). If
this is the case, it might be helpful to have a note there in the
manual, or at least a reference to the part of the manual I'm
about to go hunt for that I assume mentions data messages.

And this was correct as well.

A followup question relates to the apparent requirement that you
add the RRD/test name to both the GRAPH and the TEST2RRD variables
in the hobbitserver.cfg. Is there any way to add the graph to just
the test page and not the trends page? Why do both of these
variables exist if it seems like you need both of them for it to
work?


****************************************************************************
 The information contained in this email is intended only for the
use of the intended recipient at the email address to which it has
been addressed. If the reader of this message is not an intended
recipient, you are hereby notified that you have received this
document in error and that any review, dissemination or copying of
the message or associated attachments is strictly prohibited. If
you have received this email in error, please contact the sender by
return email or call 01793 877777 and ask for the sender and then
delete it immediately from your system.Please note that neither the
RWE Group of Companies nor the sender accepts any responsibility
for viruses and it is your responsibility to scan attachments (if
any).

*****************************************************************************

- -- 
- ---- _  _ _  _ ___  _  _  _
|Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| |  | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/


iEYEARECAAYFAlBhbF0ACgkQmb+gadEcsb7kegCeL2sBvcKaKgzmeTHTPbqpqM2P
afsAn0HRG7e/eg9ky4DQWNkDgpVhU7+Q
=+CIn
-----END PGP SIGNATURE-----

list Alias · Thu, 11 Apr 2013 23:27:26 +1000 ·

Our weakness with xymon is forecasting disk trends. With the right
thresholds I can be proactive in managing disk capacity however it's not
possible to check the past 576days for hundreds of servers. Metric
Reporting results are blank when selecting multiple hosts and disks
however it does work for CPU but it compiles all results\servers into
one graph . I would like to see disk trends of 200hosts display in
200graphs on the one page.

 
Has anyone got any suggestions or "been there, done that" ?

 
regards

list Paul Root · Tue, 8 Jul 2014 16:20:14 +0000 ·

Hi,
                This is one of my least favorite things of xymon. It really shouldn't be this hard.

                I've had a script that pulls the %mem of certain processes on various machines.  Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.

                Previously, I had the script just put out a ps -eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).

                I get something that looks like this now:
Process Memory Usage


SA: 6.5
MBIM: 2.4
OI: 4.1
 Process' alert levels are defined in /usr/local/etc/procmem.cfg
When levels are exceeded, restarting of the process to free memory is recommended

  xymProcMem,v 1.7 2014/07/07 20:37:17 ptroot Exp ptroot $


                My first iteration, I used <br> to separate the lines. That was a mistake, the source of the html showed them all on one line. I've changed to \n, and that to me should provide the proper output.

                However, looking at the rrd dump, I only get the first one:

<!-- Round Robin Database Dump --><rrd> <version> 0003 </version>
        <step> 300 </step> <!-- Seconds -->
        <lastupdate> 1404828303 </lastupdate> <!-- 2014-07-08 09:05:03 CDT -->

        <ds>
                <name> SA </name>
                <type> GAUGE </type>
                <minimal_heartbeat> 600 </minimal_heartbeat>
                <min> NaN </min>
                <max> NaN </max>

                <!-- PDP Status -->
                <last_ds> 6.5 </last_ds>
                <value> NaN </value>
                <unknown_sec> 3 </unknown_sec>
        </ds>

        <ds>
                <name> xymProcMemv16201407 </name>
                <type> DERIVE </type>
                <minimal_heartbeat> 600 </minimal_heartbeat>
                <min> NaN </min>
                <max> NaN </max>

                <!-- PDP Status -->
                <last_ds> 50 </last_ds>
                <value> NaN </value>
                <unknown_sec> 3 </unknown_sec>
        </ds>


                My TEST2RRD looks like:
TEST2RRD="cpu=la,disk,inode,...,nfmsgw=ncv,hpnasnapshot=ncv,ProcMemory=ncv"

And the NCV definitions:
NCV_hpnasnapshot="TotaldevicesinDB:GAUGE,Totalactivedevicesi:GAUGE,Totalinactivedevice:GAUGE,Totaldeviceswithout:GAUGE,Pctactive:GAUGE,Pctinactive:GAUGE,Pctnodriver:GAUGE,Activeattemptedsucc:GAUGE,Activeattemptedunsu:GAUGE,Activebutnotattempt:GAUGE,PctAttemptedsuccess:GAUGE,PctAttemptedunsucce:GAUGE,PctActivebutnotatte:GAUGE"
NCV_ProcMemory="SA:GAUGE,MBIM:GAUGE,OI:GAUGE,MNS1:GAUGE,NMS1:GAUGE,NMS2:GAUGE,NMS11:GAUGE,NMS12:GAUGE,NMS13:GAUGE,NMS14:GAUGE,NMS15:GAUGE,TCMgmtEngine:GAUGE,TCTFTP:GAUGE,TCSyslog:GAUGE,SWIM:GAUGE"


The hpnasnapshot graph works. I've been using it as an example.

Obviously, without all the data in the rrd, there is no point in trying to get a graph to show.

Any ideas from anyone?

Thanks,
Paul.


Paul Root
Lead Engineer
CenturyLink Network Reliability Operations Center

600 Stinson Blvd, N.E.
Flr 2N
Minneapolis, MN 55413
Direct: (651)312-5207
user-76fdb6883669@xymon.invalid

list Paul Root · Tue, 8 Jul 2014 17:28:43 +0000 ·

I decided to delete all the rrd files for this test, and after the next run (hourly). The data files are now filled out correctly.

On to the graph...

▸ quoted from Paul Root


From: Root, Paul T
Sent: Tuesday, July 08, 2014 11:20 AM
To: 'xymon at xymon.com'
Subject: custom graphs

Hi,
                This is one of my least favorite things of xymon. It really shouldn't be this hard.

                I've had a script that pulls the %mem of certain processes on various machines.  Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.

                Previously, I had the script just put out a ps -eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).

                I get something that looks like this now:
Process Memory Usage


SA: 6.5
MBIM: 2.4
OI: 4.1

 Process' alert levels are defined in /usr/local/etc/procmem.cfg
When levels are exceeded, restarting of the process to free memory is recommended


  xymProcMem,v 1.7 2014/07/07 20:37:17 ptroot Exp ptroot $


                My first iteration, I used <br> to separate the lines. That was a mistake, the source of the html showed them all on one line. I've changed to \n, and that to me should provide the proper output.

                However, looking at the rrd dump, I only get the first one:

<!-- Round Robin Database Dump --><rrd> <version> 0003 </version>
        <step> 300 </step> <!-- Seconds -->
        <lastupdate> 1404828303 </lastupdate> <!-- 2014-07-08 09:05:03 CDT -->

        <ds>
                <name> SA </name>
                <type> GAUGE </type>
                <minimal_heartbeat> 600 </minimal_heartbeat>
                <min> NaN </min>
                <max> NaN </max>

                <!-- PDP Status -->
                <last_ds> 6.5 </last_ds>
                <value> NaN </value>
                <unknown_sec> 3 </unknown_sec>
        </ds>

        <ds>
                <name> xymProcMemv16201407 </name>
                <type> DERIVE </type>
                <minimal_heartbeat> 600 </minimal_heartbeat>
                <min> NaN </min>
                <max> NaN </max>

                <!-- PDP Status -->
                <last_ds> 50 </last_ds>
                <value> NaN </value>
                <unknown_sec> 3 </unknown_sec>
        </ds>


                My TEST2RRD looks like:
TEST2RRD="cpu=la,disk,inode,...,nfmsgw=ncv,hpnasnapshot=ncv,ProcMemory=ncv"

And the NCV definitions:
NCV_hpnasnapshot="TotaldevicesinDB:GAUGE,Totalactivedevicesi:GAUGE,Totalinactivedevice:GAUGE,Totaldeviceswithout:GAUGE,Pctactive:GAUGE,Pctinactive:GAUGE,Pctnodriver:GAUGE,Activeattemptedsucc:GAUGE,Activeattemptedunsu:GAUGE,Activebutnotattempt:GAUGE,PctAttemptedsuccess:GAUGE,PctAttemptedunsucce:GAUGE,PctActivebutnotatte:GAUGE"
NCV_ProcMemory="SA:GAUGE,MBIM:GAUGE,OI:GAUGE,MNS1:GAUGE,NMS1:GAUGE,NMS2:GAUGE,NMS11:GAUGE,NMS12:GAUGE,NMS13:GAUGE,NMS14:GAUGE,NMS15:GAUGE,TCMgmtEngine:GAUGE,TCTFTP:GAUGE,TCSyslog:GAUGE,SWIM:GAUGE"


The hpnasnapshot graph works. I've been using it as an example.

Obviously, without all the data in the rrd, there is no point in trying to get a graph to show.

Any ideas from anyone?

Thanks,
Paul.


Paul Root
Lead Engineer
CenturyLink Network Reliability Operations Center

600 Stinson Blvd, N.E.
Flr 2N
Minneapolis, MN 55413
Direct: (651)312-5207


user-76fdb6883669@xymon.invalid<mailto:user-76fdb6883669@xymon.invalid>

list Jeremy Laidman · Wed, 9 Jul 2014 13:23:32 +1000 ·

On 9 July 2014 02:20, Root, Paul T <user-76fdb6883669@xymon.invalid> wrote:

                 I get something that looks like this now:

Your status text looks OK to me.  As does your configuration settings.

   xymProcMem,v 1.7 2014/07/07 20:37:17 ptroot Exp ptroot $

Except for this.  You realise that this is being interpreted as an NCV
line?  That's how you got your second DS.

▸ quoted from Paul Root

                 My first iteration, I used <br> to separate the lines.
That was a mistake, the source of the html showed them all on one line.

But also created the second DS called "xymProcMemv16201407" (from the
version string, non-text removed, truncated to 19 chars).

I’ve changed to \n, and that to me should provide the proper output.

Too late, the DS names were already set.

                <name> xymProcMemv16201407 </name>

Once the RRD file is created, it won't get recreated with any new DS names.
You have to delete the RRD file, or do an export/edit/import process to get
the DS names you need.

J

list W.J.M. Nelis · Wed, 09 Jul 2014 08:49:02 +0200 ·

Hi,

▸ quoted from Paul Root

                I've had a script that pulls the %mem of certain processes on various machines. Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.

                Previously, I had the script just put out a ps --eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).

                I get something that looks like this now:

Process Memory Usage

SA: 6.5

MBIM: 2.4

OI: 4.1

You might consider using the devmon-way of reporting data in stead of NCV. It has the advantage that you can report data to be entered in multiple RRD's. In Devmon one RRD per interface is use, in your case one RRD per group of machines could be used. The devmon format has also the advantage that colons or equal-signs in the message will not confuse the extractor of the RRD data.

Regards,

▸ quoted from Wim Nelis

   Wim Nelis.


******************************************************************************************************************

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual
or legal commitment on the part of the sender.
This message may contain information that is not intended for you. If you are not the addressee or if this
message was sent to you by mistake, you are requested to inform the sender and delete the message.
Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic
transmission of messages.
 ******************************************************************************************************************

list Paul Root · Wed, 9 Jul 2014 18:04:24 +0000 ·

Yes, I'm working on that now.  I pretty quickly came to the realization that the single RRD file wasn't going to work.

▸ quoted from W.J.M. Nelis

From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of W.J.M. Nelis
Sent: Wednesday, July 09, 2014 1:49 AM
To: xymon at xymon.com
Subject: Re: [Xymon] custom graphs

Hi,

                I've had a script that pulls the %mem of certain processes on various machines.  Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.

                Previously, I had the script just put out a ps -eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).

                I get something that looks like this now:
Process Memory Usage

SA: 6.5
MBIM: 2.4
OI: 4.1

You might consider using the devmon-way of reporting data in stead of NCV. It has the advantage that you can report data to be entered in multiple RRD's. In Devmon one RRD per interface is use, in your case one RRD per group of machines could be used. The devmon format has also the advantage that colons or equal-signs in the message will not confuse the extractor of the RRD data.

Regards,
  Wim Nelis.

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender.

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.

vmstat graphing with CPU io wait 🔗 link

vmstat graphing with CPU io wait