vmstat graphing with CPU io wait
list Tom Georgoulias
My old BB setup has a customized vmstat-larrd.pl script which allows for variations in vmstat output based on the version of procps. In other words, it compensates for the fact that RHEL3 and old red hat linux systems have vmstat output who's column ordering doesn't match up. So I'd like to bring some of those vmstat changes into my new hobbit setup, most notably the ability to plot CPU wait for IO (wa) alongside user, system & idle time, but poking around in hobbitgraph.cfg doesn't reveal an easy way to do it. Any tips on how I might accomplish this? Tom
list Henrik Størner
▸
In <user-1109dc7e0465@xymon.invalid> Tom Georgoulias <user-e7ef09aae711@xymon.invalid> writes:
My old BB setup has a customized vmstat-larrd.pl script which allows for variations in vmstat output based on the version of procps. In other words, it compensates for the fact that RHEL3 and old red hat linux systems have vmstat output who's column ordering doesn't match up.
Hobbit knows these two layouts of the vmstat data as "linux" and "debian3", the latter being the one for the older linux versions (essentially, systems running vmstat for a Linux 2.2 kernel).
▸
So I'd like to bring some of those vmstat changes into my new hobbit setup, most notably the ability to plot CPU wait for IO (wa) alongside user, system & idle time, but poking around in hobbitgraph.cfg doesn't reveal an easy way to do it.
Any tips on how I might accomplish this?
As with LARRD, there are two steps: Collecting the data, and
displaying them.
Data collection is handled by hobbitd_larrd. The interesting stuff
here is in hobbitd/do_larrd.c, and hobbitd/larrd/*.c . do_larrd.c
determines which function should parse an incoming status or data
message, by looking at the name of the "status" or "data" name.
It also consults the LARRDS environment variable, e.g. to figure out
that "ftp" is handled by the "tcp" parser. Each type of RRD file then
has it's own little routine in one of the hobbitd/larrd/*.c files to
pick out the interesting data, and put it into the RRD file.
Where do you get the I/O wait information from ?
Data display is handled by hobbitgraph.cgi, and the config file
hobbitgraph.cfg. This is very similar to the looong set of
definitions in larrd-grapher.cgi, except that you need not worry about
hostnames in the RRD files, because Hobbit keeps all RRD files for a
given host in a separate directory. So e.g. the "vmstat" graph can
just get the CPU idle-time value with
DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE
i.e. grab the "vmstat.rrd" file, and extract the current average value
of the "cpu_idl" dataset.
You can mix values from different RRD files in the same graph,
e.g. the "vmstat2" graph uses both the "vmstat.rrd" file and the
"la.rrd" file:
DEF:avg=la.rrd:la:AVERAGE
CDEF:la=avg,100,/
DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE
CDEF:cpu_idl2=cpu_idl,100,/
If you have more questions, please ask. And if you have something that
could be of interest to others, I'll be happy to include it with
Hobbit.
Regards,
Henrik
list Charles Jones
How difficult is it to add custom graphs? For example we have several Oracle standby databases that "resync" (import binary database changelogs) for several hours every night. I have a perl script which parses one of the logfiles created by this process, and gets values like the total time it took to import each binary log. Having a graph of the average time it takes to process would be VERY handy for metric and scaling evaluations. The long story short is that I have a script or process that outputs numbers that I want to graph with Hobbit+LARRD. Is it possible? Or, I should be asking, how difficult would it be, and can you give any pointers on what to do. Thanks, -Charles
list Henrik Størner
On Mon, Jan 24, 2005 at 03:14:54PM -0700, Charles Jones wrote:
How difficult is it to add custom graphs?
Well, it does require some programming - it's an advantage if you are familiar with C, since that is the language Hobbit is written in. I don't think it's terribly hard, but I am biased.
▸
For example we have several Oracle standby databases that "resync" (import binary database changelogs) for several hours every night. I have a perl script which parses one of the logfiles created by this process, and gets values like the total time it took to import each binary log. Having a graph of the average time it takes to process would be VERY handy for metric and scaling evaluations.
The hard part usually is getting the data, and you have that already
with your perl script.
Next step is getting the data to the Hobbit server. This is easy;
decide on a unique name for the type of data you're handling - e.g.
"orasync" - and use the "bb" utility to send it off as a "data"
message to Hobbit. E.g. the following script runs your perl script,
stores the output in a temporary file, and uses the "bb" utility from
a Big Brother client installation to send this datafile to Hobbit
in a "data" message:
#!/bin/sh
/foo/perlscript >/tmp/datafile
BBHOME=/usr/local/bbc
export BBHOME
. $BBHOME/etc/bbdef.sh
$BB $BBDISP "data $MACHINE.orasync
`cat /tmp/datafile`
"
Now the fun bit starts. Hobbit will automatically pass data-messages
to all tasks monitoring the "data" channel. hobbitd_larrd is one of
them, so you can either add some code to this "worker module", or you
can create your own module from scratch using your favorite
programming language. An example of a hobbitd worker module is in the
hobbitd_sample.c application included with Hobbit.
Assuming you just add stuff to the existing hobbitd_larrd module,
you must do two things:
1) Write a routine do_orasync_larrd() similar to the other ones
in the hobbitd/larrd/*.c files, that receives the message,
picks out the numbers that you want to store, and saves it in
an RRD file;
2) Add a line to do_larrd.c at the end of the file, so when it sees
the "orasync" message, it calls your do_orasync_larrd() routine.
You need to learn about RRDtool to really do this; the "rrdcreate"
and "rrdgraph" manpages include some tips on how to define RRD's
and how you can setup graphs.
Take a look at one of the simple ones, e.g. hobbitd/larrd/do_bbgen.c
which picks out a single value from the status message that bbgen
sends when updating the web-pages - the do_bbgen_larrd routine simply
finds the string "TIME TOTAL <some number>", picks out the number
(which is the time bbgen takes to generate the webpages and stores it
in an RRD file.
The data stored in the RRD file is described in the bbgen_params
variable (or "orasync_params" in your case):
static char *bbgen_params[] = { "rrdcreate", rrdfn,
"DS:runtime:GAUGE:600:0:U",
rra1, rra2, rra3, rra4, NULL };
The first and last line of this is static; you only change the
"DS:..." line to define the data you store in the rrd.
When you have the data you want, put all of it in the "rrdvalues"
string, with the timestamp in front, and call the
create_and_update_rrd routine to do the work of saving the data.
So now you have an RRD file. Time to put it into hobbitgraph.cfg.
This really depends on the kind of data you are handling.
The last step is to add "orasync" to the GRAPHS definition in
hobbitserver.cfg. This causes the bb-larrdcolumn tool to include the
orasync graph on the "trends" column page.
The first time you do it, it seems complex, and I admit: it isn't
exactly trivial because there are many pieces that need to fit
together. But once you get the first simple graph working you'll
see that it isn't all that hard. And I'll be happy to help you if you
run into problems along the way.
Henrik
list Charles Jones
▸
Henrik Stoerner wrote:
and use the "bb" utility to send it off as a "data"
message to Hobbit. E.g. the following script runs your perl script,
stores the output in a temporary file, and uses the "bb" utility from
a Big Brother client installation to send this datafile to Hobbit
in a "data" message:
#!/bin/sh
/foo/perlscript >/tmp/datafile
BBHOME=/usr/local/bbc
export BBHOME
. $BBHOME/etc/bbdef.sh
$BB $BBDISP "data $MACHINE.orasync
`cat /tmp/datafile`
"
Now the fun bit starts. Hobbit will automatically pass data-messages
to all tasks monitoring the "data" channel. Can you tell me the difference between using "data" and "status"? The reason I ask is because I looked at the hobbitd/larrd/do_bea.c (because it was the smallest one), and I notice in the comments that script that feeds it is using "status" instead of "data". Thanks, -Charles
list Charles Jones
▸
Charles Jones wrote:
Henrik Stoerner wrote:and use the "bb" utility to send it off as a "data" message to Hobbit. E.g. the following script runs your perl script, stores the output in a temporary file, and uses the "bb" utility from a Big Brother client installation to send this datafile to Hobbit in a "data" message: #!/bin/sh /foo/perlscript >/tmp/datafile BBHOME=/usr/local/bbc export BBHOME . $BBHOME/etc/bbdef.sh $BB $BBDISP "data $MACHINE.orasync `cat /tmp/datafile` " Now the fun bit starts. Hobbit will automatically pass data-messages to all tasks monitoring the "data" channel.Can you tell me the difference between using "data" and "status"? The reason I ask is because I looked at the hobbitd/larrd/do_bea.c (because it was the smallest one), and I notice in the comments that script that feeds it is using "status" instead of "data". Thanks, -Charles
Opps, scratch that....the do_bea.c is definitely not the smallest one, and I should have looked at the one you suggested :-) I would still like to know the difference between, and when one should use, data vs status though. Now that I am looking at d0_bbgen.c, it does look very simple...I will give a try at making my own. Thanks again. -Charles
list Charles Jones
Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager? If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :) -Charles
list Daniel J McDonald
▸
On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:
Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager? If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)
Then you wouldn't know that the person who was supposed to be notified really was... Making them copy it off a pager is a cheap "2-factor" authentication.... -- Daniel J McDonald, CCIE # 2495, CNX Austin Energy user-290ce4e24e19@xymon.invalid
list Charles Jones
▸
Daniel J McDonald wrote:
On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager? If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)Then you wouldn't know that the person who was supposed to be notified really was... Making them copy it off a pager is a cheap "2-factor" authentication....
I see what you mean. In my case the alerts go to an email alias that includes both the alert pager and an email list. So it can be any one of a number of people who actually Ack the alert, and they always have to copy and paste it from the email subject. They use the explanation/cause field to indicate who acked ie. "Network cable came unplugged, plugged back in -cjones". I'm just trying to figure out a way to make the Acking process more efficient. -Charles
list Adam Goryachev
▸
On Mon, 2005-01-24 at 16:12 -0700, Charles Jones wrote:
Charles Jones wrote:Henrik Stoerner wrote:and use the "bb" utility to send it off as a "data" message to Hobbit. E.g. the following script runs your perl script, stores the output in a temporary file, and uses the "bb" utility from a Big Brother client installation to send this datafile to Hobbit in a "data" message: #!/bin/sh /foo/perlscript >/tmp/datafile BBHOME=/usr/local/bbc export BBHOME . $BBHOME/etc/bbdef.sh $BB $BBDISP "data $MACHINE.orasync `cat /tmp/datafile` " Now the fun bit starts. Hobbit will automatically pass data-messages to all tasks monitoring the "data" channel.Can you tell me the difference between using "data" and "status"? The > reason I ask is because I looked at the hobbitd/larrd/do_bea.c > (because it was the smallest one), and I notice in the comments that > script that feeds it is using "status" instead of "data". Thanks, -CharlesOpps, scratch that....the do_bea.c is definitely not the smallest one, and I should have looked at the one you suggested :-) I would still like to know the difference between, and when one should use, data vs status though.
AFAIK, status is the result of a test, which should be alarmed/displayed/etc, whereas data is not alarmed/displayed, just handed off to something else to deal with it. (I think in BB it was just appended to a file in bbvar/data/hostname....) Regards, Adam -- -- Adam Goryachev Website Managers Ph: +XX X XXXX XXXX user-eaec2ffb4cbc@xymon.invalid Fax: +XX X XXXX XXXX www.websitemanagers.com.au
list Henrik Størner
▸
On Mon, Jan 24, 2005 at 04:03:31PM -0700, Charles Jones wrote:
Henrik Stoerner wrote:and use the "bb" utility to send it off as a "data" message to Hobbit.
Can you tell me the difference between using "data" and "status"?
A "status" message results in a column on the display, and also has a color (red, green, yellow) that might trigger an alert. A "data" message is never displayed and cannot generate an alert, it is just a way of collecting data. Henrik
list Henrik Størner
▸
In <user-f81b191abbfc@xymon.invalid> Daniel J McDonald <user-290ce4e24e19@xymon.invalid> writes:
On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager? If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)
Then you wouldn't know that the person who was supposed to be notified really was... Making them copy it off a pager is a cheap "2-factor" authentication....
Exactly. But I understand Charles' question, because I've been wanting to do something like that. Our monitoring is handled by a NOC manned 24x7, and when an alert pops up on the Hobbit NK view they raise a trouble-ticket in some other system. The NOC people dont get an e-mail or pager alert, but it would still be nice if they could acknowledge "yes, a TT has been raised about this" to get the problem off their monitor. So I will probably implement some way of putting an "acknowledge" function on the webpages - this would have to be protected with some sort of access control, obviously. Henrik
list Charles Jones
▸
Henrik Stoerner wrote:
On Mon, Jan 24, 2005 at 04:03:31PM -0700, Charles Jones wrote:Henrik Stoerner wrote:and use the "bb" utility to send it off as a "data" message to Hobbit.Can you tell me the difference between using "data" and "status"?A "status" message results in a column on the display, and also has a color (red, green, yellow) that might trigger an alert. A "data" message is never displayed and cannot generate an alert, it is just a way of collecting data.
But data can be collected from a status message as well right? At least I hope so, because I want to update a status, AND graph the result. For instance in my oracle resync scenario, I want to send basically a status that says "Resync completed successfully. 500 files imported. Total Resync Time: 1530 seconds." I want the 1530 to be trended. Will I be able to do that, or will have have to send both a status and a seperate data message? Thanks, -Charles
list Charles Jones
▸
Henrik Storner wrote:
In <user-f81b191abbfc@xymon.invalid> Daniel J McDonald <user-290ce4e24e19@xymon.invalid> writes:On Mon, 2005-01-24 at 17:07 -0700, Charles Jones wrote:Would it be possible, when the status of something goes Red, to have the code printed somewhere on the status page, so that one could use the "Acknowledge alert" option and copy/paste it in, rather than having to get the incident code from email/pager? If we wanted to be really spiffy the acknowledge alert could even have a dropdown/list of current alerts so you wouldn't even have to type it :)Then you wouldn't know that the person who was supposed to be notified really was... Making them copy it off a pager is a cheap "2-factor" authentication....Exactly. But I understand Charles' question, because I've been wanting to do something like that. Our monitoring is handled by a NOC manned 24x7, and when an alert pops up on the Hobbit NK view they raise a trouble-ticket in some other system. The NOC people dont get an e-mail or pager alert, but it would still be nice if they could acknowledge "yes, a TT has been raised about this" to get the problem off their monitor. So I will probably implement some way of putting an "acknowledge" function on the webpages - this would have to be protected with some sort of access control, obviously.
Currently I am simply using a .htaccess file to restrict access, which has been working for me so far, but built-in access control would be nice, particularly for Acks and for Maint.pl. If there were a proper permissions system, you could even define what users could see which groups of hosts! Ahhh I feel the feature creature sneaking up on us! :-) -Charles
list Henrik Størner
▸
On Tue, Jan 25, 2005 at 02:07:42AM -0700, Charles Jones wrote:
Henrik Stoerner wrote:A "status" message results in a column on the display, and also has a color (red, green, yellow) that might trigger an alert. A "data" message is never displayed and cannot generate an alert, it is just a way of collecting data.But data can be collected from a status message as well right? At least I hope so, because I want to update a status, AND graph the result.
Certainly, no problem at all. Hobbit gets most of its RRD graph-data from status messages (the "cpu", "disk", "memory" and network test messages, for instance). That's why you'll see two hobbitd_larrd processes running: One gets the "status" messages, and the other gets the "data" messages. So no, you don't need to do anything special. Whether you send your original data as a status- or a data-message is up to you, as far as collecting the data in an RRD and graphing them, there is no difference. Regards, Henrik
list Tom Georgoulias
Henrik Storner wrote: <snip> Thanks for the explanation of larrd. It helped a lot.
Where do you get the I/O wait information from ?
On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu", so I modified a section of larrd-0.43c's vmstat-larrd.pl so it'd recognize this value and use it when dealing with rhel3 systems. I hacked my client's vmstat larrd bf script to make it determine if the system rhel3 or not, then exported the BBOSNAME as rhel3 so this array assignment would used by vmstat-larrd.pl.
rhel3 => { cpu_r => 0,
cpu_b => 1,
mem_swpd => 2,
mem_free => 3,
mem_buff => 4,
mem_cach => 5,
mem_si => 6,
mem_so => 7,
dsk_bi => 8,
dsk_bo => 9,
cpu_int => 10,
cpu_csw => 11,
cpu_usr => 12,
cpu_sys => 13,
cpu_wait => 14,
cpu_idl => 15,
I might try adding this to hobbitd/larrd/do_vmstat.c and see if I can make it work.
▸
DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE i.e. grab the "vmstat.rrd" file, and extract the current average value of the "cpu_idl" dataset.
You can mix values from different RRD files in the same graph, e.g. the "vmstat2" graph uses both the "vmstat.rrd" file and the "la.rrd" file:
This is nice. Once I figured out what you were doing there, I thought "hey, all I've got to do is set up a def for cpu_wa|cpu_wait and I'm golden." Then I fired up rrdtool and checked the rrd file, only to realize that I didn't have the data to begin with...
▸
If you have more questions, please ask. And if you have something that could be of interest to others, I'll be happy to include it with Hobbit.
I'll be happy to contribute any patches that I generate. Tom Tom
list Tom Georgoulias
▸
Tom Georgoulias wrote:
rhel3 => { cpu_r => 0,
cpu_b => 1,
mem_swpd => 2,
mem_free => 3,
mem_buff => 4,
mem_cach => 5,
mem_si => 6,
mem_so => 7,
dsk_bi => 8,
dsk_bo => 9,
cpu_int => 10,
cpu_csw => 11,
cpu_usr => 12,
cpu_sys => 13,
cpu_wait => 14,
cpu_idl => 15,
I might try adding this to hobbitd/larrd/do_vmstat.c and see if I can
make it work.I was able to get this to work w/o much hassle at all--modifying hobbitd/larrd/do_vmstat.c to include the rhel3 array and lib/misc.c, lib/misc.h to define rhel3 as an os type did the trick. Then I created a vmstat graph config (vmstat_rhel3) in hobbitgraph.cfg that uses all 4 cpu status parameters and referenced that in bb-hosts for my rhel3 systems. Like I said in my last message, the vmstat bottom feeders on the clients have to be configured to set the BBOSTYPE to rhel3 when sending the data to the hobbit server for this to take effect, so this is more of a positive test result than a general purpose solution. I guess the point of this email is that it works just like I wanted it too. Getting it more generalized is my next step. TOm
list Daniel J McDonald
▸
On Tue, 2005-01-25 at 08:27 -0500, Tom Georgoulias wrote:
Henrik Storner wrote: <snip> Thanks for the explanation of larrd. It helped a lot.Where do you get the I/O wait information from ?On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu", so I modified a section of larrd-0.43c's vmstat-larrd.pl so it'd recognize this value and use it when dealing with rhel3 systems. I hacked my client's vmstat larrd bf
Actually, that is present in all kernel 2.6 versions, e.g. Mandrake 10.0 and 10.1. I'd love to be able to capture that - I beat on bb-central for quite a while trying to track it. Tracking wait state is great for figuring out which boxes need more ram. -- Daniel J McDonald, CCIE # 2495, CNX Austin Energy user-290ce4e24e19@xymon.invalid
list Henrik Størner
▸
On Tue, Jan 25, 2005 at 08:27:37AM -0500, Tom Georgoulias wrote:
Henrik Storner wrote:Where do you get the I/O wait information from ?On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu"
Aha! So that's it - I had been wondering a bit why my load graphs didn't always add up to 100% ! This is quite interesting, and definitely something that should be tracked. So I hope you don't mind that I've tried adding it myself ... One annoying bit with the RRD files is that changing the dataset (e.g. adding an extra variable) is not possible. So adding the cpu_wait data will break any existing vmstat data that has been collected. So if we're gonna break the vmstat RRD layout for Linux clients, we might as well do it now before the official release. And that should also include getting the very old layout (the one from Linux 2.2 kernels, with the "r b w" proces-counts) aligned with the new layout - effectively creating a single vmstat RRD format regardless of what Linux version you are running. So: I've modified the Linux vmstat RRD layout to always include the "cpu_w" (from the very old vmstat version) and "cpu_wait" columns (from the latest vmstat versions). If the client doesn't report a value for these, they are set to the special RRD-value "undefined". So when someone upgrades a system from Linux 2.2. to 2.4, or from 2.4 to 2.6, the vmstat data will still work. I've also defined a "vmstat1" graph similar to the normal "vmstat" graph, but with the cpu_wait data added (it stacks on top of the "system" time, below "user" time). Some sample graphs (they don't have any data yet, so you're probably better off waiting a couple of hours before you view them): Linux 2.6 host: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=vmstat1&graph=hourly Linux 2.4 host: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=tyge.sslug.dk&service=vmstat1&graph=hourly Linux 2.2 host (actually 2.4, but an old vmstat version): http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=fenris.hswn.dk&service=vmstat1&graph=hourly Henrik
list Tom Georgoulias
▸
Henrik Stoerner wrote:
On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu"Aha! So that's it - I had been wondering a bit why my load graphs didn't always add up to 100% ! This is quite interesting, and definitely something that should be tracked. So I hope you don't mind that I've tried adding it myself ...
Oh no, please do. Mine is a hack, your's would be a release. ;)
▸
So adding thecpu_wait data will break any existing vmstat data that has been collected. So if we're gonna break the vmstat RRD layout for Linux clients, we might as well do it now before the official release. And that should also include getting the very old layout (the one from Linux 2.2 kernels, with the "r b w" proces-counts) aligned with the new layout - effectively creating a single vmstat RRD format regardless of what Linux version you are running.
Good. Very good.
▸
So: I've modified the Linux vmstat RRD layout to always include the "cpu_w" (from the very old vmstat version)
Isn't that value the number of processes swapped out, the third column from old vmstat? That is basically going to be ignored, unless someone has a custom larrd graph that uses it, right?
▸
and "cpu_wait" columns(from the latest vmstat versions). If the client doesn't report a value for these, they are set to the special RRD-value "undefined". So when someone upgrades a system from Linux 2.2. to 2.4, or from 2.4 to 2.6, the vmstat data will still work.
Cool. I'm looking forward to testing it out in the next beta. Tom
list Chris Morris
Henrik, AIX also reports i/o wait in its vmstat output in column 16 under wa of cpu which it would be nice to have in the graphs. kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 2 238300 239351 0 0 0 60 108 0 110 79 83 8 30 30 32 0 2 238803 238827 0 0 0 0 0 0 498 3066 334 1 2 97 1 Chris
▸
-----Original Message----- From: Henrik Stoerner [SMTP:user-ce4a2c883f75@xymon.invalid] Sent: Tuesday, January 25, 2005 5:04 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] vmstat graphing with CPU io wait On Tue, Jan 25, 2005 at 08:27:37AM -0500, Tom Georgoulias wrote:Henrik Storner wrote:Where do you get the I/O wait information from ?On RHEL3 (procps-2.0.17-10), there is a value for it in column 14 of vmstat's output, labeled "wa" under "cpu"Aha! So that's it - I had been wondering a bit why my load graphs didn't always add up to 100% ! This is quite interesting, and definitely something that should be tracked. So I hope you don't mind that I've tried adding it myself ... One annoying bit with the RRD files is that changing the dataset (e.g. adding an extra variable) is not possible. So adding the cpu_wait data will break any existing vmstat data that has been collected. So if we're gonna break the vmstat RRD layout for Linux clients, we might as well do it now before the official release. And that should also include getting the very old layout (the one from Linux 2.2 kernels, with the "r b w" proces-counts) aligned with the new layout - effectively creating a single vmstat RRD format regardless of what Linux version you are running. So: I've modified the Linux vmstat RRD layout to always include the "cpu_w" (from the very old vmstat version) and "cpu_wait" columns (from the latest vmstat versions). If the client doesn't report a value for these, they are set to the special RRD-value "undefined". So when someone upgrades a system from Linux 2.2. to 2.4, or from 2.4 to 2.6, the vmstat data will still work. I've also defined a "vmstat1" graph similar to the normal "vmstat" graph, but with the cpu_wait data added (it stacks on top of the "system" time, below "user" time). Some sample graphs (they don't have any data yet, so you're probably better off waiting a couple of hours before you view them): Linux 2.6 host:
http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=v mstat1&graph=hourly Linux 2.4 host: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=tyge.sslug.dk&service=vm stat1&graph=hourly Linux 2.2 host (actually 2.4, but an old vmstat version): http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=fenris.hswn.dk&service=v mstat1&graph=hourly Henrik
**************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). *****************************************************************************
list Henrik Størner
▸
On Wed, Jan 26, 2005 at 09:29:31AM -0000, Morris, Chris (Shared Services) wrote:
AIX also reports i/o wait in its vmstat output in column 16 under wa of cpu which it would be nice to have in the graphs. kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 2 238300 239351 0 0 0 60 108 0 110 79 83 8 30 30 32 0 2 238803 238827 0 0 0 0 0 0 498 3066 334 1 2 97 1
Yes, I noticed that when I worked on the vmstat graphs yesterday.
This was already being collected for AIX, so I made sure the names
matched, so that the graph-definitions will work for both Linux and
AIX.
You can try it with the AIX data you have. Add this to
hobbitgraph.cfg - it's the definition for "vmstat1" I wrote
yesterday. It gives you a graph with the CPU usage split into
system, I/O wait, user and idle:
[vmstat1]
TITLE CPU Utilization
YAXIS % Load
-u 100
-r
DEF:cpu_idl=vmstat.rrd:cpu_idl:AVERAGE
DEF:cpu_usr=vmstat.rrd:cpu_usr:AVERAGE
DEF:cpu_sys=vmstat.rrd:cpu_sys:AVERAGE
DEF:cpu_wait=vmstat.rrd:cpu_wait:AVERAGE
AREA:cpu_sys#FF0000:System
STACK:cpu_wait#774400:I/O wait
STACK:cpu_usr#FFFF00:User
STACK:cpu_idl#00FF00:Idle
COMMENT:\n
GPRINT:cpu_sys:LAST:System \: %5.1lf (cur)
GPRINT:cpu_sys:MAX: \: %5.1lf (max)
GPRINT:cpu_sys:MIN: \: %5.1lf (min)
GPRINT:cpu_sys:AVERAGE: \: %5.1lf (avg)\n
GPRINT:cpu_wait:LAST:I/O Wait\: %5.1lf (cur)
GPRINT:cpu_wait:MAX: \: %5.1lf (max)
GPRINT:cpu_wait:MIN: \: %5.1lf (min)
GPRINT:cpu_wait:AVERAGE: \: %5.1lf (avg)\n
GPRINT:cpu_usr:LAST:User \: %5.1lf (cur)
GPRINT:cpu_usr:MAX: \: %5.1lf (max)
GPRINT:cpu_usr:MIN: \: %5.1lf (min)
GPRINT:cpu_usr:AVERAGE: \: %5.1lf (avg)\n
GPRINT:cpu_idl:LAST:Idle \: %5.1lf (cur)
GPRINT:cpu_idl:MAX: \: %5.1lf (max)
GPRINT:cpu_idl:MIN: \: %5.1lf (min)
GPRINT:cpu_idl:AVERAGE: \: %5.1lf (avg)\n
Now find one of your AIX boxes on the Hobbit webpages and look at the
vmstat graphs. Then, in the browser change the part of the URL that
says "service=vmstat" to "service=vmstat1". You should then see the
new graph.
Or put "LARRD:*,vmstat:vmstat1" in the AIX-hosts' entry in bb-hosts
and wait for bb-larrdcolumn to update the set of graphs shown by
default.
Henrik
list Chris Morris
Henrik, Re vmstat1 on AIX - that works a treat - finally a complete graph : ) Thanks Chris
▸
****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.
If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************
list Tom Georgoulias
Henrik, Are the vmstat patches you created ready for beta testing? Care to share them so I can test them out? Tom
list Henrik Størner
▸
On Wed, Jan 26, 2005 at 07:44:21AM -0500, Tom Georgoulias wrote:
Are the vmstat patches you created ready for beta testing? Care to share them so I can test them out?
I plan on putting out a "release candidate" tomorrow. There is a beta6-vmstat.patch file on http://www.hswn.dk/beta/ which has the vmstat changes; applies on top of beta-6. After patching, run "make" and "make install", then restart hobbit (or at least hobbitd_larrd - if you just kill it, then hobbitlaunch will restart it automatically). Make sure you copy over the new hobbitd/etcfiles/hobbitgraph.cfg file to ~hobbit/server/etc/ You also need to delete the existing ~hobbit/data/rrd/*/vmstat.rrd files (at least those from Linux systems), or you will get a lot of errors that it cannot update the vmstat.rrd file. Check the larrd-status.log and larrd-data.log files. Henrik
list Tom Georgoulias
▸
Henrik Stoerner wrote:
On Wed, Jan 26, 2005 at 07:44:21AM -0500, Tom Georgoulias wrote:Are the vmstat patches you created ready for beta testing? Care to share them so I can test them out?I plan on putting out a "release candidate" tomorrow. There is a beta6-vmstat.patch file on http://www.hswn.dk/beta/ which has the vmstat changes; applies on top of beta-6.
Thanks for providing the patch. I applied it and it built without any errors, but I'm still having problems getting it to work. I did copy over the new hobbitgraph.cfg file after installing & deleted the vmstat.rrd for the linux system in question before restarting.
So, my first question: I was looking at the patch and wasn't sure the array order is correct. (I'm not a programmer by any means, so if I'm wrong just say so).
on RHEL3, vmstat's CPU info columns are in this order:
user -12th
system - 13th
IO wait - 14th
idle - 15th
For example (pardon the line wrap):
-bash-2.05b$ vmstat 2
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy wa id
0 1 0 19036 27412 4370032 0 0 214 0 622 649 0 1 50 48
in the patch, you have cpu_idl =14 & cpu_wait=15. Is that backwards? Or am I out of my league (disclaimer: I hardly know anything about C programming).
static vmstat_layout_t vmstat_linux_layout[] = {
{ 0, "cpu_r" },
{ 1, "cpu_b" },
{ -1, "cpu_w" }, /* Not present for 2.4+ kernels, so log as "Undefined" */
{ 2, "mem_swpd" },
{ 3, "mem_free" },
{ 4, "mem_buff" },
{ 5, "mem_cach" },
{ 6, "mem_si" },
{ 7, "mem_so" },
{ 8, "dsk_bi" },
{ 9, "dsk_bo" },
{ 10, "cpu_int" },
{ 11, "cpu_csw" },
{ 12, "cpu_usr" },
{ 13, "cpu_sys" },
{ 14, "cpu_idl" },
{ 15, "cpu_wait" }, /* Requires kernel 2.6, but may not be present */
{ -1, NULL }
};
list Henrik Størner
▸
On Wed, Jan 26, 2005 at 11:47:30AM -0500, Tom Georgoulias wrote:
So, my first question: I was looking at the patch and wasn't sure the array order is correct. (I'm not a programmer by any means, so if I'm wrong just say so). on RHEL3, vmstat's CPU info columns are in this order: user -12th system - 13th IO wait - 14th idle - 15th
Argh! They swapped the order of the IO wait and idle counters!
Well, the simple way of fixing that is to just switch them around
in hobbitgraph.cfg. But cpu_idl is used in a lot of graphs, so that
does get rather messy.
So it's probably better to define RHEL3 as a new OS type, and
setup it's own table for mapping the numbers to the RRD data.
Patch - on top of the previous one - attached. It compiles, but
I haven't tested it. It assumes your vmstat data sends in
"rhel3" as the name of the OS.
Henrik
-------------- next part --------------
--- lib/misc.c 2005/01/20 22:02:23 1.23
+++ lib/misc.c 2005/01/26 17:11:14
@@ -43,6 +43,7 @@
else if (strcmp(osname, "debian3") == 0) result = OS_DEBIAN3;
else if (strcmp(osname, "debian") == 0) result = OS_DEBIAN;
else if (strcmp(osname, "linux") == 0) result = OS_LINUX;
+ else if (strcmp(osname, "rhel3") == 0) result = OS_RHEL3;
else if (strcmp(osname, "snmp") == 0) result = OS_SNMP;
else if (strcmp(osname, "snmpnetstat") == 0) result = OS_SNMP;
--- lib/misc.h 2005/01/17 23:13:41 1.10
+++ lib/misc.h 2005/01/26 17:11:36
@@ -13,7 +13,7 @@
#include <stdio.h>
-enum ostype_t { OS_UNKNOWN, OS_SOLARIS, OS_OSF, OS_FREEBSD, OS_LINUX, OS_REDHAT, OS_DEBIAN3, OS_DEBIAN, OS_HPUX, OS_AIX, OS_SCO, OS_SNMP, OS_WIN32 } ;
+enum ostype_t { OS_UNKNOWN, OS_SOLARIS, OS_OSF, OS_FREEBSD, OS_LINUX, OS_REDHAT, OS_DEBIAN3, OS_DEBIAN, OS_HPUX, OS_AIX, OS_SCO, OS_SNMP, OS_WIN32, OS_RHEL3 } ;
extern enum ostype_t get_ostype(char *osname);
extern int hexvalue(unsigned char c);
--- hobbitd/larrd/do_vmstat.c 2005/01/25 17:53:45 1.9
+++ hobbitd/larrd/do_vmstat.c 2005/01/26 17:10:26
@@ -119,6 +119,31 @@
{ -1, NULL }
};
+/*
+ * This one is for Red Hat Enterprise Linux 3. Identical to the "linux" layout,
+ * except Red Hat for some reason decided to swap the cpu_wait and cpu_idle columns.
+ */
+static vmstat_layout_t vmstat_rhel3_layout[] = {
▸
+ { 0, "cpu_r" },
+ { 1, "cpu_b" },
+ { -1, "cpu_w" },
+ { 2, "mem_swpd" },
+ { 3, "mem_free" },
+ { 4, "mem_buff" },
+ { 5, "mem_cach" },
+ { 6, "mem_si" },
+ { 7, "mem_so" },
+ { 8, "dsk_bi" },
+ { 9, "dsk_bo" },
+ { 10, "cpu_int" },
+ { 11, "cpu_csw" },
+ { 12, "cpu_usr" },
+ { 13, "cpu_sys" },
+ { 14, "cpu_wait" },
+ { 15, "cpu_idl" },
+ { -1, NULL }
+};
• /* This one is for Debian 3.0 (Woody), and possibly others with a Linux 2.2 kernel */
static vmstat_layout_t vmstat_debian3_layout[] = {
{ 0, "cpu_r" },
@@ -218,8 +243,9 @@
case OS_LINUX:
case OS_REDHAT:
case OS_DEBIAN:
- layout = vmstat_linux_layout;
- break;
+ layout = vmstat_linux_layout; break;
+ case OS_RHEL3:
+ layout = vmstat_rhel3_layout; break;
case OS_DEBIAN3:
layout = vmstat_debian3_layout; break;
case OS_FREEBSD:
list Charles Jones
Will Hobbit play nice with bbfetch? If I recall, bbfetch is run on the BBDISPLAY server, and scp's the raw status files generated by the remote clients modified $BBHOME/bin/bb. I'm wondering if it would still work since Hobbit has no bbvar directory...I think it would because I think bb-fetch uses the $BBTMP variable, and from what I have seen Hobbit populates all the usual bb variables. If bb-fetch won't work with Hobbit, it might be nice to incorporate similar functionality in, as it is quite useful for situations where bbproxy won't do the trick because of one-way firewall issues. -Charles
list Henrik Størner
▸
On Wed, Jan 26, 2005 at 12:32:45PM -0700, Charles Jones wrote:
Will Hobbit play nice with bbfetch? If I recall, bbfetch is run on the BBDISPLAY server, and scp's the raw status files generated by the remote clients modified $BBHOME/bin/bb.
It might not ... I haven't tried bbfetch myself, so I cannot say. But it would probably be pretty easy to come up with a script that picks up the status-files that bbfetch collects, and sends them off to the Hobbit daemon via the normal Hobbit "bb" command.
If bb-fetch won't work with Hobbit, it might be nice to incorporate similar functionality in, as it is quite useful for situations where bbproxy won't do the trick because of one-way firewall issues.
I have some ideas for a Hobbit client, and yes - making it work in both a "push" (normal client) and a "pull" (bbfetch style) setup it necessary. Henrik
list Charles Jones
My production BigBrother server is running BigBrother + bbgen 2.5 (I know there is newer bbgen, I plan on replacing BB with a Hobbit server). My current bb+bbgen setup has problems whenever a machine dies in such a way that it is pingable, but when you connect to any open TCP port you get nothing back (usually caused by a memory error or overheating). When my current bb+bbgen setup tries to test one of these machines that has zombified, it gets hung testing that host, and eventually everything turns purple since bb isn't updating anymore. Does Hobbit have proper timeouts to timeout a hung TCP connection so this sort of thing does not happen? For all I know this behavior was fixed in bbgen 3.x , but as I said I plan on just phasing out my BB server in favor of Hobbit. -Charles
list Henrik Størner
▸
On Wed, Jan 26, 2005 at 03:17:01PM -0700, Charles Jones wrote:
My production BigBrother server is running BigBrother + bbgen 2.5 (I know there is newer bbgen, I plan on replacing BB with a Hobbit server).
Wow, that's a pretty old bbgen version - 1œ years, in fact.
▸
My current bb+bbgen setup has problems whenever a machine dies in such a way that it is pingable, but when you connect to any open TCP port you get nothing back (usually caused by a memory error or overheating). When my current bb+bbgen setup tries to test one of these machines that has zombified, it gets hung testing that host, and eventually everything turns purple since bb isn't updating anymore. Does Hobbit have proper timeouts to timeout a hung TCP connection so this sort of thing does not happen?
If not, then it's definitely a bug. All network tests done by Hobbit must timeout if the other end doesn't respond. The default timeout is 10 seconds (set with the "--timeout=N" option to bbtest-net). Looking back through the bbgen changelog, there are a couple of bugfixes through the 2.x series that seem likely to fix it. But without knowing exactly what's triggering this behaviour it is hard to say for sure. Henrik
list Charles Jones
▸
Henrik Stoerner wrote:
All network tests done by Hobbit must timeout if the other end doesn't respond. The default timeout is 10 seconds (set with the "--timeout=N" option to bbtest-net).
The problem is, the ports DO respond, you can telnet for example to port 25, and it connects...but the daemon does not respond...you can input text you will get nothing back, and unless you ^], break, the telnet session it will stay hung and connected indefinitely. It's these sort of hangs I'm hoping Hobbit can sense and timeout on. Unfortunately the only way for me to test it, is for a machine to lock up in that manner, and although it happens every now and then, I cannot reproduce it at will. -Charles
list Tom Georgoulias
▸
Henrik Stoerner wrote:
on RHEL3, vmstat's CPU info columns are in this order: user -12th system - 13th IO wait - 14th idle - 15thArgh! They swapped the order of the IO wait and idle counters!
Frustrating, huh? And I'll bet it'll match Fedora's and others when procps gets updated in a future batch of errata. :(
▸
Well, the simple way of fixing that is to just switch them around in hobbitgraph.cfg. But cpu_idl is used in a lot of graphs, so that does get rather messy. So it's probably better to define RHEL3 as a new OS type, and setup it's own table for mapping the numbers to the RRD data.
That's what I've been doing. One problem that remains for me when doing this, or maybe there for other OSes as well, is the continued use of the "vmstat" graph in the vmstat status page. I'm going to try and adjust that so the rhel3 systems use vmstat1 and other OSes use whatever they need.
▸
Patch - on top of the previous one - attached. It compiles, but I haven't tested it. It assumes your vmstat data sends in "rhel3" as the name of the OS.
I was going to share the patch I created, which looks almost the same, but I went ahead and used yours instead, though, just to be in sync with your sources. Tom
list Tom Georgoulias
▸
Tom Georgoulias wrote:
Patch - on top of the previous one - attached. It compiles, but I haven't tested it. It assumes your vmstat data sends in "rhel3" as the name of the OS.I was going to share the patch I created, which looks almost the same, but I went ahead and used yours instead, though, just to be in sync with your sources.
I think I spoke too soon. My Red Hat 7.1/7.3 systems need to use the same layout as debian3, so I had that in my patch. I also created the cpu_wait column for my freebsd systems, but left it undefined so that every system could use the same vmstat graph. For those that track IOwait, it'll use it. For those that do not, the parameter will show up in the legend and keep the value "nan". Not the prettiest, but much easier to maintain. Patch is attached, which relies on yours already being in place, in case you are interested. I hesitate to push for inclusion since RH 8.0, 9 and what ever else is out there may report their BBOSNAME as "redhat" but use a different vmstat, plus not everyone wants their graphs to include a parameter that might not exist. It's out there for whoever wants to use it. I also included a simple, tiny patch to add an echo statement for starthobbit.sh that tells the user hobbit is stoppped, much like the message displayed when starting. I put it there as a way to clarify what is happening when the rest of my team starts messing around with hobbit. I'm thinking of creating a new symlink called "runhobbit.sh", just to match the old BB style and to try and avoid any confusion that may go along with a command that looks like this "starthobbit.sh stop" Tom
list Charles Jones
I think it would be cool if Hobbit graphed the number of alerts it sent out. It could be included on the hobbitd status page. Trending alerts is good for showing how much pages the Oncall persons are responding to :-) -Charles
list Charles Jones
I am still unable to get the elusive apache1-apache3 graphs to display. Here's my relavant bb-hosts entries" paeg WEB Web Sites 1.2.3.4 www.mysite.com # noconn http://www.mysite.com apache=http://1.2.3.4/server-status?auto LARRD:*,apache:apache1|apache2|apache3 I have verified that going to http://1.2.3.4/server-status?auto works, here is the data it returns: Total Accesses: 237 Total kBytes: 1279 CPULoad: 9.10606 Uptime: 66 ReqPerSec: 3.59091 BytesPerSec: 19843.9 BytesPerReq: 5526.14 BusyWorkers: 2 IdleWorkers: 12 Can you see anything I'm doing wrong? Note I also tried just having the simple keyword "apache" instead of apache=..., still no luck. I'm not even getting an "apache" column (although I wouldn't mind if the graphs just appeared in the http status info page). Scoreboard: _C_________W__..................................................................................................................................................................................................................................................
list Henrik Størner
▸
On Thu, Jan 27, 2005 at 10:19:29PM -0700, Charles Jones wrote:
I am still unable to get the elusive apache1-apache3 graphs to display. Here's my relavant bb-hosts entries" paeg WEB Web Sites 1.2.3.4 www.mysite.com # noconn http://www.mysite.com apache=http://1.2.3.4/server-status?auto LARRD:*,apache:apache1|apache2|apache3
Do you have an apache.rrd file in ~/data/rrd/www.mysite.com/ ? If you do, then the graphs should show up on the "trends" page after a while; the "trends" page is updated every 15 minutes by default so it may take a while after you change the bb-hosts file for the new graphs to show up. If not, then there's a problem with the data collection. But your bb-hosts entry looks right, and it seems your server sends the right data. Henrik
list Charles Jones
▸
Henrik Stoerner wrote:
On Thu, Jan 27, 2005 at 10:19:29PM -0700, Charles Jones wrote:I am still unable to get the elusive apache1-apache3 graphs to display. Here's my relavant bb-hosts entries" paeg WEB Web Sites 1.2.3.4 www.mysite.com # noconn http://www.mysite.com apache=http://1.2.3.4/server-status?auto LARRD:*,apache:apache1|apache2|apache3Do you have an apache.rrd file in ~/data/rrd/www.mysite.com/ ?
Yep: -rw-r--r-- 1 hobbit other 114492 Jan 28 02:56 apache.rrd
▸
If you do, then the graphs should show up on the "trends" page after a while; the "trends" page is updated every 15 minutes by default so it may take a while after you change the bb-hosts file for the new graphs to show up.
It's been in the bb-hosts file for a few hours and still no apache column, nor extra graphs in the http column status page.
▸
If not, then there's a problem with the data collection. But your bb-hosts entry looks right, and it seems your server sends the right data.
How do we troubleshoot this? I checked the apache server logs and the access log shows that Hobbit is hitting the server-status url. Is there some way to manually query data from the apache.rrd file to see if it has anything in it? -Charles
list Charles Jones
Okay, scratch my last message. I re-read what you said and looked at the *trends* page and the graphs are there. This is what I get for working on stuff past midnight :-) Now my question is, how can I get the apache graphs to display on the httpd page as well as in trends page? -Charles
▸
Charles Jones wrote:
Henrik Stoerner wrote:On Thu, Jan 27, 2005 at 10:19:29PM -0700, Charles Jones wrote:I am still unable to get the elusive apache1-apache3 graphs to display. Here's my relavant bb-hosts entries" paeg WEB Web Sites 1.2.3.4 www.mysite.com # noconn http://www.mysite.com apache=http://1.2.3.4/server-status?auto LARRD:*,apache:apache1|apache2|apache3Do you have an apache.rrd file in ~/data/rrd/www.mysite.com/ ?Yep: -rw-r--r-- 1 hobbit other 114492 Jan 28 02:56 apache.rrdIf you do, then the graphs should show up on the "trends" page after a while; the "trends" page is updated every 15 minutes by default so it may take a while after you change the bb-hosts file for the new graphs to show up.It's been in the bb-hosts file for a few hours and still no apache column, nor extra graphs in the http column status page.If not, then there's a problem with the data collection. But your bb-hosts entry looks right, and it seems your server sends the right data.How do we troubleshoot this? I checked the apache server logs and the access log shows that Hobbit is hitting the server-status url. Is there some way to manually query data from the apache.rrd file to see if it has anything in it? -Charles
list Henrik Størner
▸
On Fri, Jan 28, 2005 at 03:06:15AM -0700, Charles Jones wrote:
Okay, scratch my last message. I re-read what you said and looked at the *trends* page and the graphs are there. This is what I get for working on stuff past midnight :-)
I had the same feeling last night.
Now my question is, how can I get the apache graphs to display on the httpd page as well as in trends page?
Right now you cannot. Ideally, bb-hostsvc.cgi that generates the html view of a status log would pick up the LARRD setting from bb-hosts, and give you the same graphs that you get on the trends page. It doesn't right now - for historical reasons, mostly. Henrik
list Charles Jones
I'm assuming this wont work with Hobbit, since Hobbit stores the rrd files differently. Do you think temperature-larrd.pl could be modified to run on the Hobbit server and work? Or should I instead attempt to hack the client temperature.sh to send the temp as a data message and then create a do_temp.c module? Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc). I've been up all night because of temperature issues in my server room, so forgive me if I'm not making much sense :-) -Charles
list Henrik Størner
▸
On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:
I'm assuming this wont work with Hobbit, since Hobbit stores the rrd files differently. Do you think temperature-larrd.pl could be modified to run on the Hobbit server and work? Or should I instead attempt to hack the client temperature.sh to send the temp as a data message and then create a do_temp.c module?
I looked at converting temperature-larrd.pl when doing the Hobbit larrd stuff, but I couldn't find the script that feeds it - and without some idea of what the input data looks like, it's a bit hard to do the data collection. Where can I find the client side script ? Or perhaps you can just send me a sample of the status it reports.
Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc).
You mean doing it in C is too hard :-) The current work-around is to enable the hobbitd_filestore module to save status- and data-reports to files, the way Big Brother does. There's an option for hobbitd_filestore so you need not save all status logs on disk, but only the ones you want to process with some other tool. Henrik
list Charles Jones
▸
Henrik Stoerner wrote:
On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:I'm assuming this wont work with Hobbit, since Hobbit stores the rrd files differently. Do you think temperature-larrd.pl could be modified to run on the Hobbit server and work? Or should I instead attempt to hack the client temperature.sh to send the temp as a data message and then create a do_temp.c module?I looked at converting temperature-larrd.pl when doing the Hobbit larrd stuff, but I couldn't find the script that feeds it - and without some idea of what the input data looks like, it's a bit hard to do the data collection. Where can I find the client side script ? Or perhaps you can just send me a sample of the status it reports.
The client script is on deadcat.net - http://www.deadcat.net/viewfile.php?fileid=501 Here is a sample status message, from my BigBrother server that is using it: logs]# cat *temp green Fri Jan 28 09:13:19 MST 2005 Temperature status: Device Temp(C) Temp(F) &green AMBIENT 24 75 &green CPU0 40 104 &green CPU1 40 104 &green CPU2 40 104 &green CPU3 40 104 Status green: All devices look okay Status unchanged in 5.12 hours Status message received from 1.2.3.4 Note that the output can vary depending on which kind of machine temperature.sh is run on, but I believe they all have AMBIENT so thats the main value we want to grab and trend
▸
Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc).You mean doing it in C is too hard :-)
Okay ya got me there :P
▸
The current work-around is to enable the hobbitd_filestore module to save status- and data-reports to files, the way Big Brother does. There's an option for hobbitd_filestore so you need not save all status logs on disk, but only the ones you want to process with some other tool.
Blah...I'm trying to not use any of the backwards compatible features...I want new and improved all the way :-)
list Daniel J McDonald
▸
On Wed, 2005-01-26 at 21:56 +0100, Henrik Stoerner wrote:
On Wed, Jan 26, 2005 at 12:32:45PM -0700, Charles Jones wrote:Will Hobbit play nice with bbfetch? If I recall, bbfetch is run on the BBDISPLAY server, and scp's the raw status files generated by the remote clients modified $BBHOME/bin/bb.It might not ... I haven't tried bbfetch myself, so I cannot say. But it would probably be pretty easy to come up with a script that picks up the status-files that bbfetch collects, and sends them off to the Hobbit daemon via the normal Hobbit "bb" command.If bb-fetch won't work with Hobbit, it might be nice to incorporate similar functionality in, as it is quite useful for situations where bbproxy won't do the trick because of one-way firewall issues.I have some ideas for a Hobbit client, and yes - making it work in both a "push" (normal client) and a "pull" (bbfetch style) setup it necessary.
I'm actually rather fond of the bb-central style - no clients, all of the "client like scripts" run via ssh from the server. bb-fetch has trouble with time - the remote client wipes the status files on it's own schedule, and the server picks them up on it's own schedule, and if the client isn't done writing you end up with lots of purples... I haven't tried bb-central with hobbit yet. bbmap is still my next priority now that I've got a really good bbmrtg.pl running. But I've got to get bb-central up soon
list Charles Jones
Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line. Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past? This would be helpful, as I have a server whose load average shot up to over 100 because of a problem. Only lasted a few mins, but now my load average graph looks like a tall weed growing in the middle of a golf green :-) -Charles
list Henrik Størner
▸
On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:
Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line. Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?
Yes, but it's actually a feature that exists in RRDtool, which is used
to generate the graphs. Just change hobbitgraph.cfg's [la] definition:
[la]
... some lines ...
CDEF:la=avg,100,/
--upper-limit 3.0
--rigid
.... more lines ...
The --upper-limit sets the value for the top of the graph; the --rigid
makes this setting "rigid" so even if there are larger values in the
dataset, the graph will not adapt to show these.
It might be an idea to let this upper value be determined by the CGI
so you can adjust it dynamically. I'll have a look at that.
Henrik
list Gordon Thiesfeld
Is this something that is going to be implemented into hobbit in the future? If not, I'd appreciate some help modifying hobbitd_larrd. The only thing I can do with C is spell it:-) I'm not using temperature.sh on the client side, but I can format my script so that the output will match it for consistency. Thanks, Gordon
▸
From: Charles Jones [mailto:user-e86b4aeade4e@xymon.invalid] Sent: Friday, January 28, 2005 10:18 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] temperature-larrd.pl Henrik Stoerner wrote: On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote: I'm assuming this wont work with Hobbit, since Hobbit stores the rrd files differently. Do you think temperature-larrd.pl could be modified to run on the Hobbit server and work? Or should I instead attempt to hack the client temperature.sh to send the temp as a data message and then create a do_temp.c module? I looked at converting temperature-larrd.pl when doing the Hobbit larrd stuff, but I couldn't find the script that feeds it - and without some idea of what the input data looks like, it's a bit hard to do the data collection. Where can I find the client side script ? Or perhaps you can just send me a sample of the status it reports. The client script is on deadcat.net - http://www.deadcat.net/viewfile.php?fileid=501 <http://www.deadcat.net/viewfile.php?fileid=501>; Here is a sample status message, from my BigBrother server that is using it: logs]# cat *temp green Fri Jan 28 09:13:19 MST 2005 Temperature status: Device Temp(C) Temp(F) &green AMBIENT 24 75 &green CPU0 40 104 &green CPU1 40 104 &green CPU2 40 104 &green CPU3 40 104 Status green: All devices look okay Status unchanged in 5.12 hours Status message received from 1.2.3.4 Note that the output can vary depending on which kind of machine temperature.sh is run on, but I believe they all have AMBIENT so thats the main value we want to grab and trend Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc). You mean doing it in C is too hard :-) Okay ya got me there :P The current work-around is to enable the hobbitd_filestore module to save status- and data-reports to files, the way Big Brother does. There's an option for hobbitd_filestore so you need not save all status logs on disk, but only the ones you want to process with some other tool. Blah...I'm trying to not use any of the backwards compatible features...I want new and improved all the way :-)
list Henrik Størner
▸
On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:
Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line. Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?
I've worked on this a bit, and so far I've come up with the solution that you can see on my site - e.g. if you go to http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the first host, and click the graph to see the 4 periodic graphs. Direct link: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la At the bottom of the page you can define the lower- and upper-limit for the graph, and which periodic graph you want to see. E.g enter "0.5" in the upper-limit, click "View" and you'll get graphs that cut off spikes at load 0.5. Something like that you had in mind ? Henrik
list Henrik Størner
▸
On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:
Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc).
Sounds reasonable. I've found a way of doing this that keeps as much
as possible of the RRD handling in Hobbit, and makes it easy to use
custom scripts (written in your favourite scripting language) to
process a message and pick out the interesting data you want to put
into a graph.
Basically, you tell hobbitd_larrd which status- or data-messages are
handled by an external script, and what the script is. Your script
is then called when such a message arrives, and is fed the status
message in a file. In return, the script must output the RRD
definitions for the data you want to store, a filename for the RRD
file, and the values.
E.g. if you have a message like
green Weather in Copenhagen is FAIR
Temperature: 6
Wind: 4
Humidity: 72
and you want to track these, then this script would do:
#!/bin/sh
# Input parameters: Hostname, testname (column), and messagefile
HOSTNAME="$1"
TESTNAME="$2"
FNAME="$3"
# Analyze the message we got
TEMP=`cat $FNAME | grep "^Temperature:" | awk '{print $2}'
WIND=`cat $FNAME | grep "^Wind:" | awk '{print $2}'
HMTY=`cat $FNAME | grep "^Humidity:" | awk '{print $2}'
# The RRD dataset definition
echo "DS:temperature:GAUGE:600:-30:50"
echo "DS:wind:GAUGE:600:0:U"
echo "DS:humidity:GAUGE:600:0:100"
# The filename
echo "weather.rrd"
# The data
echo "$TEMP:$WIND:$HMTY"
exit 0
Does that seem like a usable plug-in facility ?
Henrik
list Charles Jones
That would be cool. It would require users to learn about the RRD definition options, but at least they wouldn't have to code and compile their own C module. Would it also create a default entry for them in hobbitgraph.cfg, or would they have to learn those options as well? The plugin system sounds cool...any chance it will make it into an RC before 4.0 final? -Charles
▸
Henrik Stoerner wrote:
On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc).Sounds reasonable. I've found a way of doing this that keeps as much as possible of the RRD handling in Hobbit, and makes it easy to use custom scripts (written in your favourite scripting language) to process a message and pick out the interesting data you want to put into a graph. Basically, you tell hobbitd_larrd which status- or data-messages are handled by an external script, and what the script is. Your script is then called when such a message arrives, and is fed the status message in a file. In return, the script must output the RRD definitions for the data you want to store, a filename for the RRD file, and the values. E.g. if you have a message like green Weather in Copenhagen is FAIR Temperature: 6 Wind: 4 Humidity: 72 and you want to track these, then this script would do: #!/bin/sh # Input parameters: Hostname, testname (column), and messagefile HOSTNAME="$1" TESTNAME="$2" FNAME="$3" # Analyze the message we got TEMP=`cat $FNAME | grep "^Temperature:" | awk '{print $2}' WIND=`cat $FNAME | grep "^Wind:" | awk '{print $2}' HMTY=`cat $FNAME | grep "^Humidity:" | awk '{print $2}' # The RRD dataset definition echo "DS:temperature:GAUGE:600:-30:50" echo "DS:wind:GAUGE:600:0:U" echo "DS:humidity:GAUGE:600:0:100" # The filename echo "weather.rrd" # The data echo "$TEMP:$WIND:$HMTY" exit 0 Does that seem like a usable plug-in facility ?
list Bruce Lysik
Does that seem like a usable plug-in facility ?
Something like that sounds awesome. In the near future I'm going to want to graph some data from SQL queries, and this would be much simpler. -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
list Kevin Hanrahan
I like that feature a lot! -----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Friday, February 04, 2005 7:11 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Graph Limits Importance: Low
▸
On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line. Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?
I've worked on this a bit, and so far I've come up with the solution that you can see on my site - e.g. if you go to http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the first host, and click the graph to see the 4 periodic graphs. Direct link: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la At the bottom of the page you can define the lower- and upper-limit for the graph, and which periodic graph you want to see. E.g enter "0.5" in the upper-limit, click "View" and you'll get graphs that cut off spikes at load 0.5. Something like that you had in mind ? Henrik
list Kevin Hanrahan
I could use this for a great number of data sets! -----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Friday, February 04, 2005 5:51 PM To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] Custom graphs (was: temperature-larrd.pl) Importance: Low
▸
On Fri, Jan 28, 2005 at 04:47:27AM -0700, Charles Jones wrote:Speaking of this, it sure would be nice to have some sort of plugin system, or something for easily creating custom graphs. I can think of many uses for simple one-element graphs (temperature, emails sent per day, etc).
Sounds reasonable. I've found a way of doing this that keeps as much as
possible of the RRD handling in Hobbit, and makes it easy to use custom
scripts (written in your favourite scripting language) to process a message
and pick out the interesting data you want to put into a graph.
Basically, you tell hobbitd_larrd which status- or data-messages are handled
by an external script, and what the script is. Your script is then called
when such a message arrives, and is fed the status message in a file. In
return, the script must output the RRD definitions for the data you want to
store, a filename for the RRD file, and the values.
E.g. if you have a message like
green Weather in Copenhagen is FAIR
Temperature: 6
Wind: 4
Humidity: 72
and you want to track these, then this script would do:
#!/bin/sh
# Input parameters: Hostname, testname (column), and messagefile
HOSTNAME="$1"
TESTNAME="$2"
FNAME="$3"
# Analyze the message we got
TEMP=`cat $FNAME | grep "^Temperature:" | awk '{print $2}'
WIND=`cat $FNAME | grep "^Wind:" | awk '{print $2}'
HMTY=`cat $FNAME | grep "^Humidity:" | awk '{print $2}'
# The RRD dataset definition
echo "DS:temperature:GAUGE:600:-30:50"
echo "DS:wind:GAUGE:600:0:U"
echo "DS:humidity:GAUGE:600:0:100"
# The filename
echo "weather.rrd"
# The data
echo "$TEMP:$WIND:$HMTY"
exit 0
Does that seem like a usable plug-in facility ?
Henrik
list Charles Jones
I like! :-) I like he cpu test too...is that a test that is available on the hobbit server? -Charles
▸
-----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Friday, February 04, 2005 7:11 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Graph Limits Importance: Low On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:Sometimes a large spike can ruin a graph, because the graph then scales to the max size of the spike, so the rest of the graph isn't very readable as it is now scrunched down to almost a flat line. Is it possible to have an option for Hobbits larrding so that you can define a maximum threshold not to extend past?I've worked on this a bit, and so far I've come up with the solution that you can see on my site - e.g. if you go to http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the first host, and click the graph to see the 4 periodic graphs. Direct link: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la At the bottom of the page you can define the lower- and upper-limit for the graph, and which periodic graph you want to see. E.g enter "0.5" in the upper-limit, click "View" and you'll get graphs that cut off spikes at load 0.5. Something like that you had in mind ?
list Henrik Størner
▸
On Sat, Feb 05, 2005 at 04:16:09AM -0700, Charles Jones wrote:
I like! :-) I like he cpu test too...is that a test that is available on the hobbit server?
The cpu (and the other client-side tests) show up if you install the Big Brother client on the servers you're monitoring. You can get it as part of the Big Brother package on bb4.org. There is some work underway for a free (i.e. Open Source) client implementation; Emanuel Dreyfus has been working on a client for NetBSD, and I think it's designed so it will be easy to make it handle other Unix-like operating systems. Regards, Henrik
list Charles Jones
I'm using the bb client on remote machines, including the cpu test...Hobbit just graphs the load average but doesn't include the ps output like yours does. I'm going to guess it's because the bb clients I am using are 1.9e :-) -Charles
▸
Henrik Stoerner wrote:
On Sat, Feb 05, 2005 at 04:16:09AM -0700, Charles Jones wrote:I like! :-) I like he cpu test too...is that a test that is available on the hobbit server?The cpu (and the other client-side tests) show up if you install the Big Brother client on the servers you're monitoring. You can get it as part of the Big Brother package on bb4.org. There is some work underway for a free (i.e. Open Source) client implementation; Emanuel Dreyfus has been working on a client for NetBSD, and I think it's designed so it will be easy to make it handle other Unix-like operating systems.
list Henrik Størner
▸
On Sat, Feb 05, 2005 at 04:32:19AM -0700, Charles Jones wrote:
I'm using the bb client on remote machines, including the cpu test...Hobbit just graphs the load average but doesn't include the ps output like yours does. I'm going to guess it's because the bb clients I am using are 1.9e :-)
You can get that too, if you have "top" installed on the systems. In your client's etc/bbsys.local, add this: TOP="/usr/bin/top" TOPARGS="-b -n 1" export TOP TOPARGS This is for "top" on Linux, I think the Solaris parameters need a bit of tweaking. Henrik
list Henrik Størner
▸
On Thu, Feb 03, 2005 at 11:41:58AM -0600, Thiesfeld, Gordon wrote:
Is this something that is going to be implemented into hobbit in the future? If not, I'd appreciate some help modifying hobbitd_larrd. The only thing I can do with C is spell it:-)
Yes, I've added a "temperature" handler to Hobbit's RRD module.
▸
I'm not using temperature.sh on the client side, but I can format my script so that the output will match it for consistency.
If you do, then it should pick up the data and graph it automatically. Henrik
list Olivier Beau
Hi, Cacti does this a cool way (with php/gd); a demo site : http://www.bigspring.k12.pa.us/cacti/graph_view.php?action=tree&tree_id=31&leaf_id=408&select_first=true click on the magnifying glass and then select in the graph what you want to zoom on... Olivier Selon Henrik Stoerner <user-ce4a2c883f75@xymon.invalid>:
▸
On Tue, Feb 01, 2005 at 05:10:38PM -0700, Charles Jones wrote:Sometimes a large spike can ruin a graph, because the graph then scales > to the max size of the spike, so the rest of the graph isn't very > readable as it is now scrunched down to almost a flat line. Is it > possible to have an option for Hobbits larrding so that you can define a > maximum threshold not to extend past?I've worked on this a bit, and so far I've come up with the solution that you can see on my site - e.g. if you go to http://www.hswn.dk/hobbit/servers/ , pick the "cpu" display for the first host, and click the graph to see the 4 periodic graphs. Direct link: http://www.hswn.dk/hobbit-cgi/hobbitgraph.sh?host=voodoo.hswn.dk&service=la At the bottom of the page you can define the lower- and upper-limit for the graph, and which periodic graph you want to see. E.g enter "0.5" in the upper-limit, click "View" and you'll get graphs that cut off spikes at load 0.5. Something like that you had in mind ? Henrik
--
Olivier Beau
list Henrik Størner
▸
On Sun, Feb 06, 2005 at 09:57:09AM +0100, user-fe6e0e6a0d05@xymon.invalid wrote:
Hi, Cacti does this a cool way (with php/gd); a demo site : http://www.bigspring.k12.pa.us/cacti/graph_view.php?action=tree&tree_id=31&leaf_id=408&select_first=true click on the magnifying glass and then select in the graph what you want to zoom
Agreed, that *is* cool. It seems they are also using rrdtool as the back-end, so it shouldn't be too hard to implement something similar in Hobbit. For now, I'll leave it "as-is" - but this looks like one of those "wow" features that are very nice to have when you're trying to convince someone to use Hobbit :-) So I'll get back to it later, unless someone else would like to contribute code for it. Henrik
list Daniel Magnuszewski
I have created this functionality with a Big Brother external script that integrates Big Brother and Cacti. The script is called BigCactus and it's available on deadcat. I am very new to Hobbit, and I have just joined the list (I haven't even installed hobbit yet). With that said, if BB external scripts work in Hobbit, then this should work no problem. If you already have Cacti installed, you can integrate the two, similar to that of the bbmrtg mrtg scripts (with the ability to set thresholds, etc). The only problem currently is that I haven't written much threshold checking tests (only for Novell servers so far). So by using this script, it will allow you to go right from the hobbit page to the cacti page that contains the zoom functionality. One problem I found with the magnifying glass zoom (if/when someone begins writing this functionality) is that when you zoom in on the graph, there is a DHTML layering over the image. This causes problems when trying to print out the graph in IE, but apparently not firefox, mozilla, etc. Daniel Magnuszewski CCNA M & T Bank user-2179d46e0f82@xymon.invalid
user-ce4a2c883f75@xymon.invalid 2/6/2005 4:04:59 AM >>>
▸
On Sun, Feb 06, 2005 at 09:57:09AM +0100, user-fe6e0e6a0d05@xymon.invalid wrote:
Cacti does this a cool way (with php/gd); a demo site :
http://www.bigspring.k12.pa.us/cacti/graph_view.php?action=tree&tree_id=31&leaf_id=408&select_first=true
click on the magnifying glass and then select in the graph what you want to zoom
It seems they are also using rrdtool as the back-end, so it shouldn't be too hard to implement something similar in Hobbit. For now, I'll leave it "as-is" - but this looks like one of those "wow" features that are very nice to have when you're trying to convince someone to use Hobbit :-) So I'll get back to it later, unless someone else would like to contribute code for it.
list Bruce Lysik
Hey, Just wanted to say I started using the custom graphing feature. Manager is very impressed. Thanks Henrik. -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
list Craig Boyce
Hi, I have moved from BB to hobbit and am having problems getting the memory data graphed from all the windows clients, There is no graph being generated for memory while Cpu and Disk are. I have also added Henrik's Citrix script and these stats are not being passed through. I have checked the LARRDS and GRAPHS section of hobbitserver.cfg and the entries to graph these items are enabled by default. I have enabled the debug options for larrdstatus and larrddata, In checking the log files I can see no entries for the client memory or the citrix stats. Any Ideas. Thanks Craig Boyce ##################################################################################### Disclaimer: The information in this electronic mail message is confidential and may be legally privileged. It is intended solely for the Addressee.Access to this internet electronic mail message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it is prohibited and may be unlawful. If you have received this message in error please notify us immediately. Rodney District Council accepts no responsibility for any effects this email message or attachments has on the recipient network or computer system. #####################################################################################
list Andy France
Hi Craig, "Craig Boyce" wrote on 21/06/2005 14:05:21:
▸
Hi, I have moved from BB to hobbit and am having problems getting the memory data graphed from all the windows clients, There is no graph being generated for memory while Cpu and Disk are. I have also added Henrik's Citrix script and these stats are not being passed through. I have checked the LARRDS and GRAPHS section of hobbitserver.cfg and the entries to graph these items are enabled by default. I have enabled the debug options for larrdstatus and larrddata, In checking the log files I can see no entries for the client memory or the citrix stats. Any Ideas. Thanks Craig Boyce
CPU (load) and DISK (percent) are provided by the base NT client. Do you
already have bb-memory script from Deadcat installed on each to pass the
memory and netstat data? I'm guessing you had the graphs under BB since
you are asking where they have gone :-)
Are you missing the whole "memory" column, or is it just the graph that is
broken? Is it the same for "citrix"? If you have the columns but
broken/missing graphs try deleting any existing rrd files for these tests
on the hobbit server.
I have both of these working OK, so feel free to drop me a note. Nice to
see another kiwi on the list!
Andy.
#####################################################################################
This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').
While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################
list Fabio Flores
Hi All,
Im trying once again to get my custom graphs to work.
the rrd files are ok, Ive tested them separately, but when I try to merge
them into 1 graph, it wont work. Here is what I have on my hobbitgraph.cfg:
[jms]
FNPATTERN jms(.*).rrd
TITLE JMS Queues
YAXIS Num of Messages
DEF:IN at RRDIDX@=@RRDFN@:IN:AVERAGE
LINE2:IN at RRDIDX@#@COLOR@:@RRDPARAM@
GPRINT:IN at RRDIDX@:LAST: \: %5.1lf (cur)
GPRINT:IN at RRDIDX@:MAX: \: %5.1lf (max)
GPRINT:IN at RRDIDX@:MIN: \: %5.1lf (min)
GPRINT:IN at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
I might be missing something but Ive checked the mail list archive and also
the rrd documentation but couldn't figure out.
The point is that I have different jms,foo.rrd files each one got different
DS: definitions, for a start Im using only the "IN" definition.
Hope someone can help.
Thanks.
list Didier Degey
Hello Andy,Craig I have exactly the same problem. I have tried the solution that Andy propose, but that did not change anything Still have the memory graph missing (and only the graph) Craig, does the solution change something for you ? Didier.
▸
-----Message d'origine-----
De : Andy France [mailto:user-ee2a9e4eaf57@xymon.invalid]
Envoyé : mardi 21 juin 2005 04:25
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] NT Client trend graphs
Hi Craig,
"Craig Boyce" wrote on 21/06/2005 14:05:21:Hi, I have moved from BB to hobbit and am having problems getting the memory data graphed from all the windows clients, There is no graph being generated for memory while Cpu and Disk are. I have also added Henrik's Citrix script and these stats are not being passed through. I have checked the LARRDS and GRAPHS section of hobbitserver.cfg and the entries to graph these items are enabled by default. I have enabled the debug options for larrdstatus and larrddata, In checking the log files I can see no entries for the client memory or the citrix stats. Any Ideas. Thanks Craig Boyce
CPU (load) and DISK (percent) are provided by the base NT client. Do you
already have bb-memory script from Deadcat installed on each to pass the
memory and netstat data? I'm guessing you had the graphs under BB since you
are asking where they have gone :-)
Are you missing the whole "memory" column, or is it just the graph that is
broken? Is it the same for "citrix"? If you have the columns but
broken/missing graphs try deleting any existing rrd files for these tests on
the hobbit server.
I have both of these working OK, so feel free to drop me a note. Nice to
see another kiwi on the list!
Andy.
############################################################################
#########
This email is intended for the person to whom it is addressed only. If you
are not the intended recipient, do not read, copy or use the contents in any
way. The opinions expressed may not necessarily reflect those of ZESPRI
Group of Companies ('ZESPRI').
While every effort has been made to verify the information contained herein,
ZESPRI does not make any representations as to the accuracy of the
information or to the performance of any data, information or the products
mentioned herein.
ZESPRI will not accept liability for any losses, damage or consequence,
however, resulting directly or indirectly from the use of this
e-mail/attachments.
############################################################################
#########
list Craig Boyce
Hi Didier, My previous BB install was building the memory graphs from the CPU stats returned from the BB client and this was not working with Hobbit. I installed the bb-memory script from deadcat and set the saved logs location on the client and it started working straight away. Craig
▸
-----Original Message-----
From: Didier Degey [mailto:user-fe2d30acf6f7@xymon.invalid]
Sent: Thursday, 23 June 2005 7:04 p.m.
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] NT Client trend graphs
Hello Andy,Craig
I have exactly the same problem.
I have tried the solution that Andy propose, but that did not change anything Still have the memory graph missing (and only the graph)
Craig, does the solution change something for you ?
Didier.
-----Message d'origine-----
De : Andy France [mailto:user-ee2a9e4eaf57@xymon.invalid]
Envoyé : mardi 21 juin 2005 04:25
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] NT Client trend graphs
Hi Craig,
"Craig Boyce" wrote on 21/06/2005 14:05:21:Hi, I have moved from BB to hobbit and am having problems getting the memory data graphed from all the windows clients, There is no graph being generated for memory while Cpu and Disk are. I have also added Henrik's Citrix script and these stats are not being passed through. I have checked the LARRDS and GRAPHS section of hobbitserver.cfg and the entries to graph these items are enabled by default. I have enabled the debug options for larrdstatus and larrddata, In checking the log files I can see no entries for the client memory or the citrix stats. Any Ideas. Thanks Craig Boyce
CPU (load) and DISK (percent) are provided by the base NT client. Do you already have bb-memory script from Deadcat installed on each to pass the memory and netstat data? I'm guessing you had the graphs under BB since you are asking where they have gone :-)
Are you missing the whole "memory" column, or is it just the graph that is broken? Is it the same for "citrix"? If you have the columns but broken/missing graphs try deleting any existing rrd files for these tests on the hobbit server.
I have both of these working OK, so feel free to drop me a note. Nice to see another kiwi on the list!
Andy.
############################################################################
#########
This email is intended for the person to whom it is addressed only. If you are not the intended recipient, do not read, copy or use the contents in any way. The opinions expressed may not necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').
While every effort has been made to verify the information contained herein, ZESPRI does not make any representations as to the accuracy of the information or to the performance of any data, information or the products mentioned herein. ZESPRI will not accept liability for any losses, damage or consequence, however, resulting directly or indirectly from the use of this e-mail/attachments. ############################################################################
#########
#####################################################################################
Disclaimer:
The information in this electronic mail message is confidential and may be legally privileged.
It is intended solely for the Addressee.Access to this internet electronic mail message by anyone else is unauthorised.
If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be
taken in reliance on it is prohibited and may be unlawful.
If you have received this message in error please notify us immediately.
Rodney District Council accepts no responsibility for any effects this email message or attachments has on the recipient
network or computer system.
#####################################################################################
list Didier Degey
Hello Craig, I was using BBCheckMemory.vbs ... I change to the bb-memory and it works well too Thanks Didier.
▸
-----Message d'origine-----
De : Craig Boyce [mailto:user-e7830d35cd5f@xymon.invalid]
Envoyé : jeudi 23 juin 2005 11:11
À : user-ae9b8668bcde@xymon.invalid
Objet : RE: [hobbit] NT Client trend graphs
Hi Didier,
My previous BB install was building the memory graphs from the CPU stats
returned from the BB client and this was not working with Hobbit. I
installed the bb-memory script from deadcat and set the saved logs location
on the client and it started working straight away.
Craig
-----Original Message-----
From: Didier Degey [mailto:user-fe2d30acf6f7@xymon.invalid]
Sent: Thursday, 23 June 2005 7:04 p.m.
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] NT Client trend graphs
Hello Andy,Craig
I have exactly the same problem.
I have tried the solution that Andy propose, but that did not change
anything Still have the memory graph missing (and only the graph)
Craig, does the solution change something for you ?
Didier.
-----Message d'origine-----
De : Andy France [mailto:user-ee2a9e4eaf57@xymon.invalid]
Envoyé : mardi 21 juin 2005 04:25
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] NT Client trend graphs
Hi Craig,
"Craig Boyce" wrote on 21/06/2005 14:05:21:Hi, I have moved from BB to hobbit and am having problems getting the memory data graphed from all the windows clients, There is no graph being generated for memory while Cpu and Disk are. I have also added Henrik's Citrix script and these stats are not being passed through. I have checked the LARRDS and GRAPHS section of hobbitserver.cfg and the entries to graph these items are enabled by default. I have enabled the debug options for larrdstatus and larrddata, In checking the log files I can see no entries for the client memory or the citrix stats. Any Ideas. Thanks Craig Boyce
CPU (load) and DISK (percent) are provided by the base NT client. Do you
already have bb-memory script from Deadcat installed on each to pass the
memory and netstat data? I'm guessing you had the graphs under BB since you
are asking where they have gone :-)
Are you missing the whole "memory" column, or is it just the graph that is
broken? Is it the same for "citrix"? If you have the columns but
broken/missing graphs try deleting any existing rrd files for these tests on
the hobbit server.
I have both of these working OK, so feel free to drop me a note. Nice to
see another kiwi on the list!
Andy.
############################################################################
#########
This email is intended for the person to whom it is addressed only. If you
are not the intended recipient, do not read, copy or use the contents in any
way. The opinions expressed may not necessarily reflect those of ZESPRI
Group of Companies ('ZESPRI').
While every effort has been made to verify the information contained herein,
ZESPRI does not make any representations as to the accuracy of the
information or to the performance of any data, information or the products
mentioned herein. ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from the use of this
e-mail/attachments.
############################################################################
#########
############################################################################
#########
Disclaimer:
The information in this electronic mail message is confidential and may be
legally privileged.
It is intended solely for the Addressee.Access to this internet electronic
mail message by anyone else is unauthorised.
If you are not the intended recipient, any disclosure, copying, distribution
or any action taken or omitted to be
taken in reliance on it is prohibited and may be unlawful.
If you have received this message in error please notify us immediately.
Rodney District Council accepts no responsibility for any effects this email
message or attachments has on the recipient
network or computer system.
############################################################################
#########
list Thomas Kern
I have two non-linux systems that I have a specialized client on. On one of these systems, I have a special test for the CPU utilization of logical partitions. I used the NCV method of getting the data (LPARname:CPUvalue) into the RRD files supported by Hobbit. I have been able to add a graph to the status page for the LPAR column with a graph definition that lists each of the known LPARs in this system and the total value that is also sent. This has been working for several weeks. Now I am trying to add the second system. It has a different set of LPAR names. These LPAR names must be unique across our organization. I asked our Hobbit admin to update the hobbitgraph.cfg with a modified [lpar] definition that included the second system's LPAR names in the hope that RRD would simply say that those entries from the other system were 'Not Available' or simply not display anything for them. I had hoped that I could use ONE graph definition for the [lpar] column no matter which system was supplying the data. Both graphs had an error about one of the OTHER's DS entries not being present. Is it possible to have two graph definitions for the same column, such as [vm1.lpar] and [vmhost.lpar] versus a single definition [lpar]? Here are copies of the two status pages: Thu Jan 18 10:05:06 EST 2007 General Processor LPAR utilization within thresholds LEGACY:1.0 ZOSEPROD:26.7 ZOSPROD:2.4 Total:30.1 Thu Jan 18 10:05:13 EST 2007 IFL Processor LPAR utilization within thresholds IFLPROD:1.3 Total:1.3 Here is the combined graph definition: [lpar] TITLE z890 LPAR CPU Utilization YAXIS Percent FNPATTERN lpar.rrd -u 100 -l 0 DEF:LEGACY=lpar.rrd:LEGACY:AVERAGE DEF:IFLPROD=lpar.rrd:IFLPROD:AVERAGE DEF:ZOSPROD=lpar.rrd:ZOSPROD:AVERAGE DEF:IFLTEST=lpar.rrd:IFLTEST:AVERAGE DEF:ZOSEPROD=lpar.rrd:ZOSEPROD:AVERAGE DEF:Total=lpar.rrd:Total:AVERAGE LINE2:LEGACY#00cc00:Legacy LINE2:IFLPROD#00cc00:IFL-Production LINE2:ZOSPROD#ff0000:ZOSProd LINE2:IFLTEST#00cc00:IFL-Test LINE2:ZOSEPROD#0000ff:ZOSEProd LINE2:Total#ff00ff:Total COMMENT: \n GPRINT:LEGACY:LAST:LEGACY \: %5.1lf (cur) GPRINT:LEGACY:MAX: \: %5.1lf (max) GPRINT:LEGACY:MIN: \: %5.1lf (min) GPRINT:IFLPROD:AVERAGE: \: %5.1lf (avg) \n GPRINT:IFLPROD:LAST:LEGACY \: %5.1lf (cur) GPRINT:IFLPROD:MAX: \: %5.1lf (max) GPRINT:IFLPROD:MIN: \: %5.1lf (min) GPRINT:IFLPROD:AVERAGE: \: %5.1lf (avg) \n GPRINT:ZOSPROD:LAST:ZOSPROD \: %5.1lf (cur) GPRINT:ZOSPROD:MAX: \: %5.1lf (max) GPRINT:ZOSPROD:MIN: \: %5.1lf (min) GPRINT:ZOSPROD:AVERAGE: \: %5.1lf (avg) \n GPRINT:IFLTEST:LAST:LEGACY \: %5.1lf (cur) GPRINT:IFLTEST:MAX: \: %5.1lf (max) GPRINT:IFLTEST:MIN: \: %5.1lf (min) GPRINT:IFLTEST:AVERAGE: \: %5.1lf (avg) \n GPRINT:ZOSEPROD:LAST:ZOSEPROD \: %5.1lf (cur) GPRINT:ZOSEPROD:MAX: \: %5.1lf (max) GPRINT:ZOSEPROD:MIN: \: %5.1lf (min) GPRINT:ZOSEPROD:AVERAGE: \: %5.1lf (avg) \n GPRINT:Total:LAST:Total \: %5.1lf (cur) GPRINT:Total:MAX: \: %5.1lf (max) GPRINT:Total:MIN: \: %5.1lf (min) GPRINT:Total:AVERAGE: \: %5.1lf (avg) \n /Thomas Kern /XXX-XXX-XXXX
list Jeremy Ruffer
I'm obviously missing something here. I have an external test that creates columns Humidity and Temperature as follows: Fri Feb 1 11:01:39 GMT 2008 Humidity : 21.0 and Fri Feb 1 11:01:39 GMT 2008 Temperature : 24.7 I have added the columns in hobbitserver.cfg TEST2RRD="cpu=la,disk,inode,qtree,memory,$PINGCOLUMN=tcp,http=tcp,dns=tc p,dig=tcp,time=ntpstat,vmstat,iostat,netstat,temperature,apache,bind,sen dmail,mailq,nmailq=mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy ,hobbitd,files,procs=processes,ports,clock,lines,Temperature=ncv,Humidit y=ncv" GRAPHS="la,disk,inode,qtree,files,processes,memory,users,vmstat,iostat,t cp.http,tcp,ncv,netstat,ifstat,mrtg::1,ports,temperature,ntpstat,apache, bind,sendmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobb itd,clock,lines,Temperature,Humidity" NCV_Temperature="Temperature:GAUGE" NCV_Humidity="Humidity:GAUGE" I did wonder if it was the decimal point so I took it out but that didn't make any difference. The rrd files aren't being created. What have I missed? Jeremy This message, and any associated files, are intended only for the use of the message recipient and may contain information that is confidential, subject to copyright or constitute a trade secret. If you are not the message recipient you are hereby notified that any dissemination, copying or distribution of this message, or files associated with this message, is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to the message and then deleting it from your computer. HSS Hire Service Group Limited may monitor email traffic data and also the content of email for the purposes of security and staff training. Any views or opinions presented are solely those of user-020a2aa3cf14@xymon.invalid and do not necessarily represent those of the company. HSS Hire Service Group is a limited company registered in England and Wales. Registered number: 2103564. Registered office: 25 Willow Lane, Mitcham, Surrey, CR4 4TS, United Kingdom.
list James Wade
Jeremy, Try adding another column to one of the tests: Humidity : 21.0 Test : 19.0 I had a problem where I was feeding in four values, the fourth one would never appear in the rrd file. I figure out that it needed a newline when I sent the status message: `cat $line` " I just put the quote on the line below the cat. When I had it on the same line: `cat $line`" , it didn't read in the fourth value. I'm just wondering if because you only have one value, the new line isn't sent and so it's not creating an rrd file. By adding a value below Humidity or Test, you could check for it. The other thing I did when I created my test was to restart Hobbit Server. James
▸
From: Jeremy Ruffer [mailto:user-020a2aa3cf14@xymon.invalid]
Sent: Friday, February 01, 2008 5:16 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Custom graphs
I'm obviously missing something here.
I have an external test that creates columns Humidity and Temperature as
follows:
Fri Feb 1 11:01:39 GMT 2008
Humidity : 21.0
and
Fri Feb 1 11:01:39 GMT 2008
Temperature : 24.7
I have added the columns in hobbitserver.cfg
TEST2RRD="cpu=la,disk,inode,qtree,memory,$PINGCOLUMN=tcp,http=tcp,dns=tcp,di
g=tcp,time=ntpstat,vmstat,iostat,netstat,temperature,apache,bind,sendmail,ma
ilq,nmailq=mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,fil
es,procs=processes,ports,clock,lines,Temperature=ncv,Humidity=ncv"
GRAPHS="la,disk,inode,qtree,files,processes,memory,users,vmstat,iostat,tcp.h
ttp,tcp,ncv,netstat,ifstat,mrtg::1,ports,temperature,ntpstat,apache,bind,sen
dmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,clock,li
nes,Temperature,Humidity"
▸
NCV_Temperature="Temperature:GAUGE"
NCV_Humidity="Humidity:GAUGE"
I did wonder if it was the decimal point so I took it out but that didn't
make any difference.
The rrd files aren't being created.
What have I missed?
Jeremy
This message, and any associated files, are intended only for the use of the
message recipient and may contain information that is confidential, subject
to copyright or constitute a trade secret. If you are not the message
recipient you are hereby notified that any dissemination, copying or
distribution of this message, or files associated with this message, is
strictly prohibited. If you have received this message in error, please
notify the sender immediately by replying to the message and then deleting
it from your computer. HSS Hire Service Group Limited may monitor email
traffic data and also the content of email for the purposes of security and
staff training. Any views or opinions presented are solely those of
user-020a2aa3cf14@xymon.invalid and do not necessarily represent those of the company.
HSS Hire Service Group is a limited company registered in England and Wales.
Registered number: 2103564.
Registered office: 25 Willow Lane, Mitcham, Surrey, CR4 4TS, United Kingdom.
list dOCtoR MADneSs
Hi, I created some tests, and I get graphs from them, using splitncv All is almost perfect (I get values, they are stored in separate files, the graphs are made), but I have a little issue. In trends page, I get my custom graphs, then 2 useless lines (having the name of my custom graphs). I attache a screenshot that is better descritption. I'd like to remove those useless lines, that's all. I tried to add/remove/modify GRAPHS section in hobbitserver.cfg, TRENDS definition in bb-hosts, without any result (I can't get more or less useless lines). Any help is very welcome.
list Ryan Novosielski
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much formatting in them and would require a rewrite to have the separate lines for values as is needed by NCV. I happened to check the docs on my 4.2.3 server however and I see that the part about sending trends messages is missing. So, I've got two questions: 1) Was sending trends messages added in Xymon 4.3.x and not available in 4.2.3? 2) At the end of this: http://www.xymon.com/xymon/help/howtograph.html ...there's a description of what to put in a trends message. There is not, however, any description of what one does with it. My expectation is that you do the following: xymon <display> `cat trends.file` ...where the trends file is like the one described in the manual (eg. starts with "data hostname.test"). Essentially, similar to sending a status message (but sending a data message instead). If this is the case, it might be helpful to have a note there in the manual, or at least a reference to the part of the manual I'm about to go hunt for that I assume mentions data messages. - -- - ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iEYEARECAAYFAlBgyKgACgkQmb+gadEcsb7KRwCgnSF/ZHyyLuRFcgHPrn7WotPz Ik0An1gl+WN6UpaeNJ21GWb1rNRqbYTd =B2zv -----END PGP SIGNATURE-----
list Wim Nelis
Hello,
▸
I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much formatting in them and would require a rewrite to have the separate lines for values as is needed by NCV.
Perhaps a rewrite is not needed. If the values to be entered in the graph are still available once the status message is build, you could add an HTML comment section to the message containing the NCVs, formatted as Xymon expects them. Additionally, you could replace the ':' and the '=' in the original status message by the corresponding HTML escape character. This will prevent the NCV module from extracting values from the original message. HTH, Wim Nelis. ****************************************************************************************************************** The NLR disclaimer is valid for NLR e-mail messages. This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender. This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages. ******************************************************************************************************************
list Ryan Novosielski
▸
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/25/2012 12:41 AM, Nelis, Wim wrote:
Hello,I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much formatting in them and would require a rewrite to have the separate lines for values as is needed by NCV.Perhaps a rewrite is not needed. If the values to be entered in the graph are still available once the status message is build, you could add an HTML comment section to the message containing the NCVs, formatted as Xymon expects them. Additionally, you could replace the ':' and the '=' in the original status message by the corresponding HTML escape character. This will prevent the NCV module from extracting values from the original message.
That is actually a pretty interesting idea, though it ends up being about the same amount of work as using a trends message when it comes down to it (except I guess it cuts down on the number of message). I guess the NCV will use either = or : as a delimiter?
▸
- -- - ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
iEYEARECAAYFAlBhSvkACgkQmb+gadEcsb5DGwCgtw7f45WuBMYU6tHp7UTmPQRY
n1sAn1GVbqEWaScIomgAZLuUiw3SkQgC
=IO/S
-----END PGP SIGNATURE-----
list Ryan Novosielski
▸
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/24/2012 04:55 PM, Ryan Novosielski wrote:
Hi, I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much formatting in them and would require a rewrite to have the separate lines for values as is needed by NCV. I happened to check the docs on my 4.2.3 server however and I see that the part about sending trends messages is missing. So, I've got two questions: 1) Was sending trends messages added in Xymon 4.3.x and not available in 4.2.3?
The answer to this question I found is "no, the feature was indeed available in 4.2.3 but just missing from that section of the docs."
▸
2) At the end of this: http://www.xymon.com/xymon/help/howtograph.html ...there's a description of what to put in a trends message. There is not, however, any description of what one does with it. My expectation is that you do the following: xymon <display> `cat trends.file` ...where the trends file is like the one described in the manual (eg. starts with "data hostname.test"). Essentially, similar to sending a status message (but sending a data message instead). If this is the case, it might be helpful to have a note there in the manual, or at least a reference to the part of the manual I'm about to go hunt for that I assume mentions data messages.
And this was correct as well. A followup question relates to the apparent requirement that you add the RRD/test name to both the GRAPH and the TEST2RRD variables in the hobbitserver.cfg. Is there any way to add the graph to just the test page and not the trends page? Why do both of these variables exist if it seems like you need both of them for it to work?
▸
- -- - ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
iEYEARECAAYFAlBhS+YACgkQmb+gadEcsb45xwCeJ0w6ptFrjtdDa92ZkJ1hisHO
PjUAnAo0QrYtJT674R2l0yTKnGJj7gZv
=BJey
-----END PGP SIGNATURE-----
list Chris Morris
The comments in the "xymonserver.cfg" file are quite clear :- TEST2RRD # This is also used by the svcstatus.cgi script to determine if the detailed # status view of a test should include a graph. GRAPH # This defines which RRD files to include on the "trends" column webpage, # and the order in which they appear. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Ryan Novosielski Sent: 25 September 2012 07:15 To: xymon at xymon.com Subject: Re: [Xymon] Custom Graphs
▸
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 09/24/2012 04:55 PM, Ryan Novosielski wrote:Hi, I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much formatting in them and would require a rewrite to have the separate lines for values
as is needed by NCV. I happened to check the docs on my 4.2.3 server however and I see that the part about sending trends messages is missing. So, I've got two questions: 1) Was sending trends messages added in Xymon 4.3.x and not available in 4.2.3?
The answer to this question I found is "no, the feature was indeed available in 4.2.3 but just missing from that section of the docs."
2) At the end of this: http://www.xymon.com/xymon/help/howtograph.html ...there's a description of what to put in a trends message. There is not, however,
any description of what one does with it. My expectation is that you do the following: xymon <display> `cat trends.file` ...where the trends file is like the one described in the manual (eg. starts with "data hostname.test"). Essentially, similar to sending a status message (but sending a data message instead). If this is the case, it might be helpful to have a note there in the manual, or at least a reference to the part of the manual I'm about to go hunt for that I assume mentions data messages.
And this was correct as well. A followup question relates to the apparent requirement that you add the RRD/test name to both the GRAPH and the TEST2RRD variables in the hobbitserver.cfg. Is there any way to add the graph to just the test page and not the trends page? Why do both of these variables exist if it seems like you need both of them for it to work? - -- - ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iEYEARECAAYFAlBhS+YACgkQmb+gadEcsb45xwCeJ0w6ptFrjtdDa92ZkJ1hisHO PjUAnAo0QrYtJT674R2l0yTKnGJj7gZv =BJey -----END PGP SIGNATURE-----
**************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither the RWE Group of Companies nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************
list Ryan Novosielski
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Adding it to the GRAPH variable didn't seem to do anything if it wasn't also present on the TEST2RRD variable, and vice-versa. I will double-check that, but it didn't seem to be possible to do one or the other.
▸
On 09/25/2012 04:16 AM, user-e510f6c03e57@xymon.invalid wrote:The comments in the "xymonserver.cfg" file are quite clear :- TEST2RRD # This is also used by the svcstatus.cgi script to determine if the detailed # status view of a test should include a graph. GRAPH # This defines which RRD files to include on the "trends" column webpage, # and the order in which they appear. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Ryan Novosielski Sent: 25 September 2012 07:15 To: xymon at xymon.com Subject: Re: [Xymon] Custom Graphs On 09/24/2012 04:55 PM, Ryan Novosielski wrote:Hi,I searched the web a bit and found the Xymon docs and there was mention of two ways of doing custom graphs: NCV and trends messages. NCV doesn't really suit me as my scripts have too much formatting in them and would require a rewrite to have the separate lines for valuesas is needed by NCV. I happened to check the docs on my 4.2.3 server however and I see that the part about sending trends messages is missing. So, I've got two questions:1) Was sending trends messages added in Xymon 4.3.x and not available in 4.2.3?The answer to this question I found is "no, the feature was indeed available in 4.2.3 but just missing from that section of the docs."2) At the end of this: http://www.xymon.com/xymon/help/howtograph.html ...there's a description of what to put in a trends message. There is not, however,any description of what one does with it. My expectation is that you do the following:xymon <display> `cat trends.file`...where the trends file is like the one described in the manual (eg. starts with "data hostname.test"). Essentially, similar to sending a status message (but sending a data message instead). If this is the case, it might be helpful to have a note there in the manual, or at least a reference to the part of the manual I'm about to go hunt for that I assume mentions data messages.And this was correct as well. A followup question relates to the apparent requirement that you add the RRD/test name to both the GRAPH and the TEST2RRD variables in the hobbitserver.cfg. Is there any way to add the graph to just the test page and not the trends page? Why do both of these variables exist if it seems like you need both of them for it to work? **************************************************************************** The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited. If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither the RWE Group of Companies nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any). *****************************************************************************
- -- - ---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
iEYEARECAAYFAlBhbF0ACgkQmb+gadEcsb7kegCeL2sBvcKaKgzmeTHTPbqpqM2P
afsAn0HRG7e/eg9ky4DQWNkDgpVhU7+Q
=+CIn
-----END PGP SIGNATURE-----
list Alias
Our weakness with xymon is forecasting disk trends. With the right thresholds I can be proactive in managing disk capacity however it's not possible to check the past 576days for hundreds of servers. Metric Reporting results are blank when selecting multiple hosts and disks however it does work for CPU but it compiles all results\servers into one graph . I would like to see disk trends of 200hosts display in 200graphs on the one page. Has anyone got any suggestions or "been there, done that" ? regards
list Paul Root
Hi,
This is one of my least favorite things of xymon. It really shouldn't be this hard.
I've had a script that pulls the %mem of certain processes on various machines. Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.
Previously, I had the script just put out a ps -eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).
I get something that looks like this now:
Process Memory Usage
SA: 6.5
MBIM: 2.4
OI: 4.1
Process' alert levels are defined in /usr/local/etc/procmem.cfg
When levels are exceeded, restarting of the process to free memory is recommended
xymProcMem,v 1.7 2014/07/07 20:37:17 ptroot Exp ptroot $
My first iteration, I used <br> to separate the lines. That was a mistake, the source of the html showed them all on one line. I've changed to \n, and that to me should provide the proper output.
However, looking at the rrd dump, I only get the first one:
<!-- Round Robin Database Dump --><rrd> <version> 0003 </version>
<step> 300 </step> <!-- Seconds -->
<lastupdate> 1404828303 </lastupdate> <!-- 2014-07-08 09:05:03 CDT -->
<ds>
<name> SA </name>
<type> GAUGE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> NaN </min>
<max> NaN </max>
<!-- PDP Status -->
<last_ds> 6.5 </last_ds>
<value> NaN </value>
<unknown_sec> 3 </unknown_sec>
</ds>
<ds>
<name> xymProcMemv16201407 </name>
<type> DERIVE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> NaN </min>
<max> NaN </max>
<!-- PDP Status -->
<last_ds> 50 </last_ds>
<value> NaN </value>
<unknown_sec> 3 </unknown_sec>
</ds>
My TEST2RRD looks like:
TEST2RRD="cpu=la,disk,inode,...,nfmsgw=ncv,hpnasnapshot=ncv,ProcMemory=ncv"
And the NCV definitions:
NCV_hpnasnapshot="TotaldevicesinDB:GAUGE,Totalactivedevicesi:GAUGE,Totalinactivedevice:GAUGE,Totaldeviceswithout:GAUGE,Pctactive:GAUGE,Pctinactive:GAUGE,Pctnodriver:GAUGE,Activeattemptedsucc:GAUGE,Activeattemptedunsu:GAUGE,Activebutnotattempt:GAUGE,PctAttemptedsuccess:GAUGE,PctAttemptedunsucce:GAUGE,PctActivebutnotatte:GAUGE"
NCV_ProcMemory="SA:GAUGE,MBIM:GAUGE,OI:GAUGE,MNS1:GAUGE,NMS1:GAUGE,NMS2:GAUGE,NMS11:GAUGE,NMS12:GAUGE,NMS13:GAUGE,NMS14:GAUGE,NMS15:GAUGE,TCMgmtEngine:GAUGE,TCTFTP:GAUGE,TCSyslog:GAUGE,SWIM:GAUGE"
The hpnasnapshot graph works. I've been using it as an example.
Obviously, without all the data in the rrd, there is no point in trying to get a graph to show.
Any ideas from anyone?
Thanks,
Paul.
Paul Root
Lead Engineer
CenturyLink Network Reliability Operations Center
600 Stinson Blvd, N.E.
Flr 2N
Minneapolis, MN 55413
Direct: (651)312-5207
user-76fdb6883669@xymon.invalid
list Paul Root
I decided to delete all the rrd files for this test, and after the next run (hourly). The data files are now filled out correctly. On to the graph...
▸
From: Root, Paul T
Sent: Tuesday, July 08, 2014 11:20 AM
To: 'xymon at xymon.com'
Subject: custom graphs
Hi,
This is one of my least favorite things of xymon. It really shouldn't be this hard.
I've had a script that pulls the %mem of certain processes on various machines. Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.
Previously, I had the script just put out a ps -eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).
I get something that looks like this now:
Process Memory Usage
SA: 6.5
MBIM: 2.4
OI: 4.1
Process' alert levels are defined in /usr/local/etc/procmem.cfg
When levels are exceeded, restarting of the process to free memory is recommended
xymProcMem,v 1.7 2014/07/07 20:37:17 ptroot Exp ptroot $
My first iteration, I used <br> to separate the lines. That was a mistake, the source of the html showed them all on one line. I've changed to \n, and that to me should provide the proper output.
However, looking at the rrd dump, I only get the first one:
<!-- Round Robin Database Dump --><rrd> <version> 0003 </version>
<step> 300 </step> <!-- Seconds -->
<lastupdate> 1404828303 </lastupdate> <!-- 2014-07-08 09:05:03 CDT -->
<ds>
<name> SA </name>
<type> GAUGE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> NaN </min>
<max> NaN </max>
<!-- PDP Status -->
<last_ds> 6.5 </last_ds>
<value> NaN </value>
<unknown_sec> 3 </unknown_sec>
</ds>
<ds>
<name> xymProcMemv16201407 </name>
<type> DERIVE </type>
<minimal_heartbeat> 600 </minimal_heartbeat>
<min> NaN </min>
<max> NaN </max>
<!-- PDP Status -->
<last_ds> 50 </last_ds>
<value> NaN </value>
<unknown_sec> 3 </unknown_sec>
</ds>
My TEST2RRD looks like:
TEST2RRD="cpu=la,disk,inode,...,nfmsgw=ncv,hpnasnapshot=ncv,ProcMemory=ncv"
And the NCV definitions:
NCV_hpnasnapshot="TotaldevicesinDB:GAUGE,Totalactivedevicesi:GAUGE,Totalinactivedevice:GAUGE,Totaldeviceswithout:GAUGE,Pctactive:GAUGE,Pctinactive:GAUGE,Pctnodriver:GAUGE,Activeattemptedsucc:GAUGE,Activeattemptedunsu:GAUGE,Activebutnotattempt:GAUGE,PctAttemptedsuccess:GAUGE,PctAttemptedunsucce:GAUGE,PctActivebutnotatte:GAUGE"
NCV_ProcMemory="SA:GAUGE,MBIM:GAUGE,OI:GAUGE,MNS1:GAUGE,NMS1:GAUGE,NMS2:GAUGE,NMS11:GAUGE,NMS12:GAUGE,NMS13:GAUGE,NMS14:GAUGE,NMS15:GAUGE,TCMgmtEngine:GAUGE,TCTFTP:GAUGE,TCSyslog:GAUGE,SWIM:GAUGE"
The hpnasnapshot graph works. I've been using it as an example.
Obviously, without all the data in the rrd, there is no point in trying to get a graph to show.
Any ideas from anyone?
Thanks,
Paul.
Paul Root
Lead Engineer
CenturyLink Network Reliability Operations Center
600 Stinson Blvd, N.E.
Flr 2N
Minneapolis, MN 55413
Direct: (651)312-5207
user-76fdb6883669@xymon.invalid<mailto:user-76fdb6883669@xymon.invalid>
list Jeremy Laidman
On 9 July 2014 02:20, Root, Paul T <user-76fdb6883669@xymon.invalid> wrote:
I get something that looks like this now:
Your status text looks OK to me. As does your configuration settings.
xymProcMem,v 1.7 2014/07/07 20:37:17 ptroot Exp ptroot $
Except for this. You realise that this is being interpreted as an NCV line? That's how you got your second DS.
▸
My first iteration, I used <br> to separate the lines. That was a mistake, the source of the html showed them all on one line.
But also created the second DS called "xymProcMemv16201407" (from the version string, non-text removed, truncated to 19 chars). I’ve changed to \n, and that to me should provide the proper output.
Too late, the DS names were already set.
<name> xymProcMemv16201407 </name>Once the RRD file is created, it won't get recreated with any new DS names. You have to delete the RRD file, or do an export/edit/import process to get the DS names you need. J
list W.J.M. Nelis
Hi,
▸
I've had a script that pulls the %mem of certain processes on various machines. Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.
Previously, I had the script just put out a ps --eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).
I get something that looks like this now:
Process Memory Usage
SA: 6.5
MBIM: 2.4
OI: 4.1
You might consider using the devmon-way of reporting data in stead of NCV. It has the advantage that you can report data to be entered in multiple RRD's. In Devmon one RRD per interface is use, in your case one RRD per group of machines could be used. The devmon format has also the advantage that colons or equal-signs in the message will not confuse the extractor of the RRD data. Regards,
▸
Wim Nelis.
******************************************************************************************************************
The NLR disclaimer is valid for NLR e-mail messages.
This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual
or legal commitment on the part of the sender.
This message may contain information that is not intended for you. If you are not the addressee or if this
message was sent to you by mistake, you are requested to inform the sender and delete the message.
Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic
transmission of messages.
******************************************************************************************************************
list Paul Root
Yes, I'm working on that now. I pretty quickly came to the realization that the single RRD file wasn't going to work.
▸
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of W.J.M. Nelis
Sent: Wednesday, July 09, 2014 1:49 AM
To: xymon at xymon.com
Subject: Re: [Xymon] custom graphs
Hi,
I've had a script that pulls the %mem of certain processes on various machines. Right now, I have 3 separate groups of machines with different processes being watched. The purpose is to track memory leaks.
Previously, I had the script just put out a ps -eo 'pid,%mem,cmd' for the desired processes. Then others wanted a graph, so I've reformatted the output to be just a keyword (either the process name or a significant field of the arguments of the command).
I get something that looks like this now:
Process Memory Usage
SA: 6.5
MBIM: 2.4
OI: 4.1
You might consider using the devmon-way of reporting data in stead of NCV. It has the advantage that you can report data to be entered in multiple RRD's. In Devmon one RRD per interface is use, in your case one RRD per group of machines could be used. The devmon format has also the advantage that colons or equal-signs in the message will not confuse the extractor of the RRD data.
Regards,
Wim Nelis.
The NLR disclaimer is valid for NLR e-mail messages.
This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of the sender.
This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic transmission of messages.