Xymon Mailing List Archive search

ganglia-style graph aggregation with hobbit

15 messages in this thread

list Gildas le Nadan · Wed, 20 Sep 2006 16:51:25 +0100 ·
Hello,

I'd like to completely replace the ganglia system we have here with hobbit.

Most of the needed features are there, except the ability to produce aggregated graphs with multiple hosts, as in http://monitor.millennium.berkeley.edu/?c=PSI%20Cluster&m=&r=hour&s=descending&hc=4

I suppose the following things will need to be done:
- find a neat way to group the hosts (all the hosts in a group or a page for instance)
- collect all the values from the rrd for all the hosts in a group and add them (when different from 0/NaN)
- graph the result

What would be the best way to do it? I'd like not to reinvent the wheel, so if some bits are already existing, I'd better use them...

Cheers,
Gildas
list Henrik Størner · Wed, 20 Sep 2006 21:39:20 +0200 ·
quoted from Gildas le Nadan
On Wed, Sep 20, 2006 at 04:51:25PM +0100, Gildas Le Nadan wrote:
I'd like to completely replace the ganglia system we have here with hobbit.

Most of the needed features are there, except the ability to produce 
aggregated graphs with multiple hosts, as in 
http://monitor.millennium.berkeley.edu/?c=PSI%20Cluster&m=&r=hour&s=descending&hc=4
I think it can be done just by putting together the right graph
definitions. RRDtool which is used to generate the graphs has all
of the necessary functions to build such aggregate graphs, and 
Hobbit already stores all of the information you're tracking. So
it should "just" be a case of putting together the right input for
the RRD graph module.

I have done it on an ad-hoc basis by hand-coding some extra graph
definitions in the hobbitgraph.cfg file, but this is not suitable
for the case where you have lots of hosts - for that you need something
a bit more flexible that lets you select a group of hosts, and generate
a graph with the type of aggregation you want.
quoted from Gildas le Nadan
What would be the best way to do it? I'd like not to reinvent the wheel, 
so if some bits are already existing, I'd better use them...
Check the hobbit_hostgraph.cgi module in Hobbit 4.2, and the "multi"
definitions in hobbitgraph.cfg. The current hobbitgraph tool lets you
generate a graph for multiple hosts, but just overlaid on top of each
other. I think it might be possible to just modify the "multi"
definitions in hobbitgraph.cfg to produce an aggregate graph also.


Regards,
Henrik
list Gildas le Nadan · Thu, 21 Sep 2006 15:39:42 +0100 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Wed, Sep 20, 2006 at 04:51:25PM +0100, Gildas Le Nadan wrote:
I'd like to completely replace the ganglia system we have here with hobbit.

Most of the needed features are there, except the ability to produce aggregated graphs with multiple hosts, as in http://monitor.millennium.berkeley.edu/?c=PSI%20Cluster&m=&r=hour&s=descending&hc=4
I think it can be done just by putting together the right graph
definitions. RRDtool which is used to generate the graphs has all
of the necessary functions to build such aggregate graphs, and Hobbit already stores all of the information you're tracking. So
it should "just" be a case of putting together the right input for
the RRD graph module.

I have done it on an ad-hoc basis by hand-coding some extra graph
definitions in the hobbitgraph.cfg file, but this is not suitable
for the case where you have lots of hosts - for that you need something
a bit more flexible that lets you select a group of hosts, and generate
a graph with the type of aggregation you want.
What would be the best way to do it? I'd like not to reinvent the wheel, so if some bits are already existing, I'd better use them...
Check the hobbit_hostgraph.cgi module in Hobbit 4.2, and the "multi"
definitions in hobbitgraph.cfg. The current hobbitgraph tool lets you
generate a graph for multiple hosts, but just overlaid on top of each
other. I think it might be possible to just modify the "multi"
definitions in hobbitgraph.cfg to produce an aggregate graph also.


Regards,
Henrik
Hum, I'm afraid I don't get how it works/can't make it work on a simple example: I'm trying to change la-multi in hobbitgraph.cfg so the values will be added up instead of printed on top of the others.

Are the entries in hobbitgraph.cfg used as a template to build the rrdgrph query? If so, then how can I access the values from the previous RDN to add them to the one in the current RDN (@RRDFN@)?

I tried adding the values to a VDEF:add=add, at RRDIDX@,+ but without success.

Any clue?

Cheers,
Gildas
list Tom Georgoulias · Tue, 10 Oct 2006 11:23:58 -0400 ·
quoted from Gildas le Nadan
Gildas Le Nadan wrote:
Hum, I'm afraid I don't get how it works/can't make it work on a simple example: I'm trying to change la-multi in hobbitgraph.cfg so the values will be added up instead of printed on top of the others.

Are the entries in hobbitgraph.cfg used as a template to build the rrdgrph query? If so, then how can I access the values from the previous RDN to add them to the one in the current RDN (@RRDFN@)?

I tried adding the values to a VDEF:add=add, at RRDIDX@,+ but without success.
Did you ever work out a solution for this?

I'm starting to investigate a way that I can take data from many rrd files, and graph the average of the data all those rrd files as a single line.  For example, I'd like to average the %CPU usage (la1) for 10 different webservers, and display it as a single overall average %CPU for a web farm.

Tom
list Charles Goyard · Wed, 11 Oct 2006 10:39:12 +0200 ·
Hi,
quoted from Tom Georgoulias

Tom Georgoulias a écrit :
I'm starting to investigate a way that I can take data from many rrd files, and graph the average of the data all those rrd files as a single line.  For example, I'd like to average the %CPU usage (la1) for 10 different webservers, and display it as a single overall average %CPU for a web farm.
I currently am writing an extension for hobbit that works almost like
bbcombotest, but it lets you yield a green, yellow or red (bbcombotest
only has green and red). It will also sum and average NCV-like data out
of aggregated statuses. I intend to use it for load-balanced pools, and
limited resources, such as X25 network lines. It hope to release it by
the end of the week. If it works like I want, I'll then try to ack it
into the hobbit core.


-- 
Charles Goyard - user-98f9625a7a59@xymon.invalid - (+33) 1 45 38 01 31
list Gildas le Nadan · Wed, 11 Oct 2006 09:50:34 +0100 ·
quoted from Tom Georgoulias
Tom Georgoulias wrote:
Gildas Le Nadan wrote:
Hum, I'm afraid I don't get how it works/can't make it work on a simple example: I'm trying to change la-multi in hobbitgraph.cfg so the values will be added up instead of printed on top of the others.

Are the entries in hobbitgraph.cfg used as a template to build the rrdgrph query? If so, then how can I access the values from the previous RDN to add them to the one in the current RDN (@RRDFN@)?

I tried adding the values to a VDEF:add=add, at RRDIDX@,+ but without success.
Did you ever work out a solution for this?
No, not yet. I tried several other things in the [*-multi] hobbitgraph.cfg definitions but with no luck so far (I am no a rrd expert).

Btw Henrik, I also think it would be a good idea if the multi graph menu in hobbit-hostgraphs.sh was generated automatically from the [*-multi] entries in hobbitgraph.cfg.

Things I tried so far:

- the :STACK option don't work, probably because it shouldn't be added for the first entry (I tried the example on the rrd page, setting a graph with a constant value as a first graph don't work)

- I tried a VDEF/CDEF with IF so if there is no entry we set it to 0 (because we have to treat the first entry correctly)

I was about to test the different possibilities using rrdgraph straight instead of hobbit graph, so to get more debug/output when it fails.

Then, when I'll get a working solution, I'll try to see if this is possible to implement using the actual hobbitgraph.cgi. If not, I'll try to patch/ask Henrik for features.

(At least that's my plan)
quoted from Charles Goyard
I'm starting to investigate a way that I can take data from many rrd files, and graph the average of the data all those rrd files as a single line.  For example, I'd like to average the %CPU usage (la1) for 10 different webservers, and display it as a single overall average %CPU for a web farm.

Tom
Yes, this is a fairly common problem I think :) There's plenty of other usage, such as adding up the bandwidth on different servers, and so on...

Cheers,
Gildas
list Jason Altrincham Jones · Wed, 11 Oct 2006 10:45:40 +0100 ·
Hi all,

I'm trying to get a setup where I am E-mailed when someone ack's or
disables the alert with the reason they gave, the problem I am having is
that all alerts are being sent to me, hobbit-alerts.cfg I have put:

HOST=%.*
	MAIL me NOTICE

Do I need to change this? Also can you get ack's to send an e-mail to a
specified person that they have been ack'd? if not can this be
considered a feature request please :)

Thanks,
Jason.
list Jason Altrincham Jones · Wed, 11 Oct 2006 11:20:38 +0100 ·
quoted from Jason Altrincham Jones
Hi all,

I'm trying to get a setup where I am E-mailed when someone ack's or

disables an alert with the reason they gave, the problem I am having is
that all alerts are being sent to me, in hobbit-alerts.cfg I have put:
quoted from Jason Altrincham Jones

HOST=%.*
	MAIL me NOTICE

Do I need to change this? Also can you get ack's to send an e-mail to a
specified person that they have been ack'd? if not can this be
considered a feature request please :)

Thanks,
Jason.
list Charles Jones · Wed, 11 Oct 2006 04:06:28 -0700 ·
Note that as far as I know, currently only enable/disables are sent via 
NOTICE, and there are no alerts for Acks.

-Charles
quoted from Jason Altrincham Jones

Jones, Jason (Altrincham) wrote:
Hi all,

I'm trying to get a setup where I am E-mailed when someone ack's or
disables the alert with the reason they gave, the problem I am having is
that all alerts are being sent to me, hobbit-alerts.cfg I have put:

HOST=%.*
	MAIL me NOTICE

Do I need to change this? Also can you get ack's to send an e-mail to a
specified person that they have been ack'd? if not can this be
considered a feature request please :)

Thanks,
Jason.

list Jason Altrincham Jones · Wed, 11 Oct 2006 15:43:09 +0100 ·
Ignore this, I resent it when it wouldn't appear on the list the first
time and it seems that it took this long to appear...maybe a filter or
something?
Jason.
quoted from Charles Jones
-----Original Message-----
From: Jones, Jason (Altrincham) 
Sent: 11 October 2006 10:46
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] NOTICE tag

Hi all,

I'm trying to get a setup where I am E-mailed when someone ack's or
disables the alert with the reason they gave, the problem I am having is
that all alerts are being sent to me, hobbit-alerts.cfg I have put:

HOST=%.*
	MAIL me NOTICE

Do I need to change this? Also can you get ack's to send an e-mail to a
specified person that they have been ack'd? if not can this be
considered a feature request please :)

Thanks,
Jason.
list Gildas le Nadan · Thu, 12 Oct 2006 10:33:14 +0100 ·
Hi,

BEWARE: this patch has been tested on a machine with rrdtool 1.0.x. The 
values/behavior for rrdtool 1.2.x were taken from the online rrd documentation 
so they are hopefully correct. If someone was able to test it for me on a server 
with rrdtool 1.2, I would be very grateful!

Cheers,
Gildas

-- 

The following patch add support for a @STACKIT@ keyword in the graph definitions 
in hobbitgraph.cfg, allowing data to be stacked.

The STACK behavior changed between rrdtool 1.0.x and 1.2.x, hence the ifdef:
- in 1.0.x, you replace the graph type (AREA|LINE) for the graph you want to 
stack with the  STACK keyword
- in 1.2.x, you add the STACK keyword at the end of the definition

Please note that in both cases the first entry mustn't contain the keyword STACK 
at all, so  we need a different treatment for the first rrdidx

examples of valid hobbitgraph.cfg entries:

rrdtool 1.0.x
[la-multi]
         TITLE Multi-host CPU Load
         YAXIS Load
         FNPATTERN la.rrd
         DEF:avg at RRDIDX@=@RRDFN@:la:AVERAGE
         CDEF:la at RRDIDX@=avg at RRDIDX@,100,/
         @STACKIT@:la at RRDIDX@#@COLOR@:@RRDPARAM@
         -u 1.0
         GPRINT:la at RRDIDX@:LAST: \: %5.1lf (cur)
         GPRINT:la at RRDIDX@:MAX: \: %5.1lf (max)
         GPRINT:la at RRDIDX@:MIN: \: %5.1lf (min)
         GPRINT:la at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n

rrdtool 1.2.x
[la-multi]
         TITLE Multi-host CPU Load
         YAXIS Load
         FNPATTERN la.rrd
         DEF:avg at RRDIDX@=@RRDFN@:la:AVERAGE
         CDEF:la at RRDIDX@=avg at RRDIDX@,100,/
         AREA:la at RRDIDX@#@COLOR@:@RRDPARAM@:@STACKIT@
         -u 1.0
         GPRINT:la at RRDIDX@:LAST: \: %5.1lf (cur)
         GPRINT:la at RRDIDX@:MAX: \: %5.1lf (max)
         GPRINT:la at RRDIDX@:MIN: \: %5.1lf (min)
         GPRINT:la at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n


--- hobbit-4.2.0/web/hobbitgraph.c      2006-08-09 21:10:13.000000000 +0100
+++ hobbit-4.2.0.ganglia/web/hobbitgraph.c      2006-10-12 10:24:09.788773551 +0100
@@ -392,6 +392,42 @@
                         }
                         inp += 10;
                 }
+               else if (strncmp(inp, "@STACKIT@", 9) == 0) {
+                       /* the STACK behavior changed between rrdtool 1.0.x
+                        * and 1.2.x, hence the ifdef:
+                        * - in 1.0.x, you replace the graph type (AREA|LINE)
+                        *  for the graph you want to stack with the  STACK
+                        *  keyword
+                        * - in 1.2.x, you add the STACK keyword at the end
+                        *  of the definition
+                        *
+                        * Please note that in both cases the first entry
+                        * mustn't contain the keyword STACK at all, so
+                        * we need a different treatment for the first rrdidx
+                        *
+                        * examples of hobbitgraph.cfg entries:
+                        *
+                        * - rrdtool 1.0.x
+                        * @STACKIT@:la at RRDIDX@#@COLOR@:@RRDPARAM@
+                        *
+                        * - rrdtool 1.2.x
+                        * AREA::la at RRDIDX@#@COLOR@:@RRDPARAM@:@STACKIT@
+                        */
+                       char numstr[10];
+                       if (rrdidx == 0) {
+#ifdef RRDTOOL12
+                               sprintf(numstr, "");
+#else
+                               sprintf(numstr, "AREA");
+#endif
+                       }
+                       else {
+                               sprintf(numstr, "STACK");
+                       }
+                       strcpy(outp, numstr);
+                       outp += strlen(outp);
+                       inp += 9;
+               }
                 else if (strncmp(inp, "@RRDIDX@", 8) == 0) {
                         char numstr[10];
list Buchan Milne · Thu, 12 Oct 2006 17:38:39 +0200 ·
quoted from Gildas le Nadan
On Thursday 12 October 2006 11:33, Gildas Le Nadan wrote:
Hi,

BEWARE: this patch has been tested on a machine with rrdtool 1.0.x. The
values/behavior for rrdtool 1.2.x were taken from the online rrd
documentation so they are hopefully correct. If someone was able to test it
for me on a server with rrdtool 1.2, I would be very grateful!
Seems to work as expected, with :

$ ldd /usr/lib/hobbit/server/bin/hobbitgraph.cgi |grep rrd
        librrd.so.2 => /usr/lib/librrd.so.2 (0x007c2000)

$ rpm -qf /usr/lib/librrd.so.2
librrdtool2-1.2.11-2.rhel4es

Only thing is, it would be nice to have both the multi-host graphs and the 
aggregated ones available. But, that is more of a hobbit-only issue (along 
with multiple graphs on the page for one custom/extension test, etc. etc.).

Regards,
Buchan

-- 
Buchan Milne
ISP Systems Specialist - Monitoring/Authentication Team Leader
B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
list Gildas le Nadan · Thu, 12 Oct 2006 16:52:29 +0100 ·
quoted from Buchan Milne
Buchan Milne wrote:
On Thursday 12 October 2006 11:33, Gildas Le Nadan wrote:
Hi,

BEWARE: this patch has been tested on a machine with rrdtool 1.0.x. The
values/behavior for rrdtool 1.2.x were taken from the online rrd
documentation so they are hopefully correct. If someone was able to test it
for me on a server with rrdtool 1.2, I would be very grateful!
Seems to work as expected, with :

$ ldd /usr/lib/hobbit/server/bin/hobbitgraph.cgi |grep rrd
        librrd.so.2 => /usr/lib/librrd.so.2 (0x007c2000)

$ rpm -qf /usr/lib/librrd.so.2
librrdtool2-1.2.11-2.rhel4es
Thanks very much for your help!
quoted from Buchan Milne
Only thing is, it would be nice to have both the multi-host graphs and the 
aggregated ones available. But, that is more of a hobbit-only issue (along 
with multiple graphs on the page for one custom/extension test, etc. etc.).
Well, this is exactly what I intend to do in the next step :)

This is why I think we need the multi-host graph list produced by 
hobbit-hostgraphs.cgi to be automatically generated from the [*-multi] entries 
in hobbitgraph.cfg. It would then allow to add [aggrla-multi] entries for instance.
Regards,
Buchan
Thanks for your help, much appreciated,
Gildas
list Tom Georgoulias · Mon, 06 Nov 2006 14:53:33 -0500 ·
quoted from Gildas le Nadan
Gildas Le Nadan wrote:
BEWARE: this patch has been tested on a machine with rrdtool 1.0.x. The values/behavior for rrdtool 1.2.x were taken from the online rrd documentation so they are hopefully correct. If someone was able to test it for me on a server with rrdtool 1.2, I would be very grateful!
I'm a little late on testing this, so hopefully this patch is still the latest version.

I have tested the patch with Hobbit 4.2.0 + all-in-one patch and rrdtool 1.2.15.  It seems to work fine, although I am not seeing data presented in the way that I had expected it to look.  I want to use the RRD stack to store values taken from the rrd for each host in a given group, then get an average of those stored values which is a single data point that represents the host group as a whole.

I've been experimenting with a version of the la1-multi definition, but I haven't gotten anything to work yet and I'm rather certain that my syntax is off in a few places.  I thought I'd email it out anyway, in case I can get some pointers or have someone let me know that it won't work.


TITLE Farm CPU Utilitization
YAXIS % Used
FNPATTERN vmstat.rrd
-u 100
-r
DEF:cpu_idl at RRDIDX@=@RRDFN@:cpu_idl:AVERAGE
CDEF:hostcpu at RRDIDX@=100,cpu_idl at RRDIDX@,-
# need a way to push each hostcpu at RRDIDX@ onto the stack
CDEF:cpuavgs=hostcpu at RRDIDX@,AVERAGE
# then get an average of all the values on the stack
# using something like COUNT as the num of items in the stack
CDEF:pbusy=cpuavgs1,cpuavgs2,cpuavgs3,COUNT,AVG
# graph the final data point
LINE2:pbusy#ccccff:%CPU
GPRINT:pbusy:LAST: \: %5.1lf (cur)
GPRINT:pbusy:MAX: \: %5.1lf (max)
GPRINT:pbusy:MIN: \: %5.1lf (min)
GPRINT:pbusy:AVERAGE: \: %5.1lf (avg)\n


-- 
Tom Georgoulias
Systems Engineer
McClatchy Interactive
list Gildas le Nadan · Tue, 07 Nov 2006 09:57:10 +0000 ·
Hello,

As far as I understand, you want to add up all the values for the hosts in your group and display a single graph instead of a stack of multiple graphs.

The @STACKIT@ stanza was not designed for that, but is designed to graph the relative contribution of each host in the resulting graph.

For your need, I see 2 different options:
1- You create an external script that store the value in a separate rrd and graph it
2- You do the calculations each time you do the rendering, using rrdtool 1.2. There seems to be ways in rrdtool 1.2 to do that, but it seem that you'll have to use the IF operator to populate your value for the first entry. I've seen examples somewhere but I don't remember where.

In case 2, I strongly recommend that you do all the tests manually (i-e not by modifying hobbitgraph.conf) as it is far easier to figure out what the problem is. Once you have a working solution, you can figure out how to adapt it to hobbit.

Cheers,
Gildas
quoted from Tom Georgoulias

Tom Georgoulias wrote:
Gildas Le Nadan wrote:
BEWARE: this patch has been tested on a machine with rrdtool 1.0.x. The values/behavior for rrdtool 1.2.x were taken from the online rrd documentation so they are hopefully correct. If someone was able to test it for me on a server with rrdtool 1.2, I would be very grateful!
I'm a little late on testing this, so hopefully this patch is still the latest version.

I have tested the patch with Hobbit 4.2.0 + all-in-one patch and rrdtool 1.2.15.  It seems to work fine, although I am not seeing data presented in the way that I had expected it to look.  I want to use the RRD stack to store values taken from the rrd for each host in a given group, then get an average of those stored values which is a single data point that represents the host group as a whole.

I've been experimenting with a version of the la1-multi definition, but I haven't gotten anything to work yet and I'm rather certain that my syntax is off in a few places.  I thought I'd email it out anyway, in case I can get some pointers or have someone let me know that it won't work.


TITLE Farm CPU Utilitization
YAXIS % Used
FNPATTERN vmstat.rrd
-u 100
-r
DEF:cpu_idl at RRDIDX@=@RRDFN@:cpu_idl:AVERAGE
CDEF:hostcpu at RRDIDX@=100,cpu_idl at RRDIDX@,-
# need a way to push each hostcpu at RRDIDX@ onto the stack
CDEF:cpuavgs=hostcpu at RRDIDX@,AVERAGE
# then get an average of all the values on the stack
# using something like COUNT as the num of items in the stack
CDEF:pbusy=cpuavgs1,cpuavgs2,cpuavgs3,COUNT,AVG
# graph the final data point
LINE2:pbusy#ccccff:%CPU
GPRINT:pbusy:LAST: \: %5.1lf (cur)
GPRINT:pbusy:MAX: \: %5.1lf (max)
GPRINT:pbusy:MIN: \: %5.1lf (min)
GPRINT:pbusy:AVERAGE: \: %5.1lf (avg)\n