dbcheck, custom graphs and strange results

6 messages in this thread

list Gildas le Nadan · Thu, 21 Sep 2006 10:17:37 +0100 ·

Hello,

I am trying to graph the value of the "Opens" value produced by dbcheck when monitoring mysql.

I defined the value as DERIVE in NCV_mysql instead of GAUGE, as this is a value that is growing while the mysql instance is running (if the mysql instance restart it restarts from 0).

For instance, I have the following:

time  | Opens | diff | graph'd | diff/graph'd
       | value | t'-t | value   | value
------+-------+------+---------+-------------
10:14 | 22732 |  -   |  -      |   -
10:19 | 22765 | +33  | 110m    | 3.33
10:24 | 22806 | +41  | 128m    | 3.12
10:29 | 22821 | +15  | 50m     | 3.33
10:34 | 22835 | +14  | 48m     | 3.2

I was expecting the graphed value to be equal to the diff value, but the units are wrong (m instead of unit), and the value is wrong as well (with a somewhat constant 3.xx ratio).

Any idea where this comes from? (I suspect this is me being stupid and not understanding some of the underlying rrd subtleties)

Cheers,
Gildas

list Francesco Duranti · Thu, 21 Sep 2006 11:53:48 +0200 ·

Using DERIVE, if i remember well, the value is also divided by the time
elapsed so for example:
10:24 | 22806 | +41  | 128m    | 3.12

41/300 (5 minutes interval)=0,136 and rrd will report that number to a
understandable unit (m is milli) so it's 0,136*1000=136m.
De difference of the value you get here 128m instead of 136 m should be
beacause rrd will calc the time when he get the data so it's like it got
the new value after 320 seconds instead of 300 (41/0,128 = 320). The
data graphed is something like opens/seconds. The suggestion on the rrd
site is to eventually multiply it and get data/minutes or data/hours
instead of the "pure" absolute value that is not really useful. To do
this you can put a CDEF in the hobbitgraphs 

[mysqlopen]
        DEF:op=mysqlperf.rrd:Opens:AVERAGE
        CDEF:opm=op,60,*
        TITLE MySQL Open / Minutes
        YAXIS open/min
        LINE2:opm#00CCCC:Open/min.
        COMMENT:\n
        GPRINT:slow:LAST: \: %5.1lf (cur)
        GPRINT:slow:MAX: \: %5.1lf (max)
        GPRINT:slow:MIN: \: %5.1lf (min)
        GPRINT:slow:AVERAGE: \: %5.1lf (avg)\n


Just to put the right value in the next version of dbcheck.pl do you
know the kind of the other counter? I'm working with mysql on a test db
so the perf data are not so clear to understand :D If I'm correct they
should be:
Threads			= GAUGE
Questions			= DERIVE
Slow queries		= DERIVE
Opens				= DERIVE
Flush tables		= DERIVE
Open tables			= GAUGE
Queries per second avg	= GAUGE

Is this correct?


This is the explanation on the rrd create from the website...

It's always a Rate
    RRDtool stores rates in amount/second for COUNTER, DERIVE and
ABSOLUTE data. When you plot the data, you will get on the y axis
amount/second which you might be tempted to convert to an absolute
amount by multiplying by the delta-time between the points. RRDtool
plots continuous data, and as such is not appropriate for plotting
absolute amounts as for example ``total bytes'' sent and received in a
router. What you probably want is plot rates that you can scale to
bytes/hour, for example, or plot absolute amounts with another tool that
draws bar-plots, where the delta-time is clear on the plot for each
point (such that when you read the graph you see for example GB on the y
axis, days on the x axis and one bar for each day).


Francesco

▸ quoted from Gildas le Nadan

Hello,

I am trying to graph the value of the "Opens" value produced 
by dbcheck when monitoring mysql.

I defined the value as DERIVE in NCV_mysql instead of GAUGE, 
as this is a value that is growing while the mysql instance 
is running (if the mysql instance restart it restarts from 0).

For instance, I have the following:

time  | Opens | diff | graph'd | diff/graph'd
       | value | t'-t | value   | value
------+-------+------+---------+-------------
10:14 | 22732 |  -   |  -      |   -
10:19 | 22765 | +33  | 110m    | 3.33
10:24 | 22806 | +41  | 128m    | 3.12
10:29 | 22821 | +15  | 50m     | 3.33
10:34 | 22835 | +14  | 48m     | 3.2

I was expecting the graphed value to be equal to the diff 
value, but the units are wrong (m instead of unit), and the 
value is wrong as well (with a somewhat constant 3.xx ratio).

Any idea where this comes from? (I suspect this is me being 
stupid and not understanding some of the underlying rrd subtleties)

Cheers,
Gildas

list Jason Altrincham Jones · Thu, 21 Sep 2006 11:54:02 +0100 ·

Hi all,

Does hobbit have the ability to monitor power supplies and hard drive
statuses? i.e. we had a hard drive failure the other day and hobbit did
not warn on it, also a power supply failed today (there were 2 so the
server didn't go down) and again no alert, does anyone have an ext
script for these or does hobbit have the facility somewhere I don't know
about?

Thanks,
Jason.

list Johann Eggers · Thu, 21 Sep 2006 13:11:11 +0200 ·

Hi,

I guess "Hobbit-Out-of-the-box" wouldn't do the job. 
If the hardware/OS has tools to determine the status of PSU's,
temperature sensors, raids, disks... then you can compile your own
script to report states / failures / values to your hobbit server.

For a SUN based environment (and probably other) there are scripts
available at www.deadcat.net.au

Another useful way is to monitor your devices via SNMP. You can use the
bb-xsnmp.pl script from deadcat or use the devmon tool
(http://devmon.sourceforge.net/) which has a more generally approach in
monitoring snmp enabled devices.

Johann

▸ quoted from Jason Altrincham Jones

-----Original Message-----
From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid] 
Sent: Donnerstag, 21. September 2006 12:54
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Failures not reported

Hi all,

Does hobbit have the ability to monitor power supplies and hard drive
statuses? i.e. we had a hard drive failure the other day and hobbit did
not warn on it, also a power supply failed today (there were 2 so the
server didn't go down) and again no alert, does anyone have an ext
script for these or does hobbit have the facility somewhere I don't know
about?

Thanks,
Jason.

list Gildas le Nadan · Thu, 21 Sep 2006 12:14:36 +0100 ·

Thanks for your answer Francesco, it is graphing the correct values now.

▸ quoted from Francesco Duranti


Francesco Duranti wrote:

Using DERIVE, if i remember well, the value is also divided by the time
elapsed so for example:
10:24 | 22806 | +41  | 128m    | 3.12

41/300 (5 minutes interval)=0,136 and rrd will report that number to a
understandable unit (m is milli) so it's 0,136*1000=136m.
De difference of the value you get here 128m instead of 136 m should be
beacause rrd will calc the time when he get the data so it's like it got
the new value after 320 seconds instead of 300 (41/0,128 = 320).

Hum, it is more likely because the values were read "on the graph" :)

▸ quoted from Francesco Duranti

The
data graphed is something like opens/seconds. The suggestion on the rrd
site is to eventually multiply it and get data/minutes or data/hours
instead of the "pure" absolute value that is not really useful. To do
this you can put a CDEF in the hobbitgraphs 

[mysqlopen]
        DEF:op=mysqlperf.rrd:Opens:AVERAGE
        CDEF:opm=op,60,*
        TITLE MySQL Open / Minutes
        YAXIS open/min
        LINE2:opm#00CCCC:Open/min.
        COMMENT:\n
        GPRINT:slow:LAST: \: %5.1lf (cur)
        GPRINT:slow:MAX: \: %5.1lf (max)
        GPRINT:slow:MIN: \: %5.1lf (min)
        GPRINT:slow:AVERAGE: \: %5.1lf (avg)\n

Well actually, I renamed the test mysql after your message on 
sourceforge, and the "Opens" value was added to the mysqlslow graph (I 
shall renamed it otherwise).

So here is my working mysqlslow definition in hobbitgraph.cfg

[mysqlslow]
         DEF:slow=mysql.rrd:Slowqueries:AVERAGE
         DEF:open=mysql.rrd:Opens:AVERAGE
         CDEF:slowm=slow,60,*
         CDEF:openm=open,60,*
         TITLE MySQL Slow Queries & Opens / min
         YAXIS #
         LINE2:slowm#000000:Slow Queries / min
         GPRINT:slowm:LAST: \: %5.1lf (cur)
         GPRINT:slowm:MAX: \: %5.1lf (max)
         GPRINT:slowm:MIN: \: %5.1lf (min)
         GPRINT:slowm:AVERAGE: \: %5.1lf (avg)\n
         LINE2:openm#FF0000:Open / min
         GPRINT:openm:LAST: \: %5.1lf (cur)
         GPRINT:openm:MAX: \: %5.1lf (max)
         GPRINT:openm:MIN: \: %5.1lf (min)
         GPRINT:openm:AVERAGE: \: %5.1lf (avg)\n

▸ quoted from Francesco Duranti

Just to put the right value in the next version of dbcheck.pl do you
know the kind of the other counter? I'm working with mysql on a test db
so the perf data are not so clear to understand :D If I'm correct they
should be:
Threads			= GAUGE
Questions			= DERIVE
Slow queries		= DERIVE
Opens				= DERIVE
Flush tables		= DERIVE
Open tables			= GAUGE
Queries per second avg	= GAUGE

Is this correct?

As far as I know, yes (this is what I have defined anyway).

BTW, I have 2 requests concerning dbcheck.pl

- Firstly, would it be possible to deactivate the test for oraclehome 
being a valid directory if no oracle test is set?

- A nice feature would be to monitor the process list on the mysql 
server (http://dev.mysql.com/doc/refman/5.0/en/show-processlist.html)

As it may be quite long on an active server, it is probably better to 
consolidate them by category (i-e # of process in the "Sleeping" state, 
in the "Sending data" state and so on)

Cheers,
Gildas

list Jason Altrincham Jones · Thu, 21 Sep 2006 12:20:39 +0100 ·

Hi,

We do have Openmanage and use the script by Dave Sobel for the power
supply failures there and I am working on a hard drive one (though I may
have found one for that too so just need to edit it to work on windows),
the only problem is we have openmanage on a limited number of our linux
boxes, so I was wondering if there are any openmanage equivalents for
linux or a way to get linux to report the status on the command line,
after that scripting is easy I'm just not a very advanced linux user, if
any beyond K&R and the development community can be called one :)

Thanks,
Jason

▸ quoted from Johann Eggers

-----Original Message-----
From: Johann Eggers [mailto:user-769b09132207@xymon.invalid] 
Sent: 21 September 2006 12:11
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Failures not reported

Hi,

I guess "Hobbit-Out-of-the-box" wouldn't do the job. 
If the hardware/OS has tools to determine the status of PSU's,
temperature sensors, raids, disks... then you can compile your own
script to report states / failures / values to your hobbit server.

For a SUN based environment (and probably other) there are scripts
available at www.deadcat.net.au

Another useful way is to monitor your devices via SNMP. You can use the
bb-xsnmp.pl script from deadcat or use the devmon tool
(http://devmon.sourceforge.net/) which has a more generally approach in
monitoring snmp enabled devices.

Johann

-----Original Message-----
From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid] 
Sent: Donnerstag, 21. September 2006 12:54
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Failures not reported

Hi all,

Does hobbit have the ability to monitor power supplies and hard drive
statuses? i.e. we had a hard drive failure the other day and hobbit did
not warn on it, also a power supply failed today (there were 2 so the
server didn't go down) and again no alert, does anyone have an ext
script for these or does hobbit have the facility somewhere I don't know
about?

Thanks,
Jason.

dbcheck, custom graphs and strange results 🔗 link

dbcheck, custom graphs and strange results