dbcheck, custom graphs and strange results
list Gildas le Nadan
Hello,
I am trying to graph the value of the "Opens" value produced by dbcheck when monitoring mysql.
I defined the value as DERIVE in NCV_mysql instead of GAUGE, as this is a value that is growing while the mysql instance is running (if the mysql instance restart it restarts from 0).
For instance, I have the following:
time | Opens | diff | graph'd | diff/graph'd
| value | t'-t | value | value
------+-------+------+---------+-------------
10:14 | 22732 | - | - | -
10:19 | 22765 | +33 | 110m | 3.33
10:24 | 22806 | +41 | 128m | 3.12
10:29 | 22821 | +15 | 50m | 3.33
10:34 | 22835 | +14 | 48m | 3.2
I was expecting the graphed value to be equal to the diff value, but the units are wrong (m instead of unit), and the value is wrong as well (with a somewhat constant 3.xx ratio).
Any idea where this comes from? (I suspect this is me being stupid and not understanding some of the underlying rrd subtleties)
Cheers,
Gildas
list Francesco Duranti
Using DERIVE, if i remember well, the value is also divided by the time
elapsed so for example:
10:24 | 22806 | +41 | 128m | 3.12
41/300 (5 minutes interval)=0,136 and rrd will report that number to a
understandable unit (m is milli) so it's 0,136*1000=136m.
De difference of the value you get here 128m instead of 136 m should be
beacause rrd will calc the time when he get the data so it's like it got
the new value after 320 seconds instead of 300 (41/0,128 = 320). The
data graphed is something like opens/seconds. The suggestion on the rrd
site is to eventually multiply it and get data/minutes or data/hours
instead of the "pure" absolute value that is not really useful. To do
this you can put a CDEF in the hobbitgraphs
[mysqlopen]
DEF:op=mysqlperf.rrd:Opens:AVERAGE
CDEF:opm=op,60,*
TITLE MySQL Open / Minutes
YAXIS open/min
LINE2:opm#00CCCC:Open/min.
COMMENT:\n
GPRINT:slow:LAST: \: %5.1lf (cur)
GPRINT:slow:MAX: \: %5.1lf (max)
GPRINT:slow:MIN: \: %5.1lf (min)
GPRINT:slow:AVERAGE: \: %5.1lf (avg)\n
Just to put the right value in the next version of dbcheck.pl do you
know the kind of the other counter? I'm working with mysql on a test db
so the perf data are not so clear to understand :D If I'm correct they
should be:
Threads = GAUGE
Questions = DERIVE
Slow queries = DERIVE
Opens = DERIVE
Flush tables = DERIVE
Open tables = GAUGE
Queries per second avg = GAUGE
Is this correct?
This is the explanation on the rrd create from the website...
It's always a Rate
RRDtool stores rates in amount/second for COUNTER, DERIVE and
ABSOLUTE data. When you plot the data, you will get on the y axis
amount/second which you might be tempted to convert to an absolute
amount by multiplying by the delta-time between the points. RRDtool
plots continuous data, and as such is not appropriate for plotting
absolute amounts as for example ``total bytes'' sent and received in a
router. What you probably want is plot rates that you can scale to
bytes/hour, for example, or plot absolute amounts with another tool that
draws bar-plots, where the delta-time is clear on the plot for each
point (such that when you read the graph you see for example GB on the y
axis, days on the x axis and one bar for each day).
Francesco
▸
Hello,
I am trying to graph the value of the "Opens" value produced
by dbcheck when monitoring mysql.
I defined the value as DERIVE in NCV_mysql instead of GAUGE,
as this is a value that is growing while the mysql instance
is running (if the mysql instance restart it restarts from 0).
For instance, I have the following:
time | Opens | diff | graph'd | diff/graph'd
| value | t'-t | value | value
------+-------+------+---------+-------------
10:14 | 22732 | - | - | -
10:19 | 22765 | +33 | 110m | 3.33
10:24 | 22806 | +41 | 128m | 3.12
10:29 | 22821 | +15 | 50m | 3.33
10:34 | 22835 | +14 | 48m | 3.2
I was expecting the graphed value to be equal to the diff
value, but the units are wrong (m instead of unit), and the
value is wrong as well (with a somewhat constant 3.xx ratio).
Any idea where this comes from? (I suspect this is me being
stupid and not understanding some of the underlying rrd subtleties)
Cheers,
Gildas
list Jason Altrincham Jones
Hi all, Does hobbit have the ability to monitor power supplies and hard drive statuses? i.e. we had a hard drive failure the other day and hobbit did not warn on it, also a power supply failed today (there were 2 so the server didn't go down) and again no alert, does anyone have an ext script for these or does hobbit have the facility somewhere I don't know about? Thanks, Jason.
list Johann Eggers
Hi, I guess "Hobbit-Out-of-the-box" wouldn't do the job. If the hardware/OS has tools to determine the status of PSU's, temperature sensors, raids, disks... then you can compile your own script to report states / failures / values to your hobbit server. For a SUN based environment (and probably other) there are scripts available at www.deadcat.net.au Another useful way is to monitor your devices via SNMP. You can use the bb-xsnmp.pl script from deadcat or use the devmon tool (http://devmon.sourceforge.net/) which has a more generally approach in monitoring snmp enabled devices. Johann
▸
-----Original Message-----
From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid]
Sent: Donnerstag, 21. September 2006 12:54
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Failures not reported
Hi all,
Does hobbit have the ability to monitor power supplies and hard drive
statuses? i.e. we had a hard drive failure the other day and hobbit did
not warn on it, also a power supply failed today (there were 2 so the
server didn't go down) and again no alert, does anyone have an ext
script for these or does hobbit have the facility somewhere I don't know
about?
Thanks,
Jason.
list Gildas le Nadan
Thanks for your answer Francesco, it is graphing the correct values now.
▸
Francesco Duranti wrote:Using DERIVE, if i remember well, the value is also divided by the time elapsed so for example: 10:24 | 22806 | +41 | 128m | 3.12 41/300 (5 minutes interval)=0,136 and rrd will report that number to a understandable unit (m is milli) so it's 0,136*1000=136m. De difference of the value you get here 128m instead of 136 m should be beacause rrd will calc the time when he get the data so it's like it got the new value after 320 seconds instead of 300 (41/0,128 = 320).
Hum, it is more likely because the values were read "on the graph" :)
▸
The
data graphed is something like opens/seconds. The suggestion on the rrd
site is to eventually multiply it and get data/minutes or data/hours
instead of the "pure" absolute value that is not really useful. To do
this you can put a CDEF in the hobbitgraphs
[mysqlopen]
DEF:op=mysqlperf.rrd:Opens:AVERAGE
CDEF:opm=op,60,*
TITLE MySQL Open / Minutes
YAXIS open/min
LINE2:opm#00CCCC:Open/min.
COMMENT:\n
GPRINT:slow:LAST: \: %5.1lf (cur)
GPRINT:slow:MAX: \: %5.1lf (max)
GPRINT:slow:MIN: \: %5.1lf (min)
GPRINT:slow:AVERAGE: \: %5.1lf (avg)\n
Well actually, I renamed the test mysql after your message on
sourceforge, and the "Opens" value was added to the mysqlslow graph (I
shall renamed it otherwise).
So here is my working mysqlslow definition in hobbitgraph.cfg
[mysqlslow]
DEF:slow=mysql.rrd:Slowqueries:AVERAGE
DEF:open=mysql.rrd:Opens:AVERAGE
CDEF:slowm=slow,60,*
CDEF:openm=open,60,*
TITLE MySQL Slow Queries & Opens / min
YAXIS #
LINE2:slowm#000000:Slow Queries / min
GPRINT:slowm:LAST: \: %5.1lf (cur)
GPRINT:slowm:MAX: \: %5.1lf (max)
GPRINT:slowm:MIN: \: %5.1lf (min)
GPRINT:slowm:AVERAGE: \: %5.1lf (avg)\n
LINE2:openm#FF0000:Open / min
GPRINT:openm:LAST: \: %5.1lf (cur)
GPRINT:openm:MAX: \: %5.1lf (max)
GPRINT:openm:MIN: \: %5.1lf (min)
GPRINT:openm:AVERAGE: \: %5.1lf (avg)\n
▸
Just to put the right value in the next version of dbcheck.pl do you know the kind of the other counter? I'm working with mysql on a test db so the perf data are not so clear to understand :D If I'm correct they should be: Threads = GAUGE Questions = DERIVE Slow queries = DERIVE Opens = DERIVE Flush tables = DERIVE Open tables = GAUGE Queries per second avg = GAUGE Is this correct?
As far as I know, yes (this is what I have defined anyway). BTW, I have 2 requests concerning dbcheck.pl - Firstly, would it be possible to deactivate the test for oraclehome being a valid directory if no oracle test is set? - A nice feature would be to monitor the process list on the mysql server (http://dev.mysql.com/doc/refman/5.0/en/show-processlist.html) As it may be quite long on an active server, it is probably better to consolidate them by category (i-e # of process in the "Sleeping" state, in the "Sending data" state and so on) Cheers, Gildas
list Jason Altrincham Jones
Hi, We do have Openmanage and use the script by Dave Sobel for the power supply failures there and I am working on a hard drive one (though I may have found one for that too so just need to edit it to work on windows), the only problem is we have openmanage on a limited number of our linux boxes, so I was wondering if there are any openmanage equivalents for linux or a way to get linux to report the status on the command line, after that scripting is easy I'm just not a very advanced linux user, if any beyond K&R and the development community can be called one :) Thanks, Jason
▸
-----Original Message----- From: Johann Eggers [mailto:user-769b09132207@xymon.invalid] Sent: 21 September 2006 12:11 To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] Failures not reported Hi, I guess "Hobbit-Out-of-the-box" wouldn't do the job. If the hardware/OS has tools to determine the status of PSU's, temperature sensors, raids, disks... then you can compile your own script to report states / failures / values to your hobbit server. For a SUN based environment (and probably other) there are scripts available at www.deadcat.net.au Another useful way is to monitor your devices via SNMP. You can use the bb-xsnmp.pl script from deadcat or use the devmon tool (http://devmon.sourceforge.net/) which has a more generally approach in monitoring snmp enabled devices. Johann -----Original Message----- From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid] Sent: Donnerstag, 21. September 2006 12:54 To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] Failures not reported Hi all, Does hobbit have the ability to monitor power supplies and hard drive statuses? i.e. we had a hard drive failure the other day and hobbit did not warn on it, also a power supply failed today (there were 2 so the server didn't go down) and again no alert, does anyone have an ext script for these or does hobbit have the facility somewhere I don't know about? Thanks, Jason.