Xymon Mailing List Archive search

graphing problem

8 messages in this thread

list Phil Crooker · Wed, 17 Apr 2013 11:48:44 +1000 ·
I'm a graphing newbie and can't get the graph to work. Could anyone please help? This is to measure internet latency, the test is called "internet". This is running over four hosts, so we should be getting four sets of graphs. The rrd file is created, but I just see NaN values, no data is being recorded. 
 
many thanks, Phil
 
I've got the following data coming in from an ext script every 10 mintues:
 
google : 0.15
businessspectator : 1.28
bloomberg : 0.05
 
I set this up as a gauge graph. Here is one of the entries:
 
	    <ds>
			    <name> google </name>
			    <type> GAUGE </type>
			    <minimal_heartbeat> 600 </minimal_heartbeat>
			    <min> NaN </min>
			    <max> NaN </max>
 
			    <!-- PDP Status -->
			    <last_ds> U </last_ds>
			    <value> 0.0000000000e+00 </value>
			    <unknown_sec> 172 </unknown_sec>
	    </ds>
 
Here are the config files.
 
 
in xymonserver.cfg:
 
I added this to the GRAPHS string: internet=ncv
 
and put in this line after that 
 
NCV_internet="google:GAUGE,businessspectator:GAUGE,bloomberg:GAUGE"
 
 
graphs.cfg:
 
[internet]
	    TITLE Internet Latency
	    YAXIS Seconds
	    DEF:google=internet.rrd:google:GAUGE
	    DEF:businessspectator=internet.rrd:businessspectator:GAUGE
	    DEF:bloomberg=internet.rrd:bloomberg:GAUGE
	    LINE2:google#00CCCC:Inode cache
	    LINE2:businessspectator#FF0000:Dentry cache
	    LINE2:bloomberg#00FF00:In 
	    COMMENT:Time to load home page in seconds.\n
list Phil Crooker · Wed, 17 Apr 2013 15:30:50 +1000 ·
should be getting four sets of graphs. The rrd file is created, but I just see NaN values, no data is being recorded.
First, it sometimes takes a few cycles for RRD entries to show up. Although I suspect this is normally for non-GAUGE data sources.
 
This has been running for several days.
Check your RRD file permissions and make sure that Xymon can write to them. If they contain nothing but NaNs, you could delete them and see if they get recreated.
Yes they are recreated.
quoted from Phil Crooker
in xymonserver.cfg:
I added this to the GRAPHS string: internet=ncv
and put in this line after that 
NCV_internet="google:GAUGE,businessspectator:GAUGE,bloomberg:GAUGE"
Good. Also, you might need to add "internet=ncv" to TEST2RRD.
It is now in both GRAPHS and TEST2RRD and has gone through several cycles...

Probably something simple I missed out.

thanks.
list Adam Goryachev · Wed, 17 Apr 2013 15:35:51 +1000 ·
quoted from Phil Crooker
On 17/04/13 15:30, Phil Crooker wrote:
should be getting four sets of graphs. The rrd file is created, but I just see NaN values, no data is being recorded.
You mentioned that your data is updated every 10 minutes, ensure that
the RRD is defined as receiving one update every 10 minutes. By default,
it expects data every 5 minutes, and only half the values can be invalid
before all data is invalid, and one update every 10 minutes means half
the data is invalid/unknown....

Just something to check anyway... (or consider sending data updates more
frequently, every 5 minutes like xymon expects).

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au
list Jeremy Laidman · Wed, 17 Apr 2013 15:58:51 +1000 ·
On 17 April 2013 15:30, Phil Crooker <user-e8e31cd73303@xymon.invalid> wrote:
Probably something simple I missed out.
See if there are errors showing in your rrd-status.log file.

J
list Michael Beatty · Wed, 17 Apr 2013 08:10:41 -0400 ·
The problem is most certainly that your data is coming in at 10 minute intervals.  The RRD files are created to have a step of 300 seconds and a heartbeat of 600 seconds.  This means that rrd expects a new data point every 300 seconds (5 minutes) and if it doesn't get a new data point withing 600 seconds (10 minutes) it considers the data junk and disregards it.  Typically, this is set to be double the step, so if you miss one, its okay, but if you miss 2 it returns NaN (Not a Number), there is nothing to stop you from going larger. Since your data is coming in every 10 minutes, it is in violation of your heartbeat and the data is being ignored.

I have asked around before and didn't get a solid answer as to if there is a better way to do it, but I have worked out a way to fix this.  If there is a better way, please let me know!

There are two places you need to tweak, first is in xymonserver.cfg.  You need to fully define your NCV value for that RRD.
You have:
NCV_internet="google:GAUGE,businessspectator:GAUGE,bloomberg:GAUGE"

When an rrd file is created it is in the following format:
dataSourceName:dataSourceType:heartbeat:min_value:max_value

So, your NCV value should be fully defined as:
NCV_internet="google:GAUGE:1200:0:U,businessspectator:GAUGE:1200:0:U,bloomberg:GAUGE:1200:0:U

This will create your 3 datasets, all of type GAUGE with a 1200 (double your 10 minute test) with a 0 for a minimum expected value and an "U"nlimited maximum value.

Next, you need to set your rrddefinition.cfg to set the "step" to override the default 300 second value.  Since your script runs every 10 minutes, it should be 600.  To do this, put a "-s" parameter in your rrddefinition
[internet]
     -s 600
     RRA:AVERAGE:0.5:1:576

One thing to note, while you can change the heartbeat on the fly, the step is permanent.  Once the file is created, changing your rrddefinition.cfg won't change the RRD file.  As long as you are still in development, every time you make a change, just delete the rrd file and let xymon create a new one, it will set the step you have defined in rrddefinition.cfg.  If you aren't in development and do not want to loose the data you currently have, the only option I know you have is to export the rrd using "rrdtool dump" to an XML file, manually edit the STEP of that file, then do an "rrdtool restore" to convert that XML back into an rrd.

Here is a link to the rrd man page, its a good read.
http://oss.oetiker.ch/rrdtool/doc/rrdgraph.en.html


Michael Beatty
quoted from Phil Crooker

On 04/16/2013 09:48 PM, Phil Crooker wrote:
I'm a graphing newbie and can't get the graph to work. Could anyone please help? This is to measure internet latency, the test is called "internet". This is running over four hosts, so we should be getting four sets of graphs. The rrd file is created, but I just see NaN values, no data is being recorded.
many thanks, Phil
I've got the following data coming in from an ext script every 10 mintues:
google : 0.15
businessspectator : 1.28
bloomberg : 0.05
I set this up as a gauge graph. Here is one of the entries:
        <ds>
                <name> google </name>
                <type> GAUGE </type>
                <minimal_heartbeat> 600 </minimal_heartbeat>
                <min> NaN </min>
                <max> NaN </max>
                <!-- PDP Status -->
                <last_ds> U </last_ds>
                <value> 0.0000000000e+00 </value>
                <unknown_sec> 172 </unknown_sec>
        </ds>
Here are the config files.
in xymonserver.cfg:
I added this to the GRAPHS string: internet=ncv
and put in this line after that
NCV_internet="google:GAUGE,businessspectator:GAUGE,bloomberg:GAUGE"
graphs.cfg:
[internet]
        TITLE Internet Latency
        YAXIS Seconds
        DEF:google=internet.rrd:google:GAUGE
DEF:businessspectator=internet.rrd:businessspectator:GAUGE
        DEF:bloomberg=internet.rrd:bloomberg:GAUGE
        LINE2:google#00CCCC:Inode cache
        LINE2:businessspectator#FF0000:Dentry cache
        LINE2:bloomberg#00FF00:In
        COMMENT:Time to load home page in seconds.\n

list Phil Crooker · Thu, 18 Apr 2013 17:22:32 +1000 ·
First, many thanks for everyone's help, I really appreciate it and it saves me a lot of time.
 
Now, I followed Michael's directions, and the data was recorded in the rrd files. But no graphs. So, I then changed the DEF entries in graphs.cfg from GAUGE to AVERAGE as per Wim Nelis's advice and voila - they are there! I think I will change the colour scheme, though....  ;-)
 
Many, many thanks, all, this is excellent.
 
regards, Phil
quoted from Michael Beatty
The problem is most certainly that your data is coming in at 10 minute intervals.  The RRD files are created to have a step of 300 seconds and a heartbeat of 600 seconds.  This means that rrd expects a new data point every 300 seconds (5 minutes) and if it doesn't get a new data point withing 600 seconds (10 minutes) it considers the data junk and disregards it.  Typically, this is set to be double the step, so if you miss one, its okay, but if you miss 2 it returns NaN (Not a Number), there is nothing to stop you from going larger.  Since your data is coming in every 10 minutes, it is in violation of your heartbeat and the data is being ignored.

I have asked around before and didn't get a solid answer as to if there is a better way to do it, but I have worked out a way to fix this.  If there is a better way, please let me know!

There are two places you need to tweak, first is in xymonserver.cfg.  You need to fully define your NCV value for that RRD.
You have: 
NCV_internet="google:GAUGE,businessspectator:GAUGE,bloomberg:GAUGE"

When an rrd file is created it is in the following format:
dataSourceName:dataSourceType:heartbeat:min_value:max_value

So, your NCV value should be fully defined as:
NCV_internet="google:GAUGE:1200:0:U,businessspectator:GAUGE:1200:0:U,bloomberg:GAUGE:1200:0:U

This will create your 3 datasets, all of type GAUGE with a 1200 (double your 10 minute test) with a 0 for a minimum expected value and an "U"nlimited maximum value.

Next, you need to set your rrddefinition.cfg to set the "step" to override the default 300 second value.  Since your script runs every 10 minutes, it should be 600.  To do this, put a "-s" parameter in your rrddefinition
[internet]
    -s 600
    RRA:AVERAGE:0.5:1:576

One thing to note, while you can change the heartbeat on the fly, the step is permanent.  Once the file is created, changing your rrddefinition.cfg won't change the RRD file.  As long as you are still in development, every time you make a change, just delete the rrd file and let xymon create a new one, it will set the step you have defined in rrddefinition.cfg.  If you aren't in development and do not want to loose the data you currently have, the only option I know you have is to export the rrd using "rrdtool dump" to an XML file, manually edit the STEP of that file, then do an "rrdtool restore" to convert that XML back into an rrd.  

Here is a link to the rrd man page, its a good read.
http://oss.oetiker.ch/rrdtool/doc/rrdgraph.en.html
Michael Beatty
On 04/16/2013 09:48 PM, Phil Crooker wrote:


I'm a graphing newbie and can't get the graph to work. Could anyone please help? This is to measure internet latency, the test is called "internet". This is running over four hosts, so we should be getting four sets of graphs. The rrd file is created, but I just see NaN values, no data is being recorded. 
 
many thanks, Phil
 
I've got the following data coming in from an ext script every 10 mintues:
 
google : 0.15
businessspectator : 1.28
bloomberg : 0.05
 
I set this up as a gauge graph. Here is one of the entries:
 
	    <ds>
			    <name> google </name>
			    <type> GAUGE </type>
			    <minimal_heartbeat> 600 </minimal_heartbeat>
			    <min> NaN </min>
			    <max> NaN </max>
 
			    <!-- PDP Status -->
			    <last_ds> U </last_ds>
			    <value> 0.0000000000e+00 </value>
			    <unknown_sec> 172 </unknown_sec>
	    </ds>
 
Here are the config files.
 
 
in xymonserver.cfg:
 
I added this to the GRAPHS string: internet=ncv
 
and put in this line after that 
 
NCV_internet="google:GAUGE,businessspectator:GAUGE,bloomberg:GAUGE"
 
 
graphs.cfg:
 
[internet]
	    TITLE Internet Latency
	    YAXIS Seconds
	    DEF:google=internet.rrd:google:GAUGE
	    DEF:businessspectator=internet.rrd:businessspectator:GAUGE
	    DEF:bloomberg=internet.rrd:bloomberg:GAUGE
	    LINE2:google#00CCCC:Inode cache
	    LINE2:businessspectator#FF0000:Dentry cache
	    LINE2:bloomberg#00FF00:In 
	    COMMENT:Time to load home page in seconds.\n
list David Welker · Mon, 1 Jul 2013 10:32:43 -0400 ·
Maybe I have a unique problem, but I'm hoping someone can help me out as I
have scoured the archives, tried all kinds of combinations, and still
haven't found the perfect solution.

Problem:
I have a column that reports a list of system and user processes (my-conn).
I have a script that gets the summary data into a file (counts) in NCV
format.
I would like to report the summary data as a graph on the same column as
the list of systems and processes (my-conn)

I initially had both my-conn and counts=ncv in my TEST2RRD variable, which
allowed my graph to show up,
 but I was getting all kinds of DS errors with the my-conn rrd, even after
making it type GAUGE.
I also was sending status messages.
I tried  sending data messages, since I don't want a separate column
displayed with the summary data, which didn't seem to help.
Sometimes I get links on the page, or sometimes I get the graph on the
trends page, but not the column page.

I've tried all kinds of variation on the theme...
TEST2RRD=counts=ncv, my-conn=counts
TEST2RRD=counts, my-conn
TEST2RRD=counts
TEST2RRD=my-conn=ncv

GRAPH=counts
GRAPH=my-conn
GRAPH=my-conn=ncv

Bottom line, can this be done?  If so, what should I be sending, and what
should the variable values be?

Thanks!
David
list Jeremy Laidman · Wed, 3 Jul 2013 12:28:24 +1000 ·
quoted from David Welker
On 2 July 2013 00:32, David Welker <user-04cf53598626@xymon.invalid> wrote:
Maybe I have a unique problem, but I'm hoping someone can help me out as I
have scoured the archives, tried all kinds of combinations, and still
haven't found the perfect solution.

This is doable.  Not sure if you're going anything wrong, but one thing
comes to mind.  If you changed your DS type to/from GAUGE, then you need to
rebuild or erase your RRD file.

Regardless, it might help to see a setup that works.  For example, I graph
SMART media errors.  My script (see http://xymonton.org/monitors:xymon-smart)
presents the data values for a sample in a data "trends" message, and also
sends a status "smart" message.  The script that creates these messages is
run from a file in tasks.d/.  The data messages include various metrics
including uncorrected errors, corrected errors, drive temperature and so on.

In graphs.cfg, I append entries for [smart], [smart_uncorrected],
[smart_corrected] and [smart_temp].  Here's an example:

[smart]
        # total read/write errors
        TITLE S.M.A.R.T. Total Media Errors
        YAXIS errors per second
        FNPATTERN ^smart.(.*).rrd
        DEF:rc at RRDIDX@=@RRDFN@:err_r_c:AVERAGE
        DEF:ru at RRDIDX@=@RRDFN@:err_r_u:AVERAGE
        DEF:wc at RRDIDX@=@RRDFN@:err_w_c:AVERAGE
        DEF:wu at RRDIDX@=@RRDFN@:err_w_u:AVERAGE
        CDEF:re at RRDIDX@=rc at RRDIDX@,ru at RRDIDX@,+
        CDEF:we at RRDIDX@=wc at RRDIDX@,wu at RRDIDX@,+
        COMMENT:@RRDPARAM@\:\n
        LINE1:re at RRDIDX@#@COLOR@:Read Errors         :
        GPRINT:re at RRDIDX@:LAST:\: %5.1lf %s (cur)
        GPRINT:re at RRDIDX@:MAX: %5.1lf %s (max)
        GPRINT:re at RRDIDX@:MIN: %5.1lf %s (min)
        GPRINT:re at RRDIDX@:AVERAGE: %5.1lf %s (avg)\n
        LINE1:we at RRDIDX@#@COLOR@:Write Errors        :
        GPRINT:we at RRDIDX@:LAST:\: %5.1lf %s (cur)
        GPRINT:we at RRDIDX@:MAX: %5.1lf %s (max)
        GPRINT:we at RRDIDX@:MIN: %5.1lf %s (min)
        GPRINT:we at RRDIDX@:AVERAGE: %5.1lf %s (avg)\n

In xymonserver.cfg, I append "smart" to the TEST2RRD variable, thusly:

TEST2RRD="cpu-la,disk,...xymonproxy,xymond,smart"

This should result in the RRD file being created and populated.  It should
also result in the graph being shown on the status view of the "smart"
test, using the definition added into graphs.cfg.

Also in xymonserver.cfg, I append "trends" to the GRAPHS variable, like so:

GRAPHS="la,disk,...,xymond,ntp,smart"

This should result in a single "smart" graph being added to the trends page.

In hosts.cfg I include smart in a TRENDS parameter like so:

10.1.2.3 servername # conn dns this that
TRENDS:smart:smart|smart_uncorrected|smart_corrected|smart_temp

This is not strictly necessary for what you want, but I want all possible
graphs to show on the trends page for this host. This essentially says that
on the trends page, where the "smart" graph would have gone, instead put
the four graphs smart, smart_uncorrected, smart_corrected and smart_temp.

J