Xymon Mailing List Archive search

strange graph behavior - random machines & graphs

list Gary Baluha
Wed, 5 Dec 2007 15:55:38 -0500
Message-Id: <user-a89491669fb9@xymon.invalid>

I wrote a script to clean up these bogus values.  Of course, if there are
trend graphs where numbers large enough for NNNe+1NN to be valid, the script
will have unexpected results.  To run the script, you need to "cd" into the
directory with the rrd files to be fixed.

On Dec 5, 2007 2:05 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
cd hobbit_data_dir/host_machine
rrdtool dump clock.rrd > clock.xml

I know any number that shows up greater than "e+1nn" is bogus, so I search
for "e+1".

One of several bogus data lines:
<!-- 2007-11-26 19:00:00 EST / 1196121600 --> <row><v> 3.9551632477e+169</v></row>

Same line, changed to NaN (repeat for all affected lines):
<!-- 2007-11-26 19:00:00 EST / 1196121600 --> <row><v> NaN </v></row>

rrdtool restore clock.xml clock.rrd


On Dec 5, 2007 11:57 AM, Kern, Thomas <user-f1ebafb19faf@xymon.invalid> wrote:
 Could you give a short example of a bogus and a changed (NaN) entry,
just in case that is also happening to some of my data files?


/Thomas Kern
/XXX-XXX-XXXX (O)
/XXX-XXX-XXXX (M)


*From:* Gary Baluha [mailto:user-ae3e15c22de1@xymon.invalid]
*Sent:* Wednesday, December 05, 2007 11:53 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] strange graph behavior - random machines &
graphs

I am now completely convinced that the strange behavior of the graphs is
due to some bad data getting inserted into the .rrd database files.  The bad
data is always the same value: 5.1776682516e+170.  That's what the value
looks like when you do an rrddump on the .rrd database file.

I still have no idea where this value is coming from, but I have at
least determined how to fix these graphs.  I'm working on a script to do
this, but for now, I manually do an rrddump of the file, change all bogus
values to NaN (basically, searching for "e+1", since none of the values I
trend generally get that large, so I know these entries are just averaged
values of correct data and the 5.17... number), and then do an
rrdrestore from the modified xml file.

It would be nice to determine where this problem is coming from, though.

Attachments (1)