Xymon Mailing List Archive search

strange graph behavior - random machines & graphs

list Gary Baluha
Fri, 30 Nov 2007 11:14:41 -0500
Message-Id: <user-ec47dfaf853a@xymon.invalid>

On Nov 30, 2007 10:53 AM, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:
 Gary,

This is pretty hard to decipher from "afar".

I think I remember you saying that when you dump the data it is always
okay?
Actually, it turns out this is not true.  The rrd file does indeed have the
bad data.  I just didn't notice it before, but now that it appears to be
getting worse, it is quite obvious to see the bad data.

Some wild thoughts:
a) could there be two different processes updating the same RRD files?
I don't believe so.  The strange thing is, all of the graphs that become
corrupted have the exact same large number that is being input into the rrd
data files.

b) are all servers using the same version of rrdtool?
No.  One is running 1.2.23, the other 1.2.26.  Both have the problem.

c) are the hobbitgraph files okay?  I have proven to my satisfaction that
hobbitgraph definition errors can make the graphs act funny.
They haven't changed since before the graphs were having this problem.

d) if this stuff is on a SAN, can it be moved to local storage?
It is on the SAN on one of the machines, and locally on the other.  I was
thinking of temporarily moving the data directory and have Hobbit regenerate
all the data from scratch.  I'm trying to avoid this, since that would mean
losing a year's worth of trend data that has proven itself very useful.
Still, if it helps me narrow down the problem, I'll consider this (and move
the data back once I get my answer).

I am just "fishing."  Sometimes, when I am at my wit's end, I just change
SOMETHING to see if it makes a difference. Even WORSE can help get me
started.

GLH

*From:* Gary Baluha [mailto:user-ae3e15c22de1@xymon.invalid]
*Sent:* Friday, November 30, 2007 9:25 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] strange graph behavior - random machines & graphs

Now this appears it is becoming a more serious problem.  It seems more and
more graphs are starting to be affected, and I still have no explanation for
what is going on here.  It also seems that almost any new graph that is
created (such as if I delete/rename/move an existing .rrd file), it
immediately starts off being corrupted. :-(

On Nov 28, 2007 10:08 AM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
I have recently noticed a strange thing happening with some of the rrd
graphs generated by Hobbit.  When you look at the graph, it looks as though
the rrd data is one one format (gauge), but the graph is generating it in a
different format (derive).  I can't seem to find any pattern to the hosts or
tests that are exhibiting this strange behavior, and it is only happening on
a handful of graphs.  I have attached a picture of one of these graphs,
since I'm not really sure how to describe it.  Note the huge numbers
displayed on the curr/min/avg/max line.

Any idea what's going on here?  When I dump the RRD file manually,
everything looks okay.  I'm running Hobbit 4.2.0 with the 2007-02-09
allinone patch (I believe the latest).  This has only happened in the past
few weeks, though when exactly it started, I don't know.  Any ideas?