Xymon Mailing List Archive search

strange graph behavior - random machines & graphs

list Gary Baluha
Fri, 30 Nov 2007 13:31:01 -0500
Message-Id: <user-5ba8cb6d4450@xymon.invalid>

On Nov 30, 2007 1:15 PM, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:
 It sounds like you are zeroing in on the problem.  Based on your other
post (and this) it seems that the data is getting logged okay in the RRD,
and that data is being faithfully reproduced by the graphs.  The problem is
that the data itself has unexpected values.  So whatever is providing that
data to the RRD is either faulty, or is in turn being misled by something
else further upstream.
Yeah, I'm fairly confident now that it is the initial data being fed into
the rrd file that is faulty.  I'm still not sure what the initial "entry
point" of this bad data is, though, nor why it is happening.  I have a
feeling that once I determine where the entry point is, that will lead me to
the "why".

I don't remember where you said that this data was coming from.  I know
there can be a problem with "rollovers" when a signed integer is used as a
counter and it grows to the point where the sign bit flips.  This can cause
a big jump in a reading if the software cannot handle the switch from
2,147,483,647 (hex 7FFFFFF) to the next value (hex 80000000) which flips the
sign bit for a signed 32 bit integer.  This has been a problem in the SNMP
world for YEARS.
Hrm, that has been something vaguely on my mind.  But I haven't really
thought of that as _the_ reason why, since I don't know why there would be
some sort of data rollover.  We're talking about load average and disk space
usage graphs that are showing invalid data.  I'm also curious why it would
have started all of a sudden, on two separate machines.  But it does seem
more and more like something like an integer rollover, or similar situation.