Xymon Mailing List Archive search

Gaps in graphs

list Jeremy Laidman
Tue, 9 Mar 2021 09:29:19 +1100
Message-Id: <CACO=ejwA0=UiasSo=user-742c58df5118@xymon.invalid>

On Mon, 8 Mar 2021 at 19:21, Carl Melgaard <user-cdea55422fa4@xymon.invalid> wrote:
Are you receiving these "duplicate RRD data" messages every 5 minutes,
or only occasionally (such as when you're seeing gaps in your graphs)?
It might be helpful to see one of your graphs with gaps in it.
Also, can you provide maybe 10 sequential log messages with the
"duplicate RRD data" in them? I'd like to get a sense of their regularity
and frequency.


2021-03-03 01:24:19.002264 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614731059, different data

2021-03-03 02:55:15.002852 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614736515, different data

2021-03-04 10:01:17.004140 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614848477, different data

2021-03-05 14:15:25.007389 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614950125, different data

2021-03-05 14:15:25.007523 x/ifstat.eno16780032.rrd: Bug - duplicate RRD
data with same timestamp 1614950125, different data

2021-03-05 22:56:18.014486 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614981378, different data

2021-03-05 22:56:18.015006 x/ifstat.eno16780032.rrd: Bug - duplicate RRD
data with same timestamp 1614981378, different data

2021-03-06 12:30:28.002023 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1615030228, different data

2021-03-06 12:30:28.002952 x/ifstat.eno16780032.rrd: Bug - duplicate RRD
data with same timestamp 1615030228, different data
Interesting. It seems to be a rare occurrence - no more than two duplicate
data points in a day - almost too few to notice. Are your gaps more than 5
minutes (one sample) long? It might be helpful for you to include an
example gappy graph for us to see.

Some of these errors relate to netstat and others to ifstat processing.
Both parsers receive data from the same client data message. Interestingly,
only some of the errors for netstat.rrd coincide with ones for ifstat.rrd.
The matching timestamps means this is unlikely to be a coincidence, but I'm
not sure what to make of it TBH.
One last thing to look at. Are the gaps actual missing data points, or are
they values of zero? The way to tell this is to dump the RRD file's
contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail
-100" (or "less rather than tail -100) and look for either zero or low
numbers, >or NaN (not a number) entries. [Note that the last few are
usually NaN because they're still waiting for updates, so you can ignore
those.]


Currently I cant actually find a graph with a gap in it. I just noticed
because it happened on the Xymon server itself. On my old setup, it never
happened.
OK. I think your best bet to diagnose is going to be correlating log
messages or other events to the gaps.

You mentioned an "old setup". Can you describe what has changed from old to
new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?

You said that you noticed on the Xymon server itself. Has it only happened
to graphs for the Xymon server? I'm wondering if you have the Xymon client
AND the Xymon server both running on the same host?

Also in xymonclient.log I get these quite alot, dunno if its related:


mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory
Can you explain "quite alot"? Can you give an indication of how often these
occur?

This might very well be related. The logfetch and vmstat files are created
during the construction of the client data message. It's likely that some,
if not all, of the client data message will be missing, when these logs
show up.

I'd be trying to correlate these log messages with the times that you get
gaps in your graphs. If they match, then it looks to be a problem with the
Xymon client.