Xymon Mailing List Archive search

RRD updates for hours, then stops updating for hours

2 messages in this thread

list Oliver · Wed, 06 Nov 2024 13:55:13 -0500 ·
Hopefully I can explain this properly because I don't really understand what's going on.

I'm running 4.3.30 (Terabithia RPMs) on Centos 6.10

I have an external server that reports the number of files waiting to be processed by our internal app.  We want this number to be 1 (0 means a problem, more than 1 means a queue is building) but for testing purposes I'm overriding the number using FILE_COUNT="$(( ( RANDOM % 31 ) ))" to give me any number between 0 and 30

I see this reflected correctly in my XYMONSRV status page.  Every 5 minutes it reflects the new value and changes colour accordingly but I don't always see the associated RRD get updated.

Here's what I see if I tcpdump on the client (and grep for TTlogs.)  It is sent every 5 minutes without fail.
.status myserver.TTlogs red The count is too high. Data corruption may occur if this reaches 30. [ fileCount:20 ]

However, my associated RRD file hasn't updated in nearly 3 hours:

$ date
Wed Nov  6 18:30:04 GMT 2024
$ ls -l TTlogs.rrd
-rw-r--r-- 1 xymon xymon 217352 Nov  6 15:49 TTlogs.rrd

My troubleshooting of RRD is weak but maybe this will help.

The timestamp that 'rrdtool lastupdate' shows is 15:39 which corresponds with the timestamp on the file
$ rrdtool lastupdate TTlogs.rrd
  fileCount
1730905160: 1

Fetching the LAST readings gives (with comments by me):
$ rrdtool fetch TTlogs.rrd LAST
                       fileCount
1730831700: -nan  (time is November 5, 2024, at 20:15:00)
...same reading at 300 intervals until...
1730844300: -nan
1730844600: 1.0000000000e+00 (time is November 6, 2024, at 00:10:00)
... same at 300 intervals...
1730904900: 1.0000000000e+00 (time is November 6, 2024, at 15:35:00)
1730905200: -nan (time is November 6, 2024, at 15:40:00)
... same at 300 intervals...
1730918100: -nan (time is November 6, 2024, at 19:15:00 - in the future?)

So it was updating with 1 (the real reading) from 00:10:00 to 15:35:00 and then went back to -nan and it refuses to update even when I'm changing the reading every 5 minutes

TTlogs.rrd was generated automatically so it should be set up with whatever the Xymon defaults are.

My xymon config for TTlogs is an external file in xymonserver.cfg.d with:
TEST2RRD+=",TTlogs=ncv"
GRAPHS+=",TTlogs"
NCV_TTlogs="fileCount:GAUGE"

I'm not exactly sure what this, if anything, tells me but maybe it helps
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">;
<!-- Round Robin Database Dump -->
<rrd>
         <version>0003</version>
         <step>300</step> <!-- Seconds -->
         <lastupdate>1730905160</lastupdate> <!-- 2024-11-06 14:59:20 GMT -->

         <ds>
                 <name> fileCount </name>
                 <type> GAUGE </type>
                 <minimal_heartbeat>600</minimal_heartbeat>
                 <min>NaN</min>
                 <max>NaN</max>

                 <!-- PDP Status -->
                 <last_ds>1</last_ds>
                 <value>2.6000000000e+02</value>
                 <unknown_sec> 0 </unknown_sec>
         </ds>

         <!-- Round Robin Archives -->

I'm kind of lost on what is going on.  Any ideas of how to troubleshoot further?

Any help or advice would be gratefully received.
list Oliver · Wed, 06 Nov 2024 15:31:30 -0500 ·
quoted from Oliver
On 2024-11-06 13:55, user-70687f024faa@xymon.invalid wrote:
Hopefully I can explain this properly because I don't really understand what's going on.
...

So... pretty sure it's because of this (from the man page)
  Note  that  each  "NAME  : value" must be on a line by itself. If you have a custom script
        generating the status- or data-message that is fed into the  NCV  handler,  make  sure  it
        inserts a newline before each of the data-items you want to track.

I fixed that and also paid attention to using <!--  ncv_end  --> to end collection and all has been working as expected since.