Hopefully I can explain this properly because I don't really understand what's going on.
I'm running 4.3.30 (Terabithia RPMs) on Centos 6.10
I have an external server that reports the number of files waiting to be processed by our internal app. We want this number to be 1 (0 means a problem, more than 1 means a queue is building) but for testing purposes I'm overriding the number using FILE_COUNT="$(( ( RANDOM % 31 ) ))" to give me any number between 0 and 30
I see this reflected correctly in my XYMONSRV status page. Every 5 minutes it reflects the new value and changes colour accordingly but I don't always see the associated RRD get updated.
Here's what I see if I tcpdump on the client (and grep for TTlogs.) It is sent every 5 minutes without fail.
.status myserver.TTlogs red The count is too high. Data corruption may occur if this reaches 30. [ fileCount:20 ]
However, my associated RRD file hasn't updated in nearly 3 hours:
$ date
Wed Nov 6 18:30:04 GMT 2024
$ ls -l TTlogs.rrd
-rw-r--r-- 1 xymon xymon 217352 Nov 6 15:49 TTlogs.rrd
My troubleshooting of RRD is weak but maybe this will help.
The timestamp that 'rrdtool lastupdate' shows is 15:39 which corresponds with the timestamp on the file
$ rrdtool lastupdate TTlogs.rrd
fileCount
1730905160: 1
Fetching the LAST readings gives (with comments by me):
$ rrdtool fetch TTlogs.rrd LAST
fileCount
1730831700: -nan (time is November 5, 2024, at 20:15:00)
...same reading at 300 intervals until...
1730844300: -nan
1730844600: 1.0000000000e+00 (time is November 6, 2024, at 00:10:00)
... same at 300 intervals...
1730904900: 1.0000000000e+00 (time is November 6, 2024, at 15:35:00)
1730905200: -nan (time is November 6, 2024, at 15:40:00)
... same at 300 intervals...
1730918100: -nan (time is November 6, 2024, at 19:15:00 - in the future?)
So it was updating with 1 (the real reading) from 00:10:00 to 15:35:00 and then went back to -nan and it refuses to update even when I'm changing the reading every 5 minutes
TTlogs.rrd was generated automatically so it should be set up with whatever the Xymon defaults are.
My xymon config for TTlogs is an external file in xymonserver.cfg.d with:
TEST2RRD+=",TTlogs=ncv"
GRAPHS+=",TTlogs"
NCV_TTlogs="fileCount:GAUGE"
I'm not exactly sure what this, if anything, tells me but maybe it helps
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">
<!-- Round Robin Database Dump -->
<rrd>
<version>0003</version>
<step>300</step> <!-- Seconds -->
<lastupdate>1730905160</lastupdate> <!-- 2024-11-06 14:59:20 GMT -->
<ds>
<name> fileCount </name>
<type> GAUGE </type>
<minimal_heartbeat>600</minimal_heartbeat>
<min>NaN</min>
<max>NaN</max>
<!-- PDP Status -->
<last_ds>1</last_ds>
<value>2.6000000000e+02</value>
<unknown_sec> 0 </unknown_sec>
</ds>
<!-- Round Robin Archives -->
I'm kind of lost on what is going on. Any ideas of how to troubleshoot further?
Any help or advice would be gratefully received.