Gaps in graphs

16 messages in this thread

list Carl Melgaard · Thu, 4 Mar 2021 10:43:54 +0000 ·

Hi,

How serious is gaps in graphs - for instance disk-graphs etc. Is a gap the same as potential missing alerting on events?

Regards,

Carl Melgaard

list Jeremy Laidman · Fri, 5 Mar 2021 12:37:45 +1100 ·

▸ quoted from Carl Melgaard

On Thu, 4 Mar 2021 at 21:44, Carl Melgaard <user-cdea55422fa4@xymon.invalid> wrote:

Hi,


How serious is gaps in graphs ? for instance disk-graphs etc. Is a gap the
same as potential missing alerting on events?


Regards,


Carl Melgaard

Yes, usually gaps in graphs are caused by missing data points. In the case
of the disk graph, this is usually caused by missing client data messages
that are not being sent from host to Xymon server, for some reason - such
as stopping the Xymon client at just the wrong time. It's also possible
that client data messages are not being sent in a timely manner - if two
data points are fed into RRD within the same 5-minute interval, the second
one is ignored, and then the next 5-minute interval with have no data point.

One unlikely cause of missing graphs is that the client data message is
being truncated. If the disk stats are after the point of truncation, then
there are no data points to add to the RRD file, so you'll see a gap. I
would check your xymond.log file for messages like "Oversize data/client
msg from 10.1.1.1 truncated n=<msgsize>, limit <msglimit>). If disk graphs
are affected by a section earlier in the message, it's likely that other
graphs are also affected by this - the [df] section is followed by [free]
(memor), then [ifconfig] and all the other sections used for network stats.
Perhaps scan down the graphs on the trends page looking for similar gaps.

I've seen client data message truncation cause missing data points, but,
it's actually unlikely this is the cause of your problem. All of the client
data sections that are likely to cause truncation are after the sections
that are used for the standard graphs (including disk). But it couldn't
hurt to check. Message limit defaults can be changed in the xymonserver.cfg
file - search the man page for MAXMSG_CLIENT for more details.

If the cause is something else, I suspect you'll still find clues in your
xymond.log file. But also check rrd-data.log and rrd-status.log.

J

list Carl Melgaard · Fri, 5 Mar 2021 08:11:31 +0000 ·

▸ quoted from Carl Melgaard

On Thu, 4 Mar 2021 at 21:44, Carl Melgaard <user-cdea55422fa4@xymon.invalid<mailto:user-cdea55422fa4@xymon.invalid>> wrote:
Hi,

How serious is gaps in graphs ? for instance disk-graphs etc. Is a gap the same as potential missing alerting on events?

Regards,

Carl Melgaard

Yes, usually gaps in graphs are caused by missing data points. In the case of the disk graph, this is usually caused by missing client data messages that are not being sent from host to Xymon server, for some reason - such as stopping the Xymon client at just the wrong time. It's also >possible that client data messages are not being sent in a timely manner - if two data points are fed into RRD within the same 5-minute interval, the second one is ignored, and then the next 5-minute interval with have no data point.

One unlikely cause of missing graphs is that the client data message is being truncated. If the disk stats are after the point of truncation, then there are no data points to add to the RRD file, so you'll see a gap. I would check your xymond.log file for messages like "Oversize data/client msg >from 10.1.1.1 truncated n=<msgsize>, limit <msglimit>). If disk graphs are affected by a section earlier in the message, it's likely that other graphs are also affected by this - the [df] section is followed by [free] (memor), then [ifconfig] and all the other sections used for network stats. >Perhaps scan down the graphs on the trends page looking for similar gaps.

I've seen client data message truncation cause missing data points, but, it's actually unlikely this is the cause of your problem. All of the client data sections that are likely to cause truncation are after the sections that are used for the standard graphs (including disk). But it couldn't hurt to >check. Message limit defaults can be changed in the xymonserver.cfg file - search the man page for MAXMSG_CLIENT for more details.

If the cause is something else, I suspect you'll still find clues in your xymond.log file. But also check rrd-data.log and rrd-status.log.

It is indeed wrong with more graphs than just disk. I?ll look around ? thanks for the pointers.

Regards,

Carl

list Carl Melgaard · Fri, 5 Mar 2021 10:48:41 +0000 ·

▸ quoted from Carl Melgaard

On Thu, 4 Mar 2021 at 21:44, Carl Melgaard <user-cdea55422fa4@xymon.invalid<mailto:user-cdea55422fa4@xymon.invalid>> wrote:
Hi,
How serious is gaps in graphs ? for instance disk-graphs etc. Is a gap the same as potential missing alerting on events?
Regards,
Carl Melgaard

Yes, usually gaps in graphs are caused by missing data points. In the case of the disk graph, this is usually caused by missing client data messages that are not being sent from host to Xymon server, for some reason - such as stopping the Xymon client at just the wrong time. It's also >possible that client data messages are not being sent in a timely manner - if two data points are fed into RRD within the same 5-minute interval, the second one is ignored, and then the next 5-minute interval with have no data point.
One unlikely cause of missing graphs is that the client data message is being truncated. If the disk stats are after the point of truncation, then there are no data points to add to the RRD file, so you'll see a gap. I would check your xymond.log file for messages like "Oversize data/client msg >from 10.1.1.1 truncated n=<msgsize>, limit <msglimit>). If disk graphs are affected by a section earlier in the message, it's likely that other graphs are also affected by this - the [df] section is followed by [free] (memor), then [ifconfig] and all the other sections used for network stats. >Perhaps scan down the graphs on the trends page looking for similar gaps.
I've seen client data message truncation cause missing data points, but, it's actually unlikely this is the cause of your problem. All of the client data sections that are likely to cause truncation are after the sections that are used for the standard graphs (including disk). But it couldn't hurt to >check. Message limit defaults can be changed in the xymonserver.cfg file - search the man page for MAXMSG_CLIENT for more details.
If the cause is something else, I suspect you'll still find clues in your xymond.log file. But also check rrd-data.log and rrd-status.log.

So, I looked through the logs. Xymond.log doesn?t point to anything besides the normal spam in there, except this one: Sending dropstate (from xymond) with xxx

But in the rrd-data.log and rrd-status.log I have this occurring (more than once):

rrd-data.log
2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

rrd-status.log
2021-03-03 02:55:15.003177 xxx/disk,tmpfs.rrd: Bug - duplicate RRD data with same timestamp 1614736515, different data

I recently updated to newest version of Xymon (from a very old version), and it seems I carried over some MAXMSG-settings:

MAXMSG_STATUS="5180590"
MAXMSG_CLIENT="5180590"
MAXMSG_DATA="5180590"
#MAXMSG_CLIENT=512              # clientdata messages (default=512k)
#MAXMSG_STATUS=256              # general "status" messages (default=256k)
#MAXMSG_DATA=256                # "data" messages, if enabled (default=256k)

And if Xymon now thinks numbers are in kilobyte instead of bytes, I seem to have allocated A LOT more memory perhaps?

Regards,

Carl Melgaard

list Scot Kreienkamp · Fri, 5 Mar 2021 15:04:10 +0000 ·

2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

That usually happens because graphs by default store data once every 5 minutes.  However, if it receives data more often, say every minute, then it can?t store that in the RRD because it can only store one data point every 5 minutes.  Since it?s receiving data more often than once in the 5 minute window the RRD backend triggers that message.


Scot Kreienkamp | Senior Systems Engineer | La-Z-Boy Corporate
One La-Z-Boy Drive | Monroe, Michigan 48162 | ? XXX-XXX-XXXX | |  ? X-XXX-XXX-XXXX  | ? user-9678697f1438@xymon.invalid
www.la-z-boy.com<http://www.la-z-boy.com>;  | facebook.com/lazboy<http://facebook.com/lazboy>;  | twitter.com/lazboy<http://twitter.com/lazboy>; | youtube.com/lazboy<http://youtube.com/lazboy>;
[cid:4C-lzbVertical_Tag_400px_d8b9412e-f3ea-46a1-99dc-a7c57261e11e.jpg]

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

list Jeremy Laidman · Sat, 6 Mar 2021 18:31:10 +1100 ·

On Sat, 6 Mar 2021 at 02:04, Scot Kreienkamp <user-9678697f1438@xymon.invalid>

▸ quoted from Scot Kreienkamp

wrote:

2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614731059, different data


That usually happens because graphs by default store data once every 5
minutes.  However, if it receives data more often, say every minute, then
it can?t store that in the RRD because it can only store one data point
every 5 minutes.  Since it?s receiving data more often than once in the 5
minute window the RRD backend triggers that message.

Yes, what Scott said. This could mean you have two different sources of
both client data messages and status messages, as if you have two copies of
the Xymon client running on the host being monitored.

However, duplicate messages would, not in itself, lead to missing data. It
would only cause the extra data to be dropped, but the first data point in
a 5-minute window would be accepted, and no gaps in the data. The only way
I can see this being a symptom of a problem that also causes gaps in
graphs, is if the clock on the host is jittering wildly (such as if it was
a VM on a heavily-loaded host server) and causing some sequential messages
to arrive at the Xymon server too close together. This is quite unlikely,
so I'm not sure this is related to the gaps. Instead, you might just have
two problems to solve: duplicate data sources, and an as-yet unexplained
gaps in your graphs.

Are you receiving these "duplicate RRD data" messages every 5 minutes, or
only occasionally (such as when you're seeing gaps in your graphs)?

It might be helpful to see one of your graphs with gaps in it.

Also, can you provide maybe 10 sequential log messages with the "duplicate
RRD data" in them? I'd like to get a sense of their regularity and
frequency.

One last thing to look at. Are the gaps actual missing data points, or are
they values of zero? The way to tell this is to dump the RRD file's
contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail
-100" (or "less rather than tail -100) and look for either zero or low
numbers, or NaN (not a number) entries. [Note that the last few are usually
NaN because they're still waiting for updates, so you can ignore those.]

Cheers
Jeremy

list Carl Melgaard · Mon, 8 Mar 2021 08:13:53 +0000 ·

▸ quoted from Scot Kreienkamp

On Sat, 6 Mar 2021 at 02:04, Scot Kreienkamp <user-9678697f1438@xymon.invalid<mailto:user-9678697f1438@xymon.invalid>> wrote:
2021-03-03 01:24:19.002264 xxx/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data

That usually happens because graphs by default store data once every 5 minutes.  However, if it receives data more often, say every minute, then it can?t store that in the RRD because it can only store one data point every 5 minutes.  Since it?s receiving data more often than once in the 5 minute window the RRD backend triggers that message.

Are you receiving these "duplicate RRD data" messages every 5 minutes, or only occasionally (such as when you're seeing gaps in your graphs)?
It might be helpful to see one of your graphs with gaps in it.
Also, can you provide maybe 10 sequential log messages with the "duplicate RRD data" in them? I'd like to get a sense of their regularity and frequency.

2021-03-03 01:24:19.002264 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614731059, different data
2021-03-03 02:55:15.002852 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614736515, different data
2021-03-04 10:01:17.004140 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614848477, different data
2021-03-05 14:15:25.007389 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614950125, different data
2021-03-05 14:15:25.007523 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1614950125, different data
2021-03-05 22:56:18.014486 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1614981378, different data
2021-03-05 22:56:18.015006 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1614981378, different data
2021-03-06 12:30:28.002023 x/netstat.rrd: Bug - duplicate RRD data with same timestamp 1615030228, different data
2021-03-06 12:30:28.002952 x/ifstat.eno16780032.rrd: Bug - duplicate RRD data with same timestamp 1615030228, different data

One last thing to look at. Are the gaps actual missing data points, or are they values of zero? The way to tell this is to dump the RRD file's contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail -100" (or "less rather than tail -100) and look for either zero or low numbers, >or NaN (not a number) entries. [Note that the last few are usually NaN because they're still waiting for updates, so you can ignore those.]

Currently I cant actually find a graph with a gap in it. I just noticed because it happened on the Xymon server itself. On my old setup, it never happened.

Also in xymonclient.log I get these quite alot, dunno if its related:

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory
cat: /dev/shm/xymon_vmstat.x: No such file or directory
cat: /dev/shm/xymon_vmstat.x: No such file or directory

Regards,

Carl

list Jeremy Laidman · Tue, 9 Mar 2021 09:29:19 +1100 ·

▸ quoted from Carl Melgaard

On Mon, 8 Mar 2021 at 19:21, Carl Melgaard <user-cdea55422fa4@xymon.invalid> wrote:

Are you receiving these "duplicate RRD data" messages every 5 minutes,
or only occasionally (such as when you're seeing gaps in your graphs)?

It might be helpful to see one of your graphs with gaps in it.

Also, can you provide maybe 10 sequential log messages with the

"duplicate RRD data" in them? I'd like to get a sense of their regularity
and frequency.


2021-03-03 01:24:19.002264 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614731059, different data

2021-03-03 02:55:15.002852 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614736515, different data

2021-03-04 10:01:17.004140 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614848477, different data

2021-03-05 14:15:25.007389 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614950125, different data

2021-03-05 14:15:25.007523 x/ifstat.eno16780032.rrd: Bug - duplicate RRD
data with same timestamp 1614950125, different data

2021-03-05 22:56:18.014486 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1614981378, different data

2021-03-05 22:56:18.015006 x/ifstat.eno16780032.rrd: Bug - duplicate RRD
data with same timestamp 1614981378, different data

2021-03-06 12:30:28.002023 x/netstat.rrd: Bug - duplicate RRD data with
same timestamp 1615030228, different data

2021-03-06 12:30:28.002952 x/ifstat.eno16780032.rrd: Bug - duplicate RRD
data with same timestamp 1615030228, different data

Interesting. It seems to be a rare occurrence - no more than two duplicate
data points in a day - almost too few to notice. Are your gaps more than 5
minutes (one sample) long? It might be helpful for you to include an
example gappy graph for us to see.

Some of these errors relate to netstat and others to ifstat processing.
Both parsers receive data from the same client data message. Interestingly,
only some of the errors for netstat.rrd coincide with ones for ifstat.rrd.
The matching timestamps means this is unlikely to be a coincidence, but I'm
not sure what to make of it TBH.

▸ quoted from Carl Melgaard

One last thing to look at. Are the gaps actual missing data points, or are
they values of zero? The way to tell this is to dump the RRD file's
contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail
-100" (or "less rather than tail -100) and look for either zero or low
numbers, >or NaN (not a number) entries. [Note that the last few are
usually NaN because they're still waiting for updates, so you can ignore
those.]


Currently I cant actually find a graph with a gap in it. I just noticed
because it happened on the Xymon server itself. On my old setup, it never
happened.

OK. I think your best bet to diagnose is going to be correlating log
messages or other events to the gaps.

You mentioned an "old setup". Can you describe what has changed from old to
new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?

You said that you noticed on the Xymon server itself. Has it only happened
to graphs for the Xymon server? I'm wondering if you have the Xymon client
AND the Xymon server both running on the same host?

▸ quoted from Carl Melgaard

Also in xymonclient.log I get these quite alot, dunno if its related:


mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

Can you explain "quite alot"? Can you give an indication of how often these
occur?

This might very well be related. The logfetch and vmstat files are created
during the construction of the client data message. It's likely that some,
if not all, of the client data message will be missing, when these logs
show up.

I'd be trying to correlate these log messages with the times that you get
gaps in your graphs. If they match, then it looks to be a problem with the
Xymon client.

list Carl Melgaard · Tue, 9 Mar 2021 07:39:46 +0000 ·

▸ quoted from Carl Melgaard

On Mon, 8 Mar 2021 at 19:21, Carl Melgaard <user-cdea55422fa4@xymon.invalid<mailto:user-cdea55422fa4@xymon.invalid>> wrote:

One last thing to look at. Are the gaps actual missing data points, or are they values of zero? The way to tell this is to dump the RRD file's contents using something like "rrdtool fetch netstat.rrd AVERAGE | tail -100" (or "less rather than tail -100) and look for either zero or low numbers, >or NaN (not a number) entries. [Note that the last few are usually NaN because they're still waiting for updates, so you can ignore those.]

Currently I cant actually find a graph with a gap in it. I just noticed because it happened on the Xymon server itself. On my old setup, it never happened.

OK. I think your best bet to diagnose is going to be correlating log messages or other events to the gaps.
You mentioned an "old setup". Can you describe what has changed from old to new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?

I changed OS, CentOS 5.11 -> RH 7.9 and Xymon from 4.3.7 to 4.3.30, and changed from selfcompiled to Terabithia-packages. So quite a big jump.
Yes, client and server both runs on the same host. As did it on the old system. I want he Xymon server itself monitored. I have 2 Xymon servers, 1 primary and 1 secondary. The primary distributes to the secondary. Only the secondary is updated as of yet.

You said that you noticed on the Xymon server itself. Has it only happened to graphs for the Xymon server? I'm wondering if you have the Xymon client AND the Xymon server both running on the same host?

After I noticed it on the Xymon server itself, I went looked for gaps elsewhere, and I found some on other servers as well.

▸ quoted from Jeremy Laidman


Also in xymonclient.log I get these quite alot, dunno if its related:

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory
cat: /dev/shm/xymon_vmstat.x: No such file or directory
cat: /dev/shm/xymon_vmstat.x: No such file or directory

Can you explain "quite alot"? Can you give an indication of how often these occur?

623 lines in the logfile yesterday.

Regards,

Carl

list Jeremy Laidman · Wed, 10 Mar 2021 09:53:43 +1100 ·

▸ quoted from Carl Melgaard

On Tue, 9 Mar 2021 at 18:47, Carl Melgaard <user-cdea55422fa4@xymon.invalid> wrote:

You mentioned an "old setup". Can you describe what has changed from old
to new setup? Have you upgraded hardware/OS/Xymon server/Xymon client(s)?


I changed OS, CentOS 5.11 -> RH 7.9 and Xymon from 4.3.7 to 4.3.30, and
changed from selfcompiled to Terabithia-packages. So quite a big jump.

Yes, client and server both runs on the same host. As did it on the old
system. I want he Xymon server itself monitored.

Yep, that makes sense. My curiosity around this is the possibility that the
Xymon server is running the client scripts from its clientlaunch process,
and also a second copy of clientlaunch is running the same scripts - in
essence, a "server" instance of the client scripts, as well as a "client"
instance of the client scripts. If this is happening, you'll get two data
messages every 5 minutes instead of one.

Again, I don't think this would cause graph gaps, but it might be causing
some of your warning logs.

Interestingly, Terabithia packages for Xymon up to v4.3.18 included both
client and server components in the one "xymon" package, as well as in the
"xymon-client" package. You would only install "xymon" or "xymon-client"
but not both (or you might get duplicate clients running). However, from
v4.3.18, the client files in the xymon package were removed, requiring both
"xymon" and "xymon-client" to be installed on the Xymon server (if you
wanted to the server to monitor itself). You appear to have both packages
installed on your Xymon server.

▸ quoted from Carl Melgaard


I have 2 Xymon servers, 1 primary and 1 secondary. The primary distributes

to the secondary. Only the secondary is updated as of yet.

It makes sense to have two for redundancy. Have you thought about
configuring both Xymon servers in each client? That way, if the primary
goes down, the secondary will still receive updates. (This has nothing to
do with diagnosing the gaps in your graphs, I'm just curious.)

▸ quoted from Carl Melgaard

You said that you noticed on the Xymon server itself. Has it only happened
to graphs for the Xymon server? I'm wondering if you have the Xymon
client AND the Xymon server both running on the same host?


After I noticed it on the Xymon server itself, I went looked for gaps
elsewhere, and I found some on other servers as well.

Right, so the gaps aren't likely to be caused by client and server running
together, if it's also happening for other servers not running the Xymon
server. But this might be the cause of your RRD warnings.

▸ quoted from Carl Melgaard


Also in xymonclient.log I get these quite alot, dunno if its related:


mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

cat: /dev/shm/xymon_vmstat.x: No such file or directory

Do you only see these on the Xymon server, or these log messages also
showing on Xymon clients? And if so, at what frequency?

▸ quoted from Carl Melgaard

Can you explain "quite alot"? Can you give an indication of how often
these occur?


623 lines in the logfile yesterday.

That's roughly 2 every 5-minute interval. That's significant.

Your symptoms (xymonclient.log messages, RRD warnings) are consistent with
two different instances of the Xymon client script running at the same
time. When this happens, each instance tries to create and populate
xymon_vmstat.<servername> (from a vmstat command) and include its contents
in the client status message before removing the file. Usually the file
only exists for a brief moment. If two instances of the client are running,
it's unlikely that both would create the file, and then try to use it, at
the same time. But if it did happen, the one instance would likely show the
"No such file or directory" message, because the other instance had removed
the file. A classic "race condition".

Similarly, the Xymon client script creates the
logfetch.<servername>.cfg.tmp file, then renames it to
logfetch.<servername>.cfg. If a second instance tries to rename the file
after the first instance has already done so, then you'll see the "No such
file or directory".

Can you show me the output of the following commands. I'm running this on
one of my Xymon servers (using Terabithia RPMs) to show what you might
expect:

$ pgrep -lf xymonlaunch
16602 /usr/lib/xymon/server/bin/xymonlaunch
--config=/usr/lib/xymon/server/etc/tasks.cfg
--env=/usr/lib/xymon/server/etc/xymonserver.cfg
--log=/var/log/xymon/xymonlaunch.log
--pidfile=/var/log/xymon/xymonlaunch.pid

$ pgrep -lf vmstat
8304 sh -c vmstat 300 2
1>/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 2>&1; mv
/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252
/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>
8306 vmstat 300 2

Cheers
Jeremy

list Carl Melgaard · Wed, 10 Mar 2021 08:01:53 +0000 ·

▸ quoted from Carl Melgaard

On Tue, 9 Mar 2021 at 18:47, Carl Melgaard <user-cdea55422fa4@xymon.invalid<mailto:user-cdea55422fa4@xymon.invalid>> wrote:

I have 2 Xymon servers, 1 primary and 1 secondary. The primary distributes to the secondary. Only the secondary is updated as of yet.
|It makes sense to have two for redundancy. Have you thought about configuring both Xymon servers in each client? That way, if the primary goes down, the secondary will still receive updates. (This has nothing to do with diagnosing the gaps in your graphs, I'm just curious.)


Yeah, I already do this on all my xymon clients.

▸ quoted from Jeremy Laidman


Also in xymonclient.log I get these quite alot, dunno if its related:

mv: cannot stat '/dev/shm/logfetch.x.cfg.tmp': No such file or directory
cat: /dev/shm/xymon_vmstat.x: No such file or directory
cat: /dev/shm/xymon_vmstat.x: No such file or directory

Do you only see these on the Xymon server, or these log messages also showing on Xymon clients? And if so, at what frequency?

I don?t see them anywhere than on the updated Xymon-server.

▸ quoted from Jeremy Laidman


623 lines in the logfile yesterday.

That's roughly 2 every 5-minute interval. That's significant.
Can you show me the output of the following commands. I'm running this on one of my Xymon servers (using Terabithia RPMs) to show what you might expect:
$ pgrep -lf xymonlaunch
16602 /usr/lib/xymon/server/bin/xymonlaunch --config=/usr/lib/xymon/server/etc/tasks.cfg --env=/usr/lib/xymon/server/etc/xymonserver.cfg --log=/var/log/xymon/xymonlaunch.log --pidfile=/var/log/xymon/xymonlaunch.pid

$ ps ?ef|grep xymonlaunch
xymon     1084     1  0 Jan19 ?        00:01:24 /usr/sbin/xymonlaunch --no-daemon --log=/var/log/xymon/xymonlaunch.log

▸ quoted from Jeremy Laidman

$ pgrep -lf vmstat
8304 sh -c vmstat 300 2 1>/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 2>&1; mv /usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 /usr/lib/xymon/client/tmp/xymon_vmstat.<servername>
8306 vmstat 300 2

$ps ?ef |grep vmstat

xymon    14896 14893  0 08:50 ?        00:00:00 vmstat 300 2
xymon    14904 14898  0 08:50 ?        00:00:00 vmstat 300 2

I noticed these 2 running, and couldnt figure out how both were spawned. Maybe I should ?DISABLED? the client-part in clientlaunch.cfg ? I see now that theres actually a xymonclient-part in tasks.cfg? There we have the 2 instances!

I guess I?ll try that ? Thanks for pointing me right at the answer! Now I just have to figure out, why the new server is eating up 10 times more RAM than the old server, with the same amount of hosts monitored.

Regards,

Carl

list Jeremy Laidman · Wed, 10 Mar 2021 21:02:35 +1100 ·

On Wed, 10 Mar 2021 at 19:02, Carl Melgaard <user-cdea55422fa4@xymon.invalid>

▸ quoted from Carl Melgaard

wrote:

$ pgrep -lf vmstat

8304 sh -c vmstat 300 2

1>/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252 2>&1; mv
/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>.8252
/usr/lib/xymon/client/tmp/xymon_vmstat.<servername>

8306 vmstat 300 2


$ps ?ef |grep vmstat


xymon    14896 14893  0 08:50 ?        00:00:00 vmstat 300 2

xymon    14904 14898  0 08:50 ?        00:00:00 vmstat 300 2


I noticed these 2 running, and couldnt figure out how both were spawned.
Maybe I should ?DISABLED? the client-part in clientlaunch.cfg ? I see now
that theres actually a xymonclient-part in tasks.cfg? There we have the 2
instances!

Yes, that'd be it. Disable one of those.

The clientlaunch.cfg file comments say:

# Note: On the Xymon *server* itself, this file is normally
#       NOT used. Instead, both the client- and server-tasks
#       are controlled by the tasks.cfg file.

On a client, the clientlaunch.cfg file is loaded by
/usr/lib/xymon/*client*/bin/xymonlaunch
(note the "client" rather than the "server" in the path). The client
instance has "--config=/usr/lib/xymon/client/etc/clientlaunch.cfg" as a
parameter, to use the contents of that file. It's not usual for this
instance of xymonlaunch to run on a Xymon server.

Your symptoms suggest that you have a client instance "xymonlaunch
--config=...client/etc/clientlaunch.cfg" as well as the server instance.
However your "ps -ef|grep xymonlaunch" only shows one. So I'm puzzled how
the clientlaunch.cfg file is being processed.

I guess I?ll try that J Thanks for pointing me right at the answer! Now I

▸ quoted from Carl Melgaard

just have to figure out, why the new server is eating up 10 times more RAM
than the old server, with the same amount of hosts monitored.

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL 7.
The kernel memory management is likely to be a bit different. You might
find that the extra RAM usage is simply taken up by kernel buffers and
cache, so isn't really "in use" in the traditional sense.

Cheers
Jeremy

list Carl Melgaard · Wed, 10 Mar 2021 12:53:05 +0000 ·

▸ quoted from Carl Melgaard

On Wed, 10 Mar 2021 at 19:02, Carl Melgaard <user-cdea55422fa4@xymon.invalid<mailto:user-cdea55422fa4@xymon.invalid>> wrote:
I noticed these 2 running, and couldnt figure out how both were spawned. Maybe I should ?DISABLED? the client-part in clientlaunch.cfg ? I see now that theres actually a xymonclient-part in tasks.cfg? There we have the 2 instances!

Yes, that'd be it. Disable one of those.
The clientlaunch.cfg file comments say:
# Note: On the Xymon *server* itself, this file is normally
#       NOT used. Instead, both the client- and server-tasks
#       are controlled by the tasks.cfg file.

Yup, I disabled it in clientlaunch and it works.

Your symptoms suggest that you have a client instance "xymonlaunch --config=...client/etc/clientlaunch.cfg" as well as the server instance. However your "ps -ef|grep xymonlaunch" only shows one. So I'm puzzled how the clientlaunch.cfg file is being processed.

I think it runs from a service, as xymon-client is a seperate package.

▸ quoted from Jeremy Laidman


I guess I?ll try that ? Thanks for pointing me right at the answer! Now I just have to figure out, why the new server is eating up 10 times more RAM than the old server, with the same amount of hosts monitored.

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL 7. The kernel memory management is likely to be a bit different. You might find that the extra RAM usage is simply taken up by kernel buffers and cache, so isn't really "in use" in the traditional sense.

Is there any way to verify that this is indeed the case?

Regards,

Carl

list Jeremy Laidman · Thu, 11 Mar 2021 09:48:40 +1100 ·

On Wed, 10 Mar 2021 at 23:53, Carl Melgaard <user-cdea55422fa4@xymon.invalid>

▸ quoted from Carl Melgaard

wrote:

Your symptoms suggest that you have a client instance "xymonlaunch

--config=...client/etc/clientlaunch.cfg" as well as the server instance. However
your "ps -ef|grep xymonlaunch" only shows one. So I'm puzzled how the
clientlaunch.cfg file is being processed.


I think it runs from a service, as xymon-client is a seperate package.

Hmm, the Terabithia package postinstall script (from output of `rpm -q
--scripts xymon-client` shows:

# This is a hack, but we don't want to double-bounce xymonlaunch,
# so let the server package handle it if both are installed...
if [ ! -e "/etc/xymon/xymonserver.cfg" ] ; then

# add unit file or init script; restart if already running

if [ $1 -eq 1 ] ; then
        # Initial installation
        systemctl preset xymonlaunch.service >/dev/null 2>&1 || :
fi

fi


This tells me that the client package first checks for the server being
installed (by checking the existence of /ec/xymon/xymonserver.cfg) and if
not, it creates its own service, otherwise it assumes the server package
will do the needful. However, this would probably not work if you installed
the client package before the server package, and possibly cause both
services to be installed. Either way, you should be able to rectify the
situation with the appropriate "systemctl" command.

▸ quoted from Carl Melgaard


I guess I?ll try that J Thanks for pointing me right at the answer! Now I

just have to figure out, why the new server is eating up 10 times more RAM
than the old server, with the same amount of hosts monitored.

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL

7. The kernel memory management is likely to be a bit different. You
might find that the extra RAM usage is simply taken up by kernel buffers
and cache, so isn't really "in use" in the traditional sense.


Is there any way to verify that this is indeed the case?

Memory management and monitoring is something I don't know much about - I
know just enough to know how little I know.

The "real" memory usage graph is seemingly a good indication of actual RAM
utilisation because it excludes buffers/caches. If you're concerned about
Xymon using too much RAM, to the point where it could affect your server's
performance or stability, then I'd recommend opening a new thread to
discuss it.

Cheers
Jeremy

list Carl Melgaard · Thu, 11 Mar 2021 07:43:57 +0000 ·

▸ quoted from Jeremy Laidman

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL 7. The kernel memory management is likely to be a bit different. You might find that the extra RAM usage is simply taken up by kernel buffers and cache, so isn't really "in use" in the traditional sense.

 Is there any way to verify that this is indeed the case?

Memory management and monitoring is something I don't know much about - I know just enough to know how little I know.
The "real" memory usage graph is seemingly a good indication of actual RAM utilisation because it excludes buffers/caches. If you're concerned about Xymon using too much RAM, to the point where it could affect your server's performance or stability, then I'd recommend opening a >new thread to discuss it.

It is indeed indicated in ?real? memory. I already did open a new (older) thread, which didn?t get much attention, maybe 2 replies. The mailinglist is almost dead, so I know it?s a longshot to get help.

Regards,

Carl

list Jeremy Laidman · Thu, 11 Mar 2021 21:52:02 +1100 ·

On Thu, 11 Mar 2021 at 18:44, Carl Melgaard <user-cdea55422fa4@xymon.invalid>

▸ quoted from Carl Melgaard

wrote:

I note that you've moved quite a few OS iterations from CentOS 5 to RHEL

7. The kernel memory management is likely to be a bit different. You
might find that the extra RAM usage is simply taken up by kernel buffers
and cache, so isn't really "in use" in the traditional sense.

 Is there any way to verify that this is indeed the case?

Memory management and monitoring is something I don't know much about -

I know just enough to know how little I know.

The "real" memory usage graph is seemingly a good indication of actual

RAM utilisation because it excludes buffers/caches. If you're concerned
about Xymon using too much RAM, to the point where it could affect your
server's performance or stability, then I'd recommend opening a >new
thread to discuss it.


It is indeed indicated in ?real? memory. I already did open a new (older)
thread, which didn?t get much attention, maybe 2 replies. The mailinglist
is almost dead, so I know it?s a longshot to get help.

The memory concern is probably a general Linux question rather than a
Xymon-specific one. You might find more responses from a Linux forum.

J

Gaps in graphs 🔗 link

Gaps in graphs