Xymon Mailing List Archive search

Graph lines for disk and memory showing gaps

6 messages in this thread

list Stephen Barrie · Wed, 9 Aug 2017 12:13:06 +0000 ·
Hi

We have a few Xymon servers running on Ubuntu 14.04 and 16.04 that show gaps in the graphs for disk and memory metrics. This specifically relates to Linux and Unix clients reporting to these servers. The graph lines for Windows clients look ok.

If the same Unix and Linux clients are pointed to another server running Red Hat we do not see the same gaps in the graphs. Upgrading the client version does not resolve this. So it seems there is a problem with graph data being created on the Xymon server on Ubuntu. We are using the package version 4.3.25. Is there a known issue with this?

Regards

Stephen Barrie
This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.
list Jeremy Laidman · Wed, 9 Aug 2017 23:50:30 +1000 ·
Can you show an example/screencap of a graph with the gaps? Do all servers
and/or disks have gaps at the same time? How big are the gaps - one or two
5-minute sample size, or much larger?

The Windows disk usage message is parsed by different code to that which
parses the UNIX and Linux "df" output. That might have something to do with
it.

Is 4.3.25 in use on all Xymon servers?

Check the Xymon logs for error messages. Perhaps the RRD parser is crashing.

It's possible that the "df" output is sometimes not parseable by the RRD
parser, and so the data is ignored. Can you show the [df] section from the
client data during one of the gaps, and at another time when there are no
gaps?

J
quoted from Stephen Barrie


On 9 August 2017 at 22:13, Stephen Barrie <user-fb745847505d@xymon.invalid> wrote:
Hi


We have a few Xymon servers running on Ubuntu 14.04 and 16.04 that show
gaps in the graphs for disk and memory metrics. This specifically relates
to Linux and Unix clients reporting to these servers. The graph lines for
Windows clients look ok.


If the same Unix and Linux clients are pointed to another server running
Red Hat we do not see the same gaps in the graphs. Upgrading the client
version does not resolve this. So it seems there is a problem with graph
data being created on the Xymon server on Ubuntu. We are using the package
version 4.3.25. Is there a known issue with this?


Regards


Stephen Barrie
This message is confidential and may contain privileged information. You
should not disclose its contents to any other person. If you are not the
intended recipient, please notify the sender named above immediately. It is
expressly declared that this e-mail does not constitute nor form part of a
contract or unilateral obligation. Opinions, conclusions and other
information in this message that do not relate to the official business of
brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to
leaving our network.

list Stephen Barrie · Wed, 9 Aug 2017 15:41:25 +0000 ·
Hello

Attached are sample memory and disk graphs. You will see the gaps in one line do not equate with the gaps in others. The gaps vary in length and there is no clear pattern. Sometimes it is for minutes but they could be for more than an hour. The output from the df command in the client still updates the server every few minutes and this never seems to miss any data. It is difficult to get the df output that coincides with gaps in the graphs as we don’t see the graphs are missing data until after the fact, but we can see that the gaps are not at the same times for all machines or all disks.

The server where we are not experiencing the problem is running 4.3.17 on Red Hat Linux. The others are 4.3.25 on Ubuntu 16.04 and 4.3.7 on Ubuntu 14.04. There are no RRD or other related errors in the logs.

A problem with the RRD parser seems a logical explanation. Is there a way of tracking what this is doing?

Stephen
quoted from Jeremy Laidman

From: Jeremy Laidman [mailto:user-71895fb2e44c@xymon.invalid]
Sent: 09 August 2017 14:51
To: Stephen Barrie <user-fb745847505d@xymon.invalid>
Cc: xymon at xymon.com
Subject: Re: [Xymon] Graph lines for disk and memory showing gaps

Can you show an example/screencap of a graph with the gaps? Do all servers and/or disks have gaps at the same time? How big are the gaps - one or two 5-minute sample size, or much larger?

The Windows disk usage message is parsed by different code to that which parses the UNIX and Linux "df" output. That might have something to do with it.

Is 4.3.25 in use on all Xymon servers?

Check the Xymon logs for error messages. Perhaps the RRD parser is crashing.

It's possible that the "df" output is sometimes not parseable by the RRD parser, and so the data is ignored. Can you show the [df] section from the client data during one of the gaps, and at another time when there are no gaps?

J


On 9 August 2017 at 22:13, Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>> wrote:
Hi

We have a few Xymon servers running on Ubuntu 14.04 and 16.04 that show gaps in the graphs for disk and memory metrics. This specifically relates to Linux and Unix clients reporting to these servers. The graph lines for Windows clients look ok.

If the same Unix and Linux clients are pointed to another server running Red Hat we do not see the same gaps in the graphs. Upgrading the client version does not resolve this. So it seems there is a problem with graph data being created on the Xymon server on Ubuntu. We are using the package version 4.3.25. Is there a known issue with this?

Regards

Stephen Barrie
This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.


This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.
list Jeremy Laidman · Thu, 10 Aug 2017 11:33:24 +1000 ·
Barrie

This is quite interesting. On the one hand, most of the gaps in the lines
don't line up with gaps in other lines. On the other hand, on the graph
with four lines, the red and pink lines appear to have gaps in the same
place 100% of the time. And the green line only appears when the blue line
appears. It's as if there are two concurrent problems that are masking each
others' symptoms.

Yes, you can run xymond_rrd with the "--debug" switch, to enable more
output in the log file.

Rather than re-configuring the parameters for xymond_rrd in tasks.cfg and
restarting, you can run a second copy of it, which runs independently of
the main one, creating and updating its own copy of the RRD files, and
logging to STDOUT for you to peruse in real time.

Typically there are two xymond_rrd processes running, one for status
messages and the other for data messages. The one we're interested in is
the one for the status messages, because the data are parsed from the
"disk" message body. You can run the following (as xymon user) to see
what's going on with this:

$ sudo -u xymon mkdir -p /tmp/myrrd
$ sudo -u xymon xymoncmd xymond_channel --channel=status \
 --filter='^@@status.*/hostname.example.com.*\|disk\|' xymond_rrd \
 --rrddir=/tmp/myrrd --debug

This will create RRD files in /tmp/myrrd for the named host and for only
the "disk" data. You can then use rrdtool to view the data and look for
signs of gaps. Note that xymond_rrd will cache updates to the RRD file, and
so you might not get updates to the RRD file straight away, so you can
either wait 30 minutes to review, or add the --no-cache switch. Also note
that RRD requires more than just one sample for it to start trusting the
data, so you would want to wait for at least two samples before relying on
the contents of the RRD file.

If that doesn't show any hints, then a more systematic approach may be
warranted. There are several stages between the Xymon client creating the
client data message and the RRD file being created, where messages could go
missing. Here's a rough summary:

xymon-client.sh => df
  => xymon [Client message]
  => xymond
  => xymond_client -> [status message]
  => xymond
  => xymond_channel --status
  => xymond_rrd
  => rrd

What I'd be doing is checking each phase of this to determine where the
messages are being dropped/corrupted. In most cases, you can impersonate
the existing process and review the results to infer what's happening in
the actual process.

J
quoted from Stephen Barrie


On 10 August 2017 at 01:41, Stephen Barrie <user-fb745847505d@xymon.invalid> wrote:
Hello


Attached are sample memory and disk graphs. You will see the gaps in one
line do not equate with the gaps in others. The gaps vary in length and
there is no clear pattern. Sometimes it is for minutes but they could be
for more than an hour. The output from the df command in the client still
updates the server every few minutes and this never seems to miss any data.
It is difficult to get the df output that coincides with gaps in the graphs
as we don’t see the graphs are missing data until after the fact, but we
can see that the gaps are not at the same times for all machines or all
disks.


The server where we are not experiencing the problem is running 4.3.17 on
Red Hat Linux. The others are 4.3.25 on Ubuntu 16.04 and 4.3.7 on Ubuntu
14.04. There are no RRD or other related errors in the logs.


A problem with the RRD parser seems a logical explanation. Is there a way
of tracking what this is doing?


Stephen


*From:* Jeremy Laidman [mailto:user-71895fb2e44c@xymon.invalid]
*Sent:* 09 August 2017 14:51
*To:* Stephen Barrie <user-fb745847505d@xymon.invalid>
*Cc:* xymon at xymon.com
*Subject:* Re: [Xymon] Graph lines for disk and memory showing gaps


Can you show an example/screencap of a graph with the gaps? Do all servers
and/or disks have gaps at the same time? How big are the gaps - one or two
5-minute sample size, or much larger?


The Windows disk usage message is parsed by different code to that which
parses the UNIX and Linux "df" output. That might have something to do with
it.


Is 4.3.25 in use on all Xymon servers?


Check the Xymon logs for error messages. Perhaps the RRD parser is
crashing.


It's possible that the "df" output is sometimes not parseable by the RRD
parser, and so the data is ignored. Can you show the [df] section from the
client data during one of the gaps, and at another time when there are no
gaps?


J


On 9 August 2017 at 22:13, Stephen Barrie <user-fb745847505d@xymon.invalid> wrote:

Hi


We have a few Xymon servers running on Ubuntu 14.04 and 16.04 that show
gaps in the graphs for disk and memory metrics. This specifically relates
to Linux and Unix clients reporting to these servers. The graph lines for
Windows clients look ok.


If the same Unix and Linux clients are pointed to another server running
Red Hat we do not see the same gaps in the graphs. Upgrading the client
version does not resolve this. So it seems there is a problem with graph
data being created on the Xymon server on Ubuntu. We are using the package
version 4.3.25. Is there a known issue with this?


Regards


Stephen Barrie
This message is confidential and may contain privileged information. You
should not disclose its contents to any other person. If you are not the
intended recipient, please notify the sender named above immediately. It is
expressly declared that this e-mail does not constitute nor form part of a
contract or unilateral obligation. Opinions, conclusions and other
information in this message that do not relate to the official business of
brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to
leaving our network.


This message is confidential and may contain privileged information. You
should not disclose its contents to any other person. If you are not the
intended recipient, please notify the sender named above immediately. It is
expressly declared that this e-mail does not constitute nor form part of a
contract or unilateral obligation. Opinions, conclusions and other
information in this message that do not relate to the official business of
brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to
leaving our network.
list Stephen Barrie · Thu, 10 Aug 2017 11:24:46 +0000 ·
Hi

I have tried running xymon_rrd with the debug option and found some errors.

I took a sample host where the df output looks like this
Filesystem                  1024-blocks     Used Available Capacity Mounted on
udev                            1999540        0   1999540       0% /dev
tmpfs                            403788    41084    362704      11% /run
/dev/mapper/rootvg-lv--root    57456900 16628168  37886988      31% /
tmpfs                           2018924        0   2018924       0% /dev/shm
tmpfs                              5120        0      5120       0% /run/lock
tmpfs                           2018924        0   2018924       0% /sys/fs/cgroup
/dev/sda1                        482922   249320    208668      55% /boot
tmpfs                            403788        0    403788       0% /run/user/1000

I get errors like the ones below
31712 2017-08-10 10:56:57.362148 server/disk,root.rrd: Ignored duplicate (and identical) update timestamped 1502359017
31712 2017-08-10 10:56:57.362154 server/disk,run,lock.rrd: Ignored duplicate (and identical) update timestamped 1502359017
31712 2017-08-10 10:56:57.362158 server/disk,sys,fs,cgroup.rrd: Ignored duplicate (and identical) update timestamped 1502359017
31712 2017-08-10 10:56:57.362167 server/disk,run,user,1000.rrd: Ignored duplicate (and identical) update timestamped 1502359017

The timestamps on the RRD files with the errors do not update at the times when these errors appear. In the above example there is no error for disk,boot.rrd, which relates to the /boot file system, and the timestamp on that file updated at 10:56 while the ones mentioned in the error messages did not. Each time xymond_rrd runs the errors relate to a slightly different set of files.

I also noticed the rrd_status.log file has messages like “server/memory.real.rrd: Bug - duplicate RRD data with same timestamp 1502336737, different data”.

We only see these messages in relation to servers that have missing points in the graphs. This behaviour is similar to this problem http://lists.xymon.com/archive/2016-April/043387.html

I checked on what is suggested there but rrdstatus and rrddata are running on separate channels so I can’t see where the duplicate data is coming from.
quoted from Jeremy Laidman


From: Jeremy Laidman [mailto:user-71895fb2e44c@xymon.invalid]
Sent: 10 August 2017 02:34
To: Stephen Barrie <user-fb745847505d@xymon.invalid>
Cc: xymon at xymon.com
Subject: Re: [Xymon] Graph lines for disk and memory showing gaps

Barrie

This is quite interesting. On the one hand, most of the gaps in the lines don't line up with gaps in other lines. On the other hand, on the graph with four lines, the red and pink lines appear to have gaps in the same place 100% of the time. And the green line only appears when the blue line appears. It's as if there are two concurrent problems that are masking each others' symptoms.

Yes, you can run xymond_rrd with the "--debug" switch, to enable more output in the log file.

Rather than re-configuring the parameters for xymond_rrd in tasks.cfg and restarting, you can run a second copy of it, which runs independently of the main one, creating and updating its own copy of the RRD files, and logging to STDOUT for you to peruse in real time.

Typically there are two xymond_rrd processes running, one for status messages and the other for data messages. The one we're interested in is the one for the status messages, because the data are parsed from the "disk" message body. You can run the following (as xymon user) to see what's going on with this:

$ sudo -u xymon mkdir -p /tmp/myrrd
$ sudo -u xymon xymoncmd xymond_channel --channel=status \
 --filter='^@@status.*/hostname.example.com.*\|disk\|' xymond_rrd \
 --rrddir=/tmp/myrrd --debug

This will create RRD files in /tmp/myrrd for the named host and for only the "disk" data. You can then use rrdtool to view the data and look for signs of gaps. Note that xymond_rrd will cache updates to the RRD file, and so you might not get updates to the RRD file straight away, so you can either wait 30 minutes to review, or add the --no-cache switch. Also note that RRD requires more than just one sample for it to start trusting the data, so you would want to wait for at least two samples before relying on the contents of the RRD file.

If that doesn't show any hints, then a more systematic approach may be warranted. There are several stages between the Xymon client creating the client data message and the RRD file being created, where messages could go missing. Here's a rough summary:

xymon-client.sh => df
  => xymon [Client message]
  => xymond
  => xymond_client -> [status message]
  => xymond
  => xymond_channel --status
  => xymond_rrd
  => rrd

What I'd be doing is checking each phase of this to determine where the messages are being dropped/corrupted. In most cases, you can impersonate the existing process and review the results to infer what's happening in the actual process.

J


On 10 August 2017 at 01:41, Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>> wrote:
Hello

Attached are sample memory and disk graphs. You will see the gaps in one line do not equate with the gaps in others. The gaps vary in length and there is no clear pattern. Sometimes it is for minutes but they could be for more than an hour. The output from the df command in the client still updates the server every few minutes and this never seems to miss any data. It is difficult to get the df output that coincides with gaps in the graphs as we don’t see the graphs are missing data until after the fact, but we can see that the gaps are not at the same times for all machines or all disks.

The server where we are not experiencing the problem is running 4.3.17 on Red Hat Linux. The others are 4.3.25 on Ubuntu 16.04 and 4.3.7 on Ubuntu 14.04. There are no RRD or other related errors in the logs.

A problem with the RRD parser seems a logical explanation. Is there a way of tracking what this is doing?

Stephen

From: Jeremy Laidman [mailto:user-71895fb2e44c@xymon.invalid<mailto:user-71895fb2e44c@xymon.invalid>]
Sent: 09 August 2017 14:51
To: Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>>
Cc: xymon at xymon.com<mailto:xymon at xymon.com>
Subject: Re: [Xymon] Graph lines for disk and memory showing gaps

Can you show an example/screencap of a graph with the gaps? Do all servers and/or disks have gaps at the same time? How big are the gaps - one or two 5-minute sample size, or much larger?

The Windows disk usage message is parsed by different code to that which parses the UNIX and Linux "df" output. That might have something to do with it.

Is 4.3.25 in use on all Xymon servers?

Check the Xymon logs for error messages. Perhaps the RRD parser is crashing.

It's possible that the "df" output is sometimes not parseable by the RRD parser, and so the data is ignored. Can you show the [df] section from the client data during one of the gaps, and at another time when there are no gaps?

J


On 9 August 2017 at 22:13, Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>> wrote:
Hi

We have a few Xymon servers running on Ubuntu 14.04 and 16.04 that show gaps in the graphs for disk and memory metrics. This specifically relates to Linux and Unix clients reporting to these servers. The graph lines for Windows clients look ok.

If the same Unix and Linux clients are pointed to another server running Red Hat we do not see the same gaps in the graphs. Upgrading the client version does not resolve this. So it seems there is a problem with graph data being created on the Xymon server on Ubuntu. We are using the package version 4.3.25. Is there a known issue with this?

Regards

Stephen Barrie
This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.


This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.

This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.
list Stephen Barrie · Wed, 30 Aug 2017 11:28:30 +0000 ·
Hi

We have set the --no-cache option under rrdstatus in tasks.cfg which has resolved the problem with missing graph data.  So it looks like the cache does not behave correctly with the Ubuntu packages for Xymon server.
quoted from Stephen Barrie


From: Stephen Barrie
Sent: 10 August 2017 12:25
To: Jeremy Laidman <user-71895fb2e44c@xymon.invalid>
Cc: xymon at xymon.com
Subject: RE: [Xymon] Graph lines for disk and memory showing gaps

Hi

I have tried running xymon_rrd with the debug option and found some errors.

I took a sample host where the df output looks like this
Filesystem                  1024-blocks     Used Available Capacity Mounted on
udev                            1999540        0   1999540       0% /dev
tmpfs                            403788    41084    362704      11% /run
/dev/mapper/rootvg-lv--root    57456900 16628168  37886988      31% /
tmpfs                           2018924        0   2018924       0% /dev/shm
tmpfs                              5120        0      5120       0% /run/lock
tmpfs                           2018924        0   2018924       0% /sys/fs/cgroup
/dev/sda1                        482922   249320    208668      55% /boot
tmpfs                            403788        0    403788       0% /run/user/1000

I get errors like the ones below
31712 2017-08-10 10:56:57.362148 server/disk,root.rrd: Ignored duplicate (and identical) update timestamped 1502359017
31712 2017-08-10 10:56:57.362154 server/disk,run,lock.rrd: Ignored duplicate (and identical) update timestamped 1502359017
31712 2017-08-10 10:56:57.362158 server/disk,sys,fs,cgroup.rrd: Ignored duplicate (and identical) update timestamped 1502359017
31712 2017-08-10 10:56:57.362167 server/disk,run,user,1000.rrd: Ignored duplicate (and identical) update timestamped 1502359017

The timestamps on the RRD files with the errors do not update at the times when these errors appear. In the above example there is no error for disk,boot.rrd, which relates to the /boot file system, and the timestamp on that file updated at 10:56 while the ones mentioned in the error messages did not. Each time xymond_rrd runs the errors relate to a slightly different set of files.

I also noticed the rrd_status.log file has messages like “server/memory.real.rrd: Bug - duplicate RRD data with same timestamp 1502336737, different data”.

We only see these messages in relation to servers that have missing points in the graphs. This behaviour is similar to this problem http://lists.xymon.com/archive/2016-April/043387.html

I checked on what is suggested there but rrdstatus and rrddata are running on separate channels so I can’t see where the duplicate data is coming from.


From: Jeremy Laidman [mailto:user-71895fb2e44c@xymon.invalid]
Sent: 10 August 2017 02:34
To: Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>>
Cc: xymon at xymon.com<mailto:xymon at xymon.com>
Subject: Re: [Xymon] Graph lines for disk and memory showing gaps

Barrie

This is quite interesting. On the one hand, most of the gaps in the lines don't line up with gaps in other lines. On the other hand, on the graph with four lines, the red and pink lines appear to have gaps in the same place 100% of the time. And the green line only appears when the blue line appears. It's as if there are two concurrent problems that are masking each others' symptoms.

Yes, you can run xymond_rrd with the "--debug" switch, to enable more output in the log file.

Rather than re-configuring the parameters for xymond_rrd in tasks.cfg and restarting, you can run a second copy of it, which runs independently of the main one, creating and updating its own copy of the RRD files, and logging to STDOUT for you to peruse in real time.

Typically there are two xymond_rrd processes running, one for status messages and the other for data messages. The one we're interested in is the one for the status messages, because the data are parsed from the "disk" message body. You can run the following (as xymon user) to see what's going on with this:

$ sudo -u xymon mkdir -p /tmp/myrrd
$ sudo -u xymon xymoncmd xymond_channel --channel=status \
 --filter='^@@status.*/hostname.example.com.*\|disk\|' xymond_rrd \
 --rrddir=/tmp/myrrd --debug

This will create RRD files in /tmp/myrrd for the named host and for only the "disk" data. You can then use rrdtool to view the data and look for signs of gaps. Note that xymond_rrd will cache updates to the RRD file, and so you might not get updates to the RRD file straight away, so you can either wait 30 minutes to review, or add the --no-cache switch. Also note that RRD requires more than just one sample for it to start trusting the data, so you would want to wait for at least two samples before relying on the contents of the RRD file.

If that doesn't show any hints, then a more systematic approach may be warranted. There are several stages between the Xymon client creating the client data message and the RRD file being created, where messages could go missing. Here's a rough summary:

xymon-client.sh => df
  => xymon [Client message]
  => xymond
  => xymond_client -> [status message]
  => xymond
  => xymond_channel --status
  => xymond_rrd
  => rrd

What I'd be doing is checking each phase of this to determine where the messages are being dropped/corrupted. In most cases, you can impersonate the existing process and review the results to infer what's happening in the actual process.

J


On 10 August 2017 at 01:41, Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>> wrote:
Hello

Attached are sample memory and disk graphs. You will see the gaps in one line do not equate with the gaps in others. The gaps vary in length and there is no clear pattern. Sometimes it is for minutes but they could be for more than an hour. The output from the df command in the client still updates the server every few minutes and this never seems to miss any data. It is difficult to get the df output that coincides with gaps in the graphs as we don’t see the graphs are missing data until after the fact, but we can see that the gaps are not at the same times for all machines or all disks.

The server where we are not experiencing the problem is running 4.3.17 on Red Hat Linux. The others are 4.3.25 on Ubuntu 16.04 and 4.3.7 on Ubuntu 14.04. There are no RRD or other related errors in the logs.

A problem with the RRD parser seems a logical explanation. Is there a way of tracking what this is doing?

Stephen

From: Jeremy Laidman [mailto:user-71895fb2e44c@xymon.invalid<mailto:user-71895fb2e44c@xymon.invalid>]
Sent: 09 August 2017 14:51
To: Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>>
Cc: xymon at xymon.com<mailto:xymon at xymon.com>
Subject: Re: [Xymon] Graph lines for disk and memory showing gaps

Can you show an example/screencap of a graph with the gaps? Do all servers and/or disks have gaps at the same time? How big are the gaps - one or two 5-minute sample size, or much larger?

The Windows disk usage message is parsed by different code to that which parses the UNIX and Linux "df" output. That might have something to do with it.

Is 4.3.25 in use on all Xymon servers?

Check the Xymon logs for error messages. Perhaps the RRD parser is crashing.

It's possible that the "df" output is sometimes not parseable by the RRD parser, and so the data is ignored. Can you show the [df] section from the client data during one of the gaps, and at another time when there are no gaps?

J


On 9 August 2017 at 22:13, Stephen Barrie <user-fb745847505d@xymon.invalid<mailto:user-fb745847505d@xymon.invalid>> wrote:
Hi

We have a few Xymon servers running on Ubuntu 14.04 and 16.04 that show gaps in the graphs for disk and memory metrics. This specifically relates to Linux and Unix clients reporting to these servers. The graph lines for Windows clients look ok.

If the same Unix and Linux clients are pointed to another server running Red Hat we do not see the same gaps in the graphs. Upgrading the client version does not resolve this. So it seems there is a problem with graph data being created on the Xymon server on Ubuntu. We are using the package version 4.3.25. Is there a known issue with this?

Regards

Stephen Barrie
This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.


This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.

This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of brightsolid shall be understood as neither given nor endorsed by it.


This email has been checked for virus and other malicious content prior to leaving our network.