Xymon Mailing List Archive search

Installing Xymon from terabithia; two weird issues

5 messages in this thread

list Peter Welter · Fri, 17 Mar 2017 13:56:09 +0100 ·
Hi JC,

I'm still experiencing some difficulties with Xymon version
(4.3.27-1.el6.terabithia) software, that is being deployed from
http://terabithia.org/rpms/xymon/el6/i686/.

There are two different types of problems:

1) Has to do with the integration of Xymon/Devmon.

   Although Devmon gets valid SNMP-data, for each poll, the values in the
if_load.Ethernet3_1.rrd-file (for example) are showing gaps. The next value
is so much larger than the rest, so the total graph is going beserk because
of the spikes that are being shown.

   ...[snip]
            <!-- 2017-03-15 15:10:00 CET / 1489587000 -->
<row><v>5.7197560484e+01</v><v>5.7540255376e+01</v></row>
            <!-- 2017-03-15 15:15:00 CET / 1489587300 -->
<row><v>5.8052253788e+01</v><v>5.7062462121e+01</v></row>
            <!-- 2017-03-15 15:20:00 CET / 1489587600 -->
<row><v>5.8039204545e+01</v><v>5.7738579545e+01</v></row>
            <!-- 2017-03-15 15:25:00 CET / 1489587900 -->
<row><v>5.8352395833e+01</v><v>5.7912187500e+01</v></row>
            <!-- 2017-03-15 15:30:00 CET / 1489588200 -->
<row><v>5.7961458333e+01</v><v>5.8807500000e+01</v></row>
            <!-- 2017-03-15 15:35:00 CET / 1489588500 -->
<row><v>5.7040675403e+01</v><v>5.7108262769e+01</v></row>
            <!-- 2017-03-15 15:40:00 CET / 1489588800 -->
<row><v>5.7984999119e+01</v><v>5.8214662436e+01</v></row>
            <!-- 2017-03-15 15:45:00 CET / 1489589100 -->
<row><v>1.6832224569e+16</v><v>1.6832224569e+16</v></row>
            <!-- 2017-03-15 15:50:00 CET / 1489589400 -->
<row><v>4.4656922344e+16</v><v>4.4656922343e+16</v></row>
            <!-- 2017-03-15 15:55:00 CET / 1489589700 -->
<row><v>5.7648150173e+01</v><v>5.7687031165e+01</v></row>
            <!-- 2017-03-15 16:00:00 CET / 1489590000 -->
<row><v>5.9068884188e+01</v><v>5.9453689406e+01</v></row>
            <!-- 2017-03-15 16:05:00 CET / 1489590300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:10:00 CET / 1489590600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:15:00 CET / 1489590900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:20:00 CET / 1489591200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:25:00 CET / 1489591500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:30:00 CET / 1489591800 -->
<row><v>1.9398478192e+07</v><v>1.8707899982e+07</v></row>
            <!-- 2017-03-15 16:35:00 CET / 1489592100 -->
<row><v>5.6938284153e+01</v><v>5.6770437158e+01</v></row>
            <!-- 2017-03-15 16:40:00 CET / 1489592400 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:45:00 CET / 1489592700 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:50:00 CET / 1489593000 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:55:00 CET / 1489593300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:00:00 CET / 1489593600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:05:00 CET / 1489593900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:10:00 CET / 1489594200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:15:00 CET / 1489594500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:20:00 CET / 1489594800 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:25:00 CET / 1489595100 -->
<row><v>3.5775056887e+07</v><v>3.4501518955e+07</v></row>
            <!-- 2017-03-15 17:30:00 CET / 1489595400 -->
<row><v>5.7219344262e+01</v><v>5.7417704918e+01</v></row>
            <!-- 2017-03-15 17:35:00 CET / 1489595700 -->
<row><v>5.7166338798e+01</v><v>5.9383825137e+01</v></row>
            <!-- 2017-03-15 17:40:00 CET / 1489596000 -->
<row><v>5.6769617486e+01</v><v>5.6981202186e+01</v></row>
            <!-- 2017-03-15 17:45:00 CET / 1489596300 -->
<row><v>5.7549617486e+01</v><v>5.7382732240e+01</v></row>
    ...[snip]
    This behaviour does NOT occur on my current Xymon server (version
4.2.3) running on SLES11 SP4.

    First I thought that this has to do with vmware, but that is not the
case. VM or bare metal; the behaviour is the same.

    I made sure to see that even the devmon module is not causing the
problems. The same devmon software works fine on SLES and RHEL. The
snmpwalk-command does get valid SNMP-data, when writing to a files. It just
seems that Xymon does not update the rrd-file correctly!?!?

    Any suggestions how to proceed?

2) Is a memory leak that only occurs when the NetApp-plugin (
https://sourceforge.net/projects/hobbit-perl-cl/) is being used for
trending data. Unfortunately, this not maintained anymore.

   In the past I have been trying to troubleshoot this problem with you
using valgrind etc.

   What do you suggest? Should I upgrade to the newest version first?

Kind regards, Peter
list Matt Vander Werf · Sun, 19 Mar 2017 06:05:18 -0400 ·
Hi Peter,

Regarding #2, it looks like a memory leak with netapp data RRD templates
was fixed as part of the 4.3.28 release [1], and it looks like it says it
was reported by you as well [2]. Given that you said you're still using the
4.3.27 RPM, you might want to try and update to 4.3.28 and see if that
fixes your memory leak issue.

Hope this helps!

[1] https://sourceforge.net/p/xymon/code/7969/
[2] https://sourceforge.net/p/xymon/code/HEAD/tree/branches/4.3.28/Changes

--
Matt Vander Werf

On Fri, Mar 17, 2017 at 8:56 AM, Peter Welter <user-f55666bd0d1e@xymon.invalid>
quoted from Peter Welter
wrote:
Hi JC,

I'm still experiencing some difficulties with Xymon version
(4.3.27-1.el6.terabithia) software, that is being deployed from
http://terabithia.org/rpms/xymon/el6/i686/.

There are two different types of problems:

1) Has to do with the integration of Xymon/Devmon.

   Although Devmon gets valid SNMP-data, for each poll, the values in the
if_load.Ethernet3_1.rrd-file (for example) are showing gaps. The next value
is so much larger than the rest, so the total graph is going beserk because
of the spikes that are being shown.

   ...[snip]
            <!-- 2017-03-15 15:10:00 CET / 1489587000 -->
<row><v>5.7197560484e+01</v><v>5.7540255376e+01</v></row>
            <!-- 2017-03-15 15:15:00 CET / 1489587300 -->
<row><v>5.8052253788e+01</v><v>5.7062462121e+01</v></row>
            <!-- 2017-03-15 15:20:00 CET / 1489587600 -->
<row><v>5.8039204545e+01</v><v>5.7738579545e+01</v></row>
            <!-- 2017-03-15 15:25:00 CET / 1489587900 -->
<row><v>5.8352395833e+01</v><v>5.7912187500e+01</v></row>
            <!-- 2017-03-15 15:30:00 CET / 1489588200 -->
<row><v>5.7961458333e+01</v><v>5.8807500000e+01</v></row>
            <!-- 2017-03-15 15:35:00 CET / 1489588500 -->
<row><v>5.7040675403e+01</v><v>5.7108262769e+01</v></row>
            <!-- 2017-03-15 15:40:00 CET / 1489588800 -->
<row><v>5.7984999119e+01</v><v>5.8214662436e+01</v></row>
            <!-- 2017-03-15 15:45:00 CET / 1489589100 -->
<row><v>1.6832224569e+16</v><v>1.6832224569e+16</v></row>
            <!-- 2017-03-15 15:50:00 CET / 1489589400 -->
<row><v>4.4656922344e+16</v><v>4.4656922343e+16</v></row>
            <!-- 2017-03-15 15:55:00 CET / 1489589700 -->
<row><v>5.7648150173e+01</v><v>5.7687031165e+01</v></row>
            <!-- 2017-03-15 16:00:00 CET / 1489590000 -->
<row><v>5.9068884188e+01</v><v>5.9453689406e+01</v></row>
            <!-- 2017-03-15 16:05:00 CET / 1489590300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:10:00 CET / 1489590600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:15:00 CET / 1489590900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:20:00 CET / 1489591200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:25:00 CET / 1489591500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:30:00 CET / 1489591800 -->
<row><v>1.9398478192e+07</v><v>1.8707899982e+07</v></row>
            <!-- 2017-03-15 16:35:00 CET / 1489592100 -->
<row><v>5.6938284153e+01</v><v>5.6770437158e+01</v></row>
            <!-- 2017-03-15 16:40:00 CET / 1489592400 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:45:00 CET / 1489592700 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:50:00 CET / 1489593000 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:55:00 CET / 1489593300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:00:00 CET / 1489593600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:05:00 CET / 1489593900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:10:00 CET / 1489594200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:15:00 CET / 1489594500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:20:00 CET / 1489594800 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:25:00 CET / 1489595100 -->
<row><v>3.5775056887e+07</v><v>3.4501518955e+07</v></row>
            <!-- 2017-03-15 17:30:00 CET / 1489595400 -->
<row><v>5.7219344262e+01</v><v>5.7417704918e+01</v></row>
            <!-- 2017-03-15 17:35:00 CET / 1489595700 -->
<row><v>5.7166338798e+01</v><v>5.9383825137e+01</v></row>
            <!-- 2017-03-15 17:40:00 CET / 1489596000 -->
<row><v>5.6769617486e+01</v><v>5.6981202186e+01</v></row>
            <!-- 2017-03-15 17:45:00 CET / 1489596300 -->
<row><v>5.7549617486e+01</v><v>5.7382732240e+01</v></row>
    ...[snip]
    This behaviour does NOT occur on my current Xymon server (version
4.2.3) running on SLES11 SP4.

    First I thought that this has to do with vmware, but that is not the
case. VM or bare metal; the behaviour is the same.

    I made sure to see that even the devmon module is not causing the
problems. The same devmon software works fine on SLES and RHEL. The
snmpwalk-command does get valid SNMP-data, when writing to a files. It just
seems that Xymon does not update the rrd-file correctly!?!?

    Any suggestions how to proceed?

2) Is a memory leak that only occurs when the NetApp-plugin (
https://sourceforge.net/projects/hobbit-perl-cl/) is being used for
trending data. Unfortunately, this not maintained anymore.

   In the past I have been trying to troubleshoot this problem with you
using valgrind etc.

   What do you suggest? Should I upgrade to the newest version first?

Kind regards, Peter

list Peter Welter · Mon, 20 Mar 2017 09:42:33 +0100 ·
Thanks Matt, I was looking for exactly the links you reported, but
apparantly I was looking hard enough. I'll report back later tomorrow, or
so.

Peter

2017-03-19 11:05 GMT+01:00 Matt Vander Werf <user-dfc3cf2ca434@xymon.invalid>:
quoted from Matt Vander Werf
Hi Peter,

Regarding #2, it looks like a memory leak with netapp data RRD templates
was fixed as part of the 4.3.28 release [1], and it looks like it says it
was reported by you as well [2]. Given that you said you're still using the
4.3.27 RPM, you might want to try and update to 4.3.28 and see if that
fixes your memory leak issue.

Hope this helps!

[1] https://sourceforge.net/p/xymon/code/7969/
[2] https://sourceforge.net/p/xymon/code/HEAD/tree/branches/4.3.28/Changes

--
Matt Vander Werf

On Fri, Mar 17, 2017 at 8:56 AM, Peter Welter <user-f55666bd0d1e@xymon.invalid>
wrote:
Hi JC,

I'm still experiencing some difficulties with Xymon version
(4.3.27-1.el6.terabithia) software, that is being deployed from
http://terabithia.org/rpms/xymon/el6/i686/.

There are two different types of problems:

1) Has to do with the integration of Xymon/Devmon.

   Although Devmon gets valid SNMP-data, for each poll, the values in the
if_load.Ethernet3_1.rrd-file (for example) are showing gaps. The next value
is so much larger than the rest, so the total graph is going beserk because
of the spikes that are being shown.

   ...[snip]
            <!-- 2017-03-15 15:10:00 CET / 1489587000 -->
<row><v>5.7197560484e+01</v><v>5.7540255376e+01</v></row>
            <!-- 2017-03-15 15:15:00 CET / 1489587300 -->
<row><v>5.8052253788e+01</v><v>5.7062462121e+01</v></row>
            <!-- 2017-03-15 15:20:00 CET / 1489587600 -->
<row><v>5.8039204545e+01</v><v>5.7738579545e+01</v></row>
            <!-- 2017-03-15 15:25:00 CET / 1489587900 -->
<row><v>5.8352395833e+01</v><v>5.7912187500e+01</v></row>
            <!-- 2017-03-15 15:30:00 CET / 1489588200 -->
<row><v>5.7961458333e+01</v><v>5.8807500000e+01</v></row>
            <!-- 2017-03-15 15:35:00 CET / 1489588500 -->
<row><v>5.7040675403e+01</v><v>5.7108262769e+01</v></row>
            <!-- 2017-03-15 15:40:00 CET / 1489588800 -->
<row><v>5.7984999119e+01</v><v>5.8214662436e+01</v></row>
            <!-- 2017-03-15 15:45:00 CET / 1489589100 -->
<row><v>1.6832224569e+16</v><v>1.6832224569e+16</v></row>
            <!-- 2017-03-15 15:50:00 CET / 1489589400 -->
<row><v>4.4656922344e+16</v><v>4.4656922343e+16</v></row>
            <!-- 2017-03-15 15:55:00 CET / 1489589700 -->
<row><v>5.7648150173e+01</v><v>5.7687031165e+01</v></row>
            <!-- 2017-03-15 16:00:00 CET / 1489590000 -->
<row><v>5.9068884188e+01</v><v>5.9453689406e+01</v></row>
            <!-- 2017-03-15 16:05:00 CET / 1489590300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:10:00 CET / 1489590600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:15:00 CET / 1489590900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:20:00 CET / 1489591200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:25:00 CET / 1489591500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:30:00 CET / 1489591800 -->
<row><v>1.9398478192e+07</v><v>1.8707899982e+07</v></row>
            <!-- 2017-03-15 16:35:00 CET / 1489592100 -->
<row><v>5.6938284153e+01</v><v>5.6770437158e+01</v></row>
            <!-- 2017-03-15 16:40:00 CET / 1489592400 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:45:00 CET / 1489592700 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:50:00 CET / 1489593000 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:55:00 CET / 1489593300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:00:00 CET / 1489593600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:05:00 CET / 1489593900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:10:00 CET / 1489594200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:15:00 CET / 1489594500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:20:00 CET / 1489594800 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:25:00 CET / 1489595100 -->
<row><v>3.5775056887e+07</v><v>3.4501518955e+07</v></row>
            <!-- 2017-03-15 17:30:00 CET / 1489595400 -->
<row><v>5.7219344262e+01</v><v>5.7417704918e+01</v></row>
            <!-- 2017-03-15 17:35:00 CET / 1489595700 -->
<row><v>5.7166338798e+01</v><v>5.9383825137e+01</v></row>
            <!-- 2017-03-15 17:40:00 CET / 1489596000 -->
<row><v>5.6769617486e+01</v><v>5.6981202186e+01</v></row>
            <!-- 2017-03-15 17:45:00 CET / 1489596300 -->
<row><v>5.7549617486e+01</v><v>5.7382732240e+01</v></row>
    ...[snip]
    This behaviour does NOT occur on my current Xymon server (version
4.2.3) running on SLES11 SP4.

    First I thought that this has to do with vmware, but that is not the
case. VM or bare metal; the behaviour is the same.

    I made sure to see that even the devmon module is not causing the
problems. The same devmon software works fine on SLES and RHEL. The
snmpwalk-command does get valid SNMP-data, when writing to a files. It just
seems that Xymon does not update the rrd-file correctly!?!?

    Any suggestions how to proceed?

2) Is a memory leak that only occurs when the NetApp-plugin (
https://sourceforge.net/projects/hobbit-perl-cl/) is being used for
trending data. Unfortunately, this not maintained anymore.

   In the past I have been trying to troubleshoot this problem with you
using valgrind etc.

   What do you suggest? Should I upgrade to the newest version first?

Kind regards, Peter

list Japheth Cleaver · Tue, 21 Mar 2017 08:36:19 -0700 ·
quoted from Peter Welter
    On Fri, Mar 17, 2017 at 8:56 AM, Peter Welter
    <user-f55666bd0d1e@xymon.invalid <mailto:user-f55666bd0d1e@xymon.invalid>> wrote:

        Hi JC,

        I'm still experiencing some difficulties with Xymon version
        (4.3.27-1.el6.terabithia) software, that is being deployed
        from http://terabithia.org/rpms/xymon/el6/i686/
        <http://terabithia.org/rpms/xymon/el6/i686/>;.

        There are two different types of problems:

        1) Has to do with the integration of Xymon/Devmon.

           Although Devmon gets valid SNMP-data, for each poll, the
        values in the if_load.Ethernet3_1.rrd-file (for example) are
        showing gaps. The next value is so much larger than the rest,
        so the total graph is going beserk because of the spikes that
        are being shown.

           ...[snip]
                    <!-- 2017-03-15 15:10:00 CET / 1489587000 -->
        <row><v>5.7197560484e+01</v><v>5.7540255376e+01</v></row>
                    <!-- 2017-03-15 15:15:00 CET / 1489587300 -->
        <row><v>5.8052253788e+01</v><v>5.7062462121e+01</v></row>
                    <!-- 2017-03-15 15:20:00 CET / 1489587600 -->
        <row><v>5.8039204545e+01</v><v>5.7738579545e+01</v></row>
                    <!-- 2017-03-15 15:25:00 CET / 1489587900 -->
        <row><v>5.8352395833e+01</v><v>5.7912187500e+01</v></row>
                    <!-- 2017-03-15 15:30:00 CET / 1489588200 -->
        <row><v>5.7961458333e+01</v><v>5.8807500000e+01</v></row>
                    <!-- 2017-03-15 15:35:00 CET / 1489588500 -->
        <row><v>5.7040675403e+01</v><v>5.7108262769e+01</v></row>
                    <!-- 2017-03-15 15:40:00 CET / 1489588800 -->
        <row><v>5.7984999119e+01</v><v>5.8214662436e+01</v></row>
                    <!-- 2017-03-15 15:45:00 CET / 1489589100 -->
        <row><v>1.6832224569e+16</v><v>1.6832224569e+16</v></row>
                    <!-- 2017-03-15 15:50:00 CET / 1489589400 -->
        <row><v>4.4656922344e+16</v><v>4.4656922343e+16</v></row>
                    <!-- 2017-03-15 15:55:00 CET / 1489589700 -->
        <row><v>5.7648150173e+01</v><v>5.7687031165e+01</v></row>
                    <!-- 2017-03-15 16:00:00 CET / 1489590000 -->
        <row><v>5.9068884188e+01</v><v>5.9453689406e+01</v></row>
                    <!-- 2017-03-15 16:05:00 CET / 1489590300 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:10:00 CET / 1489590600 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:15:00 CET / 1489590900 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:20:00 CET / 1489591200 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:25:00 CET / 1489591500 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:30:00 CET / 1489591800 -->
        <row><v>1.9398478192e+07</v><v>1.8707899982e+07</v></row>
                    <!-- 2017-03-15 16:35:00 CET / 1489592100 -->
        <row><v>5.6938284153e+01</v><v>5.6770437158e+01</v></row>
                    <!-- 2017-03-15 16:40:00 CET / 1489592400 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:45:00 CET / 1489592700 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:50:00 CET / 1489593000 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 16:55:00 CET / 1489593300 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 17:00:00 CET / 1489593600 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 17:05:00 CET / 1489593900 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 17:10:00 CET / 1489594200 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 17:15:00 CET / 1489594500 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 17:20:00 CET / 1489594800 -->
        <row><v>NaN</v><v>NaN</v></row>
                    <!-- 2017-03-15 17:25:00 CET / 1489595100 -->
        <row><v>3.5775056887e+07</v><v>3.4501518955e+07</v></row>
                    <!-- 2017-03-15 17:30:00 CET / 1489595400 -->
        <row><v>5.7219344262e+01</v><v>5.7417704918e+01</v></row>
                    <!-- 2017-03-15 17:35:00 CET / 1489595700 -->
        <row><v>5.7166338798e+01</v><v>5.9383825137e+01</v></row>
                    <!-- 2017-03-15 17:40:00 CET / 1489596000 -->
        <row><v>5.6769617486e+01</v><v>5.6981202186e+01</v></row>
                    <!-- 2017-03-15 17:45:00 CET / 1489596300 -->
        <row><v>5.7549617486e+01</v><v>5.7382732240e+01</v></row>
            ...[snip]
            This behaviour does NOT occur on my current Xymon server
        (version 4.2.3) running on SLES11 SP4.

            First I thought that this has to do with vmware, but that
        is not the case. VM or bare metal; the behaviour is the same.

            I made sure to see that even the devmon module is not
        causing the problems. The same devmon software works fine on
        SLES and RHEL. The snmpwalk-command does get valid SNMP-data,
        when writing to a files. It just seems that Xymon does not
        update the rrd-file correctly!?!?

            Any suggestions how to proceed?
Assuming that the numeric values are correct for the time periods that 
are coming in, my first thought would be that there's something unusual 
going on with RRD cacheing. Are you seeing this issue with other trends 
graphs, either for other tests on this host, other hosts using this 
test/data, or any other graphs period?

If it's unique to this, then that speaks to a problem with this specific 
data transmission. If not, there could be a larger issue with xymond_rrd 
(I/O performance, for example). I'd start with enabling debug output and 
examining the logs for when it's receiving data for this test. (Not sure 
if this is being sent via 'data' or 'status' messages, but you'll want 
to make sure you're enabling debug for the right copy of xymond_rrd.)

If nothing there, then you might try disabling the cache, which will 
force xymond_rrd to write things out as received (but will also increase 
I/O load a lot).

If neither of those fix it, there could actually be an issue with the 
data coming in. At about that point I would set up a channel listener 
looking specifically for the host.svc messages related to this source so 
I could physically see the contents of each one coming in and look for 
any anomalies.

HTH,
-jc
list Peter Welter · Mon, 27 Mar 2017 11:36:07 +0200 ·
Hi JC, Matt

Good news:

Last friday I first upgraded to 4.3.28, but the spiky behavior immediately
showed up. So I think this is not Xymon-version specific.

Then I did as JC suggested, dis/en-able debug & en/dis-abling the cache.
Since there is an SSD involved on my xymon server the impact is minimal and
there is no production running.

This fixes both issues!

1) The devmon/xymon related thing and the gaps for the graphs
disappears as *soon
as I disabled the caching* (--no-cache). As you say, not something I want
for long, but now we can have a specific look (A) why and (B) where caching
is a problem. I think that is good news!


2) I expect the memory leak error solved, as the release notes said, but
that will only show up over time (weeks).


3) The enable debugging showed me another problem in the, self-modified,
netapp.pl-script. I reverted my change and now there are no more spurious
xstatvolume,____-rrd-files anymore filling up my diskspace.
This is an error I introduced myself and mailed in November 2016 on the
list. Sorry for this.

Very happy now and hoping we can tackle the cache problem so I can enable
the launching of the rrd-deamons.

Peter

2017-03-21 16:36 GMT+01:00 Japheth Cleaver <user-87556346d4af@xymon.invalid>:
quoted from Japheth Cleaver
On Fri, Mar 17, 2017 at 8:56 AM, Peter Welter <user-f55666bd0d1e@xymon.invalid>
wrote:
Hi JC,

I'm still experiencing some difficulties with Xymon version
(4.3.27-1.el6.terabithia) software, that is being deployed from
http://terabithia.org/rpms/xymon/el6/i686/.

There are two different types of problems:

1) Has to do with the integration of Xymon/Devmon.

   Although Devmon gets valid SNMP-data, for each poll, the values in
the if_load.Ethernet3_1.rrd-file (for example) are showing gaps. The next
value is so much larger than the rest, so the total graph is going beserk
because of the spikes that are being shown.

   ...[snip]
            <!-- 2017-03-15 15:10:00 CET / 1489587000 -->
<row><v>5.7197560484e+01</v><v>5.7540255376e+01</v></row>
            <!-- 2017-03-15 15:15:00 CET / 1489587300 -->
<row><v>5.8052253788e+01</v><v>5.7062462121e+01</v></row>
            <!-- 2017-03-15 15:20:00 CET / 1489587600 -->
<row><v>5.8039204545e+01</v><v>5.7738579545e+01</v></row>
            <!-- 2017-03-15 15:25:00 CET / 1489587900 -->
<row><v>5.8352395833e+01</v><v>5.7912187500e+01</v></row>
            <!-- 2017-03-15 15:30:00 CET / 1489588200 -->
<row><v>5.7961458333e+01</v><v>5.8807500000e+01</v></row>
            <!-- 2017-03-15 15:35:00 CET / 1489588500 -->
<row><v>5.7040675403e+01</v><v>5.7108262769e+01</v></row>
            <!-- 2017-03-15 15:40:00 CET / 1489588800 -->
<row><v>5.7984999119e+01</v><v>5.8214662436e+01</v></row>
            <!-- 2017-03-15 15:45:00 CET / 1489589100 -->
<row><v>1.6832224569e+16</v><v>1.6832224569e+16</v></row>
            <!-- 2017-03-15 15:50:00 CET / 1489589400 -->
<row><v>4.4656922344e+16</v><v>4.4656922343e+16</v></row>
            <!-- 2017-03-15 15:55:00 CET / 1489589700 -->
<row><v>5.7648150173e+01</v><v>5.7687031165e+01</v></row>
            <!-- 2017-03-15 16:00:00 CET / 1489590000 -->
<row><v>5.9068884188e+01</v><v>5.9453689406e+01</v></row>
            <!-- 2017-03-15 16:05:00 CET / 1489590300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:10:00 CET / 1489590600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:15:00 CET / 1489590900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:20:00 CET / 1489591200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:25:00 CET / 1489591500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:30:00 CET / 1489591800 -->
<row><v>1.9398478192e+07</v><v>1.8707899982e+07</v></row>
            <!-- 2017-03-15 16:35:00 CET / 1489592100 -->
<row><v>5.6938284153e+01</v><v>5.6770437158e+01</v></row>
            <!-- 2017-03-15 16:40:00 CET / 1489592400 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:45:00 CET / 1489592700 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:50:00 CET / 1489593000 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 16:55:00 CET / 1489593300 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:00:00 CET / 1489593600 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:05:00 CET / 1489593900 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:10:00 CET / 1489594200 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:15:00 CET / 1489594500 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:20:00 CET / 1489594800 -->
<row><v>NaN</v><v>NaN</v></row>
            <!-- 2017-03-15 17:25:00 CET / 1489595100 -->
<row><v>3.5775056887e+07</v><v>3.4501518955e+07</v></row>
            <!-- 2017-03-15 17:30:00 CET / 1489595400 -->
<row><v>5.7219344262e+01</v><v>5.7417704918e+01</v></row>
            <!-- 2017-03-15 17:35:00 CET / 1489595700 -->
<row><v>5.7166338798e+01</v><v>5.9383825137e+01</v></row>
            <!-- 2017-03-15 17:40:00 CET / 1489596000 -->
<row><v>5.6769617486e+01</v><v>5.6981202186e+01</v></row>
            <!-- 2017-03-15 17:45:00 CET / 1489596300 -->
<row><v>5.7549617486e+01</v><v>5.7382732240e+01</v></row>
    ...[snip]
    This behaviour does NOT occur on my current Xymon server (version
4.2.3) running on SLES11 SP4.

    First I thought that this has to do with vmware, but that is not the
case. VM or bare metal; the behaviour is the same.

    I made sure to see that even the devmon module is not causing the
problems. The same devmon software works fine on SLES and RHEL. The
snmpwalk-command does get valid SNMP-data, when writing to a files. It just
seems that Xymon does not update the rrd-file correctly!?!?

    Any suggestions how to proceed?
Assuming that the numeric values are correct for the time periods that are
coming in, my first thought would be that there's something unusual going
on with RRD cacheing. Are you seeing this issue with other trends graphs,
either for other tests on this host, other hosts using this test/data, or
any other graphs period?

If it's unique to this, then that speaks to a problem with this specific
data transmission. If not, there could be a larger issue with xymond_rrd
(I/O performance, for example). I'd start with enabling debug output and
examining the logs for when it's receiving data for this test. (Not sure if
this is being sent via 'data' or 'status' messages, but you'll want to make
sure you're enabling debug for the right copy of xymond_rrd.)

If nothing there, then you might try disabling the cache, which will force
xymond_rrd to write things out as received (but will also increase I/O load
a lot).

If neither of those fix it, there could actually be an issue with the data
coming in. At about that point I would set up a channel listener looking
specifically for the host.svc messages related to this source so I could
physically see the contents of each one coming in and look for any
anomalies.

HTH,
-jc