Xymon Mailing List Archive search

Possible Memory Leak (?!) in Version Xymon 4.3.27-1.el6.terabithia

7 messages in this thread

list Peter Welter · Tue, 20 Sep 2016 17:37:08 +0200 ·
Hi J.C.,

First of all: Thanks for your work for Xymon!

Second: I have a question about the repository from terabithia. I want to
install an Development, Test  Accept, Production environment with the use
of this repository. I installed first and are working on the next phase.

Over time however, I see that my Xymon-server seems to eat all the memory
available and starts swapping until all memory is consumed?!?

This is for Development only and there are no really any tests. A very
small host.cfg. So, why is over time, Xymon this hungry for memory?

Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

   Memory                  Used       Total  Percentage
green Real/Physical          7737M       7872M         98%
yellow Actual/Virtual         7539M       7872M         95%
red Swap/Page              3886M       4095M         94%

After a Xymon restart, all the swap is freed?

I'm using Red Hat Enterprise Linux Server release 6.8 (Santiago)

Any suggestions what to do next? Thanks in advance for any help!

Peter
list Japheth Cleaver · Tue, 20 Sep 2016 10:07:10 -0700 ·
quoted from Peter Welter
On 9/20/2016 8:37 AM, Peter Welter wrote:
Hi J.C.,

First of all: Thanks for your work for Xymon!

Second: I have a question about the repository from terabithia. I want to install an Development, Test  Accept, Production environment with the use of this repository. I installed first and are working on the next phase.

Over time however, I see that my Xymon-server seems to eat all the memory available and starts swapping until all memory is consumed?!?

This is for Development only and there are no really any tests. A very small host.cfg. So, why is over time, Xymon this hungry for memory?

Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

   Memory                  Used       Total  Percentage
green Real/Physical          7737M       7872M 98%
yellow Actual/Virtual         7539M       7872M 95%
red Swap/Page              3886M       4095M         94%

After a Xymon restart, all the swap is freed?

I'm using Red Hat Enterprise Linux Server release 6.8 (Santiago)

Any suggestions what to do next? Thanks in advance for any help!

Peter
Hi Peter,

I'm not aware of any memory leaks present in 4.3.27 itself that would cause growth like that. Can you provide the ps output for the system's various xymon tools? Which process seems to be running out of control?

-jc
list Peter Welter · Wed, 21 Sep 2016 13:18:27 +0200 ·
Hi Japheth,

Thanks for your response. I'm looking into this and will be back a.s.a.p.
(a few days or so, since I just restarted Xymon ;-)

Peter

2016-09-20 19:07 GMT+02:00 Japheth Cleaver <user-87556346d4af@xymon.invalid>:
quoted from Peter Welter
On 9/20/2016 8:37 AM, Peter Welter wrote:
Hi J.C.,

First of all: Thanks for your work for Xymon!

Second: I have a question about the repository from terabithia. I want to
install an Development, Test  Accept, Production environment with the use
of this repository. I installed first and are working on the next phase.

Over time however, I see that my Xymon-server seems to eat all the memory
available and starts swapping until all memory is consumed?!?

This is for Development only and there are no really any tests. A very
small host.cfg. So, why is over time, Xymon this hungry for memory?

Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

   Memory                  Used       Total  Percentage
green Real/Physical          7737M       7872M 98%
yellow Actual/Virtual         7539M       7872M 95%
red Swap/Page              3886M       4095M         94%

After a Xymon restart, all the swap is freed?

I'm using Red Hat Enterprise Linux Server release 6.8 (Santiago)

Any suggestions what to do next? Thanks in advance for any help!

Peter
Hi Peter,

I'm not aware of any memory leaks present in 4.3.27 itself that would
cause growth like that. Can you provide the ps output for the system's
various xymon tools? Which process seems to be running out of control?

-jc
list Peter Welter · Fri, 23 Sep 2016 13:38:36 +0200 ·
Hi Japheth,

Probable one process (xymon_rrd) seems very hungry for memory:

[xymon]# ps aux | egrep 'xymon|MEM'

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

xymon    16889  0.0  0.0   4176   604 ?        S    13:26   0:00 /bin/dash

xymon    16892  0.0  0.0   6272   660 ?        S    13:26   0:00 vmstat 300
2

xymon    16986  0.0  0.0   4176   600 ?        S    13:28   0:00 /bin/dash

xymon    16989  0.0  0.0   6272   664 ?        S    13:28   0:00 vmstat 300
2

xymon    17060  0.0  0.0   4176   604 ?        S    13:30   0:00 /bin/dash

xymon    17063  0.0  0.0   6272   664 ?        S    13:30   0:00 vmstat 300
2

xymon    17107  0.5  0.1 140340 10324 ?        S    13:31   0:00
/usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl

xymon    17110  0.2  0.1 142236 11108 ?        S    13:31   0:00
/usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl

xymon    17160  0.0  0.0 106120  1248 ?        S    13:31   0:00 sh -c
/usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1

xymon    17161  0.0  0.0  60060  3440 ?        S    13:31   0:00
/usr/bin/ssh -x -l xymon 10.10.1.30 environment status

root     17163  0.0  0.0 103324   852 pts/1    S+   13:31   0:00 egrep
xymon|MEM

xymon    27932  0.0  0.0  12648   592 ?        Ss   Sep20   0:05
/usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log

xymon    27992  0.0  0.1 25212804 8160 ?       S    Sep20   1:57 xymond
--restart=/var/lib/xymon/tmp/xymond.chk
--checkpoint-file=/var/lib/xymon/tmp/xymond.chk --checkpoint-interval=600
--admin-senders=127.0.0.1,132.229.61.140 --store-clientlogs=!msgs

xymon    27996  0.0  0.0 12624444 1452 ?       S    Sep20   0:00
xymond_channel --channel=stachg xymond_history

xymon    27997  0.0  0.0 12624444 1244 ?       S    Sep20   0:00
xymond_channel --channel=page xymond_alert
--checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600

xymon    27998  0.0  0.0 12624444 1340 ?       S    Sep20   0:00
xymond_channel --channel=client xymond_client

xymon    27999  0.0  0.0 12624860 4328 ?       S    Sep20   0:02
xymond_channel --channel=status xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28000  0.0  0.0 12625628 4712 ?       S    Sep20   0:00
xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28001  0.0  0.0 12624444 1320 ?       S    Sep20   0:00
xymond_channel --channel=clichg xymond_hostdata

xymon    28007  0.0  0.0  41788  1168 ?        S    Sep20   0:00
xymond_channel --channel=user --log=/var/log/xymon/vmware-monitord.log
vmware-monitord

xymon    28008  0.0  0.0 10527268 1688 ?       S    Sep20   0:00
xymond_history

xymon    28009  0.0  1.5 12624884 122508 ?     S    Sep20   0:00
xymond_client

xymon    28010  0.0  0.0 106848  2176 ?        S    Sep20   0:00 /bin/gawk
-f /usr/libexec/xymon/vmware-monitord

xymon    28011  0.0  0.0 10527252 1212 ?       S    Sep20   0:00
xymond_hostdata

*xymon    28012  0.0  9.4 12680832 765216 ?     S    Sep20   0:08
xymond_rrd --rrddir=/var/lib/xymon/rrd*

*xymon    28013  0.0 12.1 12689484 975908 ?     S    Sep20   0:12
xymond_rrd --rrddir=/var/lib/xymon/rrd*

xymon    28014  0.0  0.1 10527512 9980 ?       S    Sep20   0:00
xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
--checkpoint-interval=600
I did one test migration, were all hosts (about 4000 hosts) ran on this
system. So the directory /var/lib/xymon/rrd is quite huge. However,
currently there is only one host (xymon server itself) running and it is
testing one netapp filer. So perhaps, xymon_rrd and this large directory
are somehow related. I will have a try on the Accept environment which I
have installed by now. There are just a few files in /var/lib/xymon/rrd on
this Accept system, and I check next monday how each system will behave.

<So far an update; will be continued. next week..>


2016-09-21 13:18 GMT+02:00 Peter Welter <user-f55666bd0d1e@xymon.invalid>:
signature
Hi Japheth,

Thanks for your response. I'm looking into this and will be back a.s.a.p.
(a few days or so, since I just restarted Xymon ;-)

Peter

2016-09-20 19:07 GMT+02:00 Japheth Cleaver <user-87556346d4af@xymon.invalid>:
quoted from Peter Welter
On 9/20/2016 8:37 AM, Peter Welter wrote:
Hi J.C.,

First of all: Thanks for your work for Xymon!

Second: I have a question about the repository from terabithia. I want
to install an Development, Test  Accept, Production environment with the
use of this repository. I installed first and are working on the next phase.

Over time however, I see that my Xymon-server seems to eat all the
memory available and starts swapping until all memory is consumed?!?

This is for Development only and there are no really any tests. A very
small host.cfg. So, why is over time, Xymon this hungry for memory?

Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

   Memory                  Used       Total  Percentage
green Real/Physical          7737M       7872M 98%
yellow Actual/Virtual         7539M       7872M 95%
red Swap/Page              3886M       4095M         94%

After a Xymon restart, all the swap is freed?

I'm using Red Hat Enterprise Linux Server release 6.8 (Santiago)

Any suggestions what to do next? Thanks in advance for any help!

Peter
Hi Peter,

I'm not aware of any memory leaks present in 4.3.27 itself that would
cause growth like that. Can you provide the ps output for the system's
various xymon tools? Which process seems to be running out of control?

-jc
list Henrik Størner · Sat, 24 Sep 2016 14:18:34 +0200 ·
Hi,

memory leaks are the worst to troubleshoot.

If possible, then running xymond_rrd via the "valgrind" tool is the best way to do it. valgrind comes with some distributions, not sure about RHEL though. There might be some CentOS packages that will work.

An important point is that the binaries must be compiled with debugging info intact; i.e. "-g" as a compile-time option, preferably only -O optimisation, and not stripped. I guess Japheth can help you with that, if necessary.

Then you change the tasks.cfg to run xymond_rrd via valgrind: The CMD setting must then be

CMD valgrind --log-file=/tmp/valgrind-rrd.%p --leak-check=full \
     xymond_channel --channel=status --log=$XYMONSERVERLOGS/rrd-status.log xymond_rrd --rrddir=$XYMONVAR/rrd

Then run Xymon normally for some time, until hopefully it starts logging memory leaks.


This checking does have a significant performance impact, so running it on a 4000-server system is probably not possible.


Regards,
Henrik


Den 23-09-2016 kl. 13:38 skrev Peter Welter:
quoted from Peter Welter
Hi Japheth,

Probable one process (xymon_rrd) seems very hungry for memory:

[xymon]# ps aux | egrep 'xymon|MEM'

USER       PID %CPU %MEM VSZ   RSS TTY      STAT START   TIME COMMAND

xymon    16889  0.0  0.0 4176   604 ?        S    13:26   0:00 /bin/dash

xymon    16892  0.0  0.0 6272   660 ?        S    13:26   0:00 vmstat 300 2

xymon    16986  0.0  0.0 4176   600 ?        S    13:28   0:00 /bin/dash

xymon    16989  0.0  0.0 6272   664 ?        S    13:28   0:00 vmstat 300 2

xymon    17060  0.0  0.0 4176   604 ?        S    13:30   0:00 /bin/dash

xymon    17063  0.0  0.0 6272   664 ?        S    13:30   0:00 vmstat 300 2

xymon    17107  0.5  0.1 140340 10324 ?        S    13:31   0:00 /usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>;

xymon    17110  0.2  0.1 142236 11108 ?        S    13:31   0:00 /usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>;
quoted from Peter Welter

xymon    17160  0.0  0.0 106120  1248 ?        S    13:31   0:00 sh -c /usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1

xymon    17161  0.0  0.0 60060  3440 ?        S    13:31   0:00 /usr/bin/ssh -x -l xymon 10.10.1.30 environment status

root     17163  0.0  0.0 103324   852 pts/1    S+   13:31   0:00 egrep xymon|MEM

xymon    27932  0.0  0.0 12648   592 ?        Ss   Sep20   0:05 /usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log

xymon    27992  0.0  0.1 25212804 8160 ?       S    Sep20   1:57 xymond --restart=/var/lib/xymon/tmp/xymond.chk --checkpoint-file=/var/lib/xymon/tmp/xymond.chk --checkpoint-interval=600 --admin-senders=127.0.0.1,132.229.61.140 --store-clientlogs=!msgs

xymon    27996  0.0  0.0 12624444 1452 ?       S    Sep20   0:00 xymond_channel --channel=stachg xymond_history

xymon    27997  0.0  0.0 12624444 1244 ?       S    Sep20   0:00 xymond_channel --channel=page xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600

xymon    27998  0.0  0.0 12624444 1340 ?       S    Sep20   0:00 xymond_channel --channel=client xymond_client

xymon    27999  0.0  0.0 12624860 4328 ?       S    Sep20   0:02 xymond_channel --channel=status xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28000  0.0  0.0 12625628 4712 ?       S    Sep20   0:00 xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28001  0.0  0.0 12624444 1320 ?       S    Sep20   0:00 xymond_channel --channel=clichg xymond_hostdata

xymon    28007  0.0  0.0 41788  1168 ?        S    Sep20   0:00 xymond_channel --channel=user --log=/var/log/xymon/vmware-monitord.log vmware-monitord

xymon    28008  0.0  0.0 10527268 1688 ?       S    Sep20   0:00 xymond_history

xymon    28009  0.0  1.5 12624884 122508 ?     S    Sep20   0:00 xymond_client

xymon    28010  0.0  0.0 106848  2176 ?        S    Sep20   0:00 /bin/gawk -f /usr/libexec/xymon/vmware-monitord

xymon    28011  0.0  0.0 10527252 1212 ?       S    Sep20   0:00 xymond_hostdata

*xymon    28012  0.0  9.4 12680832 765216 ?     S    Sep20   0:08 xymond_rrd --rrddir=/var/lib/xymon/rrd*

*xymon    28013  0.0 12.1 12689484 975908 ?     S    Sep20   0:12 xymond_rrd --rrddir=/var/lib/xymon/rrd*

xymon 28014  0.0  0.1 10527512 9980 ?       S    Sep20   0:00 xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600

I did one test migration, were all hosts (about 4000 hosts) ran on this system. So the directory /var/lib/xymon/rrd is quite huge. However, currently there is only one host (xymon server itself) running and it is testing one netapp filer. So perhaps, xymon_rrd and this large directory are somehow related. I will have a try on the Accept environment which I have installed by now. There are just a few files in /var/lib/xymon/rrd on this Accept system, and I check next monday how each system will behave.

<So far an update; will be continued. next week..>


2016-09-21 13:18 GMT+02:00 Peter Welter <user-f55666bd0d1e@xymon.invalid <mailto:user-f55666bd0d1e@xymon.invalid>>:
quoted from Peter Welter

    Hi Japheth,

    Thanks for your response. I'm looking into this and will be back
    a.s.a.p. (a few days or so, since I just restarted Xymon ;-)

    Peter

    2016-09-20 19:07 GMT+02:00 Japheth Cleaver <user-87556346d4af@xymon.invalid

    <mailto:user-87556346d4af@xymon.invalid>>:
quoted from Peter Welter

        On 9/20/2016 8:37 AM, Peter Welter wrote:

            Hi J.C.,

            First of all: Thanks for your work for Xymon!

            Second: I have a question about the repository from
            terabithia. I want to install an Development, Test             Accept, Production environment with the use of this
            repository. I installed first and are working on the next
            phase.

            Over time however, I see that my Xymon-server seems to eat
            all the memory available and starts swapping until all
            memory is consumed?!?

            This is for Development only and there are no really any
            tests. A very small host.cfg. So, why is over time, Xymon
            this hungry for memory?

            Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

               Memory                  Used  Total  Percentage
            green Real/Physical          7737M  7872M 98%
            yellow Actual/Virtual         7539M  7872M 95%
            red Swap/Page              3886M  4095M         94%

            After a Xymon restart, all the swap is freed?

            I'm using Red Hat Enterprise Linux Server release 6.8
            (Santiago)

            Any suggestions what to do next? Thanks in advance for any
            help!

            Peter


        Hi Peter,

        I'm not aware of any memory leaks present in 4.3.27 itself
        that would cause growth like that. Can you provide the ps
        output for the system's various xymon tools? Which process
        seems to be running out of control?

        -jc

list Peter Welter · Wed, 28 Sep 2016 10:27:55 +0200 ·
Hi Henrik, J.C.,

Thanks for your response.

It seems that valgrind is available for RHEL (see below) and now I wanted
to ask J.C. the following: "What do you want me to do?"

If I want to use the prebuild packages, and YES that would be preferable,
then can you supply me with a pre-compiles binary for xymond_rrd that has
all the options Henrik talked about? So I can replace this with the
currently installed image?

Or should I build a package my self to debug this issue?

Regards, Peter


[root at uhu-a xymon]# yum search valgrind

Loaded plugins: product-id, search-disabled-repos, security,
subscription-manager

==================================================================================================================================
N/S Matched: valgrind
==================================================================================================================================

devtoolset-1.1-*valgrind*-devel.i686 : Development files for *valgrind*

devtoolset-1.1-*valgrind*-devel.x86_64 : Development files for *valgrind*

devtoolset-1.1-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*

devtoolset-1.1-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

devtoolset-2-eclipse-*valgrind*.noarch : *Valgrind* Tools Integration for
Eclipse

devtoolset-2-*valgrind*-devel.i686 : Development files for *valgrind*

devtoolset-2-*valgrind*-devel.x86_64 : Development files for *valgrind*

devtoolset-2-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*

devtoolset-2-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

eclipse-*valgrind*.x86_64 : *Valgrind* Tools Integration for Eclipse

perl-Test-*Valgrind*.noarch : Generate suppressions, analyze and test any
command with *valgrind*

*valgrind*-devel.i686 : Development files for *valgrind*

*valgrind*-devel.x86_64 : Development files for *valgrind*

*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

devtoolset-1.1-*valgrind*.i686 : Tool for finding memory management bugs in
programs

devtoolset-1.1-*valgrind*.x86_64 : Tool for finding memory management bugs
in programs

devtoolset-2-*valgrind*.i686 : Tool for finding memory management bugs in
programs

devtoolset-2-*valgrind*.x86_64 : Tool for finding memory management bugs in
programs

*valgrind*.i686 : Tool for finding memory management bugs in programs

*valgrind*.x86_64 : Tool for finding memory management bugs in programs

valkyrie.x86_64 : Graphical User Interface for *Valgrind* Suite


  Name and summary matches *only*, use "search all" for everything.

2016-09-24 14:18 GMT+02:00 Henrik Størner <user-ce4a2c883f75@xymon.invalid>:
quoted from Henrik Størner
Hi,

memory leaks are the worst to troubleshoot.

If possible, then running xymond_rrd via the "valgrind" tool is the best
way to do it. valgrind comes with some distributions, not sure about RHEL
though. There might be some CentOS packages that will work.

An important point is that the binaries must be compiled with debugging
info intact; i.e. "-g" as a compile-time option, preferably only -O
optimisation, and not stripped. I guess Japheth can help you with that, if
necessary.

Then you change the tasks.cfg to run xymond_rrd via valgrind: The CMD
setting must then be

CMD valgrind --log-file=/tmp/valgrind-rrd.%p --leak-check=full \
    xymond_channel --channel=status --log=$XYMONSERVERLOGS/rrd-status.log
xymond_rrd --rrddir=$XYMONVAR/rrd
Then run Xymon normally for some time, until hopefully it starts logging
memory leaks.


This checking does have a significant performance impact, so running it on
a 4000-server system is probably not possible.


Regards,
Henrik


Den 23-09-2016 kl. 13:38 skrev Peter Welter:

Hi Japheth,

Probable one process (xymon_rrd) seems very hungry for memory:

[xymon]# ps aux | egrep 'xymon|MEM'

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

xymon    16889  0.0  0.0   4176   604 ?        S    13:26   0:00 /bin/dash

xymon    16892  0.0  0.0   6272   660 ?        S    13:26   0:00 vmstat
300 2

xymon    16986  0.0  0.0   4176   600 ?        S    13:28   0:00 /bin/dash

xymon    16989  0.0  0.0   6272   664 ?        S    13:28   0:00 vmstat
300 2

xymon    17060  0.0  0.0   4176   604 ?        S    13:30   0:00 /bin/dash

xymon    17063  0.0  0.0   6272   664 ?        S    13:30   0:00 vmstat
300 2

xymon    17107  0.5  0.1 140340 10324 ?        S    13:31   0:00
/usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl

xymon    17110  0.2  0.1 142236 11108 ?        S    13:31   0:00
/usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl

xymon    17160  0.0  0.0 106120  1248 ?        S    13:31   0:00 sh -c
/usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1

xymon    17161  0.0  0.0  60060  3440 ?        S    13:31   0:00
/usr/bin/ssh -x -l xymon 10.10.1.30 environment status

root     17163  0.0  0.0 103324   852 pts/1    S+   13:31   0:00 egrep
xymon|MEM

xymon    27932  0.0  0.0  12648   592 ?        Ss   Sep20   0:05
/usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log

xymon    27992  0.0  0.1 25212804 8160 ?       S    Sep20   1:57 xymond
--restart=/var/lib/xymon/tmp/xymond.chk --checkpoint-file=/var/lib/xymon/tmp/xymond.chk
--checkpoint-interval=600 --admin-senders=127.0.0.1,132.229.61.140
--store-clientlogs=!msgs

xymon    27996  0.0  0.0 12624444 1452 ?       S    Sep20   0:00
xymond_channel --channel=stachg xymond_history

xymon    27997  0.0  0.0 12624444 1244 ?       S    Sep20   0:00
xymond_channel --channel=page xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
--checkpoint-interval=600

xymon    27998  0.0  0.0 12624444 1340 ?       S    Sep20   0:00
xymond_channel --channel=client xymond_client

xymon    27999  0.0  0.0 12624860 4328 ?       S    Sep20   0:02
xymond_channel --channel=status xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28000  0.0  0.0 12625628 4712 ?       S    Sep20   0:00
xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28001  0.0  0.0 12624444 1320 ?       S    Sep20   0:00
xymond_channel --channel=clichg xymond_hostdata

xymon    28007  0.0  0.0  41788  1168 ?        S    Sep20   0:00
xymond_channel --channel=user --log=/var/log/xymon/vmware-monitord.log
vmware-monitord

xymon    28008  0.0  0.0 10527268 1688 ?       S    Sep20   0:00
xymond_history

xymon    28009  0.0  1.5 12624884 122508 ?     S    Sep20   0:00
xymond_client

xymon    28010  0.0  0.0 106848  2176 ?        S    Sep20   0:00 /bin/gawk
-f /usr/libexec/xymon/vmware-monitord

xymon    28011  0.0  0.0 10527252 1212 ?       S    Sep20   0:00
xymond_hostdata

*xymon    28012  0.0  9.4 12680832 765216 ?     S    Sep20   0:08
xymond_rrd --rrddir=/var/lib/xymon/rrd*

*xymon    28013  0.0 12.1 12689484 975908 ?     S    Sep20   0:12
xymond_rrd --rrddir=/var/lib/xymon/rrd*

xymon    28014  0.0  0.1 10527512 9980 ?       S    Sep20   0:00
xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
--checkpoint-interval=600
I did one test migration, were all hosts (about 4000 hosts) ran on this
system. So the directory /var/lib/xymon/rrd is quite huge. However,
currently there is only one host (xymon server itself) running and it is
testing one netapp filer. So perhaps, xymon_rrd and this large directory
are somehow related. I will have a try on the Accept environment which I
have installed by now. There are just a few files in /var/lib/xymon/rrd on
this Accept system, and I check next monday how each system will behave.

<So far an update; will be continued. next week..>


2016-09-21 13:18 GMT+02:00 Peter Welter <user-f55666bd0d1e@xymon.invalid>:
Hi Japheth,

Thanks for your response. I'm looking into this and will be back a.s.a.p.
(a few days or so, since I just restarted Xymon ;-)

Peter

2016-09-20 19:07 GMT+02:00 Japheth Cleaver <user-87556346d4af@xymon.invalid>:
On 9/20/2016 8:37 AM, Peter Welter wrote:
Hi J.C.,

First of all: Thanks for your work for Xymon!

Second: I have a question about the repository from terabithia. I want
to install an Development, Test  Accept, Production environment with the
use of this repository. I installed first and are working on the next phase.

Over time however, I see that my Xymon-server seems to eat all the
memory available and starts swapping until all memory is consumed?!?

This is for Development only and there are no really any tests. A very
small host.cfg. So, why is over time, Xymon this hungry for memory?

Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

   Memory                  Used       Total  Percentage
green Real/Physical          7737M       7872M 98%
yellow Actual/Virtual         7539M       7872M 95%
red Swap/Page              3886M       4095M         94%

After a Xymon restart, all the swap is freed?

I'm using Red Hat Enterprise Linux Server release 6.8 (Santiago)

Any suggestions what to do next? Thanks in advance for any help!

Peter
Hi Peter,

I'm not aware of any memory leaks present in 4.3.27 itself that would
cause growth like that. Can you provide the ps output for the system's
various xymon tools? Which process seems to be running out of control?

-jc
Xymon mailing user-d459c9d661b6@xymon.invalid

list Japheth Cleaver · Wed, 28 Sep 2016 09:58:18 -0700 ·
Hi,

There's no need to rebuild the packages to enable this type of testing. 
Just make sure the xymon-debuginfo RPM is installed (it's in the same 
repo), as that contains all of the symbol information on RH-type systems.

As far as valgrind, all you really need is the base 'valgrind' package. 
Simply modify the tasks.cfg as below and you should be set. I also use 
"--track-origins=yes" typically.

In terms of the overall problem, xymond_rrd will use a larg(ish) amount 
of RAM as it spools up its cache of data points before sending them out 
to rrdtool itself for writing. In theory, this should hit a constant 
level once it's been running for an hour or two (depending on your 
datapoints and hosts) and shouldn't grow beyond that. The overall memory 
usage will scale linearly with host x RRAs.

I know it had been a source of leaks before, so it's possible something 
is still in there. Are you adding and removing lots of hosts at once by 
any chance? It's possible there's an incorrect cleanup of previously 
cached data, but I'd thought those had been resolved.

HTH,
-jc
quoted from Peter Welter

On 9/28/2016 1:27 AM, Peter Welter wrote:
Hi Henrik, J.C.,

Thanks for your response.

It seems that valgrind is available for RHEL (see below) and now I 
wanted to ask J.C. the following: "What do you want me to do?"

If I want to use the prebuild packages, and YES that would be 
preferable, then can you supply me with a pre-compiles binary for 
xymond_rrd that has all the options Henrik talked about? So I can 
replace this with the currently installed image?

Or should I build a package my self to debug this issue?

Regards, Peter


[root at uhu-a xymon]# yum search valgrind

Loaded plugins: product-id, search-disabled-repos, security, 
subscription-manager

================================================================================================================================== 
N/S Matched: valgrind 
==================================================================================================================================

devtoolset-1.1-*valgrind*-devel.i686 : Development files for *valgrind*

devtoolset-1.1-*valgrind*-devel.x86_64 : Development files for *valgrind*

devtoolset-1.1-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*

devtoolset-1.1-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

devtoolset-2-eclipse-*valgrind*.noarch : *Valgrind* Tools Integration 
for Eclipse

devtoolset-2-*valgrind*-devel.i686 : Development files for *valgrind*

devtoolset-2-*valgrind*-devel.x86_64 : Development files for *valgrind*

devtoolset-2-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*

devtoolset-2-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

eclipse-*valgrind*.x86_64 : *Valgrind* Tools Integration for Eclipse

perl-Test-*Valgrind*.noarch : Generate suppressions, analyze and test 
any command with *valgrind*

*valgrind*-devel.i686 : Development files for *valgrind*

*valgrind*-devel.x86_64 : Development files for *valgrind*

*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

devtoolset-1.1-*valgrind*.i686 : Tool for finding memory management 
bugs in programs

devtoolset-1.1-*valgrind*.x86_64 : Tool for finding memory management 
bugs in programs

devtoolset-2-*valgrind*.i686 : Tool for finding memory management bugs 
in programs

devtoolset-2-*valgrind*.x86_64 : Tool for finding memory management 
bugs in programs

*valgrind*.i686 : Tool for finding memory management bugs in programs

*valgrind*.x86_64 : Tool for finding memory management bugs in programs

valkyrie.x86_64 : Graphical User Interface for *Valgrind* Suite


  Name and summary matches *only*, use "search all" for everything.


2016-09-24 14:18 GMT+02:00 Henrik Størner <user-ce4a2c883f75@xymon.invalid 

<mailto:user-ce4a2c883f75@xymon.invalid>>:
quoted from Peter Welter

    Hi,

    memory leaks are the worst to troubleshoot.

    If possible, then running xymond_rrd via the "valgrind" tool is
    the best way to do it. valgrind comes with some distributions, not
    sure about RHEL though. There might be some CentOS packages that
    will work.

    An important point is that the binaries must be compiled with
    debugging info intact; i.e. "-g" as a compile-time option,
    preferably only -O optimisation, and not stripped. I guess Japheth
    can help you with that, if necessary.

    Then you change the tasks.cfg to run xymond_rrd via valgrind: The
    CMD setting must then be

    CMD valgrind --log-file=/tmp/valgrind-rrd.%p --leak-check=full \
        xymond_channel --channel=status
    --log=$XYMONSERVERLOGS/rrd-status.log xymond_rrd
    --rrddir=$XYMONVAR/rrd

    Then run Xymon normally for some time, until hopefully it starts
    logging memory leaks.


    This checking does have a significant performance impact, so
    running it on a 4000-server system is probably not possible.


    Regards,
    Henrik


    Den 23-09-2016 kl. 13:38 skrev Peter Welter:
    Hi Japheth,

    Probable one process (xymon_rrd) seems very hungry for memory:

    [xymon]# ps aux | egrep 'xymon|MEM'

    USER       PID %CPU %MEM    VSZ   RSS TTY STAT START   TIME COMMAND

    xymon   16889  0.0  0.0   4176   604 ?        S   13:26   0:00
    /bin/dash

    xymon   16892  0.0  0.0   6272   660 ?        S   13:26   0:00
    vmstat 300 2

    xymon   16986  0.0  0.0   4176   600 ?        S   13:28   0:00
    /bin/dash

    xymon   16989  0.0  0.0   6272   664 ?        S   13:28   0:00
    vmstat 300 2

    xymon   17060  0.0  0.0   4176   604 ?        S   13:30   0:00
    /bin/dash

    xymon   17063  0.0  0.0   6272   664 ?        S   13:30   0:00
    vmstat 300 2

    xymon   17107  0.5  0.1 140340 <tel:XXXXXX> 10324 ?        S   
quoted from Peter Welter
    13:31   0:00 /usr/bin/perl -w -I/home/bbtest/server/ext
    /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>;

    xymon   17110  0.2  0.1 142236 11108 ?        S   13:31   0:00
    /usr/bin/perl -w -I/home/bbtest/server/ext
    /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>;

    xymon   17160  0.0  0.0 106120  1248 ?        S   13:31   0:00 sh
    -c /usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1

    xymon   17161  0.0  0.0  60060  3440 ?        S   13:31   0:00
    /usr/bin/ssh -x -l xymon 10.10.1.30 environment status

    root     17163  0.0  0.0 103324   852 pts/1 S+   13:31   0:00
    egrep xymon|MEM

    xymon   27932  0.0  0.0  12648   592 ?        Ss   Sep20   0:05
    /usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log

    xymon   27992  0.0  0.1 25212804 8160 ?       S   Sep20   1:57
    xymond --restart=/var/lib/xymon/tmp/xymond.chk
    --checkpoint-file=/var/lib/xymon/tmp/xymond.chk
    --checkpoint-interval=600
    --admin-senders=127.0.0.1,132.229.61.140 --store-clientlogs=!msgs

    xymon   27996  0.0  0.0 12624444 1452 ?       S   Sep20   0:00
    xymond_channel --channel=stachg xymond_history

    xymon   27997  0.0  0.0 12624444 1244 ?       S   Sep20   0:00
    xymond_channel --channel=page xymond_alert
    --checkpoint-file=/var/lib/xymon/tmp/alert.chk
    --checkpoint-interval=600

    xymon   27998  0.0  0.0 12624444 1340 ?       S   Sep20   0:00
    xymond_channel --channel=client xymond_client

    xymon   27999  0.0  0.0 12624860 4328 ?       S   Sep20   0:02
    xymond_channel --channel=status xymond_rrd
    --rrddir=/var/lib/xymon/rrd

    xymon   28000  0.0  0.0 12625628 4712 ?       S   Sep20   0:00
    xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd

    xymon   28001  0.0  0.0 12624444 1320 ?       S   Sep20   0:00
    xymond_channel --channel=clichg xymond_hostdata

    xymon   28007  0.0  0.0  41788  1168 ?        S   Sep20   0:00
    xymond_channel --channel=user
    --log=/var/log/xymon/vmware-monitord.log vmware-monitord

    xymon   28008  0.0  0.0 10527268 1688 ?       S   Sep20   0:00
    xymond_history

    xymon   28009  0.0  1.5 12624884 122508 ?     S   Sep20   0:00
    xymond_client

    xymon   28010  0.0  0.0 106848  2176 ?        S   Sep20   0:00
    /bin/gawk -f /usr/libexec/xymon/vmware-monitord

    xymon   28011  0.0  0.0 10527252 1212 ?       S   Sep20   0:00
    xymond_hostdata

    *xymon   28012  0.0  9.4 12680832 765216 ? S    Sep20   0:08
    xymond_rrd --rrddir=/var/lib/xymon/rrd*

    *xymon   28013  0.0 12.1 12689484 975908 ? S    Sep20   0:12
    xymond_rrd --rrddir=/var/lib/xymon/rrd*

    xymon 28014  0.0  0.1 10527512 9980 ?       S Sep20   0:00
    xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
    --checkpoint-interval=600

    I did one test migration, were all hosts (about 4000 hosts) ran
    on this system. So the directory /var/lib/xymon/rrd is quite
    huge. However, currently there is only one host (xymon server
    itself) running and it is testing one netapp filer. So perhaps,
    xymon_rrd and this large directory are somehow related. I will
    have a try on the Accept environment which I have installed by
    now. There are just a few files in /var/lib/xymon/rrd on this
    Accept system, and I check next monday how each system will behave.

    <So far an update; will be continued. next week..>


    2016-09-21 13:18 GMT+02:00 Peter Welter <user-f55666bd0d1e@xymon.invalid
    <mailto:user-f55666bd0d1e@xymon.invalid>>:

        Hi Japheth,

        Thanks for your response. I'm looking into this and will be
        back a.s.a.p. (a few days or so, since I just restarted Xymon ;-)

        Peter

        2016-09-20 19:07 GMT+02:00 Japheth Cleaver
        <user-87556346d4af@xymon.invalid <mailto:user-87556346d4af@xymon.invalid>>:

            On 9/20/2016 8:37 AM, Peter Welter wrote:

                Hi J.C.,

                First of all: Thanks for your work for Xymon!

                Second: I have a question about the repository from
                terabithia. I want to install an Development, Test 
                Accept, Production environment with the use of this
                repository. I installed first and are working on the
                next phase.

                Over time however, I see that my Xymon-server seems
                to eat all the memory available and starts swapping
                until all memory is consumed?!?

                This is for Development only and there are no really
                any tests. A very small host.cfg. So, why is over
                time, Xymon this hungry for memory?

                Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

                   Memory Used       Total  Percentage
                green Real/Physical 7737M       7872M 98%
                yellow Actual/Virtual  7539M       7872M 95%
                red Swap/Page 3886M       4095M  94%

                After a Xymon restart, all the swap is freed?

                I'm using Red Hat Enterprise Linux Server release 6.8
                (Santiago)

                Any suggestions what to do next? Thanks in advance
                for any help!

                Peter


            Hi Peter,

            I'm not aware of any memory leaks present in 4.3.27
            itself that would cause growth like that. Can you provide
            the ps output for the system's various xymon tools? Which
            process seems to be running out of control?

            -jc


    <
    
    <