Xymon Mailing List Archive search

Possible Memory Leak (?!) in Version Xymon 4.3.27-1.el6.terabithia

list Japheth Cleaver
Wed, 28 Sep 2016 09:58:18 -0700
Message-Id: <user-87a290a9f073@xymon.invalid>

Hi,

There's no need to rebuild the packages to enable this type of testing. 
Just make sure the xymon-debuginfo RPM is installed (it's in the same 
repo), as that contains all of the symbol information on RH-type systems.

As far as valgrind, all you really need is the base 'valgrind' package. 
Simply modify the tasks.cfg as below and you should be set. I also use 
"--track-origins=yes" typically.

In terms of the overall problem, xymond_rrd will use a larg(ish) amount 
of RAM as it spools up its cache of data points before sending them out 
to rrdtool itself for writing. In theory, this should hit a constant 
level once it's been running for an hour or two (depending on your 
datapoints and hosts) and shouldn't grow beyond that. The overall memory 
usage will scale linearly with host x RRAs.

I know it had been a source of leaks before, so it's possible something 
is still in there. Are you adding and removing lots of hosts at once by 
any chance? It's possible there's an incorrect cleanup of previously 
cached data, but I'd thought those had been resolved.

HTH,
-jc

On 9/28/2016 1:27 AM, Peter Welter wrote:
Hi Henrik, J.C.,

Thanks for your response.

It seems that valgrind is available for RHEL (see below) and now I 
wanted to ask J.C. the following: "What do you want me to do?"

If I want to use the prebuild packages, and YES that would be 
preferable, then can you supply me with a pre-compiles binary for 
xymond_rrd that has all the options Henrik talked about? So I can 
replace this with the currently installed image?

Or should I build a package my self to debug this issue?

Regards, Peter


[root at uhu-a xymon]# yum search valgrind

Loaded plugins: product-id, search-disabled-repos, security, 
subscription-manager

================================================================================================================================== 
N/S Matched: valgrind 
==================================================================================================================================

devtoolset-1.1-*valgrind*-devel.i686 : Development files for *valgrind*

devtoolset-1.1-*valgrind*-devel.x86_64 : Development files for *valgrind*

devtoolset-1.1-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*

devtoolset-1.1-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

devtoolset-2-eclipse-*valgrind*.noarch : *Valgrind* Tools Integration 
for Eclipse

devtoolset-2-*valgrind*-devel.i686 : Development files for *valgrind*

devtoolset-2-*valgrind*-devel.x86_64 : Development files for *valgrind*

devtoolset-2-*valgrind*-openmpi.i686 : OpenMPI support for *valgrind*

devtoolset-2-*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

eclipse-*valgrind*.x86_64 : *Valgrind* Tools Integration for Eclipse

perl-Test-*Valgrind*.noarch : Generate suppressions, analyze and test 
any command with *valgrind*

*valgrind*-devel.i686 : Development files for *valgrind*

*valgrind*-devel.x86_64 : Development files for *valgrind*

*valgrind*-openmpi.x86_64 : OpenMPI support for *valgrind*

devtoolset-1.1-*valgrind*.i686 : Tool for finding memory management 
bugs in programs

devtoolset-1.1-*valgrind*.x86_64 : Tool for finding memory management 
bugs in programs

devtoolset-2-*valgrind*.i686 : Tool for finding memory management bugs 
in programs

devtoolset-2-*valgrind*.x86_64 : Tool for finding memory management 
bugs in programs

*valgrind*.i686 : Tool for finding memory management bugs in programs

*valgrind*.x86_64 : Tool for finding memory management bugs in programs

valkyrie.x86_64 : Graphical User Interface for *Valgrind* Suite


  Name and summary matches *only*, use "search all" for everything.


2016-09-24 14:18 GMT+02:00 Henrik Størner <user-ce4a2c883f75@xymon.invalid 
<mailto:user-ce4a2c883f75@xymon.invalid>>:

    Hi,

    memory leaks are the worst to troubleshoot.

    If possible, then running xymond_rrd via the "valgrind" tool is
    the best way to do it. valgrind comes with some distributions, not
    sure about RHEL though. There might be some CentOS packages that
    will work.

    An important point is that the binaries must be compiled with
    debugging info intact; i.e. "-g" as a compile-time option,
    preferably only -O optimisation, and not stripped. I guess Japheth
    can help you with that, if necessary.

    Then you change the tasks.cfg to run xymond_rrd via valgrind: The
    CMD setting must then be

    CMD valgrind --log-file=/tmp/valgrind-rrd.%p --leak-check=full \
        xymond_channel --channel=status
    --log=$XYMONSERVERLOGS/rrd-status.log xymond_rrd
    --rrddir=$XYMONVAR/rrd

    Then run Xymon normally for some time, until hopefully it starts
    logging memory leaks.


    This checking does have a significant performance impact, so
    running it on a 4000-server system is probably not possible.


    Regards,
    Henrik


    Den 23-09-2016 kl. 13:38 skrev Peter Welter:
    Hi Japheth,

    Probable one process (xymon_rrd) seems very hungry for memory:

    [xymon]# ps aux | egrep 'xymon|MEM'

    USER       PID %CPU %MEM    VSZ   RSS TTY STAT START   TIME COMMAND

    xymon   16889  0.0  0.0   4176   604 ?        S   13:26   0:00
    /bin/dash

    xymon   16892  0.0  0.0   6272   660 ?        S   13:26   0:00
    vmstat 300 2

    xymon   16986  0.0  0.0   4176   600 ?        S   13:28   0:00
    /bin/dash

    xymon   16989  0.0  0.0   6272   664 ?        S   13:28   0:00
    vmstat 300 2

    xymon   17060  0.0  0.0   4176   604 ?        S   13:30   0:00
    /bin/dash

    xymon   17063  0.0  0.0   6272   664 ?        S   13:30   0:00
    vmstat 300 2

    xymon   17107  0.5  0.1 140340 <tel:XXXXXX> 10324 ?        S   
    13:31   0:00 /usr/bin/perl -w -I/home/bbtest/server/ext
    /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>;

    xymon   17110  0.2  0.1 142236 11108 ?        S   13:31   0:00
    /usr/bin/perl -w -I/home/bbtest/server/ext
    /etc/xymon/ext/netapp/netapp.pl <http://netapp.pl>;

    xymon   17160  0.0  0.0 106120  1248 ?        S   13:31   0:00 sh
    -c /usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1

    xymon   17161  0.0  0.0  60060  3440 ?        S   13:31   0:00
    /usr/bin/ssh -x -l xymon 10.10.1.30 environment status

    root     17163  0.0  0.0 103324   852 pts/1 S+   13:31   0:00
    egrep xymon|MEM

    xymon   27932  0.0  0.0  12648   592 ?        Ss   Sep20   0:05
    /usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log

    xymon   27992  0.0  0.1 25212804 8160 ?       S   Sep20   1:57
    xymond --restart=/var/lib/xymon/tmp/xymond.chk
    --checkpoint-file=/var/lib/xymon/tmp/xymond.chk
    --checkpoint-interval=600
    --admin-senders=127.0.0.1,132.229.61.140 --store-clientlogs=!msgs

    xymon   27996  0.0  0.0 12624444 1452 ?       S   Sep20   0:00
    xymond_channel --channel=stachg xymond_history

    xymon   27997  0.0  0.0 12624444 1244 ?       S   Sep20   0:00
    xymond_channel --channel=page xymond_alert
    --checkpoint-file=/var/lib/xymon/tmp/alert.chk
    --checkpoint-interval=600

    xymon   27998  0.0  0.0 12624444 1340 ?       S   Sep20   0:00
    xymond_channel --channel=client xymond_client

    xymon   27999  0.0  0.0 12624860 4328 ?       S   Sep20   0:02
    xymond_channel --channel=status xymond_rrd
    --rrddir=/var/lib/xymon/rrd

    xymon   28000  0.0  0.0 12625628 4712 ?       S   Sep20   0:00
    xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd

    xymon   28001  0.0  0.0 12624444 1320 ?       S   Sep20   0:00
    xymond_channel --channel=clichg xymond_hostdata

    xymon   28007  0.0  0.0  41788  1168 ?        S   Sep20   0:00
    xymond_channel --channel=user
    --log=/var/log/xymon/vmware-monitord.log vmware-monitord

    xymon   28008  0.0  0.0 10527268 1688 ?       S   Sep20   0:00
    xymond_history

    xymon   28009  0.0  1.5 12624884 122508 ?     S   Sep20   0:00
    xymond_client

    xymon   28010  0.0  0.0 106848  2176 ?        S   Sep20   0:00
    /bin/gawk -f /usr/libexec/xymon/vmware-monitord

    xymon   28011  0.0  0.0 10527252 1212 ?       S   Sep20   0:00
    xymond_hostdata

    *xymon   28012  0.0  9.4 12680832 765216 ? S    Sep20   0:08
    xymond_rrd --rrddir=/var/lib/xymon/rrd*

    *xymon   28013  0.0 12.1 12689484 975908 ? S    Sep20   0:12
    xymond_rrd --rrddir=/var/lib/xymon/rrd*

    xymon 28014  0.0  0.1 10527512 9980 ?       S Sep20   0:00
    xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
    --checkpoint-interval=600

    I did one test migration, were all hosts (about 4000 hosts) ran
    on this system. So the directory /var/lib/xymon/rrd is quite
    huge. However, currently there is only one host (xymon server
    itself) running and it is testing one netapp filer. So perhaps,
    xymon_rrd and this large directory are somehow related. I will
    have a try on the Accept environment which I have installed by
    now. There are just a few files in /var/lib/xymon/rrd on this
    Accept system, and I check next monday how each system will behave.

    <So far an update; will be continued. next week..>


    2016-09-21 13:18 GMT+02:00 Peter Welter <user-f55666bd0d1e@xymon.invalid
    <mailto:user-f55666bd0d1e@xymon.invalid>>:

        Hi Japheth,

        Thanks for your response. I'm looking into this and will be
        back a.s.a.p. (a few days or so, since I just restarted Xymon ;-)

        Peter

        2016-09-20 19:07 GMT+02:00 Japheth Cleaver
        <user-87556346d4af@xymon.invalid <mailto:user-87556346d4af@xymon.invalid>>:

            On 9/20/2016 8:37 AM, Peter Welter wrote:

                Hi J.C.,

                First of all: Thanks for your work for Xymon!

                Second: I have a question about the repository from
                terabithia. I want to install an Development, Test 
                Accept, Production environment with the use of this
                repository. I installed first and are working on the
                next phase.

                Over time however, I see that my Xymon-server seems
                to eat all the memory available and starts swapping
                until all memory is consumed?!?

                This is for Development only and there are no really
                any tests. A very small host.cfg. So, why is over
                time, Xymon this hungry for memory?

                Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

                   Memory Used       Total  Percentage
                green Real/Physical 7737M       7872M 98%
                yellow Actual/Virtual  7539M       7872M 95%
                red Swap/Page 3886M       4095M  94%

                After a Xymon restart, all the swap is freed?

                I'm using Red Hat Enterprise Linux Server release 6.8
                (Santiago)

                Any suggestions what to do next? Thanks in advance
                for any help!

                Peter


            Hi Peter,

            I'm not aware of any memory leaks present in 4.3.27
            itself that would cause growth like that. Can you provide
            the ps output for the system's various xymon tools? Which
            process seems to be running out of control?

            -jc


    <
    
    <