Xymon Mailing List Archive search

Possible Memory Leak (?!) in Version Xymon 4.3.27-1.el6.terabithia

list Peter Welter
Fri, 23 Sep 2016 13:38:36 +0200
Message-Id: <user-0a19b6ea1dd1@xymon.invalid>

Hi Japheth,

Probable one process (xymon_rrd) seems very hungry for memory:

[xymon]# ps aux | egrep 'xymon|MEM'

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND

xymon    16889  0.0  0.0   4176   604 ?        S    13:26   0:00 /bin/dash

xymon    16892  0.0  0.0   6272   660 ?        S    13:26   0:00 vmstat 300
2

xymon    16986  0.0  0.0   4176   600 ?        S    13:28   0:00 /bin/dash

xymon    16989  0.0  0.0   6272   664 ?        S    13:28   0:00 vmstat 300
2

xymon    17060  0.0  0.0   4176   604 ?        S    13:30   0:00 /bin/dash

xymon    17063  0.0  0.0   6272   664 ?        S    13:30   0:00 vmstat 300
2

xymon    17107  0.5  0.1 140340 10324 ?        S    13:31   0:00
/usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl

xymon    17110  0.2  0.1 142236 11108 ?        S    13:31   0:00
/usr/bin/perl -w -I/home/bbtest/server/ext /etc/xymon/ext/netapp/netapp.pl

xymon    17160  0.0  0.0 106120  1248 ?        S    13:31   0:00 sh -c
/usr/bin/ssh -x -l xymon xxx.xxx.xxx.xxx "environment status" 2>&1

xymon    17161  0.0  0.0  60060  3440 ?        S    13:31   0:00
/usr/bin/ssh -x -l xymon 10.10.1.30 environment status

root     17163  0.0  0.0 103324   852 pts/1    S+   13:31   0:00 egrep
xymon|MEM

xymon    27932  0.0  0.0  12648   592 ?        Ss   Sep20   0:05
/usr/sbin/xymonlaunch --log=/var/log/xymon/xymonlaunch.log

xymon    27992  0.0  0.1 25212804 8160 ?       S    Sep20   1:57 xymond
--restart=/var/lib/xymon/tmp/xymond.chk
--checkpoint-file=/var/lib/xymon/tmp/xymond.chk --checkpoint-interval=600
--admin-senders=127.0.0.1,132.229.61.140 --store-clientlogs=!msgs

xymon    27996  0.0  0.0 12624444 1452 ?       S    Sep20   0:00
xymond_channel --channel=stachg xymond_history

xymon    27997  0.0  0.0 12624444 1244 ?       S    Sep20   0:00
xymond_channel --channel=page xymond_alert
--checkpoint-file=/var/lib/xymon/tmp/alert.chk --checkpoint-interval=600

xymon    27998  0.0  0.0 12624444 1340 ?       S    Sep20   0:00
xymond_channel --channel=client xymond_client

xymon    27999  0.0  0.0 12624860 4328 ?       S    Sep20   0:02
xymond_channel --channel=status xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28000  0.0  0.0 12625628 4712 ?       S    Sep20   0:00
xymond_channel --channel=data xymond_rrd --rrddir=/var/lib/xymon/rrd

xymon    28001  0.0  0.0 12624444 1320 ?       S    Sep20   0:00
xymond_channel --channel=clichg xymond_hostdata

xymon    28007  0.0  0.0  41788  1168 ?        S    Sep20   0:00
xymond_channel --channel=user --log=/var/log/xymon/vmware-monitord.log
vmware-monitord

xymon    28008  0.0  0.0 10527268 1688 ?       S    Sep20   0:00
xymond_history

xymon    28009  0.0  1.5 12624884 122508 ?     S    Sep20   0:00
xymond_client

xymon    28010  0.0  0.0 106848  2176 ?        S    Sep20   0:00 /bin/gawk
-f /usr/libexec/xymon/vmware-monitord

xymon    28011  0.0  0.0 10527252 1212 ?       S    Sep20   0:00
xymond_hostdata

*xymon    28012  0.0  9.4 12680832 765216 ?     S    Sep20   0:08
xymond_rrd --rrddir=/var/lib/xymon/rrd*

*xymon    28013  0.0 12.1 12689484 975908 ?     S    Sep20   0:12
xymond_rrd --rrddir=/var/lib/xymon/rrd*

xymon    28014  0.0  0.1 10527512 9980 ?       S    Sep20   0:00
xymond_alert --checkpoint-file=/var/lib/xymon/tmp/alert.chk
--checkpoint-interval=600
I did one test migration, were all hosts (about 4000 hosts) ran on this
system. So the directory /var/lib/xymon/rrd is quite huge. However,
currently there is only one host (xymon server itself) running and it is
testing one netapp filer. So perhaps, xymon_rrd and this large directory
are somehow related. I will have a try on the Accept environment which I
have installed by now. There are just a few files in /var/lib/xymon/rrd on
this Accept system, and I check next monday how each system will behave.

<So far an update; will be continued. next week..>


2016-09-21 13:18 GMT+02:00 Peter Welter <user-f55666bd0d1e@xymon.invalid>:
Hi Japheth,

Thanks for your response. I'm looking into this and will be back a.s.a.p.
(a few days or so, since I just restarted Xymon ;-)

Peter

2016-09-20 19:07 GMT+02:00 Japheth Cleaver <user-87556346d4af@xymon.invalid>:
On 9/20/2016 8:37 AM, Peter Welter wrote:
Hi J.C.,

First of all: Thanks for your work for Xymon!

Second: I have a question about the repository from terabithia. I want
to install an Development, Test  Accept, Production environment with the
use of this repository. I installed first and are working on the next phase.

Over time however, I see that my Xymon-server seems to eat all the
memory available and starts swapping until all memory is consumed?!?

This is for Development only and there are no really any tests. A very
small host.cfg. So, why is over time, Xymon this hungry for memory?

Tue Sep 20 17:29:46 CEST 2016 - Memory CRITICAL

   Memory                  Used       Total  Percentage
green Real/Physical          7737M       7872M 98%
yellow Actual/Virtual         7539M       7872M 95%
red Swap/Page              3886M       4095M         94%

After a Xymon restart, all the swap is freed?

I'm using Red Hat Enterprise Linux Server release 6.8 (Santiago)

Any suggestions what to do next? Thanks in advance for any help!

Peter
Hi Peter,

I'm not aware of any memory leaks present in 4.3.27 itself that would
cause growth like that. Can you provide the ps output for the system's
various xymon tools? Which process seems to be running out of control?

-jc