Xymon Mailing List Archive search

Could not fork checkpoint child:Cannot allocate memory

3 messages in this thread

list Raul GN · Mon, 21 Jul 2014 12:24:08 +0200 ·
After upgrading from xymon version 4.3.12 to 4.3.17  xymond daemon memory
grow without any limit. After a 2 or 3 days this messages appear in logs:

2014-07-17 16:27:46 Setup complete
2014-07-18 04:16:49 Flapping detected for web.int:http - 10 changes in 868
seconds
2014-07-18 04:16:49 Flapping detected for web.int:tomcat - 10 changes in
868 seconds
2014-07-18 04:18:23 Flapping detected for web.int:http - 10 changes in 892
seconds
2014-07-18 04:18:23 Flapping detected for web.int:tomcat - 10 changes in
892 seconds
2014-07-18 18:25:44 Flapping detected for web.int:http - 10 changes in 808
seconds
2014-07-18 18:25:44 Flapping detected for web.int:tomcat - 10 changes in
808 seconds
2014-07-18 23:40:53 Could not fork checkpoint child:Cannot allocate memory
2014-07-18 23:50:54 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:00:55 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:10:56 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:20:57 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:30:58 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:40:59 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:51:00 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:01:01 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:11:02 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:21:03 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:31:04 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:41:05 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:51:06 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 02:01:07 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 02:11:08 Could not fork checkpoint child:Cannot allocate memory

¿Anybody knows how to avoid this problem? If I don't reboot xymon it crash.
list Japheth Cleaver · Mon, 21 Jul 2014 10:12:23 -0700 ·
quoted from Raul GN

On Mon, July 21, 2014 3:24 am, Raul GN wrote:
After upgrading from xymon version 4.3.12 to 4.3.17  xymond daemon memory
grow without any limit. After a 2 or 3 days this messages appear in logs:

2014-07-17 16:27:46 Setup complete
2014-07-18 04:16:49 Flapping detected for web.int:http - 10 changes in 868
seconds
2014-07-18 04:16:49 Flapping detected for web.int:tomcat - 10 changes in
868 seconds
2014-07-18 04:18:23 Flapping detected for web.int:http - 10 changes in 892
seconds
2014-07-18 04:18:23 Flapping detected for web.int:tomcat - 10 changes in
892 seconds
2014-07-18 18:25:44 Flapping detected for web.int:http - 10 changes in 808
seconds
2014-07-18 18:25:44 Flapping detected for web.int:tomcat - 10 changes in
808 seconds
2014-07-18 23:40:53 Could not fork checkpoint child:Cannot allocate memory
2014-07-18 23:50:54 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:00:55 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:10:56 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:20:57 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:30:58 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:40:59 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 00:51:00 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:01:01 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:11:02 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:21:03 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:31:04 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:41:05 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 01:51:06 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 02:01:07 Could not fork checkpoint child:Cannot allocate memory
2014-07-19 02:11:08 Could not fork checkpoint child:Cannot allocate memory

¿Anybody knows how to avoid this problem? If I don't reboot xymon it
crash.

Hmm. If it's growing truly without limit there's something unusual going
on; I'd take the memory allocation error later on at face value.

Can you provide any additional details? Do you have an unusual workload or
ulimits on the xymon user? Or a large number of host inserts/removals?
What OS are you running?


Regards,

-jc
list Raul GN · Tue, 22 Jul 2014 10:09:49 +0200 ·
Yes, I can provide you all information you need. Xymon was upgraded on 15
July and this is memory graph:

[image: Inline image 1]

Xymon is instaled in Debian 6.0.9.

This are our ulimit configuration. We didn't change it so it should be
defaults values:
root: #ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63975
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63975
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

xymon: $ ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        8192
coredump(blocks)     0
memory(kbytes)       unlimited
locked memory(kbytes) 64
process              63975
nofiles              1024
vmemory(kbytes)      unlimited
locks                unlimited

We monitor 1200 hosts and don't add or remove many hosts. May be 20 host a
week:


[image: Inline image 2]

xymond test is:
Statistics for Xymon daemon
Version: 4.3.17
Up since 21-Jul-2014 11:37:37 (0 days, 22:00:00)

Incoming messages      :    4335475
- status               :    2251363
- combo                :      24050
- extcombo             :     104869
- page                 :        160
- summary              :          0
- data                 :    1354532
- client               :      92341
- notes                :          0
- enable               :          0
- disable              :          0
- ack                  :          2
- config               :       5968
- query                :       4471
- xymondboard          :     146651
- xymondlog            :     341645
- drop                 :          3
- rename               :          0
- dummy                :        328
- ping                 :          0
- notify               :          0
- schedule             :        298
- download             :          0
- Bogus/Timeouts       :       8794
Incoming messages/sec  :         52 (average last 300 seconds)

status channel messages:    2244782 (1 readers)
stachg channel messages:      10913 (1 readers)
page   channel messages:      49720 (1 readers)
data   channel messages:    1349077 (1 readers)
notes  channel messages:          0 (0 readers)
enadis channel messages:          0 (0 readers)
client channel messages:      90465 (1 readers)
clichg channel messages:        360 (1 readers)
user   channel messages:          0 (0 readers)
backfeed messages      :          0

Ghost reports:
  10.6.71.66      reported host
  10.6.42.103     reported host 10.6.42.10
  10.6.42.103     reported host 10.6.42.11
  10.6.42.103     reported host 10.6.42.12
  10.6.42.103     reported host 10.6.42.13
  10.6.42.103     reported host 10.6.42.14
  10.6.42.103     reported host 10.6.42.15
  10.6.42.103     reported host 10.6.42.4
  10.6.42.103     reported host 10.6.42.5
  10.1.0.194      reported host ePagess_app_3
  10.1.0.86       reported host IIS6_6_1
  10.1.0.87       reported host IIS6_6_2
  10.1.0.194      reported host main0404


Multi-source statuses
  admin01:conn              reported by 10.6.42.103 and 10.0.0.29


And xymond section in task.cfg:

[xymond]
    ENVFILE /usr/lib/xymon/server/etc/xymonserver.cfg
    CMD xymond --pidfile=$XYMONSERVERLOGS/xymond.pid \
        --restart=$XYMONTMP/xymond.chk
--checkpoint-file=$XYMONTMP/xymond.chk --checkpoint-interval=600 \
        --log=$XYMONSERVERLOGS/xymond.log \
        --admin-senders=127.0.0.1,$XYMONSERVERIP \
        --store-clientlogs=!msgs \
        --maint-senders=127.0.0.1,$XYMONSERVERIP \
        --www-senders=127.0.0.1,10.0.0.0/24,10.6.42.103 \
        --flap-count=10 \
        --flap-seconds=900


We also change some MAX variables in xymonserver.cfg:
MAXLINE="32768"
MAXMSG_DATA="5242880"
MAXMSG_CLIENT="5242880"
MAXMSG_STATUS="5242880"

Thank you for you help.


On Mon, Jul 21, 2014 at 7:12 PM, J.C. Cleaver <user-87556346d4af@xymon.invalid>
quoted from Japheth Cleaver
wrote:
On Mon, July 21, 2014 3:24 am, Raul GN wrote:
After upgrading from xymon version 4.3.12 to 4.3.17  xymond daemon memory
grow without any limit. After a 2 or 3 days this messages appear in logs:

2014-07-17 16:27:46 Setup complete
2014-07-18 04:16:49 Flapping detected for web.int:http - 10 changes in
868
seconds
2014-07-18 04:16:49 Flapping detected for web.int:tomcat - 10 changes in
868 seconds
2014-07-18 04:18:23 Flapping detected for web.int:http - 10 changes in
892
seconds
2014-07-18 04:18:23 Flapping detected for web.int:tomcat - 10 changes in
892 seconds
2014-07-18 18:25:44 Flapping detected for web.int:http - 10 changes in
808
seconds
2014-07-18 18:25:44 Flapping detected for web.int:tomcat - 10 changes in
808 seconds
2014-07-18 23:40:53 Could not fork checkpoint child:Cannot allocate
memory
2014-07-18 23:50:54 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 00:00:55 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 00:10:56 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 00:20:57 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 00:30:58 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 00:40:59 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 00:51:00 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 01:01:01 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 01:11:02 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 01:21:03 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 01:31:04 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 01:41:05 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 01:51:06 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 02:01:07 Could not fork checkpoint child:Cannot allocate
memory
2014-07-19 02:11:08 Could not fork checkpoint child:Cannot allocate
memory

żAnybody knows how to avoid this problem? If I don't reboot xymon it
quoted from Japheth Cleaver
crash.

Hmm. If it's growing truly without limit there's something unusual going
on; I'd take the memory allocation error later on at face value.

Can you provide any additional details? Do you have an unusual workload or
ulimits on the xymon user? Or a large number of host inserts/removals?
What OS are you running?


Regards,

-jc

Attachments (1)