Xymon Mailing List Archive search

xymon hostdata module going rogue

list Japheth Cleaver
Wed, 10 Jun 2015 10:20:45 -0700
Message-Id: <user-d5d05c8f5a14@xymon.invalid>

On Wed, June 10, 2015 10:01 am, Scot Kreienkamp wrote:
Hi everyone,

I have a xymon server running 4.3.21 that seems to be accumulating
processes like these:

hobbit   28430  0.0  0.0      0     0 ?        Z    12:50   0:00
[xymond_hostdata] <defunct>
hobbit   28435  0.0  0.0      0     0 ?        Z    12:50   0:00
[xymond_hostdata] <defunct>
hobbit   28440  0.0  0.0      0     0 ?        Z    12:50   0:00
[xymond_hostdata] <defunct>
hobbit   28444  0.0  0.0      0     0 ?        Z    12:50   0:00
[xymond_hostdata] <defunct>
hobbit   28449  0.0  0.0      0     0 ?        Z    12:50   0:00
[xymond_hostdata] <defunct>
hobbit   28452  0.0  0.0      0     0 ?        Z    12:50   0:00
[xymond_hostdata] <defunct>

It seemed related to drop messages, so I did a test.


[hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
auxw |grep xymond_hostdata |wc -l
161
[hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
auxw |grep xymond_hostdata |wc -l
162
[hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
auxw |grep xymond_hostdata |wc -l
163
[hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
auxw |grep xymond_hostdata |wc -l
164
[hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
auxw |grep xymond_hostdata |wc -l
165
[hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
auxw |grep xymond_hostdata |wc -l
166
[hobbit at retv6100 temp]$ xymon 127.0.0.1 "drop amds7101_na_lzb_hq" ; ps
auxw |grep xymond_hostdata |wc -l
167

So every time I send a drop message I get a defunct process hanging out.
Bug in Xymon?

This is on RHEL5, xymon 4.3.21.

Thanks!

Scot,


Some background: When doing a full drop on a host, xymond_hostdata (and
xymond_history, IIRC) forks to perform the recursive directory removal of
history files and whatnot in the background, then exits out. That's why it
corresponds to those events.


Looks like xymond_hostdata.c is missing a SIGCHLD registration, which is
causing the defunct processes to stack up. Strangely, I haven't observed
this behavior on RHEL6 at all though, even though we're dropping hosts all
the time. Odd.


The following patch should fix the issue for you, I believe.


Regards,

-jc
Attachments (1)