Xymon Mailing List Archive search

Xymon client doesn't clean up all of its children

10 messages in this thread

list Mark Felder · Thu, 26 Feb 2015 12:14:22 -0600 ·
~~ I can't verify on other OSes right now, so I'm hoping someone can
chime in ~~

On FreeBSD when I stop the Xymon client process it doesn't clean up all
of its children. Primarily you'll find that the vmstat command is not
sent a signal and continues to run ... indefinitely?

As a result, restarting the Xymon client leaves vmstat processes around
that really should not be there. I'm working around this right now, but
is it possible this is being seen on other platforms and it's a bug in
the client shutdown code?


Thanks!
list John Thurston · Thu, 26 Feb 2015 09:17:29 -0900 ·
quoted from Mark Felder
On 2/26/2015 9:14 AM, Mark Felder wrote:
~~ I can't verify on other OSes right now, so I'm hoping someone can
chime in ~~

On FreeBSD when I stop the Xymon client process it doesn't clean up all
of its children. Primarily you'll find that the vmstat command is not
sent a signal and continues to run ... indefinitely?
I have observed this behavior on Solaris, but the vmstat does eventually 
disappear. It does not run forever.

-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska
list Ryan Novosielski · Thu, 26 Feb 2015 13:44:53 -0500 ·
However, the result is that you can't do a restart on Solaris 10 with SMF if you are using vmstat. I have patched my scripts on Solaris to kill the child processes.

____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid<mailto:user-46c89e614701@xymon.invalid>- 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
quoted from John Thurston
    `'

On Feb 26, 2015, at 13:17, John Thurston <user-ce4d79d99bab@xymon.invalid<mailto:user-ce4d79d99bab@xymon.invalid>> wrote:

On 2/26/2015 9:14 AM, Mark Felder wrote:
~~ I can't verify on other OSes right now, so I'm hoping someone can
chime in ~~

On FreeBSD when I stop the Xymon client process it doesn't clean up all
of its children. Primarily you'll find that the vmstat command is not
sent a signal and continues to run ... indefinitely?

I have observed this behavior on Solaris, but the vmstat does eventually
disappear. It does not run forever.

--
   Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX

user-ce4d79d99bab@xymon.invalid<mailto:user-ce4d79d99bab@xymon.invalid>
Enterprise Technology Services
Department of Administration
State of Alaska
list Galen Johnson · Thu, 26 Feb 2015 18:49:42 +0000 ·
The vmstat is set to run for 5 minutes.  It is a forked process.  I see it on all unix systems that I've run Xymon on.  I usually ignore it or just kill it.  As John said, it doesn't run forever.

=G=
quoted from John Thurston

From: Xymon <xymon-bounces at xymon.com> on behalf of John Thurston <user-ce4d79d99bab@xymon.invalid>
Sent: Thursday, February 26, 2015 1:17 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Xymon client doesn't clean up all of its children

On 2/26/2015 9:14 AM, Mark Felder wrote:
~~ I can't verify on other OSes right now, so I'm hoping someone can
chime in ~~

On FreeBSD when I stop the Xymon client process it doesn't clean up all
of its children. Primarily you'll find that the vmstat command is not
sent a signal and continues to run ... indefinitely?
I have observed this behavior on Solaris, but the vmstat does eventually
disappear. It does not run forever.

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska
list John Thurston · Thu, 26 Feb 2015 09:55:47 -0900 ·
On Feb 26, 2015, at 13:17, John Thurston <user-ce4d79d99bab@xymon.invalid
quoted from Mark Felder
On 2/26/2015 9:14 AM, Mark Felder wrote:
~~ I can't verify on other OSes right now, so I'm hoping someone can
chime in ~~

On FreeBSD when I stop the Xymon client process it doesn't clean up all
of its children. Primarily you'll find that the vmstat command is not
sent a signal and continues to run ... indefinitely?
I have observed this behavior on Solaris, but the vmstat does eventually
disappear. It does not run forever.
On 2/26/2015 9:44 AM, Novosielski, Ryan wrote:
However, the result is that you can't do a restart on Solaris 10 with
SMF if you are using vmstat. I have patched my scripts on Solaris to
kill the child processes.
Ahhh. I haven't run into this problem because I'm not trying to use SMF 
to control it. I use " ~/server/xymon.sh restart " if I want to restart 
it interactively as the non-priv'd user. As root, my init-script does 
about the same thing.

For future reference, how did you modify your manifest or scripts to 
meet your needs?
quoted from Galen Johnson
-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska
list Japheth Cleaver · Thu, 26 Feb 2015 11:33:54 -0800 ·
quoted from John Thurston
On Thu, February 26, 2015 10:44 am, Novosielski, Ryan wrote:
However, the result is that you can't do a restart on Solaris 10 with SMF
if you are using vmstat. I have patched my scripts on Solaris to kill the
child processes.
On Feb 26, 2015, at 13:17, John Thurston
<user-ce4d79d99bab@xymon.invalid<mailto:user-ce4d79d99bab@xymon.invalid>> wrote:

On 2/26/2015 9:14 AM, Mark Felder wrote:
~~ I can't verify on other OSes right now, so I'm hoping someone can
chime in ~~

On FreeBSD when I stop the Xymon client process it doesn't clean up all
of its children. Primarily you'll find that the vmstat command is not
sent a signal and continues to run ... indefinitely?

I have observed this behavior on Solaris, but the vmstat does eventually
disappear. It does not run forever.
That's... interesting. vmstat (and anything of a similar nature) is indeed
being forked via nohup on a 5m timer. If it lasts more than 5m after the
last run of xymonclient.sh, there's definitely something wrong somewhere.
I wasn't aware that backgrounded processes like that could cause a problem
for Solaris under SMF though.

It does raise the question a little of three changes I'd considered
committing, but wanted to get differing perspectives on (especially from
non-Linux OS's):


1) Pipe the vmstat command to a nohup'd shell instead of executing it
directly. I'm curious if it might help SMF cope a little better, but must
confess that the primary reason was simply to have a 'ps' output that
doesn't look quite as scary:

 5599 ?        S      0:00 /bin/sh
 5602 ?        S      0:00  \_ vmstat 300 2


2) Kill backgrounded vmstat (or any other processes) owned by the
configured user when given a 'stop' SysV script command, but *not* when
given a 'restart'.


3) Generally speaking, patch the startup scripts and default configuration
to simply run 'xymoncmd /path/to/xymonlaunch --log=/path/to/log/file'.
This was the path I took for putting in systemd compatibility, and it's
possible it might provide something simpler for OSs' service monitors to
track too.


Long time users of the Terabithia RPMs might notice that these three have
been in for a while, but going beyond RH-derived Linux distros is a bigger
step.


What would you folks think?

-jc
list Mark Felder · Thu, 26 Feb 2015 14:10:24 -0600 ·

On Thu, Feb 26, 2015, at 13:33, J.C. Cleaver wrote:

What would you folks think?
I don't have much of an opinion. On FreeBSD I just pushed an update to
"pkill -U $xymon_client_user vmstat" so it cleans up on a stop/restart.

I swear I saw the vmstat live on longer than 5 minutes, but maybe I need
to do some more formal testing. It appears that the command is:

vmstat 300 2

but on FreeBSD that means it updates/prints new output every 300 seconds
and the 2 means two intervals, so I guess the max time it will run is 10
minutes / 600 seconds?
list Japheth Cleaver · Thu, 26 Feb 2015 12:36:21 -0800 ·
quoted from Mark Felder
On Thu, February 26, 2015 12:10 pm, Mark Felder wrote:

On Thu, Feb 26, 2015, at 13:33, J.C. Cleaver wrote:

What would you folks think?
I don't have much of an opinion. On FreeBSD I just pushed an update to
"pkill -U $xymon_client_user vmstat" so it cleans up on a stop/restart.

I swear I saw the vmstat live on longer than 5 minutes, but maybe I need
to do some more formal testing. It appears that the command is:

vmstat 300 2

but on FreeBSD that means it updates/prints new output every 300 seconds
and the 2 means two intervals, so I guess the max time it will run is 10
minutes / 600 seconds?

Interesting. I guess that makes it similar to AIX in that regards.

http://lists.xymon.com/archive/2014-September/040192.html

We should clarify this on each supported OS.
On RHEL5, "vmstat 10 2" gives me:

-bash-3.2$ vmstat 10 2
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa st
 0  0    312 9461348 203472 84676768    0    0    34   319    0    0  4  1
94  0  0

(wait 10 seconds)

 1  0    312 9466564 203472 84677632    0    0    51  5290 3138 8928  3  2
95  0  0

(exit)


-jc
list Mark Felder · Thu, 26 Feb 2015 14:39:44 -0600 ·
quoted from Japheth Cleaver

On Thu, Feb 26, 2015, at 14:36, J.C. Cleaver wrote:
We should clarify this on each supported OS.
On RHEL5, "vmstat 10 2" gives me:

-bash-3.2$ vmstat 10 2
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
 id
wa st
 0  0    312 9461348 203472 84676768    0    0    34   319    0    0  4 
 1
94  0  0

(wait 10 seconds)

 1  0    312 9466564 203472 84677632    0    0    51  5290 3138 8928  3 
 2
95  0  0

(exit)
Yes, on FreeBSD it is behaving exactly like that. I guess I described it
incorrectly. In order for it to hit 600 seconds it would have to have a
third interval... so it prints once immediately at 0 seconds, and at 300
seconds it prints a second time and exits.

My mistake!
list Japheth Cleaver · Thu, 26 Feb 2015 12:41:39 -0800 ·
quoted from Mark Felder

On Thu, February 26, 2015 12:39 pm, Mark Felder wrote:
On Thu, Feb 26, 2015, at 14:36, J.C. Cleaver wrote:
We should clarify this on each supported OS.
On RHEL5, "vmstat 10 2" gives me:

-bash-3.2$ vmstat 10 2
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
 id
wa st
 0  0    312 9461348 203472 84676768    0    0    34   319    0    0  4
 1
94  0  0

(wait 10 seconds)

 1  0    312 9466564 203472 84677632    0    0    51  5290 3138 8928  3
 2
95  0  0

(exit)
Yes, on FreeBSD it is behaving exactly like that. I guess I described it
incorrectly. In order for it to hit 600 seconds it would have to have a
third interval... so it prints once immediately at 0 seconds, and at 300
seconds it prints a second time and exits.

My mistake!
Ah, no worries! :)


-jc