Xymon Mailing List Archive search

Xymon client only reports once

3 messages in this thread

list Mark Wagner · Mon, 13 Feb 2012 20:13:50 -0800 ·
I've installed Xymon on my home network as testing for a possible installation at work.  It's working fine on three out of the four systems, but on the fourth, the client only reports its status to the server once, immediately after being started.

The problem computer is a Pentium MMX with 48MB RAM, running Gentoo Linux.

It looks as if the client is getting hung in the process of sending the second report: "ps aux" shows a sleeping "xymonlaunch" process, and the XYMONTMP directory contains a "xymon_vmstat" file with a timestamp five minutes after the successful update.

I could probably work around this with cron job to restart the client every five minutes, but I'd rather fix it properly.  Any suggestions on what might be going wrong, or other things I could look at?

Thanks, Mark Wagner
list Japheth Cleaver · Tue, 14 Feb 2012 11:59:13 -0800 (PST) ·
quoted from Mark Wagner

On Mon, February 13, 2012 8:13 pm, Mark wrote:
I've installed Xymon on my home network as testing for a possible
installation
at work.  It's working fine on three out of the four systems, but on the
fourth, the client only reports its status to the server once, immediately
after being started.

The problem computer is a Pentium MMX with 48MB RAM, running Gentoo Linux.

It looks as if the client is getting hung in the process of sending the
second
report: "ps aux" shows a sleeping "xymonlaunch" process, and the XYMONTMP
directory contains a "xymon_vmstat" file with a timestamp five minutes
after
the successful update.

I could probably work around this with cron job to restart the client
every
five minutes, but I'd rather fix it properly.  Any suggestions on what
might
be going wrong, or other things I could look at?

Thanks,
Mark Wagner
The vmstat file there sounds normal... Can you run xymonlaunch with
--debug and see what it's reporting back? Also, strace what it's doing
when the next expected run occurs?

For testing purposes you can bring the interval down to 30s or so. The
only change you should notice is having multiple backgrounded vmstat
processes going at once in a round-robin fashion.


-jc
list Mark Wagner · Tue, 14 Feb 2012 23:16:33 -0800 ·
quoted from Japheth Cleaver
On Tuesday 14 February 2012 11:59:13 am you wrote:
On Mon, February 13, 2012 8:13 pm, Mark wrote:
I've installed Xymon on my home network as testing for a possible
installation
at work.  It's working fine on three out of the four systems, but on the
fourth, the client only reports its status to the server once,
immediately after being started.

The problem computer is a Pentium MMX with 48MB RAM, running Gentoo
Linux.

It looks as if the client is getting hung in the process of sending the
second
report: "ps aux" shows a sleeping "xymonlaunch" process, and the XYMONTMP
directory contains a "xymon_vmstat" file with a timestamp five minutes
after
the successful update.

I could probably work around this with cron job to restart the client
every
five minutes, but I'd rather fix it properly.  Any suggestions on what
might
be going wrong, or other things I could look at?

Thanks,
Mark Wagner
The vmstat file there sounds normal... Can you run xymonlaunch with
--debug and see what it's reporting back? Also, strace what it's doing
when the next expected run occurs?

For testing purposes you can bring the interval down to 30s or so. The
only change you should notice is having multiple backgrounded vmstat
processes going at once in a round-robin fashion.
Running xymonlaunch from the command line with the "--no-daemon" and "--debug" options, there's no output to the terminal.

clientlaunch.log:
2012-02-14 21:48:23 xymonlaunch starting
2012-02-14 21:48:23 Loading tasklist configuration from ./etc/clientlaunch.cfg
15337 2012-02-14 21:48:23 Opening file ./etc/clientlaunch.cfg
15337 2012-02-14 21:48:23 15337 2012-02-14 21:48:23 Starting tasklist scan
15337 2012-02-14 21:48:23 About to start task client
15338 2012-02-14 21:48:23 client -> Loading environment from /home/xymon/client/etc/xymonclient.cfg area 15338 2012-02-14 21:48:23 Opening file /home/xymon/client/etc/xymonclient.cfg
15338 2012-02-14 21:48:23 client -> Assigning stdout/stderr to log '/home/xymon/client/logs/xymonclient.log'
15337 2012-02-14 21:48:28 15337 2012-02-14 21:48:28 Starting tasklist scan
15337 2012-02-14 21:48:28 Task client active with PID 15338
15337 2012-02-14 21:48:32 15337 2012-02-14 21:48:32 Starting tasklist scan

The last two lines then repeat every five seconds until I kill the client.

xymonclient.log:
15338 2012-02-14 21:48:23 client -> Running '/home/xymon/client/bin/xymonclient.sh', XYMONHOME=/home/xymon/client

That one line is the only entry.

strace shows xymonclient forking off a new process (the "About to start task client" entry in clientlaunch.log).  The task client then execs xymonclient.sh, which gathers data, sends it off, and exits.  The main thread, meanwhile, has the following strace output repeating every five seconds with suitable changes to timestamps:

15337 21:56:03 wait4(-1, 0xbffff41c, WNOHANG, NULL) = -1 ECHILD (No child processes)
15337 21:56:03 time(NULL)               = 1329285363
15337 21:56:03 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
15337 21:56:03 getpid()                 = 15337
15337 21:56:03 write(1, "15337 2012-02-14 21:56:03 \n", 27) = 27
15337 21:56:03 time(NULL)               = 1329285363
15337 21:56:03 stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0
15337 21:56:03 getpid()                 = 15337
15337 21:56:03 write(1, "15337 2012-02-14 21:56:03 Starting tasklist scan\n", 49) = 49
15337 21:56:03 time(NULL)               = 1329285363
15337 21:56:03 rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0
15337 21:56:03 rt_sigaction(SIGCHLD, NULL, {0x804a950, [], SA_RESTORER, 0x4005d6f8}, 8) = 0
15337 21:56:03 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
15337 21:56:03 nanosleep({5, 0}, 0xbffff224) = 0

There's no change at the 30-second mark (when the client task should be gathering the next set of data), and the only action at the five-minute mark is vmstat waking up.

For comparison, running strace on a working system shows the main thread creating a new task client process right when it should.  One thing that may or may not be relevant: although the log output on both systems has the entry "Starting tasklist scan", the working client doesn't actually start stat()-ing "clientlaunch.cfg" until after the *second* successful run of the task client; the non-working system never does stat() it.

The strace logs from both machines are available if anyone thinks they might be useful in figuring out what's happening, but since they're about 2.5MB combined, I don't want to send them to the whole list.

-- 
Mark Wagner