Xymon Mailing List Archive search

Xymon PROC check fails on un-aligned ps(1) output

3 messages in this thread

list Christoph Schug · Mon, 01 Aug 2011 20:32:10 +0200 ·
If have got a question regarding Xymon 4.3.3 (running on CentOS
5.6/x86_64). In order to monitor the existence of certain processes like
rsyslogd(8) I have following process rule defined in analysis.cfg:

CLASS=linux
    PROC     "%^/sbin/rsyslogd -m 0$"

This works fine as long as the columns in the output of ps(1) (more
specific “ps -Aww -o
pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd” as defined in
xymonclient-linux.sh) are all nicely aligned.

  PID  PPID USER      STARTED S PRI %CPU     TIME %MEM   RSZ    VSZ CMD
[...]
 4620  4607 68         Jun 17 S  22  0.0 00:00:00  0.0   860  12348
hald-addon-keyboard: listening on /dev/input/event0
 4709     1 root       Jun 17 S  17  0.0 00:00:00  0.0   496   8540
/usr/bin/hidd --server
 4739     1 root       Jun 17 S  21  0.0 00:11:14  0.0  3576 300132
/sbin/rsyslogd -m 0
 6894     1 root       Jun 17 S  18  0.0 00:00:00  0.0  1540 122008
automount
 6918     1 root       Jun 17 S  24  0.0 00:00:08  0.0  1224  63544
/usr/sbin/sshd

The trouble starts when the process in question runs long enough (as seen
on a different machine) so it does fit the reserved columns for that
specific field, disturbing to whole output (process runtime is just one
example, I suppose any value growing big enough to not fit anymore the
reserved space would do to exploit this behavior):

  PID  PPID USER      STARTED S PRI %CPU     TIME %MEM   RSZ    VSZ CMD
[...]
 5377     1 root       May 24 S  21  0.0 00:00:00  0.0   444   3816
/sbin/mingetty tty4
 5378     1 root       May 24 S  20  0.0 00:00:00  0.0   444   3816
/sbin/mingetty tty5
 5380     1 root       May 24 S  19  0.0 00:00:00  0.0   444   3816
/sbin/mingetty tty6
 5382     1 root       May 24 S  22  0.0 00:00:00  0.0   496   3824
/sbin/agetty 9600 ttyS1 vt100
 8734     1 root       Jun 20 S  21  7.7 3-06:51:29  0.1 48640 292664
/sbin/rsyslogd -m 0
20468   262 root       Jul 19 S  24  0.0 00:04:01  0.0     0      0
[pdflush]

In this case the above regex does not seem to match anymore, because
(apparently) the matching starts at some fixed column value. Just for fun
and to double check I enhanced the process rule set by another rule:

CLASS=linux
    PROC     "%^/sbin/rsyslogd -m 0$"
    PROC     "%^[0-9]+ /sbin/rsyslogd -m 0$"

After doing so, indeed the first rule still fails while the second rule
matches. So apparently the last digit of the VSZ field of rsyslogd(8)
sneaked into the CMD field and gets matched by the PROC check. Is this a
known bug, and if yes is there a good workaround for that apart from
invoking a wrapper script in xymonclient-linux.sh which mangels the output
of ps(1) accordingly?

Thanks in advance!
-cs
list Christoph Schug · Mon, 01 Aug 2011 21:07:06 +0200 ·
quoted from Christoph Schug
On Mon, 01 Aug 2011 20:32:10 +0200, Christoph Schug <user-cd3e90ddf801@xymon.invalid> wrote:
If have got a question regarding Xymon 4.3.3 (running on CentOS
5.6/x86_64). In order to monitor the existence of certain processes like
rsyslogd(8) I have following process rule defined in analysis.cfg:

CLASS=linux
    PROC     "%^/sbin/rsyslogd -m 0$"
[...]

I was asked off the list (thanks, but honestly I hope the benefit for all
of us is higher of discussion keeps on the list):

"Why not just dispense with the '^'? That way the RE will match regardless
of where it starts. "

I'd like to have most exact matching on all my processes. rsyslogd(8) is
just an example, same applies for example to shell scripts which run for a
very long time or as daemon. So I prefer rather

     PROC     "%^/foo/bar$"

instead of just

     PROC     "/foo/bar"

or a somehow relaxed regex, because otherwise a local use might have a look
at the script using more(1), but I don't want to have the process
monitoring matching such thinks like "more /foo/bar". This is reporting
wrong numbers, or might even report the check as GREEN while the instance
which is intended to run doesn't so anymore.

-cs
list Christoph Schug · Tue, 02 Aug 2011 13:43:53 +0200 ·
Okay, looked more closely on the source. Apparently the column position of
fields of the ps(1) output is automatically determined depended on the
position of the header. So just to let you know, following patch worked for
my setup:

--- xymon-4.3.4/client/xymonclient-linux.sh.orig        2011-07-31
23:01:52.000000000 +0200
+++ xymon-4.3.4/client/xymonclient-linux.sh     2011-08-02
13:40:14.000000000 +0200
@@ -68,7 +68,7 @@
 # Report mdstat data if it exists
 if test -r /proc/mdstat; then echo "[mdstat]"; cat /proc/mdstat; fi
 echo "[ps]"
-ps -Aww -o pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd
+ps -Aww -o
pid,ppid,user,start,state,pri,pcpu,time:12,pmem,rsz:10,vsz:10,cmd

 # $TOP must be set, the install utility should do that for us if it
exists.
 if test "$TOP" != ""

Cheers
-cs