Xymon Mailing List Archive search

Monitoring process with regex - fails with 4.3.12

3 messages in this thread

list John Horne · Wed, 31 Jul 2013 17:20:12 +0100 ·
Hello,

I have just upgraded some client servers from Xymon 4.3.10 to 4.3.12.
Two clients now continuously report that a specific process (clamd) is
not running. (I have checked both servers and the process is running.)

In the monitoring server analysis.cfg file I have (for both clients):

	PROC    "% clamd$" TEXT=clamd

This worked fine for 4.3.10, but now seems to fail. We have some other
process strings containing 'clamd', so we had to use the above to ensure
that we were just monitoring the main 'clamd' process.

However, the 4.3.12 'Changes' file does mention:

	* Linux clients now align "ps" output so it is more readable.

I'm not sure if that is relevant. I'll see if I can test things a bit
further to see what is happening.


John.

-- 
John Horne, Plymouth University, UK
Tel: +XX (X)XXXX XXXXXX    Fax: +XX (X)XXXX XXXXXX
list John Horne · Wed, 31 Jul 2013 21:23:21 +0100 ·
quoted from John Horne
On Wed, 2013-07-31 at 17:20 +0100, John Horne wrote:
Hello,

I have just upgraded some client servers from Xymon 4.3.10 to 4.3.12.
Two clients now continuously report that a specific process (clamd) is
not running. (I have checked both servers and the process is running.)

In the monitoring server analysis.cfg file I have (for both clients):

	PROC    "% clamd$" TEXT=clamd

This worked fine for 4.3.10, but now seems to fail. We have some other
process strings containing 'clamd', so we had to use the above to ensure
that we were just monitoring the main 'clamd' process.
Well the code for regular expressions started to get a bit complicated
so I thought I would look to see which regex worked and which didn't.

The 'ps' output the Xymon server receives (taken from a data/hostdata
file) is, as an example:

  6810  1 clam   Jul 11 S  21  0.1  00:40:42 11.6 49000 493632 clamd

As can be seen the 'clamd' command is just that, no pathname and no
command options. It is preceded by a space, which is why we used the
above regex.

For the Xymon 4.3.12 clients I found that using '%^clamd$' worked, and
that '% clamd$' did not (note there is a space between the '%' and 'c'
characters).

For the Xymon 4.3.10 clients I found the opposite. '%^clamd$' did not
work, but '% clamd$' did.

I am, however, confused. As far as I was aware the procs processing is
done on the Xymon server, not the client. Our main Xymon server was
updated to 4.3.12 yesterday, and all four of our clamd clients showed no
errors in the procs column. Today I updated two clients to 4.3.12 and
started to get the errors about 'clamd' not running.

So, in effect, we have two 4.3.12 clients using the old regex (' clamd
$') and that works. And we have two 4.3.12 clients using the new regex
('^clamd$') and they too work. But if this is all processed on the
server, then I would expect two of the clients to be reporting errors.
Since that is not happening I can only assume that the client is either
doing the processing or communicating something with the main server.

As to what has changed, I can only assume that the 'ps' processing of
the client data, when using a regex, works only on the actual command
being run and not the whole line from the 'ps' output. Hence '^clamd$'
now works (for 4.3.12) and ' clamd$' does not.
quoted from John Horne


John.

-- 
John Horne, Plymouth University, UK
Tel: +XX (X)XXXX XXXXXX    Fax: +XX (X)XXXX XXXXXX
list Japheth Cleaver · Wed, 31 Jul 2013 21:32:56 -0000 (UTC) ·
quoted from John Horne

On Wed, July 31, 2013 8:23 pm, John Horne wrote:
On Wed, 2013-07-31 at 17:20 +0100, John Horne wrote:
Hello,

I have just upgraded some client servers from Xymon 4.3.10 to 4.3.12.
Two clients now continuously report that a specific process (clamd) is
not running. (I have checked both servers and the process is running.)

In the monitoring server analysis.cfg file I have (for both clients):

	PROC    "% clamd$" TEXT=clamd

This worked fine for 4.3.10, but now seems to fail. We have some other
process strings containing 'clamd', so we had to use the above to ensure
that we were just monitoring the main 'clamd' process.
Well the code for regular expressions started to get a bit complicated
so I thought I would look to see which regex worked and which didn't.

The 'ps' output the Xymon server receives (taken from a data/hostdata
file) is, as an example:

  6810  1 clam   Jul 11 S  21  0.1  00:40:42 11.6 49000 493632 clamd

As can be seen the 'clamd' command is just that, no pathname and no
command options. It is preceded by a space, which is why we used the
above regex.

For the Xymon 4.3.12 clients I found that using '%^clamd$' worked, and
that '% clamd$' did not (note there is a space between the '%' and 'c'
characters).

For the Xymon 4.3.10 clients I found the opposite. '%^clamd$' did not
work, but '% clamd$' did.

I am, however, confused. As far as I was aware the procs processing is
done on the Xymon server, not the client. Our main Xymon server was
updated to 4.3.12 yesterday, and all four of our clamd clients showed no
errors in the procs column. Today I updated two clients to 4.3.12 and
started to get the errors about 'clamd' not running.

So, in effect, we have two 4.3.12 clients using the old regex (' clamd
$') and that works. And we have two 4.3.12 clients using the new regex
('^clamd$') and they too work. But if this is all processed on the
server, then I would expect two of the clients to be reporting errors.
Since that is not happening I can only assume that the client is either
doing the processing or communicating something with the main server.

As to what has changed, I can only assume that the 'ps' processing of
the client data, when using a regex, works only on the actual command
being run and not the whole line from the 'ps' output. Hence '^clamd$'
now works (for 4.3.12) and ' clamd$' does not.

I'm seeing the same thing on my systems, interestingly. We'd never used
prepended spaces in our CMD regexes, so I hadn't seen it before.

I suspect this actually isn't the ps-align patch doing it as AFAICT all
it's changing is the amount of padding... the padding is still composed of
spaces only.

IIRC there was a different tweak earlier in the dev cycle that involved
trying to ensure that the command was being isolated properly in the first
place but I can't seem to find the check-in at the moment.


Regards,

-jc