Xymon Mailing List Archive search

Checking process longevity

list Colin Coe
Tue, 5 Feb 2008 13:47:07 +0900
Message-Id: <user-f9aa6602da84@xymon.invalid>

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Monday, 4 February 2008 6:18 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Checking process longevity

On Mon, Feb 04, 2008 at 08:20:42AM +0900, Coe, Colin C. (Unix Engineer) wrote:
I'm trying to work out how I can get hobbit to alert me when a process
has existed for more than n seconds.  This is important as we sometimes
have NFS problems that cause processes such as df to hang due to stale
mounts and I'd like tp pick these up sooner rather than later.
Wouldn't it be easier to just scan the logfile for NFS timeout errors?

Hobbit doesn't track the lifetime of a process, and I would think this
would be very bothersome to setup because you'd have to exclude long-
living daemon processes.


Regards,
Henrik
By default, under RHEL (most of) the files under /var/log are owned by,
and only readable by, root.  I'm still deciding whether or not to allow
hobbit to read the log files.  I do think that there are other cases
where monitoring how long a process exists is useful.

I was thinking that this could be done by adding a new flag to 'PROC' in
hobbit-clients.cfg.  Something like:

PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text]
[RUNTIME=seconds]

Example, alert if a 'df' has existed for more 60 seconds

HOST foo
	PROC df RUNTIME=60

I started hacking but my C fu is weak.

CC


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.