Checking process longevity
list Colin Coe
Hi all I'm trying to work out how I can get hobbit to alert me when a process has existed for more than n seconds. This is important as we sometimes have NFS problems that cause processes such as df to hang due to stale mounts and I'd like tp pick these up sooner rather than later. Any ideas on how to implement this? Thanks CC NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
list Henrik Størner
▸
On Mon, Feb 04, 2008 at 08:20:42AM +0900, Coe, Colin C. (Unix Engineer) wrote:
I'm trying to work out how I can get hobbit to alert me when a process has existed for more than n seconds. This is important as we sometimes have NFS problems that cause processes such as df to hang due to stale mounts and I'd like tp pick these up sooner rather than later.
Wouldn't it be easier to just scan the logfile for NFS timeout errors? Hobbit doesn't track the lifetime of a process, and I would think this would be very bothersome to setup because you'd have to exclude long- living daemon processes. Regards, Henrik
list Colin Coe
▸
-----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Monday, 4 February 2008 6:18 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Checking process longevity On Mon, Feb 04, 2008 at 08:20:42AM +0900, Coe, Colin C. (Unix Engineer) wrote:I'm trying to work out how I can get hobbit to alert me when a process has existed for more than n seconds. This is important as we sometimes have NFS problems that cause processes such as df to hang due to stale mounts and I'd like tp pick these up sooner rather than later.Wouldn't it be easier to just scan the logfile for NFS timeout errors? Hobbit doesn't track the lifetime of a process, and I would think this would be very bothersome to setup because you'd have to exclude long- living daemon processes. Regards, Henrik
By default, under RHEL (most of) the files under /var/log are owned by, and only readable by, root. I'm still deciding whether or not to allow hobbit to read the log files. I do think that there are other cases where monitoring how long a process exists is useful. I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like: PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] [RUNTIME=seconds] Example, alert if a 'df' has existed for more 60 seconds HOST foo PROC df RUNTIME=60 I started hacking but my C fu is weak.
▸
CC
NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
list Henrik Størner
▸
On Tue, Feb 05, 2008 at 01:47:07PM +0900, Coe, Colin C. (Unix Engineer) wrote:
I do think that there are other cases where monitoring how long a process exists is useful. I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like: PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] [RUNTIME=seconds] Example, alert if a 'df' has existed for more 60 seconds HOST foo PROC df RUNTIME=60
Sure. Only problem is: How do you determine how long a process has existed ? Some systems report the start-time of a process in a separate column (START in Linux, STIME in Solaris, ...) Not very accurate, since if they were started more than 24 hours ago it shows only the date. I guess we could use that. Regards, Henrik
list Massimo Morsiani
Hi all, what is the right way to use CLASS tag in hobbit-alerts.cfg? I didn't find anything in Hobbit man-pages. Regards. Massimo Morsiani Information Technology Dept. Gilbarco S.p.a. via de' Cattani, 220/G 50145 Firenze, Italy tel: +XX-XXX-XXXXX fax: +XX-XXX-XXXXXX email: user-32025d8bd22e@xymon.invalid web: http://www.gilbarco.it This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.
list Colin Coe
▸
-----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Tuesday, 5 February 2008 8:58 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Checking process longevity On Tue, Feb 05, 2008 at 01:47:07PM +0900, Coe, Colin C. (Unix Engineer) wrote:I do think that there are other cases where monitoring how long a process exists is useful.I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like: PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text][RUNTIME=seconds]Example, alert if a 'df' has existed for more 60 seconds HOST fooPROC df RUNTIME=60Sure. Only problem is: How do you determine how long a process has existed ? Some systems report the start-time of a process in a separate column (START in Linux, STIME in Solaris, ...) Not very accurate, since if they were started more than 24 hours ago it shows only the date. I guess we could use that. Regards, Henrik
That sounds great. Typically, I'm looking for processes existing for 5 minutes.
▸
Thanks
CC
NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
list John Glowacki
▸
On Tue, Feb 5, 2008 at 7:58 AM, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Feb 05, 2008 at 01:47:07PM +0900, Coe, Colin C. (Unix Engineer) wrote:I do think that there are other cases where monitoring how long a process exists is useful. I was thinking that this could be done by adding a new flag to 'PROC' in hobbit-clients.cfg. Something like: PROC processname minimumcount maximumcount color [TRACK=id] [TEXT=text] [RUNTIME=seconds] Example, alert if a 'df' has existed for more 60 seconds HOST foo>
PROC df RUNTIME=60Sure. Only problem is: How do you determine how long a process has existed ? Some systems report the start-time of a process in a separate column (START in Linux, STIME in Solaris, ...) Not very accurate, since if they were started more than 24 hours ago it shows only the date. I guess we could use that. Regards, Henrik
If etime was added to ps command this could be added to Solaris and
Linux for this purpose. stime seems like it would report month day or
year depending on OS and time passed.
Solaris man for ps:
etime In the POSIX locale, the elapsed time since the pro-
cess was started, in the form:
[[dd-]hh:]mm:ss
Example output for different times.
STIME ELAPSED
Mar18 2-03:45:15
Mar19 1-06:56:27
14:45 02:08:04
15:33 01:20:22
16:25 28:27
16:53 00:29
16:53 00:00
SunOS 5.7
STIME ELAPSED
May_04 686-03:00:28
SunOS 5.9
STIME ELAPSED
Mar_08 377-21:30:03
SunOS 5.10
STIME ELAPSED
Jun_12 282-01:43:14
Red Hat Enterprise Linux AS release 3
Linux 2.4.21-47.ELsmp
STIME ELAPSED
Mar15 5-02:51:34
Red Hat Enterprise Linux AS release 4
Linux 2.6.9-34.ELsmp
STIME ELAPSED
2007 203-00:13:39
I found etime today because I had to prove processes started on a
Solaris system on Feb_20 of 2007 and not Feb_20 2008.
Hope this is helpful.
John