Xymon Mailing List Archive search

Restarting failed processes on the client

list Henrik Størner
Wed, 11 Jul 2007 16:13:56 +0200
Message-Id: <user-a15cf8504ced@xymon.invalid>

On Wed, Jul 11, 2007 at 02:01:13PM +0200, Thomas Kaehn wrote:
But is there also a proper way in Hobbit to take action on failed
processes?
No. Hobbit only monitors things, it doesn't act to recover from
any failures.

If You really want this, then the easiest way is probably to
have a script on the Hobbit server that handles the service
restart, and trigger it from an alerting script. Here's how:

First, setup monitoring of the "sshd" process in hobbit-clients.cfg 
with
    PROC sshd GROUP=ssh 
You need the "GROUP" setting to be able to distinguish between
different types of "procs" alerts.

Next, create /usr/local/bin/sshRecover.sh with the commands needed 
to restart ssh - you can use $BBHOSTNAME to get the name of the host 
that has the problem. 

Finally, in hobbit-alerts.cfg you should have
    HOST=hostA,hostB,hostC SERVICE=procs GROUP=ssh
        SCRIPT /usr/local/bin/sshRecover.sh 0
to trigger the sshRecover.sh script when the "procs" column
goes red due to the "sshd" process missing. The "0" at the end
is a mandatory parameter in hobbit-alerts.cfg (the "recipient"
if you read the man-page) but here it's just a dummy parameter.


Regards,
Henrik