On Wed, Jul 11, 2007 at 02:01:13PM +0200, Thomas Kaehn wrote:
But is there also a proper way in Hobbit to take action on failed
processes?
No. Hobbit only monitors things, it doesn't act to recover from
any failures.
If You really want this, then the easiest way is probably to
have a script on the Hobbit server that handles the service
restart, and trigger it from an alerting script. Here's how:
First, setup monitoring of the "sshd" process in hobbit-clients.cfg
with
PROC sshd GROUP=ssh
You need the "GROUP" setting to be able to distinguish between
different types of "procs" alerts.
Next, create /usr/local/bin/sshRecover.sh with the commands needed
to restart ssh - you can use $BBHOSTNAME to get the name of the host
that has the problem.
Finally, in hobbit-alerts.cfg you should have
HOST=hostA,hostB,hostC SERVICE=procs GROUP=ssh
SCRIPT /usr/local/bin/sshRecover.sh 0
to trigger the sshRecover.sh script when the "procs" column
goes red due to the "sshd" process missing. The "0" at the end
is a mandatory parameter in hobbit-alerts.cfg (the "recipient"
if you read the man-page) but here it's just a dummy parameter.
Regards,
Henrik