Restarting failed processes on the client

list Daniel Bourque
Thu, 12 Jul 2007 09:50:11 -0500
Message-Id: <user-852a3b26a169@xymon.invalid>

As a last resort, if you also have rsh running, you could
- set hosts.equiv to allow the hobbit user coming in from the hobbit 
server to login as user x without a password,
- then give user x sudo ( with NOPASSWD ) rights to restart sshd.

I have a bunch automated fixes i setup, restart ntpd, kill processes, 
etc, using the SCRIPT alert & ssh keys.

In your case you could do this to restart the local or remote ssh service


< from hobbit-alerts.cfg>
...
PAGE=bla COLOR=red
        SCRIPT /opt/hobbit/server/bin/autofix_ssh autofix_ssh 
SERVICE=ssh  DURATION<10m
        MAIL user-0a951403e24f@xymon.invalid DURATION>10m REPEAT=30m

<autofix_ssh>

#!/bin/bash

if [ $BBHOSTNAME -eq `hostname` ] ; then
    sudo /etc/init.d/sshd restart
else
    rsh $BBHOSTNAME -l userx sudo /etc/init.d/sshd restart";
fi


hope this helps

Daniel Bourque
Systems/Network Administrator
WeatherData Service Inc
An Accuweather Company

Office (XXX) XXX-XXXX
Office (XXX) XXX-XXXX ext. XXXX
Mobile (XXX) XXX-XXXX


Henrik Stoerner wrote:

On Wed, Jul 11, 2007 at 04:13:56PM +0200, Henrik Stoerner wrote:

If You really want this, then the easiest way is probably to
have a script on the Hobbit server that handles the service
restart, and trigger it from an alerting script. Here's how:

[snipped]

Particularly for ssh, running the recovery script from the Hobbit
server might not be easy - since ssh is usually the only way you 
can remote-login to the server and gets things (re-)started.

So to implement the same functionality on the client-side, you can
write a client-side extension script that does:

  #!/bin/sh

  PROCSTATUS=`$BB $BBDISP "query $MACHINE.procs" | awk '{print $1}'`
  if test "$PROCSTATUS" = "red"
  then
     /etc/init.d/sshd restart
  fi

  exit 0

This triggers the "sshd restart" whenever the "procs" status goes red.
So it won't be able to tell if it's the sshd process that triggers a red
if you're monitoring multiple processes on each host. So alternatively,
you could add network-monitoring of "ssh", and then query the "ssh"
column instead of the "procs" column.


Regards,
Henrik