Phil Wild wrote:
Hi Chris,
I think it really depends on what you are testing? If you are using the
standard hobbit client and the standard tests, most of the client side is
pretty basic, I guess you could call it a dumb client in a way as it does a
simple job of pulling the data out and sends it on without any intelligent
decisions being made about thresholds etc.
To do what you want, you either have to do as you say (set up keys from
the server etc and have the server perform an action after a threshold
breach from a script initiated/configured in the hobbit-alerts file.
Bummer, I was hoping there was perhaps a barely documented feature that
would let you exec a script on the client to make my life easier.
Or, which would be much simpler, put some code in your monitoring script
to take action, but then you are starting to move away from the simplicity
of hobbit. It decomes even harder if you want to take action based on
something picked up in the standard tests (like CPU that you mentioned in
your post).
Indeed, my intent is for standard tests.
You may need to write your own test/new column that monitors the same
metric but in a different light. In my view, an automated action based on a
detected event probably does not belong in the monitoring system. If a
failure can be expected and an automated action is known to fix the issue,
perhaps that should be built into the startup process of the application (a
watchdog process etc).
Indeed, a daemon shouldn't fail and it should run properly or be fixed
natively. However, in the case of what I'm trying to monitor is an item that
has a series of dependencies - Postfix, depending on Amavis, depending on
p0f, depending on ClamAV, depending on greylist software, depending on
database, etc. Under certain circumstances if one of these were to go down,
it ends up snowballing to have high CPU, in my case.
The better way for me to handle this is to likely search logs for items,
instead of relying on high CPU.
Hobbit can then be used to monitor the log for a restart event, or a
failed restart event etc. Actually, thinking about it more, building the
intelligent action into the agent is an ok idea and you also have the
opportunity of capturing and transmitting additional information about why
something dies if you run an action to fix and it failed etc.
For the scenario I laid out above, I intend to write a script that will
restart the daemons properly in the correct order, but this is the "oh shit"
script, and wouldn't be a system startup script, for example.
I am waffling... You still need extra security based configuration steps
on the client with sudo or ssh anyway to get around access permission to
restart something anyway as your client is running as the hobbit userid so
this brings the client configuration closer to an ssh setup on the server. I
don't think either way is perfect but both would do what you want...
Indeed. It sounds like generally the two scenarios I'd mentioned in my
email are the way to get it to work, and whichever is most reliable/less
hack-ish would be the best way to do it.
Perhaps this is something that belongs on a request for feature list of a
future release of hobbit.
The hobbit client installation to configure sudo to allow it to run
commands as other users (on admins acceptance during the installation of
course).
The ability of the hobbit server to send a series of actions to the
hobbit client for execution via the hobbit communication channel. Sounds
like something that could have lots of uses if done well...
I think this would be the perfect scenario. Something added to the hobbit
client, that would go in 'localclient.cfg'. A simple 'SCRIPT' line that
could be nested under a test, that would pass along whatever meaningful
variables that could be useful, such as PID.
The hobbit server *already uses* 'setuid root' for some binaries, such as
'hobbitping'. The client would simply need to call some binary who's sole
purpose is to launch scripts as root so essentially anything would be
possible.
I can't think of any reason that some hook would have to exist for the
script to tell the hobbit client anything back, I think it can just wait for
the next poll period to see if it went back to green.
--Chris