Had a situation last weekend where we had a critical red alert for
"msgs" - and it went green after less than 5 minutes. We all expected
that it would stay red for half an hour.
This particular server is monitoring both /var/log/messages and
/var/log/otherlog and the red was triggered by /var/log/otherlog.
Thoughts? Should this have stayed red longer? Is there possibly an
issue when monitoring two files, if one stays green?
Meanwhile, I put in a hacky little workaround to make sure that we
never EVER miss a red on this server's logs:
In alerts.cfg:
HOST=%hotserver.* SERVICE=msgs
SCRIPT /usr/local/scripts/touchred REDalert FORMAT=TEXT
/usr/local/scripts/touchred just calls "/bin/touch /home/xymon/REDALERT"
and then I have in analysis.cfg:
HOST=%xymon.*
FILE /home/xymon/REDALERT red NOEXIST MTIME>30
That's going to stay red *forever* until someone goes and manually
deletes /home/xymon/REDALERT
(plus for extra fun, BOTH alerts page. We are not going to miss one of
these again)