Hobbit: tests suddenly went purple and they did not reappear
Hi all,
This is the situation:
hobbit server: hobbitd 4.2.0 monitoring a discrete number of hosts (different OSs) and network devices.
hobbit client on the problematic host: hobbit-client-4.2.0-1 running on RedHat EL AS rel4 Update 4.
The host was being monitored with no apparent problem until today early morning (ordinary tests: cpu, disk, memory, ports); then, all of a sudden, all tests except conn, info and trend went purple (we also have a devmon custom test querying snmp that was still working fine).
Tried to restart the client, tried also to restart the server, but got no luck. Tried also to restart both client and server, with no success again.
Finally, I tried to comment the host's line in bb-hosts and dropped the host with bb: the host correctly disappeared from the monitoring web page. After that, I uncommented the host's line in bb-hosts and restarted hobbit server and client.
From this point on, i can only see conn, info and trend tests on the monitoring webpage. Also, the devmon custom check (querying snmp oids on the monitored host) works fine.
Then I went on checking permissions and owner on the hobbit dirs on monitored host: they appeared consistent with those on other non-problematic hosts.
There are no peculiar configurations on hobbit conf files on the client host:
$HOBBITCLIENTHOME/etc/clientlautch.cfg:
[client]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/bin/hobbitclient.sh
LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
INTERVAL 5m
$HOBBITCLIENTHOME/etc/localclient.cfg:
DEFAULT
# These are the built-in defaults.
UP 1h
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
$BBTMP/logfetch.myhost.cfg:
log:/var/log/messages:10240
ignore MARK
On the hobbit server side there are no peculiar settings for this same host in /etc/hobbit.
Other third-parties services running on the same host didn't seem to be affected by problems today.
Any hints?
thank you
fra