Hobbit: tests suddenly went purple and they did not reappear
list Francesco Cicolani
Hi all,
This is the situation:
hobbit server: hobbitd 4.2.0 monitoring a discrete number of hosts (different OSs) and network devices.
hobbit client on the problematic host: hobbit-client-4.2.0-1 running on RedHat EL AS rel4 Update 4.
The host was being monitored with no apparent problem until today early morning (ordinary tests: cpu, disk, memory, ports); then, all of a sudden, all tests except conn, info and trend went purple (we also have a devmon custom test querying snmp that was still working fine).
Tried to restart the client, tried also to restart the server, but got no luck. Tried also to restart both client and server, with no success again.
Finally, I tried to comment the host's line in bb-hosts and dropped the host with bb: the host correctly disappeared from the monitoring web page. After that, I uncommented the host's line in bb-hosts and restarted hobbit server and client.
From this point on, i can only see conn, info and trend tests on the monitoring webpage. Also, the devmon custom check (querying snmp oids on the monitored host) works fine.
Then I went on checking permissions and owner on the hobbit dirs on monitored host: they appeared consistent with those on other non-problematic hosts.
There are no peculiar configurations on hobbit conf files on the client host:
$HOBBITCLIENTHOME/etc/clientlautch.cfg:
[client]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/bin/hobbitclient.sh
LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
INTERVAL 5m
$HOBBITCLIENTHOME/etc/localclient.cfg:
DEFAULT
# These are the built-in defaults.
UP 1h
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
$BBTMP/logfetch.myhost.cfg:
log:/var/log/messages:10240
ignore MARK
On the hobbit server side there are no peculiar settings for this same host in /etc/hobbit.
Other third-parties services running on the same host didn't seem to be affected by problems today.
Any hints?
thank you
fra
list Iain M Conochie
▸
<snip>
On the hobbit server side there are no peculiar settings for this same host in /etc/hobbit. Other third-parties services running on the same host didn't seem to be affected by problems today. Any hints?
From the client can you telnet to port 1984 (or what ever port your hobbit server is running on)? Has the hostname of the client changed? Can you see a new hostname in ghost clients in the hobbitd test for your monitoring server? Good luck iain
thank you fra
list Francesco Cicolani
Hi Iain,
still no luck here:
- telnet test on port 1984 looks good.
- the problemativc host does not appear in the ghostlist (and the client name hasn't changed).
Thank you for your answer.
fra
list Bruce White
Is the client software actually running on the client? Is the client software hung? A hung client process will also cause this condition. ......Bruce Bruce White Senior Enterprise Systems Engineer | Phone: XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc. From: user-51f34f8db11e@xymon.invalid [mailto:user-51f34f8db11e@xymon.invalid] Sent: Monday, November 30, 2009 1:42 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Hobbit: tests suddenly went purple and they did not reappear
▸
Hi Iain,
still no luck here:
- telnet test on port 1984 looks good.
- the problemativc host does not appear in the ghostlist (and the client
name hasn't changed).
Thank you for your answer.
fra