Xymon Mailing List Archive search

Hobbit: tests suddenly went purple and they did not reappear

4 messages in this thread

list Francesco Cicolani · Fri, 27 Nov 2009 15:06:30 +0100 ·
Hi all,

This is the situation:
hobbit server: hobbitd 4.2.0 monitoring a discrete number of hosts (different OSs) and network devices.
hobbit client on the problematic host: hobbit-client-4.2.0-1 running on RedHat EL AS rel4 Update 4.


The host was being monitored with no apparent problem until today early morning (ordinary tests: cpu, disk, memory, ports); then, all of a sudden, all tests except conn, info and trend went purple (we also have a devmon custom test querying snmp that was still working fine).


Tried to restart the client, tried also to restart the server, but got no luck. Tried also to restart both client and server, with no success again.
Finally, I tried to comment the host's line in bb-hosts and dropped the host with bb: the host correctly disappeared from the monitoring web page. After that, I uncommented the host's line in bb-hosts and restarted hobbit server and client.
From this point on, i can only see conn, info and trend tests on the monitoring webpage. Also, the devmon custom check (querying snmp oids on the monitored host) works fine.


Then I went on checking permissions and owner on the hobbit dirs on monitored host: they appeared consistent with those on other non-problematic hosts.


There are no peculiar configurations on hobbit conf files on the client host:
$HOBBITCLIENTHOME/etc/clientlautch.cfg:
[client]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/bin/hobbitclient.sh
LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
INTERVAL 5m

$HOBBITCLIENTHOME/etc/localclient.cfg:
DEFAULT
# These are the built-in defaults.
UP 1h
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97

$BBTMP/logfetch.myhost.cfg:
log:/var/log/messages:10240
ignore MARK


On the hobbit server side there are no peculiar settings for this same host in /etc/hobbit.


Other third-parties services running on the same host didn't seem to be affected by problems today.

Any hints?

thank you
fra
list Iain M Conochie · Fri, 27 Nov 2009 16:24:42 +0000 ·
quoted from Francesco Cicolani
<snip>
On the hobbit server side there are no peculiar settings for this same host in /etc/hobbit.
 
Other third-parties services running on the same host didn't seem to be affected by problems today.
 Any hints?
 From the client can you telnet to port 1984 (or what ever port your hobbit server is running on)?

Has the hostname of the client changed? Can you see a new hostname in ghost clients in the hobbitd test for your monitoring server?

Good luck

iain
 thank you
fra
list Francesco Cicolani · Mon, 30 Nov 2009 08:41:50 +0100 ·
Hi Iain,

still no luck here:
- telnet test on port 1984 looks good.
- the problemativc host does not appear in the ghostlist (and the client name hasn't changed).

Thank you for your answer.
fra
list Bruce White · Tue, 1 Dec 2009 08:18:12 -0600 ·
Is the client software actually running on the client?  Is the client
software hung?  A hung client process will also cause this condition.
 
   ......Bruce
 

 Bruce White
 Senior Enterprise Systems Engineer | Phone: XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/
 
 
Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.
 

From: user-51f34f8db11e@xymon.invalid
[mailto:user-51f34f8db11e@xymon.invalid] 
Sent: Monday, November 30, 2009 1:42 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Hobbit: tests suddenly went purple and they did
not reappear
quoted from Francesco Cicolani


Hi Iain,
 
still no luck here:
- telnet test on port 1984 looks good.
- the problemativc host does not appear in the ghostlist (and the client
name hasn't changed).
 
Thank you for your answer.
fra