Xymon Mailing List Archive search

bbtest-net hangs

5 messages in this thread

list Iain M Conochie · Mon, 4 Sep 2006 09:16:34 +0100 (BST) ·
Morning hobbiters

    I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. Basically, after running for about 15 hours or so, the bbtest-net command hangs, and all network tests turn purple. Here is the stanza from the hobbitlaunch.cfg file with regards the bbtest-net

[bbnet]
         ENVFILE /BIG/usr/local/hobbit/server/etc/hobbitserver.cfg
         NEEDS hobbitd
         CMD bbtest-net --report --ping --checkresponse --debug
         LOGFILE $BBSERVERLOGS/bb-network.log
         INTERVAL 5m

    The last thing in the logs is the retrieval of the DNS for all the hosts and then ... nothing. i.e

2006-09-02 17:02:11 Got DNS result for host narya.servista.com : 192.168.1.32
2006-09-02 17:02:11 Got DNS result for host skye.int.servista.com : 192.168.1.452006-09-02 17:02:11 Got DNS result for host oban.int.servista.com : 192.168.1.722006-09-02 17:02:11 Got DNS result for host orkney.servista.com : 192.168.1.30
2006-09-02 17:02:11 Got DNS result for host islay.servista.com : 192.168.1.28
2006-09-02 17:02:11 Got DNS result for host tennessee.int.servista.com : 192.168.1.109

    Has anyone had any experience of this kind of issue before? Is there any way I can get some more logging to see what is happening?

    Cheers

Iain Conochie
UNIX Systems Administrator
COLT Telecommunications PLC
list Henrik Størner · Mon, 4 Sep 2006 11:11:45 +0200 ·
quoted from Iain M Conochie
On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:
   I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. 
Basically, after running for about 15 hours or so, the bbtest-net command 
hangs, and all network tests turn purple.
That's obviously bad. I suppose you killed off the bbtest-net process to
get things running again. If it happens again, could you kill it with
   kill -6 <bbtest-net PID>
This causes it to dump a core-file in the ~hobbit/server/tmp/ directory
which should help me track down where it is hanging.

Did you notice if the process was using a lot of cpu time, or if it
was just completely stalled? Was there an "fping" or "hobbitping"
process hanging around also?


Regards,
Henrik
list Iain M Conochie · Mon, 4 Sep 2006 10:21:58 +0100 (BST) ·
quoted from Henrik Størner
On Mon, 4 Sep 2006, Henrik Stoerner wrote:
On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:
   I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system.
Basically, after running for about 15 hours or so, the bbtest-net command
hangs, and all network tests turn purple.
That's obviously bad. I suppose you killed off the bbtest-net process to
get things running again. If it happens again, could you kill it with
  kill -6 <bbtest-net PID>
This causes it to dump a core-file in the ~hobbit/server/tmp/ directory
which should help me track down where it is hanging.
OK I will try that. Basically what i was doing was restarting the whole 
hobbit server. The next tiem we have this issue i will try the kill -6 
command to get the core dump.
quoted from Henrik Størner
Did you notice if the process was using a lot of cpu time, or if it
was just completely stalled? Was there an "fping" or "hobbitping"
process hanging around also?
Basically, the bbtest-net program stalled and the process was hainging 
around. CPU usage was normal. I am using the hobbitping command


Cheers

Iain

Regards,
Henrik

list Rich Smrcina · Mon, 04 Sep 2006 07:25:56 -0500 ·
This sounds somewhat like the problem that I reported a few days ago. Except in my case I could not get access to the machine to check any processes, it was hung tight.  The hypervisor does indicate that it was using alot of CPU time, which would suggest a loop.
quoted from Iain M Conochie

Henrik Stoerner wrote:
On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:
   I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. Basically, after running for about 15 hours or so, the bbtest-net command hangs, and all network tests turn purple.
That's obviously bad. I suppose you killed off the bbtest-net process to
get things running again. If it happens again, could you kill it with
   kill -6 <bbtest-net PID>
This causes it to dump a core-file in the ~hobbit/server/tmp/ directory
which should help me track down where it is hanging.

Did you notice if the process was using a lot of cpu time, or if it
was just completely stalled? Was there an "fping" or "hobbitping"
process hanging around also?


Regards,
Henrik

-- 

Rich Smrcina
VM Assist, Inc.
Phone: XXX-XXX-XXXX
Ans Service:  XXX-XXX-XXXX
user-61add9955ef9@xymon.invalid

Catch the WAVV!  http://www.wavv.org
WAVV 2007 - Green Bay, WI - May 18-22, 2007
list Iain M Conochie · Sun, 10 Sep 2006 15:17:00 +0100 ·
quoted from Rich Smrcina
On Mon, 2006-09-04 at 11:11 +0200, Henrik Stoerner wrote:
On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:
   I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. > Basically, after running for about 15 hours or so, the bbtest-net command > hangs, and all network tests turn purple.
That's obviously bad. I suppose you killed off the bbtest-net process to
get things running again. If it happens again, could you kill it with
   kill -6 <bbtest-net PID>
This causes it to dump a core-file in the ~hobbit/server/tmp/ directory
which should help me track down where it is hanging.
OK - got that. How can I analyse this to get the information you need?
quoted from Rich Smrcina
Did you notice if the process was using a lot of cpu time, or if it
was just completely stalled? Was there an "fping" or "hobbitping"
process hanging around also?
Nope - none at all.

Cheers

Iain
Regards,
Henrik