bbtest-net hangs
list Iain M Conochie
Morning hobbiters
I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. Basically, after running for about 15 hours or so, the bbtest-net command hangs, and all network tests turn purple. Here is the stanza from the hobbitlaunch.cfg file with regards the bbtest-net
[bbnet]
ENVFILE /BIG/usr/local/hobbit/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD bbtest-net --report --ping --checkresponse --debug
LOGFILE $BBSERVERLOGS/bb-network.log
INTERVAL 5m
The last thing in the logs is the retrieval of the DNS for all the hosts and then ... nothing. i.e
2006-09-02 17:02:11 Got DNS result for host narya.servista.com : 192.168.1.32
2006-09-02 17:02:11 Got DNS result for host skye.int.servista.com : 192.168.1.452006-09-02 17:02:11 Got DNS result for host oban.int.servista.com : 192.168.1.722006-09-02 17:02:11 Got DNS result for host orkney.servista.com : 192.168.1.30
2006-09-02 17:02:11 Got DNS result for host islay.servista.com : 192.168.1.28
2006-09-02 17:02:11 Got DNS result for host tennessee.int.servista.com : 192.168.1.109
Has anyone had any experience of this kind of issue before? Is there any way I can get some more logging to see what is happening?
Cheers
Iain Conochie
UNIX Systems Administrator
COLT Telecommunications PLC
list Henrik Størner
▸
On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:
I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. Basically, after running for about 15 hours or so, the bbtest-net command hangs, and all network tests turn purple.
That's obviously bad. I suppose you killed off the bbtest-net process to get things running again. If it happens again, could you kill it with kill -6 <bbtest-net PID> This causes it to dump a core-file in the ~hobbit/server/tmp/ directory which should help me track down where it is hanging. Did you notice if the process was using a lot of cpu time, or if it was just completely stalled? Was there an "fping" or "hobbitping" process hanging around also? Regards, Henrik
list Iain M Conochie
▸
On Mon, 4 Sep 2006, Henrik Stoerner wrote:
On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. Basically, after running for about 15 hours or so, the bbtest-net command hangs, and all network tests turn purple.That's obviously bad. I suppose you killed off the bbtest-net process to get things running again. If it happens again, could you kill it with kill -6 <bbtest-net PID> This causes it to dump a core-file in the ~hobbit/server/tmp/ directory which should help me track down where it is hanging.
OK I will try that. Basically what i was doing was restarting the whole hobbit server. The next tiem we have this issue i will try the kill -6 command to get the core dump.
▸
Did you notice if the process was using a lot of cpu time, or if it was just completely stalled? Was there an "fping" or "hobbitping" process hanging around also?
Basically, the bbtest-net program stalled and the process was hainging around. CPU usage was normal. I am using the hobbitping command Cheers Iain
Regards, Henrik
list Rich Smrcina
This sounds somewhat like the problem that I reported a few days ago. Except in my case I could not get access to the machine to check any processes, it was hung tight. The hypervisor does indicate that it was using alot of CPU time, which would suggest a loop.
▸
Henrik Stoerner wrote:On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. Basically, after running for about 15 hours or so, the bbtest-net command hangs, and all network tests turn purple.That's obviously bad. I suppose you killed off the bbtest-net process to get things running again. If it happens again, could you kill it with kill -6 <bbtest-net PID> This causes it to dump a core-file in the ~hobbit/server/tmp/ directory which should help me track down where it is hanging. Did you notice if the process was using a lot of cpu time, or if it was just completely stalled? Was there an "fping" or "hobbitping" process hanging around also? Regards, Henrik
--
Rich Smrcina VM Assist, Inc. Phone: XXX-XXX-XXXX Ans Service: XXX-XXX-XXXX user-61add9955ef9@xymon.invalid Catch the WAVV! http://www.wavv.org WAVV 2007 - Green Bay, WI - May 18-22, 2007
list Iain M Conochie
▸
On Mon, 2006-09-04 at 11:11 +0200, Henrik Stoerner wrote:
On Mon, Sep 04, 2006 at 09:16:34AM +0100, Iain M Conochie wrote:I am having an issue with hobbit 4.2.0 built on a RedHat 9.0 system. > Basically, after running for about 15 hours or so, the bbtest-net command > hangs, and all network tests turn purple.That's obviously bad. I suppose you killed off the bbtest-net process to get things running again. If it happens again, could you kill it with kill -6 <bbtest-net PID> This causes it to dump a core-file in the ~hobbit/server/tmp/ directory which should help me track down where it is hanging.
OK - got that. How can I analyse this to get the information you need?
▸
Did you notice if the process was using a lot of cpu time, or if it was just completely stalled? Was there an "fping" or "hobbitping" process hanging around also?
Nope - none at all. Cheers Iain
Regards, Henrik