Xymon Mailing List Archive search

network test timeouts for hung TCP connects?

list Henrik Størner
Wed, 26 Jan 2005 23:29:13 +0100
Message-Id: <user-f1f4fd35ba26@xymon.invalid>

On Wed, Jan 26, 2005 at 03:17:01PM -0700, Charles Jones wrote:
My production BigBrother server is running BigBrother + bbgen 2.5 (I know there is newer bbgen, I plan on replacing BB with a Hobbit server).  
Wow, that's a pretty old bbgen version - 1œ years, in fact.
My current bb+bbgen setup has problems whenever a machine dies in such a way that it is pingable, but when you connect to any open TCP port you get nothing back (usually caused by a memory error or overheating).  When my current bb+bbgen setup tries to test one of these machines that has zombified, it gets hung testing that host, and eventually everything turns purple since  bb isn't updating anymore.

Does Hobbit have proper timeouts to timeout a hung TCP connection so this sort of thing does not happen?
If not, then it's definitely a bug. All network tests done by Hobbit
must timeout if the other end doesn't respond. The default timeout is
10 seconds (set with the "--timeout=N" option to bbtest-net).

Looking back through the bbgen changelog, there are a couple of
bugfixes through the 2.x series that seem likely to fix it. But
without knowing exactly what's triggering this behaviour it is hard to
say for sure.


Henrik