Xymon Mailing List Archive search

hobbitd crashing (both 4.0.2 and latest snapshot) on Solaris 8

3 messages in this thread

list Stephen Menton · Tue, 10 Apr 2007 18:28:43 -0700 ·
After adding additional tests today I saw that some of my tests weren't
getting through. I checked their logs and saw repeted lines like:
2007-04-10 20:17:59 Could not connect to user-f34d12f1f208@xymon.invalid:1985 -
Connection refused
2007-04-10 20:17:59 Whoops ! bb failed to send message - Connection
failed
(replaced actual IP with xxx.xxx.xxx.xxx)

Additionally, both in my previous install of hobbit and the current
snapshot I'm seeing the following messages written to
/var/log/hobbit/hobbitlaunch.log:
2007-04-10 18:13:16 Setting up network listener on 0.0.0.0:1985
2007-04-10 18:13:16 Setting up local listener
2007-04-10 18:13:17 Task bbdisplay terminated by signal 15
2007-04-10 18:13:17 Setting up signal handlers
2007-04-10 18:13:17 Setting up hobbitd channels
2007-04-10 18:13:17 Setting up logfiles
2007-04-10 18:16:21 Task hobbitd terminated by signal 6
2007-04-10 18:16:21 Task bbdisplay terminated by signal 15

I will install gdb and see if I can get more info to provide but am
wondering if this is a known issue. I need help as I need to complete a
bb4->hobbit migration by next Friday, have a meeting scheduled tomorrow
to show my progress, and hobbit is "dead" at the moment.

Help?
stephen
list Henrik Størner · Wed, 11 Apr 2007 07:55:32 +0200 ·
quoted from Stephen Menton
On Tue, Apr 10, 2007 at 06:28:43PM -0700, Menton, Stephen wrote:
2007-04-10 18:13:17 Setting up logfiles
2007-04-10 18:16:21 Task hobbitd terminated by signal 6
This should leave a core dump in ~hobbit/server/tmp/ . 
Please run this through gdb, see
http://www.hswn.dk/hobbit/help/known-issues.html#bugreport

If possible, send me (directly, off-list) a copy of your bb-hosts 
file, the hobbitserver.cfg file, and the ~hobbit/server/tmp/hobbitd.chk
file. Note that these have lots of information about the hosts you're
monitoring, so if that is considered confidential make sure you're
allowed to send it to me.


Regards,
Henrik
list Henrik Størner · Thu, 12 Apr 2007 07:37:25 +0200 ·
quoted from Stephen Menton
On Tue, Apr 10, 2007 at 06:28:43PM -0700, Menton, Stephen wrote:
Additionally, both in my previous install of hobbit and the current
snapshot I'm seeing the following messages written to
/var/log/hobbit/hobbitlaunch.log:
2007-04-10 18:13:16 Setting up network listener on 0.0.0.0:1985
2007-04-10 18:13:16 Setting up local listener
2007-04-10 18:13:17 Task bbdisplay terminated by signal 15
2007-04-10 18:13:17 Setting up signal handlers
2007-04-10 18:13:17 Setting up hobbitd channels
2007-04-10 18:13:17 Setting up logfiles
2007-04-10 18:16:21 Task hobbitd terminated by signal 6
Stephen and I worked on this yesterday, and the problem turned out 
to be a general one, which his particular set of tests just happened 
to trigger easily. Specifically, he was sending a fairly large status
message from a script in one single line, i.e.
  $BB $BBDISP "status myhost.customtest green .... <several KB data>"
and this would cause hobbitd to crash the next time it should 
update the webpages.

You wouldn't have to have very long status messages to trigger this;
having a lot of smaller ones is enough. So this could be the reason
for some of the unexplained hobbitd crashes reported over time.

The attached patch solves this. I'll be updating the "allinone" patch
later today with this.


Regards,
Henrik

-------------- next part --------------
--- ../hobbit-4.2.0/hobbitd/hobbitd.c	2006-08-09 22:10:05.000000000 +0200
+++ hobbitd/hobbitd.c	2007-04-12 07:28:23.000000000 +0200
@@ -2167,6 +2175,7 @@
 		switch (f_type) {
 		  case F_ACKMSG: if (lwalk->ackmsg) needed += 2*strlen(lwalk->ackmsg); break;
 		  case F_DISMSG: if (lwalk->dismsg) needed += 2*strlen(lwalk->dismsg); break;
+		  case F_LINE1:
 		  case F_MSG: needed += 2*strlen(lwalk->message); break;
 
 		  case F_ACKLIST: