Xymon Mailing List Archive search

bbdisplay problems after adding some new hosts

6 messages in this thread

list Stefan Loos · Wed, 11 May 2005 07:41:37 +0000 ·
Hi,

yesterday I add some new hosts to my hobbit-server and short after that 
hobbit had some problems.
Here is what hobbitlauch.log says:

2005-05-11 09:11:18 Heartbeat lost for task hobbitd, bouncing it
2005-05-11 09:11:18 Task bbretest started with PID 4523
2005-05-11 09:11:23 Heartbeat lost for task hobbitd, killing it
2005-05-11 09:11:23 Task bbdisplay started with PID 4524
2005-05-11 09:11:23 Task hobbitd terminated by signal 9
2005-05-11 09:11:23 Task hobbitd started with PID 4525
2005-05-11 09:11:23 Loading hostnames
2005-05-11 09:11:23 Loading saved state
2005-05-11 09:11:23 Setting up network listener on 0.0.0.0:1984
2005-05-11 09:11:23 Setting up signal handlers
2005-05-11 09:11:23 Setting up hobbitd channels
2005-05-11 09:11:23 Setting up logfiles
2005-05-11 09:11:28 Task bbhistory started with PID 4527
2005-05-11 09:11:28 Task bbenadis started with PID 4528
2005-05-11 09:11:28 Task bbpage started with PID 4530
2005-05-11 09:11:28 Task larrdstatus started with PID 4532
2005-05-11 09:11:28 Task larrddata started with PID 4534
2005-05-11 09:12:18 Task bbretest started with PID 4541
2005-05-11 09:12:23 Task bbdisplay started with PID 4542
2005-05-11 09:12:43 Heartbeat lost for task hobbitd, bouncing it
2005-05-11 09:12:48 Heartbeat lost for task hobbitd, killing it
2005-05-11 09:12:48 Task hobbitd terminated by signal 9
2005-05-11 09:12:48 Task bbdisplay terminated by signal 15

So I tried to find out which component causes the problem and disabled 
everything in hobbitlauch.cfg and reenabled one by one.
I found out that everytime I enabled bbdisplay those errors occour.
The bb-display.log looks like this:

2005-05-11 09:09:48 Whoops ! bb failed to send message - timeout
2005-05-11 09:09:48 hobbitd status-board not available
2005-05-11 09:09:53 Whoops ! bb failed to send message - timeout
2005-05-11 09:10:53 Whoops ! bb failed to send message - timeout
2005-05-11 09:10:53 hobbitd status-board not available
2005-05-11 09:11:23 Could not connect to bbd at 10.207.193.41:1984 - Connection 
refused
2005-05-11 09:11:23 Whoops ! bb failed to send message - Connection failed
2005-05-11 09:11:23 hobbitd status-board not available
2005-05-11 09:11:23 Could not connect to bbd at 10.207.193.41:1984 - Connection 
refused
2005-05-11 09:11:23 Whoops ! bb failed to send message - Connection failed

I also found some core files in ~server/tmp but I'm pretty shure they came 
from killing hobbit - nevertheless I've run the gdb util:

GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db 
library "/lib/tls/libthread_db.so.1".

Core was generated by `hobbitd --debug --pidfile=/var/log/hobbit/hobbitd.pid 
--restart=/usr/local/hobb'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0x00df4cef in raise () from /lib/tls/libc.so.6
(gdb) bt
#0  0x00df4cef in raise () from /lib/tls/libc.so.6
#1  0x00df64f5 in abort () from /lib/tls/libc.so.6
#2  0x08054126 in sigsegv_handler (signum=11) at sig.c:57
#3  <signal handler called>
#4  0x00e46cac in mempcpy () from /lib/tls/libc.so.6
#5  0x00e3a4d2 in _IO_default_xsputn_internal () from /lib/tls/libc.so.6
#6  0x00e13527 in vfprintf () from /lib/tls/libc.so.6
#7  0x00e2f3dc in vsprintf () from /lib/tls/libc.so.6
#8  0x00e1a03d in sprintf () from /lib/tls/libc.so.6
#9  0x0804d7a4 in do_message (msg=0x9e0b3f8, origin=0x80554bb "") at 
hobbitd.c:1903
#10 0x0804fcb5 in main (argc=8, argv=0xbfff9084) at hobbitd.c:2944
(gdb)

Now I try to find out which of the new hosts - and what test causes the 
problems...

Regards,

Stefan Loos
list Henrik Størner · Wed, 11 May 2005 11:06:13 +0200 ·
Could you try removing the "HEARTBEAT" line from hobbitlaunch.cfg and
see if things run OK after that ?


Regards,
Henrik
quoted from Stefan Loos

On Wed, May 11, 2005 at 07:41:37AM +0000, Stefan Loos wrote:
Hi,

yesterday I add some new hosts to my hobbit-server and short after that 
hobbit had some problems.
Here is what hobbitlauch.log says:

2005-05-11 09:11:18 Heartbeat lost for task hobbitd, bouncing it
2005-05-11 09:11:18 Task bbretest started with PID 4523
2005-05-11 09:11:23 Heartbeat lost for task hobbitd, killing it
2005-05-11 09:11:23 Task bbdisplay started with PID 4524
2005-05-11 09:11:23 Task hobbitd terminated by signal 9
2005-05-11 09:11:23 Task hobbitd started with PID 4525
2005-05-11 09:11:23 Loading hostnames
2005-05-11 09:11:23 Loading saved state
2005-05-11 09:11:23 Setting up network listener on 0.0.0.0:1984
2005-05-11 09:11:23 Setting up signal handlers
2005-05-11 09:11:23 Setting up hobbitd channels
2005-05-11 09:11:23 Setting up logfiles
2005-05-11 09:11:28 Task bbhistory started with PID 4527
2005-05-11 09:11:28 Task bbenadis started with PID 4528
2005-05-11 09:11:28 Task bbpage started with PID 4530
2005-05-11 09:11:28 Task larrdstatus started with PID 4532
2005-05-11 09:11:28 Task larrddata started with PID 4534
2005-05-11 09:12:18 Task bbretest started with PID 4541
2005-05-11 09:12:23 Task bbdisplay started with PID 4542
2005-05-11 09:12:43 Heartbeat lost for task hobbitd, bouncing it
2005-05-11 09:12:48 Heartbeat lost for task hobbitd, killing it
2005-05-11 09:12:48 Task hobbitd terminated by signal 9
2005-05-11 09:12:48 Task bbdisplay terminated by signal 15

So I tried to find out which component causes the problem and disabled 
everything in hobbitlauch.cfg and reenabled one by one.
I found out that everytime I enabled bbdisplay those errors occour.
The bb-display.log looks like this:

2005-05-11 09:09:48 Whoops ! bb failed to send message - timeout
2005-05-11 09:09:48 hobbitd status-board not available
2005-05-11 09:09:53 Whoops ! bb failed to send message - timeout
2005-05-11 09:10:53 Whoops ! bb failed to send message - timeout
2005-05-11 09:10:53 hobbitd status-board not available
2005-05-11 09:11:23 Could not connect to bbd at 10.207.193.41:1984 - 
Connection refused
2005-05-11 09:11:23 Whoops ! bb failed to send message - Connection failed
2005-05-11 09:11:23 hobbitd status-board not available
2005-05-11 09:11:23 Could not connect to bbd at 10.207.193.41:1984 - 
Connection refused
2005-05-11 09:11:23 Whoops ! bb failed to send message - Connection failed

I also found some core files in ~server/tmp but I'm pretty shure they came 
from killing hobbit - nevertheless I've run the gdb util:

GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host 
libthread_db library "/lib/tls/libthread_db.so.1".

Core was generated by `hobbitd --debug 
--pidfile=/var/log/hobbit/hobbitd.pid --restart=/usr/local/hobb'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0x00df4cef in raise () from /lib/tls/libc.so.6
(gdb) bt
#0  0x00df4cef in raise () from /lib/tls/libc.so.6
#1  0x00df64f5 in abort () from /lib/tls/libc.so.6
#2  0x08054126 in sigsegv_handler (signum=11) at sig.c:57
#3  <signal handler called>
#4  0x00e46cac in mempcpy () from /lib/tls/libc.so.6
#5  0x00e3a4d2 in _IO_default_xsputn_internal () from /lib/tls/libc.so.6
#6  0x00e13527 in vfprintf () from /lib/tls/libc.so.6
#7  0x00e2f3dc in vsprintf () from /lib/tls/libc.so.6
#8  0x00e1a03d in sprintf () from /lib/tls/libc.so.6
#9  0x0804d7a4 in do_message (msg=0x9e0b3f8, origin=0x80554bb "") at 
hobbitd.c:1903
#10 0x0804fcb5 in main (argc=8, argv=0xbfff9084) at hobbitd.c:2944
(gdb)

Now I try to find out which of the new hosts - and what test causes the 
problems...

Regards,

Stefan Loos

-- 

Henrik Storner
list Stefan Loos · Wed, 11 May 2005 09:49:30 +0000 ·
Hi Henrik,

now the errors in the hobbitlaunch.log are gone but in bb-display.log are still there. And another strange thing - since I reenabled the bbdisplay this morning I didn't see any host at the hobbit server! Just the subpages and groups are there.

Regards,

Stefan Loos

<br><br><br>&gt;From: user-ce4a2c883f75@xymon.invalid (Henrik Stoerner)<br>&gt;Reply-To: user-ae9b8668bcde@xymon.invalid<br>&gt;To: user-ae9b8668bcde@xymon.invalid<br>&gt;Subject: Re: [hobbit] bbdisplay problems after adding some new hosts<br>&gt;Date: Wed, 11 May 2005 11:06:13 +0200<br>&gt;<br>&gt;Could you try removing the &quot;HEARTBEAT&quot; line from hobbitlaunch.cfg and<br>&gt;see if things run OK after that ?<br>&gt;<br>&gt;<br>&gt;Regards,<br>&gt;Henrik<br>&gt;<br>&gt;On Wed, May 11, 2005 at 07:41:37AM +0000, Stefan Loos wrote:<br>&gt; &gt; Hi,<br>&gt; &gt;<br>&gt; &gt; yesterday I add some new hosts to my hobbit-server and short after that<br>&gt; &gt; hobbit had some problems.<br>&gt; &gt; Here is what hobbitlauch.log says:<br>&gt; &gt;<br>&gt; &gt; 2005-05-11 09:11:18 Heartbeat lost for task hobbitd, bouncing it<br>&gt; &gt; 2005-05-11 09:11:18 Task bbretest started with PID 4523<br>&gt; &gt; 2005-05-11 09:11:23 Heartbeat lost for task hobbitd, killing it<br>&gt; &gt; 2005-05-11 09:11:23 Task bbdisplay started with PID 4524<br>&gt; &gt; 2005-05-11 09:11:23 Task hobbitd terminated by signal 9<br>&gt; &gt; 2005-05-11 09:11:23 Task hobbitd started with PID 4525<br>&gt; &gt; 2005-05-11 09:11:23 Loading hostnames<br>&gt; &gt; 2005-05-11 09:11:23 Loading saved state<br>&gt; &gt; 2005-05-11 09:11:23 Setting up network listener on 0.0.0.0:1984<br>&gt; &gt; 2005-05-11 09:11:23 Setting up signal handlers<br>&gt; &gt; 2005-05-11 09:11:23 Setting up hobbitd channels<br>&gt; &gt; 2005-05-11 09:11:23 Setting up logfiles<br>&gt; &gt; 2005-05-11 09:11:28 Task bbhistory started with PID 4527<br>&gt; &gt; 2005-05-11 09:11:28 Task bbenadis started with PID 4528<br>&gt; &gt; 2005-05-11 09:11:28 Task bbpage started with PID 4530<br>&gt; &gt; 2005-05-11 09:11:28 Task larrdstatus started with PID 4532<br>&gt; &gt; 2005-05-11 09:11:28 Task larrddata started with PID 4534<br>&gt; &gt; 2005-05-11 09:12:18 Task bbretest started with PID 4541<br>&gt; &gt; 2005-05-11 09:12:23 Task bbdisplay started with PID 4542<br>&gt; &gt; 2005-05-11 09:12:43 Heartbeat lost for task hobbitd, bouncing it<br>&gt; &gt; 2005-05-11 09:12:48 Heartbeat lost for task hobbitd, killing it<br>&gt; &gt; 2005-05-11 09:12:48 Task hobbitd terminated by signal 9<br>&gt; &gt; 2005-05-11 09:12:48 Task bbdisplay terminated by signal 15<br>&gt; &gt;<br>&gt; &gt; So I tried to find out which component causes the problem and disabled<br>&gt; &gt; everything in hobbitlauch.cfg and reenabled one by one.<br>&gt; &gt; I found out that everytime I enabled bbdisplay those errors occour.<br>&gt; &gt; The bb-display.log looks like this:<br>&gt; &gt;<br>&gt; &gt; 2005-05-11 09:09:48 Whoops ! bb failed to send message - timeout<br>&gt; &gt; 2005-05-11 09:09:48 hobbitd status-board not available<br>&gt; &gt; 2005-05-11 09:09:53 Whoops ! bb failed to send message - timeout<br>&gt; &gt; 2005-05-11 09:10:53 Whoops ! bb failed to send message - timeout<br>&gt; &gt; 2005-05-11 09:10:53 hobbitd status-board not available<br>&gt; &gt; 2005-05-11 09:11:23 Could not connect to bbd at 10.207.193.41:1984 -<br>&gt; &gt; Connection refused<br>&gt; &gt; 2005-05-11 09:11:23 Whoops ! bb failed to send message - Connection failed<br>&gt; &gt; 2005-05-11 09:11:23 hobbitd status-board not available<br>&gt; &gt; 2005-05-11 09:11:23 Could not connect to bbd at 10.207.193.41:1984 -<br>&gt; &gt; Connection refused<br>&gt; &gt; 2005-05-11 09:11:23 Whoops ! bb failed to send message - Connection failed<br>&gt; &gt;<br>&gt; &gt; I also found some core files in ~server/tmp but I'm pretty shure they came<br>&gt; &gt; from killing hobbit - nevertheless I've run the gdb util:<br>&gt; &gt;<br>&gt; &gt; GNU gdb Red Hat Linux (6.1post-1.20040607.52rh)<br>&gt; &gt; Copyright 2004 Free Software Foundation, Inc.<br>&gt; &gt; GDB is free software, covered by the GNU General Public License, and you are<br>&gt; &gt; welcome to change it and/or distribute copies of it under certain<br>&gt; &gt; conditions.<br>&gt; &gt; Type &quot;show copying&quot; to see the conditions.<br>&gt; &gt; There is absolutely no warranty for GDB.  Type &quot;show warranty&quot; for details.<br>&gt; &gt; This GDB was configured as &quot;i386-redhat-linux-gnu&quot;...Using host<br>&gt; &gt; libthread_db library &quot;/lib/tls/libthread_db.so.1&quot;.<br>&gt; &gt;<br>&gt; &gt; Core was generated by `hobbitd --debug<br>&gt; &gt; --pidfile=/var/log/hobbit/hobbitd.pid --restart=/usr/local/hobb'.<br>&gt; &gt; Program terminated with signal 6, Aborted.<br>&gt; &gt; Reading symbols from /lib/tls/libc.so.6...done.<br>&gt; &gt; Loaded symbols for /lib/tls/libc.so.6<br>&gt; &gt; Reading symbols from /lib/ld-linux.so.2...done.<br>&gt; &gt; Loaded symbols for /lib/ld-linux.so.2<br>&gt; &gt; #0  0x00df4cef in raise () from /lib/tls/libc.so.6<br>&gt; &gt; (gdb) bt<br>&gt; &gt; #0  0x00df4cef in raise () from /lib/tls/libc.so.6<br>&gt; &gt; #1  0x00df64f5 in abort () from /lib/tls/libc.so.6<br>&gt; &gt; #2  0x08054126 in sigsegv_handler (signum=11) at sig.c:57<br>&gt; &gt; #3  &lt;signal handler called&gt;<br>&gt; &gt; #4  0x00e46cac in mempcpy () from /lib/tls/libc.so.6<br>&gt; &gt; #5  0x00e3a4d2 in _IO_default_xsputn_internal () from /lib/tls/libc.so.6<br>&gt; &gt; #6  0x00e13527 in vfprintf () from /lib/tls/libc.so.6<br>&gt; &gt; #7  0x00e2f3dc in vsprintf () from /lib/tls/libc.so.6<br>&gt; &gt; #8  0x00e1a03d in sprintf () from /lib/tls/libc.so.6<br>&gt; &gt; #9  0x0804d7a4 in do_message (msg=0x9e0b3f8, origin=0x80554bb &quot;&quot;) at<br>&gt; &gt; hobbitd.c:1903<br>&gt; &gt; #10 0x0804fcb5 in main (argc=8, argv=0xbfff9084) at hobbitd.c:2944<br>&gt; &gt; (gdb)<br>&gt; &gt;<br>&gt; &gt; Now I try to find out which of the new hosts - and what test causes the<br>&gt; &gt; problems...<br>&gt; &gt;<br>&gt; &gt; Regards,<br>&gt; &gt;<br>&gt; &gt; Stefan Loos<br>&gt; &gt;<br>&gt; &gt;<br>&gt; &gt;<br>&gt; &gt; To unsubscribe from the hobbit list, send an e-mail to<br>&gt; &gt; user-095ef1c764a2@xymon.invalid<br>&gt; &gt;<br>&gt; &gt;<br>&gt;<br>&gt;--<br>&gt;Henrik Storner<br>&gt;<br>&gt;To unsubscribe from the hobbit list, send an e-mail to<br>&gt;user-095ef1c764a2@xymon.invalid<br>&gt;<br>&gt;<br>
list Stefan Loos · Thu, 12 May 2005 12:31:44 +0000 ·
Hello,

is the number of tests per host limited? I've cleaned my bb-hosts and as I add one host with many tests. I disabled all customized tests and tried to find out if its a single test. But all test run by itself didn't crash the hobbit server - running all together did!
The host runs about 20 tests.

Regards,

Stefan


<br><br><br>&gt;From: &quot;Stefan Loos&quot; &lt;user-dea24d965402@xymon.invalid&gt;<br>&gt;Reply-To: user-ae9b8668bcde@xymon.invalid<br>&gt;To: user-ae9b8668bcde@xymon.invalid<br>&gt;Subject: Re: [hobbit] bbdisplay problems after adding some new hosts<br>&gt;Date: Wed, 11 May 2005 09:49:30 +0000<br>&gt;<br>&gt;Hi Henrik,<br>&gt;<br>&gt;now the errors in the hobbitlaunch.log are gone but in <br>&gt;bb-display.log are still there. And another strange thing - since I <br>&gt;reenabled the bbdisplay this morning I didn't see any host at the <br>&gt;hobbit server! Just the subpages and groups are there.<br>&gt;<br>&gt;Regards,<br>&gt;<br>&gt;Stefan Loos<br>&gt;<br>&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&amp;gt;From: user-ce4a2c883f75@xymon.invalid (Henrik <br>&gt;Stoerner)&lt;br&gt;&amp;gt;Reply-To: user-ae9b8668bcde@xymon.invalid&lt;br&gt;&amp;gt;To: <br>&gt;user-ae9b8668bcde@xymon.invalid&lt;br&gt;&amp;gt;Subject: Re: [hobbit] bbdisplay problems after <br>&gt;adding some new hosts&lt;br&gt;&amp;gt;Date: Wed, 11 May 2005 11:06:13 <br>&gt;+0200&lt;br&gt;&amp;gt;&lt;br&gt;&amp;gt;Could you try removing the <br>&gt;&amp;quot;HEARTBEAT&amp;quot; line from hobbitlaunch.cfg and&lt;br&gt;&amp;gt;see if <br>&gt;things run OK after that <br>&gt;?&lt;br&gt;&amp;gt;&lt;br&gt;&amp;gt;&lt;br&gt;&amp;gt;Regards,&lt;br&gt;&amp;gt;Henrik&lt;br&gt;&amp;gt;&lt;br&gt;&amp;gt;On <br>&gt;Wed, May 11, 2005 at 07:41:37AM +0000, Stefan Loos wrote:&lt;br&gt;&amp;gt; <br>&gt;&amp;gt; Hi,&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; yesterday I add some new hosts to <br>&gt;my hobbit-server and short after that&lt;br&gt;&amp;gt; &amp;gt; hobbit had some <br>&gt;problems.&lt;br&gt;&amp;gt; &amp;gt; Here is what hobbitlauch.log says:&lt;br&gt;&amp;gt; <br>&gt;&amp;gt;&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:18 Heartbeat lost for task <br>&gt;hobbitd, bouncing it&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:18 Task bbretest <br>&gt;started with PID 4523&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Heartbeat <br>&gt;lost for task hobbitd, killing it&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 <br>&gt;Task bbdisplay started with PID 4524&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 <br>&gt;09:11:23 Task hobbitd terminated by signal 9&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 <br>&gt;09:11:23 Task hobbitd started with PID 4525&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 <br>&gt;09:11:23 Loading hostnames&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Loading <br>&gt;saved state&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Setting up network <br>&gt;listener on 0.0.0.0:1984&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Setting up <br>&gt;signal handlers&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Setting up hobbitd <br>&gt;channels&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Setting up <br>&gt;logfiles&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:28 Task bbhistory started <br>&gt;with PID 4527&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:28 Task bbenadis started <br>&gt;with PID 4528&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:28 Task bbpage started <br>&gt;with PID 4530&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:28 Task larrdstatus <br>&gt;started with PID 4532&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:28 Task <br>&gt;larrddata started with PID 4534&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:12:18 <br>&gt;Task bbretest started with PID 4541&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:12:23 <br>&gt;Task bbdisplay started with PID 4542&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 <br>&gt;09:12:43 Heartbeat lost for task hobbitd, bouncing it&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;2005-05-11 09:12:48 Heartbeat lost for task hobbitd, killing <br>&gt;it&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:12:48 Task hobbitd terminated by <br>&gt;signal 9&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:12:48 Task bbdisplay terminated <br>&gt;by signal 15&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; So I tried to find out which <br>&gt;component causes the problem and disabled&lt;br&gt;&amp;gt; &amp;gt; everything in <br>&gt;hobbitlauch.cfg and reenabled one by one.&lt;br&gt;&amp;gt; &amp;gt; I found out <br>&gt;that everytime I enabled bbdisplay those errors occour.&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;The bb-display.log looks like this:&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;2005-05-11 09:09:48 Whoops ! bb failed to send message - <br>&gt;timeout&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:09:48 hobbitd status-board not <br>&gt;available&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:09:53 Whoops ! bb failed to <br>&gt;send message - timeout&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:10:53 Whoops ! bb <br>&gt;failed to send message - timeout&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:10:53 <br>&gt;hobbitd status-board not available&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 <br>&gt;Could not connect to bbd at 10.207.193.41:1984 -&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;Connection refused&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Whoops ! bb <br>&gt;failed to send message - Connection failed&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 <br>&gt;09:11:23 hobbitd status-board not available&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 <br>&gt;09:11:23 Could not connect to bbd at 10.207.193.41:1984 -&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;Connection refused&lt;br&gt;&amp;gt; &amp;gt; 2005-05-11 09:11:23 Whoops ! bb <br>&gt;failed to send message - Connection failed&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;I also found some core files in ~server/tmp but I'm pretty shure <br>&gt;they came&lt;br&gt;&amp;gt; &amp;gt; from killing hobbit - nevertheless I've run <br>&gt;the gdb util:&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; GNU gdb Red Hat Linux <br>&gt;(6.1post-1.20040607.52rh)&lt;br&gt;&amp;gt; &amp;gt; Copyright 2004 Free Software <br>&gt;Foundation, Inc.&lt;br&gt;&amp;gt; &amp;gt; GDB is free software, covered by the <br>&gt;GNU General Public License, and you are&lt;br&gt;&amp;gt; &amp;gt; welcome to <br>&gt;change it and/or distribute copies of it under certain&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;conditions.&lt;br&gt;&amp;gt; &amp;gt; Type &amp;quot;show copying&amp;quot; to see the <br>&gt;conditions.&lt;br&gt;&amp;gt; &amp;gt; There is absolutely no warranty for GDB.  <br>&gt;Type &amp;quot;show warranty&amp;quot; for details.&lt;br&gt;&amp;gt; &amp;gt; This GDB <br>&gt;was configured as &amp;quot;i386-redhat-linux-gnu&amp;quot;...Using <br>&gt;host&lt;br&gt;&amp;gt; &amp;gt; libthread_db library <br>&gt;&amp;quot;/lib/tls/libthread_db.so.1&amp;quot;.&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;Core was generated by `hobbitd --debug&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;--pidfile=/var/log/hobbit/hobbitd.pid <br>&gt;--restart=/usr/local/hobb'.&lt;br&gt;&amp;gt; &amp;gt; Program terminated with <br>&gt;signal 6, Aborted.&lt;br&gt;&amp;gt; &amp;gt; Reading symbols from <br>&gt;/lib/tls/libc.so.6...done.&lt;br&gt;&amp;gt; &amp;gt; Loaded symbols for <br>&gt;/lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; Reading symbols from <br>&gt;/lib/ld-linux.so.2...done.&lt;br&gt;&amp;gt; &amp;gt; Loaded symbols for <br>&gt;/lib/ld-linux.so.2&lt;br&gt;&amp;gt; &amp;gt; #0  0x00df4cef in raise () from <br>&gt;/lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; (gdb) bt&lt;br&gt;&amp;gt; &amp;gt; #0  0x00df4cef <br>&gt;in raise () from /lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; #1  0x00df64f5 in <br>&gt;abort () from /lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; #2  0x08054126 in <br>&gt;sigsegv_handler (signum=11) at sig.c:57&lt;br&gt;&amp;gt; &amp;gt; #3  &amp;lt;signal <br>&gt;handler called&amp;gt;&lt;br&gt;&amp;gt; &amp;gt; #4  0x00e46cac in mempcpy () from <br>&gt;/lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; #5  0x00e3a4d2 in <br>&gt;_IO_default_xsputn_internal () from /lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;#6  0x00e13527 in vfprintf () from /lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;#7  0x00e2f3dc in vsprintf () from /lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;#8  0x00e1a03d in sprintf () from /lib/tls/libc.so.6&lt;br&gt;&amp;gt; &amp;gt; #9 <br>&gt;  0x0804d7a4 in do_message (msg=0x9e0b3f8, origin=0x80554bb <br>&gt;&amp;quot;&amp;quot;) at&lt;br&gt;&amp;gt; &amp;gt; hobbitd.c:1903&lt;br&gt;&amp;gt; &amp;gt; #10 <br>&gt;0x0804fcb5 in main (argc=8, argv=0xbfff9084) at <br>&gt;hobbitd.c:2944&lt;br&gt;&amp;gt; &amp;gt; (gdb)&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; Now I <br>&gt;try to find out which of the new hosts - and what test causes <br>&gt;the&lt;br&gt;&amp;gt; &amp;gt; problems...&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; <br>&gt;Regards,&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; Stefan Loos&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; <br>&gt;&amp;gt;&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt; &amp;gt; To unsubscribe from the hobbit list, <br>&gt;send an e-mail to&lt;br&gt;&amp;gt; &amp;gt; user-095ef1c764a2@xymon.invalid&lt;br&gt;&amp;gt; <br>&gt;&amp;gt;&lt;br&gt;&amp;gt; &amp;gt;&lt;br&gt;&amp;gt;&lt;br&gt;&amp;gt;--&lt;br&gt;&amp;gt;Henrik <br>&gt;Storner&lt;br&gt;&amp;gt;&lt;br&gt;&amp;gt;To unsubscribe from the hobbit list, send an <br>&gt;e-mail to&lt;br&gt;&amp;gt;user-095ef1c764a2@xymon.invalid&lt;br&gt;&amp;gt;&lt;br&gt;&amp;gt;&lt;br&gt;<br>&gt;<br>&gt;<br>&gt;<br>&gt;To unsubscribe from the hobbit list, send an e-mail to<br>&gt;user-095ef1c764a2@xymon.invalid<br>&gt;<br>&gt;<br>
list Henrik Størner · Sun, 15 May 2005 08:46:01 +0200 ·
quoted from Stefan Loos
On Thu, May 12, 2005 at 12:31:44PM +0000, Stefan Loos wrote:
is the number of tests per host limited? I've cleaned my bb-hosts and as I add one host with many tests. I disabled all customized tests and tried to find out if its a single test. But all test run by itself didn't crash the hobbit server - running all together did!
The host runs about 20 tests.
No, there is no limit (other than running out of memory, but I think we
can rule out that one).

It seems to crash while loading the checkpoint file. Could you send me that file - it's the ~hobbit/server/tmp/hobbitd.chk file ? I believe it
has somehow become corrupted, but that still shouldn't crash the server.


Regards,
Henrik
list Stefan Loos · Tue, 17 May 2005 05:34:49 +0000 ·
Hello Henrik,

I will send you the checkpoint file directly.
But I found out that it seems to be a single test after all (first time I didn't wait long enough).
Everytime I send the attached status message to my hobbit server it will crash after 5-10 minutes.

Regards,

Stefan


<br><br><br>&gt;From: user-ce4a2c883f75@xymon.invalid (Henrik Stoerner)<br>&gt;Reply-To: user-ae9b8668bcde@xymon.invalid<br>&gt;To: user-ae9b8668bcde@xymon.invalid<br>&gt;Subject: Re: [hobbit] bbdisplay problems after adding some new hosts<br>&gt;Date: Sun, 15 May 2005 08:46:01 +0200<br>&gt;<br>&gt;On Thu, May 12, 2005 at 12:31:44PM +0000, Stefan Loos wrote:<br>&gt; &gt;<br>&gt; &gt; is the number of tests per host limited? I've cleaned my bb-hosts and as I<br>&gt; &gt; add one host with many tests. I disabled all customized tests and tried to<br>&gt; &gt; find out if its a single test. But all test run by itself didn't crash the<br>&gt; &gt; hobbit server - running all together did!<br>&gt; &gt; The host runs about 20 tests.<br>&gt;<br>&gt;No, there is no limit (other than running out of memory, but I think we<br>&gt;can rule out that one).<br>&gt;<br>&gt;It seems to crash while loading the checkpoint file. Could you send me<br>&gt;that file - it's the ~hobbit/server/tmp/hobbitd.chk file ? I believe it<br>&gt;has somehow become corrupted, but that still shouldn't crash the server.<br>&gt;<br>&gt;<br>&gt;Regards,<br>&gt;Henrik<br>&gt;<br>&gt;<br>&gt;To unsubscribe from the hobbit list, send an e-mail to<br>&gt;user-095ef1c764a2@xymon.invalid<br>&gt;<br>&gt;<br>
Attachments (1)