Xymon Mailing List Archive search

Hobbit crashes

6 messages in this thread

list Christian Maxeiner · Wed, 1 Feb 2006 17:05:38 +0100 ·
Hi all,

I have just migrated from BB to Hobbit. I have run Hobbit beside my old
BB installation on port 1985 for testing and configuring. Today I have
shut down old BB and switched my Hobbit server and clients back to port
1984.

Everything worked fine but an hour later some of the hobbit processes
are crashing shortly after restarting hobbit. I am running hobbit
4.1.2p1 on HP-UX 11.11.

This is the output of the hobbitlaunch.log file:

...
2006-02-01 16:51:27 hobbitlaunch starting
2006-02-01 16:51:27 Loading tasklist configuration from
/users/hobbit4.0//server/etc/hobbitlaunch.cfg
2006-02-01 16:51:27 Loading hostnames
2006-02-01 16:51:27 Loading saved state
2006-02-01 16:51:27 Setting up network listener on 0.0.0.0:1984
2006-02-01 16:51:27 Setting up signal handlers
2006-02-01 16:51:27 Setting up hobbitd channels
2006-02-01 16:51:27 Setting up logfiles
2006-02-01 16:51:34 Task hobbitd terminated by signal 6
2006-02-01 16:51:34 Task bbretest terminated by signal 15
2006-02-01 16:51:34 Task bbdisplay terminated by signal 15
2006-02-01 16:51:34 Task clientdata terminated, status 1
2006-02-01 16:51:34 Task rrddata terminated, status 1
2006-02-01 16:51:34 Task rrdstatus terminated, status 1
2006-02-01 16:51:34 Task bbhistory terminated, status 1
2006-02-01 16:51:34 Task bbstatus terminated, status 1
2006-02-01 16:51:34 Loading hostnames
2006-02-01 16:51:34 Loading saved state
2006-02-01 16:51:34 Setting up network listener on 0.0.0.0:1984
2006-02-01 16:51:34 Setting up signal handlers
2006-02-01 16:51:34 Setting up hobbitd channels
2006-02-01 16:51:34 Setting up logfiles
2006-02-01 16:51:34 Task hobbitclient terminated by signal 15
2006-02-01 16:51:34 Task bbnet terminated by signal 15
2006-02-01 16:52:32 Task hobbitd terminated by signal 6
2006-02-01 16:52:32 Task bbretest terminated by signal 15
2006-02-01 16:52:32 Task clientdata terminated, status 1
2006-02-01 16:52:32 Task rrddata terminated, status 1
2006-02-01 16:52:32 Task rrdstatus terminated, status 1
2006-02-01 16:52:32 Task bbhistory terminated, status 1
2006-02-01 16:52:32 Task bbstatus terminated, status 1
2006-02-01 16:52:32 Loading hostnames
2006-02-01 16:52:32 Loading saved state
2006-02-01 16:52:32 Setting up network listener on 0.0.0.0:1984
2006-02-01 16:52:32 Setting up signal handlers
2006-02-01 16:52:32 Setting up hobbitd channels
2006-02-01 16:52:32 Setting up logfiles
2006-02-01 16:52:37 Task bbdisplay terminated by signal 15
2006-02-01 16:53:32 Task hobbitd terminated by signal 6
2006-02-01 16:53:32 Task bbretest terminated by signal 15
2006-02-01 16:53:32 Task clientdata terminated, status 1
2006-02-01 16:53:32 Task rrddata terminated, status 1
2006-02-01 16:53:32 Task rrdstatus terminated, status 1
2006-02-01 16:53:32 Task bbhistory terminated, status 1
2006-02-01 16:53:32 Task bbstatus terminated, status 1
2006-02-01 16:53:32 Task bbdisplay terminated by signal 15
2006-02-01 16:56:29 Task bb-swap terminated, status 5
2006-02-01 16:56:51 Task bb-httpbench terminated, status 5
2006-02-01 17:01:30 Task bb-swap terminated, status 5
2006-02-01 17:01:52 Task bb-httpbench terminated, status 5
2006-02-01 17:02:36 Loading hostnames
2006-02-01 17:02:36 Loading saved state
2006-02-01 17:02:36 Setting up network listener on 0.0.0.0:1984
2006-02-01 17:02:36 Setting up signal handlers
2006-02-01 17:02:36 Setting up hobbitd channels
2006-02-01 17:02:36 Setting up logfiles
2006-02-01 17:03:33 Task hobbitd terminated by signal 6
2006-02-01 17:03:34 Task clientdata terminated, status 1
2006-02-01 17:03:34 Task rrddata terminated, status 1
2006-02-01 17:03:34 Task rrdstatus terminated, status 1
2006-02-01 17:03:34 Task bbhistory terminated, status 1
2006-02-01 17:03:34 Task bbstatus terminated, status 1
2006-02-01 17:03:34 Loading hostnames
2006-02-01 17:03:34 Loading saved state
2006-02-01 17:03:34 Setting up network listener on 0.0.0.0:1984
2006-02-01 17:03:34 Setting up signal handlers
2006-02-01 17:03:34 Setting up hobbitd channels
2006-02-01 17:03:34 Setting up logfiles
2006-02-01 17:03:34 Task bbdisplay terminated by signal 15

Thanks in advance for your help.

Chris
list Henrik Størner · Wed, 1 Feb 2006 17:14:34 +0100 ·
quoted from Christian Maxeiner
On Wed, Feb 01, 2006 at 05:05:38PM +0100, Maxeiner, Christian wrote:
Everything worked fine but an hour later some of the hobbit processes
are crashing shortly after restarting hobbit. I am running hobbit
4.1.2p1 on HP-UX 11.11.

This is the output of the hobbitlaunch.log file:

2006-02-01 16:51:27 Setting up hobbitd channels
2006-02-01 16:51:27 Setting up logfiles
2006-02-01 16:51:34 Task hobbitd terminated by signal 6
What's in the hobbitd.log file ? After the "Setting up logfiles", 
that's where the hobbitd output goes.

This ought to generate a core-file in the ~hobbit/data/tmp/ directory.
Could you check this, and if there is a core file then run it through
gdb as described in http://www.hswn.dk/hobbit/help/known-issues.html#bugreport ?


Thanks,
Henrik
list Christian Maxeiner · Wed, 1 Feb 2006 17:18:20 +0100 ·
 Hi Henrik,

these are the entries of hobbitd.log:

2006-02-01 16:02:18 Setup complete
2006-02-01 16:19:14 Setup complete
2006-02-01 16:19:20 Setup complete
2006-02-01 16:20:21 Setup complete
2006-02-01 16:30:23 Setup complete
2006-02-01 16:31:25 Setup complete
2006-02-01 16:32:24 Setup complete
2006-02-01 16:37:18 Setup complete
2006-02-01 16:37:25 Setup complete
2006-02-01 16:38:29 Setup complete
2006-02-01 16:42:00 Setup complete
2006-02-01 16:42:06 Setup complete
2006-02-01 16:43:10 Setup complete
2006-02-01 16:51:27 Setup complete
2006-02-01 16:51:34 Setup complete
2006-02-01 16:52:32 Setup complete
2006-02-01 17:02:36 Setup complete
2006-02-01 17:03:34 Setup complete
2006-02-01 17:04:34 Setup complete
2006-02-01 17:14:36 Setup complete

Seams to be normal output.

Output of gdb:

Hewlett-Packard Wildebeest 1.0 (based on GDB 4.16)
(built for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00)
Copyright 1996, 1997 Free Software Foundation, Inc...
Core was generated by `hobbitd'.
Program terminated with signal 6, Aborted.
 
warning: The shared libraries were not privately mapped; setting a
breakpoint in a shared library will not work until you rerun the program.
 
 
warning: Can't find file hobbitd referenced in dld_list.
#0  0xc020d5b8 in _kill () from /usr/lib/libc.2
#0  0xc020d5b8 in _kill () from /usr/lib/libc.2
(gdb) bt
#0  0xc020d5b8 in _kill () from /usr/lib/libc.2
#1  0xc01a6f7c in raise () from /usr/lib/libc.2
#2  0xc01e81e0 in _abort_C () from /usr/lib/libc.2
#3  0xc01e823c in _abort () from /usr/lib/libc.2
#4  0x12bc8 in sigsegv_handler (signum=2063889120) at sig.c:57
#5  <signal handler called>
#6  0xc0199038 in _sigfillset () from /usr/lib/libc.2
#7  0xc0195bec in _sscanf () from /usr/lib/libc.2
#8  0xc019b510 in realloc () from /usr/lib/libc.2
#9  0x104a8 in xrealloc (ptr=0x4010dffc, size=0) at memory.c:149
#10 0x8aac in do_message (msg=0x3c610c68, origin=0x0) at hobbitd.c:2222
#11 0xc17c in main (argc=10485759, argv=0x40009cb8) at hobbitd.c:3512
#12 0xc0143478 in _start () from /usr/lib/libc.2
(gdb) 

Thanks, Chris
quoted from Henrik Størner


-----Ursprüngliche Nachricht-----
Von: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Gesendet: Mittwoch, 1. Februar 2006 17:15
An: user-ae9b8668bcde@xymon.invalid
Betreff: Re: [hobbit] Hobbit crashes

On Wed, Feb 01, 2006 at 05:05:38PM +0100, Maxeiner, Christian wrote:
Everything worked fine but an hour later some of the hobbit processes
are crashing shortly after restarting hobbit. I am running hobbit
4.1.2p1 on HP-UX 11.11.

This is the output of the hobbitlaunch.log file:

2006-02-01 16:51:27 Setting up hobbitd channels
2006-02-01 16:51:27 Setting up logfiles
2006-02-01 16:51:34 Task hobbitd terminated by signal 6
What's in the hobbitd.log file ? After the "Setting up logfiles", 
that's where the hobbitd output goes.

This ought to generate a core-file in the ~hobbit/data/tmp/ directory.
Could you check this, and if there is a core file then run it through
gdb as described in http://www.hswn.dk/hobbit/help/known-issues.html#bugreport ?


Thanks,
Henrik
list Henrik Størner · Wed, 1 Feb 2006 19:29:10 +0100 ·
On Wed, Feb 01, 2006 at 05:18:20PM +0100, Maxeiner, Christian wrote:
Output of gdb:
(gdb) bt
quoted from Christian Maxeiner
#5  <signal handler called>
#6  0xc0199038 in _sigfillset () from /usr/lib/libc.2
#7  0xc0195bec in _sscanf () from /usr/lib/libc.2
#8  0xc019b510 in realloc () from /usr/lib/libc.2
#9  0x104a8 in xrealloc (ptr=0x4010dffc, size=0) at memory.c:149
#10 0x8aac in do_message (msg=0x3c610c68, origin=0x0) at hobbitd.c:2222
#11 0xc17c in main (argc=10485759, argv=0x40009cb8) at hobbitd.c:3512
Very odd. The interesting thing is that hobbitd here is doing a 
re-allocation of a buffer, but asking for 0 bytes - and apparently,
HP-UX doesn't like that.

But I don't see how it can get to asking for 0 bytes in that part
of the code...

Could you start gdb again, but instead of the "bt" command do this:

gdb> fr 10
gdb> p used
gdb> p needed
gdb> p bufsz
gdb> p bufp
gdb> p buf


and mail me the output?


Thanks,
Henrik
list Christian Maxeiner · Wed, 1 Feb 2006 19:34:35 +0100 ·
Henrik,

here's the output:
quoted from Christian Maxeiner

#0  0xc020d5b8 in _kill () from /usr/lib/libc.2
#0  0xc020d5b8 in _kill () from /usr/lib/libc.2

(gdb) fr 10
#10 0x8aac in do_message (msg=0x6e651a64, origin=0x0) at hobbitd.c:2222
2222                                                    buf = (char *)realloc(buf, bufsz);
(gdb) p used
$1 = 1074320844
(gdb) p needed
$2 = 1024
(gdb) p bufsz
$3 = 30832
(gdb) p bufp
$4 = (char *) 0x0
(gdb) p buf
$5 = (char *) 0x8c70 "\b\034\002X\204`!0\013\205\n%4\023"
quoted from Henrik Størner
(gdb)  

Thanks, Chris.


-----Ursprüngliche Nachricht-----
Von: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Gesendet: Mittwoch, 1. Februar 2006 19:29
An: user-ae9b8668bcde@xymon.invalid
Betreff: Re: [hobbit] Hobbit crashes

On Wed, Feb 01, 2006 at 05:18:20PM +0100, Maxeiner, Christian wrote:
Output of gdb:
(gdb) bt
#5  <signal handler called>
#6  0xc0199038 in _sigfillset () from /usr/lib/libc.2
#7  0xc0195bec in _sscanf () from /usr/lib/libc.2
#8  0xc019b510 in realloc () from /usr/lib/libc.2
#9  0x104a8 in xrealloc (ptr=0x4010dffc, size=0) at memory.c:149
#10 0x8aac in do_message (msg=0x3c610c68, origin=0x0) at hobbitd.c:2222
#11 0xc17c in main (argc=10485759, argv=0x40009cb8) at hobbitd.c:3512
Very odd. The interesting thing is that hobbitd here is doing a 
re-allocation of a buffer, but asking for 0 bytes - and apparently,
HP-UX doesn't like that.

But I don't see how it can get to asking for 0 bytes in that part
of the code...

Could you start gdb again, but instead of the "bt" command do this:

gdb> fr 10
gdb> p used
gdb> p needed
gdb> p bufsz
gdb> p bufp
gdb> p buf


and mail me the output?


Thanks,
Henrik
list Henrik Størner · Wed, 1 Feb 2006 20:44:08 +0100 ·
quoted from Christian Maxeiner
On Wed, Feb 01, 2006 at 07:34:35PM +0100, Maxeiner, Christian wrote:
Henrik,

here's the output:

#0  0xc020d5b8 in _kill () from /usr/lib/libc.2
#0  0xc020d5b8 in _kill () from /usr/lib/libc.2
(gdb) fr 10
#10 0x8aac in do_message (msg=0x6e651a64, origin=0x0) at hobbitd.c:2222
2222                                                    buf = (char *)realloc(buf, bufsz);
(gdb) p used
$1 = 1074320844
(gdb) p needed
$2 = 1024
(gdb) p bufsz
$3 = 30832
(gdb) p bufp
$4 = (char *) 0x0
(gdb) p buf
$5 = (char *) 0x8c70 "\b\034\002X\204`!0\013\205\n%4\023"
Still very odd. The "used" number is insanely high, and bufp is 0. The
first could be a result of the latter, but I cannot understand how that
could happen (bufp starts out being the same as buf, and it only grows).


What happens if you shutdown Hobbit, rename the
~hobbit/data/tmp/hobbitd.chk file to something else, and startup
Hobbit again ?

If that makes it work, I'd very much like to have a copy of that file.
It will include a lot of internal information about your tests, so make
sure you have permission to make it available to me. And send it to me
directly - user-ce4a2c883f75@xymon.invalid - instead of to the list.


Regards,
Henrik