Xymon Mailing List Archive search

Bug latest snapshot, hobbitd_client

4 messages in this thread

list David W David Gore · Wed, 19 Mar 2008 01:31:54 +0000 ·
[hobbit at hobbit2 server]$ file tmp/core.21567
tmp/core.21567: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, from 'hobbitd_client'

[hobbit at hobbit2 server]$ file tmp/core.21600
tmp/core.21600: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, from 'hobbitd_client'

[hobbit at hobbit2 server]$ ls -al tmp/core.21567 tmp/core.21600
-rw-------  1 hobbit hobbit 5210112 Mar 19 00:23 tmp/core.21567
-rw-------  1 hobbit hobbit 5210112 Mar 19 00:23 tmp/core.21600

Dumps core in pairs every 1-5 minutes or so:

-rw-------  1 hobbit hobbit  5210112 Mar 19 00:23 tmp/core.21600
-rw-------  1 hobbit hobbit  5210112 Mar 19 00:23 tmp/core.21567
-rw-------  1 hobbit hobbit  4038656 Mar 19 00:27 tmp/core.21841
-rw-------  1 hobbit hobbit  4038656 Mar 19 00:27 tmp/core.21602
-rw-------  1 hobbit hobbit 49213440 Mar 19 00:31 tmp/core.21520
-rw-------  1 hobbit hobbit  5505024 Mar 19 00:32 tmp/core.22115
-rw-------  1 hobbit hobbit  5505024 Mar 19 00:32 tmp/core.22109
-rw-------  1 hobbit hobbit  4227072 Mar 19 00:36 tmp/core.22378
-rw-------  1 hobbit hobbit  4227072 Mar 19 00:36 tmp/core.22169
-rw-------  1 hobbit hobbit  3776512 Mar 19 00:38 tmp/core.22439
-rw-------  1 hobbit hobbit  3776512 Mar 19 00:38 tmp/core.22379
-rw-------  1 hobbit hobbit  5881856 Mar 19 00:43 tmp/core.22706
-rw-------  1 hobbit hobbit  5881856 Mar 19 00:43 tmp/core.22441
-rw-------  1 hobbit hobbit  3584000 Mar 19 00:44 tmp/core.22715
-rw-------  1 hobbit hobbit  3584000 Mar 19 00:44 tmp/core.22707
-rw-------  1 hobbit hobbit  4902912 Mar 19 00:48 tmp/core.22968
-rw-------  1 hobbit hobbit  4902912 Mar 19 00:48 tmp/core.22716
-rw-------  1 hobbit hobbit  5398528 Mar 19 00:51 tmp/core.23165
-rw-------  1 hobbit hobbit  5398528 Mar 19 00:51 tmp/core.22969
-rw-------  1 hobbit hobbit  4841472 Mar 19 00:53 tmp/core.23233
-rw-------  1 hobbit hobbit  4841472 Mar 19 00:53 tmp/core.23166
-rw-------  1 hobbit hobbit  3964928 Mar 19 00:58 tmp/core.23493
-rw-------  1 hobbit hobbit  3964928 Mar 19 00:58 tmp/core.23234
-rw-------  1 hobbit hobbit  3817472 Mar 19 01:03 tmp/core.23836
-rw-------  1 hobbit hobbit  3817472 Mar 19 01:03 tmp/core.23494
-rw-------  1 hobbit hobbit 54190080 Mar 19 01:07 tmp/core.22100
-rw-------  1 hobbit hobbit  5402624 Mar 19 01:12 tmp/core.24304
-rw-------  1 hobbit hobbit  5402624 Mar 19 01:12 tmp/core.24095
-rw-------  1 hobbit hobbit  4055040 Mar 19 01:13 tmp/core.24367
-rw-------  1 hobbit hobbit  4055040 Mar 19 01:13 tmp/core.24305

I am not sure I should post my gdb back trace here, but it has been
dumping core for at least a week perhaps longer with different daily
snapshots.  We use the same configs on a much older snapshot with no
problems.  I am not sure of the date of the stable snapshot, the version
is listed as Hobbit Monitor 4.3.0-0.20071026.  Running Red Hat
Enterprise 4.0.  After a while it fills up the file system.

As a side note, I thought I reported this a few month or so ago, but the
files column is mangled for some hosts, shows duplicate file entries
like /etc/hosts listed twice or even 3 times on the web page.

Of course this means hobbitd is crashing, stopping?

[hobbit at hobbit2 logs]$ cat hobbitlaunch.log
2008-03-19 00:22:14 hobbitlaunch starting
2008-03-19 00:22:14 Loading tasklist configuration from
/home/hobbit/server/etc/hobbitlaunch.cfg
2008-03-19 00:22:14 Loading hostnames
2008-03-19 00:22:14 Loading saved state
2008-03-19 00:22:15 Setting up network listener on 0.0.0.0:1984
2008-03-19 00:22:15 Setting up local listener
2008-03-19 00:22:15 Setting up signal handlers
2008-03-19 00:22:15 Setting up hobbitd channels
2008-03-19 00:22:15 Setting up logfiles
2008-03-19 00:31:46 Task hobbitd terminated by signal 6
2008-03-19 00:31:46 Loading hostnames
2008-03-19 00:31:46 Loading saved state
2008-03-19 00:31:47 Setting up network listener on 0.0.0.0:1984
2008-03-19 00:31:47 Setting up local listener
2008-03-19 00:31:47 Setting up signal handlers
2008-03-19 00:31:47 Setting up hobbitd channels
2008-03-19 00:31:47 Setting up logfiles
2008-03-19 01:07:56 Task hobbitd terminated by signal 6
2008-03-19 01:07:56 Task bbnet terminated by signal 15
2008-03-19 01:07:56 Loading hostnames
2008-03-19 01:07:57 Loading saved state
2008-03-19 01:07:57 Setting up network listener on 0.0.0.0:1984
2008-03-19 01:07:57 Setting up local listener
2008-03-19 01:07:57 Setting up signal handlers
2008-03-19 01:07:57 Setting up hobbitd channels
2008-03-19 01:07:57 Setting up logfiles
2008-03-19 01:17:55 Task hobbitd terminated by signal 6
2008-03-19 01:17:55 Task bbnet terminated by signal 15
2008-03-19 01:17:55 Loading hostnames
2008-03-19 01:17:55 Loading saved state
2008-03-19 01:17:56 Setting up network listener on 0.0.0.0:1984
2008-03-19 01:17:56 Setting up local listener
2008-03-19 01:17:56 Setting up signal handlers
2008-03-19 01:17:56 Setting up hobbitd channels
2008-03-19 01:17:56 Setting up logfiles
2008-03-19 01:21:32 Task hobbitd terminated by signal 6
2008-03-19 01:21:33 Loading hostnames
2008-03-19 01:21:33 Loading saved state
2008-03-19 01:21:33 Setting up network listener on 0.0.0.0:1984
2008-03-19 01:21:33 Setting up local listener
2008-03-19 01:21:33 Setting up signal handlers
2008-03-19 01:21:33 Setting up hobbitd channels
2008-03-19 01:21:33 Setting up logfiles


Perhaps this helps:

[hobbit at hobbit2 logs]$ cat clientdata.log
2008-03-19 00:22:20 Peer not up, flushing message queue
2008-03-19 00:23:52 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:23:52 Peer not up, flushing message queue
2008-03-19 00:27:20 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:27:22 Peer not up, flushing message queue
2008-03-19 00:31:53 Peer not up, flushing message queue
2008-03-19 00:32:23 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:32:23 Peer not up, flushing message queue
2008-03-19 00:32:25 Peer not up, flushing message queue
2008-03-19 00:32:26 Peer not up, flushing message queue
2008-03-19 00:32:29 Peer not up, flushing message queue
2008-03-19 00:32:32 Peer not up, flushing message queue
2008-03-19 00:32:34 Peer not up, flushing message queue
2008-03-19 00:32:37 Peer not up, flushing message queue
2008-03-19 00:32:38 Peer not up, flushing message queue
2008-03-19 00:32:42 Peer not up, flushing message queue
2008-03-19 00:32:44 Peer not up, flushing message queue
2008-03-19 00:32:45 Peer not up, flushing message queue
2008-03-19 00:32:46 Peer not up, flushing message queue
2008-03-19 00:32:48 Peer not up, flushing message queue
2008-03-19 00:32:50 Peer not up, flushing message queue
2008-03-19 00:36:42 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:36:44 Peer not up, flushing message queue
2008-03-19 00:38:11 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:38:12 Peer not up, flushing message queue
2008-03-19 00:43:53 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:43:54 Peer not up, flushing message queue
2008-03-19 00:44:42 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:44:44 Peer not up, flushing message queue
2008-03-19 00:44:44 Peer not up, flushing message queue
2008-03-19 00:44:44 Peer not up, flushing message queue
2008-03-19 00:44:45 Peer not up, flushing message queue
2008-03-19 00:44:45 Peer not up, flushing message queue
2008-03-19 00:44:46 Peer not up, flushing message queue
2008-03-19 00:44:49 Peer not up, flushing message queue
2008-03-19 00:44:50 Peer not up, flushing message queue
2008-03-19 00:44:52 Peer not up, flushing message queue
2008-03-19 00:44:53 Peer not up, flushing message queue
2008-03-19 00:48:50 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:48:51 Peer not up, flushing message queue
2008-03-19 00:51:55 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:51:56 Peer not up, flushing message queue
2008-03-19 00:53:54 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:53:56 Peer not up, flushing message queue
2008-03-19 00:58:54 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 00:58:55 Peer not up, flushing message queue
2008-03-19 01:03:56 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 01:03:56 Peer not up, flushing message queue
2008-03-19 01:08:02 Peer not up, flushing message queue
2008-03-19 01:12:24 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 01:12:24 Peer not up, flushing message queue
2008-03-19 01:13:56 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 01:13:57 Peer not up, flushing message queue
2008-03-19 01:17:17 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 01:17:19 Peer not up, flushing message queue
2008-03-19 01:18:03 Peer not up, flushing message queue
2008-03-19 01:18:58 Peer at 0.0.0.0:0 failed: Broken pipe
2008-03-19 01:18:59 Peer not up, flushing message queue
2008-03-19 01:19:00 Peer not up, flushing message queue
2008-03-19 01:19:01 Peer not up, flushing message queue
2008-03-19 01:19:02 Peer not up, flushing message queue
2008-03-19 01:19:02 Peer not up, flushing message queue
2008-03-19 01:21:38 Peer not up, flushing message queue

David
list Dirk Kastens · Wed, 19 Mar 2008 08:36:44 +0100 ·
Hi,
quoted from David W David Gore

Gore, David W (David) wrote:
As a side note, I thought I reported this a few month or so ago, but the
files column is mangled for some hosts, shows duplicate file entries
like /etc/hosts listed twice or even 3 times on the web page.
I had the same problem with one of the February snapshots and had to 
return to a snapshot from November. I reported this to the list but 
didn't receive an answer.

-- 
Regards,

Dirk Kastens
Universitaet Osnabrueck, Rechenzentrum (Computer Center)
Albrechtstr. 28, 49069 Osnabrueck, Germany
Tel.: +XX-XXX-XXX-XXXX, FAX: -2470
list Henrik Størner · Wed, 19 Mar 2008 09:08:01 +0100 ·
Seems several of your Hobbit programs are dumping core. These three
cores are probably not from the same program, since they are so 
different in size:
quoted from David W David Gore
-rw-------  1 hobbit hobbit  5210112 Mar 19 00:23 tmp/core.21600
-rw-------  1 hobbit hobbit  4038656 Mar 19 00:27 tmp/core.21602
-rw-------  1 hobbit hobbit 49213440 Mar 19 00:31 tmp/core.21520
I'd suspect that .21520 core was from hobbitd, the timestamp matches
the log-file entry indicating that hobbitd has crashed.

Before doing anything else, please re-build Hobbit - and run a "make
clean" before doing the "make; make install". If the problem persists
after that, I would like to see the gdb backtrace from the different
programs that crash.
quoted from David W David Gore
I am not sure I should post my gdb back trace here, but it has been
dumping core for at least a week perhaps longer with different daily
snapshots.  We use the same configs on a much older snapshot with no
problems.  I am not sure of the date of the stable snapshot
The best way is to extract the version-numbers from each of the
source files like this:

$ strings ~hobbit/server/bin/hobbitd|grep \$Id:
$Id: hobbitd.c,v 1.279 2008/03/02 12:49:40 henrik Exp henrik $
$Id: hobbitd_buffer.c,v 1.10 2008/01/03 10:08:13 henrik Exp $
...lots more lines...


Regards,
Henrik
list David W David Gore · Wed, 19 Mar 2008 14:16:13 +0000 ·
quoted from Henrik Størner
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, March 19, 2008 08:08
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Bug latest snapshot, hobbitd_client

Seems several of your Hobbit programs are dumping core. These three
cores are probably not from the same program, since they are so
different in size:
-rw-------  1 hobbit hobbit  5210112 Mar 19 00:23 tmp/core.21600
-rw-------  1 hobbit hobbit  4038656 Mar 19 00:27 tmp/core.21602
-rw-------  1 hobbit hobbit 49213440 Mar 19 00:31 tmp/core.21520
I'd suspect that .21520 core was from hobbitd, the timestamp matches
the log-file entry indicating that hobbitd has crashed.

Before doing anything else, please re-build Hobbit - and run a "make
clean" before doing the "make; make install". If the problem persists
after that, I would like to see the gdb backtrace from the different
programs that crash.
Each build from the snapshot is from scratch, so it will be clean from
the start. 
quoted from Henrik Størner
I am not sure I should post my gdb back trace here, but it has been
dumping core for at least a week perhaps longer with different daily
snapshots.  We use the same configs on a much older snapshot with no
problems.  I am not sure of the date of the stable snapshot
The best way is to extract the version-numbers from each of the
source files like this:

$ strings ~hobbit/server/bin/hobbitd|grep \$Id:
$Id: hobbitd.c,v 1.279 2008/03/02 12:49:40 henrik Exp henrik $
$Id: hobbitd_buffer.c,v 1.10 2008/01/03 10:08:13 henrik Exp $
...lots more lines...
[hobbit at hobbit2 tmp]$ file core.21600 core.21602 core.21520
core.21600: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, from 'hobbitd_client'
core.21602: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, from 'hobbitd_client'
core.21520: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, from 'hobbitd'

Sorry, the grep for $Id: is not in the strings output.  I sent you the
backtrace in a separate e-mail.  

Just as an FYI, I am still disappointed in tooltips not working
properly.  It really isn't practical to vertically scroll the window
because descriptions or comments are making the window really wide. When
you have the column headings in view you cannot tell what host the alarm
is for because the hosts scrolled off the left side of the window.
Regards,
Henrik