Xymon Mailing List Archive search

bbnet terminating on signal 6

7 messages in this thread

list Tom Kauffman · Wed, 4 May 2005 12:43:14 -0500 ·

Debug output:

Core was generated by `bbtest-net --report --ping --checkresponse'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x4004c691 in kill () from /lib/libc.so.6

And the --version output:
hobbit at whq-bbd:~/server/tmp> ../bin/bbtest-net --version
bbtest-net version 4.0.3rc1
Compile settings: MAXMSG=32768, BBDPORTNUMBER=1984

What do I need to look for?

This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd testing
at config time.

I saw one other signal 6 on the list, but no resolution. Have I missed
something?

(bbtest-net is working fine on my other system, SuSE 9.0)

Tom
list Tom Kauffman · Wed, 4 May 2005 14:12:19 -0500 ·
OK, let's forget this one -- I'm gonna bite the bullet and upgrade the
server to SuSE 9.x.

Just gotta figure out when . . .

Tom
quoted from Tom Kauffman

-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid] 
Sent: Wednesday, May 04, 2005 12:43 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] bbnet terminating on signal 6


Debug output:

Core was generated by `bbtest-net --report --ping --checkresponse'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x4004c691 in kill () from /lib/libc.so.6

And the --version output:
hobbit at whq-bbd:~/server/tmp> ../bin/bbtest-net --version
bbtest-net version 4.0.3rc1
Compile settings: MAXMSG=32768, BBDPORTNUMBER=1984

What do I need to look for?

This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd testing
at config time.

I saw one other signal 6 on the list, but no resolution. Have I missed
something?

(bbtest-net is working fine on my other system, SuSE 9.0)

Tom
list Henrik Størner · Wed, 4 May 2005 22:29:15 +0200 ·
On Wed, May 04, 2005 at 12:43:14PM -0500, Kauffman, Tom wrote:
Loaded symbols for /lib/libnss_files.so.2
#0  0x4004c691 in kill () from /lib/libc.so.6
Hrm - that doesn't say much about what's happening. Meaning that it's
probably some sort of memory/stack corruption.

Could you try running it with --debug ? That should at least give some
idea about what it is doing when it crashes.
quoted from Tom Kauffman
This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd testing
at config time.
Still, it shouldn't crash.
quoted from Tom Kauffman
I saw one other signal 6 on the list, but no resolution. Have I missed
something?
No I haven't figured out what happened there - since no one else
reported something like that, I was suspecting a local setup issue of
some kind. With two reports, that is rather unlikely.


Henrik
list Tom Kauffman · Wed, 4 May 2005 15:47:31 -0500 ·
Ask and ye shall receive:

This GDB was configured as "i586-suse-linux"...
Core was generated by `bbtest-net --report --ping --checkresponse
--debug --no-ssl'.
quoted from Tom Kauffman
Program terminated with signal 6, Aborted.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x4004c691 in kill () from /lib/libc.so.6

The bb-network.log is interesting -- doesn't bbtest-net understand the
'noconn' tag? I've got my systems multiply defined, to allow groupings
by function and groupings by site; the function groupings all have
'noconn' specified. 

I'll attach the log -- but this looks like where the problem hit:

2005-05-04 13:07:57 Got DNS result for host rey-primary-router :
10.112.254.12
2005-05-04 13:07:57 Got DNS result for host gos-secondary-router :
10.16.254.13
2005-05-04 13:07:57 DNS lookup failed for lod-router-hsrp - status
Domain name not found (4)
2005-05-04 13:08:02 DNS lookup failed for whq-intranet-d.nibco.com -
status Channel is being destroyed (16)
2005-05-04 13:08:02 DNS lookup failed for gos-iqserver - status Channel
is being destroyed (16)

As I said, it seems to run just fine on SuSE 9.0, so I'm planning on
updating this 8.2 (unpatched 8.2, at that!) to either 9.0 or 9.3. I had
been hoping to wait until I got my 'new' hardware (this is a 4-way
P4-450 Xeon box, an old Dell 4300; the replacement will be a 2-way Dell
currently in use by another application).

Tom
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, May 04, 2005 3:29 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbnet terminating on signal 6

On Wed, May 04, 2005 at 12:43:14PM -0500, Kauffman, Tom wrote:
Loaded symbols for /lib/libnss_files.so.2
#0  0x4004c691 in kill () from /lib/libc.so.6
Hrm - that doesn't say much about what's happening. Meaning that it's
probably some sort of memory/stack corruption.

Could you try running it with --debug ? That should at least give some
idea about what it is doing when it crashes.
This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd
testing
at config time.
Still, it shouldn't crash.
I saw one other signal 6 on the list, but no resolution. Have I missed
something?
No I haven't figured out what happened there - since no one else
reported something like that, I was suspecting a local setup issue of
some kind. With two reports, that is rather unlikely.


Henrik
list Henrik Størner · Thu, 5 May 2005 08:21:06 +0200 ·
quoted from Tom Kauffman
On Wed, May 04, 2005 at 03:47:31PM -0500, Kauffman, Tom wrote:
The bb-network.log is interesting -- doesn't bbtest-net understand the
'noconn' tag? I've got my systems multiply defined, to allow groupings
by function and groupings by site; the function groupings all have
'noconn' specified. 
It does know about the "noconn" tag, but the way the bb-hosts file get
parsed and tests added to the queue may not be quite as you expected.
Suffice to say, I do believe it ends up doing the right thing :-)
I'll attach the log -- but this looks like where the problem hit:
Yes, it seems to be a problem in the resolver library. Just to verify
that, could you try running bbtest-net with the "--no-ares" option and
see if that keeps it from crashing ? I know this may cause bbtest-net to
spend much more time doing DNS lookups, but I'd like to have the problem
narrowed down as much as possible.


Thanks,
Henrik
list Tom Kauffman · Thu, 5 May 2005 12:48:01 -0500 ·
Running with "--no-ares" cured the problem; I'll just leave it this way
until I do the rebuild sometime next month. After all, bbtest is
reporting 359 dns lookups, a run time of 22.669 seconds, and a ping time
of 21.018 seconds. It's not like I'm sitting on the host count a lot of
the rest of you have.

Reading somewhat between the lines, I'm assuming this is normal:

2005-05-05 12:32:16 Task bbnet started with PID 26979
2005-05-05 12:33:18 Task bbdisplay started with PID 26999
2005-05-05 12:34:20 Task bbdisplay started with PID 27004
2005-05-05 12:35:20 Task bbdisplay started with PID 27026
2005-05-05 12:36:24 Task bbdisplay started with PID 27028
2005-05-05 12:37:18 Task bbcombotest started with PID 27031
2005-05-05 12:37:18 Task bbnet started with PID 27032
2005-05-05 12:37:28 Task bbdisplay started with PID 27039

bbdisplay firing up once per minute to build the displays, and bbnet
running every five minutes to run the network tests.

Now to verify my alert rules work, and I'm home free!

Thanks for all the help!
quoted from Henrik Størner

Tom

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Thursday, May 05, 2005 1:21 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbnet terminating on signal 6

On Wed, May 04, 2005 at 03:47:31PM -0500, Kauffman, Tom wrote:
The bb-network.log is interesting -- doesn't bbtest-net understand the
'noconn' tag? I've got my systems multiply defined, to allow groupings
by function and groupings by site; the function groupings all have
'noconn' specified. 
It does know about the "noconn" tag, but the way the bb-hosts file get
parsed and tests added to the queue may not be quite as you expected.
Suffice to say, I do believe it ends up doing the right thing :-)
I'll attach the log -- but this looks like where the problem hit:
Yes, it seems to be a problem in the resolver library. Just to verify
that, could you try running bbtest-net with the "--no-ares" option and
see if that keeps it from crashing ? I know this may cause bbtest-net to
spend much more time doing DNS lookups, but I'd like to have the problem
narrowed down as much as possible.


Thanks,
Henrik
list Henrik Størner · Thu, 5 May 2005 22:52:05 +0200 ·
quoted from Tom Kauffman
On Thu, May 05, 2005 at 12:48:01PM -0500, Kauffman, Tom wrote:
Running with "--no-ares" cured the problem; I'll just leave it this way
until I do the rebuild sometime next month. 
Thanks, that is really nice to know. I'll take a look at the C-ARES
code, but I'll probably just report it to the guy doing that library.
quoted from Tom Kauffman
Reading somewhat between the lines, I'm assuming this is normal:

2005-05-05 12:32:16 Task bbnet started with PID 26979
2005-05-05 12:33:18 Task bbdisplay started with PID 26999
2005-05-05 12:34:20 Task bbdisplay started with PID 27004
2005-05-05 12:35:20 Task bbdisplay started with PID 27026
2005-05-05 12:36:24 Task bbdisplay started with PID 27028
2005-05-05 12:37:18 Task bbcombotest started with PID 27031
2005-05-05 12:37:18 Task bbnet started with PID 27032
2005-05-05 12:37:28 Task bbdisplay started with PID 27039

bbdisplay firing up once per minute to build the displays, and bbnet
running every five minutes to run the network tests.
Yep, that is the default setup. bbdisplay runs once a minute, because
you're likely to get updates throughout the 5 minute period from your
client hosts, and it is nice to see them (almost) as soon as they
appear.

You can tweak the frequency for each task in hobbitlaunch.cfg (the
"INTERVAL" setting).


Henrik