bbnet terminating on signal 6
list Tom Kauffman
Debug output: Core was generated by `bbtest-net --report --ping --checkresponse'. Program terminated with signal 6, Aborted. Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 #0 0x4004c691 in kill () from /lib/libc.so.6 And the --version output: hobbit at whq-bbd:~/server/tmp> ../bin/bbtest-net --version bbtest-net version 4.0.3rc1 Compile settings: MAXMSG=32768, BBDPORTNUMBER=1984 What do I need to look for? This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd testing at config time. I saw one other signal 6 on the list, but no resolution. Have I missed something? (bbtest-net is working fine on my other system, SuSE 9.0) Tom
list Tom Kauffman
OK, let's forget this one -- I'm gonna bite the bullet and upgrade the server to SuSE 9.x. Just gotta figure out when . . . Tom
▸
-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid]
Sent: Wednesday, May 04, 2005 12:43 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] bbnet terminating on signal 6
Debug output:
Core was generated by `bbtest-net --report --ping --checkresponse'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0 0x4004c691 in kill () from /lib/libc.so.6
And the --version output:
hobbit at whq-bbd:~/server/tmp> ../bin/bbtest-net --version
bbtest-net version 4.0.3rc1
Compile settings: MAXMSG=32768, BBDPORTNUMBER=1984
What do I need to look for?
This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd testing
at config time.
I saw one other signal 6 on the list, but no resolution. Have I missed
something?
(bbtest-net is working fine on my other system, SuSE 9.0)
Tom
list Henrik Størner
On Wed, May 04, 2005 at 12:43:14PM -0500, Kauffman, Tom wrote:
Loaded symbols for /lib/libnss_files.so.2 #0 0x4004c691 in kill () from /lib/libc.so.6
Hrm - that doesn't say much about what's happening. Meaning that it's probably some sort of memory/stack corruption. Could you try running it with --debug ? That should at least give some idea about what it is doing when it crashes.
▸
This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd testing at config time.
Still, it shouldn't crash.
▸
I saw one other signal 6 on the list, but no resolution. Have I missed something?
No I haven't figured out what happened there - since no one else reported something like that, I was suspecting a local setup issue of some kind. With two reports, that is rather unlikely. Henrik
list Tom Kauffman
Ask and ye shall receive: This GDB was configured as "i586-suse-linux"... Core was generated by `bbtest-net --report --ping --checkresponse --debug --no-ssl'.
▸
Program terminated with signal 6, Aborted.
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0 0x4004c691 in kill () from /lib/libc.so.6
The bb-network.log is interesting -- doesn't bbtest-net understand the
'noconn' tag? I've got my systems multiply defined, to allow groupings
by function and groupings by site; the function groupings all have
'noconn' specified.
I'll attach the log -- but this looks like where the problem hit:
2005-05-04 13:07:57 Got DNS result for host rey-primary-router :
10.112.254.12
2005-05-04 13:07:57 Got DNS result for host gos-secondary-router :
10.16.254.13
2005-05-04 13:07:57 DNS lookup failed for lod-router-hsrp - status
Domain name not found (4)
2005-05-04 13:08:02 DNS lookup failed for whq-intranet-d.nibco.com -
status Channel is being destroyed (16)
2005-05-04 13:08:02 DNS lookup failed for gos-iqserver - status Channel
is being destroyed (16)
As I said, it seems to run just fine on SuSE 9.0, so I'm planning on
updating this 8.2 (unpatched 8.2, at that!) to either 9.0 or 9.3. I had
been hoping to wait until I got my 'new' hardware (this is a 4-way
P4-450 Xeon box, an old Dell 4300; the replacement will be a 2-way Dell
currently in use by another application).
Tom
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, May 04, 2005 3:29 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbnet terminating on signal 6
On Wed, May 04, 2005 at 12:43:14PM -0500, Kauffman, Tom wrote:Loaded symbols for /lib/libnss_files.so.2 #0 0x4004c691 in kill () from /lib/libc.so.6
Hrm - that doesn't say much about what's happening. Meaning that it's probably some sort of memory/stack corruption. Could you try running it with --debug ? That should at least give some idea about what it is doing when it crashes.
This is SuSE 8.2 (yeah, I know!); I did not include ssl or lapd testing at config time.
Still, it shouldn't crash.
I saw one other signal 6 on the list, but no resolution. Have I missed something?
No I haven't figured out what happened there - since no one else reported something like that, I was suspecting a local setup issue of some kind. With two reports, that is rather unlikely. Henrik
list Henrik Størner
▸
On Wed, May 04, 2005 at 03:47:31PM -0500, Kauffman, Tom wrote:
The bb-network.log is interesting -- doesn't bbtest-net understand the 'noconn' tag? I've got my systems multiply defined, to allow groupings by function and groupings by site; the function groupings all have 'noconn' specified.
It does know about the "noconn" tag, but the way the bb-hosts file get parsed and tests added to the queue may not be quite as you expected. Suffice to say, I do believe it ends up doing the right thing :-)
I'll attach the log -- but this looks like where the problem hit:
Yes, it seems to be a problem in the resolver library. Just to verify that, could you try running bbtest-net with the "--no-ares" option and see if that keeps it from crashing ? I know this may cause bbtest-net to spend much more time doing DNS lookups, but I'd like to have the problem narrowed down as much as possible. Thanks, Henrik
list Tom Kauffman
Running with "--no-ares" cured the problem; I'll just leave it this way until I do the rebuild sometime next month. After all, bbtest is reporting 359 dns lookups, a run time of 22.669 seconds, and a ping time of 21.018 seconds. It's not like I'm sitting on the host count a lot of the rest of you have. Reading somewhat between the lines, I'm assuming this is normal: 2005-05-05 12:32:16 Task bbnet started with PID 26979 2005-05-05 12:33:18 Task bbdisplay started with PID 26999 2005-05-05 12:34:20 Task bbdisplay started with PID 27004 2005-05-05 12:35:20 Task bbdisplay started with PID 27026 2005-05-05 12:36:24 Task bbdisplay started with PID 27028 2005-05-05 12:37:18 Task bbcombotest started with PID 27031 2005-05-05 12:37:18 Task bbnet started with PID 27032 2005-05-05 12:37:28 Task bbdisplay started with PID 27039 bbdisplay firing up once per minute to build the displays, and bbnet running every five minutes to run the network tests. Now to verify my alert rules work, and I'm home free! Thanks for all the help!
▸
Tom
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Thursday, May 05, 2005 1:21 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbnet terminating on signal 6
On Wed, May 04, 2005 at 03:47:31PM -0500, Kauffman, Tom wrote:The bb-network.log is interesting -- doesn't bbtest-net understand the 'noconn' tag? I've got my systems multiply defined, to allow groupings by function and groupings by site; the function groupings all have 'noconn' specified.
It does know about the "noconn" tag, but the way the bb-hosts file get parsed and tests added to the queue may not be quite as you expected. Suffice to say, I do believe it ends up doing the right thing :-)
I'll attach the log -- but this looks like where the problem hit:
Yes, it seems to be a problem in the resolver library. Just to verify that, could you try running bbtest-net with the "--no-ares" option and see if that keeps it from crashing ? I know this may cause bbtest-net to spend much more time doing DNS lookups, but I'd like to have the problem narrowed down as much as possible. Thanks, Henrik
list Henrik Størner
▸
On Thu, May 05, 2005 at 12:48:01PM -0500, Kauffman, Tom wrote:
Running with "--no-ares" cured the problem; I'll just leave it this way until I do the rebuild sometime next month.
Thanks, that is really nice to know. I'll take a look at the C-ARES code, but I'll probably just report it to the guy doing that library.
▸
Reading somewhat between the lines, I'm assuming this is normal: 2005-05-05 12:32:16 Task bbnet started with PID 26979 2005-05-05 12:33:18 Task bbdisplay started with PID 26999 2005-05-05 12:34:20 Task bbdisplay started with PID 27004 2005-05-05 12:35:20 Task bbdisplay started with PID 27026 2005-05-05 12:36:24 Task bbdisplay started with PID 27028 2005-05-05 12:37:18 Task bbcombotest started with PID 27031 2005-05-05 12:37:18 Task bbnet started with PID 27032 2005-05-05 12:37:28 Task bbdisplay started with PID 27039 bbdisplay firing up once per minute to build the displays, and bbnet running every five minutes to run the network tests.
Yep, that is the default setup. bbdisplay runs once a minute, because you're likely to get updates throughout the 5 minute period from your client hosts, and it is nice to see them (almost) as soon as they appear. You can tweak the frequency for each task in hobbitlaunch.cfg (the "INTERVAL" setting). Henrik