Xymon Mailing List Archive search

bbtest-net crashing

13 messages in this thread

list Joe Sloan · Fri, 21 Jan 2005 10:38:26 -0800 ·
Hello Henrik,

Have you had any luck tracking down the issue in the ares code that 
causes bbtest-net to crash? Shall I send more backtraces?

Best Regards,

Joe
list Henrik Størner · Fri, 21 Jan 2005 19:41:31 +0100 ·
quoted from Joe Sloan
On Fri, Jan 21, 2005 at 10:38:26AM -0800, joe wrote:
Hello Henrik,

Have you had any luck tracking down the issue in the ares code that 
causes bbtest-net to crash? Shall I send more backtraces?
Have you tried it with the beta-5 version released last week-end ?
This includes an update to the C-ARES library that I hope might solve
this problem.

If not, I'll have to file a bug-report with the C-ARES people.


Henrik
list Joe Sloan · Fri, 21 Jan 2005 11:01:38 -0800 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Fri, Jan 21, 2005 at 10:38:26AM -0800, joe wrote:
 
Hello Henrik,

Have you had any luck tracking down the issue in the ares code that 
causes bbtest-net to crash? Shall I send more backtraces?
   
Have you tried it with the beta-5 version released last week-end ?
This includes an update to the C-ARES library that I hope might solve
this problem.

If not, I'll have to file a bug-report with the C-ARES people.

 
Ah, I'm using bbgen-3.5, can I use the newer c-ares lib with that?

Joe
list Henrik Størner · Fri, 21 Jan 2005 23:16:49 +0100 ·
quoted from Joe Sloan
On Fri, Jan 21, 2005 at 11:01:38AM -0800, joe wrote:
Henrik Stoerner wrote:
On Fri, Jan 21, 2005 at 10:38:26AM -0800, joe wrote:
Ah, I'm using bbgen-3.5, can I use the newer c-ares lib with that?
That's a bit tricky, but you can fetch Hobbit in the beta-5
version, and configure it for use with standard Big Brother.
That gives you the same set of tools that you have with bbgen.
In fact, you can just copy the "bbnet/bbtest-net" tool you
get from compiling Hobbit and use it instead of the bbtest-net
from bbgen.

-- 
Henrik Storner
list Joe Sloan · Fri, 21 Jan 2005 14:19:33 -0800 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Fri, Jan 21, 2005 at 11:01:38AM -0800, joe wrote:
 
Henrik Stoerner wrote:

   
On Fri, Jan 21, 2005 at 10:38:26AM -0800, joe wrote:
     
Ah, I'm using bbgen-3.5, can I use the newer c-ares lib with that?
   
That's a bit tricky, but you can fetch Hobbit in the beta-5
version, and configure it for use with standard Big Brother.
That gives you the same set of tools that you have with bbgen.
In fact, you can just copy the "bbnet/bbtest-net" tool you
get from compiling Hobbit and use it instead of the bbtest-net
from bbgen.
 
Thanks, I'll give that a try and see how it goes...

Joe
list Joe Sloan · Thu, 27 Jan 2005 18:50:07 -0800 ·
I've compiled bbgen-3.5 with c-ares-1.2.1 (shipped with beta-6) and bbtest-net seems to be OK so far, running without a crash. However, as it took some time for the problem to occur with the previous version of the c-ares libs, I'll have to just monitor this and see how it goes...

Joe
list Joe Sloan · Mon, 31 Jan 2005 10:38:27 -0800 ·
quoted from Joe Sloan
joe wrote:
I've compiled bbgen-3.5 with c-ares-1.2.1 (shipped with beta-6) and bbtest-net seems to be OK so far, running without a crash. However, as it took some time for the problem to occur with the previous version of the c-ares libs, I'll have to just monitor this and see how it goes...
I had earlier reported bbtest-net crashing on bbgen-3.5, which was narrowed down to the ares libs. After upgrading c-ares to 1.2.1 and re-enabling ares, the bb servers have stayed up all weekend. It appears to be the case that ares 1.2.1 fixes the bbtest-net crash bug.

Joe
list Joe Sloan · Thu, 10 Feb 2005 11:52:39 -0800 ·
Unfortunately the good times didn't last - both big brother servers went red last night and started sending red & purple messages - it's a subtle and unpredictable bug; disabling ares got both big brother servers back on track. I have bbtest-net core dumps from both servers, if you are interested -

Joe
quoted from Joe Sloan
joe wrote:
I've compiled bbgen-3.5 with c-ares-1.2.1 (shipped with beta-6) and bbtest-net seems to be OK so far, running without a crash. However, as it took some time for the problem to occur with the previous version of the c-ares libs, I'll have to just monitor this and see how it goes...
list Henrik Størner · Thu, 10 Feb 2005 21:29:07 +0100 ·
quoted from Joe Sloan
On Thu, Feb 10, 2005 at 11:52:39AM -0800, J Sloan wrote:
Unfortunately the good times didn't last - both big brother servers went red last night and started sending red & purple messages - it's a subtle and unpredictable bug; disabling ares got both big brother servers back on track. I have bbtest-net core dumps from both servers, if you are interested -
I certainly am. Please send them directly to me, and include the
bbtest-net program that you built - the core-dump is hardly usable
without the binary that generated it.


Regards,
Henrik
list Stewart Larsen · Thu, 1 Dec 2005 12:36:07 -0500 ·
Using Hobbit Monitor 4.1.2p1

Clean install replacing a bb/bbgen install

I get a bunch of these in my log...

"Task bbnet terminated by signal 6"


Gdb dumps of the core files vary.

I was getting...
#0  0x0043ecef in raise () from /lib/tls/libc.so.6
#1  0x004404f5 in abort () from /lib/tls/libc.so.6
#2  0x0805dab6 in sigsegv_handler (signum=11) at sig.c:57
#3  <signal handler called>
#4  0x08060483 in read_tcp_data (channel=0x85ecd48, read_fds=0xbfff9f20,
now=1133457363) at ares_process.c:192
#5  0x08060287 in ares_process (channel=0x85ecd48, read_fds=0xbfff9f20,
write_fds=0xbfff9ea0) at ares_process.c:77
#6  0x08055b2e in dns_queue_run (channel=0x85ecd48) at dns.c:202
#7  0x08055bf2 in flush_dnsqueue () at dns.c:231
#8  0x0804f4c3 in main (argc=2, argv=0xbfffb364) at bbtest-net.c:2200

Changed to CMD bbtest-net --report --ping --checkresponse
--dns-timeout=15 --dns=ip
And now I get...

#0  0x0016ccef in raise () from /lib/tls/libc.so.6
#1  0x0016e4f5 in abort () from /lib/tls/libc.so.6
#2  0x0805dab6 in sigsegv_handler (signum=11) at sig.c:57
#3  <signal handler called>
#4  0x0804d18e in start_fping_service (service=0x400100e) at
bbtest-net.c:1140
#5  0x0804fcc2 in main (argc=6, argv=0xbfffd994) at bbtest-net.c:2220


--
Stewart Larsen
list Henrik Størner · Thu, 1 Dec 2005 22:43:49 +0100 ·
quoted from Stewart Larsen
On Thu, Dec 01, 2005 at 12:36:07PM -0500, user-6f5382941e41@xymon.invalid wrote:
Using Hobbit Monitor 4.1.2p1

Clean install replacing a bb/bbgen install

I get a bunch of these in my log...

"Task bbnet terminated by signal 6"
#3  <signal handler called>
#4  0x08060483 in read_tcp_data (channel=0x85ecd48, read_fds=0xbfff9f20,
now=1133457363) at ares_process.c:192
#5  0x08060287 in ares_process (channel=0x85ecd48, read_fds=0xbfff9f20,
write_fds=0xbfff9ea0) at ares_process.c:77
#6  0x08055b2e in dns_queue_run (channel=0x85ecd48) at dns.c:202
There's a known bug in the c-ares library used for DNS lookups.
Could you try running with the --no-ares option ?


Regards,
Henrik
list Etienne Marganne · Mon, 22 Feb 2010 16:25:55 +0100 ·
Hello,

 
A few days ago we experimented a total loss of service from Hobbit. The
bbtest-net process keeps on crashing; we tried first to isolate the line
from the bb-host file but it did not work, instability went back after a
short period of stability. We have two Hobbit servers instance being
clones and they failed the same way. Trying some debugging we searched
the mailing for similar problems to see if someone had a solution but
the -no-ares swith solution found did not make it.

The issue seams to come from a C library but I am not very sure.

 
Following the result from gdb on the core created by the bbtest-net:

 
GNU gdb 6.6

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you
are

welcome to change it and/or distribute copies of it under certain
conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for
details.

This GDB was configured as "i586-suse-linux"...

Using host libthread_db library "/lib/tls/libthread_db.so.1".

 
warning: Can't read pathname for load map: Input/output error.

Reading symbols from /usr/lib/libldap.so.199...done.

Loaded symbols for /usr/lib/libldap.so.199

Reading symbols from /usr/lib/liblber.so.199...done.

Loaded symbols for /usr/lib/liblber.so.199

Reading symbols from /usr/lib/libssl.so.0.9.7...done.

Loaded symbols for /usr/lib/libssl.so.0.9.7

Reading symbols from /usr/lib/libcrypto.so.0.9.7...done.

Loaded symbols for /usr/lib/libcrypto.so.0.9.7

Reading symbols from /lib/tls/libc.so.6...done.

Loaded symbols for /lib/tls/libc.so.6

Reading symbols from /lib/libresolv.so.2...done.

Loaded symbols for /lib/libresolv.so.2

Reading symbols from /usr/lib/libsasl2.so.2...done.

Loaded symbols for /usr/lib/libsasl2.so.2

Reading symbols from /lib/libdl.so.2...done.

Loaded symbols for /lib/libdl.so.2

Reading symbols from /lib/ld-linux.so.2...done.

Loaded symbols for /lib/ld-linux.so.2

Reading symbols from /lib/libnss_files.so.2...done.

Loaded symbols for /lib/libnss_files.so.2

Core was generated by `bbtest-net --debug --test-untagged
--dns-timeout=10 --timeout=10 --concurrency='.

Program terminated with signal 6, Aborted.

#0  0xffffe410 in __kernel_vsyscall ()

(gdb) bt

#0  0xffffe410 in __kernel_vsyscall ()

#1  0x401a75c1 in raise () from /lib/tls/libc.so.6

#2  0x401a8ea5 in abort () from /lib/tls/libc.so.6

#3  0x0805b933 in xstrdup (s=0x0) at memory.c:175

#4  0x080514d7 in setup_ssl (item=0x8379aa8) at contest.c:626

#5  0x08052816 in do_tcp_tests (timeout=10, concurrency=40) at
contest.c:998

#6  0x08050005 in main (argc=6, argv=0xbfffc454) at bbtest-net.c:2133

 
If you need any other information, let me know.

 
Thank you for any help,

 
Etienne Marganne.
list Etienne Marganne · Tue, 23 Feb 2010 13:50:58 +0100 ·
Hello,

 
Further debugging proved that a https URL is sending such a result to
the GET query that it makes the bbtest-net process crash. Removing the
URL test in the bb-hosts file fixed it. I will keep you posted about the
core explanation so that the exception can be handled by a newer code
version.

 
Best regards,

 
Etienne Marganne.
quoted from Etienne Marganne

 
From: Marganne, Etienne [mailto:user-b63c96159c04@xymon.invalid] 
Sent: Monday, February 22, 2010 4:26 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] bbtest-net crashing

 
Hello,

 
A few days ago we experimented a total loss of service from Hobbit. The
bbtest-net process keeps on crashing; we tried first to isolate the line
from the bb-host file but it did not work, instability went back after a
short period of stability. We have two Hobbit servers instance being
clones and they failed the same way. Trying some debugging we searched
the mailing for similar problems to see if someone had a solution but
the -no-ares swith solution found did not make it.

The issue seams to come from a C library but I am not very sure.

 
Following the result from gdb on the core created by the bbtest-net:

 
GNU gdb 6.6

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you
are

welcome to change it and/or distribute copies of it under certain
conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for
details.

This GDB was configured as "i586-suse-linux"...

Using host libthread_db library "/lib/tls/libthread_db.so.1".

 
warning: Can't read pathname for load map: Input/output error.

Reading symbols from /usr/lib/libldap.so.199...done.

Loaded symbols for /usr/lib/libldap.so.199

Reading symbols from /usr/lib/liblber.so.199...done.

Loaded symbols for /usr/lib/liblber.so.199

Reading symbols from /usr/lib/libssl.so.0.9.7...done.

Loaded symbols for /usr/lib/libssl.so.0.9.7

Reading symbols from /usr/lib/libcrypto.so.0.9.7...done.

Loaded symbols for /usr/lib/libcrypto.so.0.9.7

Reading symbols from /lib/tls/libc.so.6...done.

Loaded symbols for /lib/tls/libc.so.6

Reading symbols from /lib/libresolv.so.2...done.

Loaded symbols for /lib/libresolv.so.2

Reading symbols from /usr/lib/libsasl2.so.2...done.

Loaded symbols for /usr/lib/libsasl2.so.2

Reading symbols from /lib/libdl.so.2...done.

Loaded symbols for /lib/libdl.so.2

Reading symbols from /lib/ld-linux.so.2...done.

Loaded symbols for /lib/ld-linux.so.2

Reading symbols from /lib/libnss_files.so.2...done.

Loaded symbols for /lib/libnss_files.so.2

Core was generated by `bbtest-net --debug --test-untagged
--dns-timeout=10 --timeout=10 --concurrency='.

Program terminated with signal 6, Aborted.

#0  0xffffe410 in __kernel_vsyscall ()

(gdb) bt

#0  0xffffe410 in __kernel_vsyscall ()

#1  0x401a75c1 in raise () from /lib/tls/libc.so.6

#2  0x401a8ea5 in abort () from /lib/tls/libc.so.6

#3  0x0805b933 in xstrdup (s=0x0) at memory.c:175

#4  0x080514d7 in setup_ssl (item=0x8379aa8) at contest.c:626

#5  0x08052816 in do_tcp_tests (timeout=10, concurrency=40) at
contest.c:998

#6  0x08050005 in main (argc=6, argv=0xbfffc454) at bbtest-net.c:2133

 
If you need any other information, let me know.

 
Thank you for any help,

 
Etienne Marganne.