Xymon Mailing List Archive search

hosts still disappear

7 messages in this thread

list Bruce Lysik · Tue, 18 Jan 2005 15:17:51 -0800 ·
Well, I spoke too soon about hosts disappearing.  I'm still seeing the same
problem.  Is there anything I can do to help in debugging this?

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Henrik Størner · Wed, 19 Jan 2005 07:58:58 +0100 ·
quoted from Bruce Lysik
On Tue, Jan 18, 2005 at 03:17:51PM -0800, Bruce Lysik wrote:
Well, I spoke too soon about hosts disappearing.  I'm still seeing the same
problem.  Is there anything I can do to help in debugging this?
This is with the beta-5 version PLUS the patch I posted yesterday ?

First, check if the hobbit daemon is still responding at all. You
can see if the ~server/tmp/hobbitd.chk file is being updated - it
should refresh every 10 minutes. You can also try this: If

 ~/server/bin/bb 127.0.0.1 "hobbitdlog some,server,name.conn"

responds with a status log, then it is responding. If not, proceed:


Next, I want to figure out what the hobbitd proces is doing. You'll
need the gdb (GNU debugger) for that. Login as the hobbit user, Find
the process ID of hobbitd - it's recorded in
/var/log/hobbit/hobbitd.pid, or use "ps" -  then:

  gdb ~/server/bin/hobbitd
  [gdb messages]
  gdb> attach <hobbitd-processid>
  gdb> bt

Let me know what this shows.


Henrik
list Bruce Lysik · Wed, 19 Jan 2005 09:34:40 -0800 ·
Correct, beta-5, plus the patch.  

Okay, hobbitd isn't responding at all.  hobbitd.chk hasn't been updated
since yesterday, and the hobbitlog command doesn't return anything.

Strangely, no pid file was located in /var/log/hobbit.  However, a ps
grepping for hobbit showed this:

bbuser   32296 32295  0 Jan18 ?        00:00:00 hobbitd
--restart=/opt/bb/server/tmp/hobbitd.chk
--checkpoint-file=/opt/bb/server/tmp/hobbitd.chk --checkpoint-interval=600
--purple-conn=conn --log=/var/log/hobbit/hobbitd.log
--admin-senders=127.0.0.1 172.16.190.2

I used gdb and attached to that pid, resulting in:

(gdb) attach 32296
Attaching to process 32296
Reading symbols from /opt/bb/server/bin/hobbitd...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0xb7584416 in semop () from /lib/tls/libc.so.6
(gdb) 
quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tue 1/18/2005 10:58 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hosts still disappear
 
On Tue, Jan 18, 2005 at 03:17:51PM -0800, Bruce Lysik wrote:
Well, I spoke too soon about hosts disappearing.  I'm still seeing the
same
problem.  Is there anything I can do to help in debugging this?
This is with the beta-5 version PLUS the patch I posted yesterday ?

First, check if the hobbit daemon is still responding at all. You
can see if the ~server/tmp/hobbitd.chk file is being updated - it
should refresh every 10 minutes. You can also try this: If

 ~/server/bin/bb 127.0.0.1 "hobbitdlog some,server,name.conn"

responds with a status log, then it is responding. If not, proceed:


Next, I want to figure out what the hobbitd proces is doing. You'll
need the gdb (GNU debugger) for that. Login as the hobbit user, Find
the process ID of hobbitd - it's recorded in
/var/log/hobbit/hobbitd.pid, or use "ps" -  then:

  gdb ~/server/bin/hobbitd
  [gdb messages]
  gdb> attach <hobbitd-processid>
  gdb> bt

Let me know what this shows.


Henrik
list Bruce Lysik · Wed, 19 Jan 2005 14:49:51 -0800 ·
I just realized I left out the actual backtrace.  Here's one I just ran
again:

0xb7584416 in semop () from /lib/tls/libc.so.6
(gdb) bt
#0  0xb7584416 in semop () from /lib/tls/libc.so.6
#1  0x08049f27 in posttochannel (channel=0xbfffc740,     channelmarker=0x80538b2 "status",     msg=0x8101868 "status uberbox.sys green Tue Jan 18 16:51:44 2005
\nSystem Description  NetApp Release 6.5.2: Sun Jul 25 10:56:02 PDT
2004\n\nObject ID", ' ' <repeats 11 times>,
".1.3.6.1.4.1.789.2.1\n\nUptime", ' ' <repeats 14 times>, "143 days,
10:42"..., sender=0x8053bbc "hobbitd", hostname=0x81017f0 "uberbox",     log=0x8101800, readymsg=0x0) at hobbitd.c:390
#2  0x0804ab22 in handle_status (
    msg=0x8101868 "status uberbox.sys green Tue Jan 18 16:51:44 2005
\nSystem Description  NetApp Release 6.5.2: Sun Jul 25 10:56:02 PDT
2004\n\nObject ID", ' ' <repeats 11 times>,
".1.3.6.1.4.1.789.2.1\n\nUptime", ' ' <repeats 14 times>, "143 days,
10:42"..., sender=0x8053bbc "hobbitd", hostname=0x81017f0 "uberbox",     testname=0x7e6bf <Address 0x7e6bf out of bounds>, log=0x8101800,     newcolor=3) at hobbitd.c:830
#3  0x0804df9a in check_purple_status () at hobbitd.c:2001
#4  0x0804ebff in main (argc=7, argv=0xbfffcb14) at hobbitd.c:2359
quoted from Bruce Lysik

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer

-----Original Message-----
From: Bruce Lysik Sent: Wednesday, January 19, 2005 9:35 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] hosts still disappear


Correct, beta-5, plus the patch.  
Okay, hobbitd isn't responding at all.  hobbitd.chk hasn't been updated
since yesterday, and the hobbitlog command doesn't return anything.

Strangely, no pid file was located in /var/log/hobbit.  However, a ps
grepping for hobbit showed this:

bbuser   32296 32295  0 Jan18 ?        00:00:00 hobbitd
--restart=/opt/bb/server/tmp/hobbitd.chk
--checkpoint-file=/opt/bb/server/tmp/hobbitd.chk --checkpoint-interval=600
--purple-conn=conn --log=/var/log/hobbit/hobbitd.log
--admin-senders=127.0.0.1 172.16.190.2

I used gdb and attached to that pid, resulting in:

(gdb) attach 32296
Attaching to process 32296
Reading symbols from /opt/bb/server/bin/hobbitd...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0xb7584416 in semop () from /lib/tls/libc.so.6
(gdb) 


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tue 1/18/2005 10:58 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hosts still disappear
 On Tue, Jan 18, 2005 at 03:17:51PM -0800, Bruce Lysik wrote:
Well, I spoke too soon about hosts disappearing.  I'm still seeing the
same
problem.  Is there anything I can do to help in debugging this?
This is with the beta-5 version PLUS the patch I posted yesterday ?

First, check if the hobbit daemon is still responding at all. You
can see if the ~server/tmp/hobbitd.chk file is being updated - it
should refresh every 10 minutes. You can also try this: If

 ~/server/bin/bb 127.0.0.1 "hobbitdlog some,server,name.conn"

responds with a status log, then it is responding. If not, proceed:


Next, I want to figure out what the hobbitd proces is doing. You'll
need the gdb (GNU debugger) for that. Login as the hobbit user, Find
the process ID of hobbitd - it's recorded in
/var/log/hobbit/hobbitd.pid, or use "ps" -  then:

  gdb ~/server/bin/hobbitd
  [gdb messages]
  gdb> attach <hobbitd-processid>
  gdb> bt

Let me know what this shows.


Henrik

list Charles Jones · Wed, 19 Jan 2005 22:08:54 -0700 ·
When I try using the "apache" keyword in bb-hosts, it causes hobbits network tests to crash and fail.  The result is statuses are still updated by remote bb-clients, but all the outgoing hobbit tests (apache, httpd, conn, dns) fail and turn purple.  The "bbtest" status also turns red and says "program crashed!"

I verified that the host I am testing with the apache directive has the proper conf for http://IP.OF.HOST/server-status?auto to work.

I have taken the apache directive out, and everything is green again.  It's just a test server, so I would be glad to reproduce the error and send any info you need.  OS is Solaris10, Hobbit beta 5+patch.

-Charles
list Henrik Størner · Thu, 20 Jan 2005 07:32:39 +0100 ·
On Wed, Jan 19, 2005 at 09:34:40AM -0800, Bruce Lysik wrote:
[hobbitd hangs]
0xb7584416 in semop () from /lib/tls/libc.so.6
OK, this is interesting. hobbitd uses a technique called "semaphores"
to synchronize the communication between hobbitd and the
hobbitd_channel processes. hobbitd being stuck doing a "semop" call
indicates that it got into a deadlock situation, where hobbitd is
waiting for one of the hobbitd_channel children to finish, but it
never signals that it has completed.

This is a different kind of bug than I had expected.

I'll have to think about how to handle that.


Henrik
list Rick Waegner · Thu, 20 Jan 2005 09:10:01 -0600 ·
hmm, I've got the apache keyword as well and nothing has crashed so
far.  I DO have one issue with it though... we're not "allowed" to have
non-authenticated access to the server-status screen, so HOW do I pass
the username and password to get the data so I can have it graphed?


Rick
quoted from Charles Jones

On Wed, 2005-01-19 at 23:08, Charles Jones wrote:
When I try using the "apache" keyword in bb-hosts, it causes hobbits network tests to crash and fail.  The result is statuses are still updated by remote bb-clients, but all the outgoing hobbit tests (apache, httpd, conn, dns) fail and turn purple.  The "bbtest" status also turns red and says "program crashed!"

I verified that the host I am testing with the apache directive has the proper conf for http://IP.OF.HOST/server-status?auto to work.

I have taken the apache directive out, and everything is green again.  It's just a test server, so I would be glad to reproduce the error and send any info you need.  OS is Solaris10, Hobbit beta 5+patch.

-Charles