Xymon Mailing List Archive search

hobbitfetch testers needed

7 messages in this thread

list Henrik Størner · Tue, 24 Jul 2007 14:36:03 +0200 ·
I've been going over the hobbitfetch code looking for the cause of the
cpu-spinning lockups that have been reported. I *think* I've found it,
but this needs confirmation from someone who sees the problem in a real
setup, as opposed to my testing scenario.

So I'd be interested to have my bugfix tested by someone who has this
problem. If you can, grab the current snapshot http://www.hswn.dk/beta/
Rebuild Hobbit, copy for snapshot/hobbitd/hobbitfetch utility into
your ~hobbit/server/bin/ directory and restart Hobbit. Hopefull the
problem should be gone.


Regards,
Henrik
list Daniel J McDonald · Tue, 24 Jul 2007 08:04:27 -0500 ·
quoted from Henrik Størner
On Tue, 2007-07-24 at 14:36 +0200, Henrik Stoerner wrote:
I've been going over the hobbitfetch code looking for the cause of the
cpu-spinning lockups that have been reported. I *think* I've found it,
but this needs confirmation from someone who sees the problem in a real
setup, as opposed to my testing scenario.

So I'd be interested to have my bugfix tested by someone who has this
problem. If you can, grab the current snapshot http://www.hswn.dk/beta/
Rebuild Hobbit, copy for snapshot/hobbitd/hobbitfetch utility into
your ~hobbit/server/bin/ directory and restart Hobbit. 
Ok, I've done that.  We'll keep a close eye on it today to see if it
goes berserk...
Hopefull the
problem should be gone.
-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com
list Henrik Størner · Tue, 24 Jul 2007 15:05:23 +0200 ·
quoted from Daniel J McDonald
On Tue, Jul 24, 2007 at 08:04:27AM -0500, Daniel J McDonald wrote:
On Tue, 2007-07-24 at 14:36 +0200, Henrik Stoerner wrote:
I've been going over the hobbitfetch code looking for the cause of the
cpu-spinning lockups that have been reported. I *think* I've found it,
but this needs confirmation from someone who sees the problem in a real
setup, as opposed to my testing scenario.

So I'd be interested to have my bugfix tested by someone who has this
problem. If you can, grab the current snapshot http://www.hswn.dk/beta/
Rebuild Hobbit, copy for snapshot/hobbitd/hobbitfetch utility into
your ~hobbit/server/bin/ directory and restart Hobbit. 
Ok, I've done that.  We'll keep a close eye on it today to see if it
goes berserk...
Please re-get it. I'd forgotten to update the snapshot file with the
last fix I made, so there was about 10 minutes with the wrong version
on the web.

The right one will have
  "$Id: hobbitfetch.c,v 1.18 2007/07/24 13:00:29 henrik Exp $";
near the top of the hobbitfetch.c file.


Sorry about that.

Henrik
list Daniel J McDonald · Tue, 24 Jul 2007 08:44:34 -0500 ·
quoted from Henrik Størner
On Tue, 2007-07-24 at 15:05 +0200, Henrik Stoerner wrote:
On Tue, Jul 24, 2007 at 08:04:27AM -0500, Daniel J McDonald wrote:
On Tue, 2007-07-24 at 14:36 +0200, Henrik Stoerner wrote:
I've been going over the hobbitfetch code looking for the cause of the
cpu-spinning lockups that have been reported. I *think* I've found it,
but this needs confirmation from someone who sees the problem in a real
setup, as opposed to my testing scenario.

So I'd be interested to have my bugfix tested by someone who has this
problem. If you can, grab the current snapshot http://www.hswn.dk/beta/
Rebuild Hobbit, copy for snapshot/hobbitd/hobbitfetch utility into
your ~hobbit/server/bin/ directory and restart Hobbit. 
Ok, I've done that.  We'll keep a close eye on it today to see if it
goes berserk...
Please re-get it. I'd forgotten to update the snapshot file with the
last fix I made, so there was about 10 minutes with the wrong version
on the web.

The right one will have
  "$Id: hobbitfetch.c,v 1.18 2007/07/24 13:00:29 henrik Exp $";
near the top of the hobbitfetch.c file.
static char rcsid[] = "$Id: hobbitfetch.c,v 1.18 2007/07/24 13:00:29
henrik Exp $";

It's installed now....

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com
list Dan McDonald · Wed, 25 Jul 2007 06:58:40 -0500 ·
quoted from Daniel J McDonald
On Tue, 2007-07-24 at 15:05 +0200, Henrik Stoerner wrote:
On Tue, Jul 24, 2007 at 08:04:27AM -0500, Daniel J McDonald wrote:
On Tue, 2007-07-24 at 14:36 +0200, Henrik Stoerner wrote:
I've been going over the hobbitfetch code looking for the cause of the
cpu-spinning lockups that have been reported. I *think* I've found it,
but this needs confirmation from someone who sees the problem in a real
setup, as opposed to my testing scenario.

So I'd be interested to have my bugfix tested by someone who has this
problem. If you can, grab the current snapshot http://www.hswn.dk/beta/
Rebuild Hobbit, copy for snapshot/hobbitd/hobbitfetch utility into
your ~hobbit/server/bin/ directory and restart Hobbit. 
Ok, I've done that.  We'll keep a close eye on it today to see if it
goes berserk...

In the past 22 hours it has not gotten into a loop, but I still see this
status on the hobbitfetch item:


                           - Program crashed
                                    
                          Fatal signal caught!
                                    

                        Status unchanged in 2 hours, 16 minutes
                         Status message received from 127.0.0.1
                                      Client data available
                                    

If you would like any log files are debug items, I will be happy to
provide them.  I will be in meetings most of the day starting in about
2-1/2 hours.
-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com
list Henrik Størner · Wed, 25 Jul 2007 14:44:56 +0200 ·
quoted from Dan McDonald
On Wed, Jul 25, 2007 at 06:58:40AM -0500, McDonald, Dan wrote:
In the past 22 hours it has not gotten into a loop, 
Good, I suppose the client data is being fetched and updates correctly?
quoted from Dan McDonald
but I still see this status on the hobbitfetch item:


                           - Program crashed
                                    
                          Fatal signal caught!
I'd like to know if this generates a core file in ~hobbit/server/tmp/
(it should), and if there is one then please run it through gdb to get
the backtrace as described in
http://www.hswn.dk/hobbit/help/known-issues.html#bugreport


Thanks,
Henrik
list Dan McDonald · Wed, 25 Jul 2007 08:09:25 -0500 ·
quoted from Henrik Størner
On Wed, 2007-07-25 at 14:44 +0200, Henrik Stoerner wrote:
On Wed, Jul 25, 2007 at 06:58:40AM -0500, McDonald, Dan wrote:
In the past 22 hours it has not gotten into a loop, 
Good, I suppose the client data is being fetched and updates correctly?
but I still see this status on the hobbitfetch item:


                           - Program crashed
                                    
                          Fatal signal caught!
I'd like to know if this generates a core file in ~hobbit/server/tmp/
(it should), and if there is one then please run it through gdb to get
the backtrace as described in
http://www.hswn.dk/hobbit/help/known-issues.html#bugreport
This appears to be the appropriate core:

[mcdonalddj at ldap ~]$ sudo su hobbit -
bash-3.00$ cd /usr/lib/hobbit
bash-3.00$ cd server
bash-3.00$ gdb bin/hobbitfetch tmp/core
GNU gdb 6.3-5mdk (Mandriva Linux release 2006.0)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i586-mandriva-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".

Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xffffe000
Core was generated by `/usr/lib/hobbit/server/bin/hobbitfetch
--server=127.0.0.1 --no-daemon --pidfile'.
Program terminated with signal 6, Aborted.

warning: svr4_current_sos: Can't read pathname for load map:
Input/output error

Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7de9051 in raise () from /lib/tls/libc.so.6
#2  0xb7deaa3b in abort () from /lib/tls/libc.so.6
#3  0x0804fbff in sigsegv_handler (signum=11) at sig.c:57
#4  <signal handler called>
#5  main (argc=4, argv=0xbff11094) at hobbitfetch.c:746
(gdb) quit
bash-3.00$ ls -l tmp         
lrwxrwxrwx  1 root root 19 Nov 27  2006 tmp -> /var/lib/hobbit/tmp
bash-3.00$ ls -l tmp/
total 6814
-rw-r--r--  1 hobbit hobbit   92921 Jul 25 07:56 alert.chk
-rw-r--r--  1 hobbit hobbit    1785 Jul 25 07:56 alert.chk.sub
-rw-------  1 hobbit hobbit  700416 Jul 25 07:45 core
-rw-------  1 hobbit hobbit  602112 Dec 11  2006 core.11005
-rw-------  1 hobbit hobbit  630784 Dec  8  2006 core.12024
-rw-------  1 hobbit hobbit  618496 Dec 12  2006 core.19573
-rw-------  1 hobbit hobbit  585728 Dec 27  2006 core.20895
-rw-------  1 hobbit hobbit  454656 Jan 23  2007 core.23799
-rw-------  1 hobbit hobbit  569344 May 22 01:40 core.24289
-rw-------  1 hobbit hobbit  860160 Feb  7 09:20 core.26081
-rw-------  1 hobbit hobbit  430080 Jun 26 21:55 core.30855
-rw-------  1 hobbit hobbit  577536 Jun 13 19:05 core.6773
-rw----r--  1 hobbit hobbit  450560 Dec  4  2006 core.7.21
-rw-------  1 hobbit hobbit  573440 Feb 16 02:20 core.7213
-rw-------  1 hobbit hobbit 1361898 Jul 25 07:58 hobbitd.chk
-rw-------  1 hobbit hobbit     176 Jul 25 07:59 ping..status
lrwxrwxrwx  1 root   root        30 Nov 27  2006 tmp
-> ../../../../var/lib/hobbit/tmp
bash-3.00$ 

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com