Xymon Mailing List Archive search

hobbitfetch infinite loop?

3 messages in this thread

list Daniel J McDonald · Sat, 27 Jan 2007 11:43:58 -0600 ·
Every couple of days I note that hobbitfetch is hung:
yellow Load is HIGH
System clock is 0 seconds off


top - 07:23:48 up 67 days, 15:36,  0 users,  load average: 3.57, 5.42, 5.18
Tasks:  94 total,   4 running,  90 sleeping,   0 stopped,   0 zombie
Cpu(s): 47.8% us,  6.5% sy,  0.3% ni, 43.0% id,  1.8% wa,  0.1% hi,  0.5% si
Mem:   2074580k total,  2008140k used,    66440k free,    46652k buffers
Swap:  1052216k total,     2576k used,  1049640k free,  1325888k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
30444 hobbit    25   0  2020 1016  616 R 96.3  0.0   3869:03 hobbitfetch 


I then go kill -9 the hobbitfetch process, and it starts "working" again, but I get an error reported in the hobbitfetch status:
                           - Program crashed
                          Fatal signal caught!
                                    

Nothing is written to hobbitfetch.log.  But I do get correct statuses
for the few hosts that I am polling.

I have 4 hosts that I poll using hobbitfetch.  I think there will be
just two more.  But they are all internet facing, so keeping them up is
a priority...

I would appreciate any suggestions as to where to begin troubleshooting
this, as I really need this to work.


-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com
list Henrik Størner · Sat, 27 Jan 2007 23:31:16 +0100 ·
quoted from Daniel J McDonald
On Sat, Jan 27, 2007 at 11:43:58AM -0600, Daniel J McDonald wrote:
Every couple of days I note that hobbitfetch is hung:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            30444 hobbit    25   0  2020 1016  616 R 96.3  0.0   3869:03 hobbitfetch 
I then go kill -9 the hobbitfetch process, and it starts "working" again, 
Next time, please do a "kill -6" and run the resulting coredump through
gdb to get the stacktrace (see the "Reporting bugs" online help).

If possible, I'd also like to have a copy of the coredump and the
hobbitfetch binary.


Regards,
Henrik
list Daniel J McDonald · Thu, 08 Feb 2007 10:35:15 -0600 ·
quoted from Henrik Størner
On Sat, 2007-01-27 at 23:31 +0100, Henrik Stoerner wrote:
On Sat, Jan 27, 2007 at 11:43:58AM -0600, Daniel J McDonald wrote:
Every couple of days I note that hobbitfetch is hung:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            > 30444 hobbit    25   0  2020 1016  616 R 96.3  0.0   3869:03 hobbitfetch > > I then go kill -9 the hobbitfetch process, and it starts "working" again, 
Next time, please do a "kill -6" and run the resulting coredump through
gdb to get the stacktrace (see the "Reporting bugs" online help).

If possible, I'd also like to have a copy of the coredump and the
hobbitfetch binary.
I've sent you a couple of those in the past...

On a whim, I specified IP addresses for those servers (rather than
0.0.0.0) and the infinite loop issue went away.  I still get the - Program crashed
Fatal signal caught!

message on the hobbitfetch monitoring page, but it is still pulling down
data much of the time.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com