Xymon Mailing List Archive search

hobbitd status-board not available

18 messages in this thread

list David Gore · Wed, 18 May 2005 01:07:43 +0000 ·
How do I fix this?  I am concerned that Hobbit appears to be fluctuating between all green and numerous alarming hosts.

The hosts should be alarming.  This is after I have upgraded to Solaris 10, which so far has been a major disaster.

Error output:
Whoops ! bb failed to send message - timeout
hobbitd status-board not available

~David
list Henrik Størner · Wed, 18 May 2005 05:50:14 +0200 ·
quoted from David Gore
On Wed, May 18, 2005 at 01:07:43AM +0000, David Gore wrote:
How do I fix this?  I am concerned that Hobbit appears to be fluctuating between all green and numerous alarming hosts.

The hosts should be alarming.  This is after I have upgraded to Solaris 10, which so far has been a major disaster.

Error output:
Whoops ! bb failed to send message - timeout
hobbitd status-board not available
This is usually an indication that Hobbit has crashed.

Check the /var/log/hobbit/hobbitd.log for strange messages
(it's usually almost empty), and also hobbitlaunch.log for 
messages like "hobbitd terminated by signal ..."

If you see a message about "heartbeat lost", try removing
the HEARTBEAT setting in hobbitlaunch.cfg - there are some
cases where it doesn't really work well, and kills hobbitd
without it being necessary.


Henrik
list Thomas Pedersen · Wed, 22 Jun 2005 09:41:44 +0200 ·
Running Hobbit 4.0.4 on RHEL linux in a kinda special way. I have 2 BBNET servers, one server is also the BBDISPLAY server.

On the BBNET server without display I am running hobbit basicly only bbnet process. On this host I am also running bbproxy for my production BBDISPLAY and my test BBDISPLAY (third seperate server).

Now the problem started when I decided to "upgrade" the hobbit BBNET only server to also be a BBDISPLAY server. I did this by starting hobbit on port 1986 (--listen=172.17.110.20:1986) along with the bbdisplay/histry/lard etc processes and then add a third bbdisplay server to my bbproxy configuration (--bbdisplay=172.17.110.20:1986).

Hobbitd.log and hobbitlaunch.log are fine - no errors - but on my "primary" BBDISPLAY I now have the above line as status. I tried to add --debug to the hobbitd and bbdisplay configurations in hobbitlaunch.cfg but this crashed.

Tried to disable the HEARTBEAT in hobbitlaunch but this did not solve the problem.

If I stop hobbit the state is written to the tmp/hobbit.chk file and started again has the saved state.

If this was not enough the "new" BBDISPLAY is making the headlines but no hosts. :-(

Any help is appresiated.

BR Thomas
list Henrik Størner · Wed, 22 Jun 2005 14:29:44 +0200 ·
quoted from Thomas Pedersen
On Wed, Jun 22, 2005 at 09:41:44AM +0200, Thomas wrote:
Now the problem started when I decided to "upgrade" the hobbit BBNET only server to also be a BBDISPLAY server. I did this by starting hobbit on port 1986 (--listen=172.17.110.20:1986) along with the bbdisplay/histry/lard etc processes and then add a third bbdisplay server to my bbproxy configuration (--bbdisplay=172.17.110.20:1986).
I haven't quite understood exactly which servers are running BBDISPLAY and which are proxying and who should talk to who, but I think the
problem might have to do with the port 1986. Several of the tools - including bbgen (generates the overview webpages) and hobbitsvc.cgi (generates the detailed status view) talk to hobbitd over the network,
so they may have to be tweaked to communicate with a hobbitd on port
1986.

For troubleshooting, could you try running

   BBPORT=1986; export BBPORT
   bb ip.of.your.bbdisplay hobbitdboard

This should fetch the current status from your server using port 1986.

Now I'll go re-read your mail in greater detail and see if I can figure
out what your setup is :-)


Regards,
Henrik
list Thomas Pedersen · Wed, 22 Jun 2005 14:47:32 +0200 ·
It does give me the board as output.

If it gives you more info

-bash-2.05b$ netstat -ln | grep 198
tcp 0 0 0.0.0.0:1984 0.0.0.0:* LISTEN
tcp 0 0 172.17.110.20:1986 0.0.0.0:* LISTEN

where 1986 is the "local" BBNET/BBDISPLAY and the 1984 is the bbproxy server.

The error is only reported on the "central" BBDISPLAY

quoted from Henrik StørnerHenrik Stoerner skrev:
On Wed, Jun 22, 2005 at 09:41:44AM +0200, Thomas wrote:

  
Now the problem started when I decided to "upgrade" the hobbit BBNET 

only server to also be a BBDISPLAY server. I did this by starting hobbit 

on port 1986 (--listen=172.17.110.20:1986) along with the 

bbdisplay/histry/lard etc processes and then add a third bbdisplay 

server to my bbproxy configuration (--bbdisplay=172.17.110.20:1986).

    

I haven't quite understood exactly which servers are running BBDISPLAY 

and which are proxying and who should talk to who, but I think the

problem might have to do with the port 1986. Several of the tools - 

including bbgen (generates the overview webpages) and hobbitsvc.cgi 

(generates the detailed status view) talk to hobbitd over the network,

so they may have to be tweaked to communicate with a hobbitd on port

1986.



For troubleshooting, could you try running



   BBPORT=1986; export BBPORT

   bb ip.of.your.bbdisplay hobbitdboard



This should fetch the current status from your server using port 1986.



Now I'll go re-read your mail in greater detail and see if I can figure

out what your setup is :-)





Regards,

Henrik













  
list Henrik Størner · Sat, 8 Oct 2005 20:35:02 +0200 ·
On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:
What does this message mean.  Typically we get this when disabling multiple hosts.  Is it a host resource issue, something isn't replying quick enough?  We are on the snapshot from 03 October.  This has been happening over many weeks and different snapshots.  OS is solaris 9.
It really points to a bug in the hobbitd daemon - it means that some
task (usually bbdisplay) couldn't fetch the status information from
the Hobbit server, which it uses to build the webpages.

I'm somewhat alarmed if you have this problem with such a recent snapshot. I know there was a bug in 4.1.1 (and earlier) that could trigger this when disabling or renaming hosts, but that should not
happen with the snapshot from 03 Oct.
I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.
Interesting. I'll go over that particular piece of code again to
see if I can come up with an explanation. If you have a way of
triggering this, let me know - in that case, I'd like you to try out
some things to make it sure it is fixed.


Regards,
Henrik
list David Gore · Sat, 08 Oct 2005 16:08:57 -0600 ·
What does this message mean.  Typically we get this when disabling multiple hosts.  Is it a host resource issue, something isn't replying quick enough?  We are on the snapshot from 03 October.  This has been happening over many weeks and different snapshots.  OS is solaris 9.

It really is starting to frustrate people when we have a maintenance and everyone gets repeated pages during the maintenance.

I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.

I did get disable to work after I restarted hobbit, but less than 12 hours later it is failing again.

Here are some logs:

bb-display.log:
2005-09-28 01:19:22 hobbitd status-board not available
2005-09-28 01:38:47 hobbitd status-board not available
2005-09-28 19:57:51 hobbitd status-board not available
2005-09-28 20:01:53 hobbitd status-board not available
2005-09-29 03:00:01 hobbitd status-board not available
2005-09-29 19:50:052005-09-30 00:14:10 hobbitd status-board not available
2005-09-30 00:15:12 hobbitd status-board not available
2005-09-30 00:16:13 hobbitd status-board not available
2005-09-30 04:15:17 hobbitd status-board not available
2005-10-06 21:37:08 hobbitd status-board not available
2005-10-07 20:43:13 hobbitd status-board not available
2005-10-07 21:41:15 hobbitd status-board not available
2005-10-08 02:56:16 hobbitd status-board not available
2005-10-08 04:11:19 hobbitd status-board not available
2005-10-08 04:12:20 hobbitd status-board not available
2005-10-08 15:22:32 hobbitd status-board not available
2005-10-08 15:32:34 hobbitd status-board not available
2005-10-08 15:40:36 hobbitd status-board not available

history.log:
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.memory - color unchanged (blue)
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.procs - color unchanged (blue)
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.msgs - color unchanged (blue)
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.cpu - color unchanged (blue)
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.se - color unchanged (blue)
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.prtdiag - color unchanged (blue)
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.disk - color unchanged (blue)
2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.conn - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.memory - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.prtdiag - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.se - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.procs - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.msgs - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.cpu - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.disk - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.conn - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01b.conn - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.oradb - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.orasys - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.conn - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.procs - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.memory - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.disk - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.cpu - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.se - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.prtdiag - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.msgs - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02b.conn - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.topology - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.memory - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.prtdiag - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.net2 - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.se - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.conn - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.msgs - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.cpu - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.disk - color unchanged (blue)
2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03b.conn - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.memory - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.procs - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.msgs - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.cpu - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.se - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.prtdiag - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.disk - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.conn - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.memory - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.se - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.prtdiag - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.procs - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.msgs - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.cpu - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.linkstate - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.disk - color unchanged (blue)
2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.conn - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.memory - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.prtdiag - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.se - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.procs - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.msgs - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.cpu - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.disk - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.conn - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01b.conn - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.oradb - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.orasys - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.conn - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.procs - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.memory - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.disk - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.cpu - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.se - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.prtdiag - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.msgs - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02b.conn - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.topology - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.memory - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.prtdiag - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.net2 - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.se - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.conn - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.msgs - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.cpu - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.disk - color unchanged (blue)
2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03b.conn - color unchanged (blue)

~David
list David Gore · Tue, 11 Oct 2005 16:56:32 +0000 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:
  
What does this message mean.  Typically we get this when disabling 
multiple hosts.  Is it a host resource issue, something isn't replying 
quick enough?  We are on the snapshot from 03 October.  This has been 
happening over many weeks and different snapshots.  OS is solaris 9.
    
It really points to a bug in the hobbitd daemon - it means that some
task (usually bbdisplay) couldn't fetch the status information from
the Hobbit server, which it uses to build the webpages.

I'm somewhat alarmed if you have this problem with such a recent 
snapshot. I know there was a bug in 4.1.1 (and earlier) that could 
trigger this when disabling or renaming hosts, but that should not
happen with the snapshot from 03 Oct.

  
I am pretty sure these happen as people disable hosts and it fails 
although bb2.html shows them going to blue in the history, they will not 
show up on the enable/disable screen and usually show as failed when 
executing the disable.
    
Interesting. I'll go over that particular piece of code again to
see if I can come up with an explanation. If you have a way of
triggering this, let me know - in that case, I'd like you to try out
some things to make it sure it is fixed.


Regards,
Henrik

It is still happening with the latest 4.1.2 install.  A multi-host (~75+ 
hosts) disable worked, but then later on the enable it looks like 
hobbitd crashed:

hobbit at hobbit:/export/home/hobbit/server> find . -name core
./tmp/core
hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core
-rw-------   1 hobbit   other    13630500 Oct 11 16:46 ./tmp/core
hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core
./tmp/core:     ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.9"...
Core was generated by `hobbitd 
--pidfile=/export/home/hobbit/server/logs/hobbitd.pid --restart=/export'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libresolv.so.2...done.
Loaded symbols for /usr/lib/libresolv.so.2
Reading symbols from /usr/lib/libsocket.so.1...done.
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...done.
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libc.so.1...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libdl.so.1...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/lib/libmp.so.2...done.
Loaded symbols for /usr/lib/libmp.so.2
Reading symbols from /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done.
Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
(gdb) bt
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
#1  0xff136cd8 in abort () from /usr/lib/libc.so.1
#2  0x00021080 in sigsegv_handler (signum=10) at sig.c:57
#3  <signal handler called>
(gdb)

Can you give me directions on how I can do a relatively clean install 
and still retain all my historical information?

~David
list David Gore · Tue, 11 Oct 2005 17:28:28 +0000 ·
quoted from David Gore
David Gore wrote:
Henrik Stoerner wrote:
On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:
 
What does this message mean.  Typically we get this when disabling multiple hosts.  Is it a host resource issue, something isn't replying quick enough?  We are on the snapshot from 03 October.  This has been happening over many weeks and different snapshots.  OS is solaris 9.
    
It really points to a bug in the hobbitd daemon - it means that some
task (usually bbdisplay) couldn't fetch the status information from
the Hobbit server, which it uses to build the webpages.

I'm somewhat alarmed if you have this problem with such a recent snapshot. I know there was a bug in 4.1.1 (and earlier) that could trigger this when disabling or renaming hosts, but that should not
happen with the snapshot from 03 Oct.

 
I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.
    
Interesting. I'll go over that particular piece of code again to
see if I can come up with an explanation. If you have a way of
triggering this, let me know - in that case, I'd like you to try out
some things to make it sure it is fixed.


Regards,
Henrik

It is still happening with the latest 4.1.2 install.  A multi-host (~75+ hosts) disable worked, but then later on the enable it looks like hobbitd crashed:

hobbit at hobbit:/export/home/hobbit/server> find . -name core
./tmp/core
hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core
-rw-------   1 hobbit   other    13630500 Oct 11 16:46 ./tmp/core
hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core
./tmp/core:     ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.9"...
Core was generated by `hobbitd --pidfile=/export/home/hobbit/server/logs/hobbitd.pid --restart=/export'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libresolv.so.2...done.
Loaded symbols for /usr/lib/libresolv.so.2
Reading symbols from /usr/lib/libsocket.so.1...done.
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...done.
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libc.so.1...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libdl.so.1...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/lib/libmp.so.2...done.
Loaded symbols for /usr/lib/libmp.so.2
Reading symbols from /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done.
Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
(gdb) bt
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
#1  0xff136cd8 in abort () from /usr/lib/libc.so.1
#2  0x00021080 in sigsegv_handler (signum=10) at sig.c:57
#3  <signal handler called>
(gdb)

Can you give me directions on how I can do a relatively clean install and still retain all my historical information?

~David

It has cored several times now due to attempted multi-host re-enables.  I cannot re-enable the hosts.  The last time was 5 hosts with 1 test.  I am just going to let hobbit auto-enable them when their disable time expires.  Additionally, the disable/enable web page is not populated with any hosts for about ten minutes after the crash, that includes the info page.

~David
list David Gore · Thu, 13 Oct 2005 19:37:31 +0000 ·
I am not sure, if I missed this before I don't think I did, but it's possible.  
Regardless the problem has been resolved.

hobbitlaunch.log:2005-10-13 19:01:57 Could not get sem: No space left on device

solaris 9:

/etc/system:

set shmsys:shminfo_shmseg=10

# reboot # or init 6

Everything works well including multi-host enable/disables.  No cores since making the change.

Thank you Henrik for all your hard work!


~David

*e-mail via SUSE Linux 9.3 and other open source tools.
quoted from David Gore


David Gore wrote:
David Gore wrote:
Henrik Stoerner wrote:
On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:
 
What does this message mean.  Typically we get this when disabling multiple hosts.  Is it a host resource issue, something isn't replying quick enough?  We are on the snapshot from 03 October.  This has been happening over many weeks and different snapshots.  OS is solaris 9.
    
It really points to a bug in the hobbitd daemon - it means that some
task (usually bbdisplay) couldn't fetch the status information from
the Hobbit server, which it uses to build the webpages.

I'm somewhat alarmed if you have this problem with such a recent snapshot. I know there was a bug in 4.1.1 (and earlier) that could trigger this when disabling or renaming hosts, but that should not
happen with the snapshot from 03 Oct.

 
I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.
    
Interesting. I'll go over that particular piece of code again to
see if I can come up with an explanation. If you have a way of
triggering this, let me know - in that case, I'd like you to try out
some things to make it sure it is fixed.


Regards,
Henrik

It is still happening with the latest 4.1.2 install.  A multi-host (~75+ hosts) disable worked, but then later on the enable it looks like hobbitd crashed:

hobbit at hobbit:/export/home/hobbit/server> find . -name core
./tmp/core
hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core
-rw-------   1 hobbit   other    13630500 Oct 11 16:46 ./tmp/core
hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core
./tmp/core:     ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.9"...
Core was generated by `hobbitd --pidfile=/export/home/hobbit/server/logs/hobbitd.pid --restart=/export'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libresolv.so.2...done.
Loaded symbols for /usr/lib/libresolv.so.2
Reading symbols from /usr/lib/libsocket.so.1...done.
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...done.
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libc.so.1...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libdl.so.1...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/lib/libmp.so.2...done.
Loaded symbols for /usr/lib/libmp.so.2
Reading symbols from /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done.
Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
(gdb) bt
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
#1  0xff136cd8 in abort () from /usr/lib/libc.so.1
#2  0x00021080 in sigsegv_handler (signum=10) at sig.c:57
#3  <signal handler called>
(gdb)

Can you give me directions on how I can do a relatively clean install and still retain all my historical information?

~David

It has cored several times now due to attempted multi-host re-enables.  I cannot re-enable the hosts.  The last time was 5 hosts with 1 test.  I am just going to let hobbit auto-enable them when their disable time expires.  Additionally, the disable/enable web page is not populated with any hosts for about ten minutes after the crash, that includes the info page.

~David
list Henrik Størner · Thu, 13 Oct 2005 22:01:27 +0200 ·
quoted from David Gore
On Thu, Oct 13, 2005 at 07:37:31PM +0000, David Gore wrote:
I am not sure, if I missed this before I don't think I did, but it's possible.  Regardless the problem has been resolved.

hobbitlaunch.log:2005-10-13 19:01:57 Could not get sem: No space left on device
Whoa - that would explain why it crashes on any kind of disable you'd
do. I just don't understand how you got hobbitd running at all with
that error.

If you could mail me the full hobbitlaunch.log and hobbitd.log files
(directly at user-ce4a2c883f75@xymon.invalid, not to the list), I would very much like to take a look at them.


Regards,
Henrik
list Guillermo Castellini · Fri, 23 Feb 2007 08:24:01 -0300 ·
I have problems with bbgen test, i found this logs:

bb-display.log
::::::::::::::
2007-02-22 17:29:33 hobbitd status-board not available
2007-02-22 17:30:33 hobbitd status-board not available
2007-02-22 17:44:46 hobbitd status-board not available
2007-02-22 17:51:48 hobbitd status-board not available
2007-02-22 17:58:57 hobbitd status-board not available
2007-02-22 18:50:40 hobbitd status-board not available
2007-02-22 19:06:54 hobbitd status-board not available
2007-02-22 20:24:59 hobbitd status-board not available
2007-02-22 20:28:00 hobbitd status-board not available
2007-02-22 20:34:05 hobbitd status-board not available
2007-02-22 20:51:29 hobbitd status-board not available
2007-02-22 21:50:20 hobbitd status-board not available
2007-02-22 21:59:26 hobbitd status-board not available
2007-02-22 22:03:30 hobbitd status-board not available
2007-02-22 23:00:18 hobbitd status-board not available
2007-02-22 23:06:23 hobbitd status-board not available
2007-02-22 23:46:01 hobbitd status-board not available
2007-02-23 00:46:56 hobbitd status-board not available
2007-02-23 00:59:06 hobbitd status-board not available
2007-02-23 02:12:12 hobbitd status-board not available
2007-02-23 02:17:17 hobbitd status-board not available
2007-02-23 02:18:17 hobbitd status-board not available
2007-02-23 02:21:19 hobbitd status-board not available
2007-02-23 02:47:42 hobbitd status-board not available
2007-02-23 03:56:47 hobbitd status-board not available
2007-02-23 04:49:36 hobbitd status-board not available
2007-02-23 05:41:23 hobbitd status-board not available
2007-02-23 06:46:20 hobbitd status-board not available
2007-02-23 08:13:40 hobbitd status-board not available

I  set "shmsys:shminfo_shmseg=10" in the /etc/system of my solaris 10
(sparc), but the problem still there...
I read the archives mail, but i couldn´t find a real solution to my
problem... any idea?

Thank´s a lot to the comunity !
list Henrik Størner · Fri, 23 Feb 2007 12:34:57 +0100 ·
quoted from Guillermo Castellini
On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:
I have problems with bbgen test, i found this logs:

bb-display.log
::::::::::::::
2007-02-22 17:29:33 hobbitd status-board not available

I'm curious if the attached patch solves this problem. I ran into a
similar issue during a major network problem here, and found out that
the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one
connection to e.g. a client sending a status report was hanging.


Regards,
Henrik

-------------- next part --------------
--- hobbitd/hobbitd.c.orig	2007-02-23 12:33:49.678273441 +0100
+++ hobbitd/hobbitd.c	2007-02-23 12:33:53.374595668 +0100
@@ -4368,6 +4368,8 @@
 			switch (cwalk->doingwhat) {
 			  case RECEIVING:
 				if (FD_ISSET(cwalk->sock, &fdread)) {
+					if ((n == -1) && (errno == EAGAIN)) break; /* Do nothing */
• n = read(cwalk->sock, cwalk->bufp, (cwalk->bufsz - cwalk->buflen - 1));
 					if (n <= 0) {
 						/* End of input data on this connection */
@@ -4405,6 +4407,8 @@
 				if (FD_ISSET(cwalk->sock, &fdwrite)) {
 					n = write(cwalk->sock, cwalk->bufp, cwalk->buflen);
 
+					if ((n == -1) && (errno == EAGAIN)) break; /* Do nothing */
• if (n < 0) {
 						cwalk->buflen = 0;
 					}
@@ -4527,6 +4531,9 @@
 			int sock = accept(lsocket, (struct sockaddr *)&addr, &addrsz);
 
 			if (sock >= 0) {
+				/* Make sure our sockets are non-blocking */
+				fcntl(sock, F_SETFL, O_NONBLOCK);
• if (connhead == NULL) {
 					connhead = conntail = (conn_t *)malloc(sizeof(conn_t));
 				}
list Mike Rowell · Fri, 23 Feb 2007 12:08:05 -0000 ·
Henrik,

I am also one of the sufferers of this annoying bug, it only seems to
affect Solaris 10 from what I've seen on the platforms here, Solaris 8
(at least here) is unaffected.

I've applied the patch and will monitor the situation.

Regards,

Mike
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: 23 February 2007 11:35
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd status-board not available

On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:
I have problems with bbgen test, i found this logs:

bb-display.log
::::::::::::::
2007-02-22 17:29:33 hobbitd status-board not available

I'm curious if the attached patch solves this problem. I ran into a
similar issue during a major network problem here, and found out that
the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one
connection to e.g. a client sending a status report was hanging.


Regards,
Henrik


This email has been scanned for all viruses by the MessageLabs service.

This email has been scanned for all viruses by the MessageLabs service. 
list Guillermo Castellini · Fri, 23 Feb 2007 10:08:55 -0300 ·
I apply this patch and i am monitoring the log...
thank´s and regards...
quoted from Mike Rowell


On 2/23/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:
I have problems with bbgen test, i found this logs:

bb-display.log
::::::::::::::
2007-02-22 17:29:33 hobbitd status-board not available

I'm curious if the attached patch solves this problem. I ran into a
similar issue during a major network problem here, and found out that
the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one
connection to e.g. a client sending a status report was hanging.


Regards,
Henrik

list Guillermo Castellini · Fri, 23 Feb 2007 11:08:58 -0300 ·
I install this patch buth the problem still there... :(

bb-display.log
::::::::::::::
2007-02-23 10:35:38 hobbitd status-board not available
quoted from Guillermo Castellini

On 2/23/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:
I have problems with bbgen test, i found this logs:

bb-display.log
::::::::::::::
2007-02-22 17:29:33 hobbitd status-board not available

I'm curious if the attached patch solves this problem. I ran into a
similar issue during a major network problem here, and found out that
the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one
connection to e.g. a client sending a status report was hanging.


Regards,
Henrik

list Steve Holmes · Fri, 23 Feb 2007 10:39:59 -0500 ·
Hmmm. Me too. I hadn't noticed it before (I'm still testing hobbit). I would
be very interested in a fix for it :-).
Steve Holmes
Purdue University.
quoted from Mike Rowell


On 2/23/07, Mike Rowell <user-63f3e97eb1de@xymon.invalid> wrote:
Henrik,

I am also one of the sufferers of this annoying bug, it only seems to
affect Solaris 10 from what I've seen on the platforms here, Solaris 8
(at least here) is unaffected.

I've applied the patch and will monitor the situation.

Regards,

Mike

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 23 February 2007 11:35
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd status-board not available

On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:
I have problems with bbgen test, i found this logs:

bb-display.log
::::::::::::::
2007-02-22 17:29:33 hobbitd status-board not available

I'm curious if the attached patch solves this problem. I ran into a
similar issue during a major network problem here, and found out that
the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one
connection to e.g. a client sending a status report was hanging.


Regards,
Henrik


This email has been scanned for all viruses by the MessageLabs service.

This email has been scanned for all viruses by the MessageLabs service.

list Mike Rowell · Fri, 23 Feb 2007 17:09:10 -0000 ·
Unfortunately with the patch I have just had a status-board not
available.

 
Regards,

 
Mike

 
From: user-5425c7b245e1@xymon.invalid [mailto:user-5425c7b245e1@xymon.invalid] On Behalf Of
Steve Holmes
Sent: 23 February 2007 15:40
quoted from Steve Holmes
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd status-board not available

 
Hmmm. Me too. I hadn't noticed it before (I'm still testing hobbit). I
would be very interested in a fix for it :-).
Steve Holmes
Purdue University.


On 2/23/07, Mike Rowell <user-63f3e97eb1de@xymon.invalid> wrote:

Henrik,

I am also one of the sufferers of this annoying bug, it only seems to
affect Solaris 10 from what I've seen on the platforms here, Solaris 8
(at least here) is unaffected.

I've applied the patch and will monitor the situation. 

Regards,

Mike

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 23 February 2007 11:35
To: user-ae9b8668bcde@xymon.invalid 
Subject: Re: [hobbit] hobbitd status-board not available

On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:
I have problems with bbgen test, i found this logs:

bb-display.log 
::::::::::::::
2007-02-22 17:29:33 hobbitd status-board not available

I'm curious if the attached patch solves this problem. I ran into a
similar issue during a major network problem here, and found out that 
the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one
connection to e.g. a client sending a status report was hanging.


Regards,
Henrik


This email has been scanned for all viruses by the MessageLabs service.

This email has been scanned for all viruses by the MessageLabs service. 


This email has been scanned for all viruses by the MessageLabs service.


This email has been scanned for all viruses by the MessageLabs service.