hobbitd status-board not available
list David Gore
How do I fix this? I am concerned that Hobbit appears to be fluctuating between all green and numerous alarming hosts. The hosts should be alarming. This is after I have upgraded to Solaris 10, which so far has been a major disaster. Error output: Whoops ! bb failed to send message - timeout hobbitd status-board not available ~David
list Henrik Størner
▸
On Wed, May 18, 2005 at 01:07:43AM +0000, David Gore wrote:
How do I fix this? I am concerned that Hobbit appears to be fluctuating between all green and numerous alarming hosts. The hosts should be alarming. This is after I have upgraded to Solaris 10, which so far has been a major disaster. Error output: Whoops ! bb failed to send message - timeout hobbitd status-board not available
This is usually an indication that Hobbit has crashed. Check the /var/log/hobbit/hobbitd.log for strange messages (it's usually almost empty), and also hobbitlaunch.log for messages like "hobbitd terminated by signal ..." If you see a message about "heartbeat lost", try removing the HEARTBEAT setting in hobbitlaunch.cfg - there are some cases where it doesn't really work well, and kills hobbitd without it being necessary. Henrik
list Thomas Pedersen
Running Hobbit 4.0.4 on RHEL linux in a kinda special way. I have 2 BBNET servers, one server is also the BBDISPLAY server. On the BBNET server without display I am running hobbit basicly only bbnet process. On this host I am also running bbproxy for my production BBDISPLAY and my test BBDISPLAY (third seperate server). Now the problem started when I decided to "upgrade" the hobbit BBNET only server to also be a BBDISPLAY server. I did this by starting hobbit on port 1986 (--listen=172.17.110.20:1986) along with the bbdisplay/histry/lard etc processes and then add a third bbdisplay server to my bbproxy configuration (--bbdisplay=172.17.110.20:1986). Hobbitd.log and hobbitlaunch.log are fine - no errors - but on my "primary" BBDISPLAY I now have the above line as status. I tried to add --debug to the hobbitd and bbdisplay configurations in hobbitlaunch.cfg but this crashed. Tried to disable the HEARTBEAT in hobbitlaunch but this did not solve the problem. If I stop hobbit the state is written to the tmp/hobbit.chk file and started again has the saved state. If this was not enough the "new" BBDISPLAY is making the headlines but no hosts. :-( Any help is appresiated. BR Thomas
list Henrik Størner
▸
On Wed, Jun 22, 2005 at 09:41:44AM +0200, Thomas wrote:
Now the problem started when I decided to "upgrade" the hobbit BBNET only server to also be a BBDISPLAY server. I did this by starting hobbit on port 1986 (--listen=172.17.110.20:1986) along with the bbdisplay/histry/lard etc processes and then add a third bbdisplay server to my bbproxy configuration (--bbdisplay=172.17.110.20:1986).
I haven't quite understood exactly which servers are running BBDISPLAY and which are proxying and who should talk to who, but I think the problem might have to do with the port 1986. Several of the tools - including bbgen (generates the overview webpages) and hobbitsvc.cgi (generates the detailed status view) talk to hobbitd over the network, so they may have to be tweaked to communicate with a hobbitd on port 1986. For troubleshooting, could you try running BBPORT=1986; export BBPORT bb ip.of.your.bbdisplay hobbitdboard This should fetch the current status from your server using port 1986. Now I'll go re-read your mail in greater detail and see if I can figure out what your setup is :-) Regards, Henrik
list Thomas Pedersen
It does give me the board as output.
If it gives you more info
-bash-2.05b$ netstat -ln | grep 198
tcp 0 0 0.0.0.0:1984 0.0.0.0:* LISTEN
tcp 0 0 172.17.110.20:1986 0.0.0.0:* LISTEN
where 1986 is the "local" BBNET/BBDISPLAY and the 1984 is the bbproxy server.
The error is only reported on the "central" BBDISPLAY
If it gives you more info
-bash-2.05b$ netstat -ln | grep 198
tcp 0 0 0.0.0.0:1984 0.0.0.0:* LISTEN
tcp 0 0 172.17.110.20:1986 0.0.0.0:* LISTEN
where 1986 is the "local" BBNET/BBDISPLAY and the 1984 is the bbproxy server.
The error is only reported on the "central" BBDISPLAY
▸
Henrik Stoerner skrev:On Wed, Jun 22, 2005 at 09:41:44AM +0200, Thomas wrote:Now the problem started when I decided to "upgrade" the hobbit BBNET only server to also be a BBDISPLAY server. I did this by starting hobbit on port 1986 (--listen=172.17.110.20:1986) along with the bbdisplay/histry/lard etc processes and then add a third bbdisplay server to my bbproxy configuration (--bbdisplay=172.17.110.20:1986).I haven't quite understood exactly which servers are running BBDISPLAY and which are proxying and who should talk to who, but I think the problem might have to do with the port 1986. Several of the tools - including bbgen (generates the overview webpages) and hobbitsvc.cgi (generates the detailed status view) talk to hobbitd over the network, so they may have to be tweaked to communicate with a hobbitd on port 1986. For troubleshooting, could you try running BBPORT=1986; export BBPORT bb ip.of.your.bbdisplay hobbitdboard This should fetch the current status from your server using port 1986. Now I'll go re-read your mail in greater detail and see if I can figure out what your setup is :-) Regards, Henrik
list Henrik Størner
On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:
What does this message mean. Typically we get this when disabling multiple hosts. Is it a host resource issue, something isn't replying quick enough? We are on the snapshot from 03 October. This has been happening over many weeks and different snapshots. OS is solaris 9.
It really points to a bug in the hobbitd daemon - it means that some task (usually bbdisplay) couldn't fetch the status information from the Hobbit server, which it uses to build the webpages. I'm somewhat alarmed if you have this problem with such a recent snapshot. I know there was a bug in 4.1.1 (and earlier) that could trigger this when disabling or renaming hosts, but that should not happen with the snapshot from 03 Oct.
I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.
Interesting. I'll go over that particular piece of code again to see if I can come up with an explanation. If you have a way of triggering this, let me know - in that case, I'd like you to try out some things to make it sure it is fixed. Regards, Henrik
list David Gore
What does this message mean. Typically we get this when disabling multiple hosts. Is it a host resource issue, something isn't replying quick enough? We are on the snapshot from 03 October. This has been happening over many weeks and different snapshots. OS is solaris 9. It really is starting to frustrate people when we have a maintenance and everyone gets repeated pages during the maintenance. I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable. I did get disable to work after I restarted hobbit, but less than 12 hours later it is failing again. Here are some logs: bb-display.log: 2005-09-28 01:19:22 hobbitd status-board not available 2005-09-28 01:38:47 hobbitd status-board not available 2005-09-28 19:57:51 hobbitd status-board not available 2005-09-28 20:01:53 hobbitd status-board not available 2005-09-29 03:00:01 hobbitd status-board not available 2005-09-29 19:50:052005-09-30 00:14:10 hobbitd status-board not available 2005-09-30 00:15:12 hobbitd status-board not available 2005-09-30 00:16:13 hobbitd status-board not available 2005-09-30 04:15:17 hobbitd status-board not available 2005-10-06 21:37:08 hobbitd status-board not available 2005-10-07 20:43:13 hobbitd status-board not available 2005-10-07 21:41:15 hobbitd status-board not available 2005-10-08 02:56:16 hobbitd status-board not available 2005-10-08 04:11:19 hobbitd status-board not available 2005-10-08 04:12:20 hobbitd status-board not available 2005-10-08 15:22:32 hobbitd status-board not available 2005-10-08 15:32:34 hobbitd status-board not available 2005-10-08 15:40:36 hobbitd status-board not available history.log: 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.memory - color unchanged (blue) 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.procs - color unchanged (blue) 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.msgs - color unchanged (blue) 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.cpu - color unchanged (blue) 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.se - color unchanged (blue) 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.prtdiag - color unchanged (blue) 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.disk - color unchanged (blue) 2005-10-08 15:29:28 Will not update /export/home/hobbit/data/hist/thisHost1.conn - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.memory - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.prtdiag - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.se - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.procs - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.msgs - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.cpu - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.disk - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01.conn - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost01b.conn - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.oradb - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.orasys - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.conn - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.procs - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.memory - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.disk - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.cpu - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.se - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.prtdiag - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02.msgs - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost02b.conn - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.topology - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.memory - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.prtdiag - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.net2 - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.se - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.conn - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.msgs - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.cpu - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03.disk - color unchanged (blue) 2005-10-08 15:31:40 Will not update /export/home/hobbit/data/hist/anotherHost03b.conn - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.memory - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.procs - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.msgs - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.cpu - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.se - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.prtdiag - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.disk - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost1.conn - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.memory - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.se - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.prtdiag - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.procs - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.msgs - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.cpu - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.linkstate - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.disk - color unchanged (blue) 2005-10-08 15:38:23 Will not update /export/home/hobbit/data/hist/thisHost2.conn - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.memory - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.prtdiag - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.se - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.procs - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.msgs - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.cpu - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.disk - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01.conn - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost01b.conn - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.oradb - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.orasys - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.conn - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.procs - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.memory - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.disk - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.cpu - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.se - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.prtdiag - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02.msgs - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost02b.conn - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.topology - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.memory - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.prtdiag - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.net2 - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.se - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.conn - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.msgs - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.cpu - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03.disk - color unchanged (blue) 2005-10-08 15:39:59 Will not update /export/home/hobbit/data/hist/anotherHost03b.conn - color unchanged (blue) ~David
list David Gore
▸
Henrik Stoerner wrote:
On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:What does this message mean. Typically we get this when disabling multiple hosts. Is it a host resource issue, something isn't replying quick enough? We are on the snapshot from 03 October. This has been happening over many weeks and different snapshots. OS is solaris 9.It really points to a bug in the hobbitd daemon - it means that some task (usually bbdisplay) couldn't fetch the status information from the Hobbit server, which it uses to build the webpages. I'm somewhat alarmed if you have this problem with such a recent snapshot. I know there was a bug in 4.1.1 (and earlier) that could trigger this when disabling or renaming hosts, but that should not happen with the snapshot from 03 Oct.I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.Interesting. I'll go over that particular piece of code again to see if I can come up with an explanation. If you have a way of triggering this, let me know - in that case, I'd like you to try out some things to make it sure it is fixed. Regards, Henrik
It is still happening with the latest 4.1.2 install. A multi-host (~75+ hosts) disable worked, but then later on the enable it looks like hobbitd crashed: hobbit at hobbit:/export/home/hobbit/server> find . -name core ./tmp/core hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core -rw------- 1 hobbit other 13630500 Oct 11 16:46 ./tmp/core hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core ./tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd --pidfile=/export/home/hobbit/server/logs/hobbitd.pid --restart=/export'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1 #0 0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff136cd8 in abort () from /usr/lib/libc.so.1 #2 0x00021080 in sigsegv_handler (signum=10) at sig.c:57 #3 <signal handler called> (gdb) Can you give me directions on how I can do a relatively clean install and still retain all my historical information? ~David
list David Gore
▸
David Gore wrote:
Henrik Stoerner wrote:On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:What does this message mean. Typically we get this when disabling multiple hosts. Is it a host resource issue, something isn't replying quick enough? We are on the snapshot from 03 October. This has been happening over many weeks and different snapshots. OS is solaris 9.It really points to a bug in the hobbitd daemon - it means that some task (usually bbdisplay) couldn't fetch the status information from the Hobbit server, which it uses to build the webpages. I'm somewhat alarmed if you have this problem with such a recent snapshot. I know there was a bug in 4.1.1 (and earlier) that could trigger this when disabling or renaming hosts, but that should not happen with the snapshot from 03 Oct.I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.Interesting. I'll go over that particular piece of code again to see if I can come up with an explanation. If you have a way of triggering this, let me know - in that case, I'd like you to try out some things to make it sure it is fixed. Regards, HenrikIt is still happening with the latest 4.1.2 install. A multi-host (~75+ hosts) disable worked, but then later on the enable it looks like hobbitd crashed: hobbit at hobbit:/export/home/hobbit/server> find . -name core ./tmp/core hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core -rw------- 1 hobbit other 13630500 Oct 11 16:46 ./tmp/core hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core ./tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd --pidfile=/export/home/hobbit/server/logs/hobbitd.pid --restart=/export'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1 #0 0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff136cd8 in abort () from /usr/lib/libc.so.1 #2 0x00021080 in sigsegv_handler (signum=10) at sig.c:57 #3 <signal handler called> (gdb) Can you give me directions on how I can do a relatively clean install and still retain all my historical information? ~David
It has cored several times now due to attempted multi-host re-enables. I cannot re-enable the hosts. The last time was 5 hosts with 1 test. I am just going to let hobbit auto-enable them when their disable time expires. Additionally, the disable/enable web page is not populated with any hosts for about ten minutes after the crash, that includes the info page. ~David
list David Gore
I am not sure, if I missed this before I don't think I did, but it's possible. Regardless the problem has been resolved. hobbitlaunch.log:2005-10-13 19:01:57 Could not get sem: No space left on device solaris 9: /etc/system: set shmsys:shminfo_shmseg=10 # reboot # or init 6 Everything works well including multi-host enable/disables. No cores since making the change. Thank you Henrik for all your hard work! ~David *e-mail via SUSE Linux 9.3 and other open source tools.
▸
David Gore wrote:David Gore wrote:Henrik Stoerner wrote:On Sat, Oct 08, 2005 at 04:08:57PM -0600, David Gore wrote:What does this message mean. Typically we get this when disabling multiple hosts. Is it a host resource issue, something isn't replying quick enough? We are on the snapshot from 03 October. This has been happening over many weeks and different snapshots. OS is solaris 9.It really points to a bug in the hobbitd daemon - it means that some task (usually bbdisplay) couldn't fetch the status information from the Hobbit server, which it uses to build the webpages. I'm somewhat alarmed if you have this problem with such a recent snapshot. I know there was a bug in 4.1.1 (and earlier) that could trigger this when disabling or renaming hosts, but that should not happen with the snapshot from 03 Oct.I am pretty sure these happen as people disable hosts and it fails although bb2.html shows them going to blue in the history, they will not show up on the enable/disable screen and usually show as failed when executing the disable.Interesting. I'll go over that particular piece of code again to see if I can come up with an explanation. If you have a way of triggering this, let me know - in that case, I'd like you to try out some things to make it sure it is fixed. Regards, HenrikIt is still happening with the latest 4.1.2 install. A multi-host (~75+ hosts) disable worked, but then later on the enable it looks like hobbitd crashed: hobbit at hobbit:/export/home/hobbit/server> find . -name core ./tmp/core hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core -rw------- 1 hobbit other 13630500 Oct 11 16:46 ./tmp/core hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core ./tmp/core: ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd' hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.9"... Core was generated by `hobbitd --pidfile=/export/home/hobbit/server/logs/hobbitd.pid --restart=/export'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libresolv.so.2...done. Loaded symbols for /usr/lib/libresolv.so.2 Reading symbols from /usr/lib/libsocket.so.1...done. Loaded symbols for /usr/lib/libsocket.so.1 Reading symbols from /usr/lib/libnsl.so.1...done. Loaded symbols for /usr/lib/libnsl.so.1 Reading symbols from /usr/lib/libc.so.1...done. Loaded symbols for /usr/lib/libc.so.1 Reading symbols from /usr/lib/libdl.so.1...done. Loaded symbols for /usr/lib/libdl.so.1 Reading symbols from /usr/lib/libmp.so.2...done. Loaded symbols for /usr/lib/libmp.so.2 Reading symbols from /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done. Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1 #0 0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1 (gdb) bt #0 0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1 #1 0xff136cd8 in abort () from /usr/lib/libc.so.1 #2 0x00021080 in sigsegv_handler (signum=10) at sig.c:57 #3 <signal handler called> (gdb) Can you give me directions on how I can do a relatively clean install and still retain all my historical information? ~DavidIt has cored several times now due to attempted multi-host re-enables. I cannot re-enable the hosts. The last time was 5 hosts with 1 test. I am just going to let hobbit auto-enable them when their disable time expires. Additionally, the disable/enable web page is not populated with any hosts for about ten minutes after the crash, that includes the info page. ~David
list Henrik Størner
▸
On Thu, Oct 13, 2005 at 07:37:31PM +0000, David Gore wrote:
I am not sure, if I missed this before I don't think I did, but it's possible. Regardless the problem has been resolved. hobbitlaunch.log:2005-10-13 19:01:57 Could not get sem: No space left on device
Whoa - that would explain why it crashes on any kind of disable you'd do. I just don't understand how you got hobbitd running at all with that error. If you could mail me the full hobbitlaunch.log and hobbitd.log files (directly at user-ce4a2c883f75@xymon.invalid, not to the list), I would very much like to take a look at them. Regards, Henrik
list Guillermo Castellini
I have problems with bbgen test, i found this logs: bb-display.log :::::::::::::: 2007-02-22 17:29:33 hobbitd status-board not available 2007-02-22 17:30:33 hobbitd status-board not available 2007-02-22 17:44:46 hobbitd status-board not available 2007-02-22 17:51:48 hobbitd status-board not available 2007-02-22 17:58:57 hobbitd status-board not available 2007-02-22 18:50:40 hobbitd status-board not available 2007-02-22 19:06:54 hobbitd status-board not available 2007-02-22 20:24:59 hobbitd status-board not available 2007-02-22 20:28:00 hobbitd status-board not available 2007-02-22 20:34:05 hobbitd status-board not available 2007-02-22 20:51:29 hobbitd status-board not available 2007-02-22 21:50:20 hobbitd status-board not available 2007-02-22 21:59:26 hobbitd status-board not available 2007-02-22 22:03:30 hobbitd status-board not available 2007-02-22 23:00:18 hobbitd status-board not available 2007-02-22 23:06:23 hobbitd status-board not available 2007-02-22 23:46:01 hobbitd status-board not available 2007-02-23 00:46:56 hobbitd status-board not available 2007-02-23 00:59:06 hobbitd status-board not available 2007-02-23 02:12:12 hobbitd status-board not available 2007-02-23 02:17:17 hobbitd status-board not available 2007-02-23 02:18:17 hobbitd status-board not available 2007-02-23 02:21:19 hobbitd status-board not available 2007-02-23 02:47:42 hobbitd status-board not available 2007-02-23 03:56:47 hobbitd status-board not available 2007-02-23 04:49:36 hobbitd status-board not available 2007-02-23 05:41:23 hobbitd status-board not available 2007-02-23 06:46:20 hobbitd status-board not available 2007-02-23 08:13:40 hobbitd status-board not available I set "shmsys:shminfo_shmseg=10" in the /etc/system of my solaris 10 (sparc), but the problem still there... I read the archives mail, but i couldn´t find a real solution to my problem... any idea? Thank´s a lot to the comunity !
list Henrik Størner
▸
On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:
I have problems with bbgen test, i found this logs: bb-display.log :::::::::::::: 2007-02-22 17:29:33 hobbitd status-board not available
I'm curious if the attached patch solves this problem. I ran into a
similar issue during a major network problem here, and found out that
the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one
connection to e.g. a client sending a status report was hanging.
Regards,
Henrik
-------------- next part --------------
--- hobbitd/hobbitd.c.orig 2007-02-23 12:33:49.678273441 +0100
+++ hobbitd/hobbitd.c 2007-02-23 12:33:53.374595668 +0100
@@ -4368,6 +4368,8 @@
switch (cwalk->doingwhat) {
case RECEIVING:
if (FD_ISSET(cwalk->sock, &fdread)) {
+ if ((n == -1) && (errno == EAGAIN)) break; /* Do nothing */
• n = read(cwalk->sock, cwalk->bufp, (cwalk->bufsz - cwalk->buflen - 1));
if (n <= 0) {
/* End of input data on this connection */
@@ -4405,6 +4407,8 @@
if (FD_ISSET(cwalk->sock, &fdwrite)) {
n = write(cwalk->sock, cwalk->bufp, cwalk->buflen);
+ if ((n == -1) && (errno == EAGAIN)) break; /* Do nothing */
• if (n < 0) {
cwalk->buflen = 0;
}
@@ -4527,6 +4531,9 @@
int sock = accept(lsocket, (struct sockaddr *)&addr, &addrsz);
if (sock >= 0) {
+ /* Make sure our sockets are non-blocking */
+ fcntl(sock, F_SETFL, O_NONBLOCK);
• if (connhead == NULL) {
connhead = conntail = (conn_t *)malloc(sizeof(conn_t));
}
list Mike Rowell
Henrik, I am also one of the sufferers of this annoying bug, it only seems to affect Solaris 10 from what I've seen on the platforms here, Solaris 8 (at least here) is unaffected. I've applied the patch and will monitor the situation. Regards, Mike
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 23 February 2007 11:35
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd status-board not available
On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:I have problems with bbgen test, i found this logs: bb-display.log :::::::::::::: 2007-02-22 17:29:33 hobbitd status-board not available
I'm curious if the attached patch solves this problem. I ran into a similar issue during a major network problem here, and found out that the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one connection to e.g. a client sending a status report was hanging. Regards, Henrik
This email has been scanned for all viruses by the MessageLabs service.
This email has been scanned for all viruses by the MessageLabs service.
list Guillermo Castellini
I apply this patch and i am monitoring the log... thank´s and regards...
▸
On 2/23/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:I have problems with bbgen test, i found this logs: bb-display.log :::::::::::::: 2007-02-22 17:29:33 hobbitd status-board not availableI'm curious if the attached patch solves this problem. I ran into a similar issue during a major network problem here, and found out that the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one connection to e.g. a client sending a status report was hanging. Regards, Henrik
list Guillermo Castellini
I install this patch buth the problem still there... :( bb-display.log :::::::::::::: 2007-02-23 10:35:38 hobbitd status-board not available
▸
On 2/23/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:I have problems with bbgen test, i found this logs: bb-display.log :::::::::::::: 2007-02-22 17:29:33 hobbitd status-board not availableI'm curious if the attached patch solves this problem. I ran into a similar issue during a major network problem here, and found out that the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one connection to e.g. a client sending a status report was hanging. Regards, Henrik
list Steve Holmes
Hmmm. Me too. I hadn't noticed it before (I'm still testing hobbit). I would be very interested in a fix for it :-). Steve Holmes Purdue University.
▸
On 2/23/07, Mike Rowell <user-63f3e97eb1de@xymon.invalid> wrote:Henrik, I am also one of the sufferers of this annoying bug, it only seems to affect Solaris 10 from what I've seen on the platforms here, Solaris 8 (at least here) is unaffected. I've applied the patch and will monitor the situation. Regards, Mike -----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: 23 February 2007 11:35 To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] hobbitd status-board not available On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:I have problems with bbgen test, i found this logs: bb-display.log :::::::::::::: 2007-02-22 17:29:33 hobbitd status-board not availableI'm curious if the attached patch solves this problem. I ran into a similar issue during a major network problem here, and found out that the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one connection to e.g. a client sending a status report was hanging. Regards, Henrik This email has been scanned for all viruses by the MessageLabs service. This email has been scanned for all viruses by the MessageLabs service.
list Mike Rowell
Unfortunately with the patch I have just had a status-board not available. Regards, Mike From: user-5425c7b245e1@xymon.invalid [mailto:user-5425c7b245e1@xymon.invalid] On Behalf Of Steve Holmes Sent: 23 February 2007 15:40
▸
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd status-board not available
Hmmm. Me too. I hadn't noticed it before (I'm still testing hobbit). I
would be very interested in a fix for it :-).
Steve Holmes
Purdue University.
On 2/23/07, Mike Rowell <user-63f3e97eb1de@xymon.invalid> wrote:
Henrik,
I am also one of the sufferers of this annoying bug, it only seems to
affect Solaris 10 from what I've seen on the platforms here, Solaris 8
(at least here) is unaffected.
I've applied the patch and will monitor the situation.
Regards,
Mike
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 23 February 2007 11:35
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd status-board not available
On Fri, Feb 23, 2007 at 08:24:01AM -0300, Guillermo Castellini wrote:I have problems with bbgen test, i found this logs: bb-display.log :::::::::::::: 2007-02-22 17:29:33 hobbitd status-board not available
I'm curious if the attached patch solves this problem. I ran into a similar issue during a major network problem here, and found out that the Hobbit 4.2.0 "hobbitd" daemon could stop servicing requests if one connection to e.g. a client sending a status report was hanging. Regards, Henrik This email has been scanned for all viruses by the MessageLabs service. This email has been scanned for all viruses by the MessageLabs service. This email has been scanned for all viruses by the MessageLabs service. This email has been scanned for all viruses by the MessageLabs service.