Xymon Mailing List Archive search

"hobbitd status-board not available" from bbgen on solaris 10

5 messages in this thread

list Colin Spargo · Thu, 19 Apr 2007 12:30:23 +0100 ·
If anyone has been having issues with bbgen logging this error mesage on Solaris 10 and intermittently  failing, resulting in blank status pages, then I think I have found a workaround. 
If you disable TCP fusion be adding the following kernel parameter to /etc/system and reboot, hopefully you will find that the problem goes away.

set ip:do_tcp_fusion = 0

Apparently this can be done on a live system as well (without rebooting), but will require hobbit to be restarted. To do this:

echo do_tcp_fusion/W0 | mdb -kw


TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
list Henrik Størner · Thu, 19 Apr 2007 15:59:08 +0200 ·
quoted from Colin Spargo
On Thu, Apr 19, 2007 at 12:30:23PM +0100, Colin Spargo wrote:
TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
Very interesting, thanks. There have been some reports about problems on Solaris 10, and at one point I was suspecting an OS bug. Seems I was
right.


Henrik
list T.J. Yang · Thu, 19 Apr 2007 11:46:23 -0500 ·
1. stop hobbit server
2. zero out the existing log file
3. apply the online fix
4. So far so good, I can confirm the status-board error message is now gone 
;)

bash-3.00# grep -i status-board  *.log
bash-3.00# pwd
/var/opt/hobbitserver42/log
bash-3.00# ls *.log
acknowledge.log    cgierror.log       hobbitlaunch.log   rrd-data.log
bb-display.log     clientdata.log     hobbitlaunch.pid   rrd-status.log
bb-network.log     history.log        hostdata.log
bb-retest.log      hobbitd.log        notifications.log
bbcombotest.log    hobbitd.pid        page.log
bash-3.00# cat /etc/release
                       Solaris 10 6/06 s10s_u2wos_09a SPARC
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 09 June 2006
bash-3.00#


Good job on track down the cause on providing the fix.

T.J. Yang

From: Colin Spargo <user-4148d5b43ace@xymon.invalid>
Reply-To: user-ae9b8668bcde@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] "hobbitd status-board not available" from bbgen on 
solaris 10
Date: Thu, 19 Apr 2007 12:30:23 +0100
quoted from Colin Spargo

If anyone has been having issues with bbgen logging this error mesage on
Solaris 10 and intermittently  failing, resulting in blank status pages,
then I think I have found a workaround.

If you disable TCP fusion be adding the following kernel parameter to
/etc/system and reboot, hopefully you will find that the problem goes
away.

set ip:do_tcp_fusion = 0

Apparently this can be done on a live system as well (without rebooting),
but will require hobbit to be restarted. To do this:

echo do_tcp_fusion/W0 | mdb -kw


TCP fusion is only used on local loopback connections to speed them up by
bypassing the normal TCP stack. I found that the problem only occured when
connecting to hobbitd locally. I tried running "bb localhost hobbitdboard"
once a second, and found it would often return no data, but if I ran the
same command from another host to the hobbit server, it always returned
correct data. This made me suspect TCP fusion, as I have run into issues
with it before. It it is best left disabled in my opinion.
MSN is giving away a trip to Vegas to see Elton John.� Enter to win today. 
http://msnconcertcontest.com?icid-nceltontagline
list Colin Spargo · Thu, 19 Apr 2007 18:10:26 +0100 ·
Good to hear!

A trawl through sunsolve shows a few bugs that may have something to do 
with it:

Bug ID: 6458410  Synopsis: read() may spuriously return EAGAIN while 
unfusing a TCP connection

No patch for this yet I believe. 

This is a PRIVATE message. If you are not the intended recipient, please 
delete without copying and kindly advise us by e-mail of the mistake in 
delivery. NOTE: Regardless of content, this e-mail shall not operate to 
bind CSC to any order or other contract unless pursuant to explicit 
written agreement or government initiative expressly permitting the use of 
e-mail for such purpose.


"T.J. Yang" <user-8e841282cda5@xymon.invalid> 
19/04/2007 17:46
Please respond to
user-ae9b8668bcde@xymon.invalid


To
user-ae9b8668bcde@xymon.invalid
cc

Subject
RE: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10
quoted from T.J. Yang


1. stop hobbit server
2. zero out the existing log file
3. apply the online fix
4. So far so good, I can confirm the status-board error message is now 
gone 

;)
quoted from T.J. Yang

bash-3.00# grep -i status-board  *.log
bash-3.00# pwd
/var/opt/hobbitserver42/log
bash-3.00# ls *.log
acknowledge.log    cgierror.log       hobbitlaunch.log   rrd-data.log
bb-display.log     clientdata.log     hobbitlaunch.pid   rrd-status.log
bb-network.log     history.log        hostdata.log
bb-retest.log      hobbitd.log        notifications.log
bbcombotest.log    hobbitd.pid        page.log
bash-3.00# cat /etc/release
                       Solaris 10 6/06 s10s_u2wos_09a SPARC
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 09 June 2006
bash-3.00#


Good job on track down the cause on providing the fix.

T.J. Yang

From: Colin Spargo <user-4148d5b43ace@xymon.invalid>
Reply-To: user-ae9b8668bcde@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] "hobbitd status-board not available" from bbgen on 
solaris 10
Date: Thu, 19 Apr 2007 12:30:23 +0100

If anyone has been having issues with bbgen logging this error mesage on
Solaris 10 and intermittently  failing, resulting in blank status pages,
then I think I have found a workaround.

If you disable TCP fusion be adding the following kernel parameter to
/etc/system and reboot, hopefully you will find that the problem goes
away.

set ip:do_tcp_fusion = 0

Apparently this can be done on a live system as well (without rebooting),
but will require hobbit to be restarted. To do this:

echo do_tcp_fusion/W0 | mdb -kw


TCP fusion is only used on local loopback connections to speed them up by
bypassing the normal TCP stack. I found that the problem only occured 
when
connecting to hobbitd locally. I tried running "bb localhost 
hobbitdboard"
once a second, and found it would often return no data, but if I ran the
same command from another host to the hobbit server, it always returned
correct data. This made me suspect TCP fusion, as I have run into issues
with it before. It it is best left disabled in my opinion.
MSN is giving away a trip to Vegas to see Elton John.  Enter to win today. 

http://msnconcertcontest.com?icid-nceltontagline
list Colin Spargo · Thu, 19 Apr 2007 18:21:20 +0100 ·
Missed out the other bug, which is a duplicate of the one below. It gives 
more detail about the issue:


"Bug ID: 6454060 Synopsis: select()/poll() indicate a socket as readable 
when there is no data available

Description: 

pollsys() (on behalf of select()) indicates available data on a socket but 
a following recvfrom() fails with EAGAIN:"

I'm not sure if this does apply here, but it is a spurious type of issue.


Colin Spargo/GIS/SC/CSC
19/04/2007 18:10
quoted from Colin Spargo

To
user-ae9b8668bcde@xymon.invalid
cc

Subject
RE: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10


Good to hear!

A trawl through sunsolve shows a few bugs that may have something to do 
with it:

Bug ID: 6458410  Synopsis: read() may spuriously return EAGAIN while 
unfusing a TCP connection

No patch for this yet I believe. 


"T.J. Yang" <user-8e841282cda5@xymon.invalid> 
19/04/2007 17:46
Please respond to
user-ae9b8668bcde@xymon.invalid


To
user-ae9b8668bcde@xymon.invalid
cc

Subject
RE: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10


1. stop hobbit server
2. zero out the existing log file
3. apply the online fix
4. So far so good, I can confirm the status-board error message is now 
gone 

;)
quoted from Colin Spargo

bash-3.00# grep -i status-board  *.log
bash-3.00# pwd
/var/opt/hobbitserver42/log
bash-3.00# ls *.log
acknowledge.log    cgierror.log       hobbitlaunch.log   rrd-data.log
bb-display.log     clientdata.log     hobbitlaunch.pid   rrd-status.log
bb-network.log     history.log        hostdata.log
bb-retest.log      hobbitd.log        notifications.log
bbcombotest.log    hobbitd.pid        page.log
bash-3.00# cat /etc/release
                       Solaris 10 6/06 s10s_u2wos_09a SPARC
           Copyright 2006 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                             Assembled 09 June 2006
bash-3.00#


Good job on track down the cause on providing the fix.

T.J. Yang

From: Colin Spargo <user-4148d5b43ace@xymon.invalid>
Reply-To: user-ae9b8668bcde@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] "hobbitd status-board not available" from bbgen on 
solaris 10
Date: Thu, 19 Apr 2007 12:30:23 +0100

If anyone has been having issues with bbgen logging this error mesage on
Solaris 10 and intermittently  failing, resulting in blank status pages,
then I think I have found a workaround.

If you disable TCP fusion be adding the following kernel parameter to
/etc/system and reboot, hopefully you will find that the problem goes
away.

set ip:do_tcp_fusion = 0

Apparently this can be done on a live system as well (without rebooting),
but will require hobbit to be restarted. To do this:

echo do_tcp_fusion/W0 | mdb -kw


TCP fusion is only used on local loopback connections to speed them up by
bypassing the normal TCP stack. I found that the problem only occured 
when
connecting to hobbitd locally. I tried running "bb localhost 
hobbitdboard"
once a second, and found it would often return no data, but if I ran the
same command from another host to the hobbit server, it always returned
correct data. This made me suspect TCP fusion, as I have run into issues
with it before. It it is best left disabled in my opinion.
MSN is giving away a trip to Vegas to see Elton John.  Enter to win today. 

http://msnconcertcontest.com?icid-nceltontagline