"hobbitd status-board not available" from bbgen on solaris 10
list Colin Spargo
If anyone has been having issues with bbgen logging this error mesage on Solaris 10 and intermittently failing, resulting in blank status pages, then I think I have found a workaround. If you disable TCP fusion be adding the following kernel parameter to /etc/system and reboot, hopefully you will find that the problem goes away. set ip:do_tcp_fusion = 0 Apparently this can be done on a live system as well (without rebooting), but will require hobbit to be restarted. To do this: echo do_tcp_fusion/W0 | mdb -kw TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
list Henrik Størner
▸
On Thu, Apr 19, 2007 at 12:30:23PM +0100, Colin Spargo wrote:
TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
Very interesting, thanks. There have been some reports about problems on Solaris 10, and at one point I was suspecting an OS bug. Seems I was right. Henrik
list T.J. Yang
1. stop hobbit server
2. zero out the existing log file
3. apply the online fix
4. So far so good, I can confirm the status-board error message is now gone
;)
bash-3.00# grep -i status-board *.log
bash-3.00# pwd
/var/opt/hobbitserver42/log
bash-3.00# ls *.log
acknowledge.log cgierror.log hobbitlaunch.log rrd-data.log
bb-display.log clientdata.log hobbitlaunch.pid rrd-status.log
bb-network.log history.log hostdata.log
bb-retest.log hobbitd.log notifications.log
bbcombotest.log hobbitd.pid page.log
bash-3.00# cat /etc/release
Solaris 10 6/06 s10s_u2wos_09a SPARC
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 09 June 2006
bash-3.00#
Good job on track down the cause on providing the fix.
T.J. Yang
From: Colin Spargo <user-4148d5b43ace@xymon.invalid> Reply-To: user-ae9b8668bcde@xymon.invalid To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10 Date: Thu, 19 Apr 2007 12:30:23 +0100
▸
If anyone has been having issues with bbgen logging this error mesage on
Solaris 10 and intermittently failing, resulting in blank status pages,
then I think I have found a workaround.
If you disable TCP fusion be adding the following kernel parameter to
/etc/system and reboot, hopefully you will find that the problem goes
away.
set ip:do_tcp_fusion = 0
Apparently this can be done on a live system as well (without rebooting),
but will require hobbit to be restarted. To do this:
echo do_tcp_fusion/W0 | mdb -kw
TCP fusion is only used on local loopback connections to speed them up by
bypassing the normal TCP stack. I found that the problem only occured when
connecting to hobbitd locally. I tried running "bb localhost hobbitdboard"
once a second, and found it would often return no data, but if I ran the
same command from another host to the hobbit server, it always returned
correct data. This made me suspect TCP fusion, as I have run into issues
with it before. It it is best left disabled in my opinion.MSN is giving away a trip to Vegas to see Elton John.� Enter to win today. http://msnconcertcontest.com?icid-nceltontagline
list Colin Spargo
Good to hear! A trawl through sunsolve shows a few bugs that may have something to do with it: Bug ID: 6458410 Synopsis: read() may spuriously return EAGAIN while unfusing a TCP connection No patch for this yet I believe. This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. "T.J. Yang" <user-8e841282cda5@xymon.invalid> 19/04/2007 17:46 Please respond to user-ae9b8668bcde@xymon.invalid To user-ae9b8668bcde@xymon.invalid cc Subject RE: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10
▸
1. stop hobbit server
2. zero out the existing log file
3. apply the online fix
4. So far so good, I can confirm the status-board error message is now
gone
;)
▸
bash-3.00# grep -i status-board *.log
bash-3.00# pwd
/var/opt/hobbitserver42/log
bash-3.00# ls *.log
acknowledge.log cgierror.log hobbitlaunch.log rrd-data.log
bb-display.log clientdata.log hobbitlaunch.pid rrd-status.log
bb-network.log history.log hostdata.log
bb-retest.log hobbitd.log notifications.log
bbcombotest.log hobbitd.pid page.log
bash-3.00# cat /etc/release
Solaris 10 6/06 s10s_u2wos_09a SPARC
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 09 June 2006
bash-3.00#
Good job on track down the cause on providing the fix.
T.J. Yang
From: Colin Spargo <user-4148d5b43ace@xymon.invalid> Reply-To: user-ae9b8668bcde@xymon.invalid To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10 Date: Thu, 19 Apr 2007 12:30:23 +0100 If anyone has been having issues with bbgen logging this error mesage on Solaris 10 and intermittently failing, resulting in blank status pages, then I think I have found a workaround. If you disable TCP fusion be adding the following kernel parameter to /etc/system and reboot, hopefully you will find that the problem goes away. set ip:do_tcp_fusion = 0 Apparently this can be done on a live system as well (without rebooting), but will require hobbit to be restarted. To do this: echo do_tcp_fusion/W0 | mdb -kw TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
MSN is giving away a trip to Vegas to see Elton John. Enter to win today. http://msnconcertcontest.com?icid-nceltontagline
list Colin Spargo
Missed out the other bug, which is a duplicate of the one below. It gives more detail about the issue: "Bug ID: 6454060 Synopsis: select()/poll() indicate a socket as readable when there is no data available Description: pollsys() (on behalf of select()) indicates available data on a socket but a following recvfrom() fails with EAGAIN:" I'm not sure if this does apply here, but it is a spurious type of issue. Colin Spargo/GIS/SC/CSC 19/04/2007 18:10
▸
To
user-ae9b8668bcde@xymon.invalid
cc
Subject
RE: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10
Good to hear!
A trawl through sunsolve shows a few bugs that may have something to do
with it:
Bug ID: 6458410 Synopsis: read() may spuriously return EAGAIN while
unfusing a TCP connection
No patch for this yet I believe.
"T.J. Yang" <user-8e841282cda5@xymon.invalid>
19/04/2007 17:46
Please respond to
user-ae9b8668bcde@xymon.invalid
To
user-ae9b8668bcde@xymon.invalid
cc
Subject
RE: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10
1. stop hobbit server
2. zero out the existing log file
3. apply the online fix
4. So far so good, I can confirm the status-board error message is now
gone
;)
▸
bash-3.00# grep -i status-board *.log
bash-3.00# pwd
/var/opt/hobbitserver42/log
bash-3.00# ls *.log
acknowledge.log cgierror.log hobbitlaunch.log rrd-data.log
bb-display.log clientdata.log hobbitlaunch.pid rrd-status.log
bb-network.log history.log hostdata.log
bb-retest.log hobbitd.log notifications.log
bbcombotest.log hobbitd.pid page.log
bash-3.00# cat /etc/release
Solaris 10 6/06 s10s_u2wos_09a SPARC
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 09 June 2006
bash-3.00#
Good job on track down the cause on providing the fix.
T.J. Yang
From: Colin Spargo <user-4148d5b43ace@xymon.invalid> Reply-To: user-ae9b8668bcde@xymon.invalid To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] "hobbitd status-board not available" from bbgen on solaris 10 Date: Thu, 19 Apr 2007 12:30:23 +0100 If anyone has been having issues with bbgen logging this error mesage on Solaris 10 and intermittently failing, resulting in blank status pages, then I think I have found a workaround. If you disable TCP fusion be adding the following kernel parameter to /etc/system and reboot, hopefully you will find that the problem goes away. set ip:do_tcp_fusion = 0 Apparently this can be done on a live system as well (without rebooting), but will require hobbit to be restarted. To do this: echo do_tcp_fusion/W0 | mdb -kw TCP fusion is only used on local loopback connections to speed them up by bypassing the normal TCP stack. I found that the problem only occured when connecting to hobbitd locally. I tried running "bb localhost hobbitdboard" once a second, and found it would often return no data, but if I ran the same command from another host to the hobbit server, it always returned correct data. This made me suspect TCP fusion, as I have run into issues with it before. It it is best left disabled in my opinion.
MSN is giving away a trip to Vegas to see Elton John. Enter to win today. http://msnconcertcontest.com?icid-nceltontagline