Xymon Mailing List Archive search

bbgen frequent yellow alerts - hobbitd problem?

10 messages in this thread

list Mr-Pope · Mon, 6 Nov 2006 07:35:27 -0800 ·
Hi,

We are running a new installation of Hobbit 4.2 on Solaris 10 running
in a non-global zone.  Server is a v240 but I don't think that matters
here.

The problem here is that our bbgen status turns yellow with fairly
high frequency, sometimes multiple times an hour, at (what seem like)
random intervals.  In the yellow alert bbgen reports:
"hobbitd status-board not available"

During this time the hobbitd daemon is still running and the next time
that bbgen runs the alert (usually) turns green.  I've tested this by
running bbgen every second, every 15 seconds, and every minute.  The
same is also true if I run bbgen by hand.

During the 'yellow alert' time window the bb2.html gets updated with
"All Monitored Systems OK"
When all monitored systems are NOT OK.  When the status turns green
again this page reflects the correct status for the non-green systems.

Below are the output from some commands/logs.  These logs don't really
seem to help, so let me know if there is anything else that I can send
along to debug this issue.

Any help is appreciated - we're near the point of frustration to where
we may have to pull the plug on Hobbit and go back to our old BB
installation.

Thanks in advance.
-Jon

(logs below)


hobbitd log from --debug.  Way less entries here than normal.
2006-11-03 10:54:00 -> do_message/1 (12 bytes): hobbitdboard
2006-11-03 10:54:00 -> update_statistics
2006-11-03 10:54:00 <- update_statistics
2006-11-03 10:54:00 -> oksender
2006-11-03 10:54:00 <- oksender(1-a)
2006-11-03 10:54:00 -> setup_filter: hobbitdboard
2006-11-03 10:54:00 <- setup_filter: hobbitdboard
2006-11-03 10:54:00 <- do_message/1
2006-11-03 10:54:01 -> do_message/1 (0 bytes):
2006-11-03 10:54:01 -> update_statistics
2006-11-03 10:54:01 <- update_statistics
2006-11-03 10:54:01 <- do_message/1


$BB --debug $BBDISP "hobbitdboard"
(with no --debug on a 'failure' I get no output.  I'm assuming this is
the same cause of the bbgen yellow alert)

2006-11-03 10:54:01 Transport setup is:
2006-11-03 10:54:01 bbdportnumber = 1984
2006-11-03 10:54:01 bbdispproxyhost = NONE
2006-11-03 10:54:01 bbdispproxyport = 0
2006-11-03 10:54:01 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 10:54:01 Standard BB protocol on port 1984
2006-11-03 10:54:01 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 10:54:01 Connect status is 0
2006-11-03 10:54:01 Sent 12 bytes
2006-11-03 10:54:01 Closing connection


bbgen --debug --report (this one turned bbgen yellow/unavailable.
Note the quick disconnect.)
2006-11-03 09:51:03 load_state()
2006-11-03 09:51:03 Transport setup is:
2006-11-03 09:51:03 bbdportnumber = 1984
2006-11-03 09:51:03 bbdispproxyhost = NONE
2006-11-03 09:51:03 bbdispproxyport = 0
2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 09:51:03 Standard BB protocol on port 1984
2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 09:51:03 Connect status is 0
2006-11-03 09:51:03 Sent 126 bytes
2006-11-03 09:51:03 Closing connection

bbgen --debug --report (this one worked fine)
2006-11-03 09:54:00 load_state()
2006-11-03 09:54:00 Transport setup is:
2006-11-03 09:54:00 bbdportnumber = 1984
2006-11-03 09:54:00 bbdispproxyhost = NONE
2006-11-03 09:54:00 bbdispproxyport = 0
2006-11-03 09:54:00 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 09:54:00 Standard BB protocol on port 1984
2006-11-03 09:54:00 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 09:54:00 Connect status is 0
2006-11-03 09:54:00 Sent 126 bytes
2006-11-03 09:54:00 Read 16384 bytes
2006-11-03 09:54:00 Read 32767 bytes
2006-11-03 09:54:00 Read 1 bytes
2006-11-03 09:54:00 Read 32767 bytes
2006-11-03 09:54:00 Read 32767 bytes
2006-11-03 09:54:00 Read 24578 bytes
2006-11-03 09:54:00 Read 32767 bytes
2006-11-03 09:54:00 Read 32767 bytes
2006-11-03 09:54:00 Read 24578 bytes
2006-11-03 09:54:00 Read 16503 bytes
2006-11-03 09:54:00 Closing connection
list Henrik Størner · Mon, 6 Nov 2006 17:29:59 +0100 ·
quoted from Mr-Pope
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running
in a non-global zone.  Server is a v240 but I don't think that matters
here.

The problem here is that our bbgen status turns yellow with fairly
high frequency, sometimes multiple times an hour, at (what seem like)
random intervals.  In the yellow alert bbgen reports:
"hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all
happen on Solaris 10. So I'm beginning to suspect that maybe Solaris
doesn't work quite the way other systems do.

Or perhaps there is a bug, and something special in Solaris triggers it.
quoted from Mr-Pope
Below are the output from some commands/logs.  These logs don't really
seem to help, so let me know if there is anything else that I can send
along to debug this issue.
$BB --debug $BBDISP "hobbitdboard"
(with no --debug on a 'failure' I get no output.  I'm assuming this is
the same cause of the bbgen yellow alert)
Yes.
quoted from Mr-Pope
bbgen --debug --report (this one turned bbgen yellow/unavailable.
Note the quick disconnect.)
2006-11-03 09:51:03 load_state()
2006-11-03 09:51:03 Transport setup is:
2006-11-03 09:51:03 bbdportnumber = 1984
2006-11-03 09:51:03 bbdispproxyhost = NONE
2006-11-03 09:51:03 bbdispproxyport = 0
2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 09:51:03 Standard BB protocol on port 1984
2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 09:51:03 Connect status is 0
2006-11-03 09:51:03 Sent 126 bytes
2006-11-03 09:51:03 Closing connection
Interesting.

Since it seems that this bites you more than most others, I'd like you
to do a couple of things for me to figure out what is going on. I need
you to add a couple of debugging lines to Hobbit.

First, in the bbdisplay/loaddata.c file, around line 436 you'll find the
code that prints out the "hobbitd status board not available" message.
It looks like this:
        errprintf("hobbitd status-board not available\n");
I want you to change that to
        errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);


Next, in the lib/sendmsg.c file around line 340 is where the code is
that receives data from Hobbit. You'll find these lines:

	n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
	if (n > 0) {

I'd like you to add 8 lines between these two:

	n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
	if (n < 0) {
		dbgprintf("recv() returned error: %s\n", strerror(errno));
		if (errno == EAGAIN) continue; 
	}
	if (n == 0) {
		dbgprintf("recv() gave us 0 bytes\n");
		continue;
	}
	if (n > 0) {

(it isn't the prettiest of programming, but it does the job for now).


After making these two changes, run "make clean; make" and copy the
bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let
Hobbit run as normal (with --debug on the bbgen command) and when it
fails I am very interested to see what's in the logfile.


Regards,
Henrik
list Henrik Størner · Mon, 6 Nov 2006 17:33:19 +0100 ·
Oops - one line too many among those extra ones I sent you:
quoted from Henrik Størner

On Mon, Nov 06, 2006 at 05:29:59PM +0100, Henrik Stoerner wrote:
I'd like you to add 8 lines between these two:

	n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
	if (n < 0) {
		dbgprintf("recv() returned error: %s\n", strerror(errno));
		if (errno == EAGAIN) continue; 
	}
	if (n == 0) {
		dbgprintf("recv() gave us 0 bytes\n");
		continue;
                ^^^^^^^^^ dont put this "continue" line in there.


Regards,
Henrik
list Matthew G Armstrong · Mon, 6 Nov 2006 16:39:22 +0000 ·
are there many problems encountered when you using Solaris with Hobbit?  I was going to base my design on Solaris but I am now wondering if I should steer more towards an AIX solution.

This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.


Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> 06/11/2006 16:33
Please respond to
user-ae9b8668bcde@xymon.invalid


To
user-ae9b8668bcde@xymon.invalid
cc

Subject
Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
quoted from Henrik Størner


Oops - one line too many among those extra ones I sent you:

On Mon, Nov 06, 2006 at 05:29:59PM +0100, Henrik Stoerner wrote:
I'd like you to add 8 lines between these two:

               n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
               if (n < 0) {
                               dbgprintf("recv() returned error: %s\n", 
strerror(errno));
                               if (errno == EAGAIN) continue;                }
               if (n == 0) {
                               dbgprintf("recv() gave us 0 bytes\n");
                               continue;
                ^^^^^^^^^ dont put this "continue" line in there.


Regards,
Henrik
list Greg L Hubbard · Mon, 6 Nov 2006 10:44:20 -0600 ·
Solaris works great once you get past the linker problems.  That is
well-known territory if you read the hints.
quoted from Matthew G Armstrong


	From: Matthew G Armstrong [mailto:user-0523254e8394@xymon.invalid] 
	Sent: Monday, November 06, 2006 10:39 AM
	To: user-ae9b8668bcde@xymon.invalid
	Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd
problem?
	
	
	are there many problems encountered when you using Solaris with
Hobbit?  I was going to base my design on Solaris but I am now wondering
if I should steer more towards an AIX solution.
	
	
	This is a PRIVATE message. If you are not the intended
recipient, please delete without copying and kindly advise us by e-mail
of the mistake in delivery. NOTE: Regardless of content, this e-mail
shall not operate to bind CSC to any order or other contract unless
pursuant to explicit written agreement or government initiative
expressly permitting the use of e-mail for such purpose.
	
	
Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> 

06/11/2006 16:33 
Please respond to
user-ae9b8668bcde@xymon.invalid


To
user-ae9b8668bcde@xymon.invalid 
cc
Subject
Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?

	
	Oops - one line too many among those extra ones I sent you:
	
	On Mon, Nov 06, 2006 at 05:29:59PM +0100, Henrik Stoerner wrote:
	
I'd like you to add 8 lines between these two:

                 n = recv(sockfd, recvbuf, sizeof(recvbuf)-1,
0);
                 if (n < 0) {
                                  dbgprintf("recv() returned
error: %s\n", strerror(errno));
                                  if (errno == EAGAIN)
continue; 
                 }
                 if (n == 0) {
                                  dbgprintf("recv() gave us 0
bytes\n");
                                  continue;
	               ^^^^^^^^^ dont put this "continue" line in there.
	
	
	Regards,
	Henrik
list Henrik Størner · Mon, 6 Nov 2006 17:49:47 +0100 ·
Hi Matt,

(always good to see a fellow csc'er on the list :-))
quoted from Matthew G Armstrong

On Mon, Nov 06, 2006 at 04:39:22PM +0000, Matthew G Armstrong wrote:
are there many problems encountered when you using Solaris with Hobbit?  I 
was going to base my design on Solaris but I am now wondering if I should 
steer more towards an AIX solution.
I don't think Solaris is a problem. Most likely it's just my code which
doesn't work quite the way it should. I believe there are lots of Hobbit
users out there running on Solaris.


Regards,
Henrik
list Mr-Pope · Mon, 6 Nov 2006 10:27:24 -0800 ·
Thanks, Henrik.

I made the changes that you suggested and copied bbgen to the
appropriate directory.  When we get yellow alerts the following is the
new "Error output":

hobbitd status-board not available, code 0

In the log I get the following messages during the connection process:

*cut*

2006-11-06 09:59:57 load_state()
2006-11-06 09:59:57 Transport setup is:
2006-11-06 09:59:57 bbdportnumber = 1984
2006-11-06 09:59:57 bbdispproxyhost = NONE
2006-11-06 09:59:57 bbdispproxyport = 0
2006-11-06 09:59:57 Recipient listed as '10.xxx.xxx.xxx'
2006-11-06 09:59:57 Standard BB protocol on port 1984
2006-11-06 09:59:57 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-06 09:59:57 Connect status is 0
2006-11-06 09:59:57 Sent 126 bytes
2006-11-06 09:59:57 recv() gave us 0 bytes
2006-11-06 09:59:57 Closing connection

*cut*

2006-11-06 09:59:58 Recipient listed as '10.xxx.xxx.xxx'
2006-11-06 09:59:58 Standard BB protocol on port 1984
2006-11-06 09:59:58 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-06 09:59:58 Connect status is 0
2006-11-06 09:59:58 Sent 1384 bytes
2006-11-06 09:59:58 Closing connection
2006-11-06 09:59:58 1 status messages merged into 2 transmissions

*end*

I did not see the impact of the changes to sendmsg.c anywhere in the
debug output.

-Jon
quoted from Henrik Størner

On 11/6/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running
in a non-global zone.  Server is a v240 but I don't think that matters
here.

The problem here is that our bbgen status turns yellow with fairly
high frequency, sometimes multiple times an hour, at (what seem like)
random intervals.  In the yellow alert bbgen reports:
"hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all
happen on Solaris 10. So I'm beginning to suspect that maybe Solaris
doesn't work quite the way other systems do.

Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs.  These logs don't really
seem to help, so let me know if there is anything else that I can send
along to debug this issue.
$BB --debug $BBDISP "hobbitdboard"
(with no --debug on a 'failure' I get no output.  I'm assuming this is
the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable.
Note the quick disconnect.)
2006-11-03 09:51:03 load_state()
2006-11-03 09:51:03 Transport setup is:
2006-11-03 09:51:03 bbdportnumber = 1984
2006-11-03 09:51:03 bbdispproxyhost = NONE
2006-11-03 09:51:03 bbdispproxyport = 0
2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 09:51:03 Standard BB protocol on port 1984
2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 09:51:03 Connect status is 0
2006-11-03 09:51:03 Sent 126 bytes
2006-11-03 09:51:03 Closing connection
Interesting.

Since it seems that this bites you more than most others, I'd like you
to do a couple of things for me to figure out what is going on. I need
you to add a couple of debugging lines to Hobbit.

First, in the bbdisplay/loaddata.c file, around line 436 you'll find the
code that prints out the "hobbitd status board not available" message.
It looks like this:
        errprintf("hobbitd status-board not available\n");
I want you to change that to
        errprintf("hobbitd status-board not available, code %d\n", hobbitdresult);


Next, in the lib/sendmsg.c file around line 340 is where the code is
that receives data from Hobbit. You'll find these lines:

        n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
        if (n > 0) {

I'd like you to add 8 lines between these two:

        n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
        if (n < 0) {
                dbgprintf("recv() returned error: %s\n", strerror(errno));
                if (errno == EAGAIN) continue;
        }
        if (n == 0) {
                dbgprintf("recv() gave us 0 bytes\n");
                continue;
        }
        if (n > 0) {

(it isn't the prettiest of programming, but it does the job for now).


After making these two changes, run "make clean; make" and copy the
bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let
Hobbit run as normal (with --debug on the bbgen command) and when it
fails I am very interested to see what's in the logfile.


Regards,
Henrik

list Rdeal · Mon, 06 Nov 2006 13:33:37 -0500 ·
I have seen it with solaris 10 sparc.  Solaris 9 sparc was fine.

From: Mike Rowell <user-63f3e97eb1de@xymon.invalid>
Reply-To: <user-ae9b8668bcde@xymon.invalid>
Date: Mon, 6 Nov 2006 18:33:41 -0000
To: <user-ae9b8668bcde@xymon.invalid>
Conversation: [hobbit] bbgen frequent yellow alerts - hobbitd problem?
Subject: RE: [hobbit] bbgen frequent yellow alerts - hobbitd problem?


Henrik,

It might be worth checking to make sure these problems are only on
Solaris 10 x86 as that is the only architecture I've seen this problem
on, sparc seems fine so might help you in narrowing down the problem.

Regards,

Mike Rowell
quoted from Mr-Pope
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 06 November 2006 16:30
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?

On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running
in a non-global zone.  Server is a v240 but I don't think that matters
here.

The problem here is that our bbgen status turns yellow with fairly
high frequency, sometimes multiple times an hour, at (what seem like)
random intervals.  In the yellow alert bbgen reports:
"hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all
happen on Solaris 10. So I'm beginning to suspect that maybe Solaris
doesn't work quite the way other systems do.

Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs.  These logs don't really
seem to help, so let me know if there is anything else that I can send
along to debug this issue.
$BB --debug $BBDISP "hobbitdboard"
(with no --debug on a 'failure' I get no output.  I'm assuming this is
the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable.
Note the quick disconnect.)
2006-11-03 09:51:03 load_state()
2006-11-03 09:51:03 Transport setup is:
2006-11-03 09:51:03 bbdportnumber = 1984
2006-11-03 09:51:03 bbdispproxyhost = NONE
2006-11-03 09:51:03 bbdispproxyport = 0
2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 09:51:03 Standard BB protocol on port 1984
2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 09:51:03 Connect status is 0
2006-11-03 09:51:03 Sent 126 bytes
2006-11-03 09:51:03 Closing connection
Interesting.

Since it seems that this bites you more than most others, I'd like you
to do a couple of things for me to figure out what is going on. I need
you to add a couple of debugging lines to Hobbit.

First, in the bbdisplay/loaddata.c file, around line 436 you'll find the
code that prints out the "hobbitd status board not available" message.
It looks like this:
        errprintf("hobbitd status-board not available\n");
I want you to change that to
        errprintf("hobbitd status-board not available, code %d\n",
hobbitdresult);


Next, in the lib/sendmsg.c file around line 340 is where the code is
that receives data from Hobbit. You'll find these lines:

n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n > 0) {

I'd like you to add 8 lines between these two:

n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
if (n < 0) {
dbgprintf("recv() returned error: %s\n",
strerror(errno));
if (errno == EAGAIN) continue;
}
if (n == 0) {
dbgprintf("recv() gave us 0 bytes\n");
continue;
}
if (n > 0) {

(it isn't the prettiest of programming, but it does the job for now).


After making these two changes, run "make clean; make" and copy the
bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let
Hobbit run as normal (with --debug on the bbgen command) and when it
fails I am very interested to see what's in the logfile.


Regards,
Henrik


This email has been scanned for all viruses by the MessageLabs service.

This email has been scanned for all viruses by the MessageLabs service.

list Mike Rowell · Mon, 6 Nov 2006 18:33:41 -0000 ·
Henrik,

It might be worth checking to make sure these problems are only on
Solaris 10 x86 as that is the only architecture I've seen this problem
on, sparc seems fine so might help you in narrowing down the problem.

Regards,

Mike Rowell
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: 06 November 2006 16:30
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?

On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running
in a non-global zone.  Server is a v240 but I don't think that matters
here.

The problem here is that our bbgen status turns yellow with fairly
high frequency, sometimes multiple times an hour, at (what seem like)
random intervals.  In the yellow alert bbgen reports:
"hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all
happen on Solaris 10. So I'm beginning to suspect that maybe Solaris
doesn't work quite the way other systems do.

Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs.  These logs don't really
seem to help, so let me know if there is anything else that I can send
along to debug this issue.
$BB --debug $BBDISP "hobbitdboard"
(with no --debug on a 'failure' I get no output.  I'm assuming this is
the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable.
Note the quick disconnect.)
2006-11-03 09:51:03 load_state()
2006-11-03 09:51:03 Transport setup is:
2006-11-03 09:51:03 bbdportnumber = 1984
2006-11-03 09:51:03 bbdispproxyhost = NONE
2006-11-03 09:51:03 bbdispproxyport = 0
2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 09:51:03 Standard BB protocol on port 1984
2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 09:51:03 Connect status is 0
2006-11-03 09:51:03 Sent 126 bytes
2006-11-03 09:51:03 Closing connection
Interesting.

Since it seems that this bites you more than most others, I'd like you
to do a couple of things for me to figure out what is going on. I need
you to add a couple of debugging lines to Hobbit.

First, in the bbdisplay/loaddata.c file, around line 436 you'll find the
code that prints out the "hobbitd status board not available" message.
It looks like this:
        errprintf("hobbitd status-board not available\n");
I want you to change that to
        errprintf("hobbitd status-board not available, code %d\n",
hobbitdresult);


Next, in the lib/sendmsg.c file around line 340 is where the code is
that receives data from Hobbit. You'll find these lines:

	n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
	if (n > 0) {

I'd like you to add 8 lines between these two:

	n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
	if (n < 0) {
		dbgprintf("recv() returned error: %s\n",
strerror(errno));
		if (errno == EAGAIN) continue; 
	}
	if (n == 0) {
		dbgprintf("recv() gave us 0 bytes\n");
		continue;
	}
	if (n > 0) {

(it isn't the prettiest of programming, but it does the job for now).


After making these two changes, run "make clean; make" and copy the
bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let
Hobbit run as normal (with --debug on the bbgen command) and when it
fails I am very interested to see what's in the logfile.


Regards,
Henrik


This email has been scanned for all viruses by the MessageLabs service.

This email has been scanned for all viruses by the MessageLabs service. 
list Mr-Pope · Mon, 6 Nov 2006 10:36:22 -0800 ·
Mike,

This problem is happening on a sparc box.

-Jon
quoted from Mike Rowell

On 11/6/06, Mike Rowell <user-63f3e97eb1de@xymon.invalid> wrote:
Henrik,

It might be worth checking to make sure these problems are only on
Solaris 10 x86 as that is the only architecture I've seen this problem
on, sparc seems fine so might help you in narrowing down the problem.

Regards,

Mike Rowell
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 06 November 2006 16:30
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbgen frequent yellow alerts - hobbitd problem?

On Mon, Nov 06, 2006 at 07:35:27AM -0800, Mr-Pope wrote:
We are running a new installation of Hobbit 4.2 on Solaris 10 running
in a non-global zone.  Server is a v240 but I don't think that matters
here.

The problem here is that our bbgen status turns yellow with fairly
high frequency, sometimes multiple times an hour, at (what seem like)
random intervals.  In the yellow alert bbgen reports:
"hobbitd status-board not available"
The reports I've had of this only have one thing in common: They all
happen on Solaris 10. So I'm beginning to suspect that maybe Solaris
doesn't work quite the way other systems do.

Or perhaps there is a bug, and something special in Solaris triggers it.
Below are the output from some commands/logs.  These logs don't really
seem to help, so let me know if there is anything else that I can send
along to debug this issue.
$BB --debug $BBDISP "hobbitdboard"
(with no --debug on a 'failure' I get no output.  I'm assuming this is
the same cause of the bbgen yellow alert)
Yes.
bbgen --debug --report (this one turned bbgen yellow/unavailable.
Note the quick disconnect.)
2006-11-03 09:51:03 load_state()
2006-11-03 09:51:03 Transport setup is:
2006-11-03 09:51:03 bbdportnumber = 1984
2006-11-03 09:51:03 bbdispproxyhost = NONE
2006-11-03 09:51:03 bbdispproxyport = 0
2006-11-03 09:51:03 Recipient listed as '10.xxx.xxx.xxx'
2006-11-03 09:51:03 Standard BB protocol on port 1984
2006-11-03 09:51:03 Will connect to address 10.xxx.xxx.xxx port 1984
2006-11-03 09:51:03 Connect status is 0
2006-11-03 09:51:03 Sent 126 bytes
2006-11-03 09:51:03 Closing connection
Interesting.

Since it seems that this bites you more than most others, I'd like you
to do a couple of things for me to figure out what is going on. I need
you to add a couple of debugging lines to Hobbit.

First, in the bbdisplay/loaddata.c file, around line 436 you'll find the
code that prints out the "hobbitd status board not available" message.
It looks like this:
        errprintf("hobbitd status-board not available\n");
I want you to change that to
        errprintf("hobbitd status-board not available, code %d\n",
hobbitdresult);


Next, in the lib/sendmsg.c file around line 340 is where the code is
that receives data from Hobbit. You'll find these lines:

        n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
        if (n > 0) {

I'd like you to add 8 lines between these two:

        n = recv(sockfd, recvbuf, sizeof(recvbuf)-1, 0);
        if (n < 0) {
                dbgprintf("recv() returned error: %s\n",
strerror(errno));
                if (errno == EAGAIN) continue;
        }
        if (n == 0) {
                dbgprintf("recv() gave us 0 bytes\n");
                continue;
        }
        if (n > 0) {

(it isn't the prettiest of programming, but it does the job for now).


After making these two changes, run "make clean; make" and copy the
bbdisplay/bbgen binary into your ~hobbit/server/bin/ directory. Let
Hobbit run as normal (with --debug on the bbgen command) and when it
fails I am very interested to see what's in the logfile.


Regards,
Henrik


This email has been scanned for all viruses by the MessageLabs service.

This email has been scanned for all viruses by the MessageLabs service.

-- 

A FLASH FLOOD WATCH MEANS FLASH FLOODING IS POSSIBLE IN THE WATCH AREA.