Xymon Mailing List Archive search

AIX 5.2 clientdata problem (was: Note: Hobbit 4.2 re lease planned for Thursday Aug. 1 0.)

5 messages in this thread

list Chris Morris · Wed, 9 Aug 2006 14:51:47 +0100 ·
Hi Henrik,

I am running the snapshot from 7th August on this server, with the pscheck
patch applied.

You resolved the clientdata problem on 12th July. The hostdata module does
not terminate, but just keeps saying 

"Our child has failed and will not talk to us: Channel clichg, PID  38436"

My hobbitlaunch.cfg file contains the following lines :-

[hobbitd]
        HEARTBEAT
        ENVFILE /usr/local/hobbit/server/etc/hobbitserver.cfg
        CMD hobbitd --pidfile=$BBSERVERLOGS/hobbitd.pid
--restart=$BBTMP/hobbitd.chk
--checkpoint-file=$BBTMP/hobbitd.chk --checkpoint-interval=600
--log=$BBSERVERLOGS/ho
bbitd.log --store-clientlogs=!msgs --admin-senders=127.0.0.1,$BBSERVERIP
.
.
# Note: The --store-clientlogs option for the [hobbitd] provides control
over
#       which status-changes will cause a client message to be stored.
[hostdata]
        ENVFILE /usr/local/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD hobbitd_channel --channel=clichg --debug
--log=$BBSERVERLOGS/hostdata.log
 --logdir=$BBVAR/hostdata hobbitd_hostdata
.
.

The hobbitlaunch.log file :-

2006-08-09 10:06:29 Loading tasklist configuration from
/usr/local/hobbit/server/etc/hobbitlaunch.cfg
2006-08-09 10:06:47 hobbitlaunch starting
2006-08-09 10:06:47 Loading tasklist configuration from
/usr/local/hobbit/server/etc/hobbitlaunch.cfg
2006-08-09 10:06:47 Loading hostnames
2006-08-09 10:06:47 Loading saved state
2006-08-09 10:06:47 Setting up network listener on 0.0.0.0:1984
2006-08-09 10:06:47 Setting up signal handlers
2006-08-09 10:06:47 Setting up hobbitd channels
2006-08-09 10:06:47 Setting up logfiles

hobbit) ipcs -m
IPC status from /dev/mem as of Wed  9 Aug 14:32:05 2006
T           ID        KEY        MODE       OWNER    GROUP
Shared Memory:
m             0 0x580060c4 --rw-rw-rw-     root   system
m    131073 0x52e74b4f --rw-rw-rw-   imnadm   imnadm
m    131074 0x9308e451 --rw-rw-rw-   imnadm   imnadm
m    131075 0xe4663d62 --rw-rw-rw-   imnadm   imnadm
m             4 0xc76283cc --rw-rw-rw-   imnadm   imnadm
m             5 0x298ee665 --rw-rw-rw-   imnadm   imnadm
m             6 0xffffffff         --rw-rw----     root   system
m             7 0x0d01aae1 --rw-rw-rw-     root   system
m   3407880 0x0205b000 --rw-------   hobbit   hobbit
m   4587529 0x0305b000 --rw-------   hobbit   hobbit
m  13107210 0x4100a990 --rw-rw-rw-   patrol      adm
m   4718603 0x0705b000 --rw-------   hobbit   hobbit
m   4718604 0x0405b000 --rw-------   hobbit   hobbit
m   4587533 0x0605b000 --rw-------   hobbit   hobbit
m   4718606 0x0105b000 --rw-------   hobbit   hobbit
m   4849679 0x0505b000 --rw-------   hobbit   hobbit
m   3145744 0x0805b000 --rw-------   hobbit   hobbit
m   2228241 0x0100a9a0 --rw-rw-rw-   patrol      adm
m   1310738 0x0100a98b --rw-rw-rw-   patrol      adm
m  32243731 0x4100a98c --rw-rw-rw-   patrol      adm
m   9175060 0x4100a98d --rw-rw-rw-   patrol      adm
m   1179669 0x4100a98f --rw-rw-rw-   patrol      adm
m   2490390 0x4100a99a --rw-rw-rw-   patrol      adm
m   1703959 0x4100a999 --rw-rw-rw-   patrol      adm
m   1048600 0x4100a99b --rw-rw-rw-   patrol      adm
m  13369369 0x4100a99f --rw-rw-rw-   patrol      adm
m  11403290 0x4100a99d --rw-rw-rw-   patrol      adm
m  12845083 0x4100a99e --rw-rw-rw-   patrol      adm
m  26083356 0x4100a994 --rw-rw-rw-   patrol      adm

The settings in hobbitserver.cfg are :

MAXMSG_CLIENT="768"
MAXMSG_STATUS="768"
MAXMSG_STACHG="768"

Regards,

Chris

-----Original Message-----
From:	user-ce4a2c883f75@xymon.invalid [SMTP:user-ce4a2c883f75@xymon.invalid]
Sent:	Wednesday, August 09, 2006 1:51 PM
To:	user-ae9b8668bcde@xymon.invalid
Subject:	[hobbit] AIX 5.2 clientdata problem (was: Note: Hobbit 4.2
release planned for Thursday Aug. 1 0.)

(changing the subject)

Hi Chris,

it's quite a while ago - June 8, if my archive search is correct.
http://osiris.hswn.dk/hobbiton/2006/06/msg00203.html

Have you tried running a current snapshot on this server? It looks 
like it might be the same problem that You and I worked on for 
quite some time between the beta-release and the RC-release - 
the one where the clientdata task terminated without any apparent 
reason. This mail started it: 
http://osiris.hswn.dk/hobbiton/2006/06/msg00279.html

If it's still an issue with the current snapshot, I'd like to see
the hobbitlaunch.log file also. And could you send me the output
from the "ipcs -m" command, please?


Regards,
Henrik


On Wed, Aug 09, 2006 at 10:25:31AM +0100, Morris, Chris (Shared Services)
wrote:
Hi Henrik,

A problem with hostdata that was raised some little while ago doesn't
seem
to be resolved or listed as outstanding.

When running the server on an AIX 5.2 system with the hostdata task
enabled,
client data is not written to $BBVAR/hostdata and the hostdata.log
contains
the following :-

2006-08-09 09:59:50 Our child has failed and will not talk to us:
Channel
clichg, PID 17648
2006-08-09 10:06:29 Setting up clichg channel (id=8)
2006-08-09 10:06:29 calling ftok('/usr/local/hobbit/server',8)
2006-08-09 10:06:29 ftok() returns: 0x805B000
2006-08-09 10:06:29 shmget() returns: 0x46000E
2006-08-09 10:06:29 Setting up clichg channel (id=8)
2006-08-09 10:06:29 calling ftok('/usr/local/hobbit/server',8)
2006-08-09 10:06:29 ftok() returns: 0x805B000
2006-08-09 10:06:29 shmget() returns: 0x46000E
2006-08-09 10:06:37 Semaphore wait aborted: Interrupted system call
2006-08-09 10:06:37 Semaphore wait aborted: Interrupted system call
2006-08-09 10:06:52 Setting up clichg channel (id=8)
2006-08-09 10:06:52 calling ftok('/usr/local/hobbit/server',8)
2006-08-09 10:06:52 ftok() returns: 0x805B000
2006-08-09 10:06:52 shmget() returns: 0x300010
2006-08-09 10:06:52 Setting up clichg channel (id=8)
2006-08-09 10:06:52 calling ftok('/usr/local/hobbit/server',8)
2006-08-09 10:06:52 ftok() returns: 0x805B000
2006-08-09 10:06:52 shmget() returns: 0x300010
2006-08-09 10:11:48 Our child has failed and will not talk to us:
Channel
clichg, PID 38436


Looking forward to the 4.2 release - great work.


Regards,

Chris
****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************
list Henrik Størner · Wed, 9 Aug 2006 16:02:52 +0200 ·
Hi Chris,
quoted from Chris Morris

On Wed, Aug 09, 2006 at 02:51:47PM +0100, Morris, Chris (Shared Services) wrote:
You resolved the clientdata problem on 12th July. The hostdata module does
not terminate, but just keeps saying 

"Our child has failed and will not talk to us: Channel clichg, PID  38436"
Could you try changing the hobbitd/hobbitd_channel.c around line 347 or so from 
  errprintf("Our child has failed and will not talk to us: Channel %s, PID %d\n",
to
  errprintf("Our child has failed and will not talk to us: Channel %s, PID %d, cause %s\n",

This adds some additional info about what went wrong when we tried to
push a message to the hostdata module.

I *think* this might be a harmless problem, but I'd like to see what the
actual error message is.


Regards,
Henrik
list Chris Morris · Wed, 9 Aug 2006 15:17:30 +0100 ·
Hi Henrik,

Our child has failed and will not talk to us: Channel clichg, PID 27932,
cause Bad file number
quoted from Chris Morris

Regards,
Chris

-----Original Message-----
From:	user-ce4a2c883f75@xymon.invalid [SMTP:user-ce4a2c883f75@xymon.invalid]
Sent:	Wednesday, August 09, 2006 3:03 PM
To:	user-ae9b8668bcde@xymon.invalid
Subject:	Re: [hobbit] AIX 5.2 clientdata problem (was: Note: Hobbit

4.2 re lease planned for Thursday Aug. 1 0.)
quoted from Henrik Størner

Hi Chris,

On Wed, Aug 09, 2006 at 02:51:47PM +0100, Morris, Chris (Shared Services)
wrote:
You resolved the clientdata problem on 12th July. The hostdata module
does
not terminate, but just keeps saying 

"Our child has failed and will not talk to us: Channel clichg, PID
38436"

Could you try changing the hobbitd/hobbitd_channel.c around line 347 or so
from 
  errprintf("Our child has failed and will not talk to us: Channel %s, PID
%d\n",
to
  errprintf("Our child has failed and will not talk to us: Channel %s, PID
%d, cause %s\n",

This adds some additional info about what went wrong when we tried to
push a message to the hostdata module.

I *think* this might be a harmless problem, but I'd like to see what the
actual error message is.


Regards,
Henrik

****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************
list Henrik Størner · Wed, 9 Aug 2006 20:40:48 +0200 ·
quoted from Chris Morris
On Wed, Aug 09, 2006 at 03:17:30PM +0100, Morris, Chris (Shared Services) wrote:
Hi Henrik,

Our child has failed and will not talk to us: Channel clichg, PID 27932,
cause Bad file number
"Bad file number" ?! But that file handle was given to us just a few
lines further up in the code without any errors being flagged.

Are there any indications that any of the hobbitd_* modules have
crashed? Core files in the ~hobbit/server/tmp/ directory, red/purple
hobbitd_something statuses on the webserver ? I think you would have
said so if there were any.

Do any of the other hobbitd modules (ie any of the other logfiles)
report anything similar ? Or is it only that one logfile?

I haven't heard about anything like this from others, so I'm reasonably
certain this is not a general problem.


Regards,
Henrik
list Chris Morris · Thu, 10 Aug 2006 10:09:55 +0100 ·
Hi Henrik,

No other indications of any problems on the webpage or in other logfiles and
no core files produced.

I think this may be a problem on this specific box - I'll do some more
"testing"

Thanks,
quoted from Chris Morris

Chris
-----Original Message-----
From:	user-ce4a2c883f75@xymon.invalid [SMTP:user-ce4a2c883f75@xymon.invalid]
Sent:	Wednesday, August 09, 2006 7:41 PM
To:	user-ae9b8668bcde@xymon.invalid
Subject:	Re: [hobbit] AIX 5.2 clientdata problem (was: Note: Hobbit
4.2 re lease planned for Thursday Aug. 1 0.)

On Wed, Aug 09, 2006 at 03:17:30PM +0100, Morris, Chris (Shared Services)
wrote:
Hi Henrik,

Our child has failed and will not talk to us: Channel clichg, PID 27932,
cause Bad file number
"Bad file number" ?! But that file handle was given to us just a few
lines further up in the code without any errors being flagged.

Are there any indications that any of the hobbitd_* modules have
crashed? Core files in the ~hobbit/server/tmp/ directory, red/purple
hobbitd_something statuses on the webserver ? I think you would have
said so if there were any.

Do any of the other hobbitd modules (ie any of the other logfiles)
report anything similar ? Or is it only that one logfile?

I haven't heard about anything like this from others, so I'm reasonably
certain this is not a general problem.


Regards,
Henrik

****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************