Xymon Mailing List Archive search

weird problem.

4 messages in this thread

list Neil Simmonds · Tue, 26 Feb 2013 08:47:14 -0000 ·
Hi all,

 
I've got a strange problem that I'm trying to diagnose and would
appreciate any help you can give.

 
We have 2 new servers that have recently been set up that are Aix
servers running the hobbit client. We have 62 other Aix server with the
same client running absolutely fine.

 
The problem is that the client data is getting cut off mid stream. It's
always in the ps output. I've checked the MAX settings and there all ok,
in fact we have other clients that are sending data files larger than
these that are working fine. I've checked the data on the client and
it's complete but if I look in /xymon/data/hostdata on the server the
data seems to be almost always getting truncated to 69518 bytes.
Occasionally a full message (approx 93k) gets through.

 
There are no messages regarding truncated data in the server logs and
the only message I can find on the client is the following,

 
2013-02-26 08:41:21 Write error while sending message to
bbd at xymonserver:1984

2013-02-26 08:41:21 Whoops ! bb failed to send message - write error 
 
I've googled this extensively and can't find anything that seems
relevant to our problem. 
 
Regards,

 
Neil Simmonds

Senior Operations Analyst (Operations Support Group)
Express Gifts Limited

Express House

Clayton Business Park

Accrington

Lancashire

BB5 5JY T: 01254 303092 | E: user-8188d25e65e4@xymon.invalid 
 
 
Name & Registered Office: EXPRESS GIFTS LIMITED, 2 GREGORY ST, HYDE, CHESHIRE, ENGLAND, SK14 4TH, Company No. 00718151.
Express Gifts Limited is authorised and regulated by the Financial Services Authority
NOTE:  This email and any information contained within or attached in a separate file is confidential and intended solely for the Individual to whom it is addressed. The information or data included is solely for the purpose indicated or previously agreed. Any information or data included with this e-mail remains the property of Findel PLC and the recipient will refrain from utilising the information for any purpose other than that indicated and upon request will destroy the information and remove it from their records.  Any views or opinions presented are solely those of the author and do not necessarily represent those of Findel PLC. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. No warranties or assurances are made in relation to the safety and content of this e-mail and any attachments.  No liability is accepted for any consequences arising from it. Findel Plc reserves the right to monitor all e-mail communications through its internal and external networks. If you have received this email in error please notify our IT helpdesk on +44(0) 1254 303030
list Adam Goryachev · Tue, 26 Feb 2013 20:26:47 +1100 ·
quoted from Neil Simmonds
On 26/02/13 19:47, Neil Simmonds wrote:
Hi all,

 
I've got a strange problem that I'm trying to diagnose and would
appreciate any help you can give.

 
We have 2 new servers that have recently been set up that are Aix
servers running the hobbit client. We have 62 other Aix server with
the same client running absolutely fine.

 
The problem is that the client data is getting cut off mid stream.
It's always in the ps output. I've checked the MAX settings and there
all ok, in fact we have other clients that are sending data files
larger than these that are working fine. I've checked the data on the
client and it's complete but if I look in /xymon/data/hostdata on the
server the data seems to be almost always getting truncated to 69518
bytes. Occasionally a full message (approx 93k) gets through.

 
There are no messages regarding truncated data in the server logs and
the only message I can find on the client is the following,

 
2013-02-26 08:41:21 Write error while sending message to
bbd at xymonserver:1984

2013-02-26 08:41:21 Whoops ! bb failed to send message - write error

 
I've googled this extensively and can't find anything that seems
relevant to our problem.

I get this from time to time, primarily when the xymon host has very
limited bandwidth. It seems to me that Xymon will accept whatever data
has been received prior to the connection being broken/interrupted, and
pretend it is complete (as opposed to discarding it away).

If this is happening frequently/all the time, I would suspect firewall
settings, and/or MTU issues (if it is packet size related). Check that
you are not blocking all ICMP, or that path MTU discovery is working
properly, check any firewall is not timing out or blocking the
connection for some reason, and that there is enough bandwidth for the
messages.

Potentially, a tcpdump at both client and server could be educational,
possibly load these into wireshark for analysis.

PS, I wonder when we will get compression, and/or encryption for the
status messages? Both would assist in making sure the complete message
arrives un-altered...

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au
list David Baldwin · Wed, 27 Feb 2013 10:47:14 +1100 ·
quoted from Adam Goryachev
On 26/02/13 8:26 PM, Adam Goryachev wrote:
On 26/02/13 19:47, Neil Simmonds wrote:
Hi all,

 
I’ve got a strange problem that I’m trying to diagnose and would
appreciate any help you can give.

 
We have 2 new servers that have recently been set up that are Aix
servers running the hobbit client. We have 62 other Aix server with
the same client running absolutely fine.

 
The problem is that the client data is getting cut off mid stream.
It’s always in the ps output. I’ve checked the MAX settings and there
all ok, in fact we have other clients that are sending data files
larger than these that are working fine. I’ve checked the data on the
client and it’s complete but if I look in /xymon/data/hostdata on the
server the data seems to be almost always getting truncated to 69518
bytes. Occasionally a full message (approx 93k) gets through.

 
There are no messages regarding truncated data in the server logs and
the only message I can find on the client is the following,

 
2013-02-26 08:41:21 Write error while sending message to
bbd at xymonserver:1984

2013-02-26 08:41:21 Whoops ! bb failed to send message - write error

 
I’ve googled this extensively and can’t find anything that seems
relevant to our problem.

I get this from time to time, primarily when the xymon host has very
limited bandwidth. It seems to me that Xymon will accept whatever data
has been received prior to the connection being broken/interrupted,
and pretend it is complete (as opposed to discarding it away).
The problem is that there isn't a well defined "end of message" on a
standard client report. The message starts with "client HOSTNAME.OS
CLASS" line then consists of a bunch of sections starting with
"[section]" lines followed by lines of text. When the client has
finished sending its message it just does a shutdown on the write socket
and reads any returned data until EOF. That's it. The server probably
doesn't care if the client even reads the data it sends back, and has no
way of communicating with it anyway.

So if the client connection to the server is interrupted mid-stream, the
server quite probably just handles it as a socket shutdown and accepts
whatever has been received so far as the whole message.
quoted from Adam Goryachev
If this is happening frequently/all the time, I would suspect firewall
settings, and/or MTU issues (if it is packet size related). Check that
you are not blocking all ICMP, or that path MTU discovery is working
properly, check any firewall is not timing out or blocking the
connection for some reason, and that there is enough bandwidth for the
messages.

Potentially, a tcpdump at both client and server could be educational,
possibly load these into wireshark for analysis.

PS, I wonder when we will get compression, and/or encryption for the
status messages? Both would assist in making sure the complete message
arrives un-altered...
Indeed. There are other ways of delivering/fetching messages - maybe
worth exploring for more reliable transmission.

David.
Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au

-- 
David Baldwin - Senior Systems Administrator (Datacentres + Networks)
Information and Communication Technology Services
Australian Sports Commission          http://ausport.gov.au
Tel 02 62147830 Fax 02 62141830       PO Box 176 Belconnen ACT 2616
user-cbbf693f2c89@xymon.invalid          Leverrier Street Bruce ACT 2617


Keep up to date with what's happening in Australian sport visit http://www.ausport.gov.au

This message is intended for the addressee named and may contain confidential and privileged information. If you are not the intended recipient please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you receive this message in error, please delete it and notify the sender.
list Jeremy Laidman · Wed, 27 Feb 2013 16:29:07 +1100 ·
On 26 February 2013 19:47, Neil Simmonds
quoted from David Baldwin
<user-8188d25e65e4@xymon.invalid>wrote:

There are no messages regarding truncated data in the server logs and the
only message I can find on the client is the following,

** **

2013-02-26 08:41:21 Write error while sending message to bbd at xymonserver
:1984****

2013-02-26 08:41:21 Whoops ! bb failed to send message - write error
Perhaps try running the client script manually like so:

$ cd ~xymon/client/bin
$ sudo -u xymon ./xymoncmd
$ time ./xymonclient.sh

This might show an error you didn't see before.  At the very least, it will
give you an idea how long it takes to run/fail.  You might also run it as:

$ sh -x ./xymonclient.sh

Then see what takes all the time.

Perhaps you could run it through truss to see what system calls are being
run when the connection closes.  Like so:

$ truss -f ./xymonclient.sh

It's likely to be caused by taking to long to transfer the data, either
because the data is taking too long to transmit (eg duplex mismatch causing
network errors) or because there's too much data to sent.  You could try
increasing the timeout value for xymond on the server by adding "--timeout
N" (from 5 to 60) in tasks.cfg.  The man page for xymond says the default
is 10 seconds, but the code for v4.3.10 shows 30 seconds.

I don't think the server normally logs a message if it times out a
connection in this way.  However if you turn on debug (by adding "--debug"
in tasks.cfg) then it should log "No command for update_statistics" when
this happens.

J