Xymon Mailing List Archive search

performance tuning

6 messages in this thread

list Jeff Newman · Tue, 4 Apr 2006 18:11:57 -0500 ·
All,

I'm running Hobbit 4.1.2p1 on a Redhat AS 4 server.

I currently have a number of server side scripts that post-process
incoming test results.

My concern is a high number of "1984" sockets in TIME_WAIT condition
(upwords of 60 on the client at times) The clients are AIX boxes.

Is there any way to tune Redhat/AIX/Hobbit to smooth this out? I know
that doesn't sound very technical heh. Just looking for some ideas.

Thanks,
Jeff
list Henrik Størner · Wed, 5 Apr 2006 08:08:54 +0200 ·
quoted from Jeff Newman
On Tue, Apr 04, 2006 at 06:11:57PM -0500, Jeff Newman wrote:
I'm running Hobbit 4.1.2p1 on a Redhat AS 4 server.

I currently have a number of server side scripts that post-process
incoming test results.

My concern is a high number of "1984" sockets in TIME_WAIT condition
(upwords of 60 on the client at times) The clients are AIX boxes.
I dont really understand why you see so many TIME_WAIT connections on
the *client*. I could understand if they were on the Hobbit server, but
not on the client...

If there is a firewall between the clients and the server, it could be
that the firewall terminates the connection tracking state before the
final exchange of FIN/ACK packets has completed between the two sides.
But then I'd expect it to be in FIN_WAIT_[1,2] ...

What's TIME_WAIT anyway ... from
http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html:

  "The main thing to recognize about connection teardown is that a
   connection in the TIME_WAIT state cannot move to the CLOSED state until
   it has waited for two times the maximum amount of time an IP datagram
   might live in the Inter net. The reason for this is that while the local
   side of the connection has sent an ACK in response to the other side's
   FIN segment, it does not know that the ACK was successfully delivered.
   As a consequence this other side might re transmit its FIN segment, and
   this second FIN segment might be delayed in the network. If the
   connection were allowed to move directly to the CLOSED state, then
   another pair of application processes might come along and open the same
   connection, and the delayed FIN segment from the earlier incarnation of
   the connection would immediately initiate the termination of the later
   incarnation of that connection."

So TIME_WAIT states should only stay around for a few minutes - I
believe a common "2 x max lifetime" is 2 minutes, but this is somewhat
OS dependant. Since the Hobbit client only runs once every 5 minutes,
I don't understand why you have many of these on the clients.


Henrik
list Allan Spencer · Wed, 05 Apr 2006 16:24:24 +1000 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Tue, Apr 04, 2006 at 06:11:57PM -0500, Jeff Newman wrote:
  
I'm running Hobbit 4.1.2p1 on a Redhat AS 4 server.

I currently have a number of server side scripts that post-process
incoming test results.

My concern is a high number of "1984" sockets in TIME_WAIT condition
(upwords of 60 on the client at times) The clients are AIX boxes.
    
I dont really understand why you see so many TIME_WAIT connections on
the *client*. I could understand if they were on the Hobbit server, but
not on the client...

If there is a firewall between the clients and the server, it could be
that the firewall terminates the connection tracking state before the
final exchange of FIN/ACK packets has completed between the two sides.
But then I'd expect it to be in FIN_WAIT_[1,2] ...

What's TIME_WAIT anyway ... from
http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html:

  "The main thing to recognize about connection teardown is that a
   connection in the TIME_WAIT state cannot move to the CLOSED state until
   it has waited for two times the maximum amount of time an IP datagram
   might live in the Inter net. The reason for this is that while the local
   side of the connection has sent an ACK in response to the other side's
   FIN segment, it does not know that the ACK was successfully delivered.
   As a consequence this other side might re transmit its FIN segment, and
   this second FIN segment might be delayed in the network. If the
   connection were allowed to move directly to the CLOSED state, then
   another pair of application processes might come along and open the same
   connection, and the delayed FIN segment from the earlier incarnation of
   the connection would immediately initiate the termination of the later
   incarnation of that connection."

So TIME_WAIT states should only stay around for a few minutes - I
believe a common "2 x max lifetime" is 2 minutes, but this is somewhat
OS dependant. Since the Hobbit client only runs once every 5 minutes,
I don't understand why you have many of these on the clients.


Henrik

I just checked a number of hosts here and none of them had anything like 
this. About the most I saw was 6 time waits all of which only hung 
around for 60 secs maybe 90 tops. Even on both our servers we dont have 
any time waits on the hobbit port, only from the client that sends it 
stats to the other server but same again minute tops.

Id suggest the same as what Henrik already has so I won't say any more

Allan
list Jeff Newman · Wed, 5 Apr 2006 10:27:23 -0500 ·
Hi,

No firewall in between.
quoted from Allan Spencer
Since the Hobbit client only runs once every 5 minutes,
I don't understand why you have many of these on the clients.
I have 7 hosts that I am doing performance testing on. I have moved those
7 hosts to their own hobbit server. On those 7 hosts each host has
between 4-6 1 second interval tests.

By isolating them to their own server, that keeps the load off the
main hobbit server we have. As we have all discussed previously, there
is a difference between
monitoring and capacity planning, so I am pushing the limit with hobbit on this.

So when the socket is in TIME_WAIT, is it waiting on some response from
the hobbit server then? would increasing or decreasing the maximum life span
have any negative effects? I noticed in the alpha notification for the
new hobbit
it mentioned performace enhancements. Would these help do you think?

Thanks,
Jeff
quoted from Allan Spencer

On 4/5/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Apr 04, 2006 at 06:11:57PM -0500, Jeff Newman wrote:
I'm running Hobbit 4.1.2p1 on a Redhat AS 4 server.

I currently have a number of server side scripts that post-process
incoming test results.

My concern is a high number of "1984" sockets in TIME_WAIT condition
(upwords of 60 on the client at times) The clients are AIX boxes.
I dont really understand why you see so many TIME_WAIT connections on
the *client*. I could understand if they were on the Hobbit server, but
not on the client...

If there is a firewall between the clients and the server, it could be
that the firewall terminates the connection tracking state before the
final exchange of FIN/ACK packets has completed between the two sides.
But then I'd expect it to be in FIN_WAIT_[1,2] ...

What's TIME_WAIT anyway ... from
http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html:

 "The main thing to recognize about connection teardown is that a
  connection in the TIME_WAIT state cannot move to the CLOSED state until
  it has waited for two times the maximum amount of time an IP datagram
  might live in the Inter net. The reason for this is that while the local
  side of the connection has sent an ACK in response to the other side's
  FIN segment, it does not know that the ACK was successfully delivered.
  As a consequence this other side might re transmit its FIN segment, and
  this second FIN segment might be delayed in the network. If the
  connection were allowed to move directly to the CLOSED state, then
  another pair of application processes might come along and open the same
  connection, and the delayed FIN segment from the earlier incarnation of
  the connection would immediately initiate the termination of the later
  incarnation of that connection."

So TIME_WAIT states should only stay around for a few minutes - I
believe a common "2 x max lifetime" is 2 minutes, but this is somewhat
OS dependant. Since the Hobbit client only runs once every 5 minutes,
I don't understand why you have many of these on the clients.


Henrik

list Charles Jones · Wed, 05 Apr 2006 08:37:38 -0700 ·
Just to clarify, You are basically hammering the clients with short 
interval tests in order to test performance? If so then why are you 
wondering about the greater number of TIME_WAIT sockets that are the 
natural result of many connections? If the tests are being done locally 
on the clients themselves, sounds like they are doing TCP tests on the 
external IPs? 

Increasing the OS value for TIME_WAIT is not a good idea, as it can 
break many other things as Henrik mentioned. Seeing the TIME_WAITs is 
not really an issue...you should see how many there are on our web 
servers when we do performance testing :)

Note: I'm also unsure of how well of a performance test you can do with 
7 clients, when there are people using hobbit in everyday situations 
with hundreds of clients with a ~dozen tests each.

Just some thoughts
-Charles
quoted from Jeff Newman

Jeff Newman wrote:
Since the Hobbit client only runs once every 5 minutes,
I don't understand why you have many of these on the clients.
    
I have 7 hosts that I am doing performance testing on. I have moved those
7 hosts to their own hobbit server. On those 7 hosts each host has
between 4-6 1 second interval tests.

By isolating them to their own server, that keeps the load off the
main hobbit server we have. As we have all discussed previously, there
is a difference between
monitoring and capacity planning, so I am pushing the limit with hobbit on this.

So when the socket is in TIME_WAIT, is it waiting on some response from
the hobbit server then? would increasing or decreasing the maximum life span
have any negative effects? I noticed in the alpha notification for the
new hobbit
it mentioned performace enhancements. Would these help do you think?

Thanks,
Jeff

On 4/5/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
  
On Tue, Apr 04, 2006 at 06:11:57PM -0500, Jeff Newman wrote:
    
I'm running Hobbit 4.1.2p1 on a Redhat AS 4 server.

I currently have a number of server side scripts that post-process
incoming test results.

My concern is a high number of "1984" sockets in TIME_WAIT condition
(upwords of 60 on the client at times) The clients are AIX boxes.
      
I dont really understand why you see so many TIME_WAIT connections on
the *client*. I could understand if they were on the Hobbit server, but
not on the client...

If there is a firewall between the clients and the server, it could be
that the firewall terminates the connection tracking state before the
final exchange of FIN/ACK packets has completed between the two sides.
But then I'd expect it to be in FIN_WAIT_[1,2] ...

What's TIME_WAIT anyway ... from
http://www.ssfnet.org/Exchange/tcp/tcpTutorialNotes.html:

 "The main thing to recognize about connection teardown is that a
  connection in the TIME_WAIT state cannot move to the CLOSED state until
  it has waited for two times the maximum amount of time an IP datagram
  might live in the Inter net. The reason for this is that while the local
  side of the connection has sent an ACK in response to the other side's
  FIN segment, it does not know that the ACK was successfully delivered.
  As a consequence this other side might re transmit its FIN segment, and
  this second FIN segment might be delayed in the network. If the
  connection were allowed to move directly to the CLOSED state, then
  another pair of application processes might come along and open the same
  connection, and the delayed FIN segment from the earlier incarnation of
  the connection would immediately initiate the termination of the later
  incarnation of that connection."

So TIME_WAIT states should only stay around for a few minutes - I
believe a common "2 x max lifetime" is 2 minutes, but this is somewhat
OS dependant. Since the Hobbit client only runs once every 5 minutes,
I don't understand why you have many of these on the clients.


Henrik

list Henrik Størner · Wed, 5 Apr 2006 17:47:15 +0200 ·
quoted from Jeff Newman
On Wed, Apr 05, 2006 at 10:27:23AM -0500, Jeff Newman wrote:
So when the socket is in TIME_WAIT, is it waiting on some response from
the hobbit server then? would increasing or decreasing the maximum life span
have any negative effects?
The TIME_WAIT state is something that the operating system TCP/IP stack
handles all by itself. It is a safeguard used to prevent re-transmitted
or duplicated packets related to a closed connection from interfering
with a new connection that just happens to use the same local portnumber.

According to
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.express.doc/info/exp/ae/tprf_tuneaix.html
you can check what the current setting for how long sockets remain in
TIME_WAIT state with "/usr/sbin/no -o tcp_timewait". By default, Linux
has this set to 2 minutes, but I believe Solaris has a much higher
value.
quoted from Charles Jones
I noticed in the alpha notification for the
new hobbit it mentioned performace enhancements. Would these help do you think?
Not with this problem. When the socket is in TIME_WAIT state, the
application (Hobbit, in this case) has completely closed the connection.
In fact, the process that created the socket may no longer exist. It
is up to the operating system to decide when the socket can be removed
from the TIME_WAIT state.


The performance enhancements I've put into Hobbit 4.2 could help with
one of the other issues that I think you reported - about hobbitd_client
using a lot of CPU time.


Henrik