XYMON Proxy Issue
list Gautier Begin
Hello
I'm using XYMON 4.3.12 under Solaris 10.5
Since I put a proxy between my XYMON server and ~1000 agents, I have such
pbl:
- Many test are in pruple state, especially BBWin agents.
- Even if the test is green, nothing is displayed in RRD graph.
- I find timeout messages in logs (agents and server and xymonnet on the
proxy):
2014-02-17 15:58:32 -> Recipient '10.195.243.205', timeout 15
2014-02-17 15:58:32 -> 1st line: 'combo'
2014-02-17 16:02:53 Whoops ! Failed to send message (timeout)
- The xymonproxy log always complaining with a message that I don't
understand:
2014-02-17 12:59:30 xymonproxy version 4.3.12 starting
2014-02-17 12:59:30 Listening on 0.0.0.0:1984
2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984
2014-02-17 12:59:30 select() failed: Invalid argument
2014-02-17 12:59:30 select() failed: Invalid argument
2014-02-17 12:59:30 select() failed: Invalid argument
- Many connections in TIME_WAIT from the proxy to the server, sometime the
status is SYN_SENT
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are
working fine.
I tried to play with timeout parameters on servers with no succès.
Any idea ?
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
list Gautier Begin
Hello, The XYMONPROXY log in debug + verbose mode: 2014-02-17 17:30:12 xymonproxy version 4.3.12 starting 2014-02-17 17:30:12 Listening on 0.0.0.0:1984 2014-02-17 17:30:12 Sending to Xymon server(s) 10.195.241.64:1984 8000 2014-02-17 17:30:12 state 0: request from client OK 2014-02-17 17:30:12 0.0.0.0 : status vh-xymon10.xymonproxy 8000 2014-02-17 17:30:12 New connection 8000 2014-02-17 17:30:12 state 0: reading from client 8000 2014-02-17 17:30:12 state 1: request combining 2014-02-17 17:30:12 select() failed: Invalid argument 8000 2014-02-17 17:30:12 state 0: reading from client 8000 2014-02-17 17:30:12 state 1: request combining 2014-02-17 17:30:12 select() failed: Invalid argument 8000 2014-02-17 17:30:12 state 0: reading from client ... Cordialement, Regards,Mit freundlichen Grüßen, Gautier BEGIN From: Gautier Begin/LUX/CSC at CSC To: xymon at xymon.com Date: 02/17/2014 04:55 PM Subject: [Xymon] XYMON Proxy Issue Sent by: "Xymon" <xymon-bounces at xymon.com>
▸
Hello
I'm using XYMON 4.3.12 under Solaris 10.5
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy): 2014-02-17 15:58:32 -> Recipient '10.195.243.205', timeout 15 2014-02-17 15:58:32 -> 1st line: 'combo' 2014-02-17 16:02:53 Whoops ! Failed to send message (timeout)
- The xymonproxy log always complaining with a message that I don't understand:
2014-02-17 12:59:30 xymonproxy version 4.3.12 starting 2014-02-17 12:59:30 Listening on 0.0.0.0:1984 2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine.
I tried to play with timeout parameters on servers with no succès. Any idea ?
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
list Gautier Begin
Hi,
I made a truss on the xymonproxy process and get such errors (ECONNABORTED/EINPROGRESS)
16881/1: 33.7424 0.0002 fcntl(6, F_SETFL, FNONBLOCK) = 0
16881/1: 33.7427 0.0003 connect(6, 0xFFBFD838, 16, SOV_DEFAULT) = 0
16881/1: 33.7429 0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT) = 0
16881/1: 33.7430 0.0001 close(4) = 0
16881/1: 33.7432 0.0002 close(5) = 0
16881/1: 33.7433 0.0001 pollsys(0xFFBFD320, 4, 0xFFBFD3D8, 0x00000000) = 4
16881/1: 33.7435 0.0002 read(9, " c o n f i g b b w i n".., 8185) = 29
16881/1: 33.7437 0.0002 read(7, 0x000544E3, 9124) = 0
16881/1: 33.7439 0.0002 getsockopt(6, SOL_SOCKET, SO_ERROR, 0xFFBFD4B8, 0xFFBFD4B4, SOV_DEFAULT) = 0
16881/1: 33.7440 0.0001 write(6, " d a t a S N F N L X 2".., 402) = 402
16881/1: 33.7465 0.0025 accept(3, 0x0004E7A0, 0xFFBFD4B0, SOV_DEFAULT) Err#130 ECONNABORTED
...
6881/1: 33.2151 0.0002 accept(3, 0x0004E810, 0xFFBFD4B0, SOV_DEFAULT) = 8
16881/1: 33.2152 0.0001 fcntl(8, F_SETFL, FNONBLOCK) = 0
16881/1: 33.2153 0.0001 time() = 1392657199
16881/1: 33.2155 0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT) = 0
16881/1: 33.2156 0.0001 close(4) = 0
16881/1: 33.2158 0.0002 shutdown(7, SHUT_WR, SOV_DEFAULT) = 0
16881/1: 33.2160 0.0002 close(5) = 0
16881/1: 33.2161 0.0001 time() = 1392657199
16881/1: 33.2162 0.0001 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 4
16881/1: 33.2163 0.0001 fcntl(4, F_SETFL, FNONBLOCK) = 0
16881/1: 33.2166 0.0003 connect(4, 0xFFBFD838, 16, SOV_DEFAULT) Err#150 EINPROGRESS
16881/1: 33.2167 0.0001 pollsys(0xFFBFD328, 5, 0xFFBFD3D8, 0x00000000) = 3
16881/1: 33.2168 0.0001 read(8, " c o n f i g b b w i n".., 8185) = 31
16881/1: 33.2171 0.0003 read(6, " s t a t u s C A C X N".., 49145) = 9630
..
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
Next Absences: - From 20th Feb to 21th Feb
- From 3th Apr to 21th Apr
System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg
Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com
CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
▸
From: Gautier Begin/LUX/CSC at CSC
To: xymon at xymon.com
Date: 02/17/2014 04:55 PM
Subject: [Xymon] XYMON Proxy Issue
Sent by: "Xymon" <xymon-bounces at xymon.com>
Hello
I'm using XYMON 4.3.12 under Solaris 10.5
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy): 2014-02-17 15:58:32 -> Recipient '10.195.243.205', timeout 15 2014-02-17 15:58:32 -> 1st line: 'combo' 2014-02-17 16:02:53 Whoops ! Failed to send message (timeout)
- The xymonproxy log always complaining with a message that I don't understand:
2014-02-17 12:59:30 xymonproxy version 4.3.12 starting 2014-02-17 12:59:30 Listening on 0.0.0.0:1984 2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine.
I tried to play with timeout parameters on servers with no succès. Any idea ?
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
list Gautier Begin
Any news ?
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
From: Gautier Begin/LUX/CSC at CSC
To: xymon at xymon.com, "Xymon" <xymon-bounces at xymon.com>
Date: 02/17/2014 06:56 PM
Subject: Re: [Xymon] XYMON Proxy Issue
Sent by: "Xymon" <xymon-bounces at xymon.com>
Hi,
I made a truss on the xymonproxy process and get such errors (ECONNABORTED/EINPROGRESS)
16881/1: 33.7424 0.0002 fcntl(6, F_SETFL, FNONBLOCK) = 0 16881/1: 33.7427 0.0003 connect(6, 0xFFBFD838, 16, SOV_DEFAULT) = 0 16881/1: 33.7429 0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT) = 0 16881/1: 33.7430 0.0001 close(4) = 0 16881/1: 33.7432 0.0002 close(5) = 0 16881/1: 33.7433 0.0001 pollsys(0xFFBFD320, 4, 0xFFBFD3D8, 0x00000000) = 4 16881/1: 33.7435 0.0002 read(9, " c o n f i g b b w i n".., 8185) = 29 16881/1: 33.7437 0.0002 read(7, 0x000544E3, 9124) = 0 16881/1: 33.7439 0.0002 getsockopt(6, SOL_SOCKET, SO_ERROR, 0xFFBFD4B8, 0xFFBFD4B4, SOV_DEFAULT) = 0 16881/1: 33.7440 0.0001 write(6, " d a t a S N F N L X 2".., 402) = 402 16881/1: 33.7465 0.0025 accept(3, 0x0004E7A0, 0xFFBFD4B0, SOV_DEFAULT) Err#130 ECONNABORTED ... 6881/1: 33.2151 0.0002 accept(3, 0x0004E810, 0xFFBFD4B0, SOV_DEFAULT) = 8 16881/1: 33.2152 0.0001 fcntl(8, F_SETFL, FNONBLOCK) = 0 16881/1: 33.2153 0.0001 time() = 1392657199 16881/1: 33.2155 0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT) = 0 16881/1: 33.2156 0.0001 close(4) = 0 16881/1: 33.2158 0.0002 shutdown(7, SHUT_WR, SOV_DEFAULT) = 0
16881/1: 33.2160 0.0002 close(5) = 0 16881/1: 33.2161 0.0001 time() = 1392657199 16881/1: 33.2162 0.0001 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 4 16881/1: 33.2163 0.0001 fcntl(4, F_SETFL, FNONBLOCK) = 0 16881/1: 33.2166 0.0003 connect(4, 0xFFBFD838, 16, SOV_DEFAULT) Err#150 EINPROGRESS 16881/1: 33.2167 0.0001 pollsys(0xFFBFD328, 5, 0xFFBFD3D8, 0x00000000) = 3 16881/1: 33.2168 0.0001 read(8, " c o n f i g b b w i n".., 8185) = 31 16881/1: 33.2171 0.0003 read(6, " s t a t u s C A C X N".., 49145) = 9630 ..
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
Next Absences: - From 20th Feb to 21th Feb
- From 3th Apr to 21th Apr
System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg
Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com
CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
From: Gautier Begin/LUX/CSC at CSC To: xymon at xymon.com Date: 02/17/2014 04:55 PM Subject: [Xymon] XYMON Proxy Issue Sent by: "Xymon" <xymon-bounces at xymon.com>
Hello
I'm using XYMON 4.3.12 under Solaris 10.5
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy): 2014-02-17 15:58:32 -> Recipient '10.195.243.205', timeout 15 2014-02-17 15:58:32 -> 1st line: 'combo' 2014-02-17 16:02:53 Whoops ! Failed to send message (timeout)
- The xymonproxy log always complaining with a message that I don't understand:
2014-02-17 12:59:30 xymonproxy version 4.3.12 starting 2014-02-17 12:59:30 Listening on 0.0.0.0:1984 2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine.
I tried to play with timeout parameters on servers with no succès. Any idea ?
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
list Gautier Begin
Hello, Here a part of the solution: Communication Issue ==================== Observing le connections between agents and proxy using netstat, I see a lot of CLOSE_WAIT on the side of the proxy and some SYN_SENT on the side of agents. => The xymonproxy program doesn't close its connections, then the system reach its limit of nbr of active connections and refuse the new ones. Modifying the creation of the socket in the C program, we were successfull in making the connections correctly handled: 230d229 < struct linger so_linger; 715,717d713 < so_linger.l_onoff = 0; < so_linger.l_linger = 10; < setsockopt(cwalk->ssocket, SOL_SOCKET, SO_LINGER, &so_linger, sizeof(so_linger)); Select Issue ==================== In the C program, the line n = select(maxfd+1, &fdread, &fdwrite, NULL, &selecttmo); is in error because the n generated is equaled to -1 . We didn't found yet why and how the solve it. Any idea ?
▸
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
Next Absences: - From 3th Apr to 21th Apr
System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg
Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com
CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
From: Gautier Begin/LUX/CSC
To: xymon at xymon.com, "Xymon" <xymon-bounces at xymon.com>
Date: 02/19/2014 10:44 AM
Subject: Re: [Xymon] XYMON Proxy Issue
Any news ?
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
From: Gautier Begin/LUX/CSC at CSC
To: xymon at xymon.com, "Xymon" <xymon-bounces at xymon.com>
Date: 02/17/2014 06:56 PM
Subject: Re: [Xymon] XYMON Proxy Issue
Sent by: "Xymon" <xymon-bounces at xymon.com>
Hi,
I made a truss on the xymonproxy process and get such errors (ECONNABORTED/EINPROGRESS)
16881/1: 33.7424 0.0002 fcntl(6, F_SETFL, FNONBLOCK) = 0 16881/1: 33.7427 0.0003 connect(6, 0xFFBFD838, 16, SOV_DEFAULT) = 0 16881/1: 33.7429 0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT) = 0 16881/1: 33.7430 0.0001 close(4) = 0 16881/1: 33.7432 0.0002 close(5) = 0 16881/1: 33.7433 0.0001 pollsys(0xFFBFD320, 4, 0xFFBFD3D8, 0x00000000) = 4 16881/1: 33.7435 0.0002 read(9, " c o n f i g b b w i n".., 8185) = 29 16881/1: 33.7437 0.0002 read(7, 0x000544E3, 9124) = 0 16881/1: 33.7439 0.0002 getsockopt(6, SOL_SOCKET, SO_ERROR, 0xFFBFD4B8, 0xFFBFD4B4, SOV_DEFAULT) = 0 16881/1: 33.7440 0.0001 write(6, " d a t a S N F N L X 2".., 402) = 402 16881/1: 33.7465 0.0025 accept(3, 0x0004E7A0, 0xFFBFD4B0, SOV_DEFAULT) Err#130 ECONNABORTED ... 6881/1: 33.2151 0.0002 accept(3, 0x0004E810, 0xFFBFD4B0, SOV_DEFAULT) = 8 16881/1: 33.2152 0.0001 fcntl(8, F_SETFL, FNONBLOCK) = 0 16881/1: 33.2153 0.0001 time() = 1392657199 16881/1: 33.2155 0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT) = 0 16881/1: 33.2156 0.0001 close(4) = 0 16881/1: 33.2158 0.0002 shutdown(7, SHUT_WR, SOV_DEFAULT) = 0
16881/1: 33.2160 0.0002 close(5) = 0 16881/1: 33.2161 0.0001 time() = 1392657199 16881/1: 33.2162 0.0001 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 4 16881/1: 33.2163 0.0001 fcntl(4, F_SETFL, FNONBLOCK) = 0 16881/1: 33.2166 0.0003 connect(4, 0xFFBFD838, 16, SOV_DEFAULT) Err#150 EINPROGRESS 16881/1: 33.2167 0.0001 pollsys(0xFFBFD328, 5, 0xFFBFD3D8, 0x00000000) = 3 16881/1: 33.2168 0.0001 read(8, " c o n f i g b b w i n".., 8185) = 31 16881/1: 33.2171 0.0003 read(6, " s t a t u s C A C X N".., 49145) = 9630 ..
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
Next Absences: - From 20th Feb to 21th Feb
- From 3th Apr to 21th Apr
System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg
Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com
CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
From: Gautier Begin/LUX/CSC at CSC To: xymon at xymon.com Date: 02/17/2014 04:55 PM Subject: [Xymon] XYMON Proxy Issue Sent by: "Xymon" <xymon-bounces at xymon.com>
Hello
I'm using XYMON 4.3.12 under Solaris 10.5
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy): 2014-02-17 15:58:32 -> Recipient '10.195.243.205', timeout 15 2014-02-17 15:58:32 -> 1st line: 'combo' 2014-02-17 16:02:53 Whoops ! Failed to send message (timeout)
- The xymonproxy log always complaining with a message that I don't understand:
2014-02-17 12:59:30 xymonproxy version 4.3.12 starting 2014-02-17 12:59:30 Listening on 0.0.0.0:1984 2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument 2014-02-17 12:59:30 select() failed: Invalid argument
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine.
I tried to play with timeout parameters on servers with no succès. Any idea ?
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
list Andy Smith
Hi, In February, Gautier reported this issue with xymonproxy on Solaris :- http://lists.xymon.com/pipermail/xymon/2014-February/039160.html I have come this week to update an installation of 4.2.3 on Solaris 9 and have encountered the exact same issue as Gautier, but this time on the latest 4.3.17 code :- 2014-05-04 13:05:36 xymonproxy version 4.3.17 starting 2014-05-04 13:20:41 Listening on 0.0.0.0:1984 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 Too many select failures, aborting 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting I do not suffer the connections in TIME_WAIT, just the constant restarting of the proxy every 15 minutes. Here is the truss as it gasps when falling over :- poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206937 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206938 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206939 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206940 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206941 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206942 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003AC60, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, " s t a t u s + 4 5 c s".., 8185) = 140 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, 0x00038CE2, 8045) = 0 time() = 1399206942 shutdown(4, 2, 1) = 0 close(4) = 0 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003ACD0, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " T o o m a n y s e l".., 35) = 35 _exit(1) So, question to Gautier, are you using Solaris 9 and have you managed to resolve this? Another question to the rest of the list, this is actually the only proxy I have on Solaris, all the otehrs are on Redhat, is anyone else using xymonproxy on Solaris and if so, what version? For the time being, I am running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to be working OK. Thanks for any feedback. -- Andy
list Gautier Begin
Andy,
I'm using Solaris 10.5 in a cluster zone configuration. Both the main and
the proxy server. I have also a little proxy under Linux Ubuntu.
XYMON version 4.3.12
Now, my proxy under Solaris is working fine with ~900 targets. Here are
the different stepsI have done:
0- Use a tool to observe the behaviour of the network on the system. I
used netstat on the zone and lsof -i :1984 on the global zone (physical
node of the cluster)
Here my perl script to be run on the zone (netstat):
$total = 0 ;
$big_total = 0 ;
@netstat = ` netstat -naP tcp ` ;
my %Con_Status ;
my %Con_Status_Total ;
foreach $ln (@netstat)
{
chomp($ln) ;
@elts = split(/ +/,$ln) ;
if (( $#elts > 5 ) && ( $ln =~ /[0-9]+.*[A-Z]+/))
{
$big_total++ ;
unless ( exists($Con_Status_Total{$elts[$#elts]}) )
{
$Con_Status_Total{$elts[$#elts]} = 1 ;
} else {
$Con_Status_Total{$elts[$#elts]} =
$Con_Status_Total{$elts[$#elts]} + 1 ;
}
}
if ( $ln =~ /\.1984 +/ )
{
unless ( exists($Con_Status{$elts[$#elts]}) )
{
$Con_Status{$elts[$#elts]} = 1 ;
} else {
$Con_Status{$elts[$#elts]} =
$Con_Status{$elts[$#elts]} + 1 ;
}
}
}
print " State\t\tPort
1984\tTotal\n=======================================\n" ;
foreach $Conn_State (sort keys %Con_Status_Total )
{
unless ( exists($Con_Status{$Conn_State}) ) {
$Con_Status{$Conn_State} = 0 ; }
if ( length($Conn_State) < 7 ) { $col = "\t\t" ; } else { $col =
"\t" ; }
print "
$Conn_State$col$Con_Status{$Conn_State}\t\t$Con_Status_Total{$Conn_State}\n"
;
$total = $total + $Con_Status{$Conn_State} ;
}
print "=======================================\n
TOTAL\t\t$total\t\t$big_total\n" ;
1- Tune and configure how Solaris manages the network using the ndd
command:
ndd -set /dev/tcp tcp_time_wait_interval 2000
ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
ndd -set /dev/tcp tcp_ip_abort_interval 300000
ndd -set /dev/tcp tcp_keepalive_interval 7200000
ndd -set /dev/tcp tcp_rexmit_interval_max 4000
ndd -set /dev/tcp tcp_rexmit_interval_min 3000
ndd -set /dev/tcp tcp_rexmit_interval_initial 3000
ndd -set /dev/tcp tcp_smallest_anon_port 1024
ndd -set /dev/tcp tcp_conn_req_max_q 2048
ndd -set /dev/tcp tcp_conn_req_max_q0 4096
ndd -set /dev/tcp tcp_slow_start_initial 4
ndd -set /dev/tcp tcp_xmit_hiwat 262144
ndd -set /dev/tcp tcp_recv_hiwat 262144
ndd -set /dev/tcp tcp_max_buf 1048576
2- Modify the program xymonproxy.c
As I previously said, sockets are not well handled in this program
(closure not managed). Because I know very few about C programming, I just
"arranged" the program, but it's remain a dirty solution.
=> so_linger, setsockopt part
I modified also line 973 and following because of verbose logging slowing
done the proxy (select failed message). The best should be to solve to
issue but I didn't.
# diff xymonproxy.c xymonproxy.c.ORIG
▸
230d229
< struct linger so_linger;
715,717d713
< so_linger.l_onoff = 0;
< so_linger.l_linger = 10;
< setsockopt(cwalk->ssocket, SOL_SOCKET,
SO_LINGER, &so_linger, sizeof(so_linger));
977,981c973,976
< /* if (n < 0) { */
< /* errprintf("select() %d/%d failed: %s\n", n, maxfd,
strerror(errno)); */
< /* } */
< /* else if (n == 0) { */
< if (n == 0) {
--- if (n < 0) {
errprintf("select() failed: %s\n",
strerror(errno));
}
else if (n == 0) {1001c996
< else if ( n > 0 ) {
--- else {3- XYMON proxy conf Because of the large amount of targets: In xymonserver.cfg, of the proxy, I put MAXMSGSPERCOMBO="500" . In the xymonserver.cfg, of the main server, I put MAXMSGSPERCOMBO="500" MAXLINE="5242880" MAXMSG_CLIENT="5242880" MAXMSG_DATA="5242880" MAXMSG_STACHG="5242880" MAXMSG_STATUS="5242880" MAXMSG_NOTES="5242880" MAXMSG_PAGE="5242880" MAXMSG_ENADIS="5242880" MAXMSG_CLICHG="5242880" This part is not realy tunned (figures should be too large) but it's working.
▸
Cordialement, Regards,Mit freundlichen Grüßen, Gautier BEGIN System Tools Team Lead CACEIS and APERAM accounts CSC Computer Sciences Luxembourg S.A. 12D Impasse Drosbach L-1882 Luxembourg Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose • CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664 From: Andy Smith <user-982f5f6d4d28@xymon.invalid> To: xymon at xymon.com Date: 05/04/2014 02:50 PM Subject: Re: [Xymon] XYMON Proxy Issue Sent by: "Xymon" <xymon-bounces at xymon.com> Hi, In February, Gautier reported this issue with xymonproxy on Solaris :- http://lists.xymon.com/pipermail/xymon/2014-February/039160.html I have come this week to update an installation of 4.2.3 on Solaris 9 and have encountered the exact same issue as Gautier, but this time on the latest 4.3.17 code :- 2014-05-04 13:05:36 xymonproxy version 4.3.17 starting 2014-05-04 13:20:41 Listening on 0.0.0.0:1984 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 Too many select failures, aborting 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting I do not suffer the connections in TIME_WAIT, just the constant restarting of the proxy every 15 minutes. Here is the truss as it gasps when falling over :- poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206937 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206938 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206939 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206940 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206941 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206942 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003AC60, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, " s t a t u s + 4 5 c s".., 8185) = 140 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, 0x00038CE2, 8045) = 0 time() = 1399206942 shutdown(4, 2, 1) = 0 close(4) = 0 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003ACD0, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " T o o m a n y s e l".., 35) = 35 _exit(1) So, question to Gautier, are you using Solaris 9 and have you managed to resolve this? Another question to the rest of the list, this is actually the only proxy I have on Solaris, all the otehrs are on Redhat, is anyone else using xymonproxy on Solaris and if so, what version? For the time being, I am running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to be working OK. Thanks for any feedback. -- Andy
list Andy Smith
▸
Gautier Begin wrote:
Andy,
I'm using Solaris 10.5 in a cluster zone configuration. Both the main and the proxy server. I have also a little proxy under Linux Ubuntu.
XYMON version 4.3.12
Now, my proxy under Solaris is working fine with ~900 targets. Here are the different stepsI have done:
*0- Use a tool to observe the behaviour of the network* on the system. I used netstat on the zone and lsof -i :1984 on the global zone (physical node of the cluster)
Here my perl script to be run on the zone (netstat):
/$total = 0 ;/
/$big_total = 0 ;/
/@netstat = ` netstat -naP tcp ` ;/
/my %Con_Status ;/
/my %Con_Status_Total ;/
/foreach $ln (@netstat)/
/{/
/ chomp($ln) ;/
/ @elts = split(/ +/,$ln) ;/
/ if (( $#elts > 5 ) && ( $ln =~ /[0-9]+.*[A-Z]+/))/
/ {/
/ $big_total++ ;/
/ unless ( exists($Con_Status_Total{$elts[$#elts]}) )/
/ {/
/ $Con_Status_Total{$elts[$#elts]} = 1 ;/
/ } else {/
/ $Con_Status_Total{$elts[$#elts]} = $Con_Status_Total{$elts[$#elts]} + 1 ;/
/ }/
/ }/
/ if ( $ln =~ /\.1984 +/ )/
/ {/
/ unless ( exists($Con_Status{$elts[$#elts]}) )/
/ {/
/ $Con_Status{$elts[$#elts]} = 1 ;/
/ } else {/
/ $Con_Status{$elts[$#elts]} = $Con_Status{$elts[$#elts]} + 1 ;/
/ }/
/ }/
/}/
/print " State\t\tPort 1984\tTotal\n=======================================\n" ;/
/foreach $Conn_State (sort keys %Con_Status_Total )/
/{/
/ unless ( exists($Con_Status{$Conn_State}) ) { $Con_Status{$Conn_State} = 0 ; }/
/ if ( length($Conn_State) < 7 ) { $col = "\t\t" ; } else { $col = "\t" ; }/
/ print " $Conn_State$col$Con_Status{$Conn_State}\t\t$Con_Status_Total{$Conn_State}\n" ;/
/ $total = $total + $Con_Status{$Conn_State} ;/
/}/
/print "=======================================\n TOTAL\t\t$total\t\t$big_total\n" ;/
*1- Tune and configure how Solaris manages the network *using the ndd command:
/ndd -set /dev/tcp tcp_time_wait_interval 2000/
/ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500/
/ndd -set /dev/tcp tcp_ip_abort_interval 300000/
/ndd -set /dev/tcp tcp_keepalive_interval 7200000/
/ndd -set /dev/tcp tcp_rexmit_interval_max 4000/
/ndd -set /dev/tcp tcp_rexmit_interval_min 3000/
/ndd -set /dev/tcp tcp_rexmit_interval_initial 3000/
/ndd -set /dev/tcp tcp_smallest_anon_port 1024/
/ndd -set /dev/tcp tcp_conn_req_max_q 2048/
/ndd -set /dev/tcp tcp_conn_req_max_q0 4096/
/ndd -set /dev/tcp tcp_slow_start_initial 4/
/ndd -set /dev/tcp tcp_xmit_hiwat 262144/
/ndd -set /dev/tcp tcp_recv_hiwat 262144/
/ndd -set /dev/tcp tcp_max_buf 1048576/
*2- Modify the program xymonproxy.c*
As I previously said, sockets are not well handled in this program (closure not managed). Because I know very few about C programming, I just "arranged" the program, but it's remain a dirty solution.
=> so_linger, setsockopt part
I modified also line 973 and following because of verbose logging slowing done the proxy (select failed message). The best should be to solve to issue but I didn't.
/# diff xymonproxy.c xymonproxy.c.ORIG/
/230d229/
/< struct linger so_linger;/
/715,717d713/
/< so_linger.l_onoff = 0;/
/< so_linger.l_linger = 10;/
/< setsockopt(cwalk->ssocket, SOL_SOCKET, SO_LINGER, &so_linger, sizeof(so_linger));/
/977,981c973,976/
/< /* if (n < 0) { *//
/< /* errprintf("select() %d/%d failed: %s\n", n, maxfd, strerror(errno)); *//
/< /* } *//
/< /* else if (n == 0) { *//
/< if (n == 0) {/
/---/
/> if (n < 0) {/
/> errprintf("select() failed: %s\n", strerror(errno));/
/> }/
/> else if (n == 0) {/
/1001c996/
/< else if ( n > 0 ) {/
/---/
/> else {/
*3- XYMON proxy conf*
Because of the large amount of targets:
In xymonserver.cfg, of the proxy, I put MAXMSGSPERCOMBO="500" .
In the xymonserver.cfg, of the main server, I put
MAXMSGSPERCOMBO="500"
MAXLINE="5242880"
MAXMSG_CLIENT="5242880"
MAXMSG_DATA="5242880"
MAXMSG_STACHG="5242880"
MAXMSG_STATUS="5242880"
MAXMSG_NOTES="5242880"
MAXMSG_PAGE="5242880"
MAXMSG_ENADIS="5242880"
MAXMSG_CLICHG="5242880"
This part is not realy tunned (figures should be too large) but it's working.
Cordialement, Regards,Mit freundlichen Grüßen,
Gautier BEGIN
System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg
Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com
CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
From: Andy Smith <user-982f5f6d4d28@xymon.invalid>
To: xymon at xymon.com
Date: 05/04/2014 02:50 PM
Subject: Re: [Xymon] XYMON Proxy Issue
Sent by: "Xymon" <xymon-bounces at xymon.com>
Hi,
In February, Gautier reported this issue with xymonproxy on Solaris :-
_
__ http://lists.xymon.com/pipermail/xymon/2014-February/039160.html_
I have come this week to update an installation of 4.2.3 on Solaris 9 and have encountered the exact same issue as Gautier, but this time on the latest 4.3.17 code :-
2014-05-04 13:05:36 xymonproxy version 4.3.17 starting2014-05-04 13:20:41 Listening on _0.0.0.0:1984_ <http://0.0.0.0:1984/>;
▸
2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 Too many select failures, aborting
2014-05-04 13:20:46 xymonproxy version 4.3.17 starting
I do not suffer the connections in TIME_WAIT, just the constant restarting of the proxy every 15 minutes. Here is the truss as it gasps when falling over :-
poll(0xFFBFF208, 1, 1000) = 0
time() = 1399206937
poll(0xFFBFF208, 1, 1000) = 0
time() = 1399206938
poll(0xFFBFF208, 1, 1000) = 0
time() = 1399206939
poll(0xFFBFF208, 1, 1000) = 0
time() = 1399206940
poll(0xFFBFF208, 1, 1000) = 0
time() = 1399206941
poll(0xFFBFF208, 1, 1000) = 0
time() = 1399206942
poll(0xFFBFF208, 1, 1000) = 1
accept(3, 0x0003AC60, 0xFFBFF310, 1) = 4
fcntl(4, F_SETFL, 0x00000080) = 0
time() = 1399206942
poll(0xFFBFF200, 2, 1000) = 1
read(4, " s t a t u s + 4 5 c s".., 8185) = 140
time() = 1399206942
poll(0xFFBFF200, 2, 1000) = 1
read(4, 0x00038CE2, 8045) = 0
time() = 1399206942
shutdown(4, 2, 1) = 0
close(4) = 0
poll(0xFFBFF208, 1, 1000) = 1
accept(3, 0x0003ACD0, 0xFFBFF310, 1) = 4
fcntl(4, F_SETFL, 0x00000080) = 0
time() = 1399206942
time() = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19
write(2, " ", 1) = 1
write(2, " s e l e c t ( ) f a i".., 34) = 34
time() = 1399206942
time() = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19
write(2, " ", 1) = 1
write(2, " s e l e c t ( ) f a i".., 34) = 34
time() = 1399206942
time() = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19
write(2, " ", 1) = 1
write(2, " s e l e c t ( ) f a i".., 34) = 34
time() = 1399206942
time() = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19
write(2, " ", 1) = 1
write(2, " s e l e c t ( ) f a i".., 34) = 34
time() = 1399206942
time() = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19
write(2, " ", 1) = 1
write(2, " s e l e c t ( ) f a i".., 34) = 34
time() = 1399206942
time() = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19
write(2, " ", 1) = 1
write(2, " s e l e c t ( ) f a i".., 34) = 34
time() = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19
write(2, " ", 1) = 1
write(2, " T o o m a n y s e l".., 35) = 35
_exit(1)
So, question to Gautier, are you using Solaris 9 and have you managed to resolve this?
Another question to the rest of the list, this is actually the only proxy I have on Solaris, all the otehrs are on Redhat, is anyone else using xymonproxy on Solaris and if so, what version? For the time being, I am running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to be working OK.
Thanks for any feedback.
--
AndyGautier, My issue is not a matter of performance or resource, I have only 3 servers in this DMZ, but thanks for the complete information. Also, it is a concern that this still happens with recent versions of Solaris, I would be prepared to accept that Solaris 9 might behave incorrectly but I would have hoped that Solaris 10 might have fixed this. Maybe I will go back to the differences between the code for bbproxy at 4.2.3 and xymonproxy at 4.3.17 for a clue as to what is going on. -- Andy
list Andy Smith
▸
Andy Smith wrote:
Hi, In February, Gautier reported this issue with xymonproxy on Solaris :- http://lists.xymon.com/pipermail/xymon/2014-February/039160.html I have come this week to update an installation of 4.2.3 on Solaris 9 and have encountered the exact same issue as Gautier, but this time on the latest 4.3.17 code :- 2014-05-04 13:05:36 xymonproxy version 4.3.17 starting 2014-05-04 13:20:41 Listening on 0.0.0.0:1984 <http://0.0.0.0:1984>; 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 Too many select failures, aborting 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting I do not suffer the connections in TIME_WAIT, just the constant restarting of the proxy every 15 minutes. Here is the truss as it gasps when falling over :- poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206937 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206938 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206939 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206940 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206941 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206942 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003AC60, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, " s t a t u s + 4 5 c s".., 8185) = 140 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, 0x00038CE2, 8045) = 0 time() = 1399206942 shutdown(4, 2, 1) = 0 close(4) = 0 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003ACD0, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " T o o m a n y s e l".., 35) = 35 _exit(1) So, question to Gautier, are you using Solaris 9 and have you managed to resolve this? Another question to the rest of the list, this is actually the only proxy I have on Solaris, all the otehrs are on Redhat, is anyone else using xymonproxy on Solaris and if so, what version? For the time being, I am running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to be working OK.
Done a bit more digging around. Firstly, if I regress to r#7368 (4.3.13) then xymonproxy on Solaris is stable. This just hides the problem of course and might be a factor in Gautier's performance issue. If I modify the code for 4.3.17 to remove the exit after 5 select() failures and add in some further debugging, I can observe that on Solaris 9 at least :- - every 900 seconds, select() fails - select continues to fail for 2 seconds then succeeds and the proxy continues as normal. - during these 2 seconds, there are no further calls to poll(), but somewhere in the region of 50,000 calls to time(). - the values for the selecttmo structure and maxfd are reasonable, so the invalid argument must be one of the fdread or fdwrite structures. Continuing to collect information but still not sure if I am looking at a Sol9 issue or if this affects later Solaris versions. -- Andy
list Andy Smith
▸
Andy Smith wrote:
Andy Smith wrote:Hi, In February, Gautier reported this issue with xymonproxy on Solaris :- http://lists.xymon.com/pipermail/xymon/2014-February/039160.html I have come this week to update an installation of 4.2.3 on Solaris 9 and have encountered the exact same issue as Gautier, but this time on the latest 4.3.17 code :- 2014-05-04 13:05:36 xymonproxy version 4.3.17 starting 2014-05-04 13:20:41 Listening on 0.0.0.0:1984 <http://0.0.0.0:1984>; 2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 select() failed: Invalid argument 2014-05-04 13:20:41 Too many select failures, aborting 2014-05-04 13:20:46 xymonproxy version 4.3.17 starting I do not suffer the connections in TIME_WAIT, just the constant restarting of the proxy every 15 minutes. Here is the truss as it gasps when falling over :- poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206937 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206938 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206939 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206940 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206941 poll(0xFFBFF208, 1, 1000) = 0 time() = 1399206942 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003AC60, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, " s t a t u s + 4 5 c s".., 8185) = 140 time() = 1399206942 poll(0xFFBFF200, 2, 1000) = 1 read(4, 0x00038CE2, 8045) = 0 time() = 1399206942 shutdown(4, 2, 1) = 0 close(4) = 0 poll(0xFFBFF208, 1, 1000) = 1 accept(3, 0x0003ACD0, 0xFFBFF310, 1) = 4 fcntl(4, F_SETFL, 0x00000080) = 0 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " s e l e c t ( ) f a i".., 34) = 34 time() = 1399206942 write(2, " 2 0 1 4 - 0 5 - 0 4 1".., 19) = 19 write(2, " ", 1) = 1 write(2, " T o o m a n y s e l".., 35) = 35 _exit(1) So, question to Gautier, are you using Solaris 9 and have you managed to resolve this? Another question to the rest of the list, this is actually the only proxy I have on Solaris, all the otehrs are on Redhat, is anyone else using xymonproxy on Solaris and if so, what version? For the time being, I am running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to be working OK.Done a bit more digging around. Firstly, if I regress to r#7368 (4.3.13) then xymonproxy on Solaris is stable. This just hides the problem of course and might be a factor in Gautier's performance issue. If I modify the code for 4.3.17 to remove the exit after 5 select() failures and add in some further debugging, I can observe that on Solaris 9 at least :- - every 900 seconds, select() fails - select continues to fail for 2 seconds then succeeds and the proxy continues as normal. - during these 2 seconds, there are no further calls to poll(), but somewhere in the region of 50,000 calls to time(). - the values for the selecttmo structure and maxfd are reasonable, so the invalid argument must be one of the fdread or fdwrite structures. Continuing to collect information but still not sure if I am looking at a Sol9 issue or if this affects later Solaris versions.
This issue affected Solaris 10 as well, the attached patch resolves all my xymonproxy stability problems on Solaris platforms, I believe the patch is relevant to other platforms also, just that the select() on other platforms is more tolerant. -- Andy