Xymon Mailing List Archive search

XYMON Proxy Issue

10 messages in this thread

list Gautier Begin · Mon, 17 Feb 2014 16:28:49 +0100 ·
Hello


I'm using XYMON 4.3.12 under Solaris 10.5

Since I put a proxy between my XYMON server and ~1000 agents, I have such 
pbl:
- Many test are in pruple state, especially BBWin agents.
- Even if the test is green, nothing is displayed in RRD graph.
- I find timeout messages in logs (agents and server and xymonnet on the 
proxy): 
        2014-02-17 15:58:32 ->  Recipient '10.195.243.205', timeout 15
        2014-02-17 15:58:32 ->  1st line: 'combo'
        2014-02-17 16:02:53 Whoops ! Failed to send message (timeout)

- The xymonproxy log always complaining with a  message that I don't 
understand:

        2014-02-17 12:59:30 xymonproxy version 4.3.12 starting
        2014-02-17 12:59:30 Listening on 0.0.0.0:1984
        2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984
        2014-02-17 12:59:30 select() failed: Invalid argument
        2014-02-17 12:59:30 select() failed: Invalid argument
        2014-02-17 12:59:30 select() failed: Invalid argument

- Many connections in TIME_WAIT from the proxy to the server, sometime the 
status is SYN_SENT

- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are 
working fine.


I tried to play with timeout parameters on servers with no succès.
Any idea ?


Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN
list Gautier Begin · Mon, 17 Feb 2014 17:30:26 +0100 ·
Hello,


The XYMONPROXY log in debug + verbose mode:

2014-02-17 17:30:12 xymonproxy version 4.3.12 starting
2014-02-17 17:30:12 Listening on 0.0.0.0:1984
2014-02-17 17:30:12 Sending to Xymon server(s) 10.195.241.64:1984
8000 2014-02-17 17:30:12 state 0: request from client OK
2014-02-17 17:30:12 0.0.0.0 : status vh-xymon10.xymonproxy
8000 2014-02-17 17:30:12 New connection
8000 2014-02-17 17:30:12 state 0: reading from client
8000 2014-02-17 17:30:12 state 1: request combining
2014-02-17 17:30:12 select() failed: Invalid argument
8000 2014-02-17 17:30:12 state 0: reading from client
8000 2014-02-17 17:30:12 state 1: request combining
2014-02-17 17:30:12 select() failed: Invalid argument
8000 2014-02-17 17:30:12 state 0: reading from client
...

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN


From:   Gautier Begin/LUX/CSC at CSC
To:     xymon at xymon.com
Date:   02/17/2014 04:55 PM
Subject:        [Xymon] XYMON Proxy Issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>
quoted from Gautier Begin


Hello 

I'm using XYMON 4.3.12 under Solaris 10.5 
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy):         2014-02-17 15:58:32 ->  Recipient '10.195.243.205', timeout 15         2014-02-17 15:58:32 ->  1st line: 'combo'         2014-02-17 16:02:53 Whoops ! Failed to send message (timeout) 
- The xymonproxy log always complaining with a  message that I don't understand: 
        2014-02-17 12:59:30 xymonproxy version 4.3.12 starting         2014-02-17 12:59:30 Listening on 0.0.0.0:1984         2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984         2014-02-17 12:59:30 select() failed: Invalid argument         2014-02-17 12:59:30 select() failed: Invalid argument         2014-02-17 12:59:30 select() failed: Invalid argument 
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT 
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine. 

I tried to play with timeout parameters on servers with no succès. Any idea ? 

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN
list Gautier Begin · Mon, 17 Feb 2014 18:29:52 +0100 ·
Hi,

I made a truss on the xymonproxy process and get such errors (ECONNABORTED/EINPROGRESS)


16881/1:        33.7424  0.0002 fcntl(6, F_SETFL, FNONBLOCK)      = 0
16881/1:        33.7427  0.0003 connect(6, 0xFFBFD838, 16, SOV_DEFAULT)   = 0
16881/1:        33.7429  0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT)      = 0
16881/1:        33.7430  0.0001 close(4)      = 0
16881/1:        33.7432  0.0002 close(5)      = 0
16881/1:        33.7433  0.0001 pollsys(0xFFBFD320, 4, 0xFFBFD3D8, 0x00000000)  = 4
16881/1:        33.7435  0.0002 read(9, " c o n f i g   b b w i n".., 8185)     = 29
16881/1:        33.7437  0.0002 read(7, 0x000544E3, 9124)      = 0
16881/1:        33.7439  0.0002 getsockopt(6, SOL_SOCKET, SO_ERROR, 0xFFBFD4B8, 0xFFBFD4B4, SOV_DEFAULT) = 0
16881/1:        33.7440  0.0001 write(6, " d a t a   S N F N L X 2".., 402)     = 402
16881/1:        33.7465  0.0025 accept(3, 0x0004E7A0, 0xFFBFD4B0, SOV_DEFAULT)  Err#130 ECONNABORTED
...
6881/1:        33.2151  0.0002 accept(3, 0x0004E810, 0xFFBFD4B0, SOV_DEFAULT)  = 8
16881/1:        33.2152  0.0001 fcntl(8, F_SETFL, FNONBLOCK)      = 0
16881/1:        33.2153  0.0001 time()      = 1392657199
16881/1:        33.2155  0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT)      = 0
16881/1:        33.2156  0.0001 close(4)      = 0
16881/1:        33.2158  0.0002 shutdown(7, SHUT_WR, SOV_DEFAULT)      = 0
16881/1:        33.2160  0.0002 close(5)      = 0
16881/1:        33.2161  0.0001 time()      = 1392657199
16881/1:        33.2162  0.0001 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 4
16881/1:        33.2163  0.0001 fcntl(4, F_SETFL, FNONBLOCK)      = 0
16881/1:        33.2166  0.0003 connect(4, 0xFFBFD838, 16, SOV_DEFAULT)   Err#150 EINPROGRESS
16881/1:        33.2167  0.0001 pollsys(0xFFBFD328, 5, 0xFFBFD3D8, 0x00000000)  = 3
16881/1:        33.2168  0.0001 read(8, " c o n f i g   b b w i n".., 8185)     = 31
16881/1:        33.2171  0.0003 read(6, " s t a t u s   C A C X N".., 49145)    = 9630
..

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

Next Absences:    - From 20th Feb to 21th Feb
                                  - From 3th Apr to 21th Apr

System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery.  NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
 • CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
quoted from Gautier Begin


From:   Gautier Begin/LUX/CSC at CSC
To:     xymon at xymon.com
Date:   02/17/2014 04:55 PM
Subject:        [Xymon] XYMON Proxy Issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>


Hello 

I'm using XYMON 4.3.12 under Solaris 10.5 
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy):         2014-02-17 15:58:32 ->  Recipient '10.195.243.205', timeout 15         2014-02-17 15:58:32 ->  1st line: 'combo'         2014-02-17 16:02:53 Whoops ! Failed to send message (timeout) 
- The xymonproxy log always complaining with a  message that I don't understand: 
        2014-02-17 12:59:30 xymonproxy version 4.3.12 starting         2014-02-17 12:59:30 Listening on 0.0.0.0:1984         2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984         2014-02-17 12:59:30 select() failed: Invalid argument         2014-02-17 12:59:30 select() failed: Invalid argument         2014-02-17 12:59:30 select() failed: Invalid argument 
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT 
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine. 

I tried to play with timeout parameters on servers with no succès. Any idea ? 

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN
list Gautier Begin · Wed, 19 Feb 2014 10:39:09 +0100 ·
Any news ?

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN


From:   Gautier Begin/LUX/CSC at CSC
To:     xymon at xymon.com, "Xymon" <xymon-bounces at xymon.com>
Date:   02/17/2014 06:56 PM
Subject:        Re: [Xymon] XYMON Proxy Issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>


Hi, 
I made a truss on the xymonproxy process and get such errors (ECONNABORTED/EINPROGRESS) 

16881/1:        33.7424  0.0002 fcntl(6, F_SETFL, FNONBLOCK)      = 0 16881/1:        33.7427  0.0003 connect(6, 0xFFBFD838, 16, SOV_DEFAULT)   = 0 16881/1:        33.7429  0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT)      = 0 16881/1:        33.7430  0.0001 close(4)      = 0 16881/1:        33.7432  0.0002 close(5)      = 0 16881/1:        33.7433  0.0001 pollsys(0xFFBFD320, 4, 0xFFBFD3D8, 0x00000000)  = 4 16881/1:        33.7435  0.0002 read(9, " c o n f i g   b b w i n".., 8185)     = 29 16881/1:        33.7437  0.0002 read(7, 0x000544E3, 9124)      = 0 16881/1:        33.7439  0.0002 getsockopt(6, SOL_SOCKET, SO_ERROR, 0xFFBFD4B8, 0xFFBFD4B4, SOV_DEFAULT) = 0 16881/1:        33.7440  0.0001 write(6, " d a t a   S N F N L X 2".., 402)     = 402 16881/1:        33.7465  0.0025 accept(3, 0x0004E7A0, 0xFFBFD4B0, SOV_DEFAULT)  Err#130 ECONNABORTED ... 6881/1:        33.2151  0.0002 accept(3, 0x0004E810, 0xFFBFD4B0, SOV_DEFAULT)  = 8 16881/1:        33.2152  0.0001 fcntl(8, F_SETFL, FNONBLOCK)      = 0 16881/1:        33.2153  0.0001 time()      = 1392657199 16881/1:        33.2155  0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT)      = 0 16881/1:        33.2156  0.0001 close(4)      = 0 16881/1:        33.2158  0.0002 shutdown(7, SHUT_WR, SOV_DEFAULT)      = 0 
16881/1:        33.2160  0.0002 close(5)      = 0 16881/1:        33.2161  0.0001 time()      = 1392657199 16881/1:        33.2162  0.0001 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 4 16881/1:        33.2163  0.0001 fcntl(4, F_SETFL, FNONBLOCK)      = 0 16881/1:        33.2166  0.0003 connect(4, 0xFFBFD838, 16, SOV_DEFAULT)   Err#150 EINPROGRESS 16881/1:        33.2167  0.0001 pollsys(0xFFBFD328, 5, 0xFFBFD3D8, 0x00000000)  = 3 16881/1:        33.2168  0.0001 read(8, " c o n f i g   b b w i n".., 8185)     = 31 16881/1:        33.2171  0.0003 read(6, " s t a t u s   C A C X N".., 49145)    = 9630 .. 
Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

Next Absences:    - From 20th Feb to 21th Feb
                                 - From 3th Apr to 21th Apr

System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery.  NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664 


From:        Gautier Begin/LUX/CSC at CSC To:        xymon at xymon.com Date:        02/17/2014 04:55 PM Subject:        [Xymon] XYMON Proxy Issue Sent by:        "Xymon" <xymon-bounces at xymon.com> 


Hello 

I'm using XYMON 4.3.12 under Solaris 10.5 
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy):        2014-02-17 15:58:32 ->  Recipient '10.195.243.205', timeout 15        2014-02-17 15:58:32 ->  1st line: 'combo'        2014-02-17 16:02:53 Whoops ! Failed to send message (timeout) 
- The xymonproxy log always complaining with a  message that I don't understand: 
       2014-02-17 12:59:30 xymonproxy version 4.3.12 starting        2014-02-17 12:59:30 Listening on 0.0.0.0:1984        2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984        2014-02-17 12:59:30 select() failed: Invalid argument        2014-02-17 12:59:30 select() failed: Invalid argument        2014-02-17 12:59:30 select() failed: Invalid argument 
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT 
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine. 

I tried to play with timeout parameters on servers with no succès. Any idea ? 

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN
list Gautier Begin · Tue, 25 Feb 2014 11:40:58 +0100 ·
Hello,


Here a part of the solution:

Communication Issue
====================
Observing le connections between agents and proxy using netstat, I see a lot of CLOSE_WAIT on the side of the proxy and some SYN_SENT on the side of agents.
=> The xymonproxy program doesn't close its connections, then the system reach its limit of nbr of active connections and refuse the new ones.

Modifying the creation of the socket in the C program, we were successfull  in making the connections correctly handled:

230d229
<         struct linger so_linger;
715,717d713
<                                       so_linger.l_onoff = 0;
<                               so_linger.l_linger = 10;
<                               setsockopt(cwalk->ssocket, SOL_SOCKET, SO_LINGER, &so_linger, sizeof(so_linger));


Select Issue
====================
In the C program, the line n = select(maxfd+1, &fdread, &fdwrite, NULL, &selecttmo); is in error because the n generated is equaled to -1 .

We didn't found yet why and how the solve it. Any idea ?
quoted from Gautier Begin


Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

Next Absences:     - From 3th Apr to 21th Apr


System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery.  NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
 • CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664


From:   Gautier Begin/LUX/CSC
To:     xymon at xymon.com, "Xymon" <xymon-bounces at xymon.com>
Date:   02/19/2014 10:44 AM
Subject:        Re: [Xymon] XYMON Proxy Issue


Any news ?

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN


From:   Gautier Begin/LUX/CSC at CSC
To:     xymon at xymon.com, "Xymon" <xymon-bounces at xymon.com>
Date:   02/17/2014 06:56 PM
Subject:        Re: [Xymon] XYMON Proxy Issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>


Hi, 
I made a truss on the xymonproxy process and get such errors (ECONNABORTED/EINPROGRESS) 

16881/1:        33.7424  0.0002 fcntl(6, F_SETFL, FNONBLOCK)      = 0 16881/1:        33.7427  0.0003 connect(6, 0xFFBFD838, 16, SOV_DEFAULT)   = 0 16881/1:        33.7429  0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT)      = 0 16881/1:        33.7430  0.0001 close(4)      = 0 16881/1:        33.7432  0.0002 close(5)      = 0 16881/1:        33.7433  0.0001 pollsys(0xFFBFD320, 4, 0xFFBFD3D8, 0x00000000)  = 4 16881/1:        33.7435  0.0002 read(9, " c o n f i g   b b w i n".., 8185)     = 29 16881/1:        33.7437  0.0002 read(7, 0x000544E3, 9124)      = 0 16881/1:        33.7439  0.0002 getsockopt(6, SOL_SOCKET, SO_ERROR, 0xFFBFD4B8, 0xFFBFD4B4, SOV_DEFAULT) = 0 16881/1:        33.7440  0.0001 write(6, " d a t a   S N F N L X 2".., 402)     = 402 16881/1:        33.7465  0.0025 accept(3, 0x0004E7A0, 0xFFBFD4B0, SOV_DEFAULT)  Err#130 ECONNABORTED ... 6881/1:        33.2151  0.0002 accept(3, 0x0004E810, 0xFFBFD4B0, SOV_DEFAULT)  = 8 16881/1:        33.2152  0.0001 fcntl(8, F_SETFL, FNONBLOCK)      = 0 16881/1:        33.2153  0.0001 time()      = 1392657199 16881/1:        33.2155  0.0002 shutdown(4, SHUT_RDWR, SOV_DEFAULT)      = 0 16881/1:        33.2156  0.0001 close(4)      = 0 16881/1:        33.2158  0.0002 shutdown(7, SHUT_WR, SOV_DEFAULT)      = 0 
16881/1:        33.2160  0.0002 close(5)      = 0 16881/1:        33.2161  0.0001 time()      = 1392657199 16881/1:        33.2162  0.0001 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 4 16881/1:        33.2163  0.0001 fcntl(4, F_SETFL, FNONBLOCK)      = 0 16881/1:        33.2166  0.0003 connect(4, 0xFFBFD838, 16, SOV_DEFAULT)   Err#150 EINPROGRESS 16881/1:        33.2167  0.0001 pollsys(0xFFBFD328, 5, 0xFFBFD3D8, 0x00000000)  = 3 16881/1:        33.2168  0.0001 read(8, " c o n f i g   b b w i n".., 8185)     = 31 16881/1:        33.2171  0.0003 read(6, " s t a t u s   C A C X N".., 49145)    = 9630 .. 
Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

Next Absences:    - From 20th Feb to 21th Feb
                                 - From 3th Apr to 21th Apr

System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery.  NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664 


From:        Gautier Begin/LUX/CSC at CSC To:        xymon at xymon.com Date:        02/17/2014 04:55 PM Subject:        [Xymon] XYMON Proxy Issue Sent by:        "Xymon" <xymon-bounces at xymon.com> 


Hello 

I'm using XYMON 4.3.12 under Solaris 10.5 
Since I put a proxy between my XYMON server and ~1000 agents, I have such pbl: - Many test are in pruple state, especially BBWin agents. - Even if the test is green, nothing is displayed in RRD graph. - I find timeout messages in logs (agents and server and xymonnet on the proxy):        2014-02-17 15:58:32 ->  Recipient '10.195.243.205', timeout 15        2014-02-17 15:58:32 ->  1st line: 'combo'        2014-02-17 16:02:53 Whoops ! Failed to send message (timeout) 
- The xymonproxy log always complaining with a  message that I don't understand: 
       2014-02-17 12:59:30 xymonproxy version 4.3.12 starting        2014-02-17 12:59:30 Listening on 0.0.0.0:1984        2014-02-17 12:59:30 Sending to Xymon server(s) 10.195.241.64:1984        2014-02-17 12:59:30 select() failed: Invalid argument        2014-02-17 12:59:30 select() failed: Invalid argument        2014-02-17 12:59:30 select() failed: Invalid argument 
- Many connections in TIME_WAIT from the proxy to the server, sometime the status is SYN_SENT 
- Tests coming from a second xymonproxy with fewer BBWin agents ( ~70) are working fine. 

I tried to play with timeout parameters on servers with no succès. Any idea ? 

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN
list Andy Smith · Sun, 4 May 2014 13:50:51 +0100 ·
Hi,

In February, Gautier reported this issue with xymonproxy on Solaris :-

http://lists.xymon.com/pipermail/xymon/2014-February/039160.html

I have come this week to update an installation of 4.2.3 on Solaris 9 and
have encountered the exact same issue as Gautier, but this time on the
latest 4.3.17 code :-

2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
2014-05-04 13:20:41 Listening on 0.0.0.0:1984
2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 Too many select failures, aborting
2014-05-04 13:20:46 xymonproxy version 4.3.17 starting

I do not suffer the connections in TIME_WAIT, just the constant restarting
of the proxy every 15 minutes.  Here is the truss as it gasps when falling
over :-

poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206937
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206938
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206939
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206940
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206941
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206942
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, 0x00038CE2, 8045)                       = 0
time()                                          = 1399206942
shutdown(4, 2, 1)                               = 0
close(4)                                        = 0
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " T o o   m a n y   s e l".., 35)      = 35
_exit(1)

So, question to Gautier, are you using Solaris 9 and have you managed to
resolve this?

Another question to the rest of the list, this is actually the only proxy I
have on Solaris, all the otehrs are on Redhat, is anyone else using
xymonproxy on Solaris and if so, what version?  For the time being, I am
running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to
be working OK.

Thanks for any feedback.
-- 
Andy
list Gautier Begin · Mon, 5 May 2014 10:12:22 +0200 ·
Andy,

I'm using Solaris 10.5 in a cluster zone configuration. Both the main and 
the proxy server. I have also a little proxy under Linux Ubuntu. 
XYMON version 4.3.12

Now, my proxy under Solaris is working fine with ~900 targets. Here are 
the different stepsI have done:

0- Use a tool to observe the behaviour of the network on the system. I 
used netstat on the zone and lsof -i :1984 on the global zone (physical 
node of the cluster)

 Here my perl script to be run on the zone (netstat):

$total = 0 ;
$big_total = 0 ;
@netstat = ` netstat -naP tcp ` ;
my %Con_Status ;
my %Con_Status_Total ;
foreach $ln (@netstat)
{
        chomp($ln) ;
        @elts = split(/ +/,$ln) ;
        if (( $#elts > 5 ) && ( $ln =~ /[0-9]+.*[A-Z]+/))
        {
                 $big_total++ ;
                 unless ( exists($Con_Status_Total{$elts[$#elts]}) )
                {
                        $Con_Status_Total{$elts[$#elts]} = 1 ;
                } else {
                        $Con_Status_Total{$elts[$#elts]} = 
$Con_Status_Total{$elts[$#elts]} + 1 ;
                }

        }

        if ( $ln =~ /\.1984 +/ )
        {

                unless ( exists($Con_Status{$elts[$#elts]}) )
                {
                        $Con_Status{$elts[$#elts]} = 1 ;
                } else {
                        $Con_Status{$elts[$#elts]} = 
$Con_Status{$elts[$#elts]} + 1 ;
                }

        }


}


print " State\t\tPort 
1984\tTotal\n=======================================\n" ;
foreach $Conn_State (sort keys %Con_Status_Total )
{
         unless ( exists($Con_Status{$Conn_State}) ) { 
$Con_Status{$Conn_State} = 0 ; }
        if ( length($Conn_State) < 7 ) { $col = "\t\t" ; } else { $col = 
"\t"  ; }
        print " 
$Conn_State$col$Con_Status{$Conn_State}\t\t$Con_Status_Total{$Conn_State}\n" 
;
        $total = $total + $Con_Status{$Conn_State} ;
}
print "=======================================\n 
TOTAL\t\t$total\t\t$big_total\n" ;


1- Tune and configure how Solaris manages the network using the ndd 
command:

ndd -set /dev/tcp tcp_time_wait_interval        2000
ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
ndd -set /dev/tcp tcp_ip_abort_interval         300000
ndd -set /dev/tcp tcp_keepalive_interval        7200000
ndd -set /dev/tcp tcp_rexmit_interval_max       4000
ndd -set /dev/tcp tcp_rexmit_interval_min       3000
ndd -set /dev/tcp tcp_rexmit_interval_initial   3000
ndd -set /dev/tcp tcp_smallest_anon_port        1024

ndd -set /dev/tcp tcp_conn_req_max_q    2048
ndd -set /dev/tcp tcp_conn_req_max_q0   4096
ndd -set /dev/tcp tcp_slow_start_initial        4

ndd -set /dev/tcp tcp_xmit_hiwat        262144
ndd -set /dev/tcp tcp_recv_hiwat        262144
ndd -set /dev/tcp tcp_max_buf   1048576


2- Modify the program xymonproxy.c

As I previously said, sockets are not well handled in this program 
(closure not managed). Because I know very few about C programming, I just 
"arranged" the program, but it's remain a dirty solution.
=> so_linger, setsockopt part

I modified also line 973 and following because of verbose logging slowing 
done the proxy (select failed message). The best should be to solve to 
issue but I didn't.

# diff xymonproxy.c xymonproxy.c.ORIG
quoted from Gautier Begin
230d229
<         struct linger so_linger;
715,717d713
<                                       so_linger.l_onoff = 0;
<                               so_linger.l_linger = 10;
<                               setsockopt(cwalk->ssocket, SOL_SOCKET, 
SO_LINGER, &so_linger, sizeof(so_linger));

977,981c973,976
< /*            if (n < 0) {                      */
< /*                    errprintf("select() %d/%d failed: %s\n", n, maxfd, 
strerror(errno));    */
< /*            }                      */
< /*            else if (n == 0) {                      */
<               if (n == 0) {
---
              if (n < 0) {
                      errprintf("select() failed: %s\n", 
strerror(errno));
              }
              else if (n == 0) {
1001c996
<               else if ( n > 0 ) {
---
              else {

3- XYMON proxy conf

Because of the large amount of targets:

In xymonserver.cfg, of the proxy, I put MAXMSGSPERCOMBO="500" .

In the xymonserver.cfg, of the main server, I put

MAXMSGSPERCOMBO="500"

MAXLINE="5242880"
MAXMSG_CLIENT="5242880"
MAXMSG_DATA="5242880"
MAXMSG_STACHG="5242880"
MAXMSG_STATUS="5242880"
MAXMSG_NOTES="5242880"
MAXMSG_PAGE="5242880"
MAXMSG_ENADIS="5242880"
MAXMSG_CLICHG="5242880"


This part is not realy tunned (figures should be too large) but it's 
working.
quoted from Andy Smith


Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | 
user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, 
please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.  NOTE: Regardless of content, this e-mail shall not 
operate to bind CSC to any order or other contract unless pursuant to 
explicit written agreement or government initiative expressly permitting 
the use of e-mail for such purpose
 • CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10
Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in 
France: RCS Nanterre B 315 268 664


From:   Andy Smith <user-982f5f6d4d28@xymon.invalid>
To:     xymon at xymon.com
Date:   05/04/2014 02:50 PM
Subject:        Re: [Xymon] XYMON Proxy Issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>


Hi,

In February, Gautier reported this issue with xymonproxy on Solaris :-

http://lists.xymon.com/pipermail/xymon/2014-February/039160.html

I have come this week to update an installation of 4.2.3 on Solaris 9 and 
have encountered the exact same issue as Gautier, but this time on the 
latest 4.3.17 code :-

2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
2014-05-04 13:20:41 Listening on 0.0.0.0:1984
2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 Too many select failures, aborting
2014-05-04 13:20:46 xymonproxy version 4.3.17 starting

I do not suffer the connections in TIME_WAIT, just the constant restarting 
of the proxy every 15 minutes.  Here is the truss as it gasps when falling 
over :-

poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206937
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206938
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206939
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206940
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206941
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206942
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, 0x00038CE2, 8045)                       = 0
time()                                          = 1399206942
shutdown(4, 2, 1)                               = 0
close(4)                                        = 0
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " T o o   m a n y   s e l".., 35)      = 35
_exit(1)

So, question to Gautier, are you using Solaris 9 and have you managed to 
resolve this?

Another question to the rest of the list, this is actually the only proxy 
I have on Solaris, all the otehrs are on Redhat, is anyone else using 
xymonproxy on Solaris and if so, what version?  For the time being, I am 
running the old bbproxy until I get this fixed, the rest of 4.3.17 seems 
to be working OK.

Thanks for any feedback.
-- 
Andy
list Andy Smith · Mon, 05 May 2014 10:52:06 +0100 ·
quoted from Gautier Begin
Gautier Begin wrote:
Andy,

I'm using Solaris 10.5 in a cluster zone configuration. Both the main and the proxy server. I have also a little proxy under Linux Ubuntu.
XYMON version 4.3.12

Now, my proxy under Solaris is working fine with ~900 targets. Here are the different stepsI have done:

*0- Use a tool to observe the behaviour of the network* on the system. I used netstat on the zone and lsof -i :1984 on the global zone (physical node of the cluster)

 Here my perl script to be run on the zone (netstat):

/$total = 0 ;/
/$big_total = 0 ;/
/@netstat = ` netstat -naP tcp ` ;/
/my %Con_Status ;/
/my %Con_Status_Total ;/
/foreach $ln (@netstat)/
/{/
/        chomp($ln) ;/
/        @elts = split(/ +/,$ln) ;/
/        if (( $#elts > 5 ) && ( $ln =~ /[0-9]+.*[A-Z]+/))/
/        {/
/                 $big_total++ ;/
/                 unless ( exists($Con_Status_Total{$elts[$#elts]}) )/
/                {/
/                        $Con_Status_Total{$elts[$#elts]} = 1 ;/
/                } else {/
/                        $Con_Status_Total{$elts[$#elts]} = $Con_Status_Total{$elts[$#elts]} + 1 ;/
/                }/

/        }/

/        if ( $ln =~ /\.1984 +/ )/
/        {/

/                unless ( exists($Con_Status{$elts[$#elts]}) )/
/                {/
/                        $Con_Status{$elts[$#elts]} = 1 ;/
/                } else {/
/                        $Con_Status{$elts[$#elts]} = $Con_Status{$elts[$#elts]} + 1 ;/
/                }/

/        }/


/}/


/print " State\t\tPort 1984\tTotal\n=======================================\n" ;/
/foreach $Conn_State (sort keys %Con_Status_Total )/
/{/
/         unless ( exists($Con_Status{$Conn_State}) ) { $Con_Status{$Conn_State} = 0 ; }/
/        if ( length($Conn_State) < 7 ) { $col = "\t\t" ; } else { $col = "\t"  ; }/
/        print " $Conn_State$col$Con_Status{$Conn_State}\t\t$Con_Status_Total{$Conn_State}\n" ;/
/        $total = $total + $Con_Status{$Conn_State} ;/
/}/
/print "=======================================\n TOTAL\t\t$total\t\t$big_total\n" ;/


*1- Tune and configure how Solaris manages the network *using the ndd command:

/ndd -set /dev/tcp tcp_time_wait_interval        2000/
/ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500/
/ndd -set /dev/tcp tcp_ip_abort_interval         300000/
/ndd -set /dev/tcp tcp_keepalive_interval        7200000/
/ndd -set /dev/tcp tcp_rexmit_interval_max       4000/
/ndd -set /dev/tcp tcp_rexmit_interval_min       3000/
/ndd -set /dev/tcp tcp_rexmit_interval_initial   3000/
/ndd -set /dev/tcp tcp_smallest_anon_port        1024/

/ndd -set /dev/tcp tcp_conn_req_max_q    2048/
/ndd -set /dev/tcp tcp_conn_req_max_q0   4096/
/ndd -set /dev/tcp tcp_slow_start_initial        4/

/ndd -set /dev/tcp tcp_xmit_hiwat        262144/
/ndd -set /dev/tcp tcp_recv_hiwat        262144/
/ndd -set /dev/tcp tcp_max_buf   1048576/


*2- Modify the program xymonproxy.c*

As I previously said, sockets are not well handled in this program (closure not managed). Because I know very few about C programming, I just "arranged" the program, but it's remain a dirty solution.
=> so_linger, setsockopt part

I modified also line 973 and following because of verbose logging slowing done the proxy (select failed message). The best should be to solve to issue but I didn't.

/# diff xymonproxy.c xymonproxy.c.ORIG/
/230d229/
/<         struct linger so_linger;/
/715,717d713/
/<                                       so_linger.l_onoff = 0;/
/<                               so_linger.l_linger = 10;/
/<                               setsockopt(cwalk->ssocket, SOL_SOCKET, SO_LINGER, &so_linger, sizeof(so_linger));/

/977,981c973,976/
/< /*            if (n < 0) {                                                                    *//
/< /*                    errprintf("select() %d/%d failed: %s\n", n, maxfd, strerror(errno));    *//
/< /*            }                                                                               *//
/< /*            else if (n == 0) {                                                              *//
/<               if (n == 0) {/
/---/
/>               if (n < 0) {/
/>                       errprintf("select() failed: %s\n", strerror(errno));/
/>               }/
/>               else if (n == 0) {/
/1001c996/
/<               else if ( n > 0 ) {/
/---/
/>               else {/


*3- XYMON proxy conf*

Because of the large amount of targets:

In xymonserver.cfg, of the proxy, I put MAXMSGSPERCOMBO="500" .

In the xymonserver.cfg, of the main server, I put

MAXMSGSPERCOMBO="500"

MAXLINE="5242880"
MAXMSG_CLIENT="5242880"
MAXMSG_DATA="5242880"
MAXMSG_STACHG="5242880"
MAXMSG_STATUS="5242880"
MAXMSG_NOTES="5242880"
MAXMSG_PAGE="5242880"
MAXMSG_ENADIS="5242880"
MAXMSG_CLICHG="5242880"


This part is not realy tunned (figures should be too large) but it's working.


Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery.  NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
• CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664


From:        Andy Smith <user-982f5f6d4d28@xymon.invalid>
To:        xymon at xymon.com
Date:        05/04/2014 02:50 PM
Subject:        Re: [Xymon] XYMON Proxy Issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>


Hi,

In February, Gautier reported this issue with xymonproxy on Solaris :-
_
__ http://lists.xymon.com/pipermail/xymon/2014-February/039160.html_

I have come this week to update an installation of 4.2.3 on Solaris 9 and have encountered the exact same issue as Gautier, but this time on the latest 4.3.17 code :-

2014-05-04 13:05:36 xymonproxy version 4.3.17 starting

2014-05-04 13:20:41 Listening on _0.0.0.0:1984_ <http://0.0.0.0:1984/>;
quoted from Gautier Begin
2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 Too many select failures, aborting
2014-05-04 13:20:46 xymonproxy version 4.3.17 starting

I do not suffer the connections in TIME_WAIT, just the constant restarting of the proxy every 15 minutes.  Here is the truss as it gasps when falling over :-

poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206937
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206938
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206939
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206940
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206941
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206942
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, 0x00038CE2, 8045)                       = 0
time()                                          = 1399206942
shutdown(4, 2, 1)                               = 0
close(4)                                        = 0
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " T o o   m a n y   s e l".., 35)      = 35
_exit(1)

So, question to Gautier, are you using Solaris 9 and have you managed to resolve this?

Another question to the rest of the list, this is actually the only proxy I have on Solaris, all the otehrs are on Redhat, is anyone else using xymonproxy on Solaris and if so, what version?  For the time being, I am running the old bbproxy until I get this fixed, the rest of 4.3.17 seems to be working OK.

Thanks for any feedback.
-- 
Andy
Gautier,

My issue is not a matter of performance or resource, I have only 3 servers in this DMZ, but thanks for the complete information.  Also, it is a concern that this still happens with recent versions of Solaris, I would be prepared to accept that Solaris 9 might behave incorrectly but I would have hoped that Solaris 10 might have fixed this.

Maybe I will go back to the differences between the code for bbproxy at 4.2.3 and xymonproxy at 4.3.17 for a clue as to what is going on.

-- 
Andy
list Andy Smith · Sun, 11 May 2014 21:03:11 +0100 ·
quoted from Andy Smith
Andy Smith wrote:
Hi,

In February, Gautier reported this issue with xymonproxy on Solaris :-

http://lists.xymon.com/pipermail/xymon/2014-February/039160.html

I have come this week to update an installation of 4.2.3 on Solaris 9 
and have encountered the exact same issue as Gautier, but this time on 
the latest 4.3.17 code :-

2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
2014-05-04 13:20:41 Listening on 0.0.0.0:1984 <http://0.0.0.0:1984>;
2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 Too many select failures, aborting
2014-05-04 13:20:46 xymonproxy version 4.3.17 starting

I do not suffer the connections in TIME_WAIT, just the constant 
restarting of the proxy every 15 minutes.  Here is the truss as it gasps 
when falling over :-

poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206937
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206938
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206939
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206940
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206941
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206942
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, 0x00038CE2, 8045)                       = 0
time()                                          = 1399206942
shutdown(4, 2, 1)                               = 0
close(4)                                        = 0
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " T o o   m a n y   s e l".., 35)      = 35
_exit(1)

So, question to Gautier, are you using Solaris 9 and have you managed to 
resolve this?

Another question to the rest of the list, this is actually the only 
proxy I have on Solaris, all the otehrs are on Redhat, is anyone else 
using xymonproxy on Solaris and if so, what version?  For the time 
being, I am running the old bbproxy until I get this fixed, the rest of 
4.3.17 seems to be working OK.
Done a bit more digging around.  Firstly, if I regress to r#7368 
(4.3.13) then xymonproxy on Solaris is stable.  This just hides the 
problem of course and might be a factor in Gautier's performance issue.

If I modify the code for 4.3.17 to remove the exit after 5 select() 
failures and add in some further debugging, I can observe that on 
Solaris 9 at least :-

- every 900 seconds, select() fails
- select continues to fail for 2 seconds then succeeds and the proxy 
continues as normal.
- during these 2 seconds, there are no further calls to poll(), but 
somewhere in the region of 50,000 calls to time().
- the values for the selecttmo structure and maxfd are reasonable, so 
the invalid argument must be one of the fdread or fdwrite structures.

Continuing to collect information but still not sure if I am looking at 
a Sol9 issue or if this affects later Solaris versions.
-- 
Andy
list Andy Smith · Tue, 20 May 2014 07:28:46 +0100 ·
quoted from Andy Smith
Andy Smith wrote:
Andy Smith wrote:
Hi,

In February, Gautier reported this issue with xymonproxy on Solaris :-

http://lists.xymon.com/pipermail/xymon/2014-February/039160.html

I have come this week to update an installation of 4.2.3 on Solaris 9 
and have encountered the exact same issue as Gautier, but this time on 
the latest 4.3.17 code :-

2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
2014-05-04 13:20:41 Listening on 0.0.0.0:1984 <http://0.0.0.0:1984>;
2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 Too many select failures, aborting
2014-05-04 13:20:46 xymonproxy version 4.3.17 starting

I do not suffer the connections in TIME_WAIT, just the constant 
restarting of the proxy every 15 minutes.  Here is the truss as it 
gasps when falling over :-

poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206937
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206938
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206939
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206940
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206941
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206942
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, 0x00038CE2, 8045)                       = 0
time()                                          = 1399206942
shutdown(4, 2, 1)                               = 0
close(4)                                        = 0
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " T o o   m a n y   s e l".., 35)      = 35
_exit(1)

So, question to Gautier, are you using Solaris 9 and have you managed 
to resolve this?

Another question to the rest of the list, this is actually the only 
proxy I have on Solaris, all the otehrs are on Redhat, is anyone else 
using xymonproxy on Solaris and if so, what version?  For the time 
being, I am running the old bbproxy until I get this fixed, the rest 
of 4.3.17 seems to be working OK.
Done a bit more digging around.  Firstly, if I regress to r#7368 
(4.3.13) then xymonproxy on Solaris is stable.  This just hides the 
problem of course and might be a factor in Gautier's performance issue.

If I modify the code for 4.3.17 to remove the exit after 5 select() 
failures and add in some further debugging, I can observe that on 
Solaris 9 at least :-

- every 900 seconds, select() fails
- select continues to fail for 2 seconds then succeeds and the proxy 
continues as normal.
- during these 2 seconds, there are no further calls to poll(), but 
somewhere in the region of 50,000 calls to time().
- the values for the selecttmo structure and maxfd are reasonable, so 
the invalid argument must be one of the fdread or fdwrite structures.

Continuing to collect information but still not sure if I am looking at 
a Sol9 issue or if this affects later Solaris versions.
This issue affected Solaris 10 as well, the attached patch resolves all 
my xymonproxy stability problems on Solaris platforms, I believe the 
patch is relevant to other platforms also, just that the select() on 
other platforms is more tolerant.

-- 
Andy