Xymon Mailing List Archive search

XYMON Proxy Issue

list Andy Smith
Sun, 11 May 2014 21:03:11 +0100
Message-Id: <user-691c65cca532@xymon.invalid>

Andy Smith wrote:
Hi,

In February, Gautier reported this issue with xymonproxy on Solaris :-

http://lists.xymon.com/pipermail/xymon/2014-February/039160.html

I have come this week to update an installation of 4.2.3 on Solaris 9 
and have encountered the exact same issue as Gautier, but this time on 
the latest 4.3.17 code :-

2014-05-04 13:05:36 xymonproxy version 4.3.17 starting
2014-05-04 13:20:41 Listening on 0.0.0.0:1984 <http://0.0.0.0:1984>;
2014-05-04 13:20:41 Sending to Xymon server(s) xx.xx.xx.xx:1984
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 select() failed: Invalid argument
2014-05-04 13:20:41 Too many select failures, aborting
2014-05-04 13:20:46 xymonproxy version 4.3.17 starting

I do not suffer the connections in TIME_WAIT, just the constant 
restarting of the proxy every 15 minutes.  Here is the truss as it gasps 
when falling over :-

poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206937
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206938
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206939
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206940
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206941
poll(0xFFBFF208, 1, 1000)                       = 0
time()                                          = 1399206942
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003AC60, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, " s t a t u s + 4 5   c s".., 8185)     = 140
time()                                          = 1399206942
poll(0xFFBFF200, 2, 1000)                       = 1
read(4, 0x00038CE2, 8045)                       = 0
time()                                          = 1399206942
shutdown(4, 2, 1)                               = 0
close(4)                                        = 0
poll(0xFFBFF208, 1, 1000)                       = 1
accept(3, 0x0003ACD0, 0xFFBFF310, 1)            = 4
fcntl(4, F_SETFL, 0x00000080)                   = 0
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " s e l e c t ( )   f a i".., 34)      = 34
time()                                          = 1399206942
write(2, " 2 0 1 4 - 0 5 - 0 4   1".., 19)      = 19
write(2, "  ", 1)                               = 1
write(2, " T o o   m a n y   s e l".., 35)      = 35
_exit(1)

So, question to Gautier, are you using Solaris 9 and have you managed to 
resolve this?

Another question to the rest of the list, this is actually the only 
proxy I have on Solaris, all the otehrs are on Redhat, is anyone else 
using xymonproxy on Solaris and if so, what version?  For the time 
being, I am running the old bbproxy until I get this fixed, the rest of 
4.3.17 seems to be working OK.
Done a bit more digging around.  Firstly, if I regress to r#7368 
(4.3.13) then xymonproxy on Solaris is stable.  This just hides the 
problem of course and might be a factor in Gautier's performance issue.

If I modify the code for 4.3.17 to remove the exit after 5 select() 
failures and add in some further debugging, I can observe that on 
Solaris 9 at least :-

- every 900 seconds, select() fails
- select continues to fail for 2 seconds then succeeds and the proxy 
continues as normal.
- during these 2 seconds, there are no further calls to poll(), but 
somewhere in the region of 50,000 calls to time().
- the values for the selecttmo structure and maxfd are reasonable, so 
the invalid argument must be one of the fdread or fdwrite structures.

Continuing to collect information but still not sure if I am looking at 
a Sol9 issue or if this affects later Solaris versions.
-- 
Andy