Xymon Mailing List Archive search

Xymonproxy and an unreachable server

list Jeremy Laidman
Tue, 27 Mar 2018 23:17:08 +1100
Message-Id: <CACO=ejw+1TNMMy7GS0Ed+Y6SS4N_F49=user-21ac5f8c3bd7@xymon.invalid>

Wim

I suspect the solution is to find and fix the cause of the buffer overflow.
Is there a coredump from which you can get a backtrace?

Another fix might be to put msgcache/xymonfetch into the mix. The msgcache
process queues up queries and delivers them when it can.

Cheers
Jeremy


On 23 February 2018 at 02:31, Wim Nelis <user-5c5b902249f8@xymon.invalid> wrote:
On a Raspberry Pi zero W xymon-client is running to monitor some sensors
and the PRi0 itself. As the xymon server itself is not reachable by times,
a minimal xymon-server is running too on the RPi0. Xymonproxy is used to
distribute the status and data messages to both the local xymon server and
the primary xymon server. The intention of this setup is to have all the
RRD's locally complete at the RPi0. If needed, the RRDs can be copied from
the RPi0 to the primary xymon server.

The local xymon server is listening to port 1985, the xymonproxy to port
1984. The latter distributes the messages to two servers, using parameter
"--server=127.0.0.1:1985,192.168.178.72:1984". This setup is working, but
the graphs created from the local RRD's contain gaps in the periods that
the primary xymon server is not reachable. The logfiles of the clients
running on the RPi0 contain messages like the following, about twice an
hour:

2018-02-21 06:10:01.664965 Whoops ! Failed to send message (Connection
failed)
2018-02-21 06:10:01.665673 ->  Could not connect to Xymon
daemon at 127.0.0.1:1984 (Connection refused)
2018-02-21 06:10:01.665767 ->  Recipient '127.0.0.1', timeout 15
2018-02-21 06:10:01.665851 ->  1st line: 'status rpi00.mve green Wed
2018.02.21 06:10:01'

This does explain the gaps. The logfile of xymonproxy shows that the proxy
is restarted a dozen times per hour:

2018-02-21 05:55:38.272757 xymonproxy version 4.3.28 starting
2018-02-21 05:55:38.273605 Listening on 0.0.0.0:1984
2018-02-21 05:55:38.273751 Sending to Xymon server(s) 127.0.0.1:1985
192.168.178.72:1984
2018-02-21 05:56:05.304985 Server not responding, message lost
2018-02-21 06:00:30.195973 Server not responding, message lost
2018-02-21 06:00:36.221908 Server not responding, message lost
2018-02-21 06:00:41.231668 Server not responding, message lost
2018-02-21 06:00:41.236076 Server not responding, message lost
*** buffer overflow detected ***: /usr/lib/xymon/server/bin/xymonproxy
terminated
2018-02-21 06:00:42.269357 xymonproxy version 4.3.28 starting
2018-02-21 06:00:42.270200 Listening on 0.0.0.0:1984
2018-02-21 06:00:42.270346 Sending to Xymon server(s) 127.0.0.1:1985
192.168.178.72:1984
2018-02-21 06:01:09.301618 Server not responding, message lost
2018-02-21 06:05:29.188224 Server not responding, message lost
2018-02-21 06:05:40.201194 Server not responding, message lost
2018-02-21 06:05:45.208531 Server not responding, message lost
2018-02-21 06:05:45.208936 Server not responding, message lost
2018-02-21 06:05:45.209058 Server not responding, message lost
*** buffer overflow detected ***: /usr/lib/xymon/server/bin/xymonproxy
terminated
2018-02-21 06:10:45.237707 xymonproxy version 4.3.28 starting
2018-02-21 06:10:45.239061 Listening on 0.0.0.0:1984
2018-02-21 06:10:45.239219 Sending to Xymon server(s) 127.0.0.1:1985
192.168.178.72:1984
2018-02-21 06:11:11.272425 Server not responding, message lost

I have been playing with the queue length, but to no avail. Is it possible
to have xymonproxy not to terminate every 5 minutes, but just report the
inability to send a message to a particular server?

Regards,
  Wim Nelis.