Xymon Mailing List Archive search

ports/procs flapping

3 messages in this thread

list Charles Slater · Thu, 17 Nov 2016 20:09:04 +0000 (UTC) ·
Hello,
I am working on a problem where alarms keep randomly keep going off on a server.  I have taken some screen shots to show you what I am seeing as well.  I'll start with the ports issue first.  Just let me know if you need more information and I can provide you with it.  I am relatively new to working with xymon so any help that you can provide would be beneficial.
Thank you in advance!Charles
list Torsten Richter · Thu, 17 Nov 2016 22:36:36 +0100 ·
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Charles,

in the webinterface of Xymon you have for the most tests
a link called "Client data" below the status.
If you click on it you'll see a text output in your browser
that starts with something like "[collector:]".
Scroll down to the end of the file and see if the last entry
is something about "[clock]" and "epoch", "local" and "UTC".

If not then maybe the data sent from the client to the server
got truncated and you might see a yellow alert for xymond.
Or you find something about it in you Xymon server logs.

I had similar problems with some hosts where a lot of connections
were in state "ESTABLISHED" or "TIME_WAIT" that filled up the
data file and on server side some MAXMSG* parameter is set too
low. You'll have to adjust that parameter and restart the Xymon server
component.

HTH
Torsten

On 17.11.2016 22:00, charles slater via Xymon wrote:
- -- 
+---------------------------------------------------------+
| E-mail  : user-c862b499d9fa@xymon.invalid			  |
|							  |
| Homepage: http://www.richter-it.net/			  |
+---------------------------------------------------------+
Download my public key from:
http://keys.gnupg.net/pks/lookup?search=0x899093AC&op=get
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlguIuQACgkQ7DlmxomQk6xqEgCcDPCIGaQnxhSYhqUdQdQegT9o
2XIAoLr/V9HnsvtmtqALWRMhzRerwIWE
=Ohx1
-----END PGP SIGNATURE-----
list Japheth Cleaver · Fri, 18 Nov 2016 10:06:16 -0800 ·
quoted from Torsten Richter
On 11/17/2016 1:36 PM, Torsten Richter wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Charles,

in the webinterface of Xymon you have for the most tests
a link called "Client data" below the status.
If you click on it you'll see a text output in your browser
that starts with something like "[collector:]".
Scroll down to the end of the file and see if the last entry
is something about "[clock]" and "epoch", "local" and "UTC".

If not then maybe the data sent from the client to the server
got truncated and you might see a yellow alert for xymond.
Or you find something about it in you Xymon server logs.

I had similar problems with some hosts where a lot of connections
were in state "ESTABLISHED" or "TIME_WAIT" that filled up the
data file and on server side some MAXMSG* parameter is set too
low. You'll have to adjust that parameter and restart the Xymon server
component.

HTH
Torsten

On 17.11.2016 22:00, charles slater via Xymon wrote:
The data truncation thing definitely could be a cause here. An unfortunate artifact of the communication mechanism with the original TCP protocol is the lack of a message end delimiter on submission (it's present in the STDIN stream from xymond_channel to the workers, but by that point the original transmission has long-since concluded), meaning we can't tell for sure if we got the entire payload. This is fixed in the "V5" protocol (former trunk) by adding a payload size prefix, but this isn't backwards compatible for anything not expecting it. It's also (almost-certainly) fixed if you're using any compression, because size details are used as part of decompression validation.

There are number of TCP and connection kernel parameters that can be tweaked in sysctl.conf to help reduce TCP issues here -- especially important if you're running a busy xymonnet server -- but an actual overloaded router or flakey cable (or VPN connection) will still bite you.

One workaround -- if you can spare the CPU capacity and message overhead on your xymon server -- is to add '--filter='\\[clock\\]'' to anything listening on the client channel in tasks.cfg. For example:

     CMD xymond_channel --filter='\\[clock\\]' --channel=client xymond_client --uptime-status

This will reject any incoming client message that got truncated before the [clock] section at the very end, so it won't get processed into the individual status messages based on missing data. not the best solution, but it will at least prevent status flaps.

And yes, check your xymond.log for any warnings about truncated messages due to size. It's always good to give yourself a lot of extra room in the client message if you have servers reporting in that receive lots of burst network or process activity where either netstat or ps could end up 1000's of lines longer than normal.


HTH,
-jc