On 09/03/2022 00:04, Jeremy Laidman wrote:
On Tue, 8 Mar 2022 at 18:52, Christoph Zechner <user-249716582ccc@xymon.invalid
<mailto:user-249716582ccc@xymon.invalid <mailto:user-249716582ccc@xymon.invalid>>> wrote:
??? It seems I celebrated prematurely, the errors are back in
exactly the
??? same way :-/
??? 2022-03-08 08:47:19.321457 Got over-size message, truncating at
528383
??? bytes (max: 524288)
??? 2022-03-08 08:47:19.339786 Dropping (more) garbled data
??? I don't understand where this limit 05 512 comes from,
everything on
??? the
??? server checks out (2048 before, tried 4096 as well, no change).
I'm at a loss. If the xymond process is proven to have this set at
2048, then I see no reason why it would give that error message
with
that number.
Unless it's referring to another message type and hence a different
maximum setting? Perhaps take a look at xymond's environment again,
but search for all MAXMSG_ variables. See which one is set to
that might be the culprit. The defaults for these max values are
all
different, with only two of them defaulting to 512: MAXMSG_CLIENT,
MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's
possible one
of them has been set to 512.
Thanks, I tried that, but unfortunately, this did not help, since
all
the values were set correctly, according to my config.
The only other thing I can think of is that you have two copies of
xymond running, somehow with different values of MAXMSG_CLIENT.
can't think how this could come about. And you've already killed
off
any rogue processes.
Right, that's not it either. :-/
Maybe run xymond in debug mode for one round of updates, until
you get
the "Got over-size message" and review the debug logs. This might
provide enough additional detail to find out what's going on.
Another approach to solve the problem (truncated client data
message)
is to modify the client script (eg xymonclient-linux.sh) to
truncate
the ps command output, so that the total message size is less, and
hopefully fits within the max message size. This will mean that
sections of the client data message, that are used for the "cpu"
status and several metrics for graphing. Maybe something like
adding
"head -1000" will cut it down to a reasonable size:
echo "[ps]"
ps -Aww -o
pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd |
head -1000
That's actually a gread idea and I modified the [ports] section,
because
I know this is the culprit (running a proxy there and all the active
client connections were too much for xymon to handle.
I'm not interested in client connections anyway, I just want to
monitor
my running programs and ports on that server, so I replaced the
original
netstat -antuW 2>/dev/null
netstat -antuT 2>/dev/null
with
netstat -tulpenW 2>/dev/null
(adding your "| head 1000" suggestion did not work, because it
cut off
the list before it could reach the IPv6 interfaces and thus the
ports
check was always red).
Now xymon works again, although this is just a workaround,
because the
underlying problem of where exactly my messages got truncated, is
still
to be found, but I can live with this solution.
Anyway, I very much appreciate your time and efforts, thank you
very much!
Cheers
Christoph
Also, review the client data message before the [ps] section to
see if
there's actually something else pushing it over the limit, and [ps]
just happens to be where the truncation happens.
J