Xymon Mailing List Archive search

localmode, got over-size message, truncating

list Christoph Zechner
Fri, 11 Mar 2022 06:25:14 +0100
Message-Id: <user-489896abd2e3@xymon.invalid>

On 10/03/2022 23:56, Jeremy Laidman wrote:
Honestly, I can't work out how this happened. A review of the code - in 
as much as I can understand it, not being a C programmer - shows that 
there's only one place the MAXMSG_CLIENT parameter is used, and that's 
in xymond. In particular, it's not used in the xymon client (which is 
the only process that logs to xymonclient.log).
I also digged through the source code trying to find answers and since 
I'm using local mode on my clients (thus utilising the xymond_client 
binary), I think it makes sense (more or less).
I can understand how it could have come about that xymond was loaded 
using xymonclient.cfg for its environment, thus applying the smaller 
size limit to incoming messages. But if this were the case, I can't work 
out how you would have seen MAXMSG_CLIENT=2048 in the running xymond 
process's environment.
My MAXMSG_CLIENT=2048 messages were always server-side (thanks to your 
env command line showing me the current used options), I never even saw 
that variable on my client, because it never got set. Only after I 
manually added it to xymonclient.cfg, it started working as expected.

I think it classifies as a bug, but xymon's localmode is somewhat 
undocumented (the binary for it is missing in the Debian package as 
well, for example...) and in my opinion this should be documented somewhere.

Christoph
So, I'm glad you worked out a solution. But I don't think we quite 
understand the cause.

On Thu, 10 Mar 2022 at 22:41, Jeremy Laidman <user-0608abae5e7c@xymon.invalid 
<mailto:user-0608abae5e7c@xymon.invalid>> wrote:

    Great work Christoph.

    Sorry, it appears that I led you down the wrong path,?asserting that
    it was a server-only?setting in xymond. It would appear?to be a
    client-side setting. This seems to be undocumented in the man page
    for xymonclient.cfg.

    J

    On Thu, 10 Mar 2022 at 21:18, Christoph Zechner <user-249716582ccc@xymon.invalid
    <mailto:user-249716582ccc@xymon.invalid>> wrote:

        I solved it!

        I had to add and set "MAXMSG_CLIENT=1024" in
        /etc/xymon/xymonclient.cfg,
        restarted xymon-client and all the errors were gone.

        Thanks again for your help!

        Cheers
        Christoph


        On 09/03/2022 06:42, Christoph Zechner wrote:
On 09/03/2022 00:04, Jeremy Laidman wrote:
On Tue, 8 Mar 2022 at 18:52, Christoph Zechner
        <user-249716582ccc@xymon.invalid <mailto:user-249716582ccc@xymon.invalid>
<mailto:user-249716582ccc@xymon.invalid <mailto:user-249716582ccc@xymon.invalid>>> wrote:

??? It seems I celebrated prematurely, the errors are back
        in exactly the
??? same way :-/

??? 2022-03-08 08:47:19.321457 Got over-size message,
        truncating at
528383
??? bytes (max: 524288)
??? 2022-03-08 08:47:19.339786 Dropping (more) garbled data

??? I don't understand where this limit 05 512 comes from,
        everything on
??? the
??? server checks out (2048 before, tried 4096 as well, no
        change).


I'm at a loss. If the xymond process is proven to have this
        set at
2048, then I see no reason why it would give that error
        message with
that number.

Unless it's referring to another message type and hence a
        different
maximum setting? Perhaps take a look at xymond's environment
        again,
but search for all MAXMSG_ variables. See which one is set
        to 512, and
that might be the culprit. The defaults for these max values
        are all
different, with only two of them defaulting to 512:
        MAXMSG_CLIENT,
MAXMSG_CLICHG (reference: lib/xymond_buffer.c). But it's
        possible one
of them has been set to 512.
Thanks, I tried that, but unfortunately, this did not help,
        since all
the values were set correctly, according to my config.
The only other thing I can think of is that you have two
        copies of
xymond running, somehow with different values of
        MAXMSG_CLIENT. But I
can't think how this could come about. And you've already
        killed off
any rogue processes.
Right, that's not it either. :-/
Maybe run xymond in debug mode for one round of updates,
        until you get
the "Got over-size message" and review the debug logs. This
        might
provide enough additional detail to find out what's going on.

Another approach to solve the problem (truncated client data
        message)
is to modify the client script (eg xymonclient-linux.sh) to
        truncate
the ps command output, so that the total message size is
        less, and
hopefully fits within the max message size. This will mean
        that PROC
checks might not work anymore (which is likely the case
        now). But the
current state is that monitoring of the sections that come
        after [ps]
are likely broken now. On Linux this is notably the [top]
        and [vmstat]
sections of the client data message, that are used for the
        "cpu"
status and several metrics for graphing. Maybe something
        like adding
"head -1000" will cut it down to a reasonable size:

echo "[ps]"
ps -Aww -o
        pid,ppid,user,start,state,pri,pcpu,time,pmem,rsz,vsz,cmd |
head -1000
That's actually a gread idea and I modified the [ports]
        section, because
I know this is the culprit (running a proxy there and all the
        active
client connections were too much for xymon to handle.

I'm not interested in client connections anyway, I just want
        to monitor
my running programs and ports on that server, so I replaced
        the original

netstat -antuW 2>/dev/null
netstat -antuT 2>/dev/null

with

netstat -tulpenW 2>/dev/null

(adding your "| head 1000" suggestion did not work, because
        it cut off
the list before it could reach the IPv6 interfaces and thus
        the ports
check was always red).

Now xymon works again, although this is just a workaround,
        because the
underlying problem of where exactly my messages got
        truncated, is still
to be found, but I can live with this solution.

Anyway, I very much appreciate your time and efforts, thank
        you very much!

Cheers
Christoph
Also, review the client data message before the [ps] section
        to see if
there's actually something else pushing it over the limit,
        and [ps]
just happens to be where the truncation happens.

J
        <