Xymon Mailing List Archive search

Bug in xymonping reporting wrong data when pinging multiple hosts

list Michael Beatty
Wed, 09 Jan 2013 07:36:31 -0500
Message-Id: <user-23d1883dae17@xymon.invalid>

Installed FPing also, working fine.  For whatever reason it installed 
with rwxr-xr-x permissions, needed to chmod u+s for it to work.

In some more testing, the xymonping does work to report hosts that are 
failed.  So it is still effective for alerting purposes, but from a 
reporting standpoint, not so much.

Michael Beatty

On 01/08/2013 06:57 PM, Jeremy Laidman wrote:
Of course the solution is to use fping.  Henrik has previously stated 
that fping is preferred over xymonping 
<http://lists.xymon.com/archive/2012-January/033738.html>;, and the 
fping.sh script used when building will warn that "it is not yet fully 
stable".

I've just now installed fping (and configured Xymon to use it) and my 
graphs are now showing much more reasonable values than before.

Nevertheless, it's not obvious in any documentation that xymonping 
will give bad data.  The caveats on its use suggest (to me) that it 
can miss some replies when large numbers of hosts are probed, but in 
practice it gives bad data even when the number of hosts is two.

J


On 9 January 2013 10:39, Jeremy Laidman <user-71895fb2e44c@xymon.invalid 
<mailto:user-71895fb2e44c@xymon.invalid>> wrote:

    Yup, I get this too, tested with v4.3.10 and v4.3.4.  It also
    shows up when I ping the localhost address repeatedly:

    sudo ./xymon-4.3.4/xymonnet/xymonping 127.0.0.1 127.0.0.1
    127.0.0.1 127.0.0.1 127.0.0.1
    127.0.0.1 is alive (20 ms)
    127.0.0.1 is alive (0.02 ms)
    127.0.0.1 is alive (24 ms)
    127.0.0.1 is alive (0.02 ms)
    127.0.0.1 is alive (0.02 ms)

    The 20ms and 24ms entries are wrong, and they change as I adjust
    the max-pps values, by a factor of 5.

    None of my conn graphs seems to be completely flatlined, but I
    have noticed that DNS test times are usually less than conn test
    times, which is a bit odd, but might be unrelated.  Hmm, now that
    I look at them, it seems all of my graphs but one are hovering
    close to either 24ms or 48ms.  The host that is the exception,
    with a conn graph that looks correct, happens to be the last entry
    if I sort all host IP addresses.

    J


    On 9 January 2013 05:49, Michael Beatty
    <user-4aea7c115850@xymon.invalid <mailto:user-4aea7c115850@xymon.invalid>>
    wrote:

        Using Xymon 4.3.7
        OS Linux SuSE

        I've been struggling to understand why certain hosts are
        almost always reporting the exact same ping response time. 
        I've determined, that xymonping isn't working, it is reporting
        incorrect data for half of the hosts tested.

        I start by pinging 6 hosts, one at a time, everything is correct
        /[xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.22
        X.X.X.22 is alive (0.06 ms)
        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.70
        X.X.X.70 is alive (0.56 ms)
        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.138
        X.X.X.138 is alive (826 ms)
        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
        X.X.X.137 is alive (980 ms)
        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.201
        X.X.X.201 is alive (0.75 ms)
        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.202
        X.X.X.202 is alive (0.66 ms)
        /

        Then, put them in the same command, the first, second, and
        fifth values are wrong
        /[xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.70
        X.X.X.22 X.X.X.138 X.X.X.137 X.X.X.201 X.X.X.202
        X.X.X.70 is alive (40 ms)
        X.X.X.22 is alive (20 ms)
        X.X.X.138 is alive (1307 ms)
        X.X.X.137 is alive (1738 ms)
        X.X.X.201 is alive (20 ms)
        X.X.X.202 is alive (0.64 ms)/


        Switch the order of the pings, the first, second, and fifth
        value are exactly the same as the first time, and still wrong
        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.201
        X.X.X.202 X.X.X.137 X.X.X.138 X.X.X.70 X.X.X.22
        X.X.X.201 is alive (40 ms)
        X.X.X.202 is alive (20 ms)
        X.X.X.137 is alive (1598 ms)
        X.X.X.138 is alive (2069 ms)
        X.X.X.70 is alive (20 ms)
        X.X.X.22 is alive (0.04 ms)
        [xymon at mxbscs tmp]$

        Switch the order again, now the third, fourth, and fifth
        values are wrong.
        /[xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping
        X.X.X.137 X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22
        X.X.X.137 is alive (1537 ms)
        X.X.X.138 is alive (2016 ms)
        X.X.X.201 is alive (40 ms)
        X.X.X.202 is alive (20 ms)
        X.X.X.70 is alive (20 ms)
        X.X.X.22 is alive (0.06 ms)/


        Another thing I have noticed is that by altering the max-pps
        value, you get completely different results.
        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
        X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22 --max-pps=1
        X.X.X.137 is alive (2000 ms)
        X.X.X.138 is alive (1000 ms)
        X.X.X.201 is alive (2000 ms)
        X.X.X.202 is alive (1000 ms)
        X.X.X.70 is alive (1000 ms)
        X.X.X.22 is alive (0.06 ms)

        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
        X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22 --max-pps=5
        X.X.X.137 is alive (1500 ms)
        X.X.X.138 is alive (1479 ms)
        X.X.X.201 is alive (400 ms)
        X.X.X.202 is alive (200 ms)
        X.X.X.70 is alive (200 ms)
        X.X.X.22 is alive (0.06 ms)

        [xymon at mxbscs tmp]$ /home/xymon/server/bin/xymonping X.X.X.137
        X.X.X.138 X.X.X.201 X.X.X.202 X.X.X.70 X.X.X.22 --max-pps=25
        X.X.X.137 is alive (765 ms)
        X.X.X.138 is alive (896 ms)
        X.X.X.201 is alive (80 ms)
        X.X.X.202 is alive (40 ms)
        X.X.X.70 is alive (40 ms)
        X.X.X.22 is alive (0.04 ms)


        It doesn't appear to be a problem with my configuration. I
        checked the www.xymon.com <http://www.xymon.com>; demo site,
        and there seems to be the same issue there. The signature of
        the bad data is easy to see in the graphs as good data has and
        diverse line, where as bad data is a generally flat line.
        These hosts look good:
        http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=pto.linuxbog.dk&SERVICE=conn
        http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=dali.hswn.dk&SERVICE=conn

        These hosts look bad:
        http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=blixen.hswn.dk&SERVICE=conn
        http://www.xymon.com/xymon-cgi/svcstatus.sh?HOST=wifi.hswn.dk&SERVICE=conn


        -- 
        Michael Beatty