Xymon Mailing List Archive search

WAN performance/monitoring

list Adam Goryachev
Thu, 05 Jun 2014 10:29:19 +1000
Message-Id: <user-2f6c9f910871@xymon.invalid>

Hi All,

First some background, then sharing some scripts I've written/used, and finally asking for some advice please.

Some time ago I was having a LAN issue (dropped packets) which I wrote a small script to measure, and quantify the problem. (If you can't see the problem, you can't fix it, and you can't prove it is fixed afterwards).

All the script did was use fping to ping a group of IP's once per second, then every minute it would record a log of the date/time plus one line for each IP that had one or more dropped packets. This worked nicely for the above purpose, allowing me to easily pinpoint the common machines experiencing the problem, and then eventually solve it.

Now I'd like to extend the script to cover my WAN connections, but I also need more information, and don't want to re-invent the wheel. So, I'm looking for suggestions on how to implement what I need, and/or other products that already do this.

Specifically, I now want to record at least the following data into RRD's for later viewing:
1) Maximum ping time per minute
2) Average ping time per minute
3) Minimum ping time per minute
4) Packet loss per minute

Now the first three could be done by using my script to calculate the value and then record those three values per minute, or I could record 60 values per minute and let RRD do the calculation. One thing that does happen is obviously drift, ie, the processing time of my script will take a fraction of a second, so I won't really get a value for every single second, but then that is probably overkill anyway, if I can get one value for 99% of seconds, then I should get a clear picture of my links, performance, and any issues.

The second part of this question is what values for the above 4 things do you use for xymon as alarms? What is acceptable, what is marginal, and what is downright awful? In my case I'm using connections for RDP (Windows Remote Desktop).

BTW, currently the script doesn't actually integrate with xymon, that is still doing it's own standard network ping monitoring, but obviously this is a lot more intense/detailed, and I'd like to integrate the result (to get alerting/history/etc).

The current script I'm using which is started by /etc/rc.local at boot with "nohup /usr/local/bin/pingmon.sh >> /var/log/pingmon.log
#!/bin/bash
HOSTLIST="x.x.x.10
x.x.y.254
x.x.z.254"

HOSTLIST=$HOSTLIST
function doping
{
         START=`date '+%Y%m%d-%H:%M:%S'`
         result=`fping -C 60 -q ${HOSTLIST} 2>&1`
         echo "${result}" | grep -q -- - 2>&1 > /dev/null
         res=$?

         if [ $res == 0 ]
         then
                 echo -en "${START}\n${result}" | grep -- -
         else
                 echo "${START}"
         fi
}
while /bin/true
do
         doping >> /var/log/pingmon.log
done

I also wrote a report generator which was supposed to parse the log file and generate a summary/report in perl. I've attached that script here, but I can't claim that it is bug free, it also hard codes some business parameters (ie, business hours/days/etc), search for XXXX to find most things you will want to change.

Regards,
Adam

-- 
Adam Goryachev Website Managers www.websitemanagers.com.au