Xymon Mailing List Archive search

TCP/IP stats (bits/s) limited to 100M

list Scott Walters
Sun, 9 Jul 2006 16:02:32 -0400
Message-Id: <user-c402b3a69393@xymon.invalid>

On Jul 9, 2006, at 12:18 PM, Henrik Stoerner wrote:
OK, you got me on that one.
Not really, you inherited this ;)   He is trying to get me, and his  point is valid, but the tool 'works as designed', read on . . .
It seems that using COUNTER for the byte-counts in both the
netstat- and ifstat-RRD's might be a good idea.
*might* being the operative word there
The question then
becomes "what's a suitable max" for these data ? Should I
assume they are 32-bit counters ? I know some of them are not
(e.g. Solaris has 64-bit counters for bytes in/out per interface).
exactly, and it is even more complicated than that . . . see below
I'll change it to a counter now, with MAX set to "unknown". The  overflow
handling should still work correctly, if I understand the RRD
docs right.
I would not recommend this.  Another major issue is counter resets  instead of overflows (e.g reboot) get mistaken as wraps if the MAX is  not correct.  From what I recall, if you use counter and anything  gets mistaken, you get a massive spike in the RRD making all the data  relatively useless because the y axis autoscales to the spike.

With DERIVE=0 you acknowledge you won't handle counter wraps  correctly (which are not that common anyway) but the result for all  wraps/resets are benign with the NaN, which does *not* cause a  spike.  I am a firm believer in no data is better than bad data.

I am not opposing the ideal that COUNTER with correct max is the  'right way'.   The problem with software that runs on so many  platforms is the correct max is impossible to know for certain.   Defining the MAX as just whatever 32/64 bits value is not adequate  because reboots will cause spikes, you'd need to now the MAX for the  particular metric and that is completely impossible to know  absolutely.  inbytes MAX would need to be different for 10Mb/s 100  1000, Token Ring 16Mb/s, etc, etc.

DERIVE=0 and NaN is a much better compromise than the spikes.  And I  would bet the farm reboots are a much more common event than counter  wraps for the majority of environments.

And Henrik, the net result to you will be answering an endless stream  of emails regarding why every COUNTER RRD has spikes . . . I've been  there, done that ;)  I am almost 100% positive there is not *one*  counter RRD in the larrd stuff, all DERIVE.  It's not impossible  rrdtool has changed to alleviate some of this, but from what I have  read of your email streams it I haven't seen anything to support that.


scott