Xymon Mailing List Archive search

Rare shape in graphs.

6 messages in this thread

list Raul GN · Tue, 23 Apr 2013 13:33:05 +0200 ·
Hi all,
   Recently I've included 2 new host to monitor in xymon but from time to
time response times are quite high (2 o 3 seconds). Shape in graphs draws
ups and downs in a very recognisable manner:

[image: Inline image 1]
[image: Inline image 2]

Testing the host (a windows server) with telnet for hours (Perl script)
doesn't shows this responses times (always fewer than 1 second). Does
anyone knows where could be the problem? Is it a xymon server problem or it
may be in the remote server?  I think it is in the remote server but I
couldn't demonstrate it and than curious graphs doesn't help clarifying it.
Attachments (2)
list Stef Coene · Fri, 26 Apr 2013 20:13:43 +0200 ·
quoted from Raul GN
On Tuesday 23 April 2013 13:33:05 Raul GN wrote:
Hi all,
   Recently I've included 2 new host to monitor in xymon but from time to
time response times are quite high (2 o 3 seconds). Shape in graphs draws
ups and downs in a very recognisable manner:

[image: Inline image 1]
[image: Inline image 2]

Testing the host (a windows server) with telnet for hours (Perl script)
doesn't shows this responses times (always fewer than 1 second). Does
anyone knows where could be the problem? Is it a xymon server problem or it
may be in the remote server?  I think it is in the remote server but I
couldn't demonstrate it and than curious graphs doesn't help clarifying it.
I see the same on some tests, mosts of them ping.  I never figured out why this 
is happening.....


Stef
list Srinivas Dasari · Sat, 27 Apr 2013 04:43:13 +0000 ·
Hi All,

I want to install xymon in my desktop from scratch, can you please send any document for it?

Thanks,
Srinivas
quoted from Stef Coene

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Stef Coene
Sent: Friday, April 26, 2013 11:44 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Rare shape in graphs.

On Tuesday 23 April 2013 13:33:05 Raul GN wrote:
Hi all,
   Recently I've included 2 new host to monitor in xymon but from time to time response times are quite high (2 o 3 seconds). Shape in graphs draws ups and downs in a very recognisable manner:

[image: Inline image 1]
[image: Inline image 2]

Testing the host (a windows server) with telnet for hours (Perl script) doesn't shows this responses times (always fewer than 1 second). Does anyone knows where could be the problem? Is it a xymon server problem or it may be in the remote server?  I think it is in the remote server but I couldn't demonstrate it and than curious graphs doesn't help clarifying it.
I see the same on some tests, mosts of them ping.  I never figured out why this is happening.....


Stef


Please do not print this email unless it is absolutely necessary. 
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. 
www.wipro.com
list Josh Luthman · Sat, 27 Apr 2013 11:19:38 -0400 ·
Please create a new thread for a new topic from now on.

http://en.wikibooks.org/wiki/System_Monitoring_with_Xymon/Administration_Guide/Compiling_on_CentOS

Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
quoted from Srinivas Dasari


On Sat, Apr 27, 2013 at 12:43 AM, <user-07fe634f7333@xymon.invalid> wrote:
Hi All,

I want to install xymon in my desktop from scratch, can you please send
any document for it?

Thanks,
Srinivas

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Stef Coene
Sent: Friday, April 26, 2013 11:44 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Rare shape in graphs.

On Tuesday 23 April 2013 13:33:05 Raul GN wrote:
Hi all,
   Recently I've included 2 new host to monitor in xymon but from time
to time response times are quite high (2 o 3 seconds). Shape in graphs
draws ups and downs in a very recognisable manner:

[image: Inline image 1]
[image: Inline image 2]

Testing the host (a windows server) with telnet for hours (Perl
script) doesn't shows this responses times (always fewer than 1
second). Does anyone knows where could be the problem? Is it a xymon
server problem or it may be in the remote server?  I think it is in
the remote server but I couldn't demonstrate it and than curious graphs
doesn't help clarifying it.
I see the same on some tests, mosts of them ping.  I never figured out why
this is happening.....


Stef


Please do not print this email unless it is absolutely necessary.

The information contained in this electronic message and any attachments
to this message are intended for the exclusive use of the addressee(s) and
may contain proprietary, confidential or privileged information. If you are
not the intended recipient, you should not disseminate, distribute or copy
this e-mail. Please notify the sender immediately and destroy all copies of
this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of viruses.
The company accepts no liability for any damage caused by any virus
transmitted by this email.

www.wipro.com

list Jeremy Laidman · Mon, 29 Apr 2013 14:55:57 +1000 ·
quoted from Raul GN
On 23 April 2013 21:33, Raul GN <user-340accf16858@xymon.invalid> wrote:
Hi all,
   Recently I've included 2 new host to monitor in xymon but from time to
time response times are quite high (2 o 3 seconds). Shape in graphs draws
ups and downs in a very recognisable manner:
I believe this is a result of quantization and interpolation when RRD
stores the results, and it shows up when RRD creates the graphs.  It also
seems more pronounced when the metrics are whole-numbers, such as clock
offset.  It's also more obvious for really small numbers.  Here's a graph
of mine that shows an interesting repeating diamond shape.

[image: Inline images 1]
(But your multi-coloured "Connection Times" graph is far prettier.)

This took me a while to understand what's going on, and I still don't quite
get it.  But by way of an example, let's say that the clock offset (error)
of a host is 2.333 seconds relative to the Xymon server.  But let's say the
quantization causes RRD to accept only whole numbers (I'm stretching the
truth here for demonstration purposes).  The numbers will come in as 2.333,
2.333, 2.333, but RRD will store them as 2, 2, 3, 2, 2, 3.  So you end up
with a repeating pattern (a spike every 3rd sample) that really doesn't
exist, and is related to the way RRD stores, consolidates and retrieves
numbers.  This pattern is easily detected by our brain vision circuitry.

Of course, RRD can indeed store floating point numbers, but when you have
really small variations in numbers (many decimal places), RRD has to round
the numbers at some point, and this gives the same effect.  Also, the
number of decimal places required to avoid such effects is increased when
you do averaging and interpolation.

Interpolation, the way I see it, is as follows.  When RRD gets a value of 7
at 3:05:00 and a value of 8 at 3:10:01, it doesn't store "8" for the second
value, because it didn't come in exactly 5 minutes after the first value.
 Instead, RRD removes one second's worth from the value, because it has to
store the second value at 3:10:00.  If it took 301 seconds to go from 7 to
8, then RRD will calculate that the value after 300 seconds would have been
around 7.996... So if your sampling jumps around a little earlier and
later, in a regular manner, you may see an interpolation artefact that
matches that variation.

Typically, xymonnet performs its checks in the same order every time, and
earlier checks might delay later checks very slightly, just enough to push
the sample timestamp into the next whole second, causing an interpolation
artefact.  Because the ordering is not randomised, the artefacts are
consistent and recognisable as patterns in the graphs.  It could be that,
by adding "--shuffle" to the call to xymonnet, the artefacts will no longer
form a recognisable pattern.

While I don't fully understand what's going on, I think it's safe to say
that it's not a problem with either Xymon or the servers being tested, but
is a result of how RRD does its thing.

J
Attachments (1)
list Raul GN · Wed, 8 May 2013 12:03:03 +0200 ·
Thank you Jeremy, That Interpolation feature explains a lot of things with
RRD files.
quoted from Jeremy Laidman


On Mon, Apr 29, 2013 at 6:55 AM, Jeremy Laidman <user-71895fb2e44c@xymon.invalid>wrote:
On 23 April 2013 21:33, Raul GN <user-340accf16858@xymon.invalid> wrote:
Hi all,
   Recently I've included 2 new host to monitor in xymon but from time
to time response times are quite high (2 o 3 seconds). Shape in graphs
draws ups and downs in a very recognisable manner:
I believe this is a result of quantization and interpolation when RRD
stores the results, and it shows up when RRD creates the graphs.  It also
seems more pronounced when the metrics are whole-numbers, such as clock
offset.  It's also more obvious for really small numbers.  Here's a graph
of mine that shows an interesting repeating diamond shape.

[image: Inline images 1]
(But your multi-coloured "Connection Times" graph is far prettier.)

This took me a while to understand what's going on, and I still don't
quite get it.  But by way of an example, let's say that the clock offset
(error) of a host is 2.333 seconds relative to the Xymon server.  But let's
say the quantization causes RRD to accept only whole numbers (I'm
stretching the truth here for demonstration purposes).  The numbers will
come in as 2.333, 2.333, 2.333, but RRD will store them as 2, 2, 3, 2, 2,
3.  So you end up with a repeating pattern (a spike every 3rd sample) that
really doesn't exist, and is related to the way RRD stores, consolidates
and retrieves numbers.  This pattern is easily detected by our brain vision
circuitry.

Of course, RRD can indeed store floating point numbers, but when you have
really small variations in numbers (many decimal places), RRD has to round
the numbers at some point, and this gives the same effect.  Also, the
number of decimal places required to avoid such effects is increased when
you do averaging and interpolation.

Interpolation, the way I see it, is as follows.  When RRD gets a value of
7 at 3:05:00 and a value of 8 at 3:10:01, it doesn't store "8" for the
second value, because it didn't come in exactly 5 minutes after the first
value.  Instead, RRD removes one second's worth from the value, because it
has to store the second value at 3:10:00.  If it took 301 seconds to go
from 7 to 8, then RRD will calculate that the value after 300 seconds would
have been around 7.996... So if your sampling jumps around a little earlier
and later, in a regular manner, you may see an interpolation artefact that
matches that variation.

Typically, xymonnet performs its checks in the same order every time, and
earlier checks might delay later checks very slightly, just enough to push
the sample timestamp into the next whole second, causing an interpolation
artefact.  Because the ordering is not randomised, the artefacts are
consistent and recognisable as patterns in the graphs.  It could be that,
by adding "--shuffle" to the call to xymonnet, the artefacts will no longer
form a recognisable pattern.

While I don't fully understand what's going on, I think it's safe to say
that it's not a problem with either Xymon or the servers being tested, but
is a result of how RRD does its thing.

J