Xymon Mailing List Archive search

How is clock graph in "trends" column generated

7 messages in this thread

list Junaid Shahid · Wed, 22 Jun 2016 22:56:12 +0500 ·
Hello Xymon Gurus!

I have searched the interwebs for it but could not find anything useful.

My question is how are the values calculated for the Clock offset graph,
displayed at the bottom of the "Trends" column. I have seen some posts (and
my experience also confirms this) that Xymon's ntp check (specified in the
hosts.cfg) is only to see if NTP daemon on a client is up (responsive).
While the clock offset is plotted via some internal logic (client feeds
data to the server).

Would really appreciate if some one can throw some light on it!

Some links that I have gone through:
http://osdir.com/ml/monitoring.hobbit/2007-03/msg00361.html
http://lists.xymon.com/oldarchive/2009/01/msg00417.html


-- 
Regards,
Junaid Shahid,
TODO:______
list Japheth Cleaver · Wed, 22 Jun 2016 11:33:38 -0700 ·
quoted from Junaid Shahid

On Wed, June 22, 2016 10:56 am, Junaid Shahid wrote:
Hello Xymon Gurus!

I have searched the interwebs for it but could not find anything useful.

My question is how are the values calculated for the Clock offset graph,
displayed at the bottom of the "Trends" column. I have seen some posts
(and
my experience also confirms this) that Xymon's ntp check (specified in the
hosts.cfg) is only to see if NTP daemon on a client is up (responsive).
While the clock offset is plotted via some internal logic (client feeds
data to the server).

Would really appreciate if some one can throw some light on it!

Some links that I have gone through:
http://osdir.com/ml/monitoring.hobbit/2007-03/msg00361.html
http://lists.xymon.com/oldarchive/2009/01/msg00417.html
Hi,

The "clock" value is computed from the timestamp of the client message (as
seen at the end of generation by the client) to the timestamp of the "cpu"
status message generation by the xymond_client on the server. (It's thus
dependent on your xymon server having the time set correcty.) The client
isn't doing a comparison itself against anything externally.

This is subject to skew if:
a) your xymon server itself is wrong
b) you have a xymonproxy in the middle and messages are delayed getting to
xymond
c) your xymond_client process is backlogged with [client] messages
d) your xymon server is overloaded and has a long period between
transmission and TCP processing by xymond

For b) and c) the 4.4 version (and the Terabithia RPMS, IIRC) use receipt
time for the first proxy encountered, and xymond parsing/separation time
otherwise, for comparison, which fixes both of those things. Local ntp
problems and "raw" TCP lag until connection receipt will still affect
things, however.

One "feature" of this in 4.3 though is that if you see clock skew rising
and times are correct on both sides, then you can easily tell that your
xymon server is having performance problems.


HTH,
-jc
list Junaid Shahid · Tue, 28 Jun 2016 18:27:32 +0500 ·
Thanks JC!

Now that makes it very clear how CPU stats contain server's timestamp (and
why).

I have checked we are running version 4.3.21.

Now lets look at the reasons of skew:
a) your xymon server itself is wrong
Our server's time is correct (as I have manually checked it multiple times
manually and also with "ntpstats"). Plus, we have some 300+ clients under
Xymon monitoring, and none of them exhibit any time skew in their CLOCK
Offset trends
quoted from Japheth Cleaver

b) you have a xymonproxy in the middle and messages are delayed getting to
xymond

We don't use any xymon proxy

c) your xymond_client process is backlogged with [client] messages
This also can't be the reason because all other clients don't exhibit any
noticeable skew in their respective Clock Offset trends
quoted from Japheth Cleaver

d) your xymon server is overloaded and has a long period between transmission
and TCP processing by xymond

This also must not be the case as no other client show any noticeable Clock
Offset trend.


In our case there is one specific server (out of 300+) that has a
clock offset trend that alternates b/w 2-15 secs (like a sinusoidal wave).
This machine's time is in perfect sync with our NTP server though (no
clock drift exists actually). This machine has a little complicated network
topology though (behind various layers such as firewalls, load balancers
etc). My only guess now is that this is because of its weird network
location, what do you think JC?
list Jeremy Laidman · Tue, 28 Jun 2016 23:46:32 +0000 ·
quoted from Junaid Shahid
On Tue, 28 Jun 2016, 23:27 Junaid Shahid <user-bfbf9229dbc9@xymon.invalid> wrote:
Thanks JC!

Now that makes it very clear how CPU stats contain server's timestamp (and
why).

I have checked we are running version 4.3.21.

Now lets look at the reasons of skew:
a) your xymon server itself is wrong
Our server's time is correct (as I have manually checked it multiple times
manually and also with "ntpstats"). Plus, we have some 300+ clients under
Xymon monitoring, and none of them exhibit any time skew in their CLOCK
Offset trends

b) you have a xymonproxy in the middle and messages are delayed getting to
xymond
We don't use any xymon proxy

c) your xymond_client process is backlogged with [client] messages
This also can't be the reason because all other clients don't exhibit any
noticeable skew in their respective Clock Offset trends

d) your xymon server is overloaded and has a long period between transmission
and TCP processing by xymond
This also must not be the case as no other client show any noticeable
Clock Offset trend.


In our case there is one specific server (out of 300+) that has a
clock offset trend that alternates b/w 2-15 secs (like a sinusoidal wave).
This machine's time is in perfect sync with our NTP server though (no
clock drift exists actually). This machine has a little complicated network
topology though (behind various layers such as firewalls, load balancers
etc). My only guess now is that this is because of its weird network
location, what do you think JC?
I tend to agree. If it takes a few seconds to make a TCP connection to the
xymon server and transmit the client message, you will see such a delay.

Try manually sending a client message and see how long it takes. Something
like:

$ time $XYMON $XYMSRV "client/timetest $MACHINE.$SERVEROSTYPE"

(run within a xymoncmd shell on the client)

J
list Junaid Shahid · Fri, 1 Jul 2016 21:52:18 +0500 ·
Thank you Jeremy for your suggestion!

I have run this command on the client, but I don't know what conclusions
can I draw from it. Here is the outptu, (after being dropped to xymoncmd):

==================================================================
# time /usr/libexec/xymon-client/xymon 10.12.12.44 "client/timetest
zm1.i2cinc.com.Linux"
ignore mail.info
file:/etc/passwd:md5
file:/etc/shadow:md5
log:/var/adm/messages:10240
0.00user 0.00system 0:00.00elapsed 33%CPU (0avgtext+0avgdata
680maxresident)k
0inputs+0outputs (0major+200minor)pagefaults 0swaps
==================================================================

BTW I do see clock offsets as big as 45 seconds reported on the "CPU" page
for this client.

Thanks.

On Wed, Jun 29, 2016 at 4:46 AM, Jeremy Laidman <user-71895fb2e44c@xymon.invalid>
quoted from Jeremy Laidman
wrote:
On Tue, 28 Jun 2016, 23:27 Junaid Shahid <user-bfbf9229dbc9@xymon.invalid> wrote:
Thanks JC!

Now that makes it very clear how CPU stats contain server's timestamp
(and why).

I have checked we are running version 4.3.21.

Now lets look at the reasons of skew:
a) your xymon server itself is wrong
Our server's time is correct (as I have manually checked it multiple
times manually and also with "ntpstats"). Plus, we have some 300+ clients
under Xymon monitoring, and none of them exhibit any time skew in their
CLOCK Offset trends

b) you have a xymonproxy in the middle and messages are delayed getting
to xymond
We don't use any xymon proxy

c) your xymond_client process is backlogged with [client] messages
This also can't be the reason because all other clients don't exhibit any
noticeable skew in their respective Clock Offset trends

d) your xymon server is overloaded and has a long period between transmission
and TCP processing by xymond
This also must not be the case as no other client show any noticeable
Clock Offset trend.


In our case there is one specific server (out of 300+) that has a
clock offset trend that alternates b/w 2-15 secs (like a sinusoidal wave).
This machine's time is in perfect sync with our NTP server though (no
clock drift exists actually). This machine has a little complicated network
topology though (behind various layers such as firewalls, load balancers
etc). My only guess now is that this is because of its weird network
location, what do you think JC?
I tend to agree. If it takes a few seconds to make a TCP connection to the
xymon server and transmit the client message, you will see such a delay.

Try manually sending a client message and see how long it takes. Something
like:

$ time $XYMON $XYMSRV "client/timetest $MACHINE.$SERVEROSTYPE"

(run within a xymoncmd shell on the client)

J

-- 

Regards,
Junaid Shahid,
TODO:______
list Jeremy Laidman · Fri, 01 Jul 2016 21:36:41 +0000 ·
Well, that was rather quick: 0.0 seconds elapsed. So it doesn't look like
the comms is causing your problem.
quoted from Junaid Shahid

On Sat, 2 Jul 2016, 02:52 Junaid Shahid <user-bfbf9229dbc9@xymon.invalid> wrote:
Thank you Jeremy for your suggestion!

I have run this command on the client, but I don't know what conclusions
can I draw from it. Here is the outptu, (after being dropped to xymoncmd):

==================================================================
# time /usr/libexec/xymon-client/xymon 10.12.12.44 "client/timetest
zm1.i2cinc.com.Linux"
ignore mail.info
file:/etc/passwd:md5
file:/etc/shadow:md5
log:/var/adm/messages:10240
0.00user 0.00system 0:00.00elapsed 33%CPU (0avgtext+0avgdata
680maxresident)k
0inputs+0outputs (0major+200minor)pagefaults 0swaps
==================================================================

BTW I do see clock offsets as big as 45 seconds reported on the "CPU" page
for this client.

Thanks.

On Wed, Jun 29, 2016 at 4:46 AM, Jeremy Laidman <user-71895fb2e44c@xymon.invalid>
wrote:
On Tue, 28 Jun 2016, 23:27 Junaid Shahid <user-bfbf9229dbc9@xymon.invalid> wrote:
Thanks JC!

Now that makes it very clear how CPU stats contain server's timestamp
(and why).

I have checked we are running version 4.3.21.

Now lets look at the reasons of skew:
a) your xymon server itself is wrong
Our server's time is correct (as I have manually checked it multiple
times manually and also with "ntpstats"). Plus, we have some 300+ clients
under Xymon monitoring, and none of them exhibit any time skew in their
CLOCK Offset trends

b) you have a xymonproxy in the middle and messages are delayed getting
to xymond
We don't use any xymon proxy

c) your xymond_client process is backlogged with [client] messages
This also can't be the reason because all other clients don't exhibit
any noticeable skew in their respective Clock Offset trends

d) your xymon server is overloaded and has a long period between transmission
and TCP processing by xymond
This also must not be the case as no other client show any noticeable
Clock Offset trend.


In our case there is one specific server (out of 300+) that has a
clock offset trend that alternates b/w 2-15 secs (like a sinusoidal wave).
This machine's time is in perfect sync with our NTP server though (no
clock drift exists actually). This machine has a little complicated network
topology though (behind various layers such as firewalls, load balancers
etc). My only guess now is that this is because of its weird network
location, what do you think JC?
I tend to agree. If it takes a few seconds to make a TCP connection to
the xymon server and transmit the client message, you will see such a delay.

Try manually sending a client message and see how long it takes.
Something like:

$ time $XYMON $XYMSRV "client/timetest $MACHINE.$SERVEROSTYPE"

(run within a xymoncmd shell on the client)

J

--
Regards,
Junaid Shahid,
TODO:______
list Jeremy Laidman · Wed, 06 Jul 2016 00:20:18 +0000 ·
On Sat, Jul 2, 2016 at 7:36 AM Jeremy Laidman <user-71895fb2e44c@xymon.invalid>
quoted from Jeremy Laidman
wrote:
On Sat, 2 Jul 2016, 02:52 Junaid Shahid <user-bfbf9229dbc9@xymon.invalid> wrote:
Thank you Jeremy for your suggestion!

I have run this command on the client, but I don't know what conclusions
can I draw from it. Here is the outptu, (after being dropped to xymoncmd):
0.00user 0.00system 0:00.00elapsed 33%CPU (0avgtext+0avgdata
680maxresident)k
0inputs+0outputs (0major+200minor)pagefaults 0swaps
OK, so what next?  The likely causes seem to have been eliminated, or at
least unlikely.

What I'd do next is to get a packet capture on both endpoints, of the
client message, complete with timestamps on the capture output, when
there's a significant time discrepancy; 45 seconds would be great, but
anything more than a few seconds would be sufficient (an order of magnitude
longer than the expected runtime of the xymonclient.sh script).  Then I'd
compare the capture timestamps with the time shown in the contents of the
client message.  This should hint at whether the time anomaly (sounds like
a scifi plot device) is at the client or server.

For extra points, trace the client side using strace/truss with timestamps
enabled.

Note that you don't need to wait (up to 5 minutes) for the client message
to be transmitted.  You can send a client message any time you like by
running xymonclient.sh within xymoncmd.  This also makes it easier to use
strace/truss.

J