Xymon Mailing List Archive search

BUG: ntpdate command used in Xymon takes 2+ seconds each in RHEL 7 (with ntp 4.2.6+)

4 messages in this thread

list Matt Vander Werf · Fri, 18 Sep 2015 09:26:08 -0400 ·
Hello J.C., etc.

(Brief Summary)
I noticed a issue/bug in the way Xymon uses the ntpdate command for it's
built-in ntp checks. Each ntp test (using the ntpdate options hard-coded
into Xymon) takes >= 2 seconds to run in any OS that uses upstream ntp
version 4.2.6 or later, including RHEL 7. This is due to a 2 second delay
that was put in place in upstream ntp starting with version 4.2.6. While
RHEL 6 uses upstream ntp 4.2.6 in it's latest iteration, the behavior I'm
referring to was reverted by Red Hat when RHEL 6 moved from using version
4.2.4 to 4.2.6 in a minor release. This was because this kind of major
change in behavior was not desired in only a minor RHEL release. However,
in RHEL 7 it was acceptable to keep this behavior change in place since it
was a major OS release.

(Detailed (continued) Summary)
Initially, I thought it was a Red Hat issue, so a support ticket was opened
up with Red Hat to figure out what the issue was. The support person was
able to replicate the issue I was seeing and created a public Bugzilla bug
here: https://bugzilla.redhat.com/show_bug.cgi?id=1260140. The ntp package
maintainer for Red Hat replied saying this was intended behavior and while
reverted in RHEL 6, it would not be reverted in RHEL 7. I asked in the
support ticket why it was not going to be fixed in RHEL 7 and got this
response in the support ticket:

"The original ntp version shipped in RHEL6 was 4.2.4, which had ntpdate
that was fast, but violated the default 2s minimum spacing between
requests. An NTP server with enabled rate limiting would not respond more
than once to such client. In ntp 4.2.6 that bug was fixed by adding spacing
between the requests, which slows down the ntpdate operation. This change
in behavior was not acceptable for a minor RHEL6 release, so the bug was
restored in a patch. In RHEL7 as a new major release I think such change is
acceptable and ntpdate now works as upstream intended."

Please see public Bugzilla bug for additional links for reference of this
change in behavior.

(Request)
*It would be greatly appreciated if this could be fixed in upstream Xymon
ASAP! However, since my Xymon server was installed using Terabithia RPMs,
what is most important to me is to get an updated Terabithia RPM created
that I can use to update my Xymon server. I understand that a source code
patch has already been created for Debian Xymon here:
https://anonscm.debian.org/cgit/collab-maint/xymon.git/tree/debian/patches/workaround-changed-ntpdate-behaviour
(referenced in earlier mailing list post:
http://lists.xymon.com/pipermail/xymon/2015-September/042224.html). This
would work perfectly!!

However, since I am using Terabithia RPMs, it is not very easy to simply
apply this above patch and recompile the base source code and use that,
since I am I not using just the base source code.

Maybe a hotfix release of Xymon could be created and released sometime very
soon with this fix? Currently, I am unable to do all the ntp tests I would
like to do because it causes xymonnet to go over the 5 minute interval
time!....

In the meantime, would it be possible for someone (maybe J.C.) to create a
binary patch or something that I could apply to the current xymonnet binary
with the fix in the above patch? Just a thought...and would be extremely
appreciated!!


Thank you very much for your time and work you do to make Xymon a great
monitoring tool! I look forward to some fixes being applied to resolve this
issue with the ntp tests!


TL;DR:
ntp checks take WAY to long in Xymon using hard-coded ntpdate options, due
to change in ntpdate behavior starting in upstream ntp version 4.2.6. This
particularly affects Xymon servers running on RHEL 7 (and new releases in
the future). Hotfix needed in Xymon to fix this issue (see source code
patch above) ASAP! If possible, a binary patch to xymonnet binary would be
much preferred in the meantime! I am using Terabithia RPMs for my Xymon
server, so using above source code patch is not possible or desirable!

FYI: I am running Xymon 4.3.21-4.el7.terabithia for my Xymon server.

Thanks!!

--
Matt Vander Werf
list Matt Vander Werf · Fri, 25 Sep 2015 09:07:55 -0400 ·
<bump>

Any response to this e-mail? Just wanted to make sure it wasn't missed....

I'm currently unable to utilize the ntp test functionality in Xymon as much
as I would really like to (as it causes xymonnet to go over the 5 minute
interval time) and won't be able to until this issue is addressed in an
upstream fix and release AND a Terabithia RPM release, as I am using
Terabithia RPMs for my Xymon server, or at the very least a patch for the
xymonnet binary that I'm using. If it'd be easier and possible to just
release a new Terabithia RPM for (at least) RHEL 7, that would work
too....but I understand if there is the desire to keep the RPMs and
upstream versions the same.

I understand that most people aren't having this issue, as they are using
an upstream ntp version below 4.2.6, but any future OS releases that use
ntp >= 4.2.6 will have this same issue (including all new RHEL releases)!

Would greatly appreciate a fix for this issue ASAP, or at the very least an
acknowledgment of this issue and that a fix will be released soon!

Thanks very much!!

--
Matt Vander Werf
quoted from Matt Vander Werf

On Fri, Sep 18, 2015 at 9:26 AM, Matt Vander Werf <user-07704c41c3ad@xymon.invalid> wrote:
Hello J.C., etc.

(Brief Summary)
I noticed a issue/bug in the way Xymon uses the ntpdate command for it's
built-in ntp checks. Each ntp test (using the ntpdate options hard-coded
into Xymon) takes >= 2 seconds to run in any OS that uses upstream ntp
version 4.2.6 or later, including RHEL 7. This is due to a 2 second delay
that was put in place in upstream ntp starting with version 4.2.6. While
RHEL 6 uses upstream ntp 4.2.6 in it's latest iteration, the behavior I'm
referring to was reverted by Red Hat when RHEL 6 moved from using version
4.2.4 to 4.2.6 in a minor release. This was because this kind of major
change in behavior was not desired in only a minor RHEL release. However,
in RHEL 7 it was acceptable to keep this behavior change in place since it
was a major OS release.

(Detailed (continued) Summary)
Initially, I thought it was a Red Hat issue, so a support ticket was
opened up with Red Hat to figure out what the issue was. The support person
was able to replicate the issue I was seeing and created a public Bugzilla
bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1260140. The ntp
package maintainer for Red Hat replied saying this was intended behavior
and while reverted in RHEL 6, it would not be reverted in RHEL 7. I asked
in the support ticket why it was not going to be fixed in RHEL 7 and got
this response in the support ticket:

"The original ntp version shipped in RHEL6 was 4.2.4, which had ntpdate
that was fast, but violated the default 2s minimum spacing between
requests. An NTP server with enabled rate limiting would not respond more
than once to such client. In ntp 4.2.6 that bug was fixed by adding spacing
between the requests, which slows down the ntpdate operation. This change
in behavior was not acceptable for a minor RHEL6 release, so the bug was
restored in a patch. In RHEL7 as a new major release I think such change is
acceptable and ntpdate now works as upstream intended."

Please see public Bugzilla bug for additional links for reference of this
change in behavior.

(Request)
*It would be greatly appreciated if this could be fixed in upstream Xymon
ASAP! However, since my Xymon server was installed using Terabithia RPMs,
what is most important to me is to get an updated Terabithia RPM created
that I can use to update my Xymon server. I understand that a source code
patch has already been created for Debian Xymon here:
https://anonscm.debian.org/cgit/collab-maint/xymon.git/tree/debian/patches/workaround-changed-ntpdate-behaviour
(referenced in earlier mailing list post:
http://lists.xymon.com/pipermail/xymon/2015-September/042224.html). This
would work perfectly!!

However, since I am using Terabithia RPMs, it is not very easy to simply
apply this above patch and recompile the base source code and use that,
since I am I not using just the base source code.

Maybe a hotfix release of Xymon could be created and released sometime
very soon with this fix? Currently, I am unable to do all the ntp tests I
would like to do because it causes xymonnet to go over the 5 minute
interval time!....

In the meantime, would it be possible for someone (maybe J.C.) to create a
binary patch or something that I could apply to the current xymonnet binary
with the fix in the above patch? Just a thought...and would be extremely
appreciated!!


Thank you very much for your time and work you do to make Xymon a great
monitoring tool! I look forward to some fixes being applied to resolve this
issue with the ntp tests!


TL;DR:
ntp checks take WAY to long in Xymon using hard-coded ntpdate options, due
to change in ntpdate behavior starting in upstream ntp version 4.2.6. This
particularly affects Xymon servers running on RHEL 7 (and new releases in
the future). Hotfix needed in Xymon to fix this issue (see source code
patch above) ASAP! If possible, a binary patch to xymonnet binary would be
much preferred in the meantime! I am using Terabithia RPMs for my Xymon
server, so using above source code patch is not possible or desirable!

FYI: I am running Xymon 4.3.21-4.el7.terabithia for my Xymon server.

Thanks!!

--
Matt Vander Werf

list Jeremy Laidman · Tue, 29 Sep 2015 10:50:14 +1000 ·
quoted from Matt Vander Werf
On 25 September 2015 at 23:07, Matt Vander Werf <user-07704c41c3ad@xymon.invalid> wrote:
<bump>

Any response to this e-mail? Just wanted to make sure it wasn't missed....
In the mean time, here's a work-around you could use.

First create a script named /usr/local/bin/xymon-ntpdate:

#!/bin/sh
while [ "$2" ]; do shift; done
ntpdate -p 1 -u -q $1

Now, define NTPDATE=/usr/local/bin/xymon-ntpdate

This script simply throws away all of the switches, and only keeps the IP
address given by xymonnet, adding its own parameters as required.

You could probably squeeze this into NTPDATE definition, something like
this (untested):

NTPDATE="sh -c 'while [ .$1 != . ]; do shift; done; ntpdate -p 1 -u -q $1'"

I'd go with a separate script.

To be honest, I don't really know why the parameters were hard-coded into
xymonnet in the first place, and not defined in NTPDATE.  Doing the latter
would make it much easier to tune for the environment or substitute a
different command with completely different parameters.

Cheers
Jeremy
list Matt Vander Werf · Wed, 30 Sep 2015 14:32:43 -0400 ·
Hi Jeremy,

Thanks for this workaround! I implemented the script as you described and
it seems to work great!

It still shows the original command (with the hard-coded parameters) on the
ntp status page (which makes sense), but I can tell that it's working
because the time spent on the "NTP Tests Executed" part of xymonnet went
from taking ~14 seconds to ~0.738 seconds for the 7 hosts I currently have
set up to use the ntp test. (We were keeping the number of hosts that were
using the ntp test at a bare minimum to prevent any issues with xymonnet
going over the 5 minute limit, and this was the bare minimum number of
hosts we could go down to. We had many, many more hosts (over 170, at
least) with ntp checks in our old Xymon server running RHEL 5, before
moving to our current Xymon server running RHEL 7, where we had the issues
with ntpdate).

Of course, I'd prefer this issue be fixed in the actual Xymon code, so I
don't have to use this workaround to make the ntp tests work, but this will
do for the time being....

I do hope this can get resolved somehow in the Xymon code sometime in the
near future!

Thanks again, Jeremy, for this great workaround!

--
Matt Vander Werf
HPC System Administrator
University of Notre Dame
Center for Research Computing - Union Station
XXX W. South Street
South Bend, IN XXXXX
Phone: (XXX) XXX-XXXX

On Mon, Sep 28, 2015 at 8:50 PM, Jeremy Laidman <user-71895fb2e44c@xymon.invalid>
quoted from Jeremy Laidman
wrote:
On 25 September 2015 at 23:07, Matt Vander Werf <user-07704c41c3ad@xymon.invalid> wrote:
<bump>

Any response to this e-mail? Just wanted to make sure it wasn't missed....
In the mean time, here's a work-around you could use.

First create a script named /usr/local/bin/xymon-ntpdate:

#!/bin/sh
while [ "$2" ]; do shift; done
ntpdate -p 1 -u -q $1

Now, define NTPDATE=/usr/local/bin/xymon-ntpdate

This script simply throws away all of the switches, and only keeps the IP
address given by xymonnet, adding its own parameters as required.

You could probably squeeze this into NTPDATE definition, something like
this (untested):

NTPDATE="sh -c 'while [ .$1 != . ]; do shift; done; ntpdate -p 1 -u -q $1'"

I'd go with a separate script.

To be honest, I don't really know why the parameters were hard-coded into
xymonnet in the first place, and not defined in NTPDATE.  Doing the latter
would make it much easier to tune for the environment or substitute a
different command with completely different parameters.

Cheers
Jeremy