Xymon Mailing List Archive search

HTTP Test Timeout & Delays

9 messages in this thread

list James Wade · Thu, 16 Nov 2006 09:31:53 -0600 ·
I'm having problems with the HTTP test.

Yesterday, I added additional URL's to monitor.

We monitor a load of URL's, and yesterday, I added a lot.

 
After adding the additional URL's, I started getting Timeouts

across the board. If I look at the graph for old URL's, it

shows the graph increasing at the same time I implemented

all the new URL's yesterday.

 
ie. I went from about 30 to 60 URL's.

 
Does Hobbit stager URL checks? Is there a way to stager them?

 
Thanks.James
list Ralph Mitchell · Fri, 17 Nov 2006 00:27:29 -0600 ·
quoted from James Wade
On 11/16/06, James Wade <user-659655b2ea05@xymon.invalid> wrote:
ie… I went from about 30 to 60 URL's.

Does Hobbit stager URL checks? Is there a way to stager them?
I don't know about Hobbit staggering URL checks, but I'm doing a whole
bunch more than 60 using scripts fired off by cron.  Most of them
repeat every ten minutes, some every five mins or every minute, and
I've spread them out using cron settings.

These are mostly scripts left over from a previous Big Brother
installation.  I haven't yet converted them to live entirely in a
Hobbit universe, so they still load up the BB definitions and use the
BB bin/bb program to deliver reports, but they're working just fine
from cron,.  In fact, most of them were never fired off by the BB ext
script functions.  My old, single 733MHz cpu DL380 was trucking along
without a problem, running around 1600 checks spread out over 400-odd
hosts, until it blew out its power supply...

I don't know offhand how many of my url checks could be converted -
quite a few checks are doing logins, or following links through
several pages - but I was thinking of doing exactly that.  Maybe I'll
rethink that strategy... :)

Ralph Mitchell
list Henrik Størner · Fri, 17 Nov 2006 09:14:06 +0100 ·
quoted from James Wade
On Thu, Nov 16, 2006 at 09:31:53AM -0600, James Wade wrote:
After adding the additional URL's, I started getting Timeouts
across the board. If I look at the graph for old URL's, it
shows the graph increasing at the same time I implemented
all the new URL's yesterday.
ie. I went from about 30 to 60 URL's.
By default, Hobbit runs lots of network tests in parallel. It has been
seen that this can overwhelm either a server or some of your network
infrastructure; or just generate enough traffic that packets are dropped
on their way to the Hobbit server.

60 URL's aren't a whole lot, though.

Still, you can try lowering the number of concurrent tests that Hobbit
performs. The "--concurrency=N" option for bbtest-net does that (goes in
hobbitlaunch.cfg). See the bbtest-net(1) man-page.


Henrik
list Stef Coene · Fri, 17 Nov 2006 09:33:43 +0100 ·
quoted from James Wade
On Thursday 16 November 2006 16:31, James Wade wrote:
I'm having problems with the HTTP test.

Yesterday, I added additional URL's to monitor.
We monitor a load of URL's, and yesterday, I added a lot.

After adding the additional URL's, I started getting Timeouts
across the board. If I look at the graph for old URL's, it
shows the graph increasing at the same time I implemented
all the new URL's yesterday.

ie. I went from about 30 to 60 URL's.

Does Hobbit stager URL checks? Is there a way to stager them?
I have a similar problem.  I have +/- 200 netwerk tests (ping, ...) and +/- 20 http tests.  All on local LAN.  I added a second hobbit server on a remote connection (1mbps) and I had lots of timeout on the http checks.

Hobbit is doing per default 254 network checks in parallel.  Let's say the first packet is 1 kbyte this means 254 kbyte/s.  And I only have a 1mb(it)ps line!!!  The solution was limiting bbnet to only 4 parallel checks --concurrency=4 to the bbnet start command.

Actullay, the strange thing is that there is a load balancer involved.  So if I do the remote check http and use the load balancer, I get +10 seconds.  Directly to the http server gives a normal result.  This is only for the 254 parallel checks, 4 parallel checks and everything is normal.


Stef
list Richard Leyton · Fri, 17 Nov 2006 09:55:55 +0000 ·
Morning all,

One request on this matter that I'd like to suggest (perhaps in a  future release) is individual HTTP test timeout settings. My customer  needs to monitor a few external URL's that we rely on for various  things. One in particular has frequent problems.

So we upped the default timeout on bbtest-net to 20s, which helped  but it's still not really enough, and I'm reluctant to increase it  further the board for just one bad egg. If there's a workaround to  this particular problem, I'd be happy to hear suggestions.

Thanks and Regards,

Richard.
quoted from Henrik Størner

On 17 Nov 2006, at 08:14, Henrik Stoerner wrote:
On Thu, Nov 16, 2006 at 09:31:53AM -0600, James Wade wrote:
After adding the additional URL's, I started getting Timeouts
across the board. If I look at the graph for old URL's, it
shows the graph increasing at the same time I implemented
all the new URL's yesterday.
ie. I went from about 30 to 60 URL's.
By default, Hobbit runs lots of network tests in parallel. It has been
seen that this can overwhelm either a server or some of your network
infrastructure; or just generate enough traffic that packets are  dropped
on their way to the Hobbit server.

60 URL's aren't a whole lot, though.

Still, you can try lowering the number of concurrent tests that Hobbit
performs. The "--concurrency=N" option for bbtest-net does that  (goes in
hobbitlaunch.cfg). See the bbtest-net(1) man-page.


Henrik

--

Richard Leyton - user-787ca786c598@xymon.invalid
http://www.leyton.org
list Ralph Mitchell · Fri, 17 Nov 2006 04:41:34 -0600 ·
quoted from Richard Leyton
On 11/17/06, Richard Leyton <user-787ca786c598@xymon.invalid> wrote:
So we upped the default timeout on bbtest-net to 20s, which helped
but it's still not really enough, and I'm reluctant to increase it
further the board for just one bad egg. If there's a workaround to
this particular problem, I'd be happy to hear suggestions.
You could try doing what I do, and run an external script to grab the
page.  I use curl to actually fetch web pages.  Here's a minimal
script to give you an idea:

   #!/bin/sh

   TIMEOUT=60     # give up after 60 seconds
   COOKIES=/tmp/cookies
   CURLOPTS="-s -S -L -b $COOKIES -c $COOKIES -m $TIMEOUT"
   TEST=http

   curl $CURLOPTS -o /tmp/page.html http://server.domain.com
   ret=$?
   if [ "$ret" -ne "0" ]; then
     MESSAGE="Something broke.  Curl code: $ret"
     COLOR=red
   else
     MESSAGE="Everything is peachy keen!"
     COLOR=green
   fi

   LINE="status $MACHINE.$TEST $COLOR `date`
$MESSAGE"
   $BB $BBDISP "$LINE"


You can also examine the page for "interesting stuff", such as text
that wouldn't appear if the server is broken, or text that shouldn't
appear if it's working fine.  Curl also handles secure servers,
proxies, several different authentication mechanisms, and can give you
timing information if you want it for graphs.

Ralph Mitchell
list James Wade · Fri, 17 Nov 2006 08:34:07 -0600 ·
I found a work-around to my problem.

I was monitoring some pages where the security certificate
was internally generated. If I went to those pages manually,
via a browser window, it would give me a security alert
pop-up window saying the security certificate name is invalid
and doesn't match the site. (Which it doesn't because they
are just using a self-generated one in development for testing)

Anyways, I had about 10 boxes that had this problem, and all
10 were getting timeouts at 10 seconds. This resulted in all
the other http tests starting to get random timeouts as well.
Almost as though all the other http tests were waiting on these
10 machines.

After I removed the 10 machines from bb-hosts, everything started
working just fine, and my graphs went down to 0.XX seconds.

Any suggestions on how I can get past this delay caused by the
security alert? What about the reason that a delay in a few boxes
causes all http tests to delay on all other machines?

Thanks....James
quoted from Richard Leyton

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Friday, November 17, 2006 2:14 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] HTTP Test Timeout & Delays

On Thu, Nov 16, 2006 at 09:31:53AM -0600, James Wade wrote:
After adding the additional URL's, I started getting Timeouts
across the board. If I look at the graph for old URL's, it
shows the graph increasing at the same time I implemented
all the new URL's yesterday.
ie. I went from about 30 to 60 URL's.
By default, Hobbit runs lots of network tests in parallel. It has been
seen that this can overwhelm either a server or some of your network
infrastructure; or just generate enough traffic that packets are dropped
on their way to the Hobbit server.

60 URL's aren't a whole lot, though.

Still, you can try lowering the number of concurrent tests that Hobbit
performs. The "--concurrency=N" option for bbtest-net does that (goes in
hobbitlaunch.cfg). See the bbtest-net(1) man-page.


Henrik
list Henrik Størner · Fri, 17 Nov 2006 16:00:49 +0100 ·
quoted from James Wade
On Fri, Nov 17, 2006 at 08:34:07AM -0600, James Wade wrote:
I found a work-around to my problem.

I was monitoring some pages where the security certificate
was internally generated. If I went to those pages manually,
via a browser window, it would give me a security alert
pop-up window saying the security certificate name is invalid
and doesn't match the site. (Which it doesn't because they
are just using a self-generated one in development for testing)

Anyways, I had about 10 boxes that had this problem, and all
10 were getting timeouts at 10 seconds. This resulted in all
the other http tests starting to get random timeouts as well.
Almost as though all the other http tests were waiting on these
10 machines.

After I removed the 10 machines from bb-hosts, everything started
working just fine, and my graphs went down to 0.XX seconds.

Any suggestions on how I can get past this delay caused by the
security alert? What about the reason that a delay in a few boxes
causes all http tests to delay on all other machines?
This is very odd. What SSL library are you using for the network 
tests ? Just run "bbtest-net --version" and you should get:

  bbtest-net version 4.2.0
  SSL library : OpenSSL 0.9.8a 11 Oct 2005
  LDAP library: OpenLDAP 20130

It sounds as if the SSL library is attempting to verify the 
authenticity of the SSL certificate from your server. But I've never
heard of it doing this by default. So I'd like to know which version
of OpenSSL you are using, so I can see if there's a configuration
setting that Hobbit can tweak to disable this.


Regards,
Henrik
list James Wade · Fri, 17 Nov 2006 09:18:05 -0600 ·
$ ./bbtest-net --version
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.8b 04 May 2006
LDAP library: OpenLDAP 20328
quoted from Henrik Størner


This is very odd. What SSL library are you using for the network 
tests ? Just run "bbtest-net --version" and you should get:

  bbtest-net version 4.2.0
  SSL library : OpenSSL 0.9.8a 11 Oct 2005
  LDAP library: OpenLDAP 20130

It sounds as if the SSL library is attempting to verify the 
authenticity of the SSL certificate from your server. But I've never
heard of it doing this by default. So I'd like to know which version
of OpenSSL you are using, so I can see if there's a configuration
setting that Hobbit can tweak to disable this.


Regards,
Henrik


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Friday, November 17, 2006 9:01 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] HTTP Test Timeout & Delays -- More Info

On Fri, Nov 17, 2006 at 08:34:07AM -0600, James Wade wrote:
I found a work-around to my problem.

I was monitoring some pages where the security certificate
was internally generated. If I went to those pages manually,
via a browser window, it would give me a security alert
pop-up window saying the security certificate name is invalid
and doesn't match the site. (Which it doesn't because they
are just using a self-generated one in development for testing)

Anyways, I had about 10 boxes that had this problem, and all
10 were getting timeouts at 10 seconds. This resulted in all
the other http tests starting to get random timeouts as well.
Almost as though all the other http tests were waiting on these
10 machines.

After I removed the 10 machines from bb-hosts, everything started
working just fine, and my graphs went down to 0.XX seconds.

Any suggestions on how I can get past this delay caused by the
security alert? What about the reason that a delay in a few boxes
causes all http tests to delay on all other machines?