Xymon Mailing List Archive search

randomizing execution of tests

10 messages in this thread

list David Paper · Thu, 5 Feb 2009 12:25:48 -0500 ·
Greetings hobbit gurus [0],

While I am still trying to search my way to an answer via the archives  of this list and google, I'm hoping someone could point me in the  right direction.

I've got a bb-hosts file with 8 server process instances getting  tested.  Each instance gets tested with 3 HTTP requests (2 GET, 1  POST).  All 8 server processes live on the same physical OS instance.   This results in 24 HTTP requests getting sent from hobbit within  1/100th of a second.  This causes the load on the host to spike, and  generates contention w/in each server to satisfy the requests.  This  same setup is repeated for hundreds of hosts and hundreds of processes.

Is there a way to tell hobbit to take all of the entries in bb-hosts  and test them in a random order w/in the 1 minute testing interval?   This would end up staggering the arrival of each HTTP test somewhat  and lessen contention within each HTTP server and on each host.

Thanks,

-dave

[0] Of which I am not, but ... maybe one day.

--
Dave Paper

MCSE is to computers as McDonalds Certified Chef is to fine cuisine.
list Ralph Mitchell · Fri, 6 Feb 2009 14:48:59 -0600 ·
I don't think that's possible with Xymon right now, but it can be done if
you're up to a little scripting.  I had an aging, single 733MHz cpu DL380
running web page checkouts on 400+ hosts, generating around 2700 reports,
running at various intervals from 30 seconds to 24 hours.

The trick is to use cron for scheduling...

Something like this, for instance:

============= cut here ============
#!/bin/sh

TESTHOST=www.google.com
TESTURL=http://$TESTHOST/

TIMEOUT=30

# Grab *just* the headers, simulating Xymon's builtin http check
MESSAGE=`curl -m $TIMEOUT \
     -w 'Seconds:     %{time_total}\n' \
     -s -S -L -I $TESTURL | $GREP -v Set-Cookie`
if [ "$?" -eq "0" ]; then
  COLOR=green
else
  COLOR=red
fi

# convert dots to commas in the hostname
MACHINE=`echo $TESTHOST | $SED -e 's/\./\,/g'

$BB $BBDISP "status $MACHINE.home $COLOR `date`

$MESSAGE"
============= cut here ============

You'd run that from xymon's crontab using a command line like:

     $HOME/server/bin/bbcmd $HOME/ping-google.sh > /tmp/ping-google.out 2>&1

at whatever interval is appropriate for the target.

Ralph Mitchell
quoted from David Paper


On Thu, Feb 5, 2009 at 11:25 AM, David Paper <user-ad0dc750b2b6@xymon.invalid> wrote:
Greetings hobbit gurus [0],

While I am still trying to search my way to an answer via the archives of
this list and google, I'm hoping someone could point me in the right
direction.

I've got a bb-hosts file with 8 server process instances getting tested.
 Each instance gets tested with 3 HTTP requests (2 GET, 1 POST).  All 8
server processes live on the same physical OS instance.  This results in 24
HTTP requests getting sent from hobbit within 1/100th of a second.  This
causes the load on the host to spike, and generates contention w/in each
server to satisfy the requests.  This same setup is repeated for hundreds of
hosts and hundreds of processes.

Is there a way to tell hobbit to take all of the entries in bb-hosts and
test them in a random order w/in the 1 minute testing interval?  This would
end up staggering the arrival of each HTTP test somewhat and lessen
contention within each HTTP server and on each host.

Thanks,

-dave

[0] Of which I am not, but ... maybe one day.

--
Dave Paper

MCSE is to computers as McDonalds Certified Chef is to fine cuisine.

list Asif Iqbal · Fri, 6 Feb 2009 17:45:08 -0500 ·
quoted from Ralph Mitchell
On Fri, Feb 6, 2009 at 3:48 PM, Ralph Mitchell <user-00a5e44c48c0@xymon.invalid> wrote:
I don't think that's possible with Xymon right now, but it can be done if
you're up to a little scripting.  I had an aging, single 733MHz cpu DL380
running web page checkouts on 400+ hosts, generating around 2700 reports,
running at various intervals from 30 seconds to 24 hours.

The trick is to use cron for scheduling...

Something like this, for instance:

============= cut here ============
#!/bin/sh

TESTHOST=www.google.com
TESTURL=http://$TESTHOST/

TIMEOUT=30

# Grab *just* the headers, simulating Xymon's builtin http check
MESSAGE=`curl -m $TIMEOUT \
     -w 'Seconds:     %{time_total}\n' \
     -s -S -L -I $TESTURL | $GREP -v Set-Cookie`
if [ "$?" -eq "0" ]; then
  COLOR=green
else
  COLOR=red
fi

# convert dots to commas in the hostname
MACHINE=`echo $TESTHOST | $SED -e 's/\./\,/g'

$BB $BBDISP "status $MACHINE.home $COLOR `date`

$MESSAGE"
============= cut here ============
This curl command looks all I need as an extension script instead of
http:// to get my host specific http timeout

I could just use this instead of urlplus.pl, correct?
quoted from Ralph Mitchell
You'd run that from xymon's crontab using a command line like:

     $HOME/server/bin/bbcmd $HOME/ping-google.sh > /tmp/ping-google.out 2>&1

at whatever interval is appropriate for the target.

Ralph Mitchell


On Thu, Feb 5, 2009 at 11:25 AM, David Paper <user-ad0dc750b2b6@xymon.invalid> wrote:
Greetings hobbit gurus [0],

While I am still trying to search my way to an answer via the archives of
this list and google, I'm hoping someone could point me in the right
direction.

I've got a bb-hosts file with 8 server process instances getting tested.
 Each instance gets tested with 3 HTTP requests (2 GET, 1 POST).  All 8
server processes live on the same physical OS instance.  This results in 24
HTTP requests getting sent from hobbit within 1/100th of a second.  This
causes the load on the host to spike, and generates contention w/in each
server to satisfy the requests.  This same setup is repeated for hundreds of
hosts and hundreds of processes.

Is there a way to tell hobbit to take all of the entries in bb-hosts and
test them in a random order w/in the 1 minute testing interval?  This would
end up staggering the arrival of each HTTP test somewhat and lessen
contention within each HTTP server and on each host.

Thanks,

-dave

[0] Of which I am not, but ... maybe one day.

--
Dave Paper

MCSE is to computers as McDonalds Certified Chef is to fine cuisine.

-- 

Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
list Ralph Mitchell · Fri, 6 Feb 2009 21:29:25 -0600 ·
quoted from Asif Iqbal
On Fri, Feb 6, 2009 at 4:45 PM, Asif Iqbal <user-6f4b51ac2a40@xymon.invalid> wrote:
On Fri, Feb 6, 2009 at 3:48 PM, Ralph Mitchell <user-00a5e44c48c0@xymon.invalid>
wrote:
I don't think that's possible with Xymon right now, but it can be done if
you're up to a little scripting.  I had an aging, single 733MHz cpu DL380
running web page checkouts on 400+ hosts, generating around 2700 reports,
running at various intervals from 30 seconds to 24 hours.

The trick is to use cron for scheduling...

Something like this, for instance:

============= cut here ============
#!/bin/sh

TESTHOST=www.google.com
TESTURL=http://$TESTHOST/

TIMEOUT=30

# Grab *just* the headers, simulating Xymon's builtin http check
MESSAGE=`curl -m $TIMEOUT \
     -w 'Seconds:     %{time_total}\n' \
     -s -S -L -I $TESTURL | $GREP -v Set-Cookie`
if [ "$?" -eq "0" ]; then
  COLOR=green
else
  COLOR=red
fi

# convert dots to commas in the hostname
MACHINE=`echo $TESTHOST | $SED -e 's/\./\,/g'

$BB $BBDISP "status $MACHINE.home $COLOR `date`

$MESSAGE"
============= cut here ============
This curl command looks all I need as an extension script instead of
http:// to get my host specific http timeout

I could just use this instead of urlplus.pl, correct?
Yes, you could do exactly that.  You'll probably want to make the above
script into a function or child script and feed it the hostname, url & max
time values pulled from a file.

Ralph Mitchell
list Larry Barber · Fri, 6 Feb 2009 23:09:12 -0600 ·
You can also stretch out your testing interval by limiting the concurrency
in bbtest-net. See the man page for the exact syntax.

Thanks,
Larry Barber
quoted from David Paper

On Thu, Feb 5, 2009 at 11:25 AM, David Paper <user-ad0dc750b2b6@xymon.invalid> wrote:
Greetings hobbit gurus [0],

While I am still trying to search my way to an answer via the archives of
this list and google, I'm hoping someone could point me in the right
direction.

I've got a bb-hosts file with 8 server process instances getting tested.
 Each instance gets tested with 3 HTTP requests (2 GET, 1 POST).  All 8
server processes live on the same physical OS instance.  This results in 24
HTTP requests getting sent from hobbit within 1/100th of a second.  This
causes the load on the host to spike, and generates contention w/in each
server to satisfy the requests.  This same setup is repeated for hundreds of
hosts and hundreds of processes.

Is there a way to tell hobbit to take all of the entries in bb-hosts and
test them in a random order w/in the 1 minute testing interval?  This would
end up staggering the arrival of each HTTP test somewhat and lessen
contention within each HTTP server and on each host.

Thanks,

-dave

[0] Of which I am not, but ... maybe one day.

--
Dave Paper

MCSE is to computers as McDonalds Certified Chef is to fine cuisine.

list Asif Iqbal · Sat, 7 Feb 2009 11:03:15 -0500 ·
quoted from Larry Barber
On Sat, Feb 7, 2009 at 12:09 AM, Larry Barber <user-6ef9c2864140@xymon.invalid> wrote:
You can also stretch out your testing interval by limiting the concurrency
in bbtest-net. See the man page for the exact syntax.
problem is not on hobbit server side, in which case reducing the concurrency
limit will help. It is actually one specific http server which takes
longer than default 10 second
to respond. I don't want to change the default timeout for all. I was
looking for a {host,service}
specific timeout. Since that does not exist an extension script with
curl for just that one http
server will do the job.
quoted from Larry Barber
Thanks,
Larry Barber

On Thu, Feb 5, 2009 at 11:25 AM, David Paper <user-ad0dc750b2b6@xymon.invalid> wrote:
Greetings hobbit gurus [0],

While I am still trying to search my way to an answer via the archives of
this list and google, I'm hoping someone could point me in the right
direction.

I've got a bb-hosts file with 8 server process instances getting tested.
 Each instance gets tested with 3 HTTP requests (2 GET, 1 POST).  All 8
server processes live on the same physical OS instance.  This results in 24
HTTP requests getting sent from hobbit within 1/100th of a second.  This
causes the load on the host to spike, and generates contention w/in each
server to satisfy the requests.  This same setup is repeated for hundreds of
hosts and hundreds of processes.

Is there a way to tell hobbit to take all of the entries in bb-hosts and
test them in a random order w/in the 1 minute testing interval?  This would
end up staggering the arrival of each HTTP test somewhat and lessen
contention within each HTTP server and on each host.

Thanks,

-dave

[0] Of which I am not, but ... maybe one day.

--
Dave Paper

MCSE is to computers as McDonalds Certified Chef is to fine cuisine.

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
list Henrik Størner · Mon, 9 Feb 2009 22:56:49 +0100 ·
quoted from David Paper
On Thu, Feb 05, 2009 at 12:25:48PM -0500, David Paper wrote:
I've got a bb-hosts file with 8 server process instances getting tested.  
Each instance gets tested with 3 HTTP requests (2 GET, 1 POST).  All 8 
server processes live on the same physical OS instance.  This results in 
24 HTTP requests getting sent from hobbit within 1/100th of a second.  
This causes the load on the host to spike, and generates contention w/in 
each server to satisfy the requests.  This same setup is repeated for 
hundreds of hosts and hundreds of processes.
Yeah, Xymon can be pretty agressive about testing network services.
It's unfortunate that you hit the same server with multiple requests 
at once, although it is a bit unusual that a web/application server
has a problem with just 24 requests simultaneously.

But of course, it depends on what your web application does.
quoted from Asif Iqbal
Is there a way to tell hobbit to take all of the entries in bb-hosts and 
test them in a random order w/in the 1 minute testing interval?  This 
would end up staggering the arrival of each HTTP test somewhat and lessen 
contention within each HTTP server and on each host.
I'm afraid not, at least not directly.

What you *could do was to setup more than one task to run the network
tests, using the "NET:foo" definition in bb-hosts to create groups of
tests than can be run simultaneously. From you description it sounds 
as if you have 8 "pseudo-hosts" defined in bb-hosts, each of them with
3 HTTP tests. I.e.

   0.0.0.0  web1   # http://web1/test1 http://web1/test2 \
                     post;http://web1/test3;blababla
   0.0.0.0  web2   # http://web2/test1 http://web2/test2 \
                     post;http://web2/test3;blababla

etc. for your 8 server processes. Is that correct ? Or would it at
least be possible to configure it that way ?

If you can do that, then you can assign each of web1, web2, web3 
and so on a different NET: setting, and then you run the checks 
for each of these NET's one at a time.

E.g. if you want to run the tests for "web1" and "web2" separately
from "web3" and "web4", then your bb-hosts file would be like this:

   0.0.0.0  web1   # http://web1/test1 http://web1/test2 \
                     post;http://web1/test3;blababla \
		     NET:testgroup1
   0.0.0.0  web2   # http://web2/test1 http://web2/test2 \
                     post;http://web2/test3;blababla
		     NET:testgroup1
   0.0.0.0  web3   # http://web3/test1 http://web3/test2 \
                     post;http://web3/test3;blababla
		     NET:testgroup2
   0.0.0.0  web4   # http://web4/test1 http://web4/test2 \
                     post;http://web4/test3;blababla
		     NET:testgroup2

So you have two "NET:" groups of tests - "testgroup1" and "testgroup2".
Then you change to [bbnet] task to run this script "runnetworktests.sh" 
instead of just launching bbtest-net:

   #!/bin/sh

   # Run each NET group of tests separately
   # This script passes all commandline options to bbtest-net

   BBLOCATION=testgroup1 bbtest-net $*
   BBLOCATION=testgroup2 bbtest-net $*

   # Finally, run the tests that have no NET definition
   bbtest-net --test-untagged $*

   exit 0

The [bbnet] task (in hobbitlaunch.cfg) would then have

   CMD runnetworktests.sh --report --ping --checkresponse

instead of the default "CMD bbtest-net --report --ping --checkresponse"


The only "problem" with this is that you get to do the configuration of
what tests can run simultaneously by hand. And each invocation of 
bbtest-net has to parse all of the bb-hosts file, there is a small
overhead in doing that.


Regards,
Henrik
list Frank Gruellich · Fri, 13 Feb 2009 18:01:44 +0100 ·
Hi,
quoted from Henrik Størner

Henrik Størner wrote:
The only "problem" with this is that you get to do the configuration of
what tests can run simultaneously by hand.
I'd like to second that feature request (if it's not already one I would
like to make it one ;-)).  We have ~1k such "servers", but there are
only ~100 real machines.  (In fact, it's a bit more complicated.)  So it
can happen, that some machines are hit by up 20 monitoring requests
simultanously.  These machines are no simple HTTP servers but something
more advanced.  Unfortunately each of this servers can only deal with 4
requests in parallel, others are queued up and delayed.  Same goes for
real, non-monitoring requests, they are delayed during such bulks of
monitoring requests and users of the servers have to wait.

OTOH with so many servers we can't manage several monitoring groups
keeping in mind how many checks are in one groups and which one still
has "free slots" available.  Having such a randomization and spreading
in time integrated into bbtest-net would be really great.  (Thinking
about it, spreading in time is maybe difficult because you never know
how long all tests will take.  Randomization of test order should be
quite simple.  But I don't know anything about hobbit internals.)

Kind regards,
-- 
Navteq (DE) GmbH
Frank Gruellich
Map24 Systems and Networks

Duesseldorfer Strasse 40a
65760 Eschborn
Germany

Phone:      +XX XXXX XXXXX-XXX
Fax:        +XX XXXX XXXXX-XXX

USt-ID-No.: DE 197947163
Managing Directors: Thomas Golob, Alexander Wiegand,
Hans Pieter Gieszen, Martin Robert Stockman
list Henrik Størner · Sat, 14 Feb 2009 09:05:13 +0100 ·
quoted from Frank Gruellich
On Fri, Feb 13, 2009 at 06:01:44PM +0100, Frank Gruellich wrote:
OTOH with so many servers we can't manage several monitoring groups
keeping in mind how many checks are in one groups and which one still
has "free slots" available.  Having such a randomization and spreading
in time integrated into bbtest-net would be really great.  (Thinking
about it, spreading in time is maybe difficult because you never know
how long all tests will take.  Randomization of test order should be
quite simple.  But I don't know anything about hobbit internals.)
You're right that randomizing the sequence of tests is simple - the
attached patch against 4.3.0 should do that nicely.

Completely untested, but it should work :-) It won't be difficult to
port over to the 4.2.3 version if needed.

Spreading things out over a longer time requires much more of a
re-design. That may happen - I have some ideas about doing a major
re-design of how network tests are done - but it will be a while
before that evolves into any code.


Regards,
Henrik
Attachments (1)
list Dave Paper · Sat, 14 Feb 2009 22:17:08 -0500 ·
Now this is what I'm talking about!

Thanks to Ralph, Asif, Larry, Frank and of course, Henrik.

The servers being tested and constantly being improved, but at the  
moment, can take several seconds to respond.  While the server is  
busy, it chews an entire CPU, so when Hobbit's net tests run in  
parallel and hit all 8 servers on the host at once, host runs out of  
CPU.

I've dropped the hobbit concurrency down from 512 to 50, but hasn't  
had much of an effect.

The patch sounds like it'll do the trick.

Thanks!

-dave
quoted from Henrik Størner

On Feb 14, 2009, at 3:05 AM, Henrik Størner wrote:
On Fri, Feb 13, 2009 at 06:01:44PM +0100, Frank Gruellich wrote:
OTOH with so many servers we can't manage several monitoring groups
keeping in mind how many checks are in one groups and which one still
has "free slots" available.  Having such a randomization and  
spreading
in time integrated into bbtest-net would be really great.  (Thinking
about it, spreading in time is maybe difficult because you never know
how long all tests will take.  Randomization of test order should be
quite simple.  But I don't know anything about hobbit internals.)
You're right that randomizing the sequence of tests is simple - the
attached patch against 4.3.0 should do that nicely.

Completely untested, but it should work :-) It won't be difficult to
port over to the 4.2.3 version if needed.

Spreading things out over a longer time requires much more of a
re-design. That may happen - I have some ideas about doing a major
re-design of how network tests are done - but it will be a while
before that evolves into any code.


Regards,
Henrik

<randomize-tests.patch>To unsubscribe from the hobbit list, send an  
e-mail to
user-095ef1c764a2@xymon.invalid
--
Dave Paper                          user-8ca85b5297d2@xymon.invalid

"Hello, I must be going." --Groucho