Xymon Mailing List Archive search

Hobbit HTTP Monitor Anomalies

4 messages in this thread

list James Wade · Thu, 18 Jun 2009 09:41:56 -0500 ·
So, last weekend, we had a huge conference call because

two Production HTTPS URL's that we monitor with Hobbit were

getting timeouts exceeding 30 seconds.

 
No other URL's on the Hobbit Server that we monitor were

getting any types of timeouts.

 
I have two Hobbit Servers, located at two physical sites, Production

& Development. The Production Hobbit Server said that no timeouts

occurred on the two HTTPS URL's. However, the Development Hobbit

Server (which is the primary monitoring server), indicated that the URL's

were timing out. 

 
So, the only difference appeared to be in the network. However, our

network team looked at the network and showed no latency, and

the placed a sniffer on the Development Hobbit Server and saw all

HTTPS packets coming back without timeouts.

 
Early this week, we placed a box that sniffs that network and can monitor

for specific events. In this case, it's connected to the same network as

the Development Hobbit Server, and its monitoring the HTTPS request

to the two URL's.

 
Last night, I received timeouts from Hobbit on the URLs, but the sniffer

showed not such timeouts. 

 
I've checked the Hobbit server,  CPU, Load, Network, all the graphs don't

show any correlation to the timeouts. 

 
Does anyone have any ideas on this puzzle?

 
Thanks..James
list James Wade · Thu, 18 Jun 2009 10:52:03 -0500 ·
We connected up a sniffer. It appears that the

HTTPS request never receives a response. Hobbit is

sending it out, but a reply never occurs.

 
James
quoted from James Wade

 
From: James Wade [mailto:user-659655b2ea05@xymon.invalid] 
Sent: Thursday, June 18, 2009 9:42 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Hobbit HTTP Monitor Anomalies 

 
So, last weekend, we had a huge conference call because

two Production HTTPS URL's that we monitor with Hobbit were

getting timeouts exceeding 30 seconds.

 
No other URL's on the Hobbit Server that we monitor were

getting any types of timeouts.

 
I have two Hobbit Servers, located at two physical sites, Production

& Development. The Production Hobbit Server said that no timeouts

occurred on the two HTTPS URL's. However, the Development Hobbit

Server (which is the primary monitoring server), indicated that the URL's

were timing out. 

 
So, the only difference appeared to be in the network. However, our

network team looked at the network and showed no latency, and

the placed a sniffer on the Development Hobbit Server and saw all

HTTPS packets coming back without timeouts.

 
Early this week, we placed a box that sniffs that network and can monitor

for specific events. In this case, it's connected to the same network as

the Development Hobbit Server, and its monitoring the HTTPS request

to the two URL's.

 
Last night, I received timeouts from Hobbit on the URLs, but the sniffer

showed not such timeouts. 

 
I've checked the Hobbit server,  CPU, Load, Network, all the graphs don't

show any correlation to the timeouts. 

 
Does anyone have any ideas on this puzzle?

 
Thanks..James
list Ralph Mitchell · Thu, 18 Jun 2009 19:11:00 -0500 ·
I don't know about doing retries in bb-net, but it's really quite easy to
roll your own http/https tests.  A very simple case could look something
like this:
   #!/bin/bash

   curl -s -S -L -retry 3 --max-time 30 -o /dev/null
https://server.domain.com/whatever
   if [ $? -eq 0 ]; then
     COLOR=green
     MESSAGE="page fetched OK"
   else
     COLOR=red
     MESSAGE="page fetch failed"
   fi

   $BB $BBDISP "status server,domain,com.https $COLOR `date`

  $MESSAGE"

The status message needs commas in the fully qualified server name, because
the column name is separated from the server name by a dot...  The content
in $MESSAGE can include html, such as a clickable url so your ops can try to
click through to verify the server status.

I've done a bunch of these in the last 8 years...  :)

Ralph Mitchell
quoted from James Wade


On Thu, Jun 18, 2009 at 10:52 AM, James Wade <user-659655b2ea05@xymon.invalid>wrote:
 We connected up a sniffer. It appears that the

HTTPS request never receives a response. Hobbit is

sending it out, but a reply never occurs.


James


*From:* James Wade [mailto:user-659655b2ea05@xymon.invalid]
*Sent:* Thursday, June 18, 2009 9:42 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* [hobbit] Hobbit HTTP Monitor Anomalies


So, last weekend, we had a huge conference call because

two Production HTTPS URL’s that we monitor with Hobbit were

getting timeouts exceeding 30 seconds.


No other URL’s on the Hobbit Server that we monitor were

getting any types of timeouts.


I have two Hobbit Servers, located at two physical sites, Production

& Development. The Production Hobbit Server said that no timeouts

occurred on the two HTTPS URL’s. However, the Development Hobbit

Server (which is the primary monitoring server), indicated that the URL’s

were timing out.


So, the only difference appeared to be in the network. However, our

network team looked at the network and showed no latency, and

the placed a sniffer on the Development Hobbit Server and saw all

HTTPS packets coming back without timeouts.


Early this week, we placed a box that sniffs that network and can monitor

for specific events. In this case, it’s connected to the same network as

the Development Hobbit Server, and its monitoring the HTTPS request

to the two URL’s.


Last night, I received timeouts from Hobbit on the URLs, but the sniffer

showed not such timeouts.


I’ve checked the Hobbit server,  CPU, Load, Network, all the graphs don’t

show any correlation to the timeouts.


Does anyone have any ideas on this puzzle?


Thanks….James

list Buchan Milne · Fri, 19 Jun 2009 11:04:51 +0200 ·
quoted from James Wade
On Thursday 18 June 2009 17:52:03 James Wade wrote:
We connected up a sniffer. It appears that the

HTTPS request never receives a response. Hobbit is

sending it out, but a reply never occurs.

So, it looks like a network problem.

Did you try to access the site with a browser (e.g. lynx or links) on the 
hobbit server - if it also times out, then you've eliminated Hobbit as being 
the problem.
quoted from Ralph Mitchell
James


From: James Wade [mailto:user-659655b2ea05@xymon.invalid]
Sent: Thursday, June 18, 2009 9:42 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Hobbit HTTP Monitor Anomalies


So, last weekend, we had a huge conference call because

two Production HTTPS URL's that we monitor with Hobbit were

getting timeouts exceeding 30 seconds.


No other URL's on the Hobbit Server that we monitor were

getting any types of timeouts.


I have two Hobbit Servers, located at two physical sites, Production

& Development. The Production Hobbit Server said that no timeouts

occurred on the two HTTPS URL's. However, the Development Hobbit

Server (which is the primary monitoring server), indicated that the URL's

were timing out.


So, the only difference appeared to be in the network. However, our

network team looked at the network and showed no latency, and

the placed a sniffer on the Development Hobbit Server and saw all

HTTPS packets coming back without timeouts.


Early this week, we placed a box that sniffs that network and can monitor

for specific events. In this case, it's connected to the same network as

the Development Hobbit Server, and its monitoring the HTTPS request

to the two URL's.


Last night, I received timeouts from Hobbit on the URLs, but the sniffer

showed not such timeouts.


I've checked the Hobbit server,  CPU, Load, Network, all the graphs don't

show any correlation to the timeouts.


Does anyone have any ideas on this puzzle?


Thanks..James