Xymon Mailing List Archive search

wake up call

22 messages in this thread

list Gavin Leonard · Mon, 19 May 2008 13:26:31 -0600 ·
Every morning at 7am I get pages from every host I monitor including the display server,  that its connection recovered.. the it runs great for the next 23hrs.  looking at hobbit web page I see no down time nor do the servers show any down time.  But when I click on the historical web link to see the info.. I get this.. I really love hobbit..  but I am not a Web guy at all and I think it might be apache related...


Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error.
More information about this error may be available in the server error log.
Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80


Gavin Leonard

[cid:image001.gif at 01C8B9B2.A49DC760]

Director, Systems-Network Engineering

T

 XXX-XXX-XXXX

F

 XXX-XXX-XXXX

E

 user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>


Research | Marketing | Sales Generation

www.progrexion.com<http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.
list Josh Luthman · Mon, 19 May 2008 15:32:52 -0400 ·
Gavin,

I am having a very similar issue - though it is not every single day.  My
issue is that every host (or almost all of the hosts) will have conn:red and
then come back up ~60s later.  I just confirmed this weekend that it is not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the errors
in /var/log/httpd/error_log from this time period?

Josh

On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
quoted from Gavin Leonard
wrote:
 Every morning at 7am I get pages from every host I monitor including the
display server,  that its connection recovered.. the it runs great for the
next 23hrs.  looking at hobbit web page I see no down time nor do the
servers show any down time.  But when I click on the historical web link to
see the info.. I get this.. I really love hobbit..  but I am not a Web guy
at all and I think it might be apache related…


*Internal Server Error*

The server encountered an internal error or misconfiguration and was unable
to complete your request.

Please contact the server administrator, root at localhost and inform them of
the time the error occurred, and anything you might have done that may have
caused the error.

More information about this error may be available in the server error log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com* <http://www.progrexion.com/>;
quoted from Gavin Leonard


This email and its contents are confidential. If you are not the intended
recipient, delete this email and do not use or disclose the information
within this email or its attachments. Thank you.

-- 

Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Phil Wild · Tue, 20 May 2008 09:31:38 +0800 ·
Hi Josh,

This doesn't relate to the apache error, it relates to your problem... This
is a theory...

I am wondering if you are running a caching name server on your hobbit
installation? If not, I am wondering if the fping places too high a load on
your dns server and misses the occassional host. Even with a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
quoted from Josh Luthman
Gavin,

I am having a very similar issue - though it is not every single day.  My
issue is that every host (or almost all of the hosts) will have conn:red and
then come back up ~60s later.  I just confirmed this weekend that it is not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:
 Every morning at 7am I get pages from every host I monitor including the
display server,  that its connection recovered.. the it runs great for the
next 23hrs.  looking at hobbit web page I see no down time nor do the
servers show any down time.  But when I click on the historical web link to
see the info.. I get this.. I really love hobbit..  but I am not a Web guy
at all and I think it might be apache related…


*Internal Server Error*

The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost and inform them
of the time the error occurred, and anything you might have done that may
have caused the error.

More information about this error may be available in the server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com* <http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the intended
recipient, delete this email and do not use or disclose the information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

-- 

Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
list Josh Luthman · Mon, 19 May 2008 23:15:00 -0400 ·
That was someone's theory in a very large post about this issue in the
past.  I did install a caching only named on the box and it did not
fix the problem.

Did relieve the stress of my other DNS server though :)
quoted from Phil Wild


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your problem... This
is a theory...

I am wondering if you are running a caching name server on your hobbit
installation? If not, I am wondering if the fping places too high a load on
your dns server and misses the occassional host. Even with a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every single day.  My
issue is that every host (or almost all of the hosts) will have conn:red
and
then come back up ~60s later.  I just confirmed this weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:
 Every morning at 7am I get pages from every host I monitor including the
display server,  that its connection recovered.. the it runs great for
the
next 23hrs.  looking at hobbit web page I see no down time nor do the
servers show any down time.  But when I click on the historical web link
to
see the info.. I get this.. I really love hobbit..  but I am not a Web
guy
at all and I think it might be apache related…


*Internal Server Error*

The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost and inform them
of the time the error occurred, and anything you might have done that may
have caused the error.

More information about this error may be available in the server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com* <http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the intended
recipient, delete this email and do not use or disclose the information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Phil Wild · Tue, 20 May 2008 11:21:17 +0800 ·
What is ttl set to for your domain? It would be interesting to see if the
issue reduces with a higher ttl. Another way to ensure this is not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
quoted from Josh Luthman
That was someone's theory in a very large post about this issue in the
past.  I did install a caching only named on the box and it did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your problem...
This
is a theory...

I am wondering if you are running a caching name server on your hobbit
installation? If not, I am wondering if the fping places too high a load
on
your dns server and misses the occassional host. Even with a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every single day.
 My
issue is that every host (or almost all of the hosts) will have conn:red
and
then come back up ~60s later.  I just confirmed this weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I monitor including
the
display server,  that its connection recovered.. the it runs great for
the
next 23hrs.  looking at hobbit web page I see no down time nor do the
servers show any down time.  But when I click on the historical web
link
to
see the info.. I get this.. I really love hobbit..  but I am not a Web
guy
at all and I think it might be apache related…


*Internal Server Error*

The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost and inform
them
of the time the error occurred, and anything you might have done that
may
have caused the error.

More information about this error may be available in the server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com* <http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the
intended
recipient, delete this email and do not use or disclose the information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

-- 
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
list Josh Luthman · Mon, 19 May 2008 23:49:39 -0400 ·
Well almost (good 99%) of my hosts have the testip tag, so it doesn't
need to look up the names.  The things it does look up are 5m TTLs
quoted from Phil Wild
though.


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to see if the
issue reduces with a higher ttl. Another way to ensure this is not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this issue in the
past.  I did install a caching only named on the box and it did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your problem...
This
is a theory...

I am wondering if you are running a caching name server on your hobbit
installation? If not, I am wondering if the fping places too high a load
on
your dns server and misses the occassional host. Even with a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every single day.
 My
issue is that every host (or almost all of the hosts) will have
conn:red
and
then come back up ~60s later.  I just confirmed this weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I monitor including
the
display server,  that its connection recovered.. the it runs great for
the
next 23hrs.  looking at hobbit web page I see no down time nor do the
servers show any down time.  But when I click on the historical web
link
to
see the info.. I get this.. I really love hobbit..  but I am not a Web
guy
at all and I think it might be apache related…


*Internal Server Error*

The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost and inform
them
of the time the error occurred, and anything you might have done that
may
have caused the error.

More information about this error may be available in the server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com* <http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the
intended
recipient, delete this email and do not use or disclose the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Phil Wild · Tue, 20 May 2008 12:38:25 +0800 ·
Hmmm... bummer, there goes that theory... If you are using IP addresses, and
you are still getting failures on these hosts, then dns is not involved. A
ttl of five minutes is fairly worthless for a caching server. It only helps
if it hits the same device within five minutes, as hobbit is pinging every
five mins (default), you will most likely always be pulling from your
master/slaves...

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
quoted from Josh Luthman
Well almost (good 99%) of my hosts have the testip tag, so it doesn't
need to look up the names.  The things it does look up are 5m TTLs
 though.


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to see if the
issue reduces with a higher ttl. Another way to ensure this is not the
area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this issue in the
past.  I did install a caching only named on the box and it did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your problem...
This
is a theory...

I am wondering if you are running a caching name server on your hobbit
installation? If not, I am wondering if the fping places too high a
load
on
your dns server and misses the occassional host. Even with a caching
dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every single day.
 My
issue is that every host (or almost all of the hosts) will have
conn:red
and
then come back up ~60s later.  I just confirmed this weekend that it
is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <
user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I monitor
including
the
display server,  that its connection recovered.. the it runs great
for
the
next 23hrs.  looking at hobbit web page I see no down time nor do
the
servers show any down time.  But when I click on the historical web
link
to
see the info.. I get this.. I really love hobbit..  but I am not a
Web
guy
at all and I think it might be apache related…


*Internal Server Error*

The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost and inform
them
of the time the error occurred, and anything you might have done
that
may
have caused the error.

More information about this error may be available in the server
error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com* <http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the
intended
recipient, delete this email and do not use or disclose the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
--
 Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

-- 
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
list Gavin Leonard · Tue, 20 May 2008 09:27:50 -0600 ·
Happened again this morning.. so I am going to try a different dns server.

-Gavin
quoted from Phil Wild

From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
Sent: Monday, May 19, 2008 10:38 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves...

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
quoted from Phil Wild
Well almost (good 99%) of my hosts have the testip tag, so it doesn't
need to look up the names.  The things it does look up are 5m TTLs
though.


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
What is ttl set to for your domain? It would be interesting to see if the
issue reduces with a higher ttl. Another way to ensure this is not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
quoted from Phil Wild
That was someone's theory in a very large post about this issue in the
past.  I did install a caching only named on the box and it did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your problem...
This
is a theory...

I am wondering if you are running a caching name server on your hobbit
installation? If not, I am wondering if the fping places too high a load
on
your dns server and misses the occassional host. Even with a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
quoted from Phil Wild
Gavin,

I am having a very similar issue - though it is not every single day.
 My
issue is that every host (or almost all of the hosts) will have
conn:red
and
then come back up ~60s later.  I just confirmed this weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
quoted from Phil Wild
wrote:
 Every morning at 7am I get pages from every host I monitor including
the
display server,  that its connection recovered.. the it runs great for
the
next 23hrs.  looking at hobbit web page I see no down time nor do the
servers show any down time.  But when I click on the historical web
link
to
see the info.. I get this.. I really love hobbit..  but I am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost and inform
them
of the time the error occurred, and anything you might have done that
may
have caused the error.

More information about this error may be available in the server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>


Research | Marketing | Sales Generation

*www.progrexion.com<http://www.progrexion.com/>* <http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the
intended
recipient, delete this email and do not use or disclose the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX

email: user-e365c1418192@xymon.invalid<http://gmail.com/>;
quoted from Phil Wild
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX

email: user-e365c1418192@xymon.invalid<http://gmail.com/>;
quoted from Phil Wild
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX

email: user-e365c1418192@xymon.invalid<http://gmail.com>;
list Josh Luthman · Tue, 20 May 2008 11:37:44 -0400 ·
What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe DNS
would be the cause of the issue, though it is worth taking a look at until
another idea comes along.

On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
quoted from Gavin Leonard
wrote:
 Happened again this morning.. so I am going to try a different dns
server.


-Gavin


*From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid]
*Sent:* Monday, May 19, 2008 10:38 PM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Hmmm... bummer, there goes that theory... If you are using IP addresses,
and you are still getting failures on these hosts, then dns is not involved.
A ttl of five minutes is fairly worthless for a caching server. It only
helps if it hits the same device within five minutes, as hobbit is pinging
every five mins (default), you will most likely always be pulling from your
master/slaves...


Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

Well almost (good 99%) of my hosts have the testip tag, so it doesn't
need to look up the names.  The things it does look up are 5m TTLs

though.


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to see if the
issue reduces with a higher ttl. Another way to ensure this is not the
area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this issue in the
past.  I did install a caching only named on the box and it did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your problem...
This
is a theory...

I am wondering if you are running a caching name server on your hobbit
installation? If not, I am wondering if the fping places too high a
load
on
your dns server and misses the occassional host. Even with a caching
dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every single day.
 My
issue is that every host (or almost all of the hosts) will have
conn:red
and
then come back up ~60s later.  I just confirmed this weekend that it
is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <
user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I monitor
including
the
display server,  that its connection recovered.. the it runs great
for
the
next 23hrs.  looking at hobbit web page I see no down time nor do
the
servers show any down time.  But when I click on the historical web
link
to
see the info.. I get this.. I really love hobbit..  but I am not a
Web
guy
at all and I think it might be apache related…


*Internal Server Error*

The server encountered an internal error or misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost and inform
them
of the time the error occurred, and anything you might have done
that
may
have caused the error.

More information about this error may be available in the server
error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80*


*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com* <http://www.progrexion.com/>;


This email and its contents are confidential. If you are not the
intended
recipient, delete this email and do not use or disclose the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
  --

Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Katherine Cont Spawar Itc Hosch · Tue, 20 May 2008 10:52:09 -0500 ·
Check your apache log restarts in cron.... 
quoted from Josh Luthman

-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] 
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:


	Happened again this morning.. so I am going to try a different
dns server.

	 
	-Gavin

	 
	From: Phil Wild [mailto:user-e365c1418192@xymon.invalid] 
	Sent: Monday, May 19, 2008 10:38 PM
	To: user-ae9b8668bcde@xymon.invalid
	Subject: Re: [hobbit] wake up call

	 
	Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...

	 
	Phil

	2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

	Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
	need to look up the names.  The things it does look up are 5m
TTLs

	though.
	
	
	On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX

email: user-e365c1418192@xymon.invalid <http://gmail.com/>; 
quoted from Josh Luthman
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX

email: user-e365c1418192@xymon.invalid <http://gmail.com/>; 
quoted from Josh Luthman
	
	--

	Josh Luthman
	Office: XXX-XXX-XXXX
	Direct: XXX-XXX-XXXX
	XXXX Wayne St
	Suite XXXX
	Troy, OH XXXXX
	
	Those who don't understand UNIX are condemned to reinvent it,
poorly.
	--- Henry Spencer
	
	
	-- 
	Tel: XXXX XXX XXX
	Fax: XXXX XXX XXX
	email: user-e365c1418192@xymon.invalid 


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Phil Wild · Wed, 21 May 2008 09:11:51 +0800 ·
Can I suggest you use IP addresses for a number of servers and see if they
survive through your next episode. That will give you an idea of where the
problem might be...

It is the least amount of work towards identifying the cause.

Cheers

Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:
quoted from Katherine Cont Spawar Itc Hosch
Check your apache log restarts in cron....

-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer

       
       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

-- 
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
list Josh Luthman · Tue, 20 May 2008 21:38:07 -0400 ·
Also since you are lucky enough to have this problem at the same time
I would advise doing apacket capture with tcpdump.
quoted from Phil Wild


On 5/20/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Can I suggest you use IP addresses for a number of servers and see if they
survive through your next episode. That will give you an idea of where the
problem might be...

It is the least amount of work towards identifying the cause.

Cheers

Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:
Check your apache log restarts in cron....

-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation

*www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer

       
       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Gavin Leonard · Tue, 20 May 2008 22:17:48 -0600 ·
Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok?  I will let you know...
quoted from Josh Luthman

-Gavin

From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
Sent: Tuesday, May 20, 2008 7:12 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be...

It is the least amount of work towards identifying the cause.

Cheers

Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid<mailto:user-f2d837e5c776@xymon.invalid>>:
quoted from Josh Luthman
Check your apache log restarts in cron....

-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>


Research | Marketing | Sales Generation

*www.progrexion.com<http://www.progrexion.com/>; <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer


       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid<http://gmail.com/>;


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com>;
list Josh Luthman · Wed, 21 May 2008 00:23:38 -0400 ·
Thanks for the heads up.  I am very interested in knowing what is the cause
and more importantly the solution to your issue, as it may fix mine!

It would VERY nice to be able to print out uptime and availability reports
without the dozens of 1 minute outages.  I know my issue is related to the
box itself (hardware or software) as the issue appears on the hobbit server
itself.

On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
quoted from Gavin Leonard
wrote:
 Most if not all of my servers are defined by ip anyway, I have a very
segmented network so dns is not very helpful across all the different
domains and subnets.. i use my hosts file for the most part.. now that I
think of it, I wonder if the ones in the host file are still ok?  I will let
you know…


-Gavin


*From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 7:12 PM

*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Can I suggest you use IP addresses for a number of servers and see if they
survive through your next episode. That will give you an idea of where the
problem might be...


It is the least amount of work towards identifying the cause.


Cheers


Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:

Check your apache log restarts in cron....


-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation
*www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer

       
       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Gavin Leonard · Wed, 21 May 2008 10:02:01 -0600 ·
Ok.. well it did not do it this morning after adding all of my monitored hosts to the /etc/hosts file... I just cut and copied my bb-hosts file in to my /etc/hosts file, modified in to proper format.. no pages this morning.. so it could have been a dns issue.. if I am clear for three more mornings then I will be satisfied... I will let you know..
quoted from Josh Luthman

-Gavin

From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:24 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

Thanks for the heads up.  I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine!

It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages.  I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself.
On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:

Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok?  I will let you know...


-Gavin


From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
Sent: Tuesday, May 20, 2008 7:12 PM

To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call


Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be...


It is the least amount of work towards identifying the cause.


Cheers


Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid<mailto:user-f2d837e5c776@xymon.invalid>>:

Check your apache log restarts in cron....

-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>


Research | Marketing | Sales Generation
*www.progrexion.com<http://www.progrexion.com/>; <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer


       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid<http://gmail.com/>;


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com>;


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman · Wed, 21 May 2008 12:07:05 -0400 ·
After those three mornings would mind commenting those hosts to be certain
that reproduces the issue?

On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
quoted from Gavin Leonard
wrote:
 Ok.. well it did not do it this morning after adding all of my monitored
hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to
my /etc/hosts file, modified in to proper format.. no pages this morning..
so it could have been a dns issue.. if I am clear for three more mornings
then I will be satisfied… I will let you know..


-Gavin


*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 10:24 PM

*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Thanks for the heads up.  I am very interested in knowing what is the cause
and more importantly the solution to your issue, as it may fix mine!

It would VERY nice to be able to print out uptime and availability reports
without the dozens of 1 minute outages.  I know my issue is related to the
box itself (hardware or software) as the issue appears on the hobbit server
itself.

On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:

Most if not all of my servers are defined by ip anyway, I have a very
segmented network so dns is not very helpful across all the different
domains and subnets.. i use my hosts file for the most part.. now that I
think of it, I wonder if the ones in the host file are still ok?  I will let
you know…


-Gavin


*From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 7:12 PM


*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Can I suggest you use IP addresses for a number of servers and see if they
survive through your next episode. That will give you an idea of where the
problem might be...


It is the least amount of work towards identifying the cause.


Cheers


Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:

Check your apache log restarts in cron....


-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation
*www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer

       
       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Gavin Leonard · Wed, 21 May 2008 10:12:31 -0600 ·
Sure.. just give me your pager # and they can wake you up... :)
quoted from Josh Luthman

-Gavin

From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Wednesday, May 21, 2008 10:07 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

After those three mornings would mind commenting those hosts to be certain that reproduces the issue?
On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:

Ok.. well it did not do it this morning after adding all of my monitored hosts to the /etc/hosts file... I just cut and copied my bb-hosts file in to my /etc/hosts file, modified in to proper format.. no pages this morning.. so it could have been a dns issue.. if I am clear for three more mornings then I will be satisfied... I will let you know..


-Gavin


From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:24 PM

To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call


Thanks for the heads up.  I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine!

It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages.  I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself.

On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:

Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok?  I will let you know...


-Gavin


From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
Sent: Tuesday, May 20, 2008 7:12 PM

To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call


Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be...


It is the least amount of work towards identifying the cause.


Cheers


Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid<mailto:user-f2d837e5c776@xymon.invalid>>:

Check your apache log restarts in cron....

-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>


Research | Marketing | Sales Generation
*www.progrexion.com<http://www.progrexion.com/>; <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer


       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid<http://gmail.com/>;


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com>;


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman · Wed, 21 May 2008 12:20:39 -0400 ·
Tell me what email they're coming from and use user-4c45a83f15cb@xymon.invalid

On Wed, May 21, 2008 at 12:12 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:
 Sure.. just give me your pager # and they can wake you up… J
quoted from Gavin Leonard


-Gavin


*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Wednesday, May 21, 2008 10:07 AM

*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


After those three mornings would mind commenting those hosts to be certain
that reproduces the issue?

On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:

Ok.. well it did not do it this morning after adding all of my monitored
hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to
my /etc/hosts file, modified in to proper format.. no pages this morning..
so it could have been a dns issue.. if I am clear for three more mornings
then I will be satisfied… I will let you know..


-Gavin


*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 10:24 PM


*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Thanks for the heads up.  I am very interested in knowing what is the cause
and more importantly the solution to your issue, as it may fix mine!

It would VERY nice to be able to print out uptime and availability reports
without the dozens of 1 minute outages.  I know my issue is related to the
box itself (hardware or software) as the issue appears on the hobbit server
itself.

On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:

Most if not all of my servers are defined by ip anyway, I have a very
segmented network so dns is not very helpful across all the different
domains and subnets.. i use my hosts file for the most part.. now that I
think of it, I wonder if the ones in the host file are still ok?  I will let
you know…


-Gavin


*From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 7:12 PM


*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Can I suggest you use IP addresses for a number of servers and see if they
survive through your next episode. That will give you an idea of where the
problem might be...


It is the least amount of work towards identifying the cause.


Cheers


Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:

Check your apache log restarts in cron....


-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation
*www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer

       
       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Rafal Roginela · Wed, 21 May 2008 12:13:11 -0500 ·
Hi All,

 
Having an issue with Hobbit and ffping. When I try to define my
hobbitserver.cfg FPING line with FPING="/usr/sbin/fping -c3" my conn
test breaks and all hosts go red. I want to be able to do 3 pings to
each host to get an average ping time. Any help would be appreciated.

 
Thank You,

Rafal Roginela
list Josh Luthman · Wed, 21 May 2008 13:41:11 -0400 ·
essex ~ -> fping 127.0.0.1
127.0.0.1 is alive
essex ~ -> fping -c3 127.0.0.1
127.0.0.1 : [0], 84 bytes, 0.12 ms (0.12 avg, 0% loss)
127.0.0.1 : [1], 84 bytes, 0.03 ms (0.07 avg, 0% loss)
127.0.0.1 : [2], 84 bytes, 0.07 ms (0.07 avg, 0% loss)

This is probably why it is pissed off =P

On Wed, May 21, 2008 at 1:13 PM, Rafal Roginela <
quoted from Rafal Roginela
user-744e62462615@xymon.invalid> wrote:
 Hi All,


Having an issue with Hobbit and ffping. When I try to define my
hobbitserver.cfg FPING line with FPING="/usr/sbin/fping –c3" my conn test
breaks and all hosts go red. I want to be able to do 3 pings to each host to
get an average ping time. Any help would be appreciated.


Thank You,

*Rafal Roginela*
-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Rafal Roginela · Wed, 21 May 2008 12:46:24 -0500 ·
Hi Josh,

 
Yeah I think so too.. but thought that someone might be able to help
make it happen...

 
Thank You,

Rafal Roginela
quoted from Josh Luthman


From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] 
Sent: Wednesday, May 21, 2008 12:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fping arguments

 
essex ~ -> fping 127.0.0.1
127.0.0.1 is alive
essex ~ -> fping -c3 127.0.0.1
127.0.0.1 : [0], 84 bytes, 0.12 ms (0.12 avg, 0% loss)
127.0.0.1 : [1], 84 bytes, 0.03 ms (0.07 avg, 0% loss)
127.0.0.1 : [2], 84 bytes, 0.07 ms (0.07 avg, 0% loss)

This is probably why it is pissed off =P

On Wed, May 21, 2008 at 1:13 PM, Rafal Roginela
<user-744e62462615@xymon.invalid> wrote:

Hi All,

 
Having an issue with Hobbit and ffping. When I try to define my
hobbitserver.cfg FPING line with FPING="/usr/sbin/fping -c3" my conn
test breaks and all hosts go red. I want to be able to do 3 pings to
each host to get an average ping time. Any help would be appreciated.

 
Thank You,

Rafal Roginela 


-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Phil Wild · Thu, 22 May 2008 09:36:16 +0800 ·
It sure sounds like your issue is with your dns servers...

There are another couple of things to try...

You can set --dns=ip for bb-testnet This will tell hobbit to use the IP's
specified in your bb-hosts file rather than passing it to the OS name
resolution libraries.

I would expect you will get the same result as you have now with all IP's
defined in /etc/hosts. It would be very interesting to know why this happens
the same time every day. Can you describe your network and dns topology?
What settings do you have in your soa?

Cheers

Phil

2008/5/22 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
quoted from Josh Luthman
Tell me what email they're coming from and use user-4c45a83f15cb@xymon.invalid


On Wed, May 21, 2008 at 12:12 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:
 Sure.. just give me your pager # and they can wake you up… J


-Gavin


*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Wednesday, May 21, 2008 10:07 AM

*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


After those three mornings would mind commenting those hosts to be certain
that reproduces the issue?

On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:

Ok.. well it did not do it this morning after adding all of my monitored
hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to
my /etc/hosts file, modified in to proper format.. no pages this morning..
so it could have been a dns issue.. if I am clear for three more mornings
then I will be satisfied… I will let you know..


-Gavin


*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 10:24 PM


*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Thanks for the heads up.  I am very interested in knowing what is the
cause and more importantly the solution to your issue, as it may fix mine!

It would VERY nice to be able to print out uptime and availability reports
without the dozens of 1 minute outages.  I know my issue is related to the
box itself (hardware or software) as the issue appears on the hobbit server
itself.

On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
wrote:

Most if not all of my servers are defined by ip anyway, I have a very
segmented network so dns is not very helpful across all the different
domains and subnets.. i use my hosts file for the most part.. now that I
think of it, I wonder if the ones in the host file are still ok?  I will let
you know…


-Gavin


*From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 7:12 PM


*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call


Can I suggest you use IP addresses for a number of servers and see if they
survive through your next episode. That will give you an idea of where the
problem might be...


It is the least amount of work towards identifying the cause.


Cheers


Phil

2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:

Check your apache log restarts in cron....


-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call

What most people suggest is having a local DNS server, on the Hobbitmon
server itself.

As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.


On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:


       Happened again this morning.. so I am going to try a different
dns server.


       -Gavin


       From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
       Sent: Monday, May 19, 2008 10:38 PM
       To: user-ae9b8668bcde@xymon.invalid
       Subject: Re: [hobbit] wake up call


       Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...


       Phil

       2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:

       Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
       need to look up the names.  The things it does look up are 5m
TTLs

       though.


       On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past.  I did install a caching only named on the box and it
did not
fix the problem.

Did relieve the stress of my other DNS server though :)


On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,

This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...

I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.

Phil

2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,

I am having a very similar issue - though it is not every
single day.
 My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later.  I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).

An issue like that is almost always Apache related.  Can
you post the
errors in /var/log/httpd/error_log from this time period?

Josh


On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
 Every morning at 7am I get pages from every host I
monitor including
the
display server,  that its connection recovered.. the it
runs great for
the
next 23hrs.  looking at hobbit web page I see no down
time nor do the
servers show any down time.  But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit..  but I
am not a Web
guy
at all and I think it might be apache related...


*Internal Server Error*

The server encountered an internal error or
misconfiguration and was
unable to complete your request.

Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.

More information about this error may be available in the
server error
log.

*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*

*Gavin Leonard*

[image: cid:image001.gif at 01C856AD.922EF120]

Director, Systems-Network Engineering

*T*

 XXX-XXX-XXXX

*F*

 XXX-XXX-XXXX

*E*

 user-d65663809eb4@xymon.invalid


Research | Marketing | Sales Generation
*www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;

This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer

--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
       --

       Josh Luthman
       Office: XXX-XXX-XXXX
       Direct: XXX-XXX-XXXX
       XXXX Wayne St
       Suite XXXX
       Troy, OH XXXXX

       Those who don't understand UNIX are condemned to reinvent it,
poorly.
       --- Henry Spencer

       
       --
       Tel: XXXX XXX XXX
       Fax: XXXX XXX XXX
       email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
 Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
-- 
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid