wake up call
list Gavin Leonard
Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80 Gavin Leonard [cid:image001.gif at 01C8B9B2.A49DC760] Director, Systems-Network Engineering T XXX-XXX-XXXX F XXX-XXX-XXXX E user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid> Research | Marketing | Sales Generation www.progrexion.com<http://www.progrexion.com/> This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.
list Josh Luthman
Gavin, I am having a very similar issue - though it is not every single day. My issue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
▸
wrote:
Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related… *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard*
[image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com* <http://www.progrexion.com/>;
▸
This email and its contents are confidential. If you are not the intended
recipient, delete this email and do not use or disclose the information
within this email or its attachments. Thank you.
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Phil Wild
Hi Josh, This doesn't relate to the apache error, it relates to your problem... This is a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
▸
Gavin, I am having a very similar issue - though it is not every single day. My issue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related… *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com* <http://www.progrexion.com/>; This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
list Josh Luthman
That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :)
▸
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem... This is a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day. My issue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related… *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com* <http://www.progrexion.com/>; This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Phil Wild
What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
▸
That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related… *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com* <http://www.progrexion.com/>; This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
list Josh Luthman
Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs
▸
though.
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related… *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com* <http://www.progrexion.com/>; This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Phil Wild
Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
▸
Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard < user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not aWebguy at all and I think it might be apache related… *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com* <http://www.progrexion.com/>; This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
list Gavin Leonard
Happened again this morning.. so I am going to try a different dns server. -Gavin
▸
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
Sent: Monday, May 19, 2008 10:38 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves...
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
▸
Well almost (good 99%) of my hosts have the testip tag, so it doesn't
need to look up the names. The things it does look up are 5m TTLs
though.
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
▸
That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
▸
Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh
On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
▸
wrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid> Research | Marketing | Sales Generation *www.progrexion.com<http://www.progrexion.com/>* <http://www.progrexion.com/>; This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>;
▸
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>;
▸
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com>;
list Josh Luthman
What most people suggest is having a local DNS server, on the Hobbitmon server itself. As this is happening at the same time every single day I don't believe DNS would be the cause of the issue, though it is worth taking a look at until another idea comes along. On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
▸
wrote:
Happened again this morning.. so I am going to try a different dns server. -Gavin *From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid] *Sent:* Monday, May 19, 2008 10:38 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>: Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard < user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not aWebguy at all and I think it might be apache related… *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid Port 80* *Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com* <http://www.progrexion.com/>; This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Katherine Cont Spawar Itc Hosch
Check your apache log restarts in cron....
▸
-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
What most people suggest is having a local DNS server, on the Hobbitmon
server itself.
As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.
On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:
Happened again this morning.. so I am going to try a different
dns server.
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
Sent: Monday, May 19, 2008 10:38 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
need to look up the names. The things it does look up are 5m
TTLs
though.
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*
*Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;
This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
▸
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>;
▸
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Phil Wild
Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be... It is the least amount of work towards identifying the cause. Cheers Phil 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:
▸
Check your apache log restarts in cron.... -----Original Message----- From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Tuesday, May 20, 2008 10:38 To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call What most people suggest is having a local DNS server, on the Hobbitmon server itself. As this is happening at the same time every single day I don't believe DNS would be the cause of the issue, though it is worth taking a look at until another idea comes along. On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Happened again this morning.. so I am going to try a different dns server. -Gavin From: Phil Wild [mailto:user-e365c1418192@xymon.invalid] Sent: Monday, May 19, 2008 10:38 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>: Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard<user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalidPort 80**Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com <http://www.progrexion.com/>; *<http://www.progrexion.com/>;This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
list Josh Luthman
Also since you are lucky enough to have this problem at the same time I would advise doing apacket capture with tcpdump.
▸
On 5/20/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be... It is the least amount of work towards identifying the cause. Cheers Phil 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:Check your apache log restarts in cron.... -----Original Message----- From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Tuesday, May 20, 2008 10:38 To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call What most people suggest is having a local DNS server, on the Hobbitmon server itself. As this is happening at the same time every single day I don't believe DNS would be the cause of the issue, though it is worth taking a look at until another idea comes along. On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Happened again this morning.. so I am going to try a different dns server. -Gavin From: Phil Wild [mailto:user-e365c1418192@xymon.invalid] Sent: Monday, May 19, 2008 10:38 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>: Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard<user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalidPort 80**Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation *www.progrexion.com <http://www.progrexion.com/>; *<http://www.progrexion.com/>;This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Gavin Leonard
Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok? I will let you know...
▸
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
Sent: Tuesday, May 20, 2008 7:12 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be...
It is the least amount of work towards identifying the cause.
Cheers
Phil
2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid<mailto:user-f2d837e5c776@xymon.invalid>>:
▸
Check your apache log restarts in cron....
-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
What most people suggest is having a local DNS server, on the Hobbitmon
server itself.
As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.
On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:
Happened again this morning.. so I am going to try a different
dns server.
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
Sent: Monday, May 19, 2008 10:38 PM
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
need to look up the names. The things it does look up are 5m
TTLs
though.
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
wrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*
*Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid> Research | Marketing | Sales Generation *www.progrexion.com<http://www.progrexion.com/>; <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;
This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com>;
list Josh Luthman
Thanks for the heads up. I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine! It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages. I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself. On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
▸
wrote:
Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok? I will let you know… -Gavin *From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid] *Sent:* Tuesday, May 20, 2008 7:12 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be... It is the least amount of work towards identifying the cause. Cheers Phil 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>: Check your apache log restarts in cron.... -----Original Message----- From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Tuesday, May 20, 2008 10:38 To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call What most people suggest is having a local DNS server, on the Hobbitmon server itself. As this is happening at the same time every single day I don't believe DNS would be the cause of the issue, though it is worth taking a look at until another idea comes along. On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Happened again this morning.. so I am going to try a different dns server. -Gavin From: Phil Wild [mailto:user-e365c1418192@xymon.invalid] Sent: Monday, May 19, 2008 10:38 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>: Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard<user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalidPort 80**Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation*www.progrexion.com <http://www.progrexion.com/>; *<http://www.progrexion.com/>;This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Gavin Leonard
Ok.. well it did not do it this morning after adding all of my monitored hosts to the /etc/hosts file... I just cut and copied my bb-hosts file in to my /etc/hosts file, modified in to proper format.. no pages this morning.. so it could have been a dns issue.. if I am clear for three more mornings then I will be satisfied... I will let you know..
▸
-Gavin
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:24 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
Thanks for the heads up. I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine!
It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages. I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself.
On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:
Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok? I will let you know...
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
Sent: Tuesday, May 20, 2008 7:12 PM
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be...
It is the least amount of work towards identifying the cause.
Cheers
Phil
2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid<mailto:user-f2d837e5c776@xymon.invalid>>:
Check your apache log restarts in cron....
-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
What most people suggest is having a local DNS server, on the Hobbitmon
server itself.
As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.
On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:
Happened again this morning.. so I am going to try a different
dns server.
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
Sent: Monday, May 19, 2008 10:38 PM
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
need to look up the names. The things it does look up are 5m
TTLs
though.
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
wrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*
*Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid> Research | Marketing | Sales Generation
*www.progrexion.com<http://www.progrexion.com/>; <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;
This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman
After those three mornings would mind commenting those hosts to be certain that reproduces the issue? On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid>
▸
wrote:
Ok.. well it did not do it this morning after adding all of my monitored hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to my /etc/hosts file, modified in to proper format.. no pages this morning.. so it could have been a dns issue.. if I am clear for three more mornings then I will be satisfied… I will let you know.. -Gavin *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Tuesday, May 20, 2008 10:24 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Thanks for the heads up. I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine! It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages. I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself. On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok? I will let you know… -Gavin *From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid] *Sent:* Tuesday, May 20, 2008 7:12 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be... It is the least amount of work towards identifying the cause. Cheers Phil 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>: Check your apache log restarts in cron.... -----Original Message----- From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Tuesday, May 20, 2008 10:38 To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call What most people suggest is having a local DNS server, on the Hobbitmon server itself. As this is happening at the same time every single day I don't believe DNS would be the cause of the issue, though it is worth taking a look at until another idea comes along. On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Happened again this morning.. so I am going to try a different dns server. -Gavin From: Phil Wild [mailto:user-e365c1418192@xymon.invalid] Sent: Monday, May 19, 2008 10:38 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>: Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard<user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalidPort 80**Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation*www.progrexion.com <http://www.progrexion.com/>; *<http://www.progrexion.com/>;This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Gavin Leonard
Sure.. just give me your pager # and they can wake you up... :)
▸
-Gavin
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Wednesday, May 21, 2008 10:07 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
After those three mornings would mind commenting those hosts to be certain that reproduces the issue?
On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:
Ok.. well it did not do it this morning after adding all of my monitored hosts to the /etc/hosts file... I just cut and copied my bb-hosts file in to my /etc/hosts file, modified in to proper format.. no pages this morning.. so it could have been a dns issue.. if I am clear for three more mornings then I will be satisfied... I will let you know..
-Gavin
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:24 PM
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
Thanks for the heads up. I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine!
It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages. I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself.
On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:
Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok? I will let you know...
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
Sent: Tuesday, May 20, 2008 7:12 PM
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be...
It is the least amount of work towards identifying the cause.
Cheers
Phil
2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid<mailto:user-f2d837e5c776@xymon.invalid>>:
Check your apache log restarts in cron....
-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
What most people suggest is having a local DNS server, on the Hobbitmon
server itself.
As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.
On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>> wrote:
Happened again this morning.. so I am going to try a different
dns server.
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>]
Sent: Monday, May 19, 2008 10:38 PM
To: user-ae9b8668bcde@xymon.invalid<mailto:user-ae9b8668bcde@xymon.invalid>
Subject: Re: [hobbit] wake up call
Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:
Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
need to look up the names. The things it does look up are 5m
TTLs
though.
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid<mailto:user-e365c1418192@xymon.invalid>> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid<mailto:user-4c45a83f15cb@xymon.invalid>>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid>
wrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*
*Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid<mailto:user-d65663809eb4@xymon.invalid> Research | Marketing | Sales Generation
*www.progrexion.com<http://www.progrexion.com/>; <http://www.progrexion.com/>; *
<http://www.progrexion.com/>;
This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>; <http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com/>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid<http://gmail.com>;
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman
Tell me what email they're coming from and use user-4c45a83f15cb@xymon.invalid On Wed, May 21, 2008 at 12:12 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote:
Sure.. just give me your pager # and they can wake you up… J
▸
-Gavin *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Wednesday, May 21, 2008 10:07 AM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call After those three mornings would mind commenting those hosts to be certain that reproduces the issue? On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Ok.. well it did not do it this morning after adding all of my monitored hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to my /etc/hosts file, modified in to proper format.. no pages this morning.. so it could have been a dns issue.. if I am clear for three more mornings then I will be satisfied… I will let you know.. -Gavin *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Tuesday, May 20, 2008 10:24 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Thanks for the heads up. I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine! It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages. I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself. On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok? I will let you know… -Gavin *From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid] *Sent:* Tuesday, May 20, 2008 7:12 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be... It is the least amount of work towards identifying the cause. Cheers Phil 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>: Check your apache log restarts in cron.... -----Original Message----- From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Tuesday, May 20, 2008 10:38 To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call What most people suggest is having a local DNS server, on the Hobbitmon server itself. As this is happening at the same time every single day I don't believe DNS would be the cause of the issue, though it is worth taking a look at until another idea comes along. On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Happened again this morning.. so I am going to try a different dns server. -Gavin From: Phil Wild [mailto:user-e365c1418192@xymon.invalid] Sent: Monday, May 19, 2008 10:38 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>: Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard<user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalidPort 80**Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation*www.progrexion.com <http://www.progrexion.com/>; *<http://www.progrexion.com/>;This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Rafal Roginela
Hi All, Having an issue with Hobbit and ffping. When I try to define my hobbitserver.cfg FPING line with FPING="/usr/sbin/fping -c3" my conn test breaks and all hosts go red. I want to be able to do 3 pings to each host to get an average ping time. Any help would be appreciated. Thank You, Rafal Roginela
list Josh Luthman
essex ~ -> fping 127.0.0.1 127.0.0.1 is alive essex ~ -> fping -c3 127.0.0.1 127.0.0.1 : [0], 84 bytes, 0.12 ms (0.12 avg, 0% loss) 127.0.0.1 : [1], 84 bytes, 0.03 ms (0.07 avg, 0% loss) 127.0.0.1 : [2], 84 bytes, 0.07 ms (0.07 avg, 0% loss) This is probably why it is pissed off =P On Wed, May 21, 2008 at 1:13 PM, Rafal Roginela <
▸
user-744e62462615@xymon.invalid> wrote:
Hi All, Having an issue with Hobbit and ffping. When I try to define my hobbitserver.cfg FPING line with FPING="/usr/sbin/fping –c3" my conn test breaks and all hosts go red. I want to be able to do 3 pings to each host to get an average ping time. Any help would be appreciated. Thank You, *Rafal Roginela*
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Rafal Roginela
Hi Josh, Yeah I think so too.. but thought that someone might be able to help make it happen... Thank You, Rafal Roginela
▸
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Wednesday, May 21, 2008 12:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fping arguments
essex ~ -> fping 127.0.0.1
127.0.0.1 is alive
essex ~ -> fping -c3 127.0.0.1
127.0.0.1 : [0], 84 bytes, 0.12 ms (0.12 avg, 0% loss)
127.0.0.1 : [1], 84 bytes, 0.03 ms (0.07 avg, 0% loss)
127.0.0.1 : [2], 84 bytes, 0.07 ms (0.07 avg, 0% loss)
This is probably why it is pissed off =P
On Wed, May 21, 2008 at 1:13 PM, Rafal Roginela
<user-744e62462615@xymon.invalid> wrote:
Hi All,
Having an issue with Hobbit and ffping. When I try to define my
hobbitserver.cfg FPING line with FPING="/usr/sbin/fping -c3" my conn
test breaks and all hosts go red. I want to be able to do 3 pings to
each host to get an average ping time. Any help would be appreciated.
Thank You,
Rafal Roginela
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Phil Wild
It sure sounds like your issue is with your dns servers... There are another couple of things to try... You can set --dns=ip for bb-testnet This will tell hobbit to use the IP's specified in your bb-hosts file rather than passing it to the OS name resolution libraries. I would expect you will get the same result as you have now with all IP's defined in /etc/hosts. It would be very interesting to know why this happens the same time every day. Can you describe your network and dns topology? What settings do you have in your soa? Cheers Phil 2008/5/22 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
▸
Tell me what email they're coming from and use user-4c45a83f15cb@xymon.invalid On Wed, May 21, 2008 at 12:12 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote:Sure.. just give me your pager # and they can wake you up… J -Gavin *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Wednesday, May 21, 2008 10:07 AM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call After those three mornings would mind commenting those hosts to be certain that reproduces the issue? On Wed, May 21, 2008 at 12:02 PM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Ok.. well it did not do it this morning after adding all of my monitored hosts to the /etc/hosts file… I just cut and copied my bb-hosts file in to my /etc/hosts file, modified in to proper format.. no pages this morning.. so it could have been a dns issue.. if I am clear for three more mornings then I will be satisfied… I will let you know.. -Gavin *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Tuesday, May 20, 2008 10:24 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Thanks for the heads up. I am very interested in knowing what is the cause and more importantly the solution to your issue, as it may fix mine! It would VERY nice to be able to print out uptime and availability reports without the dozens of 1 minute outages. I know my issue is related to the box itself (hardware or software) as the issue appears on the hobbit server itself. On Wed, May 21, 2008 at 12:17 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Most if not all of my servers are defined by ip anyway, I have a very segmented network so dns is not very helpful across all the different domains and subnets.. i use my hosts file for the most part.. now that I think of it, I wonder if the ones in the host file are still ok? I will let you know… -Gavin *From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid] *Sent:* Tuesday, May 20, 2008 7:12 PM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] wake up call Can I suggest you use IP addresses for a number of servers and see if they survive through your next episode. That will give you an idea of where the problem might be... It is the least amount of work towards identifying the cause. Cheers Phil 2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>: Check your apache log restarts in cron.... -----Original Message----- From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Tuesday, May 20, 2008 10:38 To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call What most people suggest is having a local DNS server, on the Hobbitmon server itself. As this is happening at the same time every single day I don't believe DNS would be the cause of the issue, though it is worth taking a look at until another idea comes along. On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard <user-d65663809eb4@xymon.invalid> wrote: Happened again this morning.. so I am going to try a different dns server. -Gavin From: Phil Wild [mailto:user-e365c1418192@xymon.invalid] Sent: Monday, May 19, 2008 10:38 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] wake up call Hmmm... bummer, there goes that theory... If you are using IP addresses, and you are still getting failures on these hosts, then dns is not involved. A ttl of five minutes is fairly worthless for a caching server. It only helps if it hits the same device within five minutes, as hobbit is pinging every five mins (default), you will most likely always be pulling from your master/slaves... Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>: Well almost (good 99%) of my hosts have the testip tag, so it doesn't need to look up the names. The things it does look up are 5m TTLs though. On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to see if the issue reduces with a higher ttl. Another way to ensure this is not the area of the issue would be to set the dns server up as a slave. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:That was someone's theory in a very large post about this issue in the past. I did install a caching only named on the box and it did not fix the problem. Did relieve the stress of my other DNS server though :) On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:Hi Josh, This doesn't relate to the apache error, it relates to your problem...Thisis a theory... I am wondering if you are running a caching name server on your hobbit installation? If not, I am wondering if the fping places too high a load on your dns server and misses the occassional host. Even with a caching dns server you may see the issue every time ttl expires. Phil 2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:Gavin, I am having a very similar issue - though it is not every single day.Myissue is that every host (or almost all of the hosts) will have conn:red and then come back up ~60s later. I just confirmed this weekend that it is not related the Via NIC (Using an Intel Pro/100 S now). An issue like that is almost always Apache related. Can you post the errors in /var/log/httpd/error_log from this time period? Josh On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard<user-d65663809eb4@xymon.invalidwrote:Every morning at 7am I get pages from every host I monitor including the display server, that its connection recovered.. the it runs great for the next 23hrs. looking at hobbit web page I see no down time nor do the servers show any down time. But when I click on the historical web link to see the info.. I get this.. I really love hobbit.. but I am not a Web guy at all and I think it might be apache related... *Internal Server Error* The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator, root at localhost and inform them of the time the error occurred, and anything you might have done that may have caused the error. More information about this error may be available in the server error log. *Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalidPort 80**Gavin Leonard* [image: cid:image001.gif at 01C856AD.922EF120] Director, Systems-Network Engineering *T* XXX-XXX-XXXX *F* XXX-XXX-XXXX *E* user-d65663809eb4@xymon.invalid Research | Marketing | Sales Generation*www.progrexion.com <http://www.progrexion.com/>; *<http://www.progrexion.com/>;This email and its contents are confidential. If you are not the intended recipient, delete this email and do not use or disclose the information within this email or its attachments. Thank you.-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Tel: XXXX XXX XXX Fax: XXXX XXX XXXemail: user-e365c1418192@xymon.invalid <http://gmail.com/>;-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Tel: XXXX XXX XXX Fax: XXXX XXX XXX email: user-e365c1418192@xymon.invalid