Most if not all of my servers are defined by ip anyway, I have a very
segmented network so dns is not very helpful across all the different
domains and subnets.. i use my hosts file for the most part.. now that I
think of it, I wonder if the ones in the host file are still ok? I will let
you know…
-Gavin
*From:* Phil Wild [mailto:user-e365c1418192@xymon.invalid]
*Sent:* Tuesday, May 20, 2008 7:12 PM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] wake up call
Can I suggest you use IP addresses for a number of servers and see if they
survive through your next episode. That will give you an idea of where the
problem might be...
It is the least amount of work towards identifying the cause.
Cheers
Phil
2008/5/20 Hosch, Katherine CONT (SPAWAR ITC) <user-f2d837e5c776@xymon.invalid>:
Check your apache log restarts in cron....
-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Tuesday, May 20, 2008 10:38
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
What most people suggest is having a local DNS server, on the Hobbitmon
server itself.
As this is happening at the same time every single day I don't believe
DNS would be the cause of the issue, though it is worth taking a look at
until another idea comes along.
On Tue, May 20, 2008 at 11:27 AM, Gavin Leonard
<user-d65663809eb4@xymon.invalid> wrote:
Happened again this morning.. so I am going to try a different
dns server.
-Gavin
From: Phil Wild [mailto:user-e365c1418192@xymon.invalid]
Sent: Monday, May 19, 2008 10:38 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] wake up call
Hmmm... bummer, there goes that theory... If you are using IP
addresses, and you are still getting failures on these hosts, then dns
is not involved. A ttl of five minutes is fairly worthless for a caching
server. It only helps if it hits the same device within five minutes, as
hobbit is pinging every five mins (default), you will most likely always
be pulling from your master/slaves...
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Well almost (good 99%) of my hosts have the testip tag, so it
doesn't
need to look up the names. The things it does look up are 5m
TTLs
though.
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:What is ttl set to for your domain? It would be interesting to
see if the
issue reduces with a higher ttl. Another way to ensure this is
not the area
of the issue would be to set the dns server up as a slave.
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
That was someone's theory in a very large post about this
issue in the
past. I did install a caching only named on the box and it
did not
fix the problem.
Did relieve the stress of my other DNS server though :)
On 5/19/08, Phil Wild <user-e365c1418192@xymon.invalid> wrote:
Hi Josh,
This doesn't relate to the apache error, it relates to your
problem...
This
is a theory...
I am wondering if you are running a caching name server on
your hobbit
installation? If not, I am wondering if the fping places
too high a load
on
your dns server and misses the occassional host. Even with
a caching dns
server you may see the issue every time ttl expires.
Phil
2008/5/20 Josh Luthman <user-4c45a83f15cb@xymon.invalid>:
Gavin,
I am having a very similar issue - though it is not every
single day.
My
issue is that every host (or almost all of the hosts) will
have
conn:red
and
then come back up ~60s later. I just confirmed this
weekend that it is
not
related the Via NIC (Using an Intel Pro/100 S now).
An issue like that is almost always Apache related. Can
you post the
errors in /var/log/httpd/error_log from this time period?
Josh
On Mon, May 19, 2008 at 3:26 PM, Gavin Leonard
<user-d65663809eb4@xymon.invalid
wrote:
Every morning at 7am I get pages from every host I
monitor including
the
display server, that its connection recovered.. the it
runs great for
the
next 23hrs. looking at hobbit web page I see no down
time nor do the
servers show any down time. But when I click on the
historical web
link
to
see the info.. I get this.. I really love hobbit.. but I
am not a Web
guy
at all and I think it might be apache related...
*Internal Server Error*
The server encountered an internal error or
misconfiguration and was
unable to complete your request.
Please contact the server administrator, root at localhost
and inform
them
of the time the error occurred, and anything you might
have done that
may
have caused the error.
More information about this error may be available in the
server error
log.
*Apache/2.0.54 (Yellowdog) user-f8006a414c56@xymon.invalid
Port 80*
*Gavin Leonard*
[image: cid:image001.gif at 01C856AD.922EF120]
Director, Systems-Network Engineering
*T*
XXX-XXX-XXXX
*F*
XXX-XXX-XXXX
*E*
user-d65663809eb4@xymon.invalid
Research | Marketing | Sales Generation
*www.progrexion.com <http://www.progrexion.com/> *
<http://www.progrexion.com/>
This email and its contents are confidential. If you are
not the
intended
recipient, delete this email and do not use or disclose
the
information
within this email or its attachments. Thank you.
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent
it, poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid <http://gmail.com/>
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it,
poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid