Xymon Mailing List Archive search

DNS reboot causes purples

6 messages in this thread

list Bill Perez · Tue, 10 Jan 2006 10:58:10 -0500 ·
Hi all,

I'm hoping someone might be able to help me.  I'm running Hobbit 4.1.2 on a
Fedora Core 4, monitoring approximately 500 servers.  I have been running
Hobbit for a few months and a few times our DNS server has been rebooted for
patching.  When this happens it causes some servers to go purple and the
only way I've been able to fix this is to restart the Hobbit service but it
has generated a ton of alerts and not a lot of happy alert recipients.  My
/etc/resolv.conf file has primary and secondary DNS servers, so I would have
thought if one wasn't available it would use the other, but this doesn't
seem to be the case.
Has anyone seen this or know what I could do to prevent these purples from
occuring when the DNS server is rebooted?

Thanks much in advance.
list Henrik Størner · Thu, 12 Jan 2006 07:40:05 +0100 ·
quoted from Bill Perez
On Tue, Jan 10, 2006 at 10:58:10AM -0500, Bill Perez wrote:
I'm hoping someone might be able to help me.  I'm running Hobbit 4.1.2 on a
Fedora Core 4, monitoring approximately 500 servers.  I have been running
Hobbit for a few months and a few times our DNS server has been rebooted for
patching.  When this happens it causes some servers to go purple and the
only way I've been able to fix this is to restart the Hobbit service but it
has generated a ton of alerts and not a lot of happy alert recipients.  My
/etc/resolv.conf file has primary and secondary DNS servers, so I would have
thought if one wasn't available it would use the other, but this doesn't
seem to be the case.
Which tests are going purple ? The network tests (conn, smtp, http etc.)
or the client-side tests (cpu, disk, memory ...) ?

If it's the network tests, then the problem is probably that Hobbit is
timing out the DNS requests because it takes too long to do the DNS
lookups. It probably sends the query first to the server which is down,
and then times out waiting for the response. But that would normally
cause your network tests to go red - with a DNS error status - not
purple. But setting up a caching DNS server on the Hobbit server might
help with that (and is generally a good idea when testing many servers).

So I think it's your client-side tests that go purple. Which doesn't
really make sense, since the only communication between the clients and
Hobbit normally use the IP address directly. But you should check the 
BBDISP setting in your clients' etc/hobbitclient.cfg and make sure it is
set to the IP of your Hobbit server, not the hostname.


Regards,
Henrik
list Bill Perez · Thu, 12 Jan 2006 08:18:55 -0500 ·
quoted from Henrik Størner
Which tests are going purple ? The network tests (conn, smtp, http etc.)
or the client-side tests (cpu, disk, memory ...) ?

Henrik - It is the network test (conn) that went purple for several
switches, router, windows servers, a unix server - there was really no
consistency in what went purple.  I was thinking of using the dns=ip switch
for bbtest-net to resolve this - do you think that is a viable solution or
would I be better off looking into setting up a caching DNS server on the
Hobbit server?

Thank you


On 1/12/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Jan 10, 2006 at 10:58:10AM -0500, Bill Perez wrote:
I'm hoping someone might be able to help me.  I'm running Hobbit 4.1.2on a
quoted from Henrik Størner
Fedora Core 4, monitoring approximately 500 servers.  I have been
running
Hobbit for a few months and a few times our DNS server has been rebooted
for
patching.  When this happens it causes some servers to go purple and the
only way I've been able to fix this is to restart the Hobbit service but
it
has generated a ton of alerts and not a lot of happy alert
recipients.  My
/etc/resolv.conf file has primary and secondary DNS servers, so I would
have
thought if one wasn't available it would use the other, but this doesn't
seem to be the case.
Which tests are going purple ? The network tests (conn, smtp, http etc.)
or the client-side tests (cpu, disk, memory ...) ?

If it's the network tests, then the problem is probably that Hobbit is
timing out the DNS requests because it takes too long to do the DNS
lookups. It probably sends the query first to the server which is down,
and then times out waiting for the response. But that would normally
cause your network tests to go red - with a DNS error status - not
purple. But setting up a caching DNS server on the Hobbit server might
help with that (and is generally a good idea when testing many servers).

So I think it's your client-side tests that go purple. Which doesn't
really make sense, since the only communication between the clients and
Hobbit normally use the IP address directly. But you should check the
BBDISP setting in your clients' etc/hobbitclient.cfg and make sure it is
set to the IP of your Hobbit server, not the hostname.


Regards,
Henrik

list Henrik Størner · Thu, 12 Jan 2006 14:44:29 +0100 ·
quoted from Bill Perez
On Thu, Jan 12, 2006 at 08:18:55AM -0500, Bill Perez wrote:
Which tests are going purple ? The network tests (conn, smtp, http etc.)
or the client-side tests (cpu, disk, memory ...) ?

Henrik - It is the network test (conn) that went purple for several
switches, router, windows servers, a unix server - there was really no
consistency in what went purple.  I was thinking of using the dns=ip switch
for bbtest-net to resolve this - do you think that is a viable solution or
would I be better off looking into setting up a caching DNS server on the
Hobbit server?
Since you wrote that you are monitoring some 500 hosts, I would really
suggest that you setup a caching DNS server on your Hobbit server.

Last I used Red Hat (Fedora), there was a "caching-dns" RPM included
with the necessary config files to set this up. All I needed to do was
to add a "forwarders" entry to named.conf, so that it would query our 
local DNS server (the one from resolv.conf) instead of the public root
DNS servers; and change resolv.conf to point at 127.0.0.1.

The --dns=ip switch will work, but I don't really like it because you 
will inevitable change the IP of one of your hosts, and you're bound to
forget changing the bb-hosts file as well as the DNS entries. That
causes some confusion and a frustrated admin when you find out why the
ping-test doesn't work.


Regards,
Henrik
list Michael Frey · Thu, 12 Jan 2006 11:27:29 -0500 ·
Has anyone developed a way to disable and enable a single monitored 
service on Wondows clients, instead of all services?

Michael Frey


This message, and any attachments to it, may contain information that
is privileged, confidential, and exempt from disclosure under
applicable law.  If the reader of this message is not the intended
recipient, you are notified that any use, dissemination, distribution,
copying, or communication of this message is strictly prohibited.  If
you have received this message in error, please notify the sender
immediately by return e-mail and delete the message and any
attachments.  Thank you.
list Bill Perez · Thu, 12 Jan 2006 17:42:36 -0500 ·
Thanks for the information Henrik, I really appreciate it.
quoted from Henrik Størner

On 1/12/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Jan 12, 2006 at 08:18:55AM -0500, Bill Perez wrote:
Which tests are going purple ? The network tests (conn, smtp, http etc.)
or the client-side tests (cpu, disk, memory ...) ?

Henrik - It is the network test (conn) that went purple for several
switches, router, windows servers, a unix server - there was really no
consistency in what went purple.  I was thinking of using the dns=ip
switch
for bbtest-net to resolve this - do you think that is a viable solution
or
would I be better off looking into setting up a caching DNS server on
the
Hobbit server?
Since you wrote that you are monitoring some 500 hosts, I would really
suggest that you setup a caching DNS server on your Hobbit server.

Last I used Red Hat (Fedora), there was a "caching-dns" RPM included
with the necessary config files to set this up. All I needed to do was
to add a "forwarders" entry to named.conf, so that it would query our
local DNS server (the one from resolv.conf) instead of the public root
DNS servers; and change resolv.conf to point at 127.0.0.1.

The --dns=ip switch will work, but I don't really like it because you
will inevitable change the IP of one of your hosts, and you're bound to
forget changing the bb-hosts file as well as the DNS entries. That
causes some confusion and a frustrated admin when you find out why the
ping-test doesn't work.


Regards,
Henrik