Xymon Mailing List Archive search

DNS test timeout for a downed DNS server.

3 messages in this thread

list Steve Holmes · Tue, 24 Jul 2012 16:17:33 -0400 ·
Xymon 4.2.3. One of the DNS servers we monitor was down. The bbtest
total time went from under 30 seconds to over 900 seconds almost all
of which was for the DNS test.

This caused a number of problems. I removed the dns tag from the down
server and the time dropped back to the normal level.

The mysteries to me are why did Xymon continue to test the DNS service
on that server while the conn test was red? Why did it
take 900 seconds for it to time out, even though I was running with
--dns-timeout=60? ( I.e. I can understand that it might take longer
than with the default dns timeout, but not 900 seconds.) The color for
the dns test for that host was clear and displayed the following text:


Service dns on dns.server.org is OK
Dialup host/service, or test depends on another failed test
Host appears to be down

Timeout

Seconds: 900.042


It also stopped recording data in the graph.

And more importantly, is the behavior still the same in 4.3.9?

Thanks,
Steve Holmes
Purdue University
list Steven Carr · Tue, 24 Jul 2012 21:34:03 +0100 ·
Check the archives, there is a well documented issue with the DNS timeout
not working as you think it should work (to do with the underlying
libraries).

IIRC some fixes/workarounds have been added to later versions but it wont
be til the next major release (5.?) when the underlying testing tool has
been rewritten that it will be fixed for good.

Steve
quoted from Steve Holmes


On 24 July 2012 21:17, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:
Xymon 4.2.3. One of the DNS servers we monitor was down. The bbtest
total time went from under 30 seconds to over 900 seconds almost all
of which was for the DNS test.

This caused a number of problems. I removed the dns tag from the down
server and the time dropped back to the normal level.

The mysteries to me are why did Xymon continue to test the DNS service
on that server while the conn test was red? Why did it
take 900 seconds for it to time out, even though I was running with
--dns-timeout=60? ( I.e. I can understand that it might take longer
than with the default dns timeout, but not 900 seconds.) The color for
the dns test for that host was clear and displayed the following text:


Service dns on dns.server.org is OK
Dialup host/service, or test depends on another failed test
Host appears to be down

Timeout

Seconds: 900.042


It also stopped recording data in the graph.

And more importantly, is the behavior still the same in 4.3.9?

Thanks,
Steve Holmes
Purdue University

list Henrik Størner · Tue, 24 Jul 2012 22:50:57 +0200 ·
quoted from Steve Holmes
On 24-07-2012 22:17, Steve Holmes wrote:
Xymon 4.2.3. One of the DNS servers we monitor was down. The bbtest
total time went from under 30 seconds to over 900 seconds almost all
of which was for the DNS test.

This caused a number of problems. I removed the dns tag from the down
server and the time dropped back to the normal level.

The mysteries to me are why did Xymon continue to test the DNS service
on that server while the conn test was red? Why did it
take 900 seconds for it to time out, even though I was running with
--dns-timeout=60?
DNS timeout handling does not work correctly in versions prior to 4.3.8, it is a case of the DNS library (c-ares) not behaving the way I thought it did.

4.3.8+ has a workaround in place so the timeout is fixed at approximately 25 seconds. You can achieve almost the same result by running the old bbtest-net utility with "--dns-timeout=2" (no, this won't cause DNS lookups to timeout after 2 seconds).

And more importantly, is the behavior still the same in 4.3.9?
No.


Regards,
Henrik