Xymon Mailing List Archive search

conn Flapping every 2.5 minutes

3 messages in this thread

list Graeme Shea · Tue, 22 Feb 2011 11:09:22 +1100 ·
Hi All,

I have a problem with my Xymon setup I haven't been able to get around. The
conn test for servers at the same site as the Xymon server that have simple
host name entries in bb-hosts fails about every 2.5 minutes with a "Can't
resolve IP address for 1091edudc01" error (1091edudc01 being the host name).
The other hosts at this site with FQDN work fine, as do simple host names at
other sites. These windows servers are part of a different windows domain so
I cannot add them to my local domain. A relevant entry in bbhosts is:

 
10.X.Y.21 1091EDUDC01                                 #testip

0.0.0.0 BuServ.Alfps.Internal                      # http://edupass:81

 
Things that I have tried (in order)

 
Added IP address to bb-hosts

Added "testip" to bb-hosts

Added hosts to local dns and the local domain as the search domain to Xymon
server.

Added the hostnames  to the "hosts" file on the  Xymon server

Built new server using latest Ubuntu,  Xymon and hobbitping.

Changed to use fping.

Removed IP address and "testip" from bbhosts to use the local DNS to resolve
the host names.

 
If I use a terminal to ping the problem hosts there are no failures. I ran a
ping every 30 seconds for several hours and it worked every time but conn
still kept failing. When I switched to using DNS to resolve the host name
the flapping changed to be green for 3-4.5 minutes and red for 10-15
seconds. Nearly all Xymon settings are set to the defaults.

 
The strange thing is when I built the new server using a temporary IP
address it worked for several days but when I changed it to use the same
address as the old server (so the clients can reach it) the problem occurred
on the new server. I admit this sounds like a network issue but since it
should be using the IP address supplied it should not be returning a "Can't
resolve IP address" error and ping works fine. Using  DNS to resolve the
names of the other monitored servers at this site works ok.

 
Any thoughts would be much appreciated.

 
Regards

Graeme
list Henrik Størner · Tue, 22 Feb 2011 07:58:53 +0100 ·
quoted from Graeme Shea
Hi All,

I have a problem with my Xymon setup I haven't been able to get around. The conn test for servers at the same site as the Xymon server that have simple host name entries in bb-hosts fails about every 2.5 minutes with a "Can't resolve IP address for 1091edudc01" error (1091edudc01 being the host name). The other hosts at this site with FQDN work fine, as do simple host names at other sites.
Such random DNS lookup problems usually happen if your DNS server cannot quite cope with the burst of requests that Xymon generates. Are you using a local caching DNS server on the Xymon server, or the usual one for your other systems?

I really do recommend installing a caching DNS server on the Xymon server - I've seen Xymon bring DNS servers to their knees easily.
quoted from Graeme Shea
Things that I have tried (in order)

Added IP address to bb-hosts

Added "testip" to bb-hosts
These two - in combination! - will force Xymon to use the IP from the hosts.cfg (bb-hosts) file and avoid the DNS lookup. So that should solve it.


Regards,
Henrik
list Graeme A Shea · Tue, 22 Feb 2011 20:12:04 +1100 ·
Thankyou Henrik, that got it.

Somehow I had configured a remote bbProxy wrong :-#. I copied in the entries for another set of simple hostnames simular to the ones having the issue and forgot to update them. It's strange because those simple host names must be resolving somewhere up the DNS tree so the display on my main server was as expected (except for the failing conn test).

So simple when you know what you're doing.

Many thanks

Graeme


-----Original Message-----
From: Henrik Størner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Tuesday, 22 February 2011 7:41 PM
To: Shea, Graeme A
Subject: Re: [Xymon] conn Flapping every 2.5 minutes

Hi Graeme,
I have both an IP address and "testip" in the bb-host file. The line is

"10.X.Y.21 1091EDUDC01   #testip"  but I still get the error.
very strange. The "info" status column also says "Network tests use: IP-address" ?

Is it possible you have another server doing network tests? If you check the "Message received from..." line at the bottom of the status (it's in the historical status logs also), do they all mention the same source of the status?

Try running

    xymoncmd xymonnet --debug 1091EDUDC01

(or "bbcmd bbtest-net" if you're on 4.2.x) and send me the output.
That will show what the network test is doing. You can also try running it with the "--no-update" status a few times to simulate the test that Xymon performs, without actually updating the status. Of course it would be most interesting if you could get the debug output from one of the runs that actually fails.


Regards,
Henrik

Important - This email and any attachments may be confidential. If received in error, please contact us and delete all copies. Before opening or using attachments check them for viruses and defects. Regardless of any loss, damage or consequence, whether caused by the negligence of the sender or not, resulting directly or indirectly from the use of any attached files our liability is limited to resupplying any affected attachments. Any representations or opinions expressed are those of the individual sender, and not necessarily those of the Department of Education and Early Childhood Development.