conn Flapping every 2.5 minutes
list Graeme Shea
Hi All, I have a problem with my Xymon setup I haven't been able to get around. The conn test for servers at the same site as the Xymon server that have simple host name entries in bb-hosts fails about every 2.5 minutes with a "Can't resolve IP address for 1091edudc01" error (1091edudc01 being the host name). The other hosts at this site with FQDN work fine, as do simple host names at other sites. These windows servers are part of a different windows domain so I cannot add them to my local domain. A relevant entry in bbhosts is: 10.X.Y.21 1091EDUDC01 #testip 0.0.0.0 BuServ.Alfps.Internal # http://edupass:81 Things that I have tried (in order) Added IP address to bb-hosts Added "testip" to bb-hosts Added hosts to local dns and the local domain as the search domain to Xymon server. Added the hostnames to the "hosts" file on the Xymon server Built new server using latest Ubuntu, Xymon and hobbitping. Changed to use fping. Removed IP address and "testip" from bbhosts to use the local DNS to resolve the host names. If I use a terminal to ping the problem hosts there are no failures. I ran a ping every 30 seconds for several hours and it worked every time but conn still kept failing. When I switched to using DNS to resolve the host name the flapping changed to be green for 3-4.5 minutes and red for 10-15 seconds. Nearly all Xymon settings are set to the defaults. The strange thing is when I built the new server using a temporary IP address it worked for several days but when I changed it to use the same address as the old server (so the clients can reach it) the problem occurred on the new server. I admit this sounds like a network issue but since it should be using the IP address supplied it should not be returning a "Can't resolve IP address" error and ping works fine. Using DNS to resolve the names of the other monitored servers at this site works ok. Any thoughts would be much appreciated. Regards Graeme
list Henrik Størner
▸
Hi All, I have a problem with my Xymon setup I haven't been able to get around. The conn test for servers at the same site as the Xymon server that have simple host name entries in bb-hosts fails about every 2.5 minutes with a "Can't resolve IP address for 1091edudc01" error (1091edudc01 being the host name). The other hosts at this site with FQDN work fine, as do simple host names at other sites.
Such random DNS lookup problems usually happen if your DNS server cannot quite cope with the burst of requests that Xymon generates. Are you using a local caching DNS server on the Xymon server, or the usual one for your other systems? I really do recommend installing a caching DNS server on the Xymon server - I've seen Xymon bring DNS servers to their knees easily.
▸
Things that I have tried (in order) Added IP address to bb-hosts Added "testip" to bb-hosts
These two - in combination! - will force Xymon to use the IP from the hosts.cfg (bb-hosts) file and avoid the DNS lookup. So that should solve it. Regards, Henrik
list Graeme A Shea
Thankyou Henrik, that got it. Somehow I had configured a remote bbProxy wrong :-#. I copied in the entries for another set of simple hostnames simular to the ones having the issue and forgot to update them. It's strange because those simple host names must be resolving somewhere up the DNS tree so the display on my main server was as expected (except for the failing conn test). So simple when you know what you're doing. Many thanks Graeme -----Original Message----- From: Henrik Størner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Tuesday, 22 February 2011 7:41 PM To: Shea, Graeme A Subject: Re: [Xymon] conn Flapping every 2.5 minutes Hi Graeme,
I have both an IP address and "testip" in the bb-host file. The line is "10.X.Y.21 1091EDUDC01 #testip" but I still get the error.
very strange. The "info" status column also says "Network tests use: IP-address" ?
Is it possible you have another server doing network tests? If you check the "Message received from..." line at the bottom of the status (it's in the historical status logs also), do they all mention the same source of the status?
Try running
xymoncmd xymonnet --debug 1091EDUDC01
(or "bbcmd bbtest-net" if you're on 4.2.x) and send me the output.
That will show what the network test is doing. You can also try running it with the "--no-update" status a few times to simulate the test that Xymon performs, without actually updating the status. Of course it would be most interesting if you could get the debug output from one of the runs that actually fails.
Regards,
Henrik
Important - This email and any attachments may be confidential. If received in error, please contact us and delete all copies. Before opening or using attachments check them for viruses and defects. Regardless of any loss, damage or consequence, whether caused by the negligence of the sender or not, resulting directly or indirectly from the use of any attached files our liability is limited to resupplying any affected attachments. Any representations or opinions expressed are those of the individual sender, and not necessarily those of the Department of Education and Early Childhood Development.