Xymon Mailing List Archive search

Xymon disruption every night!

13 messages in this thread

list L.M.J · Fri, 29 Jan 2016 08:56:57 +0100 ·
Hi,

I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8 3.2.0-4-amd64
Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min, around 30 hosts become unreachable :

Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed
Unable to resolve hostname foo.bar.local
System unreachable for 3 poll periods (170 seconds)
green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1]


- Got around 500 monitored hosts and looks like the same hosts are lost every single night.
- Those monitored hosts are not necessary on the same network, not the same OS.
- We cross monitored the same hosts and the other monitoring tool doesn't have report the DNS outage.
- I ran a DNS lookup every seconds on the Hobbit server several days and it never reported a DNS outage.
- I don't have any crontab installed on the server who could disturb Xymon.
- Nothing strange in the Xymon logs nor the server logs, no memory leaks or CPU overloaded.
- The rest of the day, Xymon server behavior is normal.
- What I've done on the server 1 month ago ? I don't know, no system upgrade or so.
- I had DNSMASQ acting like a cache, I disabled it : same issue
- /etc/resolv.conf is quite light : search bar.local, next line : nameserver IP.OF.OUR.DNS.SERVER1, just like other servers

The issue could be anywhere : inside or outside the server, Xymon or not... I have to confess, I'm running out of ideas to find the issue, is anyone here may have some leads, I will be thankful !

Have a nice day!
list Becker Christian · Fri, 29 Jan 2016 08:23:14 +0000 ·
Hi L-M-J,

can you exclude that this behavior is coming from any network device like a switch or default gateway?

Regards
Christian

Christian Becker
IT-Services

user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2
www.rhein-zeitung.de<http://www.rhein-zeitung.de/>;
quoted from L.M.J

Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 08:57
An: Xymon at xymon.com
Betreff: [Xymon] Xymon disruption every night!

Hi,

I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8
3.2.0-4-amd64
Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min,
around 30 hosts become unreachable :

Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed
Unable to resolve hostname foo.bar.local
System unreachable for 3 poll periods (170 seconds)

green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1<http://127.0.0.1>;]
quoted from L.M.J


- Got around 500 monitored hosts and looks like the same hosts are
lost every single night.
- Those monitored hosts are not necessary on the same network, not
the same OS.
- We cross monitored the same hosts and the other monitoring tool
doesn't have report the DNS outage.
- I ran a DNS lookup every seconds on the Hobbit server several days
and it never reported a DNS outage.
- I don't have any crontab installed on the server who could disturb
Xymon.
- Nothing strange in the Xymon logs nor the server logs, no memory
leaks or CPU overloaded.
- The rest of the day, Xymon server behavior is normal.
- What I've done on the server 1 month ago ? I don't know, no system
upgrade or so.
- I had DNSMASQ acting like a cache, I disabled it : same issue
- /etc/resolv.conf is quite light : search bar.local, next line :
nameserver IP.OF.OUR.DNS.SERVER1, just like other servers

The issue could be anywhere : inside or outside the server, Xymon or
not... I have to confess, I'm running out of ideas to find the issue, is
anyone here may have some leads, I will be thankful !

Have a nice day!
list L.M.J · Fri, 29 Jan 2016 13:06:48 +0100 ·
Problems appears on VMs and physical servers and Lan and DMZ equipments. I don't see a link between those devices :-( 
quoted from Becker Christian


Le 29 janvier 2016 09:23:14 GMT+01:00, Becker Christian <user-e4a19bfb94c0@xymon.invalid> a écrit :
Hi L-M-J,

can you exclude that this behavior is coming from any network device
like a switch or default gateway?

Regards
Christian

Christian Becker
IT-Services

user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2
www.rhein-zeitung.de<http://www.rhein-zeitung.de/>;

Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 08:57
An: Xymon at xymon.com
Betreff: [Xymon] Xymon disruption every night!

Hi,

I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8
3.2.0-4-amd64
Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min,
around 30 hosts become unreachable :

Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed
Unable to resolve hostname foo.bar.local
System unreachable for 3 poll periods (170 seconds)
green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1<http://127.0.0.1>;]


- Got around 500 monitored hosts and looks like the same hosts are
lost every single night.
- Those monitored hosts are not necessary on the same network, not
the same OS.
- We cross monitored the same hosts and the other monitoring tool
doesn't have report the DNS outage.
- I ran a DNS lookup every seconds on the Hobbit server several days
and it never reported a DNS outage.
- I don't have any crontab installed on the server who could disturb
Xymon.
- Nothing strange in the Xymon logs nor the server logs, no memory
leaks or CPU overloaded.
- The rest of the day, Xymon server behavior is normal.
- What I've done on the server 1 month ago ? I don't know, no system
upgrade or so.
- I had DNSMASQ acting like a cache, I disabled it : same issue
- /etc/resolv.conf is quite light : search bar.local, next line :
nameserver IP.OF.OUR.DNS.SERVER1, just like other servers

The issue could be anywhere : inside or outside the server, Xymon or
not... I have to confess, I'm running out of ideas to find the issue,
is
anyone here may have some leads, I will be thankful !

Have a nice day!
-- 

Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.
list Becker Christian · Fri, 29 Jan 2016 12:22:06 +0000 ·
My intention was the figure out if the network connection of the Xymon server itself has a problem…
For example, if your Xymon server is hardware, then it has a wired network interface that is connected to a network switch. That’s your link between the Xymon server and all of your other VMs and physical servers.
From my side, if you only see problems on the Xymon server, I’ld have a look at this particular switch port or the cable infrastructure to the Xymon server. Or could there be a firewall rule preventing the Xymon server accessing the DNS server?

By the way – do you have only one DNS server in /etc/resolv.conf? Did you check the logs on your DNS server? Can you issue a continuous ping to the Xymon server to see if it loses some packages in 24hours?
quoted from L.M.J

Regards
Christian


Christian Becker
IT-Services

user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2
www.rhein-zeitung.de<http://www.rhein-zeitung.de/>;

Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 13:07
An: Xymon at xymon.com
Betreff: Re: [Xymon] Xymon disruption every night!

Problems appears on VMs and physical servers and Lan and DMZ equipments. I don't see a link between those devices :-(

Le 29 janvier 2016 09:23:14 GMT+01:00, Becker Christian <user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>> a écrit :
Hi L-M-J,


can you exclude that this behavior is coming from any network device like a switch or default gateway?


Regards
Christian


Christian Becker
IT-Services


user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2
www.rhein-zeitung.de<http://www.rhein-zeitung.de/>;


Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 08:57
An: Xymon at xymon.com<mailto:Xymon at xymon.com>
Betreff: [Xymon] Xymon disruption every night!


Hi,

I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8
3.2.0-4-amd64
Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min,
around 30 hosts become unreachable :

Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed
Unable to resolve hostname foo.bar.local
System unreachable for 3 poll periods (170 seconds)
green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1<http://127.0.0.1>;]


- Got around 500 monitored hosts and looks like the same hosts are
lost every single night.
- Those monitored hosts are not necessary on the same network, not
the same OS.
- We cross monitored the same hosts and the other monitoring tool
doesn't have report the DNS outage.
- I ran a DNS lookup every seconds on the Hobbit server several days
and it never reported a DNS outage.
- I don't have any crontab installed on the server who could disturb
Xymon.
- Nothing strange in the Xymon logs nor the server logs, no memory
leaks or CPU overloaded.
- The rest of the day, Xymon server behavior is normal.
- What I've done on the server 1 month ago ? I don't know, no system
upgrade or so.
- I had DNSMASQ acting like a cache, I disabled it : same issue
- /etc/resolv.conf is quite light : search bar.local, next line :
nameserver IP.OF.OUR.DNS.SERVER1, just like other servers

The issue could be anywhere : inside or outside the server, Xymon or
not... I have to confess, I'm running out of ideas to find the issue, is
anyone here may have some leads, I will be thankful !

Have a nice day!

--
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.
list L.M.J · Tue, 16 Feb 2016 10:44:08 +0100 ·
Hi,
  
  I'm still running into troubles every night between ~0h30 and ~2h40 :-(
  1) I checked the backup on my physical XYmon server : around 9pm and runs for 4:45 min.
  2) We cross-monitored the DNS server from another monitoring tool : no DNS outage detected.
  3) I monitored the Xymon server network link state with "mii-tool" every seconds : no troubles detected
  4) I pinged my Xymon servers from 2 differents network places all night long : no troubles detected.
  5) No firewalls between my Xymon server and the monitored hosts
  6) Over 500 hosts, only ~30 are in trouble every night and mostly the same
  7) Hosts are VM, physical servers, public internet website
  
  
  Here is what I've found in the xymond.log today :
	2016-02-16 02:02:57 Flapping detected for www.foo1.com:http - 5 changes in 1708 seconds
	2016-02-16 02:02:57 Flapping detected for www.foo2.com:http - 5 changes in 1708 seconds
	2016-02-16 02:02:57 Flapping detected for www.microsoft.com:http - 5 changes in 1708 seconds
	2016-02-16 02:06:14 Flapping detected for server01:http - 5 changes in 1678 seconds
	2016-02-16 02:06:14 Flapping detected for server02:http - 5 changes in 1678 seconds
	2016-02-16 02:06:29 Flapping detected for server03:conn - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server04:ldap - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server06:ssh - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server05:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server07:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server08:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server09:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for foo.bar1.com:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for foo.bar2.com:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for foo.bar3.fr:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server10:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server11-t:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server12:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server13:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server14:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server15:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server16:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server17:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server18:http - 5 changes in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for server19:http - 5 changes in 1745 seconds


  Here is a part of the configuration + errors displayed in the XYmon HTTP interface :
	hosts.cfg : 0.0.0.0	server03	# conn	NAME:"server03" DESCR:"VM FOO BAR"
	Error :		conn NOT ok : DNS lookup failed / Unable to resolve hostname server03
				System unreachable for 2 poll periods (86 seconds)
				
	Everything looks like the DNS resolution failed.
	
	hosts.cfg : 10.X.Y.188 server05 # conn tse NAME:"Server 05" DESCR:"My comment" http://server05/
	Error : DNS error  red http://server05/ - DNS error
	        
  - Why I have a "DNS error" here ? I set up the IP yesterday to this host to solve the issue. The "conn" error disappear since yesterday evening but the http still remains.
quoted from Becker Christian
	

Le 29 janvier 2016 13:22:06 GMT+01:00, Becker Christian <user-e4a19bfb94c0@xymon.invalid> a écrit :
My intention was the figure out if the network connection of the Xymon
server itself has a problem…
For example, if your Xymon server is hardware, then it has a wired
network interface that is connected to a network switch. That’s your
link between the Xymon server and all of your other VMs and physical
servers.
From my side, if you only see problems on the Xymon server, I’ld have a
look at this particular switch port or the cable infrastructure to the
Xymon server. Or could there be a firewall rule preventing the Xymon
server accessing the DNS server?

By the way – do you have only one DNS server in /etc/resolv.conf? Did
you check the logs on your DNS server? Can you issue a continuous ping
to the Xymon server to see if it loses some packages in 24hours?

Regards
Christian


Christian Becker
IT-Services

user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2
www.rhein-zeitung.de<http://www.rhein-zeitung.de/>;

Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 13:07
An: Xymon at xymon.com
Betreff: Re: [Xymon] Xymon disruption every night!

Problems appears on VMs and physical servers and Lan and DMZ
equipments. I don't see a link between those devices :-(

Le 29 janvier 2016 09:23:14 GMT+01:00, Becker Christian
<user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>>
a écrit :
Hi L-M-J,


can you exclude that this behavior is coming from any network device
like a switch or default gateway?


Regards
Christian


Christian Becker
IT-Services


user-e4a19bfb94c0@xymon.invalid<mailto:user-e4a19bfb94c0@xymon.invalid>
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2
www.rhein-zeitung.de<http://www.rhein-zeitung.de/>;


Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 08:57
An: Xymon at xymon.com<mailto:Xymon at xymon.com>
Betreff: [Xymon] Xymon disruption every night!


Hi,

I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8
3.2.0-4-amd64
Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min,
around 30 hosts become unreachable :

Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed
Unable to resolve hostname foo.bar.local
System unreachable for 3 poll periods (170 seconds)
green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1<http://127.0.0.1>;]


- Got around 500 monitored hosts and looks like the same hosts are
lost every single night.
- Those monitored hosts are not necessary on the same network, not
the same OS.
- We cross monitored the same hosts and the other monitoring tool
doesn't have report the DNS outage.
- I ran a DNS lookup every seconds on the Hobbit server several days
and it never reported a DNS outage.
- I don't have any crontab installed on the server who could disturb
Xymon.
- Nothing strange in the Xymon logs nor the server logs, no memory
leaks or CPU overloaded.
- The rest of the day, Xymon server behavior is normal.
- What I've done on the server 1 month ago ? I don't know, no system
upgrade or so.
- I had DNSMASQ acting like a cache, I disabled it : same issue
- /etc/resolv.conf is quite light : search bar.local, next line :
nameserver IP.OF.OUR.DNS.SERVER1, just like other servers

The issue could be anywhere : inside or outside the server, Xymon or
not... I have to confess, I'm running out of ideas to find the issue,
is
anyone here may have some leads, I will be thankful !

Have a nice day!

--
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma
brièveté.
-- 
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.
list Lukas Kohl · Tue, 16 Feb 2016 11:11:27 +0100 ·
Hi,
i know this is just a Workaround, but maybe you can profit.
I have a xymon machine with a local caching bind daemon, which also helps to improve the Speed of the DNS Tests a lot.

1. yum install bind
2. customize /etc/named:
        options {
        listen-on port 53 { 127.0.0.1; };
        #listen-on-v6 port 53 { ::1; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        allow-query     { localhost; };
        #recursion yes;
        forwarders { foo1; foo2; };
        forward only;
        notify no;
         dnssec-enable no;
        dnssec-validation no;
        #dnssec-lookaside auto;
         /* Path to ISC DLV key */
        bindkeys-file "/etc/named.iscdlv.key";
         managed-keys-directory "/var/named/dynamic";
        };
         zone "." IN {
        type hint;
        file "named.ca";
        };
         include "/etc/named.rfc1912.zones";
        include "/etc/named.root.key";
3. Make sure named.conf is 640
4. Enhance /etc/resolv.conf: nameserver 127.0.0.1 

Regards,

     Lukas Kohl
     ERGO Direkt Versicherungen
     Systembetrieb 2
     Karl-Martell-Straße 60
     90344 Nürnberg
     Deutschland
     Tel.: +XX-XXX-XXX-XXXX


Von:    L-M-J <user-78bb6d5d9024@xymon.invalid>
An:     Xymon at xymon.com
Datum:  16.02.2016 10:46
Betreff:        [SPAM] Re: [Xymon] Xymon disruption every night!
Gesendet von:   "Xymon" <xymon-bounces at xymon.com>
quoted from L.M.J


Hi,

I'm still running into troubles every night between ~0h30 and ~2h40 :-(
1) I checked the backup on my physical XYmon server : around 9pm and runs for 4:45 min.
2) We cross-monitored the DNS server from another monitoring tool : no DNS outage detected.
3) I monitored the Xymon server network link state with "mii-tool" every seconds : no troubles detected
4) I pinged my Xymon servers from 2 differents network places all night long : no troubles detected.
5) No firewalls between my Xymon server and the monitored hosts
6) Over 500 hosts, only ~30 are in trouble every night and mostly the same
7) Hosts are VM, physical servers, public internet website


Here is what I've found in the xymond.log today :
2016-02-16 02:02:57 Flapping detected for www.foo1.com:http - 5 changes in 1708 seconds
2016-02-16 02:02:57 Flapping detected for www.foo2.com:http - 5 changes in 1708 seconds
2016-02-16 02:02:57 Flapping detected for www.microsoft.com:http - 5 changes in 1708 seconds
2016-02-16 02:06:14 Flapping detected for server01:http - 5 changes in 1678 seconds
2016-02-16 02:06:14 Flapping detected for server02:http - 5 changes in 1678 seconds
2016-02-16 02:06:29 Flapping detected for server03:conn - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server04:ldap - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server06:ssh - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server05:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server07:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server08:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server09:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for foo.bar1.com:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for foo.bar2.com:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for foo.bar3.fr:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server10:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server11-t:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server12:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server13:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server14:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server15:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server16:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server17:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server18:http - 5 changes in 1745 seconds
2016-02-16 02:07:21 Flapping detected for server19:http - 5 changes in 1745 seconds


Here is a part of the configuration + errors displayed in the XYmon HTTP interface :
hosts.cfg : 0.0.0.0 server03 # conn NAME:"server03" DESCR:"VM FOO BAR"
Error : conn NOT ok : DNS lookup failed / Unable to resolve hostname server03
System unreachable for 2 poll periods (86 seconds)

Everything looks like the DNS resolution failed.

hosts.cfg : 10.X.Y.188 server05 # conn tse NAME:"Server 05" DESCR:"My comment" http://server05/
Error : DNS error red http://server05/ - DNS error

- Why I have a "DNS error" here ? I set up the IP yesterday to this host to solve the issue. The "conn" error disappear since yesterday evening but the http still remains.


Le 29 janvier 2016 13:22:06 GMT+01:00, Becker Christian <user-e4a19bfb94c0@xymon.invalid> a écrit :
My intention was the figure out if the network connection of the Xymon server itself has a problem…
For example, if your Xymon server is hardware, then it has a wired network interface that is connected to a network switch. That’s your link between the Xymon server and all of your other VMs and physical servers.
From my side, if you only see problems on the Xymon server, I’ld have a look at this particular switch port or the cable infrastructure to the Xymon server. Or could there be a firewall rule preventing the Xymon server accessing the DNS server?
 By the way – do you have only one DNS server in /etc/resolv.conf? Did you check the logs on your DNS server? Can you issue a continuous ping to the Xymon server to see if it loses some packages in 24hours?
 Regards
Christian
  Christian Becker
IT-Services

 user-e4a19bfb94c0@xymon.invalid
quoted from L.M.J
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2

www.rhein-zeitung.de
quoted from L.M.J
 Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 13:07
An: Xymon at xymon.com
Betreff: Re: [Xymon] Xymon disruption every night!
 Problems appears on VMs and physical servers and Lan and DMZ equipments. I don't see a link between those devices :-( 
Le 29 janvier 2016 09:23:14 GMT+01:00, Becker Christian <
user-e4a19bfb94c0@xymon.invalid> a écrit :
Hi L-M-J,
 can you exclude that this behavior is coming from any network device like a switch or default gateway?
 Regards
Christian
 Christian Becker
IT-Services

 user-e4a19bfb94c0@xymon.invalid
quoted from L.M.J
Mittelrhein-Verlag GmbH
August-Horch-Straße 28
D-56070 Koblenz
Verleger und Geschäftsführer: Walterpeter Twer
Reg.-Gericht Koblenz HRB 121
Finanzamt Koblenz Str.Nr. 22 65 10 285 2

www.rhein-zeitung.de
quoted from L.M.J
 Von: Xymon [mailto:xymon-bounces at xymon.com] Im Auftrag von L-M-J
Gesendet: Freitag, 29. Januar 2016 08:57
An: Xymon at xymon.com
Betreff: [Xymon] Xymon disruption every night!
 Hi,

I'm running Xymon since 6 years (4.3.17 atm) on Debian 7.8 3.2.0-4-amd64
Since 1 month now, every night, between 0h30 or 2h am at +/- 30 min, around 30 hosts become unreachable :

Fri Jan 29 01:16:38 2016 conn NOT ok : DNS lookup failed
Unable to resolve hostname foo.bar.local
System unreachable for 3 poll periods (170 seconds)
green 0.0.0.0 is alive (0.02 ms) [<- 127.0.0.1]


- Got around 500 monitored hosts and looks like the same hosts are lost every single night.
- Those monitored hosts are not necessary on the same network, not the same OS.
- We cross monitored the same hosts and the other monitoring tool doesn't have report the DNS outage.
- I ran a DNS lookup every seconds on the Hobbit server several days and it never reported a DNS outage.
- I don't have any crontab installed on the server who could disturb Xymon.
- Nothing strange in the Xymon logs nor the server logs, no memory leaks or CPU overloaded.
- The rest of the day, Xymon server behavior is normal.
- What I've done on the server 1 month ago ? I don't know, no system upgrade or so.
- I had DNSMASQ acting like a cache, I disabled it : same issue
- /etc/resolv.conf is quite light : search bar.local, next line : nameserver IP.OF.OUR.DNS.SERVER1, just like other servers

The issue could be anywhere : inside or outside the server, Xymon or not... I have to confess, I'm running out of ideas to find the issue, is anyone here may have some leads, I will be thankful !

Have a nice day!


-- 
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.


www.ergodirekt.de

Blog: http://blog.ergodirekt.de
Facebook: www.facebook.com/ERGODirekt
Google+: www.google.com/+ergodirekt Twitter: www.twitter.com/ERGODirekt
YouTube: www.youtube.com/ERGODirekt

ERGO Direkt Lebensversicherung AG · Amtsgericht Fürth HRB 2787 · UST-ID-Nr. DE159593454
ERGO Direkt Versicherung AG · Amtsgericht Fürth HRB 2934 · UST-ID-Nr. DE159593438
ERGO Direkt Krankenversicherung AG · Amtsgericht Fürth HRB 4694 · UST-ID-Nr. DE159593446
Vorsitzender der Aufsichtsräte der ERGO Direkt Lebensversicherung AG und der ERGO Direkt Krankenversicherung AG: Dr. Clemens Muth
Vorsitzender des Aufsichtsrats der ERGO Direkt Versicherung AG: Christian Diedrich
Vorstände: Peter Stockhorst (Vorsitzender), Ralf Hartmann, Dr. Jörg Stoffels · Sitz: Fürth
Karl-Martell-Straße 60 · 90344 Nürnberg · Internet: ergodirekt.de
UniCredit Bank AG - HypoVereinsbank Kto.-Nr.: 66 071 430 · BLZ 700 202 70
IBAN: DE63 7002 0270 0066 0714 30 · BIC: HYVEDEMM
list Magdi Mahmoud · Tue, 16 Feb 2016 17:14:20 +0000 ·
Hello

I’m trying to generate availability CSV report  from command line for a month period I’m getting error
Can someone help please

START=`date +%s --date="01 Jan 2015 00:00:00"`
  END=`date +%s --date="30 Jan 2015 23:59:59"`

 /usr/libexec/xymon/xymongen --reportopts=$START:$END:1:all --csv=monthly_Jan --csvdelim=,  --subpagecolumns=2
2016-02-16 17:08:55.700631 Weird file '/usr/share/xymon/data/hist/xymon-server-pro-prod.clientlog' skipped

Thank you


Magdi M.
UK Hosting Systems Engineer

[Description: cid:image003.png at 01D117C1.8700AB30]<http://www.easynet.com/>;

A. Chancellor House, 5 Thomas More Square, London, E1W 1YW
T. +44 (0) 20 7032 5173      M. +44 (0) 77500814000
list Ryan Novosielski · Tue, 16 Feb 2016 12:31:06 -0500 ·
Please don't thread hijack. This includes hitting reply and blanking the e-mail and changing the subject line. 
Sent from my iPhone
quoted from Magdi Mahmoud
On Feb 16, 2016, at 12:14, Magdi Mahmoud <user-3b0e1bc915a7@xymon.invalid> wrote:

Hello
 I’m trying to generate availability CSV report  from command line for a month period I’m getting error  Can someone help please
 START=`date +%s --date="01 Jan 2015 00:00:00"`
  END=`date +%s --date="30 Jan 2015 23:59:59"`
  /usr/libexec/xymon/xymongen --reportopts=$START:$END:1:all --csv=monthly_Jan --csvdelim=,  --subpagecolumns=2
2016-02-16 17:08:55.700631 Weird file '/usr/share/xymon/data/hist/xymon-server-pro-prod.clientlog' skipped
 Thank you
  Magdi M.
UK Hosting Systems Engineer

 <image001.png>
quoted from Magdi Mahmoud
 A. Chancellor House, 5 Thomas More Square, London, E1W 1YW
T. +44 (0) 20 7032 5173      M. +44 (0) 77500814000
list Japheth Cleaver · Tue, 16 Feb 2016 12:50:28 -0800 ·
quoted from L.M.J

On Tue, February 16, 2016 1:44 am, L-M-J wrote:
Hi,

  I'm still running into troubles every night between ~0h30 and ~2h40 :-(
  1) I checked the backup on my physical XYmon server : around 9pm and
runs for 4:45 min.
  2) We cross-monitored the DNS server from another monitoring tool : no
DNS outage detected.
  3) I monitored the Xymon server network link state with "mii-tool" every
seconds : no troubles detected
  4) I pinged my Xymon servers from 2 differents network places all night
long : no troubles detected.
  5) No firewalls between my Xymon server and the monitored hosts
  6) Over 500 hosts, only ~30 are in trouble every night and mostly the
same
  7) Hosts are VM, physical servers, public internet website


  Here is what I've found in the xymond.log today :
	2016-02-16 02:02:57 Flapping detected for www.foo1.com:http - 5 changes
in 1708 seconds
	2016-02-16 02:02:57 Flapping detected for www.foo2.com:http - 5 changes
in 1708 seconds
	2016-02-16 02:02:57 Flapping detected for www.microsoft.com:http - 5
changes in 1708 seconds
	2016-02-16 02:06:14 Flapping detected for server01:http - 5 changes in
1678 seconds
	2016-02-16 02:06:14 Flapping detected for server02:http - 5 changes in
1678 seconds
	2016-02-16 02:06:29 Flapping detected for server03:conn - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server04:ldap - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server06:ssh - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server05:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server07:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server08:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server09:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for foo.bar1.com:http - 5 changes
in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for foo.bar2.com:http - 5 changes
in 1745 seconds
	2016-02-16 02:07:21 Flapping detected for foo.bar3.fr:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server10:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server11-t:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server12:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server13:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server14:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server15:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server16:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server17:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server18:http - 5 changes in
1745 seconds
	2016-02-16 02:07:21 Flapping detected for server19:http - 5 changes in
1745 seconds


  Here is a part of the configuration + errors displayed in the XYmon HTTP
interface :
	hosts.cfg : 0.0.0.0	server03	# conn	NAME:"server03" DESCR:"VM FOO BAR"
	Error :		conn NOT ok : DNS lookup failed / Unable to resolve hostname
server03
				System unreachable for 2 poll periods (86 seconds)

	Everything looks like the DNS resolution failed.

	hosts.cfg : 10.X.Y.188 server05 # conn tse NAME:"Server 05" DESCR:"My
comment" http://server05/
	Error : DNS error  red http://server05/ - DNS error

  - Why I have a "DNS error" here ? I set up the IP yesterday to this host
to solve the issue. The "conn" error disappear since yesterday evening
but the http still remains.

All signs do point to an issue with DNS resolution here.

Was this a custom compile or are you using a package? If custom, what
version of c-ares is on your system? That's the underlying resolution
library that xymonnet is using by default to handle DNS lookups. The fact
that the 'conn' test remained good after you added the local hosts entry
matches that, since HTTP tests are performed using their own secondary DNS
lookup (to deal with vhosts, etc) unless the IP is specified there as
well.

Xymon otherwise does not cache DNS records or anything else when it comes
to network polling like this, since xymonnet is a brand new execution for
each run.

Try adding the '--dnslog=' option to xymonnet during this period to get a
log of exactly what's happening with DNS resolution, and --debug as well
(but just once or twice). You can also try testing using '--no-ares',
however the system resolver is much slower and less predictable than
c-ares (normally).

Another potential help might be altering your --concurrency=N setting to
something lower than the system default (which will typically be 256).


There's clearly *something* going on that's specific to that period, but
signs do point to something more on the host. This is especially true if
you add a local DNS cache and you're still seeing the problem.


HTH,
-jc
list L.M.J · Fri, 19 Feb 2016 20:17:32 +0100 ·
Le Tue, 16 Feb 2016 12:50:28 -0800,
quoted from Japheth Cleaver
"J.C. Cleaver" <user-87556346d4af@xymon.invalid> a écrit :
Try adding the '--dnslog=' option to xymonnet during this period to get a
log of exactly what's happening with DNS resolution, and --debug as well
(but just once or twice). You can also try testing using '--no-ares',
however the system resolver is much slower and less predictable than
c-ares (normally).

Hi,

   I activated the debug mode just like you suggested.

   Here is the error in the XYmon web interface

	Fri Feb 19 01:19:58 2016: DNS error
	red http://server01/ - DNS error
	Seconds: 0.000000000


    Part of the xymonnet.log (see my arrow --> a few line below) :

	14599 2016-02-19 01:18:25.663054 Adding hostname 'server01' to resolver queue
	14599 2016-02-19 01:18:25.680411 Got DNS result for host server01 : 192.168.2.188
	14599 2016-02-19 01:18:36.369905 Adding to combo msg: status+30 server01.conn green <!-- [flags:OrdAsTLe] --> Fri Feb 19 01:18:25 2016 conn ok
	14599 2016-02-19 01:18:36.495624 Calc content color host server03 : 14599 2016-02-19 01:18:36.495641 Calc http color host server01 : 14599 2016-02-19 01:18:36.495647 http://server01/(green) 14599 2016-02-19 01:18:36.495651  --> green
	14599 2016-02-19 01:18:36.495656 Adding to combo msg: status+30 server01.http green Fri Feb 19 01:18:25 2016: OK
	14599 2016-02-19 01:18:36.495662 Calc content color host server01 : 14599 2016-02-19 01:18:36.495711 Calc http color host server02 : 14599 2016-02-19 01:18:36.495717 http://server02/(green) 14599 2016-02-19 01:18:36.495720  --> green
	15866 2016-02-19 01:19:58.472535 Adding hostname 'server01' to resolver queue
-->	15866 2016-02-19 01:19:58.472579 DNS lookup failed for server01 - status Could not contact DNS servers (11)
	15866 2016-02-19 01:19:58.662143 Could not resolve URL hostname 'server01'
	15866 2016-02-19 01:19:58.662148 Adding tcp test IP=(NULL), port=80, service=http, silent=0
	15866 2016-02-19 01:20:51.309321 Calc content color host server03 : 15866 2016-02-19 01:20:51.309336 Calc http color host server01 : 15866 2016-02-19 01:20:51.309342 http://server01/(red) 15866 2016-02-19 01:20:51.309347  --> red
	15866 2016-02-19 01:20:51.309353 Adding to combo msg: status+30 server01.http red Fri Feb 19 01:19:58 2016: DNS error
	15866 2016-02-19 01:20:51.309358 Calc content color host server01 : 15866 2016-02-19 01:20:51.309399 Calc http color host server02 : 15866 2016-02-19 01:20:51.309404 http://server02/(red) 15866 2016-02-19 01:20:51.309408  --> red


	Here is a part of xymonnetagain.log without error :


	URL                      : http://server01/
	HTTP status              : 200
	HTTP headers
	HTTP/1.1 200 OK
	Content-Length: 1569
	Content-Type: text/html
	Content-Location: http://server01/iisstart.htm
	Last-Modified: Thu, 27 Mar 2003 18:18:28 GMT
	Accept-Ranges: bytes
	ETag: "0b282438df4c21:521"
	Server: Microsoft-IIS/6.0
	X-Powered-By: ASP.NET
	Date: Fri, 19 Feb 2016 00:18:25 GMT
	Connection: close

	HTTP output
	(NULL)


	14605 2016-02-19 01:18:27.985810 Calc http color host server01 : 14605 2016-02-19 01:18:27.985814 http://server01/(green) 14605 2016-02-19 01:18:27.985818  --> green
	14605 2016-02-19 01:18:27.985824 Adding to combo msg: status+30 server01.http green Fri Feb 19 01:18:25 2016: OK
	14605 2016-02-19 01:18:27.985829 Calc content color host server01 : 14605 2016-02-19 01:18:27.985839 Calc http color host server02 : 14605 2016-02-19 01:18:27.985842 http://server02/(green) 14605 2016-02-19 01:18:27.985847  -->
	 green


	Command: xymonnet '--ping' '--checkresponse' '--debug' '--dnslog=/var/log/xymon/xymonnet_test.log' 'server01' 'ap1-aze' 'server02' 'server04' 'server12.domain.local' 'server05.domain2.local' 'server06' 'domain-ws01.domain03.com' 'domain.domain03.com' 'portal.domain.com' 'server13.domain.com' 'server07' 'server07-t' 'server14' 'server15' 'server07' 'server08' 'server09' 'server10' 'server11' 'www.domain04.com' 'www.domain02.com' 'www.microsoft.com'
	Environment XYMONNETWORK=''
	Environment CONNTEST='TRUE'
	Environment IPTEST_2_CLEAR_ON_FAILED_CONN='TRUE'
	17377 2016-02-19 01:20:51.381764 Adding hostname 'server01' to resolver queue
	17377 2016-02-19 01:20:51.387459 Got DNS result for host server01 : 192.168.2.188

	17377 2016-02-19 01:20:53.762665 Adding to combo msg: status+30 server01.conn green <!-- [flags:OrdAsTLe] --> Fri Feb 19 01:20:51 2016 conn ok


	Address=192.168.2.188:80, open=1, res=0, err=0, connecttime=0.009261, totaltime=0.017287,
	httpstatus = 200, open=1, errcode=0, parsestatus=0
	Response:
	HTTP/1.1 200 OK
	Content-Length: 1569
	Content-Type: text/html
	Content-Location: http://server01/iisstart.htm
	Last-Modified: Thu, 27 Mar 2003 18:18:28 GMT
	Accept-Ranges: bytes
	ETag: "0b282438df4c21:521"
	Server: Microsoft-IIS/6.0
	X-Powered-By: ASP.NET
	Date: Fri, 19 Feb 2016 00:20:51 GMT
	Connection: close

	Address=192.168.2.104:443, open=1, res=0, err=0, connecttime=0.031774, totaltime=0.570770, , certinfo='Server certificate:
			subject:/CN=fs.domain.com
			start date: 2016-01-13 00:00:00 GMT
			expire date:2017-01-17 23:59:59 GMT
			key size:2048
			issuer:/C=US/O=thawte, Inc./OU=Domain Validated SSL/CN=thawte DV SSL CA - G2
			signature algorithm: sha256WithRSAEncryption

	Cipher used: AES128-SHA (128 bits)
	' (1484697599 valid)
	httpstatus = 200, open=1, errcode=0, parsestatus=0
	Response:
	HTTP/1.1 200 OK
	Cache-Control: no-cache
	Pragma: no-cache
	Content-Type: text/html; charset=utf-8
	Expires: -1
	Server: Microsoft-IIS/7.5
	X-AspNet-Version: 2.0.50727
	X-Powered-By: ASP.NET
	Date: Fri, 19 Feb 2016 00:20:51 GMT
	Connection: close
	Content-Length: 3923


	URL                      : http://server01/
	HTTP status              : 200
	HTTP headers
	HTTP/1.1 200 OK
	Content-Length: 1569
	Content-Type: text/html
	Content-Location: http://server01/iisstart.htm
	Last-Modified: Thu, 27 Mar 2003 18:18:28 GMT
	Accept-Ranges: bytes
	ETag: "0b282438df4c21:521"
	Server: Microsoft-IIS/6.0
	X-Powered-By: ASP.NET
	Date: Fri, 19 Feb 2016 00:20:51 GMT
	Connection: close

	HTTP output
	(NULL)


	17377 2016-02-19 01:20:53.763440 Calc http color host server01 : 17377 2016-02-19 01:20:53.763445 http://server01/(green) 17377 2016-02-19 01:20:53.763449  --> green
	17377 2016-02-19 01:20:53.763461 Adding to combo msg: status+30 server01.http green Fri Feb 19 01:20:51 2016: OK
	17377 2016-02-19 01:20:53.763469 Calc content color host server01 : 17377 2016-02-19 01:20:53.763482 Calc http color host server02 : 17377 2016-02-19 01:20:53.763485 http://server02/(green) 17377 2016-02-19 01:20:53.763488  --> green
quoted from Japheth Cleaver

Another potential help might be altering your --concurrency=N setting to
something lower than the system default (which will typically be 256).
I'm doing tests every 100s, so I change the --flap-count to 10 and --flap-seconds to 1001 and I had no
disruption / flapping status this night.
list Greg Earle · Wed, 20 Apr 2016 12:55:54 -0700 ·
quoted from Japheth Cleaver
On Feb 17, 2016, at 12:50 AM,"J.C. Cleaver" <user-87556346d4af@xymon.invalid> wrote:
- Why I have a "DNS error" here ?  I set up the IP yesterday to this host

to solve the issue.  The "conn" error disappeared since yesterday evening
but the http error still remains.
quoted from Japheth Cleaver
All signs do point to an issue with DNS resolution here.

Was this a custom compile or are you using a package?  If custom, what
version of c-ares is on your system?  That's the underlying resolution
library that xymonnet is using by default to handle DNS lookups.  The fact
that the 'conn' test remained good after you added the local hosts entry
matches that, since HTTP tests are performed using their own secondary DNS
lookup (to deal with vhosts, etc.) unless the IP is specified there as well.
J.C.,

I just stumbled across this thread from 2 months ago.  We're having DNS
glitches at my work and it's causing a flood of <hostname>:http "DNS error"
alerts in Xymon, which is becoming a real problem.

But here's what I don't understand.  All of our HTTP-tested hosts are
in the "hosts.cfg" file with their short names (instead of FQHNs).  So I
couldn't understand why DNS was involved since the IP addresses and names
were right there in "hosts.cfg" for Xymon to use.

Your response - specifically "unless the IP is specified there as well" -
implies that there might be another location where I could load the
names and addresses of our HTTP-tested hosts, to avoid this problem.

(Yes, I know - hosts and IP addresses can change.  But I'm in control of
that so I can deal.)

If this is the case, where is that other location where I can specify the
short names/IP addresses for the HTTP tests?

Thanks,

	- Greg
list Japheth Cleaver · Wed, 20 Apr 2016 15:40:14 -0700 ·
quoted from Greg Earle

On Wed, April 20, 2016 12:55 pm, Greg Earle wrote:
On Feb 17, 2016, at 12:50 AM,"J.C. Cleaver" <user-87556346d4af@xymon.invalid>
wrote:
- Why I have a "DNS error" here ?  I set up the IP yesterday to this
host
to solve the issue.  The "conn" error disappeared since yesterday
evening
but the http error still remains.
All signs do point to an issue with DNS resolution here.

Was this a custom compile or are you using a package?  If custom, what
version of c-ares is on your system?  That's the underlying resolution
library that xymonnet is using by default to handle DNS lookups.  The
fact
that the 'conn' test remained good after you added the local hosts entry
matches that, since HTTP tests are performed using their own secondary
DNS
lookup (to deal with vhosts, etc.) unless the IP is specified there as
well.
J.C.,

I just stumbled across this thread from 2 months ago.  We're having DNS
glitches at my work and it's causing a flood of <hostname>:http "DNS
error"
alerts in Xymon, which is becoming a real problem.

But here's what I don't understand.  All of our HTTP-tested hosts are
in the "hosts.cfg" file with their short names (instead of FQHNs).  So I
couldn't understand why DNS was involved since the IP addresses and names
were right there in "hosts.cfg" for Xymon to use.

Your response - specifically "unless the IP is specified there as well" -
implies that there might be another location where I could load the
names and addresses of our HTTP-tested hosts, to avoid this problem.

(Yes, I know - hosts and IP addresses can change.  But I'm in control of
that so I can deal.)

If this is the case, where is that other location where I can specify the
short names/IP addresses for the HTTP tests?

Thanks,

	- Greg
Yep: https://xymon.com/help/manpages/man5/hosts.cfg.5.html#lbAR

192.168.0.10 mywebserver # http://www.sample.com=192.168.0.10/


For HTTP tests, an IP override is put in for each URL you're using. This
will prevent a DNS lookup at http-test time for this URL, and when
combined with combined with testip (for any other TCP checks here) and
noconn (for ping->fping resolution), it should prevent any DNS lookups
from being done for the string "mywebserver".

Unless there's been a regression, that should be sufficient. It's
definitely worked at scale for mass testing of "hosts" that are not being
referred to by a valid DNS name and simply listen on a distinct port.


HTH,
-jc
list Greg Earle · Sat, 23 Apr 2016 15:37:21 -0700 ·
On Apr 20, 2016, at 3:40 PM, user-87556346d4af@xymon.invalid wrote:

Date: Wed, 20 Apr 2016 15:40:14 -0700
From: "J.C. Cleaver" <user-87556346d4af@xymon.invalid>
To: "Xymon Mailing List" <xymon at xymon.com>
Subject: Re: [Xymon] Xymon disruption every night!
Message-ID:
	<user-cb72d07412d6@xymon.invalid>
Content-Type: text/plain;charset=iso-8859-1
quoted from Greg Earle


On Wed, April 20, 2016 12:55 pm, Greg Earle wrote:
On Feb 17, 2016, at 12:50 AM,"J.C. Cleaver" <user-87556346d4af@xymon.invalid>
wrote:
- Why I have a "DNS error" here ?  I set up the IP yesterday to this
host
to solve the issue.  The "conn" error disappeared since yesterday
evening
but the http error still remains.
All signs do point to an issue with DNS resolution here.

Was this a custom compile or are you using a package?  If custom, what
version of c-ares is on your system?  That's the underlying resolution
library that xymonnet is using by default to handle DNS lookups.  The
fact that the 'conn' test remained good after you added the local hosts
entry matches that, since HTTP tests are performed using their own
secondary DNS lookup (to deal with vhosts, etc.) unless the IP is
specified there as well.
J.C.,

I just stumbled across this thread from 2 months ago.  We're having DNS
glitches at my work and it's causing a flood of <hostname>:http
"DNS error" alerts in Xymon, which is becoming a real problem.

But here's what I don't understand.  All of our HTTP-tested hosts are
in the "hosts.cfg" file with their short names (instead of FQHNs).  So I
couldn't understand why DNS was involved since the IP addresses and names
were right there in "hosts.cfg" for Xymon to use.

Your response - specifically "unless the IP is specified there as well" -
implies that there might be another location where I could load the
names and addresses of our HTTP-tested hosts, to avoid this problem.

(Yes, I know - hosts and IP addresses can change.  But I'm in control of
that so I can deal.)

If this is the case, where is that other location where I can specify the
short names/IP addresses for the HTTP tests?
Yep: https://xymon.com/help/manpages/man5/hosts.cfg.5.html#lbAR

192.168.0.10 mywebserver # http://www.sample.com=192.168.0.10/

For HTTP tests, an IP override is put in for each URL you're using. This
will prevent a DNS lookup at http-test time for this URL, and when
combined with combined with testip (for any other TCP checks here) and
noconn (for ping->fping resolution), it should prevent any DNS lookups
from being done for the string "mywebserver".

Unless there's been a regression, that should be sufficient. It's
definitely worked at scale for mass testing of "hosts" that are not being
referred to by a valid DNS name and simply listen on a distinct port.
Thank you!!!  I had tried using "testip" in each of the Web server
entries and couldn't understand why we still kept getting DNS meltdowns.

Apparently I missed this subtle line in hosts.cfg(5):

--
xymonnet ignores the "testip" tag normally used to force a test to use the
IP-address from the hosts.cfg file instead of the hostname, when it performs
http and https tests.
--

D'oh!

	- Greg