Xymon Mailing List Archive search

Two DNS lookups for a server but one fails

25 messages in this thread

list Martin Ward · Mon, 5 Jan 2009 13:58:56 -0000 ·
Hi all,

I have a number of servers that run their own DNS service, each giving
out a single IP address. These DNS servers are monitored in bb-hosts
like this (names and Ips changed to protect the guilty):

1.2.3.4    dns1.server.com        # smtp
dns=a:smtp.server.com,ns:smtp.server.com

The issue is that this configuration performs two DNS lookups, one for
an NS record for smtp.server.com and one for and A record for
smtp.server.com. When run either the NS or the A record is returned but
not both. The one that fails shows the following in the web interface:

Service dns on mc20.lon.server.com is not OK : Service unavailable

*** DNS lookup of 'a:smtp.server.com' ***
Timeout (channel destroyed)

In this instance it was the A record that failed but in others it is the
NS record. I always get one of the queries back successfully, but not
both.

These were working fine until I upgraded to Xymon 4.2.2 so this looks
like the culprit. Any ideas or suggestions?

|\/|artin
--  
Martin W. Ward
TAC Network Systems Team Leader
COLT
Unit 12, Powergate Business Park
Volt Avenue, Park Royal, London
NW10 6PW, United Kingdom

Tel: + 44 (0)20 7863 5218   Internal: 8 441 5218
Fax: + 44 (0)20 7863 5610
Email: user-2d33a6eb6a05@xymon.invalid 
www.colt.net

Data | Voice | Managed Services


*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 

The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 

Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 

No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  

Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Henrik Størner · Wed, 7 Jan 2009 14:29:48 +0100 ·
Hi Martin,
quoted from Martin Ward

On Mon, Jan 05, 2009 at 01:58:56PM -0000, Ward, Martin wrote:
*** DNS lookup of 'a:smtp.server.com' ***
Timeout (channel destroyed)

In this instance it was the A record that failed but in others it is the
NS record. I always get one of the queries back successfully, but not
both.

These were working fine until I upgraded to Xymon 4.2.2 so this looks
like the culprit. Any ideas or suggestions?
there was a change done in 4.2.2 - backported from the 4.3.x code - to
fix a bug that could cause the network tests to lockup while doing the
DNS lookups. It is probably that "fix" that causes the problem.

Going over the DNS code again, I think there's some flawed logic in
how it handles the lookups. Could you try the attached version of 
xymon-4.2.2/bbnet/dns.c ? Just copy it on top of the existing one,
then run "make" and copy the resulting xymon-4.2.2/bbnet/bbtest-net 
binary to your ~xymon/server/bin/ directory (save the existing one
just in case this completely breaks stuff).


Let me know if that is better.


Regards,
Henrik
Attachments (1)
list Martin Ward · Wed, 7 Jan 2009 15:51:35 -0000 ·
Hi Henrik,

I compiled that in and installed it but it seems to have messed up all the remote port checks. All my ssh port tests, which are initiated from the server, are now purple, as well as the DNS checks, syslog port checks and others besides.

Rebuilding with the previous version has restored the remote port checks as well as the dual-DNS-check errors.

|\/|artin
quoted from Henrik Størner
-----Original Message-----
From: Henrik Størner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: 07 January 2009 13:30
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails


Hi Martin,

On Mon, Jan 05, 2009 at 01:58:56PM -0000, Ward, Martin wrote:
*** DNS lookup of 'a:smtp.server.com' ***
Timeout (channel destroyed)
In this instance it was the A record that failed but in others it is > the NS record. I always get one of the queries back successfully, but > not both.
These were working fine until I upgraded to Xymon 4.2.2 so this looks > like the culprit. Any ideas or suggestions?
there was a change done in 4.2.2 - backported from the 4.3.x code - to fix a bug that could cause the network tests to lockup while doing the DNS lookups. It is probably that "fix" that causes the problem.

Going over the DNS code again, I think there's some flawed logic in how it handles the lookups. Could you try the attached version of xymon-4.2.2/bbnet/dns.c ? Just copy it on top of the existing one, then run "make" and copy the resulting xymon-4.2.2/bbnet/bbtest-net binary to your ~xymon/server/bin/ directory (save the existing one just in case this completely breaks stuff).


Let me know if that is better.


Regards,
Henrik

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Johan Sjöberg · Wed, 7 Jan 2009 17:01:12 +0100 ·
Hi.

We have been experiencing another DNS check problem since the upgrade to Xymon 4.2.2. Since I upgraded, I sometimes get "Timeout (channel destroyed) Seconds: 4.999" on two DNS servers that are on an offsite location (connected over VPN). The problem started immediately after the update, so I think it is related. This never happened with 4.2.0. Has the timeout been changed in the new version?
Anyhow, I compiled and installed the new dns.c and have not experienced any "purple" issues. Now I will just wait and see if the DNS check alerts will continue to appear.

/Johan
quoted from Martin Ward

-----Original Message-----
From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid] Sent: den 7 januari 2009 16:52
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails

Hi Henrik,

I compiled that in and installed it but it seems to have messed up all the remote port checks. All my ssh port tests, which are initiated from the server, are now purple, as well as the DNS checks, syslog port checks and others besides.

Rebuilding with the previous version has restored the remote port checks as well as the dual-DNS-check errors.

|\/|artin
-----Original Message-----
From: Henrik Størner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: 07 January 2009 13:30
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails


Hi Martin,

On Mon, Jan 05, 2009 at 01:58:56PM -0000, Ward, Martin wrote:
*** DNS lookup of 'a:smtp.server.com' ***
Timeout (channel destroyed)
In this instance it was the A record that failed but in others it is > the NS record. I always get one of the queries back successfully, but > not both.
These were working fine until I upgraded to Xymon 4.2.2 so this looks > like the culprit. Any ideas or suggestions?
there was a change done in 4.2.2 - backported from the 4.3.x code - to fix a bug that could cause the network tests to lockup while doing the DNS lookups. It is probably that "fix" that causes the problem.

Going over the DNS code again, I think there's some flawed logic in how it handles the lookups. Could you try the attached version of xymon-4.2.2/bbnet/dns.c ? Just copy it on top of the existing one, then run "make" and copy the resulting xymon-4.2.2/bbnet/bbtest-net binary to your ~xymon/server/bin/ directory (save the existing one just in case this completely breaks stuff).


Let me know if that is better.


Regards,
Henrik

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Johan Sjöberg · Thu, 8 Jan 2009 09:04:50 +0100 ·
This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by signal 6"

/Johan
quoted from Johan Sjöberg


-----Original Message-----
From: Johan Sjöberg [mailto:user-74c177c1220d@xymon.invalid] Sent: den 7 januari 2009 17:01
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails

Hi.

We have been experiencing another DNS check problem since the upgrade to Xymon 4.2.2. Since I upgraded, I sometimes get "Timeout (channel destroyed) Seconds: 4.999" on two DNS servers that are on an offsite location (connected over VPN). The problem started immediately after the update, so I think it is related. This never happened with 4.2.0. Has the timeout been changed in the new version?
Anyhow, I compiled and installed the new dns.c and have not experienced any "purple" issues. Now I will just wait and see if the DNS check alerts will continue to appear.

/Johan

-----Original Message-----
From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid] Sent: den 7 januari 2009 16:52
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails

Hi Henrik,

I compiled that in and installed it but it seems to have messed up all the remote port checks. All my ssh port tests, which are initiated from the server, are now purple, as well as the DNS checks, syslog port checks and others besides.

Rebuilding with the previous version has restored the remote port checks as well as the dual-DNS-check errors.

|\/|artin
-----Original Message-----
From: Henrik Størner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: 07 January 2009 13:30
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails


Hi Martin,

On Mon, Jan 05, 2009 at 01:58:56PM -0000, Ward, Martin wrote:
*** DNS lookup of 'a:smtp.server.com' ***
Timeout (channel destroyed)
In this instance it was the A record that failed but in others it is > the NS record. I always get one of the queries back successfully, but > not both.
These were working fine until I upgraded to Xymon 4.2.2 so this looks > like the culprit. Any ideas or suggestions?
there was a change done in 4.2.2 - backported from the 4.3.x code - to fix a bug that could cause the network tests to lockup while doing the DNS lookups. It is probably that "fix" that causes the problem.

Going over the DNS code again, I think there's some flawed logic in how it handles the lookups. Could you try the attached version of xymon-4.2.2/bbnet/dns.c ? Just copy it on top of the existing one, then run "make" and copy the resulting xymon-4.2.2/bbnet/bbtest-net binary to your ~xymon/server/bin/ directory (save the existing one just in case this completely breaks stuff).


Let me know if that is better.


Regards,
Henrik

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Lars Ebeling · Thu, 8 Jan 2009 09:24:29 +0100 ·
----- Original Message ----- From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
quoted from Johan Sjöberg
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by signal 
6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump and post the result here, for Henrik to look at.

-- 
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde
list Johan Sjöberg · Thu, 8 Jan 2009 09:38:11 +0100 ·
Where should that core dump be located? There is no dump located in the server/bin/ directory :(

/Johan
quoted from Lars Ebeling

-----Original Message-----
From: Lars Ebeling [mailto:user-1fecd3eafd52@xymon.invalid] Sent: den 8 januari 2009 09:24
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails


----- Original Message ----- From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by signal 
6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump and post the result here, for Henrik to look at.

-- 
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde
list Lars Ebeling · Thu, 8 Jan 2009 13:17:51 +0100 ·
run find

find / -name "core*" -print

Lars
quoted from Johan Sjöberg


----- Original Message ----- 
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:38 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


Where should that core dump be located? There is no dump located in the 
server/bin/ directory :(

/Johan

-----Original Message-----
From: Lars Ebeling [mailto:user-1fecd3eafd52@xymon.invalid]
Sent: den 8 januari 2009 09:24
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails


----- Original Message ----- 
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on
bbtest for the Xymon server, saying " - Program crashed Fatal signal
caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by signal
6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

-- 
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde
list Johan Sjöberg · Thu, 8 Jan 2009 15:44:14 +0100 ·
Hi.

I think I was able to run gdb the correct way... Here is the output:

# gdb ../bin/bbtest-net core
GNU gdb 6.4.90-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".


warning: Can't read pathname for load map: Input/output error.
Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...done.
Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8
Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...done.
Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8
Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libdl.so.2
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/tls/i686/cmov/libnss_files.so.2...done.
Loaded symbols for /lib/tls/i686/cmov/libnss_files.so.2
Core was generated by `bbtest-net --report --ping --checkresponse'.
Program terminated with signal 6, Aborted.
#0  0xb7efa410 in ?? ()
quoted from Lars Ebeling


/Johan

-----Original Message-----
From: Lars Ebeling [mailto:user-1fecd3eafd52@xymon.invalid] 
Sent: den 8 januari 2009 13:18
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

run find

find / -name "core*" -print

Lars


----- Original Message ----- 
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:38 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


Where should that core dump be located? There is no dump located in the 
server/bin/ directory :(

/Johan

-----Original Message-----
From: Lars Ebeling [mailto:user-1fecd3eafd52@xymon.invalid]
Sent: den 8 januari 2009 09:24
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails


----- Original Message ----- 
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on
bbtest for the Xymon server, saying " - Program crashed Fatal signal
caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by signal
6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

-- 
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde
list dOCtoR MADneSs · Fri, 09 Jan 2009 07:33:49 +0100 ·
On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling"
quoted from Johan Sjöberg
<user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message ----- From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by
signal
6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

-- 
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde

Hi,

Same issue on my side, i used the new compiled bbtest-net using corrected
dns.c
All is OK, but i got same issue with same error message.
list Vernon Everett · Fri, 9 Jan 2009 15:43:58 +0900 ·
This looks remarkably like the error we were experiencing.
What are you running on?

Do you have the case where all the network tests - ping, http, https, ftp etc. go purple?
If so it might be the same.

With Henrik's assistance, we resolved it down to a problem with the ARES resolver.
I changed our hobbitlaunch.cfg file to include the --no-ares option (see below), and we have never had the issue again.
[bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m 
(Of course this does come with a caveat. See bbtest-net man page in Xymon docs http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )

YMMV.

Cheers
   V
quoted from dOCtoR MADneSs


-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid] Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling"
<user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by 
signal 6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

--
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde

Hi,

Same issue on my side, i used the new compiled bbtest-net using corrected dns.c All is OK, but i got same issue with same error message.


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
list dOCtoR MADneSs · Fri, 09 Jan 2009 07:59:29 +0100 ·
On Fri, 9 Jan 2009 15:43:58 +0900, "Everett, Vernon"
quoted from Vernon Everett
<user-9da1a1882f49@xymon.invalid> wrote:
This looks remarkably like the error we were experiencing.
What are you running on?

Do you have the case where all the network tests - ping, http, https, ftp
etc. go purple?
If so it might be the same.

With Henrik's assistance, we resolved it down to a problem with the ARES
resolver.
I changed our hobbitlaunch.cfg file to include the --no-ares option (see
below), and we have never had the issue again.
[bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m 
(Of course this does come with a caveat. See bbtest-net man page in Xymon
docs http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )

YMMV.

Cheers
   V


-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid] Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling"
<user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by 
signal 6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

--
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde

Hi,

Same issue on my side, i used the new compiled bbtest-net using corrected
dns.c All is OK, but i got same issue with same error message.


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.

Hi,

I had purple on all the net tests (ping,http, https...)
Your solution to use the --no-ares option is with the recompiled bbtest-net
or with the original one ? (in fact , your solution fix the DNS test
problem, or the massive purples one ?)
list Vernon Everett · Fri, 9 Jan 2009 16:16:02 +0900 ·
It fixed both problems.
It worked with the bbtest-net as it was.
--no-ares is a standard option. 
quoted from dOCtoR MADneSs
-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid] Sent: Friday, 9 January 2009 3:59 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails

On Fri, 9 Jan 2009 15:43:58 +0900, "Everett, Vernon"
<user-9da1a1882f49@xymon.invalid> wrote:
This looks remarkably like the error we were experiencing.
What are you running on?

Do you have the case where all the network tests - ping, http, https, ftp etc. go purple?
If so it might be the same.

With Henrik's assistance, we resolved it down to a problem with the ARES resolver.
I changed our hobbitlaunch.cfg file to include the --no-ares option (see below), and we have never had the issue again.
[bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m

(Of course this does come with a caveat. See bbtest-net man page in Xymon docs http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )

YMMV.

Cheers
   V


-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid]
Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling"
<user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by 
signal 6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

--
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde

Hi,

Same issue on my side, i used the new compiled bbtest-net using corrected dns.c All is OK, but i got same issue with same error message.


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.

Hi,

I had purple on all the net tests (ping,http, https...)
Your solution to use the --no-ares option is with the recompiled bbtest-net
or with the original one ? (in fact , your solution fix the DNS test
problem, or the massive purples one ?)


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
list Johan Sjöberg · Fri, 9 Jan 2009 09:49:18 +0100 ·
We did not experience any "purple" problems, only this single crash of the bbtest-net. But I suppose that several consecutive crashes would have caused the tests to go purple.

/Johan
quoted from Vernon Everett

-----Original Message-----
From: Everett, Vernon [mailto:user-9da1a1882f49@xymon.invalid] Sent: den 9 januari 2009 07:44
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails

This looks remarkably like the error we were experiencing.
What are you running on?

Do you have the case where all the network tests - ping, http, https, ftp etc. go purple?
If so it might be the same.

With Henrik's assistance, we resolved it down to a problem with the ARES resolver.
I changed our hobbitlaunch.cfg file to include the --no-ares option (see below), and we have never had the issue again.
[bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m 
(Of course this does come with a caveat. See bbtest-net man page in Xymon docs http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )

YMMV.

Cheers
   V


-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid] Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling"
<user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by 
signal 6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

--
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde

Hi,

Same issue on my side, i used the new compiled bbtest-net using corrected dns.c All is OK, but i got same issue with same error message.


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
list dOCtoR MADneSs · Fri, 09 Jan 2009 11:22:24 +0100 ·
On Fri, 9 Jan 2009 16:16:02 +0900, "Everett, Vernon"
quoted from Johan Sjöberg
<user-9da1a1882f49@xymon.invalid> wrote:
It fixed both problems.
It worked with the bbtest-net as it was.
--no-ares is a standard option. 
-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid] Sent: Friday, 9 January 2009 3:59 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails

On Fri, 9 Jan 2009 15:43:58 +0900, "Everett, Vernon"
<user-9da1a1882f49@xymon.invalid> wrote:
This looks remarkably like the error we were experiencing.
What are you running on?

Do you have the case where all the network tests - ping, http, https, ftp etc. go purple?
If so it might be the same.

With Henrik's assistance, we resolved it down to a problem with the ARES resolver.
I changed our hobbitlaunch.cfg file to include the --no-ares option (see below), and we have never had the issue again.
[bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m

(Of course this does come with a caveat. See bbtest-net man page in Xymon docs http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )

YMMV.

Cheers
   V


-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid]
Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling"
<user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


This night, after installing the new bbtest-net, we received an alarm on bbtest for the Xymon server, saying " - Program crashed Fatal signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by 
signal 6"
/Johan


If the program crash, you should have a coredump. Run gdb on the coredump
and post the result here, for Henrik to look at.

--
Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"I am not young enough to know everything."
-- Oscar Wilde

Hi,

Same issue on my side, i used the new compiled bbtest-net using corrected dns.c All is OK, but i got same issue with same error message.


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.

Hi,

I had purple on all the net tests (ping,http, https...)
Your solution to use the --no-ares option is with the recompiled
bbtest-net
or with the original one ? (in fact , your solution fix the DNS test
problem, or the massive purples one ?)


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.

The use of --no-ares does not change the errors "Timeout (channel
destroyed)" when using original bbtest-net binary.
When using the corrected one, seems to work. I'll wait to see if the
network tests still alive and keep you informed.
list Martin Ward · Fri, 9 Jan 2009 10:34:30 -0000 ·
Johan,

The "purple problems" are a different view of this same error. Having experienced both I can say that you either get bb-net to crash or you find that all your remote connection tests (where bbtest-net verifies if a remote port is accessible) turn purple. Possibly you may get both at the same time, I haven't noticed that yet.

I can confirm that using the new dns.c code bbtest-net crashes on my Solaris 10 system whether I use the --no-ares option or not. 8-(

|\/|artin
quoted from Johan Sjöberg

-----Original Message-----
From: Johan Sjöberg [mailto:user-74c177c1220d@xymon.invalid] Sent: 09 January 2009 08:49
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


We did not experience any "purple" problems, only this single crash of the bbtest-net. But I suppose that several consecutive crashes would have caused the tests to go purple.

/Johan

-----Original Message-----
From: Everett, Vernon [mailto:user-9da1a1882f49@xymon.invalid] Sent: den 9 januari 2009 07:44
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails

This looks remarkably like the error we were experiencing.
What are you running on?

Do you have the case where all the network tests - ping, http, https, ftp etc. go purple? If so it might be the same.

With Henrik's assistance, we resolved it down to a problem with the ARES resolver. I changed our hobbitlaunch.cfg file to include the --no-ares option (see below), and we have never had the issue again. [bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m 
(Of course this does come with a caveat. See bbtest-net man page in Xymon docs http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )

YMMV.

Cheers
   V


-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid] Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling" <user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
This night, after installing the new bbtest-net, we received an alarm
on bbtest for the Xymon server, saying " - Program crashed Fatal > signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by
signal 6"
/Johan
If the program crash, you should have a coredump. Run gdb on the
coredump
and post the result here, for Henrik to look at.
--
Regards
Lars Ebeling
http://leopg9.no-ip.org
Hobbithobbyist
"I am not young enough to know everything."
-- Oscar Wilde
Hi,

Same issue on my side, i used the new compiled bbtest-net using corrected dns.c All is OK, but i got same issue with same error message.


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Martin Ward · Fri, 9 Jan 2009 11:48:02 -0000 ·
OK, I found a core file and have gleaned the following (I removed the symbol load messages):

Core was generated by `bbtest-net --report --ping --checkresponse --no-ares'.
Program terminated with signal 6, Aborted.
#0  0xfec64a27 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0  0xfec64a27 in _lwp_kill () from /lib/libc.so.1
#1  0xfec621d4 in thr_kill () from /lib/libc.so.1
#2  0xfec111c7 in raise () from /lib/libc.so.1
#3  0xfebf15d9 in abort () from /lib/libc.so.1
#4  0x0806cbea in sigsegv_handler (signum=11) at sig.c:52
#5  0xfec63def in __sighndlr () from /lib/libc.so.1
#6  0xfec5a292 in call_user_handler () from /lib/libc.so.1
#7  <signal handler called>
#8  0x0806db51 in strbuf_addtobuffer (buf=0x10,
    newtext=0x8106960 "@\f\a\bÀX\022\b\b", newlen=135419976) at strfunc.c:100
#9  0x08061783 in dns_detail_callback (arg=0x812ee90, status=16, abuf=0x0,
    alen=0) at dns2.c:215
#10 0x08070ba0 in end_squery (squery=0x81258c0, status=134700378, abuf=0x0,
    alen=0) at ares_search.c:185
#11 0x08070c6f in search_callback (arg=0x81258c0, status=134700378, abuf=0x0,
    alen=0) at ares_search.c:179
#12 0x08072fac in qcallback (arg=0x8106960, status=16, abuf=0x0, alen=0)
    at ares_query.c:110
#13 0x080728b4 in ares_destroy (channel=0x8125848) at ares_destroy.c:40
#14 0x08060ff3 in dns_test_server (serverip=0x0,
    hostname=0x810781c "a:mx.colt.net,ns:mx.colt.net", banner=0x8106f60)
    at dns.c:362
#15 0x08057bf7 in run_nslookup_service (service=0x0) at bbtest-net.c:970
---Type <return> to continue, or q <return> to quit---
#16 0x0805abf9 in main (argc=5, argv=0x804675c) at bbtest-net.c:2218
(gdb) 
|\/|artin
quoted from Martin Ward
-----Original Message-----
From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid] Sent: 09 January 2009 10:35
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


Johan,

The "purple problems" are a different view of this same error. Having experienced both I can say that you either get bb-net to crash or you find that all your remote connection tests (where bbtest-net verifies if a remote port is accessible) turn purple. Possibly you may get both at the same time, I haven't noticed that yet.

I can confirm that using the new dns.c code bbtest-net crashes on my Solaris 10 system whether I use the --no-ares option or not. 8-(

|\/|artin

-----Original Message-----
From: Johan Sjöberg [mailto:user-74c177c1220d@xymon.invalid]
Sent: 09 January 2009 08:49
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
We did not experience any "purple" problems, only this single
crash of the bbtest-net. But I suppose that several > consecutive crashes would have caused the tests to go purple.
/Johan
-----Original Message-----
From: Everett, Vernon [mailto:user-9da1a1882f49@xymon.invalid]
Sent: den 9 januari 2009 07:44
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
This looks remarkably like the error we were experiencing. What are > you running on?
Do you have the case where all the network tests - ping,
http, https, ftp etc. go purple? If so it might be the same.
With Henrik's assistance, we resolved it down to a problem
with the ARES resolver. I changed our hobbitlaunch.cfg file > to include the --no-ares option (see below), and we have > never had the issue again. [bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m > > (Of course this does come with a caveat. See bbtest-net man
page in Xymon docs > http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )
YMMV.
Cheers
   V
-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid]
Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails
On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling"
<user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
This night, after installing the new bbtest-net, we
received an alarm
on bbtest for the Xymon server, saying " - Program crashed Fatal
signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet terminated by > >>signal 6"
/Johan
If the program crash, you should have a coredump. Run gdb on the > > coredump
and post the result here, for Henrik to look at.
--
Regards
Lars Ebeling
http://leopg9.no-ip.org
Hobbithobbyist
"I am not young enough to know everything."
-- Oscar Wilde
Hi,
Same issue on my side, i used the new compiled bbtest-net
using corrected dns.c All is OK, but i got same issue with > same error message.
NOTICE: This email and any attachments are confidential.
They may contain legally privileged information or > copyright material. You must not read, copy, use or > disclose them without authorisation. If you are not an > intended recipient, please contact us at once by return > email and then delete both messages and all attachments.
**************************************************************
***********************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to 
www.colt.net or contact us on +44(0)20 7390 3900.

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Martin Ward · Fri, 9 Jan 2009 15:31:06 -0000 ·
Well I have poked and prodded at the code for the last few hours but to no avail. I can't actually see why strbuf_addtobuffer() is shown as having rubbish passed to it in the newtext variable when it is being called from dns2.c with a static text string: "Undocumented ARES return code\n"

Also I am unsure why it's even reaching this part of the code when I specified --no-ares on the command line.

Any ideas?
quoted from Martin Ward

|\/|artin
-----Original Message-----
From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid] Sent: 09 January 2009 11:48
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails


OK, I found a core file and have gleaned the following (I removed the symbol load messages):

Core was generated by `bbtest-net --report --ping --checkresponse --no-ares'. Program terminated with signal 6, Aborted. #0  0xfec64a27 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0  0xfec64a27 in _lwp_kill () from /lib/libc.so.1
#1  0xfec621d4 in thr_kill () from /lib/libc.so.1
#2  0xfec111c7 in raise () from /lib/libc.so.1
#3  0xfebf15d9 in abort () from /lib/libc.so.1
#4  0x0806cbea in sigsegv_handler (signum=11) at sig.c:52
#5  0xfec63def in __sighndlr () from /lib/libc.so.1
#6  0xfec5a292 in call_user_handler () from /lib/libc.so.1
#7  <signal handler called>
#8  0x0806db51 in strbuf_addtobuffer (buf=0x10,
    newtext=0x8106960 "@\f\a\bÀX\022\b\b", newlen=135419976) at strfunc.c:100 #9  0x08061783 in dns_detail_callback (arg=0x812ee90, status=16, abuf=0x0,
    alen=0) at dns2.c:215
#10 0x08070ba0 in end_squery (squery=0x81258c0, status=134700378, abuf=0x0,
    alen=0) at ares_search.c:185
#11 0x08070c6f in search_callback (arg=0x81258c0, status=134700378, abuf=0x0,
    alen=0) at ares_search.c:179
#12 0x08072fac in qcallback (arg=0x8106960, status=16, abuf=0x0, alen=0)
    at ares_query.c:110
#13 0x080728b4 in ares_destroy (channel=0x8125848) at ares_destroy.c:40 #14 0x08060ff3 in dns_test_server (serverip=0x0,
    hostname=0x810781c "a:mx.colt.net,ns:mx.colt.net", banner=0x8106f60)
    at dns.c:362
#15 0x08057bf7 in run_nslookup_service (service=0x0) at bbtest-net.c:970 ---Type <return> to continue, or q <return> to quit--- #16 0x0805abf9 in main (argc=5, argv=0x804675c) at bbtest-net.c:2218
(gdb) 
|\/|artin
-----Original Message-----
From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid]
Sent: 09 January 2009 10:35
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
Johan,
The "purple problems" are a different view of this same
error. Having experienced both I can say that you either get > bb-net to crash or you find that all your remote connection > tests (where bbtest-net verifies if a remote port is > accessible) turn purple. Possibly you may get both at the > same time, I haven't noticed that yet.
I can confirm that using the new dns.c code bbtest-net
crashes on my Solaris 10 system whether I use the --no-ares > option or not. 8-(
|\/|artin
-----Original Message-----
From: Johan Sjöberg [mailto:user-74c177c1220d@xymon.invalid]
Sent: 09 January 2009 08:49
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
We did not experience any "purple" problems, only this single crash > > of the bbtest-net. But I suppose that several consecutive crashes > > would have caused the tests to go purple.
/Johan
-----Original Message-----
From: Everett, Vernon [mailto:user-9da1a1882f49@xymon.invalid]
Sent: den 9 januari 2009 07:44
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
This looks remarkably like the error we were experiencing. What are
you running on?
Do you have the case where all the network tests - ping, http, > > https, ftp etc. go purple? If so it might be the same.
With Henrik's assistance, we resolved it down to a problem with the > > ARES resolver. I changed our hobbitlaunch.cfg file to include the > > --no-ares option (see below), and we have never had the issue again. > > [bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse --no-ares
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m
(Of course this does come with a caveat. See bbtest-net man page in > > Xymon docs > > http://www.xymon.com/hobbit/help/manpages/man1/bbtest-net.1.html )
YMMV.
Cheers
   V
-----Original Message-----
From: user-d54077869176@xymon.invalid [mailto:user-d54077869176@xymon.invalid]
Sent: Friday, 9 January 2009 3:34 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails
On Thu, 8 Jan 2009 09:24:29 +0100, "Lars Ebeling" > > <user-1fecd3eafd52@xymon.invalid> wrote:
----- Original Message -----
From: "Johan Sjöberg" <user-74c177c1220d@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Thursday, January 08, 2009 9:04 AM
Subject: RE: [hobbit] Two DNS lookups for a server but one fails
This night, after installing the new bbtest-net, we
received an alarm
on bbtest for the Xymon server, saying " - Program crashed Fatal > > > signal caught!"
From hobbitlaunch.log: "2009-01-08 05:05:07 Task bbnet
terminated by
signal 6"
/Johan
If the program crash, you should have a coredump. Run gdb on the
coredump
and post the result here, for Henrik to look at.
--
Regards
Lars Ebeling
http://leopg9.no-ip.org
Hobbithobbyist
"I am not young enough to know everything."
-- Oscar Wilde
Hi,
Same issue on my side, i used the new compiled bbtest-net using > > corrected dns.c All is OK, but i got same issue with same error > > message.
NOTICE: This email and any attachments are confidential. They may > > contain legally privileged information or copyright material. You > > must not read, copy, use or disclose them without authorisation. If > > you are not an intended recipient, please contact us at once by > > return email and then delete both messages and all attachments.
**************************************************************
***********************
The message is intended for the named addressee only and may
not be disclosed to or used by anyone else, nor may it be > copied in any way. > > The contents of this message and its attachments are
confidential and may also be subject to legal privilege.  If > you are not the named addressee and/or have received this > message in error, please advise us by e-mailing > user-61c7f445d564@xymon.invalid and delete the message and any attachments > without retaining any copies. > > Internet communications are not secure and COLT does not
accept responsibility for this message, its contents nor > responsibility for any viruses. > > No contracts can be created or varied on behalf of COLT
Telecommunications, its subsidiaries or affiliates ("COLT") > and any other party by email Communications unless expressly > agreed in writing with such other party.  > > Please note that incoming emails will be automatically
scanned to eliminate potential viruses and unsolicited > promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.

**************************************************************
***********************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to 
www.colt.net or contact us on +44(0)20 7390 3900.

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Henrik Størner · Mon, 12 Jan 2009 11:50:22 +0000 (UTC) ·
OK, it seems that the bugfix I added in Xymon 4.2.2 had some nasty
side-effects.

I have started work on a 4.2.3 maintenance tree in Subversion, and
there is a new version of the DNS code in it currently. 
http://hobbitmon.svn.sourceforge.net/viewvc/hobbitmon/branches/4.2.3/

You can try out this code (download it from the link above), but it
also involves a change from C-ARES 1.2.1 -> 1.6.0, so you will have
to re-run the configure script to perhaps pick up a new runtime
library that the new C-ARES requires (librt).

I ran it for most of Friday afternoon at work with no obvious bad
effects, so I hope it will work better than the current code.

In <user-b8adb59d5e00@xymon.invalid> "Ward, Martin" <user-2d33a6eb6a05@xymon.invalid> writes:
Also I am unsure why it's even reaching this part of the code when I specif=
ied --no-ares on the command line.
Xymon still uses ARES to perform the "dns" tests for specific hosts; 
the standard resolver library does not allow you to specify what DNS
server to query. So the --no-ares option only has effect on the DNS
lookups Hobbit performs to determine the IP of the hosts it is testing,
it does not affect the specific testing of a DNS server.


Regards,
Henrik
list Martin Ward · Mon, 12 Jan 2009 14:58:38 -0000 ·
Thanks for that Henrik.

I took a copy of my existing 4.2.2 code, overwrote the bbnet directory with the SVN source and recreated the c-ares subdirectory in the bbnet subdir by untar-ing c-areas.1.6.0.tar.gz and renaming the c-ares.1.6.0/ subdirectory to c-ares/

Having done this the compilation failed as it couldn't find a library for the clock_gettime() function. I found that adding "-lrt" to the "LDAPLIBS" variable in the Makefile in the top level source directory solved the problem, but this may not be the right place to put it. A little more fiddling was required because my SSL libraries are not located in the standard library locations and I hate having to set up LD_LIBRARY_PATH everywhere, so my LDAPLIBS and SSLLIBS ended up looking like:

SSLLIBS = -L/usr/local/ssl/lib -R/usr/local/ssl/lib  -lssl -lcrypto
LDAPLIBS = -L/usr/lib -lldap  -lrt

After compiling it successfully I copied bbtest-net into the server/bin directory and restarted xymon. Apart from a timeout issue with a local script, which I believe is a config issue on my part, things look like they are working fine now. The double DNS lookups are working OK and I am getting remote port connection tests working so that's good or me.

Thanks Henrik!
quoted from Henrik Størner

|\/|artin

-----Original Message-----
From: Henrik "Størner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: 12 January 2009 11:50
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails


OK, it seems that the bugfix I added in Xymon 4.2.2 had some nasty side-effects.

I have started work on a 4.2.3 maintenance tree in Subversion, and there is a new version of the DNS code in it currently. http://hobbitmon.svn.sourceforge.net/viewvc/hobbitmon/branches/4.2.3/

You can try out this code (download it from the link above), but it also involves a change from C-ARES 1.2.1 -> 1.6.0, so you will have to re-run the configure script to perhaps pick up a new runtime library that the new C-ARES requires (librt).

I ran it for most of Friday afternoon at work with no obvious bad effects, so I hope it will work better than the current code.

In <user-b6dd35267769@xymon.invalid
quoted from Martin Ward
LT> "Ward, Martin" <user-2d33a6eb6a05@xymon.invalid> writes:
Also I am unsure why it's even reaching this part of the code when I >specif= ied --no-ares on the command line.
Xymon still uses ARES to perform the "dns" tests for specific hosts; the standard resolver library does not allow you to specify what DNS server to query. So the --no-ares option only has effect on the DNS lookups Hobbit performs to determine the IP of the hosts it is testing, it does not affect the specific testing of a DNS server.


Regards,
Henrik

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Johan Sjöberg · Mon, 19 Jan 2009 10:45:08 +0100 ·
Is it possible to use the "badTEST" syntax for DNS tests? When I try it, the DNS test stops updating in Xymon.

/Johan
list Henrik Størner · Thu, 22 Jan 2009 14:30:42 +0000 (UTC) ·
quoted from Johan Sjöberg
In <user-4d6ca4e7279b@xymon.invalid> =?iso-8859-1?Q?Johan_Sj=F6berg?= <user-74c177c1220d@xymon.invalid> writes:
Is it possible to use the "badTEST" syntax for DNS tests? When I try it, =
the DNS test stops updating in Xymon.
My immediate response would be "yes", but I haven't checked the code.
Should work, though.

What does your bb-hosts entry look like ?


Regards,
Henrik
list Johan Sjöberg · Thu, 22 Jan 2009 15:52:41 +0100 ·
It looked like this during the tests:
10.225.72.5     xxx.xxx.xxx                  # baddns:1:2:4 devmon:model(compaq;server)

With this entry, the dns test stopped updating.

/Johan
quoted from Henrik Størner


-----Original Message-----
From: Henrik "Størner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: den 22 januari 2009 15:31
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

In <user-4d6ca4e7279b@xymon.invalid> =?iso-8859-1?Q?Johan_Sj=F6berg?= <user-74c177c1220d@xymon.invalid> writes:
Is it possible to use the "badTEST" syntax for DNS tests? When I try it, =
the DNS test stops updating in Xymon.
My immediate response would be "yes", but I haven't checked the code.
Should work, though.

What does your bb-hosts entry look like ?


Regards,
Henrik
list Henrik Størner · Thu, 22 Jan 2009 15:01:35 +0000 (UTC) ·
In <user-ec5476262788@xymon.invalid> =?iso-8859-1?Q?Johan_Sj=F6berg?= <user-74c177c1220d@xymon.invalid> writes:
It looked like this during the tests:
10.225.72.5     xxx.xxx.xxx                  # baddns:1:2:4 devmon:model(compaq;server)
Ah I see. The "baddns" only tells how to handle failures of
the DNS check. You still need the "dns" to enable DNS checking
at all! So your entry should have been

10.225.72.5  xx.xxx.xxx  # baddns:1:2:4 dns devmon:model(compaq;server)


Regards,
Henrik
list Johan Sjöberg · Thu, 22 Jan 2009 16:09:31 +0100 ·
Ah, thanks, I will try that.
quoted from Henrik Størner

/Johan

-----Original Message-----
From: Henrik "Størner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: den 22 januari 2009 16:02
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Two DNS lookups for a server but one fails

In <user-ec5476262788@xymon.invalid> =?iso-8859-1?Q?Johan_Sj=F6berg?= <user-74c177c1220d@xymon.invalid> writes:
It looked like this during the tests:
10.225.72.5     xxx.xxx.xxx                  # baddns:1:2:4 devmon:model(compaq;server)
Ah I see. The "baddns" only tells how to handle failures of
the DNS check. You still need the "dns" to enable DNS checking
at all! So your entry should have been

10.225.72.5  xx.xxx.xxx  # baddns:1:2:4 dns devmon:model(compaq;server)


Regards,
Henrik