Dumb hobbit network test question
list Tom Kauffman
I've just done my tri-annual hardware shuffle, swapping out all my on-lease RS-6000 for brand spanking new systems. Part of this included upgrading from AIX 5.1 to AIX 5.3. Now, on half the systems (the last set replaced) I get errors on the smtp and ftp tests -- typically one test every two hours. Interestingly, all these systems barf on the same test cycle. This is obviously something not quite right in the AIX config, but I'm at a loss on what. I just found out today that we also have production rsh scripts that time out on the same cycle (yeah, I know -- but they've been rsh since year dot, and getting them to ssh is real low on the list . . ) Here's a sample of the network test error: Service ftp on hudson is not OK : Unexpected service response Service smtp on hudson is not OK : Unexpected service response How can I log the actual response? I'm currently running hobbit 4.03rc1; that's scheduled to change sometime next week. Other suggestions? TIA -- Tom Kauffman NIBCO, Inc
list Tom Kauffman
OK -- I used the --debug option; it wasn't as bad as I thought it would be, the resulting log was just over 11 MB when my problem occurred and I could turn it off. Henrik, can you clarify what this really means? Address=10.8.224.9:21, open=1, res=0, err=1, connecttime=0.003110, totaltime=10.063026, Address=10.8.224.38:21, open=1, res=0, err=0, connecttime=0.003060, totaltime=0.028471, banner='220 wabash FTP server (Version 4.2 Sat Feb 5 10:12:55 CST 2005) ready. 221 Goodbye. ' (86 bytes) (good response) 2005-11-30 14:03:59 tcp_got_expected: No data in banner 2005-11-30 14:03:59 Adding to combo msg: status volga.ftp yellow <!-- [flags:OrdastILe] --> Wed Nov 30 14:03:00 2005 ftp NOT ok This system is showing a load of 0.1 (max 2.0) om a 2-way 1.6 GHz machine; the FTP connect time is 17.4 microseconds (avg) and peaked in the last 48 hours at 5.2 milliseconds TIA Tom Kauffman NIBCO, Inc
▸
-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid]
Sent: Tuesday, November 29, 2005 11:28 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Dumb hobbit network test question
I've just done my tri-annual hardware shuffle, swapping out all my
on-lease RS-6000 for brand spanking new systems. Part of this included
upgrading from AIX 5.1 to AIX 5.3.
Now, on half the systems (the last set replaced) I get errors on the
smtp and ftp tests -- typically one test every two hours. Interestingly,
all these systems barf on the same test cycle. This is obviously
something not quite right in the AIX config, but I'm at a loss on what.
I just found out today that we also have production rsh scripts that
time out on the same cycle (yeah, I know -- but they've been rsh since
year dot, and getting them to ssh is real low on the list . . )
Here's a sample of the network test error:
Service ftp on hudson is not OK : Unexpected service response
Service smtp on hudson is not OK : Unexpected service response
How can I log the actual response? I'm currently running hobbit 4.03rc1;
that's scheduled to change sometime next week.
Other suggestions?
TIA --
Tom Kauffman
NIBCO, Inc
list Frederic Mangeant
Hi Tom
▸
-----Original Message----- From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid] Sent: Tuesday, November 29, 2005 11:28 AM To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] Dumb hobbit network test question
[snip]
Here's a sample of the network test error: Service ftp on hudson is not OK : Unexpected service response Service smtp on hudson is not OK : Unexpected service response
Are you using the "--checkresponse" option ? I had the same "Unexpected service response" warnings until I removed it. http://www.hswn.dk/hobbit/help/manpages/man1/bbtest-net.1.html --checkresponse[=COLOR] When testing well-known services (e.g. FTP, SSH, SMTP, POP-2, POP- 3, IMAP, NNTP and rsync), bbtest-net will look for a valid service- specific "OK" response. If another reponse is seen, this will cause the test to report a warning (yellow) status. Without this option, the response from the service is ignored. The optional color-name is used to select a color other than yellow for the status message when the response is wrong. E.g. "-- checkresponse=red" will cause a "red" status message to be sent when the service does not respond as expected.
list Henrik Størner
▸
On Wed, Nov 30, 2005 at 03:38:56PM -0500, Kauffman, Tom wrote:
Henrik, can you clarify what this really means? Address=10.8.224.9:21, open=1, res=0, err=1, connecttime=0.003110, totaltime=10.063026,
"open=1" means that the connection to the server succeeded. The interesting thing here is that it took only 0.003 seconds to get a connection, but then Hobbit spent more than 10 seconds waiting for a banner to appear. It never did - at least not within those 10 secs; the "err=1" means it gave up waiting for the data and signals a timeout.
▸
Address=10.8.224.38:21, open=1, res=0, err=0, connecttime=0.003060, totaltime=0.028471, banner='220 wabash FTP server (Version 4.2 Sat Feb 5 10:12:55 CST 2005) ready. 221 Goodbye.' (86 bytes)
This is a different server. Again, connecting takes about 0.003 secs, but the banner appears almost immediately - the entire exchange happens in 28 milliseconds. It might be that the FTP server performs a reverse DNS lookup of the Hobbit servers' IP address when Hobbit connects to check the FTP service. Sometimes DNS lookups take a while - maybe long enough for Hobbit to reach the 10 seconds timeout. Maybe your ftp server has a local DNS cache, and the timeout only happens when the cached DNS entry expires and has to be refreshed. One thing you can try is to add a "--timeout=30" option to the bbtest-net command in hobbitlaunch.cfg; that makes it wait up to 30 seconds before flagging a timeout. Regards, Henrik
list Tom Kauffman
Oh, how it helps to have additional minds on these things. Reverse lookup looks to be the culprit. I cloned all these systems in a bit of a hurry -- and the cloning changed the dns resolution config to point at (in order) my D/R hotsite Win2003 domain controller (active), my D/R hotsite D/R test domain controller (non-existent), my D/R hotsite hobbit system (also not there), and THEN my local DNS server -- so if the D/R DC didn't answer (wonder why IT goes away every two hours?) I would go through multiple retries to non-existent systems. One more item added to the system clone checklist. Thanks! Tom
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, November 30, 2005 4:59 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Dumb hobbit network test question
On Wed, Nov 30, 2005 at 03:38:56PM -0500, Kauffman, Tom wrote:Henrik, can you clarify what this really means? Address=10.8.224.9:21, open=1, res=0, err=1, connecttime=0.003110, totaltime=10.063026,
"open=1" means that the connection to the server succeeded. The interesting thing here is that it took only 0.003 seconds to get a connection, but then Hobbit spent more than 10 seconds waiting for a banner to appear. It never did - at least not within those 10 secs; the "err=1" means it gave up waiting for the data and signals a timeout.
Address=10.8.224.38:21, open=1, res=0, err=0, connecttime=0.003060, totaltime=0.028471, banner='220 wabash FTP server (Version 4.2 Sat Feb 5 10:12:55 CST
2005) ready. 221 Goodbye.' (86 bytes) This is a different server. Again, connecting takes about 0.003 secs, but the banner appears almost immediately - the entire exchange happens in 28 milliseconds. It might be that the FTP server performs a reverse DNS lookup of the Hobbit servers' IP address when Hobbit connects to check the FTP service. Sometimes DNS lookups take a while - maybe long enough for Hobbit to reach the 10 seconds timeout. Maybe your ftp server has a local DNS cache, and the timeout only happens when the cached DNS entry expires and has to be refreshed. One thing you can try is to add a "--timeout=30" option to the bbtest-net command in hobbitlaunch.cfg; that makes it wait up to 30 seconds before flagging a timeout. Regards, Henrik
list Vernon Everett
Hi
Does anybody know what the Hobbit status numbers mean?
I am getting this in my hobbitlaunch.log
2005-12-01 13:27:41 Task hobbitclient terminated, status 208
Regards
Vernon
No trees were killed in the creation of this message. However, many
electrons were terribly inconvenienced. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
NOTICE: This message and any attachments are confidential and may contain copyright material of Australian Finance Group Limited or a third party. It is intended solely for the purpose of the addressee and any other named recipient. If you are not the intended recipient, any use, distribution, disclosure or copying of this message is strictly prohibited. The confidentiality attached
to this message is not waived or lost by reason of the mistaken transmission or delivery to any unintended party. If you have received this message in error, please notify the author immediately or contact Australian Finance Group on +61 8 9420 7888.
list Henrik Størner
▸
On Thu, Dec 01, 2005 at 01:40:22PM +0800, Vernon Everett wrote:
Does anybody know what the Hobbit status numbers mean? I am getting this in my hobbitlaunch.log 2005-12-01 13:27:41 Task hobbitclient terminated, status 208
It's the exit code returned by the command you run. "208" doesn't sound right; the hobbitclient.sh script normally returns a 0. Henrik
list Vernon Everett
I would have to agree it doesn't sound right. :-) It'a also core dumping, and not showing the status page. Any ideas?
▸
Regards
Vernon
No trees were killed in the creation of this message. However, many
electrons were terribly inconvenienced.
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Thursday, 1 December 2005 2:39 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Dumb hobbit network test question
On Thu, Dec 01, 2005 at 01:40:22PM +0800, Vernon Everett wrote:Does anybody know what the Hobbit status numbers mean? I am getting this in my hobbitlaunch.log 2005-12-01 13:27:41 Task hobbitclient terminated, status 208
It's the exit code returned by the command you run. "208" doesn't sound right; the hobbitclient.sh script normally returns a 0. Henrik _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ NOTICE: This message and any attachments are confidential and may contain copyright material of Australian Finance Group Limited or a third party. It is intended solely for the purpose of the addressee and any other named recipient. If you are not the intended recipient, any use, distribution, disclosure or copying of this message is strictly prohibited. The confidentiality attached to this message is not waived or lost by reason of the mistaken transmission or delivery to any unintended party. If you have received this message in error, please notify the author immediately or contact Australian Finance Group on +61 8 9420 7888.