Xymon Mailing List Archive search

ssh Unexpected service response

3 messages in this thread

list Vernon Everett · Wed, 14 Aug 2013 06:58:01 +0800 ·
Hi all

From time to time I get the "Unexpected service response" on the ssh tests
of a Solaris host.
The test goes yellow, and recovers again within seconds. 10 seconds on the
last example.

Because of the short duration, it's really dificult to replicate.
Can anybody give me some pointers as to what's causing this?
Does xymon log the exact response somewhere?

The old Hobbit archive has a few threads with this error message, but none
give me usable diagnostic information.

The error isn''t associated with any service disruption, and nobody has
reported difficulty with their connections, but I would still like to get
to the bottom of this, and hopefully eliminate the "problem".

Any assistance appreciated.

Regards
Vernon


-- 
"Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton
list Jeremy Laidman · Wed, 14 Aug 2013 12:11:04 +1000 ·
Vernon

Is there anything interesting showing in the sshd logs on the Solaris host,
especially differences between the entries for success vs failure?  Is this
happening on any other similar servers?

The xymonnet process has a 10-second timeout for each connection, and
probably gives the "unexpected service response" if it doesn't see "SSH" in
that time.  You could try increasing the timeout (adding "--timeout=20" to
the xymonnet CMD in tasks.cfg) and see if that helps.

I suspect you're getting a DNS reverse-lookup delay.  Many sshd processes
will, by default, perform a reverse lookup of the connecting IP address, so
that it can put the name into the logs.  Many stub resolvers time-out after
15 seconds, and sshd gives up and goes on without the hostname.  I see this
on servers that have been misconfigured for DNS, and it takes 15 seconds to
login every time, until either the DNS config is fixed up, or sshd is made
to not wait for DNS.

You can speed up this reverse lookup by adding the Xymon server IP and name
into /etc/hosts.  Most sshd implementations allow you to turn off the
reverse lookups, either by running with the "-u0" parameter, or (more
recently) by adding "UseDNS no" into sshd_config.  I do this as a matter of
course on any new servers I deploy.

J
quoted from Vernon Everett


On 14 August 2013 08:58, Vernon Everett <user-b3f8dacb72c8@xymon.invalid> wrote:
Hi all

From time to time I get the "Unexpected service response" on the ssh tests
of a Solaris host.
The test goes yellow, and recovers again within seconds. 10 seconds on the
last example.

Because of the short duration, it's really dificult to replicate.
Can anybody give me some pointers as to what's causing this?
Does xymon log the exact response somewhere?

The old Hobbit archive has a few threads with this error message, but none
give me usable diagnostic information.

The error isn''t associated with any service disruption, and nobody has
reported difficulty with their connections, but I would still like to get
to the bottom of this, and hopefully eliminate the "problem".

Any assistance appreciated.

Regards
Vernon


--
"Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton

list Thomas Eckert · Thu, 15 Aug 2013 14:47:09 +0200 ·
We see this regularily with (over-) loaded hosts.
Any chance that your Solaris host is under heavy load (both network and/or system) when this happens?

Best
Thomas 
-- 
IT-Beratung Eckert		Bleickenallee 19			fon: +49 40 7305 1809
Thomas Eckert			22763 Hamburg			fax: +XX XX XXXX XXXX
www.it-eckert.com		user-dcf4e2aaac67@xymon.invalid


Am 14.08.2013 um 00:58 schrieb Vernon Everett <user-b3f8dacb72c8@xymon.invalid>:
quoted from Jeremy Laidman
Hi all

From time to time I get the "Unexpected service response" on the ssh tests of a Solaris host.
The test goes yellow, and recovers again within seconds. 10 seconds on the last example.

Because of the short duration, it's really dificult to replicate. Can anybody give me some pointers as to what's causing this?
Does xymon log the exact response somewhere?

The old Hobbit archive has a few threads with this error message, but none give me usable diagnostic information.

The error isn''t associated with any service disruption, and nobody has reported difficulty with their connections, but I would still like to get to the bottom of this, and hopefully eliminate the "problem".

Any assistance appreciated.

Regards
Vernon


-- 
"Accept the challenges so that you can feel the exhilaration of victory"
- General George Patton