Xymon Mailing List Archive search

Question/feature request

list Eric van de Meerakker
Mon, 03 Jul 2006 14:17:43 +0200
Message-Id: <user-3b08f00d1a90@xymon.invalid>

Hi Henrik,


I don't know if you read my previous response (see below), because it
got sent using the wrong mail account. But I think I've found another
issue: does the network retest procedure after a failed test ignore the
"expect" setting in bb-services?

I tried to do some testing by deliberatly misconfiguring the expect
setting for the FTP test (I set it to 221 in stead of 220), and now I
have got a cyclical behaviour on the Hobbit server: it will turn all
(five) FTP service tests yellow on the next test, but within a minute
they all turn green again. Again five minutes later they turn yellow
again, back green within a minute, etc. etc. This continues to happen
until I put the expect 220 back in bb-services...

I don't think this is the correct behaviour?


Regards,

Eric.
Hi Henrik,


You're right, at least partially. I found out just now that the issue
was with a misconfigured nsswitch.conf on the FTP server. That file
still had entries for nis and nisplus in it, whicht caused the FTP
banner response to be very slow (just about the length of the network
test timeout I guess :-), due to the hostname lookup. The TCP connection
would be established quickly, but the FTP banner didn't always appear in
time.

But the weird thing is that some green FTP statuses (especially those
following the yellow ones in the history) don't contain any response
string either?!?

I only saw those FTP statuses at first and they made me try to put in
some debugging code to get the actual response on the web page, directly
behind the "Unexpected service response" text. My first attempt crashed
the bbtest-net executable the next time the failure occured (exactly
because there was no response, so I rewrote it to catch that and put in
an explicit "(null)" text when no data was received), but in the
meantime I found the cause of the issue.

Also, the "Seconds: N.NN" reported seems to be the time in which the TCP
connection to the FTP server was established, not the total test time.
That makes sense I suppose for the TCP timing statistics, but it threw
me off-track in finding the solution for this problem. A yellow FTP
status with 0.12 seconds duration did not indicate a timeout to me ;-)

BTW, I'm testing this on the 4.2 beta release with recent patches. I'm
in the process of installing a new Hobbit server in our remote
datacenter to monitor the production systems locally, so we won't
experience Internet outages as downtime for our services (we're already
running Hobbit remotely on two oldish servers from two remote offices,
outages in ADSL connections in reporting actual service downtime to our
customers). Alerts from the datacenter will go out through SMS. We're
very happy with Hobbit so far!


Regards,

Eric.


Henrik Stoerner wrote:
On Mon, Jul 03, 2006 at 11:37:17AM +0200, Eric van de Meerakker (Mailings Lists) wrote:
I have a question on Hobbit: how can I find out what the exact
"Unexpected service response" is on a network test? I have an FTP test
that fails momentarily for (to me) mysterious reasons... Would it be
possible to put the actual value of the unexpected service response in
the error message?
It does that already, actually. If you don't see anything on the status
page, it is because no data was received from the server. (And "no data"
obviously doesn't match the "200" status we expect from an ftp server).


Regards,
Henrik