Xymon Mailing List Archive search

LDAP Check causes xymonnet to hand

list Thomas Vachon
Fri, 1 Nov 2013 10:43:12 -0400
Message-Id: <user-3adb7f24581e@xymon.invalid>

Jeremy,

It seems LDAP surely is the issue.  The check has no settable timeout in
hosts.cfg.  As for what happens to our LOVELY OpenDirectory, it just hangs
both SSL and plain text.  All ports remain open, ping even still works.

I caught it this time and it does seem that the lack of a timeout is the
problem here (the check only terminated because we rebooted the box)
Error output:
WARNING: Runtime 521 longer than time limit (300)

TIME SPENT
Event                                           Start time
 Duration
xymonnet startup                         1383315414.744906
• Service definitions loaded               1383315414.746373
 0.001467
Tests loaded                             1383315414.815137
 0.068764
DNS lookups completed                    1383315414.815838
 0.000701
Test engine setup completed              1383315414.839164
 0.023326
TCP tests completed                      1383315426.045943
11.206779
PING test completed (104 hosts)          1383315434.035841
 7.989898
PING test results sent                   1383315434.036641
 0.000800
Test result collection completed         1383315434.037506
 0.000865
LDAP test engine setup completed         1383315434.037568
 0.000062
LDAP tests executed                      1383315935.053479
 501.015911
LDAP tests result collection completed   1383315935.053491
 0.000012
Test results transmitted                 1383315935.055777
 0.002286
xymonnet completed                       1383315935.062680
 0.006903
TIME TOTAL
 520.317774

In the host.cfg we have: $IP  commander.example.com  # ldap://
commander.example.com:389/dc=commander,dc=example,dc=com

It is our only ldap check (its over a IPsec S2S link with 50/50 connection)

Xymonnet is standard install "CMD xymonnet --report --ping
--checkresponse", the proc only exits when we reboot the server or it
passes normally.

I couldn't get the command to just do the one server, will try again in a
bit.  I think the ultimate problem is ldap check never times out, so how
can one be set?


--
Thomas Vachon


On Thu, Oct 31, 2013 at 12:31 AM, Jeremy Laidman
<user-71895fb2e44c@xymon.invalid>wrote:
Thomas

In what way is the LDAP server in a "bad state"?

Are you using LDAP or LDAPS?

Can you connect to the server on the LDAP port using telnet?

What does your hosts.cfg file entry look like?

What parameters do you have for xymonnet in tasks.cfg?

Does the xymonnet process eventually exit?

Can you run xymonnet manually (as the xymon user, under a xymoncmd shell),
such as:

$ xymonnet --debug --timeout=1 <name-of-server>

If this fails in the same way, perhaps you can tweak some parameters, such
as adding "--dns=ip" or "--noping" or other options.

J


On 31 October 2013 00:01, Thomas Vachon <user-bd0daa6991dc@xymon.invalid> wrote:
We are having issues with xymonnet handing on 4.3.12 (happened on 4.3.10
too).  As soon as we added an ldap check which can hang (due to the ldap
server being in a bad state), xymonnet hangs and goes purple on all remote
checks.

Nothing shows in the logs. Here is the historical xymonnet info from the
last purple:

Wed Oct 30 12:12:10 2013


xymonnet version 4.3.12
SSL library : OpenSSL 1.0.1e 11 Feb 2013
LDAP library: OpenLDAP 20431

Statistics:
 Hosts total           :       70
 Hosts with no tests   :        1
 Total test count      :      183
 Status messages       :      183
 Alert status msgs     :        0
 Transmissions         :        3

DNS statistics:
 # hostnames resolved  :      115
 # succesful           :       69
 # failed              :        0
 # calls to dnsresolve :      182

TCP test statistics:
 # TCP tests total     :      113
 # HTTP tests          :       45
 # Simple TCP tests    :       68
 # Connection attempts :      113
 # bytes written       :     7192
 # bytes read          :   528667


TIME SPENT
Event                                           Start time
 Duration
xymonnet startup                         1383135130.081755
  • Service definitions loaded               1383135130.089141
 0.007386
Tests loaded                             1383135130.147247
 0.058106
DNS lookups completed                    1383135130.147712
 0.000465
Test engine setup completed              1383135130.159468
 0.011756
TCP tests completed                      1383135142.406914
12.247446
PING test completed (69 hosts)           1383135149.073309
 6.666395
PING test results sent                   1383135149.074005
 0.000696
Test result collection completed         1383135149.075192
 0.001187
LDAP test engine setup completed         1383135149.075263
 0.000071
LDAP tests executed                      1383135151.403612
 2.328349
LDAP tests result collection completed   1383135151.403621
 0.000009
Test results transmitted                 1383135151.405148
 0.001527
xymonnet completed                       1383135151.407244
 0.002096
TIME TOTAL
21.325489


--
Thomas Vachon