what version of Hobbit ? And what OS/hardware are you running on ?
version 4.1.1, redhat3.0 on a fairly good server compaq (2x3Gh intel cpu)
Is there an equivalent number of "Bogus/Timeout" messages reported in
the Hobbit servers' "hobbitd" status column ?
no,
i had 1 hobbitd report with a "Bogus/Timeout =1" this morning
and over 50 bbtest-net reports with 1 or 2 whoops..
Are there any unusual messages in the hobbitd.log file ?
nothing in hobbitd.log
The timeout that bbtest-net hits is a 5 second timeout which is the
default one used whenever a message is sent off to the Hobbit daemon.
The 5 secs was chosen back when bbtest-net was sending to the Big
Brother daemon, and considering that fact that Hobbit can generate much
larger messages it might be worth a try to increase that timeout
somewhat. Unfortunately, that one is set at compile-time and cannot be
changed easily - so could you try editing the lib/sendmsg.h file and
change the line
#define BBTALK_TIMEOUT 5
to
#define BBTALK_TIMEOUT 15
Then run "make clean; make" and as root "make install" to build and
install the tools with the new timeout setting.
Also, on the Hobbit server it might be necessary to up the timeout on
the receiver side - so add a "--timeout=30" to the hobbitd command in
~hobbit/server/etc/hobbitlaunch.cfg
ok, i've changed those to what you recommended (15 and 30)
up to now, bbtest-net doesnt whoops anymore
it looks like bbtest-net actually connected to hobbitd !
-> could bbtest-net re-open a connection and resend the affected statuses
when a
oops happens ?
It's tricky. Basically these timeouts should not happen (especially not
when we're connecting to "localhost"), so I'd rather try and figure out why they happen.
yes, i understand and agree with you.
let me know if i can do anything on this.
one thing that seems pretty long in my bbtest-net report is "test result
transmitted" :
Statistics:
Hosts total : 1629
Hosts with no tests : 0
Total test count : 4511
Status messages : 4851
Alert status msgs : 0
Transmissions : 522
TIME SPENT
Event Starttime Duration
bbtest-net startup 1122897713.037280 -
Service definitions loaded 1122897713.040386 0.003106 Tests loaded 1122897713.568623 0.528237 DNS lookups completed 1122897723.673199 10.104576 Test engine setup completed 1122897723.737976 0.064777 TCP tests completed 1122897747.000639 23.262663 PING test completed (1569 hosts) 1122897792.655792 45.655153 PING test results sent 1122897795.920521 3.264729 Test result collection completed 1122897795.921481 0.000960 LDAP test engine setup completed 1122897795.921485 0.000004 LDAP tests executed 1122897795.921487 0.000002 LDAP tests result collection completed 1122897795.921488 0.000001 NTP tests executed 1122897796.143392 0.221904 DIG tests executed 1122897796.399747 0.256355 NSLOOKUP tests executed 1122897796.534172 0.134425 Test results transmitted 1122897824.069917 27.535745 bbtest-net completed 1122897824.074708 0.004791 TIME TOTAL 111.037428
--
Olivier Beau