Xymon Mailing List Archive search

polling period in hobbit

2 messages in this thread

list Engineering · Fri, 3 Oct 2008 07:39:33 -0400 ·
I am experiencing problems with the paging not sending out alerts in time.
In looking at the "conn" column, I would assume that if the device has
missed 3 poll periods than the seconds that would have elapsed would be
180sec.  Being that my poll period I thought was 60 seconds for conn.  Here
is a host that had an issue lastnight:

 
Service conn on gsoncit-jws-nsc-1 is not OK : Host does not respond to ping

 
System unreachable for 3 poll periods (348 seconds)

 
I looked at the "bbtest" column and here are my stats:

 
Fri Oct 3 07:32:41 2008

 
bbtest-net version 4.2.0

SSL library : OpenSSL 0.9.8b 04 May 2006

LDAP library: OpenLDAP 20327

 
Statistics:

 Hosts total           :     1290

 Hosts with no tests   :        2

 Total test count      :     1341

 Status messages       :     1342

 Alert status msgs     :        0

 Transmissions         :       14

 
DNS statistics:

 # hostnames resolved  :     1289

 # succesful           :     1272

 # failed              :       17

 # calls to dnsresolve :     1341

 
TCP test statistics:

 # TCP tests total     :       46

 # HTTP tests          :        1

 # Simple TCP tests    :       45

 # Connection attempts :       46

 # bytes written       :      386

 # bytes read          :     6449

 
TIME SPENT

Event                                            Starttime          Duration

bbtest-net startup                       1223033561.966478                 -

Service definitions loaded               1223033561.971409          0.004931


Tests loaded                             1223033577.163407         15.191998


DNS lookups completed                    1223033582.170201          5.006794


Test engine setup completed              1223033582.184175          0.013974


TCP tests completed                      1223033582.193734          0.009559


PING test completed (1316 hosts)         1223033647.588782         65.395048


PING test results sent                   1223033647.598961          0.010179


Test result collection completed         1223033647.599010          0.000049


LDAP test engine setup completed         1223033647.599012          0.000002


LDAP tests executed                      1223033647.599014          0.000002


LDAP tests result collection completed   1223033647.599016          0.000002


DNS tests executed                       1223033647.600062          0.001046


RPC tests executed                       1223033647.713111          0.113049


Test results transmitted                 1223033647.770369          0.057258


bbtest-net completed                     1223033647.772611          0.002242


TIME TOTAL                                                         85.806133


Anyone know what my be the issue?

 
Thanks,

 
Josh
list Iain M Conochie · Fri, 03 Oct 2008 15:59:03 +0100 ·
quoted from Engineering
Engineering wrote:
I am experiencing problems with the paging not sending out alerts in 
time. In looking at the “conn” column, I would assume that if the 
device has missed 3 poll periods than the seconds that would have 
elapsed would be 180sec. Being that my poll period I thought was 60 
seconds for conn. Here is a host that had an issue lastnight:
I think you will find the interesting stats are:

PING test completed (1316 hosts) 1223033647.588782 65.395048
TIME TOTAL 85.806133

So if you set you poll time to 60 seconds and it takes 85 seconds to 
complete you are going to have issues!

Cheers

Iain
quoted from Engineering
Service conn on gsoncit-jws-nsc-1 is not OK : Host does not respond to 
ping

System unreachable for 3 poll periods (348 seconds)

I looked at the “bbtest” column and here are my stats:

*Fri Oct 3 07:32:41 2008*

bbtest-net version 4.2.0

SSL library : OpenSSL 0.9.8b 04 May 2006

LDAP library: OpenLDAP 20327

Statistics:

Hosts total : 1290

Hosts with no tests : 2

Total test count : 1341

Status messages : 1342

Alert status msgs : 0

Transmissions : 14

DNS statistics:

# hostnames resolved : 1289

# succesful : 1272

# failed : 17

# calls to dnsresolve : 1341

TCP test statistics:

# TCP tests total : 46

# HTTP tests : 1

# Simple TCP tests : 45

# Connection attempts : 46

# bytes written : 386

# bytes read : 6449

TIME SPENT

Event Starttime Duration

bbtest-net startup 1223033561.966478 -

Service definitions loaded 1223033561.971409 0.004931

Tests loaded 1223033577.163407 15.191998

DNS lookups completed 1223033582.170201 5.006794

Test engine setup completed 1223033582.184175 0.013974

TCP tests completed 1223033582.193734 0.009559

PING test completed (1316 hosts) 1223033647.588782 65.395048

PING test results sent 1223033647.598961 0.010179

Test result collection completed 1223033647.599010 0.000049

LDAP test engine setup completed 1223033647.599012 0.000002

LDAP tests executed 1223033647.599014 0.000002

LDAP tests result collection completed 1223033647.599016 0.000002

DNS tests executed 1223033647.600062 0.001046

RPC tests executed 1223033647.713111 0.113049

Test results transmitted 1223033647.770369 0.057258

bbtest-net completed 1223033647.772611 0.002242

TIME TOTAL 85.806133

Anyone know what my be the issue?

Thanks,

Josh