Disk,CPU,Memory working Procs not
list Lars
Hello,
hope someone can help me. I have a hobbit installation and will monitor some
Ports and Proc from a server.
The client is installed and I become informations about disk, cpu, memory and
messages. But the rules to procs doesn't work. Why?
hobbit-client.cfg on server
HOST=ClientA
PROC cron 1 -1 yellow
LOG /var/log/messages
DIR /home/worker
DEFAULT
# These are the built-in defaults.
UP 1m
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
localclient.cfg on client
HOST=ClientA
UP 1m
DISK * 70 85
PROC cron 1 -1 yellow
DIR /home/worker
FILE /var/log/messages
DEFAULT
# These are the built-in defaults.
UP 1h
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
PROC cron 1 -1 yellow
best regards,
Lars
list H. Klomp
Do you have the ps output in the "client data" of the client? Bert
▸
-----Original Message-----
From: Lars [mailto:user-b14a3b373e6d@xymon.invalid]
Sent: zondag 16 december 2007 17:24
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] [Hobbit] Disk,CPU,Memory working Procs not
Hello,
hope someone can help me. I have a hobbit installation and will monitor some
Ports and Proc from a server.
The client is installed and I become informations about disk, cpu, memory and
messages. But the rules to procs doesn't work. Why?
hobbit-client.cfg on server
HOST=ClientA
PROC cron 1 -1 yellow
LOG /var/log/messages
DIR /home/worker
DEFAULT
# These are the built-in defaults.
UP 1m
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
localclient.cfg on client
HOST=ClientA
UP 1m
DISK * 70 85
PROC cron 1 -1 yellow
DIR /home/worker
FILE /var/log/messages
DEFAULT
# These are the built-in defaults.
UP 1h
LOAD 5.0 10.0
DISK * 90 95
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
PROC cron 1 -1 yellow
best regards,
Lars
list Lars
Hello,
yes I have ps output in client data.
[ps]
PID PPID USER STARTED S PRI %CPU TIME %MEM RSZ VSZ CMD
1 0 root 20:17:07 S 24 0.0 00:00:01 0.0 284 744 init [5] 2 0 root 20:17:07 S 27 0.0 00:00:00 0.0 0 0 [kthreadd]
3 2 root 20:17:07 S 139 0.0 00:00:00 0.0 0 0 much more....
4428 1 root 20:17:42 S 21 0.0 00:00:00 0.0 564 1984 /usr/sbin/cron
...much more :-)
Lars
list Michael A. Price
My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael
list Josh Luthman
# failed : 100 <--- may be the cause, lots of failed DNS queries
▸
On 12/19/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Michael A. Price
Thanks... Actually, I updated my DNS servers and went from 300 failed lookups to 100. So I thought I was going to improve.... But it got worse!!!! Any other ideas??? Thanks, michael
▸
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 20, 2007 8:10 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
# failed : 100 <--- may be the cause, lots of failed
DNS queries
On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid
<mailto:user-d7d653acf808@xymon.invalid> > wrote:
My bbtest time went from 10 seconds to 89.0 ....
Has anyone seen this before???
Wed Dec 19 19:15:55 2007
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 310
Hosts with no tests : 7
Total test count : 307
Status messages : 308
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 303
# succesful : 203
# failed : 100
# calls to dnsresolve : 307
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198091755.294810
• Service definitions loaded 1198091755.297812
0.003002
Tests loaded 1198091755.346908
0.049096
DNS lookups completed 1198091765.439050
10.092142
Test engine setup completed 1198091765.442685
0.003635
TCP tests completed 1198091765.443457
0.000772
PING test completed (303 hosts) 1198091790.084027
24.640570
PING test results sent 1198091850.102236
60.018209
Test result collection completed 1198091850.102455
0.000219
LDAP test engine setup completed 1198091850.102472
0.000017
LDAP tests executed 1198091850.102475
0.000003
LDAP tests result collection completed 1198091850.102482
0.000007
NSLOOKUP tests executed 1198091850.111523
0.009041
Test results transmitted 1198091850.118622
0.007099
bbtest-net completed 1198091850.120484
0.001862
TIME TOTAL
94.825674
Thanks, michael
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman
If that was the only change you made recently try switching the DNS servers back to see if the problem disappears.
▸
On 12/20/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:Thanks… Actually, I updated my DNS servers and went from 300 failed lookups to 100. So I thought I was going to improve…. But it got worse!!!! Any other ideas??? Thanks, michael *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Thursday, December 20, 2007 8:10 AM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] bbtest - errors # failed : 100 <--- may be the cause, lots of failed DNS queries On 12/19/07, *Michael A. Price* < user-d7d653acf808@xymon.invalid> wrote: My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Michael A. Price
Thanks for getting back to me on this. I updated the hobbit server to not use the DNS servers and all that does is cause it to go from 100 failed hosts to 299 failed hosts. I think it's the large "PING test results sent" number, what else could be the problem??? Here is another printout... Thanks, michael
▸
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 203
# failed : 101
# calls to dnsresolve : 308
▸
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198691205.281738
• Service definitions loaded 1198691205.282850
0.001112
Tests loaded 1198691205.316420
0.033570
DNS lookups completed 1198691215.446830
10.130410
Test engine setup completed 1198691215.450594
0.003764
TCP tests completed 1198691215.451393
0.000799
PING test completed (304 hosts) 1198691240.081987
24.630594
PING test results sent 1198691270.090627
30.008640
Test result collection completed 1198691270.090642
0.000015
LDAP test engine setup completed 1198691270.090656
0.000014
LDAP tests executed 1198691270.090660
0.000004
LDAP tests result collection completed 1198691270.090663
0.000003
NSLOOKUP tests executed 1198691270.146990
0.056327
Test results transmitted 1198691270.149410
0.002420
bbtest-net completed 1198691270.150271
0.000861
TIME TOTAL
64.868533
▸
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 20, 2007 11:04 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
If that was the only change you made recently try switching the DNS
servers back to see if the problem disappears.
On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid
<mailto:user-d7d653acf808@xymon.invalid> > wrote:
Thanks...
Actually, I updated my DNS servers and went from 300 failed lookups to
100. So I thought I was going to improve....
But it got worse!!!! Any other ideas???
Thanks, michael
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 20, 2007 8:10 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
# failed : 100 <--- may be the cause, lots of failed
DNS queries
On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid
<mailto:user-d7d653acf808@xymon.invalid> > wrote:
My bbtest time went from 10 seconds to 89.0 ....
Has anyone seen this before???
Wed Dec 19 19:15:55 2007
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 310
Hosts with no tests : 7
Total test count : 307
Status messages : 308
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 303
# succesful : 203
# failed : 100
# calls to dnsresolve : 307
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198091755.294810
• Service definitions loaded 1198091755.297812
0.003002
Tests loaded 1198091755.346908
0.049096
DNS lookups completed 1198091765.439050
10.092142
Test engine setup completed 1198091765.442685
0.003635
TCP tests completed 1198091765.443457
0.000772
PING test completed (303 hosts) 1198091790.084027
24.640570
PING test results sent 1198091850.102236
60.018209
Test result collection completed 1198091850.102455
0.000219
LDAP test engine setup completed 1198091850.102472
0.000017
LDAP tests executed 1198091850.102475
0.000003
LDAP tests result collection completed 1198091850.102482
0.000007
NSLOOKUP tests executed 1198091850.111523
0.009041
Test results transmitted 1198091850.118622
0.007099
bbtest-net completed 1198091850.120484
0.001862
TIME TOTAL
94.825674
Thanks, michael
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman
Your calls to dnsresolve went up one, how in the world did you "[update] the hobbit server to not use the DNS servers"? It looks like it is still doing the exact same stuff concerning DNS to me...
▸
On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:Thanks for getting back to me on this. I updated the hobbit server to not use the DNS servers and all that does is cause it to go from 100 failed hosts to 299 failed hosts. I think it's the large "PING test results sent" number, what else could be the problem??? Here is another printout… Thanks, michael bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 203 # failed : 101 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198691205.281738 - Service definitions loaded 1198691205.282850 0.001112 Tests loaded 1198691205.316420 0.033570 DNS lookups completed 1198691215.446830 10.130410 Test engine setup completed 1198691215.450594 0.003764 TCP tests completed 1198691215.451393 0.000799 PING test completed (304 hosts) 1198691240.081987 24.630594 PING test results sent 1198691270.090627 30.008640 Test result collection completed 1198691270.090642 0.000015 LDAP test engine setup completed 1198691270.090656 0.000014 LDAP tests executed 1198691270.090660 0.000004 LDAP tests result collection completed 1198691270.090663 0.000003 NSLOOKUP tests executed 1198691270.146990 0.056327 Test results transmitted 1198691270.149410 0.002420 bbtest-net completed 1198691270.150271 0.000861 TIME TOTAL 64.868533 *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Thursday, December 20, 2007 11:04 AM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] bbtest - errors If that was the only change you made recently try switching the DNS servers back to see if the problem disappears. On 12/20/07, *Michael A. Price* < user-d7d653acf808@xymon.invalid> wrote: Thanks… Actually, I updated my DNS servers and went from 300 failed lookups to 100. So I thought I was going to improve…. But it got worse!!!! Any other ideas??? Thanks, michael *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Thursday, December 20, 2007 8:10 AM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] bbtest - errors # failed : 100 <--- may be the cause, lots of failed DNS queries On 12/19/07, *Michael A. Price* < user-d7d653acf808@xymon.invalid> wrote: My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Michael A. Price
I just modified the /etc/nsswitch.conf file to remove DNS. I find it interesting that no matter if the hobbit server uses DNS servers or local host files to look up the hosts the 'PING Test Results Sent' number is still off the charts. Thanks so much for getting back to me
▸
Thanks, michael
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Wednesday, December 26, 2007 6:00 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
Your calls to dnsresolve went up one, how in the world did you "[update]
the hobbit server to not use the DNS servers"?
It looks like it is still doing the exact same stuff concerning DNS to
me...
On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:
Thanks for getting back to me on this.
I updated the hobbit server to not use the DNS servers and all that does
is cause it to go from 100 failed hosts to 299 failed hosts.
I think it's the large "PING test results sent" number, what else could
be the problem???
Here is another printout...
Thanks, michael
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 203
# failed : 101
# calls to dnsresolve : 308
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198691205.281738
• Service definitions loaded 1198691205.282850
0.001112
Tests loaded 1198691205.316420
0.033570
DNS lookups completed 1198691215.446830
10.130410
Test engine setup completed
1198691215.450594 0.003764
TCP tests completed 1198691215.451393
0.000799
PING test completed (304 hosts) 1198691240.081987
24.630594
PING test results sent 1198691270.090627
30.008640
Test result collection completed 1198691270.090642
0.000015
LDAP test engine setup completed 1198691270.090656
0.000014
LDAP tests executed 1198691270.090660
0.000004
LDAP tests result collection completed
1198691270.090663 0.000003
NSLOOKUP tests executed 1198691270.146990
0.056327
Test results transmitted 1198691270.149410
0.002420
bbtest-net completed 1198691270.150271
0.000861
TIME TOTAL
64.868533
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 20, 2007 11:04 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
If that was the only change you made recently try switching the DNS
servers back to see if the problem disappears.
On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid
<mailto:user-d7d653acf808@xymon.invalid> > wrote:
Thanks...
Actually, I updated my DNS servers and went from 300 failed lookups to
100. So I thought I was going to improve....
But it got worse!!!! Any other ideas???
Thanks, michael
From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 20, 2007 8:10 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
# failed : 100 <--- may be the cause, lots of failed
DNS queries
On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid
<mailto:user-d7d653acf808@xymon.invalid> > wrote:
My bbtest time went from 10 seconds to 89.0 ....
Has anyone seen this before???
Wed Dec 19 19:15:55 2007
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 310
Hosts with no tests : 7
Total test count : 307
Status messages : 308
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 303
# succesful : 203
# failed : 100
# calls to dnsresolve : 307
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198091755.294810
• Service definitions loaded 1198091755.297812
0.003002
Tests loaded 1198091755.346908
0.049096
DNS lookups completed 1198091765.439050
10.092142
Test engine setup completed 1198091765.442685
0.003635
TCP tests completed 1198091765.443457
0.000772
PING test completed (303 hosts) 1198091790.084027
24.640570
PING test results sent 1198091850.102236
60.018209
Test result collection completed 1198091850.102455
0.000219
LDAP test engine setup completed 1198091850.102472
0.000017
LDAP tests executed 1198091850.102475
0.000003
LDAP tests result collection completed 1198091850.102482
0.000007
NSLOOKUP tests executed 1198091850.111523
0.009041
Test results transmitted 1198091850.118622
0.007099
bbtest-net completed 1198091850.120484
0.001862
TIME TOTAL
94.825674
Thanks, michael
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman
Michael, Try adding "testip" after the comment in as many hosts as possible, IE: 10.0.0.250 myftp.server.com # testip Josh
▸
On 12/27/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:
I just modified the /etc/nsswitch.conf file to remove DNS.
I find it interesting that no matter if the hobbit server uses DNS servers
or local host files to look up the hosts the 'PING Test Results Sent' number
is still off the charts.
Thanks so much for getting back to me
Thanks, michael
*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Wednesday, December 26, 2007 6:00 PM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] bbtest - errors
Your calls to dnsresolve went up one, how in the world did you "[update]
the hobbit server to not use the DNS servers"?
It looks like it is still doing the exact same stuff concerning DNS to
me...
On 12/26/07, *Michael A. Price* <user-d7d653acf808@xymon.invalid> wrote:
Thanks for getting back to me on this.
I updated the hobbit server to not use the DNS servers and all that does
is cause it to go from 100 failed hosts to 299 failed hosts.
I think it's the large "PING test results sent" number, what else could be
the problem???
Here is another printout…
Thanks, michael
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 203
# failed : 101
# calls to dnsresolve : 308
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime Duration
bbtest-net startup 1198691205.281738 -
Service definitions loaded 1198691205.282850
0.001112
Tests loaded 1198691205.316420 0.033570
DNS lookups completed 1198691215.446830 10.130410
Test engine setup completed
1198691215.450594 0.003764
TCP tests completed 1198691215.451393 0.000799
PING test completed (304 hosts) 1198691240.081987 24.630594
PING test results sent 1198691270.090627 30.008640
Test result collection completed 1198691270.090642
0.000015
LDAP test engine setup completed 1198691270.090656 0.000014
LDAP tests executed 1198691270.090660 0.000004
LDAP tests result collection completed
1198691270.090663 0.000003
NSLOOKUP tests executed 1198691270.146990 0.056327
Test results transmitted 1198691270.149410 0.002420
bbtest-net completed 1198691270.150271 0.000861
TIME TOTAL
64.868533
*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Thursday, December 20, 2007 11:04 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] bbtest - errors
If that was the only change you made recently try switching the DNS
servers back to see if the problem disappears.
On 12/20/07, *Michael A. Price* < user-d7d653acf808@xymon.invalid> wrote:
Thanks…
Actually, I updated my DNS servers and went from 300 failed lookups to
100. So I thought I was going to improve….
But it got worse!!!! Any other ideas???
Thanks, michael
*From:* Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid]
*Sent:* Thursday, December 20, 2007 8:10 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] bbtest - errors
# failed : 100 <--- may be the cause, lots of failed
DNS queries
On 12/19/07, *Michael A. Price* < user-d7d653acf808@xymon.invalid> wrote:
My bbtest time went from 10 seconds to 89.0 ....
Has anyone seen this before???
Wed Dec 19 19:15:55 2007
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 310
Hosts with no tests : 7
Total test count : 307
Status messages : 308
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 303
# succesful : 203
# failed : 100
# calls to dnsresolve : 307
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198091755.294810
• Service definitions loaded 1198091755.297812
0.003002
Tests loaded 1198091755.346908
0.049096
DNS lookups completed 1198091765.439050
10.092142
Test engine setup completed 1198091765.442685
0.003635
TCP tests completed 1198091765.443457
0.000772
PING test completed (303 hosts) 1198091790.084027
24.640570
PING test results sent 1198091850.102236
60.018209
Test result collection completed 1198091850.102455
0.000219
LDAP test engine setup completed 1198091850.102472
0.000017
LDAP tests executed 1198091850.102475
0.000003
LDAP tests result collection completed 1198091850.102482
0.000007
NSLOOKUP tests executed 1198091850.111523
0.009041
Test results transmitted 1198091850.118622
0.007099
bbtest-net completed 1198091850.120484
0.001862
TIME TOTAL
94.825674
Thanks, michael
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Michael A. Price
Josh, Thanks for getting back to me so quickly, I updated my /etc/hosts file to have every single one of my monitored hosts, just as a test. I now have 'failed hosts' in my DNS statistic's, but my 'PING test results sent' are still off the charts. I still cant figure out the problem...
▸
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 304
# failed : 0
▸
# calls to dnsresolve : 308
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198875012.330887
• Service definitions loaded 1198875012.331984
0.001097
Tests loaded 1198875012.405015
0.073031
DNS lookups completed 1198875012.405024
0.000009
Test engine setup completed 1198875012.408543
0.003519
TCP tests completed 1198875012.409325
0.000782
PING test completed (304 hosts) 1198875037.083126
24.673801
PING test results sent 1198875067.092719
30.009593
Test result collection completed 1198875067.092733
0.000014
LDAP test engine setup completed 1198875067.092737
0.000004
LDAP tests executed 1198875067.092741
0.000004
LDAP tests result collection completed 1198875067.092745
0.000004
NSLOOKUP tests executed 1198875067.096007
0.003262
Test results transmitted 1198875067.098247
0.002240
bbtest-net completed 1198875067.099155
0.000908
TIME TOTAL
54.768268
▸
Thanks, michael
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 27, 2007 11:15 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
Michael,
Try adding "testip" after the comment in as many hosts as possible, IE:
10.0.0.250 myftp.server.com # testip
Josh
On 12/27/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:
I just modified the /etc/nsswitch.conf file to remove DNS.
I find it interesting that no matter if the hobbit server uses DNS
servers or local host files to look up the hosts the 'PING Test Results
Sent' number is still off the charts.
Thanks so much for getting back to me
Thanks, michael
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Wednesday, December 26, 2007 6:00 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
Your calls to dnsresolve went up one, how in the world did you "[update]
the hobbit server to not use the DNS servers"?
It looks like it is still doing the exact same stuff concerning DNS to
me...
On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:
Thanks for getting back to me on this.
I updated the hobbit server to not use the DNS servers and all that does
is cause it to go from 100 failed hosts to 299 failed hosts.
I think it's the large "PING test results sent" number, what else could
be the problem???
Here is another printout...
Thanks, michael
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 203
# failed : 101
# calls to dnsresolve : 308
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup
1198691205.281738 -
Service definitions loaded 1198691205.282850
0.001112
Tests loaded 1198691205.316420
0.033570
DNS lookups completed 1198691215.446830
10.130410
Test engine setup completed
1198691215.450594 0.003764
TCP tests completed 1198691215.451393
0.000799
PING test completed (304 hosts) 1198691240.081987
24.630594
PING test results sent 1198691270.090627
30.008640
Test result collection completed 1198691270.090642
0.000015
LDAP test engine setup completed 1198691270.090656
0.000014
LDAP tests executed 1198691270.090660
0.000004
LDAP tests result collection completed
1198691270.090663 0.000003
NSLOOKUP tests executed 1198691270.146990
0.056327
Test results transmitted 1198691270.149410
0.002420
bbtest-net completed 1198691270.150271
0.000861
TIME TOTAL
64.868533
From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 20, 2007 11:04 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
If that was the only change you made recently try switching the DNS
servers back to see if the problem disappears.
On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid
<mailto:user-d7d653acf808@xymon.invalid> > wrote:
Thanks...
Actually, I updated my DNS servers and went from 300 failed lookups to
100. So I thought I was going to improve....
But it got worse!!!! Any other ideas???
Thanks, michael
From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, December 20, 2007 8:10 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
# failed : 100 <--- may be the cause, lots of failed
DNS queries
On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid
<mailto:user-d7d653acf808@xymon.invalid> > wrote:
My bbtest time went from 10 seconds to 89.0 ....
Has anyone seen this before???
Wed Dec 19 19:15:55 2007
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 310
Hosts with no tests : 7
Total test count : 307
Status messages : 308
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 303
# succesful : 203
# failed : 100
# calls to dnsresolve : 307
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198091755.294810
• Service definitions loaded 1198091755.297812
0.003002
Tests loaded 1198091755.346908
0.049096
DNS lookups completed 1198091765.439050
10.092142
Test engine setup completed 1198091765.442685
0.003635
TCP tests completed 1198091765.443457
0.000772
PING test completed (303 hosts) 1198091790.084027
24.640570
PING test results sent 1198091850.102236
60.018209
Test result collection completed 1198091850.102455
0.000219
LDAP test engine setup completed 1198091850.102472
0.000017
LDAP tests executed 1198091850.102475
0.000003
LDAP tests result collection completed 1198091850.102482
0.000007
NSLOOKUP tests executed 1198091850.111523
0.009041
Test results transmitted 1198091850.118622
0.007099
bbtest-net completed 1198091850.120484
0.001862
TIME TOTAL
94.825674
Thanks, michael
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Josh Luthman
Try Henrik's fping command at the bottom of this page: http://www.hswn.dk/hobbiton/2007/11/msg00069.html and stick a time in front to see how long it takes.
▸
On 12/28/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:
Josh,
Thanks for getting back to me so quickly, I updated my /etc/hosts file to have every single one of my monitored hosts, just as a test. I now have 'failed hosts' in my DNS statistic's, but my 'PING test results sent' are still off the charts. I still cant figure out the problem…
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 304
# failed : 0
# calls to dnsresolve : 308
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime Duration
bbtest-net startup 1198875012.330887 -
Service definitions loaded 1198875012.331984 0.001097
Tests loaded 1198875012.405015 0.073031
DNS lookups completed 1198875012.405024 0.000009
Test engine setup completed 1198875012.408543 0.003519
TCP tests completed 1198875012.409325 0.000782
PING test completed (304 hosts) 1198875037.083126 24.673801
PING test results sent 1198875067.092719 30.009593
Test result collection completed 1198875067.092733 0.000014
LDAP test engine setup completed 1198875067.092737 0.000004
LDAP tests executed 1198875067.092741 0.000004
LDAP tests result collection completed 1198875067.092745 0.000004
NSLOOKUP tests executed 1198875067.096007 0.003262
Test results transmitted 1198875067.098247 0.002240
bbtest-net completed 1198875067.099155 0.000908
TIME TOTAL 54.768268
Thanks, michael
*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Thursday, December 27, 2007 11:15 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] bbtest - errors
Michael,
Try adding "testip" after the comment in as many hosts as possible, IE:
10.0.0.250 myftp.server.com # testip
Josh
On 12/27/07, *Michael A. Price* <user-d7d653acf808@xymon.invalid> wrote:
I just modified the /etc/nsswitch.conf file to remove DNS.
I find it interesting that no matter if the hobbit server uses DNS servers
or local host files to look up the hosts the 'PING Test Results Sent' number
is still off the charts.
Thanks so much for getting back to me
Thanks, michael
*From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
*Sent:* Wednesday, December 26, 2007 6:00 PM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] bbtest - errors
Your calls to dnsresolve went up one, how in the world did you "[update]
the hobbit server to not use the DNS servers"?
It looks like it is still doing the exact same stuff concerning DNS to
me...
On 12/26/07, *Michael A. Price* <user-d7d653acf808@xymon.invalid> wrote:
Thanks for getting back to me on this.
I updated the hobbit server to not use the DNS servers and all that does
is cause it to go from 100 failed hosts to 299 failed hosts.
I think it's the large "PING test results sent" number, what else could be
the problem???
Here is another printout…
Thanks, michael
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 203
# failed : 101
# calls to dnsresolve : 308
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime Duration
bbtest-net startup
1198691205.281738 -
Service definitions loaded 1198691205.282850
0.001112
Tests loaded 1198691205.316420 0.033570
DNS lookups completed 1198691215.446830
10.130410
Test engine setup completed
1198691215.450594 0.003764
TCP tests completed 1198691215.451393 0.000799
PING test completed (304 hosts) 1198691240.081987 24.630594
PING test results sent 1198691270.090627 30.008640
Test result collection completed 1198691270.090642
0.000015
LDAP test engine setup completed 1198691270.090656 0.000014
LDAP tests executed 1198691270.090660 0.000004
LDAP tests result collection completed
1198691270.090663 0.000003
NSLOOKUP tests executed 1198691270.146990 0.056327
Test results transmitted 1198691270.149410 0.002420
bbtest-net completed 1198691270.150271 0.000861
TIME TOTAL
64.868533
*From:* Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid]
*Sent:* Thursday, December 20, 2007 11:04 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] bbtest - errors
If that was the only change you made recently try switching the DNS
servers back to see if the problem disappears.
On 12/20/07, *Michael A. Price* < user-d7d653acf808@xymon.invalid> wrote:
Thanks…
Actually, I updated my DNS servers and went from 300 failed lookups to
100. So I thought I was going to improve….
But it got worse!!!! Any other ideas???
Thanks, michael
*From:* Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid]
*Sent:* Thursday, December 20, 2007 8:10 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] bbtest - errors
# failed : 100 <--- may be the cause, lots of failed
DNS queries
On 12/19/07, *Michael A. Price* < user-d7d653acf808@xymon.invalid> wrote:
My bbtest time went from 10 seconds to 89.0 ....
Has anyone seen this before???
Wed Dec 19 19:15:55 2007
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 310
Hosts with no tests : 7
Total test count : 307
Status messages : 308
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 303
# succesful : 203
# failed : 100
# calls to dnsresolve : 307
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1198091755.294810
• Service definitions loaded 1198091755.297812
0.003002
Tests loaded 1198091755.346908
0.049096
DNS lookups completed 1198091765.439050
10.092142
Test engine setup completed 1198091765.442685
0.003635
TCP tests completed 1198091765.443457
0.000772
PING test completed (303 hosts) 1198091790.084027
24.640570
PING test results sent 1198091850.102236
60.018209
Test result collection completed 1198091850.102455
0.000219
LDAP test engine setup completed 1198091850.102472
0.000017
LDAP tests executed 1198091850.102475
0.000003
LDAP tests result collection completed 1198091850.102482
0.000007
NSLOOKUP tests executed 1198091850.111523
0.009041
Test results transmitted 1198091850.118622
0.007099
bbtest-net completed 1198091850.120484
0.001862
TIME TOTAL
94.825674
Thanks, michael
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Michael A. Price
Josh, Thanks for help, AGAIN.... One step closer... I have one host down, and I have the trace option on all of my hosts listed in bb-hosts. When I comment out that downed host, the errors clear up in bb-test. Take a look... Mon Dec 31 12:22:16 2007
▸
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 310
Hosts with no tests : 7
Total test count : 307
Status messages : 308
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 303
# succesful : 303
# failed : 0
▸
# calls to dnsresolve : 307
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
TIME SPENT
Event Starttime Duration
bbtest-net startup 1199103736.384784 -
Service definitions loaded 1199103736.385887 0.001103 Tests loaded 1199103736.768919 0.383032 DNS lookups completed 1199103736.768928 0.000009 Test engine setup completed 1199103736.772261 0.003333 TCP tests completed 1199103736.773300 0.001039 PING test completed (303 hosts) 1199103755.089536 18.316236 PING test results sent 1199103755.091233 0.001697 Test result collection completed 1199103755.091241 0.000008 LDAP test engine setup completed 1199103755.091245 0.000004 LDAP tests executed 1199103755.091249 0.000004 LDAP tests result collection completed 1199103755.091252 0.000003 NSLOOKUP tests executed 1199103755.095923 0.004671 Test results transmitted 1199103755.098103 0.002180 bbtest-net completed 1199103755.099180 0.001077 TIME TOTAL 18.714396
But once I uncomment out the host and the hobbit server tries to do a traceroute to it, the errors come back again. Even if I disable the alerting of that host. Take a look....
Mon Dec 31 12:32:24 2007
▸
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7m 23 Feb 2007
LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 311
Hosts with no tests : 7
Total test count : 308
Status messages : 309
Alert status msgs : 0
Transmissions : 5
DNS statistics:
# hostnames resolved : 304
# succesful : 304
# failed : 0
# calls to dnsresolve : 308
TCP test statistics:
# TCP tests total : 2
# HTTP tests : 1
# Simple TCP tests : 1
# Connection attempts : 2
# bytes written : 135
# bytes read : 553
Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
TIME SPENT
Event Starttime Duration
bbtest-net startup 1199104344.425092 -
Service definitions loaded 1199104344.426152 0.001060 Tests loaded 1199104344.543955 0.117803 DNS lookups completed 1199104344.543964 0.000009 Test engine setup completed 1199104344.547454 0.003490 TCP tests completed 1199104344.548434 0.000980 PING test completed (304 hosts) 1199104369.082520 24.534086 PING test results sent 1199104399.089988 30.007468 Test result collection completed 1199104399.090003 0.000015 LDAP test engine setup completed 1199104399.090007 0.000004 LDAP tests executed 1199104399.090011 0.000004 LDAP tests result collection completed 1199104399.090015 0.000004 NSLOOKUP tests executed 1199104399.095563 0.005548 Test results transmitted 1199104399.097862 0.002299 bbtest-net completed 1199104399.098975 0.001113 TIME TOTAL 54.673883
Any ideas of why its doing it??? Or how to resolve it???
▸
Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Friday, December 28, 2007 5:30 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Try Henrik's fping command at the bottom of this page: http://www.hswn.dk/hobbiton/2007/11/msg00069.html and stick a time in front to see how long it takes. On 12/28/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Josh, Thanks for getting back to me so quickly, I updated my /etc/hosts file to have every single one of my monitored hosts, just as a test. I now have 'failed hosts' in my DNS statistic's, but my 'PING test results sent' are still off the charts. I still cant figure out the problem... bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198875012.330887 - Service definitions loaded 1198875012.331984 0.001097 Tests loaded 1198875012.405015 0.073031 DNS lookups completed 1198875012.405024 0.000009 Test engine setup completed 1198875012.408543 0.003519 TCP tests completed 1198875012.409325 0.000782 PING test completed (304 hosts) 1198875037.083126 24.673801 PING test results sent 1198875067.092719 30.009593 Test result collection completed 1198875067.092733 0.000014 LDAP test engine setup completed 1198875067.092737 0.000004 LDAP tests executed 1198875067.092741 0.000004 LDAP tests result collection completed 1198875067.092745 0.000004 NSLOOKUP tests executed 1198875067.096007 0.003262 Test results transmitted 1198875067.098247 0.002240 bbtest-net completed 1198875067.099155 0.000908 TIME TOTAL 54.768268 Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 27, 2007 11:15 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Michael, Try adding "testip" after the comment in as many hosts as possible, IE: 10.0.0.250 myftp.server.com # testip Josh On 12/27/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: I just modified the /etc/nsswitch.conf file to remove DNS. I find it interesting that no matter if the hobbit server uses DNS servers or local host files to look up the hosts the 'PING Test Results Sent' number is still off the charts. Thanks so much for getting back to me Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Wednesday, December 26, 2007 6:00 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Your calls to dnsresolve went up one, how in the world did you "[update] the hobbit server to not use the DNS servers"? It looks like it is still doing the exact same stuff concerning DNS to me... On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Thanks for getting back to me on this. I updated the hobbit server to not use the DNS servers and all that does is cause it to go from 100 failed hosts to 299 failed hosts. I think it's the large "PING test results sent" number, what else could be the problem??? Here is another printout... Thanks, michael bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 203 # failed : 101 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198691205.281738 - Service definitions loaded 1198691205.282850 0.001112 Tests loaded 1198691205.316420 0.033570 DNS lookups completed 1198691215.446830 10.130410 Test engine setup completed 1198691215.450594 0.003764 TCP tests completed 1198691215.451393 0.000799 PING test completed (304 hosts) 1198691240.081987 24.630594 PING test results sent 1198691270.090627 30.008640 Test result collection completed 1198691270.090642 0.000015 LDAP test engine setup completed 1198691270.090656 0.000014 LDAP tests executed 1198691270.090660 0.000004 LDAP tests result collection completed 1198691270.090663 0.000003 NSLOOKUP tests executed 1198691270.146990 0.056327 Test results transmitted 1198691270.149410 0.002420 bbtest-net completed 1198691270.150271 0.000861 TIME TOTAL 64.868533 From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 11:04 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors If that was the only change you made recently try switching the DNS servers back to see if the problem disappears. On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: Thanks... Actually, I updated my DNS servers and went from 300 failed lookups to 100. So I thought I was going to improve.... But it got worse!!!! Any other ideas??? Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 8:10 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors # failed : 100 <--- may be the cause, lots of failed DNS queries On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Michael A. Price
Josh, I just figured out it's the #trace option. When I remove that option the errors go away... Thanks, michael
▸
-----Original Message----- From: Michael A. Price Sent: Monday, December 31, 2007 7:35 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] bbtest - errors Josh, Thanks for help, AGAIN.... One step closer... I have one host down, and I have the trace option on all of my hosts listed in bb-hosts. When I comment out that downed host, the errors clear up in bb-test. Take a look... Mon Dec 31 12:22:16 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 303 # failed : 0 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 TIME SPENT Event Starttime Duration bbtest-net startup 1199103736.384784 - Service definitions loaded 1199103736.385887 0.001103 Tests loaded 1199103736.768919 0.383032 DNS lookups completed 1199103736.768928 0.000009 Test engine setup completed 1199103736.772261 0.003333 TCP tests completed 1199103736.773300 0.001039 PING test completed (303 hosts) 1199103755.089536 18.316236 PING test results sent 1199103755.091233 0.001697 Test result collection completed 1199103755.091241 0.000008 LDAP test engine setup completed 1199103755.091245 0.000004 LDAP tests executed 1199103755.091249 0.000004 LDAP tests result collection completed 1199103755.091252 0.000003 NSLOOKUP tests executed 1199103755.095923 0.004671 Test results transmitted 1199103755.098103 0.002180 bbtest-net completed 1199103755.099180 0.001077 TIME TOTAL 18.714396 But once I uncomment out the host and the hobbit server tries to do a traceroute to it, the errors come back again. Even if I disable the alerting of that host. Take a look.... Mon Dec 31 12:32:24 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1199104344.425092 - Service definitions loaded 1199104344.426152 0.001060 Tests loaded 1199104344.543955 0.117803 DNS lookups completed 1199104344.543964 0.000009 Test engine setup completed 1199104344.547454 0.003490 TCP tests completed 1199104344.548434 0.000980 PING test completed (304 hosts) 1199104369.082520 24.534086 PING test results sent 1199104399.089988 30.007468 Test result collection completed 1199104399.090003 0.000015 LDAP test engine setup completed 1199104399.090007 0.000004 LDAP tests executed 1199104399.090011 0.000004 LDAP tests result collection completed 1199104399.090015 0.000004 NSLOOKUP tests executed 1199104399.095563 0.005548 Test results transmitted 1199104399.097862 0.002299 bbtest-net completed 1199104399.098975 0.001113 TIME TOTAL 54.673883 Any ideas of why its doing it??? Or how to resolve it??? Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Friday, December 28, 2007 5:30 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Try Henrik's fping command at the bottom of this page: http://www.hswn.dk/hobbiton/2007/11/msg00069.html and stick a time in front to see how long it takes. On 12/28/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Josh, Thanks for getting back to me so quickly, I updated my /etc/hosts file to have every single one of my monitored hosts, just as a test. I now have 'failed hosts' in my DNS statistic's, but my 'PING test results sent' are still off the charts. I still cant figure out the problem... bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198875012.330887 - Service definitions loaded 1198875012.331984 0.001097 Tests loaded 1198875012.405015 0.073031 DNS lookups completed 1198875012.405024 0.000009 Test engine setup completed 1198875012.408543 0.003519 TCP tests completed 1198875012.409325 0.000782 PING test completed (304 hosts) 1198875037.083126 24.673801 PING test results sent 1198875067.092719 30.009593 Test result collection completed 1198875067.092733 0.000014 LDAP test engine setup completed 1198875067.092737 0.000004 LDAP tests executed 1198875067.092741 0.000004 LDAP tests result collection completed 1198875067.092745 0.000004 NSLOOKUP tests executed 1198875067.096007 0.003262 Test results transmitted 1198875067.098247 0.002240 bbtest-net completed 1198875067.099155 0.000908 TIME TOTAL 54.768268 Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 27, 2007 11:15 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Michael, Try adding "testip" after the comment in as many hosts as possible, IE: 10.0.0.250 myftp.server.com # testip Josh On 12/27/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: I just modified the /etc/nsswitch.conf file to remove DNS. I find it interesting that no matter if the hobbit server uses DNS servers or local host files to look up the hosts the 'PING Test Results Sent' number is still off the charts. Thanks so much for getting back to me Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Wednesday, December 26, 2007 6:00 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Your calls to dnsresolve went up one, how in the world did you "[update] the hobbit server to not use the DNS servers"? It looks like it is still doing the exact same stuff concerning DNS to me... On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Thanks for getting back to me on this. I updated the hobbit server to not use the DNS servers and all that does is cause it to go from 100 failed hosts to 299 failed hosts. I think it's the large "PING test results sent" number, what else could be the problem??? Here is another printout... Thanks, michael bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 203 # failed : 101 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198691205.281738 - Service definitions loaded 1198691205.282850 0.001112 Tests loaded 1198691205.316420 0.033570 DNS lookups completed 1198691215.446830 10.130410 Test engine setup completed 1198691215.450594 0.003764 TCP tests completed 1198691215.451393 0.000799 PING test completed (304 hosts) 1198691240.081987 24.630594 PING test results sent 1198691270.090627 30.008640 Test result collection completed 1198691270.090642 0.000015 LDAP test engine setup completed 1198691270.090656 0.000014 LDAP tests executed 1198691270.090660 0.000004 LDAP tests result collection completed 1198691270.090663 0.000003 NSLOOKUP tests executed 1198691270.146990 0.056327 Test results transmitted 1198691270.149410 0.002420 bbtest-net completed 1198691270.150271 0.000861 TIME TOTAL 64.868533 From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 11:04 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors If that was the only change you made recently try switching the DNS servers back to see if the problem disappears. On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: Thanks... Actually, I updated my DNS servers and went from 300 failed lookups to 100. So I thought I was going to improve.... But it got worse!!!! Any other ideas??? Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 8:10 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors # failed : 100 <--- may be the cause, lots of failed DNS queries On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Josh Luthman
Can you do a trace at the shell?
▸
On 12/31/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:Josh, I just figured out it's the #trace option. When I remove that option the errors go away... Thanks, michael -----Original Message----- From: Michael A. Price Sent: Monday, December 31, 2007 7:35 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] bbtest - errors Josh, Thanks for help, AGAIN.... One step closer... I have one host down, and I have the trace option on all of my hosts listed in bb-hosts. When I comment out that downed host, the errors clear up in bb-test. Take a look... Mon Dec 31 12:22:16 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 303 # failed : 0 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 TIME SPENT Event Starttime Duration bbtest-net startup 1199103736.384784 - Service definitions loaded 1199103736.385887 0.001103 Tests loaded 1199103736.768919 0.383032 DNS lookups completed 1199103736.768928 0.000009 Test engine setup completed 1199103736.772261 0.003333 TCP tests completed 1199103736.773300 0.001039 PING test completed (303 hosts) 1199103755.089536 18.316236 PING test results sent 1199103755.091233 0.001697 Test result collection completed 1199103755.091241 0.000008 LDAP test engine setup completed 1199103755.091245 0.000004 LDAP tests executed 1199103755.091249 0.000004 LDAP tests result collection completed 1199103755.091252 0.000003 NSLOOKUP tests executed 1199103755.095923 0.004671 Test results transmitted 1199103755.098103 0.002180 bbtest-net completed 1199103755.099180 0.001077 TIME TOTAL 18.714396 But once I uncomment out the host and the hobbit server tries to do a traceroute to it, the errors come back again. Even if I disable the alerting of that host. Take a look.... Mon Dec 31 12:32:24 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1199104344.425092 - Service definitions loaded 1199104344.426152 0.001060 Tests loaded 1199104344.543955 0.117803 DNS lookups completed 1199104344.543964 0.000009 Test engine setup completed 1199104344.547454 0.003490 TCP tests completed 1199104344.548434 0.000980 PING test completed (304 hosts) 1199104369.082520 24.534086 PING test results sent 1199104399.089988 30.007468 Test result collection completed 1199104399.090003 0.000015 LDAP test engine setup completed 1199104399.090007 0.000004 LDAP tests executed 1199104399.090011 0.000004 LDAP tests result collection completed 1199104399.090015 0.000004 NSLOOKUP tests executed 1199104399.095563 0.005548 Test results transmitted 1199104399.097862 0.002299 bbtest-net completed 1199104399.098975 0.001113 TIME TOTAL 54.673883 Any ideas of why its doing it??? Or how to resolve it??? Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Friday, December 28, 2007 5:30 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Try Henrik's fping command at the bottom of this page: http://www.hswn.dk/hobbiton/2007/11/msg00069.html and stick a time in front to see how long it takes. On 12/28/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Josh, Thanks for getting back to me so quickly, I updated my /etc/hosts file to have every single one of my monitored hosts, just as a test. I now have 'failed hosts' in my DNS statistic's, but my 'PING test results sent' are still off the charts. I still cant figure out the problem... bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198875012.330887 - Service definitions loaded 1198875012.331984 0.001097 Tests loaded 1198875012.405015 0.073031 DNS lookups completed 1198875012.405024 0.000009 Test engine setup completed 1198875012.408543 0.003519 TCP tests completed 1198875012.409325 0.000782 PING test completed (304 hosts) 1198875037.083126 24.673801 PING test results sent 1198875067.092719 30.009593 Test result collection completed 1198875067.092733 0.000014 LDAP test engine setup completed 1198875067.092737 0.000004 LDAP tests executed 1198875067.092741 0.000004 LDAP tests result collection completed 1198875067.092745 0.000004 NSLOOKUP tests executed 1198875067.096007 0.003262 Test results transmitted 1198875067.098247 0.002240 bbtest-net completed 1198875067.099155 0.000908 TIME TOTAL 54.768268 Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 27, 2007 11:15 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Michael, Try adding "testip" after the comment in as many hosts as possible, IE: 10.0.0.250 myftp.server.com # testip Josh On 12/27/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: I just modified the /etc/nsswitch.conf file to remove DNS. I find it interesting that no matter if the hobbit server uses DNS servers or local host files to look up the hosts the 'PING Test Results Sent' number is still off the charts. Thanks so much for getting back to me Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Wednesday, December 26, 2007 6:00 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Your calls to dnsresolve went up one, how in the world did you "[update] the hobbit server to not use the DNS servers"? It looks like it is still doing the exact same stuff concerning DNS to me... On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Thanks for getting back to me on this. I updated the hobbit server to not use the DNS servers and all that does is cause it to go from 100 failed hosts to 299 failed hosts. I think it's the large "PING test results sent" number, what else could be the problem??? Here is another printout... Thanks, michael bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 203 # failed : 101 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198691205.281738 - Service definitions loaded 1198691205.282850 0.001112 Tests loaded 1198691205.316420 0.033570 DNS lookups completed 1198691215.446830 10.130410 Test engine setup completed 1198691215.450594 0.003764 TCP tests completed 1198691215.451393 0.000799 PING test completed (304 hosts) 1198691240.081987 24.630594 PING test results sent 1198691270.090627 30.008640 Test result collection completed 1198691270.090642 0.000015 LDAP test engine setup completed 1198691270.090656 0.000014 LDAP tests executed 1198691270.090660 0.000004 LDAP tests result collection completed 1198691270.090663 0.000003 NSLOOKUP tests executed 1198691270.146990 0.056327 Test results transmitted 1198691270.149410 0.002420 bbtest-net completed 1198691270.150271 0.000861 TIME TOTAL 64.868533 From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 11:04 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors If that was the only change you made recently try switching the DNS servers back to see if the problem disappears. On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: Thanks... Actually, I updated my DNS servers and went from 300 failed lookups to 100. So I thought I was going to improve.... But it got worse!!!! Any other ideas??? Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 8:10 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors # failed : 100 <--- may be the cause, lots of failed DNS queries On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Michael A. Price
Nice sleuthing... It looks like the ball is back in my court. The trace command at the command line, never seems to end. I will do some research.. Thanks, michael
▸
-----Original Message-----
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Monday, December 31, 2007 10:27 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] bbtest - errors
Can you do a trace at the shell?
On 12/31/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:Josh, I just figured out it's the #trace option. When I remove that option the errors go away... Thanks, michael -----Original Message----- From: Michael A. Price Sent: Monday, December 31, 2007 7:35 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] bbtest - errors Josh, Thanks for help, AGAIN.... One step closer... I have one host down, and I have the trace option on all of my hosts listed in bb-hosts. When I comment out that downed host, the errors clear up in bb-test. Take a look... Mon Dec 31 12:22:16 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 303 # failed : 0 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 TIME SPENT Event Starttime
Duration
bbtest-net startup 1199103736.384784
-
Service definitions loaded 1199103736.385887
0.001103
Tests loaded 1199103736.768919
0.383032
DNS lookups completed 1199103736.768928
0.000009
Test engine setup completed 1199103736.772261
0.003333
TCP tests completed 1199103736.773300
0.001039
PING test completed (303 hosts) 1199103755.089536
18.316236
PING test results sent 1199103755.091233
0.001697
Test result collection completed 1199103755.091241
0.000008
LDAP test engine setup completed 1199103755.091245
0.000004
LDAP tests executed 1199103755.091249
0.000004
LDAP tests result collection completed 1199103755.091252
0.000003
NSLOOKUP tests executed 1199103755.095923
0.004671
Test results transmitted 1199103755.098103
0.002180
bbtest-net completed 1199103755.099180
0.001077
TIME TOTAL
18.714396
But once I uncomment out the host and the hobbit server tries to do a traceroute to it, the errors come back again. Even if I disable the alerting of that host. Take a look.... Mon Dec 31 12:32:24 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime
Duration
bbtest-net startup 1199104344.425092
-
Service definitions loaded 1199104344.426152
0.001060
Tests loaded 1199104344.543955
0.117803
DNS lookups completed 1199104344.543964
0.000009
Test engine setup completed 1199104344.547454
0.003490
TCP tests completed 1199104344.548434
0.000980
PING test completed (304 hosts) 1199104369.082520
24.534086
PING test results sent 1199104399.089988
30.007468
Test result collection completed 1199104399.090003
0.000015
LDAP test engine setup completed 1199104399.090007
0.000004
LDAP tests executed 1199104399.090011
0.000004
LDAP tests result collection completed 1199104399.090015
0.000004
NSLOOKUP tests executed 1199104399.095563
0.005548
Test results transmitted 1199104399.097862
0.002299
bbtest-net completed 1199104399.098975
0.001113
TIME TOTAL
54.673883
Any ideas of why its doing it??? Or how to resolve it??? Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Friday, December 28, 2007 5:30 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Try Henrik's fping command at the bottom of this page: http://www.hswn.dk/hobbiton/2007/11/msg00069.html and stick a time in front to see how long it takes. On 12/28/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Josh, Thanks for getting back to me so quickly, I updated my /etc/hosts file to have every single one of my monitored hosts, just as a test. I now have 'failed hosts' in my DNS statistic's, but my 'PING test results sent' are still off the charts. I still cant figure out the problem... bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime
Duration
bbtest-net startup 1198875012.330887
-
Service definitions loaded 1198875012.331984 0.001097 Tests loaded 1198875012.405015
0.073031
DNS lookups completed 1198875012.405024
0.000009
Test engine setup completed 1198875012.408543
0.003519
TCP tests completed 1198875012.409325
0.000782
PING test completed (304 hosts) 1198875037.08312624.673801
PING test results sent 1198875067.092719
30.009593
Test result collection completed 1198875067.092733 0.000014 LDAP test engine setup completed 1198875067.092737
0.000004
LDAP tests executed 1198875067.092741
0.000004
LDAP tests result collection completed 1198875067.092745
0.000004
NSLOOKUP tests executed 1198875067.096007
0.003262
Test results transmitted 1198875067.0982470.002240
bbtest-net completed 1198875067.099155
0.000908
TIME TOTAL 54.768268 Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 27, 2007 11:15 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Michael, Try adding "testip" after the comment in as many hosts as possible,
IE:
10.0.0.250 myftp.server.com # testip Josh On 12/27/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: I just modified the /etc/nsswitch.conf file to remove DNS. I find it interesting that no matter if the hobbit server uses DNS servers or local host files to look up the hosts the 'PING Test Results Sent' number is still off the charts. Thanks so much for getting back to me Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Wednesday, December 26, 2007 6:00 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Your calls to dnsresolve went up one, how in the world did you
"[update] the
hobbit server to not use the DNS servers"? It looks like it is still doing the exact same stuff concerning DNS to me... On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Thanks for getting back to me on this. I updated the hobbit server to not use the DNS servers and all that does is cause it to go from 100 failed hosts to 299 failed hosts. I think it's the large "PING test results sent" number, what else could be the problem??? Here is another printout... Thanks, michael bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 203 # failed : 101 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime
Duration
bbtest-net startup
1198691205.281738 -
Service definitions loaded 1198691205.282850
0.001112
Tests loaded 1198691205.3164200.033570
DNS lookups completed 1198691215.446830 10.130410 Test engine setup completed 1198691215.450594 0.003764 TCP tests completed 1198691215.451393 0.000799 PING test completed (304 hosts) 1198691240.081987
24.630594
PING test results sent 1198691270.090627
30.008640
Test result collection completed 1198691270.090642 0.000015 LDAP test engine setup completed 1198691270.090656 0.000014 LDAP tests executed 1198691270.090660
0.000004
LDAP tests result collection completed 1198691270.090663 0.000003 NSLOOKUP tests executed 1198691270.146990 0.056327 Test results transmitted 1198691270.149410 0.002420 bbtest-net completed 1198691270.150271
0.000861
TIME TOTAL 64.868533 From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 11:04 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors If that was the only change you made recently try switching the DNS servers back to see if the problem disappears. On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: Thanks... Actually, I updated my DNS servers and went from 300 failed lookups to
100.
So I thought I was going to improve.... But it got worse!!!! Any other ideas??? Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 8:10 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors # failed : 100 <--- may be the cause, lots of failed DNS queries On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Josh Luthman
Damn that ICMP :)
▸
On 12/31/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:Nice sleuthing... It looks like the ball is back in my court. The trace command at the command line, never seems to end. I will do some research.. Thanks, michael -----Original Message----- From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Monday, December 31, 2007 10:27 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Can you do a trace at the shell? On 12/31/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote:Josh, I just figured out it's the #trace option. When I remove that option the errors go away... Thanks, michael -----Original Message----- From: Michael A. Price Sent: Monday, December 31, 2007 7:35 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] bbtest - errors Josh, Thanks for help, AGAIN.... One step closer... I have one host down, and I have the trace option on all of my hosts listed in bb-hosts. When I comment out that downed host, the errors clear up in bb-test. Take a look... Mon Dec 31 12:22:16 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 303 # failed : 0 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 TIME SPENT Event StarttimeDurationbbtest-net startup 1199103736.384784-Service definitions loaded 1199103736.3858870.001103Tests loaded 1199103736.7689190.383032DNS lookups completed 1199103736.7689280.000009Test engine setup completed 1199103736.7722610.003333TCP tests completed 1199103736.7733000.001039PING test completed (303 hosts) 1199103755.08953618.316236PING test results sent 1199103755.0912330.001697Test result collection completed 1199103755.0912410.000008LDAP test engine setup completed 1199103755.0912450.000004LDAP tests executed 1199103755.0912490.000004LDAP tests result collection completed 1199103755.0912520.000003NSLOOKUP tests executed 1199103755.0959230.004671Test results transmitted 1199103755.0981030.002180bbtest-net completed 1199103755.0991800.001077TIME TOTAL18.714396But once I uncomment out the host and the hobbit server tries to do a traceroute to it, the errors come back again. Even if I disable the alerting of that host. Take a look.... Mon Dec 31 12:32:24 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event StarttimeDurationbbtest-net startup 1199104344.425092-Service definitions loaded 1199104344.4261520.001060Tests loaded 1199104344.5439550.117803DNS lookups completed 1199104344.5439640.000009Test engine setup completed 1199104344.5474540.003490TCP tests completed 1199104344.5484340.000980PING test completed (304 hosts) 1199104369.08252024.534086PING test results sent 1199104399.08998830.007468Test result collection completed 1199104399.0900030.000015LDAP test engine setup completed 1199104399.0900070.000004LDAP tests executed 1199104399.0900110.000004LDAP tests result collection completed 1199104399.0900150.000004NSLOOKUP tests executed 1199104399.0955630.005548Test results transmitted 1199104399.0978620.002299bbtest-net completed 1199104399.0989750.001113TIME TOTAL54.673883Any ideas of why its doing it??? Or how to resolve it??? Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Friday, December 28, 2007 5:30 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Try Henrik's fping command at the bottom of this page: http://www.hswn.dk/hobbiton/2007/11/msg00069.html and stick a time in front to see how long it takes. On 12/28/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Josh, Thanks for getting back to me so quickly, I updated my /etc/hosts file to have every single one of my monitored hosts, just as a test. I now have 'failed hosts' in my DNS statistic's, but my 'PING test results sent' are still off the charts. I still cant figure out the problem... bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 304 # failed : 0 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event StarttimeDurationbbtest-net startup 1198875012.330887-Service definitions loaded 1198875012.331984 0.001097 Tests loaded 1198875012.4050150.073031DNS lookups completed 1198875012.4050240.000009Test engine setup completed 1198875012.4085430.003519TCP tests completed 1198875012.409325 0.000782 PING test completed (304 hosts) 1198875037.08312624.673801PING test results sent 1198875067.09271930.009593Test result collection completed 1198875067.092733 0.000014 LDAP test engine setup completed 1198875067.0927370.000004LDAP tests executed 1198875067.0927410.000004LDAP tests result collection completed 1198875067.0927450.000004NSLOOKUP tests executed 1198875067.096007 0.003262 Test results transmitted 1198875067.0982470.002240bbtest-net completed 1198875067.0991550.000908TIME TOTAL 54.768268 Thanks, michael From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 27, 2007 11:15 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Michael, Try adding "testip" after the comment in as many hosts as possible,IE:10.0.0.250 myftp.server.com # testip Josh On 12/27/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: I just modified the /etc/nsswitch.conf file to remove DNS. I find it interesting that no matter if the hobbit server uses DNS servers or local host files to look up the hosts the 'PING Test Results Sent' number is still off the charts. Thanks so much for getting back to me Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Wednesday, December 26, 2007 6:00 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors Your calls to dnsresolve went up one, how in the world did you"[update] thehobbit server to not use the DNS servers"? It looks like it is still doing the exact same stuff concerning DNS to me... On 12/26/07, Michael A. Price <user-d7d653acf808@xymon.invalid> wrote: Thanks for getting back to me on this. I updated the hobbit server to not use the DNS servers and all that does is cause it to go from 100 failed hosts to 299 failed hosts. I think it's the large "PING test results sent" number, what else could be the problem??? Here is another printout... Thanks, michael bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 311 Hosts with no tests : 7 Total test count : 308 Status messages : 309 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 304 # succesful : 203 # failed : 101 # calls to dnsresolve : 308 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event StarttimeDurationbbtest-net startup 1198691205.281738 - Service definitions loaded 1198691205.282850 0.001112 Tests loaded 1198691205.3164200.033570DNS lookups completed 1198691215.446830 10.130410 Test engine setup completed 1198691215.450594 0.003764 TCP tests completed 1198691215.451393 0.000799 PING test completed (304 hosts) 1198691240.08198724.630594PING test results sent 1198691270.09062730.008640Test result collection completed 1198691270.090642 0.000015 LDAP test engine setup completed 1198691270.090656 0.000014 LDAP tests executed 1198691270.0906600.000004LDAP tests result collection completed 1198691270.090663 0.000003 NSLOOKUP tests executed 1198691270.146990 0.056327 Test results transmitted 1198691270.149410 0.002420 bbtest-net completed 1198691270.1502710.000861TIME TOTAL 64.868533 From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 11:04 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors If that was the only change you made recently try switching the DNS servers back to see if the problem disappears. On 12/20/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: Thanks... Actually, I updated my DNS servers and went from 300 failed lookups to100.So I thought I was going to improve.... But it got worse!!!! Any other ideas??? Thanks, michael From: Josh Luthman [mailto: user-4c45a83f15cb@xymon.invalid] Sent: Thursday, December 20, 2007 8:10 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] bbtest - errors # failed : 100 <--- may be the cause, lots of failed DNS queries On 12/19/07, Michael A. Price < user-d7d653acf808@xymon.invalid> wrote: My bbtest time went from 10 seconds to 89.0 .... Has anyone seen this before??? Wed Dec 19 19:15:55 2007 bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7m 23 Feb 2007 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 310 Hosts with no tests : 7 Total test count : 307 Status messages : 308 Alert status msgs : 0 Transmissions : 5 DNS statistics: # hostnames resolved : 303 # succesful : 203 # failed : 100 # calls to dnsresolve : 307 TCP test statistics: # TCP tests total : 2 # HTTP tests : 1 # Simple TCP tests : 1 # Connection attempts : 2 # bytes written : 135 # bytes read : 553 Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1198091755.294810 • Service definitions loaded 1198091755.297812 0.003002 Tests loaded 1198091755.346908 0.049096 DNS lookups completed 1198091765.439050 10.092142 Test engine setup completed 1198091765.442685 0.003635 TCP tests completed 1198091765.443457 0.000772 PING test completed (303 hosts) 1198091790.084027 24.640570 PING test results sent 1198091850.102236 60.018209 Test result collection completed 1198091850.102455 0.000219 LDAP test engine setup completed 1198091850.102472 0.000017 LDAP tests executed 1198091850.102475 0.000003 LDAP tests result collection completed 1198091850.102482 0.000007 NSLOOKUP tests executed 1198091850.111523 0.009041 Test results transmitted 1198091850.118622 0.007099 bbtest-net completed 1198091850.120484 0.001862 TIME TOTAL 94.825674 Thanks, michael -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer