bbtest yellow - PING test results sent unacceptably high
list John Payne
bbtest has been yellow for the last week. I'm wondering if anyone has an idea what might be causing this? I'm using hobbitping. I've tried bumping concurrency up with no luck... but given that the increase in time is in "PING test results sent", I'm not sure that's the right move. Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1227628387.017852 - Service definitions loaded 1227628387.018981 0.001129 Tests loaded 1227628387.053238 0.034257 DNS lookups completed 1227628387.055291 0.002053 Test engine setup completed 1227628387.067797 0.012506 TCP tests completed 1227628397.998092 10.930295 PING test completed (1628 hosts) 1227628439.127806 41.129714 PING test results sent 1227628538.148170 99.020364 Test result collection completed 1227628538.148205 0.000035 LDAP test engine setup completed 1227628538.148205 0.000000 LDAP tests executed 1227628538.148206 0.000001 LDAP tests result collection completed 1227628538.148207 0.000001 Test results transmitted 1227628538.209262 0.061055 bbtest-net completed 1227628538.214518 0.005256 TIME TOTAL 151.196666 Last week: TIME SPENT Event Starttime Duration bbtest-net startup 1226941570.282399 - Service definitions loaded 1226941570.286460 0.004061 Tests loaded 1226941570.323923 0.037463 DNS lookups completed 1226941570.324033 0.000110 Test engine setup completed 1226941570.633056 0.309023 TCP tests completed 1226941580.999136 10.366080 PING test completed (1648 hosts) 1226941625.138740 44.139604 PING test results sent 1226941634.162412 9.023672 Test result collection completed 1226941634.162446 0.000034 LDAP test engine setup completed 1226941634.162447 0.000001 LDAP tests executed 1226941634.162447 0.000000 LDAP tests result collection completed 1226941634.162448 0.000001 Test results transmitted 1226941634.212176 0.049728 bbtest-net completed 1226941634.213048 0.000872 TIME TOTAL 63.930649 Thanks John
list Josh Luthman
Do you have the capability to use fping? I'd try that and see what the results are. Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
▸
On Tue, Nov 25, 2008 at 11:19 AM, John Payne <user-ec997c02e3b9@xymon.invalid> wrote:
bbtest has been yellow for the last week. I'm wondering if anyone has an idea what might be causing this? I'm using hobbitping. I've tried bumping concurrency up with no luck... but given that the increase in time is in "PING test results sent", I'm not sure that's the right move. Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1227628387.017852 - Service definitions loaded 1227628387.018981 0.001129 Tests loaded 1227628387.053238 0.034257 DNS lookups completed 1227628387.055291 0.002053 Test engine setup completed 1227628387.067797 0.012506 TCP tests completed 1227628397.998092 10.930295 PING test completed (1628 hosts) 1227628439.127806 41.129714 PING test results sent 1227628538.148170 99.020364 Test result collection completed 1227628538.148205 0.000035 LDAP test engine setup completed 1227628538.148205 0.000000 LDAP tests executed 1227628538.148206 0.000001 LDAP tests result collection completed 1227628538.148207 0.000001 Test results transmitted 1227628538.209262 0.061055 bbtest-net completed 1227628538.214518 0.005256 TIME TOTAL 151.196666 Last week: TIME SPENT Event Starttime Duration bbtest-net startup 1226941570.282399 - Service definitions loaded 1226941570.286460 0.004061 Tests loaded 1226941570.323923 0.037463 DNS lookups completed 1226941570.324033 0.000110 Test engine setup completed 1226941570.633056 0.309023 TCP tests completed 1226941580.999136 10.366080 PING test completed (1648 hosts) 1226941625.138740 44.139604 PING test results sent 1226941634.162412 9.023672 Test result collection completed 1226941634.162446 0.000034 LDAP test engine setup completed 1226941634.162447 0.000001 LDAP tests executed 1226941634.162447 0.000000 LDAP tests result collection completed 1226941634.162448 0.000001 Test results transmitted 1226941634.212176 0.049728 bbtest-net completed 1226941634.213048 0.000872 TIME TOTAL 63.930649 Thanks John
list John Payne
▸
On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:
Do you have the capability to use fping? I'd try that and see what the results are.
I had an issue with fping back in the day. Before I spend too much time figuring out how to switch, I'd like to know what would cause the "test results sent" time to skyrocket.
▸
Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer On Tue, Nov 25, 2008 at 11:19 AM, John Payne <user-ec997c02e3b9@xymon.invalid> wrote: bbtest has been yellow for the last week. I'm wondering if anyone has an idea what might be causing this? I'm using hobbitping. I've tried bumping concurrency up with no luck... but given that the increase in time is in "PING test results sent", I'm not sure that's the right move. Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1227628387.017852 - Service definitions loaded 1227628387.018981 0.001129 Tests loaded 1227628387.053238 0.034257 DNS lookups completed 1227628387.055291 0.002053 Test engine setup completed 1227628387.067797 0.012506 TCP tests completed 1227628397.998092 10.930295 PING test completed (1628 hosts) 1227628439.127806 41.129714 PING test results sent 1227628538.148170 99.020364 Test result collection completed 1227628538.148205 0.000035 LDAP test engine setup completed 1227628538.148205 0.000000 LDAP tests executed 1227628538.148206 0.000001 LDAP tests result collection completed 1227628538.148207 0.000001 Test results transmitted 1227628538.209262 0.061055 bbtest-net completed 1227628538.214518 0.005256 TIME TOTAL 151.196666 Last week: TIME SPENT Event Starttime Duration bbtest-net startup 1226941570.282399 - Service definitions loaded 1226941570.286460 0.004061 Tests loaded 1226941570.323923 0.037463 DNS lookups completed 1226941570.324033 0.000110 Test engine setup completed 1226941570.633056 0.309023 TCP tests completed 1226941580.999136 10.366080 PING test completed (1648 hosts) 1226941625.138740 44.139604 PING test results sent 1226941634.162412 9.023672 Test result collection completed 1226941634.162446 0.000034 LDAP test engine setup completed 1226941634.162447 0.000001 LDAP tests executed 1226941634.162447 0.000000 LDAP tests result collection completed 1226941634.162448 0.000001 Test results transmitted 1226941634.212176 0.049728 bbtest-net completed 1226941634.213048 0.000872 TIME TOTAL 63.930649 Thanks John
list John Payne
On Nov 25, 2008, at 1:38 PM, John Payne wrote:
On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:Do you have the capability to use fping? I'd try that and see what the results are.I had an issue with fping back in the day. Before I spend too much time figuring out how to switch, I'd like to know what would cause the "test results sent" time to skyrocket.
Nothing?
Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer On Tue, Nov 25, 2008 at 11:19 AM, John Payne <user-ec997c02e3b9@xymon.invalid> wrote: bbtest has been yellow for the last week. I'm wondering if anyone has an idea what might be causing this? I'm using hobbitping. I've tried bumping concurrency up with no luck... but given that the increase in time is in "PING test results sent", I'm not sure that's the right move. Error output: Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 Timeout waiting for data from child, killing it Timeout waiting for data from child, killing it Child process terminated with signal 15 TIME SPENT Event Starttime Duration bbtest-net startup 1227628387.017852 - Service definitions loaded 1227628387.018981 0.001129 Tests loaded 1227628387.053238 0.034257 DNS lookups completed 1227628387.055291 0.002053 Test engine setup completed 1227628387.067797 0.012506 TCP tests completed 1227628397.998092 10.930295 PING test completed (1628 hosts) 1227628439.127806 41.129714 PING test results sent 1227628538.148170 99.020364 Test result collection completed 1227628538.148205 0.000035 LDAP test engine setup completed 1227628538.148205 0.000000 LDAP tests executed 1227628538.148206 0.000001 LDAP tests result collection completed 1227628538.148207 0.000001 Test results transmitted 1227628538.209262 0.061055 bbtest-net completed 1227628538.214518 0.005256 TIME TOTAL 151.196666 Last week: TIME SPENT Event Starttime Duration bbtest-net startup 1226941570.282399 - Service definitions loaded 1226941570.286460 0.004061 Tests loaded 1226941570.323923 0.037463 DNS lookups completed 1226941570.324033 0.000110 Test engine setup completed 1226941570.633056 0.309023 TCP tests completed 1226941580.999136 10.366080 PING test completed (1648 hosts) 1226941625.138740 44.139604 PING test results sent 1226941634.162412 9.023672 Test result collection completed 1226941634.162446 0.000034 LDAP test engine setup completed 1226941634.162447 0.000001 LDAP tests executed 1226941634.162447 0.000000 LDAP tests result collection completed 1226941634.162448 0.000001 Test results transmitted 1226941634.212176 0.049728 bbtest-net completed 1226941634.213048 0.000872 TIME TOTAL 63.930649 Thanks John
list Henrik Størner
▸
In <user-045c9f74b2ff@xymon.invalid> John Payne <user-ec997c02e3b9@xymon.invalid> writes:
On Nov 25, 2008, at 1:38 PM, John Payne wrote:
On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:Do you have the capability to use fping? I'd try that and see what the results are.I had an issue with fping back in the day. Before I spend too much time figuring out how to switch, I'd like to know what would cause the "test results sent" time to skyrocket.
Nothing?
In my experience it is usually due to either a network issue between the network-test server and the main Hobbit server, or a very high
load on the main Hobbit server causing it to take very long to accept
incoming connections. Try correlating the time of your problem with
the load on the Hobbit server (e.g. look at the "vmstat" graphs from
that time).
Regards,
Henrik
list John Payne
▸
On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:
In <user-045c9f74b2ff@xymon.invalid> John Payne <user-ec997c02e3b9@xymon.invalidwrites:On Nov 25, 2008, at 1:38 PM, John Payne wrote:On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:Do you have the capability to use fping? I'd try that and see what the results are.I had an issue with fping back in the day. Before I spend too much time figuring out how to switch, I'd like to know what would cause the "test results sent" time to skyrocket.Nothing?In my experience it is usually due to either a network issue between the network-test server and the main Hobbit server, or a very high
That'd be loopback... so unlikely to be a network issue.
▸
load on the main Hobbit server causing it to take very long to accept incoming connections. Try correlating the time of your problem with the load on the Hobbit server (e.g. look at the "vmstat" graphs from that time).
Hrm, I don't seem to be graphing vmstat. However, looking at all of
the graphs under trends for this host, they all look relatively flat
_except_ bbtest runtime which cliffs from ~70ms to over 150ms.
list Sebastian Auriol
▸
John Payne <mailto:user-ec997c02e3b9@xymon.invalid> wrote:
On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:load on the main Hobbit server causing it to take very long to accept incoming connections. Try correlating the time of your problem with the load on the Hobbit server (e.g. look at the "vmstat" graphs from that time).Hrm, I don't seem to be graphing vmstat. However, looking at all of the graphs under trends for this host, they all look relatively flat _except_ bbtest runtime which cliffs from ~70ms to over 150ms.
You're not graphing 'CPU Load' or 'CPU Utilization'? They are both graphed by default (assuming you have hobbit-client running of course: might want to check hobbitlaunch.cfg (hobbitclient)). SebA
list John Payne
▸
On Dec 2, 2008, at 1:57 PM, "SebA" <user-7b2156f36779@xymon.invalid> wrote:
John Payne <mailto:user-ec997c02e3b9@xymon.invalid> wrote:On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:load on the main Hobbit server causing it to take very long to accept incoming connections. Try correlating the time of your problem with the load on the Hobbit server (e.g. look at the "vmstat" graphs from that time).Hrm, I don't seem to be graphing vmstat. However, looking at all of the graphs under trends for this host, they all look relatively flat _except_ bbtest runtime which cliffs from ~70ms to over 150ms.You're not graphing 'CPU Load' or 'CPU Utilization'? They are both graphed by default (assuming you have hobbit-client running of course: might want to check hobbitlaunch.cfg (hobbitclient)).
I am... And those are flat.
list John Payne
▸
On Dec 2, 2008, at 2:31 PM, John Payne wrote:
On Dec 2, 2008, at 1:57 PM, "SebA" <user-7b2156f36779@xymon.invalid> wrote:John Payne <mailto:user-ec997c02e3b9@xymon.invalid> wrote:On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:load on the main Hobbit server causing it to take very long to accept incoming connections. Try correlating the time of your problem with the load on the Hobbit server (e.g. look at the "vmstat" graphs from that time).Hrm, I don't seem to be graphing vmstat. However, looking at all of the graphs under trends for this host, they all look relatively flat _except_ bbtest runtime which cliffs from ~70ms to over 150ms.You're not graphing 'CPU Load' or 'CPU Utilization'? They are both graphed by default (assuming you have hobbit-client running of course: might want to check hobbitlaunch.cfg (hobbitclient)).I am... And those are flat.
Any other suggestions? I'd really like to stop having a "yellow" status... on bbtest :)