Xymon Mailing List Archive search

bbtest yellow - PING test results sent unacceptably high

9 messages in this thread

list John Payne · Tue, 25 Nov 2008 11:19:10 -0500 ·
bbtest has been yellow for the last week.   I'm wondering if anyone  has an idea what might be causing this?
I'm using hobbitping.  I've tried bumping concurrency up with no  luck... but given that the increase in time is in "PING test results  sent", I'm not sure that's the right move.

Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15


TIME SPENT
Event                                            Starttime           Duration
bbtest-net startup                        1227628387.017852                 -
Service definitions loaded               1227628387.018981           0.001129
Tests loaded                             1227628387.053238           0.034257
DNS lookups completed                    1227628387.055291           0.002053
Test engine setup completed              1227628387.067797           0.012506
TCP tests completed                      1227628397.998092          10.930295
PING test completed (1628 hosts)         1227628439.127806          41.129714
PING test results sent                   1227628538.148170          99.020364
Test result collection completed         1227628538.148205           0.000035
LDAP test engine setup completed         1227628538.148205           0.000000
LDAP tests executed                      1227628538.148206           0.000001
LDAP tests result collection completed   1227628538.148207           0.000001
Test results transmitted                 1227628538.209262           0.061055
bbtest-net completed                     1227628538.214518           0.005256
TIME TOTAL                                                         151.196666

Last week:
TIME SPENT
Event                                            Starttime           Duration
bbtest-net startup                        1226941570.282399                 -
Service definitions loaded               1226941570.286460           0.004061
Tests loaded                             1226941570.323923           0.037463
DNS lookups completed                    1226941570.324033           0.000110
Test engine setup completed              1226941570.633056           0.309023
TCP tests completed                      1226941580.999136          10.366080
PING test completed (1648 hosts)         1226941625.138740          44.139604
PING test results sent                   1226941634.162412           9.023672
Test result collection completed         1226941634.162446           0.000034
LDAP test engine setup completed         1226941634.162447           0.000001
LDAP tests executed                      1226941634.162447           0.000000
LDAP tests result collection completed   1226941634.162448           0.000001
Test results transmitted                 1226941634.212176           0.049728
bbtest-net completed                     1226941634.213048           0.000872
TIME TOTAL                                                          63.930649
Thanks
John
list Josh Luthman · Tue, 25 Nov 2008 11:26:37 -0500 ·
Do you have the capability to use fping?  I'd try that and see what the
results are.

Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
quoted from John Payne


On Tue, Nov 25, 2008 at 11:19 AM, John Payne <user-ec997c02e3b9@xymon.invalid> wrote:
bbtest has been yellow for the last week.   I'm wondering if anyone has an idea what might be causing this?

I'm using hobbitping.  I've tried bumping concurrency up with no luck... but given that the increase in time is in "PING test results sent", I'm not sure that's the right move.


Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15


TIME SPENT
Event                                            Starttime          Duration
bbtest-net startup                       1227628387.017852                 -
Service definitions loaded               1227628387.018981          0.001129
Tests loaded                             1227628387.053238          0.034257
DNS lookups completed                    1227628387.055291          0.002053
Test engine setup completed              1227628387.067797          0.012506
TCP tests completed                      1227628397.998092         10.930295
PING test completed (1628 hosts)         1227628439.127806         41.129714
PING test results sent                   1227628538.148170         99.020364
Test result collection completed         1227628538.148205          0.000035
LDAP test engine setup completed         1227628538.148205          0.000000
LDAP tests executed                      1227628538.148206          0.000001
LDAP tests result collection completed   1227628538.148207          0.000001
Test results transmitted                 1227628538.209262          0.061055
bbtest-net completed                     1227628538.214518          0.005256
TIME TOTAL                                                        151.196666


Last week:

TIME SPENT
Event                                            Starttime          Duration
bbtest-net startup                       1226941570.282399                 -
Service definitions loaded               1226941570.286460          0.004061
Tests loaded                             1226941570.323923          0.037463
DNS lookups completed                    1226941570.324033          0.000110
Test engine setup completed              1226941570.633056          0.309023
TCP tests completed                      1226941580.999136         10.366080
PING test completed (1648 hosts)         1226941625.138740         44.139604
PING test results sent                   1226941634.162412          9.023672
Test result collection completed         1226941634.162446          0.000034
LDAP test engine setup completed         1226941634.162447          0.000001
LDAP tests executed                      1226941634.162447          0.000000
LDAP tests result collection completed   1226941634.162448          0.000001
Test results transmitted                 1226941634.212176          0.049728
bbtest-net completed                     1226941634.213048          0.000872
TIME TOTAL                                                         63.930649

Thanks
John

list John Payne · Tue, 25 Nov 2008 13:38:16 -0500 ·
quoted from Josh Luthman
On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:
Do you have the capability to use fping?  I'd try that and see what  the results are.
I had an issue with fping back in the day.   Before I spend too much  time figuring out how to switch, I'd like to know what would cause the  "test results sent" time to skyrocket.
quoted from Josh Luthman

Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


On Tue, Nov 25, 2008 at 11:19 AM, John Payne <user-ec997c02e3b9@xymon.invalid>  wrote:
bbtest has been yellow for the last week.   I'm wondering if anyone  has an idea what might be causing this?
I'm using hobbitping.  I've tried bumping concurrency up with no  luck... but given that the increase in time is in "PING test results  sent", I'm not sure that's the right move.

Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15


TIME SPENT
Event                                            Starttime           Duration
bbtest-net startup                        1227628387.017852                 -
Service definitions loaded               1227628387.018981           0.001129
Tests loaded                             1227628387.053238           0.034257
DNS lookups completed                    1227628387.055291           0.002053
Test engine setup completed              1227628387.067797           0.012506
TCP tests completed                      1227628397.998092          10.930295
PING test completed (1628 hosts)         1227628439.127806          41.129714
PING test results sent                   1227628538.148170          99.020364
Test result collection completed         1227628538.148205           0.000035
LDAP test engine setup completed         1227628538.148205           0.000000
LDAP tests executed                      1227628538.148206           0.000001
LDAP tests result collection completed   1227628538.148207           0.000001
Test results transmitted                 1227628538.209262           0.061055
bbtest-net completed                     1227628538.214518           0.005256
TIME TOTAL                                                         151.196666

Last week:
TIME SPENT
Event                                            Starttime           Duration
bbtest-net startup                        1226941570.282399                 -
Service definitions loaded               1226941570.286460           0.004061
Tests loaded                             1226941570.323923           0.037463
DNS lookups completed                    1226941570.324033           0.000110
Test engine setup completed              1226941570.633056           0.309023
TCP tests completed                      1226941580.999136          10.366080
PING test completed (1648 hosts)         1226941625.138740          44.139604
PING test results sent                   1226941634.162412           9.023672
Test result collection completed         1226941634.162446           0.000034
LDAP test engine setup completed         1226941634.162447           0.000001
LDAP tests executed                      1226941634.162447           0.000000
LDAP tests result collection completed   1226941634.162448           0.000001
Test results transmitted                 1226941634.212176           0.049728
bbtest-net completed                     1226941634.213048           0.000872
TIME TOTAL                                                          63.930649
Thanks
John

list John Payne · Mon, 1 Dec 2008 16:40:07 -0500 ·
On Nov 25, 2008, at 1:38 PM, John Payne wrote:
On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:
Do you have the capability to use fping?  I'd try that and see what  the results are.
I had an issue with fping back in the day.   Before I spend too much  time figuring out how to switch, I'd like to know what would cause  the "test results sent" time to skyrocket.
Nothing?
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


On Tue, Nov 25, 2008 at 11:19 AM, John Payne <user-ec997c02e3b9@xymon.invalid>  wrote:
bbtest has been yellow for the last week.   I'm wondering if anyone  has an idea what might be causing this?
I'm using hobbitping.  I've tried bumping concurrency up with no  luck... but given that the increase in time is in "PING test  results sent", I'm not sure that's the right move.

Error output:
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15
Timeout waiting for data from child, killing it
Timeout waiting for data from child, killing it
Child process terminated with signal 15


TIME SPENT
Event                                            Starttime           Duration
bbtest-net startup                        1227628387.017852                 -
Service definitions loaded               1227628387.018981           0.001129
Tests loaded                             1227628387.053238           0.034257
DNS lookups completed                    1227628387.055291           0.002053
Test engine setup completed              1227628387.067797           0.012506
TCP tests completed                      1227628397.998092          10.930295
PING test completed (1628 hosts)         1227628439.127806          41.129714
PING test results sent                   1227628538.148170          99.020364
Test result collection completed         1227628538.148205           0.000035
LDAP test engine setup completed         1227628538.148205           0.000000
LDAP tests executed                      1227628538.148206           0.000001
LDAP tests result collection completed   1227628538.148207           0.000001
Test results transmitted                 1227628538.209262           0.061055
bbtest-net completed                     1227628538.214518           0.005256
TIME TOTAL                                                         151.196666

Last week:
TIME SPENT
Event                                            Starttime           Duration
bbtest-net startup                        1226941570.282399                 -
Service definitions loaded               1226941570.286460           0.004061
Tests loaded                             1226941570.323923           0.037463
DNS lookups completed                    1226941570.324033           0.000110
Test engine setup completed              1226941570.633056           0.309023
TCP tests completed                      1226941580.999136          10.366080
PING test completed (1648 hosts)         1226941625.138740          44.139604
PING test results sent                   1226941634.162412           9.023672
Test result collection completed         1226941634.162446           0.000034
LDAP test engine setup completed         1226941634.162447           0.000001
LDAP tests executed                      1226941634.162447           0.000000
LDAP tests result collection completed   1226941634.162448           0.000001
Test results transmitted                 1226941634.212176           0.049728
bbtest-net completed                     1226941634.213048           0.000872
TIME TOTAL                                                          63.930649
Thanks
John

list Henrik Størner · Mon, 1 Dec 2008 21:44:06 +0000 (UTC) ·
quoted from John Payne
In <user-045c9f74b2ff@xymon.invalid> John Payne <user-ec997c02e3b9@xymon.invalid> writes:
On Nov 25, 2008, at 1:38 PM, John Payne wrote:
On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:
Do you have the capability to use fping?  I'd try that and see what  the results are.
I had an issue with fping back in the day.   Before I spend too much  time figuring out how to switch, I'd like to know what would cause  the "test results sent" time to skyrocket.
Nothing?

In my experience it is usually due to either a network issue between the network-test server and the main Hobbit server, or a very high
load on the main Hobbit server causing it to take very long to accept
incoming connections. Try correlating the time of your problem with
the load on the Hobbit server (e.g. look at the "vmstat" graphs from
that time).


Regards,
Henrik
list John Payne · Tue, 2 Dec 2008 10:15:04 -0500 ·
quoted from Henrik Størner
On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:
In <user-045c9f74b2ff@xymon.invalid> John Payne <user-ec997c02e3b9@xymon.invalid 
writes:
On Nov 25, 2008, at 1:38 PM, John Payne wrote:
On Nov 25, 2008, at 11:26 AM, Josh Luthman wrote:
Do you have the capability to use fping?  I'd try that and see what
the results are.
I had an issue with fping back in the day.   Before I spend too much
time figuring out how to switch, I'd like to know what would cause
the "test results sent" time to skyrocket.
Nothing?

In my experience it is usually due to either a network issue between
the network-test server and the main Hobbit server, or a very high
That'd be loopback... so unlikely to be a network issue.
quoted from Henrik Størner
load on the main Hobbit server causing it to take very long to accept
incoming connections. Try correlating the time of your problem with
the load on the Hobbit server (e.g. look at the "vmstat" graphs from
that time).

Hrm, I don't seem to be graphing vmstat.  However, looking at all of  
the graphs under trends for this host, they all look relatively flat  
_except_ bbtest runtime which cliffs from ~70ms to over 150ms.
list Sebastian Auriol · Tue, 2 Dec 2008 18:57:37 -0000 ·
quoted from John Payne
John Payne <mailto:user-ec997c02e3b9@xymon.invalid> wrote:
On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:
load on the main Hobbit server causing it to take very long to accept
incoming connections. Try correlating the time of your problem with
the load on the Hobbit server (e.g. look at the "vmstat" graphs from
that time).

Hrm, I don't seem to be graphing vmstat.  However, looking at all of
the graphs under trends for this host, they all look relatively flat
_except_ bbtest runtime which cliffs from ~70ms to over 150ms.
You're not graphing 'CPU Load' or 'CPU Utilization'? They are both graphed
by default (assuming you have hobbit-client running of course: might want to
check hobbitlaunch.cfg (hobbitclient)).

SebA
list John Payne · Tue, 2 Dec 2008 14:31:49 -0500 ·
quoted from Sebastian Auriol

On Dec 2, 2008, at 1:57 PM, "SebA" <user-7b2156f36779@xymon.invalid> wrote:
John Payne <mailto:user-ec997c02e3b9@xymon.invalid> wrote:
On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:
load on the main Hobbit server causing it to take very long to  
accept
incoming connections. Try correlating the time of your problem with
the load on the Hobbit server (e.g. look at the "vmstat" graphs from
that time).

Hrm, I don't seem to be graphing vmstat.  However, looking at all of
the graphs under trends for this host, they all look relatively flat
_except_ bbtest runtime which cliffs from ~70ms to over 150ms.
You're not graphing 'CPU Load' or 'CPU Utilization'? They are both  
graphed
by default (assuming you have hobbit-client running of course: might  
want to
check hobbitlaunch.cfg (hobbitclient)).
I am... And those are flat.
list John Payne · Mon, 8 Dec 2008 13:27:13 -0500 ·
quoted from John Payne
On Dec 2, 2008, at 2:31 PM, John Payne wrote:
On Dec 2, 2008, at 1:57 PM, "SebA" <user-7b2156f36779@xymon.invalid> wrote:
John Payne <mailto:user-ec997c02e3b9@xymon.invalid> wrote:
On Dec 1, 2008, at 4:44 PM, Henrik Størner wrote:
load on the main Hobbit server causing it to take very long to  accept
incoming connections. Try correlating the time of your problem with
the load on the Hobbit server (e.g. look at the "vmstat" graphs  from
that time).

Hrm, I don't seem to be graphing vmstat.  However, looking at all of
the graphs under trends for this host, they all look relatively flat
_except_ bbtest runtime which cliffs from ~70ms to over 150ms.
You're not graphing 'CPU Load' or 'CPU Utilization'? They are both  graphed
by default (assuming you have hobbit-client running of course:  might want to
check hobbitlaunch.cfg (hobbitclient)).
I am... And those are flat.
Any other suggestions?  I'd really like to stop having a "yellow"  status...  on bbtest :)