Xymon Mailing List Archive search

Is there a limit on the number of hosts that can polled?

5 messages in this thread

list Jonathan Smith · Thu, 15 Jan 2009 10:16:11 -0500 ·
Once I start to get around 2700 hosts hobbit stops updating. Is there a
cause for this in hobbit (memory or host limit)? Top shows my system
with very little utilization of memory or cpu and all hosts go purple
for no updates.  I can get hobbit updating again by commenting out hosts
and restarting it but I still have roughly another fifteen hundred hosts
to enter.
I have been adding the hosts in slowly in groups of less than two
hundred which seems to work better as I was seeing everything go purple
at under two thousand hosts before.
 
Jon Smith
Network Support Technician
Time Warner Cable


bbtest-net version 4.2.0
Statistics:
Hosts total           :     2672
Hosts with no tests   :       19
Total test count      :     2656
Status messages       :     2657
Alert status msgs     :        0
Transmissions         :       28
 
TCP test statistics:
# TCP tests total     :        3
# HTTP tests          :        1
# Simple TCP tests    :        2
# Connection attempts :        3
# bytes written       :      137
# bytes read          :      374


TIME SPENT
Event                                            Starttime
Duration
bbtest-net startup                       1232031572.798610
• Service definitions loaded               1232031572.804664
0.006054 
Tests loaded                             1232031606.460960
33.656296 
DNS lookups completed                    1232031606.547165
0.086205 
Test engine setup completed              1232031606.571986
0.024821 
TCP tests completed                      1232031606.575635
0.003649 
PING test completed (2653 hosts)         1232031711.407204
104.831569 
PING test results sent                   1232031711.435915
0.028711 
Test result collection completed         1232031711.435934
0.000019 
LDAP test engine setup completed         1232031711.435940
0.000006 
LDAP tests executed                      1232031711.435947
0.000007 
LDAP tests result collection completed   1232031711.435953
0.000006 
Test results transmitted                 1232031711.575185
0.139232 
bbtest-net completed                     1232031711.578223
0.003038 
TIME TOTAL
138.779613 


P Go Green! Print this email only when necessary. Thank you for helping Time Warner Cable be environmentally responsible.
 
 
This E-mail and any of its attachments may contain Time Warner
Cable proprietary information, which is privileged, confidential,
or subject to copyright belonging to Time Warner Cable. This E-mail
is intended solely for the use of the individual or entity to which
it is addressed. If you are not the intended recipient of this
E-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the contents
of and attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify
the sender immediately and permanently delete the original and any
copy of this E-mail and any printout.
list Brian Catlin · Sat, 17 Jan 2009 19:54:58 -0500 (Eastern Standard Time) ·
As nobody took a shot at this,   While you are ok on memory and CPU - have
you looked at your other resources?  With that many hosts reporting back to
a master - I would suspect I/O flooding off your interface...

Just a thought ....

Brian 
 
user-259d6a9a548a@xymon.invalid
quoted from Jonathan Smith
-------Original Message-------
 
From: Smith, Jonathan
Date: 1/15/2009 10:19:33 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Is there a limit on the number of hosts that can polled?
 
Once I start to get around 2700 hosts hobbit stops updating. Is there a
cause for this in hobbit (memory or host limit)? Top shows my system with
very little utilization of memory or cpu and all hosts go purple for no
updates.  I can get hobbit updating again by commenting out hosts and
restarting it but I still have roughly another fifteen hundred hosts to
enter.
I have been adding the hosts in slowly in groups of less than two hundred
which seems to work better as I was seeing everything go purple at under two
thousand hosts before.
 
Jon Smith
Network Support Technician
Time Warner Cable


bbtest-net version 4.2.0
Statistics:
Hosts total           :     2672
Hosts with no tests   :       19
Total test count      :     2656
Status messages       :     2657
Alert status msgs     :        0
Transmissions         :       28
 
TCP test statistics:
# TCP tests total     :        3
# HTTP tests          :        1
# Simple TCP tests    :        2
# Connection attempts :        3
# bytes written       :      137
# bytes read          :      374


TIME SPENT
Event                                            Starttime          Duration
bbtest-net startup                       1232031572.798610                 -
Service definitions loaded               1232031572.804664          0.006054

Tests loaded                             1232031606.460960         33.656296

DNS lookups completed                    1232031606.547165          0.086205

Test engine setup completed              1232031606.571986          0.024821

TCP tests completed                      1232031606.575635          0.003649

PING test completed (2653 hosts)         1232031711.407204        104.831569

PING test results sent                   1232031711.435915          0.028711

Test result collection completed         1232031711.435934          0.000019

LDAP test engine setup completed         1232031711.435940          0.000006

LDAP tests executed                      1232031711.435947          0.000007

LDAP tests result collection completed   1232031711.435953          0.000006

Test results transmitted                 1232031711.575185          0.139232

bbtest-net completed                     1232031711.578223          0.003038

TIME TOTAL                                                        138.779613


P Go Green! Print this email only when necessary. Thank you for helping Time
Warner Cable be environmentally responsible.
 
This E-mail and any of its attachments may contain Time Warner
Cable proprietary information, which is privileged, confidential,
or subject to copyright belonging to Time Warner Cable. This E-mail
is intended solely for the use of the individual or entity to which
it is addressed. If you are not the intended recipient of this
E-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the contents
of and attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify
the sender immediately and permanently delete the original and any
copy of this E-mail and any printout.
list Josh Luthman · Sat, 17 Jan 2009 20:39:04 -0500 ·
You're definitely not at Hobbit's maximum as this user has twice the number
of hosts!

http://en.wikibooks.org/wiki/The_hobbit_Users_list#Steria

Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
quoted from Brian Catlin


On Sat, Jan 17, 2009 at 7:54 PM, Brian Catlin <user-af6e4c377507@xymon.invalid> wrote:
   As nobody took a shot at this,   While you are ok on memory and CPU -
have you looked at your other resources?  With that many hosts reporting
back to a master - I would suspect I/O flooding off your interface...

Just a thought ....

Brian

 user-259d6a9a548a@xymon.invalid
*-------Original Message-------*

 *From:* Smith, Jonathan <user-d73e0809fcb6@xymon.invalid>
quoted from Brian Catlin
*Date:* 1/15/2009 10:19:33 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* [hobbit] Is there a limit on the number of hosts that can
polled?


Once I start to get around 2700 hosts hobbit stops updating. Is there a
cause for this in hobbit (memory or host limit)? Top shows my system with
very little utilization of memory or cpu and all hosts go purple for no
updates.  I can get hobbit updating again by commenting out hosts and
restarting it but I still have roughly another fifteen hundred hosts to
enter.
I have been adding the hosts in slowly in groups of less than two hundred
which seems to work better as I was seeing everything go purple at under two
thousand hosts before.


Jon Smith
Network Support Technician
Time Warner Cable


bbtest-net version 4.2.0
Statistics:
Hosts total           :     2672
Hosts with no tests   :       19
Total test count      :     2656
Status messages       :     2657
Alert status msgs     :        0
Transmissions         :       28

TCP test statistics:
# TCP tests total     :        3
# HTTP tests          :        1
# Simple TCP tests    :        2
# Connection attempts :        3
# bytes written       :      137
# bytes read          :      374

TIME SPENT
Event                                            Starttime
Duration
bbtest-net startup                       1232031572.798610
• Service definitions loaded               1232031572.804664
0.006054
Tests loaded                             1232031606.460960
33.656296
DNS lookups completed                    1232031606.547165
0.086205
Test engine setup completed              1232031606.571986
0.024821
TCP tests completed                      1232031606.575635
0.003649
PING test completed (2653 hosts)         1232031711.407204
104.831569
PING test results sent                   1232031711.435915
0.028711
Test result collection completed         1232031711.435934
0.000019
LDAP test engine setup completed         1232031711.435940
0.000006
LDAP tests executed                      1232031711.435947
0.000007
LDAP tests result collection completed   1232031711.435953
0.000006
Test results transmitted                 1232031711.575185
0.139232
bbtest-net completed                     1232031711.578223
0.003038
TIME TOTAL
138.779613


P Go Green! Print this email only when necessary. Thank you for helping
Time Warner Cable be environmentally responsible.


 This E-mail and any of its attachments may contain Time Warner
Cable proprietary information, which is privileged, confidential,
or subject to copyright belonging to Time Warner Cable. This E-mail
is intended solely for the use of the individual or entity to which
it is addressed. If you are not the intended recipient of this
E-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the contents
of and attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify
the sender immediately and permanently delete the original and any
copy of this E-mail and any printout.

list Shawn Heisey · Sun, 18 Jan 2009 13:49:53 -0700 ·
I would agree with this, the disk subsystem is probably unable to keep up with the I/O load.  Use "iostat 30" or "vmstat 30" to determine iowait percentage, which is probably very high.  To fix it, get rid of any raid5/6 (even if handled by a dedicated controller) or LVM, and possibly use faster disks.  The best balance between performance and data redundancy is raid10, but obviously it costs more because there are more disks.  For write-intensive tasks like this, even JBOD is a better performance option than raid5.  Because I never use it, I don't really know why LVM causes problems, but I know from others' experience that it does.

The problem with raid5 and raid6 is that there's a write penalty due to the need to calculate and write parity data.  A good controller with memory for write caching can mitigate this in many typical circumstances, but only if the entire transaction can fit in the cache memory and can be flushed to disk before another data flood comes in.  In this case, it takes about 2700 hosts to generate more data than the system can write before more arrives.
quoted from Brian Catlin

Brian Catlin wrote:
As nobody took a shot at this,   While you are ok on memory and CPU - have you looked at your other resources?  With that many hosts reporting back to a master - I would suspect I/O flooding off your interface...
 Just a thought ....
 
list Henrik Størner · Sun, 18 Jan 2009 23:32:03 +0100 ·
quoted from Josh Luthman
On Thu, Jan 15, 2009 at 10:16:11AM -0500, Smith, Jonathan wrote:
Once I start to get around 2700 hosts hobbit stops updating. Is there a
cause for this in hobbit (memory or host limit)? Top shows my system
with very little utilization of memory or cpu and all hosts go purple
for no updates.  I can get hobbit updating again by commenting out hosts
and restarting it but I still have roughly another fifteen hundred hosts
to enter.
Which tests go purple - the network tests ("conn" status, since you
seem to be doing mostly ping tests), or all of them including the
client-side tests (cpu, disk, memory etc.) ?

You shouldn't have any problems with that number of hosts. 

You're nowhere near the number of hosts I have in my production setup;
I have about 5700 entries in bb-hosts, my main network probe tests
4100 of them. And it seems your network tests complete well within
the 300 second max. poll time. What does the "bbgen" status say about
the time it takes to build the webpages ? And what's the I/O load on
the Hobbit server - check out the "CPU utilization" graph in the
"trends" column (NOT the "CPU load" one - you want the multi-color
'stacked' graph).

Are there any errors logged in the various Hobbit logfiles ? Or any
"ressource" problems logged in the operating system logs - like,
out of sockets, network card issues, or other weird messages?


And what operating system is this on ?


The limitations I've seen over time mostly have to do with the amount
of disk I/O caused by the hobbitd_rrd RRD graph data collector (the
update caching added in the current development version solves that
problem completely), and with the network ressources used for testing
lots of hosts - some systems have fairly small ARP caches, and this
can cause all sorts of weird problems, because Hobbit will sporadically
lose contact with itself or with the systems it is testing.


Regards,
Henrik