Is there a limit on the number of hosts that can polled?
list Jonathan Smith
Once I start to get around 2700 hosts hobbit stops updating. Is there a cause for this in hobbit (memory or host limit)? Top shows my system with very little utilization of memory or cpu and all hosts go purple for no updates. I can get hobbit updating again by commenting out hosts and restarting it but I still have roughly another fifteen hundred hosts to enter. I have been adding the hosts in slowly in groups of less than two hundred which seems to work better as I was seeing everything go purple at under two thousand hosts before. Jon Smith Network Support Technician Time Warner Cable bbtest-net version 4.2.0 Statistics: Hosts total : 2672 Hosts with no tests : 19 Total test count : 2656 Status messages : 2657 Alert status msgs : 0 Transmissions : 28 TCP test statistics: # TCP tests total : 3 # HTTP tests : 1 # Simple TCP tests : 2 # Connection attempts : 3 # bytes written : 137 # bytes read : 374 TIME SPENT Event Starttime Duration bbtest-net startup 1232031572.798610 • Service definitions loaded 1232031572.804664 0.006054 Tests loaded 1232031606.460960 33.656296 DNS lookups completed 1232031606.547165 0.086205 Test engine setup completed 1232031606.571986 0.024821 TCP tests completed 1232031606.575635 0.003649 PING test completed (2653 hosts) 1232031711.407204 104.831569 PING test results sent 1232031711.435915 0.028711 Test result collection completed 1232031711.435934 0.000019 LDAP test engine setup completed 1232031711.435940 0.000006 LDAP tests executed 1232031711.435947 0.000007 LDAP tests result collection completed 1232031711.435953 0.000006 Test results transmitted 1232031711.575185 0.139232 bbtest-net completed 1232031711.578223 0.003038 TIME TOTAL 138.779613 P Go Green! Print this email only when necessary. Thank you for helping Time Warner Cable be environmentally responsible. This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
list Brian Catlin
As nobody took a shot at this, While you are ok on memory and CPU - have you looked at your other resources? With that many hosts reporting back to a master - I would suspect I/O flooding off your interface... Just a thought .... Brian user-259d6a9a548a@xymon.invalid
▸
-------Original Message-------
From: Smith, Jonathan
Date: 1/15/2009 10:19:33 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Is there a limit on the number of hosts that can polled?
Once I start to get around 2700 hosts hobbit stops updating. Is there a
cause for this in hobbit (memory or host limit)? Top shows my system with
very little utilization of memory or cpu and all hosts go purple for no
updates. I can get hobbit updating again by commenting out hosts and
restarting it but I still have roughly another fifteen hundred hosts to
enter.
I have been adding the hosts in slowly in groups of less than two hundred
which seems to work better as I was seeing everything go purple at under two
thousand hosts before.
Jon Smith
Network Support Technician
Time Warner Cable
bbtest-net version 4.2.0
Statistics:
Hosts total : 2672
Hosts with no tests : 19
Total test count : 2656
Status messages : 2657
Alert status msgs : 0
Transmissions : 28
TCP test statistics:
# TCP tests total : 3
# HTTP tests : 1
# Simple TCP tests : 2
# Connection attempts : 3
# bytes written : 137
# bytes read : 374
TIME SPENT
Event Starttime Duration
bbtest-net startup 1232031572.798610 -
Service definitions loaded 1232031572.804664 0.006054
Tests loaded 1232031606.460960 33.656296
DNS lookups completed 1232031606.547165 0.086205
Test engine setup completed 1232031606.571986 0.024821
TCP tests completed 1232031606.575635 0.003649
PING test completed (2653 hosts) 1232031711.407204 104.831569
PING test results sent 1232031711.435915 0.028711
Test result collection completed 1232031711.435934 0.000019
LDAP test engine setup completed 1232031711.435940 0.000006
LDAP tests executed 1232031711.435947 0.000007
LDAP tests result collection completed 1232031711.435953 0.000006
Test results transmitted 1232031711.575185 0.139232
bbtest-net completed 1232031711.578223 0.003038
TIME TOTAL 138.779613
P Go Green! Print this email only when necessary. Thank you for helping Time
Warner Cable be environmentally responsible.
This E-mail and any of its attachments may contain Time Warner
Cable proprietary information, which is privileged, confidential,
or subject to copyright belonging to Time Warner Cable. This E-mail
is intended solely for the use of the individual or entity to which
it is addressed. If you are not the intended recipient of this
E-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the contents
of and attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify
the sender immediately and permanently delete the original and any
copy of this E-mail and any printout.
list Josh Luthman
You're definitely not at Hobbit's maximum as this user has twice the number of hosts! http://en.wikibooks.org/wiki/The_hobbit_Users_list#Steria Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
▸
On Sat, Jan 17, 2009 at 7:54 PM, Brian Catlin <user-af6e4c377507@xymon.invalid> wrote:
As nobody took a shot at this, While you are ok on memory and CPU - have you looked at your other resources? With that many hosts reporting back to a master - I would suspect I/O flooding off your interface... Just a thought .... Brian user-259d6a9a548a@xymon.invalid *-------Original Message-------*
*From:* Smith, Jonathan <user-d73e0809fcb6@xymon.invalid>
▸
*Date:* 1/15/2009 10:19:33 AM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* [hobbit] Is there a limit on the number of hosts that can
polled?
Once I start to get around 2700 hosts hobbit stops updating. Is there a
cause for this in hobbit (memory or host limit)? Top shows my system with
very little utilization of memory or cpu and all hosts go purple for no
updates. I can get hobbit updating again by commenting out hosts and
restarting it but I still have roughly another fifteen hundred hosts to
enter.
I have been adding the hosts in slowly in groups of less than two hundred
which seems to work better as I was seeing everything go purple at under two
thousand hosts before.
Jon Smith
Network Support Technician
Time Warner Cable
bbtest-net version 4.2.0
Statistics:
Hosts total : 2672
Hosts with no tests : 19
Total test count : 2656
Status messages : 2657
Alert status msgs : 0
Transmissions : 28
TCP test statistics:
# TCP tests total : 3
# HTTP tests : 1
# Simple TCP tests : 2
# Connection attempts : 3
# bytes written : 137
# bytes read : 374
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1232031572.798610
• Service definitions loaded 1232031572.804664
0.006054
Tests loaded 1232031606.460960
33.656296
DNS lookups completed 1232031606.547165
0.086205
Test engine setup completed 1232031606.571986
0.024821
TCP tests completed 1232031606.575635
0.003649
PING test completed (2653 hosts) 1232031711.407204
104.831569
PING test results sent 1232031711.435915
0.028711
Test result collection completed 1232031711.435934
0.000019
LDAP test engine setup completed 1232031711.435940
0.000006
LDAP tests executed 1232031711.435947
0.000007
LDAP tests result collection completed 1232031711.435953
0.000006
Test results transmitted 1232031711.575185
0.139232
bbtest-net completed 1232031711.578223
0.003038
TIME TOTAL
138.779613
P Go Green! Print this email only when necessary. Thank you for helping
Time Warner Cable be environmentally responsible.
This E-mail and any of its attachments may contain Time Warner
Cable proprietary information, which is privileged, confidential,
or subject to copyright belonging to Time Warner Cable. This E-mail
is intended solely for the use of the individual or entity to which
it is addressed. If you are not the intended recipient of this
E-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the contents
of and attachments to this E-mail is strictly prohibited and may be
unlawful. If you have received this E-mail in error, please notify
the sender immediately and permanently delete the original and any
copy of this E-mail and any printout.
list Shawn Heisey
I would agree with this, the disk subsystem is probably unable to keep up with the I/O load. Use "iostat 30" or "vmstat 30" to determine iowait percentage, which is probably very high. To fix it, get rid of any raid5/6 (even if handled by a dedicated controller) or LVM, and possibly use faster disks. The best balance between performance and data redundancy is raid10, but obviously it costs more because there are more disks. For write-intensive tasks like this, even JBOD is a better performance option than raid5. Because I never use it, I don't really know why LVM causes problems, but I know from others' experience that it does. The problem with raid5 and raid6 is that there's a write penalty due to the need to calculate and write parity data. A good controller with memory for write caching can mitigate this in many typical circumstances, but only if the entire transaction can fit in the cache memory and can be flushed to disk before another data flood comes in. In this case, it takes about 2700 hosts to generate more data than the system can write before more arrives.
▸
Brian Catlin wrote:As nobody took a shot at this, While you are ok on memory and CPU - have you looked at your other resources? With that many hosts reporting back to a master - I would suspect I/O flooding off your interface... Just a thought ....
list Henrik Størner
▸
On Thu, Jan 15, 2009 at 10:16:11AM -0500, Smith, Jonathan wrote:
Once I start to get around 2700 hosts hobbit stops updating. Is there a cause for this in hobbit (memory or host limit)? Top shows my system with very little utilization of memory or cpu and all hosts go purple for no updates. I can get hobbit updating again by commenting out hosts and restarting it but I still have roughly another fifteen hundred hosts to enter.
Which tests go purple - the network tests ("conn" status, since you
seem to be doing mostly ping tests), or all of them including the
client-side tests (cpu, disk, memory etc.) ?
You shouldn't have any problems with that number of hosts.
You're nowhere near the number of hosts I have in my production setup;
I have about 5700 entries in bb-hosts, my main network probe tests
4100 of them. And it seems your network tests complete well within
the 300 second max. poll time. What does the "bbgen" status say about
the time it takes to build the webpages ? And what's the I/O load on
the Hobbit server - check out the "CPU utilization" graph in the
"trends" column (NOT the "CPU load" one - you want the multi-color
'stacked' graph).
Are there any errors logged in the various Hobbit logfiles ? Or any
"ressource" problems logged in the operating system logs - like,
out of sockets, network card issues, or other weird messages?
And what operating system is this on ?
The limitations I've seen over time mostly have to do with the amount
of disk I/O caused by the hobbitd_rrd RRD graph data collector (the
update caching added in the current development version solves that
problem completely), and with the network ressources used for testing
lots of hosts - some systems have fairly small ARP caches, and this
can cause all sorts of weird problems, because Hobbit will sporadically
lose contact with itself or with the systems it is testing.
Regards,
Henrik