Hobbit Server performance
list Jeff Newman
All, Have been taking a look at my hobbit server for the past week or so. Wanted to ask a couple things. The server is a DL360 G3 with two 3ghz processors and 2gb of RAM. It does only hobbit and cricket (10 or so routers, so not a huge hit) I've noticed the following two things: #1. I am constantly using 100% of my physical ram (no swap really) Mem: 2075040k total, 2051928k used, 23112k free, 52172k buffers Swap: 0k total, 0k used, 0k free, 1744168k cached is this normal? Not sure how to tell which process specifically is eating up the most RAM. #2. I am consistantly running with 2-300 ports in TIME_WAIT involving port 1984 # netstat -an | grep TIME | grep 1984 | wc -l 253 Im not sure if this is normal either, nor am I sure if it is bad or not? Here is the output from some of the "bb" buttons at the top of my hobbit page. If anything stands out, any help would be apprieciated. If I can provide any other info, let me know. Thanks, Jeff bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7a Feb 19 2003 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 137 Hosts with no tests : 0 Total test count : 141 Status messages : 141 Alert status msgs : 0 Transmissions : 4 DNS statistics: # hostnames resolved : 138 # succesful : 134 # failed : 4 # calls to dnsresolve : 141 TCP test statistics: # TCP tests total : 4 # HTTP tests : 2 # Simple TCP tests : 2 # Connection attempts : 4 # bytes written : 272 # bytes read : 6108 TIME SPENT Event Starttime Duration bbtest-net startup 1188421844.836442 - Service definitions loaded 1188421844.837799 0.001357 Tests loaded 1188421844.860329 0.022530 DNS lookups completed 1188421854.099983 9.239654 Test engine setup completed 1188421854.101787 0.001804 TCP tests completed 1188421854.249955 0.148168 PING test completed (137 hosts) 1188421867.657468 13.407513 PING test results sent 1188421867.659863 0.002395 Test result collection completed 1188421867.659878 0.000015 LDAP test engine setup completed 1188421867.659880 0.000002 LDAP tests executed 1188421867.659882 0.000002 LDAP tests result collection completed 1188421867.659883 0.000001 Test results transmitted 1188421867.660531 0.000648 bbtest-net completed 1188421867.662248 0.001717 TIME TOTAL 22.825806 bbgen for Hobbit version 4.2.0 Statistics: Hosts : 243 Status messages : 1255 Purple messages : 0 Pages : 22 TIME SPENT Event Starttime Duration Startup 1188421967.748593 - Load links done 1188421967.748887 0.000294 Load bbhosts done 1188421967.752720 0.003833 ACK removal done 1188421967.752787 0.000067 Load STATE done 1188421967.772438 0.019651 Color calculation done 1188421967.772690 0.000252 Hobbit pagegen start 1188421967.772717 0.000027 Hobbit pagegen done 1188421967.794090 0.021373 BB2 generation done 1188421967.800862 0.006772 BBNK generation done 1188421967.801386 0.000524 Summary transmission done 1188421967.801388 0.000002 Run completed 1188421967.801389 0.000001 TIME TOTAL 0.052796 Statistics for Hobbit daemon Up since 29-Aug-2007 13:49:58 (0 days, 02:20:00) Incoming messages : 152683 - status : 97745 - combo : 16461 - page : 10 - summary : 0 - data : 22090 - client : 16205 - notes : 0 - enable : 0 - disable : 0 - ack : 0 - config : 0 - query : 0 - hobbitdboard : 140 - hobbitdlog : 4 - drop : 0 - rename : 0 - dummy : 28 - ping : 0 - notify : 0 - schedule : 0 - download : 0 - Bogus/Timeouts : 0 Incoming messages/sec : 17 (average last 300 seconds) status channel messages: 97839 (1 readers) stachg channel messages: 1113 (1 readers) page channel messages: 293 (1 readers) data channel messages: 22090 (1 readers) notes channel messages: 0 (0 readers) enadis channel messages: 0 (0 readers) client channel messages: 16195 (1 readers) clichg channel messages: 0 (1 readers)
list Larry Barber
1. Linux tends to gobble up whatever memory thats not being used for buffers and cache, so on Linux boxes using nearly 100% of memory is normal. Use the "actual" reading to get the memory that is being used not including caching and buffers. 2. is normal, Hobbit is very network intensive. On the bb-test results, it looks like your DNS lookups are a little slow and your ping tests are taking a little longer than I would consider normal. I am running Hobbit on similar hardware as yours, monitoring over 600 hosts and have similar times for the DNS and ping. Thanks, Larry Barber
▸
On 8/29/07, Jeff Newman <user-e96740e73ca8@xymon.invalid> wrote:All, Have been taking a look at my hobbit server for the past week or so. Wanted to ask a couple things. The server is a DL360 G3 with two 3ghz processors and 2gb of RAM. It does only hobbit and cricket (10 or so routers, so not a huge hit) I've noticed the following two things: #1. I am constantly using 100% of my physical ram (no swap really) Mem: 2075040k total, 2051928k used, 23112k free, 52172k buffers Swap: 0k total, 0k used, 0k free, 1744168k cached is this normal? Not sure how to tell which process specifically is eating up the most RAM. #2. I am consistantly running with 2-300 ports in TIME_WAIT involving port 1984 # netstat -an | grep TIME | grep 1984 | wc -l 253 Im not sure if this is normal either, nor am I sure if it is bad or not? Here is the output from some of the "bb" buttons at the top of my hobbit page. If anything stands out, any help would be apprieciated. If I can provide any other info, let me know. Thanks, Jeff bbtest-net version 4.2.0 SSL library : OpenSSL 0.9.7a Feb 19 2003 LDAP library: OpenLDAP 20213 Statistics: Hosts total : 137 Hosts with no tests : 0 Total test count : 141 Status messages : 141 Alert status msgs : 0 Transmissions : 4 DNS statistics: # hostnames resolved : 138 # succesful : 134 # failed : 4 # calls to dnsresolve : 141 TCP test statistics: # TCP tests total : 4 # HTTP tests : 2 # Simple TCP tests : 2 # Connection attempts : 4 # bytes written : 272 # bytes read : 6108 TIME SPENT Event Starttime Duration bbtest-net startup 1188421844.836442 • Service definitions loaded 1188421844.837799 0.001357 Tests loaded 1188421844.860329 0.022530 DNS lookups completed 1188421854.099983 9.239654 Test engine setup completed 1188421854.101787 0.001804 TCP tests completed 1188421854.249955 0.148168 PING test completed (137 hosts) 1188421867.657468 13.407513 PING test results sent 1188421867.659863 0.002395 Test result collection completed 1188421867.659878 0.000015 LDAP test engine setup completed 1188421867.659880 0.000002 LDAP tests executed 1188421867.659882 0.000002 LDAP tests result collection completed 1188421867.659883 0.000001 Test results transmitted 1188421867.660531 0.000648 bbtest-net completed 1188421867.662248 0.001717 TIME TOTAL 22.825806 bbgen for Hobbit version 4.2.0 Statistics: Hosts : 243 Status messages : 1255 Purple messages : 0 Pages : 22 TIME SPENT Event Starttime Duration Startup 1188421967.748593 • Load links done 1188421967.748887 0.000294 Load bbhosts done 1188421967.752720 0.003833 ACK removal done 1188421967.752787 0.000067 Load STATE done 1188421967.772438 0.019651 Color calculation done 1188421967.772690 0.000252 Hobbit pagegen start 1188421967.772717 0.000027 Hobbit pagegen done 1188421967.794090 0.021373 BB2 generation done 1188421967.800862 0.006772 BBNK generation done 1188421967.801386 0.000524 Summary transmission done 1188421967.801388 0.000002 Run completed 1188421967.801389 0.000001 TIME TOTAL 0.052796 Statistics for Hobbit daemon Up since 29-Aug-2007 13:49:58 (0 days, 02:20:00) Incoming messages : 152683 - status : 97745 - combo : 16461 - page : 10 - summary : 0 - data : 22090 - client : 16205 - notes : 0 - enable : 0 - disable : 0 - ack : 0 - config : 0 - query : 0 - hobbitdboard : 140 - hobbitdlog : 4 - drop : 0 - rename : 0 - dummy : 28 - ping : 0 - notify : 0 - schedule : 0 - download : 0 - Bogus/Timeouts : 0 Incoming messages/sec : 17 (average last 300 seconds) status channel messages: 97839 (1 readers) stachg channel messages: 1113 (1 readers) page channel messages: 293 (1 readers) data channel messages: 22090 (1 readers) notes channel messages: 0 (0 readers) enadis channel messages: 0 (0 readers) client channel messages: 16195 (1 readers) clichg channel messages: 0 (1 readers)
list Henrik Størner
▸
On Wed, Aug 29, 2007 at 04:14:58PM -0500, Jeff Newman wrote:
The server is a DL360 G3 with two 3ghz processors and 2gb of RAM. It does only hobbit and cricket (10 or so routers, so not a huge hit) I've noticed the following two things: #1. I am constantly using 100% of my physical ram (no swap really) Mem: 2075040k total, 2051928k used, 23112k free, 52172k buffers Swap: 0k total, 0k used, 0k free, 1744168k cached is this normal? Not sure how to tell which process specifically is eating up the most RAM.
I think you're running Linux, right? The "1744168k cached" normally goes in the "Mem:" line, not the "Swap:" line. And that's the clue to what's using your RAM: Linux uses available RAM as a variable-size disk cache; the disk cache grows until just about all of RAM is used. It usually makes sense, the cached data is a copy of data which is stored on disk, so when an application needs more memory, the diskcache memory can be freed instantly and allocated to the application. Therefore, Linux boxes usually have very little "free" memory. The "top" utility can sort processes by memory-usage. Also, "ps -vax" will tell you how much memory each process is using.
▸
#2. I am consistantly running with 2-300 ports in TIME_WAIT involving port 1984 # netstat -an | grep TIME | grep 1984 | wc -l 253 Im not sure if this is normal either, nor am I sure if it is bad or not?
It's not unusual, and quite harmless. The TIME_WAIT state happens when a socket is closed; the operating system keeps the socket around for some time (usually 20-30 secs, cannot remember what the Linux default is) to make sure that all packets destined for this socket have been received (there might be some duplicated/retransmitted packets still in transit when the socket is closed). This makes sure that a new connection using the same port will not see packets from the old connection.
▸
Here is the output from some of the "bb" buttons at the top of my hobbit page. If anything stands out, any help would be apprieciated. DNS statistics: # hostnames resolved : 138 # succesful : 134 # failed : 4
DNS lookups completed 1188421854.099983 9.239654Your DNS lookups are a bit slow - 9 seconds for 138 DNS lookups. Nothing critical, just a bit slower than I'd expect. Installing a local caching DNS daemon is a good way to eliminate this problem. Regards, Henrik
list Greg L Hubbard
Linux, right? I *think* some Linux distributions glom on to any available RAM for a disk cache, and then they give it up when the memory is needed by something else. I hope I am not getting Linux confused with Windows -- that would really start the flames! As for the time_wait -- I have no idea. Some systems (Solaris) have a long time out when ports are released by default, and this can cause connections to hang around for a long time after the remote system has closed them. But this is just a shot in the dark. GLH
▸
-----Original Message-----
From: Jeff Newman [mailto:user-e96740e73ca8@xymon.invalid]
Sent: Wednesday, August 29, 2007 4:15 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Hobbit Server performance
All,
Have been taking a look at my hobbit server for the past week or so.
Wanted to ask a couple things.
The server is a DL360 G3 with two 3ghz processors and 2gb of RAM. It
does only hobbit and cricket (10 or so routers, so not a huge hit)
I've noticed the following two things:
#1. I am constantly using 100% of my physical ram (no swap really)
Mem: 2075040k total, 2051928k used, 23112k free, 52172k buffers
Swap: 0k total, 0k used, 0k free, 1744168k cached
is this normal? Not sure how to tell which process specifically is
eating up the most RAM.
#2. I am consistantly running with 2-300 ports in TIME_WAIT involving
port 1984
# netstat -an | grep TIME | grep 1984 | wc -l
253
Im not sure if this is normal either, nor am I sure if it is bad or not?
Here is the output from some of the "bb" buttons at the top of my hobbit
page. If anything stands out, any help would be apprieciated.
If I can provide any other info, let me know.
Thanks,
Jeff
bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7a Feb 19 2003 LDAP library: OpenLDAP 20213
Statistics:
Hosts total : 137
Hosts with no tests : 0
Total test count : 141
Status messages : 141
Alert status msgs : 0
Transmissions : 4
DNS statistics:
# hostnames resolved : 138
# succesful : 134
# failed : 4
# calls to dnsresolve : 141
TCP test statistics:
# TCP tests total : 4
# HTTP tests : 2
# Simple TCP tests : 2
# Connection attempts : 4
# bytes written : 272
# bytes read : 6108
TIME SPENT
Event Starttime
Duration
bbtest-net startup 1188421844.836442
• Service definitions loaded 1188421844.837799
0.001357
Tests loaded 1188421844.860329
0.022530
DNS lookups completed 1188421854.099983
9.239654
Test engine setup completed 1188421854.101787
0.001804
TCP tests completed 1188421854.249955
0.148168
PING test completed (137 hosts) 1188421867.657468
13.407513
PING test results sent 1188421867.659863
0.002395
Test result collection completed 1188421867.659878
0.000015
LDAP test engine setup completed 1188421867.659880
0.000002
LDAP tests executed 1188421867.659882
0.000002
LDAP tests result collection completed 1188421867.659883
0.000001
Test results transmitted 1188421867.660531
0.000648
bbtest-net completed 1188421867.662248
0.001717
TIME TOTAL
22.825806
bbgen for Hobbit version 4.2.0
Statistics:
Hosts : 243
Status messages : 1255
Purple messages : 0
Pages : 22
TIME SPENT
Event Starttime
Duration
Startup 1188421967.748593
• Load links done 1188421967.748887
0.000294
Load bbhosts done 1188421967.752720
0.003833
ACK removal done 1188421967.752787
0.000067
Load STATE done 1188421967.772438
0.019651
Color calculation done 1188421967.772690
0.000252
Hobbit pagegen start 1188421967.772717
0.000027
Hobbit pagegen done 1188421967.794090
0.021373
BB2 generation done 1188421967.800862
0.006772
BBNK generation done 1188421967.801386
0.000524
Summary transmission done 1188421967.801388
0.000002
Run completed 1188421967.801389
0.000001
TIME TOTAL
0.052796
Statistics for Hobbit daemon
Up since 29-Aug-2007 13:49:58 (0 days, 02:20:00)
Incoming messages : 152683
- status : 97745
- combo : 16461
- page : 10
- summary : 0
- data : 22090
- client : 16205
- notes : 0
- enable : 0
- disable : 0
- ack : 0
- config : 0
- query : 0
- hobbitdboard : 140
- hobbitdlog : 4
- drop : 0
- rename : 0
- dummy : 28
- ping : 0
- notify : 0
- schedule : 0
- download : 0
- Bogus/Timeouts : 0
Incoming messages/sec : 17 (average last 300 seconds)
status channel messages: 97839 (1 readers)
stachg channel messages: 1113 (1 readers)
page channel messages: 293 (1 readers)
data channel messages: 22090 (1 readers)
notes channel messages: 0 (0 readers)
enadis channel messages: 0 (0 readers)
client channel messages: 16195 (1 readers)
clichg channel messages: 0 (1 readers)
list Jeff Newman
Thanks for the feedback everyone. I'll look into the DNS slowness, good to know about the memory/TIME_WAIT's, feel a bit better now :-) -Jeff
▸
On 8/29/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Wed, Aug 29, 2007 at 04:14:58PM -0500, Jeff Newman wrote:The server is a DL360 G3 with two 3ghz processors and 2gb of RAM. It does only hobbit and cricket (10 or so routers, so not a huge hit) I've noticed the following two things: #1. I am constantly using 100% of my physical ram (no swap really) Mem: 2075040k total, 2051928k used, 23112k free, 52172k buffers Swap: 0k total, 0k used, 0k free, 1744168k cached is this normal? Not sure how to tell which process specifically is eating up the most RAM.I think you're running Linux, right? The "1744168k cached" normally goes in the "Mem:" line, not the "Swap:" line. And that's the clue to what's using your RAM: Linux uses available RAM as a variable-size disk cache; the disk cache grows until just about all of RAM is used. It usually makes sense, the cached data is a copy of data which is stored on disk, so when an application needs more memory, the diskcache memory can be freed instantly and allocated to the application. Therefore, Linux boxes usually have very little "free" memory. The "top" utility can sort processes by memory-usage. Also, "ps -vax" will tell you how much memory each process is using.#2. I am consistantly running with 2-300 ports in TIME_WAIT involving port 1984 # netstat -an | grep TIME | grep 1984 | wc -l 253 Im not sure if this is normal either, nor am I sure if it is bad or not?It's not unusual, and quite harmless. The TIME_WAIT state happens when a socket is closed; the operating system keeps the socket around for some time (usually 20-30 secs, cannot remember what the Linux default is) to make sure that all packets destined for this socket have been received (there might be some duplicated/retransmitted packets still in transit when the socket is closed). This makes sure that a new connection using the same port will not see packets from the old connection.Here is the output from some of the "bb" buttons at the top of my hobbit page. If anything stands out, any help would be apprieciated. DNS statistics: # hostnames resolved : 138 # succesful : 134 # failed : 4 DNS lookups completed 1188421854.099983 9.239654Your DNS lookups are a bit slow - 9 seconds for 138 DNS lookups. Nothing critical, just a bit slower than I'd expect. Installing a local caching DNS daemon is a good way to eliminate this problem. Regards, Henrik
list Xbgmsharp
Hi, Here is a solution in order to reduce I/O which is a lot consuming. I load all the rrd and hist and webserver (cgi-bin,www,web,secure-cgi) into different tmpfs (http://en.wikipedia.org/wiki/TMPFS). This way everything is load into memory (cached). # free -k total used free shared buffers cached Mem: 6229400 5570816 658584 0 466896 4685444 I have 6G of memory and i set 2G for rrd and the other 2G for hist and a very small one for the webserver 64M. I check 2896 hosts. I have reduce my iowait% for 60-80% load to 0%. The webinterface is much faster. I order to write data on the disk rsync data twice a day. All of this does'nt execpt for having a local caching DNS daemon and many other tunning parameters. You can also: * reduce the timeout TCP/IP: echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout echo 1800 > /proc/sys/net/ipv4/tcp_keepalive_intvl echo 3000 > /proc/sys/net/ipv4/tcp_keepalive_time echo 0 > /proc/sys/net/ipv4/tcp_window_scaling echo 0 > /proc/sys/net/ipv4/tcp_sack echo 0 > /proc/sys/net/ipv4/tcp_timestamps * increase TCP/IP buffer echo 262144 > /proc/sys/net/core/rmem_max echo 262144 > /proc/sys/net/core/wmem_max * mount hobbit partition with noatime option and for more security nodev,noexec,nosuid * unilimit cpu,mem,files for hobbit user * increase inodes number on partition because of rrd and hist and clientdata directory. * Controls the memory # Controls the default maxmimum size of a mesage queue kernel.msgmax = 65536 # Controls the maximum shared segment size, in bytes kernel.shmmax = 4294967295 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 268435456 kernel.shmmax = 536870912 Regards, KaYa
▸
On 8/29/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Wed, Aug 29, 2007 at 04:14:58PM -0500, Jeff Newman wrote:The server is a DL360 G3 with two 3ghz processors and 2gb of RAM.It does onlyhobbit and cricket (10 or so routers, so not a huge hit) I've noticed the following two things: #1. I am constantly using 100% of my physical ram (no swap really) Mem: 2075040k total, 2051928k used, 23112k free, 52172k buffers Swap: 0k total, 0k used, 0k free, 1744168k cached is this normal? Not sure how to tell which process specifically is eating up the most RAM.I think you're running Linux, right? The "1744168k cached" normally goes in the "Mem:" line, not the "Swap:" line. And that's the clue to what's using your RAM: Linux uses available RAM as a variable-size disk cache; the disk cache grows until just about all of RAM is used. It usually makes sense, the cached data is a copy of data which is stored on disk, so when an application needs more memory, the diskcache memory can be freed instantly and allocated to the application. Therefore, Linux boxes usually have very little "free" memory. The "top" utility can sort processes by memory-usage. Also, "ps -vax" will tell you how much memory each process is using.#2. I am consistantly running with 2-300 ports in TIME_WAIT involving port 1984 # netstat -an | grep TIME | grep 1984 | wc -l 253 Im not sure if this is normal either, nor am I sure if it is bad or not?It's not unusual, and quite harmless. The TIME_WAIT state happens when a socket is closed; the operating system keeps the socket around for some time (usually 20-30 secs, cannot remember what the Linux default is) to make sure that all packets destined for this socket have been received (there might be some duplicated/retransmitted packets still in transit when the socket is closed). This makes sure that a new connection using the same port will not see packets from the old connection.Here is the output from some of the "bb" buttons at the top of my hobbit page. If anything stands out, any help would be apprieciated. DNS statistics: # hostnames resolved : 138 # succesful : 134 # failed : 4 DNS lookups completed 1188421854.0999839.239654 Your DNS lookups are a bit slow - 9 seconds for 138 DNS lookups. Nothing critical, just a bit slower than I'd expect. Installing a local caching DNS daemon is a good way to eliminate this problem. Regards, Henrik
list Henrik Størner
▸
In <user-d4e58d08a6e9@xymon.invalid> xbgmsharp <user-b84b7d8ff428@xymon.invalid> writes:
Here is a solution in order to reduce I/O which is a lot consuming.
I load all the rrd and hist and webserver (cgi-bin,www,web,secure-cgi) =20
▸
into different tmpfs (http://en.wikipedia.org/wiki/TMPFS). This way everything is load into memory (cached).
I have 6G of memory and i set 2G for rrd and the other 2G for hist and =20
a very small one for the webserver 64M.That will obviously work, but you'd better be sure this server doesn't crash (or lose power). And it doesn't scale very well - it wouldn't work for me, since I have to plan on the number of hosts being monitored doubling approximately every 12-18 months. To show what the new code does, have a look at http://www.hswn.dk/~henrik/rrd-ioload.png Before the new code was put into production, the server was running at 50% "io" load - but since this is a dual-CPU server and all of the I/O goes through one of the CPU's (Linux design choice), it was actually completely maxed out on the amount of I/O it could handle. After the RRD update caching, it uses 7-9 % I/O time. Regards, Henrik