On Thu, Feb 12, 2009 at 06:06:48PM +0000, Flyzone Micky wrote:
"really low" as in ... how much ?
Output of iostat command:
avg-cpu: %user %nice %system %iowait %steal %idle
2.22 0.00 0.91 3.62 0.00 93.26
This is the output of iostat about nfs:
Device: rBlk_nor/s wBlk_nor/s rBlk_dir/s
vnetapp:/vol/hobbit 1631.11 373.97 0.00
wBlk_dir/s rBlk_svr/s wBlk_svr/s rops/s wops/s
0.00 1170.83 825.22 840.76 840.76
In this last iostat have also a rsync statistic in it cause I was
mantening a rsync on local disk of hobbit.
Unlucky nfsstat doesn't sho
of all the RRD files - takes about 8 minutes. No chance at all
then of keeping up with 5-minute update cycles.
But in this case will not appear a warning like this (that I don't have)?
WARNING: Runtime 110 longer than BBSLEEP
I really think you should try shutting off the hobbitd_rrd tasks,
just to see what happens.
Maybe I missed in the last post, but I have already done, and didn't
solve the problem.
For hosts to go purple they have to go more than 30 minutes without
an update - they don't go purple just because they miss a single
update.
Right...but doesn't appear always, I remember also an old patch
that was in all-in-one about dirty-datas, but was already applied.
I suppose you have check the kernel logs ('dmesg' output) for
anything odd ?
Done, like all the logs in the system and hobbit. Nothing more
message that could help.
I'm wondering if maybe you're running out of ports (there's only
64K of them, only about half can be used by normal apps). How
many ports do you have in TIME_WAIT state ?
Excluded, the port is 235-300 at maximun, and in the kernel parameter
I also tried to use (like in Oracle):
net.ipv4.ip_local_port_range = 1024 65000
but with or without nothing change.
Another thing is the size of the ARP cache, if your hosts are
all on the same IP network or your router/firewall is doing
proxy-arp.
The networks are about 4 differents.
And however, remember about my test on a just 20 clients.
Is this server also running the network tests ?
...
sysctl net.ipv4.tcp_tw_reuse=1
which enables the kernel to re-use ports that are in a TIME_WAIT
Yes, but like before...appear also with just a 20 clients,
so I would exclude a problem related at the numbers of clients.
However I tried also with:
net.ipv4.tcp_fin_timeout = 30
instead of the default 120 seconds in RHEL5 to leave a port
in TIME_WAIT state.
One (I) would expect the 64-bit systems to have a bit more "oomph"
so they should be the ones that worked best.
Ahm...what is a oomph? :-S
A datapoint here. I'm also running Hobbit on a 64-bit Linux
platform, but it is using SPARC (Sun) hardware.
we are trying to shutdown all our sparc and pass to linux.. :)
So you're saying that on a RHEL 5.3 64-bit Intel server, setting
up Hobbit and feeding it with data from ~20 clients will make
the system break?
Yes, this is the point RHEL > 5.0 and 64bit (AMD)...
I need yet to try on Fedora 10 64bit