Xymon Mailing List Archive search

performance help needed

4 messages in this thread

list Greg Shea · Mon, 26 Oct 2009 15:55:15 -0400 ·
Hi all,

 
First off, sorry for the long post, I'm trying to supply as much data as
possible for analysis.

 
I have a single Hobbit server with approximately 3500 hosts, a mixture
of windows and unix, some DB tests,

some BEA tests and a few custom tests.  I have over 70000 RRD files
which seems to be causing Hobbit performance

problems, most specifcally clock offset.  I have a cron job that
restarts Hobbit every 30 minutes otherwise the offset

grows so large it eats all memory and OOM kill starts.  NTP is fine, it
seems to be the time it takes for Hobbit to process

the client data.  OS resides on RAID1 146GB drives SAS 15K RPM, second
drive for RRDs is a single 300GB SAS 15K RPM.

At the end is a graph showing the clock offset.  What else can I try?

 
I moved the RRDs off to a separate drive hoping this would help, but the
write per second is high.  I've tried reducing

read-ahead, mounting noatime,nodiratime, changing IO scheduling to
deadline, nothing seems to help.  Here's a 

sample output from iostat -xd 60 10:

Device:

rrqm/s

wrqm/s

r/s

w/s

rsec/s

wsec/s

rkB/s

wkB/s

avgrq-sz

avgqu-sz

await

svctm

%util

sda

0.00

68.08

0.17

20.02

1.33

704.78

0.67

352.39

34.98

4.25

210.36

3.47

7.01

sda1

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

sda2

0.00

68.08

0.17

20.02

1.33

704.78

0.67

352.39

34.98

4.25

210.36

3.47

7.01

sdb

0.00

674.60

1.53

311.04

12.27

7887.05

6.13

3943.52

25.27

24.50

78.38

1.91

59.70

sdb1

0.00

674.60

1.53

311.04

12.27

7887.05

6.13

3943.52

25.27

24.50

78.38

1.91

59.70

sdb2

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

dm-0

0.00

0.00

0.17

88.10

1.33

704.78

0.67

352.39

8.00

20.31

230.09

0.79

7.01

dm-1

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

 
Drive sdb1 is housing the RRD files

 
Memory seems fine:

Memory              Used       Total  Percentage
  Physical           7645M       7973M         95%
 Actual             4688M       7973M         58%
 Swap                 64M       9983M          0%
 

[hobbit at hobbitmon rrd]$ uname -a

Linux hobbitmon 2.6.9-78.0.8.ELsmp #1 SMP Wed Nov 5 07:14:58 EST 2008
x86_64 x86_64 x86_64 GNU/Linux

[hobbit at hobbitmon rrd]$ cat /etc/redhat-release

Red Hat Enterprise Linux AS release 4 (Nahant Update 7)

 
Output from bbgen:

bbgen for Hobbit version 4.2.0

 
Statistics:

 Hosts               :  3506

 Status messages     : 41934

 Purple messages     :     0

 Pages               :   171

 
Output from bbtest:

bbtest-net version 4.2.0
SSL library : OpenSSL 0.9.7a Feb 19 2003
LDAP library: OpenLDAP 20213
 
Statistics:
 Hosts total           :     3511
 Hosts with no tests   :     2390
 Total test count      :     1470
 Status messages       :     1596
 Alert status msgs     :        0
 Transmissions         :       18
 
DNS statistics:
 # hostnames resolved  :      358
 # succesful           :      339
 # failed              :       19
 # calls to dnsresolve :      530
 
TCP test statistics:
 # TCP tests total     :      411
 # HTTP tests          :      161
 # Simple TCP tests    :      250
 # Connection attempts :      411
 # bytes written       :    24722
 # bytes read          :   543706
 
 
TIME SPENT
Event                                            Starttime
Duration
bbtest-net startup                       1256584823.382254
• Service definitions loaded               1256584823.383506
0.001252 
Tests loaded                             1256584823.468743
0.085237 
DNS lookups completed                    1256584828.565010
5.096267 
Test engine setup completed              1256584828.572444
0.007434 
TCP tests completed                      1256584839.000192
10.427748 
PING test completed (1082 hosts)         1256584881.612835
42.612643 
PING test results sent                   1256584890.617168
9.004333 
Test result collection completed         1256584890.617453
0.000285 
LDAP test engine setup completed         1256584890.617453
0.000000 
LDAP tests executed                      1256584890.617454
0.000001 
LDAP tests result collection completed   1256584890.617455
0.000001 
NTP tests executed                       1256584894.477007
3.859552 
RPC tests executed                       1256584894.988810
0.511803 
Test results transmitted                 1256584895.016358
0.027548 
bbtest-net completed                     1256584895.018441
0.002083 
TIME TOTAL
71.636187 
 
 
Output for hobbitd:
Statistics for Hobbit daemon
Up since 26-Oct-2009 15:00:11 (0 days, 00:25:02)
 
Incoming messages      :     398039
- status               :     367373
- combo                :       5193
- page                 :        183
- summary              :         75
- data                 :      15310
- client               :       9595
- notes                :          0
- enable               :          0
- disable              :          0
- ack                  :          0
- config               :          0
- query                :         50
- hobbitdboard         :         63
- hobbitdlog           :        180
- drop                 :          0
- rename               :          0
- dummy                :          5
- ping                 :          0
- notify               :          0
- schedule             :          1
- download             :          0
- Bogus/Timeouts       :         11
Incoming messages/sec  :        262 (average last 300 seconds)
 
status channel messages:     366410 (1 readers)
stachg channel messages:      34214 (1 readers)
page   channel messages:       5600 (1 readers)
data   channel messages:      15310 (1 readers)
notes  channel messages:          0 (0 readers)
enadis channel messages:          0 (0 readers)
client channel messages:       9565 (1 readers)
clichg channel messages:         17 (1 readers)
Attachments (1)
list Buchan Milne · Tue, 27 Oct 2009 12:34:34 +0100 ·
quoted from Greg Shea
On Monday, 26 October 2009 20:55:15 user-762ee872a5a4@xymon.invalid wrote:
Hi all,


First off, sorry for the long post, I'm trying to supply as much data as
possible for analysis.


I have a single Hobbit server with approximately 3500 hosts, a mixture
of windows and unix, some DB tests,

some BEA tests and a few custom tests.  I have over 70000 RRD files
which seems to be causing Hobbit performance

problems, most specifcally clock offset.  I have a cron job that
restarts Hobbit every 30 minutes otherwise the offset

grows so large it eats all memory and OOM kill starts.  NTP is fine, it
seems to be the time it takes for Hobbit to process

the client data.  OS resides on RAID1 146GB drives SAS 15K RPM, second
drive for RRDs is a single 300GB SAS 15K RPM.

At the end is a graph showing the clock offset.  What else can I try?
Add more spindles.

70 000 RRD files will result in a minimum of 233 IOPS (assuming they are all being  updated at 5-minute intervals). The EMC people I've spoken to say a 15k FC disk shouldn't really be averaging much more than 180 IOPS, 15k SAS or 15k SCSI wouldn't be any better. The 311 you seem to be doing isn't significant overhead for the minumum of 233, so it is unlikely that any tuning will help.

If you can't add spindles, you could look at the 4.3 branch, which has some features that allow scaling out to more hosts, or streamlining RRD writes (which may allow you to lose the clock offset, but will likely not reduce the load average much).

Regards,
Buchan
list Greg Shea · Tue, 27 Oct 2009 09:24:47 -0400 ·
quoted from Buchan Milne
On Monday, 26 October 2009 20:55:15 user-762ee872a5a4@xymon.invalid wrote:
Hi all,


First off, sorry for the long post, I'm trying to supply as much data as
possible for analysis.


I have a single Hobbit server with approximately 3500 hosts, a mixture
of windows and unix, some DB tests,

some BEA tests and a few custom tests.  I have over 70000 RRD files
which seems to be causing Hobbit performance

problems, most specifcally clock offset.  I have a cron job that
restarts Hobbit every 30 minutes otherwise the offset

grows so large it eats all memory and OOM kill starts.  NTP is fine, it
seems to be the time it takes for Hobbit to process

the client data.  OS resides on RAID1 146GB drives SAS 15K RPM, second
drive for RRDs is a single 300GB SAS 15K RPM.

At the end is a graph showing the clock offset.  What else can I try?
Add more spindles.

70 000 RRD files will result in a minimum of 233 IOPS (assuming they are all 
being  updated at 5-minute intervals). The EMC people I've spoken to say a 15k 
FC disk shouldn't really be averaging much more than 180 IOPS, 15k SAS or 15k 
SCSI wouldn't be any better. The 311 you seem to be doing isn't significant 
overhead for the minumum of 233, so it is unlikely that any tuning will help.

If you can't add spindles, you could look at the 4.3 branch, which has some 
features that allow scaling out to more hosts, or streamlining RRD writes 
(which may allow you to lose the clock offset, but will likely not reduce the 
load average much).

Regards,
Buchan
Hi Buchan,

Thanks for your response.  I bounced around the idea of external storage, but even
here at EMC there is a cost associated with external storage, that's why I tried the
second drive.  I've read about the enhancements in 4.3, but thought I should upgrade
from RH 4.7 to RH 5.3 first (RH is the official supported Linux) as there were IO
improvements in the kernel.  I also tried a newer version of RRD 1.2.30 and 1.3.8.
RRD 1.3.8 doesn't work Hobbit 4.2.

On to the storage requisition process....

Thanks
-Grs-
Gregory R Shea
EMC Corporation
list Olivier Audry · Tue, 27 Oct 2009 15:16:28 +0000 (UTC) ·
hi all,

do you have a lot of memory ? If yes you can create tmpfs for your rrd and sync the tmpfs every couple hours.

We do it for a little hobbit server with 3500+ devices. For rrd hist and www dir.

Regards

Olivier AUDRY

----- Mail Original -----
quoted from Greg Shea
De: "shea greg" <user-762ee872a5a4@xymon.invalid>
À: user-9b139aff4dec@xymon.invalid, user-ae9b8668bcde@xymon.invalid
Cc: "shea greg" <user-762ee872a5a4@xymon.invalid>
Envoyé: Mardi 27 Octobre 2009 14h24:47 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne
Objet: RE: [hobbit] performance help needed
On Monday, 26 October 2009 20:55:15 user-762ee872a5a4@xymon.invalid wrote:
Hi all,


First off, sorry for the long post, I'm trying to supply as much data as
possible for analysis.


I have a single Hobbit server with approximately 3500 hosts, a mixture
of windows and unix, some DB tests,

some BEA tests and a few custom tests.  I have over 70000 RRD files
which seems to be causing Hobbit performance

problems, most specifcally clock offset.  I have a cron job that
restarts Hobbit every 30 minutes otherwise the offset

grows so large it eats all memory and OOM kill starts.  NTP is fine, it
seems to be the time it takes for Hobbit to process

the client data.  OS resides on RAID1 146GB drives SAS 15K RPM, second
drive for RRDs is a single 300GB SAS 15K RPM.

At the end is a graph showing the clock offset.  What else can I try?
Add more spindles.

70 000 RRD files will result in a minimum of 233 IOPS (assuming they are all 
being  updated at 5-minute intervals). The EMC people I've spoken to say a 15k 
FC disk shouldn't really be averaging much more than 180 IOPS, 15k SAS or 15k 
SCSI wouldn't be any better. The 311 you seem to be doing isn't significant 
overhead for the minumum of 233, so it is unlikely that any tuning will help.

If you can't add spindles, you could look at the 4.3 branch, which has some 
features that allow scaling out to more hosts, or streamlining RRD writes 
(which may allow you to lose the clock offset, but will likely not reduce the 
load average much).

Regards,
Buchan
Hi Buchan,

Thanks for your response.  I bounced around the idea of external storage, but even
here at EMC there is a cost associated with external storage, that's why I tried the
second drive.  I've read about the enhancements in 4.3, but thought I should upgrade
from RH 4.7 to RH 5.3 first (RH is the official supported Linux) as there were IO
improvements in the kernel.  I also tried a newer version of RRD 1.2.30 and 1.3.8.
RRD 1.3.8 doesn't work Hobbit 4.2.

On to the storage requisition process....

Thanks
-Grs-
Gregory R Shea
EMC Corporation