Scaling

list Olivier Audry
Thu, 11 Apr 2013 21:40:40 +0200
Message-Id: <1365709240.4608.9.camel@aragorn>

hello

as I understand I should run xymon on a single node to improve memory
access latency. Right ?

I will test this if I found the right command :)

oau

Le jeudi 11 avril 2013 à 20:40 +0200, Olivier AUDRY a écrit :

hello

can you gives us more information on your numa config ?

As I understand I only see two node 1 per physical cpu 
numactl --hardware
available: 2 nodes (0-1)
node 0 size: 12097 MB
node 0 free: 594 MB
node 1 size: 12120 MB
node 1 free: 12 MB
node distances:
node   0   1   0:  10  20 

event I got 24 cpu. Multi core and hyperthreading. Is that correct ?

As I can see my two node are full. Not good at all I guess.

My policy is the default one. Perhaps you can advice a specific policy
for a xymon setup ? 
 numactl --show
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23 cpubind: 0 1 nodebind: 0 1 membind: 0 1 

I'm looking into /proc/pid/numa_maps to find more info.

If you can help it will be great :)

thx

oau

Le jeudi 11 avril 2013 à 17:18 +0000, user-87556346d4af@xymon.invalid a écrit :

On Wed, Apr 10, 2013 at 5:51 PM, White, Bruce <user-58f975e8bf9d@xymon.invalid>
wrote:

Over 1000 devices monitored here and only real issue is rrd keeping up.
I
have been told an ssd for the rrd files will solve this issue.

~2000 hosts and that will double or triple in the next few weeks. I really
don't see any IO issues in the slightest.
6 x 15k RPM SCSI drives in Raid 5 on a Dell PowerEdge 2950 with 8 gigs of
ram and the thing is snoring (LA: 0.25)

Regards,
Cami

We're currently processing ~2K incoming messages a second on a single
xymond instance. This is a pretty beefy box, but it's also handling lots
of other concurrent monitoring tasks that we're slowly moving over to
xymon... including a non-fping-enabled Icinga install >.<

]# xymon localhost "xymondboard test=info fields=hostname" | wc -l

(Not all of those are full hosts; some are application nodes with statuses
being generated server-side out of client-side jvm stats or the like.)

At these levels it's important to ensure you're using whatever NUMA
capabilities your system has properly, since message passing is basically
just shoveling incoming TCP data around within memory. Also, you might
want to tweak net.ipv4.ip_local_port_range and enable
net.ipv4.tcp_tw_reuse and/or net.ipv4.tcp_tw_recycle on Linux to eke more
simultaneous testing out of xymonnet.
One of the beauties of Xymon's architecture is the ability to cleanly
disconnect the components... Xymongen can run on some other box,
xymond_locator can be used to send rrd data off somewhere if IO becomes an
issue, xymonnet pollers can be distributed, and xymonproxy can be used as
needed to aggregate and smooth out incoming status reports, etc.

There are lots of different mechanisms for "scaling" efficiently depending
on your particular needs, but I'd bet that on decently modern server
hardware you'll probably want to scale for HA purposes long before you
actually /need/ the additional power.

HTH,

-jc