Xymon Mailing List Archive search

Trying to set up split client load over mutiple servers.

3 messages in this thread

list Thomas R. Brand · Thu, 27 May 2010 17:32:23 -0400 ·
Sorry if this post gets a bit long, I've read thru the man pages and the
archives and have failed to reach understanding and need some help.

 
I have over 4300 Hobbit 4.2.0+all-patches clients currently reporting to
one server running Xymon  4.3.0-0.beta2. This server is also used for
other workloads, which normally do not produce much of a load but can at
times. 

 
We are adding more clients to this configuration at rate of about
400/week for a final total of about 7200 clients.

 
As of this week it seems that my Xymon server is unable to keep up with
the load and I suspect this is due to several reasons: the large number
of clients, file system type (ext3) used for the data/.. directories (,
the problem is.

 
I am trying to determine my best course to be able to handle the 7200+
clients ... 

Some options I've considered:

A)     simply split the clients over 2 or more independent Xymon servers

a.      easiest to configure

b.      lose 'single web page' overview

c.      lose combined statistics/reporting


B)     Split clients over multiple servers for data gathering/storage
and (how?) use one server to display a bb2.html page which combines all
non green from all 'data-gathering' servers

a.      Is this even possible?


C)    Some other method/configuration? Anyone on the list running this
many clients? 

a.      How did you set up your environment?

b.      What size/performance/type of system are you using for your
Xymon server?

 
Some of my current symptoms:

 
The 'top' command on the server is indicating high I/O load  (at times
%iowait > 60%).

 
The 'procs' column warns/alerts about missing processes; closer
examination shows that the client data received has been truncated,
usually somewhere in the 'ps' outout section.

 
The bb2.html page sometimes shows many 'purple' clients; the clients in
purple change (eg, a client goes purple for a while 5-30+ minutes, then
we get another message processed and the client goes green again.

 
In /var/log/xymon/clientdata.log, I am seeing many (2212 yesterday
alone) messages like:

2010-05-27 11:38:09 hobbitd_client: Got message 55294, expected 55277

2010-05-27 12:08:12 Flushed 7 stale messages for 0.0.0.0:0

2010-05-27 12:08:13 Flushed 43 stale messages for 0.0.0.0:0

2010-05-27 12:08:13 hobbitd_client: Got message 81190, expected 81140

2010-05-27 12:38:36 Flushed 16 stale messages for 0.0.0.0:0

2010-05-27 12:38:37 Flushed 43 stale messages for 0.0.0.0:0

2010-05-27 12:38:38 Flushed 26 stale messages for 0.0.0.0:0

2010-05-27 12:38:38 hobbitd_client: Got message 107484, expected 107399

 
The hobbitd page shows:

Statistics for Hobbit daemon
Up since 26-May-2010 14:55:38 (0 days, 22:00:02)
 
Incoming messages      :   16721697
- status               :   11146704
- combo                :    1122112

 
Incoming messages/sec  :        216 (average last 300 seconds)

 
The bbtest is taking 225 seconds to complete 

PING test completed (4390 hosts)            5005592.360761
203.397989 
TIME TOTAL
225.391564 

 
Hoping the group-mind can help me out,

Thanks, 

Tom

     
Thomas Brand 

Disclaimer: 1) all opinions are my own, 2) I may be completely wrong, 3)
my advice is worth at least as much as what you are paying for it, or
your money cheerfully refunded.

CONFIDENTIALITY NOTICE: This communication and any attachments may
contain confidential and/or privileged information for the use of the
designated recipients named above.  If you are not the intended
recipient, you are hereby notified that you have received this
communication in error and that any review, disclosure, dissemination,
distribution or copying of it or its contents is prohibited.  If you
have received this communication in error, please notify the sender
immediately by telephone and destroy all copies of this communication
and any attachments.
list Josh Luthman · Thu, 27 May 2010 17:45:00 -0400 ·
What kind of hardware are you on now?
quoted from Thomas R. Brand

On 5/27/10, Brand, Thomas R. <user-10a840458972@xymon.invalid> wrote:
Sorry if this post gets a bit long, I've read thru the man pages and the
archives and have failed to reach understanding and need some help.


I have over 4300 Hobbit 4.2.0+all-patches clients currently reporting to
one server running Xymon  4.3.0-0.beta2. This server is also used for
other workloads, which normally do not produce much of a load but can at
times.


We are adding more clients to this configuration at rate of about
400/week for a final total of about 7200 clients.


As of this week it seems that my Xymon server is unable to keep up with
the load and I suspect this is due to several reasons: the large number
of clients, file system type (ext3) used for the data/.. directories (,
the problem is.


I am trying to determine my best course to be able to handle the 7200+
clients ...

Some options I've considered:

A)     simply split the clients over 2 or more independent Xymon servers

a.      easiest to configure

b.      lose 'single web page' overview

c.      lose combined statistics/reporting


B)     Split clients over multiple servers for data gathering/storage
and (how?) use one server to display a bb2.html page which combines all
non green from all 'data-gathering' servers

a.      Is this even possible?


C)    Some other method/configuration? Anyone on the list running this
many clients?

a.      How did you set up your environment?

b.      What size/performance/type of system are you using for your
Xymon server?


Some of my current symptoms:


The 'top' command on the server is indicating high I/O load  (at times
%iowait > 60%).


The 'procs' column warns/alerts about missing processes; closer
examination shows that the client data received has been truncated,
usually somewhere in the 'ps' outout section.


The bb2.html page sometimes shows many 'purple' clients; the clients in
purple change (eg, a client goes purple for a while 5-30+ minutes, then
we get another message processed and the client goes green again.


In /var/log/xymon/clientdata.log, I am seeing many (2212 yesterday
alone) messages like:

2010-05-27 11:38:09 hobbitd_client: Got message 55294, expected 55277

2010-05-27 12:08:12 Flushed 7 stale messages for 0.0.0.0:0

2010-05-27 12:08:13 Flushed 43 stale messages for 0.0.0.0:0

2010-05-27 12:08:13 hobbitd_client: Got message 81190, expected 81140

2010-05-27 12:38:36 Flushed 16 stale messages for 0.0.0.0:0

2010-05-27 12:38:37 Flushed 43 stale messages for 0.0.0.0:0

2010-05-27 12:38:38 Flushed 26 stale messages for 0.0.0.0:0

2010-05-27 12:38:38 hobbitd_client: Got message 107484, expected 107399


The hobbitd page shows:

Statistics for Hobbit daemon
Up since 26-May-2010 14:55:38 (0 days, 22:00:02)

Incoming messages      :   16721697
- status               :   11146704
- combo                :    1122112


Incoming messages/sec  :        216 (average last 300 seconds)


The bbtest is taking 225 seconds to complete

PING test completed (4390 hosts)            5005592.360761
203.397989
TIME TOTAL
225.391564


Hoping the group-mind can help me out,

Thanks,

Tom


Thomas Brand

Disclaimer: 1) all opinions are my own, 2) I may be completely wrong, 3)
my advice is worth at least as much as what you are paying for it, or
your money cheerfully refunded.

CONFIDENTIALITY NOTICE: This communication and any attachments may
contain confidential and/or privileged information for the use of the
designated recipients named above.  If you are not the intended
recipient, you are hereby notified that you have received this
communication in error and that any review, disclosure, dissemination,
distribution or copying of it or its contents is prohibited.  If you
have received this communication in error, please notify the sender
immediately by telephone and destroy all copies of this communication
and any attachments.

-- 

Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

“Success is not final, failure is not fatal: it is the courage to
continue that counts.”
--- Winston Churchill
list Olivier Beau · Fri, 28 May 2010 10:23:33 +0200 ·
Hi,
quoted from Josh Luthman
Some of my current symptoms:

The ‘top’ command on the server is indicating high I/O load (at times %iowait > 60%).
try changing the cache time of hobbitd_rrd from 30 minutes to 1 hour
it's hardcoded in do_rrd.c
change
#define CACHESZ 6
to
#define CACHESZ 12
The ‘procs’ column warns/alerts about missing processes; closer examination shows that the client data received has been truncated, usually somewhere in the ‘ps’ outout section.
try putting higher values for MAXLINE and MAXMSG* in hobbitserver.cfg
The bb2.html page sometimes shows many ‘purple’ clients; the clients in purple change (eg, a client goes purple for a while 5-30+ minutes, then we get another message processed and the client goes green again.
any logs on the client itself ? (not being able to report to your hobbitserver ?)


Olivier