Loadbalancing Hobbit Server

list Scott Walters
Mon, 19 Feb 2007 23:36:08 -0500
Message-Id: <user-ec24ef9b2ce8@xymon.invalid>

On 2/19/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

Hi Scott,

you're always asking interesting questions :-)


Thanks.  I changed majors to Philosophy since returning to college to finish
my undergraduate degree.  I'm glad to see it's paying off ;)

Some tasks can run on any of the available servers. E.g. analyzing the
client data can be done on any server running the hobbitd_client module;
so it doesn't matter which of the available "client task" servers is
invoked. (Obviously, the config files must be kept in sync on the
servers, but that's why we have tools like rsync).


Hmmm...since you already have a "task master" it might be convenient to make
it the "config master" as well.  Similar to the hobbit-client.cfg?


Some tasks store data - e.g. the RRD files. Those tasks can run on

multiple servers, BUT: For any given host, there will be only one server
holding the data. It's no good feeding the RRD updates to server A at
10:00 AM, and server B at 10:05 - because that would break the RRD data.
So if the RRD files for "www.foo.com" lives on server A, and that server
crashes, then you will lose access to the RRD files for www.foo.com -
but RRD files for hosts on the other servers will still be available.
History logs are handled like RRD files. Now, you can argue that it
would be nice if you could replicate the RRD- or history-updates to
multiple servers so you would have a complete failover where you
wouldn't lose access to some of the data. If there's enough requests
it can be added - there's nothing in the design that prevents it. But
perhaps it would just be simpler to mirror those files between the
servers at regular intervals through some other program.


Yes, that's the million dollar question:  Should HA with integrity of
RRD/history files be part of of Hobbit?  Even if you do put the
history-updates to multiple servers, you still have the nightmare of how to
sync things up when the "dead" server comes back up.

Those are my ideas. Feedback is very welcome from anyone; this is a
relatively new area for me to be working with (at least from a
programmer perspective), so any input will be appreciated.

Because of the complexity of HA solutions and data integrity, I am not sure
the hobbit code is the right place for the logic.  Similar to the database
backend, you'll open yourself up to a lot of potential debugging.  I am a
keep it simple stupid kinda guy and I am reminded of a saying, "A man with
one watch always knows what time it is."

I'd rather see the hobbit tool improve monitoring, reports, and other
features that really matter.  Let the HA happen outside of hobbit.

I also believe you should only cluster/load-balance when one box can't do
the job.  Introducing those complexities to increase availability are
usually counterproductive -- you end up taking your system down because it's
so hard to configure/maintain.  And then it usually doesn't work anyway when
it's supposed to.

Scott Walters
-PacketPusher