Xymon Mailing List Archive search

dr for hobbit

list Phil Wild
Mon, 19 May 2008 16:26:21 +0800
Message-Id: <user-f5cce70e02a1@xymon.invalid>

Hi Ian,

I have contemplated this approach...

The two hobbit installations are about 15km's apart. Although I could
cluster between two servers over this distance, it is not my preference.

My biggest issue with a clustered solution is that there is one copy of the
data (albeit usually mirrored). If something goes wrong with the data (e.g.
a mistake.. rm bb-hosts...), it instantly happens to both sites. Recovery
then requires restoration via tape/snapshot etc. If it is via snapshot, then
I have to roll back to the last snapshot (which may be acceptable depending
on the technology being used).

I want my dr copy to be as close to production as possible but without any
shared infrastructure that may allow the poisoning of both services. A warm
standby seems to be a good approach and from my research, it seems quite
feasible.

Cheers

Phil

2008/5/19 Iain Miller <user-6f16c60bcd50@xymon.invalid>:
Hi Phil,

I know you said you don't want to use a HA/clustering solution, but I
have a similar situation to yourself and I use a HA solution with
heartbeat/drbd and being honest it saves me a load of hassle.  OK the
failover fails automatically and I don't know that it has (which I'd
argue is how I want it) but all the rrd files are kept in sync and all
maintenance settings get maintained across the two servers.  Plus I
don't need to recall which server was down and which server I need to
rsync from and to - DRBD resource maintains all that for me and I just
worry about configuring hobbit.  Plus as hobbit is only running on the
active server, it's the only one sending out alerts.

I can give you more details on my configuration if you are interested.

Cheers,

Iain.

2008/5/19 Phil Wild <user-e365c1418192@xymon.invalid>:
Hi all,

I am redesigning the method we use for performing a failover to a
disaster
recovery installation of hobbit. I am interested in opinions on the
approach
and any shortcomings.

Note: This is not HA/clustering, it is for DR purposes.

We are aiming to have:

a production hobbit deployment
a DR hobbit deployment

clients will be configured to send metrics to both servers. which will
keep
historical rrd data up to date etc.

The production server will be configured to send out alerts. The dr
server
will not.

At regular intervals, rsync will be used to synchronise data from the
production server to the dr server, including the in memory checkpoint
file.

In the event of a dr, the dr hobbit server will be promoted to active by
restarting hobbit, and loading the checkpoint and alert configurations.

I am expecting that this will ensure that the dr server will be "up to
date"
with proudction as per the last checkpoint. This includes tests that have
been disabled or acknowledged.

Prior to failback to the production hobbit installation, the reverse of
the
above would be performed.
An rsync of rrd data files would be performed to cover any windows where
one
of the servers was offline for a period of time.

Is there anything wrong with this approach?

Cheers

Phil


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid

--
Iain Miller
user-6f16c60bcd50@xymon.invalid

-- 
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid