dr for hobbit

8 messages in this thread

list Phil Wild · Mon, 19 May 2008 14:39:11 +0800 ·

Hi all,

I am redesigning the method we use for performing a failover to a disaster
recovery installation of hobbit. I am interested in opinions on the approach
and any shortcomings.

Note: This is not HA/clustering, it is for DR purposes.

We are aiming to have:

a production hobbit deployment
a DR hobbit deployment

clients will be configured to send metrics to both servers. which will keep
historical rrd data up to date etc.

The production server will be configured to send out alerts. The dr server
will not.

At regular intervals, rsync will be used to synchronise data from the
production server to the dr server, including the in memory checkpoint file.

In the event of a dr, the dr hobbit server will be promoted to active by
restarting hobbit, and loading the checkpoint and alert configurations.

I am expecting that this will ensure that the dr server will be "up to date"
with proudction as per the last checkpoint. This includes tests that have
been disabled or acknowledged.

Prior to failback to the production hobbit installation, the reverse of the
above would be performed.
An rsync of rrd data files would be performed to cover any windows where one
of the servers was offline for a period of time.

Is there anything wrong with this approach?

Cheers

Phil


-- 
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid

list Iain Miller · Mon, 19 May 2008 09:03:04 +0100 ·

Hi Phil,

I know you said you don't want to use a HA/clustering solution, but I
have a similar situation to yourself and I use a HA solution with
heartbeat/drbd and being honest it saves me a load of hassle.  OK the
failover fails automatically and I don't know that it has (which I'd
argue is how I want it) but all the rrd files are kept in sync and all
maintenance settings get maintained across the two servers.  Plus I
don't need to recall which server was down and which server I need to
rsync from and to - DRBD resource maintains all that for me and I just
worry about configuring hobbit.  Plus as hobbit is only running on the
active server, it's the only one sending out alerts.

I can give you more details on my configuration if you are interested.

Cheers,

Iain.

2008/5/19 Phil Wild <user-e365c1418192@xymon.invalid>:

▸ quoted from Phil Wild

Hi all,

I am redesigning the method we use for performing a failover to a disaster
recovery installation of hobbit. I am interested in opinions on the approach
and any shortcomings.

Note: This is not HA/clustering, it is for DR purposes.

We are aiming to have:

a production hobbit deployment
a DR hobbit deployment

clients will be configured to send metrics to both servers. which will keep
historical rrd data up to date etc.

The production server will be configured to send out alerts. The dr server
will not.

At regular intervals, rsync will be used to synchronise data from the
production server to the dr server, including the in memory checkpoint file.

In the event of a dr, the dr hobbit server will be promoted to active by
restarting hobbit, and loading the checkpoint and alert configurations.

I am expecting that this will ensure that the dr server will be "up to date"
with proudction as per the last checkpoint. This includes tests that have
been disabled or acknowledged.

Prior to failback to the production hobbit installation, the reverse of the
above would be performed.
An rsync of rrd data files would be performed to cover any windows where one
of the servers was offline for a period of time.

Is there anything wrong with this approach?

Cheers

Phil


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid

--


Iain Miller
user-6f16c60bcd50@xymon.invalid

list Phil Wild · Mon, 19 May 2008 16:26:21 +0800 ·

Hi Ian,

I have contemplated this approach...

The two hobbit installations are about 15km's apart. Although I could
cluster between two servers over this distance, it is not my preference.

My biggest issue with a clustered solution is that there is one copy of the
data (albeit usually mirrored). If something goes wrong with the data (e.g.
a mistake.. rm bb-hosts...), it instantly happens to both sites. Recovery
then requires restoration via tape/snapshot etc. If it is via snapshot, then
I have to roll back to the last snapshot (which may be acceptable depending
on the technology being used).

I want my dr copy to be as close to production as possible but without any
shared infrastructure that may allow the poisoning of both services. A warm
standby seems to be a good approach and from my research, it seems quite
feasible.

Cheers

Phil

2008/5/19 Iain Miller <user-6f16c60bcd50@xymon.invalid>:

▸ quoted from Iain Miller

Hi Phil,

I know you said you don't want to use a HA/clustering solution, but I
have a similar situation to yourself and I use a HA solution with
heartbeat/drbd and being honest it saves me a load of hassle.  OK the
failover fails automatically and I don't know that it has (which I'd
argue is how I want it) but all the rrd files are kept in sync and all
maintenance settings get maintained across the two servers.  Plus I
don't need to recall which server was down and which server I need to
rsync from and to - DRBD resource maintains all that for me and I just
worry about configuring hobbit.  Plus as hobbit is only running on the
active server, it's the only one sending out alerts.

I can give you more details on my configuration if you are interested.

Cheers,

Iain.

2008/5/19 Phil Wild <user-e365c1418192@xymon.invalid>:

Hi all,

I am redesigning the method we use for performing a failover to a
disaster
recovery installation of hobbit. I am interested in opinions on the
approach
and any shortcomings.

Note: This is not HA/clustering, it is for DR purposes.

We are aiming to have:

a production hobbit deployment
a DR hobbit deployment

clients will be configured to send metrics to both servers. which will
keep
historical rrd data up to date etc.

The production server will be configured to send out alerts. The dr
server
will not.

At regular intervals, rsync will be used to synchronise data from the
production server to the dr server, including the in memory checkpoint
file.

In the event of a dr, the dr hobbit server will be promoted to active by
restarting hobbit, and loading the checkpoint and alert configurations.

I am expecting that this will ensure that the dr server will be "up to
date"
with proudction as per the last checkpoint. This includes tests that have
been disabled or acknowledged.

Prior to failback to the production hobbit installation, the reverse of
the
above would be performed.
An rsync of rrd data files would be performed to cover any windows where
one
of the servers was offline for a period of time.

Is there anything wrong with this approach?

Cheers

Phil


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid


--
Iain Miller
user-6f16c60bcd50@xymon.invalid

-- 
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid

list Christoph Berg · Mon, 19 May 2008 12:44:26 +0200 ·

Re: Phil Wild 2008-05-19 <user-2852918975f3@xymon.invalid>

▸ quoted from Phil Wild

[...]

clients will be configured to send metrics to both servers. which will keep
historical rrd data up to date etc.

Hi,

just a note on that one, we are running a setup like that too, plus a
third host for alerting. The problem we have is that if one of the
hosts in BBDISPLAYS is down, clients will have trouble sending in data
to any server, because of timeouts. Sometimes notifications lag for as
much as half an hour.

I'm not saying your setup won't work, but you should keep in mind that
you might effectively degrade availability/responsiveness because of
the use of multiple servers.

Christoph
-- 
user-92157dbc91bf@xymon.invalid | http://www.df7cb.de/

list Stewart L · Mon, 19 May 2008 07:02:55 -0400 ·

I'd like to hear about your setup...


On Mon, May 19, 2008 at 4:03 AM, Iain Miller <user-6f16c60bcd50@xymon.invalid>

▸ quoted from Phil Wild

wrote:

Hi Phil,

I know you said you don't want to use a HA/clustering solution, but I
have a similar situation to yourself and I use a HA solution with
heartbeat/drbd and being honest it saves me a load of hassle.  OK the
failover fails automatically and I don't know that it has (which I'd
argue is how I want it) but all the rrd files are kept in sync and all
maintenance settings get maintained across the two servers.  Plus I
don't need to recall which server was down and which server I need to
rsync from and to - DRBD resource maintains all that for me and I just
worry about configuring hobbit.  Plus as hobbit is only running on the
active server, it's the only one sending out alerts.

I can give you more details on my configuration if you are interested.

Cheers,

Iain.

2008/5/19 Phil Wild <user-e365c1418192@xymon.invalid>:

Hi all,

I am redesigning the method we use for performing a failover to a
disaster
recovery installation of hobbit. I am interested in opinions on the
approach
and any shortcomings.

Note: This is not HA/clustering, it is for DR purposes.

We are aiming to have:

a production hobbit deployment
a DR hobbit deployment

clients will be configured to send metrics to both servers. which will
keep
historical rrd data up to date etc.

The production server will be configured to send out alerts. The dr
server
will not.

At regular intervals, rsync will be used to synchronise data from the
production server to the dr server, including the in memory checkpoint
file.

In the event of a dr, the dr hobbit server will be promoted to active by
restarting hobbit, and loading the checkpoint and alert configurations.

I am expecting that this will ensure that the dr server will be "up to
date"
with proudction as per the last checkpoint. This includes tests that have
been disabled or acknowledged.

Prior to failback to the production hobbit installation, the reverse of
the
above would be performed.
An rsync of rrd data files would be performed to cover any windows where
one
of the servers was offline for a period of time.

Is there anything wrong with this approach?

Cheers

Phil


--
Tel: XXXX XXX XXX
Fax: XXXX XXX XXX
email: user-e365c1418192@xymon.invalid


--
Iain Miller
user-6f16c60bcd50@xymon.invalid

--


Stewart

The revolution will not be televised.
The revolution will be no re-run brothers;
The revolution will be live.

list Anna Jonna Armannsdottir · Mon, 19 May 2008 11:07:38 +0000 ·

▸ quoted from Iain Miller

On mán, 2008-05-19 at 09:03 +0100, Iain Miller wrote:

but all the rrd files are kept in sync and all
maintenance settings get maintained across the two servers.

I would say that keeping the rrd files in sync, is the touchest task. 
How do you mix two rrd files which both have some different holes in their information? 
I am using a solution where the idea is to eliminate single points of failure. All clients send data to both hobbit servers, and both servers store rrd data and both servers have http service. 
Eliminating single points of failure is a simple method which gives an uptime sufficient for my university, without resorting to more advanced methods. 
-- 
Kindest Regards, Anna Jonna Ármannsdóttir,       %&   A: Because people read from top to bottom.
Unix System Aministration, Computing Services,   %&   Q: Why is top posting bad?
University of Iceland.

list Anna Jonna Armannsdottir · Mon, 19 May 2008 11:30:25 +0000 ·

▸ quoted from Christoph Berg

On mán, 2008-05-19 at 12:44 +0200, Christoph Berg wrote:

clients will be configured to send metrics to both servers. which
will keep
historical rrd data up to date etc.

Hi,

just a note on that one, we are running a setup like that too, plus a
third host for alerting. The problem we have is that if one of the
hosts in BBDISPLAYS is down, clients will have trouble sending in data
to any server, because of timeouts. Sometimes notifications lag for as
much as half an hour.

So the clients contact the servers one after another. The clients could be changed so that they contact the the servers in parallel. So if one of the processes times out, it will not affect the other processes. 
This behaviour probably depends on the client itself. BBWin, Big Brother
Client and Hobbit Client probably behave differently.

▸ quoted from Anna Jonna Armannsdottir

-- 
Kindest Regards, Anna Jonna Ármannsdóttir,       %&   A: Because people read from top to bottom.
Unix System Aministration, Computing Services,   %&   Q: Why is top posting bad?
University of Iceland.

list Paul Krash · Mon, 19 May 2008 06:32:26 -0500 ·

Overcome this  by checking if BBDISPLAY server is down,
then editing client config file, bouncing hobbit service.

No delays/responsiveness issues.

Much the same as hobbit invokes an email or SNPP
event that notifies user if up or down,
one could notify clients that a BBDISPLAY is down/up
and act accordingly.

Best,

PKrash

▸ quoted from Christoph Berg



-----Original Message-----
From: Christoph Berg [mailto:user-92157dbc91bf@xymon.invalid]
Sent: Mon 5/19/2008 5:44 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] dr for hobbit
 
Re: Phil Wild 2008-05-19 <user-2852918975f3@xymon.invalid>
[...]

clients will be configured to send metrics to both servers. which will keep
historical rrd data up to date etc.

Hi,

just a note on that one, we are running a setup like that too, plus a
third host for alerting. The problem we have is that if one of the
hosts in BBDISPLAYS is down, clients will have trouble sending in data
to any server, because of timeouts. Sometimes notifications lag for as
much as half an hour.

I'm not saying your setup won't work, but you should keep in mind that
you might effectively degrade availability/responsiveness because of
the use of multiple servers.

Christoph
-- 
user-92157dbc91bf@xymon.invalid | http://www.df7cb.de/


This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others.

dr for hobbit 🔗 link

dr for hobbit