Some thoughts on clustered hobbit

10 messages in this thread

list Tom Kauffman · Mon, 9 May 2005 16:19:43 -0500 ·

First, let me express my thanks to Brian for putting this document
together and allowing Henrik to distribute it! I've a lot of experience
with IBM's HACMP for AIX, and getting a clustered configuration working
as desired is not a trivial procedure.

Henrik -- check me on this: it's my impression we no longer need a
'BBPAGER' entry on the client-side bb-hosts because the hobbit server
passes all potentially alertable statuses to hobbit-alert and it decides
if an alert is really required.

Brian -- no offense, but I would rather categorise your configuration as
"active/inactive". I'm looking at doing an "active/passive" cluster when
time frees up -- about a month from now. The difference? I'm running two
hobbit/apache instances all the time -- but the 'passive' (fallover)
side is not doing alerting or network tests. It does build displays
(it's my technical documentation server as well) and it does keep both
history and rrd data updated. Both hosts show up on the client side as
'BBDISPLAY'. On failover it will take over the IP address for the hobbit
display and re-launch hobbit with network testing and alerting enabled.

Depending on host count and test count, this might be a bad idea -- but
we've only got about 300 entries in bb-hosts. 

So -- thanks ever so much, again, for providing this -- it will make my
life ever so much easier next month when I get the time to automate the
failover environment.

Tom

Tom Kauffman
NIBCO, Inc

list Henrik Størner · Mon, 9 May 2005 23:47:29 +0200 ·

▸ quoted from Tom Kauffman

On Mon, May 09, 2005 at 04:19:43PM -0500, Kauffman, Tom wrote:

Henrik -- check me on this: it's my impression we no longer need a
'BBPAGER' entry on the client-side bb-hosts because the hobbit server
passes all potentially alertable statuses to hobbit-alert and it decides
if an alert is really required.

Correct.

In fact it does make sense to remove it, because then the client will
not initiate a connection to the server to send the "page" message -
which hobbitd promptly just discards.

▸ quoted from Tom Kauffman

Brian -- no offense, but I would rather categorise your configuration as
"active/inactive". I'm looking at doing an "active/passive" cluster when
time frees up -- about a month from now. The difference? I'm running two
hobbit/apache instances all the time -- but the 'passive' (fallover)
side is not doing alerting or network tests. It does build displays
(it's my technical documentation server as well) and it does keep both
history and rrd data updated. Both hosts show up on the client side as
'BBDISPLAY'. On failover it will take over the IP address for the hobbit
display and re-launch hobbit with network testing and alerting enabled.

That would certainly be interesting for me as well - it's the kind of
setup I plan to implement when I get time.


Regards,
Henrik

list Andy France · Tue, 10 May 2005 12:22:48 +1200 ·


Hi All,

I am adding a custom graph to track Oracle tablespace usage.  I currently
have an "oracle" test which shows the overall green/yellow/red status of
tablespaces based on percent full thresholds.

I want to append graphs showing each tablespace's size/used stats for a
better view of historical growth, but I don't want the raw data showing in
the HTML page.

I seem to recall using a <data> type tag in the message text but can't find
a good example.  I'm comfortable with post-processing the message to
generate the actual RRDs and graphs but would appreciate some help on the
message side!

TIA,
Andy.

#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################

list Al Jeffcoat · Mon, 9 May 2005 20:41:37 -0400 ·

Did I miss something?  I don't see anything regarding a clustered hobbit
setup in the last couple of days, but would be greatly interested in
clustering 2 AIX servers together across locations to handle hobbit.

Al Jeffcoat
IBM Certified Support Specialist, AIX
Enterprise SAN and Storage Administrator
System Programmer II
(321)843-1051
user-b34a8ad6e24c@xymon.invalid

▸ quoted from Henrik Størner

-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid] 
Sent: Monday, May 09, 2005 5:20 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Some thoughts on clustered hobbit

First, let me express my thanks to Brian for putting this document
together and allowing Henrik to distribute it! I've a lot of experience
with IBM's HACMP for AIX, and getting a clustered configuration working
as desired is not a trivial procedure.

Henrik -- check me on this: it's my impression we no longer need a
'BBPAGER' entry on the client-side bb-hosts because the hobbit server
passes all potentially alertable statuses to hobbit-alert and it decides
if an alert is really required.

Brian -- no offense, but I would rather categorise your configuration as
"active/inactive". I'm looking at doing an "active/passive" cluster when
time frees up -- about a month from now. The difference? I'm running two
hobbit/apache instances all the time -- but the 'passive' (fallover)
side is not doing alerting or network tests. It does build displays
(it's my technical documentation server as well) and it does keep both
history and rrd data updated. Both hosts show up on the client side as
'BBDISPLAY'. On failover it will take over the IP address for the hobbit
display and re-launch hobbit with network testing and alerting enabled.

Depending on host count and test count, this might be a bad idea -- but
we've only got about 300 entries in bb-hosts. 

So -- thanks ever so much, again, for providing this -- it will make my
life ever so much easier next month when I get the time to automate the
failover environment.

Tom

Tom Kauffman
NIBCO, Inc


This e-mail message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. If you are not the intended recipient, any review, use, or distribution of this e-mail message and any attached files is strictly prohibited. This communication may contain material protected by Federal privacy regulations, attorney-client work product, or other privileges. If you have received this confidential communication in error, please notify the sender immediately by reply e-mail message and permanently delete the original message.  To reply to our email administrator directly, send an email to:  user-ecde3bbc361d@xymon.invalid .  If this e-mail message concerns a contract matter, be advised that no employee or agent is authorized to conclude any binding agreement on behalf of Orlando Regional Healthcare by e-mail without express written confirmation by an officer of the corporation. Any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Orlando Regional Healthcare.

list Andy France · Tue, 10 May 2005 15:12:14 +1200 ·

Andy France wrote on 10/05/2005 12:22:48:

▸ quoted from Andy France

Hi All,

I am adding a custom graph to track Oracle tablespace usage.  I currently
have an "oracle" test which shows the overall green/yellow/red status of
tablespaces based on percent full thresholds.

I want to append graphs showing each tablespace's size/used stats for a
better view of historical growth, but I don't want the raw data showing
in
the HTML page.

I seem to recall using a <data> type tag in the message text but can't
find
a good example.  I'm comfortable with post-processing the message to
generate the actual RRDs and graphs but would appreciate some help on the
message side!

TIA,
Andy.


I have made some headway on this - what I was looking for was a simple set
of HTML comment tags <!-- Comment -->

So now I have my data embedded in the message thus:

  <!--DATA
  oracle.ZP1.psapbtabd.rrd:134696864:2010520:1.49
  oracle.ZP1.psapbtabi.rrd:95354800:1822184:1.91
  oracle.ZP1.psapuser1i.rrd:2585600:282128:10.91
  oracle.ZP1.psapstabd.rrd:14479360:1598952:11.04
  oracle.ZP1.psapstabi.rrd:14520304:1872960:12.90
  oracle.ZP1.dbtotsize.rrd:317055808:46472960:14.66
  oracle.ZP1.psapes40bd.rrd:6184960:1728248:27.94
  oracle.ZP1.system.rrd:307200:108216:35.23
  oracle.ZP1.psapes40bi.rrd:2068480:799904:38.67
  oracle.ZP1.psapuser1d.rrd:2068480:887688:42.92
  oracle.ZP1.psapel40bd.rrd:2068480:1025088:49.56
  oracle.ZP1.psapddicd.rrd:512000:258168:50.42
  oracle.ZP1.psapclud.rrd:10342400:5982072:57.84
  oracle.ZP1.psapdocud.rrd:71680:44984:62.76
  oracle.ZP1.psappoold.rrd:4136960:2726784:65.91
  oracle.ZP1.psapdocui.rrd:71680:51568:71.94
  oracle.ZP1.psapddici.rrd:512000:380504:74.32
  oracle.ZP1.psappooli.rrd:4136960:3139896:75.90
  oracle.ZP1.psapsourcei.rrd:307200:243096:79.13
  oracle.ZP1.psapclui.rrd:2068480:1771992:85.67
  oracle.ZP1.psapsourced.rrd:307200:264672:86.16
  oracle.ZP1.psapproti.rrd:1024000:888880:86.81
  oracle.ZP1.psapprotd.rrd:4136960:3735416:90.29
  oracle.ZP1.psapel40bi.rrd:409600:373160:91.10
  oracle.ZP1.psaproll.rrd:8273920:8065888:97.49
  oracle.ZP1.psaploadd.rrd:102400:102288:99.89
  oracle.ZP1.psaploadi.rrd:102400:102288:99.89
  oracle.ZP1.psaptemp.rrd:6205440:6205416:100.00
  -->

It doesn't show up on the web page, and I can happily parse it with my
custom rrd handler script (I really must get to grips with moving it to
C...).

But now I'm having trouble with the hobbitgraph.cfg entry.  What I want is
a series of stacked graphs, one for each rrd file, showing used space over
total space.  But I can't figure out how to specify multiple sources with
FNPATTERN, but ask rrdgraph to create a unique graph per file rather than
trying to combine them like disk etc.

I can't see any useful switches in the rrdgraph man page.  Can anyone help
with this?

▸ quoted from Andy France


TIA,
Andy.

#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################

list Henrik Størner · Tue, 10 May 2005 07:21:36 +0200 ·

Brian Lynch wrote up a description of how to cluster two Hobbit
servers - it's in the "contrib" directory in the hobbit-4.0.3rc2
sources.

▸ quoted from Al Jeffcoat


On Mon, May 09, 2005 at 08:41:37PM -0400, Jeffcoat, Al wrote:

Did I miss something?  I don't see anything regarding a clustered hobbit
setup in the last couple of days, but would be greatly interested in
clustering 2 AIX servers together across locations to handle hobbit.

Henrik

list Brian Lynch · Tue, 10 May 2005 00:21:35 -0700 ·

▸ quoted from Al Jeffcoat

On 5/9/05, Kauffman, Tom <user-3feba9e60a8b@xymon.invalid> wrote:

First, let me express my thanks to Brian for putting this document
together and allowing Henrik to distribute it! I've a lot of experience
with IBM's HACMP for AIX, and getting a clustered configuration working
as desired is not a trivial procedure.

Henrik -- check me on this: it's my impression we no longer need a
'BBPAGER' entry on the client-side bb-hosts because the hobbit server
passes all potentially alertable statuses to hobbit-alert and it decides
if an alert is really required.

Brian -- no offense, but I would rather categorise your configuration as
"active/inactive". I'm looking at doing an "active/passive" cluster when
time frees up -- about a month from now. The difference? I'm running two
hobbit/apache instances all the time -- but the 'passive' (fallover)
side is not doing alerting or network tests. It does build displays
(it's my technical documentation server as well) and it does keep both
history and rrd data updated. Both hosts show up on the client side as
'BBDISPLAY'. On failover it will take over the IP address for the hobbit
display and re-launch hobbit with network testing and alerting enabled.


I agree with your assessment, but chose the model for a few reasons
(note that I'm basing my experience on about 2 1/2 years running a dual
big brother failover setup):

1. There is always one repository for both configuration and data that are
kept reasonably identical on both systems (within the synch delay). 2. There is only one ip address accepting BB reports cutting down on both network traffic and firewall rules (for hosts in locked down vlans). 3. The other system can be dedicated to another purpose (it currently
hosts our documentation site that fails over in the opposite direction).
4. No redundant work is done. Indeed, no load is being 'shared' across
the systems unless you host the web server on the other box.
There is a risk to this based on the possibility of complete machine failure
in between synchronizations. Hence, Hobbit may come up without all
the updates for hosts or alerts. Based on my current model, I will lose about a day of historical data. These synch rates can be changed and a gigabit crossover between machines cuts down on any traffic imposed by multiple synch's. 
Note that you could very easily turn off the hobbit alerts with the same
clustering software by truncating and restoring the hobbit-alerts.cfg file. Not sure how to disable the network tests, so that may require some
custom coding... Once complete, you could use the same cluster
resource sw to accomplish a 'hot' standby.

▸ quoted from Al Jeffcoat

Depending on host count and test count, this might be a bad idea -- but

we've only got about 300 entries in bb-hosts.

So -- thanks ever so much, again, for providing this -- it will make my
life ever so much easier next month when I get the time to automate the
failover environment.

Tom

Tom Kauffman
NIBCO, Inc

list John Turner · Tue, 10 May 2005 07:35:10 -0400 ·

▸ quoted from Brian Lynch

On May 10, 2005, at 3:21 AM, Brian Lynch wrote:

On 5/9/05, Kauffman, Tom <user-3feba9e60a8b@xymon.invalid> wrote:
First, let me express my thanks to Brian for putting this document
together and allowing Henrik to distribute it! I've a lot of  experience
with IBM's HACMP for AIX, and getting a clustered configuration  working
as desired is not a trivial procedure.

Henrik -- check me on this: it's my impression we no longer need a
'BBPAGER' entry on the client-side bb-hosts because the hobbit server
passes all potentially alertable statuses to hobbit-alert and it  decides
if an alert is really required.

Brian -- no offense, but I would rather categorise your  configuration as
"active/inactive". I'm looking at doing an "active/passive" cluster  when
time frees up -- about a month from now. The difference? I'm  running two
hobbit/apache instances all the time -- but the 'passive' (fallover)
side is not doing alerting or network tests. It does build displays
(it's my technical documentation server as well) and it does keep both
history and rrd data updated. Both hosts show up on the client side as
'BBDISPLAY'. On failover it will take over the IP address for the  hobbit
display and re-launch hobbit with network testing and alerting  enabled.

Software exists to do this active/passive, a search for HA or  FAILOVER will the Linux-HA project is one example.

I biggest issue in my view is keeping the systems in sync and is  often done with a shared storage.

▸ quoted from Brian Lynch

I agree with your assessment, but chose the model for a few reasons
(note that I'm basing my experience on about 2 1/2 years running a  dual
big brother failover setup):

1. There is always one repository for both configuration and data  that are
kept reasonably identical on both systems (within the synch delay).
2. There is only one ip address accepting BB reports cutting down on
both network traffic and firewall rules (for hosts in locked down  vlans).
3. The other system can be dedicated to another purpose (it currently
hosts our documentation site that fails over in the opposite  direction).
4. No redundant work is done. Indeed, no load is being 'shared' across
the systems unless you host the web server on the other box.
 There is a risk to this based on the possibility of complete  machine failure
 in between synchronizations.  Hence, Hobbit may come up without all
 the updates for hosts or alerts.  Based on my current model, I  will lose
about a day of historical data.  These synch rates can be changed and
a gigabit crossover between machines cuts down on any traffic imposed
by multiple synch's.

Note that you could very easily turn off the hobbit alerts with the  same


clustering software by truncating and restoring the hobbit- alerts.cfg file.

▸ quoted from Brian Lynch

Not sure how to disable the network tests, so that may require some
custom coding... Once complete, you could use the same cluster
resource sw to accomplish a 'hot' standby.

So I think the only issue you have with this method is that you don't  want the extra network load. If you are willing to require a cross- over cable, and that is a requirement for most HA solutions, then one  solution might be to add the idea of a BACKUP_BBDISPLAY.  The  BBDISPLAY server would forward all incoming messages to the BACKUP  and if you have the private network cable it would not cause any load  on your network.

You would also need a way of telling the Hobbit software on the  system that is it the BACKUP server.

John

list Al Jeffcoat · Tue, 10 May 2005 10:24:15 -0400 ·

Aha...  I haven't had the pleasure of downloading and testing 4.0.3rc2,
so that explains it :)

▸ signature


Al Jeffcoat
IBM Certified Support Specialist, AIX
Enterprise SAN and Storage Administrator
System Programmer II
(321)843-1051
user-b34a8ad6e24c@xymon.invalid
-----Original Message-----

▸ quoted from Al Jeffcoat

From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Tuesday, May 10, 2005 1:22 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Some thoughts on clustered hobbit

Brian Lynch wrote up a description of how to cluster two Hobbit
servers - it's in the "contrib" directory in the hobbit-4.0.3rc2
sources.

On Mon, May 09, 2005 at 08:41:37PM -0400, Jeffcoat, Al wrote:

Did I miss something?  I don't see anything regarding a clustered
hobbit
setup in the last couple of days, but would be greatly interested in
clustering 2 AIX servers together across locations to handle hobbit.

Henrik


This e-mail message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. If you are not the intended recipient, any review, use, or distribution of this e-mail message and any attached files is strictly prohibited. This communication may contain material protected by Federal privacy regulations, attorney-client work product, or other privileges. If you have received this confidential communication in error, please notify the sender immediately by reply e-mail message and permanently delete the original message.  To reply to our email administrator directly, send an email to:  user-ecde3bbc361d@xymon.invalid .  If this e-mail message concerns a contract matter, be advised that no employee or agent is authorized to conclude any binding agreement on behalf of Orlando Regional Healthcare by e-mail without express written confirmation by an officer of the corporation. Any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Orlando Regional Healthcare.

list Tom Kauffman · Wed, 11 May 2005 09:59:04 -0500 ·

I'm planning to have (on the failover system) two hobbitlaunch.cfg files
- a hobbitlaunch.stby and a hobbitlaunch.run. The .stby file has bbpage
and bbnet disabled, while .run does not. At failover time, stop hobbit,
copy the .run to .cfg, and start hobbit. At recovery time, stop hobbit,
copy .stby to .cfg, start hobbit.

 
I'm already syncing the config files from the primary to the standby
whenever changes occur (I'm the only one making changes, so far). What I
*don't* have in place is a dedicated IP address; right now, if the BB
server (now primary hobbit server) goes down, you need to know which
system is running the fallover display. And I'm trying to decide if I
want to sync the history and the rrds BACK to the primary on recovery,
or just leave a hole in the data. Right now, I'm leaning toward doing
this manually after the fact.

 
Tom

▸ quoted from Brian Lynch

-----Original Message-----
From: Brian Lynch [mailto:user-0420823115a8@xymon.invalid] 
Sent: Tuesday, May 10, 2005 2:22 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Some thoughts on clustered hobbit

I agree with your assessment, but chose the model for a few reasons
(note that I'm basing my experience on about 2 1/2 years running a dual
big brother failover setup):

1. There is always one repository for both configuration and data that
are
kept reasonably identical on both systems (within the synch delay). 
2. There is only one ip address accepting BB reports cutting down on 
both network traffic and firewall rules (for hosts in locked down
vlans). 
3. The other system can be dedicated to another purpose (it currently
hosts our documentation site that fails over in the opposite direction).
4. No redundant work is done. Indeed, no load is being 'shared' across
the systems unless you host the web server on the other box.
 There is a risk to this based on the possibility of complete machine
failure
 in between synchronizations.  Hence, Hobbit may come up without all
 the updates for hosts or alerts.  Based on my current model, I will
lose 
about a day of historical data.  These synch rates can be changed and 
a gigabit crossover between machines cuts down on any traffic imposed 
by multiple synch's. 

Note that you could very easily turn off the hobbit alerts with the same
clustering software by truncating and restoring the hobbit-alerts.cfg
file. 
Not sure how to disable the network tests, so that may require some
custom coding... Once complete, you could use the same cluster
resource sw to accomplish a 'hot' standby.

Some thoughts on clustered hobbit 🔗 link

Some thoughts on clustered hobbit