Some thoughts on clustered hobbit
list Tom Kauffman
First, let me express my thanks to Brian for putting this document together and allowing Henrik to distribute it! I've a lot of experience with IBM's HACMP for AIX, and getting a clustered configuration working as desired is not a trivial procedure. Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it decides if an alert is really required. Brian -- no offense, but I would rather categorise your configuration as "active/inactive". I'm looking at doing an "active/passive" cluster when time frees up -- about a month from now. The difference? I'm running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the hobbit display and re-launch hobbit with network testing and alerting enabled. Depending on host count and test count, this might be a bad idea -- but we've only got about 300 entries in bb-hosts. So -- thanks ever so much, again, for providing this -- it will make my life ever so much easier next month when I get the time to automate the failover environment. Tom Tom Kauffman NIBCO, Inc
list Henrik Størner
▸
On Mon, May 09, 2005 at 04:19:43PM -0500, Kauffman, Tom wrote:
Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it decides if an alert is really required.
Correct. In fact it does make sense to remove it, because then the client will not initiate a connection to the server to send the "page" message - which hobbitd promptly just discards.
▸
Brian -- no offense, but I would rather categorise your configuration as "active/inactive". I'm looking at doing an "active/passive" cluster when time frees up -- about a month from now. The difference? I'm running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the hobbit display and re-launch hobbit with network testing and alerting enabled.
That would certainly be interesting for me as well - it's the kind of setup I plan to implement when I get time. Regards, Henrik
list Andy France
Hi All,
I am adding a custom graph to track Oracle tablespace usage. I currently
have an "oracle" test which shows the overall green/yellow/red status of
tablespaces based on percent full thresholds.
I want to append graphs showing each tablespace's size/used stats for a
better view of historical growth, but I don't want the raw data showing in
the HTML page.
I seem to recall using a <data> type tag in the message text but can't find
a good example. I'm comfortable with post-processing the message to
generate the actual RRDs and graphs but would appreciate some help on the
message side!
TIA,
Andy.
#####################################################################################
This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').
While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################
list Al Jeffcoat
Did I miss something? I don't see anything regarding a clustered hobbit setup in the last couple of days, but would be greatly interested in clustering 2 AIX servers together across locations to handle hobbit. Al Jeffcoat IBM Certified Support Specialist, AIX Enterprise SAN and Storage Administrator System Programmer II (321)843-1051 user-b34a8ad6e24c@xymon.invalid
▸
-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid]
Sent: Monday, May 09, 2005 5:20 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Some thoughts on clustered hobbit
First, let me express my thanks to Brian for putting this document
together and allowing Henrik to distribute it! I've a lot of experience
with IBM's HACMP for AIX, and getting a clustered configuration working
as desired is not a trivial procedure.
Henrik -- check me on this: it's my impression we no longer need a
'BBPAGER' entry on the client-side bb-hosts because the hobbit server
passes all potentially alertable statuses to hobbit-alert and it decides
if an alert is really required.
Brian -- no offense, but I would rather categorise your configuration as
"active/inactive". I'm looking at doing an "active/passive" cluster when
time frees up -- about a month from now. The difference? I'm running two
hobbit/apache instances all the time -- but the 'passive' (fallover)
side is not doing alerting or network tests. It does build displays
(it's my technical documentation server as well) and it does keep both
history and rrd data updated. Both hosts show up on the client side as
'BBDISPLAY'. On failover it will take over the IP address for the hobbit
display and re-launch hobbit with network testing and alerting enabled.
Depending on host count and test count, this might be a bad idea -- but
we've only got about 300 entries in bb-hosts.
So -- thanks ever so much, again, for providing this -- it will make my
life ever so much easier next month when I get the time to automate the
failover environment.
Tom
Tom Kauffman
NIBCO, Inc
This e-mail message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. If you are not the intended recipient, any review, use, or distribution of this e-mail message and any attached files is strictly prohibited. This communication may contain material protected by Federal privacy regulations, attorney-client work product, or other privileges. If you have received this confidential communication in error, please notify the sender immediately by reply e-mail message and permanently delete the original message. To reply to our email administrator directly, send an email to: user-ecde3bbc361d@xymon.invalid . If this e-mail message concerns a contract matter, be advised that no employee or agent is authorized to conclude any binding agreement on behalf of Orlando Regional Healthcare by e-mail without express written confirmation by an officer of the corporation. Any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Orlando Regional Healthcare.
list Andy France
Andy France wrote on 10/05/2005 12:22:48:
▸
Hi All,
I am adding a custom graph to track Oracle tablespace usage. I currently have an "oracle" test which shows the overall green/yellow/red status of tablespaces based on percent full thresholds.
I want to append graphs showing each tablespace's size/used stats for a better view of historical growth, but I don't want the raw data showing in the HTML page.
I seem to recall using a <data> type tag in the message text but can't find a good example. I'm comfortable with post-processing the message to generate the actual RRDs and graphs but would appreciate some help on the message side!
TIA, Andy.
I have made some headway on this - what I was looking for was a simple set
of HTML comment tags <!-- Comment -->
So now I have my data embedded in the message thus:
<!--DATA
oracle.ZP1.psapbtabd.rrd:134696864:2010520:1.49
oracle.ZP1.psapbtabi.rrd:95354800:1822184:1.91
oracle.ZP1.psapuser1i.rrd:2585600:282128:10.91
oracle.ZP1.psapstabd.rrd:14479360:1598952:11.04
oracle.ZP1.psapstabi.rrd:14520304:1872960:12.90
oracle.ZP1.dbtotsize.rrd:317055808:46472960:14.66
oracle.ZP1.psapes40bd.rrd:6184960:1728248:27.94
oracle.ZP1.system.rrd:307200:108216:35.23
oracle.ZP1.psapes40bi.rrd:2068480:799904:38.67
oracle.ZP1.psapuser1d.rrd:2068480:887688:42.92
oracle.ZP1.psapel40bd.rrd:2068480:1025088:49.56
oracle.ZP1.psapddicd.rrd:512000:258168:50.42
oracle.ZP1.psapclud.rrd:10342400:5982072:57.84
oracle.ZP1.psapdocud.rrd:71680:44984:62.76
oracle.ZP1.psappoold.rrd:4136960:2726784:65.91
oracle.ZP1.psapdocui.rrd:71680:51568:71.94
oracle.ZP1.psapddici.rrd:512000:380504:74.32
oracle.ZP1.psappooli.rrd:4136960:3139896:75.90
oracle.ZP1.psapsourcei.rrd:307200:243096:79.13
oracle.ZP1.psapclui.rrd:2068480:1771992:85.67
oracle.ZP1.psapsourced.rrd:307200:264672:86.16
oracle.ZP1.psapproti.rrd:1024000:888880:86.81
oracle.ZP1.psapprotd.rrd:4136960:3735416:90.29
oracle.ZP1.psapel40bi.rrd:409600:373160:91.10
oracle.ZP1.psaproll.rrd:8273920:8065888:97.49
oracle.ZP1.psaploadd.rrd:102400:102288:99.89
oracle.ZP1.psaploadi.rrd:102400:102288:99.89
oracle.ZP1.psaptemp.rrd:6205440:6205416:100.00
-->
It doesn't show up on the web page, and I can happily parse it with my
custom rrd handler script (I really must get to grips with moving it to
C...).
But now I'm having trouble with the hobbitgraph.cfg entry. What I want is
a series of stacked graphs, one for each rrd file, showing used space over
total space. But I can't figure out how to specify multiple sources with
FNPATTERN, but ask rrdgraph to create a unique graph per file rather than
trying to combine them like disk etc.
I can't see any useful switches in the rrdgraph man page. Can anyone help
with this?
▸
TIA,
Andy.
#####################################################################################
This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').
While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################
list Henrik Størner
Brian Lynch wrote up a description of how to cluster two Hobbit servers - it's in the "contrib" directory in the hobbit-4.0.3rc2 sources.
▸
On Mon, May 09, 2005 at 08:41:37PM -0400, Jeffcoat, Al wrote:Did I miss something? I don't see anything regarding a clustered hobbit setup in the last couple of days, but would be greatly interested in clustering 2 AIX servers together across locations to handle hobbit.
Henrik
list Brian Lynch
▸
On 5/9/05, Kauffman, Tom <user-3feba9e60a8b@xymon.invalid> wrote:
First, let me express my thanks to Brian for putting this document together and allowing Henrik to distribute it! I've a lot of experience with IBM's HACMP for AIX, and getting a clustered configuration working as desired is not a trivial procedure. Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it decides if an alert is really required. Brian -- no offense, but I would rather categorise your configuration as "active/inactive". I'm looking at doing an "active/passive" cluster when time frees up -- about a month from now. The difference? I'm running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the hobbit display and re-launch hobbit with network testing and alerting enabled.
I agree with your assessment, but chose the model for a few reasons
(note that I'm basing my experience on about 2 1/2 years running a dual
big brother failover setup):
1. There is always one repository for both configuration and data that are
kept reasonably identical on both systems (within the synch delay). 2. There is only one ip address accepting BB reports cutting down on both network traffic and firewall rules (for hosts in locked down vlans). 3. The other system can be dedicated to another purpose (it currently
hosts our documentation site that fails over in the opposite direction).
4. No redundant work is done. Indeed, no load is being 'shared' across
the systems unless you host the web server on the other box.
There is a risk to this based on the possibility of complete machine failure
in between synchronizations. Hence, Hobbit may come up without all
the updates for hosts or alerts. Based on my current model, I will lose about a day of historical data. These synch rates can be changed and a gigabit crossover between machines cuts down on any traffic imposed by multiple synch's.
Note that you could very easily turn off the hobbit alerts with the same
clustering software by truncating and restoring the hobbit-alerts.cfg file. Not sure how to disable the network tests, so that may require some
custom coding... Once complete, you could use the same cluster
resource sw to accomplish a 'hot' standby.
▸
Depending on host count and test count, this might be a bad idea -- butwe've only got about 300 entries in bb-hosts. So -- thanks ever so much, again, for providing this -- it will make my life ever so much easier next month when I get the time to automate the failover environment. Tom Tom Kauffman NIBCO, Inc
list John Turner
▸
On May 10, 2005, at 3:21 AM, Brian Lynch wrote:
On 5/9/05, Kauffman, Tom <user-3feba9e60a8b@xymon.invalid> wrote: First, let me express my thanks to Brian for putting this document together and allowing Henrik to distribute it! I've a lot of experience with IBM's HACMP for AIX, and getting a clustered configuration working as desired is not a trivial procedure. Henrik -- check me on this: it's my impression we no longer need a 'BBPAGER' entry on the client-side bb-hosts because the hobbit server passes all potentially alertable statuses to hobbit-alert and it decides if an alert is really required. Brian -- no offense, but I would rather categorise your configuration as "active/inactive". I'm looking at doing an "active/passive" cluster when time frees up -- about a month from now. The difference? I'm running two hobbit/apache instances all the time -- but the 'passive' (fallover) side is not doing alerting or network tests. It does build displays (it's my technical documentation server as well) and it does keep both history and rrd data updated. Both hosts show up on the client side as 'BBDISPLAY'. On failover it will take over the IP address for the hobbit display and re-launch hobbit with network testing and alerting enabled.
Software exists to do this active/passive, a search for HA or FAILOVER will the Linux-HA project is one example. I biggest issue in my view is keeping the systems in sync and is often done with a shared storage.
▸
I agree with your assessment, but chose the model for a few reasons (note that I'm basing my experience on about 2 1/2 years running a dual big brother failover setup): 1. There is always one repository for both configuration and data that are kept reasonably identical on both systems (within the synch delay). 2. There is only one ip address accepting BB reports cutting down on both network traffic and firewall rules (for hosts in locked down vlans). 3. The other system can be dedicated to another purpose (it currently hosts our documentation site that fails over in the opposite direction). 4. No redundant work is done. Indeed, no load is being 'shared' across the systems unless you host the web server on the other box. There is a risk to this based on the possibility of complete machine failure in between synchronizations. Hence, Hobbit may come up without all the updates for hosts or alerts. Based on my current model, I will lose about a day of historical data. These synch rates can be changed and a gigabit crossover between machines cuts down on any traffic imposed by multiple synch's. Note that you could very easily turn off the hobbit alerts with the same
clustering software by truncating and restoring the hobbit- alerts.cfg file.
▸
Not sure how to disable the network tests, so that may require some
custom coding... Once complete, you could use the same cluster
resource sw to accomplish a 'hot' standby.
So I think the only issue you have with this method is that you don't want the extra network load. If you are willing to require a cross- over cable, and that is a requirement for most HA solutions, then one solution might be to add the idea of a BACKUP_BBDISPLAY. The BBDISPLAY server would forward all incoming messages to the BACKUP and if you have the private network cable it would not cause any load on your network. You would also need a way of telling the Hobbit software on the system that is it the BACKUP server. John
list Al Jeffcoat
Aha... I haven't had the pleasure of downloading and testing 4.0.3rc2, so that explains it :)
▸
Al Jeffcoat
IBM Certified Support Specialist, AIX
Enterprise SAN and Storage Administrator
System Programmer II
(321)843-1051
user-b34a8ad6e24c@xymon.invalid
-----Original Message-----
▸
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, May 10, 2005 1:22 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Some thoughts on clustered hobbit
Brian Lynch wrote up a description of how to cluster two Hobbit
servers - it's in the "contrib" directory in the hobbit-4.0.3rc2
sources.
On Mon, May 09, 2005 at 08:41:37PM -0400, Jeffcoat, Al wrote:Did I miss something? I don't see anything regarding a clustered hobbit setup in the last couple of days, but would be greatly interested in clustering 2 AIX servers together across locations to handle hobbit.
Henrik This e-mail message and any attached files are confidential and are intended solely for the use of the addressee(s) named above. If you are not the intended recipient, any review, use, or distribution of this e-mail message and any attached files is strictly prohibited. This communication may contain material protected by Federal privacy regulations, attorney-client work product, or other privileges. If you have received this confidential communication in error, please notify the sender immediately by reply e-mail message and permanently delete the original message. To reply to our email administrator directly, send an email to: user-ecde3bbc361d@xymon.invalid . If this e-mail message concerns a contract matter, be advised that no employee or agent is authorized to conclude any binding agreement on behalf of Orlando Regional Healthcare by e-mail without express written confirmation by an officer of the corporation. Any views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of Orlando Regional Healthcare.
list Tom Kauffman
I'm planning to have (on the failover system) two hobbitlaunch.cfg files - a hobbitlaunch.stby and a hobbitlaunch.run. The .stby file has bbpage and bbnet disabled, while .run does not. At failover time, stop hobbit, copy the .run to .cfg, and start hobbit. At recovery time, stop hobbit, copy .stby to .cfg, start hobbit. I'm already syncing the config files from the primary to the standby whenever changes occur (I'm the only one making changes, so far). What I *don't* have in place is a dedicated IP address; right now, if the BB server (now primary hobbit server) goes down, you need to know which system is running the fallover display. And I'm trying to decide if I want to sync the history and the rrds BACK to the primary on recovery, or just leave a hole in the data. Right now, I'm leaning toward doing this manually after the fact. Tom
▸
-----Original Message-----
From: Brian Lynch [mailto:user-0420823115a8@xymon.invalid]
Sent: Tuesday, May 10, 2005 2:22 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Some thoughts on clustered hobbit
I agree with your assessment, but chose the model for a few reasons
(note that I'm basing my experience on about 2 1/2 years running a dual
big brother failover setup):
1. There is always one repository for both configuration and data that
are
kept reasonably identical on both systems (within the synch delay).
2. There is only one ip address accepting BB reports cutting down on
both network traffic and firewall rules (for hosts in locked down
vlans).
3. The other system can be dedicated to another purpose (it currently
hosts our documentation site that fails over in the opposite direction).
4. No redundant work is done. Indeed, no load is being 'shared' across
the systems unless you host the web server on the other box.
There is a risk to this based on the possibility of complete machine
failure
in between synchronizations. Hence, Hobbit may come up without all
the updates for hosts or alerts. Based on my current model, I will
lose
about a day of historical data. These synch rates can be changed and
a gigabit crossover between machines cuts down on any traffic imposed
by multiple synch's.
Note that you could very easily turn off the hobbit alerts with the same
clustering software by truncating and restoring the hobbit-alerts.cfg
file.
Not sure how to disable the network tests, so that may require some
custom coding... Once complete, you could use the same cluster
resource sw to accomplish a 'hot' standby.