big brother replacement
list Joe Sloan
Hello list, It's that time of year again - we're looking for alternatives to our aging bb infrastructure - although it's been helped by the bbgen extensions, it is showing it's age, and is getting harder to support as time goes by. Of all the potential replacements we've looked at, I don't really like any of them - the commercial bb stuff is uninspiring, and their linux support is lacking. The other solutions tend to be heavyweight j2ee and database apps, or oddities like nagios. What I'd really love to find is something like an up-to-date version of big brother+bbgen, something like hobbit. Unfortunately, last I checked, hobbit still lacked a crucial capability that we depend on, the built-in bb failover mechanism. We have 2 data centers, several hundred miles apart, with bb servers in several lans at both sites. Each bb server has a twin at the other location, and they both monitor the servers in both data centers, but only one of the bb servers does reporting, as determined by the failover state. The bb failover has worked marvelously, and has kept bb firmly in place so far, despite the other advantages of hobbit. So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability? Thanks for your words of wisdom. Joe
list Tod Hansmann
Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct? Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over? Tod Hansmann Network Engineer
▸
-----Original Message-----
From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid]
Sent: Thursday, November 01, 2007 4:20 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] big brother replacement
Hello list,
It's that time of year again - we're looking for alternatives to our
aging bb infrastructure - although it's been helped by the bbgen
extensions, it is showing it's age, and is getting harder to support as
time goes by.
Of all the potential replacements we've looked at, I don't really like
any of them - the commercial bb stuff is uninspiring, and their linux
support is lacking. The other solutions tend to be heavyweight j2ee and
database apps, or oddities like nagios. What I'd really love to find is
something like an up-to-date version of big brother+bbgen, something
like hobbit.
Unfortunately, last I checked, hobbit still lacked a crucial capability
that we depend on, the built-in bb failover mechanism. We have 2 data
centers, several hundred miles apart, with bb servers in several lans at
both sites. Each bb server has a twin at the other location, and they
both monitor the servers in both data centers, but only one of the bb
servers does reporting, as determined by the failover state. The bb
failover has worked marvelously, and has kept bb firmly in place so far,
despite the other advantages of hobbit.
So, the $64 question: Is there anything in hobbit, or on the horizon,
which will allow hobbit to serve as a drop-in replacement for bb,
including the failover capability?
Thanks for your words of wisdom.
Joe
list Josh Luthman
I'm not entire sure what you mean when you reference the failover capability. Could you please explain how this works? I'm interested in knowing how the hostname reflects to what IP addresses, hardware running what software specifically, etc. Coming from BB1.9btf I don't know of many expansions between 1.9 and 3.3. We had some discussion about multiple servers and redundancy just a short while ago: http://www.hswn.dk/hobbiton/2007/10/msg00423.html
▸
On 11/1/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:Hello list, It's that time of year again - we're looking for alternatives to our aging bb infrastructure - although it's been helped by the bbgen extensions, it is showing it's age, and is getting harder to support as time goes by. Of all the potential replacements we've looked at, I don't really like any of them - the commercial bb stuff is uninspiring, and their linux support is lacking. The other solutions tend to be heavyweight j2ee and database apps, or oddities like nagios. What I'd really love to find is something like an up-to-date version of big brother+bbgen, something like hobbit. Unfortunately, last I checked, hobbit still lacked a crucial capability that we depend on, the built-in bb failover mechanism. We have 2 data centers, several hundred miles apart, with bb servers in several lans at both sites. Each bb server has a twin at the other location, and they both monitor the servers in both data centers, but only one of the bb servers does reporting, as determined by the failover state. The bb failover has worked marvelously, and has kept bb firmly in place so far, despite the other advantages of hobbit. So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability? Thanks for your words of wisdom. Joe
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Joe Sloan
▸
Tod Hansmann wrote:
Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct?
You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.
▸
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
IIRC It takes one bb cycle to kick in. We've not seen a false positive, as I mentioned above. It's just the standard built-in bb failover - head ~bb/ext/failover follows: #!/bin/sh # failover # # BIG BROTHER - FAILOVER SCRIPT # Sean MacGuire # # (c) Copyright Quest Software, Inc. 1997-2003 All rights reserved. # # # failover WATCHES BBNET and BBPAGER # # IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY RETURN # # To use, just add failover to the BBEXT variable in etc/bbdef.sh # # To configure BBPAGER failover: # define both the primary and failover machines as BBPAGERS in etc/bb-hosts # and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg Joe
list Josh Luthman
I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct? I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =) Josh
▸
On 11/1/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:Tod Hansmann wrote:Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct?You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?IIRC It takes one bb cycle to kick in. We've not seen a false positive, as I mentioned above. It's just the standard built-in bb failover - head ~bb/ext/failover follows: #!/bin/sh # failover # # BIG BROTHER - FAILOVER SCRIPT # Sean MacGuire # # (c) Copyright Quest Software, Inc. 1997-2003 All rights reserved. # # # failover WATCHES BBNET and BBPAGER # # IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY RETURN # # To use, just add failover to the BBEXT variable in etc/bbdef.sh # # To configure BBPAGER failover: # define both the primary and failover machines as BBPAGERS in etc/bb-hosts # and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg Joe
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Joe Sloan
▸
Josh Luthman wrote:
I'm not entire sure what you mean when you reference the failover capability. Could you please explain how this works?
I'm basically just a user of the capability, how much detail do you want? It's just the standard failover built into bb.
▸
I'm interested in knowing how the hostname reflects to what IP addresses, hardware running what software specifically, etc.
How the hostname reflects to what IP address? I'm not sure what you mean. There are no tricks here, just the standard dns naming scheme. I'm not sure what hardware has to do with it, but we're running SLES on HP/Compaq DL servers. I'm not sure what you mean by "what software" - you mean the OS, or the applications being monitored, or the exact version of bb?
Coming from BB1.9btf I don't know of many expansions between 1.9 and 3.3.
We're on 1.9 here, patched with bbgen to keep it going - AFAIK the bb code has basically languished since it was bought back in 2001 or so, so I'm curious about this version 3.3 that you speak of.
▸
We had some discussion about multiple servers and redundancy just a short while ago: http://www.hswn.dk/hobbiton/2007/10/msg00423.html
Yes, those discussions look mostly like the typical ha requirements, e.g. managing bb failover via external proxies, redirectors etc, which adds a whole new layer of cost and complexity. It would be a hard sell to justify all the new ha infrastructure if we are replacing bb with something newer and better, since bb currently handles that all by itself, with no need of an external ha system. Joe
list Tod Hansmann
I'd be using Henrik's solution as follows, given your situation: "I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary. "Config files are rsync'ed from the primary site to the disaster site regularly." Though to be honest, this failover script may be something that can be converted over to be used in hobbit. You might be better off going one of a dozen different options that are slightly different than how you have it setup, but that's up to you. Hobbit doesn't have this built-in. That's for sure. I would think it's fairly easy to use it to get much the same effect, though. I'll wait for others responses on your situation and throw my own thoughts back in tomorrow morning.
▸
Tod Hansmann
Network Engineer
-----Original Message-----
From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid]
Sent: Thursday, November 01, 2007 5:03 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] big brother replacement
Tod Hansmann wrote:Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one set reports at a time, depending on failover state, correct?
You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
IIRC It takes one bb cycle to kick in. We've not seen a false positive, as I mentioned above. It's just the standard built-in bb failover - head ~bb/ext/failover follows: #!/bin/sh # failover # # BIG BROTHER - FAILOVER SCRIPT # Sean MacGuire # # (c) Copyright Quest Software, Inc. 1997-2003 All rights reserved. # # # failover WATCHES BBNET and BBPAGER # # IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY RETURN # # To use, just add failover to the BBEXT variable in etc/bbdef.sh # # To configure BBPAGER failover: # define both the primary and failover machines as BBPAGERS in etc/bb-hosts # and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg Joe
list Tod Hansmann
I think he's just looking for the alerts. From what he's indicating, it doesn't look like he's too concerned about the display (unless he has a bunch of web pages up at once all the time). Tod Hansmann Network Engineer <http://www.directpointe.com/>;
▸
From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid]
Sent: Thursday, November 01, 2007 5:12 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] big brother replacement
I see what you're saying, but you still have to manually specify which
server you're connecting to. If the bb1.domain.tld can not be reached
the techs have to manually enter bb2.domain.tld - correct?
I know of several BB ext scripts that work perfectly with Hobbit and
even more then just needed a small weak. Would you be able to post the
entire ext script? Hopefully Henrik is willing to answer your $64
question =)
Josh
On 11/1/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
Tod Hansmann wrote:Let me see if I understand. You have several bb servers at one datacenter, each with their twin at the other datacenter, and both sets do the tests. They report to one central display server, but only one
set reports at a time, depending on failover state, correct?
You have the basic idea, but there is no single central server, just pairs of bb servers, one to a data center, in each lan which is being monitored. For each pair of bb servers, only the server at data center A does reporting, unless the server in data center B cannot reach the server in data center A, in which case the server in data center B will take over the reporting duties until the bb server in data center A becomes reachable again. While this could theoretically lead to a split brain condition, the failover condition has only ever triggered when there was a wan outage.
Is this failover automatic? If so, how is this failover determined? What if this failover has a false positive? If not, what is your timeframe to swap over?
IIRC It takes one bb cycle to kick in. We've not seen a false positive, as I mentioned above. It's just the standard built-in bb failover - head ~bb/ext/failover follows: #!/bin/sh # failover # # BIG BROTHER - FAILOVER SCRIPT # Sean MacGuire # # (c) Copyright Quest Software, Inc. 1997-2003 All rights reserved. # # # failover WATCHES BBNET and BBPAGER # # IF BBNET OR BBPAGER BECOMES UNAVAILABLE, THEN TAKE OVER UNTIL THEY RETURN # # To use, just add failover to the BBEXT variable in etc/bbdef.sh # # To configure BBPAGER failover: # define both the primary and failover machines as BBPAGERS in etc/bb-hosts # and set bbwarn: FAILOVER in etc/bbwarnsetup.cfg Joe -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Joe Sloan
▸
Josh Luthman wrote:
I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?
Well, no need to enter anything manually. We have on each big brother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage.
▸
I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)
Sure, I can post the entire script - it's in the attachment - Joe
list Josh Luthman
I see now - you've a redundant BBNET. I haven't used BB in several weeks and I never got really complex with it - a lot of ping tests was what I needed out of it. Can you explain what BBNET is?
▸
On 11/1/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:Josh Luthman wrote:I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?Well, no need to enter anything manually. We have on each big brother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage.I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)Sure, I can post the entire script - it's in the attachment - Joe
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Joe Sloan
Well, bb being somewhat modular, there are 3 components: BBNET, BBPAGER, BBDISPLAY BBNET is the component of bb that tests the network services/connectivity on the remote hosts. Joe
▸
Josh Luthman wrote:I see now - you've a redundant BBNET. I haven't used BB in several weeks and I never got really complex with it - a lot of ping tests was what I needed out of it. Can you explain what BBNET is? On 11/1/07, *Sloan* <user-b1d2c84d244b@xymon.invalid <mailto:user-b1d2c84d244b@xymon.invalid>> wrote: Josh Luthman wrote:I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?Well, no need to enter anything manually. We have on each big brother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage.I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)Sure, I can post the entire script - it's in the attachment - Joe -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Josh Luthman
That would be relative to Hobbit's bbtest I believe - someone correct me if I'm wrong, I'm just guessing here! I don't see any reason to think that script wouldn't work with some variable changes like BBNET=`$GREP BBNET $BBHOSTS | $GREP "^[0-9]" | $GREP -v "^\#"` but I am not an expert by any means! Getting back to the version 3.3 - after 1.9btf Quest starting selling the product. I don't know the exact history behind it but 1.9btf is what you get without paying for anything. I have worked with and continue to monitor a network with 3.1 or 3.2 (they decided to revert from 3.3 as it looks quite a bit different on the BBDISPLAY) and I honestly don't see what they've changed between 1.9 and 3.2. Most of the features of BB I heard of or read were not only already in Hobbit but were even better then what I had heard. Not to mention the dozens of BB scripts that can be relatively painless to migrate.
▸
Josh
On 11/1/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:Well, bb being somewhat modular, there are 3 components: BBNET, BBPAGER, BBDISPLAY BBNET is the component of bb that tests the network services/connectivity on the remote hosts. Joe Josh Luthman wrote:I see now - you've a redundant BBNET. I haven't used BB in several weeks and I never got really complex with it - a lot of ping tests was what I needed out of it. Can you explain what BBNET is? On 11/1/07, *Sloan* <user-b1d2c84d244b@xymon.invalid <mailto:user-b1d2c84d244b@xymon.invalid>> wrote: Josh Luthman wrote:I see what you're saying, but you still have to manually specify which server you're connecting to. If the bb1.domain.tld can not be reached the techs have to manually enter bb2.domain.tld - correct?Well, no need to enter anything manually. We have on each big brother server a page with links to all the other bb servers, and I'm sure the support folks have links bookmarked, so if the DC1 data center is down, then instead of clicking on e.g. dc1bbdata, they'd click on dc2bbdata. Normally the 2 bb display servers in each pair provide an identical view, so this only matters in the event of an outage.I know of several BB ext scripts that work perfectly with Hobbit and even more then just needed a small weak. Would you be able to post the entire ext script? Hopefully Henrik is willing to answer your $64 question =)Sure, I can post the entire script - it's in the attachment - Joe -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Joe Sloan
▸
Josh Luthman wrote:
That would be relative to Hobbit's bbtest I believe - someone correct me if I'm wrong, I'm just guessing here! I don't see any reason to think that script wouldn't work with some variable changes like BBNET=`$GREP BBNET $BBHOSTS | $GREP "^[0-9]" | $GREP -v "^\#"` but I am not an expert by any means!
Well, depending on what I hear on this list in the next day or so,
taking a crack at adapting the old bb failover script may be my best
option.
▸
Getting back to the version 3.3 - after 1.9btf Quest starting selling the product. I don't know the exact history behind it but 1.9btf is what you get without paying for anything. I have worked with and continue to monitor a network with 3.1 or 3.2 (they decided to revert from 3.3 as it looks quite a bit different on the BBDISPLAY) and I honestly don't see what they've changed between 1.9 and 3.2. Most of the features of BB I heard of or read were not only already in Hobbit but were even better then what I had heard. Not to mention the dozens of BB scripts that can be relatively painless to migrate.
Ah, interesting - I always had the feeling that quest didn't do much of anything with the code except put in some verbiage and legal warnings, and tried to push their own proprietary and non linux-friendly stuff and left the bb code base to slowly decay. Joe
list Josh Luthman
You think their lack of linux support is bad? Get a quote.
▸
On 11/1/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:Josh Luthman wrote:That would be relative to Hobbit's bbtest I believe - someone correct me if I'm wrong, I'm just guessing here! I don't see any reason to think that script wouldn't work with some variable changes like BBNET=`$GREP BBNET $BBHOSTS | $GREP "^[0-9]" | $GREP -v "^\#"` but I am not an expert by any means!Well, depending on what I hear on this list in the next day or so, taking a crack at adapting the old bb failover script may be my best option.Getting back to the version 3.3 - after 1.9btf Quest starting selling the product. I don't know the exact history behind it but 1.9btf is what you get without paying for anything. I have worked with and continue to monitor a network with 3.1 or 3.2 (they decided to revert from 3.3 as it looks quite a bit different on the BBDISPLAY) and I honestly don't see what they've changed between 1.9 and 3.2. Most of the features of BB I heard of or read were not only already in Hobbit but were even better then what I had heard. Not to mention the dozens of BB scripts that can be relatively painless to migrate.Ah, interesting - I always had the feeling that quest didn't do much of anything with the code except put in some verbiage and legal warnings, and tried to push their own proprietary and non linux-friendly stuff and left the bb code base to slowly decay. Joe
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Mark Deiss
For a vanilla BB environment, you can have multiple BBDISPLAY entities but the recommendation is that there is only one BBNET entity. A BBNET server that is generating the pings out to the clients will be sending the ping results to all of the BBDISPLAY entities (as defined on the BBNET host). If you have multiple BBNET entities that ping the same servers, you will be sending duplicated results as far as the individual BBDISPLAY servers are concerned (the connection messages will be renamed to the host being pinged). To support multiple BBNETs in a non-race environment requires additional coding to carefully direct the BBNET results to not trip over each other. The default behavior is to pump them out to whatever BBDISPLAY is listed - you get the race conditions when you want all the BBDISPLAY servers to monitor all of the BBNET hosts (i.e. want BBNET to send their client-side tests to the BBDISPLAY entities - this will result in the BBNET poll messages going out to all the BBDISPLAY entities also). It's doable, hacked the heck out of the BB code base years ago to support three separate BBDISPLAY/BBNET servers that provided redundant monitoring over a client base and over each other. The goal in this case is the BBNET directs its status messages to only a defined group of BBDISPLAY servers - that will only have the one source of BBNET message traffic. The rest of the BBNET's tests would be allowed to go to a wider distribution of BBDISPLAY servers. These other tests would be keyed to the BBNET's server name so there would not be a race or conflict conditions occurring on the BBDISPLAY servers. The BBNET race condition may seem minor - but then think about what is going on with any RRD database entries - you would be updating if from all the BBNET entities in a given time window. Resulting trends can get really bizarre if the BBNET polls are originating from different network segments with different response times. Bad times from one segment getting masked in the trends due to updates occurring from another segment etc. Main difference in the commercial version of BB over the BTF version is that they added support for encrypting the communications from the clients to the servers. I would place some value on that as some sites are running external tests that are sending sensitive client information to the BBDISPLAY boxes. There may be a difference in the level of included documentation - but who reads the documentation anyways..... Maybe Mr. Croteau and Mr. MacGuire will do a LBO and take BB private again.
▸
-----Original Message-----
From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid]
Sent: Thursday, November 01, 2007 8:05 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] big brother replacement
Josh Luthman wrote:That would be relative to Hobbit's bbtest I believe - someone correct me if I'm wrong, I'm just guessing here! I don't see any reason to think that script wouldn't work with some variable changes like BBNET=`$GREP BBNET $BBHOSTS | $GREP "^[0-9]" | $GREP -v "^\#"` but I am not an expert by any means!
Well, depending on what I hear on this list in the next day or so, taking a crack at adapting the old bb failover script may be my best option.
Getting back to the version 3.3 - after 1.9btf Quest starting selling the product. I don't know the exact history behind it but 1.9btf is what you get without paying for anything. I have worked with and continue to monitor a network with 3.1 or 3.2 (they decided to revert from 3.3 as it looks quite a bit different on the BBDISPLAY) and I honestly don't see what they've changed between 1.9 and 3.2. Most of the features of BB I heard of or read were not only already in Hobbit but were even better then what I had heard. Not to mention the dozens of BB scripts that can be relatively painless to migrate.
Ah, interesting - I always had the feeling that quest didn't do much of anything with the code except put in some verbiage and legal warnings, and tried to push their own proprietary and non linux-friendly stuff and left the bb code base to slowly decay. Joe
list Paul Williamson
The biggest problem I have with going to Hobbit is there is no snmp trap sending support. We don't use BB as our main interface for showing all alerts, but it is required and we do send snmp traps from the BBPAGER to our main dashboard of alerts. Is this functionality in Hobbit yet? ************************************ This email may contain privileged and/or confidential information that is intended solely for the use of the addressee. If you are not the intended recipient or entity, you are strictly prohibited from disclosing, copying, distributing or using any of the information contained in the transmission. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. This communication may contain nonpublic personal information about consumers subject to the restrictions of the Gramm-Leach-Bliley Act and the Sarbanes-Oxley Act. You may not directly or indirectly reuse or disclose such information for any purpose other than to provide the services for which you are receiving the information. There are risks associated with the use of electronic transmission. The sender of this information does not control the method of transmittal or service providers and assumes no duty or obligation for the security, receipt, or third party interception of this transmission. ************************************
list Henrik Størner
Hi Joe,
▸
On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?
The BB "failover" script does two things: It makes the network tests
run on the failover server if the primary BBNET server cannot be
ping'ed; and it enables alerts being sent from the failover server
if there is no connection from the failover server to the primary
BBPAGER server.
The network-test failover is fairly simple to do. I've attached two
scripts here, both of which must run on the backup/standby/failover
server:
1) failover.sh - goes in ~hobbit/server/ext/
Add a section to hobbitlaunch.cfg with
[failovercheck]
ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com
"10.0.0.1" is the IP of your primary Hobbit server,
"hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the
primary network test machine.
What this does is that it queries the primary Hobbit server for how
long ago the network tests were updated. If more than 7 minutes ago
it deems the primary network test node to be DOWN, and flags this via
the file $BBTMP/primarynetDOWN. If the network test update was less
than 7 minutes ago, it removes the file.
This is then used by the other script, which replaces the CMD in the
"[bbnet]" section in hobbitlaunch.cfg.
2) failovernet.sh - goes in ~hobbit/server/ext/
When this runs to do the normal network tests, it will check for the
presence of the $BBTMP/primarynetDOWN file. If this file exists, it
picks up the IP of the primary Hobbit server from the file, and
modifies the settings to report data to both the normal (local)
Hobbit server, and to the primary server. If the file does not exist,
it will just run the network tests the normal way.
So to run this, modify the [bbnet] section in hobbitlaunch.cfg and
change the CMD setting to "$BBHOME/server/ext/failovernet.sh"
The alert failover is different, because Hobbit doesn't have a separate
BBPAGER server - alerts are sent from the same host that handles the
Hobbit data collection and webpages. A solution to this has been
implemented for the next release, where the alerting module can be
distributed onto multiple servers, but only one of them will send alerts
at any given time.
Regards,
Henrik
Attachments (2)
list Henrik Størner
▸
On Fri, Nov 02, 2007 at 08:30:38AM -0400, PAUL WILLIAMSON wrote:
The biggest problem I have with going to Hobbit is there is no snmp trap sending support. We don't use BB as our main interface for showing all alerts, but it is required and we do send snmp traps from the BBPAGER to our main dashboard of alerts. Is this functionality in Hobbit yet?
BB really just calls the "snmptrap" utility to send the trap message to your SNMP based console. A script to do the same from hobbit-alerts.cfg is a very simple solution to this. Henrik
list Josh Luthman
So I take it that Joe has to Paypal Henrik $64 now? Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this! Thanks to all three of you!
▸
On 11/2/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com "10.0.0.1" is the IP of your primary Hobbit server, "hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Josh Luthman
I believe there is a workaround for it using something else. I know you can find it in the mailing list archives - it has been asked and discussed many times. I read that Henrik has SNMP planned for a later release, too, though I don't believe that is confirmed. Josh
▸
On 11/2/07, PAUL WILLIAMSON <user-4a2fa5b5a229@xymon.invalid> wrote:The biggest problem I have with going to Hobbit is there is no snmp trap sending support. We don't use BB as our main interface for showing all alerts, but it is required and we do send snmp traps from the BBPAGER to our main dashboard of alerts. Is this functionality in Hobbit yet? ************************************ This email may contain privileged and/or confidential information that is intended solely for the use of the addressee. If you are not the intended recipient or entity, you are strictly prohibited from disclosing, copying, distributing or using any of the information contained in the transmission. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. This communication may contain nonpublic personal information about consumers subject to the restrictions of the Gramm-Leach-Bliley Act and the Sarbanes-Oxley Act. You may not directly or indirectly reuse or disclose such information for any purpose other than to provide the services for which you are receiving the information. There are risks associated with the use of electronic transmission. The sender of this information does not control the method of transmittal or service providers and assumes no duty or obligation for the security, receipt, or third party interception of this transmission. ************************************
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Greg L Hubbard
Paul, Drop me a private email and I will tell you what I am doing for my "main dashboard of alerts." GLH
▸
From: PAUL WILLIAMSON [mailto:user-4a2fa5b5a229@xymon.invalid]
Sent: Friday, November 02, 2007 7:31 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] big brother replacement
The biggest problem I have with going to Hobbit is there is no
snmp trap sending support. We don't use BB as our main interface for
showing all alerts, but it is required and we do send snmp traps from
the BBPAGER to our main dashboard of alerts. Is this functionality in
Hobbit yet?
************************************
This email may contain privileged and/or confidential
information that is intended solely for the use of the addressee. If you
are not the intended recipient or entity, you are strictly prohibited
from disclosing, copying, distributing or using any of the information
contained in the transmission. If you received this communication in
error, please contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy. This communication may
contain nonpublic personal information about consumers subject to the
restrictions of the Gramm-Leach-Bliley Act and the Sarbanes-Oxley Act.
You may not directly or indirectly reuse or disclose such information
for any purpose other than to provide the services for which you are
receiving the information.
There are risks associated with the use of electronic
transmission. The sender of this information does not control the method
of transmittal or service providers and assumes no duty or obligation
for the security, receipt, or third party interception of this
transmission.
************************************
list Tod Hansmann
For posterity: We use mrtg to do all our SNMP polling (I know devmon is the popular solution hear, but it's messy and so multi-filed so we have a bad taste for it) and for traps we have a script we stopped using recently, but it was setup much like Henrik stated. It was just added to hobbit-alerts.cfg and life was good.
▸
Tod Hansmann Network Engineer <http://www.directpointe.com/>; From: Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] Sent: Friday, November 02, 2007 8:48 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] big brother replacement I believe there is a workaround for it using something else. I know you can find it in the mailing list archives - it has been asked and discussed many times. I read that Henrik has SNMP planned for a later release, too, though I don't believe that is confirmed. Josh On 11/2/07, PAUL WILLIAMSON <user-4a2fa5b5a229@xymon.invalid> wrote: The biggest problem I have with going to Hobbit is there is no snmp trap sending support. We don't use BB as our main interface for showing all alerts, but it is required and we do send snmp traps from the BBPAGER to our main dashboard of alerts. Is this functionality in Hobbit yet? ************************************ This email may contain privileged and/or confidential information that is intended solely for the use of the addressee. If you are not the intended recipient or entity, you are strictly prohibited from disclosing, copying, distributing or using any of the information contained in the transmission. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. This communication may contain nonpublic personal information about consumers subject to the restrictions of the Gramm-Leach-Bliley Act and the Sarbanes-Oxley Act. You may not directly or indirectly reuse or disclose such information for any purpose other than to provide the services for which you are receiving the information. There are risks associated with the use of electronic transmission. The sender of this information does not control the method of transmittal or service providers and assumes no duty or obligation for the security, receipt, or third party interception of this transmission. ************************************ -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Buchan Milne
▸
On Friday 02 November 2007 17:07:28 Tod Hansmann wrote:
For posterity: We use mrtg to do all our SNMP polling (I know devmon is the popular solution hear, but it's messy and so multi-filed so we have a bad taste for it)
Would you like to expand on what's wrong with devmon? The problem I have with mrtg/cacti etc. is that they *only* do trends, which is half the job. I have got interface graphs (as well as temperature for Dell and HP servers) working, and there is very little that our cacti installation is doing that I still need to have available on devmon. Regards, Buchan
list Joe Sloan
This looks promising, I'll give it a whirl - Joe
▸
Henrik Stoerner wrote:Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com "10.0.0.1" is the IP of your primary Hobbit server, "hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik
list Joe Sloan
Yes, but keep in mind that's $64 octal. Joe
▸
Josh Luthman wrote:So I take it that Joe has to Paypal Henrik $64 now? Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this! Thanks to all three of you!
On 11/2/07, *Henrik Stoerner* <user-ce4a2c883f75@xymon.invalid <mailto:user-ce4a2c883f75@xymon.invalid>>
▸
wrote: Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd
CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1>; hobbitnet.mydom.com <http://hobbitnet.mydom.com>; "10.0.0.1 <http://10.0.0.1>"; is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>"; is the
▸
hostname (in the bb-hosts file) of the
primary network test machine.
What this does is that it queries the primary Hobbit server for
how
long ago the network tests were updated. If more than 7 minutes ago
it deems the primary network test node to be DOWN, and flags
this via
the file $BBTMP/primarynetDOWN. If the network test update was
less
than 7 minutes ago, it removes the file.
This is then used by the other script, which replaces the CMD
in the
"[bbnet]" section in hobbitlaunch.cfg.
2) failovernet.sh - goes in ~hobbit/server/ext/
When this runs to do the normal network tests, it will check
for the
presence of the $BBTMP/primarynetDOWN file. If this file exists, it
picks up the IP of the primary Hobbit server from the file, and
modifies the settings to report data to both the normal (local)
Hobbit server, and to the primary server. If the file does not
exist,
it will just run the network tests the normal way.
So to run this, modify the [bbnet] section in hobbitlaunch.cfg and
change the CMD setting to "$BBHOME/server/ext/failovernet.sh"
The alert failover is different, because Hobbit doesn't have a
separate
BBPAGER server - alerts are sent from the same host that handles the
Hobbit data collection and webpages. A solution to this has been
implemented for the next release, where the alerting module can be
distributed onto multiple servers, but only one of them will send
alerts
at any given time.
Regards,
Henrik
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencerlist Joe Sloan
Hello Tod, Since you've stopped using the script, would you be willing to share it for posterity sake? Joe
▸
Tod Hansmann wrote:For posterity: We use mrtg to do all our SNMP polling (I know devmon is the popular solution hear, but it’s messy and so multi-filed so we have a bad taste for it) and for traps we have a script we stopped using recently, but it was setup much like Henrik stated. It was just added to hobbit-alerts.cfg and life was good. *Tod Hansmann* Network Engineer [http://www.directpointe.com/] <http://www.directpointe.com/>; *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Friday, November 02, 2007 8:48 AM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] big brother replacement I believe there is a workaround for it using something else. I know you can find it in the mailing list archives - it has been asked and discussed many times. I read that Henrik has SNMP planned for a later release, too, though I don't believe that is confirmed. Josh On 11/2/07, *PAUL WILLIAMSON* <user-4a2fa5b5a229@xymon.invalid <mailto:user-4a2fa5b5a229@xymon.invalid>> wrote: The biggest problem I have with going to Hobbit is there is no snmp trap sending support. We don't use BB as our main interface for showing all alerts, but it is required and we do send snmp traps from the BBPAGER to our main dashboard of alerts. Is this functionality in Hobbit yet? ************************************ This email may contain privileged and/or confidential information that is intended solely for the use of the addressee. If you are not the intended recipient or entity, you are strictly prohibited from disclosing, copying, distributing or using any of the information contained in the transmission. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. This communication may contain nonpublic personal information about consumers subject to the restrictions of the Gramm-Leach-Bliley Act and the Sarbanes-Oxley Act. You may not directly or indirectly reuse or disclose such information for any purpose other than to provide the services for which you are receiving the information. There are risks associated with the use of electronic transmission. The sender of this information does not control the method of transmittal or service providers and assumes no duty or obligation for the security, receipt, or third party interception of this transmission. ************************************ -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Galen Johnson
Aw...don't be cheap...go ahead and kick in the other $12...
▸
-----Original Message-----
From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid]
Sent: Friday, November 02, 2007 1:20 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] big brother replacement
Yes, but keep in mind that's $64 octal.
Joe
Josh Luthman wrote:So I take it that Joe has to Paypal Henrik $64 now? Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this! Thanks to all three of you! On 11/2/07, *Henrik Stoerner* <user-ce4a2c883f75@xymon.invalid <mailto:user-ce4a2c883f75@xymon.invalid>> wrote: Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1>; hobbitnet.mydom.com <http://hobbitnet.mydom.com>; "10.0.0.1 <http://10.0.0.1>"; is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>"; is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Tod Hansmann
Sorry, no can do. It's long gone. I didn't set it up but I'm pretty sure we just got it off of deadcat (though I just did a couple searches and found nothing like what we were using). Probably wouldn't be hard to write a new one for someone who knows snmp "syntax" well enough. Just pass the hobbit message to the server receiving traps, and you're golden. It wasn't exactly helpful for us when we use the emails and the display page as the key modes of contact for alarms in our environment.
▸
Tod Hansmann
Network Engineer
-----Original Message-----
From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid]
Sent: Friday, November 02, 2007 11:21 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] big brother replacement
Hello Tod,
Since you've stopped using the script, would you be willing to share it
for posterity sake?
Joe
Tod Hansmann wrote:For posterity: We use mrtg to do all our SNMP polling (I know devmon is the popular solution hear, but it's messy and so multi-filed so we have a bad taste for it) and for traps we have a script we stopped using recently, but it was setup much like Henrik stated. It was just added to hobbit-alerts.cfg and life was good. *Tod Hansmann* Network Engineer [http://www.directpointe.com/] <http://www.directpointe.com/>; *From:* Josh Luthman [mailto:user-4c45a83f15cb@xymon.invalid] *Sent:* Friday, November 02, 2007 8:48 AM *To:* user-ae9b8668bcde@xymon.invalid *Subject:* Re: [hobbit] big brother replacement I believe there is a workaround for it using something else. I know you can find it in the mailing list archives - it has been asked and discussed many times. I read that Henrik has SNMP planned for a later release, too, though I don't believe that is confirmed. Josh On 11/2/07, *PAUL WILLIAMSON* <user-4a2fa5b5a229@xymon.invalid <mailto:user-4a2fa5b5a229@xymon.invalid>> wrote: The biggest problem I have with going to Hobbit is there is no snmp trap sending support. We don't use BB as our main interface for showing all alerts, but it is required and we do send snmp traps from the BBPAGER to our main dashboard of alerts. Is this functionality in Hobbit yet? ************************************ This email may contain privileged and/or confidential information that is intended solely for the use of the addressee. If you are not the intended recipient or entity, you are strictly prohibited from disclosing, copying, distributing or using any of the information contained in the transmission. If you received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. This communication may contain nonpublic personal information about consumers subject to the restrictions of the Gramm-Leach-Bliley Act and the Sarbanes-Oxley Act. You may not directly or indirectly reuse or disclose such information for any purpose other than to provide the services for which you are receiving the information. There are risks associated with the use of electronic transmission. The sender of this information does not control the method of transmittal or service providers and assumes no duty or obligation for the security, receipt, or third party interception of this transmission. ************************************ -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Joe Sloan
Galen Johnson wrote:
Aw...don't be cheap...go ahead and kick in the other $12...
OK you win - $40 hex, as soon as I get paid. Joe
▸
-----Original Message----- From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid] Sent: Friday, November 02, 2007 1:20 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] big brother replacement Yes, but keep in mind that's $64 octal. Joe Josh Luthman wrote:So I take it that Joe has to Paypal Henrik $64 now? Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this! Thanks to all three of you! On 11/2/07, *Henrik Stoerner* <user-ce4a2c883f75@xymon.invalid <mailto:user-ce4a2c883f75@xymon.invalid>> wrote: Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement forbb,including the failover capability?The BB "failover" script does two things: It makes the networktestsrun on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attachedtwoscripts here, both of which must run on thebackup/standby/failoverserver: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1>; hobbitnet.mydom.com <http://hobbitnet.mydom.com>; "10.0.0.1 <http://10.0.0.1>"; is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>"; is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutesagoit deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this fileexists, itpicks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfgandchange the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handlestheHobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Galen Johnson
Just offer $1000000 binary...man, I'm on geek overload...
▸
-----Original Message-----
From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid]
Sent: Friday, November 02, 2007 2:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] big brother replacement
Galen Johnson wrote:Aw...don't be cheap...go ahead and kick in the other $12...
OK you win - $40 hex, as soon as I get paid. Joe
-----Original Message----- From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid] Sent: Friday, November 02, 2007 1:20 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] big brother replacement Yes, but keep in mind that's $64 octal. Joe Josh Luthman wrote:So I take it that Joe has to Paypal Henrik $64 now? Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this! Thanks to all three of you! On 11/2/07, *Henrik Stoerner* <user-ce4a2c883f75@xymon.invalid wrote: Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement forbb,including the failover capability?The BB "failover" script does two things: It makes the networktestsrun on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attachedtwoscripts here, both of which must run on thebackup/standby/failoverserver: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1>; hobbitnet.mydom.com <http://hobbitnet.mydom.com>; "10.0.0.1 <http://10.0.0.1>"; is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>"; is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutesagoit deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this fileexists, itpicks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal
(local)
Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfgandchange the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handlestheHobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Gary Baluha
On 11/2/07, Galen Johnson <user-87f955643e3d@xymon.invalid> wrote:
Just offer $1000000 binary...man, I'm on geek overload...
::shakes head::
▸
-----Original Message-----From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid] Sent: Friday, November 02, 2007 2:31 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] big brother replacement Galen Johnson wrote:Aw...don't be cheap...go ahead and kick in the other $12...OK you win - $40 hex, as soon as I get paid. Joe-----Original Message----- From: Sloan [mailto:user-b1d2c84d244b@xymon.invalid] Sent: Friday, November 02, 2007 1:20 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] big brother replacement Yes, but keep in mind that's $64 octal. Joe Josh Luthman wrote:So I take it that Joe has to Paypal Henrik $64 now? Please let me, and everyone else of course, know how the failover script works on Hobbit. I'd be very interested in knowing the result to this! Thanks to all three of you! On 11/2/07, *Henrik Stoerner* <user-ce4a2c883f75@xymon.invalid wrote: Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement forbb,including the failover capability?The BB "failover" script does two things: It makes the networktestsrun on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attachedtwoscripts here, both of which must run on thebackup/standby/failoverserver: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 <http://10.0.0.1>; hobbitnet.mydom.com <http://hobbitnet.mydom.com>; "10.0.0.1 <http://10.0.0.1>"; is the IP of your primary Hobbit server, "hobbitnet.mydom.com <http://hobbitnet.mydom.com>"; is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutesagoit deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this fileexists, itpicks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal(local)Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfgandchange the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handlestheHobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Joe Sloan
▸
Henrik Stoerner wrote:
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
So the distributed alerting capability will be in the soon-to-be-released 4.3.0? Joe
list Joe Sloan
▸
Deiss, Mark wrote:
For a vanilla BB environment, you can have multiple BBDISPLAY entities but the recommendation is that there is only one BBNET entity. A BBNET server that is generating the pings out to the clients will be sending the ping results to all of the BBDISPLAY entities (as defined on the BBNET host). If you have multiple BBNET entities that ping the same servers, you will be sending duplicated results as far as the individual BBDISPLAY servers are concerned (the connection messages will be renamed to the host being pinged). To support multiple BBNETs in a non-race environment requires additional coding to carefully direct the BBNET results to not trip over each other. The default behavior is to pump them out to whatever BBDISPLAY is listed - you get the race conditions when you want all the BBDISPLAY servers to monitor all of the BBNET hosts (i.e. want BBNET to send their client-side tests to the BBDISPLAY entities - this will result in the BBNET poll messages going out to all the BBDISPLAY entities also).
Interestingly enough, we've been running redundant bb servers for each lan, without any concern for race conditions and while that has it's own peculiar behavior in corner cases, we've never seen any sort of real, intractable problems with it. The general consensus here is that redundancy is good, except for the notifications - we don't want to be notified twice for every incident, thus the so-called bb "failover" capability saves us that annoyance with no extra hacks required. I probably made it sound a lot more sophisticated than it really is - we really just have active/active BBNET/BBDISPLAY servers, with the delegation of BBPAGER decided by the failover status. It looks like Henrik has a good roadmap to get there in 4.3 from what I read here, so hopefully we've got our bb replacement at last. The only other concern is that we copy all bb notifications as snmp traps to netcool, but it looks as though that should be with a hobbit plugin. Joe
list Josh Luthman
Joe, Do you have any support to any extent with BB? The main reason I switched was that there was a mailing list to look to for support. Secondly, it wasn't BB. Josh
▸
On 11/2/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:Deiss, Mark wrote:For a vanilla BB environment, you can have multiple BBDISPLAY entities but the recommendation is that there is only one BBNET entity. A BBNET server that is generating the pings out to the clients will be sending the ping results to all of the BBDISPLAY entities (as defined on the BBNET host). If you have multiple BBNET entities that ping the same servers, you will be sending duplicated results as far as the individual BBDISPLAY servers are concerned (the connection messages will be renamed to the host being pinged). To support multiple BBNETs in a non-race environment requires additional coding to carefully direct the BBNET results to not trip over each other. The default behavior is to pump them out to whatever BBDISPLAY is listed - you get the race conditions when you want all the BBDISPLAY servers to monitor all of the BBNET hosts (i.e. want BBNET to send their client-side tests to the BBDISPLAY entities - this will result in the BBNET poll messages going out to all the BBDISPLAY entities also).Interestingly enough, we've been running redundant bb servers for each lan, without any concern for race conditions and while that has it's own peculiar behavior in corner cases, we've never seen any sort of real, intractable problems with it. The general consensus here is that redundancy is good, except for the notifications - we don't want to be notified twice for every incident, thus the so-called bb "failover" capability saves us that annoyance with no extra hacks required. I probably made it sound a lot more sophisticated than it really is - we really just have active/active BBNET/BBDISPLAY servers, with the delegation of BBPAGER decided by the failover status. It looks like Henrik has a good roadmap to get there in 4.3 from what I read here, so hopefully we've got our bb replacement at last. The only other concern is that we copy all bb notifications as snmp traps to netcool, but it looks as though that should be with a hobbit plugin. Joe
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Joe Sloan
We have no official bb support, just google, and the experience of local sys admins. But it's been here so long that infrastructure has grown up around it, which really drives the need for a drop-in replacement. Joe
▸
Josh Luthman wrote:Joe, Do you have any support to any extent with BB? The main reason I switched was that there was a mailing list to look to for support. Secondly, it wasn't BB. Josh On 11/2/07, *Sloan* <user-b1d2c84d244b@xymon.invalid <mailto:user-b1d2c84d244b@xymon.invalid>> wrote: Deiss, Mark wrote:For a vanilla BB environment, you can have multiple BBDISPLAY entities but the recommendation is that there is only one BBNET entity. A BBNET server that is generating the pings out to the clients will be sending the ping results to all of the BBDISPLAY entities (as defined on the BBNET host). If you have multiple BBNET entities that ping the same servers, you will be sending duplicated results as far as the individual BBDISPLAY servers are concerned (the connection messages will be renamed to the host being pinged). To support multiple BBNETs in a non-race environment requires additional coding to carefully direct the BBNET results to not trip over each other. The default behavior is to pump them out to whatever BBDISPLAY is listed - you get the race conditions when you want all the BBDISPLAY servers to monitor all of the BBNET hosts (i.e. want BBNET to send their client-side tests to the BBDISPLAY entities - this will result in the BBNET poll messages going out to all the BBDISPLAY entities also).Interestingly enough, we've been running redundant bb servers for each lan, without any concern for race conditions and while that has it's own peculiar behavior in corner cases, we've never seen any sort of real, intractable problems with it. The general consensus here is that redundancy is good, except for the notifications - we don't want to be notified twice for every incident, thus the so-called bb "failover" capability saves us that annoyance with no extra hacks required. I probably made it sound a lot more sophisticated than it really is - we really just have active/active BBNET/BBDISPLAY servers, with the delegation of BBPAGER decided by the failover status. It looks like Henrik has a good roadmap to get there in 4.3 from what I read here, so hopefully we've got our bb replacement at last. The only other concern is that we copy all bb notifications as snmp traps to netcool, but it looks as though that should be with a hobbit plugin. Joe -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Josh Luthman
I really like this mailing list - you learn quite a bit even just by reading the questions of others. I'm the only system administrator around here so I need a shoulder to lean on every once and a while! Are you saying you've used BB so much that you know it too well and need a replacement? Sounds kind of backwards to me =P
▸
On 11/2/07, joe <user-b1d2c84d244b@xymon.invalid> wrote:We have no official bb support, just google, and the experience of local sys admins. But it's been here so long that infrastructure has grown up around it, which really drives the need for a drop-in replacement. Joe Josh Luthman wrote:Joe, Do you have any support to any extent with BB? The main reason I switched was that there was a mailing list to look to for support. Secondly, it wasn't BB. Josh On 11/2/07, *Sloan* <user-b1d2c84d244b@xymon.invalid <mailto:user-b1d2c84d244b@xymon.invalid>> wrote: Deiss, Mark wrote:For a vanilla BB environment, you can have multiple BBDISPLAY entities but the recommendation is that there is only one BBNET entity. ABBNETserver that is generating the pings out to the clients will be sending the ping results to all of the BBDISPLAY entities (as defined on the BBNET host). If you have multiple BBNET entities that ping the same servers, you will be sending duplicated results as far as the individual BBDISPLAY servers are concerned (the connection messages will be renamed to the host being pinged). To support multipleBBNETsin a non-race environment requires additional coding to carefully direct the BBNET results to not trip over each other. The default behavior is to pump them out to whatever BBDISPLAY is listed - you get the race conditions when you want all the BBDISPLAY servers to monitor all of the BBNET hosts (i.e. want BBNET to send their client-side tests to the BBDISPLAY entities - this will result in the BBNET poll messages going out to all the BBDISPLAY entities also).Interestingly enough, we've been running redundant bb servers for each lan, without any concern for race conditions and while that has it's own peculiar behavior in corner cases, we've never seen any sort of real, intractable problems with it. The general consensus here is that redundancy is good, except for the notifications - we don't want to be notified twice for every incident, thus the so-called bb "failover" capability saves us that annoyance with no extra hacks required. I probably made it sound a lot more sophisticated than it really is- wereally just have active/active BBNET/BBDISPLAY servers, with the delegation of BBPAGER decided by the failover status. It looks like Henrik has a good roadmap to get there in 4.3 from what I read here, so hopefully we've got our bb replacement at last. The only other concern is that we copy all bb notifications as snmp traps to netcool, but it looks as though that should be with a hobbit plugin. Joe -- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Ryan Jay B. Lapuz
Hi Henrick, I just setup a backup server for our primary hobbit server yesterday that was inspired by this: From Henrick:
▸
I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary. --
((manual I think))Now that is my current setup, however you just created a failover script which will make the failover transition automatic.The concept is just the same but, is the clients will still report to both HB servers if there is no failover? Thanks and regards, Ryan
▸
----- Original Message -----
From: "Henrik Stoerner" <user-ce4a2c883f75@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, November 02, 2007 10:37 PM
Subject: Re: [hobbit] big brother replacement
Hi Joe, On Thu, Nov 01, 2007 at 03:20:12PM -0700, Sloan wrote:So, the $64 question: Is there anything in hobbit, or on the horizon, which will allow hobbit to serve as a drop-in replacement for bb, including the failover capability?The BB "failover" script does two things: It makes the network tests run on the failover server if the primary BBNET server cannot be ping'ed; and it enables alerts being sent from the failover server if there is no connection from the failover server to the primary BBPAGER server. The network-test failover is fairly simple to do. I've attached two scripts here, both of which must run on the backup/standby/failover server: 1) failover.sh - goes in ~hobbit/server/ext/ Add a section to hobbitlaunch.cfg with [failovercheck] ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg NEEDS hobbitd CMD $BBHOME/ext/failover.sh 10.0.0.1 hobbitnet.mydom.com "10.0.0.1" is the IP of your primary Hobbit server, "hobbitnet.mydom.com" is the hostname (in the bb-hosts file) of the primary network test machine. What this does is that it queries the primary Hobbit server for how long ago the network tests were updated. If more than 7 minutes ago it deems the primary network test node to be DOWN, and flags this via the file $BBTMP/primarynetDOWN. If the network test update was less than 7 minutes ago, it removes the file. This is then used by the other script, which replaces the CMD in the "[bbnet]" section in hobbitlaunch.cfg. 2) failovernet.sh - goes in ~hobbit/server/ext/ When this runs to do the normal network tests, it will check for the presence of the $BBTMP/primarynetDOWN file. If this file exists, it picks up the IP of the primary Hobbit server from the file, and modifies the settings to report data to both the normal (local) Hobbit server, and to the primary server. If the file does not exist, it will just run the network tests the normal way. So to run this, modify the [bbnet] section in hobbitlaunch.cfg and change the CMD setting to "$BBHOME/server/ext/failovernet.sh" The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time. Regards, Henrik
list Joe Sloan
▸
Josh Luthman wrote:
I really like this mailing list - you learn quite a bit even just by reading the questions of others. I'm the only system administrator around here so I need a shoulder to lean on every once and a while! Are you saying you've used BB so much that you know it too well and need a replacement? Sounds kind of backwards to me =P
Haha, what I mean is, big brother is showing it's age, and getting harder to support as time goes by, simply by virtue of the fact that the code is slowly rotting. But we can't just make a clean sweep - so much now depends on the way big brother behaves, that the replacement needs to be able to act identically to big brother. Joe
list Josh Luthman
Joe, I see what you're saying now. In your case Hobbit is an absolute perfect match! Hopefully we can all enjoy seeing your questions and see you answering some of ours =) Josh
▸
On 11/5/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:Josh Luthman wrote:I really like this mailing list - you learn quite a bit even just by reading the questions of others. I'm the only system administrator around here so I need a shoulder to lean on every once and a while! Are you saying you've used BB so much that you know it too well and need a replacement? Sounds kind of backwards to me =PHaha, what I mean is, big brother is showing it's age, and getting harder to support as time goes by, simply by virtue of the fact that the code is slowly rotting. But we can't just make a clean sweep - so much now depends on the way big brother behaves, that the replacement needs to be able to act identically to big brother. Joe
-- Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX Those who don't understand UNIX are condemned to reinvent it, poorly. --- Henry Spencer
list Ralph Mitchell
▸
On 11/5/07, Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
Haha, what I mean is, big brother is showing it's age, and getting harder to support as time goes by, simply by virtue of the fact that the code is slowly rotting. But we can't just make a clean sweep - so much now depends on the way big brother behaves, that the replacement needs to be able to act identically to big brother.
In your original email you said:
▸
"only one of the bb servers does reporting, as determined by the
failover state"
Does that mean only one of the two sends out notifications?? If so,
why not just have two identical Hobbit servers, with all the clients
sending reports to both, and disable the [bbpage] section in
server/etc/hobbitlaunch.cfg on the backup system. Then your failover
solution only has to detect "death of twin" and rewrite the
hobbitlaunch.cfg with DISABLED removed from that section. I guess
Hobbit might need to be reloaded or restarted for the change to take
effect.
I haven't tried this - we don't use the paging system - I'm just
wondering if something as simple as that might work.
Ralph Mitchell
list Joe Sloan
▸
Ralph Mitchell wrote:
In your original email you said: "only one of the bb servers does reporting, as determined by the failover state" Does that mean only one of the two sends out notifications?? If so, why not just have two identical Hobbit servers, with all the clients sending reports to both, and disable the [bbpage] section in server/etc/hobbitlaunch.cfg on the backup system. Then your failover solution only has to detect "death of twin" and rewrite the hobbitlaunch.cfg with DISABLED removed from that section. I guess Hobbit might need to be reloaded or restarted for the change to take effect. I haven't tried this - we don't use the paging system - I'm just wondering if something as simple as that might work.
Yes, that approach has merit, it could work. However I still need to give Henrik's quick solution from last week some testing - I'm currently building a hobbit infrastructure so I can see how his failover script behaves under pressure. Joe
list Henrik Størner
On Fri, Nov 02, 2007 at 02:48:32PM -0700, Sloan wrote:
Henrik Stoerner wrote:The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.So the distributed alerting capability will be in the soon-to-be-released 4.3.0?
Yes. Henrik
list Henrik Størner
▸
On Sat, Nov 03, 2007 at 02:27:42PM +0800, Ryan Jay B. Lapuz wrote:
I just setup a backup server for our primary hobbit server yesterday that was inspired by this:From Henrick: I run two completely separate systems in parallel, and have the clients report to both of them. The system at our disaster center has the paging module disabled (just disable the [bbpage] section in hobbitlaunch.cfg), to avoid double alerts - it is simple to activate it, if necessary. -- ((manual I think))Now that is my current setup, however you just created a failover script which will make the failover transition automatic.The concept is just the same but, is the clients will still report to both HB servers if there is no failover?
The failover script is used in a scenario where you have two servers running the network tests - and ONLY the network tests, not the Hobbit display - each reporting to their own Hobbit server. If the primary network test server server goes down then you want the backup server to automatically start feeding the test results to both servers. In that case it becomes necessary to have a failover from the primary to the backup server to do the network tests, and that is what the script I posted does. But in both scenarios you'd probably want to have the "full picture" of the health of your systems, so you do need to have the data from the clients available, both on the primary and on the backup system. Therefore the clients should be configured to send data to both systems. Henrik
list Tom L. Stewart
I tried to set up two dummy hosts use as a clone for critical systems. The first one is for the conn test and I added a bunch of systems to it(Conn_Host_P1). That worked fine. Next I created another clone master called something else (Proc_Host_P1). I used a system that was already set in (Conn_Host_P1), and I added it to Proc_Host_P1. However, now the added system disappears from Conn_Host_P1 as a clone and only appears in Proc_Host_P1. Am I understanding that only 1 master clone item can exist? I wanted master tests for conn and procs and a few other items. I am using the web page Edit Critical Systems to do this. The file permissions are set as hobbit owner and webxxx group in server/etc/hobbit-nkview.cfg. Any insight would be appreciated. Tom
list Joe Sloan
▸
On 20071102, Henrik Stoerner wrote:
The alert failover is different, because Hobbit doesn't have a separate BBPAGER server - alerts are sent from the same host that handles the Hobbit data collection and webpages. A solution to this has been implemented for the next release, where the alerting module can be distributed onto multiple servers, but only one of them will send alerts at any given time.
Could you elaborate on this alerting module configuration, or point me to TFM? If it has indeed been implemented, it sounds like it would be the key to enabling hobbit to emulate the big brother style alerting failover behavior. Joe