[re-post] xymon notifications
list Joe Sloan
We've been running big brother for some years, and we monitor several hundred servers in 2 data centers. We are currently running a pilot of xymon (4.3.0 svn), and just as with big brother, we have redundant monitoring servers. A xymon server in california monitors hosts in both california and arizona, and a xymon server in arizona monitors hosts in both data centers as well. For the most part, xymon is performing well, but it is a bit of an annoyance with xymon that we get duplicate notifications, as each xymon server sends it's own notifications for every event. Big brother sends a single notification for each event, as controlled by the alerting failover. Is there any way to suppress the duplicate notifications from xymon? Joe
list Ralph Mitchell
▸
On Wed, May 27, 2009 at 2:03 PM, J Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
We've been running big brother for some years, and we monitor several hundred servers in 2 data centers. We are currently running a pilot of xymon (4.3.0 svn), and just as with big brother, we have redundant monitoring servers. A xymon server in california monitors hosts in both california and arizona, and a xymon server in arizona monitors hosts in both data centers as well. For the most part, xymon is performing well, but it is a bit of an annoyance with xymon that we get duplicate notifications, as each xymon server sends it's own notifications for every event. Big brother sends a single notification for each event, as controlled by the alerting failover. Is there any way to suppress the duplicate notifications from xymon?
I don't think xymon has failover yet. How are you sending the alerts?? If
you're using a script, you could make the script on the backup server run a
check against the primary. If it fails, allow the backup to send out the
notifications. Not perfect, but it could at least reduce the duplicates.
Ralph Mitchell
list Joe Sloan
▸
Ralph Mitchell wrote:
On Wed, May 27, 2009 at 2:03 PM, J Sloan <user-b1d2c84d244b@xymon.invalid
<mailto:user-b1d2c84d244b@xymon.invalid>> wrote:
We've been running big brother for some years, and we monitor several
hundred servers in 2 data centers.
We are currently running a pilot of xymon (4.3.0 svn), and just as
with
big brother, we have redundant monitoring servers. A xymon server in
california monitors hosts in both california and arizona, and a xymon
server in arizona monitors hosts in both data centers as well.
For the most part, xymon is performing well, but it is a bit of an
annoyance with xymon that we get duplicate notifications, as each
xymon
server sends it's own notifications for every event.
Big brother sends a single notification for each event, as
controlled by
the alerting failover.
Is there any way to suppress the duplicate notifications from xymon?
I don't think xymon has failover yet. How are you sending the
alerts?? If you're using a script, you could make the script on the
backup server run a check against the primary. If it fails, allow the
backup to send out the notifications. Not perfect, but it could at
least reduce the duplicates.
We're using the mail facility - for instance:
$LXADM=MAIL joe TIME=W:0800:1800 REPEAT=30m
$LXDNS=%(emerald|ebony|mulholland|plifsp01|plifsp10).*
HOST=$LXDNS
$LXADM RECOVERED
Joe
list T.J. Yang
Hi, joe
We are running two xymon servers across WAN network also.
Here is a brief description how we did it.
1. xymon1 is the primpary one and xymon2 is the standby one which is dumb(not alerting).
2. all the clients send xymon messags to both xymon1 and xymon2.
3. on xymon2(standby),
1. we have a cron entry to sync xymon1 config files every 5 minutes.
2. there is a xymon2 hertbeat server side external module to check the health of xymon1.
if xymon1 is head or not healthy, this module will enable xymon2 with [bbpage] section enabled.
3. heartbeat server side module will disable its alerting once xymon1 is back online.
So we have a semi-auto fail-over architecture. but we need to take the lost of missing metrics information on xymon1 during its' down time.
keeping two xymon server in sync on same LAN is easy using HA/clustering software.
but keeping two xyomn servers in sync on two WANs far away is not easy. I heard Sun's clustering software has new feature to enable clustering over WANs, but I haven't study this myself.
Hope this help
T.J. Yang
▸
Date: Wed, 27 May 2009 12:03:34 -0700 From: user-b1d2c84d244b@xymon.invalid To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] [re-post] xymon notifications We've been running big brother for some years, and we monitor several hundred servers in 2 data centers. We are currently running a pilot of xymon (4.3.0 svn), and just as with big brother, we have redundant monitoring servers. A xymon server in california monitors hosts in both california and arizona, and a xymon server in arizona monitors hosts in both data centers as well. For the most part, xymon is performing well, but it is a bit of an annoyance with xymon that we get duplicate notifications, as each xymon server sends it's own notifications for every event. Big brother sends a single notification for each event, as controlled by the alerting failover. Is there any way to suppress the duplicate notifications from xymon? Joe
Hotmail® has ever-growing storage! Don’t worry about storage limits. http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009
list Joe Sloan
▸
T.J. Yang wrote:
Hi, joe
We are running two xymon servers across WAN network also.
Here is a brief description how we did it.
1. xymon1 is the primpary one and xymon2 is the standby one which is dumb(not alerting).
2. all the clients send xymon messags to both xymon1 and xymon2.
3. on xymon2(standby),
1. we have a cron entry to sync xymon1 config files every 5 minutes.
2. there is a xymon2 hertbeat server side external module to check the health of xymon1.
if xymon1 is head or not healthy, this module will enable xymon2 with [bbpage] section enabled.
3. heartbeat server side module will disable its alerting once xymon1 is back online.
So we have a semi-auto fail-over architecture. but we need to take the lost of missing metrics information on xymon1 during its' down time.
keeping two xymon server in sync on same LAN is easy using HA/clustering software.
but keeping two xyomn servers in sync on two WANs far away is not easy. I heard Sun's clustering software has new feature to enable clustering over WANs, but I haven't study this myself.
T.J. - Thanks for you insights. Your setup sounds like an engineering tour de force, but our needs are much simpler than that - no cluster is needed in our environment, the redundant xymon servers are providing all the reliability we need and more. In fact, a cluster would be hard to implement since the corresponding xymon servers are in separate networks, hundreds of miles apart. Our problem with xymon is all the duplicated alerts. If there were some way to get xymon to emulate big brother in this regard it would be ideal. The ideas posted here so far have merit, but I'm still trying to think through all the options to come up with the simplest way to suppress the duplicate alerts without introducing a new single point of failure. Joe
list Ralph Mitchell
▸
On Tue, Jun 2, 2009 at 2:18 PM, J Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
T.J. Yang wrote:Hi, joe We are running two xymon servers across WAN network also. Here is a brief description how we did it. 1. xymon1 is the primpary one and xymon2 is the standby one which is dumb(not alerting). 2. all the clients send xymon messags to both xymon1 and xymon2. 3. on xymon2(standby), 1. we have a cron entry to sync xymon1 config files every 5 minutes. 2. there is a xymon2 hertbeat server side external module to check the health of xymon1. if xymon1 is head or not healthy, this module will enable xymon2 with [bbpage] section enabled. 3. heartbeat server side module will disable its alerting once xymon1 is back online. So we have a semi-auto fail-over architecture. but we need to take the lost of missing metrics information on xymon1 during its' down time. keeping two xymon server in sync on same LAN is easy using HA/clustering software. but keeping two xyomn servers in sync on two WANs far away is not easy. I heard Sun's clustering software has new feature to enable clustering overWANs, but I haven't study this myself.T.J. - Thanks for you insights. Your setup sounds like an engineering tour de force, but our needs are much simpler than that - no cluster is needed in our environment, the redundant xymon servers are providing all the reliability we need and more. In fact, a cluster would be hard to implement since the corresponding xymon servers are in separate networks, hundreds of miles apart. Our problem with xymon is all the duplicated alerts. If there were some way to get xymon to emulate big brother in this regard it would be ideal. The ideas posted here so far have merit, but I'm still trying to think through all the options to come up with the simplest way to suppress the duplicate alerts without introducing a new single point of failure.
I once had a old Compaq desktop system and laptop, both running Gentoo Linux
with 'heartbeat' installed. Whenever I shutdown the laptop, the Compaq
'acquired' its IP address, so that it wouldn't be given away to anyone else
on that segment of the company network. It wasn't anything fancy, just
heartbeat packets being exchanged over the network every few seconds.
As long as your two xymon servers are sending each other status messages,
you could use that for the heartbeat. Something like this in an external
script:
X=`server/bin/bb localhost 'hobbitdboard host=xymon1 test=bbd
fields=logtime'`
Y=`date +%s`
Z=`expr $Y - $X`
if [ $Z -ge 600 ]; then
# do stuff to enable paging
fi
If you were using a script to send the pages out, enabling paging could be
as simple as "touch $BBTMP/pager", then in the pager script, do this:
if [ -f $BBTMP/pager ]; then
# send the page
fi
Ralph Mitchell
list Joe Sloan
▸
Ralph Mitchell wrote:
I once had a old Compaq desktop system and laptop, both running Gentoo Linux with 'heartbeat' installed. Whenever I shutdown the laptop, the Compaq 'acquired' its IP address, so that it wouldn't be given away to anyone else on that segment of the company network. It wasn't anything fancy, just heartbeat packets being exchanged over the network every few seconds.
I've played with heartbeat, it's a neat package - bit of overkill for
what we need here though.
▸
As long as your two xymon servers are sending each other status
messages, you could use that for the heartbeat. Something like this
in an external script:
X=`server/bin/bb localhost 'hobbitdboard host=xymon1 test=bbd
fields=logtime'`
Y=`date +%s`
Z=`expr $Y - $X`
if [ $Z -ge 600 ]; then
# do stuff to enable paging
fi
If you were using a script to send the pages out, enabling paging
could be as simple as "touch $BBTMP/pager", then in the pager script,
do this:
if [ -f $BBTMP/pager ]; then
# send the page
fiNow that is interesting - I wasn't familiar with the detail of information available from the bb command using the hobbitboard directive, but that definitely lends itself to the solution I'm looking for. I'll play with it a bit and see what I can come up with. Joe