Xymon Mailing List Archive search

[re-post] xymon notifications

7 messages in this thread

list Joe Sloan · Wed, 27 May 2009 12:03:34 -0700 ·
We've been running big brother for some years, and we monitor several
hundred servers in 2 data centers.

We are currently running a pilot of xymon (4.3.0 svn), and just as with
big brother, we have redundant monitoring servers. A xymon server in
california monitors hosts in both california and arizona, and a xymon
server in arizona monitors hosts in both data centers as well.

For the most part, xymon is performing well, but it is a bit of an
annoyance with xymon that we get duplicate notifications, as each xymon
server sends it's own notifications for every event.

Big brother sends a single notification for each event, as controlled by
the alerting failover.

Is there any way to suppress the duplicate notifications from xymon?

Joe
list Ralph Mitchell · Wed, 27 May 2009 14:30:59 -0500 ·
quoted from Joe Sloan
On Wed, May 27, 2009 at 2:03 PM, J Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
We've been running big brother for some years, and we monitor several
hundred servers in 2 data centers.

We are currently running a pilot of xymon (4.3.0 svn), and just as with
big brother, we have redundant monitoring servers. A xymon server in
california monitors hosts in both california and arizona, and a xymon
server in arizona monitors hosts in both data centers as well.

For the most part, xymon is performing well, but it is a bit of an
annoyance with xymon that we get duplicate notifications, as each xymon
server sends it's own notifications for every event.

Big brother sends a single notification for each event, as controlled by
the alerting failover.

Is there any way to suppress the duplicate notifications from xymon?

I don't think xymon has failover yet.  How are you sending the alerts??  If
you're using a script, you could make the script on the backup server run a
check against the primary.  If it fails, allow the backup to send out the
notifications.  Not perfect, but it could at least reduce the duplicates.

Ralph Mitchell
list Joe Sloan · Wed, 27 May 2009 13:06:18 -0700 ·
quoted from Ralph Mitchell
Ralph Mitchell wrote:
On Wed, May 27, 2009 at 2:03 PM, J Sloan <user-b1d2c84d244b@xymon.invalid
<mailto:user-b1d2c84d244b@xymon.invalid>> wrote:

    We've been running big brother for some years, and we monitor several
    hundred servers in 2 data centers.

    We are currently running a pilot of xymon (4.3.0 svn), and just as
    with
    big brother, we have redundant monitoring servers. A xymon server in
    california monitors hosts in both california and arizona, and a xymon
    server in arizona monitors hosts in both data centers as well.

    For the most part, xymon is performing well, but it is a bit of an
    annoyance with xymon that we get duplicate notifications, as each
    xymon
    server sends it's own notifications for every event.

    Big brother sends a single notification for each event, as
    controlled by
    the alerting failover.

    Is there any way to suppress the duplicate notifications from xymon?


I don't think xymon has failover yet.  How are you sending the
alerts??  If you're using a script, you could make the script on the
backup server run a check against the primary.  If it fails, allow the
backup to send out the notifications.  Not perfect, but it could at
least reduce the duplicates.
We're using the mail facility - for instance:

$LXADM=MAIL joe TIME=W:0800:1800 REPEAT=30m

$LXDNS=%(emerald|ebony|mulholland|plifsp01|plifsp10).*

HOST=$LXDNS
        $LXADM RECOVERED


Joe
list T.J. Yang · Wed, 27 May 2009 19:36:33 -0500 ·
Hi, joe

We are running two xymon servers across WAN network also.
Here is a brief description how we did it.

1. xymon1 is the primpary one and xymon2 is the standby one which is dumb(not alerting).
2. all the clients send xymon messags to both xymon1 and xymon2.
3. on xymon2(standby),
    1. we have a cron entry to sync xymon1 config files every 5 minutes.
    2. there is a xymon2 hertbeat server side external module to check the health of xymon1.
       if xymon1 is head or not healthy, this module will enable xymon2 with [bbpage] section enabled.
    3. heartbeat server side module will disable its alerting once xymon1 is back online.

So we have a semi-auto fail-over architecture. but we need to take the lost of missing metrics information on xymon1 during its' down time. 

 keeping two xymon server in sync on same LAN is easy using HA/clustering software.
but keeping two xyomn servers in sync on two WANs far away is not easy. I heard Sun's clustering software has new feature to enable clustering over WANs, but I haven't study this myself.
 
Hope this help


T.J. Yang
quoted from Joe Sloan

Date: Wed, 27 May 2009 12:03:34 -0700
From: user-b1d2c84d244b@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] [re-post] xymon notifications

We've been running big brother for some years, and we monitor several
hundred servers in 2 data centers.

We are currently running a pilot of xymon (4.3.0 svn), and just as with
big brother, we have redundant monitoring servers. A xymon server in
california monitors hosts in both california and arizona, and a xymon
server in arizona monitors hosts in both data centers as well.

For the most part, xymon is performing well, but it is a bit of an
annoyance with xymon that we get duplicate notifications, as each xymon
server sends it's own notifications for every event.

Big brother sends a single notification for each event, as controlled by
the alerting failover.

Is there any way to suppress the duplicate notifications from xymon?

Joe

Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009
list Joe Sloan · Tue, 02 Jun 2009 12:18:38 -0700 ·
quoted from T.J. Yang
T.J. Yang wrote:
Hi, joe

We are running two xymon servers across WAN network also.
Here is a brief description how we did it.

1. xymon1 is the primpary one and xymon2 is the standby one which is dumb(not alerting).
2. all the clients send xymon messags to both xymon1 and xymon2.
3. on xymon2(standby),
    1. we have a cron entry to sync xymon1 config files every 5 minutes.
    2. there is a xymon2 hertbeat server side external module to check the health of xymon1.
       if xymon1 is head or not healthy, this module will enable xymon2 with [bbpage] section enabled.
    3. heartbeat server side module will disable its alerting once xymon1 is back online.

So we have a semi-auto fail-over architecture. but we need to take the lost of missing metrics information on xymon1 during its' down time. 

 keeping two xymon server in sync on same LAN is easy using HA/clustering software.
but keeping two xyomn servers in sync on two WANs far away is not easy. I heard Sun's clustering software has new feature to enable clustering over WANs, but I haven't study this myself.
  
T.J. - 

Thanks for you insights. Your setup sounds like an engineering tour de force, but our needs are much simpler than that - no cluster is needed in our environment, the redundant xymon servers are providing all the reliability we need and more. In fact, a cluster would be hard to implement since the corresponding xymon servers are in separate networks, hundreds of miles apart.

Our problem with xymon is all the duplicated alerts. If there were some way to get xymon to emulate big brother in this regard it would be ideal.

The ideas posted here so far have merit, but I'm still trying to think through all the options to come up with the simplest way to suppress the duplicate alerts without introducing a new single point of failure.

Joe
list Ralph Mitchell · Wed, 3 Jun 2009 11:45:16 -0500 ·
quoted from Joe Sloan
On Tue, Jun 2, 2009 at 2:18 PM, J Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
T.J. Yang wrote:
Hi, joe

We are running two xymon servers across WAN network also.
Here is a brief description how we did it.

1. xymon1 is the primpary one and xymon2 is the standby one which is
dumb(not alerting).
2. all the clients send xymon messags to both xymon1 and xymon2.
3. on xymon2(standby),
    1. we have a cron entry to sync xymon1 config files every 5 minutes.
    2. there is a xymon2 hertbeat server side external module to check
the health of xymon1.
       if xymon1 is head or not healthy, this module will enable xymon2
with [bbpage] section enabled.
    3. heartbeat server side module will disable its alerting once xymon1
is back online.

So we have a semi-auto fail-over architecture. but we need to take the
lost of missing metrics information on xymon1 during its' down time.

 keeping two xymon server in sync on same LAN is easy using HA/clustering
software.
but keeping two xyomn servers in sync on two WANs far away is not easy. I
heard Sun's clustering software has new feature to enable clustering over
WANs, but I haven't study this myself.
T.J. -

Thanks for you insights. Your setup sounds like an engineering tour de
force, but our needs are much simpler than that - no cluster is needed in
our environment, the redundant xymon servers are providing all the
reliability we need and more. In fact, a cluster would be hard to implement
since the corresponding xymon servers are in separate networks, hundreds of
miles apart.

Our problem with xymon is all the duplicated alerts. If there were some way
to get xymon to emulate big brother in this regard it would be ideal.

The ideas posted here so far have merit, but I'm still trying to think
through all the options to come up with the simplest way to suppress the
duplicate alerts without introducing a new single point of failure.

I once had a old Compaq desktop system and laptop, both running Gentoo Linux
with 'heartbeat' installed.  Whenever I shutdown the laptop,  the Compaq
'acquired' its IP address, so that it wouldn't be given away to anyone else
on that segment of the company network.  It wasn't anything fancy, just
heartbeat packets being exchanged over the network every few seconds.

As long as your two xymon servers are sending each other status messages,
you could use that for the heartbeat.  Something like this in an external
script:

     X=`server/bin/bb localhost 'hobbitdboard host=xymon1 test=bbd
fields=logtime'`
     Y=`date +%s`
     Z=`expr $Y - $X`
     if [ $Z -ge 600 ]; then
        # do stuff to enable paging
     fi

If you were using a script to send the pages out, enabling paging could be
as simple as "touch $BBTMP/pager", then in the pager script, do this:

     if [ -f $BBTMP/pager ]; then
        # send the page
     fi

Ralph Mitchell
list Joe Sloan · Wed, 03 Jun 2009 14:19:46 -0700 ·
quoted from Ralph Mitchell
Ralph Mitchell wrote:
I once had a old Compaq desktop system and laptop, both running Gentoo
Linux with 'heartbeat' installed.  Whenever I shutdown the laptop,
 the Compaq 'acquired' its IP address, so that it wouldn't be given
away to anyone else on that segment of the company network.  It wasn't
anything fancy, just heartbeat packets being exchanged over the
network every few seconds.

I've played with heartbeat, it's a neat package - bit of overkill for
what we need here though.
quoted from Ralph Mitchell
As long as your two xymon servers are sending each other status
messages, you could use that for the heartbeat.  Something like this
in an external script:

     X=`server/bin/bb localhost 'hobbitdboard host=xymon1 test=bbd
fields=logtime'`
     Y=`date +%s`
     Z=`expr $Y - $X`
     if [ $Z -ge 600 ]; then
        # do stuff to enable paging
     fi

If you were using a script to send the pages out, enabling paging
could be as simple as "touch $BBTMP/pager", then in the pager script,
do this:

     if [ -f $BBTMP/pager ]; then
        # send the page
     fi
Now that is interesting - I wasn't familiar with the detail of
information available from the bb command using the hobbitboard
directive, but that definitely lends itself to the solution I'm looking
for. I'll play with it a bit and see what I can come up with.

Joe