Xymon Mailing List Archive search

pagetype: HOST alerting feature?

6 messages in this thread

list Bruce Lysik · Thu, 17 Feb 2005 17:01:53 -0800 ·
Hi,

So I migrated one of our BB installations officially over to Hobbit today.  Woohoo.  So now the questions and comments from the other sysadmins start to come in.

Bigbrother apparently has a paging setting where if all the hosts for a check fail, it will only page you on one of the checks.  ie, a host's nic fails, so ping, http checks, snmp checks fail, but bigbrother would only page you about the ping check failing and preventing a deluge to your pager.

Does hobbit have a way to do this?

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Bruce Lysik · Thu, 17 Feb 2005 17:13:32 -0800 ·
quoted from Bruce Lysik
Bigbrother apparently has a paging setting where if all the 
hosts for a check fail, it will only page you on one of the 
checks.
Er, I meant: a paging setting where if all the checks for a host fail...

--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Charles Jones · Thu, 17 Feb 2005 18:37:55 -0700 ·
quoted from Bruce Lysik
Bruce Lysik wrote:
Bigbrother apparently has a paging setting where if all the 
hosts for a check fail, it will only page you on one of the 
checks.
   
Er, I meant: a paging setting where if all the checks for a host fail...
 
depends=(testA:host1/test1,host2/test2),(testB:host3/test3),[...]
    This tag allows you to define dependencies betweeen tests. If
    "testA" for the current host depends on "test1" for host "host1" and
    test "test2" for "host2", this can be defined with


       depends=(testA:host1/test1,host2/test2)

    When deciding the color to report for testA, if either host1/test1
    failed or host2/test2 failed, if testA has failed also then the
    color of testA will be "clear" instead of red or yellow.

    Since all tests are actually run before the dependencies are
    evaluated, you can use any host/test in the dependency - regardless
    of the actual sequence that the hosts are listed, or the tests run.
    It is also valid to use tests from the same host that the dependency
    is for. E.g.


       1.2.3.4  foo # http://foo/ webmin depends=(webmin:foo/http)

    is valid; if both the http and the webmin tests fail, then webmin
    will be reported as clear.

    Note: The "depends" tag is evaluated on the BBNET server while
    running the network tests. It can therefore only refer to other
    network tests that are handled by the same BBNET server - there is
    currently no way to use the e.g. the status of locally run tests
    (disk, cpu, msgs) or network tests from other BBNET servers in a
    dependency definition. Such dependencies are silently ignored.
list Bruce Lysik · Thu, 17 Feb 2005 18:12:41 -0800 ·
<snip 'depends' snippet>
 
Interesting.  So I guess one way to do it would be to have all checks for a given host depend on that hosts conn check.
 
So if I'm comprehending correctly, this should work:
 
1.2.3.4 foo #  <http://foo/>; http://foo/ smtp depends=(http:foo/conn),(smtp:foo/conn)
 
In theory, the http and smtp checks would be clear if conn is red, and no alert would be sent for them.  (While an alert would be sent for the red conn check.)
 
That's pretty cool, but unfortunately doesn't solve the problem I've run into:
 
I have some hostsA which all nfs mount from hostB.  hostB dies.  Now when hostsA bb clients run its disk check, the df hangs, which causes all those local bb checks to purple in hobbit.  hobbit proceeds to send out five alerts per host (one for each local purple).  In the past, bigbrother would only send out one alert per group of purples.  Is there any way to roll-up alerts like this?
 
--
Bruce Z. Lysik  <user-4e63a10f8934@xymon.invalid> 
Operations Engineer
list Henrik Størner · Fri, 18 Feb 2005 07:50:30 +0100 ·
quoted from Bruce Lysik
On Thu, Feb 17, 2005 at 05:01:53PM -0800, Bruce Lysik wrote:
So I migrated one of our BB installations officially over to Hobbit
 today.  Woohoo.  So now the questions and comments from the other
 sysadmins start to come in.
Oh my god ... they're actually *using* the darn thing :-)
quoted from Bruce Lysik
Bigbrother apparently has a paging setting where if all the hosts
 for a check fail, it will only page you on one of the checks.  ie, a
 host's nic fails, so ping, http checks, snmp checks fail, but
 bigbrother would only page you about the ping check failing and
 preventing a deluge to your pager.

Does hobbit have a way to do this?
It should do that "out of the box". Hobbit's network tester mimics the
way BB does network tests, so if the ping-test fails the "conn" column 
will be red, but the other network tests for that host will go
"clear". And the "clear" color normally does not trigger a page.

The "depends" setting mentioned here is for dependencies between
hosts, e.g. if your webserver needs an application server to be up
before it can send a response.


Regards,
Henrik
list Henrik Størner · Sun, 20 Feb 2005 13:53:35 +0100 ·
quoted from Bruce Lysik
On Thu, Feb 17, 2005 at 06:12:41PM -0800, Bruce Lysik wrote:  
I have some hostsA which all nfs mount from hostB.  hostB dies.  Now
 when hostsA bb clients run its disk check, the df hangs, which
 causes all those local bb checks to purple in hobbit.  hobbit
 proceeds to send out five alerts per host (one for each local
 purple). 
OK, that's pretty annoying.
quoted from Bruce Lysik
 In the past, bigbrother would only send out one alert per
 group of purples.  Is there any way to roll-up alerts like this?
Did BB really have this ? I didn't know that.

Hobbit doesn't support it - sorry. I think it might be worthwhile to
look at doing "alert-merging" more generally, e.g. so you'll get
all alerts for a given recipient merged into one message based on some
criteria. 

But that's for a future version.


Henrik