Xymon Mailing List Archive search

Some thoughts about alerts, acks and escalations

10 messages in this thread

list Henrik Størner · Wed, 13 Apr 2005 07:49:07 +0200 ·
I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.

Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).

BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.


What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.

I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.

A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done. 

Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.

You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.


Does that sound like it would be useful?


Regards,
Henrik
list Richard Deal · Wed, 13 Apr 2005 08:24:36 -0400 ·
Yes this sound useful, especially if there is any easy way to set when
the alerts start going to the next level up.  An escalation delay, for
example that would determine how long something is in alert before going
to the next level.  Would be nice to be able to set the escalation delay
globally, and to change it per host.  Might also be nice (for some) to
have different delays for each escalation level (10 mins from 0-to-1,
but 30 mins from 1-to-2)?
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, April 13, 2005 1:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Some thoughts about alerts, acks and escalations

I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.

Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).

BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.


What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.

I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.

A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done. 

Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.

You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.


Does that sound like it would be useful?


Regards,
Henrik
list David Stuffle · Wed, 13 Apr 2005 08:01:25 -0500 ·
quoted from Richard Deal
Henrik Stoerner wrote:
I'm beginning to look at the issue of escalating alerts.
And I've had an idea that I'd like to get some feedback
on before I go ahead and implement it. 

Right now, Hobbit doesn't handle escalating an alert. When
someone receives an alert message, they can ack it - when
they do, all alerts stop and the item disappears from the
"Critical systems" page (the NK page).

BB has the concept of escalating an alert, meaning that
some recipients of an alert will get the alert message
even if the alert has been acknowledged. 


What I'd like to have is the BB system with a finer
granularity. A recipient in the hobbit-alerts.cfg file
has an associated "level", default is 1.

I want our NOC guys who do nothing but stare at the NK
page 24x7 to be able to acknowledge an alert - and that
just gets it off their monitor, it doesn't stop alerts
from going out. A "level 0" acknowledgment - this is just
to log that a trouble ticket has been raised for the
issue. 

A technician (who is a "level 1" recipient) can
acknowledge the alert he receives - this will stop alert
messages from going out to other "level 1" receipients,
so all of the engineers can concentrate on doing what
needs to be done. 

Alerts will still be sent to recipients who are "level 2"
and above - these are the equivalent of the BB
"escalation" alerts. They can ack the alert if they'd
like to turn off more alert messages, of course.

You can have even higher levels if you like, probably
going up the hierarchy of managers. I don't think we'll
using more than the 3 levels I've described, but there is
no reason to impose any limit. 


Does that sound like it would be useful?


Regards,
Henrik

I like it too.  Would there still be a way to ack an alert which stops all
alerts to everyone?  This way your saying, "I see the alert and I know it's
not a problem, so don't alert anyone else in any level."  

Similarly, sometimes I would like to say, "I see the alert, but I can't do
anything about it right now because I'm on the golf course, so stop alerting
me, but continue alerting everyone else, even people in my same level."

Would the recipient levels be set per host?  I may be level 1 on one host
but level 3 on another.

Somewhat related, one big problem we have is being able to mail ack an alert
from a cell phone.  When you reply to the message it doesn't keep the
subject.  But I believe it puts the ack code somewhere in the body.  Could
Hobbit search the body of the message for "Hobbit [xxxxxx]"?  I don't think
I've heard anyone else with this problem, am I missing something?

Thanks Henrik.


~~~~~~~~~~~~~~
David Stuffle                       user-4d88f4a4f51e@xymon.invalid
Delta Faucet Company                (XXX) XXX-XXXX


This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
Please note that any views or opinions presented in this email are solely
those of the author and do not necessarily represent those of the company.
Finally, the recipient should check this email and any attachments for the
presence of viruses. The company accepts no liability for any damage caused
by any virus transmitted by this email.
list Terry Barnes · Wed, 13 Apr 2005 09:10:29 -0400 ·
YES - that would work well here.

Terry Barnes
Siemens Com @ HFHS
XXX-XXX-XXXX (Office)
XXX-XXX-XXXX (Cellular)
XXX-XXX-XXXX (Fax)
user-34ea5ff61ded@xymon.invalid (Text Pager)
user-0e29285d9a67@xymon.invalid
user-ce4a2c883f75@xymon.invalid 4/13/05 1:49:07 AM >>>
quoted from David Stuffle
I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.

Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).

BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.


What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.

I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.

A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done. 

Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.

You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.


Does that sound like it would be useful?


Regards,
Henrik

 
==============================================================================

HFHS CONFIDENTIALITY NOTICE: This email contains information from the sender that may be CONFIDENTIAL, LEGALLY PRIVILEGED, PROPRIETARY or otherwise protected from disclosure. This email is intended for use only by the person or entity to whom it is addressed.  If you are not the intended recipient, any use, disclosure, copying, distribution, printing, or any action taken in reliance on the contents of this email, is strictly prohibited. If you received this email in error, please contact the sending party by replying in an email to the sender, delete the email from your computer system and shred any paper copies of the email you printed.

Note to Patients: There are a number of risks you should consider before using e-mail to communicate with us. These risks are described in our Privacy Policy at http://henryford.com.  Review that policy carefully before continuing to communicate with us by e-mail. For greater Internet security, our policy describes the Henry Ford MyHealth electronic communication process - you may register at http://henryford.com.  If you do not believe that our policy gives you the privacy and security protection you need, do not send e-mail or Internet communications to us.


==============================================================================
list Stefan Loos · Wed, 13 Apr 2005 13:19:30 +0000 ·
Hi Henrik,

for me it would be a "nice to have" feature.
What I would set on top of a wishlist would be a failover server solution, 
so that the guys who stare at the NK page 24x7 will have something to stare 
at when the hobbit server crashes ;-)
Nevertheless I want to say thank you for your great work!

Regards,
Stefan


>From: Henrik Stoerner <user-ce4a2c883f75@xymon.invalid>
>Reply-To: user-ae9b8668bcde@xymon.invalid
>To: user-ae9b8668bcde@xymon.invalid
>Subject: [hobbit] Some thoughts about alerts, acks and escalations
>Date: Wed, 13 Apr 2005 07:49:07 +0200
>
>I'm beginning to look at the issue of escalating alerts. And I've had
>an idea that I'd like to get some feedback on before I go ahead and
>implement it.
>
>Right now, Hobbit doesn't handle escalating an alert. When someone
>receives an alert message, they can ack it - when they do, all alerts
>stop and the item disappears from the "Critical systems" page 
(the NK
>page).
>
>BB has the concept of escalating an alert, meaning that some
>recipients of an alert will get the alert message even if the alert
>has been acknowledged.
>
>
>What I'd like to have is the BB system with a finer granularity. A
>recipient in the hobbit-alerts.cfg file has an associated 
"level",
>default is 1.
>
>I want our NOC guys who do nothing but stare at the NK page 24x7 to be
>able to acknowledge an alert - and that just gets it off their
>monitor, it doesn't stop alerts from going out. A "level 0"
>acknowledgment - this is just to log that a trouble ticket has been
>raised for the issue.
>
>A technician (who is a "level 1" recipient) can acknowledge 
the alert
>he receives - this will stop alert messages from going out to other
>"level 1" receipients, so all of the engineers can concentrate 
on
>doing what needs to be done.
>
>Alerts will still be sent to recipients who are "level 2" and 
above -
>these are the equivalent of the BB "escalation" alerts. They 
can ack
>the alert if they'd like to turn off more alert messages, of course.
>
>You can have even higher levels if you like, probably going up the
>hierarchy of managers. I don't think we'll using more than the 3
>levels I've described, but there is no reason to impose any limit.
>
>
>Does that sound like it would be useful?
>
>
>Regards,
>Henrik
>
>
>
>
list Michael Lowery · Wed, 13 Apr 2005 09:12:46 -0500 ·
Absolutely!  I'm currently the level 3 guy, but if one of the level 1
guys ack the alert, I don't get it anymore...  Sounds like a great idea!

Michael
quoted from Terry Barnes


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, April 13, 2005 12:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Some thoughts about alerts, acks and escalations

I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.

Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).

BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.


What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.

I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.

A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done. 

Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.

You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.


Does that sound like it would be useful?


Regards,
Henrik
list Larry Barber · Wed, 13 Apr 2005 10:16:21 -0400 (EDT) ·
Yes, very useful. I think the biggest flaw with BigBrother is the way
one red alert can mask subsequent red alerts, even alerts from different
machines. There needs to be a way for the operations people to clear an
alert after they've noticed it and logged it. 
Thanks,
Larry Barber
quoted from Michael Lowery


On Wed, 2005-04-13 at 00:49 -0500, user-ce4a2c883f75@xymon.invalid wrote:
I'm beginning to look at the issue of escalating alerts. And I've had an idea that I'd like to get some feedback on before I go ahead and implement it.

Right now, Hobbit doesn't handle escalating an alert. When someone receives an alert message, they can ack it - when they do, all alerts stop and the item disappears from the "Critical systems" page (the NK page).

BB has the concept of escalating an alert, meaning that some recipients of an alert will get the alert message even if the alert has been acknowledged.


What I'd like to have is the BB system with a finer granularity. A recipient in the hobbit-alerts.cfg file has an associated "level", default is 1.

I want our NOC guys who do nothing but stare at the NK page 24x7 to
be able to acknowledge an alert - and that just gets it off their monitor, it doesn't stop alerts from going out. A "level 0" acknowledgment - this is just to log that a trouble ticket has been raised for the issue.

A technician (who is a "level 1" recipient) can acknowledge the alert he receives - this will stop alert messages from going out to other "level 1" receipients, so all of the engineers can concentrate on doing what needs to be done. 
Alerts will still be sent to recipients who are "level 2" and above - these are the equivalent of the BB "escalation" alerts. They can ack the alert if they'd like to turn off more alert messages, of course.

You can have even higher levels if you like, probably going up the hierarchy of managers. I don't think we'll using more than the 3 levels I've described, but there is no reason to impose any limit.


Does that sound like it would be useful?


Regards, Henrik

list Daniel Deighton · Wed, 13 Apr 2005 10:27:01 -0400 ·
This would be extremely useful.  I agree with Richard.  Full control of
the escalation times would be very beneficial.
quoted from Larry Barber

On Wed, 2005-04-13 at 08:24 -0400, Deal, Richard wrote:
Yes this sound useful, especially if there is any easy way to set when
the alerts start going to the next level up.  An escalation delay, for
example that would determine how long something is in alert before going
to the next level.  Would be nice to be able to set the escalation delay
globally, and to change it per host.  Might also be nice (for some) to
have different delays for each escalation level (10 mins from 0-to-1,
but 30 mins from 1-to-2)?

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Wednesday, April 13, 2005 1:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Some thoughts about alerts, acks and escalations

I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.

Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).

BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.


What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.

I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.

A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done. 
Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.

You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.


Does that sound like it would be useful?


Regards,
Henrik

-- 

Daniel Deighton <user-fdcc03e0c730@xymon.invalid>
list Tom Georgoulias · Wed, 13 Apr 2005 14:04:41 -0400 ·
quoted from Daniel Deighton
Henrik Stoerner wrote:
I'm beginning to look at the issue of escalating alerts. 
Does that sound like it would be useful?
I like this feature and wouldn't mind having it, for many of the same reasons already echoed by others on the list.

Just so I understand, though, an ack by a person only prevents alerts from being sent out to other in their same level.  Some one above or below still gets them, unless they ack the alert themselves?

Say a level 1 guy acks the alert.  It still goes to 0 & 2, right?  Or would it just go to 2 and no longer to 0?

Tom
list Bob Gordon · Wed, 13 Apr 2005 13:57:58 -0700 ·
quoted from David Stuffle
On 4/13/05, Stuffle, David <user-4d88f4a4f51e@xymon.invalid> wrote:
I like it too.  Would there still be a way to ack an alert which stops all
alerts to everyone?  This way your saying, "I see the alert and I know it's
not a problem, so don't alert anyone else in any level."

Similarly, sometimes I would like to say, "I see the alert, but I can't do
anything about it right now because I'm on the golf course, so stop alerting
me, but continue alerting everyone else, even people in my same level."

Would the recipient levels be set per host?  I may be level 1 on one host
but level 3 on another.
Something like this is what our Datacenter Ops crew is clamoring for. 
As someone mentioned though, if it could have the capability of
variable escalation times (so that we can adjust them to our existing
SLA's) that would be great..  :)

-- 
--==[ Bob Gordon ]==--