Some thoughts about alerts, acks and escalations
list Henrik Størner
I'm beginning to look at the issue of escalating alerts. And I've had an idea that I'd like to get some feedback on before I go ahead and implement it. Right now, Hobbit doesn't handle escalating an alert. When someone receives an alert message, they can ack it - when they do, all alerts stop and the item disappears from the "Critical systems" page (the NK page). BB has the concept of escalating an alert, meaning that some recipients of an alert will get the alert message even if the alert has been acknowledged. What I'd like to have is the BB system with a finer granularity. A recipient in the hobbit-alerts.cfg file has an associated "level", default is 1. I want our NOC guys who do nothing but stare at the NK page 24x7 to be able to acknowledge an alert - and that just gets it off their monitor, it doesn't stop alerts from going out. A "level 0" acknowledgment - this is just to log that a trouble ticket has been raised for the issue. A technician (who is a "level 1" recipient) can acknowledge the alert he receives - this will stop alert messages from going out to other "level 1" receipients, so all of the engineers can concentrate on doing what needs to be done. Alerts will still be sent to recipients who are "level 2" and above - these are the equivalent of the BB "escalation" alerts. They can ack the alert if they'd like to turn off more alert messages, of course. You can have even higher levels if you like, probably going up the hierarchy of managers. I don't think we'll using more than the 3 levels I've described, but there is no reason to impose any limit. Does that sound like it would be useful? Regards, Henrik
list Richard Deal
Yes this sound useful, especially if there is any easy way to set when the alerts start going to the next level up. An escalation delay, for example that would determine how long something is in alert before going to the next level. Would be nice to be able to set the escalation delay globally, and to change it per host. Might also be nice (for some) to have different delays for each escalation level (10 mins from 0-to-1, but 30 mins from 1-to-2)?
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, April 13, 2005 1:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Some thoughts about alerts, acks and escalations
I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.
Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).
BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.
What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.
I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.
A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done.
Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.
You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.
Does that sound like it would be useful?
Regards,
Henrik
list David Stuffle
▸
Henrik Stoerner wrote:
I'm beginning to look at the issue of escalating alerts. And I've had an idea that I'd like to get some feedback on before I go ahead and implement it. Right now, Hobbit doesn't handle escalating an alert. When someone receives an alert message, they can ack it - when they do, all alerts stop and the item disappears from the "Critical systems" page (the NK page). BB has the concept of escalating an alert, meaning that some recipients of an alert will get the alert message even if the alert has been acknowledged. What I'd like to have is the BB system with a finer granularity. A recipient in the hobbit-alerts.cfg file has an associated "level", default is 1. I want our NOC guys who do nothing but stare at the NK page 24x7 to be able to acknowledge an alert - and that just gets it off their monitor, it doesn't stop alerts from going out. A "level 0" acknowledgment - this is just to log that a trouble ticket has been raised for the issue. A technician (who is a "level 1" recipient) can acknowledge the alert he receives - this will stop alert messages from going out to other "level 1" receipients, so all of the engineers can concentrate on doing what needs to be done. Alerts will still be sent to recipients who are "level 2" and above - these are the equivalent of the BB "escalation" alerts. They can ack the alert if they'd like to turn off more alert messages, of course. You can have even higher levels if you like, probably going up the hierarchy of managers. I don't think we'll using more than the 3 levels I've described, but there is no reason to impose any limit. Does that sound like it would be useful? Regards, Henrik
I like it too. Would there still be a way to ack an alert which stops all alerts to everyone? This way your saying, "I see the alert and I know it's not a problem, so don't alert anyone else in any level." Similarly, sometimes I would like to say, "I see the alert, but I can't do anything about it right now because I'm on the golf course, so stop alerting me, but continue alerting everyone else, even people in my same level." Would the recipient levels be set per host? I may be level 1 on one host but level 3 on another. Somewhat related, one big problem we have is being able to mail ack an alert from a cell phone. When you reply to the message it doesn't keep the subject. But I believe it puts the ack code somewhere in the body. Could Hobbit search the body of the message for "Hobbit [xxxxxx]"? I don't think I've heard anyone else with this problem, am I missing something? Thanks Henrik. ~~~~~~~~~~~~~~ David Stuffle user-4d88f4a4f51e@xymon.invalid Delta Faucet Company (XXX) XXX-XXXX This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
list Terry Barnes
YES - that would work well here. Terry Barnes Siemens Com @ HFHS XXX-XXX-XXXX (Office) XXX-XXX-XXXX (Cellular) XXX-XXX-XXXX (Fax) user-34ea5ff61ded@xymon.invalid (Text Pager) user-0e29285d9a67@xymon.invalid
user-ce4a2c883f75@xymon.invalid 4/13/05 1:49:07 AM >>>
▸
I'm beginning to look at the issue of escalating alerts. And I've had an idea that I'd like to get some feedback on before I go ahead and implement it. Right now, Hobbit doesn't handle escalating an alert. When someone receives an alert message, they can ack it - when they do, all alerts stop and the item disappears from the "Critical systems" page (the NK page). BB has the concept of escalating an alert, meaning that some recipients of an alert will get the alert message even if the alert has been acknowledged. What I'd like to have is the BB system with a finer granularity. A recipient in the hobbit-alerts.cfg file has an associated "level", default is 1. I want our NOC guys who do nothing but stare at the NK page 24x7 to be able to acknowledge an alert - and that just gets it off their monitor, it doesn't stop alerts from going out. A "level 0" acknowledgment - this is just to log that a trouble ticket has been raised for the issue. A technician (who is a "level 1" recipient) can acknowledge the alert he receives - this will stop alert messages from going out to other "level 1" receipients, so all of the engineers can concentrate on doing what needs to be done. Alerts will still be sent to recipients who are "level 2" and above - these are the equivalent of the BB "escalation" alerts. They can ack the alert if they'd like to turn off more alert messages, of course. You can have even higher levels if you like, probably going up the hierarchy of managers. I don't think we'll using more than the 3 levels I've described, but there is no reason to impose any limit. Does that sound like it would be useful? Regards, Henrik ==============================================================================
HFHS CONFIDENTIALITY NOTICE: This email contains information from the sender that may be CONFIDENTIAL, LEGALLY PRIVILEGED, PROPRIETARY or otherwise protected from disclosure. This email is intended for use only by the person or entity to whom it is addressed. If you are not the intended recipient, any use, disclosure, copying, distribution, printing, or any action taken in reliance on the contents of this email, is strictly prohibited. If you received this email in error, please contact the sending party by replying in an email to the sender, delete the email from your computer system and shred any paper copies of the email you printed. Note to Patients: There are a number of risks you should consider before using e-mail to communicate with us. These risks are described in our Privacy Policy at http://henryford.com. Review that policy carefully before continuing to communicate with us by e-mail. For greater Internet security, our policy describes the Henry Ford MyHealth electronic communication process - you may register at http://henryford.com. If you do not believe that our policy gives you the privacy and security protection you need, do not send e-mail or Internet communications to us. ==============================================================================
list Stefan Loos
Hi Henrik, for me it would be a "nice to have" feature. What I would set on top of a wishlist would be a failover server solution, so that the guys who stare at the NK page 24x7 will have something to stare at when the hobbit server crashes ;-) Nevertheless I want to say thank you for your great work! Regards, Stefan >From: Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> >Reply-To: user-ae9b8668bcde@xymon.invalid >To: user-ae9b8668bcde@xymon.invalid >Subject: [hobbit] Some thoughts about alerts, acks and escalations >Date: Wed, 13 Apr 2005 07:49:07 +0200 > >I'm beginning to look at the issue of escalating alerts. And I've had >an idea that I'd like to get some feedback on before I go ahead and >implement it. > >Right now, Hobbit doesn't handle escalating an alert. When someone >receives an alert message, they can ack it - when they do, all alerts >stop and the item disappears from the "Critical systems" page (the NK >page). > >BB has the concept of escalating an alert, meaning that some >recipients of an alert will get the alert message even if the alert >has been acknowledged. > > >What I'd like to have is the BB system with a finer granularity. A >recipient in the hobbit-alerts.cfg file has an associated "level", >default is 1. > >I want our NOC guys who do nothing but stare at the NK page 24x7 to be >able to acknowledge an alert - and that just gets it off their >monitor, it doesn't stop alerts from going out. A "level 0" >acknowledgment - this is just to log that a trouble ticket has been >raised for the issue. > >A technician (who is a "level 1" recipient) can acknowledge the alert >he receives - this will stop alert messages from going out to other >"level 1" receipients, so all of the engineers can concentrate on >doing what needs to be done. > >Alerts will still be sent to recipients who are "level 2" and above - >these are the equivalent of the BB "escalation" alerts. They can ack >the alert if they'd like to turn off more alert messages, of course. > >You can have even higher levels if you like, probably going up the >hierarchy of managers. I don't think we'll using more than the 3 >levels I've described, but there is no reason to impose any limit. > > >Does that sound like it would be useful? > > >Regards, >Henrik > > > >
list Michael Lowery
Absolutely! I'm currently the level 3 guy, but if one of the level 1 guys ack the alert, I don't get it anymore... Sounds like a great idea! Michael
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, April 13, 2005 12:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Some thoughts about alerts, acks and escalations
I'm beginning to look at the issue of escalating alerts. And I've had
an idea that I'd like to get some feedback on before I go ahead and
implement it.
Right now, Hobbit doesn't handle escalating an alert. When someone
receives an alert message, they can ack it - when they do, all alerts
stop and the item disappears from the "Critical systems" page (the NK
page).
BB has the concept of escalating an alert, meaning that some
recipients of an alert will get the alert message even if the alert
has been acknowledged.
What I'd like to have is the BB system with a finer granularity. A
recipient in the hobbit-alerts.cfg file has an associated "level",
default is 1.
I want our NOC guys who do nothing but stare at the NK page 24x7 to be
able to acknowledge an alert - and that just gets it off their
monitor, it doesn't stop alerts from going out. A "level 0"
acknowledgment - this is just to log that a trouble ticket has been
raised for the issue.
A technician (who is a "level 1" recipient) can acknowledge the alert
he receives - this will stop alert messages from going out to other
"level 1" receipients, so all of the engineers can concentrate on
doing what needs to be done.
Alerts will still be sent to recipients who are "level 2" and above -
these are the equivalent of the BB "escalation" alerts. They can ack
the alert if they'd like to turn off more alert messages, of course.
You can have even higher levels if you like, probably going up the
hierarchy of managers. I don't think we'll using more than the 3
levels I've described, but there is no reason to impose any limit.
Does that sound like it would be useful?
Regards,
Henrik
list Larry Barber
Yes, very useful. I think the biggest flaw with BigBrother is the way one red alert can mask subsequent red alerts, even alerts from different machines. There needs to be a way for the operations people to clear an alert after they've noticed it and logged it. Thanks, Larry Barber
▸
On Wed, 2005-04-13 at 00:49 -0500, user-ce4a2c883f75@xymon.invalid wrote:I'm beginning to look at the issue of escalating alerts. And I've had an idea that I'd like to get some feedback on before I go ahead and implement it. Right now, Hobbit doesn't handle escalating an alert. When someone receives an alert message, they can ack it - when they do, all alerts stop and the item disappears from the "Critical systems" page (the NK page). BB has the concept of escalating an alert, meaning that some recipients of an alert will get the alert message even if the alert has been acknowledged. What I'd like to have is the BB system with a finer granularity. A recipient in the hobbit-alerts.cfg file has an associated "level", default is 1. I want our NOC guys who do nothing but stare at the NK page 24x7 to be able to acknowledge an alert - and that just gets it off their monitor, it doesn't stop alerts from going out. A "level 0" acknowledgment - this is just to log that a trouble ticket has been raised for the issue. A technician (who is a "level 1" recipient) can acknowledge the alert he receives - this will stop alert messages from going out to other "level 1" receipients, so all of the engineers can concentrate on doing what needs to be done. Alerts will still be sent to recipients who are "level 2" and above - these are the equivalent of the BB "escalation" alerts. They can ack the alert if they'd like to turn off more alert messages, of course. You can have even higher levels if you like, probably going up the hierarchy of managers. I don't think we'll using more than the 3 levels I've described, but there is no reason to impose any limit. Does that sound like it would be useful? Regards, Henrik
list Daniel Deighton
This would be extremely useful. I agree with Richard. Full control of the escalation times would be very beneficial.
▸
On Wed, 2005-04-13 at 08:24 -0400, Deal, Richard wrote:Yes this sound useful, especially if there is any easy way to set when the alerts start going to the next level up. An escalation delay, for example that would determine how long something is in alert before going to the next level. Would be nice to be able to set the escalation delay globally, and to change it per host. Might also be nice (for some) to have different delays for each escalation level (10 mins from 0-to-1, but 30 mins from 1-to-2)? -----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Wednesday, April 13, 2005 1:49 AM To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] Some thoughts about alerts, acks and escalations I'm beginning to look at the issue of escalating alerts. And I've had an idea that I'd like to get some feedback on before I go ahead and implement it. Right now, Hobbit doesn't handle escalating an alert. When someone receives an alert message, they can ack it - when they do, all alerts stop and the item disappears from the "Critical systems" page (the NK page). BB has the concept of escalating an alert, meaning that some recipients of an alert will get the alert message even if the alert has been acknowledged. What I'd like to have is the BB system with a finer granularity. A recipient in the hobbit-alerts.cfg file has an associated "level", default is 1. I want our NOC guys who do nothing but stare at the NK page 24x7 to be able to acknowledge an alert - and that just gets it off their monitor, it doesn't stop alerts from going out. A "level 0" acknowledgment - this is just to log that a trouble ticket has been raised for the issue. A technician (who is a "level 1" recipient) can acknowledge the alert he receives - this will stop alert messages from going out to other "level 1" receipients, so all of the engineers can concentrate on doing what needs to be done. Alerts will still be sent to recipients who are "level 2" and above - these are the equivalent of the BB "escalation" alerts. They can ack the alert if they'd like to turn off more alert messages, of course. You can have even higher levels if you like, probably going up the hierarchy of managers. I don't think we'll using more than the 3 levels I've described, but there is no reason to impose any limit. Does that sound like it would be useful? Regards, Henrik
--
Daniel Deighton <user-fdcc03e0c730@xymon.invalid>
list Tom Georgoulias
▸
Henrik Stoerner wrote:
I'm beginning to look at the issue of escalating alerts.
Does that sound like it would be useful?
I like this feature and wouldn't mind having it, for many of the same reasons already echoed by others on the list. Just so I understand, though, an ack by a person only prevents alerts from being sent out to other in their same level. Some one above or below still gets them, unless they ack the alert themselves? Say a level 1 guy acks the alert. It still goes to 0 & 2, right? Or would it just go to 2 and no longer to 0? Tom
list Bob Gordon
▸
On 4/13/05, Stuffle, David <user-4d88f4a4f51e@xymon.invalid> wrote:
I like it too. Would there still be a way to ack an alert which stops all alerts to everyone? This way your saying, "I see the alert and I know it's not a problem, so don't alert anyone else in any level." Similarly, sometimes I would like to say, "I see the alert, but I can't do anything about it right now because I'm on the golf course, so stop alerting me, but continue alerting everyone else, even people in my same level." Would the recipient levels be set per host? I may be level 1 on one host but level 3 on another.
Something like this is what our Datacenter Ops crew is clamoring for. As someone mentioned though, if it could have the capability of variable escalation times (so that we can adjust them to our existing SLA's) that would be great.. :) -- --==[ Bob Gordon ]==--