alert storm / intelligent extra mailscript
list Martin Flemming
Hi ! I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ... I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts and Henrik answered quite right like anytime :-) but this works only, if i know the id of the event, in our situation i needed it before the event(s) started .. :-( they don't want to got 5 or more mails for only one machine ( by ca. 50 or more machines) ... So, we've played somthing around with Duration,Recovered .. Now i've got two mails for Conn ( RED & Recovered) and one for cpu ( Yellow for reboot) ... we can reduce them to only two mails of course ( deactivate the Recovered for Conn or make an higher Duration for the cpu-reboot-mail) ... My Question is, if there still exist an intelligent extra mailscript or something else which look at the conn-condition and if it's bad, it doesn't send any alarm for all other services only for conn .... Thanks & cheers Martin
list Buchan Milne
▸
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
Hi ! I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ... I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge Alerts
IMHO, planned changes should be preceded by disabling the tests that would be affected, which can easily be done with a command-line ...
list Martin Flemming
▸
On Fri, 17 Apr 2009, Buchan Milne wrote:
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:Hi ! I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ... I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge AlertsIMHO, planned changes should be preceded by disabling the tests that would be affected,
Yep, you're right of course ..
which can easily be done with a command-line ...
But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ?
NAME
bb-ack.cgi - Hobbit CGI script to acknowledge alerts
bb-ack.cgi is passed a QUERY_STRING environment variable with the ACTION, NUMBER, DELAY and MESSAGE parameters.
NUMBER is the number identifying the host/service to be acknowledged. It is included in all alert-messages sent out by Hobbit.
Or did i something missing ?
cheers,
Martin
list Martin Flemming
Ok, i've to read again the manual first .. :-( http://www.hswn.dk/hobbit/help/manpages/man1/bb.1.html XYMON MESSAGE SYNTAX disable HOSTNAME.TESTNAME DURATION <additional text> Disables a specific test for DURATION minutes. This will cause the status of this test to be listed as "blue" on the BBDISPLAY server, and no alerts for this host/test will be generated. If DURATION is given as a number followed by s/m/h/d, it is interpreted as being in seconds/minutes/hours/days respectively. Todisablealltestsforahost,useanasterisk*forTESTNAME. Right ? I will try it .. sorry cheers, martin
▸
On Fri, 17 Apr 2009, Martin Flemming wrote:
On Fri, 17 Apr 2009, Buchan Milne wrote:On Friday 17 April 2009 11:16:17 Martin Flemming wrote:Hi !I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc... and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to > make only one command on the console ...I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.htmlRe: [hobbit] remote/commandline Acknowledge AlertsIMHO, planned changes should be preceded by disabling the tests that would be affected,Yep, you're right of course ..which can easily be done with a command-line ...But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ? NAME bb-ack.cgi - Hobbit CGI script to acknowledge alerts bb-ack.cgi is passed a QUERY_STRING environment variable with the ACTION, NUMBER, DELAY and MESSAGE parameters. NUMBER is the number identifying the host/service to be acknowledged. It is included in all alert-messages sent out by Hobbit. Or did i something missing ? cheers, Martin
list Martin Flemming
Heureka ! Works like a charme, of course :-) .. but if someone got such "intelligent mailscript" i'm interesting anyway ... :-)
▸
martin
On Fri, 17 Apr 2009, Martin Flemming wrote:
Ok, i've to read again the manual first .. :-( http://www.hswn.dk/hobbit/help/manpages/man1/bb.1.html XYMON MESSAGE SYNTAX disable HOSTNAME.TESTNAME DURATION <additional text> Disables a specific test for DURATION minutes. This will cause the status of this test to be listed as "blue" on the BBDISPLAY server, and no alerts for this host/test will be generated. If DURATION is given as a number followed by s/m/h/d, it is interpreted as being in seconds/minutes/hours/days respectively. Todisablealltestsforahost,useanasterisk*forTESTNAME. Right ? I will try it .. sorry cheers, martin On Fri, 17 Apr 2009, Martin Flemming wrote:On Fri, 17 Apr 2009, Buchan Milne wrote:On Friday 17 April 2009 11:16:17 Martin Flemming wrote:Hi ! I've got an problem with my colleagues and the alert-storm if a hole batchfarm will be rebooted for kernel-upgrade etc. .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make only one command on the console ... I know, i asked something similiar before http://www.hswn.dk/hobbiton/2009/01/msg00398.html Re: [hobbit] remote/commandline Acknowledge AlertsIMHO, planned changes should be preceded by disabling the tests that would be affected,Yep, you're right of course ..which can easily be done with a command-line ...But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ? NAME bb-ack.cgi - Hobbit CGI script to acknowledge alerts bb-ack.cgi is passed a QUERY_STRING environment variable with the ACTION, NUMBER, DELAY and MESSAGE parameters. NUMBER is the number identifying the host/service to be acknowledged. It is included in all alert-messages sent out by Hobbit. Or did i something missing ? cheers, Martin
Gruss
Martin Flemming
Martin Flemming
DESY / IT office : Building 2b / 008a
Notkestr. 85 phone : XXX - XXXX - XXXX
22603 Hamburg mail : user-f286aaa49a76@xymon.invalid