Xymon Mailing List Archive search

alert storm / intelligent extra mailscript

5 messages in this thread

list Martin Flemming · Fri, 17 Apr 2009 11:16:17 +0200 (CEST) ·
Hi !

I've got an problem with my colleagues and the alert-storm
if a hole batchfarm will be rebooted for kernel-upgrade etc.
  .. and the person, who did it, doesn't deactivate them or make an Acknowledge-Downtime,
don't ask me why ... he hate web-guis, want to make only one command on the console ...

I know, i asked something similiar before
http://www.hswn.dk/hobbiton/2009/01/msg00398.html
Re: [hobbit] remote/commandline Acknowledge Alerts

and Henrik answered quite right like anytime :-)
but this works only, if i know the id of the event,
in our situation i needed it before the event(s) started .. :-(


they don't want to got 5 or more mails for only one machine
( by ca. 50 or more machines) ...

So, we've played somthing around with Duration,Recovered ..

Now i've got two mails for Conn ( RED & Recovered)
and one for cpu ( Yellow for reboot) ... we can reduce them to only two 
mails of course ( deactivate the Recovered for Conn or make an higher 
Duration for the cpu-reboot-mail) ...

My Question is, if there still exist an intelligent extra mailscript or 
something else which look at the conn-condition and if it's bad, it doesn't send any 
alarm for all other services only for conn ....

Thanks & cheers


        Martin
list Buchan Milne · Fri, 17 Apr 2009 11:36:30 +0200 ·
quoted from Martin Flemming
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
Hi !

I've got an problem with my colleagues and the alert-storm
if a hole batchfarm will be rebooted for kernel-upgrade etc.
  .. and the person, who did it, doesn't deactivate them or make an
Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make
only one command on the console ...

I know, i asked something similiar before
http://www.hswn.dk/hobbiton/2009/01/msg00398.html
Re: [hobbit] remote/commandline Acknowledge Alerts
IMHO, planned changes should be preceded by disabling the tests that would be 
affected, which can easily be done with a command-line ...
list Martin Flemming · Fri, 17 Apr 2009 12:21:37 +0200 (CEST) ·
quoted from Buchan Milne
On Fri, 17 Apr 2009, Buchan Milne wrote:
On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
Hi !

I've got an problem with my colleagues and the alert-storm
if a hole batchfarm will be rebooted for kernel-upgrade etc.
  .. and the person, who did it, doesn't deactivate them or make an
Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to make
only one command on the console ...

I know, i asked something similiar before
http://www.hswn.dk/hobbiton/2009/01/msg00398.html
Re: [hobbit] remote/commandline Acknowledge Alerts
IMHO, planned changes should be preceded by disabling the tests that would be
affected,
Yep,  you're right of course ..
which can easily be done with a command-line ...
But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ?

NAME
        bb-ack.cgi - Hobbit CGI script to acknowledge alerts


        bb-ack.cgi is passed a QUERY_STRING environment variable with the ACTION, NUMBER, DELAY and MESSAGE parameters.


        NUMBER is the number identifying the host/service to be acknowledged.  It is included in all alert-messages sent out by Hobbit.

Or did i something missing ?

cheers,

        Martin
list Martin Flemming · Fri, 17 Apr 2009 12:41:01 +0200 (CEST) ·

Ok, i've to read again the manual first .. :-(

http://www.hswn.dk/hobbit/help/manpages/man1/bb.1.html

XYMON MESSAGE SYNTAX

disable HOSTNAME.TESTNAME DURATION <additional text>
     Disables a specific test for DURATION minutes. This will cause the status of this test to be listed as "blue" on the BBDISPLAY server, and no alerts for this host/test will be generated. If DURATION is given as a number followed by s/m/h/d, it is interpreted as being in seconds/minutes/hours/days respectively. Todisablealltestsforahost,useanasterisk*forTESTNAME.

Right ?

I will try it .. sorry

cheers,
 	martin
quoted from Martin Flemming

On Fri, 17 Apr 2009, Martin Flemming wrote:
On Fri, 17 Apr 2009, Buchan Milne wrote:
 On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
 Hi !
 I've got an problem with my colleagues and the alert-storm
 if a hole batchfarm will be rebooted for kernel-upgrade etc.
   .. and the person, who did it, doesn't deactivate them or make an
 Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to >  make
 only one command on the console ...
 I know, i asked something similiar before
 http://www.hswn.dk/hobbiton/2009/01/msg00398.html
 Re: [hobbit] remote/commandline Acknowledge Alerts
 IMHO, planned changes should be preceded by disabling the tests that would
 be
 affected,
Yep,  you're right of course ..
 which can easily be done with a command-line ...
But how i know early the alert-id for the host/service e.g. cpu & conn for host1,host2 ?

NAME
       bb-ack.cgi - Hobbit CGI script to acknowledge alerts


      bb-ack.cgi is passed a QUERY_STRING environment variable with the ACTION, NUMBER, DELAY and MESSAGE parameters.


      NUMBER is the number identifying the host/service to be acknowledged. It is included in all alert-messages sent out by Hobbit.

Or did i something missing ?

cheers,

      Martin

list Martin Flemming · Fri, 17 Apr 2009 13:02:43 +0200 (CEST) ·
Heureka !

Works like a charme, of course  :-)

.. but if someone got such "intelligent mailscript"
i'm interesting anyway ... :-)
quoted from Martin Flemming


 	martin


On Fri, 17 Apr 2009, Martin Flemming wrote:
Ok, i've to read again the manual first .. :-(

http://www.hswn.dk/hobbit/help/manpages/man1/bb.1.html

XYMON MESSAGE SYNTAX

disable HOSTNAME.TESTNAME DURATION <additional text>
   Disables a specific test for DURATION minutes. This will cause the status 
of this test to be listed as "blue" on the BBDISPLAY server, and no alerts 
for this host/test will be generated. If DURATION is given as a number 
followed by s/m/h/d, it is interpreted as being in seconds/minutes/hours/days 
respectively. Todisablealltestsforahost,useanasterisk*forTESTNAME.

Right ?

I will try it .. sorry

cheers,
	 martin

On Fri, 17 Apr 2009, Martin Flemming wrote:
 On Fri, 17 Apr 2009, Buchan Milne wrote:
  On Friday 17 April 2009 11:16:17 Martin Flemming wrote:
  Hi !

  I've got an problem with my colleagues and the alert-storm
  if a hole batchfarm will be rebooted for kernel-upgrade etc.
    .. and the person, who did it, doesn't deactivate them or make an
  Acknowledge-Downtime, don't ask me why ... he hate web-guis, want to 
  make
  only one command on the console ...

  I know, i asked something similiar before
  http://www.hswn.dk/hobbiton/2009/01/msg00398.html
  Re: [hobbit] remote/commandline Acknowledge Alerts
  IMHO, planned changes should be preceded by disabling the tests that 
  would
  be
  affected,
 Yep,  you're right of course ..
  which can easily be done with a command-line ...
 But how i know early the alert-id for the host/service e.g. cpu & conn for
 host1,host2 ?

 NAME
        bb-ack.cgi - Hobbit CGI script to acknowledge alerts


       bb-ack.cgi is passed a QUERY_STRING environment variable with the
 ACTION, NUMBER, DELAY and MESSAGE parameters.


       NUMBER is the number identifying the host/service to be
 acknowledged. It is included in all alert-messages sent out by Hobbit.

 Or did i something missing ?

 cheers,

       Martin

Gruss

        Martin Flemming


Martin Flemming
DESY / IT          office : Building 2b / 008a
Notkestr. 85       phone  : XXX - XXXX - XXXX
22603 Hamburg      mail   : user-f286aaa49a76@xymon.invalid