canceling acknowledgements
list Larry Barber
Is there some way to cancel an 'ack' after it has been issued? Thanks, Larry Barber
list Henrik Størner
▸
In <AANLkTikhQ6NkY6A-nH=user-2a7e5b9319ef@xymon.invalid> Larry Barber <user-6ef9c2864140@xymon.invalid> writes:
Is there some way to cancel an 'ack' after it has been issued?
I guess you can re-send the ack with a very short duration-value. Regards, Henrik
list Ryan Novosielski
I'm guessing, but I don't know, that these two things would work: 1) Remove the file created on the server that contains the ack notice. 2) Ack the same test as the same person for 1 minute or something similar. I was thinking about this same one the other day. The conclusion I came to was "Why on earth would I want to do that?" The only reason I could think of was a case of ack'd by accident.
▸
----- Original Message -----
From: Larry Barber <user-6ef9c2864140@xymon.invalid>
Date: Friday, October 22, 2010 9:21 pm
Subject: [xymon] canceling acknowledgements
To: xymon at xymon.com
Is there some way to cancel an 'ack' after it has been issued? Thanks, Larry Barber
---- _ _ _ _ ___ _ _ _
|Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark
list John Rothlisberger
Oh, this is so old...
There is a simple answer (IMO) as to why you would want to cancel and ack'd alert - a new alert for an ack'd test has been received.
This has been bugging me for years. Let's assume you ack an alert for the service "W3SVC" for 4 hours for maintenance, which you have stopped on purpose for whatever reason. Let's assume you also monitor services for SQL, Java, etc. If one of those also fails during the 4 hours of ack time - you won't get an alert. There are 3 tests that can have multiple components which I would like to know if new alerts arrive while a test has been ack'd: disk, procs, & svcs.
I have a situation currently where I have to address this exact scenario. As described I have multiple services and processes that I need to monitor, those are often ack'd but it doesn't change the importance of knowing if a different process/service needs attention. Thus, I have written the perl script below which (being just written and may not be 100% - YMMV) will monitor tests that have been ack'd and look for changes. If there is a change that needs to be addressed the ack is canceled by sending a temporary green status to the host.test. The next update from the client triggers a new alert.
Maybe this will help someone else too...
Steps:
Copy contents to ~bin/watch_ackd_alerts.pl
Create directory ~server/tmp/ACKS
Log file created in ~logs
Create the following crontab entry
*/5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1
#!/usr/bin/perl
# ------------------------------------------------------------------------------------------------
# Script Name: watch_ackd_alerts.pl
# Author: John Rothlisberger
# Created On: March 10, 2014
# VERSION="1.03102014.09";
# ------------------------------------------------------------------------------------------------
# Purpose: A script to monitor ack'd alerts and watch for changes.
# Example: The C: drive fills up and sends out a red alert. Knowing this will
# take some time to fix you ack the alert for 60 minutes. If, during that 60
# minute window the D: drive fills up you will not be notified as the 'disk' test
# has been acknowledged. This script is an attempt to short circuit the ack and
# allow for the new alert to be sent out.
# ------------------------------------------------------------------------------------------------
# Execution: Run every 5 minutes from xymon crontab:
# */5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1
# ------------------------------------------------------------------------------------------------
# Setup COUNT and directory where to store ack info files.
$COUNT=0;
$ACKSDIR="/home/xymon/server/tmp/ACKS";
# Log file
open(LOGFILE,">> /home/xymon/logs/ack_terminate.log") || die("can't open port_watch.email: $!");
# input file example
# servername|test|color|flags|lastchange|logtime|validtime|acktime|disabletime|sender|cookie|line1|ackmsg|dismsg|msg
# Open input file
open ALERTS, "/home/xymon/server/bin/xymon 0 'xymondboard color=yellow,red' |" or die "Couldn't execute: $!";
# Parse all active alerts
while (<ALERTS>) {
chomp;
@LINE=split(/\|/,$_);
$SERVERNAME=@LINE[0];
$TESTTYPE=@LINE[1];
$COLOR=@LINE[2];
$ACKTIME=@LINE[7];
$COOKIE=@LINE[10];
$LINE1=@LINE[11];
$ACKMSG=@LINE[12];
$DISMSG=@LINE[13];
$MSG=@LINE[14];
# Skip all alerts except disk, procs, and svcs (others are not tested)
next if ((! $TESTTYPE == "disk") || (! $TESTTYPE == "procs") || (! $TESTTYPE == "svcs"));
# If the alert has been ack'd we want to watch for any changes.
if ( $ACKTIME > 0) {
$COUNT+=1;
$REDS=0;
$YELLOWS=0;
$REDS_CMP=0;
$YELLOWS_CMP=0;
$NEED_COMP=0;
print LOGFILE "-------------------------------------------------------------------\n";
print LOGFILE "SERVERNAME: $SERVERNAME\n";
print LOGFILE "TESTTYPE: $TESTTYPE\n";
print LOGFILE "COLOR: $COLOR\n";
# If this is a new ack'd alert we will create a static file that holds current test state.
# We will use this file to decide if there have been changes to what has been ack'd.
if (! -e "${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" ) {
open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!";
open OUTFILE, ">${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!";
while (<DETAILS>) {
chomp;
if ( $_ =~ /^&/ ) {
$_ =~ s/\&//;
@DETLINE=split(/ /,$_);
# Change colors to numbers red=2 yellow=1 anything else = 0
if ( "$DETLINE[0]" eq "red" ) {
$COL_VALUE = "2";
} elsif ( "$DETLINE[0]" eq "yellow" ) {
$COL_VALUE = "1";
} else {
$COL_VALUE = "0";
}
# Create the status file which will be used on subsequent runs.
print OUTFILE "${COL_VALUE}:${DETLINE[1]}\n";
print LOGFILE "DATE: ${COL_VALUE}:${DETLINE[1]}\n";
}
}
close OUTFILE;
# We have already recorded the initial state of the test and saved it to a file.
# Now we will check new status output with that file to see if the alerts have changed.
} else {
open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!";
while (<DETAILS>) {
chomp;
if ( $_ =~ /^&/ ) {
$_ =~ s/\&//;
@DETLINE=split(/ /,$_);
# Change colors to numbers red=2 yellow=1 anything else = 0
if ( "$DETLINE[0]" eq "red" ) {
$COL_VALUE = "2";
} elsif ( "$DETLINE[0]" eq "yellow" ) {
$COL_VALUE = "1";
} else {
$COL_VALUE = "0";
}
push (@COMP_contents, "${COL_VALUE}:${DETLINE[1]}");
}
}
# Get the initial ack file that was created.
open INITFILE, "<${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!";
while (<INITFILE>) {
chomp;
push (@INITFILE_contents, "$_");
}
close INITFILE;
# Create a hash that contains the initial ack file.
%INITF = map(($_,1), at INITFILE_contents);
foreach (@COMP_contents) {
if ($INITF{$_}) {
# No change to the alert - nothing to do.
print LOGFILE "Alert hasn't changed: $_\n";
} else {
# Alert has changed in some form.
print LOGFILE "Alert has changed: $_\n";
@CURRENT=split(/:/,$_);
$CUR_COLOR=$CURRENT[0];
$CUR_TEST=$CURRENT[1];
@ACKD_EVENT=grep (/:${CUR_TEST}/, @INITFILE_contents);
@ACK_EVENT=split(/:/,$ACKD_EVENT[0]);
$ACK_COLOR=$ACKD_EVENT[0];
$ACK_TEST=$ACKD_EVENT[1];
# Compare the current alert color with that which was saved initially.
if ( $CUR_COLOR < $ACK_EVENT[0] ) {
# New color is lower than initial color - leave ack alone.
print LOGFILE "NO ACTION NEEDED (new level lower than ack level).\n";
} elsif ( $CUR_COLOR > $ACK_EVENT[0] ) {
# New color is greater than initial ack color, dump ack so new alerts can be sent.
if ( $ACK_COLOR == "" ) {
# New alert not previously detected (different service, process, or disk alerting)
print LOGFILE "ACK COLOR $ACK_COLOR\n";
print LOGFILE "NEW ALERT - DISABLE ACK AND SEND NEW ALERT.\n";
# Reset the server.test status to green. Next update will reset the alert condition effectivly
# canceling the acknowledge.
open RESET, "/home/xymon/server/bin/xymon 0 'status+10 ${SERVERNAME}.${TESTTYPE} green Ack Reset New Alert Rcvd.' |" or die "Couldn't execute: $!";
close RESET;
} else {
# Level of original alert has upgraded (typically yellow->red)
print LOGFILE "ACK COLOR $ACK_COLOR\n";
print LOGFILE "OLD ALERT - DISABLE ACK AND SEND NEW ALERT.\n";
# Reset the server.test status to green. Next update will reset the alert condition effectivly
# canceling the acknowledge.
open RESET, "/home/xymon/server/bin/xymon 0 'status+1 ${SERVERNAME}.${TESTTYPE} green Ack Reset Alert Level Changed.' |" or die "Couldn't execute: $!";
close RESET;
}
} else {
# Nothing to do here.
print LOGFILE "NO ACTION NEEDED (new level equals ack level).\n";
}
}
}
}
}
}
# When there are no ack'd alerts clean out the ACK status directory.
if ( $COUNT == 0 ) {
unlink glob "${ACKSDIR}/*";
}
Thanks,
John
Upcoming PTO:
(none)
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office
▸
From: Ryan Novosielski [mailto:user-ae4522577e16@xymon.invalid]
Sent: Friday, October 22, 2010 5:23 PM
To: xymon at xymon.com
Cc: xymon at xymon.com
Subject: Re: [xymon] canceling acknowledgements
I'm guessing, but I don't know, that these two things would work:
1) Remove the file created on the server that contains the ack notice.
2) Ack the same test as the same person for 1 minute or something similar.
I was thinking about this same one the other day. The conclusion I came to was "Why on earth would I want to do that?" The only reason I could think of was a case of ack'd by accident.
----- Original Message -----
From: Larry Barber <user-6ef9c2864140@xymon.invalid<mailto:user-6ef9c2864140@xymon.invalid>>
Date: Friday, October 22, 2010 9:21 pm
Subject: [xymon] canceling acknowledgements
To: xymon at xymon.com<mailto:xymon at xymon.com>
Is there some way to cancel an 'ack' after it has been issued? Thanks, Larry Barber
---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid<mailto:|user-ae4522577e16@xymon.invalid> - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
www.accenture.com
list Ryan Novosielski
I'm a little confused about some of this. For one, I don't ack servers for maintenance, I disable the affected tests. You can cancel a disable anytime you want. Also, since you acknowledge individual services, why would acknowledging one of them prevent you getting alerts on others. Do you mean, perhaps, alerts like on the "ports" test where failures can occur in different ways?
▸
On 03/10/2014 10:51 AM, user-7adce57665bb@xymon.invalid wrote:Oh, this is so old…
There is a simple answer (IMO) as to why you would want to cancel and
ack’d alert – a new alert for an ack’d test has been received.
This has been bugging me for years. Let’s assume you ack an alert for
the service “W3SVC” for 4 hours for maintenance, which you have stopped
on purpose for whatever reason. Let’s assume you also monitor services
for SQL, Java, etc. If one of those also fails during the 4 hours of
ack time – you won’t get an alert. There are 3 tests that can have
multiple components which I would like to know if new alerts arrive
while a test has been ack’d: disk, procs, & svcs.
I have a situation currently where I have to address this exact
scenario. As described I have multiple services and processes that I
need to monitor, those are often ack’d but it doesn’t change the
importance of knowing if a different process/service needs attention. Thus, I have written the perl script below which (being just written and
may not be 100% - YMMV) will monitor tests that have been ack’d and look
for changes. If there is a change that needs to be addressed the ack is
canceled by sending a temporary green status to the host.test. The next
update from the client triggers a new alert.
Maybe this will help someone else too…
Steps:
Copy contents to ~bin/watch_ackd_alerts.pl
Create directory ~server/tmp/ACKS
Log file created in ~logs
Create the following crontab entry
*/5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1
#!/usr/bin/perl
#
# Script Name: watch_ackd_alerts.pl
# Author: John Rothlisberger
# Created On: March 10, 2014
# VERSION="1.03102014.09";
#
# Purpose: A script to monitor ack'd alerts and watch for changes.
# Example: The C: drive fills up and sends out a red
alert. Knowing this will
# take some time to fix you ack the alert for 60 minutes. If, during that 60
# minute window the D: drive fills up you will not be
notified as the 'disk' test
# has been acknowledged. This script is an attempt to
short circuit the ack and
# allow for the new alert to be sent out.
#
# Execution: Run every 5 minutes from xymon crontab:
# */5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1
#
# Setup COUNT and directory where to store ack info files.
$COUNT=0;
$ACKSDIR="/home/xymon/server/tmp/ACKS";
# Log file
open(LOGFILE,">> /home/xymon/logs/ack_terminate.log") || die("can't open
port_watch.email: $!");
# input file example
#
servername|test|color|flags|lastchange|logtime|validtime|acktime|disabletime|sender|cookie|line1|ackmsg|dismsg|msg
# Open input file
open ALERTS, "/home/xymon/server/bin/xymon 0 'xymondboard
color=yellow,red' |" or die "Couldn't execute: $!";
# Parse all active alerts
while (<ALERTS>) {
chomp;
@LINE=split(/\|/,$_);
$SERVERNAME=@LINE[0];
$TESTTYPE=@LINE[1];
$COLOR=@LINE[2];
$ACKTIME=@LINE[7];
$COOKIE=@LINE[10];
$LINE1=@LINE[11];
$ACKMSG=@LINE[12];
$DISMSG=@LINE[13];
$MSG=@LINE[14];
# Skip all alerts except disk, procs, and svcs (others are not tested)
next if ((! $TESTTYPE == "disk") || (! $TESTTYPE == "procs") || (!
$TESTTYPE == "svcs"));
# If the alert has been ack'd we want to watch for any changes.
if ( $ACKTIME > 0) {
$COUNT+=1;
$REDS=0;
$YELLOWS=0;
$REDS_CMP=0;
$YELLOWS_CMP=0;
$NEED_COMP=0;
print LOGFILE
"-------------------------------------------------------------------\n";
print LOGFILE "SERVERNAME: $SERVERNAME\n";
print LOGFILE "TESTTYPE: $TESTTYPE\n";
print LOGFILE "COLOR: $COLOR\n";
# If this is a new ack'd alert we will create a static file that holds
current test state.
# We will use this file to decide if there have been changes to what has
been ack'd.
if (! -e "${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" ) {
open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog
${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!";
open OUTFILE,
">${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die
"Couldn't execute: $!";
while (<DETAILS>) {
chomp;
if ( $_ =~ /^&/ ) {
$_ =~ s/\&//;
@DETLINE=split(/ /,$_);
# Change colors to numbers red=2 yellow=1 anything else = 0
if ( "$DETLINE[0]" eq "red" ) {
$COL_VALUE = "2";
} elsif ( "$DETLINE[0]" eq "yellow" ) {
$COL_VALUE = "1";
} else {
$COL_VALUE = "0";
}
# Create the status file which will be used on subsequent runs.
print OUTFILE "${COL_VALUE}:${DETLINE[1]}\n";
print LOGFILE "DATE: ${COL_VALUE}:${DETLINE[1]}\n";
}
}
close OUTFILE;
# We have already recorded the initial state of the test and saved it to
a file.
# Now we will check new status output with that file to see if the
alerts have changed.
} else {
open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog
${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!";
while (<DETAILS>) {
chomp;
if ( $_ =~ /^&/ ) {
$_ =~ s/\&//;
@DETLINE=split(/ /,$_);
# Change colors to numbers red=2 yellow=1 anything else = 0
if ( "$DETLINE[0]" eq "red" ) {
$COL_VALUE = "2";
} elsif ( "$DETLINE[0]" eq "yellow" ) {
$COL_VALUE = "1";
} else {
$COL_VALUE = "0";
}
push (@COMP_contents, "${COL_VALUE}:${DETLINE[1]}");
}
}
# Get the initial ack file that was created.
open INITFILE,
"<${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die
"Couldn't execute: $!";
while (<INITFILE>) {
chomp;
push (@INITFILE_contents, "$_");
}
close INITFILE;
# Create a hash that contains the initial ack file.
%INITF = map(($_,1), at INITFILE_contents);
foreach (@COMP_contents) {
if ($INITF{$_}) {
# No change to the alert - nothing to do.
print LOGFILE "Alert hasn't changed: $_\n";
} else {
# Alert has changed in some form.
print LOGFILE "Alert has changed: $_\n";
@CURRENT=split(/:/,$_);
$CUR_COLOR=$CURRENT[0];
$CUR_TEST=$CURRENT[1];
@ACKD_EVENT=grep (/:${CUR_TEST}/, @INITFILE_contents);
@ACK_EVENT=split(/:/,$ACKD_EVENT[0]);
$ACK_COLOR=$ACKD_EVENT[0];
$ACK_TEST=$ACKD_EVENT[1];
# Compare the current alert color with that which was saved initially.
if ( $CUR_COLOR < $ACK_EVENT[0] ) {
# New color is lower than initial color - leave ack alone.
print LOGFILE "NO ACTION NEEDED (new level lower than
ack level).\n";
} elsif ( $CUR_COLOR > $ACK_EVENT[0] ) {
# New color is greater than initial ack color, dump ack so new alerts
can be sent.
if ( $ACK_COLOR == "" ) {
# New alert not previously detected (different service, process, or disk
alerting)
print LOGFILE "ACK COLOR $ACK_COLOR\n";
print LOGFILE "NEW ALERT - DISABLE ACK AND SEND NEW
ALERT.\n";
# Reset the server.test status to green. Next update will reset the
alert condition effectivly
# canceling the acknowledge.
open RESET, "/home/xymon/server/bin/xymon 0
'status+10 ${SERVERNAME}.${TESTTYPE} green Ack Reset New Alert Rcvd.' |"
or die "Couldn't execute: $!";
close RESET;
} else {
# Level of original alert has upgraded (typically yellow->red)
print LOGFILE "ACK COLOR $ACK_COLOR\n";
print LOGFILE "OLD ALERT - DISABLE ACK AND SEND NEW
ALERT.\n";
# Reset the server.test status to green. Next update will reset the
alert condition effectivly
# canceling the acknowledge.
open RESET, "/home/xymon/server/bin/xymon 0
'status+1 ${SERVERNAME}.${TESTTYPE} green Ack Reset Alert Level
Changed.' |" or die "Couldn't execute: $!";
close RESET;
}
} else {
# Nothing to do here.
print LOGFILE "NO ACTION NEEDED (new level equals ack
level).\n";
}
}
}
}
}
}
# When there are no ack'd alerts clean out the ACK status directory.
if ( $COUNT == 0 ) {
unlink glob "${ACKSDIR}/*";
}
Thanks,
John
Upcoming PTO:
(none)
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office**
*From:*Ryan Novosielski [mailto:user-ae4522577e16@xymon.invalid]
▸
*Sent:* Friday, October 22, 2010 5:23 PM *To:* xymon at xymon.com *Cc:* xymon at xymon.com *Subject:* Re: [xymon] canceling acknowledgements I'm guessing, but I don't know, that these two things would work: 1) Remove the file created on the server that contains the ack notice. 2) Ack the same test as the same person for 1 minute or something similar. I was thinking about this same one the other day. The conclusion I came to was "Why on earth would I want to do that?" The only reason I could think of was a case of ack'd by accident. ----- Original Message ----- From: Larry Barber <user-6ef9c2864140@xymon.invalid <mailto:user-6ef9c2864140@xymon.invalid>> Date: Friday, October 22, 2010 9:21 pm Subject: [xymon] canceling acknowledgements To: xymon at xymon.com <mailto:xymon at xymon.com>Is there some way to cancel an 'ack' after it has been issued? Thanks, Larry Barber---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid
<mailto:|user-ae4522577e16@xymon.invalid> - 973/972.0922 (2-0922)
▸
\__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you
have received it in error, please notify the sender immediately and
delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and
its affiliates, including e-mail and instant messaging (including
content), may be scanned by our systems for the purposes of information
security and assessment of internal compliance with Accenture policy.
www.accenture.com--
____*Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Sr. Systems Programmer
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIT/EI-Academic Svcs. - ADMC 450, Newark
`'
list John Rothlisberger
It was so new I missed a screw up in my code.
I had to change my "if next" to be:
# Skip all alerts except disk, procs, and svcs (others are not tested)
if ("$TESTTYPE" ne "svcs" && "$TESTTYPE" ne "disk" && "$TESTTYPE" ne "procs") {
next;
}
Please forgive me....
▸
Thanks,
John
Upcoming PTO:
(none)
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office
From: Rothlisberger, John R.
Sent: Monday, March 10, 2014 9:51 AM
To: 'xymon at xymon.com'
Subject: RE: [xymon] canceling acknowledgements
Oh, this is so old...
There is a simple answer (IMO) as to why you would want to cancel and ack'd alert - a new alert for an ack'd test has been received.
This has been bugging me for years. Let's assume you ack an alert for the service "W3SVC" for 4 hours for maintenance, which you have stopped on purpose for whatever reason. Let's assume you also monitor services for SQL, Java, etc. If one of those also fails during the 4 hours of ack time - you won't get an alert. There are 3 tests that can have multiple components which I would like to know if new alerts arrive while a test has been ack'd: disk, procs, & svcs.
I have a situation currently where I have to address this exact scenario. As described I have multiple services and processes that I need to monitor, those are often ack'd but it doesn't change the importance of knowing if a different process/service needs attention. Thus, I have written the perl script below which (being just written and may not be 100% - YMMV) will monitor tests that have been ack'd and look for changes. If there is a change that needs to be addressed the ack is canceled by sending a temporary green status to the host.test. The next update from the client triggers a new alert.
Maybe this will help someone else too...
Steps:
Copy contents to ~bin/watch_ackd_alerts.pl
Create directory ~server/tmp/ACKS
Log file created in ~logs
Create the following crontab entry
*/5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1
#!/usr/bin/perl
# ------------------------------------------------------------------------------------------------
# Script Name: watch_ackd_alerts.pl
# Author: John Rothlisberger
# Created On: March 10, 2014
# VERSION="1.03102014.09";
# ------------------------------------------------------------------------------------------------
# Purpose: A script to monitor ack'd alerts and watch for changes.
# Example: The C: drive fills up and sends out a red alert. Knowing this will
# take some time to fix you ack the alert for 60 minutes. If, during that 60
# minute window the D: drive fills up you will not be notified as the 'disk' test
# has been acknowledged. This script is an attempt to short circuit the ack and
# allow for the new alert to be sent out.
# ------------------------------------------------------------------------------------------------
# Execution: Run every 5 minutes from xymon crontab:
# */5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1
# ------------------------------------------------------------------------------------------------
# Setup COUNT and directory where to store ack info files.
$COUNT=0;
$ACKSDIR="/home/xymon/server/tmp/ACKS";
# Log file
open(LOGFILE,">> /home/xymon/logs/ack_terminate.log") || die("can't open port_watch.email: $!");
# input file example
# servername|test|color|flags|lastchange|logtime|validtime|acktime|disabletime|sender|cookie|line1|ackmsg|dismsg|msg
# Open input file
open ALERTS, "/home/xymon/server/bin/xymon 0 'xymondboard color=yellow,red' |" or die "Couldn't execute: $!";
# Parse all active alerts
while (<ALERTS>) {
chomp;
@LINE=split(/\|/,$_);
$SERVERNAME=@LINE[0];
$TESTTYPE=@LINE[1];
$COLOR=@LINE[2];
$ACKTIME=@LINE[7];
$COOKIE=@LINE[10];
$LINE1=@LINE[11];
$ACKMSG=@LINE[12];
$DISMSG=@LINE[13];
$MSG=@LINE[14];
# Skip all alerts except disk, procs, and svcs (others are not tested)
if ("$TESTTYPE" ne "svcs" && "$TESTTYPE" ne "disk" && "$TESTTYPE" ne "procs") {
next;
▸
}
# If the alert has been ack'd we want to watch for any changes.
if ( $ACKTIME > 0) {
$COUNT+=1;
$REDS=0;
$YELLOWS=0;
$REDS_CMP=0;
$YELLOWS_CMP=0;
$NEED_COMP=0;
print LOGFILE "-------------------------------------------------------------------\n";
print LOGFILE "SERVERNAME: $SERVERNAME\n";
print LOGFILE "TESTTYPE: $TESTTYPE\n";
print LOGFILE "COLOR: $COLOR\n";
# If this is a new ack'd alert we will create a static file that holds current test state.
# We will use this file to decide if there have been changes to what has been ack'd.
if (! -e "${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" ) {
open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!";
open OUTFILE, ">${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!";
while (<DETAILS>) {
chomp;
if ( $_ =~ /^&/ ) {
$_ =~ s/\&//;
@DETLINE=split(/ /,$_);
# Change colors to numbers red=2 yellow=1 anything else = 0
if ( "$DETLINE[0]" eq "red" ) {
$COL_VALUE = "2";
} elsif ( "$DETLINE[0]" eq "yellow" ) {
$COL_VALUE = "1";
} else {
$COL_VALUE = "0";
}
# Create the status file which will be used on subsequent runs.
print OUTFILE "${COL_VALUE}:${DETLINE[1]}\n";
print LOGFILE "DATA: ${COL_VALUE}:${DETLINE[1]}\n";
▸
}
}
close OUTFILE;
# We have already recorded the initial state of the test and saved it to a file.
# Now we will check new status output with that file to see if the alerts have changed.
} else {
open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!";
while (<DETAILS>) {
chomp;
if ( $_ =~ /^&/ ) {
$_ =~ s/\&//;
@DETLINE=split(/ /,$_);
# Change colors to numbers red=2 yellow=1 anything else = 0
if ( "$DETLINE[0]" eq "red" ) {
$COL_VALUE = "2";
} elsif ( "$DETLINE[0]" eq "yellow" ) {
$COL_VALUE = "1";
} else {
$COL_VALUE = "0";
}
push (@COMP_contents, "${COL_VALUE}:${DETLINE[1]}");
}
}
# Get the initial ack file that was created.
open INITFILE, "<${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!";
while (<INITFILE>) {
chomp;
push (@INITFILE_contents, "$_");
}
close INITFILE;
# Create a hash that contains the initial ack file.
%INITF = map(($_,1), at INITFILE_contents);
foreach (@COMP_contents) {
if ($INITF{$_}) {
# No change to the alert - nothing to do.
print LOGFILE "Alert hasn't changed: $_\n";
} else {
# Alert has changed in some form.
print LOGFILE "Alert has changed: $_\n";
@CURRENT=split(/:/,$_);
$CUR_COLOR=$CURRENT[0];
$CUR_TEST=$CURRENT[1];
@ACKD_EVENT=grep (/:${CUR_TEST}/, @INITFILE_contents);
@ACK_EVENT=split(/:/,$ACKD_EVENT[0]);
$ACK_COLOR=$ACKD_EVENT[0];
$ACK_TEST=$ACKD_EVENT[1];
# Compare the current alert color with that which was saved initially.
if ( $CUR_COLOR < $ACK_EVENT[0] ) {
# New color is lower than initial color - leave ack alone.
print LOGFILE "NO ACTION NEEDED (new level lower than ack level).\n";
} elsif ( $CUR_COLOR > $ACK_EVENT[0] ) {
# New color is greater than initial ack color, dump ack so new alerts can be sent.
if ( $ACK_COLOR == "" ) {
# New alert not previously detected (different service, process, or disk alerting)
print LOGFILE "ACK COLOR $ACK_COLOR\n";
print LOGFILE "NEW ALERT - DISABLE ACK AND SEND NEW ALERT.\n";
# Reset the server.test status to green. Next update will reset the alert condition effectivly
# canceling the acknowledge.
open RESET, "/home/xymon/server/bin/xymon 0 'status+10 ${SERVERNAME}.${TESTTYPE} green Ack Reset New Alert Rcvd.' |" or die "Couldn't execute: $!";
close RESET;
} else {
# Level of original alert has upgraded (typically yellow->red)
print LOGFILE "ACK COLOR $ACK_COLOR\n";
print LOGFILE "OLD ALERT - DISABLE ACK AND SEND NEW ALERT.\n";
# Reset the server.test status to green. Next update will reset the alert condition effectivly
# canceling the acknowledge.
open RESET, "/home/xymon/server/bin/xymon 0 'status+1 ${SERVERNAME}.${TESTTYPE} green Ack Reset Alert Level Changed.' |" or die "Couldn't execute: $!";
close RESET;
}
} else {
# Nothing to do here.
print LOGFILE "NO ACTION NEEDED (new level equals ack level).\n";
}
}
}
}
}
}
# When there are no ack'd alerts clean out the ACK status directory.
if ( $COUNT == 0 ) {
unlink glob "${ACKSDIR}/*";
}
Thanks,
John
Upcoming PTO:
(none)
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office
From: Ryan Novosielski [mailto:user-ae4522577e16@xymon.invalid]
Sent: Friday, October 22, 2010 5:23 PM
To: xymon at xymon.com
Cc: xymon at xymon.com
Subject: Re: [xymon] canceling acknowledgements
I'm guessing, but I don't know, that these two things would work:
1) Remove the file created on the server that contains the ack notice.
2) Ack the same test as the same person for 1 minute or something similar.
I was thinking about this same one the other day. The conclusion I came to was "Why on earth would I want to do that?" The only reason I could think of was a case of ack'd by accident.
----- Original Message -----
From: Larry Barber <user-6ef9c2864140@xymon.invalid<mailto:user-6ef9c2864140@xymon.invalid>>
Date: Friday, October 22, 2010 9:21 pm
Subject: [xymon] canceling acknowledgements
To: xymon at xymon.com<mailto:xymon at xymon.com>
Is there some way to cancel an 'ack' after it has been issued? Thanks, Larry Barber
---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid<mailto:|user-ae4522577e16@xymon.invalid> - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. www.accenture.com
list Paul Root
He's saying if he is monitoring 4-5 different processes or ports, or files, and he acks one of them failing, he needs to know if something else fails.
▸
-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Ryan Novosielski
Sent: Monday, March 10, 2014 10:07 AM
To: xymon at xymon.com
Subject: Re: [Xymon] [xymon] canceling acknowledgements
I'm a little confused about some of this. For one, I don't ack servers
for maintenance, I disable the affected tests. You can cancel a disable
anytime you want. Also, since you acknowledge individual services, why
would acknowledging one of them prevent you getting alerts on others. Do
you mean, perhaps, alerts like on the "ports" test where failures can
occur in different ways?
On 03/10/2014 10:51 AM, user-7adce57665bb@xymon.invalid wrote:Oh, this is so old... There is a simple answer (IMO) as to why you would want to cancel and ack'd alert - a new alert for an ack'd test has been received. This has been bugging me for years. Let's assume you ack an alert for the service "W3SVC" for 4 hours for maintenance, which you have stopped on purpose for whatever reason. Let's assume you also monitor services for SQL, Java, etc. If one of those also fails during the 4 hours of ack time - you won't get an alert. There are 3 tests that can have multiple components which I would like to know if new alerts arrive while a test has been ack'd: disk, procs, & svcs. I have a situation currently where I have to address this exact scenario. As described I have multiple services and processes that I need to monitor, those are often ack'd but it doesn't change the importance of knowing if a different process/service needs attention. Thus, I have written the perl script below which (being just written and may not be 100% - YMMV) will monitor tests that have been ack'd and look for changes. If there is a change that needs to be addressed the ack is canceled by sending a temporary green status to the host.test. The next update from the client triggers a new alert. Maybe this will help someone else too... Steps: Copy contents to ~bin/watch_ackd_alerts.pl Create directory ~server/tmp/ACKS Log file created in ~logs Create the following crontab entry */5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1 #!/usr/bin/perl # # Script Name: watch_ackd_alerts.pl # Author: John Rothlisberger # Created On: March 10, 2014 # VERSION="1.03102014.09"; # # Purpose: A script to monitor ack'd alerts and watch for changes. # Example: The C: drive fills up and sends out a red alert. Knowing this will # take some time to fix you ack the alert for 60 minutes. If, during that 60 # minute window the D: drive fills up you will not be notified as the 'disk' test # has been acknowledged. This script is an attempt to short circuit the ack and # allow for the new alert to be sent out. # # Execution: Run every 5 minutes from xymon crontab: # */5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1 # # Setup COUNT and directory where to store ack info files. $COUNT=0; $ACKSDIR="/home/xymon/server/tmp/ACKS"; # Log file open(LOGFILE,">> /home/xymon/logs/ack_terminate.log") || die("can't open port_watch.email: $!"); # input file example # servername|test|color|flags|lastchange|logtime|validtime|acktime|disabletime|sender|cookie|line1|ackmsg|dismsg|msg # Open input file open ALERTS, "/home/xymon/server/bin/xymon 0 'xymondboard color=yellow,red' |" or die "Couldn't execute: $!"; # Parse all active alerts while (<ALERTS>) { chomp; @LINE=split(/\|/,$_); $SERVERNAME=@LINE[0]; $TESTTYPE=@LINE[1]; $COLOR=@LINE[2]; $ACKTIME=@LINE[7]; $COOKIE=@LINE[10]; $LINE1=@LINE[11]; $ACKMSG=@LINE[12]; $DISMSG=@LINE[13]; $MSG=@LINE[14]; # Skip all alerts except disk, procs, and svcs (others are not tested) next if ((! $TESTTYPE == "disk") || (! $TESTTYPE == "procs") || (! $TESTTYPE == "svcs")); # If the alert has been ack'd we want to watch for any changes. if ( $ACKTIME > 0) { $COUNT+=1; $REDS=0; $YELLOWS=0; $REDS_CMP=0; $YELLOWS_CMP=0; $NEED_COMP=0; print LOGFILE "-------------------------------------------------------------------\n"; print LOGFILE "SERVERNAME: $SERVERNAME\n"; print LOGFILE "TESTTYPE: $TESTTYPE\n"; print LOGFILE "COLOR: $COLOR\n"; # If this is a new ack'd alert we will create a static file that holds current test state. # We will use this file to decide if there have been changes to what has been ack'd. if (! -e "${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" ) { open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!"; open OUTFILE, ">${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!"; while (<DETAILS>) { chomp; if ( $_ =~ /^&/ ) { $_ =~ s/\&//; @DETLINE=split(/ /,$_); # Change colors to numbers red=2 yellow=1 anything else = 0 if ( "$DETLINE[0]" eq "red" ) { $COL_VALUE = "2"; } elsif ( "$DETLINE[0]" eq "yellow" ) { $COL_VALUE = "1"; } else { $COL_VALUE = "0"; } # Create the status file which will be used on subsequent runs. print OUTFILE "${COL_VALUE}:${DETLINE[1]}\n"; print LOGFILE "DATE: ${COL_VALUE}:${DETLINE[1]}\n"; } } close OUTFILE; # We have already recorded the initial state of the test and saved it to a file. # Now we will check new status output with that file to see if the alerts have changed. } else { open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!"; while (<DETAILS>) { chomp; if ( $_ =~ /^&/ ) { $_ =~ s/\&//; @DETLINE=split(/ /,$_); # Change colors to numbers red=2 yellow=1 anything else = 0 if ( "$DETLINE[0]" eq "red" ) { $COL_VALUE = "2"; } elsif ( "$DETLINE[0]" eq "yellow" ) { $COL_VALUE = "1"; } else { $COL_VALUE = "0"; } push (@COMP_contents, "${COL_VALUE}:${DETLINE[1]}"); } } # Get the initial ack file that was created. open INITFILE, "<${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!"; while (<INITFILE>) { chomp; push (@INITFILE_contents, "$_"); } close INITFILE; # Create a hash that contains the initial ack file. %INITF = map(($_,1), at INITFILE_contents); foreach (@COMP_contents) { if ($INITF{$_}) { # No change to the alert - nothing to do. print LOGFILE "Alert hasn't changed: $_\n"; } else { # Alert has changed in some form. print LOGFILE "Alert has changed: $_\n"; @CURRENT=split(/:/,$_); $CUR_COLOR=$CURRENT[0]; $CUR_TEST=$CURRENT[1]; @ACKD_EVENT=grep (/:${CUR_TEST}/, @INITFILE_contents); @ACK_EVENT=split(/:/,$ACKD_EVENT[0]); $ACK_COLOR=$ACKD_EVENT[0]; $ACK_TEST=$ACKD_EVENT[1]; # Compare the current alert color with that which was saved initially. if ( $CUR_COLOR < $ACK_EVENT[0] ) { # New color is lower than initial color - leave ack alone. print LOGFILE "NO ACTION NEEDED (new level lower than ack level).\n"; } elsif ( $CUR_COLOR > $ACK_EVENT[0] ) { # New color is greater than initial ack color, dump ack so new alerts can be sent. if ( $ACK_COLOR == "" ) { # New alert not previously detected (different service, process, or disk alerting) print LOGFILE "ACK COLOR $ACK_COLOR\n"; print LOGFILE "NEW ALERT - DISABLE ACK AND SEND NEW ALERT.\n"; # Reset the server.test status to green. Next update will reset the alert condition effectivly # canceling the acknowledge. open RESET, "/home/xymon/server/bin/xymon 0 'status+10 ${SERVERNAME}.${TESTTYPE} green Ack Reset New Alert Rcvd.' |" or die "Couldn't execute: $!"; close RESET; } else { # Level of original alert has upgraded (typically yellow->red) print LOGFILE "ACK COLOR $ACK_COLOR\n"; print LOGFILE "OLD ALERT - DISABLE ACK AND SEND NEW ALERT.\n"; # Reset the server.test status to green. Next update will reset the alert condition effectivly # canceling the acknowledge. open RESET, "/home/xymon/server/bin/xymon 0 'status+1 ${SERVERNAME}.${TESTTYPE} green Ack Reset Alert Level Changed.' |" or die "Couldn't execute: $!"; close RESET; } } else { # Nothing to do here. print LOGFILE "NO ACTION NEEDED (new level equals ack level).\n"; } } } } } } # When there are no ack'd alerts clean out the ACK status directory. if ( $COUNT == 0 ) { unlink glob "${ACKSDIR}/*"; } Thanks, John Upcoming PTO: (none) John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture XXX.XXX.XXXX office** *From:*Ryan Novosielski [mailto:user-ae4522577e16@xymon.invalid] *Sent:* Friday, October 22, 2010 5:23 PM *To:* xymon at xymon.com *Cc:* xymon at xymon.com *Subject:* Re: [xymon] canceling acknowledgements I'm guessing, but I don't know, that these two things would work: 1) Remove the file created on the server that contains the ack notice. 2) Ack the same test as the same person for 1 minute or something similar. I was thinking about this same one the other day. The conclusion I came to was "Why on earth would I want to do that?" The only reason I could think of was a case of ack'd by accident. ----- Original Message ----- From: Larry Barber <user-6ef9c2864140@xymon.invalid <mailto:user-6ef9c2864140@xymon.invalid>> Date: Friday, October 22, 2010 9:21 pm Subject: [xymon] canceling acknowledgements To: xymon at xymon.com <mailto:xymon at xymon.com>Is there some way to cancel an 'ack' after it has been issued? Thanks, Larry Barber---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid <mailto:|user-ae4522577e16@xymon.invalid> - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. www.accenture.com
--
____*Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Sr. Systems Programmer
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIT/EI-Academic Svcs. - ADMC 450, Newark
`'
list John Rothlisberger
Paul, you are correct. A better analogy may be for disk. If you monitor SQL servers you probably know that they can fill up drives quickly and often. If it is a situation where you have to wait for someone else to shrink a DB or remove old files, you can simply ack the alert. But, while that alert is acknowledged, if the C drive fills up you wouldn't be sent a new alert. The same can be said for disabling an alert. If you disable alerts for disk on server and a drive fills up, you won't be notified. This script simply monitors those tests that have been acknowledged and looks to see if anything has changed and allow for new alerts to flow.
▸
Thanks,
John
Upcoming PTO:
(none)
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office
-----Original Message----- From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Root, Paul T Sent: Monday, March 10, 2014 12:18 PM To: 'Ryan Novosielski'; 'xymon at xymon.com' Subject: Re: [Xymon] [xymon] canceling acknowledgements He's saying if he is monitoring 4-5 different processes or ports, or files, and he acks one of them failing, he needs to know if something else fails. -----Original Message----- From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Ryan Novosielski Sent: Monday, March 10, 2014 10:07 AM To: xymon at xymon.com Subject: Re: [Xymon] [xymon] canceling acknowledgements I'm a little confused about some of this. For one, I don't ack servers for maintenance, I disable the affected tests. You can cancel a disable anytime you want. Also, since you acknowledge individual services, why would acknowledging one of them prevent you getting alerts on others. Do you mean, perhaps, alerts like on the "ports" test where failures can occur in different ways? On 03/10/2014 10:51 AM, user-7adce57665bb@xymon.invalid wrote:Oh, this is so old... There is a simple answer (IMO) as to why you would want to cancel and ack'd alert - a new alert for an ack'd test has been received. This has been bugging me for years. Let's assume you ack an alert for the service "W3SVC" for 4 hours for maintenance, which you have stopped on purpose for whatever reason. Let's assume you also monitor services for SQL, Java, etc. If one of those also fails during the 4 hours of ack time - you won't get an alert. There are 3 tests that can have multiple components which I would like to know if new alerts arrive while a test has been ack'd: disk, procs, & svcs. I have a situation currently where I have to address this exact scenario. As described I have multiple services and processes that I need to monitor, those are often ack'd but it doesn't change the importance of knowing if a different process/service needs attention. Thus, I have written the perl script below which (being just written and may not be 100% - YMMV) will monitor tests that have been ack'd and look for changes. If there is a change that needs to be addressed the ack is canceled by sending a temporary green status to the host.test. The next update from the client triggers a new alert. Maybe this will help someone else too... Steps: Copy contents to ~bin/watch_ackd_alerts.pl Create directory ~server/tmp/ACKS Log file created in ~logs Create the following crontab entry */5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1 #!/usr/bin/perl # # Script Name: watch_ackd_alerts.pl # Author: John Rothlisberger # Created On: March 10, 2014 # VERSION="1.03102014.09"; # # Purpose: A script to monitor ack'd alerts and watch for changes. # Example: The C: drive fills up and sends out a red alert. Knowing this will # take some time to fix you ack the alert for 60 minutes. If, during that 60 # minute window the D: drive fills up you will not be notified as the 'disk' test # has been acknowledged. This script is an attempt to short circuit the ack and # allow for the new alert to be sent out. # # Execution: Run every 5 minutes from xymon crontab: # */5 * * * * /home/xymon/bin/watch_ackd_alerts.pl > /dev/null 2>&1 # # Setup COUNT and directory where to store ack info files. $COUNT=0; $ACKSDIR="/home/xymon/server/tmp/ACKS"; # Log file open(LOGFILE,">> /home/xymon/logs/ack_terminate.log") || die("can't open port_watch.email: $!"); # input file example #
servername|test|color|flags|lastchange|logtime|validtime|acktime|disabletime|send er|cookie|line1|ackmsg|dismsg|msg
▸
# Open input file open ALERTS, "/home/xymon/server/bin/xymon 0 'xymondboard color=yellow,red' |" or die "Couldn't execute: $!"; # Parse all active alerts while (<ALERTS>) { chomp; @LINE=split(/\|/,$_); $SERVERNAME=@LINE[0]; $TESTTYPE=@LINE[1]; $COLOR=@LINE[2]; $ACKTIME=@LINE[7]; $COOKIE=@LINE[10]; $LINE1=@LINE[11]; $ACKMSG=@LINE[12]; $DISMSG=@LINE[13]; $MSG=@LINE[14]; # Skip all alerts except disk, procs, and svcs (others are not tested) next if ((! $TESTTYPE == "disk") || (! $TESTTYPE == "procs") || (! $TESTTYPE == "svcs")); # If the alert has been ack'd we want to watch for any changes. if ( $ACKTIME > 0) { $COUNT+=1; $REDS=0; $YELLOWS=0; $REDS_CMP=0; $YELLOWS_CMP=0; $NEED_COMP=0; print LOGFILE "-------------------------------------------------------------------\n "; print LOGFILE "SERVERNAME: $SERVERNAME\n"; print LOGFILE "TESTTYPE: $TESTTYPE\n"; print LOGFILE "COLOR: $COLOR\n"; # If this is a new ack'd alert we will create a static file that holds current test state. # We will use this file to decide if there have been changes to what has been ack'd. if (! -e "${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" ) { open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!"; open OUTFILE, ">${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!"; while (<DETAILS>) { chomp; if ( $_ =~ /^&/ ) { $_ =~ s/\&//; @DETLINE=split(/ /,$_); # Change colors to numbers red=2 yellow=1 anything else = 0 if ( "$DETLINE[0]" eq "red" ) { $COL_VALUE = "2"; } elsif ( "$DETLINE[0]" eq "yellow" ) { $COL_VALUE = "1"; } else { $COL_VALUE = "0"; } # Create the status file which will be used on subsequent runs. print OUTFILE "${COL_VALUE}:${DETLINE[1]}\n"; print LOGFILE "DATE: ${COL_VALUE}:${DETLINE[1]}\n"; } } close OUTFILE; # We have already recorded the initial state of the test and saved it to a file. # Now we will check new status output with that file to see if the alerts have changed. } else { open DETAILS, "/home/xymon/server/bin/xymon 0 'xymondlog ${SERVERNAME}.${TESTTYPE}' |" or die "Couldn't execute: $!"; while (<DETAILS>) { chomp; if ( $_ =~ /^&/ ) { $_ =~ s/\&//; @DETLINE=split(/ /,$_); # Change colors to numbers red=2 yellow=1 anything else = 0 if ( "$DETLINE[0]" eq "red" ) { $COL_VALUE = "2"; } elsif ( "$DETLINE[0]" eq "yellow" ) { $COL_VALUE = "1"; } else { $COL_VALUE = "0"; } push (@COMP_contents, "${COL_VALUE}:${DETLINE[1]}"); } } # Get the initial ack file that was created. open INITFILE, "<${ACKSDIR}/${SERVERNAME}${TESTTYPE}${COLOR}${ACKTIME}" or die "Couldn't execute: $!"; while (<INITFILE>) { chomp; push (@INITFILE_contents, "$_"); } close INITFILE; # Create a hash that contains the initial ack file. %INITF = map(($_,1), at INITFILE_contents); foreach (@COMP_contents) { if ($INITF{$_}) { # No change to the alert - nothing to do. print LOGFILE "Alert hasn't changed: $_\n"; } else { # Alert has changed in some form. print LOGFILE "Alert has changed: $_\n"; @CURRENT=split(/:/,$_); $CUR_COLOR=$CURRENT[0]; $CUR_TEST=$CURRENT[1]; @ACKD_EVENT=grep (/:${CUR_TEST}/, @INITFILE_contents); @ACK_EVENT=split(/:/,$ACKD_EVENT[0]); $ACK_COLOR=$ACKD_EVENT[0]; $ACK_TEST=$ACKD_EVENT[1]; # Compare the current alert color with that which was saved initially. if ( $CUR_COLOR < $ACK_EVENT[0] ) { # New color is lower than initial color - leave ack alone. print LOGFILE "NO ACTION NEEDED (new level lower than ack level).\n"; } elsif ( $CUR_COLOR > $ACK_EVENT[0] ) { # New color is greater than initial ack color, dump ack so new alerts can be sent. if ( $ACK_COLOR == "" ) { # New alert not previously detected (different service, process, or disk alerting) print LOGFILE "ACK COLOR $ACK_COLOR\n"; print LOGFILE "NEW ALERT - DISABLE ACK AND SEND NEW ALERT.\n"; # Reset the server.test status to green. Next update will reset the alert condition effectivly # canceling the acknowledge. open RESET, "/home/xymon/server/bin/xymon 0 'status+10 ${SERVERNAME}.${TESTTYPE} green Ack Reset New Alert Rcvd.' |" or die "Couldn't execute: $!"; close RESET; } else { # Level of original alert has upgraded (typically yellow->red) print LOGFILE "ACK COLOR $ACK_COLOR\n"; print LOGFILE "OLD ALERT - DISABLE ACK AND SEND NEW ALERT.\n"; # Reset the server.test status to green. Next update will reset the alert condition effectivly # canceling the acknowledge. open RESET, "/home/xymon/server/bin/xymon 0 'status+1 ${SERVERNAME}.${TESTTYPE} green Ack Reset Alert Level Changed.' |" or die "Couldn't execute: $!"; close RESET; } } else { # Nothing to do here. print LOGFILE "NO ACTION NEEDED (new level equals ack level).\n"; } } } } } } # When there are no ack'd alerts clean out the ACK status directory. if ( $COUNT == 0 ) { unlink glob "${ACKSDIR}/*"; } Thanks, John Upcoming PTO: (none)____John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture XXX.XXX.XXXX office**____*From:*Ryan Novosielski [mailto:user-ae4522577e16@xymon.invalid] *Sent:* Friday, October 22, 2010 5:23 PM *To:* xymon at xymon.com *Cc:* xymon at xymon.com *Subject:* Re: [xymon] canceling acknowledgements I'm guessing, but I don't know, that these two things would work: 1) Remove the file created on the server that contains the ack notice. 2) Ack the same test as the same person for 1 minute or something similar. I was thinking about this same one the other day. The conclusion I came to was "Why on earth would I want to do that?" The only reason I could think of was a case of ack'd by accident. ----- Original Message ----- From: Larry Barber <user-6ef9c2864140@xymon.invalid <mailto:user-6ef9c2864140@xymon.invalid>> Date: Friday, October 22, 2010 9:21 pm Subject: [xymon] canceling acknowledgements To: xymon at xymon.com <mailto:xymon at xymon.com>Is there some way to cancel an 'ack' after it has been issued? Thanks, Larry Barber---- _ _ _ _ ___ _ _ _ |Y#| | | |\/| | \ |\ | | |Ryan Novosielski - Sr. Systems Programmer |$&| |__| | | |__/ | \| _| |user-ae4522577e16@xymon.invalid <mailto:|user-ae4522577e16@xymon.invalid> - 973/972.0922 (2-0922) \__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark -- This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. www.accenture.com-- ____*Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* || \\UTGERS |---------------------*O*--------------------- ||_// Biomedical | Ryan Novosielski - Sr. Systems Programmer || \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922) || \\ Sciences | OIT/EI-Academic Svcs. - ADMC 450, Newark `'
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. www.accenture.com
list Henrik Størner
▸
Den 2014-03-10 15:51, user-7adce57665bb@xymon.invalid skrev:
This has been bugging me for years. Let's assume you ack an alert for the service "W3SVC" for 4 hours for maintenance, which you have stopped on purpose for whatever reason. Let's assume you also monitor services for SQL, Java, etc. If one of those also fails during the 4 hours of ack time - you won't get an alert. There are 3 tests that can have multiple components which I would like to know if new alerts arrive while a test has been ack'd: disk, procs, & svcs.
As I see it, the core of the
problem with this is that Xymon currently bundles tests by the method in
which they are tested, not by the "thing" that they test.
So you have
a "procs" status containing the status of multiple processes. But these
processes may not have anything to do with each other, so handling
alerts and acks based on this combination-status causes problems.
Same
issue with e.g. "disk" or "http" status.
As the old-timers here know,
the reason for this is historical. That's a bad excuse, though, and it
is something that needs changing. It is "in the pipeline"; since we have
all of the rules for checking e.g. processes defined in analysis.cfg, we
can also tell Xymon to put the analysis result in a different column
than the "procs" column.
I just need to write the code to do that :-(
Regards,
Henrik