Ongoing problem with multiple recovery notices with on a few tests
list Larry Bonham
I'm having an ongoing problem with multiple recovery notices. I was really hoping that 4.3.21 would have fixed it but not so. The alert picks up the correct rule and stops at line 473. Recover hits that rule then continues on down the list. It will then duplicate on a secondary matching rule line 482 and then on my catch all default rule line 618. This isn't on all alert recoveries. Mostly appears to only happen on "server" names that have an underscore or dash in it. That may just be a coincidence. Attached is a snapshot of my email notices showing the problem. It's really a small percentage of my total alerts that do this. But that subset will do it consistently. There has to be something different about that group. I got a response from J.C. on this around 3/4/15. He and I agreed that the problem is most likely in lib/loadalerts.c. Which is a pretty complicated piece of code. ANYWAY, I wanted to see if anyone else is experiencing this and, if so, were you able to adjust or work around it. It isn't a major problem. Just an annoyance. RHEL 6.6 Xymon 4.3.21 From hosts.cfg 0.0.0.0 ccs_menu.qa.ccsic.fni-stl.com # noconn nosslcert https://qa.ccsic.fni-stl.com/cgi-bin/xxx.pl DESCR:"CCS:Testing ccs_menu.pl response on qa.ccsic.fni-stl.com" DOWNTIME=12345:0000:0659,12345:1701:2359,06:0000:2359 Relevant section from alerts.cfg ### My email screen snap shot represents user-dec3fc4ceca4@xymon.invalid ### macro to stop further rule checking. Also tried IGNORE. Same results. 180 $STOP=SCRIPT xymon-ignore.sh none FORMAT=SCRIPT STOP 470 PAGE=%url/CCS HOST=%ccs_menu.(qa|test|launch).ccsic.fni-stl.com EXSERVICE=sslcert 471 MAIL user-a6de924e57a5@xymon.invalid DURATION>2 REPEAT=60 RECOVERED 472 MAIL user-dec3fc4ceca4@xymon.invalid DURATION>2 REPEAT=60 RECOVERED 473 $STOP <-- when alerting on fail it always stops here. But recovery notices keep going. ### catch all rule for anything not handled above. 480 PAGE=%url/.* EXSERVICE=sslcert 481 MAIL user-c3b701b2fb96@xymon.invalid DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red 482 MAIL user-dec3fc4ceca4@xymon.invalid DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red ... Email other users. Line format identical to above. 490 SCRIPT xymon-page.sh grp1 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red 491 SCRIPT xymon-page.sh grp3 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red,purple 492 $STOP ### catch all rule for anything not handled above. 616 HOST=* 617 MAIL user-a6de924e57a5@xymon.invalid REPEAT=1440 RECOVERED COLOR=yellow,red 618 MAIL user-dec3fc4ceca4@xymon.invalid REPEAT=1440 RECOVERED COLOR=yellow,red 619 SCRIPT xymon-page.sh grp3 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red 620 $STOP Larry B. CONFIDENTIALITY NOTICE: This electronic mail message is intended exclusively for recipient to which it is addressed. The contents of this message and any attachments may contain confidential and privileged information. Any unauthorized review, use, print, storage, copy, disclosure or distribution is strictly prohibited. If you have received this message in error, please advise the sender immediately by replying to the message's sender and delete all copies of this message and its attachments without disclosing the contents to anyone, or using the contents for any purpose.
list Larry Bonham
I apologize if this is a mostly duplicate message from earlier today. It was too large due to an attachment. And I had some incorrect information in the example configuration I provided. This one is smaller and correct.
▸
I'm having an ongoing problem with multiple recovery notices. I was really hoping that 4.3.21 would have fixed it but not so.
The alert picks up the correct rule and stops at line 473. Recover hits that rule then continues on down the list. It will then duplicate on a secondary matching rule line 482 and then on my catch all default rule line 618. This isn't on all alert recoveries. Mostly appears to only happen on "server" names that have an underscore or dash in it. That may just be a coincidence.
Here is an example of recent email notices showing the problem. It's really a small percentage of my total alerts that do this. But that subset will do it consistently. There has to be something different about that group.
From Subject Received
Xymon STL Xymon [839283] qa_ccsic_ccs_red_alert:http CRITICAL (RED) [cfid:467] 7:11 AM
Xymon STL Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:467] 7:40 AM
Xymon STL Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:482] 7:40 AM
Xymon STL Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:618] 7:40 AM
▸
I got a response from J.C. on this around 3/4/15. He and I agreed that the problem is most likely in lib/loadalerts.c. Which is a pretty complicated piece of code.
ANYWAY, I wanted to see if anyone else is experiencing this and, if so, were you able to adjust or work around it. It isn't a major problem. Just an annoyance.
RHEL 6.6
Xymon 4.3.21
From hosts.cfg
0.0.0.0 qa_ccsic_ccs_red_alert # noconn nosslcert https://qa.ccsic.fni-stl.com/cgi-bin/xxx_alert.pl Relevant section from alerts.cfg ### My email results represent user-dec3fc4ceca4@xymon.invalid
▸
### macro to stop further rule checking. Also tried IGNORE. Same results.
180 $STOP=SCRIPT xymon-ignore.sh none FORMAT=SCRIPT STOP
465 PAGE=%url/CCS HOST=%(qa|test|launch)_ccsic_ccs.*_(redalert|red_alert) EXSERVICE=sslcert
466 MAIL user-a6de924e57a5@xymon.invalid DURATION>2 REPEAT=60 RECOVERED
467 MAIL user-dec3fc4ceca4@xymon.invalid DURATION>2 REPEAT=60 RECOVERED
468 $STOP <-- when alerting on fail it always stops here. But recovery notices keep going.
### catch all rule for the url page and not handled above.
▸
480 PAGE=%url/.* EXSERVICE=sslcert
481 MAIL user-c3b701b2fb96@xymon.invalid DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red
482 MAIL user-dec3fc4ceca4@xymon.invalid DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red
... Email other users. Line format identical to above.
490 SCRIPT xymon-page.sh grp1 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red
491 SCRIPT xymon-page.sh grp3 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red,purple
492 $STOP
### catch all rule for anything not handled above.
616 HOST=*
617 MAIL user-a6de924e57a5@xymon.invalid REPEAT=1440 RECOVERED COLOR=yellow,red
618 MAIL user-dec3fc4ceca4@xymon.invalid REPEAT=1440 RECOVERED COLOR=yellow,red
619 SCRIPT xymon-page.sh grp3 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red
620 $STOP
Larry B.
CONFIDENTIALITY NOTICE:
This electronic mail message is intended exclusively for
recipient to which it is addressed. The contents of this message
and any attachments may contain confidential and privileged
information. Any unauthorized review, use, print, storage, copy,
disclosure or distribution is strictly prohibited. If you have
received this message in error, please advise the sender
immediately by replying to the message's sender and delete all
copies of this message and its attachments without disclosing
the contents to anyone, or using the contents for any purpose.
list Sebastian Auriol
Yes, I'm also having this issue, although I haven't investigated it recently and this is with an older version of xymon. I was hoping it would be fixed in 4.3.21 too! Kind regards, SebA
▸
-----Original Message----- From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Larry Bonham Sent: 11 September 2015 20:49 To: xymon at xymon.com Subject: [Xymon] Ongoing problem with multiple recovery notices with on afew tests I apologize if this is a mostly duplicate message from earlier today. It was too large due to an attachment. And I had some incorrect information in the example configuration I provided. This one is smaller and correct. I'm having an ongoing problem with multiple recovery notices. I was really hoping that 4.3.21 would have fixed it but not so. The alert picks up the correct rule and stops at line 473. Recover hits that rule then continues on down the list. It will then duplicate on a secondary matching rule line 482 and then on my catch all default rule line 618. This isn't on all alert recoveries. Mostly appears to only happen on "server" names that have an underscore or dash in it. That may just be a coincidence. Here is an example of recent email notices showing the problem. It's really a small percentage of my total alerts that do this. But that subset will do it consistently. There has to be something different about that group. From Subject Received Xymon STL Xymon [839283] qa_ccsic_ccs_red_alert:http CRITICAL (RED) [cfid:467] 7:11 AM Xymon STL Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:467] 7:40 AM Xymon STL Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:482] 7:40 AM Xymon STL Xymon qa_ccsic_ccs_red_alert:http recovered [cfid:618] 7:40 AM I got a response from J.C. on this around 3/4/15. He and I agreed that the problem is most likely in lib/loadalerts.c. Which is a pretty complicated piece of code. ANYWAY, I wanted to see if anyone else is experiencing this and, if so, were you able to adjust or work around it. It isn't a major problem. Just an annoyance. RHEL 6.6 Xymon 4.3.21 From hosts.cfg 0.0.0.0 qa_ccsic_ccs_red_alert # noconn nosslcert https://qa.ccsic.fni-stl.com/cgi-bin/xxx_alert.pl Relevant section from alerts.cfg ### My email results represent user-dec3fc4ceca4@xymon.invalid ### macro to stop further rule checking. Also tried IGNORE. Same results. 180 $STOP=SCRIPT xymon-ignore.sh none FORMAT=SCRIPT STOP 465 PAGE=%url/CCS HOST=%(qa|test|launch)_ccsic_ccs.*_(redalert|red_alert) EXSERVICE=sslcert 466 MAIL user-a6de924e57a5@xymon.invalid DURATION>2 REPEAT=60 RECOVERED 467 MAIL user-dec3fc4ceca4@xymon.invalid DURATION>2 REPEAT=60 RECOVERED 468 $STOP <-- when alerting on fail it always stops here. But recovery notices keep going. ### catch all rule for the url page and not handled above. 480 PAGE=%url/.* EXSERVICE=sslcert 481 MAIL user-c3b701b2fb96@xymon.invalid DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red 482 MAIL user-dec3fc4ceca4@xymon.invalid DURATION>2 REPEAT=60 RECOVERED COLOR=yellow,red ... Email other users. Line format identical to above. 490 SCRIPT xymon-page.sh grp1 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red 491 SCRIPT xymon-page.sh grp3 DURATION>2 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red,purple 492 $STOP ### catch all rule for anything not handled above. 616 HOST=* 617 MAIL user-a6de924e57a5@xymon.invalid REPEAT=1440 RECOVERED COLOR=yellow,red 618 MAIL user-dec3fc4ceca4@xymon.invalid REPEAT=1440 RECOVERED COLOR=yellow,red 619 SCRIPT xymon-page.sh grp3 FORMAT=SCRIPT REPEAT=60 RECOVERED COLOR=red 620 $STOP Larry B. CONFIDENTIALITY NOTICE: This electronic mail message is intended exclusively for recipient to which it is addressed. The contents of this message and any attachments may contain confidential and privileged information. Any unauthorized review, use, print, storage, copy, disclosure or distribution is strictly prohibited. If you have received this message in error, please advise the sender immediately by replying to the message's sender and delete all copies of this message and its attachments without disclosing the contents to anyone, or using the contents for any purpose.