On Wed, October 7, 2015 11:58 pm, Gavin Stone-Tolcher wrote:
Hi, We are seeing unusual alerting behaviour with Xymon 4.3.21 server
using a "holidays.cfg" with HOLIDAYLIKEWEEKDAY=0.
We have a network operations team (uqnoc-sms) that gets alerts during
business hours (TIME=W:0800:1700)
And a data networks team (dn-sms) that get out of business hours alerts in
certain windows (TIME=W:0600:0759,W:1701:2200,60:0600:2200)
Rules are like:
PAGE=$UNSMSREGEX EXHOST=$UNEXCLUDE
MAIL user-760dce092658@xymon.invalid SERVICE=$UNSMSSVCS DURATION>6m
TIME=W:0800:1700 COLOR=red REPEAT=1w FORMAT=SMS RECOVERED
MAIL user-c6d0660c5139@xymon.invalid SERVICE=$UNSMSSVCS DURATION>6m
TIME=W:0600:0759,W:1701:2200,60:0600:2200 COLOR=red REPEAT=1w
FORMAT=SMS RECOVERED
For a "red" conn test covered by the rule on a weekday public holiday, it
seems to correctly identify not to send an alert to "uqnoc-sms"
(TIME=W:0800:1700 ) and instead correctly generates an alert to "dn-sms"
(TIME=60:0600:2200 component), but then keeps sending the same alert
approximately every minute (my xymonnet poll cycle). Ignores REPEAT=1w?
Before I try and debug much further, I thought I would ask if anyone else
has seen similar behaviour?
Hmm. Does the REPEAT value work with a smaller interval (such as 1d or
1h)? And what type of system are you running on?
I'm curious if there's a REPEAT over/underflow going on instead of
something specific to the TIME exclusion back and forth.
Is the test persistently red with no spurious recoveries being generated
during the period in question?
-jc