Xymon Mailing List Archive search

Need some help with delayyellow / delayred

5 messages in this thread

list Grant Taylor · Fri, 09 Aug 2024 10:39:47 -0500 ·
Hi,

I need some help understanding delayyellow and delayred in Xymon 4.3.30 compiled from source (not distro).

I've got some equipment that I'm pinging (conn) and checking web (http) on BMCs (Dell iDRAC and Oracle ILOM) and am having a LOT of trouble with very short lived failures.  As in fails a test and then the xymonnet (?) re-tries succeed in the next minute.

So I've tried enabling delayyellow and delayred, first with 5 minutes and then with 10 minutes.  But I'm still seeing color changes and receiving email notifications.

My hosts.cfg entries look like this:

    192.0.2.1	test-client-ilom	# NAME:Test-Client-ILOM ssh delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10

These are place holders, but they are find & replace type place holders.

Am I incorrect in thinking that delayyellow / delayred will cause the test results for the delay value to be ignored /before/ changing color?

I suppose that I can change from delayyellow / delayred in hosts.cfg and go to their counterpart in the alerts.cfg file and not send the email(s).  But I'd rather the page not change colors as we keep a screen open to it and I'd really rather it not false-yellow / false-red to immediately change back to green < 5 minutes later.

I'm sure that I'm missing something and am hoping that someone will help me figure things out and learn.

Thank you and have a good day.


-- 
Grant. . . .
unix || die
list Tom Schmidt · Fri, 09 Aug 2024 10:43:51 -0600 ·
Since you are using the delayred and delayyellow options for network tests (conn and ssh), the xymonnet-again.sh script will retest them every minute for up to 30 minutes (see man page for hosts.cfg).  So your "delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10" option should delay changing these tests to red/yellow for up to 10 minutes after the first failure.   I believe you have it configured correctly.  Is the test perhaps flapping?  There is also a noflap option that could be used if that is the case.  I have a few systems that I use the delayred/delayyellow options on and they appear to be working as expected, such as this:
0.0.0.0  google.com  # ?conn https://google.com/ sni HIDEHTTP delayred=http:10 delayyellow=http:10

Tom

quoted from Grant Taylor
On Fri, Aug 9, 2024 at 9:39 AM Grant Taylor via Xymon <xymon@xymon.com> wrote:
Hi,

I need some help understanding delayyellow and delayred in Xymon 4.3.30
compiled from source (not distro).

I've got some equipment that I'm pinging (conn) and checking web (http)
on BMCs (Dell iDRAC and Oracle ILOM) and am having a LOT of trouble with
very short lived failures.  As in fails a test and then the xymonnet (?)
re-tries succeed in the next minute.

So I've tried enabling delayyellow and delayred, first with 5 minutes
and then with 10 minutes.  But I'm still seeing color changes and
receiving email notifications.

My hosts.cfg entries look like this:

192.0.2.1   test-client-ilom        # NAME:Test-Client-ILOM ssh
delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10

These are place holders, but they are find & replace type place holders.

Am I incorrect in thinking that delayyellow / delayred will cause the
test results for the delay value to be ignored /before/ changing color?

I suppose that I can change from delayyellow / delayred in hosts.cfg and
go to their counterpart in the alerts.cfg file and not send the
email(s).  But I'd rather the page not change colors as we keep a screen
open to it and I'd really rather it not false-yellow / false-red to
immediately change back to green < 5 minutes later.

I'm sure that I'm missing something and am hoping that someone will help
me figure things out and learn.

Thank you and have a good day.

--
Grant. . . .
unix || die
xymon@xymon.com
To unsubscribe send an email to xymon-leave@xymon.com
list Grant Taylor · Fri, 09 Aug 2024 12:36:36 -0500 ·
quoted from Tom Schmidt
On 8/9/24 11:43 AM, Tom Schmidt wrote:
Since you are using the delayred and delayyellow options for network tests (conn and ssh), the xymonnet-again.sh script will retest them every minute for up to 30 minutes (see man page for hosts.cfg).
ACK

I thought it would re-test every minute for the first 5 minutes, but 30 minutes is cool too.  #TIL
So your "delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10" option should delay changing these tests to red/yellow for up to 10 minutes after the first failure.
That's what I thought and behavior I was trying to achieve.
I believe you have it configured correctly.
Thank you for the 2nd set of eyes.
Is the test perhaps flapping?  There is also a noflap option that could be used if that is the case.
No, I don't think so.

It seems like the tests (conn/ping and / or http/https) periodically (once an hour or so for the sake of discussion) fail and Xymon causes the associated column to go red.

It almost always goes green again a minute or two after it went red. Time stamps in alert emails are usually one minute apart.  Sometimes they have the same minute or up to three minutes apart.

Test history shows that it was red for < 2 minutes.

It's just older / slower / cantankerous hardware that occasionally burps and fails a test.

I don't care about onsie-twosie tests fails.  I care about when it's been failing for 10-15 minutes.

Well ... I prefer no deay<COLOR>.  But I'd rather not have color changes for burps on the known problematic systems.  --  I hope that makes sense.
I have a few systems that I use the delayred/delayyellow options on and they appear to be working as expected, such as this:
ACK
0.0.0.0 google.com  # ?conn https://google.com/ sni HIDEHTTP delayred=http:10 delayyellow=http:10
I'm not sure what the question mark in front of the conn does.  I think sni causes the test to use Server Name Indication, which it doesn't do by default.  The delayred / delayyellow is what I'm trying to get to work.

I wonder if the syntax isn't correct with the comma separating multiple tests.  I'll try the following and see if that improves things:

    delayred=conn:10 delayyellow=conn:10

Thank you Tom.  :-)


-- 
Grant. . . .
unix || die
list Jeremy Laidman · Sat, 10 Aug 2024 12:54:51 +1000 ·
The comma format is valid according to the doco. However I've never seen that usage before, so trying separate entries is a good idea.
Regarding the "?":

"By prefixing a test with "?" errors will be reported with a "clear" status instead of red. This is known as a test for a "dialup" service, and allows you to run tests of hosts that are not always online, without getting alarms while they are off-line."

J

quoted from Grant Taylor
On Sat, 10 Aug 2024, 03:36 Grant Taylor via Xymon, <xymon@xymon.com> wrote:
On 8/9/24 11:43 AM, Tom Schmidt wrote:
> Since you are using the delayred and delayyellow options for network
> tests (conn and ssh), the xymonnet-again.sh script will retest them
> every minute for up to 30 minutes (see man page for hosts.cfg).

ACK

I thought it would re-test every minute for the first 5 minutes, but 30
minutes is cool too.  #TIL

> So your "delayred=conn:10,ssh:10 delayyellow=conn:10,ssh:10" option
> should delay changing these tests to red/yellow for up to 10 minutes
> after the first failure.

That's what I thought and behavior I was trying to achieve.

> I believe you have it configured correctly.

Thank you for the 2nd set of eyes.

> Is the test perhaps flapping?  There is also a noflap option that
> could be used if that is the case.

No, I don't think so.

It seems like the tests (conn/ping and / or http/https) periodically
(once an hour or so for the sake of discussion) fail and Xymon causes
the associated column to go red.

It almost always goes green again a minute or two after it went red.
Time stamps in alert emails are usually one minute apart.  Sometimes
they have the same minute or up to three minutes apart.

Test history shows that it was red for < 2 minutes.

It's just older / slower / cantankerous hardware that occasionally burps
and fails a test.

I don't care about onsie-twosie tests fails.  I care about when it's
been failing for 10-15 minutes.

Well ... I prefer no deay<COLOR>.  But I'd rather not have color changes
for burps on the known problematic systems.  --  I hope that makes sense.

> I have a few systems that I use the delayred/delayyellow options on
> and they appear to be working as expected, such as this:

ACK

> 0.0.0.0 google.com  # ?conn https://google.com/
> sni HIDEHTTP delayred=http:10 delayyellow=http:10

I'm not sure what the question mark in front of the conn does.  I think
sni causes the test to use Server Name Indication, which it doesn't do
by default.  The delayred / delayyellow is what I'm trying to get to work.

I wonder if the syntax isn't correct with the comma separating multiple
tests.  I'll try the following and see if that improves things:

delayred=conn:10 delayyellow=conn:10

Thank you Tom.  :-)

--
Grant. . . .
unix || die

xymon@xymon.com
To unsubscribe send an email to xymon-leave@xymon.com
list Grant Taylor · Mon, 12 Aug 2024 16:13:21 -0500 ·
quoted from Jeremy Laidman
On 8/9/24 9:54 PM, Jeremy Laidman wrote:
The comma format is valid according to the doco. However I've never seen that usage before, so trying separate entries is a good idea.
Well separate entries didn't seem to work.

    192.0.2.1    test-client-ilom    # NAME:Test-Client-ILOM ssh delayred=conn:10 delayred=ssh:10 delayyellow=conn:10 delayyellow=ssh:10
quoted from Jeremy Laidman
Regarding the "?":

"By prefixing a test with "?" errors will be reported with a "clear" status instead of red. This is known as a test for a "dialup" service, and allows you to run tests of hosts that are not always online, without getting alarms while they are off-line."
Thank you.

I suspect that other tests (e.g. ssh) will go clear if conn doesn't respond.  But I'll have to test what happens if conn is good but ssh would otherwise go red.


-- 
Grant. . . .
unix || die