Xymon Mailing List Archive search

Xymon Dependancies configuration.

list Tom Diehl
Sat, 6 Jun 2020 18:16:48 -0400 (EDT)
Message-Id: <user-f0a8e7264b50@xymon.invalid>

On Sat, 6 Jun 2020, Ralph M wrote:
On Sat, Jun 6, 2020 at 3:36 PM <user-dcee455aaab0@xymon.invalid> wrote:
On Thu, 4 Jun 2020, Ralph M wrote:
On Thu, Jun 4, 2020 at 3:36 PM <user-dcee455aaab0@xymon.invalid> wrote:
Hi,

On Thu, 4 Jun 2020, Adam Thorn wrote:
On 03/06/2020 22:49, user-dcee455aaab0@xymon.invalid wrote:
 Hi,

 I am trying to configure xymon dependencies so that if the core
router
is
 down
 my xymon server only pages me for the core router.

 In reading the man page it says to do something like the following:

 1.2.3.4 cg1.example.com # noconn https://cg1.example.com
 depends=(http:router.example.com/conn)

 The above works for a single service but the above host for example
has
 http and sslcert. How can I tell xymon that if router.example.com is
down
 all
 of the other services for a host should go clear?

 I tried setting the service to a * that does not work. and I tried
listing
 services separated with either a comma or a pipe but no joy.
"man hosts.cfg" suggests that the syntax you want is

depends=(testA:host1/test1,host2/test2),(testB:host3/test3)

so for your example,

depends=(http:router.example.com/conn),(sslcert:
router.example.com/conn)
That does not work for the sslcert test but does work for things like
ssh.
Which now makes sense given the info below.
As the man page says, "depends" only applies to tests performed by
xymonnet.
Wildcards do not appear to be supported but protocols.cfg will show you
most
of the tests that xymonnet might perform.
Ok, that explains why the neither the conn or sslcert test will not go
clear.
Neither test is listed in protocols.cfg. Given that both of these tests
are
network type tests it seems odd that they cannot be made to go clear on
failure of another network test. I guess I do not really understand how
Xymon works.

I was really hoping to be able to get a single alert when the router
went
down. It does not happen real often but it is a pita to get several
hundred
text messages for what is really a single failure.

Does anyone have a solution for these kinds of failures?
You could write an external script to connect to the router and "do
stuff"
if the connection fails.

For example, if you're checking the router every 5 minutes, when it fails
you could send a "disable" message to Xymon for the list of things behind
the router, with a 10 minute lifetime.  That'll turn off alerts for all
those devices.  As long as the router continues to fail, keep on sending
disables with 10 min lifetime, essentially extending the original
lifetime.  Once the router recovers, the disable message will expire up
to
10 mins later and those devices will alert or not depending on their next
status.

I don't have such a script, but it feels like it ought to be fairly
trivial
to implement.
In preparation for writing a script to do what I need, I have been playing
with
xymon commands.

If I send the following to xymon it appears to be ignoring the lifetime
parameter:
/usr/bin/xymon 127.0.0.1 "status+10m EMD1-2,example,com.conn clear `date`
test message"

The above command will send a status message to xymon but is only stays
clear for approx
30 seconds. If I am reading the man page correctly it should stay clear
for 10 minutes.
Does anyone know what I am missing?

Is Xymon pinging that host?  Its message would override your message.  Try
inventing a whole new column for the testing process, that way you can be
sure it shouldn't flip state unexpectedly.  Just replace .conn with any
other string that is not used for a test.
That is what I am seeing. It does not matter if the service is red or green
set it to clear and it switches to red or green in under 30 seconds.
In thinking about this that makes sense. Originally I thought that the message
sent from the xymon command line would override the automatic updates for
whatever the lifetime on the message was set. Obviously I was wrong.
For example:

    /usr/bin/xymon 127.0.0.1 "status+10m EMD1-2,example,com.tdiehl clear
`date` test message"

Best to keep it alphanumeric, but apart from that, use any word or random
string you like.  Bear in mind that the longer the string, the wider the
display, so you might start running off the edge of your display.  This may
or may not be important to you now, but if you add enough custom tests, the
sideways scrolling can be tiresome...  :)
That makes sense but for my purposes I am thinking instead of using clear I 
will disable the test. Blue is still better than red. :-)

Thanks for the confirmation and help.

Regards,

-- 
Tom			user-dcee455aaab0@xymon.invalid