Xymon Mailing List Archive search

Green status

list Steven Carr
Tue, 14 Aug 2012 19:33:15 +0100
Message-Id: <user-27c68cf84b83@xymon.invalid>

So you could also flag the nodes with the "dialup" flag which will allow
the nodes to go down and Xymon wont complain that they are down, but you
are still going to have to write your own server side script which
determine if a host is down that shouldn't be down and then raise an alert
for it. Xymon can't do that out of the box, and I'm not sure if any other
monitoring systems can either, the majority of monitoring solutions expect
nodes to be up 100% and only have exceptions for scheduled downtime etc.

Steve


On 14 August 2012 19:13, pankaj dorlikar <user-93d3572686c4@xymon.invalid> wrote:
hi,

thanks for reply. We have clients installed on all the nodes. At any
point of time, the nodes on which job is not running will be powered
down.  if new job comes, these nodes be powered up and some other
nodes will go down which are not running any job.


On 8/14/12, Steven Carr <user-923b20c0d620@xymon.invalid> wrote:
How are you monitoring the nodes? do you have a xymon client on each of
the
nodes or are you doing a simple "ping" check to the node?

If you are just doing a simple ping check then, off the top of my head, I
would make all nodes "noconn" in the hosts.cfg so Xymon doesn't actually
ping them anymore and write a script which uses the data you have to ping
nodes and then work out if the node should be up or not, and if the node
is
down and it shouldn't be then trigger a red alarm for that node.

Steve


On 14 August 2012 12:05, pankaj dorlikar <user-93d3572686c4@xymon.invalid>
wrote:
---------- Forwarded message ----------
From: pankaj dorlikar <user-93d3572686c4@xymon.invalid>
Date: Tue, 14 Aug 2012 16:34:00 +0530
Subject: Re: [Xymon] Green status
To: Ryan Novosielski <user-ae4522577e16@xymon.invalid>

Hi,

thank you for reply.
But at any point of time, only some of the nodes will be down and all
the other nodes will be up. If the server itself goes down, the
monitoring of rest of the working nodes will be affected.

On 8/14/12, Ryan Novosielski <user-ae4522577e16@xymon.invalid> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

What he is saying is that if there is an event that takes place where
you can execute a script at the time it happens, you can disable the
server by using the main binary's "disable" function. This binary used
to be called "bb" but is now called "xymon" -- take a look at its man
page to see how to send a disable message.

On 08/14/2012 03:55 AM, pankaj dorlikar wrote:
Hi,

Thank you for proving pointers and important clues. 1) Query
regarding "server-side test" : We can know the status of the "down"
nodes which are down as per schedular's instructions. But how this
information will help in setting the blue/green color for those
nodes in xymon web page? I mean how to send this data to xymon
server? Also will it cover all the tests?

2) How client cas send to send a "disable" command to server?

thank you

-pankaj


On 8/14/12, user-87556346d4af@xymon.invalid <user-87556346d4af@xymon.invalid> wrote:
We are using xymon-4.2.2 on rhel 5.2 server and more than 200
clients (HPC Cluster nodes).

Our requirement is :

-> If the node is powered down by scheduler for saving the
power, it is required that xymon should show its state as green
and same for other tests of same node.

Nodes powered down by scheduler are identified by pbsnodes
command which will show state as power.

-> If the node is going down by some other reason other that
powering down by scheduler, it should show red like normal
clients.
Assuming your scheduler can have shell script hooks attached to
events, I'd add something to send a "disable" command before it
brings a node down, and then re-enable as it comes back up. If
the nodes are being powered down without state being saved (eg,
not suspending/resuming themselves), then just disable "until
OK", otherwise I'd use some arbitrary future value.

Relevant tests will be blue (not green, as requested), but that
will be handled as a non-event for SLA purposes.

Separately, it might be a good idea to have a separate
server-side test that sends node state about each node to xymon
independent of the node itself. That test is a fine place to put
logic as well.


HTH,

-jc

- --
- ---- _  _ _  _ ___  _  _  _
|Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| |  | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922(2-0922)
\__/ Univ. of Med. and Dent.|IST/EI-Academic Svcs. - ADMC 450, Newark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlAqFsEACgkQmb+gadEcsb53xACfVP9x3ThR0zKtrYFVfVhHzJoI
JNQAoLUaRTt3AcQmrhoArknmclS7WkPw
=jBNe
-----END PGP SIGNATURE-----
--
Pankaj V. Dorlikar


--
Pankaj V. Dorlikar

--
Pankaj V. Dorlikar