Xymon Digest, Vol 111, Issue 17

list Gary Allen Vollink
Mon, 27 Apr 2020 07:54:23 -0400
Message-Id: <user-b6f5250ab2c5@xymon.invalid>

---------- Forwarded message ----------
From: Adam Goryachev <user-92fd6827f6ae@xymon.invalid>
On 27/4/20 05:06, Gary Allen Vollink wrote:

Hi all,

I have a configuration which uses RAID meta-devices set up as raid1 over
empty slots for GUI configuration and notification.  As such, I have md0
and md1 showing up as fatal errors in Xymon.  Again, this setup is standard
for this installation.  md2 + are all normal normal, valid (and actually
hold mounted filesystems).

I'd normally expect to be able to set up analysis.cfg to "something
something IGNORE" for this machine.  Like:

HOST=vault.home.vollink.com
    RAID md0 IGNORE
    RAID md1 IGNORE

Does such a thing exist (and I missed it/have the syntax wrong?)  If not,
/could/ such a thing exist?

I'm starting to become used to just having a RED screen (and that is
dangerous).

If the answer to the above is all, 'no,' then what is the best way to
ignore all RAID for that machine?

Thank you much for any thoughts,
Gary

You will need to share a your /proc/mdstat and/or a pointer to which ext
script you are using to monitor your md RAID. I suspect that your RAID
arrays are defined as a two member RAID1 with one missing member,
therefore, they would be expected to show as red, because they are failed.

You could either define the RAID arrays as RAID1 with only one member, or
else define them as RAID0 with only one member.

Or, you could add the spare drives as spares, or simply not define them as
RAID arrays until you actually need to use them.

Regards,
Adam

Thank you for responding.

I'm going to guess that the answer to my actual question - is there a way
to ignore individual md failures - is "I don't know".  To be clear: "I
don't know" is acceptable, I read through source-code looking for a way,
and I couldn't find one (and so-many bits are auto-loaded that it's super
hard to be sure enough to say "no").  I was hoping someone on-list would
actually know, but I get why that might not be the case.

To the questions:

============================ /proc/mdstat ===========================

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sda5[0] sdc5[2] sdb5[1]
      11711382912 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md1 : active raid1 sda2[0] sdb2[1] sdc2[2]
      2097088 blocks [6/3] [UUU___]

md0 : active raid1 sda1[0] sdb1[1] sdc1[2]
      2490176 blocks [6/3] [UUU___]

unused devices: <none>
============================ /proc/mdstat ===========================

I'm using the script here:
http://www.it-eckert.com/blog/2015/agent-less-monitoring-with-xymon/
(xymon-rclient.sh).

Specifically, the platform is Synology and yes, Synology runs two raid1
arrays over all of the slots (even though some are empty).  I could fix
this easily by adding hard drives into the empty slots, but I specifically
bought this unit so that I could expand it later.  That is, I both
understand that this is properly showing broken but unmounted RAIDs and I
know why those RAIDs are broken (and thus why the errors are nominal in my
setup).

I am still hoping that a failure state that is nominal would be something
I'd be able to ignore (just as I can ignore specific libraries or
individual filesystems).

The other choice for me is to entirely remove the mdstat portion of Ekert's
script.  (Sadly, there is nothing else for Synology monitoring that I can
get to work at all, and that simple script otherwise covers all of what I
need).  This means, I won't be notified (through Xymon) if one of my drives
does fail, but it's better than getting used to ignoring a RED background.

[Archive readers: It is okay to contact me directly with questions about my
setup]

Thank you,
Gary Allen Vollink