Hi,
On Mon, Mar 09, 2015 at 12:44:03PM -0000, SebA wrote:
I have been trying to find out if there is a way of Xymon detecting that a
file-system in Linux has gone read-only as a result of a disk error (other
than reporting it just the once via monitoring /var/log/messages). Nothing
is showing up in my Xymon server, but my xymon-client is a bit old:
xymon-client-4.3.7-26.1.el5.tnt
I did a bit of Googling and I came up with these two links that may be
relevant:
http://sisyphus.ru/en/srpm/Sisyphus/xymon/sources/8
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764197
It seems that a RPM maintainer may have made some modifications to their
version in order to catch disks in a read-only state (in the first link) and
that there is mount-ro plugin that is part of the hobbit-plugins package in
Debian / Ubuntu. Does anyone have more information on either of
these
I'm one of the maintainers of Debian's hobbit-plugins package, so
yes. :-)
and whether any patches can be integrated upstream or plug-ins added
to xymonton?
I'm not sure where exactly at https://wiki.xymonton.org/ I should add
our set of plugins.
CCing Axel Beckert as he seems to have committed
something to the mount-ro plugin recently:
https://www.openhub.net/p/hobbit-plugins/commits
Hrm, OpenHub seems horribly out of date with most projects
recently... The full view on that Git repo is at
https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/
The source code of the mount-ro plugin is quite simple:
https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/misc.d/mount-ro
It's though not a direct plugin but meant for the meta-plugin "misc"
which calls all scripts in /etc/xymon/misc.d/ and summarizes their
exit codes into a single check. This is meant for checks which get
yellow/red only very seldom and where you don't want to waste a whole
column for it.
misc plugin:
https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/client-ext/misc
Hobbit.pm used in the misc plugin and many other plugins in that
package:
https://anonscm.debian.org/cgit/collab-maint/hobbit-plugins.git/tree/perl/Hobbit.pm
The following was at the bottom of /var/log/messages, but it does not
suggest any very obvious alarm strings to add other than the last line
without the 'dm-0', but it would be nicer to have something more generic
still as textual messages can change between different versions of the O/S.
kernel: sd 0:0:0:0: Unhandled sense code
kernel: sd 0:0:0:0: SCSI error: return code = 0x08100002
kernel: Result: hostbyte=invalid driverbyte=DRIVER_SENSE,SUGGEST_OK
kernel: sda: Current: sense key: Hardware Error
kernel: Add. Sense: Defect list error
kernel:
kernel: Buffer I/O error on device dm-0, logical block 1358756
kernel: lost page write due to I/O error on dm-0
That's probably something which can be caught via the LOG keyword in
analysis.cfg.
Kind regards, Axel Beckert
--
Axel Beckert <user-96d9963fe797@xymon.invalid> support: +41 44 633 26 68
IT Services Group, HPT H 6 voice: +41 44 633 41 89
Departement of Physics, ETH Zurich
CH-8093 Zurich, Switzerland http://nic.phys.ethz.ch/