df hanging will cause xymon-client to hang.
list Cédric Briner
Hello, I'm running a xymon-client on a Debian Jessie. OS: Debian OS-Release: 8 (Jessie) xymon-client version : 4.3.17 The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish. Many thanks for xymon. Regards. Cédric BRINER
list Cédric Briner
Hello, I'm running a xymon-client on a Debian Jessie. OS: Debian OS-Release: 8 (Jessie) xymon-client version : 4.3.17 The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish. Many thanks for xymon. Regards. Cédric BRINER
list Ryan Novosielski
You do eventually find out when the status turns purple. This is a pretty hard one to deal with, from my experience. A hang on something NFS-related is pretty difficult to get out of. I've seen that mounting NFS with the "bg" option can improve this somewhat, but that might create other problems. -- ____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences* || \\UTGERS |---------------------*O*--------------------- ||_// Biomedical | Ryan Novosielski - Senior Technologist || \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922) || \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
▸
`'
From: Xymon [xymon-bounces at xymon.com] On Behalf Of Cédric BRINER [user-faf7f01cd570@xymon.invalid]
Sent: Thursday, June 18, 2015 9:11 AM
To: xymon at xymon.com
Subject: [Xymon] df hanging will cause xymon-client to hang.
Hello,
I'm running a xymon-client on a Debian Jessie.
OS: Debian
OS-Release: 8 (Jessie)
xymon-client version : 4.3.17
The error happend due to a nfs ressource not responding. I suppose that
as xymon launch "df" to get information and as the nfs was hanging, the
xymon-client is no able to detect that the df hangs and it does not send
any data to the server. Worst, the xymon-client is not verbose at all,
it does not tell this test (df) takes too long to accomplish.
Many thanks for xymon.
Regards.
Cédric BRINER
list Mark Felder
▸
On Thu, Jun 18, 2015, at 08:11, Cédric BRINER wrote:
Hello, I'm running a xymon-client on a Debian Jessie. OS: Debian OS-Release: 8 (Jessie) xymon-client version : 4.3.17 The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish. Many thanks for xymon.
This will happen on every OS with an NFS mount that is not responding. There's really not a workaround. You should know when that's happening because it should go purple in my experience. There are NFS mount options on most platforms that will help mitigate this type of issue.
list Steve Anderson
If you don't care about the nfs mounted volumes, you may get away with replacing the df with something like df -l or df -x nfs Those /should/ (depending on implementation) just skip the testing of the nfs volumes. timeout 5s df is another option, which should kill the df if it takes more than 5 seconds. Steve
▸
-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Novosielski, Ryan
Sent: 18 June 2015 14:16
To: Cédric BRINER; xymon at xymon.com
Subject: Re: [Xymon] df hanging will cause xymon-client to hang.
You do eventually find out when the status turns purple.
This is a pretty hard one to deal with, from my experience. A hang on something NFS-related is pretty difficult to get out of. I've seen that mounting NFS with the "bg" option can improve this somewhat, but that might create other problems.
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
From: Xymon [xymon-bounces at xymon.com] On Behalf Of Cédric BRINER [user-faf7f01cd570@xymon.invalid]
Sent: Thursday, June 18, 2015 9:11 AM
To: xymon at xymon.com
Subject: [Xymon] df hanging will cause xymon-client to hang.
Hello,
I'm running a xymon-client on a Debian Jessie.
OS: Debian
OS-Release: 8 (Jessie)
xymon-client version : 4.3.17
The error happend due to a nfs ressource not responding. I suppose that
as xymon launch "df" to get information and as the nfs was hanging, the
xymon-client is no able to detect that the df hangs and it does not send
any data to the server. Worst, the xymon-client is not verbose at all,
it does not tell this test (df) takes too long to accomplish.
Many thanks for xymon.
Regards.
Cédric BRINER
BiP Solutions Limited is a company registered in Scotland with Company
Number SC086146 and VAT number 383030966 and having its registered
office at Medius, 60 Pacific Quay, Glasgow, G51 1DZ.
In order to improve the quality of the service we offer, calls may be recorded
for quality management and training purposes.
****************************************************************************
This e-mail (and any attachment) is intended only for the attention of
the addressee(s). Its unauthorised use, disclosure, storage or copying
is not permitted. If you are not the intended recipient, please destroy
all copies and inform the sender by return e-mail.
This e-mail (whether you are the sender or the recipient) may be
monitored, recorded and retained by BiP Solutions Ltd.
E-mail monitoring/ blocking software may be used, and e-mail content may
be read at any time.You have a responsibility to ensure laws are not
broken when composing or forwarding e-mails and their contents.
****************************************************************************
list Ryan Novosielski
I wouldn't be certain that would work. Hangs on NFS tend not to respond to CTRL-C or kill. I'm working from memory here, but it would be interesting to try.
▸
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
▸
From: Steve Anderson [user-a1c59649b432@xymon.invalid]
Sent: Thursday, June 18, 2015 9:23 AM
To: Novosielski, Ryan; Cédric BRINER; xymon at xymon.com
Subject: RE: [Xymon] df hanging will cause xymon-client to hang.
If you don't care about the nfs mounted volumes, you may get away with replacing the df with something like
df -l
or
df -x nfs
Those /should/ (depending on implementation) just skip the testing of the nfs volumes.
timeout 5s df
is another option, which should kill the df if it takes more than 5 seconds.
Steve
-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Novosielski, Ryan
Sent: 18 June 2015 14:16
To: Cédric BRINER; xymon at xymon.com
Subject: Re: [Xymon] df hanging will cause xymon-client to hang.
You do eventually find out when the status turns purple.
This is a pretty hard one to deal with, from my experience. A hang on something NFS-related is pretty difficult to get out of. I've seen that mounting NFS with the "bg" option can improve this somewhat, but that might create other problems.
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
From: Xymon [xymon-bounces at xymon.com] On Behalf Of Cédric BRINER [user-faf7f01cd570@xymon.invalid]
Sent: Thursday, June 18, 2015 9:11 AM
To: xymon at xymon.com
Subject: [Xymon] df hanging will cause xymon-client to hang.
Hello,
I'm running a xymon-client on a Debian Jessie.
OS: Debian
OS-Release: 8 (Jessie)
xymon-client version : 4.3.17
The error happend due to a nfs ressource not responding. I suppose that
as xymon launch "df" to get information and as the nfs was hanging, the
xymon-client is no able to detect that the df hangs and it does not send
any data to the server. Worst, the xymon-client is not verbose at all,
it does not tell this test (df) takes too long to accomplish.
Many thanks for xymon.
Regards.
Cédric BRINER
BiP Solutions Limited is a company registered in Scotland with Company
Number SC086146 and VAT number 383030966 and having its registered
office at Medius, 60 Pacific Quay, Glasgow, G51 1DZ.
In order to improve the quality of the service we offer, calls may be recorded
for quality management and training purposes.
****************************************************************************
This e-mail (and any attachment) is intended only for the attention of
the addressee(s). Its unauthorised use, disclosure, storage or copying
is not permitted. If you are not the intended recipient, please destroy
all copies and inform the sender by return e-mail.
This e-mail (whether you are the sender or the recipient) may be
monitored, recorded and retained by BiP Solutions Ltd.
E-mail monitoring/ blocking software may be used, and e-mail content may
be read at any time.You have a responsibility to ensure laws are not
broken when composing or forwarding e-mails and their contents.
****************************************************************************
list Jeremy Ruffer
That's what I do. In xymonclient-linux.sh I changed it to df -Plh and df -Pil for the inode test. I find the h option makes the output easier to understand. Jeremy
▸
On 18 Jun 2015, at 14:23, Steve Anderson <user-a1c59649b432@xymon.invalid> wrote:
If you don't care about the nfs mounted volumes, you may get away with replacing the df with something like
df -l
or
df -x nfs
Those /should/ (depending on implementation) just skip the testing of the nfs volumes.
timeout 5s df
is another option, which should kill the df if it takes more than 5 seconds.
Steve
-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Novosielski, Ryan
Sent: 18 June 2015 14:16
To: Cédric BRINER; xymon at xymon.com
Subject: Re: [Xymon] df hanging will cause xymon-client to hang.
You do eventually find out when the status turns purple.
This is a pretty hard one to deal with, from my experience. A hang on something NFS-related is pretty difficult to get out of. I've seen that mounting NFS with the "bg" option can improve this somewhat, but that might create other problems.
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
From: Xymon [xymon-bounces at xymon.com] On Behalf Of Cédric BRINER [user-faf7f01cd570@xymon.invalid]
Sent: Thursday, June 18, 2015 9:11 AM
To: xymon at xymon.com
Subject: [Xymon] df hanging will cause xymon-client to hang.
Hello,
I'm running a xymon-client on a Debian Jessie.
OS: Debian
OS-Release: 8 (Jessie)
xymon-client version : 4.3.17
The error happend due to a nfs ressource not responding. I suppose that
as xymon launch "df" to get information and as the nfs was hanging, the
xymon-client is no able to detect that the df hangs and it does not send
any data to the server. Worst, the xymon-client is not verbose at all,
it does not tell this test (df) takes too long to accomplish.
Many thanks for xymon.
Regards.
Cédric BRINER
BiP Solutions Limited is a company registered in Scotland with Company
Number SC086146 and VAT number 383030966 and having its registered
office at Medius, 60 Pacific Quay, Glasgow, G51 1DZ.
In order to improve the quality of the service we offer, calls may be recorded
for quality management and training purposes.
****************************************************************************
This e-mail (and any attachment) is intended only for the attention of
the addressee(s). Its unauthorised use, disclosure, storage or copying
is not permitted. If you are not the intended recipient, please destroy
all copies and inform the sender by return e-mail.
This e-mail (whether you are the sender or the recipient) may be
monitored, recorded and retained by BiP Solutions Ltd.
E-mail monitoring/ blocking software may be used, and e-mail content may
be read at any time.You have a responsibility to ensure laws are not
broken when composing or forwarding e-mails and their contents.
****************************************************************************
list Henrik Størner
Den 18-06-2015 kl. 15:11 skrev Cédric BRINER:
▸
The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish.
Which is exactly the reason why the default xymon client uses "df -Pl" to avoid testing nfs mounts. If it hangs on nfs mounts, then either you changed the script or you installed a package where the package maintainer changed it. Regards, Henrik
list Richard L. Hamilton
A “hard” NFS mount will not give up on an access, but retry until the server comes back up (contrasted to a “soft” mount which will eventually give up, but can cause program crashes or even data loss on write operations when it gives up). “soft” mounts are almost always Evil (TM). An NFS mount with BOTH “hard” and “intr” options is as robust as a regular “hard” mount, but programs hung in an access to an unresponsive server can be killed.
▸
On Jun 18, 2015, at 09:45, Novosielski, Ryan <user-6e4f7a3bb37f@xymon.invalid> wrote:
I wouldn't be certain that would work. Hangs on NFS tend not to respond to CTRL-C or kill. I'm working from memory here, but it would be interesting to try.
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
From: Steve Anderson [user-a1c59649b432@xymon.invalid]
Sent: Thursday, June 18, 2015 9:23 AM
To: Novosielski, Ryan; Cédric BRINER; xymon at xymon.com
Subject: RE: [Xymon] df hanging will cause xymon-client to hang.
If you don't care about the nfs mounted volumes, you may get away with replacing the df with something like
df -l
or
df -x nfs
Those /should/ (depending on implementation) just skip the testing of the nfs volumes.
timeout 5s df
is another option, which should kill the df if it takes more than 5 seconds.
Steve
-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of Novosielski, Ryan
Sent: 18 June 2015 14:16
To: Cédric BRINER; xymon at xymon.com
Subject: Re: [Xymon] df hanging will cause xymon-client to hang.
You do eventually find out when the status turns purple.
This is a pretty hard one to deal with, from my experience. A hang on something NFS-related is pretty difficult to get out of. I've seen that mounting NFS with the "bg" option can improve this somewhat, but that might create other problems.
--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
|| \\ Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
`'
From: Xymon [xymon-bounces at xymon.com] On Behalf Of Cédric BRINER [user-faf7f01cd570@xymon.invalid]
Sent: Thursday, June 18, 2015 9:11 AM
To: xymon at xymon.com
Subject: [Xymon] df hanging will cause xymon-client to hang.
Hello,
I'm running a xymon-client on a Debian Jessie.
OS: Debian
OS-Release: 8 (Jessie)
xymon-client version : 4.3.17
The error happend due to a nfs ressource not responding. I suppose that
as xymon launch "df" to get information and as the nfs was hanging, the
xymon-client is no able to detect that the df hangs and it does not send
any data to the server. Worst, the xymon-client is not verbose at all,
it does not tell this test (df) takes too long to accomplish.
Many thanks for xymon.
Regards.
Cédric BRINER
BiP Solutions Limited is a company registered in Scotland with Company
Number SC086146 and VAT number 383030966 and having its registered
office at Medius, 60 Pacific Quay, Glasgow, G51 1DZ.
In order to improve the quality of the service we offer, calls may be recorded
for quality management and training purposes.
****************************************************************************
This e-mail (and any attachment) is intended only for the attention of
the addressee(s). Its unauthorised use, disclosure, storage or copying
is not permitted. If you are not the intended recipient, please destroy
all copies and inform the sender by return e-mail.
This e-mail (whether you are the sender or the recipient) may be
monitored, recorded and retained by BiP Solutions Ltd.
E-mail monitoring/ blocking software may be used, and e-mail content may
be read at any time.You have a responsibility to ensure laws are not
broken when composing or forwarding e-mails and their contents.
****************************************************************************
list Cédric Briner
▸
The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish.
What I found very disturbing is that there is no logs at all saying that df takes long to accomplish. Instead of finding a way to solve this xymon-client hanging out, could we somehow let xymon-client write a message on the log, saying that df did not return since a long time ? cED
list Galen Johnson
You might consider putting a process check for df on the affected servers. They actually will build up so you could alert on that. =G=
▸
From: Xymon <xymon-bounces at xymon.com> on behalf of Cédric BRINER <user-faf7f01cd570@xymon.invalid>
Sent: Friday, June 19, 2015 4:24 AM
To: Mark Felder; xymon at xymon.com
Subject: Re: [Xymon] df hanging will cause xymon-client to hang.
The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish.
What I found very disturbing is that there is no logs at all saying that df takes long to accomplish. Instead of finding a way to solve this xymon-client hanging out, could we somehow let xymon-client write a message on the log, saying that df did not return since a long time ? cED
list Phil Crooker
The trouble is the OS blocks on the i/o request while waiting for a response from the missing nfs server - the xymon client is not aware of 'real time' to be able to detect it has been blocked waiting for a return from another program. For these types of errors I create an ext script that uses expect. In the expect script you spawn the test, and with expect's timeout capability you can then alarm on the timeout.
▸
From: Xymon <xymon-bounces at xymon.com> on behalf of Cédric BRINER <user-faf7f01cd570@xymon.invalid>
Sent: Friday, 19 June 2015 5:54 PM
To: Mark Felder; xymon at xymon.com
Subject: Re: [Xymon] df hanging will cause xymon-client to hang.
The error happend due to a nfs ressource not responding. I suppose that as xymon launch "df" to get information and as the nfs was hanging, the xymon-client is no able to detect that the df hangs and it does not send any data to the server. Worst, the xymon-client is not verbose at all, it does not tell this test (df) takes too long to accomplish.
What I found very disturbing is that there is no logs at all saying that df takes long to accomplish. Instead of finding a way to solve this xymon-client hanging out, could we somehow let xymon-client write a message on the log, saying that df did not return since a long time ? cED
list Andrey Chervonets
I suppose the workaround for the problem can be - rewrite shell script calling df.
1) call df ... command a) in background b) redirecting output to some temp-file c) saving child process PID to variable
2) checking in loop the process still running
if still running (can check with ps command)
then sleep for 1 sec. or so and go to next check iteration
if not running - exit loop and check the output
if count of checks (iterations) more then N (defined in configuration), then end checking - raise the red status for disk
3) df is called twice - for space and for inodes - then implement it as shell function
Best regards,
Andrey Chervonets
SIA CoMinder
http://www.cominder.eu/