Xymon Mailing List Archive search

Monitoring disk space problems (was: RE: [hobbit] Highlights of the 4.3.0 version)

list Buchan Milne
Wed, 8 Aug 2007 18:28:30 +0200
Message-Id: <user-5028613fda9e@xymon.invalid>

On Monday 06 August 2007 21:25:46 Haertig, David F (Dave) wrote:
I try to identify filesystem "space hogs" via custom scripts I wrote a
long time ago when using BB.  99% of my custom stuff is done in PERL.

I use 'du -k' to get the size of all directories in the filesystem.  I
then cut those results down to only the first and second level
directories (but you could go as deep as you want).  I store the size of
each subdirectory in a small "database".  I did this ages ago and my
code uses PERL's "Storable" module to store the accumulated date into a
file (called my "database").  These days I'd just use Hobbit's easily
accessed RRD files.  I then use PERL's
Statistics::Descriptive::least_squares_fit() to calculate the slope and
linear correlation coefficient of the "best fit line".
This would be really useful to do on directories monitored with the dir option 
in client-local.cfg plus DIR option in hobbit-clients, e.g. to be able to 
specify alerts at specified "time before disk is full".
This allows me 
to see how fast each subdirectory is growing/shrinking, and how linear
that growth/reduction is.  I trigger yellow/red conditions based on rate
of growth and predicted fill time at current growth rate, in addition to
the standard "95% full = red" test.

The above makes it fairly easy to identify which subdirectory is your
problem, which is often times good enough to identify the file/process
that is killing you.  When that's not, I have a seperate test that tries
to identify problem files a different way.  BB/Hobbit uses 'top' to
identify cpu-hogging processes.  Many times you see files hogging space
are directly tied to processes hogging cpu (runaway process = runaway
file in many cases).  'top' identifies the process(es), then "lsof -p
<pid>" is used to identify the files that the suspect process has open.
Finding a cpu-hogger that has a filespace-hogger open is usually the
holy grail you seek.
The "CPU usage by process" graph is the utopian one ...
As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in
100Mb chunks for critical filesystems.  "dd if=/dev/zero
of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then
"cp reserve01 reserve02", etc. to build up the reserve.
lvextend may be another useful command here ...


Regards,
Buchan