Xymon Mailing List Archive search

Problem with disk monitoring

14 messages in this thread

list Shailesh Paudyal · Wed, 28 Jul 2010 09:43:09 -0500 ·
All,

Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14% /u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94% /u03

-- 
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
list Steve Holmes · Wed, 28 Jul 2010 10:51:59 -0400 ·
On Wed, Jul 28, 2010 at 10:43 AM, Shailesh Paudyal <
quoted from Shailesh Paudyal
user-baeafc0cb301@xymon.invalid> wrote:
All,

Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14% /u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94% /u03

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
Which OS?
Steve
--
list Shailesh Paudyal · Wed, 28 Jul 2010 09:57:05 -0500 ·
OS is ............... Red Hat Linux 5.4
quoted from Steve Holmes

On Wed, Jul 28, 2010 at 9:51 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:
On Wed, Jul 28, 2010 at 10:43 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:
All,

Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14% /u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94% /u03

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
Which OS?
Steve
--

-- 

Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
list Steve Holmes · Wed, 28 Jul 2010 11:21:21 -0400 ·
On Wed, Jul 28, 2010 at 10:57 AM, Shailesh Paudyal <
quoted from Shailesh Paudyal
user-baeafc0cb301@xymon.invalid> wrote:
OS is ............... Red Hat Linux 5.4


On Wed, Jul 28, 2010 at 9:51 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:
On Wed, Jul 28, 2010 at 10:43 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:
All,

Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14% /u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94% /u03

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
Which OS?
Steve
--

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
It appears that Xymon has slipped one field to the left in parsing the df
output. The string at the beginning of each of the lines before the actual
df ouput should be the name of the filesystem (plus an icon, but we'll
ignore that for now). Then it is using the available number as the percent
used, which, of course, is huge.

I don't know if this is causing the problem but there is some funkiness with
the first line of the df output. It is broken between the 10 and the 4 and
there is a ']' instead of an 'l' in the word "blocks". Maybe this is a
cut/paste error, but if not, it is certainly not right.

I think it should read something like

Filesystem          1K-blocks            Used   Available  Use%    Mounted
on

If your df actually outputs a broken first line I think that should be fixed
first, then see if Xymon parses the output correctly.

Steve

-- 
The test of a democracy is not the magnificence of buildings or the speed of
automobiles or the efficiency of air transportation, but rather the care
given to the welfare of all the people. -Helen Adams Keller, lecturer and
author (1880-1968)

Truth never damages a cause that is just. -Mohandas Karamchand Gandhi
(1869-1948)
list Shailesh Paudyal · Wed, 28 Jul 2010 10:29:33 -0500 ·
The output of df is as follows:
[root at localhost u03]# df
quoted from Steve Holmes
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda9              9920592    591896   8816628   7% /

/dev/sda10           152435112  54216212  90350668  38% /u01
/dev/sda8              9920592    154052   9254472   2% /tmp
quoted from Steve Holmes
/dev/sda7              9920592   3146968   6261556  34% /usr
/dev/sda6              9920592   1632936   7775588  18% /opt

/dev/sda5             14877060    456172  13652984   4% /var
quoted from Steve Holmes
/dev/sda3             29753588    702880  27514896   3% /home
/dev/sda1               101086     28003     67864  30% /boot

tmpfs                 74156180  13079592  61076588  18% /dev/shm
/dev/mapper/VolGroup02-u02
                     2064204960 261882740 1697466668  14% /u02
/dev/mapper/VolGroup03-u03
                     2064204960 1827234716 132114692  94% /u03

When i pasted the output from the XYMON screen, I deleted the icons, it did
have the icons..like red smilies:

Thank you,
-shailesh
quoted from Steve Holmes
On Wed, Jul 28, 2010 at 10:21 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:
On Wed, Jul 28, 2010 at 10:57 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:
OS is ............... Red Hat Linux 5.4


On Wed, Jul 28, 2010 at 9:51 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:
On Wed, Jul 28, 2010 at 10:43 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:
All,

Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14%
/u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94%
/u03

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
Which OS?
Steve
--

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
It appears that Xymon has slipped one field to the left in parsing the df
output. The string at the beginning of each of the lines before the actual
df ouput should be the name of the filesystem (plus an icon, but we'll
ignore that for now). Then it is using the available number as the percent
used, which, of course, is huge.

I don't know if this is causing the problem but there is some funkiness
with the first line of the df output. It is broken between the 10 and the 4
and there is a ']' instead of an 'l' in the word "blocks". Maybe this is a
cut/paste error, but if not, it is certainly not right.

I think it should read something like

Filesystem          1K-blocks            Used   Available  Use%    Mounted
on

If your df actually outputs a broken first line I think that should be
fixed first, then see if Xymon parses the output correctly.

Steve

--
The test of a democracy is not the magnificence of buildings or the speed
of automobiles or the efficiency of air transportation, but rather the care
given to the welfare of all the people. -Helen Adams Keller, lecturer and
author (1880-1968)

Truth never damages a cause that is just. -Mohandas Karamchand Gandhi
(1869-1948)
-- 

Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
list Steve Holmes · Wed, 28 Jul 2010 11:40:01 -0400 ·
On Wed, Jul 28, 2010 at 11:29 AM, Shailesh Paudyal <
quoted from Shailesh Paudyal
user-baeafc0cb301@xymon.invalid> wrote:
The output of df is as follows:
[root at localhost u03]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda9              9920592    591896   8816628   7% /
/dev/sda10           152435112  54216212  90350668  38% /u01
/dev/sda8              9920592    154052   9254472   2% /tmp
/dev/sda7              9920592   3146968   6261556  34% /usr
/dev/sda6              9920592   1632936   7775588  18% /opt
/dev/sda5             14877060    456172  13652984   4% /var
/dev/sda3             29753588    702880  27514896   3% /home
/dev/sda1               101086     28003     67864  30% /boot
tmpfs                 74156180  13079592  61076588  18% /dev/shm
 /dev/mapper/VolGroup02-u02
                     2064204960 261882740 1697466668  14% /u02
/dev/mapper/VolGroup03-u03
                     2064204960 1827234716 132114692  94% /u03

When i pasted the output from the XYMON screen, I deleted the icons, it did
have the icons..like red smilies:

Thank you,
-shailesh

On Wed, Jul 28, 2010 at 10:21 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:
On Wed, Jul 28, 2010 at 10:57 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:
OS is ............... Red Hat Linux 5.4


On Wed, Jul 28, 2010 at 9:51 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:
On Wed, Jul 28, 2010 at 10:43 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:
All,

Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14%
/u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94%
/u03

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
Which OS?
Steve
--

--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
It appears that Xymon has slipped one field to the left in parsing the df
output. The string at the beginning of each of the lines before the actual
df ouput should be the name of the filesystem (plus an icon, but we'll
ignore that for now). Then it is using the available number as the percent
used, which, of course, is huge.

I don't know if this is causing the problem but there is some funkiness
with the first line of the df output. It is broken between the 10 and the 4
and there is a ']' instead of an 'l' in the word "blocks". Maybe this is a
cut/paste error, but if not, it is certainly not right.

I think it should read something like

Filesystem          1K-blocks            Used   Available  Use%    Mounted
on

If your df actually outputs a broken first line I think that should be
fixed first, then see if Xymon parses the output correctly.

Steve
--

Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
Well, that looks good to me.  I have no idea.
quoted from Shailesh Paudyal
Steve

-- 
The test of a democracy is not the magnificence of buildings or the speed of
automobiles or the efficiency of air transportation, but rather the care
given to the welfare of all the people. -Helen Adams Keller, lecturer and
author (1880-1968)

Truth never damages a cause that is just. -Mohandas Karamchand Gandhi
(1869-1948)
list Johan Sjöberg · Wed, 28 Jul 2010 17:50:01 +0200 ·
Are you using Xymon 4.3.0 beta?

I have seen this before on that version. It seems to be caused by too long status messages. Chech your hobbit test and see if it reports any status messages that are too long. I fixed this problem by increasing the maximum status message size (in hobbitserver.cfg). There are other discussions about this on the mailing list.

/Johan
quoted from Steve Holmes

From: user-5425c7b245e1@xymon.invalid [mailto:user-5425c7b245e1@xymon.invalid] On Behalf Of Steve Holmes
Sent: den 28 juli 2010 17:40
To: xymon at xymon.com
Subject: Re: [xymon] Problem with disk monitoring


On Wed, Jul 28, 2010 at 11:29 AM, Shailesh Paudyal <user-baeafc0cb301@xymon.invalid<mailto:user-baeafc0cb301@xymon.invalid>> wrote:
The output of df is as follows:
[root at localhost u03]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda9              9920592    591896   8816628   7% /
/dev/sda10           152435112  54216212  90350668  38% /u01
/dev/sda8              9920592    154052   9254472   2% /tmp
/dev/sda7              9920592   3146968   6261556  34% /usr
/dev/sda6              9920592   1632936   7775588  18% /opt
/dev/sda5             14877060    456172  13652984   4% /var
/dev/sda3             29753588    702880  27514896   3% /home
/dev/sda1               101086     28003     67864  30% /boot
tmpfs                 74156180  13079592  61076588  18% /dev/shm
/dev/mapper/VolGroup02-u02
                     2064204960 261882740 1697466668  14% /u02
/dev/mapper/VolGroup03-u03
                     2064204960 1827234716 132114692  94% /u03

When i pasted the output from the XYMON screen, I deleted the icons, it did have the icons..like red smilies:

Thank you,
-shailesh

On Wed, Jul 28, 2010 at 10:21 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid<mailto:user-ec1bf77b1b44@xymon.invalid>> wrote:

On Wed, Jul 28, 2010 at 10:57 AM, Shailesh Paudyal <user-baeafc0cb301@xymon.invalid<mailto:user-baeafc0cb301@xymon.invalid>> wrote:
OS is ............... Red Hat Linux 5.4

On Wed, Jul 28, 2010 at 9:51 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid<mailto:user-ec1bf77b1b44@xymon.invalid>> wrote:


On Wed, Jul 28, 2010 at 10:43 AM, Shailesh Paudyal <user-baeafc0cb301@xymon.invalid<mailto:user-baeafc0cb301@xymon.invalid>> wrote:

All,

Please see below, there is a problem with disk monitoring on one of the server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14% /u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94% /u03

--
Shailesh K. Paudyal

user-baeafc0cb301@xymon.invalid<mailto:user-baeafc0cb301@xymon.invalid>


Which OS?
Steve
--


--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid<mailto:user-baeafc0cb301@xymon.invalid>
quoted from Steve Holmes

It appears that Xymon has slipped one field to the left in parsing the df output. The string at the beginning of each of the lines before the actual df ouput should be the name of the filesystem (plus an icon, but we'll ignore that for now). Then it is using the available number as the percent used, which, of course, is huge.

I don't know if this is causing the problem but there is some funkiness with the first line of the df output. It is broken between the 10 and the 4 and there is a ']' instead of an 'l' in the word "blocks". Maybe this is a cut/paste error, but if not, it is certainly not right.

I think it should read something like

Filesystem          1K-blocks            Used   Available  Use%    Mounted on

If your df actually outputs a broken first line I think that should be fixed first, then see if Xymon parses the output correctly.

Steve

--
Shailesh K. Paudyal

user-baeafc0cb301@xymon.invalid<mailto:user-baeafc0cb301@xymon.invalid>
quoted from Steve Holmes


Well, that looks good to me.  I have no idea.
Steve

--
The test of a democracy is not the magnificence of buildings or the speed of automobiles or the efficiency of air transportation, but rather the care given to the welfare of all the people. -Helen Adams Keller, lecturer and author (1880-1968)

Truth never damages a cause that is just. -Mohandas Karamchand Gandhi (1869-1948)
list Tim McCloskey · Wed, 28 Jul 2010 08:55:28 -0700 ·
Is the 'W]d" below a cut-n-paste error, or is that the real output?

From: Shailesh Paudyal [user-baeafc0cb301@xymon.invalid]
Sent: Wednesday, July 28, 2010 7:43 AM
To: xymon at xymon.com
Subject: [xymon] Problem with disk monitoring

...snip...
quoted from Johan Sjöberg

W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok

 7% / (8816628% used) has reached the PANIC level (95%)

 ...snip...
list Shailesh Paudyal · Wed, 28 Jul 2010 10:59:58 -0500 ·
Yes I am using XYmon 4.3.0 beta.... I am going to increase the messages size
and see....will post if it fixes the problem.

Thanks

On Wed, Jul 28, 2010 at 10:50 AM, Johan Sjöberg <
quoted from Johan Sjöberg
user-74c177c1220d@xymon.invalid> wrote:
Are you using Xymon 4.3.0 beta?


I have seen this before on that version. It seems to be caused by too long
status messages. Chech your hobbit test and see if it reports any status
messages that are too long. I fixed this problem by increasing the maximum
status message size (in hobbitserver.cfg). There are other discussions about
this on the mailing list.


/Johan


*From:* user-5425c7b245e1@xymon.invalid [mailto:user-5425c7b245e1@xymon.invalid] *On Behalf Of *Steve
Holmes
*Sent:* den 28 juli 2010 17:40
*To:* xymon at xymon.com
*Subject:* Re: [xymon] Problem with disk monitoring


On Wed, Jul 28, 2010 at 11:29 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:

The output of df is as follows:

[root at localhost u03]# df

Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/sda9              9920592    591896   8816628   7% /

/dev/sda10           152435112  54216212  90350668  38% /u01

/dev/sda8              9920592    154052   9254472   2% /tmp

/dev/sda7              9920592   3146968   6261556  34% /usr

/dev/sda6              9920592   1632936   7775588  18% /opt

/dev/sda5             14877060    456172  13652984   4% /var

/dev/sda3             29753588    702880  27514896   3% /home

/dev/sda1               101086     28003     67864  30% /boot

tmpfs                 74156180  13079592  61076588  18% /dev/shm

/dev/mapper/VolGroup02-u02

                     2064204960 261882740 1697466668  14% /u02

/dev/mapper/VolGroup03-u03

                     2064204960 1827234716 132114692  94% /u03


When i pasted the output from the XYMON screen, I deleted the icons, it did
have the icons..like red smilies:


Thank you,

-shailesh


On Wed, Jul 28, 2010 at 10:21 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:


On Wed, Jul 28, 2010 at 10:57 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:

OS is ............... Red Hat Linux 5.4


On Wed, Jul 28, 2010 at 9:51 AM, Steve Holmes <user-ec1bf77b1b44@xymon.invalid> wrote:


On Wed, Jul 28, 2010 at 10:43 AM, Shailesh Paudyal <
user-baeafc0cb301@xymon.invalid> wrote:

All,

Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?


W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok


 7% / (8816628% used) has reached the PANIC level (95%)

 38% /u01 (90371708% used) has reached the PANIC level (95%)

 2% /tmp (9254468% used) has reached the PANIC level (95%)

 34% /usr (6261556% used) has reached the PANIC level (95%)

 18% /opt (7775588% used) has reached the PANIC level (95%)

 4% /var (13653064% used) has reached the PANIC level (95%)

 3% /home (27514896% used) has reached the PANIC level (95%)

 30% /boot (67864% used) has reached the PANIC level (95%)

 14% /u02 (1697518148% used) has reached the PANIC level (95%)

 94% /u03 (136865636% used) has reached the PANIC level (95%)


Filesystem         10

4-b]ocks      Used Available Capacity Mounted on

/dev/sda9              9920592    591896   8816628       7% /

/dev/sda10           152435112  54195172  90371708      38% /u01

/dev/sda8              9920592    154056   9254468       2% /tmp

/dev/sda7              9920592   3146968   6261556      34% /usr

/dev/sda6              9920592   1632936   7775588      18% /opt

/dev/sda5             14877060    456092  13653064       4% /var

/dev/sda3             29753588    702880  27514896       3% /home

/dev/sda1               101086     28003     67864      30% /boot

/dev/mapper/VolGroup02-u02 2064204960 261831260 1697518148      14% /u02

/dev/mapper/VolGroup03-u03 2064204960 1822483772 136865636      94% /u03


--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid


Which OS?

Steve
--


--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid


It appears that Xymon has slipped one field to the left in parsing the df
output. The string at the beginning of each of the lines before the actual
df ouput should be the name of the filesystem (plus an icon, but we'll
ignore that for now). Then it is using the available number as the percent
used, which, of course, is huge.


I don't know if this is causing the problem but there is some funkiness
with the first line of the df output. It is broken between the 10 and the 4
and there is a ']' instead of an 'l' in the word "blocks". Maybe this is a
cut/paste error, but if not, it is certainly not right.


I think it should read something like


Filesystem          1K-blocks            Used   Available  Use%    Mounted
on


If your df actually outputs a broken first line I think that should be
fixed first, then see if Xymon parses the output correctly.


Steve


--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid


Well, that looks good to me.  I have no idea.

Steve

--
The test of a democracy is not the magnificence of buildings or the speed
of automobiles or the efficiency of air transportation, but rather the care
given to the welfare of all the people. -Helen Adams Keller, lecturer and
author (1880-1968)

Truth never damages a cause that is just. -Mohandas Karamchand Gandhi
(1869-1948)
-- 
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
list Shailesh Paudyal · Wed, 28 Jul 2010 11:00:16 -0500 ·
Thats a real output!
quoted from Tim McCloskey

On Wed, Jul 28, 2010 at 10:55 AM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:
Is the 'W]d" below a cut-n-paste error, or is that the real output?

From: Shailesh Paudyal [user-baeafc0cb301@xymon.invalid]
Sent: Wednesday, July 28, 2010 7:43 AM
To: xymon at xymon.com
Subject: [xymon] Problem with disk monitoring

...snip...

W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok

 7% / (8816628% used) has reached the PANIC level (95%)

 ...snip...

-- 

Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
list Henrik Størner · Thu, 23 Sep 2010 20:45:18 +0000 (UTC) ·
This is a somewhat old post, but I'm responding anyway ...
quoted from Steve Holmes

In <user-bac20ebe1220@xymon.invalid> Steve Holmes <user-ec1bf77b1b44@xymon.invalid> writes:
Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?

W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok

 7% / (8816628% used) has reached the PANIC level (95%)
 38% /u01 (90371708% used) has reached the PANIC level (95%)

Filesystem         10
4-b]ocks      Used Available Capacity Mounted on
/dev/sda9              9920592    591896   8816628       7% /
/dev/sda10           152435112  54195172  90371708      38% /u01
/dev/sda8              9920592    154056   9254468       2% /tmp
It appears that Xymon has slipped one field to the left in parsing the df
output. The string at the beginning of each of the lines before the actual
df ouput should be the name of the filesystem (plus an icon, but we'll
ignore that for now). Then it is using the available number as the percent
used, which, of course, is huge.
I don't know if this is causing the problem but there is some funkiness with
the first line of the df output. It is broken between the 10 and the 4 and
there is a ']' instead of an 'l' in the word "blocks". Maybe this is a
cut/paste error, but if not, it is certainly not right.

There is a bug somewhere in the Xymon 4.3.0-beta code with the "df"
status handling. I've seen it cause random RRD files to appear for
systems that don't have such filesystems, and occasionally it would
also result in this behaviour where a disk status goes wild.

I haven't been able to nail it yet, mostly because it seems to happen
very rarely and completely without any pattern. It would seem like
some sort of memory corruption problem, but I've had the client-message
handler running for days with valgrind (memory access checker) enabled,
and it came up with nothing.

Very annoying.


Regards,
Henrik
list Shailesh Paudyal · Thu, 23 Sep 2010 15:57:02 -0500 ·
Thanks Henrik,
But I still see the problem, please see the following alert came from xymon
a week or so ago.....

red Su] Aug 22 01:26:51 EDT 2010 - Filesystems NOT ok &red 21% / (20461936%
used) has reached the PANIC level (95%) &red 1% /app (65009052% used) has
reached the PANIC level (95%) &red 1% /home (112112360% used) has reached
the PANIC level (95%) &red 6% /var (8933348% used) has reached the PANIC
level (95%) &red 1% /tmp (18638136% used) has reached the PANIC level (95%)
&red 49% /boot (48938% used) has reached the PANIC level (95%) &red 11% /u01
(218810168% used) has reached the PANIC level (95%) &red 7% /u04 (228560164%
used) has reached the PANIC level (95%) &red 35% /u02 (1154279480% used) has
reached the PANIC level (95%) &red 24% /old_u02 (1507070236% used) has
reached the PANIC level (95%)


Filesystem         1

24-]locks      Used Available Capacity Mounted on

/dev/sda5             27054004   5195620  20461936      21% /

/dev/sdb1             68814716    253696  65009052       1% /app

/dev/sdc2            118417044    192356 112112360       1% /home

/dev/sda3              9920624    475208   8933348       6% /var

/dev/sdc1             19840892    178616  18638136       1% /tmp

/dev/sda1               101086     46929     48938      49% /boot

/dev/mapper/VolGroup01-u01 258022788  26105832 218810168      11% /u01

/dev/mapper/VolGroup04-u04 258022788  16355836 228560164       7% /u04

/dev/mapper/VolGroup03-u03 1857784872 609135396 1154279480      35% /u02

/dev/mapper/VolGroup02-u02 2064204960 452279172 1507070236      24% /old_u02
quoted from Henrik Størner

On Thu, Sep 23, 2010 at 3:45 PM, Henrik Størner <user-ce4a2c883f75@xymon.invalid> wrote:
This is a somewhat old post, but I'm responding anyway ...

In <user-bac20ebe1220@xymon.invalid> Steve
Holmes <user-ec1bf77b1b44@xymon.invalid> writes:
Please see below, there is a problem with disk monitoring on one of
the
server. Can some one tell me if I did something wrong?

W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok

 7% / (8816628% used) has reached the PANIC level (95%)
 38% /u01 (90371708% used) has reached the PANIC level (95%)

Filesystem         10
4-b]ocks      Used Available Capacity Mounted on
/dev/sda9              9920592    591896   8816628       7% /
/dev/sda10           152435112  54195172  90371708      38% /u01
/dev/sda8              9920592    154056   9254468       2% /tmp
It appears that Xymon has slipped one field to the left in parsing the df
output. The string at the beginning of each of the lines before the actual
df ouput should be the name of the filesystem (plus an icon, but we'll
ignore that for now). Then it is using the available number as the percent
used, which, of course, is huge.
I don't know if this is causing the problem but there is some funkiness
with
the first line of the df output. It is broken between the 10 and the 4 and
there is a ']' instead of an 'l' in the word "blocks". Maybe this is a
cut/paste error, but if not, it is certainly not right.

There is a bug somewhere in the Xymon 4.3.0-beta code with the "df"
status handling. I've seen it cause random RRD files to appear for
systems that don't have such filesystems, and occasionally it would
also result in this behaviour where a disk status goes wild.

I haven't been able to nail it yet, mostly because it seems to happen
very rarely and completely without any pattern. It would seem like
some sort of memory corruption problem, but I've had the client-message
handler running for days with valgrind (memory access checker) enabled,
and it came up with nothing.

Very annoying.


Regards,
Henrik

-- 

Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid
list Johan Sjöberg · Thu, 23 Sep 2010 23:02:21 +0200 ·
I think this is somehow related to oversized status messages. We were having problems with this on 4.3.0 beta, and we also had a lot of oversized status messages (ports etc). Since we increased the max message size, we have not seen the problem with the disk test.

/Johan
quoted from Shailesh Paudyal

From: Shailesh Paudyal [mailto:user-baeafc0cb301@xymon.invalid]
Sent: den 23 september 2010 22:57
To: xymon at xymon.com
Subject: Re: [xymon] Problem with disk monitoring

Thanks Henrik,
But I still see the problem, please see the following alert came from xymon a week or so ago.....


red Su] Aug 22 01:26:51 EDT 2010 - Filesystems NOT ok &red 21% / (20461936% used) has reached the PANIC level (95%) &red 1% /app (65009052% used) has reached the PANIC level (95%) &red 1% /home (112112360% used) has reached the PANIC level (95%) &red 6% /var (8933348% used) has reached the PANIC level (95%) &red 1% /tmp (18638136% used) has reached the PANIC level (95%) &red 49% /boot (48938% used) has reached the PANIC level (95%) &red 11% /u01 (218810168% used) has reached the PANIC level (95%) &red 7% /u04 (228560164% used) has reached the PANIC level (95%) &red 35% /u02 (1154279480% used) has reached the PANIC level (95%) &red 24% /old_u02 (1507070236% used) has reached the PANIC level (95%)


Filesystem         1

24-]locks      Used Available Capacity Mounted on

/dev/sda5             27054004   5195620  20461936      21% /

/dev/sdb1             68814716    253696  65009052       1% /app

/dev/sdc2            118417044    192356 112112360       1% /home

/dev/sda3              9920624    475208   8933348       6% /var

/dev/sdc1             19840892    178616  18638136       1% /tmp

/dev/sda1               101086     46929     48938      49% /boot

/dev/mapper/VolGroup01-u01 258022788  26105832 218810168      11% /u01

/dev/mapper/VolGroup04-u04 258022788  16355836 228560164       7% /u04

/dev/mapper/VolGroup03-u03 1857784872 609135396 1154279480      35% /u02

/dev/mapper/VolGroup02-u02 2064204960 452279172 1507070236      24% /old_u02

On Thu, Sep 23, 2010 at 3:45 PM, Henrik Størner <user-ce4a2c883f75@xymon.invalid<mailto:user-ce4a2c883f75@xymon.invalid>> wrote:
This is a somewhat old post, but I'm responding anyway ...

In <user-bac20ebe1220@xymon.invalid<mailto:user-bac20ebe1220@xymon.invalid>> Steve Holmes <user-ec1bf77b1b44@xymon.invalid<mailto:user-ec1bf77b1b44@xymon.invalid>> writes:
Please see below, there is a problem with disk monitoring on one of the
server. Can some one tell me if I did something wrong?

W]d Jul 28 10:34:31 EDT 2010 - Filesystems NOT ok

 7% / (8816628% used) has reached the PANIC level (95%)
 38% /u01 (90371708% used) has reached the PANIC level (95%)

Filesystem         10
4-b]ocks      Used Available Capacity Mounted on
/dev/sda9              9920592    591896   8816628       7% /
/dev/sda10           152435112  54195172  90371708      38% /u01
/dev/sda8              9920592    154056   9254468       2% /tmp
It appears that Xymon has slipped one field to the left in parsing the df
output. The string at the beginning of each of the lines before the actual
df ouput should be the name of the filesystem (plus an icon, but we'll
ignore that for now). Then it is using the available number as the percent
used, which, of course, is huge.
I don't know if this is causing the problem but there is some funkiness with
the first line of the df output. It is broken between the 10 and the 4 and
there is a ']' instead of an 'l' in the word "blocks". Maybe this is a
cut/paste error, but if not, it is certainly not right.
There is a bug somewhere in the Xymon 4.3.0-beta code with the "df"
status handling. I've seen it cause random RRD files to appear for
systems that don't have such filesystems, and occasionally it would
also result in this behaviour where a disk status goes wild.

I haven't been able to nail it yet, mostly because it seems to happen
very rarely and completely without any pattern. It would seem like
some sort of memory corruption problem, but I've had the client-message
handler running for days with valgrind (memory access checker) enabled,
and it came up with nothing.

Very annoying.


Regards,
Henrik


xymon-unsubscribe at xymon.com<mailto:xymon-unsubscribe at xymon.com>


--
Shailesh K. Paudyal
user-baeafc0cb301@xymon.invalid<mailto:user-baeafc0cb301@xymon.invalid>
list Henrik Størner · Thu, 7 Oct 2010 16:47:11 +0000 (UTC) ·
In <user-d804fbff0b62@xymon.invalid> =?iso-8859-1?Q?Johan_Sj=F6berg?= <user-74c177c1220d@xymon.invalid> writes:
I think this is somehow related to oversized status messages. We were havin=
g problems with this on 4.3.0 beta, and we also had a lot of oversized stat=
us messages (ports etc). Since we increased the max message size, we have n=
ot seen the problem with the disk test.
I think I've found this bug. It was quite subtle, and there was a greater
chance of triggering it when you had an oversize message, but it could 
also happen without any of them.

I'll commit an update for this later tonight, so it will be included in the
next (beta) release, which I expect to ship sometime next week.


Regards,
Henrik