Xymon Mailing List Archive search

False red alerts for disk

6 messages in this thread

list Patrik Nilsson · Wed, 10 Jun 2009 16:57:30 +0200 ·
Hi,

Running Xymon 4.3.0-0.beta2, I sometimes gets false red alerts from
disk on a few servers (One of the servers is the xymon server itself).

Usually disk status is reported green, as this:

Wed Jun 10 16:29:17 CEST 2009 - Filesystems OK

Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda1            204603376   1616748 192593380       1% /

But occasionally, I get red alerts, like this:

- Filesystems NOT ok

red 192593256       1% / (1616872% used) has reached the PANIC level (95%)

Filesystem         1024-blocks
Use] Available Capacity Mounted on
/dev/sda1            204603376   1616872 192593256       1% /

Somehow the parsing of the client data doesn't work right, resulting
the disk blocks being interpreted as percent used.

The corresponding df part in the actual client report looks like this:

 [df]
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda1            204603376   1616872 192593256       1% /


On another server, the false red alert looks like this:
Wed Jun 10 15:51:53 CEST 2009 - Filesystems NOT ok

red 44% / (2778580% used) has reached the PANIC level (95%)
red 6% /home (2167204% used) has reached the PANIC level (95%)

Filesystem         1
24-]locks      Used Available Capacity Mounted on
/dev/xvda2             5162828   2121988   2778580      44% /
/dev/xvda3             24
7244  ] 136744   2167204       6% /home

While it usually looks like this:
 Wed Jun 10 15:56:54 CEST 2009 - Filesystems OK

Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/xvda2             5162828   2122012   2778556      44% /
/dev/xvda3             2427244    136784   2167164       6% /home


Slightly different, but once again, blocks used being interpreted as
percentage used.

Anyone has an idea of what might be causing this?

Thanks,

Patrik Nilsson
list Patrik Nilsson · Thu, 11 Jun 2009 11:40:40 +0200 ·
I am now also seeing this with memory reports. There seem to be a
general but intermittent parsing error of client data.

T 2009][uname]
Linux tc1.jalbum.net 2.6.18-92.1
22.el5xen ]86_64 - Memory CRITICAL
   Memory              Used       Total  Percentage
red Physical          48576M          1M    4857600%
red Actual              819M          1M      81900%
green Swap                 80M       1983M          4%

Notice the messed up brackets.

The corresponing part of the actual client data reported is:

client tc1,hostnamechanged,net.linux linux
[date]
Thu Jun 11 11:31:36 CEST 2009
[uname]
Linux tc1.hostnamechanged.net 2.6.18-92.1.22.el5xen x86_64
[osversion]
CentOS release 5.2 (Final)
[uptime]
 11:31:36 up 26 days, 22:25,  1 user,  load average: 0.12, 0.10, 0.03
[who]
root     xvc0         May 15 13:09
[df]
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/mapper/VolGroup00-LogVol00  10102072   5553636   4131628      58% /
/dev/xvda1              101086     20724     75143      22% /boot
[mount]
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/xvda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
192.168.8.8:/mnt/share on /share type nfs (rw,addr=192.168.8.8)
[free]
             total       used       free     shared    buffers     cached
Mem:       1048576    1043172       5404          0       1936     201892
-/+ buffers/cache:     839344     209232
Swap:      2031608      82368    1949240
[ifconfig]

Patrik
quoted from Patrik Nilsson

On Wed, Jun 10, 2009 at 4:57 PM, Patrik Nilsson<user-f78fa12d6274@xymon.invalid> wrote:
Hi,

Running Xymon 4.3.0-0.beta2, I sometimes gets false red alerts from
disk on a few servers (One of the servers is the xymon server itself).

Usually disk status is reported green, as this:

Wed Jun 10 16:29:17 CEST 2009 - Filesystems OK

Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda1            204603376   1616748 192593380       1% /

But occasionally, I get red alerts, like this:

- Filesystems NOT ok

red 192593256       1% / (1616872% used) has reached the PANIC level (95%)

Filesystem         1024-blocks
Use] Available Capacity Mounted on
/dev/sda1            204603376   1616872 192593256       1% /

Somehow the parsing of the client data doesn't work right, resulting
the disk blocks being interpreted as percent used.

The corresponding df part in the actual client report looks like this:

 [df]
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda1            204603376   1616872 192593256       1% /


On another server, the false red alert looks like this:
Wed Jun 10 15:51:53 CEST 2009 - Filesystems NOT ok

red 44% / (2778580% used) has reached the PANIC level (95%)
red 6% /home (2167204% used) has reached the PANIC level (95%)

Filesystem         1
24-]locks      Used Available Capacity Mounted on
/dev/xvda2             5162828   2121988   2778580      44% /
/dev/xvda3             24
7244  ] 136744   2167204       6% /home

While it usually looks like this:
 Wed Jun 10 15:56:54 CEST 2009 - Filesystems OK

Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/xvda2             5162828   2122012   2778556      44% /
/dev/xvda3             2427244    136784   2167164       6% /home


Slightly different, but once again, blocks used being interpreted as
percentage used.

Anyone has an idea of what might be causing this?

Thanks,

Patrik Nilsson
list Craig Cook · Thu, 11 Jun 2009 12:20:33 -0400 ·
Sounds like what I saw previously:
http://www.hswn.dk/hobbiton/2008/10/msg00416.html

I think the cause was oversize messages coming into the Xymon server...

Craig
list Patrik Nilsson · Thu, 11 Jun 2009 19:48:30 +0200 ·
Thanks Craig, that sounds exactly like our issue! We are having
oversize messages coming in from servers with thousands of ports
active.

Did you solve it by increasing the max size of the messages?

Patrik
quoted from Craig Cook

On Thu, Jun 11, 2009 at 6:20 PM, Craig Cook<user-850f03189cf7@xymon.invalid> wrote:
Sounds like what I saw previously:

http://www.hswn.dk/hobbiton/2008/10/msg00416.html


I think the cause was oversize messages coming into the Xymon server…


Craig
list Craig Cook · Thu, 11 Jun 2009 15:01:17 -0400 ·
Did you solve it by increasing the max size of the messages?
I increased these in hobbitserver.cfg (if they are not there add them in)

MAXLINE=
MAXMSG_STATUS=
MAXMSG_CLIENT=

I have not seen the weird disk errors since.

Craig
list Patrik Nilsson · Fri, 12 Jun 2009 12:34:32 +0200 ·
Thanks, Craig, that solved it.
quoted from Craig Cook

Patrik

On Thu, Jun 11, 2009 at 9:01 PM, Craig Cook<user-850f03189cf7@xymon.invalid> wrote:
Did you solve it by increasing the max size of the messages?

I increased these in hobbitserver.cfg (if they are not there add them in)


MAXLINE=

MAXMSG_STATUS=

MAXMSG_CLIENT=


I have not seen the weird disk errors since.


Craig