Xymon Mailing List Archive search

Disk Messages being misparsed

3 messages in this thread

list Thomas Vachon · Thu, 1 Nov 2012 09:20:52 -0400 ·
I'm not sure to stop this, but it is becoming an epidemic for all of our
hosts.  The message is clearly being mis-parsed, restarting the client send
a clean disk message.  I have a very large message size allowed and I'm not
getting warnings about messages over the limit.


Th] Nov 1 13:08:04 UTC 2012 - Filesystems NOT ok

 54% / (4535836% used) has reached the PANIC level (95%)
 1% /lib/init/rw (873004% used) has reached the PANIC level (95%)
 1% /dev (867876% used) has reached the PANIC level (95%)
 0% /dev/shm (873012% used) has reached the PANIC level (95%)
 1% /mnt (331757456% used) has reached the PANIC level (95%)

Filesyst
m   ]
   ]1024-blocks      Used Available Capacity Mounted on
/dev/sda1             10321208   5261084   4535836      54% /
tmpfs                   873012         8    873004       1% /lib/init/rw
udev                    868012       136    867876       1% /dev
tmpfs                   873012         0    873012       0% /dev/shm
/dev/sda2            350891748   1310012 331757456       1% /mnt


--
Thomas Vachon
Principal Operations Architect
*session M*
list Thomas Vachon · Thu, 1 Nov 2012 09:37:59 -0400 ·
One major item of note is the fact that this is the client data (which got
munged on display), so it has to be happening server side

[df]
Filesystem         1024-blocks      Used Available Capacity Mounted on
/dev/sda1             10321208   5940992   3855928      61% /
tmpfs                  3824060         8   3824052       1% /lib/init/rw
udev                   3816848       144   3816704       1% /dev
tmpfs                  3824060         0   3824060       0% /dev/shm
/dev/sdb1            433455540 156601584 254835672      39%
/mnt/hadoop/dfs/data1
/dev/sdc1            433455540 156777848 254659408      39%
/mnt/hadoop/dfs/data2
[mount]
/dev/sda1 on / type ext4 (rw)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
/dev/sdb1 on /mnt/hadoop/dfs/data1 type ext3 (rw)
quoted from Thomas Vachon


--
Thomas Vachon
Principal Operations Architect
*session M
• On Thu, Nov 1, 2012 at 9:20 AM, Thomas Vachon <user-bd0daa6991dc@xymon.invalid> wrote:
I'm not sure to stop this, but it is becoming an epidemic for all of our
hosts.  The message is clearly being mis-parsed, restarting the client send
a clean disk message.  I have a very large message size allowed and I'm not
getting warnings about messages over the limit.


Th] Nov 1 13:08:04 UTC 2012 - Filesystems NOT ok

 54% / (4535836% used) has reached the PANIC level (95%)
 1% /lib/init/rw (873004% used) has reached the PANIC level (95%)
 1% /dev (867876% used) has reached the PANIC level (95%)
 0% /dev/shm (873012% used) has reached the PANIC level (95%)
 1% /mnt (331757456% used) has reached the PANIC level (95%)

Filesyst
m   ]
   ]1024-blocks      Used Available Capacity Mounted on
/dev/sda1             10321208   5261084   4535836      54% /
tmpfs                   873012         8    873004       1% /lib/init/rw
udev                    868012       136    867876       1% /dev
tmpfs                   873012         0    873012       0% /dev/shm
/dev/sda2            350891748   1310012 331757456       1% /mnt


--
Thomas Vachon
Principal Operations Architect
*session M*

list Henrik Størner · Thu, 01 Nov 2012 14:54:09 +0100 ·
On Thu, 1 Nov 2012 09:20:52 -0400, Thomas Vachon <user-bd0daa6991dc@xymon.invalid>
quoted from Thomas Vachon
wrote:
I'm not sure to stop this, but it is becoming an epidemic for all of our
hosts.  The message is clearly being mis-parsed, restarting the client
send
a clean disk message.  I have a very large message size allowed and I'm
not
getting warnings about messages over the limit.
You are running a Xymon version older than 4.3.4. Please upgrade - this is
a known bug in all 4.3.x versions before 4.3.4.


From the Changes file:

Changes from 4.3.3 -> 4.3.4 (1 Aug 2011)
========================================
* Fix crashes and data corruption in Xymon worker modules
  (xymond_client, xymond_rrd etc) after handling large
  messages.


Regards,
Henrik