Xymon Mailing List Archive search

False alert on disk

list Lars Kollstedt
Mon, 28 Nov 2016 23:14:14 +0100
Message-Id: <1852618.MeXDiMz60J@linux-larsk>

Hi Neil, hi List,

on Friday, 25. November 2016, 13:36:29 Neil Simmonds wrote:
Hi all,

I'm getting a strange false alert on one of our Xymon systems.
We got an alert for disk and the webpage output looks like this, 

Fri Nov 25 13:18:56 2016 - Filesystems NOT ok

red 99 (0 units free) has reached the PANIC level (524288 units)

red GB (18446744073709551615 units free) has reached the PANIC level (524288
units)

red N/A (18446744073709551615 units free) has reached the PANIC level
(524288 units)


Filesystem   1K-blocks     Used       Avail    Capacity   Total Size   Free
Space   Type    Mount Point
[...]
C             52420060   33935280   18484780    64%         49

99 GB]    17.63 GB   FIXED   N/A
[...]
Notice that some of the lines seem to have spurious line feeds, there is a
square bracket that has appeared and we have some letters missing.

When I clicked on the link for the client data this is what the disk section
looks like.
[...]
As you can see, there doesn't appear to be anything wrong with this.
Yes. I'm not not completely sure, that would always show up here already. But captured the client message channel and analyzed it per script. And the messages I got where all OK.
The only difference that I am aware of with this is that on our system where
we are not seeing this, we are running Xymon 4.3.4 on CentOS 5.6 and on the
one where we are seeing the issue we are running Xymon 4.3.4 on CentOS 6.3
[...]
Has anyone ever seen this kind of behaviour?
Yes, I had the same issue some weeks ago on really old 4.3.0.0-beta2. It turned out this was caused by an initialization issue when truncating client messages. So it was caused by a large client message, from the client reporting before. My workaround for this was to allow larger client messages, but I'm not sure this wouldn't even possibly have security impact, since the behavior is still strange for false initialized pointers or data left over in hobbitd_worker.c / xymond_worker.c, when truncating messages.
Mainly the stuff you give as "99 GB] " made me worry about this. Where is this braked from? I had it, too. See examples below. And it definitely wasn't in this place in the client message passed to the  hobbitd_client / xymond_client worker.

After lots of debugging I saw the "Got over-size message, truncating at" that lead me to the cause.

But I hadn't the time to really hunt it down, till now. :-( Possibly I'm also not familiar enough with the xymon code for this. ;-) 
I often also had a bracket an sometimes a line break but sometimes nothing of both within the df's output headline.
It was randomly affecting different machines, and the Square Brackets where also found within the ports status reported by the hobbitd_client / xymond_client worker, but didn't result in red statuses there due to our mostly less hard analysis rules for the ports.

**** False Positive Message ****
manda4.hrz.tu-darmstadt.de:disk red [443790]
red Sat Oct 15 04]20:35 CEST 2016 - Filesystems NOT ok
&red 15594972      15% / (2651148% used) has reached the PANIC level (95%)
&red 609648       1% /run (444% used) has reached the PANIC level (95%)
&red 2% /tmp (1787588% used) has reached the PANIC level (95%)
&red 13324360       3% /home (360668% used) has reached the PANIC level (95%)
&red 44667620       6% /srv (2574396% used) has reached the PANIC level (95%)
&red 39834852      13% /var (5784076% used) has reached the PANIC level (95%)
&red 4472720       4% /var/lib/mysql (179952% used) has reached the PANIC level (95%)
&red 10% /var/lib/hobbit (116445760% used) has reached the PANIC level (95%)

Filesystem     1024-bloc
s   ]
Use] Available Capacity Mounted on
/dev/sda1         19222656  2651148  15594972      15% /
udev               3041408        4   3041404       1% /dev
tmpfs               610092      444    609648       1% /run
none                  5120        0      5120       0% /run/lock
none               3050460        0   3050460       0% /run/shm
/dev/sda7         19210]6    35864   1787588       2% /tmp
/dev/sda8         14417392   360668  13324360       3% /home
/dev/sda9         49770220  2574396  44667620       6% /srv
/dev/sda6         48060296  5784076  39834852      13% /var
/dev/sda10         4914816   179952   4472720       4% /var/lib/mysql
/dev/sda11       1
531996] 11656580 116445760      10% /var/lib/hobbit

**** False Positive Message ****
maven01-vb.hrz.tu-darmstadt.de:disk red [774507]
red Sat Oct 15 09:46:22 CEST 2016 - Filesystems NOT ok
&red 1% /run (406100% used) has reached the PANIC level (95%)

Filesystem                                             1024-blocks    Used Available Capacity Mounted on
udev                                                         10240       0     10240       0% /dev
t
pfs   ]                                                   406356     256    406100       1% /run
/dev/disk/by-uuid/298ee340-256f-4430-bba1-a14a475728c1    19222656 4254772  13991348      24% /
tmpfs                                                         5120       0      5120      0% /r]n/lock
tmpfs                                                      1398620       0   1398620       0% /run/shm
/dev/sda1                                                   350275   19677    311910       6% /boot
/dev/sda5                                                  8484528  220312   7833216       3% /home
/dev/sdb1                                                 31391836 6749152  23069824      23% /mnt/vol0

Since allowing lager client-messages the issues are gone. The oversize message came from the machine reporting one or two client messages before. As far a I could reproduce the client message from the machine in between was completely ignored if the cause was two before.

Kind regards.
	Lars

-- 
man-da.de GmbH, AS8365                          Phone: +XX XXXX XX-XXXXX
Mornewegstraße 30                               Fax: +XX XXXX XX-XXXXX
D-64293 Darmstadt                               e-mail: user-0f90394071da@xymon.invalid
Geschäftsführer Marcus Stögbauer                AG Darmstadt, HRB 94 84