Xymon Mailing List Archive search

Xymon reports excessive memory usage on 1 SLES 11 host

14 messages in this thread

list Carl Melgaard · Tue, 14 Dec 2010 09:46:20 +0100 ·
Hi,

I have a weird problem on one of my SLES 11 hosts:

Tue Dec 14 03:00:52 CET 2010 - Memory CRITICAL

   Memory              Used       Total  Percentage

[cid:image001.gif at 01CB9B73.C1882320] Physical           3803M       3829M         99%

[cid:image002.gif at 01CB9B73.C1882320] Actual      17592186044128M       3829M459445966156%

[cid:image001.gif at 01CB9B73.C1882320] Swap                  0M       2055M          0%


-          as you can see, it reports an excessive memory usage, and hence turns red and does this 2 times each night. It's the 4.2.3 client compiled under SLES 11.

Any ideas?

Regards,

Carl Melgaard
list Henrik Størner · Tue, 14 Dec 2010 10:08:15 +0000 (UTC) ·
quoted from Carl Melgaard
On Tue, 14 Dec 2010 09:46:20 +0100, Carl Melgaard wrote:
I have a weird problem on one of my SLES 11 hosts:

Tue Dec 14 03:00:52 CET 2010 - Memory CRITICAL

   Memory              Used       Total  Percentage

Physical           3803M      3829M         99%
Actual      17592186044128M      3829M  459445966156%
quoted from Carl Melgaard
Swap                  0M      2055M          0%

as you can see, it reports an excessive memory usage
Could you show me the client data behind this report?
Assuming you have the "hostdata" task running, it should be
available from the historical status log via the "Client data available"
link near the bottom of the page.

It is the "[free]" section that is interesting for the memory
report.

Also, what version of Xymon are you running on your Xymon server?


Regards,
Henrik
list Carl Melgaard · Tue, 14 Dec 2010 11:16:20 +0100 ·
Hi,
quoted from Henrik Størner
as you can see, it reports an excessive memory usage
Could you show me the client data behind this report?
Assuming you have the "hostdata" task running, it should be
available from the historical status log via the "Client data available"
link near the bottom of the page.

It is the "[free]" section that is interesting for the memory
report.
Here are two separate [free] sections (from history), that resulted in red alerts:

[free]
             total       used       free     shared    buffers     cached
Mem:       3921396    3894772      26624          0     302132    3887292
-/+ buffers/cache: 18014398509187332    4216048
Swap:      2104472        904    2103568

- and

[free]
             total       used       free     shared    buffers     cached
Mem:       3921396    3851576      69820          0     319192    3903296
-/+ buffers/cache: 18014398509111072    4292308
Swap:      2104472        904    2103568

- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally.

Regards,

Carl Melgaard
list Tim McCloskey · Tue, 14 Dec 2010 09:31:05 -0800 ·
I've seen the same strange reports from 4.2.0.  Solaris 10 zones on x86 are the reporting clients, example: 
red Physical     4294953114M      16384M 4294967210%

Since it's not a large scale issue for me I've not dedicated any time to looking into it.  For me,  it's a new issue on recent zone deployments only (in the last year or so), so I figured some version of shell tool xyz may be the culprit.  I can look further or provide additional data if needed.  The point is that from 4.2.0 to 4.4 the issue presents itself, but it's new for me so I'm not sure it's hobbit.

Regards, 

Tim
quoted from Carl Melgaard

From: Carl Melgaard [user-cdea55422fa4@xymon.invalid]
Sent: Tuesday, December 14, 2010 2:16 AM
To: 'xymon at xymon.com'
Subject: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host

Hi,
as you can see, it reports an excessive memory usage
Could you show me the client data behind this report?
Assuming you have the "hostdata" task running, it should be
available from the historical status log via the "Client data available"
link near the bottom of the page.

It is the "[free]" section that is interesting for the memory
report.
Here are two separate [free] sections (from history), that resulted in red alerts:

[free]
             total       used       free     shared    buffers     cached
Mem:       3921396    3894772      26624          0     302132    3887292
-/+ buffers/cache: 18014398509187332    4216048
Swap:      2104472        904    2103568

- and

[free]
             total       used       free     shared    buffers     cached
Mem:       3921396    3851576      69820          0     319192    3903296
-/+ buffers/cache: 18014398509111072    4292308
Swap:      2104472        904    2103568

- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally.

Regards,

Carl Melgaard
list Ralph Mitchell · Tue, 14 Dec 2010 12:56:54 -0500 ·
That looks like output from the 'free' command.  Try going to the server and
entering

     free

and see what it says.  Xymon doesn't alter the output at all, it just passes
on whatever comes out.  Here's the relevant part of xymonclient-linux.sh,
with context:

     echo "[mount]"
     mount
     echo "[free]"
     free
     echo "[ifconfig]"
     /sbin/ifconfig


Ralph Mitchell
quoted from Carl Melgaard


On Tue, Dec 14, 2010 at 5:16 AM, Carl Melgaard <user-cdea55422fa4@xymon.invalid>wrote:
Hi,
as you can see, it reports an excessive memory usage
Could you show me the client data behind this report?
Assuming you have the "hostdata" task running, it should be
available from the historical status log via the "Client data available"
link near the bottom of the page.

It is the "[free]" section that is interesting for the memory
report.
Here are two separate [free] sections (from history), that resulted in red
alerts:

[free]
            total       used       free     shared    buffers     cached
Mem:       3921396    3894772      26624          0     302132    3887292
-/+ buffers/cache: 18014398509187332    4216048
Swap:      2104472        904    2103568

- and

[free]
            total       used       free     shared    buffers     cached
Mem:       3921396    3851576      69820          0     319192    3903296
-/+ buffers/cache: 18014398509111072    4292308
Swap:      2104472        904    2103568

- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented
Xymon originally.

Regards,

Carl Melgaard

list Henrik Størner · Tue, 14 Dec 2010 21:29:30 +0000 (UTC) ·
quoted from Carl Melgaard
On Tue, 14 Dec 2010 11:16:20 +0100, Carl Melgaard wrote:
Could you show me the client data behind this report? Assuming you have
the "hostdata" task running, it should be available from the historical
status log via the "Client data available" link near the bottom of the
page.

It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in
red alerts:

[free]
             total       used       free     shared    buffers    cached
Mem:       3921396    3894772      26624          0     302132  3887292 -/+ buffers/cache: 18014398509187332    4216048 Swap:     2104472        904    2103568

- and

[free]
             total       used       free     shared    buffers    cached
Mem:       3921396    3851576      69820          0     319192  3903296 -/+ buffers/cache: 18014398509111072    4292308 Swap:     2104472        904    2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is
what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results.


Regards,
Henrik
list Tim McCloskey · Tue, 14 Dec 2010 13:42:21 -0800 ·
What does the Solaris client use to get this data?  vmstat?
(free is not a native solaris tool).
quoted from Henrik Størner

From: Henrik Størner [user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, December 14, 2010 1:29 PM
To: xymon at xymon.com
Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host

On Tue, 14 Dec 2010 11:16:20 +0100, Carl Melgaard wrote:
Could you show me the client data behind this report? Assuming you have
the "hostdata" task running, it should be available from the historical
status log via the "Client data available" link near the bottom of the
page.

It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in
red alerts:

[free]
             total       used       free     shared    buffers    cached
Mem:       3921396    3894772      26624          0     302132  3887292
-/+ buffers/cache: 18014398509187332    4216048
Swap:     2104472        904    2103568

- and

[free]
             total       used       free     shared    buffers    cached
Mem:       3921396    3851576      69820          0     319192  3903296
-/+ buffers/cache: 18014398509111072    4292308
Swap:     2104472        904    2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported
by "free" in the "+/- buffers/cache" line is obviously bogus - but it is
what Xymon uses for the "Actual" memory calculations. If Xymon gets
bogus data, then you will also have bogus results.


Regards,
Henrik
list Henrik Størner · Tue, 14 Dec 2010 21:50:10 +0000 (UTC) ·
quoted from Tim McCloskey
On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data?  vmstat? (free is not
a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is
completely non-standard, and the one part of the Xymon client that
requires the most code for each new OS!

Specifically for Solaris, Xymon uses prtconf to determine how
much memory is installed, and vmstat to determine how much is
being used. "swap -s" was used for determining how much swap was
being used, but earlier today I committed an update so we will
now use "swap -l" instead.


Regards,
Henrik
list Tim McCloskey · Tue, 14 Dec 2010 14:11:52 -0800 ·
Henrik,

Thanks for the speedy answer.  I had seen this in fun in hobbitd/client/$clients.c.  You must enjoy porting that part of the project each time some OS makes a change :)

Trivia: 
On Solaris 10 zones prtconf can get the installed "Memory size:"  But anything further (like prtdiag) will fail.

System Configuration:  Sun Microsystems  i86pc
Memory size: 32768 Megabytes
System Peripherals (Software Nodes):

prtconf: devinfo facility not available


Regards, 

Tim
quoted from Henrik Størner


From: Henrik Størner [user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, December 14, 2010 1:50 PM
To: xymon at xymon.com
Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host

On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data?  vmstat? (free is not
a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is
completely non-standard, and the one part of the Xymon client that
requires the most code for each new OS!

Specifically for Solaris, Xymon uses prtconf to determine how
much memory is installed, and vmstat to determine how much is
being used. "swap -s" was used for determining how much swap was
being used, but earlier today I committed an update so we will
now use "swap -l" instead.


Regards,
Henrik
list Vernon Everett · Wed, 15 Dec 2010 11:16:46 +0800 ·
Not much point in doing memory monitoring on a Solaris sparse zone.
Might as well put a check in the client script to not collect memory info
for a sparse zone with capped memory.
See here for more info.
http://www.xymon.com/archive/2010/02/msg00213.html

Regards
     Vernon
quoted from Tim McCloskey


On Wed, Dec 15, 2010 at 6:11 AM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:
Henrik,

Thanks for the speedy answer.  I had seen this in fun in
hobbitd/client/$clients.c.  You must enjoy porting that part of the project
each time some OS makes a change :)

Trivia:
On Solaris 10 zones prtconf can get the installed "Memory size:"  But
anything further (like prtdiag) will fail.

System Configuration:  Sun Microsystems  i86pc
Memory size: 32768 Megabytes
System Peripherals (Software Nodes):

prtconf: devinfo facility not available


Regards,

Tim


From: Henrik Størner [user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, December 14, 2010 1:50 PM
To: xymon at xymon.com
Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11
host

On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data?  vmstat? (free is not
a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is
completely non-standard, and the one part of the Xymon client that
requires the most code for each new OS!

Specifically for Solaris, Xymon uses prtconf to determine how
much memory is installed, and vmstat to determine how much is
being used. "swap -s" was used for determining how much swap was
being used, but earlier today I committed an update so we will
now use "swap -l" instead.


Regards,
Henrik

list Carl Melgaard · Wed, 15 Dec 2010 09:29:45 +0100 ·
Hi,

"free" just gives the normal output, but sometimes it apparently reports bogus data on 1 host. Shrug. And it's the same version running across all the hosts. Thanks for looking into it tho.

Regards,

Carl Melgaard
quoted from Ralph Mitchell

Fra: Ralph Mitchell [mailto:user-00a5e44c48c0@xymon.invalid]
Sendt: 14. december 2010 18:57
Til: xymon at xymon.com
Emne: Re: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host

That looks like output from the 'free' command.  Try going to the server and entering

     free

and see what it says.  Xymon doesn't alter the output at all, it just passes on whatever comes out.  Here's the relevant part of xymonclient-linux.sh, with context:

     echo "[mount]"
     mount
     echo "[free]"
     free
     echo "[ifconfig]"
     /sbin/ifconfig


Ralph Mitchell

On Tue, Dec 14, 2010 at 5:16 AM, Carl Melgaard <user-cdea55422fa4@xymon.invalid<mailto:user-cdea55422fa4@xymon.invalid>> wrote:
Hi,
as you can see, it reports an excessive memory usage
Could you show me the client data behind this report?
Assuming you have the "hostdata" task running, it should be
available from the historical status log via the "Client data available"
link near the bottom of the page.

It is the "[free]" section that is interesting for the memory
report.
Here are two separate [free] sections (from history), that resulted in red alerts:

[free]
            total       used       free     shared    buffers     cached
Mem:       3921396    3894772      26624          0     302132    3887292
-/+ buffers/cache: 18014398509187332    4216048
Swap:      2104472        904    2103568

- and

[free]
            total       used       free     shared    buffers     cached
Mem:       3921396    3851576      69820          0     319192    3903296
-/+ buffers/cache: 18014398509111072    4292308
Swap:      2104472        904    2103568

- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally.

Regards,

Carl Melgaard
list Carl Melgaard · Wed, 15 Dec 2010 09:34:12 +0100 ·
quoted from Tim McCloskey
[free]
             total       used       free     shared    buffers    cached
Mem:       3921396    3851576      69820          0     319192  3903296 -/+ buffers/cache: 18014398509111072    4292308 Swap:     2104472        904    2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is
what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results.
Yes, thats understandable. Is there any way I can NOT trigger notification on these bogus alerts? Disable the MEMACT check for that host?

Regards,

Carl Melgaard
list Henrik Størner · Wed, 15 Dec 2010 11:31:33 +0000 (UTC) ·
quoted from Carl Melgaard
In <user-b025057e2892@xymon.invalid> Carl Melgaard <user-cdea55422fa4@xymon.invalid> writes:
[free]
             total       used       free     shared    buffers    cached
Mem:       3921396    3851576      69820          0     319192  3903296=
=20
-/+ buffers/cache: 18014398509111072    4292308=20
Swap:     2104472        904    2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported=20
by "free" in the "+/- buffers/cache" line is obviously bogus - but it is
what Xymon uses for the "Actual" memory calculations. If Xymon gets=20
quoted from Carl Melgaard
bogus data, then you will also have bogus results.
Yes, thats understandable. Is there any way I can NOT trigger notification =
on these bogus alerts? Disable the MEMACT check for that host?
Not in the code You have. But it seems reasonable to add some sort of sanity
check in the memory-status handler, so I've done that to only act on the
data when the percent-used is at most 100%. So you will a) not get alerts
from the bogus data, and b) you can disable all memory alerts by setting
a threshold greater than 100.

Patch below should apply to 4.3.0-beta3.

Regards,
Henrik

Index: xymond/xymond_client.c
===================================================================
--- xymond/xymond_client.c	(revision 6590)
+++ xymond/xymond_client.c	(working copy)
@@ -883,17 +883,19 @@
 	get_memory_thresholds(hinfo, clientclass, &physyellow, &physred, &swapyellow, &swapred, &actyellow, &actred);
 
 	memphyspct = (memphystotal > 0) ? ((100 * memphysused) / memphystotal) : 0;
-	if (memphyspct > physyellow) physcolor = COL_YELLOW;
-	if (memphyspct > physred)    physcolor = COL_RED;
+	if (memphyspct <= 100) {
+		if (memphyspct > physyellow) physcolor = COL_YELLOW;
+		if (memphyspct > physred)    physcolor = COL_RED;
+	}
 
-	if (memswapused != -1) {
-		memswappct = (memswaptotal > 0) ? ((100 * memswapused) / memswaptotal) : 0;
+	if (memswapused != -1) memswappct = (memswaptotal > 0) ? ((100 * memswapused) / memswaptotal) : 0;
+	if (memswappct <= 100) {
 		if (memswappct > swapyellow) swapcolor = COL_YELLOW;
 		if (memswappct > swapred)    swapcolor = COL_RED;
 	}
 
-	if (memactused != -1) {
-		memactpct = (memphystotal > 0) ? ((100 * memactused) / memphystotal) : 0;
+	if (memactused != -1) memactpct = (memphystotal > 0) ? ((100 * memactused) / memphystotal) : 0;
+	if (memactpct <= 100) {
 		if (memactpct  > actyellow)  actcolor  = COL_YELLOW;
 		if (memactpct  > actred)     actcolor  = COL_RED;
 	}
@@ -927,14 +929,24 @@
 	addtostatus(msgline);
 
 	if (memactused != -1) {
-		sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", 
-			colorname(actcolor), "Actual", memactused, memphystotal, memactpct);
+		if (memactpct <= 100)
+			sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", 
+				colorname(actcolor), "Actual", memactused, memphystotal, memactpct);
+		else
+			sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%% - invalid data\n", 
+				colorname(COL_CLEAR), "Actual", memactused, memphystotal, 0);
• addtostatus(msgline);
 	}
 
 	if (memswapused != -1) {
-		sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", 
-			colorname(swapcolor), "Swap", memswapused, memswaptotal, memswappct);
+		if (memswappct <= 100)
+			sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n", 
+				colorname(swapcolor), "Swap", memswapused, memswaptotal, memswappct);
+		else
+			sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%% - invalid data\n", 
+				colorname(COL_CLEAR), "Swap", memswapused, memswaptotal, 0);
• addtostatus(msgline);
 	}
 	if (fromline && !localmode) addtostatus(fromline);
list Tim McCloskey · Wed, 15 Dec 2010 09:25:47 -0800 ·
Thanks Vernon.  I know that what we see is not 100% accurate but for the majority of my zones they are only used for one purpose (only one child zone per phys server).  So, for me, the data provided from the zone is useful.  We don't really measure exact numbers, more of trends and watching for quick spikes.  
quoted from Vernon Everett
From: Vernon Everett [user-b3f8dacb72c8@xymon.invalid]
Sent: Tuesday, December 14, 2010 7:16 PM
To: xymon at xymon.com
Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host

Not much point in doing memory monitoring on a Solaris sparse zone.
Might as well put a check in the client script to not collect memory info for a sparse zone with capped memory.
See here for more info.
http://www.xymon.com/archive/2010/02/msg00213.html

Regards
     Vernon


On Wed, Dec 15, 2010 at 6:11 AM, Tim McCloskey <user-440820cc07d6@xymon.invalid<mailto:user-440820cc07d6@xymon.invalid>> wrote:
Henrik,

Thanks for the speedy answer.  I had seen this in fun in hobbitd/client/$clients.c.  You must enjoy porting that part of the project each time some OS makes a change :)

Trivia:
On Solaris 10 zones prtconf can get the installed "Memory size:"  But anything further (like prtdiag) will fail.

System Configuration:  Sun Microsystems  i86pc
Memory size: 32768 Megabytes
System Peripherals (Software Nodes):

prtconf: devinfo facility not available


Regards,

Tim


From: Henrik Størner [user-ce4a2c883f75@xymon.invalid<mailto:user-ce4a2c883f75@xymon.invalid>]
Sent: Tuesday, December 14, 2010 1:50 PM
To: xymon at xymon.com<mailto:xymon at xymon.com>
Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host

On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data?  vmstat? (free is not
a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is
completely non-standard, and the one part of the Xymon client that
requires the most code for each new OS!

Specifically for Solaris, Xymon uses prtconf to determine how
much memory is installed, and vmstat to determine how much is
being used. "swap -s" was used for determining how much swap was
being used, but earlier today I committed an update so we will
now use "swap -l" instead.


Regards,
Henrik

xymon-unsubscribe at xymon.com<mailto:xymon-unsubscribe at xymon.com>