Xymon reports excessive memory usage on 1 SLES 11 host
list Carl Melgaard
Hi, I have a weird problem on one of my SLES 11 hosts: Tue Dec 14 03:00:52 CET 2010 - Memory CRITICAL Memory Used Total Percentage [cid:image001.gif at 01CB9B73.C1882320] Physical 3803M 3829M 99% [cid:image002.gif at 01CB9B73.C1882320] Actual 17592186044128M 3829M459445966156% [cid:image001.gif at 01CB9B73.C1882320] Swap 0M 2055M 0% - as you can see, it reports an excessive memory usage, and hence turns red and does this 2 times each night. It's the 4.2.3 client compiled under SLES 11. Any ideas? Regards, Carl Melgaard
list Henrik Størner
▸
On Tue, 14 Dec 2010 09:46:20 +0100, Carl Melgaard wrote:
I have a weird problem on one of my SLES 11 hosts: Tue Dec 14 03:00:52 CET 2010 - Memory CRITICAL Memory Used Total Percentage
Physical 3803M 3829M 99%
Actual 17592186044128M 3829M 459445966156%
▸
Swap 0M 2055M 0%
as you can see, it reports an excessive memory usageCould you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page. It is the "[free]" section that is interesting for the memory report. Also, what version of Xymon are you running on your Xymon server? Regards, Henrik
list Carl Melgaard
Hi,
▸
as you can see, it reports an excessive memory usageCould you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page. It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free]
total used free shared buffers cached
Mem: 3921396 3894772 26624 0 302132 3887292
-/+ buffers/cache: 18014398509187332 4216048
Swap: 2104472 904 2103568
- and
[free]
total used free shared buffers cached
Mem: 3921396 3851576 69820 0 319192 3903296
-/+ buffers/cache: 18014398509111072 4292308
Swap: 2104472 904 2103568
- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally. Regards, Carl Melgaard
list Tim McCloskey
I've seen the same strange reports from 4.2.0. Solaris 10 zones on x86 are the reporting clients, example: red Physical 4294953114M 16384M 4294967210% Since it's not a large scale issue for me I've not dedicated any time to looking into it. For me, it's a new issue on recent zone deployments only (in the last year or so), so I figured some version of shell tool xyz may be the culprit. I can look further or provide additional data if needed. The point is that from 4.2.0 to 4.4 the issue presents itself, but it's new for me so I'm not sure it's hobbit. Regards, Tim
▸
From: Carl Melgaard [user-cdea55422fa4@xymon.invalid]
Sent: Tuesday, December 14, 2010 2:16 AM
To: 'xymon at xymon.com'
Subject: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
Hi,
as you can see, it reports an excessive memory usageCould you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page. It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free]
total used free shared buffers cached
Mem: 3921396 3894772 26624 0 302132 3887292
-/+ buffers/cache: 18014398509187332 4216048
Swap: 2104472 904 2103568
- and
[free]
total used free shared buffers cached
Mem: 3921396 3851576 69820 0 319192 3903296
-/+ buffers/cache: 18014398509111072 4292308
Swap: 2104472 904 2103568
- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally. Regards, Carl Melgaard
list Ralph Mitchell
That looks like output from the 'free' command. Try going to the server and
entering
free
and see what it says. Xymon doesn't alter the output at all, it just passes
on whatever comes out. Here's the relevant part of xymonclient-linux.sh,
with context:
echo "[mount]"
mount
echo "[free]"
free
echo "[ifconfig]"
/sbin/ifconfig
Ralph Mitchell
▸
On Tue, Dec 14, 2010 at 5:16 AM, Carl Melgaard <user-cdea55422fa4@xymon.invalid>wrote:
Hi,as you can see, it reports an excessive memory usageCould you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page. It is the "[free]" section that is interesting for the memory report.Here are two separate [free] sections (from history), that resulted in red alerts: [free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568 - and [free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568 - two different days.Also, what version of Xymon are you running on your Xymon server?Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally. Regards, Carl Melgaard
list Henrik Størner
▸
On Tue, 14 Dec 2010 11:16:20 +0100, Carl Melgaard wrote:
Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page. It is the "[free]" section that is interesting for the memory report.Here are two separate [free] sections (from history), that resulted in red alerts: [free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568 - and [free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results. Regards, Henrik
list Tim McCloskey
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
▸
From: Henrik Størner [user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, December 14, 2010 1:29 PM
To: xymon at xymon.com
Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
On Tue, 14 Dec 2010 11:16:20 +0100, Carl Melgaard wrote:
Could you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page. It is the "[free]" section that is interesting for the memory report.Here are two separate [free] sections (from history), that resulted in red alerts: [free] total used free shared buffers cached Mem: 3921396 3894772 26624 0 302132 3887292 -/+ buffers/cache: 18014398509187332 4216048 Swap: 2104472 904 2103568 - and [free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568
I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results. Regards, Henrik
list Henrik Størner
▸
On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS! Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead. Regards, Henrik
list Tim McCloskey
Henrik, Thanks for the speedy answer. I had seen this in fun in hobbitd/client/$clients.c. You must enjoy porting that part of the project each time some OS makes a change :) Trivia: On Solaris 10 zones prtconf can get the installed "Memory size:" But anything further (like prtdiag) will fail. System Configuration: Sun Microsystems i86pc Memory size: 32768 Megabytes System Peripherals (Software Nodes): prtconf: devinfo facility not available Regards, Tim
▸
From: Henrik Størner [user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, December 14, 2010 1:50 PM
To: xymon at xymon.com
Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS! Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead. Regards, Henrik
list Vernon Everett
Not much point in doing memory monitoring on a Solaris sparse zone. Might as well put a check in the client script to not collect memory info for a sparse zone with capped memory. See here for more info. http://www.xymon.com/archive/2010/02/msg00213.html Regards Vernon
▸
On Wed, Dec 15, 2010 at 6:11 AM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:
Henrik, Thanks for the speedy answer. I had seen this in fun in hobbitd/client/$clients.c. You must enjoy porting that part of the project each time some OS makes a change :) Trivia: On Solaris 10 zones prtconf can get the installed "Memory size:" But anything further (like prtdiag) will fail. System Configuration: Sun Microsystems i86pc Memory size: 32768 Megabytes System Peripherals (Software Nodes): prtconf: devinfo facility not available Regards, Tim From: Henrik Størner [user-ce4a2c883f75@xymon.invalid] Sent: Tuesday, December 14, 2010 1:50 PM To: xymon at xymon.com Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS! Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead. Regards, Henrik
list Carl Melgaard
Hi, "free" just gives the normal output, but sometimes it apparently reports bogus data on 1 host. Shrug. And it's the same version running across all the hosts. Thanks for looking into it tho. Regards, Carl Melgaard
▸
Fra: Ralph Mitchell [mailto:user-00a5e44c48c0@xymon.invalid]
Sendt: 14. december 2010 18:57
Til: xymon at xymon.com
Emne: Re: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host
That looks like output from the 'free' command. Try going to the server and entering
free
and see what it says. Xymon doesn't alter the output at all, it just passes on whatever comes out. Here's the relevant part of xymonclient-linux.sh, with context:
echo "[mount]"
mount
echo "[free]"
free
echo "[ifconfig]"
/sbin/ifconfig
Ralph Mitchell
On Tue, Dec 14, 2010 at 5:16 AM, Carl Melgaard <user-cdea55422fa4@xymon.invalid<mailto:user-cdea55422fa4@xymon.invalid>> wrote:
Hi,
as you can see, it reports an excessive memory usageCould you show me the client data behind this report? Assuming you have the "hostdata" task running, it should be available from the historical status log via the "Client data available" link near the bottom of the page. It is the "[free]" section that is interesting for the memory report.
Here are two separate [free] sections (from history), that resulted in red alerts:
[free]
total used free shared buffers cached
Mem: 3921396 3894772 26624 0 302132 3887292
-/+ buffers/cache: 18014398509187332 4216048
Swap: 2104472 904 2103568
- and
[free]
total used free shared buffers cached
Mem: 3921396 3851576 69820 0 319192 3903296
-/+ buffers/cache: 18014398509111072 4292308
Swap: 2104472 904 2103568
- two different days.
Also, what version of Xymon are you running on your Xymon server?
Im actually running 4.4.0-1 on the server, as I was bold when I implemented Xymon originally. Regards, Carl Melgaard
list Carl Melgaard
▸
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296 -/+ buffers/cache: 18014398509111072 4292308 Swap: 2104472 904 2103568I have to plead "not guilty" on behalf of Xymon, then. The data reported by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets bogus data, then you will also have bogus results.
Yes, thats understandable. Is there any way I can NOT trigger notification on these bogus alerts? Disable the MEMACT check for that host? Regards, Carl Melgaard
list Henrik Størner
▸
In <user-b025057e2892@xymon.invalid> Carl Melgaard <user-cdea55422fa4@xymon.invalid> writes:
[free] total used free shared buffers cached Mem: 3921396 3851576 69820 0 319192 3903296=
=20-/+ buffers/cache: 18014398509111072 4292308=20 Swap: 2104472 904 2103568I have to plead "not guilty" on behalf of Xymon, then. The data reported=20 by "free" in the "+/- buffers/cache" line is obviously bogus - but it is what Xymon uses for the "Actual" memory calculations. If Xymon gets=20
▸
bogus data, then you will also have bogus results.
Yes, thats understandable. Is there any way I can NOT trigger notification = on these bogus alerts? Disable the MEMACT check for that host?
Not in the code You have. But it seems reasonable to add some sort of sanity
check in the memory-status handler, so I've done that to only act on the
data when the percent-used is at most 100%. So you will a) not get alerts
from the bogus data, and b) you can disable all memory alerts by setting
a threshold greater than 100.
Patch below should apply to 4.3.0-beta3.
Regards,
Henrik
Index: xymond/xymond_client.c
===================================================================
--- xymond/xymond_client.c (revision 6590)
+++ xymond/xymond_client.c (working copy)
@@ -883,17 +883,19 @@
get_memory_thresholds(hinfo, clientclass, &physyellow, &physred, &swapyellow, &swapred, &actyellow, &actred);
memphyspct = (memphystotal > 0) ? ((100 * memphysused) / memphystotal) : 0;
- if (memphyspct > physyellow) physcolor = COL_YELLOW;
- if (memphyspct > physred) physcolor = COL_RED;
+ if (memphyspct <= 100) {
+ if (memphyspct > physyellow) physcolor = COL_YELLOW;
+ if (memphyspct > physred) physcolor = COL_RED;
+ }
- if (memswapused != -1) {
- memswappct = (memswaptotal > 0) ? ((100 * memswapused) / memswaptotal) : 0;
+ if (memswapused != -1) memswappct = (memswaptotal > 0) ? ((100 * memswapused) / memswaptotal) : 0;
+ if (memswappct <= 100) {
if (memswappct > swapyellow) swapcolor = COL_YELLOW;
if (memswappct > swapred) swapcolor = COL_RED;
}
- if (memactused != -1) {
- memactpct = (memphystotal > 0) ? ((100 * memactused) / memphystotal) : 0;
+ if (memactused != -1) memactpct = (memphystotal > 0) ? ((100 * memactused) / memphystotal) : 0;
+ if (memactpct <= 100) {
if (memactpct > actyellow) actcolor = COL_YELLOW;
if (memactpct > actred) actcolor = COL_RED;
}
@@ -927,14 +929,24 @@
addtostatus(msgline);
if (memactused != -1) {
- sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n",
- colorname(actcolor), "Actual", memactused, memphystotal, memactpct);
+ if (memactpct <= 100)
+ sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n",
+ colorname(actcolor), "Actual", memactused, memphystotal, memactpct);
+ else
+ sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%% - invalid data\n",
+ colorname(COL_CLEAR), "Actual", memactused, memphystotal, 0);
• addtostatus(msgline);
}
if (memswapused != -1) {
- sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n",
- colorname(swapcolor), "Swap", memswapused, memswaptotal, memswappct);
+ if (memswappct <= 100)
+ sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%%\n",
+ colorname(swapcolor), "Swap", memswapused, memswaptotal, memswappct);
+ else
+ sprintf(msgline, "&%s %-12s%11luM%11luM%11lu%% - invalid data\n",
+ colorname(COL_CLEAR), "Swap", memswapused, memswaptotal, 0);
• addtostatus(msgline);
}
if (fromline && !localmode) addtostatus(fromline);
list Tim McCloskey
Thanks Vernon. I know that what we see is not 100% accurate but for the majority of my zones they are only used for one purpose (only one child zone per phys server). So, for me, the data provided from the zone is useful. We don't really measure exact numbers, more of trends and watching for quick spikes.
▸
From: Vernon Everett [user-b3f8dacb72c8@xymon.invalid] Sent: Tuesday, December 14, 2010 7:16 PM To: xymon at xymon.com Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host Not much point in doing memory monitoring on a Solaris sparse zone. Might as well put a check in the client script to not collect memory info for a sparse zone with capped memory. See here for more info. http://www.xymon.com/archive/2010/02/msg00213.html Regards Vernon On Wed, Dec 15, 2010 at 6:11 AM, Tim McCloskey <user-440820cc07d6@xymon.invalid<mailto:user-440820cc07d6@xymon.invalid>> wrote: Henrik, Thanks for the speedy answer. I had seen this in fun in hobbitd/client/$clients.c. You must enjoy porting that part of the project each time some OS makes a change :) Trivia: On Solaris 10 zones prtconf can get the installed "Memory size:" But anything further (like prtdiag) will fail. System Configuration: Sun Microsystems i86pc Memory size: 32768 Megabytes System Peripherals (Software Nodes): prtconf: devinfo facility not available Regards, Tim From: Henrik Størner [user-ce4a2c883f75@xymon.invalid<mailto:user-ce4a2c883f75@xymon.invalid>] Sent: Tuesday, December 14, 2010 1:50 PM To: xymon at xymon.com<mailto:xymon at xymon.com> Subject: Re: SV: [xymon] Xymon reports excessive memory usage on 1 SLES 11 host On Tue, 14 Dec 2010 13:42:21 -0800, Tim McCloskey wrote:
What does the Solaris client use to get this data? vmstat? (free is not a native solaris tool).
Each OS has their own way of reporting memory utilisation - it is completely non-standard, and the one part of the Xymon client that requires the most code for each new OS! Specifically for Solaris, Xymon uses prtconf to determine how much memory is installed, and vmstat to determine how much is being used. "swap -s" was used for determining how much swap was being used, but earlier today I committed an update so we will now use "swap -l" instead. Regards, Henrik
xymon-unsubscribe at xymon.com<mailto:xymon-unsubscribe at xymon.com>