Xymon Mailing List Archive search

AIX memory checks

8 messages in this thread

list Rob McBroom · Tue, 28 Jun 2011 09:29:42 -0400 ·
We’re seeing garbage data on some of our AIX (7.1) clients like this:

red Thu Jun 23 15:29:28 EDT 2011 - Memory CRITICAL
  Memory              Used       Total  Percentage
&red Physical    18446744073709551177M       4096M18446744073709551606%
&green Swap                 40M       4096M          0%

The ones that exhibit problems have Active Memory Expansion enabled. Is there anything we can do to get valid data? Should I report it as a bug?

-- 
Rob McBroom
<http://www.skurfer.com/>;
list Carl Melgaard · Wed, 29 Jun 2011 09:47:16 +0200 ·
Hi,

As far as I recall, Henrik fixed this in the newer versions - 4.3-branch - something with a sanity-check in the code, so that we shouldnt get these bogus alerts. Problem is (as I remember) that its actually the system that sends these values back to the agent - so it's not related to Xymon.

/melgaard
quoted from Rob McBroom

-----Oprindelig meddelelse-----
Fra: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] På vegne af Rob McBroom
Sendt: 28. juni 2011 15:30
Til: xymon at xymon.com
Emne: [Xymon] AIX memory checks

We're seeing garbage data on some of our AIX (7.1) clients like this:

red Thu Jun 23 15:29:28 EDT 2011 - Memory CRITICAL
  Memory              Used       Total  Percentage
&red Physical    18446744073709551177M       4096M18446744073709551606%
&green Swap                 40M       4096M          0%

The ones that exhibit problems have Active Memory Expansion enabled. Is there anything we can do to get valid data? Should I report it as a bug?

-- 
Rob McBroom
<http://www.skurfer.com/>;
list Stef Coene · Wed, 29 Jun 2011 10:55:28 +0200 ·
quoted from Rob McBroom
On Tuesday 28 June 2011, Rob McBroom wrote:
We’re seeing garbage data on some of our AIX (7.1) clients like this:

red Thu Jun 23 15:29:28 EDT 2011 - Memory CRITICAL
  Memory              Used       Total  Percentage
&red Physical    18446744073709551177M       4096M18446744073709551606%
&green Swap                 40M       4096M          0%

The ones that exhibit problems have Active Memory Expansion enabled. Is
there anything we can do to get valid data? Should I report it as a bug?
I don't seet his on my AIX 7.1 test systems.  AME is not enabled.

Can you post the output of
vmstat 1 2


Stef

This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
list Rob McBroom · Wed, 29 Jun 2011 09:15:27 -0400 ·
On Jun 29, 2011, at 4:55 AM, Stef Coene wrote:
Can you post the output of
vmstat 1 2
I looked at that a bit, but I have no idea what normal is, so it didn’t tell me much. I certainly didn’t see anything like “18446744073709551177M”. :)

The problem seems to have cleared on its own for now. If it reappears, I’ll send the output from that command. Thanks.

-- 
Rob McBroom
<http://www.skurfer.com/>;
list Rob McBroom · Mon, 1 Aug 2011 14:04:18 -0400 ·
quoted from Rob McBroom
On Jun 29, 2011, at 9:15 AM, Rob McBroom wrote:
The problem seems to have cleared on its own for now. If it reappears, I’ll send the output from that command. Thanks.

OK, here's one. The report looks like this:

    hostname:memory red [884137]
    red Mon Aug  1 11:23:33 EDT 2011 - Memory CRITICAL
      Memory              Used       Total  Percentage
    &red Physical    18446744073709546509M      16384M18446744073709551585%
    &green Swap                 40M       4096M          0%
    

The vmstat section looks like this:

    [vmstat]
    
    System configuration: lcpu=16 mem=28672MB ent=0.50
    
    kthr    memory              page              faults              cpu          
    ----- ----------- ------------------------ ------------ -----------------------
     r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa    pc    ec
     1  1 1025924 5498980   0   0   0   0    0   0  25 2080 1221  8  4 88  0  0.10  19.6
     2  1 1026236 5498663   0   0   0   0    0   0  25 2152 1213  8  4 88  0  0.10  19.6


-- 
Rob McBroom
<http://www.skurfer.com/>;
list Henrik Størner · Mon, 01 Aug 2011 22:38:15 +0200 ·
quoted from Rob McBroom
The vmstat section looks like this:

     [vmstat]

     System configuration: lcpu=16 mem=28672MB ent=0.50
What does the "[realmem]" section look like ? If you look in the xymonclient-aix.sh script, you'll see that it uses these commands:

echo "[realmem]"
lsattr -El sys0 -a realmem
echo "[freemem]"
vmstat 1 2 | tail -1
echo "[swap]"
lsps -s

All of this is in the "client data" linked to from the detailed status page. It would be nice to have an example both when it is green and when it is red.


Regards,
Henrik
list Rob McBroom · Tue, 2 Aug 2011 10:28:10 -0400 ·
On Aug 1, 2011, at 4:38 PM, Henrik Størner wrote:
What does the "[realmem]" section look like ?
This is what it currently looks like (still red and reporting invalid numbers). The client is version 4.3.3, by the way. Server is 4.2.2-RC1 if it matters.

[realmem]
realmem 16777216 Amount of usable physical memory in Kbytes False
[freemem]
0  0 1051672 5471551   0   0   0   0    0   0  10  577 525  1  3 96  0  0.03   6.7
[swap]
Total Paging Space   Percent Used
     4096MB               1%
All of this is in the "client data" linked to from the detailed status page. It would be nice to have an example both when it is green and when it is red.
It’s generally red, but here’s the same data from a nearly identical system that’s green. This one has more memory, but the configuration should be the same.

[realmem]
realmem 33554432 Amount of usable physical memory in Kbytes False
[freemem]
9  3 9838132 4694881   0   0   0 909  947   0 2836 50135 29272 50 11 38  1  2.90  96.7
[swap]
Total Paging Space   Percent Used
     4096MB               3%

Thanks.

-- 
Rob McBroom
<http://www.skurfer.com/>;
list Henrik Størner · Tue, 02 Aug 2011 18:11:48 +0200 ·
quoted from Rob McBroom
On 02-08-2011 16:28, Rob McBroom wrote:
This is what it currently looks like (still red and reporting invalid numbers). The client is version 4.3.3, by the way. Server is 4.2.2-RC1 if it matters.
Ouch, that is an old server you have there ...

I cannot say that upgrading will fix it, but there has been some changes along the way to how the memory-parsing code works. It also means that it will be difficult to provide any kind of patches to test.
quoted from Rob McBroom
[realmem]
realmem 16777216 Amount of usable physical memory in Kbytes False
[freemem]
0  0 1051672 5471551   0   0   0   0    0   0  10  577 525  1  3 96  0  0.03   6.7
[swap]
Total Paging Space   Percent Used
      4096MB               1%
Looks sane, and running it through my Linux/Intel test system here gives the correct result with 4.3.4.

The only possible bug I can see is that something weird happens when your compiler does the arithmetic because there are long int's and short int's involved. One thing you could try was to change the hobbitd/hobbitd_client.c file; in the "unix_memory_report()" function, remove the "unsigned" keyword from the declaration of the "memphyspct", "memswappct" and "memactpct" variables, and in the 4 "sprintf" commands below, change the "%11lu" formatting string to "%lld" (replace 'u' with 'd').

No idea if it helps, but it might. But I would strongly recommend upgrading your Xymon server to 4.3.4.


Regards,
Henrik