Xymon Mailing List Archive search

FreeBSD Actual Memory Usage

7 messages in this thread

list Jeremy Laidman · Thu, 21 Nov 2013 17:23:36 +1100 ·
Good Xymon Folks

Xymon doesn't support reporting "actual" memory usage for FreeBSD systems -
that is, available memory that may or may not be in use for buffers and
cache.  It only reports, graphs, and alerts on, swap and physical memory
usage.  Some OSes use some of the unused memory for filesystem caching and
other purposes related to performance, and so the reported free memory
count goes down over time even though the memory available to for use is
not decreasing.  On my systems, free memory is only a few percent.  So this
doesn't give any indication of the risk of memory "resource exhaustion".
 So what I really need is to report on "actual" memory usage.  From what
reading I've done, for FreeBSD this would be total memory subtract "free" +
"inactive" memory, although that depends on who you ask.

The "actual" memory reporting is not only a problem for FreeBSD, but for
all supported OSes except for Linux, IRIX and Windows.  I suspect many of
these OSes report their free memory to include used-but-available memory
also, and so the "real" available memory is a useful number.

Even real memory usage reporting seems to have caused trouble in the past
for FreeBSD, as Xymon has had to have its own client-side binary for
getting the memory numbers, as is also the case for HPUX and the other
*BSDs, NetBSD and OpenBSD.  For all other OS types, standard OS tools
(free, sar) are used to get memory usage numbers.

Memory usage reporting in Xymon seems to be quite a mixed bag, in general.
 Some clients report usage to [memory], some to [freemem], some to
[meminfo], some to [free].  The Irix client has no specific memory usage
report at all.  This is by no means a complaint - I'm sure Henrik would
have been much happier if all OSes had a standard memory query interface,
and I suspect a lot of these different reporting methods were for legacy
support.

So.  I need to get actual memory reported for some FreeBSD systems, so I'm
trying to work out the best way to do this.  I'd like to fix it in a way
that fits in with the "standard" model, but as I described above, there
isn't really a "standard" model.  Here are several ways I can think of to
solve the problem:

1. FreeBSD reports [top] header output, if top is installed (as do many
OSes).  The server-side code could simply grab the numbers from there.
 This would work for other OSes that don't report "active" memory, and
could be a common interface to memory usage metrics.  Any new OS that
doesn't have extensive client support could simply report a [top] section
in client data, and Xymon would magically start reporting actual free
memory.  The down-side to this is that top isn't installed everywhere.
 Although on my systems it is, so that's OK.  The other down-side is that
it requires patches to the server code.

2. I could replace the "freebsd-meminfo" binary that comes with the Xymon
client, so that the "free" figure has the "inactive" memory added (or
whatever adjustments are appropriate).  This doesn't solve the problem for
any other OS.  Perhaps that's OK - perhaps the problem is very much OS
dependent because each OS has its own unique memory management.  I think
the binary can be replaced by a simple shell script that parses the "top"
header or "sysctl vm.vmtotal" for the correct figures.  (It seems that
"top" gets all its numbers from sysctl anyway, so I'd do the latter, so as
to avoid a dependency.)

3. I could report each of the different memory metrics separately to Xymon:
active, inactive, wired, cache, buffers, free.  Then I can graph them all,
and look for various conditions on each of them separately, or in certain
combinations that make sense.  This is the most flexible option, and would
provide the highest degree of insight to someone trying to troubleshoot a
sluggish server, but it requires a lot more work on both client and server.
 It's also specific to *BSD systems.

So, any other suggestions on the best way to achieve this?  Which of the
above is the best approach, do you think?

The other issue I have is that nobody seems to agree on what's a useful
measure to keep an eye on.  The Xymon server-side code for Darwin reports
used memory as the sum of active, inactive and wired.  But other sources
use the sum of active, wired, cache and buffers.  Yet other sources say
that buffers cannot be freed, and also that inactive pages are kind-of
available if needed.  My intention is to be able to predict when it's time
to add RAM to avoid performance degradation, but it's not clear what
numbers are going to give me that.

Cheers
Jeremy
list Mark Felder · Thu, 21 Nov 2013 15:49:29 -0600 ·
quoted from Jeremy Laidman

On Thu, Nov 21, 2013, at 0:23, Jeremy Laidman wrote:
Good Xymon Folks

Xymon doesn't support reporting "actual" memory usage for FreeBSD systems
-
Correct, and this is quite annoying!
quoted from Jeremy Laidman
3. I could report each of the different memory metrics separately to
Xymon:
active, inactive, wired, cache, buffers, free.  Then I can graph them
all,
and look for various conditions on each of them separately, or in certain
combinations that make sense.  This is the most flexible option, and
would
provide the highest degree of insight to someone trying to troubleshoot a
sluggish server, but it requires a lot more work on both client and
server.
 It's also specific to *BSD systems.
Yes, more data is better. For example, look at what Observium pulls over
SNMP vs what Xymon reports:

http://imgur.com/a/P4Qq1
quoted from Jeremy Laidman

So, any other suggestions on the best way to achieve this?  Which of the
above is the best approach, do you think?

The other issue I have is that nobody seems to agree on what's a useful
measure to keep an eye on.  The Xymon server-side code for Darwin reports
used memory as the sum of active, inactive and wired.  But other sources
use the sum of active, wired, cache and buffers.  Yet other sources say
that buffers cannot be freed, and also that inactive pages are kind-of
available if needed.  My intention is to be able to predict when it's
time
to add RAM to avoid performance degradation, but it's not clear what
numbers are going to give me that.
Graph it all as granularly as you can. Let the admins figure out what's
important to monitor.
list Jeremy Laidman · Fri, 22 Nov 2013 13:00:22 +1100 ·
quoted from Mark Felder
On 22 November 2013 08:49, Mark Felder <user-db141d317836@xymon.invalid> wrote:
Yes, more data is better. For example, look at what Observium pulls over
SNMP vs what Xymon reports:

http://imgur.com/a/P4Qq1

Now, that's what I want!

Interestingly, Observium(SNMP) splits all memory into
used+cached+buffers+shared+free.  I don't know where those numbers come
from - they don't map neatly to what "top" shows:
active+inactive+wired+cache+buffers+free.  So used+shared =
active+inactive+wired??

According to this:
http://www.daemonforums.org/showthread.php?t=2125Net-SNMP counts cache
memory twice when calculating MIB::memAvailReal.0.
 It's a bit suspect.
quoted from Mark Felder
So, any other suggestions on the best way to achieve this?
Graph it all as granularly as you can. Let the admins figure out what's
important to monitor.
You're correct of course.  But it's the most work, and the least likely to
get completed anytime soon.

A bigger problem is that Xymon's genericised way of reporting memory is a
call to unix_memory_report() with parameters for total, used and actual -
and that's all.  (For FreeBSD and others, "actual" is set to -1.)  The
function unix_memory_report() does the memory threshold checks (via status
message) and also governs what gets sent to the RRD files.  If I wanted to
alert on all available memory numbers, and to have them all on the graph
for the "memory" page, I'd have to find another way to get them sent to the
RRD files and to check for threshold violations, because Xymon is simply
not geared up to do this.  And it probably won't ever be, because different
OSes do memory management differently.

I think what I'm left with is a two-prong approach.

1) Improve the "memory" page: I need to have "actual" memory reported by
the client, and parsed by the OS-specific code in xymond, so that it
thresholds on, and generates a status message with, 3 numbers instead of
two.  This needs adjustments to the client-side code
client/freebsd-meminfo.c, to add an "Actual: nnn" line to its output; and
also to the server-side code xymond/client/freebsd.c, to parse that line in
the same way that the Linux code does.

2) Display the extra numbers: I need to get all the separate numbers -
perhaps from [top] - reported into a completely separate graph (eg
[topmem]), that can be viewed on the trends page.  I can knock up a
server-side perl script to do that right now, but ultimately this would be
best done in the Xymon server-side code (probably xymond/client/freebsd.c),
and could include thresholding if it makes sense.

J
list Mark Felder · Thu, 21 Nov 2013 20:10:01 -0600 ·
quoted from Jeremy Laidman
On Nov 21, 2013, at 20:00, Jeremy Laidman <user-71895fb2e44c@xymon.invalid> wrote:
You're correct of course.  But it's the most work, and the least likely to get completed anytime soon.
Let me save you a ton of work. Everything you need should be obtained from sysctl.

https://feld.me/pub/freebsd/freebsd-memory.pl.txt
list Jeremy Laidman · Fri, 22 Nov 2013 13:38:17 +1100 ·
quoted from Mark Felder
On 22 November 2013 13:10, Mark Felder <user-db141d317836@xymon.invalid> wrote:
Let me save you a ton of work. Everything you need should be obtained from
sysctl.

https://feld.me/pub/freebsd/freebsd-memory.pl.txt

I can get these numbers from the [top] client data, because top gets them
from sysctl system calls.  So the client-side really is really the easy
part.  Most of the work is in making changes to the Xymon server code - not
just because it's code, but also because it needs to be turned into a patch
and submitted for inclusion, vetted and approved by Henrik, then the new
code packaged up and installed onto Xymon servers.

J
list Henrik Størner · Wed, 04 Dec 2013 15:09:36 +0100 ·
 

Den 22.11.2013 03:00, Jeremy Laidman skrev: 
On 22 November 2013
08:49, Mark Felder <user-db141d317836@xymon.invalid [2]> wrote:
http://imgur.com/a/P4Qq1 [1]
Now, that's what I want!
[... snip
quoted from Jeremy Laidman
...] 
A bigger problem is that Xymon's genericised way of reporting
memory is a call to unix_memory_report() with parameters for total, used
and actual - and that's all. (For FreeBSD and others, "actual" is set to
-1.) The function unix_memory_report() does the memory threshold checks
(via status message) and also governs what gets sent to the RRD files.
If I wanted to alert on all available memory numbers, and to have them
all on the graph for the "memory" page, I'd have to find another way to
get them sent to the RRD files and to check for threshold violations,
because Xymon is simply not geared up to do this. And it probably won't
ever be, because different OSes do memory management
differently.

<rant>Memory reporting is probably *the* single most
bothersome monitoring item in Xymon. Every single OS seems to count
memory differently, and different sources claim different ways of
interpreting the same data. Not to mention that the OS providers
frequently change what the numbers mean, or come up with new ways of
reporting them. </rant> 

The way Xymon reports memory handling is very
much due to historical events - how it was done in Big Brother. I agree
that the "one-size-fits-all" approach in the current code is not the
best way of doing it, unless your OS happens to nicely fit into the
real+actual+swap metrics mold. 

However, it doesn't have to be that
way. 

The clientdata handling code is specific for each type of client,
and it would be perfectly possible for that code to NOT use the
"unix_memory_report()" routine. The client code just needs to generate a
status message; it can do that without calling unix_memory_report(). But
you need to write some code specifically for that type of client,
including the bit that grabs configuration data from analysis.cfg. 

You
can also send data into an RRD file with a different layout, so you can
have more data. Getting that rrd-graph to show up on a "memory" status
is the tricky part, and right now I would recommend that you simply use
a different name for the memory status. 

So it is not an un-solvable
problem, but someone needs to figure out just how the memory metrics can
be found by the client code, and how it should be interpreted over on
the Xymon server. 

Regards, 

Henrik 

 
Links:
[1]
http://imgur.com/a/P4Qq1
[2] mailto:user-db141d317836@xymon.invalid
list Jeremy Laidman · Thu, 5 Dec 2013 16:30:12 +1100 ·
quoted from Henrik Størner
On 5 December 2013 01:09, <user-ce4a2c883f75@xymon.invalid> wrote:
The way Xymon reports memory handling is very much due to historical
events - how it was done in Big Brother. I agree that the
"one-size-fits-all" approach in the current code is not the best way of
doing it, unless your OS happens to nicely fit into the real+actual+swap
metrics mold.
I think many OSes do fit that mold these days.  Those that do not can
probably be fudged to do a useful approximation of either real+actual+swap
or actual+swap.  In most cases, the slightly nebulous "actual" number is
all we care about (vs total RAM), as sysadmins who just want to know why
our servers are misbehaving.

So while more data points would be better, perfect is the enemy of good,
and currently there are no useful memory numbers for some OSes.  I don't
believe it would take much to add a few more OSes into the list of those
supported.

But I think there are two goals here.  One is to get "actual" memory
included - that is, raise the usefulness above zero, which would be
infinitely better; the other is to get all available metrics into Xymon to
be available for analysis.  I think the first of these goals needs just a
few minor tweaks, mostly in unix_memory_report().  The second is a much
more daunting task, because of the diversity of OSes, and this is where an
alternative to unix_memory_report() might be warranted.
quoted from Henrik Størner
You can also send data into an RRD file with a different layout, so you
can have more data. Getting that rrd-graph to show up on a "memory" status
is the tricky part, and right now I would recommend that you simply use a
different name for the memory status.
I think the "memory" status should show the simpler set of memory stats
(including "active" if available).  If other memory stats are available,
these should only be shown in trends.
quoted from Henrik Størner
 So it is not an un-solvable problem, but someone needs to figure out just
how the memory metrics can be found by the client code, and how it should
be interpreted over on the Xymon server.
Yes.  And for that reason, I think the complex option might never be
completed for most, if not all OSes.  It might be better handled by custom
server-side scripts that people can implement depending on their
requirements.  For this to work, only the Xymon client needs to be
enhanced, to report the numbers.

Let me re-iterate that I'm not complaining about any part of Xymon, and
fully appreciate the difficulties in collecting useful memory data from
heterogeneous systems and presenting them in a uniform and consistent way.
 I think the design of Xymon - even despite being somewhat an "evolved"
beast - is excellent.  So what I'm trying to do is to enhance Xymon in a
way that's consistent with the current architecture and future direction.

Henrik, I'm happy to do much, if not all, of the work to mod client and/or
server code to support these enhancements.  However, I'd like to be
confident that it fits with your future directions for memory monitoring,
and avoid adding yet another data collection method hacked into Xymon, that
gets used by a shrinking minority of installs.  Can you provide guidance on
the best way to implement these features (or not)?  I'm proposing that I/we:

1) Enhance the Xymon client to also send "active" memory usage, for FreeBSD
and any other OSes that can do this.  Also update the Xymon server to
recognise the presence of "active", and make use of it in the same way that
it currently does for Linux.  The client data would be in the form of an
enhanced [meminfo] section of the client message.  (This could use the
already-used-by-Linux [free] section, or the [memory] section used by
bbwin, hpux, osf and solaris; or it could be a completely new section name,
which would not be my preference).

2) Enhance the Xymon client to send the full range of OS-specific memory
metrics available, included in the [meminfo] (or other) section, to apply
to FreeBSD and any other OSes that can do this.  This would allow for
server-side extension scripts to query the [meminfo] client data and create
RRD files as required.  This would provide the _opportunity_ for Xymon to
support parsing and reporting on these metrics, but this could be developed
by champions of each OS who wanted the feature and knew enough to interpret
what the numbers actually mean.

Cheers
Jeremy