FreeBSD Actual Memory Usage
list Jeremy Laidman
Good Xymon Folks Xymon doesn't support reporting "actual" memory usage for FreeBSD systems - that is, available memory that may or may not be in use for buffers and cache. It only reports, graphs, and alerts on, swap and physical memory usage. Some OSes use some of the unused memory for filesystem caching and other purposes related to performance, and so the reported free memory count goes down over time even though the memory available to for use is not decreasing. On my systems, free memory is only a few percent. So this doesn't give any indication of the risk of memory "resource exhaustion". So what I really need is to report on "actual" memory usage. From what reading I've done, for FreeBSD this would be total memory subtract "free" + "inactive" memory, although that depends on who you ask. The "actual" memory reporting is not only a problem for FreeBSD, but for all supported OSes except for Linux, IRIX and Windows. I suspect many of these OSes report their free memory to include used-but-available memory also, and so the "real" available memory is a useful number. Even real memory usage reporting seems to have caused trouble in the past for FreeBSD, as Xymon has had to have its own client-side binary for getting the memory numbers, as is also the case for HPUX and the other *BSDs, NetBSD and OpenBSD. For all other OS types, standard OS tools (free, sar) are used to get memory usage numbers. Memory usage reporting in Xymon seems to be quite a mixed bag, in general. Some clients report usage to [memory], some to [freemem], some to [meminfo], some to [free]. The Irix client has no specific memory usage report at all. This is by no means a complaint - I'm sure Henrik would have been much happier if all OSes had a standard memory query interface, and I suspect a lot of these different reporting methods were for legacy support. So. I need to get actual memory reported for some FreeBSD systems, so I'm trying to work out the best way to do this. I'd like to fix it in a way that fits in with the "standard" model, but as I described above, there isn't really a "standard" model. Here are several ways I can think of to solve the problem: 1. FreeBSD reports [top] header output, if top is installed (as do many OSes). The server-side code could simply grab the numbers from there. This would work for other OSes that don't report "active" memory, and could be a common interface to memory usage metrics. Any new OS that doesn't have extensive client support could simply report a [top] section in client data, and Xymon would magically start reporting actual free memory. The down-side to this is that top isn't installed everywhere. Although on my systems it is, so that's OK. The other down-side is that it requires patches to the server code. 2. I could replace the "freebsd-meminfo" binary that comes with the Xymon client, so that the "free" figure has the "inactive" memory added (or whatever adjustments are appropriate). This doesn't solve the problem for any other OS. Perhaps that's OK - perhaps the problem is very much OS dependent because each OS has its own unique memory management. I think the binary can be replaced by a simple shell script that parses the "top" header or "sysctl vm.vmtotal" for the correct figures. (It seems that "top" gets all its numbers from sysctl anyway, so I'd do the latter, so as to avoid a dependency.) 3. I could report each of the different memory metrics separately to Xymon: active, inactive, wired, cache, buffers, free. Then I can graph them all, and look for various conditions on each of them separately, or in certain combinations that make sense. This is the most flexible option, and would provide the highest degree of insight to someone trying to troubleshoot a sluggish server, but it requires a lot more work on both client and server. It's also specific to *BSD systems. So, any other suggestions on the best way to achieve this? Which of the above is the best approach, do you think? The other issue I have is that nobody seems to agree on what's a useful measure to keep an eye on. The Xymon server-side code for Darwin reports used memory as the sum of active, inactive and wired. But other sources use the sum of active, wired, cache and buffers. Yet other sources say that buffers cannot be freed, and also that inactive pages are kind-of available if needed. My intention is to be able to predict when it's time to add RAM to avoid performance degradation, but it's not clear what numbers are going to give me that. Cheers Jeremy
list Mark Felder
▸
On Thu, Nov 21, 2013, at 0:23, Jeremy Laidman wrote:
Good Xymon Folks Xymon doesn't support reporting "actual" memory usage for FreeBSD systems -
Correct, and this is quite annoying!
▸
3. I could report each of the different memory metrics separately to Xymon: active, inactive, wired, cache, buffers, free. Then I can graph them all, and look for various conditions on each of them separately, or in certain combinations that make sense. This is the most flexible option, and would provide the highest degree of insight to someone trying to troubleshoot a sluggish server, but it requires a lot more work on both client and server. It's also specific to *BSD systems.
Yes, more data is better. For example, look at what Observium pulls over SNMP vs what Xymon reports: http://imgur.com/a/P4Qq1
▸
So, any other suggestions on the best way to achieve this? Which of the above is the best approach, do you think? The other issue I have is that nobody seems to agree on what's a useful measure to keep an eye on. The Xymon server-side code for Darwin reports used memory as the sum of active, inactive and wired. But other sources use the sum of active, wired, cache and buffers. Yet other sources say that buffers cannot be freed, and also that inactive pages are kind-of available if needed. My intention is to be able to predict when it's time to add RAM to avoid performance degradation, but it's not clear what numbers are going to give me that.
Graph it all as granularly as you can. Let the admins figure out what's important to monitor.
list Jeremy Laidman
▸
On 22 November 2013 08:49, Mark Felder <user-db141d317836@xymon.invalid> wrote:
Yes, more data is better. For example, look at what Observium pulls over SNMP vs what Xymon reports: http://imgur.com/a/P4Qq1
Now, that's what I want! Interestingly, Observium(SNMP) splits all memory into used+cached+buffers+shared+free. I don't know where those numbers come from - they don't map neatly to what "top" shows: active+inactive+wired+cache+buffers+free. So used+shared = active+inactive+wired?? According to this: http://www.daemonforums.org/showthread.php?t=2125Net-SNMP counts cache memory twice when calculating MIB::memAvailReal.0. It's a bit suspect.
▸
So, any other suggestions on the best way to achieve this?
Graph it all as granularly as you can. Let the admins figure out what's important to monitor.
You're correct of course. But it's the most work, and the least likely to get completed anytime soon. A bigger problem is that Xymon's genericised way of reporting memory is a call to unix_memory_report() with parameters for total, used and actual - and that's all. (For FreeBSD and others, "actual" is set to -1.) The function unix_memory_report() does the memory threshold checks (via status message) and also governs what gets sent to the RRD files. If I wanted to alert on all available memory numbers, and to have them all on the graph for the "memory" page, I'd have to find another way to get them sent to the RRD files and to check for threshold violations, because Xymon is simply not geared up to do this. And it probably won't ever be, because different OSes do memory management differently. I think what I'm left with is a two-prong approach. 1) Improve the "memory" page: I need to have "actual" memory reported by the client, and parsed by the OS-specific code in xymond, so that it thresholds on, and generates a status message with, 3 numbers instead of two. This needs adjustments to the client-side code client/freebsd-meminfo.c, to add an "Actual: nnn" line to its output; and also to the server-side code xymond/client/freebsd.c, to parse that line in the same way that the Linux code does. 2) Display the extra numbers: I need to get all the separate numbers - perhaps from [top] - reported into a completely separate graph (eg [topmem]), that can be viewed on the trends page. I can knock up a server-side perl script to do that right now, but ultimately this would be best done in the Xymon server-side code (probably xymond/client/freebsd.c), and could include thresholding if it makes sense. J
list Mark Felder
▸
On Nov 21, 2013, at 20:00, Jeremy Laidman <user-71895fb2e44c@xymon.invalid> wrote:
You're correct of course. But it's the most work, and the least likely to get completed anytime soon.
Let me save you a ton of work. Everything you need should be obtained from sysctl. https://feld.me/pub/freebsd/freebsd-memory.pl.txt
list Jeremy Laidman
▸
On 22 November 2013 13:10, Mark Felder <user-db141d317836@xymon.invalid> wrote:
Let me save you a ton of work. Everything you need should be obtained from sysctl. https://feld.me/pub/freebsd/freebsd-memory.pl.txt
I can get these numbers from the [top] client data, because top gets them
from sysctl system calls. So the client-side really is really the easy
part. Most of the work is in making changes to the Xymon server code - not
just because it's code, but also because it needs to be turned into a patch
and submitted for inclusion, vetted and approved by Henrik, then the new
code packaged up and installed onto Xymon servers.
J
list Henrik Størner
Den 22.11.2013 03:00, Jeremy Laidman skrev:
On 22 November 2013
08:49, Mark Felder <user-db141d317836@xymon.invalid [2]> wrote:
http://imgur.com/a/P4Qq1 [1]
Now, that's what I want!
[... snip
▸
...]
A bigger problem is that Xymon's genericised way of reporting memory is a call to unix_memory_report() with parameters for total, used and actual - and that's all. (For FreeBSD and others, "actual" is set to
-1.) The function unix_memory_report() does the memory threshold checks (via status message) and also governs what gets sent to the RRD files. If I wanted to alert on all available memory numbers, and to have them all on the graph for the "memory" page, I'd have to find another way to get them sent to the RRD files and to check for threshold violations, because Xymon is simply not geared up to do this. And it probably won't ever be, because different OSes do memory management differently.
<rant>Memory reporting is probably *the* single most bothersome monitoring item in Xymon. Every single OS seems to count memory differently, and different sources claim different ways of interpreting the same data. Not to mention that the OS providers frequently change what the numbers mean, or come up with new ways of reporting them. </rant> The way Xymon reports memory handling is very much due to historical events - how it was done in Big Brother. I agree that the "one-size-fits-all" approach in the current code is not the best way of doing it, unless your OS happens to nicely fit into the real+actual+swap metrics mold. However, it doesn't have to be that way. The clientdata handling code is specific for each type of client, and it would be perfectly possible for that code to NOT use the "unix_memory_report()" routine. The client code just needs to generate a status message; it can do that without calling unix_memory_report(). But you need to write some code specifically for that type of client, including the bit that grabs configuration data from analysis.cfg. You can also send data into an RRD file with a different layout, so you can have more data. Getting that rrd-graph to show up on a "memory" status is the tricky part, and right now I would recommend that you simply use a different name for the memory status. So it is not an un-solvable problem, but someone needs to figure out just how the memory metrics can be found by the client code, and how it should be interpreted over on the Xymon server. Regards, Henrik Links: [1] http://imgur.com/a/P4Qq1 [2] mailto:user-db141d317836@xymon.invalid
list Jeremy Laidman
▸
On 5 December 2013 01:09, <user-ce4a2c883f75@xymon.invalid> wrote:
The way Xymon reports memory handling is very much due to historical events - how it was done in Big Brother. I agree that the "one-size-fits-all" approach in the current code is not the best way of doing it, unless your OS happens to nicely fit into the real+actual+swap metrics mold.
I think many OSes do fit that mold these days. Those that do not can probably be fudged to do a useful approximation of either real+actual+swap or actual+swap. In most cases, the slightly nebulous "actual" number is all we care about (vs total RAM), as sysadmins who just want to know why our servers are misbehaving. So while more data points would be better, perfect is the enemy of good, and currently there are no useful memory numbers for some OSes. I don't believe it would take much to add a few more OSes into the list of those supported. But I think there are two goals here. One is to get "actual" memory included - that is, raise the usefulness above zero, which would be infinitely better; the other is to get all available metrics into Xymon to be available for analysis. I think the first of these goals needs just a few minor tweaks, mostly in unix_memory_report(). The second is a much more daunting task, because of the diversity of OSes, and this is where an alternative to unix_memory_report() might be warranted.
▸
You can also send data into an RRD file with a different layout, so you can have more data. Getting that rrd-graph to show up on a "memory" status is the tricky part, and right now I would recommend that you simply use a different name for the memory status.
I think the "memory" status should show the simpler set of memory stats (including "active" if available). If other memory stats are available, these should only be shown in trends.
▸
So it is not an un-solvable problem, but someone needs to figure out just how the memory metrics can be found by the client code, and how it should be interpreted over on the Xymon server.
Yes. And for that reason, I think the complex option might never be completed for most, if not all OSes. It might be better handled by custom server-side scripts that people can implement depending on their requirements. For this to work, only the Xymon client needs to be enhanced, to report the numbers. Let me re-iterate that I'm not complaining about any part of Xymon, and fully appreciate the difficulties in collecting useful memory data from heterogeneous systems and presenting them in a uniform and consistent way. I think the design of Xymon - even despite being somewhat an "evolved" beast - is excellent. So what I'm trying to do is to enhance Xymon in a way that's consistent with the current architecture and future direction. Henrik, I'm happy to do much, if not all, of the work to mod client and/or server code to support these enhancements. However, I'd like to be confident that it fits with your future directions for memory monitoring, and avoid adding yet another data collection method hacked into Xymon, that gets used by a shrinking minority of installs. Can you provide guidance on the best way to implement these features (or not)? I'm proposing that I/we: 1) Enhance the Xymon client to also send "active" memory usage, for FreeBSD and any other OSes that can do this. Also update the Xymon server to recognise the presence of "active", and make use of it in the same way that it currently does for Linux. The client data would be in the form of an enhanced [meminfo] section of the client message. (This could use the already-used-by-Linux [free] section, or the [memory] section used by bbwin, hpux, osf and solaris; or it could be a completely new section name, which would not be my preference). 2) Enhance the Xymon client to send the full range of OS-specific memory metrics available, included in the [meminfo] (or other) section, to apply to FreeBSD and any other OSes that can do this. This would allow for server-side extension scripts to query the [meminfo] client data and create RRD files as required. This would provide the _opportunity_ for Xymon to support parsing and reporting on these metrics, but this could be developed by champions of each OS who wanted the feature and knew enough to interpret what the numbers actually mean. Cheers Jeremy