I'm working on improving my Xymon configuration to reduce the number of false alerts that we get. In particular, memory monitoring is a bit of a problem so I'm hoping someone will be able to offer some advice.
At the moment, Xymon is set up with something like:
MEMPHYS 100 101
MEMSWAP 20 40
MEMACT 95 97
I pretty much don't care about MEMPHYS. The problem with MEMSWAP and MEMACT is that they work independently or each other - i.e. the above will give me an alert if > 97% of the RAM is used OR > 40% of swap is used.
However, this results in warnings for systems that have a lot of idle data in memory. The Linux kernel will page out idle data (increasing swap usage and reducing RAM usage) and use that space for buffers/caches, and this is a very sensible strategy. Unfortunately, then Xymon comes along and notices that there's lots of swap in use and throws an alert, even though there's plenty of RAM free.
Basically, I don't care that a machine is 4GB into swap if it has 5GB of free ram - that isn't a problem, it just means there's quite a lot of idle data that the kernel has decided can be paged out. I do care if it's 4GB into swap and only has 0.5GB of free RAM since this would indicate that it's actually short of memory.
What I really need is to warn if > x% of the RAM is used AND > y% of swap is used - is there a way to do that?
Thanks.
--
- Steve Hill
Technical Director
Opendium Limited http://www.opendium.com
Direct contacts:
Instant messager: xmpp:user-8cda31fbea61@xymon.invalid
Email: user-8cda31fbea61@xymon.invalid
Phone: sip:user-8cda31fbea61@xymon.invalid
Sales / enquiries contacts:
Email: user-2675bcaab7d4@xymon.invalid
Phone: +XX-XXXX-XXXXXX / sip:user-2675bcaab7d4@xymon.invalid
Support contacts:
Email: user-126f03e2871f@xymon.invalid
Phone: +XX-XXXX-XXXXXX / sip:user-126f03e2871f@xymon.invalid