Xymon + graphite

list Japheth Cleaver
Thu, 10 Dec 2015 08:02:15 -0800
Message-Id: <user-cf7910cdb8e7@xymon.invalid>


On Thu, December 10, 2015 2:49 am, Jeremy Laidman wrote:

On Tue, Dec 8, 2015 at 5:49 AM Galen Johnson <user-87f955643e3d@xymon.invalid>
wrote:

Has anyone tried to integrate alerting based on Graphite?  Or used
Graphite as a trending replacement to rrd?  I love Xymon for my
monitoring
but the limitations and aggregations of rrds are starting to become an
issue.

Nope, but I'm intrigued by Graphite.  Most of my servers have enormously
long trends pages because of all the extra graphs I've added.  These are
indispensable for tracking down weird faults.  But the number of graphs
and
RRD files has become unwieldy.  One major shortcoming is that I can't put
metrics from different hosts onto the same graph.  I've used RRGrapher <
http://pages.cs.wisc.edu/~plonka/RRGrapher/>; to let me create ad-hoc
graphs
like this, but it's obviously from last millennium, and could do with a
facelift.

I'd been looking at http://www.flotcharts.org/ and a few other RRD
graphing packages that could be used providing a more browseable
interface. There's absolutely a need (aside from the CSS work and a
potential "dashboard" view generally) for improved multi-host and
multi-graph views besides the linear trends output, I agree.

For trending, Xymon can threshold (alert) on RRD files with the "DS"
operator in analysis.cfg.  Perhaps this can be extended to alert on
Holt-Winters aberrant behaviour thresholds.  Doing the same sort of thing
with a rewrite of the g2zproxy probably wouldn't be too difficult, at
least
not on the Xymon side.

(Actually, the RRD files generated on new RPM installs have had HWPREDICT,
SEASONAL, and a few other RRA's configured for a while now, if anyone
feels like experimenting...)


One problem with the current RRD paradigm is that alerting is happening
only with data available at insertion time, not using data that's stored
into RRD file (or whatever metric store you have) already, so xymond_rrd
can't efficiently alert on things beyond that.

A "xymond_trend" could operate asynchronously on the RRD files, but to get
useful trend data back out of RRDs you'll need to flush the data to disk
first, which more or less blows out your I/O performance. Fine if you're
on SSD, but more of a problem if you're on heavily loaded spinning disks.

The problem there is just that there're just so many different ways of
doing this with a lot of different needs. To make something flexible
enough would require a good survey of what people are looking for.

(With that in mind -- What are people looking for? :) Maybe it's easier
than I'm thinking.)


Alternatively, sending the metric data off entirely to a different
package, which can reinject an alert into xymon if/when it notices a
trend, is an easily-approachable option using the RRD --processor option,
which can fork your metric feed off to whatever you like (OpenTSDB,
graphite, splunk, etc...). The re-posting of alerts back into xymon can be
done with that package's notification tool set and some scripting of xymon
messages.


Regards,
-jc