On Mon, Dec 12, 2005 at 01:12:20PM -0600, Jeff Newman wrote:
I wanted to move from a 5 minute interval on all my clients to a 1 minute
interval.
In all my years of Systems Administration, things that run every minute all the time usually end up being a "Bad Idea".
How will a smaller sampling period improve the service you provide?
the script is "vmstat 300 2" So do I need to update that to reflect 1 minute
as well (i.e. vmstat 60 2)?
Or is this by design? Are there others that might need to change that I
don't know about? Is the way I am going about this wrong?
That's an interesting question :-)
My job requires data be useful, not just interesting. That is not to say there aren't jobs were useful is good enough.
The graph DB's that vmstat feeds data into (the RRD files) are
constructed in such a way that a 5-minute interval is what makes
sense. So running them with anything else really just a waste of
ressources.
With the stock larrd/hobbit RRD definitions you are correct. He'll only use one of the five, and whine about the timestamp of the other four.
(I do have a patch here from a user that would allow you to configure
the RRD files for different data-collection frequencies, but that has
not been merged yet - primarily due to me being overloaded).
The design goal of larrd, (I can't speak for Henrik and hobbit/RRD) was capacity planning and trending. 5m samples are more than adequate for that activity.
IMO, sampling at a high frequency implies real-time performance analysis, and I've always felt that outside the scope of capacity planning and trending. EG. We don't run sendmail in debug all the time . . . .
All that being said, those long term trends are very helpful for problem resolution. One can compare a single 5m sample against an aggregate of 5m samples and determine if things are 'normal'. But the art of comparing all the activity within a single 5m sample for normal is very very difficult.
So no - you shouldn't change that vmstat command. But it is bad design
on my part to assume that the client polling period would always be
5 minutes - it's perfectly valid to run the client checks differently.
That's my design you inherited and because of the complexity of the parts, I think it is a very solid design. To become flexible enough to handle different sampling rates, the server would need to know the frequency of the tests. And then changing the RRD in the future is 'almost' impossible (very difficult at the least). And I've never seen what happens to 1.5 years of data when you start messing with the RRD.
In the end, I think you'd get the worst of both worlds.
I'll think about what's the most sensible solution. It probably would be
to only start the vmstat command if one isn't running; that does assume
that you will run the client scripts *at least once* every 5 minutes.
I disagree. If real-time performance analysis is needed, I would pick other tools -- "vmstat 5" works for me;) Or construct/fork the client agent specifically designed for such a task, and run it on an as-needed basis.
Then try and decide for real time perf analysis if the sampling rate should be 5s or 1m ;)
scott