Xymon Mailing List Archive search

Client interval question

list Scott Walters
Thu, 15 Dec 2005 03:16:22 -0500
Message-Id: <user-9c39715e3c9e@xymon.invalid>

First off, I know I can come off terse in e-mail, but they are not  personal attacks.
It can be a bad idea sometimes, others not (for example, the reply  from
the person catching intermittant problems with BB running every  minute)
Who ended up stating  the anomaly *was* detected in 5m intervals, but  only once every 13h instead of every hour.   But I still don't  understand how it will help *you*.
A smaller sampling period can show things in a more granular  aspect. For example, a process kicks off and 5 minutes later you  see 100 errors (im keeping things generic for illustrative  purposes) Were those 100 errors in the first minute? the last?  constantly throughout the 5 minutes?
The 5m averages over a week would be quite low compared so a single  5m plot.  From that, one could extrapolate in the last 5m things have  not been 'normal'.
Im not saying your wrong, simply pointing out that it's not as  black and white as your making it.
And I am disagreeing with you ;)  I've been watching the data in  these graphs for many many years now, and I have yet to come across a  situation where having a 1m sampling/graphing period would have  helped me fix/improve something . . .

It's like a story problem with too much information, it makes coming  up with the real answer harder in the end.  Most people don't have  time/enegry/brains to be able to sift all the data correctly.   If if  they do, the 5m samples are good enough.

Most people (including really smart people that are forgetful) can't  deal with an auto-scaling y-axis.
Something being just interesting initially can sometimes uncover  problems that
you didn't see before.
Like I said, if you have job were interesting is worthwhile,  wonderful.  In my experience, most folks that are running the BB/ hobbit tools are involved in the operational aspects of  infrastructure, not R&D.
With the stock larrd/hobbit RRD definitions you are correct.  He'll
only use one of the five, and whine about the timestamp of the other
four.
Firstly, can you explain your comment in more detail?
RRD interpolates Time Series Data to put a value at a fixed  interval.  That is why you hardly ever see integers in the data.  If  you sample comes in at 299s, RRD interpolates what that value to what  would have been at 300s.  How this is done can be tuned.  The default  settings with the RRAs expect data to happen every 300s.  RRD will  only insert data one time within that interval.
Secondly,
im confused as to why you would state that I would "whine" about  anything
when you have no basis for a conclusion to that effect. It seems to  be a rather
pointed comment in a discussion that hasn't involved the use of  language that
would dictate a response like that.
"He'll whine" meant rrdtool, not you:

ERROR: illegal attempt to update using time 1042731000 when last  update time
is 1043099100 (minimum one second step)
That's whining in my book.  Sorry you thought I was speaking about you.
That is a very good point you make. There is a difference between
real-time analysis and capacity planning/trending. I don't however  think
that it is that far outside of hobbit's scope to try and leverage  it for
a more pointed analysis.
 From a software development standpoint there is a lot to be said  for: "Do one thing and do it well".  If architecting the RRD  framework for RTA breaks trending, bad idea.

My goal isn't to take every machine in my environment
and make them into 1 minute sampling period machines. To have the  ability to do
so on a machine-by-machine basis could be useful
Which is why I proposed another client collector for this activity.
That's my design you inherited and because of the complexity of the
parts, I think it is a very solid design.
I don't think anyone is really questioning that.
You are questioning that.  And that is fine.  I don't take it  personally you think there may be a better way.  I know my way may  not be the best, but I sure know exactly *why* I chose it.
Honestly, I don't claim to know anything about the way larrd and  hobbit
are coded in the slightest. There are difficulties to be sure, but  part of having a
community such as this is to foster ideas and innovation. Just  because you
don't think it's useful or that it's hard doesn't mean the same is  true for everyone out
there.
Ahhhhh, to the heart of the matter.   Don't suggest ideas in a public  forum if you are not prepared to defend them.  Fostering ideas comes  from intelligent discussions.  I merely wanted to understand why you  felt you needed a higher sampling rate from a business perspective.


scott