Xymon Mailing List Archive search

system log and application log monitoring

list Henrik Størner
Sun, 4 Jun 2006 10:04:44 +0200
Message-Id: <user-5d3a7bdeb7ef@xymon.invalid>

On Fri, Jun 02, 2006 at 11:03:52AM -0500, Jeff Newman wrote:
Is there a facility already in place, or a way to graph the number of "hits"
returned by a pattern match for a log file?

For instance:

I am checking xyz log file for the word "wrap" It would be *very* useful to 
have a graph that shows the number of times that word showed up between the 
previous check and the current check.
No, there isn't.
This could be very useful to illustrate, say, a disk dying (one blip
of a bad read or something would be one thing, but looking at a graph
over time that shows 1 blip one week, 10 the next, and 20 the week
after that would indicate the disk was almost dead) etc...
Hobbit only looks at log entries over a 30-minute period, so we would
have to extend that significantly. So this would have to be done at the
client side rather than on the server. (Not a problem, I'm just thinking
out loud). 
Right now, the only way I have to do this is with a client side script that
runs in a constant loop:

while true; do
  NUM=`grep "Buffer wrapped" /quotes/env/errlog | wc -l | sed 's/  *//g'`
  if [ $NUM -gt $INITIALNUM ] ; then
     WRAP_NUM=`expr $NUM - $INITIALNUM`
     $BB $BBDISP "status $MACHINE.wraps green `date`
     `echo "wraps:$WRAP_NUM"`
     "
     INITIALNUM=$NUM
  else
     OKNUM=0
     $BB $BBDISP "status $MACHINE.wraps green `date`
     `echo "wraps:$OKNUM"`
     "
  fi
If all that you want is the graph and not alerts, then I wonder if it 
couldn't be done more easily. Just do the "grep" and report the number 
like you do now. Then send it into the NCV handler, with a dataset 
definition that uses the DERIVE datatype (which is the default, btw). 
Then RRDtool should handle all of the "subtract current value from 
previous value if it's greater, else ..." stuff and you needn't 
worry about it.

.....

After thinking a bit more about this, I believe that having a method to
do "grep ...| wc -l" in the client might be a good thing. So I've added
a new type of configuration the the client-local.cfg file, so you can do

    linecount:/var/log/messages
    diskerrors I/O error.*/dev/hd
    badlogins Login failed

and it will report back in the client message the data

   diskerrors: 0
   badlogins: 2

which are the number of times these two expressions were found in the
/var/log/messages file.

Given those data, on the server side it will be easy to feed them into
a graph and do other nice things with it.


Regards,
Henrik