Xymon Mailing List Archive search

Tricky one for log file monitoring

8 messages in this thread

list Neil Simmonds · Thu, 22 Mar 2012 10:41:09 -0000 ·
We have a requirement that would allow us to move some more of our
monitoring into Xymon if we could find a way of doing it.

 
Basically the issue is this,

 
Message appears in log file for failure - from this we want an alert
that will stay active and not expire after 30 minutes like log file
alerts usually do.

 
We will hopefully then get a message in the log file that tells us of
completion of the failed process, at this point we want to clear the
alert.

 
Is anyone doing anything like this or have any idea how we might go
about it?

 
Regards,

Neil Simmonds

Name & Registered Office: EXPRESS GIFTS LIMITED, 2 GREGORY ST, HYDE, CHESHIRE, ENGLAND, SK14 4TH, Company No. 00718151.
Express Gifts Limited is authorised and regulated by the Financial Services Authority
NOTE:  This email and any information contained within or attached in a separate file is confidential and intended solely for the Individual to whom it is addressed. The information or data included is solely for the purpose indicated or previously agreed. Any information or data included with this e-mail remains the property of Findel PLC and the recipient will refrain from utilising the information for any purpose other than that indicated and upon request will destroy the information and remove it from their records.  Any views or opinions presented are solely those of the author and do not necessarily represent those of Findel PLC. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. No warranties or assurances are made in relation to the safety and content of this e-mail and any attachments.  No liability is accepted for any consequences arising from it. Findel Plc reserves the right to monitor all e-mail communications through its internal and external networks. If you have received this email in error please notify our IT helpdesk on +44(0) 1254 303030
list Henrik Størner · Thu, 22 Mar 2012 12:11:01 +0100 ·
On Thu, 22 Mar 2012 10:41:09 -0000, "Neil Simmonds"
quoted from Neil Simmonds
<user-8188d25e65e4@xymon.invalid> wrote:
Message appears in log file for failure - from this we want an alert
that will stay active and not expire after 30 minutes like log file
alerts usually do.

We will hopefully then get a message in the log file that tells us of
completion of the failed process, at this point we want to clear the
alert.
It's not something that the Xymon client will do automatically, but you
can script your way out of it. What I would do is to create a custom test
for this - something like this:

#!/bin/sh

# Logfile we monitor
FN="/var/log/mylogfile"
# Message patterns that say "alert" or "OK"
ALERTMSG="Something bad"
OKMSG="All OK"

# Use the data from the "logfetch" status to grab the last 5 minutes of
log data
FPOS=`cat $XYMONTMP/logfetch.${MACHINEDOTS}.status | grep "^${FN}:" | cut
-d: -f2`
LASTMSG=`dd if=$FN bs=1 skip=$FPOS 2>/dev/null | egrep "$ALERTMSG|$OKMSG"
| tail -n 1`

# LASTMSG now holds the last message which is either an alert or an OK
message
#
# Actually the whole "cat ... grep ... cut ... dd .." thing is not needed,
since 
# you could just scan the entire logfile and pick out the last message
which is 
# either OK or alert... you could just do
# LASTMSG=`egrep "$ALERTMSG|$OKMSG" $FN | tail -n 1`

# Determine color
COLOR="green"
if test `echo "$LASTMSG" | grep -c "$ALERTMSG"` -ne 0
then
   COLOR=red
fi

# Send the status with a very long duration so it doesnt go purple.
$XYMON $XYMSRV "status+365d $MACHINE.mylog $COLOR `date`

Last message seen: $LASTMSG
"

exit 0


This raises two interesting ideas:

1) We should have status-messages that don't expire (go purple). Using a
very long status lifetime is a kludge, really.
2) The log analysis tool should know how to handle messages that cancel
each other out.


Regards,
Henrik
list John Horne · Thu, 22 Mar 2012 11:32:45 +0000 ·
quoted from Henrik Størner
On Thu, 2012-03-22 at 12:11 +0100, user-ce4a2c883f75@xymon.invalid wrote:
This raises two interesting ideas:

1) We should have status-messages that don't expire (go purple). Using a
very long status lifetime is a kludge, really.
Hello,

We have a similar requirement to the OPs. What would be nice is for the
'xymon' command to have a 'remove' command which would remove a
(permanent, non-purple) status for a given host/service. Along with the
remove command would be an 'ID' which must be the same as that sent with
the original status message. That way only known ID statuses could be
removed, any unknown ID caused the remove to be ignored.

I won't go into details, but we have been running BB for several years
using an in-house modified 'TheState' BB addon. This allows us to send
permanent (non-purple) status messages when an event occurs, and then
can remove them when required (via a web frontend which simply calls the
'bb' command to talk to the server).

What the OP has asked for we already do here, it is on my TODO list to
see how we can get Xymon to do it too! :-)


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list Ken Connell · Thu, 22 Mar 2012 11:54:20 +0000 ·
I do something similar to what your asking to check one of our DHCP servers to see about "low dhcp pools" and update Xymon.  
I wanted to:
Run swatch on the dhcp server and watch for messages in the log that indicate "low dhcp pool", then execute a script that updates xymon. This would be ideal and real time, but... 
Because of issues, I did:
Periodically ssh to the dhcp server (from my xymon server), grab logs from the last "x" minutes, grep out dhcp pool info, then update xymon based on what I found. 

   Ken Connell
Intermediate Network Engineer
Computer & Communication Services
Ryerson University
XXX Victoria St
RM AB50
Toronto, Ont
M5B 2K3
XXX-XXX-XXXX x6709


-----Original Message-----
From: Neil Simmonds <user-8188d25e65e4@xymon.invalid>
Sender: xymon-bounces at xymon.com
Date: Thu, 22 Mar 2012 10:41:09 To: <xymon at xymon.com>
Subject: [Xymon] Tricky one for log file monitoring
list John Horne · Thu, 22 Mar 2012 12:45:14 +0000 ·
quoted from Ken Connell
On Thu, 2012-03-22 at 11:54 +0000, user-7cb0f5662626@xymon.invalid wrote:
I do something similar to what your asking to check one of our DHCP
servers to see about "low dhcp pools" and update Xymon.  

I wanted to:
Run swatch on the dhcp server and watch for messages in the log that
indicate "low dhcp pool", then execute a script that updates xymon.
This would be ideal and real time, but... 
Within swatch could you not use the 'exec' command to invoke 'xymon' to
update the Xymon server? Something like (completely untested!):

   watchfor = /low dhcp pools/
      exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
`date` DHCP pools getting low!'

(The 'exec' is all on one line.)

The man page for 'xymon' has more details.
quoted from John Horne


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list Ken Connell · Thu, 22 Mar 2012 09:29:25 -0400 ·
My issue was getting swatch working on this particular sun os box....which I'm too crazy about.

That's exactly what I had in mind though :)
quoted from John Horne


Ken Connell
Intermediate Network Engineer
Computer & Communication Services
Ryerson University
XXX Victoria St
RM AB50
Toronto, Ont
M5B 2K3
XXX-XXX-XXXX x6709

----- Original Message -----
From: John Horne <user-e95f1ec2f147@xymon.invalid>
Date: Thursday, March 22, 2012 9:04 am
Subject: Re: [Xymon] Tricky one for log file monitoring
To: xymon at xymon.com

On Thu, 2012-03-22 at 11:54 +0000, user-7cb0f5662626@xymon.invalid wrote:
I do something similar to what your asking to check one of our DHCP
servers to see about "low dhcp pools" and update Xymon.  

I wanted to:
Run swatch on the dhcp server and watch for messages in the log that
indicate "low dhcp pool", then execute a script that updates xymon.
This would be ideal and real time, but... 
 Within swatch could you not use the 'exec' command to invoke 'xymon' 
to
 update the Xymon server? Something like (completely untested!):
 
    watchfor = /low dhcp pools/
       exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
 `date` DHCP pools getting low!'
 
 (The 'exec' is all on one line.)
 
 The man page for 'xymon' has more details.
 
 
 John.
 
 -- 
 John Horne                   Tel: +XX (X)XXXX XXXXXX
 Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
 
list Jeremy Laidman · Mon, 26 Mar 2012 13:40:27 +1100 ·
quoted from John Horne
On Thu, Mar 22, 2012 at 11:45 PM, John Horne <user-e95f1ec2f147@xymon.invalid> wrote:
Within swatch could you not use the 'exec' command to invoke 'xymon' to
update the Xymon server? Something like (completely untested!):

  watchfor = /low dhcp pools/
     exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
`date` DHCP pools getting low!'
Won't the test go purple after a while?  A way to get around this
would be to create a status file, and then have another process
"refresh" Xymon based on the content of the file:

    exec = echo 'status <localhostname>.dhcp red `date` DHCP pools
getting low!' > /var/tmp/dhcp.status

Then in tasks.cfg:

[dhcp]
  ENVFILE /usr/lib/xymon/server/etc/xymonserver.cfg
  CMD xymon <xymonserverIP> `cat /var/tmp/dhcp.status
  INTERVAL 5m

J
list John Horne · Mon, 26 Mar 2012 10:32:43 +0100 ·
quoted from Jeremy Laidman
On Mon, 2012-03-26 at 13:40 +1100, Jeremy Laidman wrote:
On Thu, Mar 22, 2012 at 11:45 PM, John Horne <user-e95f1ec2f147@xymon.invalid> wrote:
Within swatch could you not use the 'exec' command to invoke 'xymon' to
update the Xymon server? Something like (completely untested!):

  watchfor = /low dhcp pools/
     exec = xymon <xymonserver IP> 'status <localhostname>.dhcp red
`date` DHCP pools getting low!'
Won't the test go purple after a while?
As it stands, yes, so maybe use something like 'status+5d'.
quoted from Jeremy Laidman
A way to get around this would be to create a status file, and then
have another process "refresh" Xymon based on the content of the file:
To achieve this with our old BB system with TheState, we had an 'expire'
time included in the status message. A separate process then ran every 5
mins or so and looked for 'expired' statuses and deleted them. This then
allowed a 'default' status to be shown (usually green). (Sorry probably
didn't describe that too well, it is a tad complex but has worked well.)
quoted from Ken Connell


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX