Xymon Mailing List Archive search

alerting on log messages

list Colin Coe
Wed, 18 Sep 2019 16:11:52 +0800
Message-Id: <user-c2b9c2081963@xymon.invalid>

Hi Jeremy

Many thanks for this, working perfectly.

Thanks again

On Wed, Sep 18, 2019 at 2:58 PM Jeremy Laidman <user-0608abae5e7c@xymon.invalid> wrote:
Ah, I worked it out. In client-local.cfg, the patterns after "ignore" and
"trigger" are defined as "regular expression". This is unlike (say)
analysis.cfg where patterns HOST and LOG keywords are defined as "string or
regular expression" and the "%" signifies a regular expression. Also, in
client-local.cfg, strings with spaces must be enclosed in quotes, wherease
in client-local.cfg, everything after the keyword (eg ignore, trigger) is
treated as the regular expression. So in client-local.cfg you must not
include a % or quotes. Instead you want something like:

[host=test_server_41]
log:/var/log/messages:1024000
ignore (Failed to fetch|Failed to parse|Failed to evaluate)
trigger (ORA-04091|Failed to log off resource|Failed to log on resource)

It can be very painful to troubleshoot problems with client-local.cfg
configs, especially as it can take up to 10 minutes for updates to
propagate to clients and generate new results. I like to create my own copy
of the logfile and the client-local.cfg snippet, and manually run logfetch
(which is what processes the client-local.cfg lines). For example, this is
what I used to diagnose Colin's problem:

xymon at server:/tmp/logfetch-test>* ls -l*
total 8
-rw-r--r-- 1 xymon xymon 160 2019-09-18 16:49 my-client-local.cfg
-rw-r--r-- 1 xymon xymon 273 2019-09-18 16:34 my-logfile.log

xymon at server:/tmp/logfetch-test> *cat my-logfile.log*
log line 1 ignore Failed to fetch bla bla
log line 2 ignore Failed to parse bla bla
log line 3 ignore Failed to evaluate bla bla
log line 4 trigger ORA-04091 bla bla
log line 5 trigger Failed to log off resource bla bla
log line 6 trigger Failed to log on resource bla bla

xymon at server:/tmp/logfetch-test> *cat my-client-local.cfg*
log:my-logfile.log:1024000
ignore (Failed to fetch|Failed to parse|Failed to evaluate)
trigger (ORA-04091|Failed to log off resource|Failed to log on resource)

xymon at server:/tmp/logfetch-test> *$XYMONCLIENTHOME/bin/logfetch
my-client-local.cfg /dev/null*
[msgs:my-logfile.log]
log line 4 trigger ORA-04091 bla bla
log line 5 trigger Failed to log off resource bla bla
log line 6 trigger Failed to log on resource bla bla

[logfile:my-logfile.log]
type:100000 (file)
mode:644 (-rw-r--r--)
linkcount:1
owner:1984 (xymon)
group:1984 (xymon)
size:273
clock:1568789652 (2019/09/18-16:54:12)
atime:1568789652 (2019/09/18-16:54:12)
ctime:1568788625 (2019/09/18-16:37:05)
mtime:1568788446 (2019/09/18-16:34:06)

Cheers
Jeremy


On Wed, 18 Sep 2019 at 16:18, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi Jeremy

What I'm finding is that I'm getting SMS alerts from Xymon about lines in
/var/log/messages such as "Failed to evaluate"," Failed to fetch", and
"Failed to calculate" which are normal for the application we're running.
I only want SMS alerts about the Oracle error and failed to log o/off
resource.

Thanks

On Wed, Sep 18, 2019 at 1:37 PM Jeremy Laidman <user-0608abae5e7c@xymon.invalid>
wrote:
"After picking out the "trigger" lines, any remaining space up to the
maximum size is filled in with the most recent entries from the logfile."

So it will include what you request as well as whatever else that will
fit.


On Wed, 18 Sep 2019 at 09:32, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi all

I have this entry in client-local.cfg
---
[host=test_server_41]
log:/var/log/messages:1024000
ignore "%(Failed to fetch|Failed to parse|Failed to evaluate)"
trigger "%(ORA-04091|Failed to log off resource|Failed to log on
resource)"
---

And in analysis.cfg I have:
---
HOST= test_server_41
    LOG /var/log/messages %(ORA-04091|Failed to log off resource|Failed
to log on resource) IGNORE=OCS color=red
---

Needless to say I'm getting alerts for more than just "ORA-04091",
"Failed to log off resource", and "Failed to log on resource".

Any ideas what I'm doing wrong?

Thanks