alerting on log messages
list Colin Coe
Hi all
I have this entry in client-local.cfg
---
[host=test_server_41]
log:/var/log/messages:1024000
ignore "%(Failed to fetch|Failed to parse|Failed to evaluate)"
trigger "%(ORA-04091|Failed to log off resource|Failed to log on resource)"
---
And in analysis.cfg I have:
---
HOST= test_server_41
LOG /var/log/messages %(ORA-04091|Failed to log off resource|Failed to
log on resource) IGNORE=OCS color=red
---
Needless to say I'm getting alerts for more than just "ORA-04091", "Failed
to log off resource", and "Failed to log on resource".
Any ideas what I'm doing wrong?
Thanks
list Jeremy Laidman
"After picking out the "trigger" lines, any remaining space up to the maximum size is filled in with the most recent entries from the logfile." So it will include what you request as well as whatever else that will fit.
▸
On Wed, 18 Sep 2019 at 09:32, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi all
I have this entry in client-local.cfg
---
[host=test_server_41]
log:/var/log/messages:1024000
ignore "%(Failed to fetch|Failed to parse|Failed to evaluate)"
trigger "%(ORA-04091|Failed to log off resource|Failed to log on resource)"
---
And in analysis.cfg I have:
---
HOST= test_server_41
LOG /var/log/messages %(ORA-04091|Failed to log off resource|Failed to
log on resource) IGNORE=OCS color=red
---
Needless to say I'm getting alerts for more than just "ORA-04091", "Failed
to log off resource", and "Failed to log on resource".
Any ideas what I'm doing wrong?
Thanks
list Colin Coe
Hi Jeremy What I'm finding is that I'm getting SMS alerts from Xymon about lines in /var/log/messages such as "Failed to evaluate"," Failed to fetch", and "Failed to calculate" which are normal for the application we're running. I only want SMS alerts about the Oracle error and failed to log o/off resource. Thanks
▸
On Wed, Sep 18, 2019 at 1:37 PM Jeremy Laidman <user-0608abae5e7c@xymon.invalid> wrote:
"After picking out the "trigger" lines, any remaining space up to the maximum size is filled in with the most recent entries from the logfile." So it will include what you request as well as whatever else that will fit. On Wed, 18 Sep 2019 at 09:32, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:Hi all I have this entry in client-local.cfg --- [host=test_server_41] log:/var/log/messages:1024000 ignore "%(Failed to fetch|Failed to parse|Failed to evaluate)" trigger "%(ORA-04091|Failed to log off resource|Failed to log on resource)" --- And in analysis.cfg I have: --- HOST= test_server_41 LOG /var/log/messages %(ORA-04091|Failed to log off resource|Failed to log on resource) IGNORE=OCS color=red --- Needless to say I'm getting alerts for more than just "ORA-04091", "Failed to log off resource", and "Failed to log on resource". Any ideas what I'm doing wrong? Thanks
list Jeremy Laidman
Ah, I worked it out. In client-local.cfg, the patterns after "ignore" and "trigger" are defined as "regular expression". This is unlike (say) analysis.cfg where patterns HOST and LOG keywords are defined as "string or regular expression" and the "%" signifies a regular expression. Also, in client-local.cfg, strings with spaces must be enclosed in quotes, wherease in client-local.cfg, everything after the keyword (eg ignore, trigger) is treated as the regular expression. So in client-local.cfg you must not include a % or quotes. Instead you want something like:
▸
[host=test_server_41]
log:/var/log/messages:1024000
ignore (Failed to fetch|Failed to parse|Failed to evaluate)
trigger (ORA-04091|Failed to log off resource|Failed to log on resource)
It can be very painful to troubleshoot problems with client-local.cfg
configs, especially as it can take up to 10 minutes for updates to
propagate to clients and generate new results. I like to create my own copy
of the logfile and the client-local.cfg snippet, and manually run logfetch
(which is what processes the client-local.cfg lines). For example, this is
what I used to diagnose Colin's problem:
xymon at server:/tmp/logfetch-test>* ls -l*
total 8
-rw-r--r-- 1 xymon xymon 160 2019-09-18 16:49 my-client-local.cfg
-rw-r--r-- 1 xymon xymon 273 2019-09-18 16:34 my-logfile.log
xymon at server:/tmp/logfetch-test> *cat my-logfile.log*
log line 1 ignore Failed to fetch bla bla
log line 2 ignore Failed to parse bla bla
log line 3 ignore Failed to evaluate bla bla
log line 4 trigger ORA-04091 bla bla
log line 5 trigger Failed to log off resource bla bla
log line 6 trigger Failed to log on resource bla bla
xymon at server:/tmp/logfetch-test> *cat my-client-local.cfg*
log:my-logfile.log:1024000
▸
ignore (Failed to fetch|Failed to parse|Failed to evaluate)
trigger (ORA-04091|Failed to log off resource|Failed to log on resource)
xymon at server:/tmp/logfetch-test> *$XYMONCLIENTHOME/bin/logfetch
my-client-local.cfg /dev/null*
[msgs:my-logfile.log]
log line 4 trigger ORA-04091 bla bla
log line 5 trigger Failed to log off resource bla bla
log line 6 trigger Failed to log on resource bla bla
[logfile:my-logfile.log]
type:100000 (file)
mode:644 (-rw-r--r--)
linkcount:1
owner:1984 (xymon)
group:1984 (xymon)
size:273
clock:1568789652 (2019/09/18-16:54:12)
atime:1568789652 (2019/09/18-16:54:12)
ctime:1568788625 (2019/09/18-16:37:05)
mtime:1568788446 (2019/09/18-16:34:06)
Cheers
Jeremy
▸
On Wed, 18 Sep 2019 at 16:18, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi Jeremy What I'm finding is that I'm getting SMS alerts from Xymon about lines in /var/log/messages such as "Failed to evaluate"," Failed to fetch", and "Failed to calculate" which are normal for the application we're running. I only want SMS alerts about the Oracle error and failed to log o/off resource. Thanks On Wed, Sep 18, 2019 at 1:37 PM Jeremy Laidman <user-0608abae5e7c@xymon.invalid> wrote:"After picking out the "trigger" lines, any remaining space up to the maximum size is filled in with the most recent entries from the logfile." So it will include what you request as well as whatever else that will fit. On Wed, 18 Sep 2019 at 09:32, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:Hi all I have this entry in client-local.cfg --- [host=test_server_41] log:/var/log/messages:1024000 ignore "%(Failed to fetch|Failed to parse|Failed to evaluate)" trigger "%(ORA-04091|Failed to log off resource|Failed to log on resource)" --- And in analysis.cfg I have: --- HOST= test_server_41 LOG /var/log/messages %(ORA-04091|Failed to log off resource|Failed to log on resource) IGNORE=OCS color=red --- Needless to say I'm getting alerts for more than just "ORA-04091", "Failed to log off resource", and "Failed to log on resource". Any ideas what I'm doing wrong? Thanks
list Colin Coe
Hi Jeremy Many thanks for this, working perfectly. Thanks again
▸
On Wed, Sep 18, 2019 at 2:58 PM Jeremy Laidman <user-0608abae5e7c@xymon.invalid> wrote:
Ah, I worked it out. In client-local.cfg, the patterns after "ignore" and "trigger" are defined as "regular expression". This is unlike (say) analysis.cfg where patterns HOST and LOG keywords are defined as "string or regular expression" and the "%" signifies a regular expression. Also, in client-local.cfg, strings with spaces must be enclosed in quotes, wherease in client-local.cfg, everything after the keyword (eg ignore, trigger) is treated as the regular expression. So in client-local.cfg you must not include a % or quotes. Instead you want something like: [host=test_server_41] log:/var/log/messages:1024000 ignore (Failed to fetch|Failed to parse|Failed to evaluate) trigger (ORA-04091|Failed to log off resource|Failed to log on resource) It can be very painful to troubleshoot problems with client-local.cfg configs, especially as it can take up to 10 minutes for updates to propagate to clients and generate new results. I like to create my own copy of the logfile and the client-local.cfg snippet, and manually run logfetch (which is what processes the client-local.cfg lines). For example, this is what I used to diagnose Colin's problem: xymon at server:/tmp/logfetch-test>* ls -l* total 8 -rw-r--r-- 1 xymon xymon 160 2019-09-18 16:49 my-client-local.cfg -rw-r--r-- 1 xymon xymon 273 2019-09-18 16:34 my-logfile.log xymon at server:/tmp/logfetch-test> *cat my-logfile.log* log line 1 ignore Failed to fetch bla bla log line 2 ignore Failed to parse bla bla log line 3 ignore Failed to evaluate bla bla log line 4 trigger ORA-04091 bla bla log line 5 trigger Failed to log off resource bla bla log line 6 trigger Failed to log on resource bla bla xymon at server:/tmp/logfetch-test> *cat my-client-local.cfg* log:my-logfile.log:1024000 ignore (Failed to fetch|Failed to parse|Failed to evaluate) trigger (ORA-04091|Failed to log off resource|Failed to log on resource) xymon at server:/tmp/logfetch-test> *$XYMONCLIENTHOME/bin/logfetch my-client-local.cfg /dev/null* [msgs:my-logfile.log] log line 4 trigger ORA-04091 bla bla log line 5 trigger Failed to log off resource bla bla log line 6 trigger Failed to log on resource bla bla [logfile:my-logfile.log] type:100000 (file) mode:644 (-rw-r--r--) linkcount:1 owner:1984 (xymon) group:1984 (xymon) size:273 clock:1568789652 (2019/09/18-16:54:12) atime:1568789652 (2019/09/18-16:54:12) ctime:1568788625 (2019/09/18-16:37:05) mtime:1568788446 (2019/09/18-16:34:06) Cheers Jeremy On Wed, 18 Sep 2019 at 16:18, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:Hi Jeremy What I'm finding is that I'm getting SMS alerts from Xymon about lines in /var/log/messages such as "Failed to evaluate"," Failed to fetch", and "Failed to calculate" which are normal for the application we're running. I only want SMS alerts about the Oracle error and failed to log o/off resource. Thanks On Wed, Sep 18, 2019 at 1:37 PM Jeremy Laidman <user-0608abae5e7c@xymon.invalid> wrote:"After picking out the "trigger" lines, any remaining space up to the maximum size is filled in with the most recent entries from the logfile." So it will include what you request as well as whatever else that will fit. On Wed, 18 Sep 2019 at 09:32, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:Hi all I have this entry in client-local.cfg --- [host=test_server_41] log:/var/log/messages:1024000 ignore "%(Failed to fetch|Failed to parse|Failed to evaluate)" trigger "%(ORA-04091|Failed to log off resource|Failed to log on resource)" --- And in analysis.cfg I have: --- HOST= test_server_41 LOG /var/log/messages %(ORA-04091|Failed to log off resource|Failed to log on resource) IGNORE=OCS color=red --- Needless to say I'm getting alerts for more than just "ORA-04091", "Failed to log off resource", and "Failed to log on resource". Any ideas what I'm doing wrong? Thanks