Xymon Mailing List Archive search

Setting thresholds in analysis.cfg

3 messages in this thread

list Christopher Seip · Fri, 8 Jun 2018 16:43:43 +0000 ·
I could a hand getting the basics of analysis.cfg worked out, please. Here's mine:

# egrep -v '^#' analysis.cfg

HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
	DISK /disk/data 50 55
	DISK    * 90 95
	MEMSWAP 80 90

HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07
	DISK /disk/data 92 96
	DISK    * 90 95

HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18
	DISK /BACKUP 98 99
	DISK    * 90 95

DEFAULT
	# Ignore some usually uninteresting tmpfs mounts.
	DISK    /dev IGNORE
	DISK    /dev/shm IGNORE
	DISK    /lib/init/rw IGNORE
	DISK    /run IGNORE
	# These are the built-in defaults. You should only modify these
	# lines, not add new ones (no PROC, DISK, LOG ... lines).
	UP      1h
	LOAD    5.0 10.0
	DISK    * 90 95
	INODE	* 70 90
	MEMPHYS 100 101
	MEMSWAP 50 80
	MEMACT  90 97


Three issues with this:

1. Swap consumption in the first host, swnfs06, has been steady at 74%, so I was trying to hush the alerts with the MEMSWAP line. This change hasn't had any effect; I am still getting a memory low yellow-warning for swap/page usage on swnfs06.

2. On the same swnfs06 host, its /disk/data partition is 56% full, so my "DISK..50 60" line was an attempt to trigger a yellow alert. I was testing my understanding of the analysis.cfg file, but the filesystems test remains green.

3. And my 96% full /BACKUP drive on hpnsvr18 is issuing a red alert for being over the panic level of 95%, where I was trying to set the panic level at 99%.

After wrestling with the man page and many experiments, I'm tossing this to the group for help. Seems very basic, but it's just not working for me. What'm I missing?

I tried switching to the "threshold hostname" format, like this:

# egrep -v '^#' analysis.cfg | head -11

DISK /disk/data 50 55 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
DISK    * 90 95 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06

DISK /disk/data 92 96 HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07
DISK    * 90 95 HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07

DISK /BACKUP 98 99 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18
DISK    * 90 95 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18

This produced no change in behavior. I am stopping and starting the Xymon server software and waiting for new html pages to generate after every change in the analysis.cfg.

In my configuration report, I can see that every server configured for local memory tests has acquired the 80%/90% threshold setting, not just swnfs06. And my "DISK /disk/data 50 55" is having no effect at all on any host: The strings "50%" or "60%" appear nowhere in my configuration report.

# egrep 'swnfs0[67]' hosts.cfg
    16.93.247.204	swnfs06.rose.rdlabs.hpecorp.net	# rpc=mount,nlockmgr,nfs,ypbind ssh
    16.93.247.205	swnfs07.rose.rdlabs.hpecorp.net	# NOCOLUMNS=files rpc=mount,nlockmgr,nfs,ypbind ssh
	16.93.247.204	swnfs06.rose.rdlabs.hpecorp.net

# xymoncmd xymond_client --dump-config
DISK /disk/data 50% 55% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06 (line: 351)
DISK * 90% 95% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06 (line: 352)
MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06 (line: 353)
DISK /disk/data 92% 96% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07 (line: 355)
DISK * 90% 95% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07 (line: 356)
DISK /BACKUP 98% 99% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18 (line: 360)
DISK * 90% 95% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18 (line: 361)
DISK /dev IGNORE (line: 371)
DISK /dev/shm IGNORE (line: 372)
DISK /lib/init/rw IGNORE (line: 373)
DISK /run IGNORE (line: 374)
UP 3600 -1 (line: 377)
LOAD 5.00 10.00 (line: 378)
DISK * 90% 95% 0 -1 red (line: 379)
INODE * 70% 90% 0 -1 red (line: 380)
MEMREAL 100 101 (line: 381)
MEMSWAP 50 80 (line: 382)
MEMACT 90 97 (line: 383)

Thanks for any insights you can provide. Feels like I'm making a wrong assumption about how analysis.cfg works.

Best thing I can figure to do would be to switch to local configuration of my Xymon clients, but I'd rather manage custom thresholds centrally.

- Chris
list Jeremy Laidman · Wed, 22 Aug 2018 01:25:51 +1000 ·
Chris

I think this is the key part of the man page:

HOST=targetstring Rule matching a host by the hostname.  "targetstring"
       is either a comma-separated  list  of  hostnames  (from  the
hosts.cfg
       file),  "*"  to  indicate  "all  hosts",  or  a Perl-compatible
regular
       expression.

Are your host definitions comma-separated lists, or PCREs? They can't be
both.

So none of your hosts match, and the DEFAULT stanza is the one that applies.

J


On 9 June 2018 at 02:43, Seip, Christopher (HPN SIS team) <
quoted from Christopher Seip
user-b14a01e805b4@xymon.invalid> wrote:
I could a hand getting the basics of analysis.cfg worked out, please.
Here's mine:

# egrep -v '^#' analysis.cfg

HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
        DISK /disk/data 50 55
        DISK    * 90 95
        MEMSWAP 80 90

HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07
        DISK /disk/data 92 96
        DISK    * 90 95

HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18
        DISK /BACKUP 98 99
        DISK    * 90 95

DEFAULT
        # Ignore some usually uninteresting tmpfs mounts.
        DISK    /dev IGNORE
        DISK    /dev/shm IGNORE
        DISK    /lib/init/rw IGNORE
        DISK    /run IGNORE
        # These are the built-in defaults. You should only modify these
        # lines, not add new ones (no PROC, DISK, LOG ... lines).
        UP      1h
        LOAD    5.0 10.0
        DISK    * 90 95
        INODE   * 70 90
        MEMPHYS 100 101
        MEMSWAP 50 80
        MEMACT  90 97


Three issues with this:

1. Swap consumption in the first host, swnfs06, has been steady at 74%, so
I was trying to hush the alerts with the MEMSWAP line. This change hasn't
had any effect; I am still getting a memory low yellow-warning for
swap/page usage on swnfs06.

2. On the same swnfs06 host, its /disk/data partition is 56% full, so my
"DISK..50 60" line was an attempt to trigger a yellow alert. I was testing
my understanding of the analysis.cfg file, but the filesystems test remains
green.

3. And my 96% full /BACKUP drive on hpnsvr18 is issuing a red alert for
being over the panic level of 95%, where I was trying to set the panic
level at 99%.

After wrestling with the man page and many experiments, I'm tossing this
to the group for help. Seems very basic, but it's just not working for me.
What'm I missing?

I tried switching to the "threshold hostname" format, like this:

# egrep -v '^#' analysis.cfg | head -11

DISK /disk/data 50 55 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
DISK    * 90 95 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06

DISK /disk/data 92 96 HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07
DISK    * 90 95 HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07

DISK /BACKUP 98 99 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18
DISK    * 90 95 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18

This produced no change in behavior. I am stopping and starting the Xymon
server software and waiting for new html pages to generate after every
change in the analysis.cfg.

In my configuration report, I can see that every server configured for
local memory tests has acquired the 80%/90% threshold setting, not just
swnfs06. And my "DISK /disk/data 50 55" is having no effect at all on any
host: The strings "50%" or "60%" appear nowhere in my configuration report.

# egrep 'swnfs0[67]' hosts.cfg
    16.93.247.204       swnfs06.rose.rdlabs.hpecorp.net #
rpc=mount,nlockmgr,nfs,ypbind ssh
    16.93.247.205       swnfs07.rose.rdlabs.hpecorp.net # NOCOLUMNS=files
rpc=mount,nlockmgr,nfs,ypbind ssh
        16.93.247.204   swnfs06.rose.rdlabs.hpecorp.net

# xymoncmd xymond_client --dump-config
DISK /disk/data 50% 55% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
(line: 351)
DISK * 90% 95% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06
(line: 352)
MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net,%swnfs06 (line: 353)
DISK /disk/data 92% 96% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07
(line: 355)
DISK * 90% 95% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net,%swnfs07
(line: 356)
DISK /BACKUP 98% 99% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18
(line: 360)
DISK * 90% 95% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net,%hpnsvr18
(line: 361)
DISK /dev IGNORE (line: 371)
DISK /dev/shm IGNORE (line: 372)
DISK /lib/init/rw IGNORE (line: 373)
DISK /run IGNORE (line: 374)
UP 3600 -1 (line: 377)
LOAD 5.00 10.00 (line: 378)
DISK * 90% 95% 0 -1 red (line: 379)
INODE * 70% 90% 0 -1 red (line: 380)
MEMREAL 100 101 (line: 381)
MEMSWAP 50 80 (line: 382)
MEMACT 90 97 (line: 383)

Thanks for any insights you can provide. Feels like I'm making a wrong
assumption about how analysis.cfg works.

Best thing I can figure to do would be to switch to local configuration of
my Xymon clients, but I'd rather manage custom thresholds centrally.

- Chris

list Christopher Seip · Tue, 21 Aug 2018 15:31:59 +0000 ·
Good point!

Thanks for finding my message and replying. That gives me something to try.

- Chris
quoted from Jeremy Laidman

From: Jeremy Laidman <user-0608abae5e7c@xymon.invalid>
Sent: Tuesday, August 21, 2018 8:26 AM
To: Seip, Christopher (HPN SIS team) <user-b14a01e805b4@xymon.invalid>
Cc: xymon at xymon.com
Subject: Re: [Xymon] Setting thresholds in analysis.cfg

Chris

I think this is the key part of the man page:

HOST=targetstring Rule matching a host by the hostname.  "targetstring"
       is either a comma-separated  list  of  hostnames  (from  the  hosts.cfg
       file),  "*"  to  indicate  "all  hosts",  or  a Perl-compatible regular
       expression.

Are your host definitions comma-separated lists, or PCREs? They can't be both.

So none of your hosts match, and the DEFAULT stanza is the one that applies.

J


On 9 June 2018 at 02:43, Seip, Christopher (HPN SIS team) <user-b14a01e805b4@xymon.invalid<mailto:user-b14a01e805b4@xymon.invalid>> wrote:
I could a hand getting the basics of analysis.cfg worked out, please. Here's mine:

# egrep -v '^#' analysis.cfg

HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06
        DISK /disk/data 50 55
        DISK    * 90 95
        MEMSWAP 80 90

HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07
        DISK /disk/data 92 96
        DISK    * 90 95

HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18
quoted from Jeremy Laidman
        DISK /BACKUP 98 99
        DISK    * 90 95

DEFAULT
        # Ignore some usually uninteresting tmpfs mounts.
        DISK    /dev IGNORE
        DISK    /dev/shm IGNORE
        DISK    /lib/init/rw IGNORE
        DISK    /run IGNORE
        # These are the built-in defaults. You should only modify these
        # lines, not add new ones (no PROC, DISK, LOG ... lines).
        UP      1h
        LOAD    5.0 10.0
        DISK    * 90 95
        INODE   * 70 90
        MEMPHYS 100 101
        MEMSWAP 50 80
        MEMACT  90 97


Three issues with this:

1. Swap consumption in the first host, swnfs06, has been steady at 74%, so I was trying to hush the alerts with the MEMSWAP line. This change hasn't had any effect; I am still getting a memory low yellow-warning for swap/page usage on swnfs06.

2. On the same swnfs06 host, its /disk/data partition is 56% full, so my "DISK..50 60" line was an attempt to trigger a yellow alert. I was testing my understanding of the analysis.cfg file, but the filesystems test remains green.

3. And my 96% full /BACKUP drive on hpnsvr18 is issuing a red alert for being over the panic level of 95%, where I was trying to set the panic level at 99%.

After wrestling with the man page and many experiments, I'm tossing this to the group for help. Seems very basic, but it's just not working for me. What'm I missing?

I tried switching to the "threshold hostname" format, like this:

# egrep -v '^#' analysis.cfg | head -11

DISK /disk/data 50 55 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06
DISK    * 90 95 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06
MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06

DISK /disk/data 92 96 HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07
DISK    * 90 95 HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07

DISK /BACKUP 98 99 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18
DISK    * 90 95 HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18
quoted from Jeremy Laidman

This produced no change in behavior. I am stopping and starting the Xymon server software and waiting for new html pages to generate after every change in the analysis.cfg.

In my configuration report, I can see that every server configured for local memory tests has acquired the 80%/90% threshold setting, not just swnfs06. And my "DISK /disk/data 50 55" is having no effect at all on any host: The strings "50%" or "60%" appear nowhere in my configuration report.

# egrep 'swnfs0[67]' hosts.cfg

    16.93.247.204       swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>; # rpc=mount,nlockmgr,nfs,ypbind ssh
    16.93.247.205       swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>; # NOCOLUMNS=files rpc=mount,nlockmgr,nfs,ypbind ssh
        16.93.247.204   swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>;

# xymoncmd xymond_client --dump-config
DISK /disk/data 50% 55% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 (line: 351)
DISK * 90% 95% 0 -1 red HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 (line: 352)
MEMSWAP 80 90 HOST=%swnfs06.rose.rdlabs.hpecorp.net<http://swnfs06.rose.rdlabs.hpecorp.net>,%swnfs06 (line: 353)
DISK /disk/data 92% 96% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07 (line: 355)
DISK * 90% 95% 0 -1 red HOST=%swnfs07.rose.rdlabs.hpecorp.net<http://swnfs07.rose.rdlabs.hpecorp.net>,%swnfs07 (line: 356)
DISK /BACKUP 98% 99% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18 (line: 360)
DISK * 90% 95% 0 -1 red HOST=%hpnsvr18.rose.rdlabs.hpecorp.net<http://hpnsvr18.rose.rdlabs.hpecorp.net>,%hpnsvr18 (line: 361)
quoted from Jeremy Laidman
DISK /dev IGNORE (line: 371)
DISK /dev/shm IGNORE (line: 372)
DISK /lib/init/rw IGNORE (line: 373)
DISK /run IGNORE (line: 374)
UP 3600 -1 (line: 377)
LOAD 5.00 10.00 (line: 378)
DISK * 90% 95% 0 -1 red (line: 379)
INODE * 70% 90% 0 -1 red (line: 380)
MEMREAL 100 101 (line: 381)
MEMSWAP 50 80 (line: 382)
MEMACT 90 97 (line: 383)

Thanks for any insights you can provide. Feels like I'm making a wrong assumption about how analysis.cfg works.

Best thing I can figure to do would be to switch to local configuration of my Xymon clients, but I'd rather manage custom thresholds centrally.

- Chris