possible alerting bug in RC2?
list Bruce Lysik
Hi,
So I installed RC2 this morning. Later on, I noticed an alert email for a monitor going into yellow. I had disabled this previously with --alertcolors=red,purple in hobbitlaunch.cfg. Here's the snippet:
[hobbitd]
HEARTBEAT
ENVFILE /opt/bb/server/etc/hobbitserver.cfg
CMD hobbitd --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbit
d.chk --checkpoint-interval=600 --purple-conn=conn --log=$BBSERVERLOGS/hobbitd.l
og --admin-senders=127.0.0.1,$BBSERVERIP --alertcolors=red,purple
And here's the alert I just received:
im68:cpu yellow [-1]
yellow Mon Feb 14 13:13:56 PST 2005 up: 208 day(s), 1 users, 115 procs, load=529
LOAD AVG on im68 is 529
Any ideas?
--
Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Asif Iqbal
▸
On Mon, Feb 14, 2005 at 01:28:28PM, Bruce Lysik wrote:
Hi,
So I installed RC2 this morning. Later on, I noticed an alert email for a monitor going into yellow. I had disabled this previously with --alertcolors=red,purple in hobbitlaunch.cfg. Here's the snippet:
[hobbitd]
HEARTBEAT
ENVFILE /opt/bb/server/etc/hobbitserver.cfg
CMD hobbitd --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbit
d.chk --checkpoint-interval=600 --purple-conn=conn --log=$BBSERVERLOGS/hobbitd.l
og --admin-senders=127.0.0.1,$BBSERVERIP --alertcolors=red,purple
And here's the alert I just received:
im68:cpu yellow [-1]
yellow Mon Feb 14 13:13:56 PST 2005 up: 208 day(s), 1 users, 115 procs, load=529What do you get when you run the following test as hobbit user? cd ~hobbit/server ./bin/bbcmd --test <FQDN of im68> cpu
▸
LOAD AVG on im68 is 529 Any ideas? -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
"...it said: Install Windows XP or better...so I installed Solaris..."
list Asif Iqbal
▸
On Mon, Feb 14, 2005 at 06:44:37PM, Asif Iqbal wrote:
On Mon, Feb 14, 2005 at 01:28:28PM, Bruce Lysik wrote:Hi, So I installed RC2 this morning. Later on, I noticed an alert email for a monitor going into yellow. I had disabled this previously with --alertcolors=red,purple in hobbitlaunch.cfg. Here's the snippet: [hobbitd]HEARTBEAT ENVFILE /opt/bb/server/etc/hobbitserver.cfg CMD hobbitd --restart=$BBTMP/hobbitd.chk --checkpoint-file=$BBTMP/hobbit d.chk --checkpoint-interval=600 --purple-conn=conn --log=$BBSERVERLOGS/hobbitd.l og --admin-senders=127.0.0.1,$BBSERVERIP --alertcolors=red,purpleAnd here's the alert I just received: im68:cpu yellow [-1] yellow Mon Feb 14 13:13:56 PST 2005 up: 208 day(s), 1 users, 115 procs, load=529What do you get when you run the following test as hobbit user? cd ~hobbit/server ./bin/bbcmd --test <FQDN of im68> cpu
oops I meant ./bin/bbcmd hobbitd_alert --test FQDN cpu
▸
LOAD AVG on im68 is 529Any ideas? --Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "...it said: Install Windows XP or better...so I installed Solaris..."
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "...it said: Install Windows XP or better...so I installed Solaris..."
list Bruce Lysik
▸
What do you get when you run the following test as hobbit user? cd ~hobbit/server ./bin/bbcmd --test <FQDN of im68> cpu
-bash-2.05b$ ./bin/bbcmd --test im68.internal.shutterfly.com cpu 2005-02-14 15:48:51 Using default environment file /opt/bb/server/etc/hobbitserver.cfg 2005-02-14 15:48:51 execvp() failed: No such file or directory Same with non-FQDN: -bash-2.05b$ ./bin/bbcmd --test im68 cpu 2005-02-14 15:49:12 Using default environment file /opt/bb/server/etc/hobbitserver.cfg 2005-02-14 15:49:12 execvp() failed: No such file or directory -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
list Bruce Lysik
oops I meant ./bin/bbcmd hobbitd_alert --test FQDN cpu
Ah, that actually works. Prepare for spam:
-bash-2.05b$ ./bin/bbcmd hobbitd_alert --test im68 cpu
2005-02-14 15:53:56 Using default environment file /opt/bb/server/etc/hobbitserver.cfg
Matching host:service:page 'im68:cpu:' against rule line 68:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 76:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 84:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 92:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 100:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 110:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 117:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 124:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 131:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 139:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 147:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 155:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 163:Matched
*** Match with 'HOST=$HG-IMAGE' ***
Matching host:service:page 'im68:cpu:' against rule line 169:Failed (min. duration)
Matching host:service:page 'im68:cpu:' against rule line 171:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 176:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 184:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 192:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 200:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 208:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 213:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 218:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 223:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 231:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 239:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 247:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 255:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 263:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 271:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 279:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 284:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 289:Failed (hostname not in include list)
Matching host:service:page 'im68:cpu:' against rule line 297:Failed (hostname not in include list)
--
Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Asif Iqbal
▸
On Mon, Feb 14, 2005 at 03:55:46PM, Bruce Lysik wrote:
oops I meant ./bin/bbcmd hobbitd_alert --test FQDN cpuAh, that actually works. Prepare for spam: -bash-2.05b$ ./bin/bbcmd hobbitd_alert --test im68 cpu 2005-02-14 15:53:56 Using default environment file /opt/bb/server/etc/hobbitserver.cfg Matching host:service:page 'im68:cpu:' against rule line 163:Matched *** Match with 'HOST=$HG-IMAGE' *** Matching host:service:page 'im68:cpu:' against rule line 169:Failed (min. duration)
I do see any rule with RED alert. Try to remove the DURARTION parameter for this rule and show me the output again. You can also just post the relavant portion of the following test's output to get a better diagnose ./bin/bbcmd hobbitd_alert --dump-config Thanks
▸
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu "...it said: Install Windows XP or better...so I installed Solaris..."
list Bruce Lysik
▸
You can also just post the relavant portion of the following test's output to get a better diagnose ./bin/bbcmd hobbitd_alert --dump-config
Sure. It's very simple, really: HOST=<snip list of about 100 hosts> SCRIPT /opt/bb/server/ext/email bruce_mail FORMAT=SCRIPT REPEAT=30 DURATION>6 RECOVERED But the main issue is, I was under the impression a yellow alert would never trigger anything, because of how I've defined --alertcolors. I've just received another yellow alert. So something is up. -- Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid> Operations Engineer
list Henrik Størner
▸
On Mon, Feb 14, 2005 at 01:28:28PM -0800, Bruce Lysik wrote:
So I installed RC2 this morning. Later on, I noticed an alert email for a monitor going into yellow. I had disabled this previously with --alertcolors=red,purple in hobbitlaunch.cfg.
[config from another mail]
▸
HOST=<snip list of about 100 hosts>
SCRIPT /opt/bb/server/ext/email bruce_mail FORMAT=SCRIPT REPEAT=30 DURATION>6 RECOVEREDAnd here's the alert I just received: im68:cpu yellow [-1] yellow Mon Feb 14 13:13:56 PST 2005 up: 208 day(s), 1 users, 115 procs, load=529
The alert you show here looks like a recovery-notice (the "-1" I assume is the acknowledgment cookie, and this value indicates that there is no active alert). If you look in the ~/data/ack/notifications.log file for these notifications, you can tell if it's an alert message or a recovery message by the number of columns in the file. E.g. in my log I have Wed Feb 16 13:08:43 2005 www.sslug.dk.smtp (130.228.2.150) user-ce4a2c883f75@xymon.invalid 1108555723 725 Wed Feb 16 13:09:44 2005 www.sslug.dk.smtp (130.228.2.150)user-ce4a2c883f75@xymon.invalid 1108555784 725 61 The first one is the alert message, the second is the recovery message. The recovery has an extra field "61", which is the duration of the event (in seconds). Could you check the following in hobbitlaunch.cfg: * The "hobbitd" command has "--alertcolors=red,purple --okcolors=green" * The "hobbitd_alert" command has "--alertcolors=red,purple" This setup should give you alerts when a status is red (or purple), and recovery notices only when they go green (after being red or purple). Regards, Henrik
list Bruce Lysik
▸
And here's the alert I just received: im68:cpu yellow [-1] yellow Mon Feb 14 13:13:56 PST 2005 up: 208 day(s), 1 users, 115 procs, load=529The alert you show here looks like a recovery-notice (the "-1" I assume is the acknowledgment cookie, and this value indicates that there is no active alert).
Argh. That's bitten me before. I have to figure out how to get the actual word 'recovered' in there to make it easier to understand at 4am.
▸
Could you check the following in hobbitlaunch.cfg: * The "hobbitd" command has "--alertcolors=red,purple --okcolors=green" * The "hobbitd_alert" command has "--alertcolors=red,purple" This setup should give you alerts when a status is red (or purple), and recovery notices only when they go green (after being red or purple).
Hmm. I didn't have --okcolors=green, so I added that to the hobbitd command.
Just to clarify about the hobbitd_alert command, that's in the bbpage module, correct? I didn't have --alertcolors there, but I've just added it:
[bbpage]
ENVFILE /opt/bb/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD hobbitd_channel --channel=page --log=$BBSERVERLOGS/page.log hobbitd_alert --alertcolors=red,purple
Thanks for your assistance.
--
Bruce Z. Lysik <user-4e63a10f8934@xymon.invalid>
Operations Engineer
list Henrik Størner
▸
On Wed, Feb 16, 2005 at 10:01:40AM -0800, Bruce Lysik wrote:
Could you check the following in hobbitlaunch.cfg:* The "hobbitd" command has "--alertcolors=red,purple > --okcolors=green"* The "hobbitd_alert" command has "--alertcolors=red,purple"This setup should give you alerts when a status is red (or purple), and recovery notices only when they go green (after being red or purple).Hmm. I didn't have --okcolors=green, so I added that to the hobbitd command. Just to clarify about the hobbitd_alert command, that's in the bbpage module, correct?
Yep.
▸
I didn't have --alertcolors there, but I've just added it:
[bbpage]
ENVFILE /opt/bb/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD hobbitd_channel --channel=page --log=$BBSERVERLOGS/page.log hobbitd_alert --alertcolors=red,purpleLooks ok now. Henrik