Xymon Mailing List Archive search

Paging & Notification Not Working

list James Wade
Thu, 12 Jul 2007 09:58:08 -0500
Message-Id: <03d101c7c495$2517b3b0$user-1799f697e832@xymon.invalid>

Thanks Jason, I did a test of my config, and it looks fine.

This morning about 7:45 we received an oracle partition filling
up. It should have paged people via a pager and email.

The DB group received an email, put not a page, the Unix group
received no email, and no page.

The notification log confirms that only the DB group was sent
a notification.

Here's a partial of the output:

00009958 2007-07-12 09:35:47 send_alert db101:disk state Paging
00009958 2007-07-12 09:35:47 Matching host:service:page 'db101:disk:Chicago'
against rule line 121
00009958 2007-07-12 09:35:47 *** Match with 'HOST=*' ***
00009958 2007-07-12 09:35:47 Matching host:service:page 'db101:disk:Chicago'
against rule line 122
00009958 2007-07-12 09:35:47 *** Match with 'MAIL user-8c39f9246549@xymon.invalid
REPEAT=15 COLOR=RED DURATION<30 RECOVERED' ***
00009958 2007-07-12 09:35:47 Mail alert with command 'mailx -s "BB [12345]
db101:disk CRITICAL (RED)" user-8c39f9246549@xymon.invalid'
00009958 2007-07-12 09:35:47 Matching host:service:page 'db101:disk:Chicago'
against rule line 132
00009958 2007-07-12 09:35:47 *** Match with 'MAIL user-5c6e60637778@xymon.invalid
REPEAT=15 COLOR=RED DURATION<30 RECOVERED' ***

I never received a page today.

Notification log:

Wed Jul 11 20:54:24 2007 system005.cpu (192.168.10.3)
user-8c39f9246549@xymon.invalid [125] 1184205263 200
Thu Jul 12 07:29:08 2007 db101.disk (192.168.10.10)
user-644d591309b5@xymon.invalid[129] 1184243348 100
Thu Jul 12 07:31:12 2007 db101.disk (192.168.10.10)
user-644d591309b5@xymon.invalid[129] 1184243472 100

The notification log shows that only pages to the DB group occurred
and nothing was sent to me as an example. Yet as the check above shows,
I should have matched and been sent both a page and email.

James

-----Original Message-----
From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid] 
Sent: Thursday, July 12, 2007 9:30 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Paging & Notification Not Working

Are you 100% sure that your hobbit-alerts file is configured correctly?

You can test your config with:

$BBHOME/bin/bbcmd hobbitd_alert --test <host> <test>

Should give you a list of everyone that should be notified

Jason.

-----Original Message-----
From: James Wade [mailto:user-659655b2ea05@xymon.invalid] 
Sent: 12 July 2007 15:24
To: user-ae9b8668bcde@xymon.invalid
Subject: FW: [hobbit] Paging & Notification Not Working

Hi,

I never received any answers on this one.
I had another problem today and no one was paged.

How can I manually test paging?

Thanks...James

Hello,

I've noticed a random problem in Hobbit that
I can't seem to track down. I thought perhaps
other folks had seen the same problem.

Occasionally, paging and notification seems to
stop for no reason or all the groups in my file
don't get notified. The only way you know that
it happens is that we stop receiving notifications.
ie...something went wrong and you are not notified.

As an example, a database server last night filled 
up a /u? partition 100%. We received no email notification
of the event.

The notifications.log indicates that messages stopped
around 9:21 p.m.... ie...This was the last message
at the end of the notifications log. 

Mon Jul  9 21:18:39 2007 server1.disk (192.168.10.2)
user-0a82b7f6ef66@xymon.invalid
[129] 1184033919 100

The one above was the last message received. The DB group
was notified of the problem, but the unix group was not.

Also, we had other problems, minor, with a couple other systems
we should have gotten notices from, but all paging seemed
to halt after the last message above.

Has anyone seen this. Any suggested method of tracking it down?
I've fixed it in the past by restarting Hobbit, and then 
notification and paging seem to start working again.

Thanks...James