Xymon Mailing List Archive search

hobbitd_alert: Servers on multiple pages & PAGE= rules

7 messages in this thread

list S Aiello · Fri, 30 Mar 2007 15:30:17 -0500 ·
Lets say I have the following bb-hosts file:
page servers
subpage Web Web Servers
1.2.3.4		Web01.domain.com			#
1.2.3.5		Web02.domain.com			#
Subpage Other Other Web Servers
0.0.0.0		Web02.domain.com			#

And now I have the following hobbit-alerts.cfg:
PAGE=servers/Other SERVICE=conn
	MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow

Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com conn", I see:
2007-03-30 15:09:38 Using default environment file ..../server/etc/hobbitserver.cfg
00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state Paging
00000897 2007-03-30 15:09:38 Matching host:service:page 'Web02.domain.com:conn:server/Web' against rule line 119
00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn' (pagename not in include list)

So it seems when using PAGE= alert rules only honor the first page a device is listed on. Is this by design ?
list S Aiello · Tue, 10 Apr 2007 16:11:42 -0400 ·
Searching the Mailing list archives, I see a few others have experienced the 
same problem. So is there a recommended work-around , or is this on the todo 
list to be fixed.

Any information would be most welcomed, Thanks.
 ~Steve
quoted from S Aiello

On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:
Lets say I have the following bb-hosts file:
page servers
subpage Web Web Servers
1.2.3.4		Web01.domain.com			#
1.2.3.5		Web02.domain.com			#
Subpage Other Other Web Servers
0.0.0.0		Web02.domain.com			#

And now I have the following hobbit-alerts.cfg:
PAGE=servers/Other SERVICE=conn
	MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow

Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com
conn", I see:
2007-03-30 15:09:38 Using default environment
file ..../server/etc/hobbitserver.cfg
00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state Paging
00000897 2007-03-30 15:09:38 Matching
host:service:page 'Web02.domain.com:conn:server/Web' against rule line 119
00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn'
(pagename not in include list)

So it seems when using PAGE= alert rules only honor the first page a device
is listed on. Is this by design ?
list S Aiello · Tue, 17 Apr 2007 10:28:08 -0400 ·
Since I have not seen a response I will assume there presently isn't a known workaround. The only workaround I can come up with is the use of MACROS in the hobbit-alerts.cfg. Basically using MACROS to define a group of hosts, and then creating an alert rule with HOST=$MacroName.
$GROUPA=%HOSTA|HOSTB|HOSTC|HOSTF
HOST=$GROUPA

This would be similar to HostGroup (hg-) in BigBrother. I haven't tested this yet. I was hoping that PAGE= would work for any page a Device is listed on and not just the top level one. That would make management of alert rules much easier and was one of the big features in Hobbit I was looking forward to.

I still do not know if PAGE= only working for the top level Page listing is a Bug or not. I did see a past mailing list article (http://www.hobbitmon.com/hobbiton/2005/05/msg0t0211.html) and a patch to fix hobbitd_alert --test option. I did not see that patch mentioned in the all-in-one patch, nor was I able to apply the patch, there was an error.

I believe this issue is also bigger then just the failure occuring during the --test option. I do not see alert rules being applied in the info page, nor do I receive alerts. I tried looking into the code, but was unable to come up with an answer/fix myself.

Henrik, or anyone else, if you have time to dig into this, I would appreciate it greatly.

 ~Steve
quoted from S Aiello


On Tuesday 10 April 2007 16:11, user-ce96540ed38f@xymon.invalid wrote:
Searching the Mailing list archives, I see a few others have experienced
the same problem. So is there a recommended work-around , or is this on the
todo list to be fixed.

Any information would be most welcomed, Thanks.
 ~Steve

On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:
Lets say I have the following bb-hosts file:
page servers
subpage Web Web Servers
1.2.3.4		Web01.domain.com			#
1.2.3.5		Web02.domain.com			#
Subpage Other Other Web Servers
0.0.0.0		Web02.domain.com			#

And now I have the following hobbit-alerts.cfg:
PAGE=servers/Other SERVICE=conn
	MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow

Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com
conn", I see:
2007-03-30 15:09:38 Using default environment
file ..../server/etc/hobbitserver.cfg
00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state
Paging 00000897 2007-03-30 15:09:38 Matching
host:service:page 'Web02.domain.com:conn:server/Web' against rule line
119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn'
(pagename not in include list)

So it seems when using PAGE= alert rules only honor the first page a
device is listed on. Is this by design ?
list John Glowacki · Tue, 17 Apr 2007 16:12:34 -0400 ·
Here is an example for workaround. Call SCRIPT instead of MAIL and do
page filtering in a custom script. That might give you some ideas for
other workarounds until a reason for the problem is found.

hobbit-alerts.cfg:
HOST=* RECOVERED NOTICE
     SCRIPT /opt/hobbit/server/etc/alert.sh noc FORMAT=SCRIPT

alert.sh:
# Check if hostname is on page=test
bb 127.0.0.1 "hobbitdboard test=$BBSVCNAME page=^test fields=hostname" |
grep "^$BBHOSTNAME$"
if [ "$?" = "0" ]
then
    echo "Skip alert: `date` P $BBHOSTNAME.$BBSVCNAME" >>
/var/log/hobbit/skip_alert.log
    exit 0
fi
# send alert (mail,sms,etc)

John
quoted from S Aiello

user-ce96540ed38f@xymon.invalid wrote:
Since I have not seen a response I will assume there presently isn't a known workaround. The only workaround I can come up with is the use of MACROS in the hobbit-alerts.cfg. Basically using MACROS to define a group of hosts, and then creating an alert rule with HOST=$MacroName.
$GROUPA=%HOSTA|HOSTB|HOSTC|HOSTF
HOST=$GROUPA

This would be similar to HostGroup (hg-) in BigBrother. I haven't tested this yet. I was hoping that PAGE= would work for any page a Device is listed on and not just the top level one. That would make management of alert rules much easier and was one of the big features in Hobbit I was looking forward to.

I still do not know if PAGE= only working for the top level Page listing is a Bug or not. I did see a past mailing list article (http://www.hobbitmon.com/hobbiton/2005/05/msg0t0211.html) and a patch to fix hobbitd_alert --test option. I did not see that patch mentioned in the all-in-one patch, nor was I able to apply the patch, there was an error.

I believe this issue is also bigger then just the failure occuring during the --test option. I do not see alert rules being applied in the info page, nor do I receive alerts. I tried looking into the code, but was unable to come up with an answer/fix myself.

Henrik, or anyone else, if you have time to dig into this, I would appreciate it greatly.

 ~Steve


On Tuesday 10 April 2007 16:11, user-ce96540ed38f@xymon.invalid wrote:
Searching the Mailing list archives, I see a few others have experienced
the same problem. So is there a recommended work-around , or is this on the
todo list to be fixed.

Any information would be most welcomed, Thanks.
 ~Steve

On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:
Lets say I have the following bb-hosts file:
page servers
subpage Web Web Servers
1.2.3.4		Web01.domain.com			#
1.2.3.5		Web02.domain.com			#
Subpage Other Other Web Servers
0.0.0.0		Web02.domain.com			#

And now I have the following hobbit-alerts.cfg:
PAGE=servers/Other SERVICE=conn
	MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow

Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com
conn", I see:
2007-03-30 15:09:38 Using default environment
file ..../server/etc/hobbitserver.cfg
00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state
Paging 00000897 2007-03-30 15:09:38 Matching
host:service:page 'Web02.domain.com:conn:server/Web' against rule line
119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn'
(pagename not in include list)

So it seems when using PAGE= alert rules only honor the first page a
device is listed on. Is this by design ?
list S Aiello · Thu, 31 May 2007 12:21:20 -0400 ·
Digging back into this issue. That workaround would only be possible if I had 
a small # of devices and/or a small # of page based rules. If I have alot of 
devices the script would get called for every host. And with a lot these page 
based alert rules, that would be a large number of scripts be spawned for 
each alert.

But in desperation and I had a bit of time, I started to dig more into this. 
Even the hobbitdboard command does not show the devices. When I have the 
following:

Main Page:
 |-Page1
 |  +Host1a
 |  +Host1b
 |  +Host1c
 |  +Host2c
 |
 |-Page2
    +Host2a
    +Host2b
    +Host1a
    +Host2c

Now if I use the command, 'bin/bb 127.0.0.1 "hobbitdboard page=Page2 test=info 
fields=hostname"', I will get the results of, "Host2a, Host2b, Host2c". 
Host1a will not be shown.

So it seems that alerts, hobbitdboard, and the info page that shows what alert 
rules the device matches to are affected. The info page section, 
Page/subpage, does not appear to be affected.

I using my bb-hosts file from BigBrother. It uses the page & subpage tags. I 
do not use any subparent tags.  I am using Hobbit 4.2.0 with the 02/09/2007 
all-in-one patch.

Any other information I can collect to help resolve this problem ?
 ~Steve
quoted from John Glowacki

On Tuesday 17 April 2007 16:12, John Glowacki wrote:
Here is an example for workaround. Call SCRIPT instead of MAIL and do
page filtering in a custom script. That might give you some ideas for
other workarounds until a reason for the problem is found.

hobbit-alerts.cfg:
HOST=* RECOVERED NOTICE
     SCRIPT /opt/hobbit/server/etc/alert.sh noc FORMAT=SCRIPT

alert.sh:
# Check if hostname is on page=test
bb 127.0.0.1 "hobbitdboard test=$BBSVCNAME page=^test fields=hostname" |
grep "^$BBHOSTNAME$"
if [ "$?" = "0" ]
then
    echo "Skip alert: `date` P $BBHOSTNAME.$BBSVCNAME" >>
/var/log/hobbit/skip_alert.log
    exit 0
fi
# send alert (mail,sms,etc)

John

user-ce96540ed38f@xymon.invalid wrote:
Since I have not seen a response I will assume there presently isn't a
known workaround. The only workaround I can come up with is the use of
MACROS in the hobbit-alerts.cfg. Basically using MACROS to define a group
of hosts, and then creating an alert rule with HOST=$MacroName.
$GROUPA=%HOSTA|HOSTB|HOSTC|HOSTF
HOST=$GROUPA

This would be similar to HostGroup (hg-) in BigBrother. I haven't tested
this yet. I was hoping that PAGE= would work for any page a Device is
listed on and not just the top level one. That would make management of
alert rules much easier and was one of the big features in Hobbit I was
looking forward to.

I still do not know if PAGE= only working for the top level Page listing
is a Bug or not. I did see a past mailing list article
(http://www.hobbitmon.com/hobbiton/2005/05/msg0t0211.html) and a patch to
fix hobbitd_alert --test option. I did not see that patch mentioned in
the all-in-one patch, nor was I able to apply the patch, there was an
error.

I believe this issue is also bigger then just the failure occuring during
the --test option. I do not see alert rules being applied in the info
page, nor do I receive alerts. I tried looking into the code, but was
unable to come up with an answer/fix myself.

Henrik, or anyone else, if you have time to dig into this, I would
appreciate it greatly.

 ~Steve

On Tuesday 10 April 2007 16:11, user-ce96540ed38f@xymon.invalid wrote:
Searching the Mailing list archives, I see a few others have experienced
the same problem. So is there a recommended work-around , or is this on
the todo list to be fixed.

Any information would be most welcomed, Thanks.
 ~Steve

On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:
Lets say I have the following bb-hosts file:
page servers
subpage Web Web Servers
1.2.3.4		Web01.domain.com			#
1.2.3.5		Web02.domain.com			#
Subpage Other Other Web Servers
0.0.0.0		Web02.domain.com			#

And now I have the following hobbit-alerts.cfg:
PAGE=servers/Other SERVICE=conn
	MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow

Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com
conn", I see:
2007-03-30 15:09:38 Using default environment
file ..../server/etc/hobbitserver.cfg
00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state
Paging 00000897 2007-03-30 15:09:38 Matching
host:service:page 'Web02.domain.com:conn:server/Web' against rule line
119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other
SERVICE=conn' (pagename not in include list)

So it seems when using PAGE= alert rules only honor the first page a
device is listed on. Is this by design ?
list Henrik Størner · Fri, 1 Jun 2007 22:00:51 +0200 ·
On Thu, May 31, 2007 at 12:21:20PM -0400, user-ce96540ed38f@xymon.invalid wrote:
Digging back into this issue [...]
The root of the problem is that as far as 99% of Hobbit is concerned, a
host lives on one page only: The one it gets from the "page", "subpage"
and "subparent" tags in bb-hosts. If a host is listed twice (or more) in
bb-hosts then it is assigned one of those as the "preferred" definition,
either by explicitly having the "prefer" keyword listed on one entry, 
or by virtue of having one of the entries with an IP and the others
listed as "0.0.0.0" and "noconn".

The only exception is that "bbgen" which builds the webpages can place a
host in multiple locations on the webpages. All the other tools just
ignore that.

So the workaround for your scenario would be to define your bb-hosts
file as 

    page servers
    subpage Web Web Servers
    1.2.3.4             Web01.domain.com   #
    0.0.0.0             Web02.domain.com   # noconn
    Subpage Other Other Web Servers
    1.2.3.5             Web02.domain.com   # prefer

Then, as far as Hobbit is concerned, the Web02 host resides on the
"servers/Other" page.

It would be nice to have hosts internally represented as residing on a
list of pages rather than just a single page. But it's a complexity that
so far I haven't found it worth adding.


Regards,
Henrik
list S Aiello · Fri, 1 Jun 2007 17:04:44 -0400 ·
If I understand you correctly, a host can really only live on one page. But 'aliases' of the host can show up on other pages, just most of Hobbit does not see the aliases.

That work around doesn't work for me. I was hoping of setting up my alert rules to be page based, so if it is displayed on 1, 2,  or 3 pages.. the appropriate group/groups would be alerted. 
Though if this actually did work, and then I started using different group-only options on the different pages... then i would only want the reports that are displayed on that particular page to match the PAGE= alert rule. So I can see that it would be a can of worms.

So the only valid solution I can see is to use macros to create groups of hosts, and uses those macro groups in my alert rules. Then when I add/remove a device from a page, I will also need to add/remove it from my macro groups. Is there a limit on the number of hosts that can be defined in a macro ? Though using a macro would be somewhat ugly:
$GroupA=(HostA|HostB|HostC|HostD)

My problem is that I have multiple groups that want to be alerted, and a good # of the devices are shared between groups.

Thank you for you prompt response, I appreciate it.
 ~Steve
quoted from Henrik Størner

On Friday 01 June 2007 16:00, Henrik Stoerner wrote:
On Thu, May 31, 2007 at 12:21:20PM -0400, user-ce96540ed38f@xymon.invalid wrote:
Digging back into this issue [...]
The root of the problem is that as far as 99% of Hobbit is concerned, a
host lives on one page only: The one it gets from the "page", "subpage"
and "subparent" tags in bb-hosts. If a host is listed twice (or more) in
bb-hosts then it is assigned one of those as the "preferred" definition,
either by explicitly having the "prefer" keyword listed on one entry,
or by virtue of having one of the entries with an IP and the others
listed as "0.0.0.0" and "noconn".

The only exception is that "bbgen" which builds the webpages can place a
host in multiple locations on the webpages. All the other tools just
ignore that.

So the workaround for your scenario would be to define your bb-hosts
file as

    page servers
    subpage Web Web Servers
    1.2.3.4             Web01.domain.com   #
    0.0.0.0             Web02.domain.com   # noconn
    Subpage Other Other Web Servers
    1.2.3.5             Web02.domain.com   # prefer

Then, as far as Hobbit is concerned, the Web02 host resides on the
"servers/Other" page.

It would be nice to have hosts internally represented as residing on a
list of pages rather than just a single page. But it's a complexity that
so far I haven't found it worth adding.


Regards,
Henrik