hobbitd_alert: Servers on multiple pages & PAGE= rules
list S Aiello
Lets say I have the following bb-hosts file: page servers subpage Web Web Servers 1.2.3.4 Web01.domain.com # 1.2.3.5 Web02.domain.com # Subpage Other Other Web Servers 0.0.0.0 Web02.domain.com # And now I have the following hobbit-alerts.cfg: PAGE=servers/Other SERVICE=conn MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com conn", I see: 2007-03-30 15:09:38 Using default environment file ..../server/etc/hobbitserver.cfg 00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state Paging 00000897 2007-03-30 15:09:38 Matching host:service:page 'Web02.domain.com:conn:server/Web' against rule line 119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn' (pagename not in include list) So it seems when using PAGE= alert rules only honor the first page a device is listed on. Is this by design ?
list S Aiello
Searching the Mailing list archives, I see a few others have experienced the same problem. So is there a recommended work-around , or is this on the todo list to be fixed. Any information would be most welcomed, Thanks. ~Steve
▸
On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:Lets say I have the following bb-hosts file: page servers subpage Web Web Servers 1.2.3.4 Web01.domain.com # 1.2.3.5 Web02.domain.com # Subpage Other Other Web Servers 0.0.0.0 Web02.domain.com # And now I have the following hobbit-alerts.cfg: PAGE=servers/Other SERVICE=conn MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com conn", I see: 2007-03-30 15:09:38 Using default environment file ..../server/etc/hobbitserver.cfg 00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state Paging 00000897 2007-03-30 15:09:38 Matching host:service:page 'Web02.domain.com:conn:server/Web' against rule line 119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn' (pagename not in include list) So it seems when using PAGE= alert rules only honor the first page a device is listed on. Is this by design ?
list S Aiello
Since I have not seen a response I will assume there presently isn't a known workaround. The only workaround I can come up with is the use of MACROS in the hobbit-alerts.cfg. Basically using MACROS to define a group of hosts, and then creating an alert rule with HOST=$MacroName. $GROUPA=%HOSTA|HOSTB|HOSTC|HOSTF HOST=$GROUPA This would be similar to HostGroup (hg-) in BigBrother. I haven't tested this yet. I was hoping that PAGE= would work for any page a Device is listed on and not just the top level one. That would make management of alert rules much easier and was one of the big features in Hobbit I was looking forward to. I still do not know if PAGE= only working for the top level Page listing is a Bug or not. I did see a past mailing list article (http://www.hobbitmon.com/hobbiton/2005/05/msg0t0211.html) and a patch to fix hobbitd_alert --test option. I did not see that patch mentioned in the all-in-one patch, nor was I able to apply the patch, there was an error. I believe this issue is also bigger then just the failure occuring during the --test option. I do not see alert rules being applied in the info page, nor do I receive alerts. I tried looking into the code, but was unable to come up with an answer/fix myself. Henrik, or anyone else, if you have time to dig into this, I would appreciate it greatly. ~Steve
▸
On Tuesday 10 April 2007 16:11, user-ce96540ed38f@xymon.invalid wrote:Searching the Mailing list archives, I see a few others have experienced the same problem. So is there a recommended work-around , or is this on the todo list to be fixed. Any information would be most welcomed, Thanks. ~Steve On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:Lets say I have the following bb-hosts file: page servers subpage Web Web Servers 1.2.3.4 Web01.domain.com # 1.2.3.5 Web02.domain.com # Subpage Other Other Web Servers 0.0.0.0 Web02.domain.com # And now I have the following hobbit-alerts.cfg: PAGE=servers/Other SERVICE=conn MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com conn", I see: 2007-03-30 15:09:38 Using default environment file ..../server/etc/hobbitserver.cfg 00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state Paging 00000897 2007-03-30 15:09:38 Matching host:service:page 'Web02.domain.com:conn:server/Web' against rule line 119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn' (pagename not in include list) So it seems when using PAGE= alert rules only honor the first page a device is listed on. Is this by design ?
list John Glowacki
Here is an example for workaround. Call SCRIPT instead of MAIL and do
page filtering in a custom script. That might give you some ideas for
other workarounds until a reason for the problem is found.
hobbit-alerts.cfg:
HOST=* RECOVERED NOTICE
SCRIPT /opt/hobbit/server/etc/alert.sh noc FORMAT=SCRIPT
alert.sh:
# Check if hostname is on page=test
bb 127.0.0.1 "hobbitdboard test=$BBSVCNAME page=^test fields=hostname" |
grep "^$BBHOSTNAME$"
if [ "$?" = "0" ]
then
echo "Skip alert: `date` P $BBHOSTNAME.$BBSVCNAME" >>
/var/log/hobbit/skip_alert.log
exit 0
fi
# send alert (mail,sms,etc)
John
▸
user-ce96540ed38f@xymon.invalid wrote:Since I have not seen a response I will assume there presently isn't a known workaround. The only workaround I can come up with is the use of MACROS in the hobbit-alerts.cfg. Basically using MACROS to define a group of hosts, and then creating an alert rule with HOST=$MacroName. $GROUPA=%HOSTA|HOSTB|HOSTC|HOSTF HOST=$GROUPA This would be similar to HostGroup (hg-) in BigBrother. I haven't tested this yet. I was hoping that PAGE= would work for any page a Device is listed on and not just the top level one. That would make management of alert rules much easier and was one of the big features in Hobbit I was looking forward to. I still do not know if PAGE= only working for the top level Page listing is a Bug or not. I did see a past mailing list article (http://www.hobbitmon.com/hobbiton/2005/05/msg0t0211.html) and a patch to fix hobbitd_alert --test option. I did not see that patch mentioned in the all-in-one patch, nor was I able to apply the patch, there was an error. I believe this issue is also bigger then just the failure occuring during the --test option. I do not see alert rules being applied in the info page, nor do I receive alerts. I tried looking into the code, but was unable to come up with an answer/fix myself. Henrik, or anyone else, if you have time to dig into this, I would appreciate it greatly. ~Steve On Tuesday 10 April 2007 16:11, user-ce96540ed38f@xymon.invalid wrote:Searching the Mailing list archives, I see a few others have experienced the same problem. So is there a recommended work-around , or is this on the todo list to be fixed. Any information would be most welcomed, Thanks. ~Steve On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:Lets say I have the following bb-hosts file: page servers subpage Web Web Servers 1.2.3.4 Web01.domain.com # 1.2.3.5 Web02.domain.com # Subpage Other Other Web Servers 0.0.0.0 Web02.domain.com # And now I have the following hobbit-alerts.cfg: PAGE=servers/Other SERVICE=conn MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com conn", I see: 2007-03-30 15:09:38 Using default environment file ..../server/etc/hobbitserver.cfg 00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state Paging 00000897 2007-03-30 15:09:38 Matching host:service:page 'Web02.domain.com:conn:server/Web' against rule line 119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn' (pagename not in include list) So it seems when using PAGE= alert rules only honor the first page a device is listed on. Is this by design ?
list S Aiello
Digging back into this issue. That workaround would only be possible if I had
a small # of devices and/or a small # of page based rules. If I have alot of
devices the script would get called for every host. And with a lot these page
based alert rules, that would be a large number of scripts be spawned for
each alert.
But in desperation and I had a bit of time, I started to dig more into this.
Even the hobbitdboard command does not show the devices. When I have the
following:
Main Page:
|-Page1
| +Host1a
| +Host1b
| +Host1c
| +Host2c
|
|-Page2
+Host2a
+Host2b
+Host1a
+Host2c
Now if I use the command, 'bin/bb 127.0.0.1 "hobbitdboard page=Page2 test=info
fields=hostname"', I will get the results of, "Host2a, Host2b, Host2c".
Host1a will not be shown.
So it seems that alerts, hobbitdboard, and the info page that shows what alert
rules the device matches to are affected. The info page section,
Page/subpage, does not appear to be affected.
I using my bb-hosts file from BigBrother. It uses the page & subpage tags. I
do not use any subparent tags. I am using Hobbit 4.2.0 with the 02/09/2007
all-in-one patch.
Any other information I can collect to help resolve this problem ?
~Steve
▸
On Tuesday 17 April 2007 16:12, John Glowacki wrote:Here is an example for workaround. Call SCRIPT instead of MAIL and do page filtering in a custom script. That might give you some ideas for other workarounds until a reason for the problem is found. hobbit-alerts.cfg: HOST=* RECOVERED NOTICE SCRIPT /opt/hobbit/server/etc/alert.sh noc FORMAT=SCRIPT alert.sh: # Check if hostname is on page=test bb 127.0.0.1 "hobbitdboard test=$BBSVCNAME page=^test fields=hostname" | grep "^$BBHOSTNAME$" if [ "$?" = "0" ] then echo "Skip alert: `date` P $BBHOSTNAME.$BBSVCNAME" >> /var/log/hobbit/skip_alert.log exit 0 fi # send alert (mail,sms,etc) John user-ce96540ed38f@xymon.invalid wrote:Since I have not seen a response I will assume there presently isn't a known workaround. The only workaround I can come up with is the use of MACROS in the hobbit-alerts.cfg. Basically using MACROS to define a group of hosts, and then creating an alert rule with HOST=$MacroName. $GROUPA=%HOSTA|HOSTB|HOSTC|HOSTF HOST=$GROUPA This would be similar to HostGroup (hg-) in BigBrother. I haven't tested this yet. I was hoping that PAGE= would work for any page a Device is listed on and not just the top level one. That would make management of alert rules much easier and was one of the big features in Hobbit I was looking forward to. I still do not know if PAGE= only working for the top level Page listing is a Bug or not. I did see a past mailing list article (http://www.hobbitmon.com/hobbiton/2005/05/msg0t0211.html) and a patch to fix hobbitd_alert --test option. I did not see that patch mentioned in the all-in-one patch, nor was I able to apply the patch, there was an error. I believe this issue is also bigger then just the failure occuring during the --test option. I do not see alert rules being applied in the info page, nor do I receive alerts. I tried looking into the code, but was unable to come up with an answer/fix myself. Henrik, or anyone else, if you have time to dig into this, I would appreciate it greatly. ~Steve On Tuesday 10 April 2007 16:11, user-ce96540ed38f@xymon.invalid wrote:Searching the Mailing list archives, I see a few others have experienced the same problem. So is there a recommended work-around , or is this on the todo list to be fixed. Any information would be most welcomed, Thanks. ~Steve On Friday 30 March 2007 16:30, user-ce96540ed38f@xymon.invalid wrote:Lets say I have the following bb-hosts file: page servers subpage Web Web Servers 1.2.3.4 Web01.domain.com # 1.2.3.5 Web02.domain.com # Subpage Other Other Web Servers 0.0.0.0 Web02.domain.com # And now I have the following hobbit-alerts.cfg: PAGE=servers/Other SERVICE=conn MAIL user-f21f09270637@xymon.invalid COLOR=red,yellow Now I run the command, "bin/bbcmd hobbitd_alert --test Web02.domain.com conn", I see: 2007-03-30 15:09:38 Using default environment file ..../server/etc/hobbitserver.cfg 00000897 2007-03-30 15:09:38 send_alert Web02.domain.com:conn state Paging 00000897 2007-03-30 15:09:38 Matching host:service:page 'Web02.domain.com:conn:server/Web' against rule line 119 00000897 2007-03-30 15:09:38 Failed 'PAGE=servers/Other SERVICE=conn' (pagename not in include list) So it seems when using PAGE= alert rules only honor the first page a device is listed on. Is this by design ?
list Henrik Størner
On Thu, May 31, 2007 at 12:21:20PM -0400, user-ce96540ed38f@xymon.invalid wrote:
Digging back into this issue [...]
The root of the problem is that as far as 99% of Hobbit is concerned, a
host lives on one page only: The one it gets from the "page", "subpage"
and "subparent" tags in bb-hosts. If a host is listed twice (or more) in
bb-hosts then it is assigned one of those as the "preferred" definition,
either by explicitly having the "prefer" keyword listed on one entry,
or by virtue of having one of the entries with an IP and the others
listed as "0.0.0.0" and "noconn".
The only exception is that "bbgen" which builds the webpages can place a
host in multiple locations on the webpages. All the other tools just
ignore that.
So the workaround for your scenario would be to define your bb-hosts
file as
page servers
subpage Web Web Servers
1.2.3.4 Web01.domain.com #
0.0.0.0 Web02.domain.com # noconn
Subpage Other Other Web Servers
1.2.3.5 Web02.domain.com # prefer
Then, as far as Hobbit is concerned, the Web02 host resides on the
"servers/Other" page.
It would be nice to have hosts internally represented as residing on a
list of pages rather than just a single page. But it's a complexity that
so far I haven't found it worth adding.
Regards,
Henrik
list S Aiello
If I understand you correctly, a host can really only live on one page. But 'aliases' of the host can show up on other pages, just most of Hobbit does not see the aliases. That work around doesn't work for me. I was hoping of setting up my alert rules to be page based, so if it is displayed on 1, 2, or 3 pages.. the appropriate group/groups would be alerted. Though if this actually did work, and then I started using different group-only options on the different pages... then i would only want the reports that are displayed on that particular page to match the PAGE= alert rule. So I can see that it would be a can of worms. So the only valid solution I can see is to use macros to create groups of hosts, and uses those macro groups in my alert rules. Then when I add/remove a device from a page, I will also need to add/remove it from my macro groups. Is there a limit on the number of hosts that can be defined in a macro ? Though using a macro would be somewhat ugly: $GroupA=(HostA|HostB|HostC|HostD) My problem is that I have multiple groups that want to be alerted, and a good # of the devices are shared between groups. Thank you for you prompt response, I appreciate it. ~Steve
▸
On Friday 01 June 2007 16:00, Henrik Stoerner wrote:On Thu, May 31, 2007 at 12:21:20PM -0400, user-ce96540ed38f@xymon.invalid wrote:Digging back into this issue [...]The root of the problem is that as far as 99% of Hobbit is concerned, a host lives on one page only: The one it gets from the "page", "subpage" and "subparent" tags in bb-hosts. If a host is listed twice (or more) in bb-hosts then it is assigned one of those as the "preferred" definition, either by explicitly having the "prefer" keyword listed on one entry, or by virtue of having one of the entries with an IP and the others listed as "0.0.0.0" and "noconn". The only exception is that "bbgen" which builds the webpages can place a host in multiple locations on the webpages. All the other tools just ignore that. So the workaround for your scenario would be to define your bb-hosts file as page servers subpage Web Web Servers 1.2.3.4 Web01.domain.com # 0.0.0.0 Web02.domain.com # noconn Subpage Other Other Web Servers 1.2.3.5 Web02.domain.com # prefer Then, as far as Hobbit is concerned, the Web02 host resides on the "servers/Other" page. It would be nice to have hosts internally represented as residing on a list of pages rather than just a single page. But it's a complexity that so far I haven't found it worth adding. Regards, Henrik