Xymon Mailing List Archive search

Hobbit 4.0-RC1 available

14 messages in this thread

list Henrik Størner · Thu, 3 Feb 2005 00:09:01 +0100 ·
The first release-candidate of Hobbit 4.0 was uploaded to
hobbitmon.sourceforge.net a few minutes ago.

This fixes several serious bugs, especially in the alert handling
module where use of the DURATION specification in beta6 would delay an
alert for 24 hours, and the handling of macros was very broken.

A number of platform compatibility issues have been resolved,
especially on NetBSD and AIX.

The data collected for vmstat- and netstat-reports has changed, and
this unfortunately means that the vmstat- and netstat-RRD files in RC1
are incompatible with previous versions, and also with files generated
by LARRD. You must delete files in the old format, or collection of
these data will fail. The larrd-data.log and larrd-status.log files
will contain error-messages if this happens.

Some joker disabled all of the tests on my demonstration site. So
I decided to move the "Enable/Disable" and "Acknowledge" CGI scripts
to a secured area requiring a login. This is now part of Hobbit and
can be enabled during configuration.

Several other minor bugs and improvements, the full list is below.

Thanks to everyone testing Hobbit - there are now more than 100 people
on the mailing list, which is much more than I had expected. I hope
you'll try out this version, so any remaining bugs can be squashed.


Regards,
Henrik

Changes from beta-6 -> RC-1

* NOTE: The netstat RRD file layout has changed.  You must delete all 
  "netstat.rrd" files, or your data collection will fail. Sorry, but the old 
  LARRD format cannot cope with reality, where some systems report packet 
  counts, some report byte counts, and some have both.

* NOTE: The vmstat RRD file layout has changed, for data collected
  from Linux-based systems. You must delete all such "vmstat.rrd"
  files, or the data collection will fail. There were several 
  incompatible formats of linux vmstat RRD files - now they all
  use the same format.

* Hobbit should now work on AIX, and possibly other platforms
  that enforce strict X/Open XPG4 semantics for the ftok() 
  routine. If you were seeing messages like "Could not generate 
  shmem key..." or "Cannot setup status channel", then these
  should now be fixed. Thanks to Chris Morris for being patient 
  with me while I struggled with the finer details of SVR4 IPC.

* A new "--test" option was implemented for hobbitd_alert. Running
  "hobbitd_alert --test HOSTNAME SERVICENAME" will print out the
  rules that match the HOSTNAME/SERVICENAME combination, so you 
  can see what alerts are triggered. (Note: man-page is not updated
  with this info yet).

* Macros in hobbit-alerts.cfg were broken. Completely.

* Specifying time-values in hobbit-alerts.cfg as "10m" was 
  interpreted as "10 months" instead of "10 minutes". Since
  minutes is much more likely to be useful, the support for
  "months" and "years" was dropped, and "10m" now means minutes.

* Specifying any kind of duration meant that the alert would not
  happen until 24 hours had passed (and the alert still was active).
 
* Two new keywords in the hobbit-alerts.cfg file: 
  STOP on a recipient means that Hobbit stops looking for more 
  recipients after the current one has matched. 
  UNMATCHED on a recipient means that this recipient only 
  gets an alert if no other recipients got an alert (for
  setting up a default catch-all rule).

* A simple syntax for "all hosts" is now "HOST=*" in hobbit-alerts.cfg

* Configuring the Hobbit URL's as a root URL - "http://host/"; -
  now works, and yields correct links to the menu, the GIF-
  files, and the documentation.

* The Enable/Disable script (maint.pl) now picks up the menu
  correctly.

* The "configure" script now allows you to select a second
  CGI directory for administration scripts (the ack-script
  and enable/disable script). This second directory is 
  access-controlled; the default Apache configuration was
  updated to show how.

* The ~/server/starthobbit.sh symbolic link was not being
  re-generated if the file already existed (on some platforms).

* The hobbit-tips.html file is now generated during installation,
  so it has correct links to the icons. Previously it assumed 
  that the icon-files had been copied to the www/help/ directory.

* vmstat data for the "I/O wait cpu time" on Linux is now being
  collected. Linux RRD data-formats made identical across the
  various formats that are in use; this means any existing vmstat.rrd
  files from Linux systems must be deleted.

* Support for NetBSD 2.0 in vmstat and netstat RRD handler.

* HTTP authentication strings are now URL un-escaped, so you can
  use any byte-value in the username and/or password.

* The info-column now includes the current address assigned to 
  a DHCP host.

* A new "noinfo" tag can be used to suppress the "info" column on
  selected hosts.

* bbhostgrep no longer segfault's if you do not use the BBLOCATION
  environment setting. This kept some extension scripts from working.

* If every single network test failed, bbtest-net could loop 
  indefinitely. Seen on NetBSD.

* If the number of available file-descriptors (ulimit -n) was
  exceeded, bbtest-net would loop forever. Seen on NetBSD.

* When running http-tests via a proxy that requires authentication,
  the authentication info could not contain a ":" or a "@". The
  authentication info is now un-escaped using the normal URL
  decoding routine, so these characters can be entered as %XX 
  escaped data (e.g. to put a "@" in the password, use "%40").

* LDAP tests now try to select LDAPv3 first, and fallback to
  LDAPv2 if this is not possible. Some newer LDAP servers 
  refuse to talk v2 protocol.

* Memory debugging was accidentally enabled in beta6 - it is now
  disabled.
list Andy France · Thu, 3 Feb 2005 17:24:42 +1300 ·

Henrik Stoerner wrote on 03/02/2005 12:09:01:
quoted from Henrik Størner
The first release-candidate of Hobbit 4.0 was uploaded to
hobbitmon.sourceforge.net a few minutes ago.
I've just updated from beta6 ro RC1, and all my conn tests now fail :-(

A simple fping command line test seems to work, but a sample alert says...


red <!-- [flags:ordAstILe] --> Thu Feb  3 16:53:17 2005 conn NOT ok

Service conn on zpnz-mtm-ctrx05 is not OK : Host does not respond to ping


System unreachable for 1 poll periods (0 seconds)


I'm not sure what the flags mean.  If they are for fping, some of them
don't appear in the man page for my version.

I'm running on Solaris 9 x86 (SunOS zpnz-mtm-bb01 5.9 Generic_117172-05
i86pc i386 i86pc), with the libraries and tools installed via blastwave
packages in /opt/csw.  Everything was fine under beta6.

Any ideas?  The only log which is being updated is hobbitlaunch.log which
doesn't tell me much.

TIA,
Andy.


#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################
list Tom Georgoulias · Thu, 03 Feb 2005 13:08:12 -0500 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
The first release-candidate of Hobbit 4.0 was uploaded to
hobbitmon.sourceforge.net a few minutes ago.
Thanks to everyone testing Hobbit - there are now more than 100 people
on the mailing list, which is much more than I had expected. I hope
you'll try out this version, so any remaining bugs can be squashed.
So far so good.  Started testing it out first thing this morning and all 
of the bugs that I was aware of have been squashed, most notably the 
fact that the DURATION variable works in alerts.

Tom
list Rick Waegner · Thu, 03 Feb 2005 12:22:29 -0600 ·
Oh yes indeed! Excellent! I even have my BEA column now! One quick
question though, the bea column consists of only a bea heap utilization
graph. Is there a number of other bea stats available? like the apache
multiple graphs "LARRD:*,apache:apache1|apache2|apache3"? I have the two
enterprise OID strings in Henrik's bea-snmpstats.sh ext file, but only
one graph (heap) is displayed, and it's empty! I've done snmpwalks to
the servers (to both OIDs) and get VAST amounts of data..


Rick
list Rick Waegner · Thu, 03 Feb 2005 12:36:18 -0600 ·
Spoke too soon. Now that a few polling periods have passed, the bea page
is nothing but the truncated snmpwalk output from the BEA-WEBLOG-MIB and
the graphs have disappeared, odd.
in the bea.log file, "/home/hobbit/server/ext/bea-snmpstats.sh: line 27:
/home/hobbit/server/bin/bb: Argument list too long" is being dumped at
eat polling time. Any ideas where I went wrong?


Rick
quoted from Rick Waegner


On Thu, 2005-02-03 at 12:22, rwaegner wrote:
Oh yes indeed! Excellent! I even have my BEA column now! One quick
question though, the bea column consists of only a bea heap utilization
graph. Is there a number of other bea stats available? like the apache
multiple graphs "LARRD:*,apache:apache1|apache2|apache3"? I have the two
enterprise OID strings in Henrik's bea-snmpstats.sh ext file, but only
one graph (heap) is displayed, and it's empty! I've done snmpwalks to
the servers (to both OIDs) and get VAST amounts of data..


Rick

list Charles Jones · Thu, 03 Feb 2005 14:28:32 -0700 ·
When you use maint.pl, there are no links back to the main Hobbit page, nor the menu overlay. So, when you enable/disable a host, the only way to get back to the main hobbit page is to click back several times.

Note: it's not a problem with bb-ack.sh, as the Hobbit menu remains overlay so you can easily click "Views, Main View".

-Charles
list Henrik Størner · Thu, 3 Feb 2005 21:47:31 +0000 (UTC) ·
quoted from Charles Jones
In <user-efc6f9db7018@xymon.invalid> Charles Jones <user-e86b4aeade4e@xymon.invalid> writes:
When you use maint.pl, there are no links back to the main Hobbit page, 
nor the menu overlay. So, when you enable/disable a host, the only way 
to get back to the main hobbit page is to click back several times.
Make sure you installed the latest files
hobbit-4.0-RC1/hobbitd/webfiles/maint_{header,footer} in your
~hobbit/server/web/ directory.

If you upgraded, then the old files are NOT overwritten with the new
ones, you need to copy them by hand.

Same goes for the menu-files in hobbit-4.0-RC1/hobbitd/wwwfiles/menu/
that should be copied to ~hobbit/server/www/menu/

The maint.pl distributed with Hobbit should include the menu.


Henrik
list Henrik Størner · Thu, 3 Feb 2005 21:52:40 +0000 (UTC) ·
quoted from Rick Waegner
In <1107455777.32088.36.camel at rinux> rwaegner <user-e63f4e3fcb43@xymon.invalid> writes:
Spoke too soon. Now that a few polling periods have passed, the bea page
is nothing but the truncated snmpwalk output from the BEA-WEBLOG-MIB and
the graphs have disappeared, odd.
in the bea.log file, "/home/hobbit/server/ext/bea-snmpstats.sh: line 27:
/home/hobbit/server/bin/bb: Argument list too long" is being dumped at
eat polling time. Any ideas where I went wrong?
How much data does the two snmpwalk commands in bea-snmpstats generate?

It's probably your servers that return more data than the current
script can handle. The output of the snmpwalk commands are passed as a
command-line parameter to the "bb" client tool, so if it exceeds
something like 32 Kb (it varies between shell implementations), then
your shell cannot handle it.

It would need some tweaking of the bea-snmpstats script to work around
this, and you probably also need to recompile Hobbit with an increased
MAXMSG setting.


Henrik
list Charles Jones · Thu, 03 Feb 2005 14:58:50 -0700 ·
Did that, now when I go to the enable/disable option I get a popup in IE that says:

"A Runtime Error has occured. Do you wish to Debug?
Line: 503
Error: 'menu' is undefined"

I assume this is some sort of javascript error. The menu is working fine for everything else, including the Ack section.  I made sure I copied the files you specified, and chown'ed them to the hobbit user and restarted hobbit.

-Charles
quoted from Henrik Størner

Henrik Storner wrote:
In <user-efc6f9db7018@xymon.invalid> Charles Jones <user-e86b4aeade4e@xymon.invalid> writes:

 
When you use maint.pl, there are no links back to the main Hobbit page, nor the menu overlay. So, when you enable/disable a host, the only way to get back to the main hobbit page is to click back several times.
   
Make sure you installed the latest files
hobbit-4.0-RC1/hobbitd/webfiles/maint_{header,footer} in your
~hobbit/server/web/ directory.

If you upgraded, then the old files are NOT overwritten with the new
ones, you need to copy them by hand.

Same goes for the menu-files in hobbit-4.0-RC1/hobbitd/wwwfiles/menu/
that should be copied to ~hobbit/server/www/menu/

The maint.pl distributed with Hobbit should include the menu.


Henrik

list Gordon Thiesfeld · Thu, 3 Feb 2005 16:06:22 -0600 ·
Check your hobbitserver.cfg against the newly created one.  You need a line
like this:

 
BBMENUSKIN="$BBSERVERWWWURL/menu"
quoted from Charles Jones

 
From: Charles Jones [mailto:user-e86b4aeade4e@xymon.invalid] 
Sent: Thursday, February 03, 2005 3:59 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] No Hobbit menu in maint.pl

 
Did that, now when I go to the enable/disable option I get a popup in IE
that says:

"A Runtime Error has occured. Do you wish to Debug?
Line: 503
Error: 'menu' is undefined"

I assume this is some sort of javascript error.  
The menu is working fine for everything else, including the Ack section.  I
made sure I copied the files you specified, and chown'ed them to the hobbit
user and restarted hobbit.

-Charles

Henrik Storner wrote: 

In  <mailto:user-efc6f9db7018@xymon.invalid> <user-efc6f9db7018@xymon.invalid> Charles
quoted from Charles Jones
Jones  <mailto:user-e86b4aeade4e@xymon.invalid> <user-e86b4aeade4e@xymon.invalid> writes:
 
  
When you use maint.pl, there are no links back to the main Hobbit page, 
nor the menu overlay. So, when you enable/disable a host, the only way 
to get back to the main hobbit page is to click back several times.
    

Make sure you installed the latest files
hobbit-4.0-RC1/hobbitd/webfiles/maint_{header,footer} in your
~hobbit/server/web/ directory.
 
If you upgraded, then the old files are NOT overwritten with the new
ones, you need to copy them by hand.
 
Same goes for the menu-files in hobbit-4.0-RC1/hobbitd/wwwfiles/menu/
that should be copied to ~hobbit/server/www/menu/
 
The maint.pl distributed with Hobbit should include the menu.
 
 
Henrik
list Rick Waegner · Thu, 03 Feb 2005 16:42:05 -0600 ·
Lots of data.  I have 1125 lines that total 132k. With ALL the cluster
members responding to the management server that I am polling. The
script is very simple, but the  hobbitgraph.cfg for the bea stats is a
little confusing. And what would need to change? Also, what needs to
change for the MAXMSG setting?

Thanks

Rick
quoted from Henrik Størner

On Thu, 2005-02-03 at 15:52, Henrik Storner wrote:
In <1107455777.32088.36.camel at rinux> rwaegner <user-e63f4e3fcb43@xymon.invalid> writes:
Spoke too soon. Now that a few polling periods have passed, the bea page
is nothing but the truncated snmpwalk output from the BEA-WEBLOG-MIB and
the graphs have disappeared, odd.
in the bea.log file, "/home/hobbit/server/ext/bea-snmpstats.sh: line 27:
/home/hobbit/server/bin/bb: Argument list too long" is being dumped at
eat polling time. Any ideas where I went wrong?
How much data does the two snmpwalk commands in bea-snmpstats generate?

It's probably your servers that return more data than the current
script can handle. The output of the snmpwalk commands are passed as a
command-line parameter to the "bb" client tool, so if it exceeds
something like 32 Kb (it varies between shell implementations), then
your shell cannot handle it.

It would need some tweaking of the bea-snmpstats script to work around
this, and you probably also need to recompile Hobbit with an increased
MAXMSG setting.


Henrik

list Rick Waegner · Thu, 03 Feb 2005 16:54:32 -0600 ·
One other thing, the data/rrd/beaserver directory has rrd files from the
tests:

-rw-rw-r--    1 hobbit   hobbit      57516 Feb  3 16:52
bea.threads.questia_admin_server.weblogic.kernel.System.rrd
-rw-rw-r--    1 hobbit   hobbit      57516 Feb  3 16:52
bea.threads.questia_admin_server.weblogic.kernel.Non-Blocking.rrd
-rw-rw-r--    1 hobbit   hobbit      57516 Feb  3 16:52
bea.threads.questia_admin_server.weblogic.kernel.Default.rrd
-rw-rw-r--    1 hobbit   hobbit      57516 Feb  3 16:52
bea.threads.questia_admin_server.weblogic.admin.RMI.rrd
-rw-rw-r--    1 hobbit   hobbit      57516 Feb  3 16:52
bea.threads.questia_admin_server.weblogic.admin.HTTP.rrd
-rw-rw-r--    1 hobbit   hobbit     171420 Feb  3 16:52
bea.memory.questia_admin_server.rrd


Appreciate any help you can provide!
quoted from Rick Waegner


Rick


On Thu, 2005-02-03 at 15:52, Henrik Storner wrote:
In <1107455777.32088.36.camel at rinux> rwaegner <user-e63f4e3fcb43@xymon.invalid> writes:
Spoke too soon. Now that a few polling periods have passed, the bea page
is nothing but the truncated snmpwalk output from the BEA-WEBLOG-MIB and
the graphs have disappeared, odd.
in the bea.log file, "/home/hobbit/server/ext/bea-snmpstats.sh: line 27:
/home/hobbit/server/bin/bb: Argument list too long" is being dumped at
eat polling time. Any ideas where I went wrong?
How much data does the two snmpwalk commands in bea-snmpstats generate?

It's probably your servers that return more data than the current
script can handle. The output of the snmpwalk commands are passed as a
command-line parameter to the "bb" client tool, so if it exceeds
something like 32 Kb (it varies between shell implementations), then
your shell cannot handle it.

It would need some tweaking of the bea-snmpstats script to work around
this, and you probably also need to recompile Hobbit with an increased
MAXMSG setting.


Henrik

list Henrik Størner · Sun, 6 Feb 2005 22:22:19 +0100 ·
quoted from Andy France
On Thu, Feb 03, 2005 at 05:24:42PM +1300, Andy France wrote:
Henrik Stoerner wrote on 03/02/2005 12:09:01:
The first release-candidate of Hobbit 4.0 was uploaded to
hobbitmon.sourceforge.net a few minutes ago.
I've just updated from beta6 ro RC1, and all my conn tests now fail :-(
There are no changes from beta-6 -> RC1 that would explain such a
failure.

I'd like to see what happens if you run it with debugging output
enabled. The simplest way of doing that is (login as the hobbit user):

cd ~/server
./bin/bbcmd --env=etc/hobbitserver.cfg bbtest-net --ping --debug 

If you have lots of hosts, you can add a couple of hostnames to the
command, that will make it run the tests for those hosts only.

I'm interested in the tmp/fping.* files that are generated by this,
and in the output that bbtest-net dumps while running the test.

It's probably best is you send them to me directly, especially if they
are large.


Regards,
Henrik
list Andy France · Mon, 7 Feb 2005 12:11:52 +1300 ·

Henrik Stoerner wrote on 07/02/2005 10:22:19:
quoted from Andy France
On Thu, Feb 03, 2005 at 05:24:42PM +1300, Andy France wrote:
Henrik Stoerner wrote on 03/02/2005 12:09:01:
The first release-candidate of Hobbit 4.0 was uploaded to
hobbitmon.sourceforge.net a few minutes ago.
I've just updated from beta6 ro RC1, and all my conn tests now fail :-(
There are no changes from beta-6 -> RC1 that would explain such a
failure.
I'd like to see what happens if you run it with debugging output
enabled. The simplest way of doing that is (login as the hobbit user):
cd ~/server
./bin/bbcmd --env=etc/hobbitserver.cfg bbtest-net --ping --debug
If you have lots of hosts, you can add a couple of hostnames to the
command, that will make it run the tests for those hosts only.
I'm interested in the tmp/fping.* files that are generated by this,
and in the output that bbtest-net dumps while running the test.
It's probably best is you send them to me directly, especially if they
are large.
Regards,
Henrik
Wouldn't you know it - I just ran the stop, make install, start again to
upgrade to RC1 and all my conn's are fine :-/

Thanks for the help and sorry for the noise!
quoted from Andy France

Andy.


#####################################################################################

This email is intended for the person to whom it is addressed
only. If you are not the intended recipient, do not read, copy
or use the contents in any way. The opinions expressed may not
necessarily reflect those of ZESPRI Group of Companies ('ZESPRI').

While every effort has been made to verify the information
contained herein, ZESPRI does not make any representations 
as to the accuracy of the information or to the performance
of any data, information or the products mentioned herein.
ZESPRI will not accept liability for any losses, damage or
consequence, however, resulting directly or indirectly from
the use of this e-mail/attachments.
#####################################################################################