Highlights of the 4.3.0 version

98 messages in this thread

list Henrik Størner · Sun, 22 Jul 2007 00:08:12 +0200 ·

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below; they have all been
implemented by now. Some of them have been contributed by others over
the past year - I'm pleased to have finally gotten their patches merged.

There are some open bug-reports, and the plan now is to try and get
those fixed. Once that is done I'll ask you all to start testing the
beta-versions, and then a new release is hopefully available soon.

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.


Major new features
* PAGE setting for alert- and client-configuration handles hosts on
  multiple pages, so any pagename can be used.
* Flap detection of statuses that change color rapidly. The status
  is kept at the most critical level until it stops flapping.
* Holiday support for alerts, including variable holidays (Easter etc)
* Split NCV support - graph data from NCV can be split into multiple
  RRD databases allowing for varying number of datasets.
* RRD database parameters are now configurable (i.e. number of 
  datapoints stored, whether to store min/max values etc). Note that
  this only applies to newly created RRD files, not existing ones.
* Distributed worker modules allow sharing the load across multiple
  Hobbit servers
* RRD updates are now cached for up to 30 minutes before being written
  to disk. This makes the I/O load on large installations much lighter.
* Detection of statuses that are reported by multiple hosts
* Client backend-support for the z/OS and z/VSE clients by Rich Smirna

Display things
* Graph zooming now limits the lower/upper bounds of a graph (requires
  rrdtool 1.2.x)
* The trends page default data-period can be configured to something
  other than the default 48-hour view, and the user can select a
  different period on-the-fly.
* Hosts can be sorted automatically on the overview webpage with a
  "group-sorted" group definition.
* NOCOLUMNS setting in bb-hosts let you suppress certain columns on
  a per-host basis
* Host-comments are displayed as tool-tips, to save screen space.

Checks and graphs
* Network tests can use a specific source IP instead of the default
* The validity-period of network tests is configurable, instead of
  being fixed at the default 30-minute setting
* Client file checks can check for a symlink
* "trends" report for RRD handling allows generating custom-made
  RRD files
* Hobbit host- and status-counts are tracked in an RRD file

Miscellaneous
* NCV reports can handle color-icons before the name:value data
* hobbitlaunch tasks can be configured to run on certain hosts only
* Time-warp detection and warning
* Local unix-socket interface to Hobbit daemon
* hobbitd_capture can collect several statuses and hand off such a
  batch to an external command
* Support for SHA-224/256/384/512 digests


Regards,
Henrik

list Asif Iqbal · Sat, 21 Jul 2007 18:49:49 -0400 ·

▸ quoted from Henrik Størner

On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below; they have all been
implemented by now. Some of them have been contributed by others over
the past year - I'm pleased to have finally gotten their patches merged.

There are some open bug-reports, and the plan now is to try and get
those fixed. Once that is done I'll ask you all to start testing the
beta-versions, and then a new release is hopefully available soon.

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.


Send a disable request thru email. Currently it only takes delay request

▸ quoted from Henrik Størner



Major new features

* PAGE setting for alert- and client-configuration handles hosts on
  multiple pages, so any pagename can be used.
* Flap detection of statuses that change color rapidly. The status
  is kept at the most critical level until it stops flapping.
* Holiday support for alerts, including variable holidays (Easter etc)
* Split NCV support - graph data from NCV can be split into multiple
  RRD databases allowing for varying number of datasets.
* RRD database parameters are now configurable (i.e. number of
  datapoints stored, whether to store min/max values etc). Note that
  this only applies to newly created RRD files, not existing ones.
* Distributed worker modules allow sharing the load across multiple
  Hobbit servers
* RRD updates are now cached for up to 30 minutes before being written
  to disk. This makes the I/O load on large installations much lighter.
* Detection of statuses that are reported by multiple hosts
* Client backend-support for the z/OS and z/VSE clients by Rich Smirna

Display things
* Graph zooming now limits the lower/upper bounds of a graph (requires
  rrdtool 1.2.x)
* The trends page default data-period can be configured to something
  other than the default 48-hour view, and the user can select a
  different period on-the-fly.
* Hosts can be sorted automatically on the overview webpage with a
  "group-sorted" group definition.
* NOCOLUMNS setting in bb-hosts let you suppress certain columns on
  a per-host basis
* Host-comments are displayed as tool-tips, to save screen space.

Checks and graphs
* Network tests can use a specific source IP instead of the default
* The validity-period of network tests is configurable, instead of
  being fixed at the default 30-minute setting
* Client file checks can check for a symlink
* "trends" report for RRD handling allows generating custom-made
  RRD files
* Hobbit host- and status-counts are tracked in an RRD file

Miscellaneous
* NCV reports can handle color-icons before the name:value data
* hobbitlaunch tasks can be configured to run on certain hosts only
* Time-warp detection and warning
* Local unix-socket interface to Hobbit daemon
* hobbitd_capture can collect several statuses and hand off such a
  batch to an external command
* Support for SHA-224/256/384/512 digests


Regards,
Henrik

--


Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

list Asif Iqbal · Sat, 21 Jul 2007 18:51:59 -0400 ·

▸ quoted from Asif Iqbal

On 7/21/07, Asif Iqbal <user-6f4b51ac2a40@xymon.invalid> wrote:

On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below; they have all been
implemented by now. Some of them have been contributed by others over
the past year - I'm pleased to have finally gotten their patches merged.


There are some open bug-reports, and the plan now is to try and get
those fixed. Once that is done I'll ask you all to start testing the
beta-versions, and then a new release is hopefully available soon.

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

Monitor and RRD of memory and cpu usage for a process

[..stripped for brevity..]

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

list Asif Iqbal · Sat, 21 Jul 2007 19:16:12 -0400 ·

▸ quoted from Asif Iqbal

On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below; they have all been
implemented by now. Some of them have been contributed by others over
the past year - I'm pleased to have finally gotten their patches merged.

There are some open bug-reports, and the plan now is to try and get
those fixed. Once that is done I'll ask you all to start testing the
beta-versions, and then a new release is hopefully available soon.

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.


- Display column only when  it is red
  (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
- SNMP trap by default
- SNMP probe option builtin
- Process specific alert
  (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
- Comment TAG for DOWNTIME
  (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
- Add functionalities in `delay'
  (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
- CPU/Memory Usage per process
  (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
- Text based alert for `msgs'. Currently it shows as html in my email
  (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)


Thanks again for such an excellent application and keeping it open!!


-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

list Scott Walters · Sat, 21 Jul 2007 21:34:11 -0400 ·

▸ quoted from Asif Iqbal

On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below;

Great to see the summary, these features look great.  I'd like to
request more RRDs and reports about the monitoring system and the
servers/services monitored.  For example:

I think the following could be "gauge" metrics:

Number of devices monitored
Number of services monitored
Number of host.service in green state
Number of host.service in yellow state
Number of host.service in red state
Number of host.service in XXX state

I am thinking these could be done by creating counters within hobbit
(since boot):

Number of state changes
Number of state changes per server
Number of state changes per service
Number of notifications sent

I think the above metrics could help create reports over time periods
for review to help get to "management by facts" vs. "management by
feeling."  Most admins that pay attention to their install will
"know", but its different when you can "prove."  Plus, when
improvements are made, it's nice to see it.

I am also thinking we could try and apply some Six Sigma terminology
and methodology to hobbit which may have value.  Six Sigma keys on
statistics and defects.  Six Sigma refers to having production quality
such that you only see 3.4 defects per million.  Granted we are not
"producing" a physical item, but I am thinking that a defect could be
considered a purple/yellow/red state.   With counters I suggested
above, we could to apply various statistical measures (control charts,
pareto charts, etc.) and see what makes sense or has value for
monitoring.

The goal is to improve consistency and reduce variance.

If you like, I could draft up some graphs and reports I'd like to see.
 My above description might be hard to visualize.  I definitely think
hobbit could benefit from internal counters, similarly to how on OS
keeps tracks of context switches and the like.

Scott

list Henrik Størner · Sun, 22 Jul 2007 15:03:08 +0200 ·

▸ quoted from Scott Walters

On Sat, Jul 21, 2007 at 09:34:11PM -0400, Scott Walters wrote:

Great to see the summary, these features look great.  I'd like to
request more RRDs and reports about the monitoring system and the
servers/services monitored.  For example:

I think the following could be "gauge" metrics:

Number of devices monitored
Number of services monitored
Number of host.service in green state
Number of host.service in yellow state
Number of host.service in red state
Number of host.service in XXX state

You mean like this:

    Statistics:
    Hosts                      :  4321
    Pages                      :   286
    Status messages            : 22331
    - Red                      :   907 ( 4.06 %)
    - Red (non-propagating)    :   809 ( 3.62 %)
    - Yellow                   :   353 ( 1.58 %)
    - Yellow (non-propagating) :   210 ( 0.94 %)
    - Clear                    :  1970 ( 8.82 %)
    - Green                    : 17052 (76.36 %)
    - Purple                   :   452 ( 2.02 %)
    - Blue                     :   578 ( 2.59 %)

The first three are from the current "bbgen --report" status message; 
I've added the breakdown of the colors now. Will put these into an RRD
for tracking trends.

▸ quoted from Scott Walters

I am thinking these could be done by creating counters within hobbit
(since boot):

Number of state changes
Number of state changes per server
Number of state changes per service
Number of notifications sent

The state changes can be calculated from the history logs. This is
preferable, I think, because that way it won't get reset if the Hobbit
server is restarted.

Notifications - it would make sense to have the alert module provide
some statistics that we could put into a trend graph.

▸ quoted from Scott Walters

If you like, I could draft up some graphs and reports I'd like to see.
My above description might be hard to visualize.  I definitely think
hobbit could benefit from internal counters, similarly to how on OS
keeps tracks of context switches and the like.

Please do. The graphs I've created about the Hobbit "internals" have
been mostly for my own use as debugging / performance evaluation data.
If we can provide some data that is interesting to management, that
would be a good thing.


Regards,
Henrik

list Asif Iqbal · Sun, 22 Jul 2007 20:01:12 -0400 ·

▸ quoted from Asif Iqbal

On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below; they have all been
implemented by now. Some of them have been contributed by others over
the past year - I'm pleased to have finally gotten their patches merged.

There are some open bug-reports, and the plan now is to try and get
those fixed. Once that is done I'll ask you all to start testing the
beta-versions, and then a new release is hopefully available soon.

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.


Here is another feature I like to see.

A way for the hobbit server to request hobbit clent to run a command locally
based on an alert.

(Pretty similar to sending a request to the client to download newer version
from download dir)


May be a  dir  called  ~hobbit/server/command  (like
~hobbit/server/download)

In that command file define the command.

Then in the ~hobbit/server/etc/client-local.cfg file define a class and in
the class
have a attribute like
    clientcommand: command definition

And in the bb-hosts file msgs:command

So whenever there is a msgs alert run that command locally on the client

▸ quoted from Asif Iqbal



Major new features

* PAGE setting for alert- and client-configuration handles hosts on
  multiple pages, so any pagename can be used.
* Flap detection of statuses that change color rapidly. The status
  is kept at the most critical level until it stops flapping.
* Holiday support for alerts, including variable holidays (Easter etc)
* Split NCV support - graph data from NCV can be split into multiple
  RRD databases allowing for varying number of datasets.
* RRD database parameters are now configurable (i.e. number of
  datapoints stored, whether to store min/max values etc). Note that
  this only applies to newly created RRD files, not existing ones.
* Distributed worker modules allow sharing the load across multiple
  Hobbit servers
* RRD updates are now cached for up to 30 minutes before being written
  to disk. This makes the I/O load on large installations much lighter.
* Detection of statuses that are reported by multiple hosts
* Client backend-support for the z/OS and z/VSE clients by Rich Smirna

Display things
* Graph zooming now limits the lower/upper bounds of a graph (requires
  rrdtool 1.2.x)
* The trends page default data-period can be configured to something
  other than the default 48-hour view, and the user can select a
  different period on-the-fly.
* Hosts can be sorted automatically on the overview webpage with a
  "group-sorted" group definition.
* NOCOLUMNS setting in bb-hosts let you suppress certain columns on
  a per-host basis
* Host-comments are displayed as tool-tips, to save screen space.

Checks and graphs
* Network tests can use a specific source IP instead of the default
* The validity-period of network tests is configurable, instead of
  being fixed at the default 30-minute setting
* Client file checks can check for a symlink
* "trends" report for RRD handling allows generating custom-made
  RRD files
* Hobbit host- and status-counts are tracked in an RRD file

Miscellaneous
* NCV reports can handle color-icons before the name:value data
* hobbitlaunch tasks can be configured to run on certain hosts only
* Time-warp detection and warning
* Local unix-socket interface to Hobbit daemon
* hobbitd_capture can collect several statuses and hand off such a
  batch to an external command
* Support for SHA-224/256/384/512 digests


Regards,
Henrik

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

list Charles Goyard · Mon, 23 Jul 2007 08:59:54 +0200 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote :

    - Red                      :   907 ( 4.06 %)
    - Red (non-propagating)    :   809 ( 3.62 %)
    - Yellow                   :   353 ( 1.58 %)
    - Yellow (non-propagating) :   210 ( 0.94 %)

Hey,

what a nice hook to tell about a bug in nopropred/nopropyellow: I
_often_ (but not always) get a red status with nopropred on the bb2
page. Full report here:

[Jun, 21th] (http://www.hswn.dk/hobbiton/2007/06/msg00311.html)


-- 
Charles Goyard - user-a6cdca7046e2@xymon.invalid - (+33) 1 45 38 01 31
Orange Business Services - online multimedia  // ingénierie

list Daniel J McDonald · Mon, 23 Jul 2007 06:14:14 -0500 ·

▸ quoted from Asif Iqbal

On Sun, 2007-07-22 at 00:08 +0200, Henrik Stoerner wrote:

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

Get hobbitfetch to not crash, hang, or spin the cpu at 100%.  I don't
have to use hobbitfetch for many hosts, but it is incredibly annoying
for the few that I do that I have to kill -6 the hobbitfetch process 4-5
times a day in order to get any statuses.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com

list Henrik Størner · Mon, 23 Jul 2007 14:00:59 +0200 ·

▸ quoted from Daniel J McDonald

On Mon, Jul 23, 2007 at 06:14:14AM -0500, Daniel J McDonald wrote:

On Sun, 2007-07-22 at 00:08 +0200, Henrik Stoerner wrote:

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

Get hobbitfetch to not crash, hang, or spin the cpu at 100%.

I know, this one is definitely a "must-fix-before-4.3.0" bug.


Henrik

list David Gilmore · Mon, 23 Jul 2007 08:54:41 -0400 ·

Henrik,

One think I would like to see is the ability to encrypt traffic between the client and the server.  Of course this would mean some tweaking of code on the BBWin side of things too.  That is one feature that I know the BB Pro client introduced a few years back that would be an excellent addition for those of us who monitor systems at client sites.

Thank you and keep up the good work.

Dave Gilmore

list T.J. Yang · Mon, 23 Jul 2007 10:10:43 -0500 ·

Great to see author of larrd participating hobbit discussion.
see below for my comments.

▸ quoted from Scott Walters

From: "Scott Walters" <user-2c405ccfe1ee@xymon.invalid>
Reply-To: user-ae9b8668bcde@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
Date: Sat, 21 Jul 2007 21:34:11 -0400

On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below;

Great to see the summary, these features look great.  I'd like to
request more RRDs and reports about the monitoring system and the
servers/services monitored.  For example:

I think the following could be "gauge" metrics:

Number of devices monitored
Number of services monitored
Number of host.service in green state
Number of host.service in yellow state
Number of host.service in red state
Number of host.service in XXX state

I am thinking these could be done by creating counters within hobbit
(since boot):

Number of state changes
Number of state changes per server
Number of state changes per service
Number of notifications sent

I think the above metrics could help create reports over time periods
for review to help get to "management by facts" vs. "management by
feeling."  Most admins that pay attention to their install will
"know", but its different when you can "prove."  Plus, when
improvements are made, it's nice to see it.

Providing OS type and version metrics also, this will give us a clear view
of how many vendor unsupported OS version(ex. solaris 2.5.1,2.6,2.7, hpux 
9,hpux 10.20 etc)
are still in an IT system.

Henrik showed me the command on this list last time I asked but it will be 
good
if this can be done from hobbit server.

▸ quoted from Scott Walters

I am also thinking we could try and apply some Six Sigma terminology
and methodology to hobbit which may have value.  Six Sigma keys on
statistics and defects.  Six Sigma refers to having production quality
such that you only see 3.4 defects per million.  Granted we are not
"producing" a physical item, but I am thinking that a defect could be
considered a purple/yellow/red state.   With counters I suggested
above, we could to apply various statistical measures (control charts,
pareto charts, etc.) and see what makes sense or has value for
monitoring.

In Six Sigma, the availability is formated with 5 Nines(99.999),
There is some patches floating around to make HB's Availability report 
showing 5 Nines format
This is a baby step but got asked by management why the bb/hb report is one 
digit short of nines.

Associate Hobbit more with Six Sigma is definitely a good thing. Connecting 
Hobbit with ITIL is even better.


tj

▸ quoted from Henrik Størner

The goal is to improve consistency and reduce variance.

If you like, I could draft up some graphs and reports I'd like to see.
My above description might be hard to visualize.  I definitely think
hobbit could benefit from internal counters, similarly to how on OS
keeps tracks of context switches and the like.

Scott

http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_2G_0507

list S Aiello · Mon, 23 Jul 2007 12:16:53 -0400 ·

▸ quoted from Asif Iqbal

On Saturday 21 July 2007 18:08, Henrik Stoerner wrote:

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below; they have all been
implemented by now. Some of them have been contributed by others over
the past year - I'm pleased to have finally gotten their patches merged.

There are some open bug-reports, and the plan now is to try and get
those fixed. Once that is done I'll ask you all to start testing the
beta-versions, and then a new release is hopefully available soon.
.......

Just checking if the iconnames.patch will be included in 4.3.0. That is that 
patch that allowed &color-acked keyword to display the acknowledged icons in 
tests ?

Thank you for all of your work,
 ~Steve

list Henrik Størner · Mon, 23 Jul 2007 22:34:08 +0200 ·

▸ quoted from S Aiello

On Mon, Jul 23, 2007 at 12:16:53PM -0400, user-ce96540ed38f@xymon.invalid wrote:

Just checking if the iconnames.patch will be included in 4.3.0. That is that 
patch that allowed &color-acked keyword to display the acknowledged icons in 
tests ?

Those patches that have been posted over the past year have all been
merged into the code, so yes - it's included.


Regards,
Henrik

list Mike Arnold · Mon, 23 Jul 2007 13:51:55 -0700 (MST) ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote:

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

I'd like to see POWER5 CPU stats:
http://www.docum.org/twiki/bin/view/Hobbit/AixPower5

-- 
-mike

list Henrik Størner · Mon, 23 Jul 2007 23:21:32 +0200 ·

▸ quoted from Mike Arnold

On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:

Henrik Stoerner wrote:

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

I'd like to see POWER5 CPU stats:
http://www.docum.org/twiki/bin/view/Hobbit/AixPower5

No problem, except I don't understand why the Wiki claims that it is
necessary to remove the '.' from the numbers. It seems this is to
convert the data to percentages, but that can be done in the graph 
definition:

    [vmstat-pc]
	TITLE Used Physical CPU
	YAXIS pc (100 = 1 CPU)
	DEF:pc=vmstat.rrd:cpu_pc:AVERAGE
	CDEF:pcpercent=pc,100,*
	LINE2:pcpercent#00CC00

And the "-b 1024" in the Wiki graph definition looks bogus.

I'm cc'ing Stef Coene who wrote the Wiki entry to see if he can shed
some light on this.


Regards,
Henrik

list Henrik Størner · Mon, 23 Jul 2007 23:33:24 +0200 ·

▸ quoted from Charles Goyard

On Mon, Jul 23, 2007 at 08:59:54AM +0200, Charles Goyard wrote:

what a nice hook to tell about a bug in nopropred/nopropyellow: I
_often_ (but not always) get a red status with nopropred on the bb2
page. Full report here:

[Jun, 21th] (http://www.hswn.dk/hobbiton/2007/06/msg00311.html)

noprop's don't affect the bb2 page - they only control if a status
affects the color of the "main page" that the status is on. If you
want to remove these from the BB2 page, run bbgen with
"--bb2-ignorecolumns=procs_master,is_master".


Regards,
Henrik

list Scott Walters · Mon, 23 Jul 2007 21:44:11 -0400 ·

▸ quoted from T.J. Yang

If you like, I could draft up some graphs and reports I'd like to see.
My above description might be hard to visualize.

Henrik, you're right about using the histories for reports.  That data
keeps its integrity unlike the RRD averages, much better for reports.

For a given input period (Last 7 days, June 2007, etc.)

* Servers with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on server would generate list of state
changes.  "Look Bob, your server is not stable you need to get your
developers under control!"

* Services with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on service would generate list of the
state changes for that period.  "PHB, the web group is performing way
too many undocumented code changes."

* Red events with longest durations (for events still open, use start
time to NOW as duration)

* Yellow events with longest durations (for events still open, use
start time to NOW as duration)

* All ping/fping/conn events.

You could piece-meal some of those from the eventlog report, but I'd
prefer a single page that showed them all.  For weekly, quarterly
meetings, turnover, etc.

Scott

list Stef Coene · Tue, 24 Jul 2007 08:38:37 +0200 ·

▸ quoted from Henrik Størner

On Monday 23 July 2007, you wrote:

On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:

I'd like to see POWER5 CPU stats:
http://www.docum.org/twiki/bin/view/Hobbit/AixPower5

No problem, except I don't understand why the Wiki claims that it is
necessary to remove the '.' from the numbers. It seems this is to
convert the data to percentages, but that can be done in the graph
definition:

The cpu_pc and cpu_ec has always a "." in it:
kthr    memory              page              faults              cpu
----- ----------- ------------------------ ------------ -----------------------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa    pc    ec
 2  2 1341331 15949   0   8   7  85   49   0 583 34667 1878 38  5 55  2  0.46  46.0

pc has always 2 numbers after the "." and ec 1 (I hope this stays the same for next AIX releases).  Rrd wants integers (I think), so you have to strip the "." from the numbers.
And, indeed, the -b 1024 is a copy-and-paste error.

I have more AIX updates (iostat graphs), but I don't have the time to create patches, maybe at the end of this week.  After that I'm 2 weeks on holiday.


Stef

list Stef Coene · Tue, 24 Jul 2007 08:38:38 +0200 ·

▸ quoted from Henrik Størner

On Monday 23 July 2007, Henrik Stoerner wrote:

On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:

Henrik Stoerner wrote:

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

I'd like to see POWER5 CPU stats:
http://www.docum.org/twiki/bin/view/Hobbit/AixPower5

Related to this post, I have a perl script that can manipulate rrds:
- adding RRA's (so you can add MAX and MIN)
- adding DS's for the extra vmstat number with AIX 5.3)
- changing the DS (so you can keep the data longer)
- migrate from OS to OS (with rrdtool dump and restore)

Let me know if you are interested.  The script itself uses some custom perl 
library's so I can not publish them, but I can try to filter out the needed 
information and procedures.


Stef

list Henrik Størner · Tue, 24 Jul 2007 09:43:26 +0200 ·

▸ quoted from Stef Coene

On Tue, Jul 24, 2007 at 08:38:37AM +0200, Stef Coene wrote:

On Monday 23 July 2007, you wrote:

On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:

I'd like to see POWER5 CPU stats:
http://www.docum.org/twiki/bin/view/Hobbit/AixPower5

No problem, except I don't understand why the Wiki claims that it is
necessary to remove the '.' from the numbers. It seems this is to
convert the data to percentages, but that can be done in the graph
definition:

The cpu_pc and cpu_ec has always a "." in it:
kthr    memory              page              faults              cpu
----- ----------- ------------------------ ------------ -----------------------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa    pc    ec
 2  2 1341331 15949   0   8   7  85   49   0 583 34667 1878 38  5 55  2  0.46  46.0

pc has always 2 numbers after the "." and ec 1 (I hope this stays the same for 
next AIX releases).  Rrd wants integers (I think), so you have to strip 
the "." from the numbers.

No, RRD uses floating-point numbers everywhere. So I'll keep the numbers
unmodified - then we won't have any problems if IBM does change the
number of decimals they report.


Regards,
Henrik

list Ralph Mitchell · Tue, 24 Jul 2007 09:18:49 -0500 ·

▸ quoted from Scott Walters

On 7/23/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:

* Services with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on service would generate list of the
state changes for that period.  "PHB, the web group is performing way
too many undocumented code changes."

Heh, that would be useful.  I've got a perl script using SOAP to get
BigIP pool status and some joker has transferred some machines between
BigIPs without removing the old definitions.  So, there's a bunch of
systems/ports that flip/flop between enable & disable.  Whether
they're red or green depends on which report comes in last.

Maybe I can persuade the load balancer guys to actually remove the
duplicate definitions.

Ralph Mitchell

list Henrik Størner · Tue, 24 Jul 2007 22:08:38 +0200 ·

▸ quoted from Ralph Mitchell

On Tue, Jul 24, 2007 at 09:18:49AM -0500, Ralph Mitchell wrote:

On 7/23/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:

* Services with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on service would generate list of the
state changes for that period.  "PHB, the web group is performing way
too many undocumented code changes."

Heh, that would be useful.  I've got a perl script using SOAP to get
BigIP pool status and some joker has transferred some machines between
BigIPs without removing the old definitions.  So, there's a bunch of
systems/ports that flip/flop between enable & disable.  Whether
they're red or green depends on which report comes in last.

That should actually be caught by another 4.3.0 feature: Flap detection.
If a status changes more than 10 times in 10 minutes, Hobbit deems it
"flapping" and stops logging status changes - instead, it fixes the
status at the most critical level reported.

Any hosts flapping are reported on the "hobbitd" status display.


Regards,
Henrik

list Henrik Størner · Tue, 24 Jul 2007 22:31:21 +0200 ·

▸ quoted from Asif Iqbal

On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:

- Display column only when  it is red
 (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)

I'll leave that for later. There will probably be an entire new version
with just display things.

- SNMP trap by default
- SNMP probe option builtin

Too much for now. I need to dig into the Net-SNMP library API to do
that.

- Process specific alert
 (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)

Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the
corresponding rule in hobbit-alerts.cfg

- Comment TAG for DOWNTIME
 (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)

Has been implemented for 4.3.0

- Add functionalities in `delay'
 (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)

Haven't looked at that.

- CPU/Memory Usage per process
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)

Probably impossible. Most "ps" implementations can report the current 
amount of cpu/memory a process uses, but that's a snapshot (ever noticed 
how "top" always has itself in the top list of cpu-using processes?). 
What's interesting is not how much cpu/memory a process uses exactly when 
the Hobbit client runs the "ps" command, but how much it has used on 
average since the last client run - similar to what "vmstat" reports for 
the system as a whole. I don't know of any way to get this data.

Another problem with this is identifying what a process is. A
long-running daemon often forks child-processes that are short-lived;
should we add their cpu-utilisation to that of the long-running process?
If yes, then we have to monitor all processes that are started (so
running once every N seconds is not sufficient); if no, then you won't 
spot the cpu hog because it was spawned as a child process.

▸ quoted from Asif Iqbal

- Text based alert for `msgs'. Currently it shows as html in my email
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)

Easily done with an alert script.


Regards,
Henrik

list Ralph Mitchell · Tue, 24 Jul 2007 15:34:20 -0500 ·

▸ quoted from Henrik Størner

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

On Tue, Jul 24, 2007 at 09:18:49AM -0500, Ralph Mitchell wrote:

On 7/23/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:

* Services with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on service would generate list of the
state changes for that period.  "PHB, the web group is performing way
too many undocumented code changes."

Heh, that would be useful.  I've got a perl script using SOAP to get
BigIP pool status and some joker has transferred some machines between
BigIPs without removing the old definitions.  So, there's a bunch of
systems/ports that flip/flop between enable & disable.  Whether
they're red or green depends on which report comes in last.

That should actually be caught by another 4.3.0 feature: Flap detection.
If a status changes more than 10 times in 10 minutes, Hobbit deems it
"flapping" and stops logging status changes - instead, it fixes the
status at the most critical level reported.

Unfortunately that's not going to affect my particular checks.  Right
now I have a Hobbit client kicking off the test on a 5 minute
interval, so it goes off at time T, T+5min, T+10min, etc.  The
duplicated servers are only on 2 BigIPs, so they flip/flop over and
back at time T, T+5, T+10.  At most there will be 6 changes in a 10
minute period.

Could that 10-times-in-10-minutes be made into a variable??  Maybe a
default value in the hobbitserver.cfg with an override in bb-hosts,
though I hate to add yet another inch to the width of that file...

Actually, even flap detection isn't going to help my situation - the
reports are going to be red for the BigIP where the server/port is
disabled and green/red for the BigIP that *really* owns the server, so
flap detection would show red anyway.  All the time.  I really need to
get the duplicates removed.

Ralph Mitchell

list Henrik Størner · Tue, 24 Jul 2007 22:41:28 +0200 ·

▸ quoted from Asif Iqbal

On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:

Here is another feature I like to see.

A way for the hobbit server to request hobbit clent to run a command locally
based on an alert.

[snip]

So whenever there is a msgs alert run that command locally on the client

Run this as a client extension:

  #!/bin/sh

  # Get the current status of the "msgs" column
  MSGSSTATUS=`$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }`

  # Get the command we must run from the client config
  CMD=`grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e 's!^msgsrecovercmd:!!'`

  # If "msgs" is red and there is a command, run it
  if test "$MSGSSTATUS" = "red" -a "$CMD" != ""
  then
     $CMD
  fi

  exit 0

Before doing this, consider the security implications of having your
servers run commands that they fetch from a remote host without
authentication.


Regards,
Henrik

list Greg L Hubbard · Tue, 24 Jul 2007 15:44:02 -0500 ·

Well, we watch for the presence of processes today.  It would be nice to
be able to track cpu and size of "important" processes over time.

Another problem is detecting CPU hogs (sometimes things run away),
another problem is detecting processes with memory leaks -- they just
grow and grow and grow.  How can Hobbit help?

GLH

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Tuesday, July 24, 2007 3:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:

- Display column only when  it is red
 (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)

I'll leave that for later. There will probably be an entire new version
with just display things.

- SNMP trap by default
- SNMP probe option builtin

Too much for now. I need to dig into the Net-SNMP library API to do
that.

- Process specific alert
 (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)

Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the
corresponding rule in hobbit-alerts.cfg

- Comment TAG for DOWNTIME
 (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)

Has been implemented for 4.3.0

- Add functionalities in `delay'
 (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)

Haven't looked at that.

- CPU/Memory Usage per process
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)

Probably impossible. Most "ps" implementations can report the current
amount of cpu/memory a process uses, but that's a snapshot (ever noticed
how "top" always has itself in the top list of cpu-using processes?). 
What's interesting is not how much cpu/memory a process uses exactly
when the Hobbit client runs the "ps" command, but how much it has used
on average since the last client run - similar to what "vmstat" reports
for the system as a whole. I don't know of any way to get this data.

Another problem with this is identifying what a process is. A
long-running daemon often forks child-processes that are short-lived;
should we add their cpu-utilisation to that of the long-running process?
If yes, then we have to monitor all processes that are started (so
running once every N seconds is not sufficient); if no, then you won't
spot the cpu hog because it was spawned as a child process.

- Text based alert for `msgs'. Currently it shows as html in my email
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)

Easily done with an alert script.


Regards,
Henrik

list Greg L Hubbard · Tue, 24 Jul 2007 15:55:02 -0500 ·

Wonder if there is any way to tell a client what it's status is so it
can be autonomous?  What I mean is this:  suppose there was a way for
the Hobbit client to tell the server that service X was now in state Y,
and a client-side module could then activate response Z on its own?

I know the Hobbit model is to have the server own the configurations,
but how do we solve the "trust" problem?

▸ quoted from Henrik Størner


GLH 

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Tuesday, July 24, 2007 3:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:

Here is another feature I like to see.

A way for the hobbit server to request hobbit clent to run a command 
locally based on an alert.

[snip]

So whenever there is a msgs alert run that command locally on the 
client

Run this as a client extension:

  #!/bin/sh

  # Get the current status of the "msgs" column
  MSGSSTATUS=`$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }`

  # Get the command we must run from the client config
  CMD=`grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e
's!^msgsrecovercmd:!!'`

  # If "msgs" is red and there is a command, run it
  if test "$MSGSSTATUS" = "red" -a "$CMD" != ""
  then
     $CMD
  fi

  exit 0

Before doing this, consider the security implications of having your
servers run commands that they fetch from a remote host without
authentication.


Regards,
Henrik

list Greg Shea · Tue, 24 Jul 2007 16:55:59 -0400 ·

Hi Greg,

I needed to do this originally with BB to track a memory leak with HP
OpenView's
pmd process, when we used to use it. 


#!/bin/sh
#
# SCRIPTS IN THE BBHOME/ext DIRECTORY ARE ONLY RUN IF
# THEY ARE DEFINED IN THE ENTRY FOR THE CURRENT HOST
# LISTED IN THE ext/bb-bbexttab FILE.
#

#
# BBPROG SHOULD JUST CONTAIN THE NAME OF THIS FILE
# USEFUL WHEN YOU GET ENVIRONMENT DUMPS TO LOCATE
# THE OFFENDING SCRIPT...
#
BBPROG=bb-pmd.sh; export BBPROG

#
# TEST NAME: THIS WILL BECOME A COLUMN ON THE DISPLAY
# IT SHOULD BE AS SHORT AS POSSIBLE TO SAVE SPACE...
# NOTE YOU CAN ALSO CREATE A HELP FILE FOR YOUR TEST
# WHICH SHOULD BE PUT IN www/help/$TEST.html.  IT WILL
# BE LINKED INTO THE DISPLAY AUTOMATICALLY.
#
TEST="pmd"
#
# BBHOME CAN BE SET MANUALLY WHEN TESTING.
# OTHERWISE IT SHOULD BE SET FROM THE BB ENVIRONMENT
#
#BBHOME=/opt/BB/bb19c ; export BBHOME   # FOR TESTING

if test "$BBHOME" = ""
then
        echo "BBHOME is not set... exiting"
        exit 1
fi

if test ! "$BBTMP"                      # GET DEFINITIONS IF NEEDED
then
         # echo "*** LOADING BBDEF ***"
        . $BBHOME/etc/bbdef.sh          # INCLUDE STANDARD DEFINITIONS
fi

PMDMEM=`/bin/ps -e -o vsz -o comm | grep " pmd" | awk '{printf "%d",
$1/1024}'`
if test "$PMDMEM" = ""
then
    COLOR="clear"
else
    COLOR="green"
fi

#
# AT THIS POINT WE HAVE OUR RESULTS.  NOW WE HAVE TO SEND IT TO
# THE BBDISPLAY TO BE DISPLAYED...
#

# MACHINE NAME MUST EITHER BE A REAL MACHINE NAME, OR
# LOOK LIKE A REAL MACHINE (in the case of arbitrary measurements
# like temperature).  IF THE NAME YOU ARE USING DOESN'T EXIST
# IN THE DNS THEN IT SHOULD BE LISTED IN THE bb-hosts FILE WITH noping,
# PREFERABLY IN IT'S OWN GROUP...

# NOTE THE COMMAS HERE - YOU NEED THEM!

MACHINE=`echo $MACHINE | $SED 's/\./,/g'`      # HAS TO BE IN A,B,C FORM

#
# THE FIRST LINE IS STATUS INFORMATION... STRUCTURE IMPORANT!
# THE REST IS FREE-FORM - WHATEVER YOU'D LIKE TO SEND...
#
LINE="PMD Statistics.
"
SUMMARY="
PMD memory usage is $PMDMEM"

# NOW USE THE BB COMMAND TO SEND THE DATA ACROSS
# SEND IT TO BBDISPLAY
$BB $BBDISP "status $MACHINE.$TEST $COLOR `date` $LINE $SUMMARY MB"

▸ quoted from Greg L Hubbard

-----Original Message-----
From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid] 
Sent: Tuesday, July 24, 2007 4:44 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

Well, we watch for the presence of processes today.  It would be nice to
be able to track cpu and size of "important" processes over time.

Another problem is detecting CPU hogs (sometimes things run away),
another problem is detecting processes with memory leaks -- they just
grow and grow and grow.  How can Hobbit help?

GLH 

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Tuesday, July 24, 2007 3:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:

- Display column only when  it is red
 (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)

I'll leave that for later. There will probably be an entire new version
with just display things.

- SNMP trap by default
- SNMP probe option builtin

Too much for now. I need to dig into the Net-SNMP library API to do
that.

- Process specific alert
 (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)

Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the
corresponding rule in hobbit-alerts.cfg

- Comment TAG for DOWNTIME
 (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)

Has been implemented for 4.3.0

- Add functionalities in `delay'
 (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)

Haven't looked at that.

- CPU/Memory Usage per process
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)

Probably impossible. Most "ps" implementations can report the current
amount of cpu/memory a process uses, but that's a snapshot (ever noticed
how "top" always has itself in the top list of cpu-using processes?). 
What's interesting is not how much cpu/memory a process uses exactly
when the Hobbit client runs the "ps" command, but how much it has used
on average since the last client run - similar to what "vmstat" reports
for the system as a whole. I don't know of any way to get this data.

Another problem with this is identifying what a process is. A
long-running daemon often forks child-processes that are short-lived;
should we add their cpu-utilisation to that of the long-running process?
If yes, then we have to monitor all processes that are started (so
running once every N seconds is not sufficient); if no, then you won't
spot the cpu hog because it was spawned as a child process.

- Text based alert for `msgs'. Currently it shows as html in my email
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)

Easily done with an alert script.


Regards,
Henrik

list Greg L Hubbard · Tue, 24 Jul 2007 16:09:54 -0500 ·

Thanks! 

-----Original Message-----
From: user-762ee872a5a4@xymon.invalid [mailto:user-762ee872a5a4@xymon.invalid] 
Sent: Tuesday, July 24, 2007 3:56 PM
To: user-ae9b8668bcde@xymon.invalid
Cc: user-762ee872a5a4@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

Hi Greg,

I needed to do this originally with BB to track a memory leak with HP
OpenView's pmd process, when we used to use it. 

#!/bin/sh
#
# SCRIPTS IN THE BBHOME/ext DIRECTORY ARE ONLY RUN IF # THEY ARE DEFINED
IN THE ENTRY FOR THE CURRENT HOST # LISTED IN THE ext/bb-bbexttab FILE.
#

#
# BBPROG SHOULD JUST CONTAIN THE NAME OF THIS FILE # USEFUL WHEN YOU GET
ENVIRONMENT DUMPS TO LOCATE # THE OFFENDING SCRIPT...
#
BBPROG=bb-pmd.sh; export BBPROG

#
# TEST NAME: THIS WILL BECOME A COLUMN ON THE DISPLAY # IT SHOULD BE AS
SHORT AS POSSIBLE TO SAVE SPACE...
# NOTE YOU CAN ALSO CREATE A HELP FILE FOR YOUR TEST # WHICH SHOULD BE
PUT IN www/help/$TEST.html.  IT WILL # BE LINKED INTO THE DISPLAY
AUTOMATICALLY.
#
TEST="pmd"
#
# BBHOME CAN BE SET MANUALLY WHEN TESTING.
# OTHERWISE IT SHOULD BE SET FROM THE BB ENVIRONMENT #
#BBHOME=/opt/BB/bb19c ; export BBHOME   # FOR TESTING

if test "$BBHOME" = ""
then
        echo "BBHOME is not set... exiting"
        exit 1
fi

if test ! "$BBTMP"                      # GET DEFINITIONS IF NEEDED
then
         # echo "*** LOADING BBDEF ***"
        . $BBHOME/etc/bbdef.sh          # INCLUDE STANDARD DEFINITIONS
fi

PMDMEM=`/bin/ps -e -o vsz -o comm | grep " pmd" | awk '{printf "%d",
$1/1024}'` if test "$PMDMEM" = ""
then
    COLOR="clear"
else
    COLOR="green"
fi

#
# AT THIS POINT WE HAVE OUR RESULTS.  NOW WE HAVE TO SEND IT TO # THE
BBDISPLAY TO BE DISPLAYED...
#

# MACHINE NAME MUST EITHER BE A REAL MACHINE NAME, OR # LOOK LIKE A REAL
MACHINE (in the case of arbitrary measurements # like temperature).  IF
THE NAME YOU ARE USING DOESN'T EXIST # IN THE DNS THEN IT SHOULD BE
LISTED IN THE bb-hosts FILE WITH noping, # PREFERABLY IN IT'S OWN
GROUP...

# NOTE THE COMMAS HERE - YOU NEED THEM!

MACHINE=`echo $MACHINE | $SED 's/\./,/g'`      # HAS TO BE IN A,B,C FORM

#
# THE FIRST LINE IS STATUS INFORMATION... STRUCTURE IMPORANT!
# THE REST IS FREE-FORM - WHATEVER YOU'D LIKE TO SEND...
#
LINE="PMD Statistics.
"
SUMMARY="
PMD memory usage is $PMDMEM"

# NOW USE THE BB COMMAND TO SEND THE DATA ACROSS # SEND IT TO BBDISPLAY
$BB $BBDISP "status $MACHINE.$TEST $COLOR `date` $LINE $SUMMARY MB"

-----Original Message-----
From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid]
Sent: Tuesday, July 24, 2007 4:44 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

Well, we watch for the presence of processes today.  It would be nice to
be able to track cpu and size of "important" processes over time.

Another problem is detecting CPU hogs (sometimes things run away),
another problem is detecting processes with memory leaks -- they just
grow and grow and grow.  How can Hobbit help?

GLH 

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, July 24, 2007 3:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:

- Display column only when  it is red
 (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)

I'll leave that for later. There will probably be an entire new version
with just display things.

- SNMP trap by default
- SNMP probe option builtin

Too much for now. I need to dig into the Net-SNMP library API to do
that.

- Process specific alert
 (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)

Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the
corresponding rule in hobbit-alerts.cfg

- Comment TAG for DOWNTIME
 (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)

Has been implemented for 4.3.0

- Add functionalities in `delay'
 (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)

Haven't looked at that.

- CPU/Memory Usage per process
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)

Probably impossible. Most "ps" implementations can report the current
amount of cpu/memory a process uses, but that's a snapshot (ever noticed
how "top" always has itself in the top list of cpu-using processes?). 
What's interesting is not how much cpu/memory a process uses exactly
when the Hobbit client runs the "ps" command, but how much it has used
on average since the last client run - similar to what "vmstat" reports
for the system as a whole. I don't know of any way to get this data.

Another problem with this is identifying what a process is. A
long-running daemon often forks child-processes that are short-lived;
should we add their cpu-utilisation to that of the long-running process?
If yes, then we have to monitor all processes that are started (so
running once every N seconds is not sufficient); if no, then you won't
spot the cpu hog because it was spawned as a child process.

- Text based alert for `msgs'. Currently it shows as html in my email
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)

Easily done with an alert script.


Regards,
Henrik

list Francesco Duranti · Tue, 24 Jul 2007 23:21:15 +0200 ·

I've a problem with bbhostgrep of the latest snapshots ... It seems that he cannot get the data from the bb-hosts file... 
Just to do an example...
I've a host defined in the bb-hosts file as:

0.0.0.0 ITROMFS10 #   WIN:*  netapp

Now if i do: bbhostgrep netapp i get:
sh-3.1$ bbhostgrep netapp
2007-07-24 23:13:19 Cannot load bb-hosts, or file is empty

bbhostshow works correctly

Francesco

list Trent Melcher · Tue, 24 Jul 2007 16:30:43 -0500 ·

Why dont you just use the SCRIPT feature of hobbit-alerts?  You can
setup ssh authentication between your hobbit server and hobbit clients.
Then if a specific test goes red, its executes the script, which in turn
ssh's to the remote server having the  issue and executes the script
there to resolve the issue or whatever you need it to do.

We did this with a legacy application we use to have,  the app would
stop listening on its ports and the only way to fix it was to respin the
application.  So hobbit would test the port and if it failed it would
send a page and fire off a script to spin the app.  After a while we got
tired of the pages so we had it email a generic mailbox that someone
checked once in a while and removed it paging us.  Worked great, never
had customers complaints on that specific app after that.

Trent

▸ quoted from Greg L Hubbard



On Tue, 2007-07-24 at 15:55 -0500, Hubbard, Greg L wrote:

Wonder if there is any way to tell a client what it's status is so it
can be autonomous?  What I mean is this:  suppose there was a way for
the Hobbit client to tell the server that service X was now in state Y,
and a client-side module could then activate response Z on its own?

I know the Hobbit model is to have the server own the configurations,
but how do we solve the "trust" problem?

GLH 
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Tuesday, July 24, 2007 3:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:

Here is another feature I like to see.

A way for the hobbit server to request hobbit clent to run a command > locally based on an alert.

[snip]

So whenever there is a msgs alert run that command locally on the > client

Run this as a client extension:

  #!/bin/sh

  # Get the current status of the "msgs" column
  MSGSSTATUS=`$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }`

  # Get the command we must run from the client config
  CMD=`grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e
's!^msgsrecovercmd:!!'`

  # If "msgs" is red and there is a command, run it
  if test "$MSGSSTATUS" = "red" -a "$CMD" != ""
  then
     $CMD
  fi

  exit 0

Before doing this, consider the security implications of having your
servers run commands that they fetch from a remote host without
authentication.


Regards,
Henrik

list Henrik Størner · Wed, 25 Jul 2007 00:07:11 +0200 ·

▸ quoted from Scott Walters

On Mon, Jul 23, 2007 at 09:44:11PM -0400, Scott Walters wrote:

For a given input period (Last 7 days, June 2007, etc.)

* Servers with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on server would generate list of state
changes.  "Look Bob, your server is not stable you need to get your
developers under control!"

* Services with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on service would generate list of the
state changes for that period.  "PHB, the web group is performing way
too many undocumented code changes."

I've whipped up a very rough implementation as part of the eventlog
report on the Hobbit demo site. Could you try generating a report at
   http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh
and let me know if the data at the top is something in the right
direction?

The nice thing about making it an add-on or variant of the eventlog
report is that there's already all of the nice filtering for hosts,
pages, time-periods etc in place, plus the "allevents" logfile parsing
is also done.


Regards,
Henrik

list Henrik Størner · Wed, 25 Jul 2007 00:23:47 +0200 ·

▸ quoted from Francesco Duranti

On Tue, Jul 24, 2007 at 11:21:15PM +0200, Francesco Duranti wrote:

I've a problem with bbhostgrep of the latest snapshots ... 
It seems that he cannot get the data from the bb-hosts file...

The current snapshot has this fixed.


Regards,
Henrik

list Scott Walters · Tue, 24 Jul 2007 22:15:02 -0400 ·

▸ quoted from Henrik Størner

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I've whipped up a very rough implementation as part of the eventlog
report on the Hobbit demo site. Could you try generating a report at
  http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh
and let me know if the data at the top is something in the right
direction?

Bingo.  But since that was so easy, here are a few more:

* So voodoo.hswn.dk has had 12 state changes . . . what were they?  It
would be nice If the server name and service name could be HTML links
which would generate a report of the state changes for the specified
server/service over the given period.

* Also, please show the total.  If there are more then 10
hosts/services use an "Other" at the end of the list.   I love seeing
single hosts on 100+ node installs with 25% of activity.  You know
where to focus.

* And I would imagine the "Top X" where X is configurable will be requested.

* And print the report period on the page so you know what you are looking at.

▸ quoted from Henrik Størner

The nice thing about making it an add-on or variant of the eventlog
report is that there's already all of the nice filtering for hosts,
pages, time-periods etc in place, plus the "allevents" logfile parsing
is also done.

We'll keep requesting features until it gets hard ;)

Scott Walters
-PacketPusher

list John G · Tue, 24 Jul 2007 22:52:48 -0400 ·

▸ quoted from Henrik Størner

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

On Mon, Jul 23, 2007 at 09:44:11PM -0400, Scott Walters wrote:

For a given input period (Last 7 days, June 2007, etc.)

* Servers with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on server would generate list of state
changes.  "Look Bob, your server is not stable you need to get your
developers under control!"

* Services with the most state changes, sorted by highest to lowest
(Maybe just Top 10).  Clicking on service would generate list of the
state changes for that period.  "PHB, the web group is performing way
too many undocumented code changes."

I've whipped up a very rough implementation as part of the eventlog
report on the Hobbit demo site. Could you try generating a report at
   http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh
and let me know if the data at the top is something in the right
direction?

The nice thing about making it an add-on or variant of the eventlog
report is that there's already all of the nice filtering for hosts,
pages, time-periods etc in place, plus the "allevents" logfile parsing
is also done.


Regards,
Henrik

Henrik, I like this. This provides a lot of flexibility on reporting
the 10 ten stats. I could see where bigger sites might want more than
a top 10 listed. Maybe it could be 10 by default and have the option
to list more.

John

list Asif Iqbal · Tue, 24 Jul 2007 23:28:16 -0400 ·

▸ quoted from Greg L Hubbard

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:

- Display column only when  it is red
 (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)

I'll leave that for later. There will probably be an entire new version
with just display things.

- SNMP trap by default
- SNMP probe option builtin

Too much for now. I need to dig into the Net-SNMP library API to do
that.

- Process specific alert
 (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)

Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the
corresponding rule in hobbit-alerts.cfg

- Comment TAG for DOWNTIME
 (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)

Has been implemented for 4.3.0

- Add functionalities in `delay'
 (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)

Haven't looked at that.

- CPU/Memory Usage per process
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)

Probably impossible. Most "ps" implementations can report the current
amount of cpu/memory a process uses, but that's a snapshot (ever noticed
how "top" always has itself in the top list of cpu-using processes?).
What's interesting is not how much cpu/memory a process uses exactly when
the Hobbit client runs the "ps" command, but how much it has used on
average since the last client run - similar to what "vmstat" reports for
the system as a whole. I don't know of any way to get this data.


Well in my `hobbit-clients.cfg' there is already an entry like this.

   PROC "%hobbitd.*" TRACK=hobbitd

It already counts the total number of %hobbitd and label it as hobbitd.
How about let it count the total amount of rss and pcpu as well for that
process
and just create two more rrds?

It won't be really inaccurate because it gives you a graphical
representation of
what the `ps' is telling you. Plus it could be GAUGE type data I guess.

Atleast it will give you some trend of how a process has been behaving. Even

though it may not do the pmap -x calculation but it sure will give you
pointing
fingures to some heavy processes

I bet you lot of hobbit community members would like to see ps graphs
builtin
to hobbit app

▸ quoted from Greg L Hubbard



Another problem with this is identifying what a process is. A

long-running daemon often forks child-processes that are short-lived;
should we add their cpu-utilisation to that of the long-running process?
If yes, then we have to monitor all processes that are started (so
running once every N seconds is not sufficient); if no, then you won't
spot the cpu hog because it was spawned as a child process.

- Text based alert for `msgs'. Currently it shows as html in my email
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)

Easily done with an alert script.


Regards,
Henrik

Thanks for the feedback to all of my feature requests. It is very kind of
you.


-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

list Asif Iqbal · Tue, 24 Jul 2007 23:48:24 -0400 ·

▸ quoted from Asif Iqbal

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:

- Display column only when  it is red
 (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)

I'll leave that for later. There will probably be an entire new version
with just display things.

- SNMP trap by default
- SNMP probe option builtin

Too much for now. I need to dig into the Net-SNMP library API to do
that.

- Process specific alert
 (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)

Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the
corresponding rule in hobbit-alerts.cfg

- Comment TAG for DOWNTIME
 (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)

Has been implemented for 4.3.0

- Add functionalities in `delay'
 (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)

Haven't looked at that.

- CPU/Memory Usage per process
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)

Probably impossible. Most "ps" implementations can report the current
amount of cpu/memory a process uses, but that's a snapshot (ever noticed
how "top" always has itself in the top list of cpu-using processes?).
What's interesting is not how much cpu/memory a process uses exactly when
the Hobbit client runs the "ps" command, but how much it has used on
average since the last client run - similar to what "vmstat" reports for
the system as a whole. I don't know of any way to get this data.

Another problem with this is identifying what a process is. A
long-running daemon often forks child-processes that are short-lived;
should we add their cpu-utilisation to that of the long-running process?
If yes, then we have to monitor all processes that are started (so
running once every N seconds is not sufficient); if no, then you won't
spot the cpu hog because it was spawned as a child process.

- Text based alert for `msgs'. Currently it shows as html in my email
 (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)

Easily done with an alert script.


Well all my messages show up in html format. Wouldn't it be nice to generate
the
email, or have a choice to generate email, as text type instead of html
type.

Also this email may suggest that text based email alert is possible.

  http://www.hobbitmon.com/hobbiton/2005/10/msg00382.html

However, I might be misreading that email.

Again, please understand this is still just a low priority feature request.

Until then I will just explore the script idea that you suggested.

Appreciate all your work really!


Regards,

Henrik

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

list Sabeer MZ · Wed, 25 Jul 2007 11:26:33 +0530 ·

Hi Henrik,

New Feature Request.-

I would like to add some network news on the pages. Suppose we found that
some server has bad disk and some one will fix it later so here i want add
the info that this issue has taken care.

▸ quoted from Scott Walters



On 7/25/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I've whipped up a very rough implementation as part of the eventlog
report on the Hobbit demo site. Could you try generating a report at
  http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh
and let me know if the data at the top is something in the right
direction?

Bingo.  But since that was so easy, here are a few more:

* So voodoo.hswn.dk has had 12 state changes . . . what were they?  It
would be nice If the server name and service name could be HTML links
which would generate a report of the state changes for the specified
server/service over the given period.

* Also, please show the total.  If there are more then 10
hosts/services use an "Other" at the end of the list.   I love seeing
single hosts on 100+ node installs with 25% of activity.  You know
where to focus.

* And I would imagine the "Top X" where X is configurable will be
requested.

* And print the report period on the page so you know what you are looking
at.

The nice thing about making it an add-on or variant of the eventlog
report is that there's already all of the nice filtering for hosts,
pages, time-periods etc in place, plus the "allevents" logfile parsing
is also done.

We'll keep requesting features until it gets hard ;)

Scott Walters
-PacketPusher

--


Thanks
Sabeer MZ

list Henrik Størner · Wed, 25 Jul 2007 11:19:04 +0200 ·

▸ quoted from Sabeer MZ

On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:

New Feature Request.-

I would like to add some network news on the pages. Suppose we found that
some server has bad disk and some one will fix it later so here i want add
the info that this issue has taken care.

Several possibilities already:

1) Ack the red/yellow statuses you have, and put this information in the
   acknowledgement text.
2) Disable the server and provide the information in the disable text.
3) Create a host "notes" file with the information.

I'd use 1) or 2). I don't see the need for a fourth way of doing this.


Regards,
Henrik

list Charles Goyard · Wed, 25 Jul 2007 11:28:00 +0200 ·

▸ quoted from Henrik Størner

Henrik Stoerner wrote :

On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:

New Feature Request.-

I would like to add some network news on the pages. Suppose we found that
some server has bad disk and some one will fix it later so here i want add
the info that this issue has taken care.

Several possibilities already:

1) Ack the red/yellow statuses you have, and put this information in the
   acknowledgement text.
2) Disable the server and provide the information in the disable text.
3) Create a host "notes" file with the information.

Wasn't there a bb_bulletin feature too ?

▸ quoted from Charles Goyard


-- 
Charles Goyard - user-a6cdca7046e2@xymon.invalid - (+33) 1 45 38 01 31
Orange Business Services - online multimedia  // ingénierie

list Henrik Størner · Wed, 25 Jul 2007 12:16:35 +0200 ·

▸ quoted from Charles Goyard

On Wed, Jul 25, 2007 at 11:28:00AM +0200, Charles Goyard wrote:

Henrik Stoerner wrote :

On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:

New Feature Request.-

I would like to add some network news on the pages. Suppose we found that
some server has bad disk and some one will fix it later so here i want add
the info that this issue has taken care.

Several possibilities already:

1) Ack the red/yellow statuses you have, and put this information in the
   acknowledgement text.
2) Disable the server and provide the information in the disable text.
3) Create a host "notes" file with the information.

Wasn't there a bb_bulletin feature too ?

~hobbit/server/web/bulletin_header and _footer, yes. But these show up
on all pages, I think Sabeer wanted something specifically for a single 
status page.


Regards,
Henrik

list Henrik Størner · Wed, 25 Jul 2007 16:05:14 +0200 ·

▸ quoted from Scott Walters

On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I've whipped up a very rough implementation as part of the eventlog
report on the Hobbit demo site. Could you try generating a report at
 http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh
and let me know if the data at the top is something in the right
direction?

Bingo.  But since that was so easy, here are a few more:

[snip]

We'll keep requesting features until it gets hard ;)

Reports are usually rather boring things to do, but this one was fun.
Have a look at the current state of this report at the Hobbit demo site
http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh
(You can also just go to the demo site, and pick the "Reports" -> "Top
 Changes" report).

Should cover everything you've asked for - at least until now, that is.


Regards,
Henrik

list Galen Johnson · Wed, 25 Jul 2007 10:10:45 -0400 ·

Well, that's damned handy...

=G=

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, July 25, 2007 10:05 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I've whipped up a very rough implementation as part of the eventlog
report on the Hobbit demo site. Could you try generating a report at
 http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh
and let me know if the data at the top is something in the right
direction?

Bingo.  But since that was so easy, here are a few more:

[snip]

We'll keep requesting features until it gets hard ;)

Reports are usually rather boring things to do, but this one was fun.
Have a look at the current state of this report at the Hobbit demo site
http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh
(You can also just go to the demo site, and pick the "Reports" -> "Top
 Changes" report).

Should cover everything you've asked for - at least until now, that is.


Regards,
Henrik

list Johann Eggers · Wed, 25 Jul 2007 16:12:06 +0200 ·

▸ quoted from Galen Johnson

Reports are usually rather boring things to do, but this one was fun.
Have a look at the current state of this report at the Hobbit demo
site
http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh
(You can also just go to the demo site, and pick the "Reports" -> "Top
 Changes" report).

Should cover everything you've asked for - at least until now, that
is.

Looks really great!
Can we have next to the numbers also the percentage related to all event
changes in the defined timeframe?

Johann

list Jason Altrincham Jones · Wed, 25 Jul 2007 15:13:39 +0100 ·

One thing that might be useful would be to alter the importance flag so
if a critical system goes down it comes in with an exclamation mark etc.
so it stands out from the other hobbit alerts....just a thought.
Jason.

▸ quoted from Galen Johnson


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: 25 July 2007 15:05
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:

On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I've whipped up a very rough implementation as part of the eventlog
report on the Hobbit demo site. Could you try generating a report at
 http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh
and let me know if the data at the top is something in the right
direction?

Bingo.  But since that was so easy, here are a few more:

[snip]

We'll keep requesting features until it gets hard ;)

Reports are usually rather boring things to do, but this one was fun.
Have a look at the current state of this report at the Hobbit demo site
http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh
(You can also just go to the demo site, and pick the "Reports" -> "Top
 Changes" report).

Should cover everything you've asked for - at least until now, that is.


Regards,
Henrik

list Scott Walters · Wed, 25 Jul 2007 10:28:14 -0400 ·

▸ quoted from Jason Altrincham Jones

On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

Reports are usually rather boring things to do, but this one was fun.
Have a look at the current state of this report at the Hobbit demo site
http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh
(You can also just go to the demo site, and pick the "Reports" -> "Top
 Changes" report).

Should cover everything you've asked for - at least until now, that is.

Perfect.  You rock Henrik.

Scott Walters
-PacketPusher

list Scott Walters · Wed, 25 Jul 2007 10:34:40 -0400 ·

On 7/25/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:

Perfect.  You rock Henrik.

Not quite, Could you add the total at the bottom of the "Top X" list.

Scott Walters
-PacketPusher

list Peter Welter · Wed, 25 Jul 2007 17:50:19 +0200 ·

One of the things I'd like to see in 4.3.0 is what is already partly
available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are
gathered but not plotted. I hope this monitor will make it, on all
platforms, because you can pinpoint any disk performance problems much
easier.

Thanks, Peter


2007/7/22, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid>:

▸ quoted from Stef Coene

[snip]

This doesn't mean that I won't consider adding new stuff before the

4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

list John Glowacki · Wed, 25 Jul 2007 12:05:19 -0400 ·

▸ quoted from Peter Welter

Peter Welter wrote:

One of the things I'd like to see in 4.3.0 is what is already partly
available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are
gathered but not plotted. I hope this monitor will make it, on all
platforms, because you can pinpoint any disk performance problems much
easier.

Thanks, Peter

That would be good to have. I have been asked if hobbit does this from
other groups in the company.

John

list Johann Eggers · Wed, 25 Jul 2007 18:06:14 +0200 ·

▸ quoted from Scott Walters

Reports are usually rather boring things to do, but this one was
fun.
Have a look at the current state of this report at the Hobbit demo
site
http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh
(You can also just go to the demo site, and pick the "Reports" ->

"Top

 Changes" report).

Should cover everything you've asked for - at least until now, that
is.

Looks really great!
Can we have next to the numbers also the percentage related to all
event
changes in the defined timeframe?

Wonderful.

That's some kind of information you can show your manger(s) and you know
where you have to probably investigate.

Thanks
Johann

list Henrik Størner · Wed, 25 Jul 2007 18:44:01 +0200 ·

▸ quoted from Johann Eggers

On Wed, Jul 25, 2007 at 04:12:06PM +0200, Johann Eggers wrote:

Reports are usually rather boring things to do, but this one was fun.
Have a look at the current state of this report at the Hobbit demo site
http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh

Looks really great!
Can we have next to the numbers also the percentage related to all event
changes in the defined timeframe?

Sure, already done.

I also added something I felt was missing: When you have the top-10 list
showing that host "foo" has the most status changes, then when you click 
on that host I wanted an overview of what services put it in the top-10.
So I added a summary by service when you click on a host in the top-10
display.

And likewise when you click on a service in the top-10 list, it gives
you a list of the hosts that were counted for that service.


Regards,
Henrik

list Scott Walters · Wed, 25 Jul 2007 12:55:38 -0400 ·

▸ quoted from John Glowacki

On 7/25/07, John Glowacki <user-a1361bcdf988@xymon.invalid> wrote:

Peter Welter wrote:

One of the things I'd like to see in 4.3.0 is what is already partly
available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are
gathered but not plotted. I hope this monitor will make it, on all
platforms, because you can pinpoint any disk performance problems much
easier.

Thanks, Peter

That would be good to have. I have been asked if hobbit does this from
other groups in the company.

Tracking disk IO gets complicated pretty quickly for a few reasons:

* OS's don't have common commands for measuring disk performance
* Do you watch IO by filesystem or spindle?  If you have RAID,
grabbing the data can become even more difficult.
* People can disagree on what good disk IO means, and even fewer
understand disk IO workloads.
*  I have no idea if Windows and the WMI has this kind of info.

For *ix, the "blocked processes" of vmstat is an excellent way to see
if the server overall is IO bound.  I would definitely like to see
that a "stock" displayed metric in 4.3.0.  Most *ix vmstat provides
that number.  Similar to the iostat for Solaris, the info is
collected, just not displayed.

Scott Walters
-PacketPusher

list Scott Walters · Wed, 25 Jul 2007 13:11:47 -0400 ·

▸ quoted from Henrik Størner

On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I also added something I felt was missing: When you have the top-10 list
showing that host "foo" has the most status changes, then when you click
on that host I wanted an overview of what services put it in the top-10.
So I added a summary by service when you click on a host in the top-10
display.

Wow.  That is awesome.  Great idea.  Someone needs to talk to the mail
admin of SMTP for www.sslug.dk!

Could you also add the report period to the server and services "sub-reports"?

Could you add state changes by day of week and hour of day?

Mon  12
Tue   145
Wed   351

And since this all been so easy, how about trending reports based on
an interval?

For example, by week of year show total state changes for a specified
server or service.  E.G.

                   server1        server2
Week 1             10                 12
Week 2             134               23

I'll try and think of a clever way to use RRD for this kind of data.
I'd imagine we could structure the RRAs to avoid averaging, and force
timestamps to match the interval.

Scott Walters
-PacketPusher

list Gary Baluha · Wed, 25 Jul 2007 13:36:11 -0400 ·

One feature I'd like to see is a more comprehensive editing page for the
Critical Systems.  Specifically, it'd be nice to see all of the currently
defined groups.  This would make it a little easier when adding new hosts to
monitor in Hobbit, and ensure that they are added to the correct critical
systems group (and to avoid duplicates and near-duplicates).

list Stef Coene · Wed, 25 Jul 2007 21:48:04 +0200 ·

▸ quoted from Peter Welter

On Wednesday 25 July 2007, Peter Welter wrote:

One of the things I'd like to see in 4.3.0 is what is already partly
available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are
gathered but not plotted. I hope this monitor will make it, on all
platforms, because you can pinpoint any disk performance problems much
easier.

I have this running for AIX with an external script.  One of my todo's is making the rrd hobbitd module like the vmstat module so you can have a definition per type host.
On the other hand, the iostat output is different then the vmstat output and the external script is working fine ....


Stef

list Charles Jones · Wed, 25 Jul 2007 12:55:42 -0700 ·

All the new features sound great. It also sounds like nearly everyone has additional features they would like to see...do you use any sort of tool for tracking feature requests?

P.S. I might as well throw in my own feature request ;-)
* Content check should correctly follow 302 (redirects).  I currently have to use a custom-made script that uses curl in order to do content checks.  In fact, I will include it in case anyone wants to use it:

#!/bin/bash
# contchk.sh written by Charles Jones (user-02bccbb1bbb5@xymon.invalid) 6/6/2007
# This script is designed to perform a content check on a URL and report the
# status to a Hobbit server.
#
# This script was created because Hobbits built-in content check functionality
# does not follow 302 redirects.
#
# The script parses out a "contchk" tag in the bb-hosts file. The proper
# syntax is: contchk;URL;REFERRER;CHECKSTRING
#
# Note that CHECKSTRING cannot contain spaces so you must use regular
# expression metacharacters, so use something like string.with.spaces
BBHTAG=contchk     # Name of the tag in bb-hosts
COLUMN=cont        # Column display name in Hobbit
CURL=/usr/bin/curl # Location of curl binary
CURLOPTS="--connect-timeout 30 -m 30 -s -L -b cookiejar" # Curl options
# Note: using grep because bbhostgrep fails on long lines
grep $BBHTAG $BBHOME/etc/bb-hosts | while read L
   do
      set $L    # To get one line of output from bbhostgrep
      HOSTIP="$1"
      MACHINEDOTS="$2"
      MACHINE=`echo $2 | $SED -e's/\./,/g'`
      CHECKURL=`echo $4 | awk -F";" '{print $2}'`    # Parse out the check URL
      REFERRER=`echo $4 | awk -F";" '{print $3}'`    # Parse out the referrer string
      if [ "" != "$REFERRER" ];
         then
           REFERRER="-e $REFERRER"
      fi
      CHECKSTRING=`echo $4 | awk -F";" '{print $4}'` # Parse out the check string
      $CURL $CURLOPTS $REFERRER $CHECKURL |grep -q "$CHECKSTRING"
      status=$? # Save greps return status
      if [ 0 -eq $status ]; then # grep returns 0 if it found something
        COLOR=green
        MSG="String <b>\"$CHECKSTRING\"</b> was found in <a href=$CHECKURL>$CHECKURL</a>"
        $BB $BBDISP "status $MACHINE.$COLUMN $COLOR `date` Content Check OK
        ${MSG}
        "
      else # grep didn't find anything
        COLOR=red
        MSG="String <b>\"$CHECKSTRING\"</b> was NOT FOUND in <a href=$CHECKURL>$CHECKURL</a>"
        $BB $BBDISP "status $MACHINE.$COLUMN $COLOR `date` Content Check FAILED
        ${MSG}
        "
      fi
   done
exit 0

list Henrik Størner · Wed, 25 Jul 2007 22:46:59 +0200 ·

▸ quoted from Scott Walters

On Wed, Jul 25, 2007 at 01:11:47PM -0400, Scott Walters wrote:

On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I also added something I felt was missing: When you have the top-10 list
showing that host "foo" has the most status changes, then when you click
on that host I wanted an overview of what services put it in the top-10.
So I added a summary by service when you click on a host in the top-10
display.

Wow.  That is awesome.  Great idea.  Someone needs to talk to the mail
admin of SMTP for www.sslug.dk!

Could you also add the report period to the server and services 
"sub-reports"?

Done.

▸ quoted from Scott Walters

Could you add state changes by day of week and hour of day?
And since this all been so easy, how about trending reports based on
an interval?

Let's leave those for now - these will be more difficult to implement.
The only other addition I'd like to make for this report now is to
have it count the event durations instead of the number of changes,
so you can have a top-10 report of the hosts (or services) that have the 
longest outages. Could be useful when playing the "blame game".
"Look - the DB people are always soooo slow when it comes to cleaning 
up the filled tables".


Regards,
Henrik

list Ralph Mitchell · Thu, 26 Jul 2007 14:35:26 -0500 ·

▸ quoted from Henrik Størner

On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

On Wed, Jul 25, 2007 at 01:11:47PM -0400, Scott Walters wrote:

On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

I also added something I felt was missing: When you have the top-10
list
showing that host "foo" has the most status changes, then when you
click
on that host I wanted an overview of what services put it in the
top-10.
So I added a summary by service when you click on a host in the top-10
display.


Just a couple of minor observations on the  top-10 list:

1) the right hand box, "Top 10 Services" has a "Host" column.  Probably
should be "Service"??

2) I was lazy and just put in the date, with no time of day, and scored an
"Internal Server Error".  Could it default to "from 00:00:00" & "to
23:59:59", or maybe have a "last XX minutes OR from/to", same as the
Notification Report??

Thanks,

Ralph Mitchell

list Henrik Størner · Thu, 26 Jul 2007 22:21:52 +0200 ·

▸ quoted from Ralph Mitchell

On Thu, Jul 26, 2007 at 02:35:26PM -0500, Ralph Mitchell wrote:

Just a couple of minor observations on the  top-10 list:

1) the right hand box, "Top 10 Services" has a "Host" column.  Probably
should be "Service"??

Of course - fixed.

▸ quoted from Ralph Mitchell

2) I was lazy and just put in the date, with no time of day, and scored an
"Internal Server Error".  Could it default to "from 00:00:00" & "to
23:59:59", or maybe have a "last XX minutes OR from/to", same as the
Notification Report??

I'll have to do some extra checking on that input. I've also added some
buttons so you can easily select the last/current year/month/week.


Regards,
Henrik

list Sabeer MZ · Tue, 31 Jul 2007 13:15:27 +0530 ·

Many thanks. I ll check it  out...

▸ quoted from Henrik Størner



On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:

New Feature Request.-

I would like to add some network news on the pages. Suppose we found
that
some server has bad disk and some one will fix it later so here i want
add
the info that this issue has taken care.

Several possibilities already:

1) Ack the red/yellow statuses you have, and put this information in the
   acknowledgement text.
2) Disable the server and provide the information in the disable text.
3) Create a host "notes" file with the information.

I'd use 1) or 2). I don't see the need for a fourth way of doing this.


Regards,
Henrik

--


Thanks
Sabeer MZ

list Jason Altrincham Jones · Tue, 31 Jul 2007 10:51:38 +0100 ·

Hi All,

 
I've been looking through the archives at the sample server side module
Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html)
I'm just curious if anyone knows how to send data from the client side
to the server the same way the standard hobbit client tests do, looking
at bb I'm guessing either bb data or bb client, but when I run bb
<hobbitIP> "client <hostname>.<os>" nothing happens and the man pages
don't mention how to actually send the data etc.

 
Any help appreciated,

 
Thanks,

Jason.

list Ralph Mitchell · Tue, 31 Jul 2007 05:32:21 -0500 ·

▸ quoted from Jason Altrincham Jones

On 7/31/07, Jones, Jason (Altrincham) <user-ee957b46acd2@xymon.invalid> wrote:

I've been looking through the archives at the sample server side module
Henrick showed us
(http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm
just curious if anyone knows how to send data from the client side to the
server the same way the standard hobbit client tests do, looking at bb I'm
guessing either bb data or bb client, but when I run bb <hobbitIP> "client
<hostname>.<os>" nothing happens and the man pages don't mention how to
actually send the data etc.

I have a Hobbit client install running a BigIP check.  In
client/etc/clientlaunch.cfg:

      [bigip-v4]
          ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
          CMD $HOBBITCLIENTHOME/ext/bigip/bigip3.sh
          LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
          INTERVAL 5m

After doing what it needs to do to get the status, the script sends
off a status message to the server like this:

      MACHINE=`echo $NAME | sed -e 's/\./,/g'`
      MESSAGE="status $MACHINE.$TEST $COLOR `date`<P><font size=+2>The
$BIGIP BigIP says: $NAME $TEST is $STATE</font>"
      $BB $BBDISP $MESSAGE

Is that what you're looking for??

Ralph Mitchell

list Sofian Brabez · Tue, 31 Jul 2007 13:33:29 +0200 ·

Hello Jones,

 
You can use the following command line to send data from server to your
client : 

 
bbuser at server:$BBHOME/bin$ bb <hobbitdisplay> "status <hostname>.<test>
<color> <date> <message>"

 
<hobbitdisplay> is your BBDISPLAY set in your $BBHOME/etc/bb-hosts agent
file

<hostname> the name of the host

<date> the current date, I should you to use the default unix command
`date` 

<test> the service to monitor, for example cpu, conn, disk, msgs and you
can put a regular expression

<color> the color on BBDISPLAY

<message> the message you want to display on your BBISPLAY and you can
put HTML text into to have a best visual aspect

 
I hope, I respond to you and help you.

 
Regards

--

Sofian Brabez

Monitoring Team

Natixis France 

user-2ae52e06a4a1@xymon.invalid

▸ quoted from Ralph Mitchell

From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid] 
Sent: Tuesday, July 31, 2007 11:52 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] sending client side data

Hi All,

I've been looking through the archives at the sample server side module
Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html)
I'm just curious if anyone knows how to send data from the client side
to the server the same way the standard hobbit client tests do, looking
at bb I'm guessing either bb data or bb client, but when I run bb
<hobbitIP> "client <hostname>.<os>" nothing happens and the man pages
don't mention how to actually send the data etc.

Any help appreciated,

Thanks,

Jason.

Ce message et toutes les pieces jointes peuvent etre confidentiels, et, de plus, peuvent etre couverts par un privilege ou une protection legale. Il est etabli a l'intention exclusive de ses destinataires. Toute utilisation de ce message non conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse prealable. Toutes opinions exprimees dans ce message, sont personnelles a leur auteur et ne sauraient necessairement refleter celle de IXIS Corporate & Investment Bank, de ses filiales ou de sa maison mere. Elles sont aussi susceptibles de modification sans notification prealable. Tous droits reserves. Si vous recevez ce message par erreur, merci de le detruire et d'en avertir immediatement l'expediteur. Toute communication avec IXIS Corporate & Investment Bank peut etre controlee, enregistree et conservee. IXIS Corporate & Investment Bank decline toute responsabilite au titre de ce message s'il a ete altere, deforme ou falsifie. Les communications sur Internet n'etant pas securisees, IXIS Corporate & Investment Bank informe qu'il ne peut accepter aucune responsabilite quant au contenu de ce message.

This email and any attachment may be confidential and may also be legally privileged or otherwise protected from disclosure. It is intended only for the stated addressee(s) and access to it by any other person(s) is unauthorised. Any use, dissemination or disclosure not in accordance with its purpose, either in whole or in part, is prohibited without our prior formal approval. Any opinion expressed in this message may be personal to the author and may not necessarily reflect the opinion of IXIS Corporate & Investment Bank, its affiliates or parent company. It may also be subject to change without prior notice. Copyright reserved. If you are not an addressee, you must not disclose, copy, circulate or in any other way use or rely on the information contained in this email. If you have received it in error, please inform us immediately and delete all copies. Any communication made with IXIS Corporate & Investment Bank (whether personal or business) may be monitored and a record kept. IXIS Corporate & Investment Bank shall not be liable for the message if altered, changed or falsified. As communication on the Internet is not secure, IXIS Corporate & Investment Bank does not accept responsibility for the content of this message. --------------------------------------------------------

list Jason Altrincham Jones · Tue, 31 Jul 2007 12:38:03 +0100 ·

Not really, that sends the predetermined colours, what I was thinking is
more sending the output of command x and then have hobbit generate the
webpage.

Any ideas?
Jason.

▸ quoted from Ralph Mitchell


-----Original Message-----
From: Ralph Mitchell [mailto:user-00a5e44c48c0@xymon.invalid] 
Sent: 31 July 2007 11:32
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] sending client side data

On 7/31/07, Jones, Jason (Altrincham) <user-ee957b46acd2@xymon.invalid> wrote:

I've been looking through the archives at the sample server side
module
Henrick showed us
(http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm
just curious if anyone knows how to send data from the client side to
the
server the same way the standard hobbit client tests do, looking at bb

I'm

guessing either bb data or bb client, but when I run bb <hobbitIP>

"client

<hostname>.<os>" nothing happens and the man pages don't mention how
to
actually send the data etc.

I have a Hobbit client install running a BigIP check.  In
client/etc/clientlaunch.cfg:

      [bigip-v4]
          ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
          CMD $HOBBITCLIENTHOME/ext/bigip/bigip3.sh
          LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
          INTERVAL 5m

After doing what it needs to do to get the status, the script sends
off a status message to the server like this:

      MACHINE=`echo $NAME | sed -e 's/\./,/g'`
      MESSAGE="status $MACHINE.$TEST $COLOR `date`<P><font size=+2>The
$BIGIP BigIP says: $NAME $TEST is $STATE</font>"
      $BB $BBDISP $MESSAGE

Is that what you're looking for??

Ralph Mitchell

list Buchan Milne · Fri, 3 Aug 2007 08:31:29 +0200 ·

▸ quoted from Trent Melcher

On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:

Wonder if there is any way to tell a client what it's status is so it
can be autonomous?  What I mean is this:  suppose there was a way for
the Hobbit client to tell the server that service X was now in state Y,
and a client-side module could then activate response Z on its own?

I don't like band-aids like this.

"restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact.

I like even less using a monitoring system to do this ...

Regards,
Buchan

list Tod Hansmann · Fri, 3 Aug 2007 08:39:56 -0600 ·

In my experience, I have to agree.  Hobbit is for monitoring so the
information that x is down gets to people who can properly diagnose what
is going on, not take generic actions.  If generic actions were
something that were required for X to function properly, it should be a
feature of that software.

Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with.

Tod Hansmann
Network Engineer

▸ quoted from Buchan Milne

 
 
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid] 
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:

Wonder if there is any way to tell a client what it's status is so it
can be autonomous?  What I mean is this:  suppose there was a way for
the Hobbit client to tell the server that service X was now in state

Y,

and a client-side module could then activate response Z on its own?

I don't like band-aids like this.

"restart because it's down" prevents the real impact of problems being
seen, 
and provides less motivation for fixing things properly. Instead, you
sit 
with frequent short outages (which may avoid the attention of managers, 
production managers) which have end-user impact.

I like even less using a monitoring system to do this ...

Regards,
Buchan

list Galen Johnson · Fri, 3 Aug 2007 11:18:23 -0400 ·

DOn't forget...this is the model that Tivoli and HP Openview, and many
other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically
restarting a service if it goes down to be "a bad thing"(TM).

In many solutions, logs get overwritten upon a restart that would be
integral to the real resolution and prevention.

=G=

▸ quoted from Tod Hansmann

-----Original Message-----
From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid] 
Sent: Friday, August 03, 2007 10:40 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

In my experience, I have to agree.  Hobbit is for monitoring so the
information that x is down gets to people who can properly diagnose what
is going on, not take generic actions.  If generic actions were
something that were required for X to function properly, it should be a
feature of that software.

Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with.

Tod Hansmann
Network Engineer

-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid] 
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:

Wonder if there is any way to tell a client what it's status is so it
can be autonomous?  What I mean is this:  suppose there was a way for
the Hobbit client to tell the server that service X was now in state

Y,

and a client-side module could then activate response Z on its own?

I don't like band-aids like this.

"restart because it's down" prevents the real impact of problems being
seen, 
and provides less motivation for fixing things properly. Instead, you
sit 
with frequent short outages (which may avoid the attention of managers, 
production managers) which have end-user impact.

I like even less using a monitoring system to do this ...

Regards,
Buchan

list Thomas Kern · Fri, 3 Aug 2007 11:31:28 -0400 ·

When a monitoring system detects something wrong, the only actions I
want the monitor to perform is to get the admin (or the admin's boss)
moving to diagnose and fix the problem.  And I am the admin that I am most concerned with. I don't understand
most of the errors well enough to automate a recovery process. 
/Thomas Kern
/XXX-XXX-XXXX

▸ quoted from Galen Johnson

-----Original Message-----
From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid] Sent: Friday, August 03, 2007 11:18 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

DOn't forget...this is the model that Tivoli and HP Openview, and many
other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically
restarting a service if it goes down to be "a bad thing"(TM).

In many solutions, logs get overwritten upon a restart that would be
integral to the real resolution and prevention.

=G=

list Greg L Hubbard · Fri, 3 Aug 2007 10:38:58 -0500 ·

Well, I use Netcool which has the opposite philosophy -- there is a
"process automation" system that watches processes and restarts them if
they fail, while also logging restarts.  You can configure a "restart"
parameter to be anything from 0 (forever) to any number of times.  I
like to set a reasonable number so persistent errors eventually kill the
process, but occasional errors do not.  Log files are not overwritten,
but are appended and rotated.

But whatever.  My view seems to be in the minority -- guess the rest of
you don't mind 24x7x365 babysitting.

GLH

▸ quoted from Galen Johnson

-----Original Message-----
From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid] 
Sent: Friday, August 03, 2007 10:18 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

DOn't forget...this is the model that Tivoli and HP Openview, and many
other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically
restarting a service if it goes down to be "a bad thing"(TM).

In many solutions, logs get overwritten upon a restart that would be
integral to the real resolution and prevention.

=G=

-----Original Message-----
From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid]
Sent: Friday, August 03, 2007 10:40 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

In my experience, I have to agree.  Hobbit is for monitoring so the
information that x is down gets to people who can properly diagnose what
is going on, not take generic actions.  If generic actions were
something that were required for X to function properly, it should be a
feature of that software.

Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with.

Tod Hansmann
Network Engineer

-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:

Wonder if there is any way to tell a client what it's status is so it 
can be autonomous?  What I mean is this:  suppose there was a way for 
the Hobbit client to tell the server that service X was now in state

Y,

and a client-side module could then activate response Z on its own?

I don't like band-aids like this.

"restart because it's down" prevents the real impact of problems being
seen, and provides less motivation for fixing things properly. Instead,
you sit with frequent short outages (which may avoid the attention of
managers, production managers) which have end-user impact.

I like even less using a monitoring system to do this ...

Regards,
Buchan

list S Aiello · Fri, 3 Aug 2007 13:06:11 -0400 ·

▸ quoted from Greg L Hubbard

On Friday 03 August 2007 11:38, Hubbard, Greg L wrote:

Well, I use Netcool which has the opposite philosophy -- there is a
"process automation" system that watches processes and restarts them if
they fail, while also logging restarts.  You can configure a "restart"
parameter to be anything from 0 (forever) to any number of times.  I
like to set a reasonable number so persistent errors eventually kill the
process, but occasional errors do not.  Log files are not overwritten,
but are appended and rotated.

But whatever.  My view seems to be in the minority -- guess the rest of
you don't mind 24x7x365 babysitting.

GLH

To restart a process, some form of intelligence has to be added to the restart script, especially when recovering from a failure mode. Scripts can only have so much intelligence, a restart script could be dangerous unless dealing with a simple situation.

Now after saying all this, I do have to admit I do have scripts that query the status of the monitoring server and on reds perform a restart. There should be nothing stopping you from implementing the same. It is just a very fine line when deciding when/how to implement process restarts.

Most times out of not, it is much better for a person to react to an alert then a script. But for recurring failure modes, these scripts do help and I don't get called at 3 am.

So if you really need to implement restart scripts, just use the bb tool's query feature.

 ~Steve

list Scott Walters · Fri, 3 Aug 2007 13:15:27 -0400 ·

I am definitely in the "monitor only" camp.  As appealing as
"self-healing" may seem, I've seen attempts go horrible wrong too many
times.  For example, shutting down Oracle for upgrades and then being
restarted in the middle of the upgrade.  Not good.

I also agree that "self-healing" lends itself to band-aids that avoid
root-cause determination.  I don't think this requires "baby-sitting,"
but a commitment to fixing things once.  I have also had the
displeasure of making permanent band-aids, but I cannot condone it.

All of those "operational" aspects aside, I've convinced myself from a
security point of view, corrective action from monitoring is bad-- a
clear violation of the separation of duties.  You don't want your
auditors "cleaning up" the numbers as they go over your books.

You know what's better than your webserver being automatically
restarted when it crashes?  Your webserver not crashing.

I completely support the absence of corrective actions from monitor
triggers.  The question I have yet to answer satisfactorily is,"Should
the monitoring system perform additional data collection after
specific errors?"  For example, running a particular "find" command
when disk usage increases to try and identify which files are causing
the partition to fill.


Scott Walters
-PacketPusher

▸ quoted from Greg L Hubbard


On 8/3/07, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:

Well, I use Netcool which has the opposite philosophy -- there is a
"process automation" system that watches processes and restarts them if
they fail, while also logging restarts.  You can configure a "restart"
parameter to be anything from 0 (forever) to any number of times.  I
like to set a reasonable number so persistent errors eventually kill the
process, but occasional errors do not.  Log files are not overwritten,
but are appended and rotated.

But whatever.  My view seems to be in the minority -- guess the rest of
you don't mind 24x7x365 babysitting.

GLH

-----Original Message-----
From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid]
Sent: Friday, August 03, 2007 10:18 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

DOn't forget...this is the model that Tivoli and HP Openview, and many
other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically
restarting a service if it goes down to be "a bad thing"(TM).

In many solutions, logs get overwritten upon a restart that would be
integral to the real resolution and prevention.

=G=

-----Original Message-----
From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid]
Sent: Friday, August 03, 2007 10:40 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

In my experience, I have to agree.  Hobbit is for monitoring so the
information that x is down gets to people who can properly diagnose what
is going on, not take generic actions.  If generic actions were
something that were required for X to function properly, it should be a
feature of that software.

Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with.

Tod Hansmann
Network Engineer


-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:

Wonder if there is any way to tell a client what it's status is so it
can be autonomous?  What I mean is this:  suppose there was a way for
the Hobbit client to tell the server that service X was now in state

Y,

and a client-side module could then activate response Z on its own?

I don't like band-aids like this.

"restart because it's down" prevents the real impact of problems being
seen, and provides less motivation for fixing things properly. Instead,
you sit with frequent short outages (which may avoid the attention of
managers, production managers) which have end-user impact.

I like even less using a monitoring system to do this ...

Regards,
Buchan

list Henrik Størner · Fri, 3 Aug 2007 21:53:14 +0200 ·

On Fri, Aug 03, 2007 at 01:15:27PM -0400, Scott Walters wrote:

I am definitely in the "monitor only" camp.

Me too. For those who feel differently, Hobbit does provide the
necessary hooks so you can trigger actions from some status going red;
either through alert scripts, or from the bb "query" command which
others have mentioned. In fact, I implemented the "query" feature
because I needed it to setup such an automated recovery for one of
our customers at work.

▸ quoted from Scott Walters

All of those "operational" aspects aside, I've convinced myself from a
security point of view, corrective action from monitoring is bad-- a
clear violation of the separation of duties.  You don't want your
auditors "cleaning up" the numbers as they go over your books.

Good point.

▸ quoted from Scott Walters

The question I have yet to answer satisfactorily is,"Should
the monitoring system perform additional data collection after
specific errors?"  For example, running a particular "find" command
when disk usage increases to try and identify which files are causing
the partition to fill.

It can be very useful at times, especially when you have to do a 
"root cause analysis" to explain why some service was down at 2 AM in
the morning - and the problem was fixed by a 2nd-level technician who 
just rebooted the box. That's why I added the feature that Hobbit saves
the latest client-data report when a status goes yellow or red. It has
helped me track down the cause of quite a few service outages.


Regards,
Henrik

list Dave Haertig · Fri, 3 Aug 2007 14:15:25 -0600 ·

Sometimes the real world runs interference for Utopia.  While in Utopia
you want to analyse, find the root cause, and fix everything before
proceding, you can't always do that.  When an outage of one hour costs
your company tens of thousands of dollars, you can't justify withholding
a simple bandaid (so long as you don't then ignore the long term fix).

Most everything I do in Hobbit is a custom script.  Restarting crashed
processes is one of the least of my worries.  Although in some rare
cases I do just that (short term), with appropriate logging and email to
the app developement team.  The corporate expense of having the app down
is too great to let Utopian ideas prevail.

Most of the automated Hobbit stuff I do is not restarting dead apps
(luckily, that is very infrequent around here).  It's more mundane.  One
example is disk space.  A full filesystem would shut many things down.
Apps should not fill a filesystem, but sometimes they do.  So my custom
Hobbit scripts first scream and scream about low disk space, even
analysing things down to specific subdirectories and fast growing files
and doing trend analysis.  But if their call is not answered, they start
freeing up space from a "private reserve" I have set aside to deal with
emergencies.  So if we experience a sudden unexpected blowup in a
filesystem at 3am, Hobbit keeps things running in production until the
appropriate people can look into and diagnose the problem.  This may not
be Utopian behavior, but it sure is practical at 3am in the morning!

But my vote would be for Hobbit out-of-the-box to NOT attempt automated
repair actions.  That should be left to the Hobbit administrator.  We
can write custom monitor scripts or custom alert scripts to add this
functionality if it's appropriate for our environments.  It's trivial to
integrate your own scripting into Hobbit.

I sure wish I worked in Utopia though.  The job would be a helluva lot
less stressful!  :-)


-----Original Message-----
From: user-7796849e4635@xymon.invalid [mailto:user-7796849e4635@xymon.invalid] On Behalf
Of Scott Walters
Sent: Friday, August 03, 2007 11:15 AM

▸ quoted from Scott Walters

To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version

I am definitely in the "monitor only" camp.  As appealing as
"self-healing" may seem, I've seen attempts go horrible wrong too many
times.  For example, shutting down Oracle for upgrades and then being
restarted in the middle of the upgrade.  Not good.

I also agree that "self-healing" lends itself to band-aids that avoid
root-cause determination.  I don't think this requires "baby-sitting,"
but a commitment to fixing things once.  I have also had the displeasure
of making permanent band-aids, but I cannot condone it.

All of those "operational" aspects aside, I've convinced myself from a
security point of view, corrective action from monitoring is bad-- a
clear violation of the separation of duties.  You don't want your
auditors "cleaning up" the numbers as they go over your books.

You know what's better than your webserver being automatically restarted
when it crashes?  Your webserver not crashing.

I completely support the absence of corrective actions from monitor
triggers.  The question I have yet to answer satisfactorily is,"Should
the monitoring system perform additional data collection after specific
errors?"  For example, running a particular "find" command when disk
usage increases to try and identify which files are causing the
partition to fill.

Scott Walters
-PacketPusher

On 8/3/07, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:

Well, I use Netcool which has the opposite philosophy -- there is a 
"process automation" system that watches processes and restarts them 
if they fail, while also logging restarts.  You can configure a

"restart"

parameter to be anything from 0 (forever) to any number of times.  I 
like to set a reasonable number so persistent errors eventually kill 
the process, but occasional errors do not.  Log files are not 
overwritten, but are appended and rotated.

But whatever.  My view seems to be in the minority -- guess the rest 
of you don't mind 24x7x365 babysitting.

GLH

-----Original Message-----
From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid]
Sent: Friday, August 03, 2007 10:18 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

DOn't forget...this is the model that Tivoli and HP Openview, and many

other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically

restarting a service if it goes down to be "a bad thing"(TM).

In many solutions, logs get overwritten upon a restart that would be 
integral to the real resolution and prevention.

=G=

-----Original Message-----
From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid]
Sent: Friday, August 03, 2007 10:40 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version

In my experience, I have to agree.  Hobbit is for monitoring so the 
information that x is down gets to people who can properly diagnose 
what is going on, not take generic actions.  If generic actions were 
something that were required for X to function properly, it should be 
a feature of that software.

Hobbit CAN do some scripting based on alerts, but even that might be a

bit more than a systems administrator wants to hinder himself with.

Tod Hansmann
Network Engineer

-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version

On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:

Wonder if there is any way to tell a client what it's status is so 
it can be autonomous?  What I mean is this:  suppose there was a way

for the Hobbit client to tell the server that service X was now in 
state

Y,

and a client-side module could then activate response Z on its own?

I don't like band-aids like this.

"restart because it's down" prevents the real impact of problems being

seen, and provides less motivation for fixing things properly. 
Instead, you sit with frequent short outages (which may avoid the 
attention of managers, production managers) which have end-user
impact.

I like even less using a monitoring system to do this ...

Regards,
Buchan

list Kolbjørn Barmen · Sat, 4 Aug 2007 03:36:02 +0200 (CEST) ·

On Fri, 3 Aug 2007, Haertig, David F (Dave) wrote:

I sure wish I worked in Utopia though.

Ditto, since there would be no top-posters around...

-- 
 Kolbjørn Barmen
 UNINETT Driftsenter

list Gary Baluha · Mon, 6 Aug 2007 09:28:31 -0400 ·

▸ quoted from Dave Haertig

On 8/3/07, Haertig, David F (Dave) <user-68874b735d77@xymon.invalid> wrote:

Most everything I do in Hobbit is a custom script.  Restarting crashed
processes is one of the least of my worries.  Although in some rare
cases I do just that (short term), with appropriate logging and email to
the app developement team.  The corporate expense of having the app down
is too great to let Utopian ideas prevail.


Agreed, though sometimes it's worth the effort for an extra few minutes of
downtime to do *some* analysis.

▸ quoted from Dave Haertig


Most of the automated Hobbit stuff I do is not restarting dead apps

(luckily, that is very infrequent around here).  It's more mundane.  One
example is disk space.  A full filesystem would shut many things down.
Apps should not fill a filesystem, but sometimes they do.  So my custom
Hobbit scripts first scream and scream about low disk space, even
analysing things down to specific subdirectories and fast growing files
and doing trend analysis.  But if their call is not answered, they start
freeing up space from a "private reserve" I have set aside to deal with
emergencies.  So if we experience a sudden unexpected blowup in a
filesystem at 3am, Hobbit keeps things running in production until the
appropriate people can look into and diagnose the problem.  This may not
be Utopian behavior, but it sure is practical at 3am in the morning!


What sort of trend analysis do your scripts perform?  We have a few boxes
that are notorious for filling up their disk space, and I haven't yet come
up with an idea of how to neatly track exactly what it is that keeps filling
up the disk.

▸ quoted from Dave Haertig


But my vote would be for Hobbit out-of-the-box to NOT attempt automated

repair actions.  That should be left to the Hobbit administrator.  We
can write custom monitor scripts or custom alert scripts to add this
functionality if it's appropriate for our environments.  It's trivial to
integrate your own scripting into Hobbit.


Due to the demands of some of the other admins, I have implemented a script
that does some rudimentary restarting, and even looks at the status of the
specific Hobbit alert in question, so that it doesn't try to restart
something, if the alert has been disabled (such as for a planned downtime).

It wasn't all that hard to write, and I also would prefer Hobbit NOT have
auto-restart logic out of the box.

▸ quoted from Dave Haertig


I sure wish I worked in Utopia though.  The job would be a helluva lot

less stressful!  :-)


Working in the real world isn't as bad, compared to working the real world
where management _thinks_ you actually work in Utopia, and yet still can't
spare an extra second of downtime for real-time root cause analysis. ;-)

list Dave Haertig · Mon, 6 Aug 2007 13:25:46 -0600 ·

I try to identify filesystem "space hogs" via custom scripts I wrote a
long time ago when using BB.  99% of my custom stuff is done in PERL.
 
I use 'du -k' to get the size of all directories in the filesystem.  I
then cut those results down to only the first and second level
directories (but you could go as deep as you want).  I store the size of
each subdirectory in a small "database".  I did this ages ago and my
code uses PERL's "Storable" module to store the accumulated date into a
file (called my "database").  These days I'd just use Hobbit's easily
accessed RRD files.  I then use PERL's
Statistics::Descriptive::least_squares_fit() to calculate the slope and
linear correlation coefficient of the "best fit line".  This allows me
to see how fast each subdirectory is growing/shrinking, and how linear
that growth/reduction is.  I trigger yellow/red conditions based on rate
of growth and predicted fill time at current growth rate, in addition to
the standard "95% full = red" test.
 
The above makes it fairly easy to identify which subdirectory is your
problem, which is often times good enough to identify the file/process
that is killing you.  When that's not, I have a seperate test that tries
to identify problem files a different way.  BB/Hobbit uses 'top' to
identify cpu-hogging processes.  Many times you see files hogging space
are directly tied to processes hogging cpu (runaway process = runaway
file in many cases).  'top' identifies the process(es), then "lsof -p
<pid>" is used to identify the files that the suspect process has open.
Finding a cpu-hogger that has a filespace-hogger open is usually the
holy grail you seek.
 
As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in
100Mb chunks for critical filesystems.  "dd if=/dev/zero
of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then
"cp reserve01 reserve02", etc. to build up the reserve.  A seperate
Hobbit "notification script" is used to simply delete files from this
reserve under dire circumstances, after normal email/pager notifications
have failed to trigger action by developers/production support people.
 
My BB/Hobbit custom scripts tend to get quite involved.  Probably too
much so, but they're fun for me to write!


From: Gary Baluha [mailto:user-ae3e15c22de1@xymon.invalid] 
Sent: Monday, August 06, 2007 7:29 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version


 < ... snip ... >

▸ quoted from Gary Baluha

 
One 
example is disk space.  A full filesystem would shut many things down.
Apps should not fill a filesystem, but sometimes they do.  So my custom
Hobbit scripts first scream and scream about low disk space, even
analysing things down to specific subdirectories and fast growing files
and doing trend analysis.  But if their call is not answered, they start
freeing up space from a "private reserve" I have set aside to deal with 
emergencies.  So if we experience a sudden unexpected blowup in a
filesystem at 3am, Hobbit keeps things running in production until the
appropriate people can look into and diagnose the problem.  This may not
be Utopian behavior, but it sure is practical at 3am in the morning!

What sort of trend analysis do your scripts perform?  We have a few
boxes that are notorious for filling up their disk space, and I haven't
yet come up with an idea of how to neatly track exactly what it is that
keeps filling up the disk.


< ... snip ...>

list Buchan Milne · Wed, 8 Aug 2007 18:20:04 +0200 ·

▸ quoted from Scott Walters

On Friday 03 August 2007 19:15:27 Scott Walters wrote:

I am definitely in the "monitor only" camp.  As appealing as
"self-healing" may seem, I've seen attempts go horrible wrong too many
times.  For example, shutting down Oracle for upgrades and then being
restarted in the middle of the upgrade.  Not good.

How about the easy example of a web server not responding. Do you restart it ? In the case I am thinking of, no. Since, the reason it is not responding is that the database server it (and another 4 webservers) is waiting for is having problems. Restarting the web server would drop the >1000 existing (working) sessions, causing a full-blown outage, and migrate the problem to the other 4 web servers that sit behind the same load balancer.

▸ quoted from Dave Haertig

I also agree that "self-healing" lends itself to band-aids that avoid
root-cause determination.

Or *prevent* the root-cause determination. For example, I had a problem on an LDAP server that appeared once in 2 or 3 weeks. I start it under a debugger, and when next experienced the problem, some online debugging (after taking it out of the pool) with a developer found and fixed the bug within one hour (and allowed me to understand the cause so I could work around it). A restart here would have meant waiting some more and another few outages.

▸ quoted from Dave Haertig

I don't think this requires "baby-sitting," but a commitment to fixing things once.  I have also had the
displeasure of making permanent band-aids, but I cannot condone it.

We do have some applications that require supervision ... but for them we use daemon-tools or supervise-scripts (a re-implementation of daemon-tools), as these are *much* better at supervision than a monitoring system. If you really need a baby-sitter, the monitoring system isn't the best one ...

▸ quoted from Dave Haertig

All of those "operational" aspects aside, I've convinced myself from a
security point of view, corrective action from monitoring is bad-- a
clear violation of the separation of duties.  You don't want your
auditors "cleaning up" the numbers as they go over your books.

You know what's better than your webserver being automatically
restarted when it crashes?  Your webserver not crashing.

I completely support the absence of corrective actions from monitor
triggers.  The question I have yet to answer satisfactorily is,"Should
the monitoring system perform additional data collection after
specific errors?"  For example, running a particular "find" command
when disk usage increases to try and identify which files are causing
the partition to fill.

Or attach a debugger to the hung process and get a backtrace ?

Regards,
Buchan

list Buchan Milne · Wed, 8 Aug 2007 18:28:30 +0200 ·

▸ quoted from Dave Haertig

On Monday 06 August 2007 21:25:46 Haertig, David F (Dave) wrote:

I try to identify filesystem "space hogs" via custom scripts I wrote a
long time ago when using BB.  99% of my custom stuff is done in PERL.

I use 'du -k' to get the size of all directories in the filesystem.  I
then cut those results down to only the first and second level
directories (but you could go as deep as you want).  I store the size of
each subdirectory in a small "database".  I did this ages ago and my
code uses PERL's "Storable" module to store the accumulated date into a
file (called my "database").  These days I'd just use Hobbit's easily
accessed RRD files.  I then use PERL's
Statistics::Descriptive::least_squares_fit() to calculate the slope and
linear correlation coefficient of the "best fit line".

This would be really useful to do on directories monitored with the dir option 
in client-local.cfg plus DIR option in hobbit-clients, e.g. to be able to 
specify alerts at specified "time before disk is full".

▸ quoted from Dave Haertig

This allows me 
to see how fast each subdirectory is growing/shrinking, and how linear
that growth/reduction is.  I trigger yellow/red conditions based on rate
of growth and predicted fill time at current growth rate, in addition to
the standard "95% full = red" test.

The above makes it fairly easy to identify which subdirectory is your
problem, which is often times good enough to identify the file/process
that is killing you.  When that's not, I have a seperate test that tries
to identify problem files a different way.  BB/Hobbit uses 'top' to
identify cpu-hogging processes.  Many times you see files hogging space
are directly tied to processes hogging cpu (runaway process = runaway
file in many cases).  'top' identifies the process(es), then "lsof -p
<pid>" is used to identify the files that the suspect process has open.
Finding a cpu-hogger that has a filespace-hogger open is usually the
holy grail you seek.

The "CPU usage by process" graph is the utopian one ...

▸ quoted from Dave Haertig

As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in
100Mb chunks for critical filesystems.  "dd if=/dev/zero
of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then
"cp reserve01 reserve02", etc. to build up the reserve.

lvextend may be another useful command here ...


Regards,
Buchan

list Jason Altrincham Jones · Thu, 9 Aug 2007 12:33:36 +0100 ·

Hi All,

Is there a way to filter out hosts/sites based on pagename or just a
regular expression? Trying to do an availability report for all sites
except one and wondering if there is a way.

Thanks,
Jason.

list Robert · Thu, 9 Aug 2007 05:59:19 -0700 (PDT) ·

Hi list,
  I am trying to compile hobbit on Solaris 9, make is failing:
   
  bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
   
  it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
  Please let me know.
   
  Thanks in advance
  
 
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.

list Steve Holmes · Thu, 9 Aug 2007 09:14:54 -0400 ·

Make sure the command make is really gmake, or use gmake explicitly.
Steve

▸ quoted from Robert



On 8/9/07, Robert <user-36b337833045@xymon.invalid> wrote:

Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:

bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'

it is using -C option which is not a right option, which option I can use
and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.

Thanks in advance

list Robert · Thu, 9 Aug 2007 06:17:15 -0700 (PDT) ·

▸ quoted from Steve Holmes


Robert <user-36b337833045@xymon.invalid> wrote: Hi list,
  I am trying to compile hobbit on Solaris 9, make is failing:
   
  bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
    
  it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
  Please let me know.
   
  Thanks in advance
  
 
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.


Pinpoint customers who are looking for what you sell.

list Robert · Thu, 9 Aug 2007 06:17:30 -0700 (PDT) ·

▸ quoted from Robert

 Hi list,
  I am trying to compile hobbit on Solaris 9, make is failing:
   
  bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
    
  it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
  Please let me know.
   
  Thanks in advance
  
 
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.


Luggage? GPS? Comic books? 
Check out fitting  gifts for grads at Yahoo! Search.

list Tom Moore · Thu, 9 Aug 2007 09:19:27 -0400 ·

Install and try "gmake" on your solaris machine.  You can get a
precompiled binary package from www.sunfreeware.com

▸ quoted from Robert

From: Robert [mailto:user-36b337833045@xymon.invalid] 
Sent: Thursday, August 09, 2007 8:59 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Filter reports?

Hi list,

I am trying to compile hobbit on Solaris 9, make is failing:

bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD
]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S
][ -t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'

it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?

Please let me know.

Thanks in advance

Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now


<http://us.rd.yahoo.com/evt=48223/*http:/get.games.yahoo.com/proddesc?ga
mekey=monopolyherenow>  (it's updated for today's economy) at Yahoo!
Games.

list Pkc_mls · Thu, 09 Aug 2007 15:20:10 +0200 ·

Robert a écrit :

Hi list,
hi Robert,

▸ quoted from Tom Moore

I am trying to compile hobbit on Solaris 9, make is failing:
 bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'

are you using gmake or the standard solaris make ?

▸ quoted from Tom Moore

it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
 Thanks in advance

list Mike Arnold · Thu, 9 Aug 2007 07:28:18 -0700 (MST) ·

▸ quoted from Robert

Robert wrote:

 Hi list,
  I am trying to compile hobbit on Solaris 9, make is failing:

  bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'

  it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
  Please let me know.

You need to use gmake and not Solaris make.

Or you can always use the packages from http://www.blastwave.org/ .

-- 
-m

list Trent Melcher · Thu, 09 Aug 2007 10:56:05 -0500 ·

Talk about hijacking a thread,  and its not even close to anything about
Filter reports.  Folks, please create a new email with a new subject for
your "New" posts.  This will make following threads stay on subject.

Thanks
Trent

▸ quoted from Robert


On Thu, 2007-08-09 at 05:59 -0700, Robert wrote:

Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
 
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv
-lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]...
[ -d ][ -dd ][ -D ][ -DD ]

[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
 
it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
 
Thanks in advance

 
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at
Yahoo! Games.

list Jason Altrincham Jones · Thu, 9 Aug 2007 17:30:18 +0100 ·

I thought it'd gotten a little off topic too :)

Original question I asked:

▸ signature


Hi All,

Is there a way to filter out hosts/sites based on pagename or just a
regular expression? Trying to do an availability report for all sites
except one and wondering if there is a way.

Thanks,
Jason.

-----Original Message-----

▸ quoted from Trent Melcher

From: Trent Melcher [mailto:user-c65e78735b17@xymon.invalid] 
Sent: 09 August 2007 16:56
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Filter reports?

Talk about hijacking a thread,  and its not even close to anything about
Filter reports.  Folks, please create a new email with a new subject for
your "New" posts.  This will make following threads stay on subject.

Thanks
Trent

On Thu, 2007-08-09 at 05:59 -0700, Robert wrote:

Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
 
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv
-lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]...
[ -d ][ -dd ][ -D ][ -DD ]

[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
 
it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
 
Thanks in advance

 
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at
Yahoo! Games.

list Henrik Størner · Thu, 9 Aug 2007 22:36:05 +0200 ·

▸ quoted from Jason Altrincham Jones

On Thu, Aug 09, 2007 at 12:33:36PM +0100, Jones, Jason (Altrincham) wrote:

Is there a way to filter out hosts/sites based on pagename or just a
regular expression? Trying to do an availability report for all sites
except one and wondering if there is a way.

Only way I can see is to run the report using a bb-hosts file without
the host you want excluded.


Henrik

list Robert · Fri, 10 Aug 2007 09:24:41 -0700 (PDT) ·

Mike,
I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there.
Thanks in advance

▸ quoted from Mike Arnold



Mike Arnold <user-95d566fbb20b@xymon.invalid> wrote: 
Robert wrote:

 Hi list,
  I am trying to compile hobbit on Solaris 9, make is failing:

  bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
             [ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
             [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'

  it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
  Please let me know.

You need to use gmake and not Solaris make.

Or you can always use the packages from http://www.blastwave.org/ .

-- 
-m


Building a website is a piece of cake. 
Yahoo! Small Business gives you all the tools to get online.

list Galen Johnson · Fri, 10 Aug 2007 12:40:57 -0400 ·

Try sunfreeware.com...

▸ quoted from Robert

From: Robert [mailto:user-36b337833045@xymon.invalid] 
Sent: Friday, August 10, 2007 12:25 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Hobbit instlalation on Solaris 9

Mike,
I spent lot of time on that site, I am trying to download CSWhobbit but
when I click on it to download it is showing bunch of dependencies and I
can't download any of those. I am not sure what I am doing wrong, could
you please let me know how to download from there.
Thanks in advance

Mike Arnold <user-95d566fbb20b@xymon.invalid> wrote:

Robert wrote:

Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:

bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv

-lsocket

-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][

-DD ]

[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'

it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.

You need to use gmake and not Solaris make.

Or you can always use the packages from http://www.blastwave.org/ .

-- 
-m


Building a website is a piece of cake. 
Yahoo! Small Business gives you all the tools to get online.


<http://us.rd.yahoo.com/evt=48251/*http:/smallbusiness.yahoo.com/webhost
ing/?p=PASSPORTPLUS>

list Flemming · Fri, 10 Aug 2007 19:14:37 +0200 (CEST) ·


or try a mirror-site ;-)

http://ftp.uni-erlangen.de/pub/mirrors/blastwave.org/unstable/i386/5.9/hobbit_client-4.2.0,REV=2007.04.12-SunOS5.8-i386-CSW.pkg.gz

and additional 
 CSWchkconfig CSWcommon CSWexpat CSWggettext CSWhobbitc CSWiconv  CSWlibpopt SMCpcre

▸ quoted from Galen Johnson



On Fri, 10 Aug 2007, Galen Johnson wrote:

Try sunfreeware.com...

From: Robert [mailto:user-36b337833045@xymon.invalid] Sent: Friday, August 10, 2007 12:25 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Hobbit instlalation on Solaris 9

Mike,
I spent lot of time on that site, I am trying to download CSWhobbit but
when I click on it to download it is showing bunch of dependencies and I
can't download any of those. I am not sure what I am doing wrong, could
you please let me know how to download from there.
Thanks in advance

Mike Arnold <user-95d566fbb20b@xymon.invalid> wrote:

Robert wrote:

Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:

bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv

-lsocket

-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][

-DD ]

[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'

it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.

You need to use gmake and not Solaris make.

Or you can always use the packages from http://www.blastwave.org/ .

-- 
-m


Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
<http://us.rd.yahoo.com/evt=48251/*http:/smallbusiness.yahoo.com/webhost
ing/?p=PASSPORTPLUS>

Cheers,

          Flemming

treibsAND
Willy-Brandt-Allee 9
23554 Lübeck

www.treibsand.net
www.walli-bleibt.de
www.myspace.com/treibsand_luebeck
user-690778e9ef6b@xymon.invalid

list Tom Georgoulias · Fri, 10 Aug 2007 15:25:14 -0400 ·

▸ quoted from Peter Welter

Henrik Stoerner wrote:

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.

I am testing the latest snapshot now and I really like what I've seen, especially the hobbit-topchanges.sh script.  This is

Major new features

▸ quoted from Asif Iqbal

* Flap detection of statuses that change color rapidly. The status
  is kept at the most critical level until it stops flapping.

Is the flap detetection going to have some kind of configurable parameters?  And will we get some kind of indication that flapping is occurring?

▸ quoted from Asif Iqbal

* Split NCV support - graph data from NCV can be split into multiple
  RRD databases allowing for varying number of datasets.

I'm very happy to see this one!  :)

▸ quoted from Asif Iqbal

* RRD database parameters are now configurable (i.e. number of   datapoints stored, whether to store min/max values etc). Note that
  this only applies to newly created RRD files, not existing ones.

How do I toggle the min/max for new RRDs?

Display things

▸ quoted from Asif Iqbal

* The trends page default data-period can be configured to something
  other than the default 48-hour view, and the user can select a
  different period on-the-fly.

Another really cool and useful improvement.

Two features I've been asked for but I know either can't be done for 4.3.0 or maybe even at all are:

1.  I get asked for this all the time:  A way to mark "big" events on the graphs, so that we can have some marker inside or text outside the graph that gives some context to sweeping, overall trend changes.   For example, if the average CPU IOwait on a database server drops from 35% to 10% after a code release, a person who is looking at that graph and wasn't involved in that code release would immediately know why the performance improved because a brief explanation of what took place during that time frame would be displayed along side.

2.  An easier way to customize the colors or style of the Hobbit webpages.  I tried to take this on one afternoon and found myself going through lots of source code, since it appeared that it was hard coded in quite a few places.

Tom
-- 
Tom Georgoulias
Sr. Systems Engineer
McClatchy Interactive
user-6a0b8b0f0ae1@xymon.invalid

list Mike Arnold · Fri, 10 Aug 2007 13:46:58 -0700 (MST) ·

▸ quoted from Robert

Robert wrote:

Mike,
I spent lot of time on that site, I am trying to download CSWhobbit but
when I click on it to download it is showing bunch of dependencies and I
can't download any of those. I am not sure what I am doing wrong, could
you please let me know how to download from there.
Thanks in advance

Blastwave's web pages are just informative.  To use blastwave you must
have pkg-get installed.

HOWTO Use Blastwave
http://www.blastwave.org/howto_S8.html

Once you have pkg-get installed, you can install Hobbit like this:
pkg-get -i hobbit hobbit_client

Hobbit then lives in /opt/csw/libexec/hobbit .

-- 
-mike

list Sebastian Auriol · Fri, 30 Nov 2007 17:03:41 -0000 ·

Henrik,

The new features sound good!  But are they now documented?  I checked the
snapshot and none of man pages or the Changes page seem to have been updated
since the 4.2.0 release.  It makes it quite difficult to test and use the
new features (which presumably is a requirement before releasing the new
version)! ;-)

Or am I missing something less obvious than using the source (TM)?

I think it would be useful if the Changes page
(http://www.hswn.dk/beta/snapshot/Changes) was kept more or less up-to-date
for the snapshot releases.  Knowing what has changed in hobbit-server betas
seems pretty difficult without downloading frequent snapshots and doing
diffs.  Is there any chance of the source being put into a public subversion
/ CVS repository or something?  I see there is already a public CVS
repository for hobbit on SourceForge, but it only includes the hobbit-client
code
  A public repository may encourage more contributed patches.

I also see that last year you mentioned that you are "using RCS which is a
predecessor to CVS".  I don't know if this is still the case, but if so, it
appears pretty simple to migrate to CVS using rcs2cvs as documented here:
http://www.linuxdocs.org/HOWTOs/CVS-RCS-HOWTO-3.html
Or as the OP did in http://www.nabble.com/RCS-to-svn-t792119.html - he
migrated RCS to SVN via CVS.

Many thanks,

Sebastian 


user-ce4a2c883f75@xymon.invalid (Henrik Stoerner) wrote on Sun, 22 Jul 2007 00:08:12
+0200:

▸ quoted from Asif Iqbal

In another thread, someone asked about what new features are planned for
version 4.3.0. I've summarized them below; they have all been
implemented by now. Some of them have been contributed by others over
the past year - I'm pleased to have finally gotten their patches merged.

There are some open bug-reports, and the plan now is to try and get
those fixed. Once that is done I'll ask you all to start testing the
beta-versions, and then a new release is hopefully available soon.

This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with
the current set of features. But if I've missed someone's favourite
patch or feature request, do let me know.


Major new features
* PAGE setting for alert- and client-configuration handles hosts on
 multiple pages, so any pagename can be used.
* Flap detection of statuses that change color rapidly. The status
 is kept at the most critical level until it stops flapping.
* Holiday support for alerts, including variable holidays (Easter etc)
* Split NCV support - graph data from NCV can be split into multiple
 RRD databases allowing for varying number of datasets.
* RRD database parameters are now configurable (i.e. number of 
 datapoints stored, whether to store min/max values etc). Note that
 this only applies to newly created RRD files, not existing ones.
* Distributed worker modules allow sharing the load across multiple
 Hobbit servers
* RRD updates are now cached for up to 30 minutes before being written
 to disk. This makes the I/O load on large installations much lighter.
* Detection of statuses that are reported by multiple hosts
* Client backend-support for the z/OS and z/VSE clients by Rich Smirna

Display things
* Graph zooming now limits the lower/upper bounds of a graph (requires
 rrdtool 1.2.x)
* The trends page default data-period can be configured to something
 other than the default 48-hour view, and the user can select a
 different period on-the-fly.
* Hosts can be sorted automatically on the overview webpage with a
 "group-sorted" group definition.
* NOCOLUMNS setting in bb-hosts let you suppress certain columns on
 a per-host basis
* Host-comments are displayed as tool-tips, to save screen space.

Checks and graphs
* Network tests can use a specific source IP instead of the default
* The validity-period of network tests is configurable, instead of
 being fixed at the default 30-minute setting
* Client file checks can check for a symlink
* "trends" report for RRD handling allows generating custom-made
 RRD files
* Hobbit host- and status-counts are tracked in an RRD file

Miscellaneous
* NCV reports can handle color-icons before the name:value data
* hobbitlaunch tasks can be configured to run on certain hosts only
* Time-warp detection and warning
* Local unix-socket interface to Hobbit daemon
* hobbitd_capture can collect several statuses and hand off such a
 batch to an external command
* Support for SHA-224/256/384/512 digests


Regards,
Henrik

list Sebastian Auriol · Thu, 3 Jan 2008 18:33:23 -0000 ·

A slight clarification on my earlier message:  the HTML versions of man
pages haven't been updated since the 4.2.0 release, but the actual man pages
have.  I did end up doing a source diff of some man pages against 4.2.0 to
see what had changed and how to use some of the new features...  So the
situation is better than I feared.
 I'm not sure whether you saw my previous message, Henrik?
 Sebastian

▸ quoted from Sebastian Auriol

From: Sebastian [mailto:user-7b2156f36779@xymon.invalid] Sent: 30 November 2007 17:04
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] RE: Highlights of the 4.3.0 version

Henrik, 
The new features sound good!  But are they now documented?  I checked the
snapshot and none of man pages or the Changes page seem to have been updated
since the 4.2.0 release.  It makes it quite difficult to test and use the
new features (which presumably is a requirement before releasing the new
version)! ;-)

Or am I missing something less obvious than using the source (TM)? 
I think it would be useful if the Changes page (
<http://www.hswn.dk/beta/snapshot/Changes>;
http://www.hswn.dk/beta/snapshot/Changes) was kept more or less up-to-date
for the snapshot releases.  Knowing what has changed in hobbit-server betas
seems pretty difficult without downloading frequent snapshots and doing
diffs.  Is there any chance of the source being put into a public subversion
/ CVS repository or something?  I see there is already a public CVS
repository for hobbit on SourceForge, but it only includes the hobbit-client
code.  A public repository may encourage more contributed patches.

I also see that last year you mentioned that you are "using RCS which is a
predecessor to CVS".  I don't know if this is still the case, but if so, it
appears pretty simple to migrate to CVS using rcs2cvs as documented here:

 <http://www.linuxdocs.org/HOWTOs/CVS-RCS-HOWTO-3.html>;
http://www.linuxdocs.org/HOWTOs/CVS-RCS-HOWTO-3.html Or as the OP did in  <http://www.nabble.com/RCS-to-svn-t792119.html>;
http://www.nabble.com/RCS-to-svn-t792119.html - he migrated RCS to SVN via
CVS. 
Many thanks, 
Sebastian  

user-ce4a2c883f75@xymon.invalid (Henrik Stoerner) wrote on Sun, 22 Jul 2007 00:08:12
+0200:

In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged.
There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon.
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.

Major new features ------------------ * PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used. * Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping. * Holiday support for alerts, including variable holidays (Easter etc) * Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets. * RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones. * Distributed worker modules allow sharing the load across multiple Hobbit servers * RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter. * Detection of statuses that are reported by multiple hosts * Client backend-support for the z/OS and z/VSE clients by Rich Smirna
Display things -------------- * Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x) * The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly. * Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition. * NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis * Host-comments are displayed as tool-tips, to save screen space.
Checks and graphs ----------------- * Network tests can use a specific source IP instead of the default * The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting * Client file checks can check for a symlink * "trends" report for RRD handling allows generating custom-made RRD files * Hobbit host- and status-counts are tracked in an RRD file
Miscellaneous ------------- * NCV reports can handle color-icons before the name:value data * hobbitlaunch tasks can be configured to run on certain hosts only * Time-warp detection and warning * Local unix-socket interface to Hobbit daemon * hobbitd_capture can collect several statuses and hand off such a batch to an external command * Support for SHA-224/256/384/512 digests

Regards, Henrik

list Jersey Man · Mon, 10 Nov 2014 20:05:11 -0800 ·

Curious if this ever worked..

Highlights of the 4.3.0 version 🔗 link

Highlights of the 4.3.0 version