Highlights of the 4.3.0 version
list Henrik Størner
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know. Major new features * PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used. * Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping. * Holiday support for alerts, including variable holidays (Easter etc) * Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets. * RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones. * Distributed worker modules allow sharing the load across multiple Hobbit servers * RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter. * Detection of statuses that are reported by multiple hosts * Client backend-support for the z/OS and z/VSE clients by Rich Smirna Display things * Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x) * The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly. * Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition. * NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis * Host-comments are displayed as tool-tips, to save screen space. Checks and graphs * Network tests can use a specific source IP instead of the default * The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting * Client file checks can check for a symlink * "trends" report for RRD handling allows generating custom-made RRD files * Hobbit host- and status-counts are tracked in an RRD file Miscellaneous * NCV reports can handle color-icons before the name:value data * hobbitlaunch tasks can be configured to run on certain hosts only * Time-warp detection and warning * Local unix-socket interface to Hobbit daemon * hobbitd_capture can collect several statuses and hand off such a batch to an external command * Support for SHA-224/256/384/512 digests Regards, Henrik
list Asif Iqbal
▸
On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Send a disable request thru email. Currently it only takes delay request
▸
Major new features* PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used. * Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping. * Holiday support for alerts, including variable holidays (Easter etc) * Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets. * RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones. * Distributed worker modules allow sharing the load across multiple Hobbit servers * RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter. * Detection of statuses that are reported by multiple hosts * Client backend-support for the z/OS and z/VSE clients by Rich Smirna Display things * Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x) * The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly. * Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition. * NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis * Host-comments are displayed as tool-tips, to save screen space. Checks and graphs * Network tests can use a specific source IP instead of the default * The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting * Client file checks can check for a symlink * "trends" report for RRD handling allows generating custom-made RRD files * Hobbit host- and status-counts are tracked in an RRD file Miscellaneous * NCV reports can handle color-icons before the name:value data * hobbitlaunch tasks can be configured to run on certain hosts only * Time-warp detection and warning * Local unix-socket interface to Hobbit daemon * hobbitd_capture can collect several statuses and hand off such a batch to an external command * Support for SHA-224/256/384/512 digests Regards, Henrik
--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
list Asif Iqbal
▸
On 7/21/07, Asif Iqbal <user-6f4b51ac2a40@xymon.invalid> wrote:
On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Monitor and RRD of memory and cpu usage for a process [..stripped for brevity..] -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
list Asif Iqbal
▸
On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html) - SNMP trap by default - SNMP probe option builtin - Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html) - Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html) - Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html) - CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html) - Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html) Thanks again for such an excellent application and keeping it open!! -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
list Scott Walters
▸
On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below;
Great to see the summary, these features look great. I'd like to request more RRDs and reports about the monitoring system and the servers/services monitored. For example: I think the following could be "gauge" metrics: Number of devices monitored Number of services monitored Number of host.service in green state Number of host.service in yellow state Number of host.service in red state Number of host.service in XXX state I am thinking these could be done by creating counters within hobbit (since boot): Number of state changes Number of state changes per server Number of state changes per service Number of notifications sent I think the above metrics could help create reports over time periods for review to help get to "management by facts" vs. "management by feeling." Most admins that pay attention to their install will "know", but its different when you can "prove." Plus, when improvements are made, it's nice to see it. I am also thinking we could try and apply some Six Sigma terminology and methodology to hobbit which may have value. Six Sigma keys on statistics and defects. Six Sigma refers to having production quality such that you only see 3.4 defects per million. Granted we are not "producing" a physical item, but I am thinking that a defect could be considered a purple/yellow/red state. With counters I suggested above, we could to apply various statistical measures (control charts, pareto charts, etc.) and see what makes sense or has value for monitoring. The goal is to improve consistency and reduce variance. If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize. I definitely think hobbit could benefit from internal counters, similarly to how on OS keeps tracks of context switches and the like. Scott
list Henrik Størner
▸
On Sat, Jul 21, 2007 at 09:34:11PM -0400, Scott Walters wrote:
Great to see the summary, these features look great. I'd like to request more RRDs and reports about the monitoring system and the servers/services monitored. For example: I think the following could be "gauge" metrics: Number of devices monitored Number of services monitored Number of host.service in green state Number of host.service in yellow state Number of host.service in red state Number of host.service in XXX state
You mean like this:
Statistics:
Hosts : 4321
Pages : 286
Status messages : 22331
- Red : 907 ( 4.06 %)
- Red (non-propagating) : 809 ( 3.62 %)
- Yellow : 353 ( 1.58 %)
- Yellow (non-propagating) : 210 ( 0.94 %)
- Clear : 1970 ( 8.82 %)
- Green : 17052 (76.36 %)
- Purple : 452 ( 2.02 %)
- Blue : 578 ( 2.59 %)
The first three are from the current "bbgen --report" status message;
I've added the breakdown of the colors now. Will put these into an RRD
for tracking trends.
▸
I am thinking these could be done by creating counters within hobbit (since boot): Number of state changes Number of state changes per server Number of state changes per service Number of notifications sent
The state changes can be calculated from the history logs. This is preferable, I think, because that way it won't get reset if the Hobbit server is restarted. Notifications - it would make sense to have the alert module provide some statistics that we could put into a trend graph.
▸
If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize. I definitely think hobbit could benefit from internal counters, similarly to how on OS keeps tracks of context switches and the like.
Please do. The graphs I've created about the Hobbit "internals" have been mostly for my own use as debugging / performance evaluation data. If we can provide some data that is interesting to management, that would be a good thing. Regards, Henrik
list Asif Iqbal
▸
On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Here is another feature I like to see.
A way for the hobbit server to request hobbit clent to run a command locally
based on an alert.
(Pretty similar to sending a request to the client to download newer version
from download dir)
May be a dir called ~hobbit/server/command (like
~hobbit/server/download)
In that command file define the command.
Then in the ~hobbit/server/etc/client-local.cfg file define a class and in
the class
have a attribute like
clientcommand: command definition
And in the bb-hosts file msgs:command
So whenever there is a msgs alert run that command locally on the client
▸
Major new features* PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used. * Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping. * Holiday support for alerts, including variable holidays (Easter etc) * Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets. * RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones. * Distributed worker modules allow sharing the load across multiple Hobbit servers * RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter. * Detection of statuses that are reported by multiple hosts * Client backend-support for the z/OS and z/VSE clients by Rich Smirna Display things * Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x) * The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly. * Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition. * NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis * Host-comments are displayed as tool-tips, to save screen space. Checks and graphs * Network tests can use a specific source IP instead of the default * The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting * Client file checks can check for a symlink * "trends" report for RRD handling allows generating custom-made RRD files * Hobbit host- and status-counts are tracked in an RRD file Miscellaneous * NCV reports can handle color-icons before the name:value data * hobbitlaunch tasks can be configured to run on certain hosts only * Time-warp detection and warning * Local unix-socket interface to Hobbit daemon * hobbitd_capture can collect several statuses and hand off such a batch to an external command * Support for SHA-224/256/384/512 digests Regards, Henrik
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
list Charles Goyard
▸
Henrik Stoerner wrote :
- Red : 907 ( 4.06 %)
- Red (non-propagating) : 809 ( 3.62 %)
- Yellow : 353 ( 1.58 %)
- Yellow (non-propagating) : 210 ( 0.94 %)Hey, what a nice hook to tell about a bug in nopropred/nopropyellow: I _often_ (but not always) get a red status with nopropred on the bb2 page. Full report here: [Jun, 21th] (http://www.hswn.dk/hobbiton/2007/06/msg00311.html) -- Charles Goyard - user-a6cdca7046e2@xymon.invalid - (+33) 1 45 38 01 31 Orange Business Services - online multimedia // ingénierie
list Daniel J McDonald
▸
On Sun, 2007-07-22 at 00:08 +0200, Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
Get hobbitfetch to not crash, hang, or spin the cpu at 100%. I don't have to use hobbitfetch for many hosts, but it is incredibly annoying for the few that I do that I have to kill -6 the hobbitfetch process 4-5 times a day in order to get any statuses. -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com
list Henrik Størner
▸
On Mon, Jul 23, 2007 at 06:14:14AM -0500, Daniel J McDonald wrote:
On Sun, 2007-07-22 at 00:08 +0200, Henrik Stoerner wrote:This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.Get hobbitfetch to not crash, hang, or spin the cpu at 100%.
I know, this one is definitely a "must-fix-before-4.3.0" bug. Henrik
list David Gilmore
Henrik, One think I would like to see is the ability to encrypt traffic between the client and the server. Of course this would mean some tweaking of code on the BBWin side of things too. That is one feature that I know the BB Pro client introduced a few years back that would be an excellent addition for those of us who monitor systems at client sites. Thank you and keep up the good work. Dave Gilmore
list T.J. Yang
Great to see author of larrd participating hobbit discussion. see below for my comments.
▸
From: "Scott Walters" <user-2c405ccfe1ee@xymon.invalid> Reply-To: user-ae9b8668bcde@xymon.invalid To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Highlights of the 4.3.0 version Date: Sat, 21 Jul 2007 21:34:11 -0400 On 7/21/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below;Great to see the summary, these features look great. I'd like to request more RRDs and reports about the monitoring system and the servers/services monitored. For example: I think the following could be "gauge" metrics: Number of devices monitored Number of services monitored Number of host.service in green state Number of host.service in yellow state Number of host.service in red state Number of host.service in XXX state I am thinking these could be done by creating counters within hobbit (since boot): Number of state changes Number of state changes per server Number of state changes per service Number of notifications sent I think the above metrics could help create reports over time periods for review to help get to "management by facts" vs. "management by feeling." Most admins that pay attention to their install will "know", but its different when you can "prove." Plus, when improvements are made, it's nice to see it.
Providing OS type and version metrics also, this will give us a clear view of how many vendor unsupported OS version(ex. solaris 2.5.1,2.6,2.7, hpux 9,hpux 10.20 etc) are still in an IT system. Henrik showed me the command on this list last time I asked but it will be good if this can be done from hobbit server.
▸
I am also thinking we could try and apply some Six Sigma terminology and methodology to hobbit which may have value. Six Sigma keys on statistics and defects. Six Sigma refers to having production quality such that you only see 3.4 defects per million. Granted we are not "producing" a physical item, but I am thinking that a defect could be considered a purple/yellow/red state. With counters I suggested above, we could to apply various statistical measures (control charts, pareto charts, etc.) and see what makes sense or has value for monitoring.
In Six Sigma, the availability is formated with 5 Nines(99.999), There is some patches floating around to make HB's Availability report showing 5 Nines format This is a baby step but got asked by management why the bb/hb report is one digit short of nines. Associate Hobbit more with Six Sigma is definitely a good thing. Connecting Hobbit with ITIL is even better. tj
▸
The goal is to improve consistency and reduce variance. If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize. I definitely think hobbit could benefit from internal counters, similarly to how on OS keeps tracks of context switches and the like. Scott
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_2G_0507
list S Aiello
▸
On Saturday 21 July 2007 18:08, Henrik Stoerner wrote:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. .......
Just checking if the iconnames.patch will be included in 4.3.0. That is that patch that allowed &color-acked keyword to display the acknowledged icons in tests ? Thank you for all of your work, ~Steve
list Henrik Størner
▸
On Mon, Jul 23, 2007 at 12:16:53PM -0400, user-ce96540ed38f@xymon.invalid wrote:
Just checking if the iconnames.patch will be included in 4.3.0. That is that patch that allowed &color-acked keyword to display the acknowledged icons in tests ?
Those patches that have been posted over the past year have all been merged into the code, so yes - it's included. Regards, Henrik
list Mike Arnold
▸
Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5 -- -mike
list Henrik Størner
▸
On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:
Henrik Stoerner wrote:This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5
No problem, except I don't understand why the Wiki claims that it is
necessary to remove the '.' from the numbers. It seems this is to
convert the data to percentages, but that can be done in the graph
definition:
[vmstat-pc]
TITLE Used Physical CPU
YAXIS pc (100 = 1 CPU)
DEF:pc=vmstat.rrd:cpu_pc:AVERAGE
CDEF:pcpercent=pc,100,*
LINE2:pcpercent#00CC00
And the "-b 1024" in the Wiki graph definition looks bogus.
I'm cc'ing Stef Coene who wrote the Wiki entry to see if he can shed
some light on this.
Regards,
Henrik
list Henrik Størner
▸
On Mon, Jul 23, 2007 at 08:59:54AM +0200, Charles Goyard wrote:
what a nice hook to tell about a bug in nopropred/nopropyellow: I _often_ (but not always) get a red status with nopropred on the bb2 page. Full report here: [Jun, 21th] (http://www.hswn.dk/hobbiton/2007/06/msg00311.html)
noprop's don't affect the bb2 page - they only control if a status affects the color of the "main page" that the status is on. If you want to remove these from the BB2 page, run bbgen with "--bb2-ignorecolumns=procs_master,is_master". Regards, Henrik
list Scott Walters
▸
If you like, I could draft up some graphs and reports I'd like to see. My above description might be hard to visualize.
Henrik, you're right about using the histories for reports. That data keeps its integrity unlike the RRD averages, much better for reports. For a given input period (Last 7 days, June 2007, etc.) * Servers with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on server would generate list of state changes. "Look Bob, your server is not stable you need to get your developers under control!" * Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes." * Red events with longest durations (for events still open, use start time to NOW as duration) * Yellow events with longest durations (for events still open, use start time to NOW as duration) * All ping/fping/conn events. You could piece-meal some of those from the eventlog report, but I'd prefer a single page that showed them all. For weekly, quarterly meetings, turnover, etc. Scott
list Stef Coene
▸
On Monday 23 July 2007, you wrote:
On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5No problem, except I don't understand why the Wiki claims that it is necessary to remove the '.' from the numbers. It seems this is to convert the data to percentages, but that can be done in the graph definition:
The cpu_pc and cpu_ec has always a "." in it: kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------------------- r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 2 2 1341331 15949 0 8 7 85 49 0 583 34667 1878 38 5 55 2 0.46 46.0 pc has always 2 numbers after the "." and ec 1 (I hope this stays the same for next AIX releases). Rrd wants integers (I think), so you have to strip the "." from the numbers. And, indeed, the -b 1024 is a copy-and-paste error. I have more AIX updates (iostat graphs), but I don't have the time to create patches, maybe at the end of this week. After that I'm 2 weeks on holiday. Stef
list Stef Coene
▸
On Monday 23 July 2007, Henrik Stoerner wrote:
On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:Henrik Stoerner wrote:This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5
Related to this post, I have a perl script that can manipulate rrds: - adding RRA's (so you can add MAX and MIN) - adding DS's for the extra vmstat number with AIX 5.3) - changing the DS (so you can keep the data longer) - migrate from OS to OS (with rrdtool dump and restore) Let me know if you are interested. The script itself uses some custom perl library's so I can not publish them, but I can try to filter out the needed information and procedures. Stef
list Henrik Størner
▸
On Tue, Jul 24, 2007 at 08:38:37AM +0200, Stef Coene wrote:
On Monday 23 July 2007, you wrote:On Mon, Jul 23, 2007 at 01:51:55PM -0700, Mike Arnold wrote:I'd like to see POWER5 CPU stats: http://www.docum.org/twiki/bin/view/Hobbit/AixPower5No problem, except I don't understand why the Wiki claims that it is necessary to remove the '.' from the numbers. It seems this is to convert the data to percentages, but that can be done in the graph definition:The cpu_pc and cpu_ec has always a "." in it: kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------------------- r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec 2 2 1341331 15949 0 8 7 85 49 0 583 34667 1878 38 5 55 2 0.46 46.0 pc has always 2 numbers after the "." and ec 1 (I hope this stays the same for next AIX releases). Rrd wants integers (I think), so you have to strip the "." from the numbers.
No, RRD uses floating-point numbers everywhere. So I'll keep the numbers unmodified - then we won't have any problems if IBM does change the number of decimals they report. Regards, Henrik
list Ralph Mitchell
▸
On 7/23/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:
* Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
Heh, that would be useful. I've got a perl script using SOAP to get BigIP pool status and some joker has transferred some machines between BigIPs without removing the old definitions. So, there's a bunch of systems/ports that flip/flop between enable & disable. Whether they're red or green depends on which report comes in last. Maybe I can persuade the load balancer guys to actually remove the duplicate definitions. Ralph Mitchell
list Henrik Størner
▸
On Tue, Jul 24, 2007 at 09:18:49AM -0500, Ralph Mitchell wrote:
On 7/23/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:* Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."Heh, that would be useful. I've got a perl script using SOAP to get BigIP pool status and some joker has transferred some machines between BigIPs without removing the old definitions. So, there's a bunch of systems/ports that flip/flop between enable & disable. Whether they're red or green depends on which report comes in last.
That should actually be caught by another 4.3.0 feature: Flap detection. If a status changes more than 10 times in 10 minutes, Hobbit deems it "flapping" and stops logging status changes - instead, it fixes the status at the most critical level reported. Any hosts flapping are reported on the "hobbitd" status display. Regards, Henrik
list Henrik Størner
▸
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:
- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default - SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data. Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
▸
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script. Regards, Henrik
list Ralph Mitchell
▸
On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Jul 24, 2007 at 09:18:49AM -0500, Ralph Mitchell wrote:On 7/23/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:* Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."Heh, that would be useful. I've got a perl script using SOAP to get BigIP pool status and some joker has transferred some machines between BigIPs without removing the old definitions. So, there's a bunch of systems/ports that flip/flop between enable & disable. Whether they're red or green depends on which report comes in last.That should actually be caught by another 4.3.0 feature: Flap detection. If a status changes more than 10 times in 10 minutes, Hobbit deems it "flapping" and stops logging status changes - instead, it fixes the status at the most critical level reported.
Unfortunately that's not going to affect my particular checks. Right now I have a Hobbit client kicking off the test on a 5 minute interval, so it goes off at time T, T+5min, T+10min, etc. The duplicated servers are only on 2 BigIPs, so they flip/flop over and back at time T, T+5, T+10. At most there will be 6 changes in a 10 minute period. Could that 10-times-in-10-minutes be made into a variable?? Maybe a default value in the hobbitserver.cfg with an override in bb-hosts, though I hate to add yet another inch to the width of that file... Actually, even flap detection isn't going to help my situation - the reports are going to be red for the BigIP where the server/port is disabled and green/red for the BigIP that *really* owns the server, so flap detection would show red anyway. All the time. I really need to get the duplicates removed. Ralph Mitchell
list Henrik Størner
▸
On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:
Here is another feature I like to see. A way for the hobbit server to request hobbit clent to run a command locally based on an alert.
[snip]
So whenever there is a msgs alert run that command locally on the client
Run this as a client extension:
#!/bin/sh
# Get the current status of the "msgs" column
MSGSSTATUS=`$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }`
# Get the command we must run from the client config
CMD=`grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e 's!^msgsrecovercmd:!!'`
# If "msgs" is red and there is a command, run it
if test "$MSGSSTATUS" = "red" -a "$CMD" != ""
then
$CMD
fi
exit 0
Before doing this, consider the security implications of having your
servers run commands that they fetch from a remote host without
authentication.
Regards,
Henrik
list Greg L Hubbard
Well, we watch for the presence of processes today. It would be nice to be able to track cpu and size of "important" processes over time. Another problem is detecting CPU hogs (sometimes things run away), another problem is detecting processes with memory leaks -- they just grow and grow and grow. How can Hobbit help? GLH
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, July 24, 2007 3:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default - SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data. Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script. Regards, Henrik
list Greg L Hubbard
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own? I know the Hobbit model is to have the server own the configurations, but how do we solve the "trust" problem?
▸
GLH
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, July 24, 2007 3:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:Here is another feature I like to see. A way for the hobbit server to request hobbit clent to run a command locally based on an alert.
[snip]
So whenever there is a msgs alert run that command locally on the client
Run this as a client extension:
#!/bin/sh
# Get the current status of the "msgs" column
MSGSSTATUS=`$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }`
# Get the command we must run from the client config
CMD=`grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e
's!^msgsrecovercmd:!!'`
# If "msgs" is red and there is a command, run it
if test "$MSGSSTATUS" = "red" -a "$CMD" != ""
then
$CMD
fi
exit 0
Before doing this, consider the security implications of having your
servers run commands that they fetch from a remote host without
authentication.
Regards,
Henrik
list Greg Shea
Hi Greg,
I needed to do this originally with BB to track a memory leak with HP
OpenView's
pmd process, when we used to use it.
#!/bin/sh
#
# SCRIPTS IN THE BBHOME/ext DIRECTORY ARE ONLY RUN IF
# THEY ARE DEFINED IN THE ENTRY FOR THE CURRENT HOST
# LISTED IN THE ext/bb-bbexttab FILE.
#
#
# BBPROG SHOULD JUST CONTAIN THE NAME OF THIS FILE
# USEFUL WHEN YOU GET ENVIRONMENT DUMPS TO LOCATE
# THE OFFENDING SCRIPT...
#
BBPROG=bb-pmd.sh; export BBPROG
#
# TEST NAME: THIS WILL BECOME A COLUMN ON THE DISPLAY
# IT SHOULD BE AS SHORT AS POSSIBLE TO SAVE SPACE...
# NOTE YOU CAN ALSO CREATE A HELP FILE FOR YOUR TEST
# WHICH SHOULD BE PUT IN www/help/$TEST.html. IT WILL
# BE LINKED INTO THE DISPLAY AUTOMATICALLY.
#
TEST="pmd"
#
# BBHOME CAN BE SET MANUALLY WHEN TESTING.
# OTHERWISE IT SHOULD BE SET FROM THE BB ENVIRONMENT
#
#BBHOME=/opt/BB/bb19c ; export BBHOME # FOR TESTING
if test "$BBHOME" = ""
then
echo "BBHOME is not set... exiting"
exit 1
fi
if test ! "$BBTMP" # GET DEFINITIONS IF NEEDED
then
# echo "*** LOADING BBDEF ***"
. $BBHOME/etc/bbdef.sh # INCLUDE STANDARD DEFINITIONS
fi
PMDMEM=`/bin/ps -e -o vsz -o comm | grep " pmd" | awk '{printf "%d",
$1/1024}'`
if test "$PMDMEM" = ""
then
COLOR="clear"
else
COLOR="green"
fi
#
# AT THIS POINT WE HAVE OUR RESULTS. NOW WE HAVE TO SEND IT TO
# THE BBDISPLAY TO BE DISPLAYED...
#
# MACHINE NAME MUST EITHER BE A REAL MACHINE NAME, OR
# LOOK LIKE A REAL MACHINE (in the case of arbitrary measurements
# like temperature). IF THE NAME YOU ARE USING DOESN'T EXIST
# IN THE DNS THEN IT SHOULD BE LISTED IN THE bb-hosts FILE WITH noping,
# PREFERABLY IN IT'S OWN GROUP...
# NOTE THE COMMAS HERE - YOU NEED THEM!
MACHINE=`echo $MACHINE | $SED 's/\./,/g'` # HAS TO BE IN A,B,C FORM
#
# THE FIRST LINE IS STATUS INFORMATION... STRUCTURE IMPORANT!
# THE REST IS FREE-FORM - WHATEVER YOU'D LIKE TO SEND...
#
LINE="PMD Statistics.
"
SUMMARY="
PMD memory usage is $PMDMEM"
# NOW USE THE BB COMMAND TO SEND THE DATA ACROSS
# SEND IT TO BBDISPLAY
$BB $BBDISP "status $MACHINE.$TEST $COLOR `date` $LINE $SUMMARY MB"
▸
-----Original Message-----
From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid]
Sent: Tuesday, July 24, 2007 4:44 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version
Well, we watch for the presence of processes today. It would be nice to
be able to track cpu and size of "important" processes over time.
Another problem is detecting CPU hogs (sometimes things run away),
another problem is detecting processes with memory leaks -- they just
grow and grow and grow. How can Hobbit help?
GLH
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, July 24, 2007 3:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default - SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data. Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script. Regards, Henrik
list Greg L Hubbard
Thanks!
-----Original Message-----
From: user-762ee872a5a4@xymon.invalid [mailto:user-762ee872a5a4@xymon.invalid]
Sent: Tuesday, July 24, 2007 3:56 PM
To: user-ae9b8668bcde@xymon.invalid
Cc: user-762ee872a5a4@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version
Hi Greg,
I needed to do this originally with BB to track a memory leak with HP
OpenView's pmd process, when we used to use it.
#!/bin/sh
#
# SCRIPTS IN THE BBHOME/ext DIRECTORY ARE ONLY RUN IF # THEY ARE DEFINED
IN THE ENTRY FOR THE CURRENT HOST # LISTED IN THE ext/bb-bbexttab FILE.
#
#
# BBPROG SHOULD JUST CONTAIN THE NAME OF THIS FILE # USEFUL WHEN YOU GET
ENVIRONMENT DUMPS TO LOCATE # THE OFFENDING SCRIPT...
#
BBPROG=bb-pmd.sh; export BBPROG
#
# TEST NAME: THIS WILL BECOME A COLUMN ON THE DISPLAY # IT SHOULD BE AS
SHORT AS POSSIBLE TO SAVE SPACE...
# NOTE YOU CAN ALSO CREATE A HELP FILE FOR YOUR TEST # WHICH SHOULD BE
PUT IN www/help/$TEST.html. IT WILL # BE LINKED INTO THE DISPLAY
AUTOMATICALLY.
#
TEST="pmd"
#
# BBHOME CAN BE SET MANUALLY WHEN TESTING.
# OTHERWISE IT SHOULD BE SET FROM THE BB ENVIRONMENT #
#BBHOME=/opt/BB/bb19c ; export BBHOME # FOR TESTING
if test "$BBHOME" = ""
then
echo "BBHOME is not set... exiting"
exit 1
fi
if test ! "$BBTMP" # GET DEFINITIONS IF NEEDED
then
# echo "*** LOADING BBDEF ***"
. $BBHOME/etc/bbdef.sh # INCLUDE STANDARD DEFINITIONS
fi
PMDMEM=`/bin/ps -e -o vsz -o comm | grep " pmd" | awk '{printf "%d",
$1/1024}'` if test "$PMDMEM" = ""
then
COLOR="clear"
else
COLOR="green"
fi
#
# AT THIS POINT WE HAVE OUR RESULTS. NOW WE HAVE TO SEND IT TO # THE
BBDISPLAY TO BE DISPLAYED...
#
# MACHINE NAME MUST EITHER BE A REAL MACHINE NAME, OR # LOOK LIKE A REAL
MACHINE (in the case of arbitrary measurements # like temperature). IF
THE NAME YOU ARE USING DOESN'T EXIST # IN THE DNS THEN IT SHOULD BE
LISTED IN THE bb-hosts FILE WITH noping, # PREFERABLY IN IT'S OWN
GROUP...
# NOTE THE COMMAS HERE - YOU NEED THEM!
MACHINE=`echo $MACHINE | $SED 's/\./,/g'` # HAS TO BE IN A,B,C FORM
#
# THE FIRST LINE IS STATUS INFORMATION... STRUCTURE IMPORANT!
# THE REST IS FREE-FORM - WHATEVER YOU'D LIKE TO SEND...
#
LINE="PMD Statistics.
"
SUMMARY="
PMD memory usage is $PMDMEM"
# NOW USE THE BB COMMAND TO SEND THE DATA ACROSS # SEND IT TO BBDISPLAY
$BB $BBDISP "status $MACHINE.$TEST $COLOR `date` $LINE $SUMMARY MB"
-----Original Message-----
From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid]
Sent: Tuesday, July 24, 2007 4:44 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version
Well, we watch for the presence of processes today. It would be nice to
be able to track cpu and size of "important" processes over time.
Another problem is detecting CPU hogs (sometimes things run away),
another problem is detecting processes with memory leaks -- they just
grow and grow and grow. How can Hobbit help?
GLH
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Tuesday, July 24, 2007 3:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)
I'll leave that for later. There will probably be an entire new version with just display things.
- SNMP trap by default - SNMP probe option builtin
Too much for now. I need to dig into the Net-SNMP library API to do that.
- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)
Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg
- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)
Has been implemented for 4.3.0
- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)
Haven't looked at that.
- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)
Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data. Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.
- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script. Regards, Henrik
list Francesco Duranti
I've a problem with bbhostgrep of the latest snapshots ... It seems that he cannot get the data from the bb-hosts file... Just to do an example... I've a host defined in the bb-hosts file as: 0.0.0.0 ITROMFS10 # WIN:* netapp Now if i do: bbhostgrep netapp i get: sh-3.1$ bbhostgrep netapp 2007-07-24 23:13:19 Cannot load bb-hosts, or file is empty bbhostshow works correctly Francesco
list Trent Melcher
Why dont you just use the SCRIPT feature of hobbit-alerts? You can setup ssh authentication between your hobbit server and hobbit clients. Then if a specific test goes red, its executes the script, which in turn ssh's to the remote server having the issue and executes the script there to resolve the issue or whatever you need it to do. We did this with a legacy application we use to have, the app would stop listening on its ports and the only way to fix it was to respin the application. So hobbit would test the port and if it failed it would send a page and fire off a script to spin the app. After a while we got tired of the pages so we had it email a generic mailbox that someone checked once in a while and removed it paging us. Worked great, never had customers complaints on that specific app after that. Trent
▸
On Tue, 2007-07-24 at 15:55 -0500, Hubbard, Greg L wrote:Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own? I know the Hobbit model is to have the server own the configurations, but how do we solve the "trust" problem? GLH -----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Tuesday, July 24, 2007 3:41 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Highlights of the 4.3.0 version On Sun, Jul 22, 2007 at 08:01:12PM -0400, Asif Iqbal wrote:Here is another feature I like to see.A way for the hobbit server to request hobbit clent to run a command > locally based on an alert.[snip]So whenever there is a msgs alert run that command locally on the > clientRun this as a client extension: #!/bin/sh # Get the current status of the "msgs" column MSGSSTATUS=`$BB $BBDISP "query $MACHINE.msgs" | awk '{ print $1 }` # Get the command we must run from the client config CMD=`grep "^msgsrecovercmd:" $BBTMP/logfetch.$MACHINEDOTS.cfg | sed -e 's!^msgsrecovercmd:!!'` # If "msgs" is red and there is a command, run it if test "$MSGSSTATUS" = "red" -a "$CMD" != "" then $CMD fi exit 0 Before doing this, consider the security implications of having your servers run commands that they fetch from a remote host without authentication. Regards, Henrik
list Henrik Størner
▸
On Mon, Jul 23, 2007 at 09:44:11PM -0400, Scott Walters wrote:
For a given input period (Last 7 days, June 2007, etc.) * Servers with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on server would generate list of state changes. "Look Bob, your server is not stable you need to get your developers under control!" * Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction? The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done. Regards, Henrik
list Henrik Størner
▸
On Tue, Jul 24, 2007 at 11:21:15PM +0200, Francesco Duranti wrote:
I've a problem with bbhostgrep of the latest snapshots ... It seems that he cannot get the data from the bb-hosts file...
The current snapshot has this fixed. Regards, Henrik
list Scott Walters
▸
On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?
Bingo. But since that was so easy, here are a few more: * So voodoo.hswn.dk has had 12 state changes . . . what were they? It would be nice If the server name and service name could be HTML links which would generate a report of the state changes for the specified server/service over the given period. * Also, please show the total. If there are more then 10 hosts/services use an "Other" at the end of the list. I love seeing single hosts on 100+ node installs with 25% of activity. You know where to focus. * And I would imagine the "Top X" where X is configurable will be requested. * And print the report period on the page so you know what you are looking at.
▸
The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done.
We'll keep requesting features until it gets hard ;) Scott Walters -PacketPusher
list John G
▸
On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Mon, Jul 23, 2007 at 09:44:11PM -0400, Scott Walters wrote:For a given input period (Last 7 days, June 2007, etc.) * Servers with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on server would generate list of state changes. "Look Bob, your server is not stable you need to get your developers under control!" * Services with the most state changes, sorted by highest to lowest (Maybe just Top 10). Clicking on service would generate list of the state changes for that period. "PHB, the web group is performing way too many undocumented code changes."I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction? The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done. Regards, Henrik
Henrik, I like this. This provides a lot of flexibility on reporting the 10 ten stats. I could see where bigger sites might want more than a top 10 listed. Maybe it could be 10 by default and have the option to list more. John
list Asif Iqbal
▸
On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)I'll leave that for later. There will probably be an entire new version with just display things.- SNMP trap by default - SNMP probe option builtinToo much for now. I need to dig into the Net-SNMP library API to do that.- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)Has been implemented for 4.3.0- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)Haven't looked at that.- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data.
Well in my `hobbit-clients.cfg' there is already an entry like this.
PROC "%hobbitd.*" TRACK=hobbitd
It already counts the total number of %hobbitd and label it as hobbitd.
How about let it count the total amount of rss and pcpu as well for that
process
and just create two more rrds?
It won't be really inaccurate because it gives you a graphical
representation of
what the `ps' is telling you. Plus it could be GAUGE type data I guess.
Atleast it will give you some trend of how a process has been behaving. Even
though it may not do the pmap -x calculation but it sure will give you
pointing
fingures to some heavy processes
I bet you lot of hobbit community members would like to see ps graphs
builtin
to hobbit app
▸
Another problem with this is identifying what a process is. Along-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)Easily done with an alert script. Regards, Henrik
Thanks for the feedback to all of my feature requests. It is very kind of you. -- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
list Asif Iqbal
▸
On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Sat, Jul 21, 2007 at 07:16:12PM -0400, Asif Iqbal wrote:- Display column only when it is red (http://www.*hobbit*mon.com/*hobbit*on/2006/08/msg00920.html)I'll leave that for later. There will probably be an entire new version with just display things.- SNMP trap by default - SNMP probe option builtinToo much for now. I need to dig into the Net-SNMP library API to do that.- Process specific alert (http://www.hswn.dk/hobbiton/2005/11/msg00159.html)Already in 4.2.0 via the GROUP definition in hobbit-clients.cfg and the corresponding rule in hobbit-alerts.cfg- Comment TAG for DOWNTIME (http://www.hobbitmon.com/hobbiton/2007/04/msg00141.html)Has been implemented for 4.3.0- Add functionalities in `delay' (http://www.hswn.dk/hobbiton/2005/06/msg00272.html)Haven't looked at that.- CPU/Memory Usage per process (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00429.html)Probably impossible. Most "ps" implementations can report the current amount of cpu/memory a process uses, but that's a snapshot (ever noticed how "top" always has itself in the top list of cpu-using processes?). What's interesting is not how much cpu/memory a process uses exactly when the Hobbit client runs the "ps" command, but how much it has used on average since the last client run - similar to what "vmstat" reports for the system as a whole. I don't know of any way to get this data. Another problem with this is identifying what a process is. A long-running daemon often forks child-processes that are short-lived; should we add their cpu-utilisation to that of the long-running process? If yes, then we have to monitor all processes that are started (so running once every N seconds is not sufficient); if no, then you won't spot the cpu hog because it was spawned as a child process.- Text based alert for `msgs'. Currently it shows as html in my email (http://osdir.com/ml/monitoring.hobbit/2007-01/msg00203.html)
Easily done with an alert script.
Well all my messages show up in html format. Wouldn't it be nice to generate the email, or have a choice to generate email, as text type instead of html type. Also this email may suggest that text based email alert is possible. http://www.hobbitmon.com/hobbiton/2005/10/msg00382.html However, I might be misreading that email. Again, please understand this is still just a low priority feature request. Until then I will just explore the script idea that you suggested. Appreciate all your work really! Regards,
Henrik
-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
list Sabeer MZ
Hi Henrik, New Feature Request.- I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.
▸
On 7/25/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?Bingo. But since that was so easy, here are a few more: * So voodoo.hswn.dk has had 12 state changes . . . what were they? It would be nice If the server name and service name could be HTML links which would generate a report of the state changes for the specified server/service over the given period. * Also, please show the total. If there are more then 10 hosts/services use an "Other" at the end of the list. I love seeing single hosts on 100+ node installs with 25% of activity. You know where to focus. * And I would imagine the "Top X" where X is configurable will be requested. * And print the report period on the page so you know what you are looking at.The nice thing about making it an add-on or variant of the eventlog report is that there's already all of the nice filtering for hosts, pages, time-periods etc in place, plus the "allevents" logfile parsing is also done.We'll keep requesting features until it gets hard ;) Scott Walters -PacketPusher
--
Thanks
Sabeer MZ
list Henrik Størner
▸
On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:
New Feature Request.- I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.
Several possibilities already: 1) Ack the red/yellow statuses you have, and put this information in the acknowledgement text. 2) Disable the server and provide the information in the disable text. 3) Create a host "notes" file with the information. I'd use 1) or 2). I don't see the need for a fourth way of doing this. Regards, Henrik
list Charles Goyard
▸
Henrik Stoerner wrote :
On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:New Feature Request.- I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.Several possibilities already: 1) Ack the red/yellow statuses you have, and put this information in the acknowledgement text. 2) Disable the server and provide the information in the disable text. 3) Create a host "notes" file with the information.
Wasn't there a bb_bulletin feature too ?
▸
--
Charles Goyard - user-a6cdca7046e2@xymon.invalid - (+33) 1 45 38 01 31
Orange Business Services - online multimedia // ingénierie
list Henrik Størner
▸
On Wed, Jul 25, 2007 at 11:28:00AM +0200, Charles Goyard wrote:
Henrik Stoerner wrote :On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:New Feature Request.- I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.Several possibilities already: 1) Ack the red/yellow statuses you have, and put this information in the acknowledgement text. 2) Disable the server and provide the information in the disable text. 3) Create a host "notes" file with the information.Wasn't there a bb_bulletin feature too ?
~hobbit/server/web/bulletin_header and _footer, yes. But these show up on all pages, I think Sabeer wanted something specifically for a single status page. Regards, Henrik
list Henrik Størner
▸
On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:
On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?Bingo. But since that was so easy, here are a few more:
[snip]
We'll keep requesting features until it gets hard ;)
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report). Should cover everything you've asked for - at least until now, that is. Regards, Henrik
list Galen Johnson
Well, that's damned handy... =G=
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, July 25, 2007 10:05 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?Bingo. But since that was so easy, here are a few more:
[snip]
We'll keep requesting features until it gets hard ;)
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report). Should cover everything you've asked for - at least until now, that is. Regards, Henrik
list Johann Eggers
▸
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report). Should cover everything you've asked for - at least until now, that is.
Looks really great! Can we have next to the numbers also the percentage related to all event changes in the defined timeframe? Johann
list Jason Altrincham Jones
One thing that might be useful would be to alter the importance flag so if a critical system goes down it comes in with an exclamation mark etc. so it stands out from the other hobbit alerts....just a thought. Jason.
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: 25 July 2007 15:05
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tue, Jul 24, 2007 at 10:15:02PM -0400, Scott Walters wrote:On 7/24/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:I've whipped up a very rough implementation as part of the eventlog report on the Hobbit demo site. Could you try generating a report at http://www.hswn.dk/hobbit-cgi/bb-eventlog.sh and let me know if the data at the top is something in the right direction?Bingo. But since that was so easy, here are a few more:
[snip]
We'll keep requesting features until it gets hard ;)
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report). Should cover everything you've asked for - at least until now, that is. Regards, Henrik
list Scott Walters
▸
On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" -> "Top Changes" report). Should cover everything you've asked for - at least until now, that is.
Perfect. You rock Henrik. Scott Walters -PacketPusher
list Scott Walters
On 7/25/07, Scott Walters <user-2c405ccfe1ee@xymon.invalid> wrote:
Perfect. You rock Henrik.
Not quite, Could you add the total at the bottom of the "Top X" list. Scott Walters -PacketPusher
list Peter Welter
One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier. Thanks, Peter 2007/7/22, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid>:
▸
[snip]
This doesn't mean that I won't consider adding new stuff before the
4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
list John Glowacki
▸
Peter Welter wrote:
One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier. Thanks, Peter
That would be good to have. I have been asked if hobbit does this from other groups in the company. John
list Johann Eggers
▸
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.sh (You can also just go to the demo site, and pick the "Reports" ->
"Top
Changes" report). Should cover everything you've asked for - at least until now, that is.Looks really great! Can we have next to the numbers also the percentage related to all event changes in the defined timeframe?
Wonderful. That's some kind of information you can show your manger(s) and you know where you have to probably investigate. Thanks Johann
list Henrik Størner
▸
On Wed, Jul 25, 2007 at 04:12:06PM +0200, Johann Eggers wrote:
Reports are usually rather boring things to do, but this one was fun. Have a look at the current state of this report at the Hobbit demo site http://www.hswn.dk/hobbit-cgi/hobbit-topchanges.shLooks really great! Can we have next to the numbers also the percentage related to all event changes in the defined timeframe?
Sure, already done. I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display. And likewise when you click on a service in the top-10 list, it gives you a list of the hosts that were counted for that service. Regards, Henrik
list Scott Walters
▸
On 7/25/07, John Glowacki <user-a1361bcdf988@xymon.invalid> wrote:
Peter Welter wrote:One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier. Thanks, PeterThat would be good to have. I have been asked if hobbit does this from other groups in the company.
Tracking disk IO gets complicated pretty quickly for a few reasons: * OS's don't have common commands for measuring disk performance * Do you watch IO by filesystem or spindle? If you have RAID, grabbing the data can become even more difficult. * People can disagree on what good disk IO means, and even fewer understand disk IO workloads. * I have no idea if Windows and the WMI has this kind of info. For *ix, the "blocked processes" of vmstat is an excellent way to see if the server overall is IO bound. I would definitely like to see that a "stock" displayed metric in 4.3.0. Most *ix vmstat provides that number. Similar to the iostat for Solaris, the info is collected, just not displayed. Scott Walters -PacketPusher
list Scott Walters
▸
On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display.
Wow. That is awesome. Great idea. Someone needs to talk to the mail
admin of SMTP for www.sslug.dk!
Could you also add the report period to the server and services "sub-reports"?
Could you add state changes by day of week and hour of day?
Mon 12
Tue 145
Wed 351
And since this all been so easy, how about trending reports based on
an interval?
For example, by week of year show total state changes for a specified
server or service. E.G.
server1 server2
Week 1 10 12
Week 2 134 23
I'll try and think of a clever way to use RRD for this kind of data.
I'd imagine we could structure the RRAs to avoid averaging, and force
timestamps to match the interval.
Scott Walters
-PacketPusher
list Gary Baluha
One feature I'd like to see is a more comprehensive editing page for the Critical Systems. Specifically, it'd be nice to see all of the currently defined groups. This would make it a little easier when adding new hosts to monitor in Hobbit, and ensure that they are added to the correct critical systems group (and to avoid duplicates and near-duplicates).
list Stef Coene
▸
On Wednesday 25 July 2007, Peter Welter wrote:
One of the things I'd like to see in 4.3.0 is what is already partly available on Sun-boxes in 4.2.0, the [iostatdisk] part. The stats are gathered but not plotted. I hope this monitor will make it, on all platforms, because you can pinpoint any disk performance problems much easier.
I have this running for AIX with an external script. One of my todo's is making the rrd hobbitd module like the vmstat module so you can have a definition per type host. On the other hand, the iostat output is different then the vmstat output and the external script is working fine .... Stef
list Charles Jones
All the new features sound great. It also sounds like nearly everyone has additional features they would like to see...do you use any sort of tool for tracking feature requests?
P.S. I might as well throw in my own feature request ;-)
* Content check should correctly follow 302 (redirects). I currently have to use a custom-made script that uses curl in order to do content checks. In fact, I will include it in case anyone wants to use it:
#!/bin/bash
# contchk.sh written by Charles Jones (user-02bccbb1bbb5@xymon.invalid) 6/6/2007
# This script is designed to perform a content check on a URL and report the
# status to a Hobbit server.
#
# This script was created because Hobbits built-in content check functionality
# does not follow 302 redirects.
#
# The script parses out a "contchk" tag in the bb-hosts file. The proper
# syntax is: contchk;URL;REFERRER;CHECKSTRING
#
# Note that CHECKSTRING cannot contain spaces so you must use regular
# expression metacharacters, so use something like string.with.spaces
BBHTAG=contchk # Name of the tag in bb-hosts
COLUMN=cont # Column display name in Hobbit
CURL=/usr/bin/curl # Location of curl binary
CURLOPTS="--connect-timeout 30 -m 30 -s -L -b cookiejar" # Curl options
# Note: using grep because bbhostgrep fails on long lines
grep $BBHTAG $BBHOME/etc/bb-hosts | while read L
do
set $L # To get one line of output from bbhostgrep
HOSTIP="$1"
MACHINEDOTS="$2"
MACHINE=`echo $2 | $SED -e's/\./,/g'`
CHECKURL=`echo $4 | awk -F";" '{print $2}'` # Parse out the check URL
REFERRER=`echo $4 | awk -F";" '{print $3}'` # Parse out the referrer string
if [ "" != "$REFERRER" ];
then
REFERRER="-e $REFERRER"
fi
CHECKSTRING=`echo $4 | awk -F";" '{print $4}'` # Parse out the check string
$CURL $CURLOPTS $REFERRER $CHECKURL |grep -q "$CHECKSTRING"
status=$? # Save greps return status
if [ 0 -eq $status ]; then # grep returns 0 if it found something
COLOR=green
MSG="String <b>\"$CHECKSTRING\"</b> was found in <a href=$CHECKURL>$CHECKURL</a>"
$BB $BBDISP "status $MACHINE.$COLUMN $COLOR `date` Content Check OK
${MSG}
"
else # grep didn't find anything
COLOR=red
MSG="String <b>\"$CHECKSTRING\"</b> was NOT FOUND in <a href=$CHECKURL>$CHECKURL</a>"
$BB $BBDISP "status $MACHINE.$COLUMN $COLOR `date` Content Check FAILED
${MSG}
"
fi
done
exit 0
list Henrik Størner
▸
On Wed, Jul 25, 2007 at 01:11:47PM -0400, Scott Walters wrote:
On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display.Wow. That is awesome. Great idea. Someone needs to talk to the mail admin of SMTP for www.sslug.dk! Could you also add the report period to the server and services "sub-reports"?
Done.
▸
Could you add state changes by day of week and hour of day? And since this all been so easy, how about trending reports based on an interval?
Let's leave those for now - these will be more difficult to implement. The only other addition I'd like to make for this report now is to have it count the event durations instead of the number of changes, so you can have a top-10 report of the hosts (or services) that have the longest outages. Could be useful when playing the "blame game". "Look - the DB people are always soooo slow when it comes to cleaning up the filled tables". Regards, Henrik
list Ralph Mitchell
▸
On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Wed, Jul 25, 2007 at 01:11:47PM -0400, Scott Walters wrote:On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:I also added something I felt was missing: When you have the top-10 list showing that host "foo" has the most status changes, then when you click on that host I wanted an overview of what services put it in the top-10. So I added a summary by service when you click on a host in the top-10 display.
Just a couple of minor observations on the top-10 list:
1) the right hand box, "Top 10 Services" has a "Host" column. Probably
should be "Service"??
2) I was lazy and just put in the date, with no time of day, and scored an
"Internal Server Error". Could it default to "from 00:00:00" & "to
23:59:59", or maybe have a "last XX minutes OR from/to", same as the
Notification Report??
Thanks,
Ralph Mitchell
list Henrik Størner
▸
On Thu, Jul 26, 2007 at 02:35:26PM -0500, Ralph Mitchell wrote:
Just a couple of minor observations on the top-10 list: 1) the right hand box, "Top 10 Services" has a "Host" column. Probably should be "Service"??
Of course - fixed.
▸
2) I was lazy and just put in the date, with no time of day, and scored an "Internal Server Error". Could it default to "from 00:00:00" & "to 23:59:59", or maybe have a "last XX minutes OR from/to", same as the Notification Report??
I'll have to do some extra checking on that input. I've also added some buttons so you can easily select the last/current year/month/week. Regards, Henrik
list Sabeer MZ
Many thanks. I ll check it out...
▸
On 7/25/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Wed, Jul 25, 2007 at 11:26:33AM +0530, Sabeer MZ wrote:New Feature Request.- I would like to add some network news on the pages. Suppose we found that some server has bad disk and some one will fix it later so here i want add the info that this issue has taken care.Several possibilities already: 1) Ack the red/yellow statuses you have, and put this information in the acknowledgement text. 2) Disable the server and provide the information in the disable text. 3) Create a host "notes" file with the information. I'd use 1) or 2). I don't see the need for a fourth way of doing this. Regards, Henrik
--
Thanks
Sabeer MZ
list Jason Altrincham Jones
Hi All, I've been looking through the archives at the sample server side module Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm just curious if anyone knows how to send data from the client side to the server the same way the standard hobbit client tests do, looking at bb I'm guessing either bb data or bb client, but when I run bb <hobbitIP> "client <hostname>.<os>" nothing happens and the man pages don't mention how to actually send the data etc. Any help appreciated, Thanks, Jason.
list Ralph Mitchell
▸
On 7/31/07, Jones, Jason (Altrincham) <user-ee957b46acd2@xymon.invalid> wrote:
I've been looking through the archives at the sample server side module Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm just curious if anyone knows how to send data from the client side to the server the same way the standard hobbit client tests do, looking at bb I'm guessing either bb data or bb client, but when I run bb <hobbitIP> "client <hostname>.<os>" nothing happens and the man pages don't mention how to actually send the data etc.
I have a Hobbit client install running a BigIP check. In
client/etc/clientlaunch.cfg:
[bigip-v4]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/ext/bigip/bigip3.sh
LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
INTERVAL 5m
After doing what it needs to do to get the status, the script sends
off a status message to the server like this:
MACHINE=`echo $NAME | sed -e 's/\./,/g'`
MESSAGE="status $MACHINE.$TEST $COLOR `date`<P><font size=+2>The
$BIGIP BigIP says: $NAME $TEST is $STATE</font>"
$BB $BBDISP $MESSAGE
Is that what you're looking for??
Ralph Mitchell
list Sofian Brabez
Hello Jones, You can use the following command line to send data from server to your client : bbuser at server:$BBHOME/bin$ bb <hobbitdisplay> "status <hostname>.<test> <color> <date> <message>" <hobbitdisplay> is your BBDISPLAY set in your $BBHOME/etc/bb-hosts agent file <hostname> the name of the host <date> the current date, I should you to use the default unix command `date` <test> the service to monitor, for example cpu, conn, disk, msgs and you can put a regular expression <color> the color on BBDISPLAY <message> the message you want to display on your BBISPLAY and you can put HTML text into to have a best visual aspect I hope, I respond to you and help you. Regards -- Sofian Brabez Monitoring Team Natixis France user-2ae52e06a4a1@xymon.invalid
▸
From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid] Sent: Tuesday, July 31, 2007 11:52 AM To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] sending client side data Hi All, I've been looking through the archives at the sample server side module Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm just curious if anyone knows how to send data from the client side to the server the same way the standard hobbit client tests do, looking at bb I'm guessing either bb data or bb client, but when I run bb <hobbitIP> "client <hostname>.<os>" nothing happens and the man pages don't mention how to actually send the data etc. Any help appreciated, Thanks, Jason.
Ce message et toutes les pieces jointes peuvent etre confidentiels, et, de plus, peuvent etre couverts par un privilege ou une protection legale. Il est etabli a l'intention exclusive de ses destinataires. Toute utilisation de ce message non conforme a sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite, sauf autorisation expresse prealable. Toutes opinions exprimees dans ce message, sont personnelles a leur auteur et ne sauraient necessairement refleter celle de IXIS Corporate & Investment Bank, de ses filiales ou de sa maison mere. Elles sont aussi susceptibles de modification sans notification prealable. Tous droits reserves. Si vous recevez ce message par erreur, merci de le detruire et d'en avertir immediatement l'expediteur. Toute communication avec IXIS Corporate & Investment Bank peut etre controlee, enregistree et conservee. IXIS Corporate & Investment Bank decline toute responsabilite au titre de ce message s'il a ete altere, deforme ou falsifie. Les communications sur Internet n'etant pas securisees, IXIS Corporate & Investment Bank informe qu'il ne peut accepter aucune responsabilite quant au contenu de ce message.
This email and any attachment may be confidential and may also be legally privileged or otherwise protected from disclosure. It is intended only for the stated addressee(s) and access to it by any other person(s) is unauthorised. Any use, dissemination or disclosure not in accordance with its purpose, either in whole or in part, is prohibited without our prior formal approval. Any opinion expressed in this message may be personal to the author and may not necessarily reflect the opinion of IXIS Corporate & Investment Bank, its affiliates or parent company. It may also be subject to change without prior notice. Copyright reserved. If you are not an addressee, you must not disclose, copy, circulate or in any other way use or rely on the information contained in this email. If you have received it in error, please inform us immediately and delete all copies. Any communication made with IXIS Corporate & Investment Bank (whether personal or business) may be monitored and a record kept. IXIS Corporate & Investment Bank shall not be liable for the message if altered, changed or falsified. As communication on the Internet is not secure, IXIS Corporate & Investment Bank does not accept responsibility for the content of this message. --------------------------------------------------------
list Jason Altrincham Jones
Not really, that sends the predetermined colours, what I was thinking is more sending the output of command x and then have hobbit generate the webpage. Any ideas? Jason.
▸
-----Original Message-----
From: Ralph Mitchell [mailto:user-00a5e44c48c0@xymon.invalid]
Sent: 31 July 2007 11:32
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] sending client side data
On 7/31/07, Jones, Jason (Altrincham) <user-ee957b46acd2@xymon.invalid> wrote:I've been looking through the archives at the sample server side module Henrick showed us (http://www.hswn.dk/hobbiton/2007/01/msg00487.html) I'm just curious if anyone knows how to send data from the client side to the server the same way the standard hobbit client tests do, looking at bb
I'm
guessing either bb data or bb client, but when I run bb <hobbitIP>
"client
<hostname>.<os>" nothing happens and the man pages don't mention how to actually send the data etc.
I have a Hobbit client install running a BigIP check. In
client/etc/clientlaunch.cfg:
[bigip-v4]
ENVFILE $HOBBITCLIENTHOME/etc/hobbitclient.cfg
CMD $HOBBITCLIENTHOME/ext/bigip/bigip3.sh
LOGFILE $HOBBITCLIENTHOME/logs/hobbitclient.log
INTERVAL 5m
After doing what it needs to do to get the status, the script sends
off a status message to the server like this:
MACHINE=`echo $NAME | sed -e 's/\./,/g'`
MESSAGE="status $MACHINE.$TEST $COLOR `date`<P><font size=+2>The
$BIGIP BigIP says: $NAME $TEST is $STATE</font>"
$BB $BBDISP $MESSAGE
Is that what you're looking for??
Ralph Mitchell
list Buchan Milne
▸
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:
Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state Y, and a client-side module could then activate response Z on its own?
I don't like band-aids like this. "restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact. I like even less using a monitoring system to do this ... Regards, Buchan
list Tod Hansmann
In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software. Hobbit CAN do some scripting based on alerts, but even that might be a bit more than a systems administrator wants to hinder himself with. Tod Hansmann Network Engineer
▸
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state
Y,
and a client-side module could then activate response Z on its own?
I don't like band-aids like this. "restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact. I like even less using a monitoring system to do this ... Regards, Buchan
list Galen Johnson
DOn't forget...this is the model that Tivoli and HP Openview, and many other commercial monitoring solutions provide and sell as a feature. From my experience as a sys admin, I've alwys found that automatically restarting a service if it goes down to be "a bad thing"(TM). In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention. =G=
▸
-----Original Message-----
From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid]
Sent: Friday, August 03, 2007 10:40 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version
In my experience, I have to agree. Hobbit is for monitoring so the
information that x is down gets to people who can properly diagnose what
is going on, not take generic actions. If generic actions were
something that were required for X to function properly, it should be a
feature of that software.
Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with.
Tod Hansmann
Network Engineer
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state
Y,
and a client-side module could then activate response Z on its own?
I don't like band-aids like this. "restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact. I like even less using a monitoring system to do this ... Regards, Buchan
list Thomas Kern
When a monitoring system detects something wrong, the only actions I want the monitor to perform is to get the admin (or the admin's boss) moving to diagnose and fix the problem. And I am the admin that I am most concerned with. I don't understand most of the errors well enough to automate a recovery process. /Thomas Kern /XXX-XXX-XXXX
▸
-----Original Message----- From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid] Sent: Friday, August 03, 2007 11:18 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] Highlights of the 4.3.0 version DOn't forget...this is the model that Tivoli and HP Openview, and many other commercial monitoring solutions provide and sell as a feature. From my experience as a sys admin, I've alwys found that automatically restarting a service if it goes down to be "a bad thing"(TM). In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention. =G=
list Greg L Hubbard
Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a "restart" parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated. But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting. GLH
▸
-----Original Message-----
From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid]
Sent: Friday, August 03, 2007 10:18 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version
DOn't forget...this is the model that Tivoli and HP Openview, and many
other commercial monitoring solutions provide and sell as a feature.
From my experience as a sys admin, I've alwys found that automatically
restarting a service if it goes down to be "a bad thing"(TM).
In many solutions, logs get overwritten upon a restart that would be
integral to the real resolution and prevention.
=G=
-----Original Message-----
From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid]
Sent: Friday, August 03, 2007 10:40 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Highlights of the 4.3.0 version
In my experience, I have to agree. Hobbit is for monitoring so the
information that x is down gets to people who can properly diagnose what
is going on, not take generic actions. If generic actions were
something that were required for X to function properly, it should be a
feature of that software.
Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with.
Tod Hansmann
Network Engineer
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Friday, August 03, 2007 12:31 AM
To: user-ae9b8668bcde@xymon.invalid
Cc: Hubbard, Greg L
Subject: Re: [hobbit] Highlights of the 4.3.0 version
On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in state
Y,
and a client-side module could then activate response Z on its own?
I don't like band-aids like this. "restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact. I like even less using a monitoring system to do this ... Regards, Buchan
list S Aiello
▸
On Friday 03 August 2007 11:38, Hubbard, Greg L wrote:
Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a "restart" parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated. But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting. GLH
To restart a process, some form of intelligence has to be added to the restart script, especially when recovering from a failure mode. Scripts can only have so much intelligence, a restart script could be dangerous unless dealing with a simple situation. Now after saying all this, I do have to admit I do have scripts that query the status of the monitoring server and on reds perform a restart. There should be nothing stopping you from implementing the same. It is just a very fine line when deciding when/how to implement process restarts. Most times out of not, it is much better for a person to react to an alert then a script. But for recurring failure modes, these scripts do help and I don't get called at 3 am. So if you really need to implement restart scripts, just use the bb tool's query feature. ~Steve
list Scott Walters
I am definitely in the "monitor only" camp. As appealing as "self-healing" may seem, I've seen attempts go horrible wrong too many times. For example, shutting down Oracle for upgrades and then being restarted in the middle of the upgrade. Not good. I also agree that "self-healing" lends itself to band-aids that avoid root-cause determination. I don't think this requires "baby-sitting," but a commitment to fixing things once. I have also had the displeasure of making permanent band-aids, but I cannot condone it. All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books. You know what's better than your webserver being automatically restarted when it crashes? Your webserver not crashing. I completely support the absence of corrective actions from monitor triggers. The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill. Scott Walters -PacketPusher
▸
On 8/3/07, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a "restart" parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated. But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting. GLH -----Original Message----- From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid] Sent: Friday, August 03, 2007 10:18 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] Highlights of the 4.3.0 version DOn't forget...this is the model that Tivoli and HP Openview, and many other commercial monitoring solutions provide and sell as a feature. From my experience as a sys admin, I've alwys found that automatically restarting a service if it goes down to be "a bad thing"(TM). In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention. =G= -----Original Message----- From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid] Sent: Friday, August 03, 2007 10:40 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] Highlights of the 4.3.0 version In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software. Hobbit CAN do some scripting based on alerts, but even that might be a bit more than a systems administrator wants to hinder himself with. Tod Hansmann Network Engineer -----Original Message----- From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid] Sent: Friday, August 03, 2007 12:31 AM To: user-ae9b8668bcde@xymon.invalid Cc: Hubbard, Greg L Subject: Re: [hobbit] Highlights of the 4.3.0 version On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way for the Hobbit client to tell the server that service X was now in stateY,and a client-side module could then activate response Z on its own?I don't like band-aids like this. "restart because it's down" prevents the real impact of problems being seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact. I like even less using a monitoring system to do this ... Regards, Buchan
list Henrik Størner
On Fri, Aug 03, 2007 at 01:15:27PM -0400, Scott Walters wrote:
I am definitely in the "monitor only" camp.
Me too. For those who feel differently, Hobbit does provide the necessary hooks so you can trigger actions from some status going red; either through alert scripts, or from the bb "query" command which others have mentioned. In fact, I implemented the "query" feature because I needed it to setup such an automated recovery for one of our customers at work.
▸
All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books.
Good point.
▸
The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill.
It can be very useful at times, especially when you have to do a "root cause analysis" to explain why some service was down at 2 AM in the morning - and the problem was fixed by a 2nd-level technician who just rebooted the box. That's why I added the feature that Hobbit saves the latest client-data report when a status goes yellow or red. It has helped me track down the cause of quite a few service outages. Regards, Henrik
list Dave Haertig
Sometimes the real world runs interference for Utopia. While in Utopia you want to analyse, find the root cause, and fix everything before proceding, you can't always do that. When an outage of one hour costs your company tens of thousands of dollars, you can't justify withholding a simple bandaid (so long as you don't then ignore the long term fix). Most everything I do in Hobbit is a custom script. Restarting crashed processes is one of the least of my worries. Although in some rare cases I do just that (short term), with appropriate logging and email to the app developement team. The corporate expense of having the app down is too great to let Utopian ideas prevail. Most of the automated Hobbit stuff I do is not restarting dead apps (luckily, that is very infrequent around here). It's more mundane. One example is disk space. A full filesystem would shut many things down. Apps should not fill a filesystem, but sometimes they do. So my custom Hobbit scripts first scream and scream about low disk space, even analysing things down to specific subdirectories and fast growing files and doing trend analysis. But if their call is not answered, they start freeing up space from a "private reserve" I have set aside to deal with emergencies. So if we experience a sudden unexpected blowup in a filesystem at 3am, Hobbit keeps things running in production until the appropriate people can look into and diagnose the problem. This may not be Utopian behavior, but it sure is practical at 3am in the morning! But my vote would be for Hobbit out-of-the-box to NOT attempt automated repair actions. That should be left to the Hobbit administrator. We can write custom monitor scripts or custom alert scripts to add this functionality if it's appropriate for our environments. It's trivial to integrate your own scripting into Hobbit. I sure wish I worked in Utopia though. The job would be a helluva lot less stressful! :-) -----Original Message----- From: user-7796849e4635@xymon.invalid [mailto:user-7796849e4635@xymon.invalid] On Behalf Of Scott Walters Sent: Friday, August 03, 2007 11:15 AM
▸
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Highlights of the 4.3.0 version
I am definitely in the "monitor only" camp. As appealing as
"self-healing" may seem, I've seen attempts go horrible wrong too many
times. For example, shutting down Oracle for upgrades and then being
restarted in the middle of the upgrade. Not good.
I also agree that "self-healing" lends itself to band-aids that avoid
root-cause determination. I don't think this requires "baby-sitting,"
but a commitment to fixing things once. I have also had the displeasure
of making permanent band-aids, but I cannot condone it.
All of those "operational" aspects aside, I've convinced myself from a
security point of view, corrective action from monitoring is bad-- a
clear violation of the separation of duties. You don't want your
auditors "cleaning up" the numbers as they go over your books.
You know what's better than your webserver being automatically restarted
when it crashes? Your webserver not crashing.
I completely support the absence of corrective actions from monitor
triggers. The question I have yet to answer satisfactorily is,"Should
the monitoring system perform additional data collection after specific
errors?" For example, running a particular "find" command when disk
usage increases to try and identify which files are causing the
partition to fill.
Scott Walters
-PacketPusher
On 8/3/07, Hubbard, Greg L <user-d970b5e56ec9@xymon.invalid> wrote:Well, I use Netcool which has the opposite philosophy -- there is a "process automation" system that watches processes and restarts them if they fail, while also logging restarts. You can configure a
"restart"
parameter to be anything from 0 (forever) to any number of times. I like to set a reasonable number so persistent errors eventually kill the process, but occasional errors do not. Log files are not overwritten, but are appended and rotated. But whatever. My view seems to be in the minority -- guess the rest of you don't mind 24x7x365 babysitting. GLH -----Original Message----- From: Galen Johnson [mailto:user-87f955643e3d@xymon.invalid] Sent: Friday, August 03, 2007 10:18 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] Highlights of the 4.3.0 version DOn't forget...this is the model that Tivoli and HP Openview, and many
other commercial monitoring solutions provide and sell as a feature. From my experience as a sys admin, I've alwys found that automatically
restarting a service if it goes down to be "a bad thing"(TM). In many solutions, logs get overwritten upon a restart that would be integral to the real resolution and prevention. =G= -----Original Message----- From: Tod Hansmann [mailto:user-b6e28cb93fa4@xymon.invalid] Sent: Friday, August 03, 2007 10:40 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] Highlights of the 4.3.0 version In my experience, I have to agree. Hobbit is for monitoring so the information that x is down gets to people who can properly diagnose what is going on, not take generic actions. If generic actions were something that were required for X to function properly, it should be a feature of that software. Hobbit CAN do some scripting based on alerts, but even that might be a
bit more than a systems administrator wants to hinder himself with. Tod Hansmann Network Engineer -----Original Message----- From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid] Sent: Friday, August 03, 2007 12:31 AM To: user-ae9b8668bcde@xymon.invalid Cc: Hubbard, Greg L Subject: Re: [hobbit] Highlights of the 4.3.0 version On Tuesday 24 July 2007 22:55:02 Hubbard, Greg L wrote:Wonder if there is any way to tell a client what it's status is so it can be autonomous? What I mean is this: suppose there was a way
for the Hobbit client to tell the server that service X was now in stateY,and a client-side module could then activate response Z on its own?I don't like band-aids like this. "restart because it's down" prevents the real impact of problems being
seen, and provides less motivation for fixing things properly. Instead, you sit with frequent short outages (which may avoid the attention of managers, production managers) which have end-user impact. I like even less using a monitoring system to do this ... Regards, Buchan
list Kolbjørn Barmen
On Fri, 3 Aug 2007, Haertig, David F (Dave) wrote:
I sure wish I worked in Utopia though.
Ditto, since there would be no top-posters around... -- Kolbjørn Barmen UNINETT Driftsenter
list Gary Baluha
▸
On 8/3/07, Haertig, David F (Dave) <user-68874b735d77@xymon.invalid> wrote:
Most everything I do in Hobbit is a custom script. Restarting crashed processes is one of the least of my worries. Although in some rare cases I do just that (short term), with appropriate logging and email to the app developement team. The corporate expense of having the app down is too great to let Utopian ideas prevail.
Agreed, though sometimes it's worth the effort for an extra few minutes of
downtime to do *some* analysis.
▸
Most of the automated Hobbit stuff I do is not restarting dead apps(luckily, that is very infrequent around here). It's more mundane. One example is disk space. A full filesystem would shut many things down. Apps should not fill a filesystem, but sometimes they do. So my custom Hobbit scripts first scream and scream about low disk space, even analysing things down to specific subdirectories and fast growing files and doing trend analysis. But if their call is not answered, they start freeing up space from a "private reserve" I have set aside to deal with emergencies. So if we experience a sudden unexpected blowup in a filesystem at 3am, Hobbit keeps things running in production until the appropriate people can look into and diagnose the problem. This may not be Utopian behavior, but it sure is practical at 3am in the morning!
What sort of trend analysis do your scripts perform? We have a few boxes
that are notorious for filling up their disk space, and I haven't yet come
up with an idea of how to neatly track exactly what it is that keeps filling
up the disk.
▸
But my vote would be for Hobbit out-of-the-box to NOT attempt automatedrepair actions. That should be left to the Hobbit administrator. We can write custom monitor scripts or custom alert scripts to add this functionality if it's appropriate for our environments. It's trivial to integrate your own scripting into Hobbit.
Due to the demands of some of the other admins, I have implemented a script
that does some rudimentary restarting, and even looks at the status of the
specific Hobbit alert in question, so that it doesn't try to restart
something, if the alert has been disabled (such as for a planned downtime).
It wasn't all that hard to write, and I also would prefer Hobbit NOT have
auto-restart logic out of the box.
▸
I sure wish I worked in Utopia though. The job would be a helluva lotless stressful! :-)
Working in the real world isn't as bad, compared to working the real world
where management _thinks_ you actually work in Utopia, and yet still can't
spare an extra second of downtime for real-time root cause analysis. ;-)
list Dave Haertig
I try to identify filesystem "space hogs" via custom scripts I wrote a long time ago when using BB. 99% of my custom stuff is done in PERL. I use 'du -k' to get the size of all directories in the filesystem. I then cut those results down to only the first and second level directories (but you could go as deep as you want). I store the size of each subdirectory in a small "database". I did this ages ago and my code uses PERL's "Storable" module to store the accumulated date into a file (called my "database"). These days I'd just use Hobbit's easily accessed RRD files. I then use PERL's Statistics::Descriptive::least_squares_fit() to calculate the slope and linear correlation coefficient of the "best fit line". This allows me to see how fast each subdirectory is growing/shrinking, and how linear that growth/reduction is. I trigger yellow/red conditions based on rate of growth and predicted fill time at current growth rate, in addition to the standard "95% full = red" test. The above makes it fairly easy to identify which subdirectory is your problem, which is often times good enough to identify the file/process that is killing you. When that's not, I have a seperate test that tries to identify problem files a different way. BB/Hobbit uses 'top' to identify cpu-hogging processes. Many times you see files hogging space are directly tied to processes hogging cpu (runaway process = runaway file in many cases). 'top' identifies the process(es), then "lsof -p <pid>" is used to identify the files that the suspect process has open. Finding a cpu-hogger that has a filespace-hogger open is usually the holy grail you seek. As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in 100Mb chunks for critical filesystems. "dd if=/dev/zero of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then "cp reserve01 reserve02", etc. to build up the reserve. A seperate Hobbit "notification script" is used to simply delete files from this reserve under dire circumstances, after normal email/pager notifications have failed to trigger action by developers/production support people. My BB/Hobbit custom scripts tend to get quite involved. Probably too much so, but they're fun for me to write! From: Gary Baluha [mailto:user-ae3e15c22de1@xymon.invalid] Sent: Monday, August 06, 2007 7:29 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Highlights of the 4.3.0 version < ... snip ... >
▸
One
example is disk space. A full filesystem would shut many things down.
Apps should not fill a filesystem, but sometimes they do. So my custom
Hobbit scripts first scream and scream about low disk space, even
analysing things down to specific subdirectories and fast growing files
and doing trend analysis. But if their call is not answered, they start
freeing up space from a "private reserve" I have set aside to deal with
emergencies. So if we experience a sudden unexpected blowup in a
filesystem at 3am, Hobbit keeps things running in production until the
appropriate people can look into and diagnose the problem. This may not
be Utopian behavior, but it sure is practical at 3am in the morning!
What sort of trend analysis do your scripts perform? We have a few
boxes that are notorious for filling up their disk space, and I haven't
yet come up with an idea of how to neatly track exactly what it is that
keeps filling up the disk.
< ... snip ...>
list Buchan Milne
▸
On Friday 03 August 2007 19:15:27 Scott Walters wrote:
I am definitely in the "monitor only" camp. As appealing as "self-healing" may seem, I've seen attempts go horrible wrong too many times. For example, shutting down Oracle for upgrades and then being restarted in the middle of the upgrade. Not good.
How about the easy example of a web server not responding. Do you restart it ? In the case I am thinking of, no. Since, the reason it is not responding is that the database server it (and another 4 webservers) is waiting for is having problems. Restarting the web server would drop the >1000 existing (working) sessions, causing a full-blown outage, and migrate the problem to the other 4 web servers that sit behind the same load balancer.
▸
I also agree that "self-healing" lends itself to band-aids that avoid root-cause determination.
Or *prevent* the root-cause determination. For example, I had a problem on an LDAP server that appeared once in 2 or 3 weeks. I start it under a debugger, and when next experienced the problem, some online debugging (after taking it out of the pool) with a developer found and fixed the bug within one hour (and allowed me to understand the cause so I could work around it). A restart here would have meant waiting some more and another few outages.
▸
I don't think this requires "baby-sitting," but a commitment to fixing things once. I have also had the displeasure of making permanent band-aids, but I cannot condone it.
We do have some applications that require supervision ... but for them we use daemon-tools or supervise-scripts (a re-implementation of daemon-tools), as these are *much* better at supervision than a monitoring system. If you really need a baby-sitter, the monitoring system isn't the best one ...
▸
All of those "operational" aspects aside, I've convinced myself from a security point of view, corrective action from monitoring is bad-- a clear violation of the separation of duties. You don't want your auditors "cleaning up" the numbers as they go over your books. You know what's better than your webserver being automatically restarted when it crashes? Your webserver not crashing. I completely support the absence of corrective actions from monitor triggers. The question I have yet to answer satisfactorily is,"Should the monitoring system perform additional data collection after specific errors?" For example, running a particular "find" command when disk usage increases to try and identify which files are causing the partition to fill.
Or attach a debugger to the hung process and get a backtrace ? Regards, Buchan
list Buchan Milne
▸
On Monday 06 August 2007 21:25:46 Haertig, David F (Dave) wrote:
I try to identify filesystem "space hogs" via custom scripts I wrote a long time ago when using BB. 99% of my custom stuff is done in PERL. I use 'du -k' to get the size of all directories in the filesystem. I then cut those results down to only the first and second level directories (but you could go as deep as you want). I store the size of each subdirectory in a small "database". I did this ages ago and my code uses PERL's "Storable" module to store the accumulated date into a file (called my "database"). These days I'd just use Hobbit's easily accessed RRD files. I then use PERL's Statistics::Descriptive::least_squares_fit() to calculate the slope and linear correlation coefficient of the "best fit line".
This would be really useful to do on directories monitored with the dir option in client-local.cfg plus DIR option in hobbit-clients, e.g. to be able to specify alerts at specified "time before disk is full".
▸
This allows me to see how fast each subdirectory is growing/shrinking, and how linear that growth/reduction is. I trigger yellow/red conditions based on rate of growth and predicted fill time at current growth rate, in addition to the standard "95% full = red" test. The above makes it fairly easy to identify which subdirectory is your problem, which is often times good enough to identify the file/process that is killing you. When that's not, I have a seperate test that tries to identify problem files a different way. BB/Hobbit uses 'top' to identify cpu-hogging processes. Many times you see files hogging space are directly tied to processes hogging cpu (runaway process = runaway file in many cases). 'top' identifies the process(es), then "lsof -p <pid>" is used to identify the files that the suspect process has open. Finding a cpu-hogger that has a filespace-hogger open is usually the holy grail you seek.
The "CPU usage by process" graph is the utopian one ...
▸
As a "repair" action for Hobbit, I squirreled away 2Gb of diskspace in 100Mb chunks for critical filesystems. "dd if=/dev/zero of=/filesystem/DiskSpaceReserve/reserve01 bs=1024 count=102400", then "cp reserve01 reserve02", etc. to build up the reserve.
lvextend may be another useful command here ... Regards, Buchan
list Jason Altrincham Jones
Hi All, Is there a way to filter out hosts/sites based on pagename or just a regular expression? Trying to do an availability report for all sites except one and wondering if there is a way. Thanks, Jason.
list Robert
Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
list Steve Holmes
Make sure the command make is really gmake, or use gmake explicitly. Steve
▸
On 8/9/07, Robert <user-36b337833045@xymon.invalid> wrote:
Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can use
and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
list Robert
▸
Robert <user-36b337833045@xymon.invalid> wrote: Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
Pinpoint customers who are looking for what you sell.
list Robert
▸
Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
Luggage? GPS? Comic books?
Check out fitting gifts for grads at Yahoo! Search.
list Tom Moore
Install and try "gmake" on your solaris machine. You can get a precompiled binary package from www.sunfreeware.com
▸
From: Robert [mailto:user-36b337833045@xymon.invalid]
Sent: Thursday, August 09, 2007 8:59 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Filter reports?
Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD
]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S
][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now<http://us.rd.yahoo.com/evt=48223/*http:/get.games.yahoo.com/proddesc?ga mekey=monopolyherenow> (it's updated for today's economy) at Yahoo! Games.
list Pkc_mls
Robert a écrit :
Hi list, hi Robert,
▸
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
are you using gmake or the standard solaris make ?
▸
it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know. Thanks in advance
list Mike Arnold
▸
Robert wrote:
Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.You need to use gmake and not Solaris make. Or you can always use the packages from http://www.blastwave.org/ . -- -m
list Trent Melcher
Talk about hijacking a thread, and its not even close to anything about Filter reports. Folks, please create a new email with a new subject for your "New" posts. This will make following threads stay on subject. Thanks Trent
▸
On Thu, 2007-08-09 at 05:59 -0700, Robert wrote:Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv
-lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]...
[ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at
Yahoo! Games.list Jason Altrincham Jones
I thought it'd gotten a little off topic too :) Original question I asked:
▸
Hi All,
Is there a way to filter out hosts/sites based on pagename or just a
regular expression? Trying to do an availability report for all sites
except one and wondering if there is a way.
Thanks,
Jason.
-----Original Message-----
▸
From: Trent Melcher [mailto:user-c65e78735b17@xymon.invalid]
Sent: 09 August 2007 16:56
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Filter reports?
Talk about hijacking a thread, and its not even close to anything about
Filter reports. Folks, please create a new email with a new subject for
your "New" posts. This will make following threads stay on subject.
Thanks
Trent
On Thu, 2007-08-09 at 05:59 -0700, Robert wrote:Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv
-lsocket -lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]...
[ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.
Thanks in advance
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at
Yahoo! Games.list Henrik Størner
▸
On Thu, Aug 09, 2007 at 12:33:36PM +0100, Jones, Jason (Altrincham) wrote:
Is there a way to filter out hosts/sites based on pagename or just a regular expression? Trying to do an availability report for all sites except one and wondering if there is a way.
Only way I can see is to run the report using a bb-hosts file without the host you want excluded. Henrik
list Robert
Mike, I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there. Thanks in advance
▸
Mike Arnold <user-95d566fbb20b@xymon.invalid> wrote:
Robert wrote:
Hi list,
I am trying to compile hobbit on Solaris 9, make is failing:
bash-2.05# pwd
/apps/hobbit/bbgen-3.5/build
bash-2.05# cd ..
bash-2.05# make
CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT
-DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include"
SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include"
SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv -lsocket
-lnsl" make -C lib all
Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][ -DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][
-t ]
[ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro
+=value"... ]
make: Fatal error: Unknown option `-C'
*** Error code 1
make: Fatal error: Command failed for target `lib-build'
it is using -C option which is not a right option, which option I can
use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient?
Please let me know.You need to use gmake and not Solaris make. Or you can always use the packages from http://www.blastwave.org/ . -- -m
Building a website is a piece of cake.
Yahoo! Small Business gives you all the tools to get online.
list Galen Johnson
Try sunfreeware.com...
▸
From: Robert [mailto:user-36b337833045@xymon.invalid]
Sent: Friday, August 10, 2007 12:25 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Hobbit instlalation on Solaris 9
Mike,
I spent lot of time on that site, I am trying to download CSWhobbit but
when I click on it to download it is showing bunch of dependencies and I
can't download any of those. I am not sure what I am doing wrong, could
you please let me know how to download from there.
Thanks in advance
Mike Arnold <user-95d566fbb20b@xymon.invalid> wrote:
Robert wrote:Hi list, I am trying to compile hobbit on Solaris 9, make is failing: bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv
-lsocket
-lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][
-DD ]
[ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option `-C' *** Error code 1 make: Fatal error: Command failed for target `lib-build' it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.
You need to use gmake and not Solaris make. Or you can always use the packages from http://www.blastwave.org/ . -- -m Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
<http://us.rd.yahoo.com/evt=48251/*http:/smallbusiness.yahoo.com/webhost ing/?p=PASSPORTPLUS>
list Flemming
or try a mirror-site ;-) http://ftp.uni-erlangen.de/pub/mirrors/blastwave.org/unstable/i386/5.9/hobbit_client-4.2.0,REV=2007.04.12-SunOS5.8-i386-CSW.pkg.gz and additional CSWchkconfig CSWcommon CSWexpat CSWggettext CSWhobbitc CSWiconv CSWlibpopt SMCpcre
▸
On Fri, 10 Aug 2007, Galen Johnson wrote:
Try sunfreeware.com... From: Robert [mailto:user-36b337833045@xymon.invalid] Sent: Friday, August 10, 2007 12:25 PM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Hobbit instlalation on Solaris 9 Mike, I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there. Thanks in advance Mike Arnold <user-95d566fbb20b@xymon.invalid> wrote: Robert wrote:Hi list, I am trying to compile hobbit on Solaris 9, make is failing: bash-2.05# pwd /apps/hobbit/bbgen-3.5/build bash-2.05# cd .. bash-2.05# make CC="gcc" CFLAGS="-g -O2 -Wall -Wno-unused -D_REENTRANT -DHAVE_RPCENT -DMAXMSG=8192 -DBBDPORTNUMBER=1984 -I. -I`pwd`/include" SSLFLAGS="-DBBGEN_SSL" SSLINCDIR="-I/usr/local/ssl/include" SSLLIBS="-L/usr/local/ssl/lib -lcrypto -lssl" NETLIBS="-lresolv-lsocket-lnsl" make -C lib all Usage : make [ -f makefile ][ -K statefile ]... [ -d ][ -dd ][ -D ][-DD ][ -e ][ -i ][ -k ][ -n ][ -p ][ -P ][ -q ][ -r ][ -s ][ -S ][ -t ] [ -u ][ -w ][ -V ][ target... ][ macro=value... ][ "macro +=value"... ] make: Fatal error: Unknown option `-C' *** Error code 1 make: Fatal error: Command failed for target `lib-build' it is using -C option which is not a right option, which option I can use and chaning in /bbgen-3.5/build/Makefile.rules is sufficient? Please let me know.You need to use gmake and not Solaris make. Or you can always use the packages from http://www.blastwave.org/ . -- -m Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online. <http://us.rd.yahoo.com/evt=48251/*http:/smallbusiness.yahoo.com/webhost ing/?p=PASSPORTPLUS>
Cheers,
Flemming
treibsAND
Willy-Brandt-Allee 9
23554 Lübeck
www.treibsand.net
www.walli-bleibt.de
www.myspace.com/treibsand_luebeck
user-690778e9ef6b@xymon.invalid
list Tom Georgoulias
▸
Henrik Stoerner wrote:
This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know.
I am testing the latest snapshot now and I really like what I've seen, especially the hobbit-topchanges.sh script. This is
Major new features
▸
* Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping.
Is the flap detetection going to have some kind of configurable parameters? And will we get some kind of indication that flapping is occurring?
▸
* Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets.
I'm very happy to see this one! :)
▸
* RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones.
How do I toggle the min/max for new RRDs?
Display things
▸
* The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly.
Another really cool and useful improvement. Two features I've been asked for but I know either can't be done for 4.3.0 or maybe even at all are: 1. I get asked for this all the time: A way to mark "big" events on the graphs, so that we can have some marker inside or text outside the graph that gives some context to sweeping, overall trend changes. For example, if the average CPU IOwait on a database server drops from 35% to 10% after a code release, a person who is looking at that graph and wasn't involved in that code release would immediately know why the performance improved because a brief explanation of what took place during that time frame would be displayed along side. 2. An easier way to customize the colors or style of the Hobbit webpages. I tried to take this on one afternoon and found myself going through lots of source code, since it appeared that it was hard coded in quite a few places. Tom -- Tom Georgoulias Sr. Systems Engineer McClatchy Interactive user-6a0b8b0f0ae1@xymon.invalid
list Mike Arnold
▸
Robert wrote:
Mike, I spent lot of time on that site, I am trying to download CSWhobbit but when I click on it to download it is showing bunch of dependencies and I can't download any of those. I am not sure what I am doing wrong, could you please let me know how to download from there. Thanks in advance
Blastwave's web pages are just informative. To use blastwave you must have pkg-get installed. HOWTO Use Blastwave http://www.blastwave.org/howto_S8.html Once you have pkg-get installed, you can install Hobbit like this: pkg-get -i hobbit hobbit_client Hobbit then lives in /opt/csw/libexec/hobbit . -- -mike
list Sebastian Auriol
Henrik, The new features sound good! But are they now documented? I checked the snapshot and none of man pages or the Changes page seem to have been updated since the 4.2.0 release. It makes it quite difficult to test and use the new features (which presumably is a requirement before releasing the new version)! ;-) Or am I missing something less obvious than using the source (TM)? I think it would be useful if the Changes page (http://www.hswn.dk/beta/snapshot/Changes) was kept more or less up-to-date for the snapshot releases. Knowing what has changed in hobbit-server betas seems pretty difficult without downloading frequent snapshots and doing diffs. Is there any chance of the source being put into a public subversion / CVS repository or something? I see there is already a public CVS repository for hobbit on SourceForge, but it only includes the hobbit-client code A public repository may encourage more contributed patches. I also see that last year you mentioned that you are "using RCS which is a predecessor to CVS". I don't know if this is still the case, but if so, it appears pretty simple to migrate to CVS using rcs2cvs as documented here: http://www.linuxdocs.org/HOWTOs/CVS-RCS-HOWTO-3.html Or as the OP did in http://www.nabble.com/RCS-to-svn-t792119.html - he migrated RCS to SVN via CVS. Many thanks, Sebastian user-ce4a2c883f75@xymon.invalid (Henrik Stoerner) wrote on Sun, 22 Jul 2007 00:08:12 +0200:
▸
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know. Major new features * PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used. * Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping. * Holiday support for alerts, including variable holidays (Easter etc) * Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets. * RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones. * Distributed worker modules allow sharing the load across multiple Hobbit servers * RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter. * Detection of statuses that are reported by multiple hosts * Client backend-support for the z/OS and z/VSE clients by Rich Smirna Display things * Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x) * The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly. * Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition. * NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis * Host-comments are displayed as tool-tips, to save screen space. Checks and graphs * Network tests can use a specific source IP instead of the default * The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting * Client file checks can check for a symlink * "trends" report for RRD handling allows generating custom-made RRD files * Hobbit host- and status-counts are tracked in an RRD file Miscellaneous * NCV reports can handle color-icons before the name:value data * hobbitlaunch tasks can be configured to run on certain hosts only * Time-warp detection and warning * Local unix-socket interface to Hobbit daemon * hobbitd_capture can collect several statuses and hand off such a batch to an external command * Support for SHA-224/256/384/512 digests Regards, Henrik
list Sebastian Auriol
A slight clarification on my earlier message: the HTML versions of man pages haven't been updated since the 4.2.0 release, but the actual man pages have. I did end up doing a source diff of some man pages against 4.2.0 to see what had changed and how to use some of the new features... So the situation is better than I feared. I'm not sure whether you saw my previous message, Henrik? Sebastian
▸
From: Sebastian [mailto:user-7b2156f36779@xymon.invalid] Sent: 30 November 2007 17:04 To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] RE: Highlights of the 4.3.0 version Henrik, The new features sound good! But are they now documented? I checked the snapshot and none of man pages or the Changes page seem to have been updated since the 4.2.0 release. It makes it quite difficult to test and use the new features (which presumably is a requirement before releasing the new version)! ;-) Or am I missing something less obvious than using the source (TM)? I think it would be useful if the Changes page ( <http://www.hswn.dk/beta/snapshot/Changes>; http://www.hswn.dk/beta/snapshot/Changes) was kept more or less up-to-date for the snapshot releases. Knowing what has changed in hobbit-server betas seems pretty difficult without downloading frequent snapshots and doing diffs. Is there any chance of the source being put into a public subversion / CVS repository or something? I see there is already a public CVS repository for hobbit on SourceForge, but it only includes the hobbit-client code. A public repository may encourage more contributed patches. I also see that last year you mentioned that you are "using RCS which is a predecessor to CVS". I don't know if this is still the case, but if so, it appears pretty simple to migrate to CVS using rcs2cvs as documented here: <http://www.linuxdocs.org/HOWTOs/CVS-RCS-HOWTO-3.html>; http://www.linuxdocs.org/HOWTOs/CVS-RCS-HOWTO-3.html Or as the OP did in <http://www.nabble.com/RCS-to-svn-t792119.html>; http://www.nabble.com/RCS-to-svn-t792119.html - he migrated RCS to SVN via CVS. Many thanks, Sebastian user-ce4a2c883f75@xymon.invalid (Henrik Stoerner) wrote on Sun, 22 Jul 2007 00:08:12 +0200:
In another thread, someone asked about what new features are planned for version 4.3.0. I've summarized them below; they have all been implemented by now. Some of them have been contributed by others over the past year - I'm pleased to have finally gotten their patches merged. There are some open bug-reports, and the plan now is to try and get those fixed. Once that is done I'll ask you all to start testing the beta-versions, and then a new release is hopefully available soon. This doesn't mean that I won't consider adding new stuff before the 4.3.0 release, but right now the plan is to get 4.3.0 shipped with the current set of features. But if I've missed someone's favourite patch or feature request, do let me know. Major new features ------------------ * PAGE setting for alert- and client-configuration handles hosts on multiple pages, so any pagename can be used. * Flap detection of statuses that change color rapidly. The status is kept at the most critical level until it stops flapping. * Holiday support for alerts, including variable holidays (Easter etc) * Split NCV support - graph data from NCV can be split into multiple RRD databases allowing for varying number of datasets. * RRD database parameters are now configurable (i.e. number of datapoints stored, whether to store min/max values etc). Note that this only applies to newly created RRD files, not existing ones. * Distributed worker modules allow sharing the load across multiple Hobbit servers * RRD updates are now cached for up to 30 minutes before being written to disk. This makes the I/O load on large installations much lighter. * Detection of statuses that are reported by multiple hosts * Client backend-support for the z/OS and z/VSE clients by Rich Smirna Display things -------------- * Graph zooming now limits the lower/upper bounds of a graph (requires rrdtool 1.2.x) * The trends page default data-period can be configured to something other than the default 48-hour view, and the user can select a different period on-the-fly. * Hosts can be sorted automatically on the overview webpage with a "group-sorted" group definition. * NOCOLUMNS setting in bb-hosts let you suppress certain columns on a per-host basis * Host-comments are displayed as tool-tips, to save screen space. Checks and graphs ----------------- * Network tests can use a specific source IP instead of the default * The validity-period of network tests is configurable, instead of being fixed at the default 30-minute setting * Client file checks can check for a symlink * "trends" report for RRD handling allows generating custom-made RRD files * Hobbit host- and status-counts are tracked in an RRD file Miscellaneous ------------- * NCV reports can handle color-icons before the name:value data * hobbitlaunch tasks can be configured to run on certain hosts only * Time-warp detection and warning * Local unix-socket interface to Hobbit daemon * hobbitd_capture can collect several statuses and hand off such a batch to an external command * Support for SHA-224/256/384/512 digests Regards, Henrik
list Jersey Man
Curious if this ever worked..