Contact to Henrik
list Wolfgang Wutz
Hello everybody, sorry for spamming the mailing list, but I'm quite desperate. Currently I'm doing my diploma thesis and part of it is setting up and customizing a monitoring environment using hobbit. Due to the fact, that my Prof wants me to write s.th. about the hobbit system architecture, I tried to reach Henrik personally by mail, but unfortunately got no reaction. Probably the spam filters blocked my messages. So Henrik, if you read this, please tell me a way to get in contact with you. Thanks for your time. Wolfgang
list Henrik Størner
Hi Wolfgang, your mails were in my spam mailbox for some reason. I then started writing a reply directly to You, but decided to send it to the list also - so it can be referenced through the mail archive if anyone else needs it.
Now I need your help, because my professor at my university wants me to present the software architecture of Hobbit in my thesis. He wants some pictures and diagramms showing the 'inner mechanisms' and how the different modules work together. So my questions is, do you have any information about the architecture, that you can share with me?
I don't have any formal design documents; the Hobbit design has evolved
from a few basic principles and some ideas I had. I'll try to give you
a quick summary.
Hobbit should be portable across Unix architectures. This is obviously
important for the Hobbit client code, but I've done my best to use only
"standard" Unix and C for both the server- and client-code. This has turned
out to be easier than I thought; there are some really old Unix systems
that cannot run the Hobbit server, but any recent version of a Unix-like
system (released within the past 5-10 years) runs Hobbit without problems.
Hobbit also has to be backwards compatible with Big Brother, in the
sense that you can use BB clients with a Hobbit server. I had over 1000
servers with a BB client on them (Unix and Windows), so doing a "Big
Bang" switch changing the servers and clients at once would be
impossible.
Hobbit must scale well. When I started using Big Brother I had 40
servers to monitor. When I got to 100, I had to re-implement parts of BB
and this became the "bbgen" toolkit. When I got to 500, my BB server was
getting overloaded, and I started to work on a replacement. Today I have
2500 servers, and before Christmas I will be monitoring nearly 4000
servers. That's a 100x increase in just 5 years.
Hobbit should not rely on a lot of other infrastructure to work. E.g. it
doesn't require a huge database backend; you can add one if you like,
but it is not needed. Keeping Hobbit simple makes it robust - and you
really, REALLY want your monitoring to work when everything else is
crashing. We once had a major power outage at our datacenter; the Hobbit
server came up quickly, but there were a lot of systems that needed some
manual intervention to get back online. It was quite interesting to see
how the activity on the Hobbit server just sky-rocketed, because everyone
was looking at Hobbit to see which systems were running, and which were down.
Analyzing data
==============
For me, a key principle of handling the data that is poured into Hobbit
is that as much as possible of the data analysis should take place on
the Hobbit server. I believe it is a huge benefit to keep configuration
settings in one place (the Hobbit server); also, by having access to the
raw data you can also perform types of analysis that you didn't think of
when a script was initially created to collect some data.
E.g. the Hobbit client reports data about who is logged on to a system.
This information is not currently used by Hobbit, but I know that
someone wrote a custom backend utility to check this data and alert him
if there was someone logged in as "root". I had not thought of this, but
by making the raw data from the client available, it was very easy for
him to implement this check on all of his servers.
This is quite different from Big Brother, where the data is
pre-processed into status messages. The BB client can check if a
process is running, but then it just reports "process foo is OK".
When process "foo" is NOT running you get an error status - but you
cannot easily see if process "foo" has stopped because your backup is
running at the same time (and they cannot coexist) because you only get
part of the information, not the full process listing.
This may seem like a trivial example, but I realized early on that there
are far more ways of using these data than I could possibly imagine.
So instead of forcing my ideas of how to use the data upon others, it
should be possible to just get the raw data and perform your own analysis
of it.
Another example is that in Hobbit 4.2, I added a module which saves a
copy of the client data if a status goes red on a host. This has turned
out to be extremely helpful in diagnosing those "why did the webserver
crash at 4 AM last Tuesday" questions ... because you have access to a
lot of raw data collected by the client just before the crash happened,
including all of the data that Hobbit didn't analyze by itself but which
humans can use to put the whole picture together.
This is not implemented completely yet. The network test utility - which
was also carried over from the bbgen toolkit - works the "Big Brother"
way. One thing on my agenda is to change that, so the network tester
just reports that the ping of host "foo" responded in 12.7 ms, the ping of
host "bar" failed and so on. Then a module on the Hobbit server can
decide if these should result in a red or yellow status, perhaps based
on other information it has (eg that the response time shouldn't exceed
10 ms during working hours, unless the primary network connection was
down so we were running on a backup line with less capacity).
The core daemons
================
I wanted to have a network daemon holding all of the "current state"
information. This information changes all the time as new status
reports arrive, so it has to keep this in memory - writing it to disk
would be too slow (BB did this, and it doesn't scale). So a core
component of Hobbit would be this central daemon (hobbitd). The daemon
NEVER does any disk I/O; this would slow it down and I don't want that,
because Hobbit must support monitoring of thousands of servers. All
communication between hobbitd and the outside world goes via a network
connection; this is used both for in-band data (status updates and data
messages), but also for out-of-band data like control messages (drop a
host, disable a server and so on). Tools that need to fetch the entire
status of all servers, or just the detailed status of a host also do
this through a network connection to hobbitd.
However, some things must be stored on disk - RRD (graph) files, for
instance, or historical eventlogs. So this is handled by a bunch of
independant "worker" modules - hobbitd_rrd (RRD updates),
hobbitd_history (history logs), hobbitd_alert (sending out alerts).
These obviously have to be fed information about the data that flows
into the hobbitd daemon - e.g. hobbitd_rrd needs the full status message
to extract the data it puts into the RRD files, and hobbitd_history
needs information about the status changes from green->red and so on.
So I needed a fast inter-process communication mechanism between hobbitd
and the worker modules. Also, I wanted to be able to start/stop/restart
worker modules on-the-fly; this is extremely nice for testing and makes
the system much more robust. Finally, I wanted an interface that was
simple to use so that end-users can hook into the data stream if they
need to write some custom back-end script. The solution for this was
a mechanism that uses the System V "shared memory" IPC mechanism,
combined with a group of semaphores to control access to the shared
memory area. So hobbitd copies a message into the shared memory area
and up's a semaphore telling the workers that there is a new message.
The workers then pick up the message and down's another semaphore once
they have secured their copy of the message; hobbitd then knows when it
is safe to overwrite the shared-memory area with a new message. I call
this IPC mechanism a "channel", and there are in fact several of these:
One for each type of message. So there is a channel which receives all
of the raw "status" messages; another channel for the raw "data"
messages; a channel that receives messages about status changes (for
history logging); a channel that receives messages about critical
red/yellow statuses (for alerts) and so on. Recently a new channel was
added for the "client" messages that comes from the Hobbit client.
There are some early notes about this mechanism in the
hobbitd/new-daemon.txt file in the hobbit sources. Not all of the ideas
there have been implemented, e.g. the "streaming" protocol turned out
not to be particularly important.
To make sure that the semaphore stuff is handled correctly, I decided to
put a "buffer" module between hobbitd and the actual workers. This is
the hobbitd_channel module; it serves only one purpose, which is to grab
the messages that hobbitd sends out through the IPC mechanism, and queue
them for the real worker module (hobbitd_rrd, hobbitd_history etc). The
fact that hobbitd_channel acts as a message queue is useful to
accomodate spikes in the activity, e.g. the alert module sometimes gets
a huge spike of messages e.g. when a network switch dies.
hobbitd_channel also makes it easy to build your own backend modules,
because it forwards the messages via a simple text-based pipe; so your
custom backend modules can just read them from stdin.
Another benefit of having hobbitd_channel between hobbitd and the worker
modules showed up recently; I am currently working on a new version of
hobbitd_channel which can distribute the incoming messages between multiple
worker "clones" running on different servers, to perform some load
balancing of the heavy tasks (primarily RRD file updates). This has been
implemented almost exclusively by changing hobbitd_channel, instead of
having to modify all of the worker modules.
So the core design looks like this:
Network tests --\
\
TCP:1984 \ IPC
Clients ----------> hobbitd --------> hobbitd_channel ---> worker module
/ Sh. mem. stdin
/
Custom tests ---/
The Web interface
=================
The web interface is mostly carried over from Hobbit's predecessor, the
"bbgen" toolkit. I wrote this for Big Brother, to speed up the
generation of the Big Brother webpages, and by re-using this in Hobbit I
would quickly get a working web interface - all I had to do was to
change the programs to grab their data from the hobbitd daemon, instead
of reading through the status logfiles that Big Brother uses.
This also means that the web interface is not tied in with the core
daemons. Sure, they need to communicate and there are some things in the
core daemons that are closely related to how the web interface works -
e.g. disabling a host. But it should be possible for an adventurous
programmer to use the core Hobbit daemons with their own web front-end
tools and come up with a completely different user-interface.
So the web interface is probably the part of Hobbit that has evolved the
least from it's origins in Big Brother. Some new CGI programs have been
added, but nothing revolutionary new - it just picks up bits of
information from hobbitd and the configuration files and displays them.
One design criteria for the web interface is that it should be as
dynamic as possible; it must reflect the current status and
configuration as much as possible. That is why most of the web interface
is done with CGI programs; the only static webpages in Hobbit are the
overview pages generated by bbgen - and I hope to eliminate those soon.
The clients
===========
So with this background, it is obvious that the Hobbit client is really,
really dumb. It is basically just a shell script that runs some normal
OS commands - df, ps, who and so on - and then it's up to the Hobbit
server to analyze them and generate some status columns. Client data is
sent to hobbitd, which feeds it through a channel to the hobbitd_client
module. hobbitd_client has some parsers for each of the operating
systems it knows about, and uses those to grab the interesting data and
compare it to the client configuration rules. Then hobbitd_client
generates some "status" messages and sends them to hobbitd. The major
challenge with this design is logfiles; you cannot realistically send
entire logfiles - some of them are several GB of data - over to Hobbit
for analysis every 5 minutes. So some filtering must be done on the
client side; to keep all of the configuration data on the Hobbit server
this meant that the client has to pick up its filter-configuration from
the Hobbit server.
I hope this is enough of an overview for You. Good luck with your
thesis.
Regards,
Henrik
list Rich Smrcina
Henrik, That is an excellent description of the impetus for the design of Hobbit. Thank you so much, I read it with much interest. The 'dumb client' design is very handy in a shared environment, like the mainframe, where dozens or hundreds of virtual machines can be running on one box. We don't want a client that sucks up alot of CPU time when it wakes up. Also the performance improvements in the 4.2 release have dropped the CPU Utilization of the server by at least 80%!
▸
Henrik Stoerner wrote:Hi Wolfgang, your mails were in my spam mailbox for some reason. I then started writing a reply directly to You, but decided to send it to the list also - so it can be referenced through the mail archive if anyone else needs it.Now I need your help, because my professor at my university wants me to present the software architecture of Hobbit in my thesis. He wants some pictures and diagramms showing the 'inner mechanisms' and how the different modules work together. So my questions is, do you have any information about the architecture, that you can share with me?
... big snip ...
-- Rich Smrcina VM Assist, Inc. Phone: XXX-XXX-XXXX Ans Service: XXX-XXX-XXXX user-61add9955ef9@xymon.invalid Catch the WAVV! http://www.wavv.org WAVV 2007 - Green Bay, WI - May 18-22, 2007
list Wolfgang Wutz
Hi Henrik, I just noticed your email and all I can say is: WOW! Haven't had the time to read it all through, but I wanted to say thank you very much for the information! It's really appreciated! .......................................................... Mit freundlichem Gruß / Kind regards Wolfgang Wutz Siemens VDO Automotive AG SV IIS PTQ Rgb Im Gewerbepark C25 93059 Regensburg Germany Tel. +XX XXX XXX XXXX E-Mail: user-cfd3593ddd2c@xymon.invalid Visit our Communication Platform: https://toolnet.rbgs.ww011.siemens.net/index.php
list Gildas le Nadan
Hi Henrik, The content of this mail should IMHO be copy/pasted and added to the hobbit documentation! Just a random thought: wouldn't it be possible to have a pure shell or perl script hobbit client running from a cron job? Even if only a subset of functionnality is produced this way, it may be useful for some OS/machines where compilation is not possible or where running a compiled client is tricky/not supported (appliances for instances). Cheers, Gildas Henrik Stoerner wrote:
Hi Wolfgang,
[SNIP]
Regards, Henrik
list Stef Coene
▸
On Thursday 16 November 2006 16:42, Gildas Le Nadan wrote:
Hi Henrik, The content of this mail should IMHO be copy/pasted and added to the hobbit documentation! Just a random thought: wouldn't it be possible to have a pure shell or perl script hobbit client running from a cron job?
I think there is a perl script that can be used to replace the bb command. Most systems have a perl binary. You can even netcat to replace the bb command, you just need to be able to send some data over the network. Stef
list Gildas le Nadan
▸
Stef Coene wrote:
On Thursday 16 November 2006 16:42, Gildas Le Nadan wrote:Hi Henrik, The content of this mail should IMHO be copy/pasted and added to the hobbit documentation! Just a random thought: wouldn't it be possible to have a pure shell or perl script hobbit client running from a cron job?I think there is a perl script that can be used to replace the bb command. Most systems have a perl binary. You can even netcat to replace the bb command, you just need to be able to send some data over the network. Stef
No, sh and telnet are widely available binaries. Perl relies on far too many libs to be usable. And a lot of the old architectures don't have netcat or a usable compiler/libs. Gildas
list Steve Aiello
Hello all, I have been tasked with working a client app for users. Currently it is basically a yahoo widget that feeds off the bbgen nstab page. We monitor alot of sevices, and from that we have configured a good number of Alternate PageSets. basically providing a focused Hobbit/BBGen display for each group. Now those groups can run a yahoo widget that parses the nstab.html for all Critical alerts. And that seems to be working rather well. But this got me to thinking. There is alot more data contained in the hobbitdxboard output, that is much easier to use. Plus by using the hobbitdxboard, users/groups can customize their query much more. So basic questions I have: 1. is there a way to get to the hobbitdxboard data, other than using the bb command ? Would be great if there was a web cgi for this. If there isn't I am sure I can whip up something using perl. 2. The question of load, should it be a concern ? I know all of the hobbit data is stored in memory, so I didn't think this would be a problem. Also I was thinking, that if I write a perl cgi, I could cache the output of the query for 1 minute or so. Since normal client update period is 5 minutes... caching the hobbitdxboard output for a minute could save on any load. Thoughts ? ~ Steve
list Buchan Milne
On Thursday 16 November 2006 18:18, Aiello, Steve (GE, Corporate, consultant)
▸
wrote:Hello all, I have been tasked with working a client app for users. Currently it is basically a yahoo widget that feeds off the bbgen nstab page. We monitor alot of sevices, and from that we have configured a good number of Alternate PageSets. basically providing a focused Hobbit/BBGen display for each group. Now those groups can run a yahoo widget that parses the nstab.html for all Critical alerts. And that seems to be working rather well. But this got me to thinking. There is alot more data contained in the hobbitdxboard output, that is much easier to use. Plus by using the hobbitdxboard, users/groups can customize their query much more. So basic questions I have: 1. is there a way to get to the hobbitdxboard data, other than using the bb command ? Would be great if there was a web cgi for this. If there isn't I am sure I can whip up something using perl. 2. The question of load, should it be a concern ? I know all of the hobbit data is stored in memory, so I didn't think this would be a problem. Also I was thinking, that if I write a perl cgi, I could cache the output of the query for 1 minute or so. Since normal client update period is 5 minutes... caching the hobbitdxboard output for a minute could save on any load. Thoughts ?
After reading Henrik's mail on the architecture, specifically:
▸
But it should be possible for an adventurous programmer to use the core Hobbit daemons with their own web front-end tools and come up with a completely different user-interface.
I wondered about writing a frontend in Catalyst (http://www.catalystframework.org/). A Hobbit model plugin might be an idea ... Regards, Buchan -- Buchan Milne ISP Systems Specialist - Monitoring/Authentication Team Leader B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
list Stef Coene
▸
On Thursday 16 November 2006 17:05, Gildas Le Nadan wrote:
No, sh and telnet are widely available binaries. Perl relies on far too many libs to be usable. And a lot of the old architectures don't have netcat or a usable compiler/libs.
You can use ssh or rsh to remotly execute a bb command. Or what about mail? You can create a script that will be triggered on incoming mail and execute bb. So the clients mails the status to the hobbit server. As long as the client has a network connection, there is a way to monitor it with hobbit. Stef
list Matthew Davis
Your better off asking the list first. Henrik is likely quite busy which is the reason for the lack of response. Your libel to find just as good answers from members of the list.
▸
On 11/16/06, Wutz, Wolfgang <user-cfd3593ddd2c@xymon.invalid> wrote:Hello everybody, sorry for spamming the mailing list, but I'm quite desperate. Currently I'm doing my diploma thesis and part of it is setting up and customizing a monitoring environment using hobbit. Due to the fact, that my Prof wants me to write s.th. about the hobbit system architecture, I tried to reach Henrik personally by mail, but unfortunately got no reaction. Probably the spam filters blocked my messages. So Henrik, if you read this, please tell me a way to get in contact with you. Thanks for your time. Wolfgang
--
Matthew Davis http://familycampground.org/matthew/
list Michael Nagel
Hi Hendrik, first of all: Happy New Year! Before a holiday normally I redefine the hobbit-alert.cfg so that the rules for sunday works also for this day. And for this Christmas I forgot it. In the attachment you find an addition (based and in some parts stealed from your source) which calculate the holidays and alert like sunday (or what weekday you like). Source: holiday.src Holiday definition: hobbit-holidays.cfg In Germany there are 3 basic rules, static holidays, days based on easter and days based on the 4th Advent sunday. These are implemented. For other (national and religious) rules I had no time to enquire. May be it is a job for the mail list. If you like it, add in into your source tree. Regards, Michael
Attachments (2)
list Henrik Størner
Hi Michael,
▸
On Tue, Jan 02, 2007 at 04:12:52PM +0100, Michael Nagel wrote:In the attachment you find an addition (based and in some parts stealed from your source) which calculate the holidays and alert like sunday (or what weekday you like). Source: holiday.src Holiday definition: hobbit-holidays.cfg
thanks a lot for this - I've been needing this myself, but never got around to implement it. I've merged your code with some minor tweaks here and there. If anyone cares to write holiday definitions for other countries, please submit them to me. Regards, Henrik
list Thomas Kern
If you can send me the format of the data file, I will see if I can get the US government holidays for the next few years. /Thomas Kern /U.S. Department of Energy /XXX-XXX-XXXX
▸
-----Original Message----- From: user-ce4a2c883f75@xymon.invalid [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Saturday, February 03, 2007 11:11 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] alerting on holidays Hi Michael, On Tue, Jan 02, 2007 at 04:12:52PM +0100, Michael Nagel wrote:In the attachment you find an addition (based and in some > parts stealed from your source) which calculate the holidays and alert like sunday (or what weekday you like). Source: holiday.src Holiday definition: hobbit-holidays.cfgthanks a lot for this - I've been needing this myself, but never got around to implement it. I've merged your code with some minor tweaks here and there. If anyone cares to write holiday definitions for other countries, please submit them to me. Regards, Henrik
list Stef Coene
▸
On Monday 05 February 2007 16:29, Kern, Thomas wrote:
If you can send me the format of the data file, I will see if I can get the US government holidays for the next few years.
I had the same problem for a website I was creating. You can use google calendar to download an xml or ical list of holidays. Parsing this with perl and dumping it in a file not so difficult. The only problem I had was finding a good google calendar. Stef
list Thorsten Erdmann
Hi, how can I reduce the history logs und RRD files hobbit creates. I have a constant growth of these files. Hobbit isn't running for a year now, so I don't know if they will be limited when the year ist full. Does hobbit need the historylogs for some tasks or can I delete them from time to time. Thorsten Erdmann
list Henrik Størner
▸
On Tue, Feb 06, 2007 at 09:39:42AM +0100, user-06a84d0fcc19@xymon.invalid wrote:
how can I reduce the history logs und RRD files hobbit creates. I have a constant growth of these files. Hobbit isn't running for a year now, so I don't know if they will be limited when the year ist full. Does hobbit need the historylogs for some tasks or can I delete them from time to time.
The RRD files are constant in size - and new ones only appear when you add hosts. The historical log entries can be trimmed with the "trimhistory" utility. Regards, Henrik