Contact to Henrik

17 messages in this thread

list Wolfgang Wutz · Thu, 16 Nov 2006 08:20:09 +0100 ·

Hello everybody,

sorry for spamming the mailing list, but I'm quite desperate. 

Currently I'm doing my diploma thesis and part of it is setting up and
customizing a monitoring environment using hobbit.
Due to the fact, that my Prof wants me to write s.th. about the hobbit
system architecture, I tried to reach Henrik personally by mail, but
unfortunately got no reaction. Probably the spam filters blocked my
messages. 

So Henrik, if you read this, please tell me a way to get in contact with
you.

Thanks for your time.

Wolfgang

list Henrik Størner · Thu, 16 Nov 2006 15:17:16 +0100 ·

Hi Wolfgang,

your mails were in my spam mailbox for some reason. I then started
writing a reply directly to You, but decided to send it to the list also
- so it can be referenced through the mail archive if anyone else needs
it.

Now I need your help, because my professor at my university wants me
to present the software architecture of Hobbit in my thesis. He wants
some pictures and diagramms showing the 'inner mechanisms' and how the
different modules work together. So my questions is, do you have any
information about the architecture, that you can share with me?

I don't have any formal design documents; the Hobbit design has evolved
from a few basic principles and some ideas I had. I'll try to give you
a quick summary.

Hobbit should be portable across Unix architectures. This is obviously
important for the Hobbit client code, but I've done my best to use only
"standard" Unix and C for both the server- and client-code. This has turned 
out to be easier than I thought; there are some really old Unix systems 
that cannot run the Hobbit server, but any recent version of a Unix-like 
system (released within the past 5-10 years) runs Hobbit without problems.

Hobbit also has to be backwards compatible with Big Brother, in the
sense that you can use BB clients with a Hobbit server. I had over 1000
servers with a BB client on them (Unix and Windows), so doing a "Big
Bang" switch changing the servers and clients at once would be
impossible.

Hobbit must scale well. When I started using Big Brother I had 40 
servers to monitor. When I got to 100, I had to re-implement parts of BB 
and this became the "bbgen" toolkit. When I got to 500, my BB server was
getting overloaded, and I started to work on a replacement. Today I have
2500 servers, and before Christmas I will be monitoring nearly 4000
servers. That's a 100x increase in just 5 years.

Hobbit should not rely on a lot of other infrastructure to work. E.g. it
doesn't require a huge database backend; you can add one if you like,
but it is not needed. Keeping Hobbit simple makes it robust - and you
really, REALLY want your monitoring to work when everything else is
crashing. We once had a major power outage at our datacenter; the Hobbit 
server came up quickly, but there were a lot of systems that needed some 
manual intervention to get back online. It was quite interesting to see 
how the activity on the Hobbit server just sky-rocketed, because everyone 
was looking at Hobbit to see which systems were running, and which were down.


Analyzing data
==============
For me, a key principle of handling the data that is poured into Hobbit
is that as much as possible of the data analysis should take place on
the Hobbit server. I believe it is a huge benefit to keep configuration
settings in one place (the Hobbit server); also, by having access to the 
raw data you can also perform types of analysis that you didn't think of
when a script was initially created to collect some data.

E.g. the Hobbit client reports data about who is logged on to a system.
This information is not currently used by Hobbit, but I know that
someone wrote a custom backend utility to check this data and alert him
if there was someone logged in as "root". I had not thought of this, but
by making the raw data from the client available, it was very easy for
him to implement this check on all of his servers.

This is quite different from Big Brother, where the data is
pre-processed into status messages. The BB client can check if a
process is running, but then it just reports "process foo is OK".
When process "foo" is NOT running you get an error status - but you
cannot easily see if process "foo" has stopped because your backup is
running at the same time (and they cannot coexist) because you only get
part of the information, not the full process listing.

This may seem like a trivial example, but I realized early on that there
are far more ways of using these data than I could possibly imagine.
So instead of forcing my ideas of how to use the data upon others, it 
should be possible to just get the raw data and perform your own analysis 
of it.

Another example is that in Hobbit 4.2, I added a module which saves a
copy of the client data if a status goes red on a host. This has turned
out to be extremely helpful in diagnosing those "why did the webserver
crash at 4 AM last Tuesday" questions ... because you have access to a
lot of raw data collected by the client just before the crash happened,
including all of the data that Hobbit didn't analyze by itself but which
humans can use to put the whole picture together.

This is not implemented completely yet. The network test utility - which
was also carried over from the bbgen toolkit - works the "Big Brother" 
way. One thing on my agenda is to change that, so the network tester
just reports that the ping of host "foo" responded in 12.7 ms, the ping of
host "bar" failed and so on. Then a module on the Hobbit server can
decide if these should result in a red or yellow status, perhaps based
on other information it has (eg that the response time shouldn't exceed
10 ms during working hours, unless the primary network connection was
down so we were running on a backup line with less capacity).


The core daemons
================
I wanted to have a network daemon holding all of the "current state" 
information. This information changes all the time as new status
reports arrive, so it has to keep this in memory - writing it to disk
would be too slow (BB did this, and it doesn't scale). So a core
component of Hobbit would be this central daemon (hobbitd). The daemon
NEVER does any disk I/O; this would slow it down and I don't want that,
because Hobbit must support monitoring of thousands of servers. All
communication between hobbitd and the outside world goes via a network
connection; this is used both for in-band data (status updates and data
messages), but also for out-of-band data like control messages (drop a
host, disable a server and so on). Tools that need to fetch the entire
status of all servers, or just the detailed status of a host also do
this through a network connection to hobbitd.

However, some things must be stored on disk - RRD (graph) files, for
instance, or historical eventlogs. So this is handled by a bunch of
independant "worker" modules - hobbitd_rrd (RRD updates),
hobbitd_history (history logs), hobbitd_alert (sending out alerts).
These obviously have to be fed information about the data that flows
into the hobbitd daemon - e.g. hobbitd_rrd needs the full status message
to extract the data it puts into the RRD files, and hobbitd_history
needs information about the status changes from green->red and so on.
So I needed a fast inter-process communication mechanism between hobbitd
and the worker modules. Also, I wanted to be able to start/stop/restart
worker modules on-the-fly; this is extremely nice for testing and makes
the system much more robust. Finally, I wanted an interface that was
simple to use so that end-users can hook into the data stream if they
need to write some custom back-end script. The solution for this was
a mechanism that uses the System V "shared memory" IPC mechanism,
combined with a group of semaphores to control access to the shared
memory area. So hobbitd copies a message into the shared memory area
and up's a semaphore telling the workers that there is a new message.
The workers then pick up the message and down's another semaphore once
they have secured their copy of the message; hobbitd then knows when it
is safe to overwrite the shared-memory area with a new message. I call
this IPC mechanism a "channel", and there are in fact several of these:
One for each type of message. So there is a channel which receives all
of the raw "status" messages; another channel for the raw "data"
messages; a channel that receives messages about status changes (for
history logging); a channel that receives messages about critical
red/yellow statuses (for alerts) and so on. Recently a new channel was
added for the "client" messages that comes from the Hobbit client.

There are some early notes about this mechanism in the
hobbitd/new-daemon.txt file in the hobbit sources. Not all of the ideas
there have been implemented, e.g. the "streaming" protocol turned out
not to be particularly important.

To make sure that the semaphore stuff is handled correctly, I decided to
put a "buffer" module between hobbitd and the actual workers. This is
the hobbitd_channel module; it serves only one purpose, which is to grab
the messages that hobbitd sends out through the IPC mechanism, and queue
them for the real worker module (hobbitd_rrd, hobbitd_history etc). The
fact that hobbitd_channel acts as a message queue is useful to
accomodate spikes in the activity, e.g. the alert module sometimes gets
a huge spike of messages e.g. when a network switch dies.
hobbitd_channel also makes it easy to build your own backend modules,
because it forwards the messages via a simple text-based pipe; so your
custom backend modules can just read them from stdin.

Another benefit of having hobbitd_channel between hobbitd and the worker
modules showed up recently; I am currently working on a new version of
hobbitd_channel which can distribute the incoming messages between multiple
worker "clones" running on different servers, to perform some load
balancing of the heavy tasks (primarily RRD file updates). This has been
implemented almost exclusively by changing hobbitd_channel, instead of
having to modify all of the worker modules.

So the core design looks like this:


   Network tests --\
                    \
            TCP:1984 \           IPC
   Clients ----------> hobbitd --------> hobbitd_channel ---> worker module
                     /         Sh. mem.                 stdin
                    /
   Custom tests ---/


The Web interface
=================
The web interface is mostly carried over from Hobbit's predecessor, the
"bbgen" toolkit. I wrote this for Big Brother, to speed up the
generation of the Big Brother webpages, and by re-using this in Hobbit I
would quickly get a working web interface - all I had to do was to
change the programs to grab their data from the hobbitd daemon, instead
of reading through the status logfiles that Big Brother uses.

This also means that the web interface is not tied in with the core
daemons. Sure, they need to communicate and there are some things in the
core daemons that are closely related to how the web interface works -
e.g. disabling a host. But it should be possible for an adventurous
programmer to use the core Hobbit daemons with their own web front-end
tools and come up with a completely different user-interface.

So the web interface is probably the part of Hobbit that has evolved the
least from it's origins in Big Brother. Some new CGI programs have been
added, but nothing revolutionary new - it just picks up bits of
information from hobbitd and the configuration files and displays them.

One design criteria for the web interface is that it should be as
dynamic as possible; it must reflect the current status and
configuration as much as possible. That is why most of the web interface
is done with CGI programs; the only static webpages in Hobbit are the
overview pages generated by bbgen - and I hope to eliminate those soon.


The clients
===========
So with this background, it is obvious that the Hobbit client is really,
really dumb. It is basically just a shell script that runs some normal
OS commands - df, ps, who and so on - and then it's up to the Hobbit
server to analyze them and generate some status columns. Client data is
sent to hobbitd, which feeds it through a channel to the hobbitd_client
module. hobbitd_client has some parsers for each of the operating
systems it knows about, and uses those to grab the interesting data and
compare it to the client configuration rules. Then hobbitd_client
generates some "status" messages and sends them to hobbitd. The major
challenge with this design is logfiles; you cannot realistically send
entire logfiles - some of them are several GB of data - over to Hobbit
for analysis every 5 minutes. So some filtering must be done on the
client side; to keep all of the configuration data on the Hobbit server
this meant that the client has to pick up its filter-configuration from
the Hobbit server.


I hope this is enough of an overview for You. Good luck with your
thesis.


Regards,
Henrik

list Rich Smrcina · Thu, 16 Nov 2006 08:59:32 -0600 ·

Henrik,

That is an excellent description of the impetus for the design of Hobbit.  Thank you so much, I read it with much interest.

The 'dumb client' design is very handy in a shared environment, like the mainframe, where dozens or hundreds of virtual machines can be running on one box.  We don't want a client that sucks up alot of CPU time when it wakes up.  Also the performance improvements in the 4.2 release have dropped the CPU Utilization of the server by at least 80%!

▸ quoted from Henrik Størner


Henrik Stoerner wrote:

Hi Wolfgang,

your mails were in my spam mailbox for some reason. I then started
writing a reply directly to You, but decided to send it to the list also
- so it can be referenced through the mail archive if anyone else needs
it.

Now I need your help, because my professor at my university wants me
to present the software architecture of Hobbit in my thesis. He wants
some pictures and diagramms showing the 'inner mechanisms' and how the
different modules work together. So my questions is, do you have any
information about the architecture, that you can share with me?

... big snip ...

-- 
Rich Smrcina
VM Assist, Inc.
Phone: XXX-XXX-XXXX
Ans Service:  XXX-XXX-XXXX
user-61add9955ef9@xymon.invalid

Catch the WAVV!  http://www.wavv.org
WAVV 2007 - Green Bay, WI - May 18-22, 2007

list Wolfgang Wutz · Thu, 16 Nov 2006 16:24:27 +0100 ·

Hi Henrik,

I just noticed your email and all I can say is: WOW!
Haven't had the time to read it all through, but I wanted to say thank you very much for the information! 
It's really appreciated!

..........................................................
Mit freundlichem Gruß / Kind regards
Wolfgang Wutz
Siemens VDO Automotive AG
SV IIS PTQ Rgb
Im Gewerbepark C25
93059 Regensburg
Germany
Tel. +XX XXX XXX XXXX
E-Mail: user-cfd3593ddd2c@xymon.invalid

Visit our Communication Platform:
https://toolnet.rbgs.ww011.siemens.net/index.php

list Gildas le Nadan · Thu, 16 Nov 2006 15:42:04 +0000 ·

Hi Henrik,

The content of this mail should IMHO be copy/pasted and added to the hobbit documentation!

Just a random thought: wouldn't it be possible to have a pure shell or perl script hobbit client running from a cron job?

Even if only a subset of functionnality is produced this way, it may be useful for some OS/machines where compilation is not possible or where running a compiled client is tricky/not supported (appliances for instances).

Cheers,
Gildas

Henrik Stoerner wrote:

Hi Wolfgang,

[SNIP]

Regards,
Henrik

list Stef Coene · Thu, 16 Nov 2006 17:01:27 +0100 ·

▸ quoted from Gildas le Nadan

On Thursday 16 November 2006 16:42, Gildas Le Nadan wrote:

Hi Henrik,

The content of this mail should IMHO be copy/pasted and added to the
hobbit documentation!

Just a random thought: wouldn't it be possible to have a pure shell or
perl script hobbit client running from a cron job?

I think there is a perl script that can be used to replace the bb command.
Most systems have a perl binary.  You can even netcat to replace the bb 
command, you just need to be able to send some data over the network.


Stef

list Gildas le Nadan · Thu, 16 Nov 2006 16:05:14 +0000 ·

▸ quoted from Stef Coene


Stef Coene wrote:

On Thursday 16 November 2006 16:42, Gildas Le Nadan wrote:

Hi Henrik,

The content of this mail should IMHO be copy/pasted and added to the
hobbit documentation!

Just a random thought: wouldn't it be possible to have a pure shell or
perl script hobbit client running from a cron job?

I think there is a perl script that can be used to replace the bb command.
Most systems have a perl binary.  You can even netcat to replace the bb command, you just need to be able to send some data over the network.


Stef

No, sh and telnet are widely available binaries. Perl relies on far too many libs to be usable. And a lot of the old architectures don't have netcat or a usable compiler/libs.

Gildas

list Steve Aiello · Thu, 16 Nov 2006 11:18:02 -0500 ·

Hello all,

I have been tasked with working a client app for users. Currently it is
basically a yahoo widget that feeds off the bbgen nstab page. We monitor
alot of sevices, and from that we have configured a good number of
Alternate PageSets. basically providing a focused Hobbit/BBGen display
for each group.

Now those groups can run a yahoo widget that parses the nstab.html for
all Critical alerts. And that seems to be working rather well. But this
got me to thinking. There is alot more data contained in the
hobbitdxboard output, that is much easier to use. Plus by using the
hobbitdxboard, users/groups can customize their query much more.

So basic questions I have:
1. is there a way to get to the hobbitdxboard data, other than using the
bb command ? Would be great if there was a web cgi for this.  If there
isn't I am sure I can whip up something using perl.

2. The question of load, should it be a concern ? I know all of the
hobbit data is stored in memory, so I didn't think this would be a
problem. Also I was thinking, that if I write a perl cgi, I could cache
the output of the query for 1 minute or so.  Since normal client update
period is 5 minutes...  caching the hobbitdxboard output for a minute
could save on any load.

Thoughts ?
 ~ Steve

list Buchan Milne · Thu, 16 Nov 2006 18:45:00 +0200 ·

On Thursday 16 November 2006 18:18, Aiello, Steve (GE, Corporate, consultant)

▸ quoted from Steve Aiello

wrote:

Hello all,

I have been tasked with working a client app for users. Currently it is
basically a yahoo widget that feeds off the bbgen nstab page. We monitor
alot of sevices, and from that we have configured a good number of
Alternate PageSets. basically providing a focused Hobbit/BBGen display
for each group.

Now those groups can run a yahoo widget that parses the nstab.html for
all Critical alerts. And that seems to be working rather well. But this
got me to thinking. There is alot more data contained in the
hobbitdxboard output, that is much easier to use. Plus by using the
hobbitdxboard, users/groups can customize their query much more.

So basic questions I have:
1. is there a way to get to the hobbitdxboard data, other than using the
bb command ? Would be great if there was a web cgi for this.  If there
isn't I am sure I can whip up something using perl.

2. The question of load, should it be a concern ? I know all of the
hobbit data is stored in memory, so I didn't think this would be a
problem. Also I was thinking, that if I write a perl cgi, I could cache
the output of the query for 1 minute or so.  Since normal client update
period is 5 minutes...  caching the hobbitdxboard output for a minute
could save on any load.

Thoughts ?


After reading Henrik's mail on the architecture, specifically:

▸ quoted from Henrik Størner

But it should be possible for an adventurous
programmer to use the core Hobbit daemons with their own web front-end
tools and come up with a completely different user-interface.

I wondered about writing a frontend in Catalyst 
(http://www.catalystframework.org/).

A Hobbit model plugin might be an idea ...

Regards,
Buchan

-- 
Buchan Milne
ISP Systems Specialist - Monitoring/Authentication Team Leader
B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)

list Stef Coene · Thu, 16 Nov 2006 20:07:14 +0100 ·

▸ quoted from Gildas le Nadan

On Thursday 16 November 2006 17:05, Gildas Le Nadan wrote:

No, sh and telnet are widely available binaries. Perl relies on far too
many libs to be usable. And a lot of the old architectures don't have
netcat or a usable compiler/libs.

You can use ssh or rsh to remotly execute a bb command.
Or what about mail?  You can create a script that will be triggered on incoming mail and execute bb.  So the clients mails the status to the hobbit server.

As long as the client has a network connection, there is a way to monitor it with hobbit.


Stef

list Matthew Davis · Thu, 16 Nov 2006 23:24:36 -0500 ·

Your better off asking the list first.  Henrik is likely quite busy
which is the reason for the lack of response.  Your libel to find just
as good answers from members of the list.

▸ quoted from Wolfgang Wutz


On 11/16/06, Wutz, Wolfgang <user-cfd3593ddd2c@xymon.invalid> wrote:

Hello everybody,

sorry for spamming the mailing list, but I'm quite desperate.

Currently I'm doing my diploma thesis and part of it is setting up and
customizing a monitoring environment using hobbit.
Due to the fact, that my Prof wants me to write s.th. about the hobbit
system architecture, I tried to reach Henrik personally by mail, but
unfortunately got no reaction. Probably the spam filters blocked my
messages.

So Henrik, if you read this, please tell me a way to get in contact with
you.

Thanks for your time.

Wolfgang

--


Matthew Davis
http://familycampground.org/matthew/

list Michael Nagel · Tue, 2 Jan 2007 16:12:52 +0100 ·

Hi Hendrik,

first of all: Happy New Year!

Before a holiday normally I redefine the hobbit-alert.cfg
so that the rules for sunday works also for this day.

And for this Christmas I forgot it.

In the attachment you find an addition (based and in some 
parts stealed from your source) which calculate the holidays
and alert like sunday (or what weekday you like).
Source: holiday.src
Holiday definition: hobbit-holidays.cfg

In Germany there are 3 basic rules, static holidays, days
based on easter and days based on the 4th Advent sunday.
These are implemented. For other (national and religious)
rules I had no time to enquire. May be it is a job for the 
mail list.

If you like it, add in into your source tree.

Regards,
Michael

Attachments (2)

attachment.obj application/octet-stream · 1.6 KB
attachment-0001.obj application/octet-stream · 9.2 KB

list Henrik Størner · Sat, 3 Feb 2007 17:11:22 +0100 ·

Hi Michael,

▸ quoted from Michael Nagel



On Tue, Jan 02, 2007 at 04:12:52PM +0100, Michael Nagel wrote:

In the attachment you find an addition (based and in some 
parts stealed from your source) which calculate the holidays
and alert like sunday (or what weekday you like).
Source: holiday.src
Holiday definition: hobbit-holidays.cfg

thanks a lot for this - I've been needing this myself, but never
got around to implement it.

I've merged your code with some minor tweaks here and there. If
anyone cares to write holiday definitions for other countries,
please submit them to me.


Regards,
Henrik

list Thomas Kern · Mon, 5 Feb 2007 10:29:02 -0500 ·

If you can send me the format of the data file, I will see if I can get
the US government holidays for the next few years.

/Thomas Kern
/U.S. Department of Energy
/XXX-XXX-XXXX

▸ quoted from Henrik Størner

-----Original Message-----
From: user-ce4a2c883f75@xymon.invalid [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Saturday, February 03, 2007 11:11 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] alerting on holidays

Hi Michael,


On Tue, Jan 02, 2007 at 04:12:52PM +0100, Michael Nagel wrote:

In the attachment you find an addition (based and in some > parts stealed from your source) which calculate the holidays
and alert like sunday (or what weekday you like).
Source: holiday.src
Holiday definition: hobbit-holidays.cfg

thanks a lot for this - I've been needing this myself, but never
got around to implement it.

I've merged your code with some minor tweaks here and there. If
anyone cares to write holiday definitions for other countries,
please submit them to me.


Regards,
Henrik

list Stef Coene · Mon, 5 Feb 2007 16:42:29 +0100 ·

▸ quoted from Thomas Kern

On Monday 05 February 2007 16:29, Kern, Thomas wrote:

If you can send me the format of the data file, I will see if I can get
the US government holidays for the next few years.

I had the same problem for a website I was creating.
You can use google calendar to download an xml or ical list of holidays.  
Parsing this with perl and dumping it in a file not so difficult.  The only 
problem I had was finding a good google calendar.


Stef

list Thorsten Erdmann · Tue, 06 Feb 2007 09:39:42 +0100 ·

Hi,

how can I reduce the history logs und RRD files hobbit creates. I have a constant growth of these files. Hobbit isn't running for a year now, so I don't know if they will be limited when the year ist full.
Does hobbit need the historylogs for some tasks or can I delete them from time to time.

Thorsten Erdmann

list Henrik Størner · Tue, 6 Feb 2007 10:01:28 +0100 ·

▸ quoted from Thorsten Erdmann

On Tue, Feb 06, 2007 at 09:39:42AM +0100, user-06a84d0fcc19@xymon.invalid wrote:

how can I reduce the history logs und RRD files hobbit creates. I have a constant growth of these files. Hobbit isn't running for a year now, so I don't know if they will be limited when the year ist full.
Does hobbit need the historylogs for some tasks or can I delete them from time to time.

The RRD files are constant in size - and new ones only appear when you
add hosts.

The historical log entries can be trimmed with the "trimhistory"
utility.


Regards,
Henrik

Contact to Henrik 🔗 link

Contact to Henrik