Xymon Mailing List Archive search

Agentless clients

11 messages in this thread

list Scott Walters · Wed, 4 Jan 2006 12:45:05 -0500 (EST) ·
Henrik,

	Since you seem to be a fan of 'parse evertyhing on the server'
have you ever considered adding the ability to have hobbit use 'agentless
clients" ?

	By agentless, I mean only the trust relationship between the
server and client is established, commands and output are done
'on-the-fly', every 5 minutes.

	After writing the VPN/sshbb/email I was thinkin it could be pretty
easy to add this 'functionality' native to hobbit.


-- 
Scott Walters
-PacketPusher
list Henrik Størner · Wed, 4 Jan 2006 23:49:25 +0100 ·
Hi Scott,
quoted from Scott Walters

On Wed, Jan 04, 2006 at 12:45:05PM -0500, Scott Walters wrote:
	Since you seem to be a fan of 'parse evertyhing on the server'
have you ever considered adding the ability to have hobbit use 'agentless
clients" ?

	By agentless, I mean only the trust relationship between the
server and client is established, commands and output are done
'on-the-fly', every 5 minutes.

	After writing the VPN/sshbb/email I was thinkin it could be pretty
easy to add this 'functionality' native to hobbit.
I guess it would, and when I wrote the Hobbit client I did it with this
in mind. It would be fairly trivial to provide a wrapper for the client
side scripts that runs them through a VPN or SSH tunnel.

It's not something that I plan on using myself, but it could easily be
a common add-on to Hobbit. I do prefer to run the clients locally on the
servers because that scales much better than having one server pull all
of this data into Hobbit. Parsing the data doesn't take nearly as long
as collecting it...


Regards,
Henrik
list Daniel J McDonald · Thu, 05 Jan 2006 09:40:31 -0600 ·
quoted from Henrik Størner
On Wed, 2006-01-04 at 23:49 +0100, Henrik Stoerner wrote:
Hi Scott,

On Wed, Jan 04, 2006 at 12:45:05PM -0500, Scott Walters wrote:
	After writing the VPN/sshbb/email I was thinkin it could be pretty
easy to add this 'functionality' native to hobbit.
I guess it would, and when I wrote the Hobbit client I did it with this
in mind. It would be fairly trivial to provide a wrapper for the client
side scripts that runs them through a VPN or SSH tunnel.
This would be very useful to me - "hobbit central" ala bb-central.
quoted from Henrik Størner
It's not something that I plan on using myself, but it could easily be
a common add-on to Hobbit. I do prefer to run the clients locally on the
servers because that scales much better than having one server pull all
of this data into Hobbit. Parsing the data doesn't take nearly as long
as collecting it...
But having one or two servers that poll all of the others does scale
well, because you don't have to install (and upgrade) hobbit clients on
a hundred machines - just set up an rsa key and you are done.  If the
primary hobbit display/alarm/parse work is too much with the polling
added, just use a second hobbit server for polling/parsing and feed the
results to the display server...

-- 
Daniel J McDonald, CCIE # 2495, CNX, CISSP # 78281
Austin Energy
user-290ce4e24e19@xymon.invalid


gpg Key: http://austinnetworkdesign.com/pgp.key
Key fingerprint = B527 F53D 0C8C D38B DCC7  901D 2F19 A13A 22E8 A76A
list Charles Jones · Thu, 05 Jan 2006 09:21:15 -0700 ·
quoted from Daniel J McDonald
Daniel J McDonald wrote:
But having one or two servers that poll all of the others does scale
well, because you don't have to install (and upgrade) hobbit clients on
a hundred machines - just set up an rsa key and you are done.  If the
primary hobbit display/alarm/parse work is too much with the polling
added, just use a second hobbit server for polling/parsing and feed the
results to the display server...
 
I disagree. The distributed system scales much better, as the remote servers are sending in their results in parallel.
While fping is able to test remote hosts in parallel, the other test are done in serial (bb-fetch, etc).

Lets say you have 1000 hosts. Lets then just for fun pretend that it will only take 1 second to log into the remote hosts, run several tests, and receive the result (it would actually take a bit longer than that).

1000 seconds (hosts) / 60 (minutes) = 16.666 minutes to poll those hosts!

So then you can say oh well just have 2-3 hobbit servers doing the polling then.  Now you have 3 hobbit servers to deail with, monitoring them, upgrading them, etc.

Now lets look at a *real world* example of how long it takes to ssh in and execute a command:

[hobbit at hobbit ~]$ time ssh myhost.net df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c5t0d0s0       30G    10G    20G    35%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   6.6G  1000K   6.6G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
fd                       0K     0K     0K     0%    /dev/fd
swap                   6.6G    16K   6.6G     1%    /tmp
swap                   6.6G    32K   6.6G     1%    /var/run
/dev/md/dsk/d0         639G   116G   523G    19%    /raid
/dev/md/dsk/d1         807G   504G   304G    63%    /raid2

*real    0m1.912s*
user    0m0.022s
sys     0m0.008s

Almost 2 seconds there....and just for one command. So now even 2 hobbit servers polling simultaneously will still take over 15 minutes just to poll 1000 servers. Having hobbit do the ssh's in parallel wouldn't work either, I have tried something similar on far fewer hosts, and even using -c blowfish option the server CPU still hit 100% from all the overhead.

The way that I get around this is to have bbproxy running on a DMZ host, and have the hobbit/bb clients configured to use the bbproxy IP as their BBDISPLAY, whcih then forwards the traffic out of the DMZ to my hobbit server.  Not 100% secure, but using bb-fetch isn't either (an attacker could compromise one of the remote servers, and modify one of the commands that the hobbit user executes, thus giving them the ability to communicate with the hobbit server, injecting something to break the parsing engine, buffer overflows, etc). I will stop talking about that now as I am getting off subject :)

I agree that having similar functionality to bb-fetch could be useful for a *few* remote/DMZ hosts, but it certainly doesn't scale well. Once you reach a number of hosts whose polling time exceeds the hobbit refresh interval you are done.  I know it would be "nice" if we didn't have to upgrade remote clients and maintain them, but your solution involves ssh keys, so just use those same keys and a script to roll out the updates :)

-Charles
list Buchan Milne · Thu, 5 Jan 2006 19:02:52 +0200 ·
quoted from Charles Jones
On Thursday 05 January 2006 18:21, Charles Jones wrote:
Lets say you have 1000 hosts. Lets then just for fun pretend that it
will only take 1 second to log into the remote hosts, run several tests,
and receive the result (it would actually take a bit longer than that).

1000 seconds (hosts) / 60 (minutes) = 16.666 minutes to poll those hosts!

So then you can say oh well just have 2-3 hobbit servers doing the
polling then.  Now you have 3 hobbit servers to deail with, monitoring
them, upgrading them, etc.

Now lets look at a *real world* example of how long it takes to ssh in
and execute a command:

[hobbit at hobbit ~]$ time ssh myhost.net df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c5t0d0s0       30G    10G    20G    35%    /
/devices                 0K     0K     0K     0%    /devices
ctfs                     0K     0K     0K     0%    /system/contract
proc                     0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
swap                   6.6G  1000K   6.6G     1%    /etc/svc/volatile
objfs                    0K     0K     0K     0%    /system/object
fd                       0K     0K     0K     0%    /dev/fd
swap                   6.6G    16K   6.6G     1%    /tmp
swap                   6.6G    32K   6.6G     1%    /var/run
/dev/md/dsk/d0         639G   116G   523G    19%    /raid
/dev/md/dsk/d1         807G   504G   304G    63%    /raid2

*real    0m1.912s*
user    0m0.022s
sys     0m0.008s

Almost 2 seconds there....and just for one command. 
I keep wondering how long the equivalent snmp query takes ... or in fact 
gathering all the data asynchronously via snmp ...

Regards,
Buchan

-- 
Buchan Milne
ISP Systems Specialist
B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
list Henrik Størner · Thu, 5 Jan 2006 18:31:46 +0100 ·
quoted from Buchan Milne
On Thu, Jan 05, 2006 at 07:02:52PM +0200, Buchan Milne wrote:
On Thursday 05 January 2006 18:21, Charles Jones wrote:
*real    0m1.912s*
user    0m0.022s
sys     0m0.008s

Almost 2 seconds there....and just for one command. 
I keep wondering how long the equivalent snmp query takes ... or in fact 
gathering all the data asynchronously via snmp ...
Let's see ... Net-SNMP includes an "snmpdf" command:

$ time snmpdf -v 1 -c somepassword somehost
Description        size (kB)         Used       Available Used%
/                    5969124       754896         5214228   12%

real    0m0.201s
user    0m0.154s
sys     0m0.019s


$ time ssh somehost df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda1              5969124    754896   4911004  14% /

real    0m0.535s
user    0m0.011s
sys     0m0.004s


So: 0.2 seconds for snmp, 0.5 seconds for ssh. (No, I don't know why
they calculate the available disk size differently - df probably 
leaves out the 5% filesystem space that is reserved for "root" use).

SNMP probably wins because it is UDP based, so you avoid a lot of
overhead from the TCP connection setup. Plus the SNMP daemon is running,
so it doesn't need to start a new process to respond.

But I do agree with Charles - using a "bbfetch" style method of pulling
data from clients to the server only works for a small number of hosts.
On the scale that I work with on a daily basis - 2000 hosts or more -
it is simply not practical to contact all hosts every 5 minutes.

I'm still willing to implement the agent-less data collection in Hobbit,
because sometimes that is just going to be the only way you can get
information about a server. So it is an OK way of doing this, if you
know what it should - and should not - be used for.

(BTW, the issue that was raised about updates being easier if you don't
have to deploy them on all clients - that's a non-issue. It boils down
to whether or not you have an (automated) procedure for software and
patch distribution - if you don't have that, then you're in trouble no 
matter how Hobbit collects data.)


Regards,
Henrik
list Scott Walters · Thu, 5 Jan 2006 13:04:52 -0500 (EST) ·
quoted from Buchan Milne
On Thu, 5 Jan 2006, Buchan Milne wrote:
Almost 2 seconds there....and just for one command.
I keep wondering how long the equivalent snmp query takes ... or in fact
gathering all the data asynchronously via snmp ...
I've always thought that SNMP would be great for UNIX.  But everytime I
have ever tried to really make it work, it just doesn't.  snmp agent
issues, OID issues, polling issues, UDP network issues.

My previous experiences have been so heinous I can easily recognize my own
bias against SNMP for unix at this point.

But what is wierd is it works so well for comm/network equipment.

-- 
Scott Walters
-PacketPusher
list Scott Walters · Thu, 5 Jan 2006 13:46:36 -0500 (EST) ·
quoted from Charles Jones
On Thu, 5 Jan 2006, Charles Jones wrote:
I disagree. The distributed system scales much better, as the remote
servers are sending in their results in parallel.
I think we could architect the agentless solution to run in parallel, or
some sort of asynch scheduler/threads.
quoted from Buchan Milne
Lets say you have 1000 hosts. Lets then just for fun pretend that it
will only take 1 second to log into the remote hosts, run several tests,
and receive the result (it would actually take a bit longer than that).
1000 seconds (hosts) / 60 (minutes) = 16.666 minutes to poll those hosts!
1 sec is *very* optitmisic ;)  So your point is very clear that a generic
serial "for host in x y z" would not scale at all.
quoted from Charles Jones
either, I have tried something similar on far fewer hosts, and even
using -c blowfish option the server CPU still hit 100% from all the
overhead.
I've found the -c blowfish only helps when you are pushing a lot of data
around (ufsdump 0fc - | ssh -c blowfish).
quoted from Charles Jones
commands that the hobbit user executes, thus giving them the ability to
communicate with the hobbit server, injecting something to break the
parsing engine, buffer overflows, etc). I will stop talking about that
now as I am getting off subject :)
If that's the easiest way to get into your network, you get a gold star ;)
quoted from Charles Jones
I agree that having similar functionality to bb-fetch could be useful
for a *few* remote/DMZ hosts, but it certainly doesn't scale well. Once
you reach a number of hosts whose polling time exceeds the hobbit
refresh interval you are done.  I know it would be "nice" if we didn't
have to upgrade remote clients and maintain them, but your solution
involves ssh keys, so just use those same keys and a script to roll out
the updates :)
True, and I am not sold on the agentless clients idea either, but we've
got such a great framework to try it on.

The first design decision in my mind would be if in agentless we mean

1)  install/run/uninstall the hobbit client every 5m, i know this sounds
horribly inefficient but I am attracted to the simplicity of agent and
agentless machines being the 'same'.  or just automagically install the
client if the trust exists . . . . .

2) only running the exact OS commands necessary and capturing the output.
This would require some new code on the server.  But if done right, it
could perhaps replace existing clients.

abstract the collection from the delivery . . . .


-- 
Scott Walters
-PacketPusher
list Henrik Størner · Thu, 5 Jan 2006 20:00:12 +0100 ·
quoted from Scott Walters
On Thu, Jan 05, 2006 at 01:46:36PM -0500, Scott Walters wrote:
I agree that having similar functionality to bb-fetch could be useful
for a *few* remote/DMZ hosts, but it certainly doesn't scale well.
True, and I am not sold on the agentless clients idea either, but we've
got such a great framework to try it on.

The first design decision in my mind would be if in agentless we mean

1)  install/run/uninstall the hobbit client every 5m, i know this sounds
horribly inefficient but I am attracted to the simplicity of agent and
agentless machines being the 'same'.  or just automagically install the
client if the trust exists . . . . .

2) only running the exact OS commands necessary and capturing the output.
This would require some new code on the server.  But if done right, it
could perhaps replace existing clients.

abstract the collection from the delivery . . . .
You can easily combine 1) and 2). Running something like this on the 
client-polling server would do it:

CLIENT="www.foo.com"
CLIENTOS="linux"
echo "client $CLIENT.$CLIENTOS" >/tmp/msg-$CLIENT.txt
ssh $CLIENT < ~$BBHOME/bin/hobbitclient-$CLIENTOS.sh >>/tmp/msg-$CLIENT.txt
$BB $BBDISP "@" </tmp/msg-$CLIENT.txt

would run the normal client-side scripts without having them installed
on each client box, and send the output to the Hobbit server in a normal
"client" message. There is an issue with the OS'es that need special tools 
installed (usually to collect the memory statistics), but that can be worked 
around.


Regards,
Henrik
list Adam Scheblein · Thu, 5 Jan 2006 13:04:43 -0600 ·
What if you were to use ssh and xargs to collect the same client
information and then just send that message straight into the hobbit
listener?  Would that work?

Adam
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Thursday, January 05, 2006 1:00 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Agentless clients

On Thu, Jan 05, 2006 at 01:46:36PM -0500, Scott Walters wrote:
I agree that having similar functionality to bb-fetch could be
useful
for a *few* remote/DMZ hosts, but it certainly doesn't scale well.
True, and I am not sold on the agentless clients idea either, but
we've
got such a great framework to try it on.

The first design decision in my mind would be if in agentless we mean

1)  install/run/uninstall the hobbit client every 5m, i know this
sounds
horribly inefficient but I am attracted to the simplicity of agent and
agentless machines being the 'same'.  or just automagically install
the
client if the trust exists . . . . .

2) only running the exact OS commands necessary and capturing the
output.
This would require some new code on the server.  But if done right, it
could perhaps replace existing clients.

abstract the collection from the delivery . . . .
You can easily combine 1) and 2). Running something like this on the 
client-polling server would do it:

CLIENT="www.foo.com"
CLIENTOS="linux"
echo "client $CLIENT.$CLIENTOS" >/tmp/msg-$CLIENT.txt
ssh $CLIENT < ~$BBHOME/bin/hobbitclient-$CLIENTOS.sh
/tmp/msg-$CLIENT.txt
$BB $BBDISP "@" </tmp/msg-$CLIENT.txt

would run the normal client-side scripts without having them installed
on each client box, and send the output to the Hobbit server in a normal
"client" message. There is an issue with the OS'es that need special
tools 
installed (usually to collect the memory statistics), but that can be
worked 
around.


Regards,
Henrik
list Scott Walters · Thu, 5 Jan 2006 16:04:38 -0500 (EST) ·
quoted from Adam Scheblein
You can easily combine 1) and 2). Running something like this on the
client-polling server would do it:
Wow, I should have known the hobbit client would have been this clean
already ;)

Real world:

[hobbit at myhost bin]# cat hobbitremote.sh
#!/bin/sh -x

CLIENT="myclient"
CLIENTOS="aix"

echo "client $CLIENT.$CLIENTOS" >/tmp/msg-$CLIENT.txt
ssh bb@$CLIENT < /usr/local/hobbit/client/bin/hobbitclient-$CLIENTOS.sh
/tmp/msg-$CLIENT.txt
#$BB $BBDISP "@" </tmp/msg-$CLIENT.txt

[hobbit at dev1 bin]# time ./hobbitremote.sh
CLIENT=aixserver
CLIENTOS=aix
+ echo client aixserver.aix
+ ssh bb at aixserver
Pseudo-terminal will not be allocated because stdin is not a terminal.
ksh[39]: top:  not found.

real 0m7.525s
user 0m0.120s
sys  0m0.010s


Seven seconds.  But I am pretty sure 5 of that is the ps of the machine.

Needless to say, this would need to be parallel processed to handle
scaling, but its awesome how easy the proof of concept was.


-- 
Scott Walters
-PacketPusher