All network test suddenly purple
list Ralf Strandell
Hi I am doing a migration from BigBrother to Hobbit. I have installed Hobbit on a new dedicated Linux server and configured the bb-hosts and started Hobbit. First everything seemed to work. I had nice results from ping, http and dns tests. After that I did not even touch the Hobbit system. Then it all suddenly stopped. All my network tests are purple. Bb-network.log is zero bytes. The ONLY error message that I can find is "df: cannot read table of mounted file systems: Permission denied" and that is not a problem. There's also "hobbitd status-board not available" in the Hobbit server log. I guess something must be wrong with bbtest-net. Any ideas? PID PPID STIME ELAPSED COMMAND 18428 1 Nov30 5-03:06:20 /home/hobbit/server/bin/hobbitlaunch --config=/home/hobbit/server/etc/hobbitlaunch.cfg --env=/h 18429 18428 Nov30 5-03:06:20 hobbitd --pidfile=/var/log/hobbit/hobbitd.pid --restart=/home/hobbit/server/tmp/hobbitd.chk --c 18430 18428 Nov30 5-03:06:15 hobbitd_channel --channel=stachg --log=/var/log/hobbit/history.log hobbitd_history 18431 18430 Nov30 5-03:06:15 hobbitd_history 18432 18428 Nov30 5-03:06:15 hobbitd_channel --channel=clichg --log=/var/log/hobbit/hostdata.log hobbitd_hostdata 18433 18432 Nov30 5-03:06:15 hobbitd_hostdata 18434 18428 Nov30 5-03:06:15 hobbitd_channel --channel=page --log=/var/log/hobbit/page.log hobbitd_alert --checkpoint-file=/ 18435 18434 Nov30 5-03:06:15 hobbitd_alert --checkpoint-file=/home/hobbit/server/tmp/alert.chk --checkpoint-interval=600 18436 18428 Nov30 5-03:06:15 hobbitd_channel --channel=status --log=/var/log/hobbit/rrd-status.log hobbitd_rrd --rrddir=/hom 18437 18428 Nov30 5-03:06:15 hobbitd_channel --channel=data --log=/var/log/hobbit/rrd-data.log hobbitd_rrd --rrddir=/home/ho 18438 18436 Nov30 5-03:06:15 hobbitd_rrd --rrddir=/home/hobbit/data/rrd 18439 18437 Nov30 5-03:06:15 hobbitd_rrd --rrddir=/home/hobbit/data/rrd 18440 18428 Nov30 5-03:06:15 hobbitd_channel --channel=client --log=/var/log/hobbit/clientdata.log hobbitd_client 18441 18440 Nov30 5-03:06:15 hobbitd_client 26129 18428 Dec04 22:35:09 bbtest-net --report --ping --checkresponse 9286 1 14:36 01:24 sh -c vmstat 300 2 1>/home/hobbit/client/tmp/hobbit_vmstat.slfitkusap007.9269 2>&1; mv /home/ho 9287 9286 14:36 01:24 vmstat 300 2
list Jason Altrincham Jones
Hi All,
Have been working with clientupdate to get our server's RE-IP to go
smoothly but am having problems, I put the tar files into ~/download and
alter client-local.cfg to do:
[Foo.bar]
ClientID=server_move
The file under ~/download is call server_move.tar and has 777
permissions
The problem is it isn't working and I can't see a reason why, when on
foo.bar itself I run:
~/bin/clientupdate -update=server_move
And get the output:
tar: blocksize = 0
even adding .tar to the end of the -update does nothing, is there any
other way I can see what is happening and why it is going wrong?
Thanks,
Jason.
(note: - is -- outlook just does some formatting crap)
list Buchan Milne
▸
On Tuesday 05 December 2006 14:39, Strandell, Ralf wrote:
Hi I am doing a migration from BigBrother to Hobbit. I have installed Hobbit on a new dedicated Linux server and configured the bb-hosts and started Hobbit. First everything seemed to work. I had nice results from ping, http and dns tests. After that I did not even touch the Hobbit system. Then it all suddenly stopped. All my network tests are purple. Bb-network.log is zero bytes. The ONLY error message that I can find is "df: cannot read table of mounted file systems: Permission denied" and that is not a problem. There's also "hobbitd status-board not available" in the Hobbit server log. I guess something must be wrong with bbtest-net. Any ideas?
Well, this can happen when the hobbit server runs out of disk space ...
--
Buchan Milne
ISP Systems Specialist - Monitoring/Authentication Team Leader
B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
list Greg L Hubbard
Last time I had this problem it was because of a problem with a DNS server. Even though Hobbit is supposed to be immune to DNS issues, it isn't really bulletproof. GLH
▸
From: Strandell, Ralf [mailto:user-c0435af0b547@xymon.invalid]
Sent: Tuesday, December 05, 2006 6:39 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] All network test suddenly purple
Hi
I am doing a migration from BigBrother to Hobbit. I have
installed Hobbit on a new dedicated Linux server and configured the
bb-hosts and started Hobbit. First everything seemed to work. I had nice
results from ping, http and dns tests. After that I did not even touch
the Hobbit system. Then it all suddenly stopped. All my network tests
are purple. Bb-network.log is zero bytes. The ONLY error message that I
can find is "df: cannot read table of mounted file systems: Permission
denied" and that is not a problem. There's also "hobbitd status-board
not available" in the Hobbit server log. I guess something must be wrong
with bbtest-net. Any ideas?
PID PPID STIME ELAPSED COMMAND
18428 1 Nov30 5-03:06:20
/home/hobbit/server/bin/hobbitlaunch
--config=/home/hobbit/server/etc/hobbitlaunch.cfg --env=/h
18429 18428 Nov30 5-03:06:20 hobbitd
--pidfile=/var/log/hobbit/hobbitd.pid
--restart=/home/hobbit/server/tmp/hobbitd.chk --c
18430 18428 Nov30 5-03:06:15 hobbitd_channel --channel=stachg
--log=/var/log/hobbit/history.log hobbitd_history
18431 18430 Nov30 5-03:06:15 hobbitd_history
18432 18428 Nov30 5-03:06:15 hobbitd_channel --channel=clichg
--log=/var/log/hobbit/hostdata.log hobbitd_hostdata
18433 18432 Nov30 5-03:06:15 hobbitd_hostdata
18434 18428 Nov30 5-03:06:15 hobbitd_channel --channel=page
--log=/var/log/hobbit/page.log hobbitd_alert --checkpoint-file=/
18435 18434 Nov30 5-03:06:15 hobbitd_alert
--checkpoint-file=/home/hobbit/server/tmp/alert.chk
--checkpoint-interval=600
18436 18428 Nov30 5-03:06:15 hobbitd_channel --channel=status
--log=/var/log/hobbit/rrd-status.log hobbitd_rrd --rrddir=/hom
18437 18428 Nov30 5-03:06:15 hobbitd_channel --channel=data
--log=/var/log/hobbit/rrd-data.log hobbitd_rrd --rrddir=/home/ho
18438 18436 Nov30 5-03:06:15 hobbitd_rrd
--rrddir=/home/hobbit/data/rrd
18439 18437 Nov30 5-03:06:15 hobbitd_rrd
--rrddir=/home/hobbit/data/rrd
18440 18428 Nov30 5-03:06:15 hobbitd_channel --channel=client
--log=/var/log/hobbit/clientdata.log hobbitd_client
18441 18440 Nov30 5-03:06:15 hobbitd_client
26129 18428 Dec04 22:35:09 bbtest-net --report --ping
--checkresponse
9286 1 14:36 01:24 sh -c vmstat 300 2
1>/home/hobbit/client/tmp/hobbit_vmstat.slfitkusap007.9269 2>&1; mv
/home/ho
9287 9286 14:36 01:24 vmstat 300 2
list Jason Altrincham Jones
The plot thickens, I can now do a clientupdate...sort of, for some damnable reason it puts itself under /usr/local/hobbit/client/hobbit/client/ (the repeated hobbit/client is not a mistake) any ideas why it would do this?? Running tar xvf on the files on the server creates the directories in the same directory as the tar (i.e. no hobbit/client) so why is it doing it on the client side? Do I need to alter BBHOME somehow? Thanks, Jason.
▸
From: Jones, Jason (Altrincham)
Sent: 05 December 2006 13:54
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] clientupdate help
Hi All,
Have been working with clientupdate to get our server's RE-IP to go
smoothly but am having problems, I put the tar files into ~/download and
alter client-local.cfg to do:
[Foo.bar]
ClientID=server_move
The file under ~/download is call server_move.tar and has 777
permissions
The problem is it isn't working and I can't see a reason why, when on
foo.bar itself I run:
~/bin/clientupdate -update=server_move
And get the output:
tar: blocksize = 0
even adding .tar to the end of the -update does nothing, is there any
other way I can see what is happening and why it is going wrong?
Thanks,
Jason.
(note: - is -- outlook just does some formatting crap)
list Greg L Hubbard
Sounds like you may have created your tar file incorrectly. The somewhat sketchy instructions say to create the tarball relative to the client directory, not root. GLH From: Jones, Jason (Altrincham) [mailto:user-ee957b46acd2@xymon.invalid] Sent: Tuesday, December 05, 2006 10:00 AM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] clientupdate help
▸
The plot thickens, I can now do a clientupdate...sort of, for
some damnable reason it puts itself under
/usr/local/hobbit/client/hobbit/client/ (the repeated hobbit/client is
not a mistake) any ideas why it would do this?? Running tar xvf on the
files on the server creates the directories in the same directory as the
tar (i.e. no hobbit/client) so why is it doing it on the client side? Do
I need to alter BBHOME somehow?
Thanks,
Jason.
From: Jones, Jason (Altrincham)
Sent: 05 December 2006 13:54
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] clientupdate help
Hi All,
Have been working with clientupdate to get our server's RE-IP to
go smoothly but am having problems, I put the tar files into ~/download
and alter client-local.cfg to do:
[Foo.bar]
ClientID=server_move
The file under ~/download is call server_move.tar and has 777
permissions
The problem is it isn't working and I can't see a reason why,
when on foo.bar itself I run:
~/bin/clientupdate -update=server_move
And get the output:
tar: blocksize = 0
even adding .tar to the end of the -update does nothing, is
there any other way I can see what is happening and why it is going
wrong?
Thanks,
Jason.
(note: - is -- outlook just does some formatting crap)
list Ralf Strandell
It was neither the IPC resources. The output of IPCS looks similar before and after a restart/recovery. So, if it wasn't disk/memory/cpu/ipcs, then what? I can't build an enterprise class monitoring system that just shuts down without any known cause. What kind of filesystem access does Hobbit need? I have a bad habit of restricting access... Oh yes, I had chosen "paranoid" filesys security just because it was possible.
▸
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Tuesday, December 05, 2006 4:56 PM
To: user-ae9b8668bcde@xymon.invalid
Cc: Strandell, Ralf
Subject: Re: [hobbit] All network test suddenly purple
On Tuesday 05 December 2006 14:39, Strandell, Ralf wrote:Hi I am doing a migration from BigBrother to Hobbit. I have installed Hobbit on a new dedicated Linux server and configured the bb-hosts and
started Hobbit. First everything seemed to work. I had nice results from ping, http and dns tests. After that I did not even touch the Hobbit system. Then it all suddenly stopped. All my network tests are purple. Bb-network.log is zero bytes. The ONLY error message that I can find is "df: cannot read table of mounted file systems: Permission denied" and
that is not a problem. There's also "hobbitd status-board not available" in the Hobbit server log. I guess something must be wrong with bbtest-net. Any ideas?
Well, this can happen when the hobbit server runs out of disk space ... -- Buchan Milne ISP Systems Specialist - Monitoring/Authentication Team Leader B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
list Ralf Strandell
Well, there is 15 GB of free storage space... Enough free memory and swap. Load is zero. There are not too many processes. Mem: 646492k total, 314756k used, 331736k free, 51164k buffers Swap: 1052216k total, 0k used, 1052216k free, 220304k cached Hobbit has write permission to its home directory and in /tmp, and nowhere else.
▸
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid] Sent: Tuesday, December 05, 2006 4:56 PM
To: user-ae9b8668bcde@xymon.invalid
Cc: Strandell, Ralf
Subject: Re: [hobbit] All network test suddenly purple
On Tuesday 05 December 2006 14:39, Strandell, Ralf wrote:Hi I am doing a migration from BigBrother to Hobbit. I have installed Hobbit on a new dedicated Linux server and configured the bb-hosts and
started Hobbit. First everything seemed to work. I had nice results from ping, http and dns tests. After that I did not even touch the Hobbit system. Then it all suddenly stopped. All my network tests are purple. Bb-network.log is zero bytes. The ONLY error message that I can find is "df: cannot read table of mounted file systems: Permission denied" and
that is not a problem. There's also "hobbitd status-board not available" in the Hobbit server log. I guess something must be wrong with bbtest-net. Any ideas?
Well, this can happen when the hobbit server runs out of disk space ... -- Buchan Milne ISP Systems Specialist - Monitoring/Authentication Team Leader B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
list Daniel J McDonald
▸
On Tue, 2006-12-05 at 18:09 +0200, Strandell, Ralf wrote:
It was neither the IPC resources. The output of IPCS looks similar before and after a restart/recovery. So, if it wasn't disk/memory/cpu/ipcs, then what? I can't build an enterprise class monitoring system that just shuts down without any known cause. What kind of filesystem access does Hobbit need? I have a bad habit of restricting access... Oh yes, I had chosen "paranoid" filesys security just because it was possible.
Ah, the old "msec changed my permissions to something completely bizzare and now it won't run" problem. I put hobbit clients in the adm and ntools groups, and change the permissions on /etc/mandriva-release and /var/log/messages to be able to be read by the adm group (in /etc/security/msec/perm.local). Also, for server, hobbitping has to be installed setuid (just like fping). Hobbit server needs at least to be able to write to its subdirectories in /var/lib/hobbit/... Look for a corefile - if it crashed, there will be a core in /var/lib/hobbit/server/ or maybe a level deeper. And check disk space: use the `df` command. -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com
list Ralf Strandell
No, unfortunately not. I have changed the permissions to "easy" in Yast (Suse Linux), and the network test still performs one poll only. Hobbitping is suid root. Fping is ran with sudo, and sudo is suid root. Hobbitping works: User "hobbit" can use it to ping ip addresses. It fails to ping hostnames. The /home/hobbit and /var/log/hobbit directory trees is writable. The bbtest page does not update either - it really looks like the network test would not run at all. Is there a way to debug the network test to find out what is going on?
▸
-----Original Message-----
From: Daniel J McDonald [mailto:user-290ce4e24e19@xymon.invalid]
Sent: Tuesday, December 05, 2006 7:28 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] All network test suddenly purple
On Tue, 2006-12-05 at 18:09 +0200, Strandell, Ralf wrote:It was neither the IPC resources. The output of IPCS looks similar before and after a restart/recovery. So, if it wasn't disk/memory/cpu/ipcs, then what? I can't build an enterprise class monitoring system that just shuts down without any known cause. What kind of filesystem access does Hobbit need? I have a bad habit of
restricting access... Oh yes, I had chosen "paranoid" filesys security
just because it was possible.
Ah, the old "msec changed my permissions to something completely bizzare and now it won't run" problem. I put hobbit clients in the adm and ntools groups, and change the permissions on /etc/mandriva-release and /var/log/messages to be able to be read by the adm group (in /etc/security/msec/perm.local). Also, for server, hobbitping has to be installed setuid (just like fping). Hobbit server needs at least to be able to write to its subdirectories in /var/lib/hobbit/... Look for a corefile - if it crashed, there will be a core in /var/lib/hobbit/server/ or maybe a level deeper. And check disk space: use the `df` command. -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com
list Greg L Hubbard
"It fails to ping hostnames." I have seen the network test stall out if there is a DNS problem, even if the "testip" flag is set. If you turn on debug for the bb-network test (I forget the exact name, but you can look it up in the man pages) then you can see if the tests are getting stuck on a reverse lookup. Even though the Ares code is supposed to time out if a DNS server does not respond, this seems to be wishful thinking in the documentation. My system was a Sun Solaris system. Don't know how many hosts you are monitoring, but if you were to add some to your hobbit server /etc/hosts file, you might be able to see if this is a DNS issue. I guess you could check your resolver configuration (/etc/resolv.conf?) and whatever it is on your system that determines precedence for naming services (on Solaris it is /etc/nsswitch.conf, not sure about your brand of Linux). A shot in the dark, but based on something that bedeviled me for a couple of days... GLH
▸
-----Original Message-----
From: Strandell, Ralf [mailto:user-c0435af0b547@xymon.invalid]
Sent: Thursday, December 07, 2006 9:13 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] All network test suddenly purple
No, unfortunately not.
I have changed the permissions to "easy" in Yast (Suse Linux), and the
network test still performs one poll only.
Hobbitping is suid root. Fping is ran with sudo, and sudo is suid root.
Hobbitping works: User "hobbit" can use it to ping ip addresses. It
fails to ping hostnames. The /home/hobbit and /var/log/hobbit directory
trees is writable. The bbtest page does not update either - it really
looks like the network test would not run at all.
Is there a way to debug the network test to find out what is going on?
-----Original Message-----
From: Daniel J McDonald [mailto:user-290ce4e24e19@xymon.invalid]
Sent: Tuesday, December 05, 2006 7:28 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] All network test suddenly purple
On Tue, 2006-12-05 at 18:09 +0200, Strandell, Ralf wrote:It was neither the IPC resources. The output of IPCS looks similar before and after a restart/recovery. So, if it wasn't disk/memory/cpu/ipcs, then what? I can't build an enterprise class monitoring system that just shuts down without any known cause. What kind of filesystem access does Hobbit need? I have a bad habit of
restricting access... Oh yes, I had chosen "paranoid" filesys security
just because it was possible.
Ah, the old "msec changed my permissions to something completely bizzare and now it won't run" problem. I put hobbit clients in the adm and ntools groups, and change the permissions on /etc/mandriva-release and /var/log/messages to be able to be read by the adm group (in /etc/security/msec/perm.local). Also, for server, hobbitping has to be installed setuid (just like fping). Hobbit server needs at least to be able to write to its subdirectories in /var/lib/hobbit/... Look for a corefile - if it crashed, there will be a core in /var/lib/hobbit/server/ or maybe a level deeper. And check disk space: use the `df` command. -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com
list Ralf Strandell
Thanks! I have now set the --dns=ip option for bbtest-net and now it works like a dream. Hobbit seems to report a "cannot run fping" condition as a "no icmp echo reply". That's a bit misleading as the problem was not network related. Not a big problem though - now that I know it. Ps. Hobbit is a wonderfull piece of code. Much much better than BigBrother ever was.
▸
-----Original Message-----
From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid]
Sent: Thursday, December 07, 2006 5:35 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] All network test suddenly purple
"It fails to ping hostnames."
I have seen the network test stall out if there is a DNS problem, even
if the "testip" flag is set. If you turn on debug for the bb-network
test (I forget the exact name, but you can look it up in the man pages)
then you can see if the tests are getting stuck on a reverse lookup.
Even though the Ares code is supposed to time out if a DNS server does
not respond, this seems to be wishful thinking in the documentation. My
system was a Sun Solaris system.
Don't know how many hosts you are monitoring, but if you were to add
some to your hobbit server /etc/hosts file, you might be able to see if
this is a DNS issue.
I guess you could check your resolver configuration (/etc/resolv.conf?)
and whatever it is on your system that determines precedence for naming
services (on Solaris it is /etc/nsswitch.conf, not sure about your brand
of Linux).
A shot in the dark, but based on something that bedeviled me for a
couple of days...
GLH
-----Original Message-----
From: Strandell, Ralf [mailto:user-c0435af0b547@xymon.invalid]
Sent: Thursday, December 07, 2006 9:13 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] All network test suddenly purple
No, unfortunately not.
I have changed the permissions to "easy" in Yast (Suse Linux), and the
network test still performs one poll only.
Hobbitping is suid root. Fping is ran with sudo, and sudo is suid root.
Hobbitping works: User "hobbit" can use it to ping ip addresses. It
fails to ping hostnames. The /home/hobbit and /var/log/hobbit directory
trees is writable. The bbtest page does not update either - it really
looks like the network test would not run at all.
Is there a way to debug the network test to find out what is going on?
-----Original Message-----
From: Daniel J McDonald [mailto:user-290ce4e24e19@xymon.invalid]
Sent: Tuesday, December 05, 2006 7:28 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] All network test suddenly purple
On Tue, 2006-12-05 at 18:09 +0200, Strandell, Ralf wrote:It was neither the IPC resources. The output of IPCS looks similar before and after a restart/recovery. So, if it wasn't disk/memory/cpu/ipcs, then what? I can't build an enterprise class monitoring system that just shuts down without any known cause. What kind of filesystem access does Hobbit need? I have a bad habit of
restricting access... Oh yes, I had chosen "paranoid" filesys security
just because it was possible.
Ah, the old "msec changed my permissions to something completely bizzare and now it won't run" problem. I put hobbit clients in the adm and ntools groups, and change the permissions on /etc/mandriva-release and /var/log/messages to be able to be read by the adm group (in /etc/security/msec/perm.local). Also, for server, hobbitping has to be installed setuid (just like fping). Hobbit server needs at least to be able to write to its subdirectories in /var/lib/hobbit/... Look for a corefile - if it crashed, there will be a core in /var/lib/hobbit/server/ or maybe a level deeper. And check disk space: use the `df` command. -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com