hobbit-nkview.cfg question

23 messages in this thread

list Charles Jones · Mon, 18 Sep 2006 12:39:52 -0700 ·

I want to add ALL services for all the hosts in my prod layer, doing this via the GUI seems like it will take a long time. I notice the format of the hobbit-nkview.cfg seems pretty simple:

prod-app-19|disk||||2|Systems|Critical processes run from /apps partition|cjones 2006-09-17 15:39:16

I'm wondering if I can use wildcards for the service, to quickly and easily have all services show up in the Critical Systems view if they turn red.  Something like:

prod-app-19|*||||2|Systems|Contact Systems Group for Resolution and Updates|cjones 2006-09-17 15:39:16

Is this possible?

-Charles

list Stewart Larsen · Tue, 23 Oct 2007 12:42:24 -0400 ·

So, how are others doing this?  I have a server set up here in my
primary data center.  We're monitoring a few thousand hosts right now
with a large number of custom externals. 

I've been tasked with setting up a fail-over or disaster response server
in case our primary data center has issues.  All of our clients are
currently configured to send their messages to the IP address of our
primary server. 

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.  

Would I use bbproxy to do this?  But if I install bbproxy on the
primary, I won't be proxying the messages if my primary goes down... :(

--
Stewart Larsen

list Stewart L · Tue, 23 Oct 2007 14:18:16 -0400 ·

So, how are others doing this?  I have a server set up here in my
primary data center.  We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues.  All of our clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

Would I use bbproxy to do this?  But if I install bbproxy on the
primary, I won't be proxying the messages if my primary goes down...
:(

--
Stewart Larsen

list Henrik Størner · Tue, 23 Oct 2007 22:02:34 +0200 ·

▸ quoted from Stewart L

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

So, how are others doing this?  I have a server set up here in my
primary data center.  We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues.  All of our clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

I run two completely separate systems in parallel, and have the clients
report to both of them. The system at our disaster center has the paging
module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),
to avoid double alerts - it is simple to activate it, if necessary.

Config files are rsync'ed from the primary site to the disaster site
regularly.


Regards,
Henrik

list Rolf Schrittenlocher · Wed, 24 Oct 2007 07:47:03 +0200 ·

Hi,
you might use a virtual IP for hobbit server which could be assigned to 
another machine if necessary. In case both servers use a common raid, 
data could be stored on the raid so even history would be there. This is 
what we do.

Rolf

▸ quoted from Henrik Størner

So, how are others doing this?  I have a server set up here in my
primary data center.  We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues.  All of our clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

Would I use bbproxy to do this?  But if I install bbproxy on the
primary, I won't be proxying the messages if my primary goes down...


:(

--
Stewart Larsen

-- 
Mit freundlichen Gruessen
Rolf Schrittenlocher

HRZ/BDV, Senckenberganlage 31, 60054 Frankfurt
Tel: (XX) XX - XXX XXXXX   Fax: (XX) XX XXX XXXXX
LBS: user-1e39a1813094@xymon.invalid
Persoenlich: user-6ea8e907e200@xymon.invalid

list T.J. Yang · Wed, 24 Oct 2007 05:24:54 -0500 ·

Date: Tue, 23 Oct 2007 22:02:34 +0200> From: user-ce4a2c883f75@xymon.invalid> To: user-ae9b8668bcde@xymon.invalid> Subject: Re: [hobbit] Fail over?> > On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:> > So, how are others doing this? I have a server set up here in my> > primary data center. We're monitoring a few thousand hosts right now> > with a large number of custom externals.> > > > I've been tasked with setting up a fail-over or disaster response> > server in case our primary data center has issues. All of our clients> > are currently configured to send their messages to the IP address of> > our primary server.> > > > Now, I could just copy the bb-hosts file to the DR site, but then I> > would only get the network tests since the clients all report to the> > primary.> > I run two completely separate systems in parallel, and have the clients> report to both of them. The system at our disaster center has the paging> module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),> to avoid double alerts - it is simple to activate it, if necessary.

 
I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but
then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ? 
 
 
I believe this setup is the most simple failover solution at the only expense of extra
network bandwidth usgage to the secondary hb server.
 
tj

Config files are rsync'ed from the primary site to the disaster site> regularly.> > > Regards,> Henrik> > >  >

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!
http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews

list Josh Luthman · Wed, 24 Oct 2007 10:15:04 -0400 ·

I believe you could use something like a proxy (Squid maybe?) for clients to
connect to and then use one or the other.  I'm not familiar at all with
squid itself so I may be completely off, but a load balancer does sound like
an option.

▸ quoted from T.J. Yang


On 10/24/07, T.J. Yang <user-8e841282cda5@xymon.invalid> wrote:

Date: Tue, 23 Oct 2007 22:02:34 +0200
From: user-ce4a2c883f75@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

So, how are others doing this? I have a server set up here in my
primary data center. We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues. All of our clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

I run two completely separate systems in parallel, and have the clients
report to both of them. The system at our disaster center has the paging
module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),
to avoid double alerts - it is simple to activate it, if necessary.


I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on
Linux) but
then how can I configure the Cluster solution to failover from one
site(Florida) to another(NewYork) ?


I believe this setup is the most simple failover solution at the only
expense of extra
network bandwidth usgage to the secondary hb server.

tj

Config files are rsync'ed from the primary site to the disaster site
regularly.


Regards,
Henrik

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare!


Try now!<http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews>;

-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

list Sean R. Clark · Wed, 24 Oct 2007 11:05:16 -0400 ·

 
running snapshot 4.3
 
crashes twice a day, every day, @line 112 in loadhosts file
 
( sprintf(newp->pagepath, "%s/%s", curtoppage->pagepath, name);  )  is the
line in question
 
 
this is running on Solaris 10 x86, compiled with 
 
./configure.server --rrdinclude /sw/include --rrdlib /sw/lib --pcreinclude
/sw/include --pcrelib /sw/lib --sslinclude /sw/include/openssl --ssllib
/sw/ssl/lib
 
don't see why it's crashing at all, any ideas?
 
 
Reading hobbitd_channel
core file header read successfully
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libc.so.1
program terminated by signal ABRT (Abort)
0xfee60717: __lwp_kill+0x0007:  jae      __lwp_kill+0x15        [
0xfee60725, .+0xe ]
Current function is bbh_item
  466                     p = strrchr(host->page->pagetitle, '/');
(dbx) where

  [1] __lwp_kill(0x1, 0x6), at 0xfee60717 
  [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 
  [3] raise(0x6), at 0xfee0ced3 
  [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f),
at 0xfedf0969 
  [5] 0x80581fe(0xb, 0x0, 0x80467f0), at 0x80581fe 
  [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf 
  [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 
  [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 
  ---- called from signal handler with signal 11 (SIGSEGV) ------
=>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in
"loadhosts.c"
  [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc
"hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

list Josh Luthman · Wed, 24 Oct 2007 11:11:35 -0400 ·

As you're on Solaris, could you do a dtrace on it?

▸ quoted from Sean R. Clark


On 10/24/07, Sean R. Clark <user-94e09d797e16@xymon.invalid> wrote:


running snapshot 4.3

crashes twice a day, every day, @line 112 in loadhosts file

( sprintf(newp->pagepath, "%s/%s", curtoppage->pagepath, name);  )  is the
line in question


this is running on Solaris 10 x86, compiled with

./configure.server --rrdinclude /sw/include --rrdlib /sw/lib --pcreinclude
/sw/include --pcrelib /sw/lib --sslinclude /sw/include/openssl --ssllib
/sw/ssl/lib

don't see why it's crashing at all, any ideas?


Reading hobbitd_channel
core file header read successfully
Reading ld.so.1
Reading libresolv.so.2
Reading libsocket.so.1
Reading libnsl.so.1
Reading libc.so.1
program terminated by signal ABRT (Abort)
0xfee60717: __lwp_kill+0x0007:  jae      __lwp_kill+0x15        [
0xfee60725, .+0xe ]
Current function is bbh_item
  466                     p = strrchr(host->page->pagetitle, '/');
(dbx)
where
  [1] __lwp_kill(0x1, 0x6), at 0xfee60717
  [2] _thr_kill(0x1, 0x6), at 0xfee5ded4
  [3] raise(0x6), at 0xfee0ced3
  [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f),
at 0xfedf0969
  [5] 0x80581fe(0xb, 0x0, 0x80467f0), at 0x80581fe
  [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf
  [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3
  [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253
  ---- called from signal handler with signal 11 (SIGSEGV) ------
=>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in "
loadhosts.c"
  [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc
"hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

list Paul Ehrenreich · Wed, 24 Oct 2007 12:23:20 -0400 ·

That sounds like an interesting idea to use squid to load balance between
the two servers. I need to do something similar in our lab and been trying
to figure out the best way to do it.

▸ quoted from Josh Luthman


On 10/24/07, Josh Luthman <user-4c45a83f15cb@xymon.invalid> wrote:

I believe you could use something like a proxy (Squid maybe?) for clients
to connect to and then use one or the other.  I'm not familiar at all with
squid itself so I may be completely off, but a load balancer does sound like
an option.

On 10/24/07, T.J. Yang <user-8e841282cda5@xymon.invalid> wrote:

Date: Tue, 23 Oct 2007 22:02:34 +0200
From: user-ce4a2c883f75@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

So, how are others doing this? I have a server set up here in my
primary data center. We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues. All of our
clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

I run two completely separate systems in parallel, and have the
clients
report to both of them. The system at our disaster center has the
paging
module disabled (just disable the [bbpage] section in hobbitlaunch.cfg

),

to avoid double alerts - it is simple to activate it, if necessary.


I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on
Linux) but
then how can I configure the Cluster solution to failover from one
site(Florida) to another(NewYork) ?


I believe this setup is the most simple failover solution at the only
expense of extra
network bandwidth usgage to the secondary hb server.

tj

Config files are rsync'ed from the primary site to the disaster site
regularly.


Regards,
Henrik

Boo! Scare away worms, viruses and so much more! Try Windows Live
OneCare! Try now!<http://onecare.live.com/standard/en-us/purchase/trial.aspx?s_cid=wl_hotmailnews>;

--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

list Stewart L · Wed, 24 Oct 2007 13:10:25 -0400 ·

Will the old BB client report to two servers? Or do I need to upgrade
to the hobbit client?

▸ quoted from Paul Ehrenreich


On 10/23/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

So, how are others doing this?  I have a server set up here in my
primary data center.  We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues.  All of our clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

I run two completely separate systems in parallel, and have the clients
report to both of them. The system at our disaster center has the paging
module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),
to avoid double alerts - it is simple to activate it, if necessary.

Config files are rsync'ed from the primary site to the disaster site
regularly.


Regards,
Henrik

list T.J. Yang · Wed, 24 Oct 2007 12:44:13 -0500 ·

Isn't a proxy is another SPF (Single point of failure) ?

T.J. Yang

▸ quoted from Paul Ehrenreich

Date: Wed, 24 Oct 2007 12:23:20 -0400
From: user-98a0adc73677@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fail over?

That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.

On 10/24/07, Josh Luthman <user-4c45a83f15cb@xymon.invalid> wrote:
I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other.  I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.

On 10/24/07, T.J. Yang < user-8e841282cda5@xymon.invalid> wrote:

Date: Tue, 23 Oct 2007 22:02:34 +0200
From: user-ce4a2c883f75@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

So, how are others doing this? I have a server set up here in my
primary data center. We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues. All of our clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

I run two completely separate systems in parallel, and have the clients
report to both of them. The system at our disaster center has the paging
module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),
to avoid double alerts - it is simple to activate it, if necessary.


I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but
then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?


I believe this setup is the most simple failover solution at the only expense of extra
network bandwidth usgage to the secondary hb server.

tj

Config files are rsync'ed from the primary site to the disaster site
regularly.


Regards,
Henrik

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!

▸ quoted from Paul Ehrenreich



--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


Help yourself to FREE treats served up daily at the Messenger Café. Stop by today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline

list Stewart L · Wed, 24 Oct 2007 13:58:34 -0400 ·

yes, it is.

I've spoke with our infrastructure and support team and they will be
re-configuring the client on all of their machines to point to both
servers.  We already have the DR server up and running, I just need to
script the daily copy of the config files.

Thanks for all the input, folks!

Stewart

▸ quoted from T.J. Yang


On 10/24/07, T.J. Yang <user-8e841282cda5@xymon.invalid> wrote:

Isn't a proxy is another SPF (Single point of failure) ?

T.J. Yang

Date: Wed, 24 Oct 2007 12:23:20 -0400
From: user-98a0adc73677@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fail over?

That sounds like an interesting idea to use squid to load balance between the two servers. I need to do something similar in our lab and been trying to figure out the best way to do it.

On 10/24/07, Josh Luthman <user-4c45a83f15cb@xymon.invalid> wrote:
I believe you could use something like a proxy (Squid maybe?) for clients to connect to and then use one or the other.  I'm not familiar at all with squid itself so I may be completely off, but a load balancer does sound like an option.

On 10/24/07, T.J. Yang < user-8e841282cda5@xymon.invalid> wrote:

Date: Tue, 23 Oct 2007 22:02:34 +0200
From: user-ce4a2c883f75@xymon.invalid
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Fail over?

On Tue, Oct 23, 2007 at 02:18:16PM -0400, Stewart L wrote:

So, how are others doing this? I have a server set up here in my
primary data center. We're monitoring a few thousand hosts right now
with a large number of custom externals.

I've been tasked with setting up a fail-over or disaster response
server in case our primary data center has issues. All of our clients
are currently configured to send their messages to the IP address of
our primary server.

Now, I could just copy the bb-hosts file to the DR site, but then I
would only get the network tests since the clients all report to the
primary.

I run two completely separate systems in parallel, and have the clients
report to both of them. The system at our disaster center has the paging
module disabled (just disable the [bbpage] section in hobbitlaunch.cfg),
to avoid double alerts - it is simple to activate it, if necessary.


I was thinking of using Sun Cluster(hb on Solaris) or HeartBeat(hb on Linux) but
then how can I configure the Cluster solution to failover from one site(Florida) to another(NewYork) ?


I believe this setup is the most simple failover solution at the only expense of extra
network bandwidth usgage to the secondary hb server.

tj

Config files are rsync'ed from the primary site to the disaster site
regularly.


Regards,
Henrik

Boo! Scare away worms, viruses and so much more! Try Windows Live OneCare! Try now!


--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer


Help yourself to FREE treats served up daily at the Messenger Café. Stop by today.
http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline

list Hobbit User · Wed, 24 Oct 2007 13:59:14 -0400 (EDT) ·

On Wed, October 24, 2007 13:44, T.J. Yang wrote:

Isn't a proxy is another SPF (Single point of failure) ?

Well, yup, a single proxy would be.  And there are all kinds of options
for putting up multiple proxies and setting up high availability/load
balancing amongst _those_.  This whole thing has kind of transmogrified
from its beginnings as a discussion of replication and (possibly manual)
failover of a Hobbit server.

Not that there's anything wrong with that.

list Ralph Mitchell · Wed, 24 Oct 2007 13:02:30 -0500 ·

▸ quoted from T.J. Yang

On 10/24/07, T.J. Yang <user-8e841282cda5@xymon.invalid> wrote:

Isn't a proxy is another SPF (Single point of failure) ?

Yes, but a proxy doesn't have to be as complicated as a whole Hobbit
server.  It wouldn't even necessarily have to have disks drives - in a
crunch you could probably run a Linux distro off a LiveCD or USB
stick, with Squid or Tinyproxy builtin.

Ralph Mitchell

list Josh Luthman · Wed, 24 Oct 2007 14:09:11 -0400 ·

I think the popular method of doing this around here is to use a USB thumb
drive and a VM.  I'd suggest making a duplicate of this.  I'm sure you'll
have a spare PC to plug it into already, but I wanted to point it out.
Having two of everything on cold swap will definitely help you keep things
running.

Josh

▸ quoted from Ralph Mitchell


On 10/24/07, Ralph Mitchell <user-00a5e44c48c0@xymon.invalid> wrote:

On 10/24/07, T.J. Yang <user-8e841282cda5@xymon.invalid> wrote:

Isn't a proxy is another SPF (Single point of failure) ?

Yes, but a proxy doesn't have to be as complicated as a whole Hobbit
server.  It wouldn't even necessarily have to have disks drives - in a
crunch you could probably run a Linux distro off a LiveCD or USB
stick, with Squid or Tinyproxy builtin.

Ralph Mitchell

-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

list Tom L. Stewart · Wed, 24 Oct 2007 15:01:13 -0500 ·

 I'm am just setting this up and I am wondering if I can vi/sed the
configuration file hobbit-nkview.cfg? I noticed that the file is
ordered, so should I also sort the file?

I have to add a lot of systems and the host clone still take some time
even if I add multiple hosts at the same time.

Thank you,
Tom

list Gary Baluha · Wed, 24 Oct 2007 16:38:39 -0400 ·

I have manually edited the hobbit-nkview.cfg file without problems.  Be
aware, however, that it is more format-specific than the other hobbit
configuration files, and if you typo/etc, it will probably break some
functionality of the critical systems feature.  I have been careful enough
to not make any errors, so I'm not sure exactly what could break.  But as
long as you are careful, it's fine.

▸ quoted from Tom L. Stewart


On 10/24/07, Stewart, Tom L. <user-f210f371749e@xymon.invalid> wrote:


I'm am just setting this up and I am wondering if I can vi/sed the
configuration file hobbit-nkview.cfg? I noticed that the file is
ordered, so should I also sort the file?

I have to add a lot of systems and the host clone still take some time
even if I add multiple hosts at the same time.

Thank you,
Tom

list Josh Luthman · Wed, 24 Oct 2007 16:49:39 -0400 ·

The way I would work on doing something critical like this is working on a
secondary system that is duplicated with it.  Anything will work really - if
you can head to a flea market or something, even a 4x86 could work =)

▸ quoted from Gary Baluha


On 10/24/07, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:

I have manually edited the hobbit-nkview.cfg file without problems.  Be
aware, however, that it is more format-specific than the other hobbit
configuration files, and if you typo/etc, it will probably break some
functionality of the critical systems feature.  I have been careful enough
to not make any errors, so I'm not sure exactly what could break.  But as
long as you are careful, it's fine.

On 10/24/07, Stewart, Tom L. <user-f210f371749e@xymon.invalid> wrote:


I'm am just setting this up and I am wondering if I can vi/sed the
configuration file hobbit-nkview.cfg? I noticed that the file is
ordered, so should I also sort the file?

I have to add a lot of systems and the host clone still take some time
even if I add multiple hosts at the same time.

Thank you,
Tom

-- 
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX

Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer

list Henrik Størner · Thu, 25 Oct 2007 10:16:24 +0200 ·

▸ quoted from Sean R. Clark

On Wed, Oct 24, 2007 at 11:05:16AM -0400, Sean R. Clark wrote:

  [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 
  ---- called from signal handler with signal 11 (SIGSEGV) ------
=>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in
"loadhosts.c"
  [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc
"hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

This trace doesn't make sense - the "bbh_item()" function isn't called
from the "load_hostnames()" function. So I think there's some corruption
of the stack involved.

Either that, or the binary you're running doesn't match the source code
you have (ie. your source files were not used to compile the binary that
is running).

If you load the binary and core into gdb as you did to get the stack
trace, could you then do this:
   gdb> fr 10
This should print out that you're now at stackframe #10, which is the
"load_hostnames" routine.
   gdb> p *inbuf
   gdb> p name
   gdb> p title
These print out the value of a number of variables.
   gdb> fr 9
   gdb> p *hostin


Regards,
Henrik

list Sean R. Clark · Thu, 25 Oct 2007 11:42:19 -0400 ·

Ahh you are correct, my binary + source did not match

Here is the stack trace from the (correct) binary (it's still crashing)

All of them all show

▸ quoted from Josh Luthman



program terminated by signal ABRT (Abort)
0xfee60717: __lwp_kill+0x0007:  jae      __lwp_kill+0x15        [
0xfee60725, .+0xe ]


Current function is sigsegv_handler
   58           abort();

▸ quoted from Josh Luthman

(dbx) where

  [1] __lwp_kill(0x1, 0x6), at 0xfee60717 
  [2] _thr_kill(0x1, 0x6), at 0xfee5ded4 
  [3] raise(0x6), at 0xfee0ced3 
  [4] abort(0x8071c20, 0x0, 0x8046758, 0xfee4dd4f, 0x8046758, 0xfee4dd4f),
at 0xfedf0969


=>[5] sigsegv_handler(signum = 11), line 58 in "sig.c"

▸ quoted from Josh Luthman

  [6] __sighndlr(0xb, 0x0, 0x80467f0, 0x80581d0), at 0xfee5fadf 
  [7] call_user_handler(0xb, 0x0, 0x80467f0), at 0xfee560d3 
  [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 
  ---- called from signal handler with signal 11 (SIGSEGV) ------


  [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"


From this:

        /*
         * Try to fork a child to send in an alarm message.
         * If the fork fails, then just attempt to exec() the BB command
         */


Do you have any commands I can run in gdb or dbx to help further?

The name & inbuf are not defined when I try it with the correct binary  +
core

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Thursday, October 25, 2007 4:16 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd_channel still crashing everyday

On Wed, Oct 24, 2007 at 11:05:16AM -0400, Sean R. Clark wrote:

  [8] sigacthandler(0xb, 0x0, 0x80467f0, 0xf, 0x0, 0x0), at 0xfee56253 
  ---- called from signal handler with signal 11 (SIGSEGV) ------ 
=>[9] bbh_item(hostin = 0x80739a8, item = BBH_NET), line 466 in 
"loadhosts.c"
  [10] load_hostnames(bbhostsfn = (nil), extrainclude = 0x8046ddc 
"hobbitd_channel", fqdn = 134508012), line 112 in "loadhosts_file.c"

This trace doesn't make sense - the "bbh_item()" function isn't called from
the "load_hostnames()" function. So I think there's some corruption of the
stack involved.

Either that, or the binary you're running doesn't match the source code you
have (ie. your source files were not used to compile the binary that is
running).

If you load the binary and core into gdb as you did to get the stack trace,
could you then do this:
   gdb> fr 10
This should print out that you're now at stackframe #10, which is the
"load_hostnames" routine.
   gdb> p *inbuf
   gdb> p name
   gdb> p title
These print out the value of a number of variables.
   gdb> fr 9
   gdb> p *hostin


Regards,
Henrik

list Henrik Størner · Thu, 25 Oct 2007 22:42:28 +0200 ·

▸ quoted from Sean R. Clark

On Thu, Oct 25, 2007 at 11:42:19AM -0400, Sean R. Clark wrote:

Ahh you are correct, my binary + source did not match

Here is the stack trace from the (correct) binary (it's still crashing)
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"

Thanks, the line number isn't quite right, but I think this patch should
fix it. However, it should only happen if the worker process
(hobbitd_alert, hobbitd_rrd, hobbitd_history) cannot keep up with the
flow of incoming messages, so there might be a different problem with 
your setup that triggers this. That would also explain why you see it
regularly, and others do not.

Anyway, let me know if this patch stops it from crashing.


Regards,
Henrik

Attachments (1)

attachment.patch application/octet-stream · 1.2 KB

list Sean R. Clark · Sun, 28 Oct 2007 18:32:07 -0400 ·

Just to follow up

Since applying the patch it's been stable

Thanks for the patch


-Sean

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Thursday, October 25, 2007 4:42 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd_channel still crashing everyday

On Thu, Oct 25, 2007 at 11:42:19AM -0400, Sean R. Clark wrote:

Ahh you are correct, my binary + source did not match

Here is the stack trace from the (correct) binary (it's still crashing)
  ---- called from signal handler with signal 11 (SIGSEGV) ------
  [9] main(argc = 4, argv = 0x8046b28), line 678 in "hobbitd_channel.c"

Thanks, the line number isn't quite right, but I think this patch should fix
it. However, it should only happen if the worker process (hobbitd_alert,
hobbitd_rrd, hobbitd_history) cannot keep up with the flow of incoming
messages, so there might be a different problem with your setup that
triggers this. That would also explain why you see it regularly, and others
do not.

Anyway, let me know if this patch stops it from crashing.


Regards,
Henrik

hobbit-nkview.cfg question 🔗 link

hobbit-nkview.cfg question