Xymon Mailing List Archive search

Multiple hobbit (bbproxy and bb server) queries

4 messages in this thread

list E-mail j.sansford · Tue, 7 Jul 2009 14:40:19 +0100 ·
Hi guys,

You may remember my questions from last week. Thanks again for these. I have now implemented it however I have a few questions (and possibly bugs?). I will first start by describing the setup. To keep things simple I'll call each hobbit server IP as either "A" "B" or "C" depending on the data centre.

Data centre 1:
bbproxy and bbserver (running on same box, A). bbproxy configured to send to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg.

Data centre 2:
bbproxy and bbserver (running on same box, B). bbproxy configured to send to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg.

Data centre 3:
bbproxy and bbserver (running on same box, C). bbproxy configured to send to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg.

Firstly, everything looks good. However, if I am to stop bbserver at A (but keep bbproxy running at A) then shortly afterwards bbproxy at A will start crashing. I've tried changing the order of --bbdisplays and it seems like the bbproxy will crash if the last bbdisplay IP has been shutdown/not available. Is this known, or is there a workaround?

To explain this better - Lets say bbproxy at site B is configured as --bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy will crash shortly afterwards. Note I'm on x86 Solaris.


My other question is this - currently if a proxy crashes and the other 2 xymon servers do not receive updates, most tests continue to stay green. I'm sure I've seen a configuration option but I can't seem to find it - can I configure these tests to go purple if they don't receive an update within the next 5-10 minutes? They've only just gone purple after 30 minutes, but we really need to know within 5 minutes if we haven't received a valid update.

Many thanks,
James.
list Ralph Mitchell · Tue, 7 Jul 2009 12:31:36 -0500 ·
What version of hobbit/xymon are you running??  I used to have a problem
like that with 4.2.  No bbproxy involved there, just several hobbit servers.
 If one of them was down, the server/bin/bb command would hang trying to
talk to it.  It should have either failed to make the connect or timed out,
but it didn't.  Anyway...
You could set up a simple heartbeat script, a bit like this:

     #!/bin/bash

     $BB $BBDISP "status+2 bbproxyname.panicnow `date`
        If this is purple, a bbproxy died."

Set that up to launch every minute.  The message has a lifetime of 2
minutes, so it'll go purple about 3 minutes after the bbproxy hangs up or
dies.  You might want to pick a different column name.  :)

Ralph Mitchell
quoted from E-mail j.sansford


On Tue, Jul 7, 2009 at 8:40 AM, <user-c15424b7e83a@xymon.invalid> wrote:
Hi guys,

You may remember my questions from last week. Thanks again for these. I
have now implemented it however I have a few questions (and possibly bugs?).
I will first start by describing the setup. To keep things simple I'll call
each hobbit server IP as either "A" "B" or "C" depending on the data centre.

Data centre 1:
bbproxy and bbserver (running on same box, A). bbproxy configured to send
to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg.

Data centre 2:
bbproxy and bbserver (running on same box, B). bbproxy configured to send
to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg.

Data centre 3:
bbproxy and bbserver (running on same box, C). bbproxy configured to send
to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg.

Firstly, everything looks good. However, if I am to stop bbserver at A (but
keep bbproxy running at A) then shortly afterwards bbproxy at A will start
crashing. I've tried changing the order of --bbdisplays and it seems like
the bbproxy will crash if the last bbdisplay IP has been shutdown/not
available. Is this known, or is there a workaround?

To explain this better - Lets say bbproxy at site B is configured as
--bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy
will crash shortly afterwards. Note I'm on x86 Solaris.


My other question is this - currently if a proxy crashes and the other 2
xymon servers do not receive updates, most tests continue to stay green. I'm
sure I've seen a configuration option but I can't seem to find it - can I
configure these tests to go purple if they don't receive an update within
the next 5-10 minutes? They've only just gone purple after 30 minutes, but
we really need to know within 5 minutes if we haven't received a valid
update.

Many thanks,
James.

list James · Tue, 7 Jul 2009 21:31:37 +0100 ·
Hi Ralph,

I'm using 4.2.3 - this is for a production area so I'm weary to try out betas currently, we need stability. I've configured bbproxy as a service on Solaris so it can bring itself back, but it will just crash again so I think we might miss quite a few messages..which links into my second point -> I was trying to find a way of configuring things such as conn, cpu etc to have a faster purple time and to alert for it. I've found the lifetime option but this as far as I can see is for custom stuff - not sure how to set it up for tests such as conn, ssh, cpu etc?

I'll try out your script tomorrow and have a play - as it currently stands if a bbserver dies for whatever reason (and therefore a bbproxy) it can look for up to half an hour like many things are "green" when they arn't infact being updated.

Cheers
James
quoted from Ralph Mitchell
  ----- Original Message ----- 
  From: Ralph Mitchell 
  To: user-ae9b8668bcde@xymon.invalid 
  Sent: Tuesday, July 07, 2009 6:31 PM
  Subject: Re: [hobbit] Multiple hobbit (bbproxy and bb server) queries


  What version of hobbit/xymon are you running??  I used to have a problem like that with 4.2.  No bbproxy involved there, just several hobbit servers.  If one of them was down, the server/bin/bb command would hang trying to talk to it.  It should have either failed to make the connect or timed out, but it didn't.  Anyway...


  You could set up a simple heartbeat script, a bit like this:


       #!/bin/bash


       $BB $BBDISP "status+2 bbproxyname.panicnow `date`
          If this is purple, a bbproxy died."


  Set that up to launch every minute.  The message has a lifetime of 2 minutes, so it'll go purple about 3 minutes after the bbproxy hangs up or dies.  You might want to pick a different column name.  :)


  Ralph Mitchell


  On Tue, Jul 7, 2009 at 8:40 AM, <user-c15424b7e83a@xymon.invalid> wrote:

    Hi guys,

    You may remember my questions from last week. Thanks again for these. I have now implemented it however I have a few questions (and possibly bugs?). I will first start by describing the setup. To keep things simple I'll call each hobbit server IP as either "A" "B" or "C" depending on the data centre.

    Data centre 1:
    bbproxy and bbserver (running on same box, A). bbproxy configured to send to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg.

    Data centre 2:
    bbproxy and bbserver (running on same box, B). bbproxy configured to send to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg.

    Data centre 3:
    bbproxy and bbserver (running on same box, C). bbproxy configured to send to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg.

    Firstly, everything looks good. However, if I am to stop bbserver at A (but keep bbproxy running at A) then shortly afterwards bbproxy at A will start crashing. I've tried changing the order of --bbdisplays and it seems like the bbproxy will crash if the last bbdisplay IP has been shutdown/not available. Is this known, or is there a workaround?

    To explain this better - Lets say bbproxy at site B is configured as --bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy will crash shortly afterwards. Note I'm on x86 Solaris.


    My other question is this - currently if a proxy crashes and the other 2 xymon servers do not receive updates, most tests continue to stay green. I'm sure I've seen a configuration option but I can't seem to find it - can I configure these tests to go purple if they don't receive an update within the next 5-10 minutes? They've only just gone purple after 30 minutes, but we really need to know within 5 minutes if we haven't received a valid update.

    Many thanks,
    James.
list Ralph Mitchell · Tue, 7 Jul 2009 18:27:15 -0500 ·
The conn and ssh tests are handled by bbnet on the xymon server.  The cpu,
disks, memory, etc are run on each client system.  Personally I wouldn't
want to fool around with those.

The point of the little test script was that, with a much shorter lifetime,
those messages will show purple dots a lot quicker when the bbproxy dies.
 You'll also only get a maximum of three to start with.  All the other
purple dots will come along 25 minutes later, if nobody fixes the bbproxy.
quoted from James

Ralph Mitchell


On Tue, Jul 7, 2009 at 3:31 PM, James <user-c15424b7e83a@xymon.invalid> wrote:
 Hi Ralph,

I'm using 4.2.3 - this is for a production area so I'm weary to try out
betas currently, we need stability. I've configured bbproxy as a service on
Solaris so it can bring itself back, but it will just crash again so I think
we might miss quite a few messages..which links into my second point -> I
was trying to find a way of configuring things such as conn, cpu etc to have
a faster purple time and to alert for it. I've found the lifetime option but
this as far as I can see is for custom stuff - not sure how to set it up for
tests such as conn, ssh, cpu etc?

I'll try out your script tomorrow and have a play - as it currently stands
if a bbserver dies for whatever reason (and therefore a bbproxy) it can look
for up to half an hour like many things are "green" when they arn't infact
being updated.

Cheers
James

----- Original Message -----

*From:* Ralph Mitchell <user-00a5e44c48c0@xymon.invalid>
quoted from James
*To:* user-ae9b8668bcde@xymon.invalid
*Sent:* Tuesday, July 07, 2009 6:31 PM
*Subject:* Re: [hobbit] Multiple hobbit (bbproxy and bb server) queries

What version of hobbit/xymon are you running??  I used to have a problem
like that with 4.2.  No bbproxy involved there, just several hobbit servers.
 If one of them was down, the server/bin/bb command would hang trying to
talk to it.  It should have either failed to make the connect or timed out,
but it didn't.  Anyway...
You could set up a simple heartbeat script, a bit like this:

     #!/bin/bash

     $BB $BBDISP "status+2 bbproxyname.panicnow `date`
        If this is purple, a bbproxy died."

Set that up to launch every minute.  The message has a lifetime of 2
minutes, so it'll go purple about 3 minutes after the bbproxy hangs up or
dies.  You might want to pick a different column name.  :)

Ralph Mitchell


On Tue, Jul 7, 2009 at 8:40 AM, <user-c15424b7e83a@xymon.invalid> wrote:
Hi guys,

You may remember my questions from last week. Thanks again for these. I
have now implemented it however I have a few questions (and possibly bugs?).
I will first start by describing the setup. To keep things simple I'll call
each hobbit server IP as either "A" "B" or "C" depending on the data centre.

Data centre 1:
bbproxy and bbserver (running on same box, A). bbproxy configured to send
to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg.

Data centre 2:
bbproxy and bbserver (running on same box, B). bbproxy configured to send
to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg.

Data centre 3:
bbproxy and bbserver (running on same box, C). bbproxy configured to send
to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg.

Firstly, everything looks good. However, if I am to stop bbserver at A
(but keep bbproxy running at A) then shortly afterwards bbproxy at A will
start crashing. I've tried changing the order of --bbdisplays and it seems
like the bbproxy will crash if the last bbdisplay IP has been shutdown/not
available. Is this known, or is there a workaround?

To explain this better - Lets say bbproxy at site B is configured as
--bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy
will crash shortly afterwards. Note I'm on x86 Solaris.


My other question is this - currently if a proxy crashes and the other 2
xymon servers do not receive updates, most tests continue to stay green. I'm
sure I've seen a configuration option but I can't seem to find it - can I
configure these tests to go purple if they don't receive an update within
the next 5-10 minutes? They've only just gone purple after 30 minutes, but
we really need to know within 5 minutes if we haven't received a valid
update.

Many thanks,
James.