Multiple hobbit (bbproxy and bb server) queries
list E-mail j.sansford
Hi guys, You may remember my questions from last week. Thanks again for these. I have now implemented it however I have a few questions (and possibly bugs?). I will first start by describing the setup. To keep things simple I'll call each hobbit server IP as either "A" "B" or "C" depending on the data centre. Data centre 1: bbproxy and bbserver (running on same box, A). bbproxy configured to send to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg. Data centre 2: bbproxy and bbserver (running on same box, B). bbproxy configured to send to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg. Data centre 3: bbproxy and bbserver (running on same box, C). bbproxy configured to send to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg. Firstly, everything looks good. However, if I am to stop bbserver at A (but keep bbproxy running at A) then shortly afterwards bbproxy at A will start crashing. I've tried changing the order of --bbdisplays and it seems like the bbproxy will crash if the last bbdisplay IP has been shutdown/not available. Is this known, or is there a workaround? To explain this better - Lets say bbproxy at site B is configured as --bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy will crash shortly afterwards. Note I'm on x86 Solaris. My other question is this - currently if a proxy crashes and the other 2 xymon servers do not receive updates, most tests continue to stay green. I'm sure I've seen a configuration option but I can't seem to find it - can I configure these tests to go purple if they don't receive an update within the next 5-10 minutes? They've only just gone purple after 30 minutes, but we really need to know within 5 minutes if we haven't received a valid update. Many thanks, James.
list Ralph Mitchell
What version of hobbit/xymon are you running?? I used to have a problem
like that with 4.2. No bbproxy involved there, just several hobbit servers.
If one of them was down, the server/bin/bb command would hang trying to
talk to it. It should have either failed to make the connect or timed out,
but it didn't. Anyway...
You could set up a simple heartbeat script, a bit like this:
#!/bin/bash
$BB $BBDISP "status+2 bbproxyname.panicnow `date`
If this is purple, a bbproxy died."
Set that up to launch every minute. The message has a lifetime of 2
minutes, so it'll go purple about 3 minutes after the bbproxy hangs up or
dies. You might want to pick a different column name. :)
Ralph Mitchell
▸
On Tue, Jul 7, 2009 at 8:40 AM, <user-c15424b7e83a@xymon.invalid> wrote:
Hi guys, You may remember my questions from last week. Thanks again for these. I have now implemented it however I have a few questions (and possibly bugs?). I will first start by describing the setup. To keep things simple I'll call each hobbit server IP as either "A" "B" or "C" depending on the data centre. Data centre 1: bbproxy and bbserver (running on same box, A). bbproxy configured to send to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg. Data centre 2: bbproxy and bbserver (running on same box, B). bbproxy configured to send to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg. Data centre 3: bbproxy and bbserver (running on same box, C). bbproxy configured to send to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg. Firstly, everything looks good. However, if I am to stop bbserver at A (but keep bbproxy running at A) then shortly afterwards bbproxy at A will start crashing. I've tried changing the order of --bbdisplays and it seems like the bbproxy will crash if the last bbdisplay IP has been shutdown/not available. Is this known, or is there a workaround? To explain this better - Lets say bbproxy at site B is configured as --bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy will crash shortly afterwards. Note I'm on x86 Solaris. My other question is this - currently if a proxy crashes and the other 2 xymon servers do not receive updates, most tests continue to stay green. I'm sure I've seen a configuration option but I can't seem to find it - can I configure these tests to go purple if they don't receive an update within the next 5-10 minutes? They've only just gone purple after 30 minutes, but we really need to know within 5 minutes if we haven't received a valid update. Many thanks, James.
list James
Hi Ralph, I'm using 4.2.3 - this is for a production area so I'm weary to try out betas currently, we need stability. I've configured bbproxy as a service on Solaris so it can bring itself back, but it will just crash again so I think we might miss quite a few messages..which links into my second point -> I was trying to find a way of configuring things such as conn, cpu etc to have a faster purple time and to alert for it. I've found the lifetime option but this as far as I can see is for custom stuff - not sure how to set it up for tests such as conn, ssh, cpu etc? I'll try out your script tomorrow and have a play - as it currently stands if a bbserver dies for whatever reason (and therefore a bbproxy) it can look for up to half an hour like many things are "green" when they arn't infact being updated. Cheers James
▸
----- Original Message -----
From: Ralph Mitchell
To: user-ae9b8668bcde@xymon.invalid
Sent: Tuesday, July 07, 2009 6:31 PM
Subject: Re: [hobbit] Multiple hobbit (bbproxy and bb server) queries
What version of hobbit/xymon are you running?? I used to have a problem like that with 4.2. No bbproxy involved there, just several hobbit servers. If one of them was down, the server/bin/bb command would hang trying to talk to it. It should have either failed to make the connect or timed out, but it didn't. Anyway...
You could set up a simple heartbeat script, a bit like this:
#!/bin/bash
$BB $BBDISP "status+2 bbproxyname.panicnow `date`
If this is purple, a bbproxy died."
Set that up to launch every minute. The message has a lifetime of 2 minutes, so it'll go purple about 3 minutes after the bbproxy hangs up or dies. You might want to pick a different column name. :)
Ralph Mitchell
On Tue, Jul 7, 2009 at 8:40 AM, <user-c15424b7e83a@xymon.invalid> wrote:
Hi guys,
You may remember my questions from last week. Thanks again for these. I have now implemented it however I have a few questions (and possibly bugs?). I will first start by describing the setup. To keep things simple I'll call each hobbit server IP as either "A" "B" or "C" depending on the data centre.
Data centre 1:
bbproxy and bbserver (running on same box, A). bbproxy configured to send to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg.
Data centre 2:
bbproxy and bbserver (running on same box, B). bbproxy configured to send to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg.
Data centre 3:
bbproxy and bbserver (running on same box, C). bbproxy configured to send to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg.
Firstly, everything looks good. However, if I am to stop bbserver at A (but keep bbproxy running at A) then shortly afterwards bbproxy at A will start crashing. I've tried changing the order of --bbdisplays and it seems like the bbproxy will crash if the last bbdisplay IP has been shutdown/not available. Is this known, or is there a workaround?
To explain this better - Lets say bbproxy at site B is configured as --bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy will crash shortly afterwards. Note I'm on x86 Solaris.
My other question is this - currently if a proxy crashes and the other 2 xymon servers do not receive updates, most tests continue to stay green. I'm sure I've seen a configuration option but I can't seem to find it - can I configure these tests to go purple if they don't receive an update within the next 5-10 minutes? They've only just gone purple after 30 minutes, but we really need to know within 5 minutes if we haven't received a valid update.
Many thanks,
James.
list Ralph Mitchell
The conn and ssh tests are handled by bbnet on the xymon server. The cpu, disks, memory, etc are run on each client system. Personally I wouldn't want to fool around with those. The point of the little test script was that, with a much shorter lifetime, those messages will show purple dots a lot quicker when the bbproxy dies. You'll also only get a maximum of three to start with. All the other purple dots will come along 25 minutes later, if nobody fixes the bbproxy.
▸
Ralph Mitchell
On Tue, Jul 7, 2009 at 3:31 PM, James <user-c15424b7e83a@xymon.invalid> wrote:
Hi Ralph, I'm using 4.2.3 - this is for a production area so I'm weary to try out betas currently, we need stability. I've configured bbproxy as a service on Solaris so it can bring itself back, but it will just crash again so I think we might miss quite a few messages..which links into my second point -> I was trying to find a way of configuring things such as conn, cpu etc to have a faster purple time and to alert for it. I've found the lifetime option but this as far as I can see is for custom stuff - not sure how to set it up for tests such as conn, ssh, cpu etc? I'll try out your script tomorrow and have a play - as it currently stands if a bbserver dies for whatever reason (and therefore a bbproxy) it can look for up to half an hour like many things are "green" when they arn't infact being updated. Cheers James ----- Original Message -----
*From:* Ralph Mitchell <user-00a5e44c48c0@xymon.invalid>
▸
*To:* user-ae9b8668bcde@xymon.invalid *Sent:* Tuesday, July 07, 2009 6:31 PM *Subject:* Re: [hobbit] Multiple hobbit (bbproxy and bb server) queries What version of hobbit/xymon are you running?? I used to have a problem like that with 4.2. No bbproxy involved there, just several hobbit servers. If one of them was down, the server/bin/bb command would hang trying to talk to it. It should have either failed to make the connect or timed out, but it didn't. Anyway... You could set up a simple heartbeat script, a bit like this: #!/bin/bash $BB $BBDISP "status+2 bbproxyname.panicnow `date` If this is purple, a bbproxy died." Set that up to launch every minute. The message has a lifetime of 2 minutes, so it'll go purple about 3 minutes after the bbproxy hangs up or dies. You might want to pick a different column name. :) Ralph Mitchell On Tue, Jul 7, 2009 at 8:40 AM, <user-c15424b7e83a@xymon.invalid> wrote:Hi guys, You may remember my questions from last week. Thanks again for these. I have now implemented it however I have a few questions (and possibly bugs?). I will first start by describing the setup. To keep things simple I'll call each hobbit server IP as either "A" "B" or "C" depending on the data centre. Data centre 1: bbproxy and bbserver (running on same box, A). bbproxy configured to send to B,C,A. bbserver configured to talk to A,B,C in hobbitserver.cfg. Data centre 2: bbproxy and bbserver (running on same box, B). bbproxy configured to send to C,A,B. bbserver configured to talk to B,C,A in hobbitserver.cfg. Data centre 3: bbproxy and bbserver (running on same box, C). bbproxy configured to send to A,B,C. bbserver configured to talk to C,A,B in hobbitserver.cfg. Firstly, everything looks good. However, if I am to stop bbserver at A (but keep bbproxy running at A) then shortly afterwards bbproxy at A will start crashing. I've tried changing the order of --bbdisplays and it seems like the bbproxy will crash if the last bbdisplay IP has been shutdown/not available. Is this known, or is there a workaround? To explain this better - Lets say bbproxy at site B is configured as --bbdisplays=B,C,A. If I kill the xymon server at site A then this proxy will crash shortly afterwards. Note I'm on x86 Solaris. My other question is this - currently if a proxy crashes and the other 2 xymon servers do not receive updates, most tests continue to stay green. I'm sure I've seen a configuration option but I can't seem to find it - can I configure these tests to go purple if they don't receive an update within the next 5-10 minutes? They've only just gone purple after 30 minutes, but we really need to know within 5 minutes if we haven't received a valid update. Many thanks, James.