Xymon Mailing List Archive search

cores

12 messages in this thread

list David Gore · Mon, 15 Aug 2005 22:41:42 +0000 ·
It is possible the cores are caused by running two clients on the same host both redirecting to ~hobbitclient/tmp/msg.txt.  That would explain my occasional rash of purples followed by greens at least.  I am going to work around that.

If I  remove most of the entries from bin/hobbitclient-osf1.sh will the back-end get annoyed and engage in odd behavior?  Or will it simply not send cpu, procs, disk, etc.. to the web page?

-- 
David
list Henrik Størner · Tue, 16 Aug 2005 07:31:05 +0200 ·
quoted from David Gore
On Mon, Aug 15, 2005 at 10:41:42PM +0000, David Gore wrote:
It is possible the cores are caused by running two clients on the same 
host both redirecting to ~hobbitclient/tmp/msg.txt.  That would explain 
my occasional rash of purples followed by greens at least.  I am going 
to work around that.
I dont think so. It is your server modules that are crashing, so what
happens on the client shouldn't matter at all.

Unfortunately, the backtrace you've sent from the core-files doesn't
reveal much of where the crash happens - which indicates that there's
something that thrashes the stack and causes the crash.

To begin with, I'd like you to add "--debug" to the hobbitd command
line. This will cause a lot of output to go into your hobbitd.log file,
the interesting bits obviously is what happens when it crashes. I'd
like to see the full log file, though - I'll e-mail you details of where
you can upload it since it's probably too large for e-mail.
quoted from David Gore

If I  remove most of the entries from bin/hobbitclient-osf1.sh will the 
back-end get annoyed and engage in odd behavior?  Or will it simply not 
send cpu, procs, disk, etc.. to the web page?
It should simply stop sending the messages it doesn't have data for.


Regards,
Henrik
list Wes Neal · Tue, 16 Aug 2005 13:41:07 -0400 ·
Below is all I have in my hobbit-alerts.cfg file currently.  I first tested
this with no macro at all and I got the emails fine, but now when trying to
build macros it does not send the mail.


#    PAGER GROUPS
$nocsupp = user-337afe88c8b7@xymon.invalid,user-5944348b568b@xymon.invalid

#    HOST GROUPS
$testing = tardis,nocsunray04,weslap


HOST=$testing
        MAIL $nocsupp REPEAT=10 RECOVERED


Thanks
Wes
list Peter Welter · Tue, 16 Aug 2005 21:01:37 +0200 ·
Avoid using spaces.
$nocsupp = user-337afe88c8b7@xymon.invalid,user-5944348b568b@xymon.invalid
$nocsupp=user-337afe88c8b7@xymon.invalid,user-5944348b568b@xymon.invalid
$testing = tardis,nocsunray04,weslap
Dito

Peter


2005/8/16, Wes Neal <user-4f272af8a740@xymon.invalid>:
quoted from Wes Neal
Below is all I have in my hobbit-alerts.cfg file currently.  I first tested
this with no macro at all and I got the emails fine, but now when trying to
build macros it does not send the mail.


#    PAGER GROUPS
$nocsupp = user-337afe88c8b7@xymon.invalid,user-5944348b568b@xymon.invalid

#    HOST GROUPS
$testing = tardis,nocsunray04,weslap


HOST=$testing
        MAIL $nocsupp REPEAT=10 RECOVERED


Thanks
Wes

list Humberto Cabrera · Mon, 23 Jan 2006 13:07:35 -0500 ·
Hi everyone,,  in:

$CBDC_EMAIL=MAIL $EMAIL  REPEAT=10M
TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359
$CBDC_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359

$CBTS1_EMAIL=MAIL $EMAIL REPEAT=10M
TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359
$CBTS1_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359

$CBWS_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:0900,W:1730:2359,60:0000:2359

$RPPRO_EMAIL=MAIL $EMAIL REPEAT=10M
TIME=*:0000:0305,*:0310:2130,*:2200:2359
$RPPRO_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=*:0000:0305,*:0310:2130,*:2200:2359

$WEB_CELL=MAIL $CELL FORMAT=sms REPEAT=20M TIME=*:0000:0200,*:0500:2359


Could someone tell me what the W stands for?  Im assuming Weekends?   


Also for example in:

6:0000:2359,0:0000:0100,0:0430:2359


What does the 6: stanbd for ??  Or the 0:  ??  Any help would be greatly
appreciated. 


Thanks!

Humberto Cabrera
Systems Administrator
Cosabella
list Larry Barber · Mon, 23 Jan 2006 12:18:21 -0600 ·
'W' stands for weekdays, not weekends. 0 and 6 are days of the week, Sunday
and Saturday, respectively.

Thanks,
Larry Barber
quoted from Humberto Cabrera

On 1/23/06, Humberto Cabrera <user-fcf745c22bfc@xymon.invalid> wrote:

Hi everyone,,  in:

$CBDC_EMAIL=MAIL $EMAIL  REPEAT=10M
TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359
$CBDC_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359

$CBTS1_EMAIL=MAIL $EMAIL REPEAT=10M
TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359
$CBTS1_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359

$CBWS_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:0900,W:1730:2359,60:0000:2359

$RPPRO_EMAIL=MAIL $EMAIL REPEAT=10M
TIME=*:0000:0305,*:0310:2130,*:2200:2359
$RPPRO_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=*:0000:0305,*:0310:2130,*:2200:2359

$WEB_CELL=MAIL $CELL FORMAT=sms REPEAT=20M TIME=*:0000:0200,*:0500:2359


Could someone tell me what the W stands for?  Im assuming Weekends?


Also for example in:

6:0000:2359,0:0000:0100,0:0430:2359


What does the 6: stanbd for ??  Or the 0:  ??  Any help would be greatly
appreciated.


Thanks!

Humberto Cabrera
Systems Administrator
Cosabella

list Humberto Cabrera · Mon, 23 Jan 2006 13:21:39 -0500 ·
Thank you very much,  by any chance can you point me to the
documentation that contains this settings,. ive read the setting up
alerts doc on the hobbit site and nothing regarding those parameters was
mentioned.
 
once again, thank you.
quoted from Larry Barber

	-----Original Message-----
	From: Larry Barber [mailto:user-6ef9c2864140@xymon.invalid] 
	Sent: Monday, January 23, 2006 1:18 PM
	To: user-ae9b8668bcde@xymon.invalid
	Subject: Re: [hobbit] Hobbit-alerts.cfg question
	
	
	'W' stands for weekdays, not weekends. 0 and 6 are days of the
week, Sunday and Saturday, respectively.
	
	Thanks,
	Larry Barber
	
	
	On 1/23/06, Humberto Cabrera <user-fcf745c22bfc@xymon.invalid> wrote: 


		Hi everyone,,  in:
		
		$CBDC_EMAIL=MAIL $EMAIL  REPEAT=10M
		TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359
		$CBDC_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
		TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359 
		
		$CBTS1_EMAIL=MAIL $EMAIL REPEAT=10M
		TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359
		$CBTS1_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
		TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359
		
		$CBWS_CELL=MAIL $CELL FORMAT=sms REPEAT=20M 
		TIME=W:0000:0900,W:1730:2359,60:0000:2359
		
		$RPPRO_EMAIL=MAIL $EMAIL REPEAT=10M
		TIME=*:0000:0305,*:0310:2130,*:2200:2359
		$RPPRO_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
		TIME=*:0000:0305,*:0310:2130,*:2200:2359 
		
		$WEB_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=*:0000:0200,*:0500:2359
		
		
		Could someone tell me what the W stands for?  Im
assuming Weekends?
		
		
		Also for example in:
		
		6:0000:2359,0:0000:0100,0:0430:2359 
		
		
		What does the 6: stanbd for ??  Or the 0:  ??  Any help
would be greatly
		appreciated.
		
		
		Thanks!
		
		Humberto Cabrera
		Systems Administrator
		Cosabella
list Larry Barber · Mon, 23 Jan 2006 13:19:35 -0600 ·
The hobbit-alerts.cfg man page refers you to the bb-hosts man page for the
TIME specification, in particular the section in bb-hosts dealing with
DOWNTIME.
quoted from Humberto Cabrera

Thanks,
Larry Barber

On 1/23/06, Humberto Cabrera <user-fcf745c22bfc@xymon.invalid> wrote:
Thank you very much,  by any chance can you point me to the documentation
that contains this settings,. ive read the setting up alerts doc on the
hobbit site and nothing regarding those parameters was mentioned.

once again, thank you.

 -----Original Message-----
*From:* Larry Barber [mailto:user-6ef9c2864140@xymon.invalid]
*Sent:* Monday, January 23, 2006 1:18 PM
*To:* user-ae9b8668bcde@xymon.invalid
*Subject:* Re: [hobbit] Hobbit-alerts.cfg question

'W' stands for weekdays, not weekends. 0 and 6 are days of the week,
Sunday and Saturday, respectively.

Thanks,
Larry Barber

On 1/23/06, Humberto Cabrera <user-fcf745c22bfc@xymon.invalid> wrote:

Hi everyone,,  in:

$CBDC_EMAIL=MAIL $EMAIL  REPEAT=10M
TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359
$CBDC_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:2359,6:0000:2359,0:0000:0100,0:0430:2359

$CBTS1_EMAIL=MAIL $EMAIL REPEAT=10M
TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359
$CBTS1_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:2359,6:0000:2359,0:0000:2000,0:2025:2359

$CBWS_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=W:0000:0900,W:1730:2359,60:0000:2359

$RPPRO_EMAIL=MAIL $EMAIL REPEAT=10M
TIME=*:0000:0305,*:0310:2130,*:2200:2359
$RPPRO_CELL=MAIL $CELL FORMAT=sms REPEAT=20M
TIME=*:0000:0305,*:0310:2130,*:2200:2359

$WEB_CELL=MAIL $CELL FORMAT=sms REPEAT=20M TIME=*:0000:0200,*:0500:2359


Could someone tell me what the W stands for?  Im assuming Weekends?


Also for example in:

6:0000:2359,0:0000:0100,0:0430:2359


What does the 6: stanbd for ??  Or the 0:  ??  Any help would be greatly
appreciated.


Thanks!

Humberto Cabrera
Systems Administrator
Cosabella

list Colin Coe · Fri, 8 Oct 2010 10:40:39 +0800 ·
Hi all

The alerting is starting to take shape but I've a question regarding
how the alerting works.  If I have a stanza similar to the following,
how is it evaluated?  Once for all hosts, or for one host at a time?
---
HOST=%.*
        # Proliant tests
        MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS REPEAT=1440m
        MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS RECOVERED

        # conn where status is RED
        MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=conn EXPAGE=dev REPEAT=1440m
        MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=conn EXPAGE=dev RECOVERED

        # conn where status is RED (dev/test)
        MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=conn PAGE=dev REPEAT=1440m
        MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=conn PAGE=dev RECOVERED

        # cpu,disk,memory where status is RED
        MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
EXPAGE=dev REPEAT=1440m
        MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
EXPAGE=dev RECOVERED

        # Dev servers
        MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
PAGE=dev REPEAT=1440m
        MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
PAGE=dev RECOVERED

        # Non-dev status YELLOW
        MAIL user-65aef167d5bd@xymon.invalid COLOR=yellow
SERVICE=cpu,disk,memory REPEAT=1440m DURATION>30m
        MAIL user-65aef167d5bd@xymon.invalid COLOR=yellow
SERVICE=cpu,disk,memory RECOVERED
---

Also, I've noticed that when a fault occurs I get two emails (or sms')
and another when the fault is rectified.  I'm thinking this is because
of the 'RECOVERED' line but i thought this would only trigger when the
fault goes.  Have I misunderstood?

Thanks

CC

-- 
RHCE#805007969328369
list Vernon Everett · Fri, 8 Oct 2010 11:12:06 +0800 ·
Hi Colin

One line per alert, with RECOVERED on the end.
Change it to something like this.
   MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS REPEAT=1440m
RECOVERED

Cheers
     Vernon
quoted from Colin Coe

On Fri, Oct 8, 2010 at 10:40 AM, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi all

The alerting is starting to take shape but I've a question regarding
how the alerting works.  If I have a stanza similar to the following,
how is it evaluated?  Once for all hosts, or for one host at a time?
---
HOST=%.*
       # Proliant tests
       MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS REPEAT=1440m
       MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS RECOVERED

       # conn where status is RED
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=conn EXPAGE=dev
REPEAT=1440m
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=conn EXPAGE=dev
RECOVERED

       # conn where status is RED (dev/test)
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=conn PAGE=dev
REPEAT=1440m
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=conn PAGE=dev
RECOVERED

       # cpu,disk,memory where status is RED
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
EXPAGE=dev REPEAT=1440m
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
EXPAGE=dev RECOVERED

       # Dev servers
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
PAGE=dev REPEAT=1440m
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
PAGE=dev RECOVERED

       # Non-dev status YELLOW
       MAIL user-65aef167d5bd@xymon.invalid COLOR=yellow
SERVICE=cpu,disk,memory REPEAT=1440m DURATION>30m
       MAIL user-65aef167d5bd@xymon.invalid COLOR=yellow
SERVICE=cpu,disk,memory RECOVERED
---

Also, I've noticed that when a fault occurs I get two emails (or sms')
and another when the fault is rectified.  I'm thinking this is because
of the 'RECOVERED' line but i thought this would only trigger when the
fault goes.  Have I misunderstood?

Thanks

CC

--
RHCE#805007969328369

list Colin Coe · Fri, 8 Oct 2010 11:35:15 +0800 ·
Cool.  Thanks Vernon.

On Fri, Oct 8, 2010 at 11:12 AM, Vernon Everett
quoted from Vernon Everett
<user-b3f8dacb72c8@xymon.invalid> wrote:
Hi Colin

One line per alert, with RECOVERED on the end.
Change it to something like this.
   MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS REPEAT=1440m
RECOVERED

Cheers
     Vernon

On Fri, Oct 8, 2010 at 10:40 AM, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi all

The alerting is starting to take shape but I've a question regarding
how the alerting works.  If I have a stanza similar to the following,
how is it evaluated?  Once for all hosts, or for one host at a time?
---
HOST=%.*
       # Proliant tests
       MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS REPEAT=1440m
       MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS RECOVERED

       # conn where status is RED
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=conn EXPAGE=dev
REPEAT=1440m
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=conn EXPAGE=dev
RECOVERED

       # conn where status is RED (dev/test)
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=conn PAGE=dev
REPEAT=1440m
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=conn PAGE=dev
RECOVERED

       # cpu,disk,memory where status is RED
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
EXPAGE=dev REPEAT=1440m
       MAIL user-4c524593359c@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
EXPAGE=dev RECOVERED

       # Dev servers
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
PAGE=dev REPEAT=1440m
       MAIL user-65aef167d5bd@xymon.invalid COLOR=red SERVICE=cpu,disk,memory
PAGE=dev RECOVERED

       # Non-dev status YELLOW
       MAIL user-65aef167d5bd@xymon.invalid COLOR=yellow
SERVICE=cpu,disk,memory REPEAT=1440m DURATION>30m
       MAIL user-65aef167d5bd@xymon.invalid COLOR=yellow
SERVICE=cpu,disk,memory RECOVERED
---

Also, I've noticed that when a fault occurs I get two emails (or sms')
and another when the fault is rectified.  I'm thinking this is because
of the 'RECOVERED' line but i thought this would only trigger when the
fault goes.  Have I misunderstood?

Thanks

CC

--
RHCE#805007969328369

-- 

RHCE#805007969328369
list Henrik Størner · Fri, 8 Oct 2010 05:34:40 +0000 (UTC) ·
quoted from Colin Coe
In <AANLkTi=user-dc67722108c1@xymon.invalid> Colin Coe <user-5b250cd7a540@xymon.invalid> writes:
The alerting is starting to take shape but I've a question regarding
how the alerting works.  If I have a stanza similar to the following,
how is it evaluated?  Once for all hosts, or for one host at a time?
I understand your curiosity, but does it really matter how ?
But it is evaluated whenever a potential alert may be generated,
based on the host/service combination, time-of-day and all the
other criteria. Think of it as a set of rules, and each time
there something red or yellow, hobbitd_alert looks at this set of 
rules and finds those actions that match (if any).
quoted from Colin Coe
HOST=%.*
       # Proliant tests
       MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS REPEAT=1440m
       MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS RECOVERED
Also, I've noticed that when a fault occurs I get two emails (or sms')
and another when the fault is rectified.  I'm thinking this is because
of the 'RECOVERED' line but i thought this would only trigger when the
fault goes.  Have I misunderstood?
I think you have. Your configuration sets up two alerting actions, but
both of them send mail to the same recipient. That's why you get two
messages. What you want to do is simpler:

HOST=%.*
        # Proliant tests
        MAIL user-4c524593359c@xymon.invalid SERVICE=proliant FORMAT=SMS REPEAT=1440m RECOVERED

This will give you one message when the service goes red or yellow, and
one when it recovers. "RECOVERED" is an "add-on" to the normal alert,
since you probably would like to know not only when something is fixed,
but also when it broke.


Regards,
Henrik