Xymon Mailing List Archive search

Excluding Hosts from Paging at certain times

7 messages in this thread

list Allan Spencer · Fri, 11 Mar 2005 11:35:40 +1100 ·
Hi there,

Not sure if this is the right place to be sending this as I've only just
subscribed and the emails dont quite say exactly where to email but here
goes.

We have just upgraded from bb after 5 years or so to hobbit and we're not
looking back. We have some servers that have to be rebooted on a daily basis
due to some of the poorly written software (not by us) that runs on them. I
have tred using the DOWNTIME= option on the BB hosts but it only seems to
ignore the net tests like http, conn etc. The server still has the bb client
installed on it so when the server comes back up the bb client starts before
some of the monitored services start and therefore marks them red and we get
paged on them.

Given our worldwide customer base we still want to know about the issues
24x7 except for the 10 mins or so when it reboots. BB used to have the
option of putting in something like !host:time:email and it wouldnt page at
that particular time for that hosts. Just wondering if someone can suggest
the best way to work around this.

We have one alert that goes out via email and then to sortof a mailing list,
and then another line that does email-to-pager. I was thinking something
along the lines of what I have below although I am not sure about the TIME
statement if you can have comma separated values, ,if you can pass midnight
such as TIME=*:0910:0900

There is only 4 hosts currently that reboot, one at midnight, two at 9am,
one at 8:30am

If someone could possible give me a hand it and point me in the right
direction it would be greatly appreciated

Cheers

Allan

#extract of hobbit-alerts.cfg

$NPSERV=cpu,disk,msgs

HOST=host1.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0010:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0010:0000 STOP

HOST=host2.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0000:0900,*:0910:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP

HOST=*
    MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED
    MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple
EXSERVICE=$NPSERV
list Allan Spencer · Fri, 11 Mar 2005 11:40:00 +1100 ·
Just something else I forgot to mention, we want to send everything to email
such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager
email address
quoted from Allan Spencer

----- Original Message ----- 
From: "ZanDAhaR" <user-42a3456c44ef@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 11:35 AM
Subject: [hobbit] Excluding Hosts from Paging at certain times

Hi there,

Not sure if this is the right place to be sending this as I've only just
subscribed and the emails dont quite say exactly where to email but here
goes.

We have just upgraded from bb after 5 years or so to hobbit and we're not
looking back. We have some servers that have to be rebooted on a daily
basis
due to some of the poorly written software (not by us) that runs on them.
I
have tred using the DOWNTIME= option on the BB hosts but it only seems to
ignore the net tests like http, conn etc. The server still has the bb
client
installed on it so when the server comes back up the bb client starts
before
some of the monitored services start and therefore marks them red and we
get
paged on them.

Given our worldwide customer base we still want to know about the issues
24x7 except for the 10 mins or so when it reboots. BB used to have the
option of putting in something like !host:time:email and it wouldnt page
at
that particular time for that hosts. Just wondering if someone can suggest
the best way to work around this.

We have one alert that goes out via email and then to sortof a mailing
list,
and then another line that does email-to-pager. I was thinking something
along the lines of what I have below although I am not sure about the TIME
statement if you can have comma separated values, ,if you can pass
midnight
such as TIME=*:0910:0900

There is only 4 hosts currently that reboot, one at midnight, two at 9am,
one at 8:30am

If someone could possible give me a hand it and point me in the right
direction it would be greatly appreciated

Cheers

Allan

#extract of hobbit-alerts.cfg

$NPSERV=cpu,disk,msgs

HOST=host1.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0010:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0010:0000 STOP

HOST=host2.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0000:0900,*:0910:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP

HOST=*
    MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED
    MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple
EXSERVICE=$NPSERV

list Kevin Grady · Thu, 10 Mar 2005 19:57:25 -0500 ·
If you need more info let me know.

From the help file:

The first line defines a rule for alerting when something breaks on
the host "www.foo.com".
There are two recipients: user-b7c20e0da76a@xymon.invalid is notified if it is the
"http" service that fails, and the notification is repeated once an
hour until the problem is resolved.
user-09a89c618369@xymon.invalid is notified if it is the "cpu", "disk" or "memory"
tests that report a failure. Since there is no "REPEAT" setting for
this recipient, the default is used which is to repeat the alert every
30 minutes.

OK, suppose now that the webmaster complains about getting e-mails at
4 AM in the morning. The webserver is not supposed to be running
between 9 PM and 8 AM, so even though there is a problem, he doesn't
want to hear about it until 7:30 - that gives him just enough time to
fix the problem. So you must modify the rule so that it doesn't send
out alerts until 7:30 AM:

	HOST=www.foo.com
		MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h TIME=*:0730:2100
		MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory

Adding the TIME setting on the recipient causes the alerts for this
recipient to be suppressed, unless the time of day is within the
interval. So with this setup, the webmaster gets his sleep.

What would have happened if you put the TIME setting on the rule
instead of on the recipient ? Like this:

	HOST=www.foo.com TIME=*:0730:2100
		MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h
		MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory

Well, the webmaster would still have his nights to himself - but the
TIME setting would then also apply to the alerts that go out when
there is a problem with the "cpu", "disk" or "memory" services. So
there would not be any mails going to user-09a89c618369@xymon.invalid when a disk
fills up during the night.
quoted from Allan Spencer


On Fri, 11 Mar 2005 11:40:00 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:
Just something else I forgot to mention, we want to send everything to email
such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager
email address

----- Original Message -----
From: "ZanDAhaR" <user-42a3456c44ef@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 11:35 AM
Subject: [hobbit] Excluding Hosts from Paging at certain times
Hi there,

Not sure if this is the right place to be sending this as I've only just
subscribed and the emails dont quite say exactly where to email but here
goes.

We have just upgraded from bb after 5 years or so to hobbit and we're not
looking back. We have some servers that have to be rebooted on a daily
basis
due to some of the poorly written software (not by us) that runs on them.
I
have tred using the DOWNTIME= option on the BB hosts but it only seems to
ignore the net tests like http, conn etc. The server still has the bb
client
installed on it so when the server comes back up the bb client starts
before
some of the monitored services start and therefore marks them red and we
get
paged on them.

Given our worldwide customer base we still want to know about the issues
24x7 except for the 10 mins or so when it reboots. BB used to have the
option of putting in something like !host:time:email and it wouldnt page
at
that particular time for that hosts. Just wondering if someone can suggest
the best way to work around this.

We have one alert that goes out via email and then to sortof a mailing
list,
and then another line that does email-to-pager. I was thinking something
along the lines of what I have below although I am not sure about the TIME
statement if you can have comma separated values, ,if you can pass
midnight
such as TIME=*:0910:0900

There is only 4 hosts currently that reboot, one at midnight, two at 9am,
one at 8:30am

If someone could possible give me a hand it and point me in the right
direction it would be greatly appreciated

Cheers

Allan

#extract of hobbit-alerts.cfg

$NPSERV=cpu,disk,msgs

HOST=host1.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0010:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0010:0000 STOP

HOST=host2.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0000:0900,*:0910:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP

HOST=*
    MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED
    MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple
EXSERVICE=$NPSERV

list Allan Spencer · Fri, 11 Mar 2005 12:37:39 +1100 ·
I've already read all of the man pages online but I still couldnt be 100%
sure about what was can and cant be specified in terms of times.

Also I needed clarification on what exactly the downtime param in the
bb-hosts related to
quoted from Kevin Grady

----- Original Message ----- 
From: "kevin grady" <user-50dc3c45bc73@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 11:57 AM
Subject: Re: [hobbit] Excluding Hosts from Paging at certain times

If you need more info let me know.
From the help file:
The first line defines a rule for alerting when something breaks on
the host "www.foo.com".
There are two recipients: user-b7c20e0da76a@xymon.invalid is notified if it is the
"http" service that fails, and the notification is repeated once an
hour until the problem is resolved.
user-09a89c618369@xymon.invalid is notified if it is the "cpu", "disk" or "memory"
tests that report a failure. Since there is no "REPEAT" setting for
this recipient, the default is used which is to repeat the alert every
30 minutes.

OK, suppose now that the webmaster complains about getting e-mails at
4 AM in the morning. The webserver is not supposed to be running
between 9 PM and 8 AM, so even though there is a problem, he doesn't
want to hear about it until 7:30 - that gives him just enough time to
fix the problem. So you must modify the rule so that it doesn't send
out alerts until 7:30 AM:

HOST=www.foo.com
MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h TIME=*:0730:2100
MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory

Adding the TIME setting on the recipient causes the alerts for this
recipient to be suppressed, unless the time of day is within the
interval. So with this setup, the webmaster gets his sleep.

What would have happened if you put the TIME setting on the rule
instead of on the recipient ? Like this:

HOST=www.foo.com TIME=*:0730:2100
MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h
MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory

Well, the webmaster would still have his nights to himself - but the
TIME setting would then also apply to the alerts that go out when
there is a problem with the "cpu", "disk" or "memory" services. So
there would not be any mails going to user-09a89c618369@xymon.invalid when a disk
fills up during the night.


On Fri, 11 Mar 2005 11:40:00 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:
Just something else I forgot to mention, we want to send everything to
email
such as msgs disk http conn etc, but not send disk, cpu, msgs to the
pager
email address

----- Original Message -----
From: "ZanDAhaR" <user-42a3456c44ef@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 11:35 AM
Subject: [hobbit] Excluding Hosts from Paging at certain times
Hi there,

Not sure if this is the right place to be sending this as I've only
just
subscribed and the emails dont quite say exactly where to email but
here
goes.

We have just upgraded from bb after 5 years or so to hobbit and we're
not
looking back. We have some servers that have to be rebooted on a daily
basis
due to some of the poorly written software (not by us) that runs on
them.
I
have tred using the DOWNTIME= option on the BB hosts but it only seems
to
ignore the net tests like http, conn etc. The server still has the bb
client
installed on it so when the server comes back up the bb client starts
before
some of the monitored services start and therefore marks them red and
we
get
paged on them.

Given our worldwide customer base we still want to know about the
issues
24x7 except for the 10 mins or so when it reboots. BB used to have the
option of putting in something like !host:time:email and it wouldnt
page
at
that particular time for that hosts. Just wondering if someone can
suggest
the best way to work around this.

We have one alert that goes out via email and then to sortof a mailing
list,
and then another line that does email-to-pager. I was thinking
something
along the lines of what I have below although I am not sure about the
TIME
statement if you can have comma separated values, ,if you can pass
midnight
such as TIME=*:0910:0900

There is only 4 hosts currently that reboot, one at midnight, two at
9am,
one at 8:30am

If someone could possible give me a hand it and point me in the right
direction it would be greatly appreciated

Cheers

Allan

#extract of hobbit-alerts.cfg

$NPSERV=cpu,disk,msgs

HOST=host1.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0010:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0010:0000 STOP

HOST=host2.domain.com
    MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0000:0900,*:0910:0000
    MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP

HOST=*
    MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED
    MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple
EXSERVICE=$NPSERV

list Robert Edeker · Thu, 10 Mar 2005 20:54:44 -0500 ·
quoted from Allan Spencer
On Fri, 11 Mar 2005 12:37:39 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:
I've already read all of the man pages online but I still couldnt be 100%
sure about what was can and cant be specified in terms of times.

Also I needed clarification on what exactly the downtime param in the
bb-hosts related to
The DOWNTIME issue is a bug. I just ran into it (msg posted on the
6th) where we reboot every Sunday, but it was only ignoring
network-based tests.  (have to assume it'll be fixed in the next RC or
sometime before final)

-r
list Allan Spencer · Fri, 11 Mar 2005 14:56:00 +1100 ·
Ahh excellent as long as I know im not being stupid and going something
wrong :)

Also I only just subscribed and could not find any sortof online version (is
there one? if so where) of the mailing list to search so I would not have
seen your earlier post

Cheers
Allan
quoted from Robert Edeker

----- Original Message ----- 
From: "Robert Edeker" <user-9ab521a661f1@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 12:54 PM
Subject: Re: [hobbit] Excluding Hosts from Paging at certain times

On Fri, 11 Mar 2005 12:37:39 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:
I've already read all of the man pages online but I still couldnt be
100%
sure about what was can and cant be specified in terms of times.

Also I needed clarification on what exactly the downtime param in the
bb-hosts related to
The DOWNTIME issue is a bug. I just ran into it (msg posted on the
6th) where we reboot every Sunday, but it was only ignoring
network-based tests.  (have to assume it'll be fixed in the next RC or
sometime before final)

-r

list Henrik Størner · Fri, 11 Mar 2005 07:43:10 +0100 ·
quoted from Allan Spencer
On Fri, Mar 11, 2005 at 11:35:40AM +1100, ZanDAhaR wrote:
We have just upgraded from bb after 5 years or so to hobbit and we're not
looking back. We have some servers that have to be rebooted on a daily basis
due to some of the poorly written software (not by us) that runs on them. I
have tred using the DOWNTIME= option on the BB hosts but it only seems to
ignore the net tests like http, conn etc. The server still has the bb client
installed on it so when the server comes back up the bb client starts before
some of the monitored services start and therefore marks them red and we get
paged on them.
That doesn't sound right - the DOWNTIME setting should trigger on any
test status that goes into Hobbit, whether it is a client-side or
network-test.

I'll setup something similar here and see if I can reproduce this.
quoted from Allan Spencer

We have one alert that goes out via email and then to sortof a mailing list,
and then another line that does email-to-pager. I was thinking something
along the lines of what I have below although I am not sure about the TIME
statement if you can have comma separated values, ,if you can pass midnight
such as TIME=*:0910:0900
"Yes" to both questions. You can definitely have a list of
comma-separated settings, and wrapping around midnight also works. You
can try running "hobbitd_alert --test HOSTNAME TESTNAME" and see how
it decides which alerts to send out.


Your configuration looks OK to me.


Regards,
Henrik