Excluding Hosts from Paging at certain times
list Allan Spencer
Hi there,
Not sure if this is the right place to be sending this as I've only just
subscribed and the emails dont quite say exactly where to email but here
goes.
We have just upgraded from bb after 5 years or so to hobbit and we're not
looking back. We have some servers that have to be rebooted on a daily basis
due to some of the poorly written software (not by us) that runs on them. I
have tred using the DOWNTIME= option on the BB hosts but it only seems to
ignore the net tests like http, conn etc. The server still has the bb client
installed on it so when the server comes back up the bb client starts before
some of the monitored services start and therefore marks them red and we get
paged on them.
Given our worldwide customer base we still want to know about the issues
24x7 except for the 10 mins or so when it reboots. BB used to have the
option of putting in something like !host:time:email and it wouldnt page at
that particular time for that hosts. Just wondering if someone can suggest
the best way to work around this.
We have one alert that goes out via email and then to sortof a mailing list,
and then another line that does email-to-pager. I was thinking something
along the lines of what I have below although I am not sure about the TIME
statement if you can have comma separated values, ,if you can pass midnight
such as TIME=*:0910:0900
There is only 4 hosts currently that reboot, one at midnight, two at 9am,
one at 8:30am
If someone could possible give me a hand it and point me in the right
direction it would be greatly appreciated
Cheers
Allan
#extract of hobbit-alerts.cfg
$NPSERV=cpu,disk,msgs
HOST=host1.domain.com
MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0010:0000
MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0010:0000 STOP
HOST=host2.domain.com
MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0000:0900,*:0910:0000
MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP
HOST=*
MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED
MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple
EXSERVICE=$NPSERV
list Allan Spencer
Just something else I forgot to mention, we want to send everything to email such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager email address
▸
----- Original Message -----
From: "ZanDAhaR" <user-42a3456c44ef@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 11:35 AM
Subject: [hobbit] Excluding Hosts from Paging at certain times
Hi there, Not sure if this is the right place to be sending this as I've only just subscribed and the emails dont quite say exactly where to email but here goes. We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them.
I
have tred using the DOWNTIME= option on the BB hosts but it only seems to
ignore the net tests like http, conn etc. The server still has the bb
client
installed on it so when the server comes back up the bb client starts
before
some of the monitored services start and therefore marks them red and we
get
paged on them.
Given our worldwide customer base we still want to know about the issues
24x7 except for the 10 mins or so when it reboots. BB used to have the
option of putting in something like !host:time:email and it wouldnt page
at
that particular time for that hosts. Just wondering if someone can suggest
the best way to work around this.
We have one alert that goes out via email and then to sortof a mailing
list,
and then another line that does email-to-pager. I was thinking something
along the lines of what I have below although I am not sure about the TIME
statement if you can have comma separated values, ,if you can pass
midnight
such as TIME=*:0910:0900
There is only 4 hosts currently that reboot, one at midnight, two at 9am,
one at 8:30am
If someone could possible give me a hand it and point me in the right
direction it would be greatly appreciated
Cheers
Allan
#extract of hobbit-alerts.cfg
$NPSERV=cpu,disk,msgs
HOST=host1.domain.com
MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0010:0000
MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0010:0000 STOP
HOST=host2.domain.com
MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED
TIME=*:0000:0900,*:0910:0000
MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS
EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP
HOST=*
MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED
MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple
EXSERVICE=$NPSERV
list Kevin Grady
If you need more info let me know. From the help file: The first line defines a rule for alerting when something breaks on the host "www.foo.com". There are two recipients: user-b7c20e0da76a@xymon.invalid is notified if it is the "http" service that fails, and the notification is repeated once an hour until the problem is resolved. user-09a89c618369@xymon.invalid is notified if it is the "cpu", "disk" or "memory" tests that report a failure. Since there is no "REPEAT" setting for this recipient, the default is used which is to repeat the alert every 30 minutes. OK, suppose now that the webmaster complains about getting e-mails at 4 AM in the morning. The webserver is not supposed to be running between 9 PM and 8 AM, so even though there is a problem, he doesn't want to hear about it until 7:30 - that gives him just enough time to fix the problem. So you must modify the rule so that it doesn't send out alerts until 7:30 AM: HOST=www.foo.com MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h TIME=*:0730:2100 MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory Adding the TIME setting on the recipient causes the alerts for this recipient to be suppressed, unless the time of day is within the interval. So with this setup, the webmaster gets his sleep. What would have happened if you put the TIME setting on the rule instead of on the recipient ? Like this: HOST=www.foo.com TIME=*:0730:2100 MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory Well, the webmaster would still have his nights to himself - but the TIME setting would then also apply to the alerts that go out when there is a problem with the "cpu", "disk" or "memory" services. So there would not be any mails going to user-09a89c618369@xymon.invalid when a disk fills up during the night.
▸
On Fri, 11 Mar 2005 11:40:00 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:Just something else I forgot to mention, we want to send everything to email such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager email address ----- Original Message ----- From: "ZanDAhaR" <user-42a3456c44ef@xymon.invalid> To: <user-ae9b8668bcde@xymon.invalid> Sent: Friday, March 11, 2005 11:35 AM Subject: [hobbit] Excluding Hosts from Paging at certain timesHi there, Not sure if this is the right place to be sending this as I've only just subscribed and the emails dont quite say exactly where to email but here goes. We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them.Ihave tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them. Given our worldwide customer base we still want to know about the issues 24x7 except for the 10 mins or so when it reboots. BB used to have the option of putting in something like !host:time:email and it wouldnt page at that particular time for that hosts. Just wondering if someone can suggest the best way to work around this. We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the TIME statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900 There is only 4 hosts currently that reboot, one at midnight, two at 9am, one at 8:30am If someone could possible give me a hand it and point me in the right direction it would be greatly appreciated Cheers Allan #extract of hobbit-alerts.cfg $NPSERV=cpu,disk,msgs HOST=host1.domain.com MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0010:0000 MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0010:0000 STOP HOST=host2.domain.com MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0000:0900,*:0910:0000 MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP HOST=* MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple EXSERVICE=$NPSERV
list Allan Spencer
I've already read all of the man pages online but I still couldnt be 100% sure about what was can and cant be specified in terms of times. Also I needed clarification on what exactly the downtime param in the bb-hosts related to
▸
----- Original Message -----
From: "kevin grady" <user-50dc3c45bc73@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 11:57 AM
Subject: Re: [hobbit] Excluding Hosts from Paging at certain times
If you need more info let me know.From the help file:The first line defines a rule for alerting when something breaks on the host "www.foo.com". There are two recipients: user-b7c20e0da76a@xymon.invalid is notified if it is the "http" service that fails, and the notification is repeated once an hour until the problem is resolved. user-09a89c618369@xymon.invalid is notified if it is the "cpu", "disk" or "memory" tests that report a failure. Since there is no "REPEAT" setting for this recipient, the default is used which is to repeat the alert every 30 minutes. OK, suppose now that the webmaster complains about getting e-mails at 4 AM in the morning. The webserver is not supposed to be running between 9 PM and 8 AM, so even though there is a problem, he doesn't want to hear about it until 7:30 - that gives him just enough time to fix the problem. So you must modify the rule so that it doesn't send out alerts until 7:30 AM: HOST=www.foo.com MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h TIME=*:0730:2100 MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory Adding the TIME setting on the recipient causes the alerts for this recipient to be suppressed, unless the time of day is within the interval. So with this setup, the webmaster gets his sleep. What would have happened if you put the TIME setting on the rule instead of on the recipient ? Like this: HOST=www.foo.com TIME=*:0730:2100 MAIL user-b7c20e0da76a@xymon.invalid SERVICE=http REPEAT=1h MAIL user-09a89c618369@xymon.invalid SERVICE=cpu,disk,memory Well, the webmaster would still have his nights to himself - but the TIME setting would then also apply to the alerts that go out when there is a problem with the "cpu", "disk" or "memory" services. So there would not be any mails going to user-09a89c618369@xymon.invalid when a disk fills up during the night. On Fri, 11 Mar 2005 11:40:00 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:Just something else I forgot to mention, we want to send everything to email such as msgs disk http conn etc, but not send disk, cpu, msgs to the pager email address ----- Original Message ----- From: "ZanDAhaR" <user-42a3456c44ef@xymon.invalid> To: <user-ae9b8668bcde@xymon.invalid> Sent: Friday, March 11, 2005 11:35 AM Subject: [hobbit] Excluding Hosts from Paging at certain timesHi there, Not sure if this is the right place to be sending this as I've only just subscribed and the emails dont quite say exactly where to email but here goes. We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them.Ihave tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them. Given our worldwide customer base we still want to know about the issues 24x7 except for the 10 mins or so when it reboots. BB used to have the option of putting in something like !host:time:email and it wouldnt page at that particular time for that hosts. Just wondering if someone can suggest the best way to work around this. We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the
TIME
statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900 There is only 4 hosts currently that reboot, one at midnight, two at
9am,
one at 8:30am If someone could possible give me a hand it and point me in the right direction it would be greatly appreciated Cheers Allan #extract of hobbit-alerts.cfg $NPSERV=cpu,disk,msgs HOST=host1.domain.com MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0010:0000 MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0010:0000 STOP HOST=host2.domain.com MAIL user-4801ab49df04@xymon.invalid COLOR=red,purple REPEAT=15 RECOVERED TIME=*:0000:0900,*:0910:0000 MAIL user-dfbeab1964ab@xymon.invalid COLOR=red,purple REPEAT=15 FORMAT=SMS EXSERVICE=$NPSERV TIME=*:0000:0900,*:0910:0000 STOP HOST=* MAIL user-4801ab49df04@xymon.invalid REPEAT=15 COLOR=red,purple RECOVERED MAIL user-dfbeab1964ab@xymon.invalid FORMAT=SMS REPEAT=15 COLOR=red,purple EXSERVICE=$NPSERV
list Robert Edeker
▸
On Fri, 11 Mar 2005 12:37:39 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:
I've already read all of the man pages online but I still couldnt be 100% sure about what was can and cant be specified in terms of times. Also I needed clarification on what exactly the downtime param in the bb-hosts related to
The DOWNTIME issue is a bug. I just ran into it (msg posted on the 6th) where we reboot every Sunday, but it was only ignoring network-based tests. (have to assume it'll be fixed in the next RC or sometime before final) -r
list Allan Spencer
Ahh excellent as long as I know im not being stupid and going something wrong :) Also I only just subscribed and could not find any sortof online version (is there one? if so where) of the mailing list to search so I would not have seen your earlier post Cheers Allan
▸
----- Original Message -----
From: "Robert Edeker" <user-9ab521a661f1@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Friday, March 11, 2005 12:54 PM
Subject: Re: [hobbit] Excluding Hosts from Paging at certain times
On Fri, 11 Mar 2005 12:37:39 +1100, ZanDAhaR <user-42a3456c44ef@xymon.invalid> wrote:I've already read all of the man pages online but I still couldnt be
100%
sure about what was can and cant be specified in terms of times. Also I needed clarification on what exactly the downtime param in the bb-hosts related toThe DOWNTIME issue is a bug. I just ran into it (msg posted on the 6th) where we reboot every Sunday, but it was only ignoring network-based tests. (have to assume it'll be fixed in the next RC or sometime before final) -r
list Henrik Størner
▸
On Fri, Mar 11, 2005 at 11:35:40AM +1100, ZanDAhaR wrote:
We have just upgraded from bb after 5 years or so to hobbit and we're not looking back. We have some servers that have to be rebooted on a daily basis due to some of the poorly written software (not by us) that runs on them. I have tred using the DOWNTIME= option on the BB hosts but it only seems to ignore the net tests like http, conn etc. The server still has the bb client installed on it so when the server comes back up the bb client starts before some of the monitored services start and therefore marks them red and we get paged on them.
That doesn't sound right - the DOWNTIME setting should trigger on any test status that goes into Hobbit, whether it is a client-side or network-test. I'll setup something similar here and see if I can reproduce this.
▸
We have one alert that goes out via email and then to sortof a mailing list, and then another line that does email-to-pager. I was thinking something along the lines of what I have below although I am not sure about the TIME statement if you can have comma separated values, ,if you can pass midnight such as TIME=*:0910:0900
"Yes" to both questions. You can definitely have a list of comma-separated settings, and wrapping around midnight also works. You can try running "hobbitd_alert --test HOSTNAME TESTNAME" and see how it decides which alerts to send out. Your configuration looks OK to me. Regards, Henrik