unexpected green mails
list Robin Wood
Hi I installed hobbit around a week ago and am very impressed with it but I have a question. I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them. I am getting these once every couple of days, not at any particular time. Can anyone suggest why? ta Robin
list Henrik Størner
▸
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I installed hobbit around a week ago and am very impressed with it but I have a question. I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them. I am getting these once every couple of days, not at any particular time.
Sounds a bit odd, but I'd need some more information before trying to track it down. Which version are you using ? What's in the ~/data/acks/notifications.log file ? What are your rules in hobbit-alerts.cfg for sending out alert- and recovery-messages ? What does the history show for a host that you get one of these messages for ? Regards, Henrik
list Robin Wood
The version is 4.0-RC1.
I monitor 3 external boxes and an internal one.
here are the last 2 batches of entries from teh notifications.log file:
Fri Feb 18 22:14:43 2005 another.domain.com.imap (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 843
Fri Feb 18 22:14:43 2005 third.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 internal.domain.int.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 another.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 internal.domain.int.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 internal.domain.int.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbd (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Fri Feb 18 22:14:43 2005 internal.domain.int.smtp (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 725
Fri Feb 18 22:14:43 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 725
Fri Feb 18 22:14:43 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 721
Fri Feb 18 22:14:43 2005 third.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 another.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 internal.domain.int.rpc (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 internal.domain.int.imap (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 843
Fri Feb 18 22:14:43 2005 internal.domain.int.dns (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 800
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbtest (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Sat Feb 19 05:45:08 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 721
Sat Feb 19 05:45:08 2005 third.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 725
Sat Feb 19 05:45:08 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:08 2005 internal.domain.int.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.imap (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 843
Sat Feb 19 05:45:09 2005 internal.domain.int.rpc (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 0
Sat Feb 19 05:45:09 2005 internal.domain.int.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:09 2005 internal.domain.int.smtp (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 725
Sat Feb 19 05:45:09 2005 internal.domain.int.imap (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 843
Sat Feb 19 05:45:09 2005 internal.domain.int.dns (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 800
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbd (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 0
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 600
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbtest (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 0
Sat Feb 19 06:15:08 2005 another.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 internal.domain.int.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 third.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 alerts.mydomain.com.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108793708 500
The only rule in the alerts file is
HOST=*
MAIL user-1af806390433@xymon.invalid
The histories are showing that the status for most of them is
unchanged in the last 14 hours which counting back is when the mails
were sent out. The graphs seem to show a gap in monitoring from around
21:30 (just before the first set of notifications entered the logs but
no mails were sent out) to around 04:30 (again just before the
notifications entered the log).
I know that the servers do a log rotate but that is around midnight.
I can't understand why the status would have changed 14 hours ago and
why there should be no log data for any period.
My update period is 30 mins. The rest of the install is virtually
straight out of the box with nothing more than what the instructions
say to change.
If you want any more info just ask.
Ta
Robin
▸
On Fri, 18 Feb 2005 23:12:48 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:I installed hobbit around a week ago and am very impressed with it but I have a question. I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them. I am getting these once every couple of days, not at any particular time.Sounds a bit odd, but I'd need some more information before trying to track it down. Which version are you using ? What's in the ~/data/acks/notifications.log file ? What are your rules in hobbit-alerts.cfg for sending out alert- and recovery-messages ? What does the history show for a host that you get one of these messages for ? Regards, Henrik
list Robin Wood
Just a bit to add to this, the things which are alerting as being green are showing up in the monitor as green smilies, the rest that aren't alerting their green status are green diamonds. Does this matter? Robin
▸
On Sat, 19 Feb 2005 20:09:27 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:The version is 4.0-RC1. I monitor 3 external boxes and an internal one. here are the last 2 batches of entries from teh notifications.log file: Fri Feb 18 22:14:43 2005 another.domain.com.imap (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 843 Fri Feb 18 22:14:43 2005 third.domain.com.http (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 600 Fri Feb 18 22:14:43 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 722 Fri Feb 18 22:14:43 2005 internal.domain.int.conn (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 500 Fri Feb 18 22:14:43 2005 another.domain.com.http (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 600 Fri Feb 18 22:14:43 2005 internal.domain.int.http (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 600 Fri Feb 18 22:14:43 2005 internal.domain.int.ssh (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 722 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbd (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 0 Fri Feb 18 22:14:43 2005 internal.domain.int.smtp (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 725 Fri Feb 18 22:14:43 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 725 Fri Feb 18 22:14:43 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 721 Fri Feb 18 22:14:43 2005 third.domain.com.conn (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 500 Fri Feb 18 22:14:43 2005 another.domain.com.conn (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108764882 500 Fri Feb 18 22:14:43 2005 internal.domain.int.rpc (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 0 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.conn (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 500 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.http (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 600 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.ssh (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 722 Fri Feb 18 22:14:43 2005 internal.domain.int.imap (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 843 Fri Feb 18 22:14:43 2005 internal.domain.int.dns (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 800 Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbtest (192.168.0.8) user-1af806390433@xymon.invalid 1108764882 0 Sat Feb 19 05:45:08 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108791908 721 Sat Feb 19 05:45:08 2005 third.domain.com.http (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.http (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108791908 725 Sat Feb 19 05:45:08 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108791908 722 Sat Feb 19 05:45:08 2005 internal.domain.int.http (192.168.0.8) user-1af806390433@xymon.invalid 1108791908 600 Sat Feb 19 05:45:08 2005 another.domain.com.imap (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108791908 843 Sat Feb 19 05:45:09 2005 internal.domain.int.rpc (192.168.0.8) user-1af806390433@xymon.invalid 1108791908 0 Sat Feb 19 05:45:09 2005 internal.domain.int.ssh (192.168.0.8) user-1af806390433@xymon.invalid 1108791908 722 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.ssh (192.168.0.8) user-1af806390433@xymon.invalid 1108791908 722 Sat Feb 19 05:45:09 2005 internal.domain.int.smtp (192.168.0.8) user-1af806390433@xymon.invalid 1108791908 725 Sat Feb 19 05:45:09 2005 internal.domain.int.imap (192.168.0.8) user-1af806390433@xymon.invalid 1108791908 843 Sat Feb 19 05:45:09 2005 internal.domain.int.dns (192.168.0.8) user-1af806390433@xymon.invalid 1108791908 800 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbd (192.168.0.8) user-1af806390433@xymon.invalid 1108791909 0 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.http (192.168.0.8) user-1af806390433@xymon.invalid 1108791909 600 Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbtest (192.168.0.8) user-1af806390433@xymon.invalid 1108791909 0 Sat Feb 19 06:15:08 2005 another.domain.com.conn (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108793708 500 Sat Feb 19 06:15:08 2005 internal.domain.int.conn (192.168.0.8) user-1af806390433@xymon.invalid 1108793708 500 Sat Feb 19 06:15:08 2005 third.domain.com.conn (xxx.xxx.xxx.xxx) user-1af806390433@xymon.invalid 1108793708 500 Sat Feb 19 06:15:08 2005 alerts.mydomain.com.conn (192.168.0.8) user-1af806390433@xymon.invalid 1108793708 500 The only rule in the alerts file is HOST=* MAIL user-1af806390433@xymon.invalid The histories are showing that the status for most of them is unchanged in the last 14 hours which counting back is when the mails were sent out. The graphs seem to show a gap in monitoring from around 21:30 (just before the first set of notifications entered the logs but no mails were sent out) to around 04:30 (again just before the notifications entered the log). I know that the servers do a log rotate but that is around midnight. I can't understand why the status would have changed 14 hours ago and why there should be no log data for any period. My update period is 30 mins. The rest of the install is virtually straight out of the box with nothing more than what the instructions say to change. If you want any more info just ask. Ta Robin On Fri, 18 Feb 2005 23:12:48 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:I installed hobbit around a week ago and am very impressed with it but I have a question. I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them. I am getting these once every couple of days, not at any particular time.Sounds a bit odd, but I'd need some more information before trying to track it down. Which version are you using ? What's in the ~/data/acks/notifications.log file ? What are your rules in hobbit-alerts.cfg for sending out alert- and recovery-messages ? What does the history show for a host that you get one of these messages for ? Regards, Henrik
list Henrik Størner
▸
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved. Regards, Henrik
list Robin Wood
ye, I'll check it out. What was wrong?
▸
On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved. Regards, Henrik
list Robin Wood
Just some extra info, this is the top of a mail I was getting: Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT) From: user-71c66846cfd2@xymon.invalid (BigBrother) green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok Service ssh on otherdomain.com is OK (up) It claims tha tit stoped reporting but gave me a green status. Robin
▸
On Mon, 28 Feb 2005 08:33:09 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:ye, I'll check it out. What was wrong? On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them.I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved. Regards, Henrik
list Henrik Størner
It hasn't been completely resolved in RC4. I found a bug in the way recovery messages was being handled that could trigger these to go out when they should not, and thought that might be the cause of the problem. Kevin Hanrahan actually found another reason you may get an unexpected green mail - if you have setup alerts to be sent out only on red (COLOR=red), you won't get an alert when it goes yellow (obviously). But you will get the recovery notice when it goes back to green! I'm working on that one, but need to do some more testing later today before I send out the fix. Regards, Henrik
▸
On Mon, Feb 28, 2005 at 08:33:09AM +0000, Robin Wood wrote:ye, I'll check it out. What was wrong? On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:I am occasionally getting a set of emails from it telling me things have a status of green. These aren't after a state change as far as I can tell as there are no other mails before them. I think I've resolved this in the RC4 release that will be available shortly. I would appreciate it if you would try it out and let me know if this problem is solved.Regards,Henrik
--
Henrik Storner
list Henrik Størner
▸
On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:
Just some extra info, this is the top of a mail I was getting: Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT) From: user-71c66846cfd2@xymon.invalid (BigBrother) green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok
OK, this isn't a "green" mail - it's purple! The clue is the subject "otherdomain.com:ssh stopped reporting". The "green" is just the last statusreport that was sent before it stopped reporting any further status. Henrik
list Robin Wood
I've just put rc4 on so I'll see if anything does get fixed, two questions though, first why would things stop reporting? I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"? I have my internet connection all night so it can't be that, especially as one box is the box that has the monitor on it. The other is why are some of my green entries smilies and others diamonds? Ta
▸
Robin
On Mon, 28 Feb 2005 12:58:09 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:Just some extra info, this is the top of a mail I was getting: Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT) From: user-71c66846cfd2@xymon.invalid (BigBrother) green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh okOK, this isn't a "green" mail - it's purple! The clue is the subject "otherdomain.com:ssh stopped reporting". The "green" is just the last statusreport that was sent before it stopped reporting any further status. Henrik
list Kevin Hanrahan
Green smiles = change of state < 24 hours Green diamonds = no change of state > 24 hours (this parameter is now configurable)
▸
-----Original Message-----
From: Robin Wood [mailto:user-a977a67e95c8@xymon.invalid]
Sent: Thursday, March 03, 2005 3:20 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] unexpected green mails
I've just put rc4 on so I'll see if anything does get fixed, two questions
though, first why would things stop reporting? I'm monitoring 3 different
boxes, one local, 2 remote on different hosts, what constitutes "stopping
reporting"? I have my internet connection all night so it can't be that,
especially as one box is the box that has the monitor on it.
The other is why are some of my green entries smilies and others diamonds?
Ta
Robin
On Mon, 28 Feb 2005 12:58:09 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:Just some extra info, this is the top of a mail I was getting: Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT) From: user-71c66846cfd2@xymon.invalid (BigBrother) green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh okOK, this isn't a "green" mail - it's purple! The clue is the subject "otherdomain.com:ssh stopped reporting". The "green" is just the last statusreport that was sent before it stopped reporting any further status. Henrik
Note: The information contained in this email and in any attachments is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. The recipient should check this email and any attachments for the presence of viruses. Sender accepts no liability for any damages caused by any virus transmitted by this email. If you have received this email in error, please notify us immediately by replying to the message and delete the email from your computer. This e-mail is and any response to it will be unencrypted and, therefore, potentially unsecure. Thank you. NOVA Information Systems, Inc.
list Henrik Størner
▸
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot.
▸
I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.
▸
The other is why are some of my green entries smilies and others diamonds?
Smilies mean the color has changed within the past 24 hours. Henrik
list Robin Wood
▸
On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:I've just put rc4 on so I'll see if anything does get fixedDo pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patchtwo questions though, first why would things stop reporting?Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot.
None of the boxes get rebooted, they are all live servers running 24/7, two with ISPs and one my own which I know the uptime of.
▸
I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.
I've never seen anything actually go purple when the mails were sent out but I don't watch it all the time so it could have done.
▸
The other is why are some of my green entries smilies and others diamonds?Smilies mean the color has changed within the past 24 hours.
ok sounds reasonable that if it sends out the mails then it is because it thinks the status has changed. I was going to report that RC4 had fixed it as I'd had no mails but then I got this: - Program crashed Fatal signal caught! on the hobbit-alert monitor so I guess that may be why I hadn't got any. I'll put the other patch on and see what happens. Robin
Henrik
list Robin Wood
One other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same?
▸
On Fri, 4 Mar 2005 23:59:22 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:I've just put rc4 on so I'll see if anything does get fixedDo pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patchtwo questions though, first why would things stop reporting?Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot.None of the boxes get rebooted, they are all live servers running 24/7, two with ISPs and one my own which I know the uptime of.I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.I've never seen anything actually go purple when the mails were sent out but I don't watch it all the time so it could have done.The other is why are some of my green entries smilies and others diamonds?Smilies mean the color has changed within the past 24 hours.ok sounds reasonable that if it sends out the mails then it is because it thinks the status has changed. I was going to report that RC4 had fixed it as I'd had no mails but then I got this: - Program crashed Fatal signal caught! on the hobbit-alert monitor so I guess that may be why I hadn't got any. I'll put the other patch on and see what happens. RobinHenrik
list Robin Wood
RC5 is unfortunatly still causing random "x stopped reporting" errors. I just got 20 mails similar to this one: Subject: BB [431703] mydomain.int:imap stopped reporting to BB Date: Mon, 7 Mar 2005 22:02:52 +0000 (GMT) green Mon Mar 7 21:32:41 2005 imap ok Service imap on mydomain.int is OK (up) * OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE ACL ACL2=UNION STARTTLS] Courier-IMAP ready. Copyright 1998-2004 Double Precision, Inc. See COPYING for distribution information. * BYE Courier-IMAP server shutting down ABC123 OK LOGOUT completed Seconds: 0.01 This is for the IMAP server on the same box as the monitor so there could be no network or connection issues. Anyone any ideas of anything else to try? A good side is that it is happening less frequently. Robin
▸
On Sat, 5 Mar 2005 00:00:30 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:One other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same? On Fri, 4 Mar 2005 23:59:22 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:I've just put rc4 on so I'll see if anything does get fixedDo pickup the post-RC4 patch, it has the final fix for the green mails. http://www.hswn.dk/beta/post-RC4.patchtwo questions though, first why would things stop reporting?Most common cause: The server was rebooted, and the client was not setup to restart automatically after a boot.None of the boxes get rebooted, they are all live servers running 24/7, two with ISPs and one my own which I know the uptime of.I'm monitoring 3 different boxes, one local, 2 remote on different hosts, what constitutes "stopping reporting"?A status in Hobbit (and BB) has a lifetime - default is 30 minutes. Normally a status is refreshed every 5 minutes, so it stays "alive". If Hobbit sees that a status has not been updated for so long that its lifetime has been exceeded, it goes into the "stopped reporting" (purple) state.I've never seen anything actually go purple when the mails were sent out but I don't watch it all the time so it could have done.The other is why are some of my green entries smilies and others diamonds?Smilies mean the color has changed within the past 24 hours.ok sounds reasonable that if it sends out the mails then it is because it thinks the status has changed. I was going to report that RC4 had fixed it as I'd had no mails but then I got this: - Program crashed Fatal signal caught! on the hobbit-alert monitor so I guess that may be why I hadn't got any. I'll put the other patch on and see what happens. RobinHenrik
list Henrik Størner
▸
On Mon, Mar 07, 2005 at 10:35:04PM +0000, Robin Wood wrote:
RC5 is unfortunatly still causing random "x stopped reporting" errors. I just got 20 mails similar to this one: Subject: BB [431703] mydomain.int:imap stopped reporting to BB Date: Mon, 7 Mar 2005 22:02:52 +0000 (GMT) green Mon Mar 7 21:32:41 2005 imap ok
Well, that report is more than 30 minutes old - the report is from Mar 7 21:32, and the alert is dated Mar 7 22:02. You mentioned that
▸
One other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same?
What exactly is is that you've changed ? I dont quite follow what you mean with "monitor period". What's the "interval" setting in hobbitlaunch.cfg for the [bbnet] task? Regards, Henrik
list Robin Wood
This is the setting I have in hobbitlaunch.cfg
[bbnet]
ENVFILE /home/bb/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD bbtest-net --report --ping --checkresponse
LOGFILE $BBSERVERLOGS/bb-network.log
INTERVAL 30m
I am wondering if the problem is that sometimes this isn't getting its
data in before the alterer tries to pick the data up so the data is
slightly over 30 minutes old and so causes the alerts to be sent out.
▸
On Tue, 8 Mar 2005 00:08:27 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Mon, Mar 07, 2005 at 10:35:04PM +0000, Robin Wood wrote:RC5 is unfortunatly still causing random "x stopped reporting" errors. I just got 20 mails similar to this one: Subject: BB [431703] mydomain.int:imap stopped reporting to BB Date: Mon, 7 Mar 2005 22:02:52 +0000 (GMT) green Mon Mar 7 21:32:41 2005 imap okWell, that report is more than 30 minutes old - the report is from Mar 7 21:32, and the alert is dated Mar 7 22:02. You mentioned thatOne other thing I did think of is that I set my monitor period to be 30 mins, could that have anything to do with it, something to do with the time to live and the refresh period being the same?What exactly is is that you've changed ? I dont quite follow what you mean with "monitor period". What's the "interval" setting in hobbitlaunch.cfg for the [bbnet] task? Regards, Henrik
list Henrik Størner
▸
On Tue, Mar 08, 2005 at 01:52:02PM +0000, Robin Wood wrote:
This is the setting I have in hobbitlaunch.cfg
[bbnet]
ENVFILE /home/bb/server/etc/hobbitserver.cfg
NEEDS hobbitd
CMD bbtest-net --report --ping --checkresponse
LOGFILE $BBSERVERLOGS/bb-network.log
INTERVAL 30m
I am wondering if the problem is that sometimes this isn't getting its
data in before the alterer tries to pick the data up so the data is
slightly over 30 minutes old and so causes the alerts to be sent
out.Yep, that is it. Network tests have a lifetime of 30 minutes before they go purple, so if you only run the network tests with 30 minute intervals, there are bound to be some occasions where the network tests fails to update the status before the go-purple triggers. Just don't set the interval that high - problem fixed. Regards, Henrik
list Robin Wood
I'll drop it to 25 mins and that should fix it. Ta Robin
▸
On Tue, 8 Mar 2005 15:15:27 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Tue, Mar 08, 2005 at 01:52:02PM +0000, Robin Wood wrote:This is the setting I have in hobbitlaunch.cfg [bbnet] ENVFILE /home/bb/server/etc/hobbitserver.cfg NEEDS hobbitd CMD bbtest-net --report --ping --checkresponse LOGFILE $BBSERVERLOGS/bb-network.log INTERVAL 30m I am wondering if the problem is that sometimes this isn't getting its data in before the alterer tries to pick the data up so the data is slightly over 30 minutes old and so causes the alerts to be sent out.Yep, that is it. Network tests have a lifetime of 30 minutes before they go purple, so if you only run the network tests with 30 minute intervals, there are bound to be some occasions where the network tests fails to update the status before the go-purple triggers. Just don't set the interval that high - problem fixed. Regards, Henrik