Xymon Mailing List Archive search

unexpected green mails

19 messages in this thread

list Robin Wood · Fri, 18 Feb 2005 19:04:47 +0000 ·
Hi
I installed hobbit around a week ago and am very impressed with it but
I have a question.

I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.

I am getting these once every couple of days, not at any particular time.

Can anyone suggest why?

ta

Robin
list Henrik Størner · Fri, 18 Feb 2005 23:12:48 +0100 ·
quoted from Robin Wood
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I installed hobbit around a week ago and am very impressed with it but
I have a question.

I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.

I am getting these once every couple of days, not at any particular time.
Sounds a bit odd, but I'd need some more information before trying to
track it down.

Which version are you using ?

What's in the ~/data/acks/notifications.log file ?

What are your rules in hobbit-alerts.cfg for sending out alert-
and recovery-messages ?

What does the history show for a host that you get one of these
messages for ?


Regards,
Henrik
list Robin Wood · Sat, 19 Feb 2005 20:09:27 +0000 ·
The version is 4.0-RC1.

I monitor 3 external boxes and an internal one.

here are the last 2 batches of entries from teh notifications.log file:

Fri Feb 18 22:14:43 2005 another.domain.com.imap (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 843
Fri Feb 18 22:14:43 2005 third.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 internal.domain.int.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 another.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 internal.domain.int.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 internal.domain.int.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbd (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Fri Feb 18 22:14:43 2005 internal.domain.int.smtp (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 725
Fri Feb 18 22:14:43 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 725
Fri Feb 18 22:14:43 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 721
Fri Feb 18 22:14:43 2005 third.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 another.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 internal.domain.int.rpc (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 internal.domain.int.imap (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 843
Fri Feb 18 22:14:43 2005 internal.domain.int.dns (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 800
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbtest (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Sat Feb 19 05:45:08 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 721
Sat Feb 19 05:45:08 2005 third.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 725
Sat Feb 19 05:45:08 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:08 2005 internal.domain.int.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.imap (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 843
Sat Feb 19 05:45:09 2005 internal.domain.int.rpc (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 0
Sat Feb 19 05:45:09 2005 internal.domain.int.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:09 2005 internal.domain.int.smtp (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 725
Sat Feb 19 05:45:09 2005 internal.domain.int.imap (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 843
Sat Feb 19 05:45:09 2005 internal.domain.int.dns (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 800
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbd (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 0
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 600
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbtest (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 0
Sat Feb 19 06:15:08 2005 another.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 internal.domain.int.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 third.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 alerts.mydomain.com.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108793708 500

The only rule in the alerts file is

HOST=*
     MAIL user-1af806390433@xymon.invalid

The histories are showing that the status for most of them is
unchanged in the last 14 hours which counting back is when the mails
were sent out. The graphs seem to show a gap in monitoring from around
21:30 (just before the first set of notifications entered the logs but
no mails were sent out) to around 04:30 (again just before the
notifications entered the log).

I know that the servers do a log rotate but that is around midnight. 

I can't understand why the status would have changed 14 hours ago and
why there should be no log data for any period.

My update period is 30 mins. The rest of the install is virtually
straight out of the box with nothing more than what the instructions
say to change.

If you want any more info just ask.

Ta

Robin
quoted from Henrik Størner


On Fri, 18 Feb 2005 23:12:48 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I installed hobbit around a week ago and am very impressed with it but
I have a question.

I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.

I am getting these once every couple of days, not at any particular time.
Sounds a bit odd, but I'd need some more information before trying to
track it down.

Which version are you using ?

What's in the ~/data/acks/notifications.log file ?

What are your rules in hobbit-alerts.cfg for sending out alert-
and recovery-messages ?

What does the history show for a host that you get one of these
messages for ?

Regards,
Henrik

list Robin Wood · Wed, 23 Feb 2005 23:17:27 +0000 ·
Just a bit to add to this, the things which are alerting as being
green are showing up in the monitor as green smilies, the rest that
aren't alerting their green status are green diamonds.

Does this matter?

Robin
quoted from Robin Wood


On Sat, 19 Feb 2005 20:09:27 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:
The version is 4.0-RC1.

I monitor 3 external boxes and an internal one.

here are the last 2 batches of entries from teh notifications.log file:

Fri Feb 18 22:14:43 2005 another.domain.com.imap (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 843
Fri Feb 18 22:14:43 2005 third.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 internal.domain.int.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 another.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 internal.domain.int.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 internal.domain.int.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbd (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Fri Feb 18 22:14:43 2005 internal.domain.int.smtp (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 725
Fri Feb 18 22:14:43 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 725
Fri Feb 18 22:14:43 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 721
Fri Feb 18 22:14:43 2005 third.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 another.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 internal.domain.int.rpc (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 500
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 600
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 722
Fri Feb 18 22:14:43 2005 internal.domain.int.imap (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 843
Fri Feb 18 22:14:43 2005 internal.domain.int.dns (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 800
Fri Feb 18 22:14:43 2005 alerts.mydomain.com.bbtest (192.168.0.8)
user-1af806390433@xymon.invalid 1108764882 0
Sat Feb 19 05:45:08 2005 third.domain.com.ftp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 721
Sat Feb 19 05:45:08 2005 third.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.http (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.smtp (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 725
Sat Feb 19 05:45:08 2005 another.domain.com.ssh (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:08 2005 internal.domain.int.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 600
Sat Feb 19 05:45:08 2005 another.domain.com.imap (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108791908 843
Sat Feb 19 05:45:09 2005 internal.domain.int.rpc (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 0
Sat Feb 19 05:45:09 2005 internal.domain.int.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.ssh (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 722
Sat Feb 19 05:45:09 2005 internal.domain.int.smtp (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 725
Sat Feb 19 05:45:09 2005 internal.domain.int.imap (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 843
Sat Feb 19 05:45:09 2005 internal.domain.int.dns (192.168.0.8)
user-1af806390433@xymon.invalid 1108791908 800
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbd (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 0
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.http (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 600
Sat Feb 19 05:45:09 2005 alerts.mydomain.com.bbtest (192.168.0.8)
user-1af806390433@xymon.invalid 1108791909 0
Sat Feb 19 06:15:08 2005 another.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 internal.domain.int.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 third.domain.com.conn (xxx.xxx.xxx.xxx)
user-1af806390433@xymon.invalid 1108793708 500
Sat Feb 19 06:15:08 2005 alerts.mydomain.com.conn (192.168.0.8)
user-1af806390433@xymon.invalid 1108793708 500

The only rule in the alerts file is

HOST=*
     MAIL user-1af806390433@xymon.invalid

The histories are showing that the status for most of them is
unchanged in the last 14 hours which counting back is when the mails
were sent out. The graphs seem to show a gap in monitoring from around
21:30 (just before the first set of notifications entered the logs but
no mails were sent out) to around 04:30 (again just before the
notifications entered the log).

I know that the servers do a log rotate but that is around midnight.

I can't understand why the status would have changed 14 hours ago and
why there should be no log data for any period.

My update period is 30 mins. The rest of the install is virtually
straight out of the box with nothing more than what the instructions
say to change.

If you want any more info just ask.

Ta

Robin


On Fri, 18 Feb 2005 23:12:48 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I installed hobbit around a week ago and am very impressed with it but
I have a question.

I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.

I am getting these once every couple of days, not at any particular time.
Sounds a bit odd, but I'd need some more information before trying to
track it down.

Which version are you using ?

What's in the ~/data/acks/notifications.log file ?

What are your rules in hobbit-alerts.cfg for sending out alert-
and recovery-messages ?

What does the history show for a host that you get one of these
messages for ?

Regards,
Henrik

list Henrik Størner · Sun, 27 Feb 2005 17:04:10 +0100 ·
quoted from Robin Wood
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available
shortly. I would appreciate it if you would try it out and let me know
if this problem is solved.


Regards,
Henrik
list Robin Wood · Mon, 28 Feb 2005 08:33:09 +0000 ·
ye, I'll check it out. What was wrong?
quoted from Henrik Størner


On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available
shortly. I would appreciate it if you would try it out and let me know
if this problem is solved.


Regards,
Henrik

list Robin Wood · Mon, 28 Feb 2005 08:35:09 +0000 ·
Just some extra info, this is the top of a mail I was getting:

Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB                
Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT)
From: user-71c66846cfd2@xymon.invalid (BigBrother)

green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok

Service ssh on otherdomain.com is OK (up)  

It claims tha tit stoped reporting but gave me a green status.

Robin
quoted from Robin Wood


On Mon, 28 Feb 2005 08:33:09 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:
ye, I'll check it out. What was wrong?


On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available
shortly. I would appreciate it if you would try it out and let me know
if this problem is solved.


Regards,
Henrik

list Henrik Størner · Mon, 28 Feb 2005 12:54:38 +0100 ·
It hasn't been completely resolved in RC4. I found a bug in the way
recovery messages was being handled that could trigger these to go out
when they should not, and thought that might be the cause of the
problem.

Kevin Hanrahan actually found another reason you may get an unexpected
green mail - if you have setup alerts to be sent out only on red
(COLOR=red), you won't get an alert when it goes yellow
(obviously). But you will get the recovery notice when it goes back to
green! I'm working on that one, but need to do some more testing later
today before I send out the fix.


Regards,
Henrik
quoted from Robin Wood


On Mon, Feb 28, 2005 at 08:33:09AM +0000, Robin Wood wrote:
ye, I'll check it out. What was wrong?


On Sun, 27 Feb 2005 17:04:10 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, Feb 18, 2005 at 07:04:47PM +0000, Robin Wood wrote:
I am occasionally getting a set of emails from it telling me things
have a status of green. These aren't after a state change as far as I
can tell as there are no other mails before them.
I think I've resolved this in the RC4 release that will be available
shortly. I would appreciate it if you would try it out and let me know
if this problem is solved.
Regards,
Henrik
-- 

Henrik Storner
list Henrik Størner · Mon, 28 Feb 2005 12:58:09 +0100 ·
quoted from Robin Wood
On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:
Just some extra info, this is the top of a mail I was getting:

Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB                
Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT)
From: user-71c66846cfd2@xymon.invalid (BigBrother)

green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok
OK, this isn't a "green" mail - it's purple! The clue is the subject
"otherdomain.com:ssh stopped reporting". The "green" is just the last
statusreport that was sent before it stopped reporting any further
status.


Henrik
list Robin Wood · Thu, 3 Mar 2005 20:19:38 +0000 ·
I've just put rc4 on so I'll see if anything does get fixed, two
questions though, first why would things stop reporting? I'm
monitoring 3 different boxes, one local, 2 remote on different hosts,
what constitutes "stopping reporting"? I have my internet connection
all night so it can't be that, especially as one box is the box that
has the monitor on it.

The other is why are some of my green entries smilies and others diamonds?

Ta
quoted from Henrik Størner

Robin


On Mon, 28 Feb 2005 12:58:09 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:
Just some extra info, this is the top of a mail I was getting:

Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB
Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT)
From: user-71c66846cfd2@xymon.invalid (BigBrother)

green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok
OK, this isn't a "green" mail - it's purple! The clue is the subject
"otherdomain.com:ssh stopped reporting". The "green" is just the last
statusreport that was sent before it stopped reporting any further
status.


Henrik

list Kevin Hanrahan · Thu, 3 Mar 2005 15:26:25 -0500 ·
Green smiles = change of state < 24 hours
Green diamonds = no change of state > 24 hours
	(this parameter is now configurable)
quoted from Robin Wood

-----Original Message-----
From: Robin Wood [mailto:user-a977a67e95c8@xymon.invalid] 
Sent: Thursday, March 03, 2005 3:20 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] unexpected green mails


I've just put rc4 on so I'll see if anything does get fixed, two questions
though, first why would things stop reporting? I'm monitoring 3 different
boxes, one local, 2 remote on different hosts, what constitutes "stopping
reporting"? I have my internet connection all night so it can't be that,
especially as one box is the box that has the monitor on it.

The other is why are some of my green entries smilies and others diamonds?

Ta

Robin


On Mon, 28 Feb 2005 12:58:09 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Mon, Feb 28, 2005 at 08:35:09AM +0000, Robin Wood wrote:
Just some extra info, this is the top of a mail I was getting:

Subject: BB [182299] otherdomain.com:ssh stopped reporting to BB
Date: Mon, 28 Feb 2005 06:23:42 +0000 (GMT)
From: user-71c66846cfd2@xymon.invalid (BigBrother)

green <!-- [flags:OrdastILe] --> Mon Feb 28 05:53:31 2005 ssh ok
OK, this isn't a "green" mail - it's purple! The clue is the subject 
"otherdomain.com:ssh stopped reporting". The "green" is just the last 
statusreport that was sent before it stopped reporting any further 
status.


Henrik

Note:  The information contained in this email and in any attachments is
intended only for the person or entity to which it is addressed and may
contain confidential and/or privileged material.  Any review,
retransmission, dissemination or other use of, or taking of any action in
reliance upon, this information by persons or entities other than the
intended recipient is prohibited.  The recipient should check this email and
any attachments for the presence of viruses.  Sender accepts no liability
for any damages caused by any virus transmitted by this email. If you have
received this email in error, please notify us immediately by replying to
the message and delete the email from your computer.  This e-mail is and any
response to it will be unencrypted and, therefore, potentially unsecure.
Thank you.  NOVA Information Systems, Inc.
list Henrik Størner · Thu, 3 Mar 2005 23:15:41 +0100 ·
quoted from Robin Wood
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green
mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not
setup to restart automatically after a boot.
quoted from Kevin Hanrahan
I'm monitoring 3 different boxes, one local, 2 remote on different
hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes.
Normally a status is refreshed every 5 minutes, so it stays "alive".
If Hobbit sees that a status has not been updated for so long that its
lifetime has been exceeded, it goes into the "stopped reporting"
(purple) state.
quoted from Kevin Hanrahan
The other is why are some of my green entries smilies and others
 diamonds?
Smilies mean the color has changed within the past 24 hours.


Henrik
list Robin Wood · Fri, 4 Mar 2005 23:59:22 +0000 ·
quoted from Henrik Størner
On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green
mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not
setup to restart automatically after a boot.
None of the boxes get rebooted, they are all live servers running
24/7, two with ISPs and one my own which I know the uptime of.
quoted from Henrik Størner
I'm monitoring 3 different boxes, one local, 2 remote on different
hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes.
Normally a status is refreshed every 5 minutes, so it stays "alive".
If Hobbit sees that a status has not been updated for so long that its
lifetime has been exceeded, it goes into the "stopped reporting"
(purple) state.
I've never seen anything actually go purple when the mails were sent
out but I don't watch it all the time so it could have done.
quoted from Henrik Størner
The other is why are some of my green entries smilies and others
 diamonds?
Smilies mean the color has changed within the past 24 hours.
ok sounds reasonable that if it sends out the mails then it is because
it thinks the status has changed.

I was going to report that RC4 had fixed it as I'd had no mails but
then I got this:

 - Program crashed

Fatal signal caught!

on the hobbit-alert monitor so I guess that may be why I hadn't got any.

I'll put the other patch on and see what happens.

Robin
Henrik

list Robin Wood · Sat, 5 Mar 2005 00:00:30 +0000 ·
One other thing I did think of is that I set my monitor period to be
30 mins, could that have anything to do with it, something to do with
the time to live and the refresh period being the same?
quoted from Robin Wood


On Fri, 4 Mar 2005 23:59:22 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:
On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green
mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not
setup to restart automatically after a boot.
None of the boxes get rebooted, they are all live servers running
24/7, two with ISPs and one my own which I know the uptime of.
I'm monitoring 3 different boxes, one local, 2 remote on different
hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes.
Normally a status is refreshed every 5 minutes, so it stays "alive".
If Hobbit sees that a status has not been updated for so long that its
lifetime has been exceeded, it goes into the "stopped reporting"
(purple) state.
I've never seen anything actually go purple when the mails were sent
out but I don't watch it all the time so it could have done.
The other is why are some of my green entries smilies and others
 diamonds?
Smilies mean the color has changed within the past 24 hours.
ok sounds reasonable that if it sends out the mails then it is because
it thinks the status has changed.

I was going to report that RC4 had fixed it as I'd had no mails but
then I got this:

 - Program crashed

Fatal signal caught!

on the hobbit-alert monitor so I guess that may be why I hadn't got any.

I'll put the other patch on and see what happens.

Robin
Henrik

list Robin Wood · Mon, 7 Mar 2005 22:35:04 +0000 ·
RC5 is unfortunatly still causing random "x stopped reporting" errors.
I just got 20 mails similar to this one:

Subject: 	BB [431703] mydomain.int:imap stopped reporting to BB
Date: 	Mon,  7 Mar 2005 22:02:52 +0000 (GMT)

green  Mon Mar  7 21:32:41 2005 imap ok 

Service imap on mydomain.int is OK (up)


* OK [CAPABILITY IMAP4rev1 UIDPLUS CHILDREN NAMESPACE
THREAD=ORDEREDSUBJECT THREAD=REFERENCES SORT QUOTA IDLE ACL ACL2=UNION
STARTTLS] Courier-IMAP ready. Copyright 1998-2004 Double Precision,
Inc.  See COPYING for distribution information.
* BYE Courier-IMAP server shutting down
ABC123 OK LOGOUT completed


Seconds: 0.01


This is for the IMAP server on the same box as the monitor so there
could be no network or connection issues. Anyone any ideas of anything
else to try?

A good side is that it is happening less frequently.

Robin
quoted from Robin Wood


On Sat, 5 Mar 2005 00:00:30 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:
One other thing I did think of is that I set my monitor period to be
30 mins, could that have anything to do with it, something to do with
the time to live and the refresh period being the same?


On Fri, 4 Mar 2005 23:59:22 +0000, Robin Wood <user-a977a67e95c8@xymon.invalid> wrote:
On Thu, 3 Mar 2005 23:15:41 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Mar 03, 2005 at 08:19:38PM +0000, Robin Wood wrote:
I've just put rc4 on so I'll see if anything does get fixed
Do pickup the post-RC4 patch, it has the final fix for the green
mails. http://www.hswn.dk/beta/post-RC4.patch
two questions though, first why would things stop reporting?
Most common cause: The server was rebooted, and the client was not
setup to restart automatically after a boot.
None of the boxes get rebooted, they are all live servers running
24/7, two with ISPs and one my own which I know the uptime of.
I'm monitoring 3 different boxes, one local, 2 remote on different
hosts, what constitutes "stopping reporting"?
A status in Hobbit (and BB) has a lifetime - default is 30 minutes.
Normally a status is refreshed every 5 minutes, so it stays "alive".
If Hobbit sees that a status has not been updated for so long that its
lifetime has been exceeded, it goes into the "stopped reporting"
(purple) state.
I've never seen anything actually go purple when the mails were sent
out but I don't watch it all the time so it could have done.
The other is why are some of my green entries smilies and others
 diamonds?
Smilies mean the color has changed within the past 24 hours.
ok sounds reasonable that if it sends out the mails then it is because
it thinks the status has changed.

I was going to report that RC4 had fixed it as I'd had no mails but
then I got this:

 - Program crashed

Fatal signal caught!

on the hobbit-alert monitor so I guess that may be why I hadn't got any.

I'll put the other patch on and see what happens.

Robin
Henrik

list Henrik Størner · Tue, 8 Mar 2005 00:08:27 +0100 ·
quoted from Robin Wood
On Mon, Mar 07, 2005 at 10:35:04PM +0000, Robin Wood wrote:
RC5 is unfortunatly still causing random "x stopped reporting" errors.
I just got 20 mails similar to this one:

Subject: 	BB [431703] mydomain.int:imap stopped reporting to BB
Date: 	Mon,  7 Mar 2005 22:02:52 +0000 (GMT)

green  Mon Mar  7 21:32:41 2005 imap ok 
Well, that report is more than 30 minutes old - the report is from Mar
7 21:32, and the alert is dated Mar 7 22:02.


You mentioned that
quoted from Robin Wood
One other thing I did think of is that I set my monitor period to be
30 mins, could that have anything to do with it, something to do with
the time to live and the refresh period being the same?
What exactly is is that you've changed ? I dont quite follow what you
mean with "monitor period".

What's the "interval" setting in hobbitlaunch.cfg for the [bbnet]
task?


Regards,
Henrik
list Robin Wood · Tue, 8 Mar 2005 13:52:02 +0000 ·
This is the setting I have in hobbitlaunch.cfg

[bbnet]
    ENVFILE /home/bb/server/etc/hobbitserver.cfg                      
                                     NEEDS hobbitd
    CMD bbtest-net --report --ping --checkresponse
    LOGFILE $BBSERVERLOGS/bb-network.log
    INTERVAL 30m

I am wondering if the problem is that sometimes this isn't getting its
data in before the alterer tries to pick the data up so the data is
slightly over  30 minutes old and so causes the alerts to be sent out.
quoted from Henrik Størner


On Tue, 8 Mar 2005 00:08:27 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Mon, Mar 07, 2005 at 10:35:04PM +0000, Robin Wood wrote:
RC5 is unfortunatly still causing random "x stopped reporting" errors.
I just got 20 mails similar to this one:

Subject:      BB [431703] mydomain.int:imap stopped reporting to BB
Date:         Mon,  7 Mar 2005 22:02:52 +0000 (GMT)

green  Mon Mar  7 21:32:41 2005 imap ok
Well, that report is more than 30 minutes old - the report is from Mar
7 21:32, and the alert is dated Mar 7 22:02.

You mentioned that
One other thing I did think of is that I set my monitor period to be
30 mins, could that have anything to do with it, something to do with
the time to live and the refresh period being the same?
What exactly is is that you've changed ? I dont quite follow what you
mean with "monitor period".

What's the "interval" setting in hobbitlaunch.cfg for the [bbnet]
task?

Regards,
Henrik

list Henrik Størner · Tue, 8 Mar 2005 15:15:27 +0100 ·
quoted from Robin Wood
On Tue, Mar 08, 2005 at 01:52:02PM +0000, Robin Wood wrote:
This is the setting I have in hobbitlaunch.cfg

[bbnet]
    ENVFILE /home/bb/server/etc/hobbitserver.cfg                      
                                     NEEDS hobbitd
    CMD bbtest-net --report --ping --checkresponse
    LOGFILE $BBSERVERLOGS/bb-network.log
    INTERVAL 30m

I am wondering if the problem is that sometimes this isn't getting its
data in before the alterer tries to pick the data up so the data is
slightly over  30 minutes old and so causes the alerts to be sent
out.
Yep, that is it. Network tests have a lifetime of 30 minutes before
they go purple, so if you only run the network tests with 30 minute
intervals, there are bound to be some occasions where the network
tests fails to update the status before the go-purple triggers.

Just don't set the interval that high - problem fixed.


Regards,
Henrik
list Robin Wood · Tue, 8 Mar 2005 21:38:53 +0000 ·
I'll drop it to 25 mins and that should fix it.

Ta

Robin
quoted from Henrik Størner


On Tue, 8 Mar 2005 15:15:27 +0100, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Tue, Mar 08, 2005 at 01:52:02PM +0000, Robin Wood wrote:
This is the setting I have in hobbitlaunch.cfg

[bbnet]
    ENVFILE /home/bb/server/etc/hobbitserver.cfg
                                     NEEDS hobbitd
    CMD bbtest-net --report --ping --checkresponse
    LOGFILE $BBSERVERLOGS/bb-network.log
    INTERVAL 30m

I am wondering if the problem is that sometimes this isn't getting its
data in before the alterer tries to pick the data up so the data is
slightly over  30 minutes old and so causes the alerts to be sent
out.
Yep, that is it. Network tests have a lifetime of 30 minutes before
they go purple, so if you only run the network tests with 30 minute
intervals, there are bound to be some occasions where the network
tests fails to update the status before the go-purple triggers.

Just don't set the interval that high - problem fixed.


Regards,
Henrik