Xymon Mailing List Archive search

purple/clear problems?!

5 messages in this thread

list Peter Welter · Mon, 27 Jun 2005 10:07:31 +0200 ·
Hi all,

Hope you cold shine a light on this one. I'm running Hobbit 4.0.4 on
Solaris 8. I've enabled reporting as follows with macro's in
hobbit-alert.cfg:

$UNIXDAY=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 REPEAT=1d RECOVERED

$UNIXNIGHT=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 DURATION>30m
REPEAT=60m RECOVERED SERVICE=!cpu,!msgs COLOR=!yellow

This morning I found in my mailbox about 4000 mails reporting stopped
& recovered. What happened was that the connection for 'host' failed.
I've got 1 email of that event as it should. But mainly, until the
connection re-established last night, a bouncing effect for other than
the conn-test. I could build a dependancy (don't warn if no-conn) but
that is rather annoying since I have many hosts to manage.

I just don't fully understand why the other tests are bouncing whereas
the conn-test turns off on friday and on last night. See below:

...
Bigbrother account	Hobbit [705353] host:disk stopped reporting
(PURPLE)	za 25-06-05 6:04	1 KB
Bigbrother account	Hobbit [735704] host:msgs stopped reporting
(PURPLE)	za 25-06-05 6:04	1 KB
Bigbrother account	Hobbit [115710] host:procs stopped reporting
(PURPLE)	za 25-06-05 6:04	1 KB
Bigbrother account	Hobbit [963024] host:cpu stopped reporting
(PURPLE)	za 25-06-05 6:04	1 KB
Bigbrother account	Hobbit host:msgs recovered	za 25-06-05 6:05	1 KB	
Bigbrother account	Hobbit host:procs recovered	za 25-06-05 6:05	1 KB	
Bigbrother account	Hobbit host:disk recovered	za 25-06-05 6:05	1 KB	
Bigbrother account	Hobbit host:cpu recovered	za 25-06-05 6:05	1 KB	
Bigbrother account	Hobbit [982927] host:procs stopped reporting
(PURPLE)	za 25-06-05 6:09	1 KB
Bigbrother account	Hobbit [276744] host:disk stopped reporting
(PURPLE)	za 25-06-05 6:09	1 KB
Bigbrother account	Hobbit [850873] host:msgs stopped reporting
(PURPLE)	za 25-06-05 6:09	1 KB
Bigbrother account	Hobbit [612113] host:cpu stopped reporting
(PURPLE)	za 25-06-05 6:09	1 KB

The process check for example turn purple, of course, but then a clear
status appears:

...
Sun Jun 26 20:09:44 2005 	clear 	0:04:43
Sun Jun 26 20:09:27 2005 	purple 	0:00:17
Sun Jun 26 20:04:32 2005 	clear 	0:04:55
Sun Jun 26 20:04:27 2005 	purple 	0:00:05
Sun Jun 26 20:00:22 2005 	clear 	0:04:05
Sun Jun 26 19:59:26 2005 	purple 	0:00:56
...

Thanks,

Peter
list Peter Welter · Wed, 29 Jun 2005 17:16:37 +0200 ·
Sorry I have to bother you again, but this afternoon I experienced the
same bouncing effect problem again:

the server to be tested went down at 13:31. After 10 minutes conn
worked again but after 5 minutes conn failed and did not come up
afterwards.

Then the following happened:

At 14:01 cpu, procs, msgs, disk went purple.
At 14:02 cpu, procs, msgs, disk went clear.
At 14:06 cpu, procs, msgs, disk went purple.
At 14:06 cpu, procs, msgs, disk went clear.
At 14:11 cpu, procs, msgs, disk went purple.
At 14:11 cpu, procs, msgs, disk went clear.
At 14:16 cpu, procs, msgs, disk went purple.
At 14:16 cpu, procs, msgs, disk went clear.

Then I blue dotted the test untill the machine went up again.

Perhaps I should add the following info: the Hobbit server gets it's
data from a Big Brother server (yeah, it's the last week were Hobbit
runs parallel to a BB-server; production next week after my week off).
So it's not Hobbit who is performing the network tests, yet.

On the Hobbit server the bbd- and bbtest-columns are blue dotted and I
have commented out the bbnet-test in the hobbitlaunch.cfg, so I
thought the network test are not performed twice and everything should
be working fine. Well, the mail actions do work :-]

Should I disable the bbretest-test as well?

Thank for your help,

Peter

2005/6/27, Peter Welter <user-f55666bd0d1e@xymon.invalid>:
quoted from Peter Welter
Hi all,

Hope you cold shine a light on this one. I'm running Hobbit 4.0.4 on
Solaris 8. I've enabled reporting as follows with macro's in
hobbit-alert.cfg:

$UNIXDAY=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 REPEAT=1d RECOVERED

$UNIXNIGHT=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 DURATION>30m
REPEAT=60m RECOVERED SERVICE=!cpu,!msgs COLOR=!yellow

This morning I found in my mailbox about 4000 mails reporting stopped
& recovered. What happened was that the connection for 'host' failed.
I've got 1 email of that event as it should. But mainly, until the
connection re-established last night, a bouncing effect for other than
the conn-test. I could build a dependancy (don't warn if no-conn) but
that is rather annoying since I have many hosts to manage.

I just don't fully understand why the other tests are bouncing whereas
the conn-test turns off on friday and on last night. See below:

...
Bigbrother account      Hobbit [705353] host:disk stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit [735704] host:msgs stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit [115710] host:procs stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit [963024] host:cpu stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit host:msgs recovered      za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit host:procs recovered     za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit host:disk recovered      za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit host:cpu recovered       za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit [982927] host:procs stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB
Bigbrother account      Hobbit [276744] host:disk stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB
Bigbrother account      Hobbit [850873] host:msgs stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB
Bigbrother account      Hobbit [612113] host:cpu stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB

The process check for example turn purple, of course, but then a clear
status appears:

...
Sun Jun 26 20:09:44 2005        clear   0:04:43
Sun Jun 26 20:09:27 2005        purple  0:00:17
Sun Jun 26 20:04:32 2005        clear   0:04:55
Sun Jun 26 20:04:27 2005        purple  0:00:05
Sun Jun 26 20:00:22 2005        clear   0:04:05
Sun Jun 26 19:59:26 2005        purple  0:00:56
...

Thanks,

Peter
list Henrik Størner · Tue, 5 Jul 2005 17:59:59 +0200 ·
It's been a while, but I'm trying to get around to this:
quoted from Peter Welter

On Wed, Jun 29, 2005 at 05:16:37PM +0200, Peter Welter wrote:
At 14:01 cpu, procs, msgs, disk went purple.
At 14:02 cpu, procs, msgs, disk went clear.
At 14:06 cpu, procs, msgs, disk went purple.
At 14:06 cpu, procs, msgs, disk went clear.
Perhaps I should add the following info: the Hobbit server gets it's
data from a Big Brother server
Are you running the BBDISPLAY part of Big Brother ? That could explain
the purples, because a BBDISPLAY server will send out status messages
to change a color to purple - Hobbit doesn't, that's handled internally
in the hobbit daemon.


Regards,
Henrik
list Gee Pee · Mon, 11 Jul 2005 09:17:55 +0200 ·
Op 5-jul-2005, om 17:59 heeft Henrik Stoerner het volgende geschreven:
It's been a while, but I'm trying to get around to this:
Very much appreciated!
quoted from Peter Welter
On Wed, Jun 29, 2005 at 05:16:37PM +0200, Peter Welter wrote:
At 14:01 cpu, procs, msgs, disk went purple.
At 14:02 cpu, procs, msgs, disk went clear.
At 14:06 cpu, procs, msgs, disk went purple.
At 14:06 cpu, procs, msgs, disk went clear.
Perhaps I should add the following info: the Hobbit server gets it's
data from a Big Brother server
Are you running the BBDISPLAY part of Big Brother ? That could explain
the purples, because a BBDISPLAY server will send out status messages
to change a color to purple - Hobbit doesn't, that's handled  
internally
in the hobbit daemon.
Yes, the Hobbit server gets its data from the BBDISPLAY BB-server,  
using the BBRELAY-option.

But I do not completely understand the bouncing effect?

Regards,
Peter
list Henrik Størner · Mon, 11 Jul 2005 09:38:18 +0000 (UTC) ·
quoted from Gee Pee
In <user-42e03adc1f99@xymon.invalid> Gee Pee <user-f55666bd0d1e@xymon.invalid> writes:

Op 5-jul-2005, om 17:59 heeft Henrik Stoerner het volgende geschreven:
Are you running the BBDISPLAY part of Big Brother ? That could explain
the purples, because a BBDISPLAY server will send out status messages
to change a color to purple - Hobbit doesn't, that's handled  
internally
in the hobbit daemon.
Yes, the Hobbit server gets its data from the BBDISPLAY BB-server,  
using the BBRELAY-option.
But I do not completely understand the bouncing effect?
The original bb-display.sh script - and my bbgen package, if you happen
to use that - generates purple status messages as part of updating the
webpages.

Hobbit does this differently: Statuses change to purple "automatically"
inside the Hobbit daemon. When Hobbit checks for this, it will first look
at the "ping" test, and if this indicates that the host is down, it will
change the tests to go "clear" instead of purple.

So I think what happens is this:

* you originally (while the host was up) received status messages for
  ping, cpu, disk, msgs and perhaps some other tests.
* The host goes down. The ping test changes to red, and the cpu, disk
  etc. local tests are no longer updated.
* After 30 minutes the lifetime of the cpu etc. tests expires. Hobbit
  notices this but also sees that the ping-test is red, so it changes
  status to clear.
* On your BBDISPLAY server, the web update now runs. It also sees these
  tests as having expired, so it generates a purple status-update which
  it forwards (because of BBRELAY) to the Hobbit server. Hobbit handles
  this as any other status message and changes the status to purple.
* Shortly after Hobbit again scans for messages that have expired, finds
  these tests again and changes them to clear because the "ping" test is
  still red.
* After 5 minutes, the BBDISPLAY server again updates the webpages, sees
  these tests as having expired, and generate a new purple status message
  which it forwards to the Hobbit server. Status goes purple.
* etc. ad infinitum.

One could argue that Hobbit should ignore it when a status message is 
received which is "purple" (because that should never happen - this color
should only be triggered internally in Hobbit). I'll add a check for this.


Regards,
Henrik