purple/clear problems?!
list Peter Welter
Hi all, Hope you cold shine a light on this one. I'm running Hobbit 4.0.4 on Solaris 8. I've enabled reporting as follows with macro's in hobbit-alert.cfg: $UNIXDAY=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 REPEAT=1d RECOVERED $UNIXNIGHT=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 DURATION>30m REPEAT=60m RECOVERED SERVICE=!cpu,!msgs COLOR=!yellow This morning I found in my mailbox about 4000 mails reporting stopped & recovered. What happened was that the connection for 'host' failed. I've got 1 email of that event as it should. But mainly, until the connection re-established last night, a bouncing effect for other than the conn-test. I could build a dependancy (don't warn if no-conn) but that is rather annoying since I have many hosts to manage. I just don't fully understand why the other tests are bouncing whereas the conn-test turns off on friday and on last night. See below: ... Bigbrother account Hobbit [705353] host:disk stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [735704] host:msgs stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [115710] host:procs stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [963024] host:cpu stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit host:msgs recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:procs recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:disk recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:cpu recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit [982927] host:procs stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [276744] host:disk stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [850873] host:msgs stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [612113] host:cpu stopped reporting (PURPLE) za 25-06-05 6:09 1 KB The process check for example turn purple, of course, but then a clear status appears: ... Sun Jun 26 20:09:44 2005 clear 0:04:43 Sun Jun 26 20:09:27 2005 purple 0:00:17 Sun Jun 26 20:04:32 2005 clear 0:04:55 Sun Jun 26 20:04:27 2005 purple 0:00:05 Sun Jun 26 20:00:22 2005 clear 0:04:05 Sun Jun 26 19:59:26 2005 purple 0:00:56 ... Thanks, Peter
list Peter Welter
Sorry I have to bother you again, but this afternoon I experienced the same bouncing effect problem again: the server to be tested went down at 13:31. After 10 minutes conn worked again but after 5 minutes conn failed and did not come up afterwards. Then the following happened: At 14:01 cpu, procs, msgs, disk went purple. At 14:02 cpu, procs, msgs, disk went clear. At 14:06 cpu, procs, msgs, disk went purple. At 14:06 cpu, procs, msgs, disk went clear. At 14:11 cpu, procs, msgs, disk went purple. At 14:11 cpu, procs, msgs, disk went clear. At 14:16 cpu, procs, msgs, disk went purple. At 14:16 cpu, procs, msgs, disk went clear. Then I blue dotted the test untill the machine went up again. Perhaps I should add the following info: the Hobbit server gets it's data from a Big Brother server (yeah, it's the last week were Hobbit runs parallel to a BB-server; production next week after my week off). So it's not Hobbit who is performing the network tests, yet. On the Hobbit server the bbd- and bbtest-columns are blue dotted and I have commented out the bbnet-test in the hobbitlaunch.cfg, so I thought the network test are not performed twice and everything should be working fine. Well, the mail actions do work :-] Should I disable the bbretest-test as well? Thank for your help, Peter 2005/6/27, Peter Welter <user-f55666bd0d1e@xymon.invalid>:
▸
Hi all, Hope you cold shine a light on this one. I'm running Hobbit 4.0.4 on Solaris 8. I've enabled reporting as follows with macro's in hobbit-alert.cfg: $UNIXDAY=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 REPEAT=1d RECOVERED $UNIXNIGHT=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 DURATION>30m REPEAT=60m RECOVERED SERVICE=!cpu,!msgs COLOR=!yellow This morning I found in my mailbox about 4000 mails reporting stopped & recovered. What happened was that the connection for 'host' failed. I've got 1 email of that event as it should. But mainly, until the connection re-established last night, a bouncing effect for other than the conn-test. I could build a dependancy (don't warn if no-conn) but that is rather annoying since I have many hosts to manage. I just don't fully understand why the other tests are bouncing whereas the conn-test turns off on friday and on last night. See below: ... Bigbrother account Hobbit [705353] host:disk stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [735704] host:msgs stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [115710] host:procs stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit [963024] host:cpu stopped reporting (PURPLE) za 25-06-05 6:04 1 KB Bigbrother account Hobbit host:msgs recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:procs recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:disk recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit host:cpu recovered za 25-06-05 6:05 1 KB Bigbrother account Hobbit [982927] host:procs stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [276744] host:disk stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [850873] host:msgs stopped reporting (PURPLE) za 25-06-05 6:09 1 KB Bigbrother account Hobbit [612113] host:cpu stopped reporting (PURPLE) za 25-06-05 6:09 1 KB The process check for example turn purple, of course, but then a clear status appears: ... Sun Jun 26 20:09:44 2005 clear 0:04:43 Sun Jun 26 20:09:27 2005 purple 0:00:17 Sun Jun 26 20:04:32 2005 clear 0:04:55 Sun Jun 26 20:04:27 2005 purple 0:00:05 Sun Jun 26 20:00:22 2005 clear 0:04:05 Sun Jun 26 19:59:26 2005 purple 0:00:56 ... Thanks, Peter
list Henrik Størner
It's been a while, but I'm trying to get around to this:
▸
On Wed, Jun 29, 2005 at 05:16:37PM +0200, Peter Welter wrote:At 14:01 cpu, procs, msgs, disk went purple. At 14:02 cpu, procs, msgs, disk went clear. At 14:06 cpu, procs, msgs, disk went purple. At 14:06 cpu, procs, msgs, disk went clear.
Perhaps I should add the following info: the Hobbit server gets it's data from a Big Brother server
Are you running the BBDISPLAY part of Big Brother ? That could explain the purples, because a BBDISPLAY server will send out status messages to change a color to purple - Hobbit doesn't, that's handled internally in the hobbit daemon. Regards, Henrik
list Gee Pee
Op 5-jul-2005, om 17:59 heeft Henrik Stoerner het volgende geschreven:
It's been a while, but I'm trying to get around to this:
Very much appreciated!
▸
On Wed, Jun 29, 2005 at 05:16:37PM +0200, Peter Welter wrote:At 14:01 cpu, procs, msgs, disk went purple. At 14:02 cpu, procs, msgs, disk went clear. At 14:06 cpu, procs, msgs, disk went purple. At 14:06 cpu, procs, msgs, disk went clear.Perhaps I should add the following info: the Hobbit server gets it's data from a Big Brother serverAre you running the BBDISPLAY part of Big Brother ? That could explain the purples, because a BBDISPLAY server will send out status messages to change a color to purple - Hobbit doesn't, that's handled internally in the hobbit daemon.
Yes, the Hobbit server gets its data from the BBDISPLAY BB-server, using the BBRELAY-option. But I do not completely understand the bouncing effect? Regards, Peter
list Henrik Størner
▸
In <user-42e03adc1f99@xymon.invalid> Gee Pee <user-f55666bd0d1e@xymon.invalid> writes:
Op 5-jul-2005, om 17:59 heeft Henrik Stoerner het volgende geschreven:
Are you running the BBDISPLAY part of Big Brother ? That could explain the purples, because a BBDISPLAY server will send out status messages to change a color to purple - Hobbit doesn't, that's handled internally in the hobbit daemon.
Yes, the Hobbit server gets its data from the BBDISPLAY BB-server, using the BBRELAY-option.
But I do not completely understand the bouncing effect?
The original bb-display.sh script - and my bbgen package, if you happen to use that - generates purple status messages as part of updating the webpages. Hobbit does this differently: Statuses change to purple "automatically" inside the Hobbit daemon. When Hobbit checks for this, it will first look at the "ping" test, and if this indicates that the host is down, it will change the tests to go "clear" instead of purple. So I think what happens is this: * you originally (while the host was up) received status messages for ping, cpu, disk, msgs and perhaps some other tests. * The host goes down. The ping test changes to red, and the cpu, disk etc. local tests are no longer updated. * After 30 minutes the lifetime of the cpu etc. tests expires. Hobbit notices this but also sees that the ping-test is red, so it changes status to clear. * On your BBDISPLAY server, the web update now runs. It also sees these tests as having expired, so it generates a purple status-update which it forwards (because of BBRELAY) to the Hobbit server. Hobbit handles this as any other status message and changes the status to purple. * Shortly after Hobbit again scans for messages that have expired, finds these tests again and changes them to clear because the "ping" test is still red. * After 5 minutes, the BBDISPLAY server again updates the webpages, sees these tests as having expired, and generate a new purple status message which it forwards to the Hobbit server. Status goes purple. * etc. ad infinitum. One could argue that Hobbit should ignore it when a status message is received which is "purple" (because that should never happen - this color should only be triggered internally in Hobbit). I'll add a check for this. Regards, Henrik