Xymon Mailing List Archive search

purple/clear problems?!

list Peter Welter
Wed, 29 Jun 2005 17:16:37 +0200
Message-Id: <user-5de3279ff970@xymon.invalid>

Sorry I have to bother you again, but this afternoon I experienced the
same bouncing effect problem again:

the server to be tested went down at 13:31. After 10 minutes conn
worked again but after 5 minutes conn failed and did not come up
afterwards.

Then the following happened:

At 14:01 cpu, procs, msgs, disk went purple.
At 14:02 cpu, procs, msgs, disk went clear.
At 14:06 cpu, procs, msgs, disk went purple.
At 14:06 cpu, procs, msgs, disk went clear.
At 14:11 cpu, procs, msgs, disk went purple.
At 14:11 cpu, procs, msgs, disk went clear.
At 14:16 cpu, procs, msgs, disk went purple.
At 14:16 cpu, procs, msgs, disk went clear.

Then I blue dotted the test untill the machine went up again.

Perhaps I should add the following info: the Hobbit server gets it's
data from a Big Brother server (yeah, it's the last week were Hobbit
runs parallel to a BB-server; production next week after my week off).
So it's not Hobbit who is performing the network tests, yet.

On the Hobbit server the bbd- and bbtest-columns are blue dotted and I
have commented out the bbnet-test in the hobbitlaunch.cfg, so I
thought the network test are not performed twice and everything should
be working fine. Well, the mail actions do work :-]

Should I disable the bbretest-test as well?

Thank for your help,

Peter

2005/6/27, Peter Welter <user-f55666bd0d1e@xymon.invalid>:
Hi all,

Hope you cold shine a light on this one. I'm running Hobbit 4.0.4 on
Solaris 8. I've enabled reporting as follows with macro's in
hobbit-alert.cfg:

$UNIXDAY=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 REPEAT=1d RECOVERED

$UNIXNIGHT=MAIL user-54ad2508e1e4@xymon.invalid TIME=*:0000:2359 DURATION>30m
REPEAT=60m RECOVERED SERVICE=!cpu,!msgs COLOR=!yellow

This morning I found in my mailbox about 4000 mails reporting stopped
& recovered. What happened was that the connection for 'host' failed.
I've got 1 email of that event as it should. But mainly, until the
connection re-established last night, a bouncing effect for other than
the conn-test. I could build a dependancy (don't warn if no-conn) but
that is rather annoying since I have many hosts to manage.

I just don't fully understand why the other tests are bouncing whereas
the conn-test turns off on friday and on last night. See below:

...
Bigbrother account      Hobbit [705353] host:disk stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit [735704] host:msgs stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit [115710] host:procs stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit [963024] host:cpu stopped reporting
(PURPLE)        za 25-06-05 6:04        1 KB
Bigbrother account      Hobbit host:msgs recovered      za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit host:procs recovered     za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit host:disk recovered      za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit host:cpu recovered       za 25-06-05 6:05        1 KB
Bigbrother account      Hobbit [982927] host:procs stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB
Bigbrother account      Hobbit [276744] host:disk stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB
Bigbrother account      Hobbit [850873] host:msgs stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB
Bigbrother account      Hobbit [612113] host:cpu stopped reporting
(PURPLE)        za 25-06-05 6:09        1 KB

The process check for example turn purple, of course, but then a clear
status appears:

...
Sun Jun 26 20:09:44 2005        clear   0:04:43
Sun Jun 26 20:09:27 2005        purple  0:00:17
Sun Jun 26 20:04:32 2005        clear   0:04:55
Sun Jun 26 20:04:27 2005        purple  0:00:05
Sun Jun 26 20:00:22 2005        clear   0:04:05
Sun Jun 26 19:59:26 2005        purple  0:00:56
...

Thanks,

Peter