Xymon Mailing List Archive search

Phantom red statuses (Fwd: Xymon [750466] mgmtconsole:msgs CRITICAL (RED))

5 messages in this thread

list Greg Earle · Thu, 11 Feb 2016 14:15:06 -0800 ·
I'm running Xymon 4.3.12-2 server (yeah, I know ...) on my management system.
(RHEL 6.5 currently)

A couple of days ago I migrated our central syslog server over to the
Xymon server, so now "/var/log/messages" is getting a ton of stuff in
it that it never had before since all my systems are now reporting in
to it.

Ever since then I've seen something weird - every hour (for about 17+ hours)
I was getting RED alerts for the management console's own "msgs" status, but
the actual e-mail notifications don't show anything marked red in them!

It's either yellow or, as in the forwarded message below, green.  I have no
idea why I was getting RED alerts for this file if it thinks it's yellow or
green - any ideas?

The only other thing I can add is that when I go to the Web page for
mgmtconsole:msgs, it says "WARNING: Flapping status" at the top.

Is that a clue?

(Update: interestingly, it looks like the status has finally changed to
 green a few minutes ago - after having been red for nearly 17 1/2 hours.
 Still seeing "WARNING: Flapping status" on the svcstatus Web page, though.)

Thanks,

	- Greg
Begin forwarded message:

From: xymon Monitor <user-c84c5ca2f00e@xymon.invalid>
Subject: Xymon [750466] mgmtconsole:msgs CRITICAL (RED)
Date: February 11, 2016 at 11:57:25 AM PST
To: user-9179ff85409c@xymon.invalid

green Thu Feb 11 11:57:24 PST 2016 - Log files ok
<pre>
</pre>

No entries in <a href="/xymon-cgi/svcstatus.sh?CLIENT=mgmtconsole&amp;SECTION=msgs:/var/log/messages">/var/log/messages</a>


Full log <a href="/xymon-cgi/svcstatus.sh?CLIENT=mgmtconsole&amp;SECTION=msgs:/var/log/messages">/var/log/messages</a>
<...SKIPPED...>
Feb 11 11:57:17 host7 nrpe[20194]: [ID 927837 daemon.info] connect from mtfuji

[... rest elided ... ]

See http://mgmtconsole/xymon-cgi/svcstatus.sh?HOST=mgmtconsole&SERVICE=msgs
list Japheth Cleaver · Thu, 11 Feb 2016 15:07:41 -0800 ·
Hi Greg,

The flapping warning is what tips it off. Flap-detection in xymond
functions by looking at alternating alert states (eg, red/green) happening
within a certain period and "pegging" it at the higher status while it's
going back and forth. This prevents spurious recovery messages and
untimely pager death.

The thing is, though, for a normal functioning 'msgs' test it's almost
impossible for it to actually flap out of the box. The logfetch program
which controls the raw data sent to xymond_client for evaluation actually
walks back 6 "periods" (run cycles) in the log file and sends all
subsequent data up to xymond. This helps with mitigating any lost messages
by turning the 'msgs' test into a "Recent Errors in the Log" test instead
of a direct reflection of the event, since xymon is a state-based
monitoring system rather than a single-fire-and-forget (trap) based
system.

Out of the box, a single red event will cause the msgs test to remain red
for a solid 30m -- far too long for flapping to get triggered in most
cases.


Is there any chance you have multiple servers reporting in with the name
'mgmtconsole'? Especially if you're not using FQDN (which it doesn't seem
like you are), that seems like something that might cause this: Two
different servers with the same name, each sending their own red/green
states every few minutes.


HTH,
-jc
quoted from Greg Earle


On Thu, February 11, 2016 2:15 pm, Greg Earle wrote:
I'm running Xymon 4.3.12-2 server (yeah, I know ...) on my management
system.
(RHEL 6.5 currently)

A couple of days ago I migrated our central syslog server over to the
Xymon server, so now "/var/log/messages" is getting a ton of stuff in
it that it never had before since all my systems are now reporting in
to it.

Ever since then I've seen something weird - every hour (for about 17+
hours)
I was getting RED alerts for the management console's own "msgs" status,
but
the actual e-mail notifications don't show anything marked red in them!

It's either yellow or, as in the forwarded message below, green.  I have
no
idea why I was getting RED alerts for this file if it thinks it's yellow
or
green - any ideas?

The only other thing I can add is that when I go to the Web page for
mgmtconsole:msgs, it says "WARNING: Flapping status" at the top.

Is that a clue?

(Update: interestingly, it looks like the status has finally changed to
 green a few minutes ago - after having been red for nearly 17 1/2 hours.
 Still seeing "WARNING: Flapping status" on the svcstatus Web page,
though.)

Thanks,

	- Greg
Begin forwarded message:

From: xymon Monitor <user-c84c5ca2f00e@xymon.invalid>
Subject: Xymon [750466] mgmtconsole:msgs CRITICAL (RED)
Date: February 11, 2016 at 11:57:25 AM PST
To: user-9179ff85409c@xymon.invalid

green Thu Feb 11 11:57:24 PST 2016 - Log files ok
<pre>
</pre>

No entries in <a
href="/xymon-cgi/svcstatus.sh?CLIENT=mgmtconsole&amp;SECTION=msgs:/var/log/messages">/var/log/messages</a>


Full log <a
href="/xymon-cgi/svcstatus.sh?CLIENT=mgmtconsole&amp;SECTION=msgs:/var/log/messages">/var/log/messages</a>
<...SKIPPED...>
Feb 11 11:57:17 host7 nrpe[20194]: [ID 927837 daemon.info] connect from
mtfuji

[... rest elided ... ]

See
http://mgmtconsole/xymon-cgi/svcstatus.sh?HOST=mgmtconsole&SERVICE=msgs
list Ryan Novosielski · Thu, 11 Feb 2016 18:13:46 -0500 ·
Yes, flapping status is a sort
quoted from Greg Earle
On Feb 11, 2016, at 5:15 PM, Greg Earle <user-8f45ae7a27f3@xymon.invalid> wrote:

I'm running Xymon 4.3.12-2 server (yeah, I know ...) on my management system.
(RHEL 6.5 currently)

A couple of days ago I migrated our central syslog server over to the
Xymon server, so now "/var/log/messages" is getting a ton of stuff in
it that it never had before since all my systems are now reporting in
to it.

Ever since then I've seen something weird - every hour (for about 17+ hours)
I was getting RED alerts for the management console's own "msgs" status, but
the actual e-mail notifications don't show anything marked red in them!

It's either yellow or, as in the forwarded message below, green.  I have no
idea why I was getting RED alerts for this file if it thinks it's yellow or
green - any ideas?

The only other thing I can add is that when I go to the Web page for
mgmtconsole:msgs, it says "WARNING: Flapping status" at the top.

Is that a clue?

(Update: interestingly, it looks like the status has finally changed to
green a few minutes ago - after having been red for nearly 17 1/2 hours.
Still seeing "WARNING: Flapping status" on the svcstatus Web page, though.)
Yes, flapping status is essentially “pegged at red due to too many status changes.”

--
____ *Note: UMDNJ is now Rutgers-Biomedical and Health Sciences*
|| \\UTGERS      |---------------------*O*---------------------
||_// Biomedical | Ryan Novosielski - Senior Technologist
|| \\ and Health | user-46c89e614701@xymon.invalid - 973/972.0922 (2x0922)
||  \\  Sciences | OIRT/High Perf & Res Comp - MSB C630, Newark
     `'
list Greg Earle · Fri, 12 Feb 2016 15:20:07 -0800 ·
quoted from Japheth Cleaver
On Feb 12, 2016, at 3:00 AM, xymon-request at xymon.com wrote:

Is there any chance you have multiple servers reporting in with the name
'mgmtconsole'?  Especially if you're not using FQDN (which it doesn't seem
like you are), that seems like something that might cause this: Two
different servers with the same name, each sending their own red/green
states every few minutes.
Thanks JC.  Interesting theory: you see, the old syslog server was on a
separate machine and it never had this Flapping status issue, despite it
getting all of the same syslog messages.  "mgmtconsole" is my Xymon server
so I thought "Oh maybe because it's acting as both server and client that's
what's screwing it up", but before this I never got any anomalous alerts
from "mgmtconsole" itself, just the expected reds/yellows from any
(ab)normal condition that triggered them.

So no, there aren't two different servers (and yes, I don't use FQHN's).

The perma-red state is back, btw.  So, given that the old syslog server
wasn't in perma-red/Flapping state, why would the new syslog/Xymon combo
server be in it?  Also, given that the actual contents of "/var/log/messages"
aren't causing red alerts, is the red alert state caused solely by it
flapping between yellow and green?  (I still don't get why the old machine
wasn't in a similar perma-red/Flapping state.)

Where should I look to try and cure this?

Thanks,

	- Greg
list Japheth Cleaver · Fri, 12 Feb 2016 17:30:59 -0800 ·
quoted from Greg Earle

On Fri, February 12, 2016 3:20 pm, Greg Earle wrote:
On Feb 12, 2016, at 3:00 AM, xymon-request at xymon.com wrote:

Is there any chance you have multiple servers reporting in with the name
'mgmtconsole'?  Especially if you're not using FQDN (which it doesn't
seem
like you are), that seems like something that might cause this: Two
different servers with the same name, each sending their own red/green
states every few minutes.
Thanks JC.  Interesting theory: you see, the old syslog server was on a
separate machine and it never had this Flapping status issue, despite it
getting all of the same syslog messages.  "mgmtconsole" is my Xymon server
so I thought "Oh maybe because it's acting as both server and client
that's
what's screwing it up", but before this I never got any anomalous alerts
from "mgmtconsole" itself, just the expected reds/yellows from any
(ab)normal condition that triggered them.

So no, there aren't two different servers (and yes, I don't use FQHN's).

The perma-red state is back, btw.  So, given that the old syslog server
wasn't in perma-red/Flapping state, why would the new syslog/Xymon combo
server be in it?  Also, given that the actual contents of
"/var/log/messages"
aren't causing red alerts, is the red alert state caused solely by it
flapping between yellow and green?  (I still don't get why the old machine
wasn't in a similar perma-red/Flapping state.)

Where should I look to try and cure this?
Hmm.

The first step will be to track down what status messages xymond is
receiving, precisely.

Something like the following:
xymoncmd xymond_channel --channel=status xymond_capture
--hosts=mgmtconsole --tests=msgs

(or xymoncmd xymond_channel --channel=status --filter='|msgs|' cat >
/tmp/foo, etc...)

should spit out each incoming 'msgs' status message for the host. By
default, I believe it would need to be changing colors 3x within 300s to
be triggered. Once you have that, compare the various colors coming in to
try to find a distinguishing pattern. This still seems like something most
likely caused by varying client sources being falsely reported as the same
host, so keep an eye on the IPs and so forth.

Additionally, check this server itself for anything that might cause
duplication... More than one copy of runclient.sh and/or xymonlaunch
executing, or perhaps permissions/updating/corruption problems on the
logfetch ".status" file it's saving in $XYMONTMP.

Finally, check the raw client data message as well for anything unusual
around the [msgs:/some/path/here] sections. It's unlikely, but possible
there's a parsing bug around there that might be confusing xymond_client
into generating multiple or erroneous 'msgs' test entries.


HTH,
-jc