Xymon Mailing List Archive search

Frequent purple alerts

9 messages in this thread

list Jaime Kikpole · Wed, 10 Jul 2019 08:53:57 -0400 ·
I have a number of systems running the PowerShell Xymon client.  One and
only one of them frequently gives purple alerts on all tests.  I haven't
found any pattern for this yet.  They just seem to happen several times per
day and then clear themselves after anything from a few minutes to a few
hours.  I tried removing and reinstalling the client, but that didn't seem
to help.

Any suggestions?


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1] [image: Google Certified
Educator, Level 2]

-- 
This electronic message and any attachment(s) may contain confidential or legally privileged information protected by law from further disclosure and is intended only for the individual or entity identified above as the addressee. If you are not the addressee (or the employee or agency responsible to deliver it to the addressee), or if this message has been addressed to you in error, you are hereby notified that you may not copy, forward, disclose or use any part of this message or any attachment(s). Please notify the sender immediately by return email or telephone and permanently delete this message and attachment(s) from your system.
list Timothy Williams · Wed, 10 Jul 2019 09:03:46 -0400 ·
What OS version are they? Does it happen more often after Patch Tuesday? We
have some 2008 to 2012 servers that the CPU goes so high as the OS runs
it's update scans to see what patches it needs. Happens every few hours
(set in WSUS frequency), and so severe that XymonClient can't run and send
its files to Xymon server.

Tim Williams


On Wed, Jul 10, 2019 at 8:54 AM Jaime Kikpole via Xymon <xymon at xymon.com>
wrote:
---------- Forwarded message ----------
From: Jaime Kikpole <user-c575ba5bb612@xymon.invalid>
To: xymon at xymon.com
Cc:
Bcc:
Date: Wed, 10 Jul 2019 08:53:57 -0400
Subject: Frequent purple alerts
quoted from Jaime Kikpole
I have a number of systems running the PowerShell Xymon client.  One and
only one of them frequently gives purple alerts on all tests.  I haven't
found any pattern for this yet.  They just seem to happen several times per
day and then clear themselves after anything from a few minutes to a few
hours.  I tried removing and reinstalling the client, but that didn't seem
to help.

Any suggestions?


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1] [image: Google Certified
Educator, Level 2]

This electronic message and any attachment(s) may contain confidential or
legally privileged information protected by law from further disclosure and
is intended only for the individual or entity identified above as the
addressee. If you are not the addressee (or the employee or agency
responsible to deliver it to the addressee), or if this message has been
addressed to you in error, you are hereby notified that you may not copy,
forward, disclose or use any part of this message or any attachment(s).
Please notify the sender immediately by return email or telephone and
permanently delete this message and attachment(s) from your system.


---------- Forwarded message ----------
From: Jaime Kikpole via Xymon <xymon at xymon.com>
To: xymon at xymon.com
Cc:
Bcc:
Date: Wed, 10 Jul 2019 08:53:57 -0400
Subject: [Xymon] Frequent purple alerts

list Jaime Kikpole · Wed, 10 Jul 2019 09:30:15 -0400 ·
It's Windows Server 2012R2 and it happens several times a day, every day.
I could take a look at the CPU load, though.  Thanks for the idea.
quoted from Timothy Williams


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1] [image: Google Certified
Educator, Level 2]


On Wed, Jul 10, 2019 at 9:04 AM Timothy Williams <user-1a5482fb085e@xymon.invalid>
quoted from Timothy Williams
wrote:
What OS version are they? Does it happen more often after Patch Tuesday?
We have some 2008 to 2012 servers that the CPU goes so high as the OS runs
it's update scans to see what patches it needs. Happens every few hours
(set in WSUS frequency), and so severe that XymonClient can't run and send
its files to Xymon server.

Tim Williams


On Wed, Jul 10, 2019 at 8:54 AM Jaime Kikpole via Xymon <xymon at xymon.com>
wrote:
---------- Forwarded message ----------
From: Jaime Kikpole <user-c575ba5bb612@xymon.invalid>
To: xymon at xymon.com
Cc:
Bcc:
Date: Wed, 10 Jul 2019 08:53:57 -0400
Subject: Frequent purple alerts
I have a number of systems running the PowerShell Xymon client.  One and
only one of them frequently gives purple alerts on all tests.  I haven't
found any pattern for this yet.  They just seem to happen several times per
day and then clear themselves after anything from a few minutes to a few
hours.  I tried removing and reinstalling the client, but that didn't seem
to help.

Any suggestions?


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1] [image: Google Certified
Educator, Level 2]

This electronic message and any attachment(s) may contain confidential or
legally privileged information protected by law from further disclosure and
is intended only for the individual or entity identified above as the
addressee. If you are not the addressee (or the employee or agency
responsible to deliver it to the addressee), or if this message has been
addressed to you in error, you are hereby notified that you may not copy,
forward, disclose or use any part of this message or any attachment(s).
Please notify the sender immediately by return email or telephone and
permanently delete this message and attachment(s) from your system.


---------- Forwarded message ----------
From: Jaime Kikpole via Xymon <xymon at xymon.com>
To: xymon at xymon.com
Cc:
Bcc:
Date: Wed, 10 Jul 2019 08:53:57 -0400
Subject: [Xymon] Frequent purple alerts

-- 
This electronic message and any attachment(s) may contain confidential or 
legally privileged information protected by law from further disclosure and 
is intended only for the individual or entity identified above as the 
addressee. If you are not the addressee (or the employee or agency 
responsible to deliver it to the addressee), or if this message has been 
addressed to you in error, you are hereby notified that you may not copy, 
forward, disclose or use any part of this message or any attachment(s). 
Please notify the sender immediately by return email or telephone and 
permanently delete this message and attachment(s) from your system.
list Paul Root · Wed, 10 Jul 2019 15:44:10 +0000 ·
Purple means that it is not getting updates to the server in a timely manner.

This could be the network is too congested. Or the computer is overloaded. Or even that it is sending messages that are too big (a frequent Windows problem in my experience – log files get so big).

Does it go purple for 1 minute, 5 minutes, 30 minutes?

The shorter suggests that it isn’t finishing the run and sending the update in time. Longer suggests connectivity issues.
quoted from Jaime Kikpole

From: Jaime Kikpole <user-c575ba5bb612@xymon.invalid>
Sent: Wednesday, July 10, 2019 7:54 AM
To: xymon at xymon.com
Subject: Frequent purple alerts

I have a number of systems running the PowerShell Xymon client.  One and only one of them frequently gives purple alerts on all tests.  I haven't found any pattern for this yet.  They just seem to happen several times per day and then clear themselves after anything from a few minutes to a few hours.  I tried removing and reinstalling the client, but that didn't seem to help.

Any suggestions?


[https://s3.amazonaws.com/htmlsig-assets/spacer.gif]
quoted from Jaime Kikpole


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500

cairodurham.org<http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid<mailto:user-2eed5d3dd752@xymon.invalid>
go.cairodurham.org/techtips<http://go.cairodurham.org/techtips>;

[Google Certified Educator, Level 1] [Google Certified Educator, Level 2]


This electronic message and any attachment(s) may contain confidential or legally privileged information protected by law from further disclosure and is intended only for the individual or entity identified above as the addressee. If you are not the addressee (or the employee or agency responsible to deliver it to the addressee), or if this message has been addressed to you in error, you are hereby notified that you may not copy, forward, disclose or use any part of this message or any attachment(s). Please notify the sender immediately by return email or telephone and permanently delete this message and attachment(s) from your system.
This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.
list Jaime Kikpole · Wed, 10 Jul 2019 12:21:42 -0400 ·
On Wed, Jul 10, 2019 at 11:44 AM Root, Paul T <user-76fdb6883669@xymon.invalid>
wrote:
This could be the network is too congested.
That seems unlikely.  The Xymon server and clients are all on one of two VM
clusters, so there aren't a lot of "hops" and the bandwidth is very high.

The Xymon client is one of two AD controllers, named dir1 and dir2.  The
one with purple alerts is dir2.  On dir1, we also have .1x authentication,
but they're otherwise pretty much the same.  Their biggest difference is
RAM:  8GB for dir1 and 4GB for dir2.  Do you think I should increase the
RAM?  Other than leaving Task Manager open at all times and hoping to catch
it as soon as the purple alert occurs, I'm not sure how to check if it's
running out of RAM.
quoted from Paul Root

Or the computer is overloaded. Or even that it is sending messages that
are too big (a frequent Windows problem in my experience – log files get so
big).

Any way to check on the log size?

Does it go purple for 1 minute, 5 minutes, 30 minutes?
I just looked at the log.  The purple alert durations are roughly 40
minutes each during the last three occasions.

If it helps, this server is an Active Directory domain controller, but it
is one of two and the other one (a) has more stuff running on it, such as
.1x authentication for our wifi and (b) isn't having this issue.

Thanks for the advice!
quoted from Paul Root


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1] [image: Google Certified
Educator, Level 2]

-- 
This electronic message and any attachment(s) may contain confidential or 
legally privileged information protected by law from further disclosure and 
is intended only for the individual or entity identified above as the 
addressee. If you are not the addressee (or the employee or agency 
responsible to deliver it to the addressee), or if this message has been 
addressed to you in error, you are hereby notified that you may not copy, 
forward, disclose or use any part of this message or any attachment(s). 
Please notify the sender immediately by return email or telephone and 
permanently delete this message and attachment(s) from your system.
list Timothy Williams · Wed, 10 Jul 2019 12:40:57 -0400 ·
In my xymonclient_config.xml file I specify the logs to be generated in
C:\Logs <clientlogfile>c:\Logs\xymonclient.log</clientlogfile>. Yours may
be in folder with script.
Look at the xymonclient.log file which contains the transmission data. Look
for the size of the Sent file. If your buffers are set too low on Xymon
server, data file gets truncated, usually causing White, not purple.

2019-07-10 12:28:43.087  Sending to server
2019-07-10 12:28:43.087  Using "original" ASCII encoding
2019-07-10 12:28:43.087  Connecting to host 128.172.5.33
2019-07-10 12:28:43.103  Sent 54761 bytes to server
2019-07-10 12:28:43.309  Received 436 bytes from server

Tim Williams

On Wed, Jul 10, 2019 at 12:23 PM Jaime Kikpole via Xymon <xymon at xymon.com>
wrote:
---------- Forwarded message ----------
From: Jaime Kikpole <user-c575ba5bb612@xymon.invalid>
To: "Root, Paul T" <user-76fdb6883669@xymon.invalid>
Cc: "xymon at xymon.com" <xymon at xymon.com>
Bcc:
Date: Wed, 10 Jul 2019 12:21:42 -0400
Subject: Re: Frequent purple alerts
quoted from Jaime Kikpole
On Wed, Jul 10, 2019 at 11:44 AM Root, Paul T <user-76fdb6883669@xymon.invalid>
wrote:
This could be the network is too congested.
That seems unlikely.  The Xymon server and clients are all on one of two
VM clusters, so there aren't a lot of "hops" and the bandwidth is very high.

The Xymon client is one of two AD controllers, named dir1 and dir2.  The
one with purple alerts is dir2.  On dir1, we also have .1x authentication,
but they're otherwise pretty much the same.  Their biggest difference is
RAM:  8GB for dir1 and 4GB for dir2.  Do you think I should increase the
RAM?  Other than leaving Task Manager open at all times and hoping to catch
it as soon as the purple alert occurs, I'm not sure how to check if it's
running out of RAM.

Or the computer is overloaded. Or even that it is sending messages that
are too big (a frequent Windows problem in my experience – log files get so
big).

Any way to check on the log size?

Does it go purple for 1 minute, 5 minutes, 30 minutes?
I just looked at the log.  The purple alert durations are roughly 40
minutes each during the last three occasions.

If it helps, this server is an Active Directory domain controller, but it
is one of two and the other one (a) has more stuff running on it, such as
.1x authentication for our wifi and (b) isn't having this issue.

Thanks for the advice!


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1] [image: Google Certified
Educator, Level 2]


This electronic message and any attachment(s) may contain confidential or
legally privileged information protected by law from further disclosure and
is intended only for the individual or entity identified above as the
addressee. If you are not the addressee (or the employee or agency
responsible to deliver it to the addressee), or if this message has been
addressed to you in error, you are hereby notified that you may not copy,
forward, disclose or use any part of this message or any attachment(s).
Please notify the sender immediately by return email or telephone and
permanently delete this message and attachment(s) from your system.


---------- Forwarded message ----------
From: Jaime Kikpole via Xymon <xymon at xymon.com>
To: "Root, Paul T" <user-76fdb6883669@xymon.invalid>
Cc: "xymon at xymon.com" <xymon at xymon.com>
Bcc:

Date: Wed, 10 Jul 2019 12:21:42 -0400
Subject: Re: [Xymon] Frequent purple alerts

list Jaime Kikpole · Wed, 31 Jul 2019 16:32:55 -0400 ·
Sorry to resurrect this old thread, but I finally was able to grab the logs
from the Xymon client during a purple alert.  Usually, it would go back to
green before I would notice, could switch gears, and began working on it.

Thanks, Timoth Williams, for pointing out the file uploading parts of the
logs.  Based on that, I found these lines in the xymonclient.log file:
2019-07-31 15:25:38  Connecting to host 163.153.163.90
2019-07-31 15:25:59  ERROR: Cannot connect to host monitor1.cairodurham.org
(163.153.163.90) : System.Management.Automation.MethodInvocationException:
Exception calling "Connect" with "2" argument(s): "A connection attempt
failed because the connected party did not properly respond after a period
of time, or established connection failed because connected host has failed
to respond 163.153.163.90:1984" ---> System.Net.Sockets.SocketException: A
connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond 163.153.163.90:1984

It looks like it was somehow resolving the FQDN (monitor1.cairodurham.org)
to its external IP address instead of its internal IP address.  I'm not
sure why.  I just checked the DNS settings and they're the same as another
Windows 2012R2 server that isn't having this issue.

I changed the FQDN to the internal IP address and restarted the service.
Everything went green almost immediately.

Any idea how it could resolve to the public IP address 2 - 4 each day but
only for a few hours total each day?
quoted from Timothy Williams


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1][image: Google Certified
Educator, Level 2] <https://www.credential.net/d24m9rrp>;
quoted from Timothy Williams

-- 
This electronic message and any attachment(s) may contain confidential or 
legally privileged information protected by law from further disclosure and 
is intended only for the individual or entity identified above as the 
addressee. If you are not the addressee (or the employee or agency 
responsible to deliver it to the addressee), or if this message has been 
addressed to you in error, you are hereby notified that you may not copy, 
forward, disclose or use any part of this message or any attachment(s). 
Please notify the sender immediately by return email or telephone and 
permanently delete this message and attachment(s) from your system.
list Timothy Williams · Thu, 1 Aug 2019 11:02:03 -0400 ·
I don't know about the DNS switching around, unless it is due to some DC
synchronizing stuff, and one has a manual entry the other doesn't? Two ways
to circumvent that is to use the IP in the Xymon Settings file <servers>
tag ( I think that is what you said you did), or add the internal IP to the
server HOSTS file; both of which requires future editing if the IP of the
hostname gets changed.

I should have mentioned that I use the tag
<clientlogretain>4</clientlogretain>  in my xymonclient_config.xml file to
save multiple versions of the logs to give me some time to look at them and
track changes from one file to another when I make a change.

Glad you are able to get it stable.

Tim Williams
VCU Computer Center


On Wed, Jul 31, 2019 at 4:33 PM Jaime Kikpole <user-c575ba5bb612@xymon.invalid>
quoted from Jaime Kikpole
wrote:
Sorry to resurrect this old thread, but I finally was able to grab the
logs from the Xymon client during a purple alert.  Usually, it would go
back to green before I would notice, could switch gears, and began working
on it.

Thanks, Timoth Williams, for pointing out the file uploading parts of the
logs.  Based on that, I found these lines in the xymonclient.log file:
2019-07-31 15:25:38  Connecting to host 163.153.163.90
2019-07-31 15:25:59  ERROR: Cannot connect to host
monitor1.cairodurham.org (163.153.163.90) :
System.Management.Automation.MethodInvocationException: Exception calling
"Connect" with "2" argument(s): "A connection attempt failed because the
connected party did not properly respond after a period of time, or
established connection failed because connected host has failed to respond
163.153.163.90:1984" ---> System.Net.Sockets.SocketException: A
connection attempt failed because the connected party did not properly
respond after a period of time, or established connection failed because
connected host has failed to respond 163.153.163.90:1984

It looks like it was somehow resolving the FQDN (monitor1.cairodurham.org)
to its external IP address instead of its internal IP address.  I'm not
sure why.  I just checked the DNS settings and they're the same as another
Windows 2012R2 server that isn't having this issue.

I changed the FQDN to the internal IP address and restarted the service.
Everything went green almost immediately.

Any idea how it could resolve to the public IP address 2 - 4 each day but
only for a few hours total each day?


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org <http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid
go.cairodurham.org/techtips

[image: Google Certified Educator, Level 1][image: Google Certified
Educator, Level 2] <https://www.credential.net/d24m9rrp>;


This electronic message and any attachment(s) may contain confidential or
legally privileged information protected by law from further disclosure and
is intended only for the individual or entity identified above as the
addressee. If you are not the addressee (or the employee or agency
responsible to deliver it to the addressee), or if this message has been
addressed to you in error, you are hereby notified that you may not copy,
forward, disclose or use any part of this message or any attachment(s).
Please notify the sender immediately by return email or telephone and
permanently delete this message and attachment(s) from your system.
list Paul Root · Thu, 1 Aug 2019 15:55:39 +0000 ·
You can tell xymon to use the IP address in the hosts.cfg file. Put testip in the comment section of that host. See the hosts.cfg man page.
quoted from Timothy Williams

From: Timothy Williams <user-1a5482fb085e@xymon.invalid>
Sent: Thursday, August 01, 2019 10:02 AM
To: Jaime Kikpole <user-c575ba5bb612@xymon.invalid>
Cc: Root, Paul T <user-76fdb6883669@xymon.invalid>; xymon at xymon.com
Subject: Re: [Xymon] Frequent purple alerts

I don't know about the DNS switching around, unless it is due to some DC synchronizing stuff, and one has a manual entry the other doesn't? Two ways to circumvent that is to use the IP in the Xymon Settings file <servers> tag ( I think that is what you said you did), or add the internal IP to the server HOSTS file; both of which requires future editing if the IP of the hostname gets changed.

I should have mentioned that I use the tag <clientlogretain>4</clientlogretain>  in my xymonclient_config.xml file to save multiple versions of the logs to give me some time to look at them and track changes from one file to another when I make a change.

Glad you are able to get it stable.

Tim Williams
VCU Computer Center


On Wed, Jul 31, 2019 at 4:33 PM Jaime Kikpole <user-c575ba5bb612@xymon.invalid<mailto:user-c575ba5bb612@xymon.invalid>> wrote:
Sorry to resurrect this old thread, but I finally was able to grab the logs from the Xymon client during a purple alert.  Usually, it would go back to green before I would notice, could switch gears, and began working on it.

Thanks, Timoth Williams, for pointing out the file uploading parts of the logs.  Based on that, I found these lines in the xymonclient.log file:
2019-07-31 15:25:38  Connecting to host 163.153.163.90

2019-07-31 15:25:59  ERROR: Cannot connect to host monitor1.cairodurham.org<http://monitor1.cairodurham.org>; (163.153.163.90) : System.Management.Automation.MethodInvocationException: Exception calling "Connect" with "2" argument(s): "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 163.153.163.90:1984<http://163.153.163.90:1984>"; ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 163.153.163.90:1984<http://163.153.163.90:1984>;

It looks like it was somehow resolving the FQDN (monitor1.cairodurham.org<http://monitor1.cairodurham.org>;) to its external IP address instead of its internal IP address.  I'm not sure why.  I just checked the DNS settings and they're the same as another Windows 2012R2 server that isn't having this issue.
quoted from Timothy Williams

I changed the FQDN to the internal IP address and restarted the service.  Everything went green almost immediately.

Any idea how it could resolve to the public IP address 2 - 4 each day but only for a few hours total each day?


[https://s3.amazonaws.com/htmlsig-assets/spacer.gif]


Jaime Kikpole

Director of Technology & Innovations
Cairo-Durham Central School District
(XXX) XXX-XXXX, x59500
cairodurham.org<http://www.cairodurham.org>;

Technical Support:
user-2eed5d3dd752@xymon.invalid<mailto:user-2eed5d3dd752@xymon.invalid>
go.cairodurham.org/techtips<http://go.cairodurham.org/techtips>;

[Google Certified Educator, Level 1][Google Certified Educator, Level 2][https://api.accredible.com/v1/frontend/credential_website_embed_image/badge/13415328]<https://www.credential.net/d24m9rrp>;
quoted from Timothy Williams


This electronic message and any attachment(s) may contain confidential or legally privileged information protected by law from further disclosure and is intended only for the individual or entity identified above as the addressee. If you are not the addressee (or the employee or agency responsible to deliver it to the addressee), or if this message has been addressed to you in error, you are hereby notified that you may not copy, forward, disclose or use any part of this message or any attachment(s). Please notify the sender immediately by return email or telephone and permanently delete this message and attachment(s) from your system.
This communication is the property of CenturyLink and may contain confidential or privileged information. Unauthorized use of this communication is strictly prohibited and may be unlawful. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy all copies of the communication and any attachments.