Xymon Mailing List Archive search

Monitoring Exchange 2010 - imaps/pop3s alarms

6 messages in this thread

list Shawn Heisey · Thu, 18 Nov 2010 14:02:04 -0700 ·
I have recently migrated from Exchange 2003 to Exchange 2010.  Several times a day, I get alarms for  imaps and pop3s, which are resolved less than a minute later.

The graph (http://www.elyograg.org/pop3s.png) shows hourly spikes in reponse time.  The alarms are not every hour, but when they do happen, they correspond to the spikes.  The email from xymon looks like this.  Note the "seconds" value, which is extremely low.  Anytime there's an alarm, the value is low like this.

----
Service pop3s on server.example.com is not OK : Service unavailable (connect timeout)


Seconds: 0.003435
----

There have been no complaints from people who use the service, and believe me, I'd hear about it if there was a problem.

The non-SSL versions are not alarming.  I just noticed that the graph for pop3 has two entries - pop3 and pop3s.  From what I can tell, it checks TLS on the standard test.  The pop3s graph on the pop3 entry looks identical to the pop3s graph I've included here, except it's red instead of blue.

I suspect that it's related to the throttling policies in Exchange 2010, but I don't know what to change.  I don't want to open the default throttling policy way up, but I did change the anonymous connection limit from 1 to 5.  Has anyone else had this problem and found a way to solve it?

Thanks,
Shawn
list Shawn Heisey · Thu, 18 Nov 2010 14:06:32 -0700 ·
quoted from Shawn Heisey
On 11/18/2010 2:02 PM, Shawn Heisey wrote:
I have recently migrated from Exchange 2003 to Exchange 2010.  Several times a day, I get alarms for  imaps and pop3s, which are resolved less than a minute later.

The graph (http://www.elyograg.org/pop3s.png) shows hourly spikes in reponse time.  The alarms are not every hour, but when they do happen, they correspond to the spikes.  The email from xymon looks like this.  Note the "seconds" value, which is extremely low.  Anytime there's an alarm, the value is low like this.

----
Service pop3s on server.example.com is not OK : Service unavailable (connect timeout)


Seconds: 0.003435
----

There have been no complaints from people who use the service, and believe me, I'd hear about it if there was a problem.

The non-SSL versions are not alarming.  I just noticed that the graph for pop3 has two entries - pop3 and pop3s.  From what I can tell, it checks TLS on the standard test.  The pop3s graph on the pop3 entry looks identical to the pop3s graph I've included here, except it's red instead of blue.

I suspect that it's related to the throttling policies in Exchange 2010, but I don't know what to change.  I don't want to open the default throttling policy way up, but I did change the anonymous connection limit from 1 to 5.  Has anyone else had this problem and found a way to solve it?
I forgot to mention versions.  Xymon is the debian package in lenny-backports, x86_64 version  4.3.0~beta2.dfsg-5~bpo50+1.  The Exchange server is running on Windows Server 2008 R2 with BBWin 0.12.
list Xymon User in Richmond · Thu, 18 Nov 2010 18:51:33 -0500 ·
quoted from Shawn Heisey
On Thu, November 18, 2010 16:02, Shawn Heisey wrote:
I have recently migrated from Exchange 2003 to Exchange 2010.  Several
times a day, I get alarms for  imaps and pop3s, which are resolved less
than a minute later.
Not the same thing, but I had a situation where internal web servers would
all go red a number of times a day then go green again on the next test. 
No complaints from users, and I couldn't identify the network anomaly
causing it, so I just use "badhttp 2:3:4" for them in bb-hosts.  They
pretty much stay in "smiley" green, but that's better than alarms for no
good purpose.

It's "badTEST" in the manpage.
list Shawn Heisey · Fri, 19 Nov 2010 01:30:32 -0700 ·
quoted from Xymon User in Richmond
On 11/18/2010 4:51 PM, Xymon User in Richmond wrote:
Not the same thing, but I had a situation where internal web servers would
all go red a number of times a day then go green again on the next test.
No complaints from users, and I couldn't identify the network anomaly
causing it, so I just use "badhttp 2:3:4" for them in bb-hosts.  They
pretty much stay in "smiley" green, but that's better than alarms for no
good purpose.

It's "badTEST" in the manpage.
Thanks!  This will get rid of the false alarms while I work out what's really wrong and how to fix it.  I think that'll take an extended tcpdump followed by inspection in wireshark.  I went with:

badimaps:1:2:3 badpop3s:1:2:3

I'm still interested in knowing if anyone else has run into this already and dealt with it at the source.  If I do manage to find a way in Exchange to fix it, I'll post it here.

Shawn
list Shawn Heisey · Fri, 19 Nov 2010 08:29:15 -0700 ·
quoted from Xymon User in Richmond
On 11/18/2010 4:51 PM, Xymon User in Richmond wrote:
Not the same thing, but I had a situation where internal web servers would
all go red a number of times a day then go green again on the next test.
No complaints from users, and I couldn't identify the network anomaly
causing it, so I just use "badhttp 2:3:4" for them in bb-hosts.  They
pretty much stay in "smiley" green, but that's better than alarms for no
good purpose.

It's "badTEST" in the manpage.
Does this work for all tests, or does it only work for the built-in 
network tests?  I have a couple of custom scripts that occasionally give 
false alarms.

Shawn
list Epperson · Fri, 19 Nov 2010 12:16:26 -0500 ·
quoted from Shawn Heisey
On Fri, November 19, 2010 10:29, Shawn Heisey wrote:
On 11/18/2010 4:51 PM, Xymon User in Richmond wrote:
Not the same thing, but I had a situation where internal web servers
would
all go red a number of times a day then go green again on the next test.
No complaints from users, and I couldn't identify the network anomaly
causing it, so I just use "badhttp 2:3:4" for them in bb-hosts.  They
pretty much stay in "smiley" green, but that's better than alarms for no
good purpose.

It's "badTEST" in the manpage.
Does this work for all tests, or does it only work for the built-in
network tests?  I have a couple of custom scripts that occasionally give
false alarms.
IIRC it's only for the built-in network tests.  That's implied but not
definitively stated in the manpage.