Xymon Mailing List Archive search

Phantom trap alerts?

6 messages in this thread

list Betsy Schwartz · Tue, 12 Mar 2013 06:54:01 -0400 ·
We have one class of device (exadata storage cels)  , and one single
server out of hundreds, for which we occasionally get bogus trap
alerts.

We don't use trap alerts at present on *any* devices.

What happens is, we get a trap alert, and then after some time passes
it turns purple. Someone gets paged and woken up for the purple which
is VERY annoying. I drop it on the xymon server side, and it goes away
for days or weeks or months.

I can set a rule to ignore trap when paging, but why are we getting
them? The  exadata storage cels  do not run a xymon client, they are
ping monitored only.  The  lone server does run the linux client. What
could be triggering the trap just for these boxes? Has anyone else
ever experienced this?
list Jeremy Laidman · Wed, 13 Mar 2013 08:44:06 +1100 ·
quoted from Betsy Schwartz
On 12 March 2013 21:54, Betsy Schwartz <user-c61747246f66@xymon.invalid> wrote:
We have one class of device (exadata storage cels)  , and one single
server out of hundreds, for which we occasionally get bogus trap
alerts.
Are you talking of SNMP traps?

J
list Betsy Schwartz · Wed, 13 Mar 2013 13:37:49 -0400 ·
On Tue, Mar 12, 2013 at 5:44 PM, Jeremy Laidman
quoted from Jeremy Laidman
<user-71895fb2e44c@xymon.invalid> wrote:
On 12 March 2013 21:54, Betsy Schwartz <user-c61747246f66@xymon.invalid> wrote:
We have one class of device (exadata storage cels)  , and one single
server out of hundreds, for which we occasionally get bogus trap
alerts.

Are you talking of SNMP traps?
Yes, we don't use them at all, and then out of the blue we'll get a purple:

clear Thu Feb 16 10:31:22 2012
Unknown trap (.1.3.6.1.4.1.111.16.2.0.1)

We've only gotten them for one particular linux host, plus several
Exadata cels.  The above OID is from an exadata cell and does appear
to be from an Oracle Exadata mib . But why would the xymon server,
only occasionally, get one of these when SNMP is not enabled? I'm not
running devmon, not sure what other ways there are to have Xymon pick
up an snmp alert?
list David Baldwin · Thu, 14 Mar 2013 10:03:04 +1100 ·
Betsy,
quoted from Betsy Schwartz
On Tue, Mar 12, 2013 at 5:44 PM, Jeremy Laidman
<user-71895fb2e44c@xymon.invalid> wrote:
On 12 March 2013 21:54, Betsy Schwartz <user-c61747246f66@xymon.invalid> wrote:
We have one class of device (exadata storage cels)  , and one single
server out of hundreds, for which we occasionally get bogus trap
alerts.
Are you talking of SNMP traps?
Yes, we don't use them at all, and then out of the blue we'll get a purple:

clear Thu Feb 16 10:31:22 2012
Unknown trap (.1.3.6.1.4.1.111.16.2.0.1)
SNMP traps are event-based notifications. If you follow the recipe
http://cerebro.victoriacollege.edu/hobbit-trap.html for setting up trap
notifications using snmptt/sec/etc part of the config requires have a
poller that checks for expiring (soon to go purple) trap status messages
and sends a clear "no traps" message to prevent that.
quoted from Betsy Schwartz
We've only gotten them for one particular linux host, plus several
Exadata cels.  The above OID is from an exadata cell and does appear
to be from an Oracle Exadata mib . But why would the xymon server,
only occasionally, get one of these when SNMP is not enabled? I'm not
running devmon, not sure what other ways there are to have Xymon pick
up an snmp alert?
It absolutely requires some test to generate these. Check the IP address
of the originating server that sent the trap status message, then check
what tests are running from there. Might also be worth checking Ghost
Clients to see if there are more of these that you don't know about.

devmon does not do SNMP traps in any way. It is SNMP polling only.

David.

-- 
David Baldwin - Senior Systems Administrator (Datacentres + Networks)
Information and Communication Technology Services
Australian Sports Commission          http://ausport.gov.au
Tel 02 62147830 Fax 02 62141830       PO Box 176 Belconnen ACT 2616
user-cbbf693f2c89@xymon.invalid          Leverrier Street Bruce ACT 2617


Keep up to date with what's happening in Australian sport visit http://www.ausport.gov.au

This message is intended for the addressee named and may contain confidential and privileged information. If you are not the intended recipient please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited and may be unlawful. If you receive this message in error, please delete it and notify the sender.
list Jeremy Laidman · Thu, 14 Mar 2013 10:49:02 +1100 ·
quoted from David Baldwin
On 14 March 2013 10:03, David Baldwin <user-cbbf693f2c89@xymon.invalid> wrote:
It absolutely requires some test to generate these. Check the IP address
of the originating server that sent the trap status message, then check
what tests are running from there. Might also be worth checking Ghost
Clients to see if there are more of these that you don't know about.
Also, check the trap destination configured on the device.  If it's set to
the Xymon server, then look for a process on your Xymon server that's
listening for SNMP packets.  On Linux, you can do "sudo netstat -naup |
grep :162" and it should show the PID and name of the process that is
receiving the traps.

devmon does not do SNMP traps in any way. It is SNMP polling only.
(As David implied) neither does Xymon.  There must be another process that
receives a trap and then generates a Xymon status message, but not
necessarily running on the Xymon server.

Googling the phrase ["Unknown trap" xymon] shows the HOWTO that David
linked to.  I suspect someone has set this up on your Xymon server.  This
means you probably have snmptrapd running, which you should stop if you
don't ever use SNMP traps.

J
list Betsy Schwartz · Wed, 13 Mar 2013 21:34:01 -0400 ·
On Wed, Mar 13, 2013 at 7:49 PM, Jeremy Laidman
quoted from Jeremy Laidman
<user-71895fb2e44c@xymon.invalid> wrote:
On 14 March 2013 10:03, David Baldwin <user-cbbf693f2c89@xymon.invalid> wrote:

It absolutely requires some test to generate these. Check the IP address
of the originating server that sent the trap status message, then check
what tests are running from there. Might also be worth checking Ghost
Clients to see if there are more of these that you don't know about.

Also, check the trap destination configured on the device.  If it's set to
the Xymon server, then look for a process on your Xymon server that's
listening for SNMP packets.  On Linux, you can do "sudo netstat -naup | grep
:162" and it should show the PID and name of the process that is receiving
the traps.
devmon does not do SNMP traps in any way. It is SNMP polling only.

(As David implied) neither does Xymon.  There must be another process that
receives a trap and then generates a Xymon status message, but not
necessarily running on the Xymon server.

Googling the phrase ["Unknown trap" xymon] shows the HOWTO that David linked
to.  I suspect someone has set this up on your Xymon server.  This means you
probably have snmptrapd running, which you should stop if you don't ever use
SNMP traps.

Hm, still puzzled. The Exadata cels are mimimal storage  devices that
don't run a linux client. The lone server that we get these  messages
from is not running any SNMP tests. We only get them once in a blue
moon.

There are no snmp processes of any sort running on the linux server. I
built that server myself  from our master image and only installed
xymon. The one thing that has somethng to do with snmp is the VMWare
cli and esxi tests, BUT the hosts that are alerting aren't vmware
hosts (and the ESXI tests are running on another server since I
haven't gotten thm to run yet on this one, but that's another story

sudo netstat -naup | grep 162
returns nothing.

The one linux server that we occasionally see this from is running two
hp hardware tests that call hpacucli, bb-roracle, and ntp and memory
tests.

I have a bazillion ghost clients because the esxi test is returning
non-FQDN names for all vmware hosts, but those aren't the hosts that
are alerting. I'm getting alerts from two exadata cells and one linux
HP G5

From Jan 1, 2012 we've gotten this many purple trap alerts:

Host	State changes
dm02cel14.example.com	4	( 28.57 %)
dm02cel13.example.com	4	( 28.57 %)
dm03cel14.example.com	1	( 7.14 %)
dm03cel13.example.com	1	( 7.14 %)
dm03cel12.example.com	1	( 7.14 %)
dm03cel11.example.com	1	( 7.14 %)
dm03cel09.example.com	1	( 7.14 %)
dba-apps2.example.com	1	( 7.14 %)
Other hosts	0	( 0.00 %)

Note that we have 28 identically configured exadata cels and only
seven of them have ever alerted

It's not really that many - fourteen alerts in 15 months - but since
every single one is bogus, I'd like to hunt it down and shoot it
If somethng were really sending these alerts I think we'd get more than this.