Xymon Mailing List Archive search

HPE BL460c Gen9 SNMP tests going clear

list Colin Coe
Thu, 14 Mar 2019 08:11:25 +0800
Message-Id: <user-0a76bf09378c@xymon.invalid>

Hi Jeremy

Looks like I've had similar issues in the past that I'd completely
forgotten about...  I've already got
"$SNMP_Session::default_avoid_negative_request_ids = 1;"
in /usr/share/perl5/SNMP_Session.pm

While multiple servers are affected, they don't always go clear at the same
time.

I've changed the snmpver from 2 to 1 but that didn't improve things.
having tcpdump running in another ssh session, I can see that requests are
being send and responses are being received.  I see no negative requests,
and ofen the Devmon tests go clear even though SNMP data is received.  The
devmon.log file reflects devmon's view, not the reality of the SNMP
responses.

I'm thinking about using the baked in Xymon SNMP tests but how would I
translate the Proliant server tests, for example
"compaq-servernohspare/raid"?

Thanks for the pointers

CC

On Wed, Mar 13, 2019 at 6:38 PM Jeremy Laidman <user-0608abae5e7c@xymon.invalid> wrote:
Colin

Are all your different servers losing numbers all around the same time?
The fact that both Windows and Linux systems (with probably different snmpd
code) are having the same issues, suggests that it's a devmon problem.

It's been a while since I did any devmon stuff (mine has been running for
years but a low node/OID count seems to keep it humming along), but I seem
to recall there was some issue with 32-bit vs 64-bit SNMP request/response
IDs. What can happen is that a negative 32-bit request ID gets turned into
a positive and large 64-bit response ID that is actually the same, but
parsed as different by the SNMP library used by devmon. This might be a
problem only where there is a 32-bit SNMP library installed on a 64-bit OS,
or that might just be a red herring.

If this is a problem, a "tcpdump" with "-v" will show request/response
IDs, and you can confirm if you only get responses for positive request IDs.

There's an SNMP_Session API  variable you can set in the devmon script to
avoid negative request IDs. See
https://lists.oetiker.ch/pipermail/mrtg-developers/2002-September/000103.html
and
https://xymon.xymon.narkive.com/6ltXE2nB/devmon-tests-clear-but-snmpwalk-works#post8
.

Another issue might be the use of SNMPBulkGet, which results in lots of
packets in the response, all fragments of a large datagram. One day to
disable Bulk requests is to set SNMP version (snmpver) to 1 in your
devmon/templates/<devtype>/specs file. Most templates have "snmpver : 2".

The last time I tried to add a bunch of hosts to devmon, it failed, not
only for the hosts I added, but also for the hosts that were there before
the addition. I ended up reverse-engineering the as-yet-undocumented Xymon
SNMP features, and it seems to work well enough. I do sometimes get no
response from some hosts from time to time, but it's better than nothing.

In case you wanted to go down this path, it's just a matter of adding
entries into /etc/xymon/snmphosts.cfg like this:

[hostname.example.net]
  version=2
  community=secret
  ip=192.0.2.19
  systemmib
  ifmib=(*)
  icmpmib
  hrsystem
  hrstorage=(*)

The entries after "ip=" line are references to stanzas in the
/etc/xymon/snmpmibs.cfg file. There's special black magic in the
snmpmibs.cfg file that I haven't worked out, and probably needs code
inspection to understand, so I haven't dared to touch this file, with the
exception of one apparent typo; diff to fix it is here:

 [hrstorage]
 # storage has data for both memory- and disk-storage
        keyidx (HOST-RESOURCES-MIB::hrStorageDescr)
-       keyidx [(HOST-RESOURCES-MIB::hrStorageType]
+       keyidx [HOST-RESOURCES-MIB::hrStorageType]
        Type = HOST-RESOURCES-MIB::hrStorageType
        Description = HOST-RESOURCES-MIB::hrStorageDescr
        Units = HOST-RESOURCES-MIB::hrStorageAllocationUnits
/rrd:GAUGE

Hmm, perhaps I should try setting "version=1" and see if that helps my
occasional missing data...

J


On Wed, 13 Mar 2019 at 17:12, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi Bruce and thanks for the quick response.

These Gen9's are a mix of RHEL6 and Windows 2016, and interestingly, both
behave the same.  The RHEL boxen are running 'net-snmp' and for Windows
2016 it's the stock WIndows SNMP service.  We don't do SNMP traps.  These
machines also have the HP Service Pack for Proliant installed.  The RHEL
config can be summarized as:
---
cat /etc/snmp/snmpd.conf
dlmod cmaX /usr/lib64/libcmaX64.so
rocommunity secret
---

Thanks again

On Wed, Mar 13, 2019 at 2:02 PM Bruce Ferrell <user-24fbf1912cfe@xymon.invalid>
wrote:
On 3/12/19 9:58 PM, Colin Coe wrote:
Hi all

All of our Gen9 servers are flicking between green and clear for
Devmon SNMP tests.  (Posting here as Devmon seems a dead project.)
The attached screen region grab shows what I'm trying to say.

I've updated the firmware (including iLO) on one server and it made no
difference.

Anyone else seen this?

Thanks

CC

I see a lot of odd things with devmon. I have seen an issue something
like on my Dell R710 with OMSA.  It patches itself into snmpd via a loaded
library to report on Dell storage
and sometimes the response from that get's really slow and blocks
response to devmon.  An update a while back to OMSA cleared that up.

I also have to run a cron job that does a kill -9 on it every two hours
and then restart it... It likes to gobble memory like it's going out of
style.

People DO respond on the devmon list, just a little slowly.

I run my devmon in multi node mode with MySQL backing.  when I do see
gross errors like this I turn on debug and watch the devmon log to see what
errors are thrown.

What OS and SNMPD is running on the HP?  Especially the SNMPD... It
matters.  OS X and pFSense don't report some OIDs  correctly due to the
snmpd they run.  It's one of those
"snmpd/ya just gotta know about it" things.