Colin
Are all your different servers losing numbers all around the same time? The
fact that both Windows and Linux systems (with probably different snmpd
code) are having the same issues, suggests that it's a devmon problem.
It's been a while since I did any devmon stuff (mine has been running for
years but a low node/OID count seems to keep it humming along), but I seem
to recall there was some issue with 32-bit vs 64-bit SNMP request/response
IDs. What can happen is that a negative 32-bit request ID gets turned into
a positive and large 64-bit response ID that is actually the same, but
parsed as different by the SNMP library used by devmon. This might be a
problem only where there is a 32-bit SNMP library installed on a 64-bit OS,
or that might just be a red herring.
If this is a problem, a "tcpdump" with "-v" will show request/response IDs,
and you can confirm if you only get responses for positive request IDs.
There's an SNMP_Session API variable you can set in the devmon script to
avoid negative request IDs. See
https://lists.oetiker.ch/pipermail/mrtg-developers/2002-September/000103.html
and
https://xymon.xymon.narkive.com/6ltXE2nB/devmon-tests-clear-but-snmpwalk-works#post8
.
Another issue might be the use of SNMPBulkGet, which results in lots of
packets in the response, all fragments of a large datagram. One day to
disable Bulk requests is to set SNMP version (snmpver) to 1 in your
devmon/templates/<devtype>/specs file. Most templates have "snmpver : 2".
The last time I tried to add a bunch of hosts to devmon, it failed, not
only for the hosts I added, but also for the hosts that were there before
the addition. I ended up reverse-engineering the as-yet-undocumented Xymon
SNMP features, and it seems to work well enough. I do sometimes get no
response from some hosts from time to time, but it's better than nothing.
In case you wanted to go down this path, it's just a matter of adding
entries into /etc/xymon/snmphosts.cfg like this:
[hostname.example.net]
version=2
community=secret
ip=192.0.2.19
systemmib
ifmib=(*)
icmpmib
hrsystem
hrstorage=(*)
The entries after "ip=" line are references to stanzas in the
/etc/xymon/snmpmibs.cfg file. There's special black magic in the
snmpmibs.cfg file that I haven't worked out, and probably needs code
inspection to understand, so I haven't dared to touch this file, with the
exception of one apparent typo; diff to fix it is here:
[hrstorage]
# storage has data for both memory- and disk-storage
keyidx (HOST-RESOURCES-MIB::hrStorageDescr)
- keyidx [(HOST-RESOURCES-MIB::hrStorageType]
+ keyidx [HOST-RESOURCES-MIB::hrStorageType]
Type = HOST-RESOURCES-MIB::hrStorageType
Description = HOST-RESOURCES-MIB::hrStorageDescr
Units = HOST-RESOURCES-MIB::hrStorageAllocationUnits
/rrd:GAUGE
Hmm, perhaps I should try setting "version=1" and see if that helps my
occasional missing data...
J
On Wed, 13 Mar 2019 at 17:12, Colin Coe <user-5b250cd7a540@xymon.invalid> wrote:
Hi Bruce and thanks for the quick response.
These Gen9's are a mix of RHEL6 and Windows 2016, and interestingly, both
behave the same. The RHEL boxen are running 'net-snmp' and for Windows
2016 it's the stock WIndows SNMP service. We don't do SNMP traps. These
machines also have the HP Service Pack for Proliant installed. The RHEL
config can be summarized as:
---
cat /etc/snmp/snmpd.conf
dlmod cmaX /usr/lib64/libcmaX64.so
rocommunity secret
---
Thanks again
On Wed, Mar 13, 2019 at 2:02 PM Bruce Ferrell <user-24fbf1912cfe@xymon.invalid>
wrote:
On 3/12/19 9:58 PM, Colin Coe wrote:
Hi all
All of our Gen9 servers are flicking between green and clear for Devmon
SNMP tests. (Posting here as Devmon seems a dead project.)
The attached screen region grab shows what I'm trying to say.
I've updated the firmware (including iLO) on one server and it made no
difference.
Anyone else seen this?
Thanks
CC
I see a lot of odd things with devmon. I have seen an issue something
like on my Dell R710 with OMSA. It patches itself into snmpd via a loaded
library to report on Dell storage
and sometimes the response from that get's really slow and blocks
response to devmon. An update a while back to OMSA cleared that up.
I also have to run a cron job that does a kill -9 on it every two hours
and then restart it... It likes to gobble memory like it's going out of
style.
People DO respond on the devmon list, just a little slowly.
I run my devmon in multi node mode with MySQL backing. when I do see
gross errors like this I turn on debug and watch the devmon log to see what
errors are thrown.
What OS and SNMPD is running on the HP? Especially the SNMPD... It
matters. OS X and pFSense don't report some OIDs correctly due to the
snmpd they run. It's one of those
"snmpd/ya just gotta know about it" things.