Xymon Mailing List Archive search

Devmon device support, Cross post between lists

5 messages in this thread

list Chris Wopat · Fri, 18 Apr 2008 09:17:35 -0500 ·
Hello,

Chiming in on some info on Devmon. While primarily targeted to the Devmon list, it may be useful to hobbit/devmon users who don't subscribe to that list.

The cisco-7206 template works perfectly fine on a Cisco 7500. I'm sure it works on a 7200 as well. I also have an old 7000 here, but I don't want to boot it up to test. Anyway, it may be in the best interest to rename 7206 to 7200, and just copy its templates to a 7500 folder, or genericly rename the whole thing cisco-7000.

Also, there is a typo in the USING doc:

http://devmon.svn.sourceforge.net/viewvc/devmon/trunk/docs/USING?revision=3&view=markup

This line is listed:
	DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y=50;r=90)

But it should be:
	DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y:50;r:90)

It's correct in the details furter down the page, but the equal symbols should be colons near the top when it first mentions thresh().

Lastly, and this is very minor, Devmon doesn't properly detect administratively down interfaces in all cases. On one router, I am using subinterfaces as follows:

GigabitEthernet0/2
GigabitEthernet0/2.1
GigabitEthernet0/2.2
GigabitEthernet0/2.3
..etc..

If I shut down Gi0/2, 'sh ip int br' shows its subinterfaces administratively down, but devmon doesn't detect that- one has to go into each subinterface and shut them down as well. It does appear that the OID that checks admin status (.1.3.6.1.2.1.2.2.1.7) does indeed say up, which is why it's showing red:

ifAdminStatus.89 = INTEGER: up(1)

I couldnt find any alternate OID to report ifAdminStatus, so short of putting in code to check parent interface status, it probably couldn't be considered a bug, but I thought I'd mention it.

--Chris
list Robert Holden · Fri, 18 Apr 2008 08:47:21 -0700 ·
I have noticed quite a bit of (unnecessary) redundancy when it comes to the
cisco templates.  I have been able to reduce nearly all the cisco devices
down to two templates:  cisco-switch and cisco-common
I still have a few minor issues to deal with, but should have something to
post to the group in about a weeks time.  The biggest of these issues is
finding something in the specs "model" that is common to the cisco-switch
(2811, 4003, 5500, & 6506), that is not found in all the other devices.
Simularily, I would like to find something in the specs "model" that is
common to all other cisco devices (cisco-common).

note: Many switches are still able to use cisco-common (2900, 3500, 3550,
etc), so I probably have to come up with a better name for cisco-switch.

I will see what I can find on your subinterfaces issue.

I am also working on an idea (change to devmon) to allow for "default"
templates depending on vendor.

Robert Holden
quoted from Chris Wopat


On Fri, Apr 18, 2008 at 7:17 AM, Chris Wopat <user-8ece45634613@xymon.invalid> wrote:
Hello,

Chiming in on some info on Devmon. While primarily targeted to the Devmon
list, it may be useful to hobbit/devmon users who don't subscribe to that
list.

The cisco-7206 template works perfectly fine on a Cisco 7500. I'm sure it
works on a 7200 as well. I also have an old 7000 here, but I don't want to
boot it up to test. Anyway, it may be in the best interest to rename 7206 to
7200, and just copy its templates to a 7500 folder, or genericly rename the
whole thing cisco-7000.

Also, there is a typo in the USING doc:


http://devmon.svn.sourceforge.net/viewvc/devmon/trunk/docs/USING?revision=3&view=markup

This line is listed:
       DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y=50;r=90)

But it should be:
       DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y:50;r:90)

It's correct in the details furter down the page, but the equal symbols
should be colons near the top when it first mentions thresh().

Lastly, and this is very minor, Devmon doesn't properly detect
administratively down interfaces in all cases. On one router, I am using
subinterfaces as follows:

GigabitEthernet0/2
GigabitEthernet0/2.1
GigabitEthernet0/2.2
GigabitEthernet0/2.3
..etc..

If I shut down Gi0/2, 'sh ip int br' shows its subinterfaces
administratively down, but devmon doesn't detect that- one has to go into
each subinterface and shut them down as well. It does appear that the OID
that checks admin status (.1.3.6.1.2.1.2.2.1.7) does indeed say up, which is
why it's showing red:

ifAdminStatus.89 = INTEGER: up(1)

I couldnt find any alternate OID to report ifAdminStatus, so short of
putting in code to check parent interface status, it probably couldn't be
considered a bug, but I thought I'd mention it.

--Chris

list Buchan Milne · Tue, 22 Apr 2008 10:09:07 +0200 ·
quoted from Robert Holden
On Friday 18 April 2008 17:47:21 Robert Holden wrote:
I have noticed quite a bit of (unnecessary) redundancy when it comes to the
cisco templates.
Why do you think it is specific to cisco templates? E.g., the if_load template 
works just as well with any device that supports the RFC-standard IFMIB (e.g. 
the linux-openwrt template has the if_load test taken almost directly from a 
cisco device). The only differences are really how devices are named, and 
thus maybe default device patterns that should be ignored.
quoted from Robert Holden
I have been able to reduce nearly all the cisco devices 
down to two templates:  cisco-switch and cisco-common
I still have a few minor issues to deal with, but should have something to
post to the group in about a weeks time.  The biggest of these issues is
finding something in the specs "model" that is common to the cisco-switch
(2811, 4003, 5500, & 6506), that is not found in all the other devices.
Simularily, I would like to find something in the specs "model" that is
common to all other cisco devices (cisco-common).

note: Many switches are still able to use cisco-common (2900, 3500, 3550,
etc), so I probably have to come up with a better name for cisco-switch.
Well, the issue is that you shouldn't really distinguish features on a device 
based on the hardware model in the first place.

If we stick to the Cisco topic, is a 6500 a switch? Is a 7600 a router? What 
if I put a better supervisor in the 6500 ? If I put a CSM blade into a 6500, 
or into a 7600, is one a load balancer and the other not?

Moving on, if I run a RADIUS server (which supports the RADIUS MIB) on a HP 
ProLiant, is a Dell PowerEdge *not* a RADIUS server?

So, yes, I think we need a new approach to:
1)Which tests are done on a specific device
2)Which tests are done by default on a device of a specific kind of hardware
I will see what I can find on your subinterfaces issue.
IMHO, if the device lies over SNMP, you should report it to the vendor, rather 
than workaround the problem in an SNMP manager.
quoted from Robert Holden
I am also working on an idea (change to devmon) to allow for "default"
templates depending on vendor.
I would prefer that you discuss any design issues on the development list ...
quoted from Robert Holden
Robert Holden

On Fri, Apr 18, 2008 at 7:17 AM, Chris Wopat <user-8ece45634613@xymon.invalid> wrote:
Hello,

Chiming in on some info on Devmon. While primarily targeted to the Devmon
list, it may be useful to hobbit/devmon users who don't subscribe to that
list.

The cisco-7206 template works perfectly fine on a Cisco 7500. I'm sure it
works on a 7200 as well. I also have an old 7000 here, but I don't want
to boot it up to test. Anyway, it may be in the best interest to rename
7206 to 7200, and just copy its templates to a 7500 folder, or genericly
rename the whole thing cisco-7000.

Also, there is a typo in the USING doc:


http://devmon.svn.sourceforge.net/viewvc/devmon/trunk/docs/USING?revision
=3&view=markup
quoted from Robert Holden

This line is listed:
       DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y=50;r=90)

But it should be:
       DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y:50;r:90)

It's correct in the details furter down the page, but the equal symbols
should be colons near the top when it first mentions thresh().

Lastly, and this is very minor, Devmon doesn't properly detect
administratively down interfaces in all cases. On one router, I am using
subinterfaces as follows:

GigabitEthernet0/2
GigabitEthernet0/2.1
GigabitEthernet0/2.2
GigabitEthernet0/2.3
..etc..

If I shut down Gi0/2, 'sh ip int br' shows its subinterfaces
administratively down, but devmon doesn't detect that- one has to go into
each subinterface and shut them down as well. It does appear that the OID
that checks admin status (.1.3.6.1.2.1.2.2.1.7) does indeed say up, which
is why it's showing red:

ifAdminStatus.89 = INTEGER: up(1)

I couldnt find any alternate OID to report ifAdminStatus, so short of
putting in code to check parent interface status, it probably couldn't be
considered a bug, but I thought I'd mention it.

--Chris

list Buchan Milne · Tue, 22 Apr 2008 11:35:55 +0200 ·
quoted from Chris Wopat
On Friday 18 April 2008 16:17:35 Chris Wopat wrote:
Hello,

Chiming in on some info on Devmon. While primarily targeted to the
Devmon list, it may be useful to hobbit/devmon users who don't subscribe
to that list.

The cisco-7206 template works perfectly fine on a Cisco 7500. I'm sure
it works on a 7200 as well. I also have an old 7000 here, but I don't
want to boot it up to test. Anyway, it may be in the best interest to
rename 7206 to 7200, and just copy its templates to a 7500 folder, or
genericly rename the whole thing cisco-7000.

Also, there is a typo in the USING doc:

http://devmon.svn.sourceforge.net/viewvc/devmon/trunk/docs/USING?revision=3
&view=markup

This line is listed:
	DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y=50;r=90)

But it should be:
	DEVMON:tests(cpu),thresh(cpu;CPUTotal5Min;y:50;r:90)
I've fixed this locally (I ran into it myself earlier but was too busy to fix 
it). I'll commit it later.
quoted from Buchan Milne
It's correct in the details furter down the page, but the equal symbols
should be colons near the top when it first mentions thresh().

Lastly, and this is very minor, Devmon doesn't properly detect
administratively down interfaces in all cases. On one router, I am using
subinterfaces as follows:

GigabitEthernet0/2
GigabitEthernet0/2.1
GigabitEthernet0/2.2
GigabitEthernet0/2.3
..etc..

If I shut down Gi0/2, 'sh ip int br' shows its subinterfaces
administratively down, but devmon doesn't detect that- one has to go
into each subinterface and shut them down as well. It does appear that
the OID that checks admin status (.1.3.6.1.2.1.2.2.1.7) does indeed say
up, which is why it's showing red:

ifAdminStatus.89 = INTEGER: up(1)
Right, so the router is lying to you. I would prefer not to workaround device 
bugs in devmon itself. If you can, you should log a TAC case regarding this 
(e.g. "Interface status reported via SNMP does not match the configured 
status").

In the mean time you can work around it with exceptions in the bb-hosts file, 
such as:

DEVMON:except(if_stat;ifName;na:Gi\d+/\d+\.\d+)

(which would ignore the if_status for all GigabitEthernet sub-interfaces, or 
you could make it more specific if you want).

Regards,
Buchan
list Robert Holden · Tue, 22 Apr 2008 12:10:39 -0700 ·
On Tue, Apr 22, 2008 at 1:09 AM, Buchan Milne <user-9b139aff4dec@xymon.invalid>
quoted from Buchan Milne
wrote:
On Friday 18 April 2008 17:47:21 Robert Holden wrote:
I have noticed quite a bit of (unnecessary) redundancy when it comes to
the
cisco templates.
Why do you think it is specific to cisco templates? E.g., the if_load
template
works just as well with any device that supports the RFC-standard IFMIB
(e.g.
the linux-openwrt template has the if_load test taken almost directly from
a
cisco device). The only differences are really how devices are named, and
thus maybe default device patterns that should be ignored.
Most of the equipment we are monitoring is cisco , as hobbit is used to
monitor all our servers.  As a result, I do not have enough experience with
SNMP as it relates to servers to answer your question.  As for RFC-standard
IFMIB, you are right, all cisco devices should follow these standards, but
these relate to Interfaces on the devices.  But having a static oid for all
interfaces will not always work:

ifSpeed [ifBps] (1.3.6.1.2.1.2.2.1.5) vs ifHighSpeed (
1.3.6.1.2.1.31.1.1.1.15)

   The range of ifSpeed is limited to
   reporting a maximum speed of (2**31)-1 bits/second, or approximately
   2.2Gbs.  SONET defines an OC-48 interface, which is defined at
   operating at 48 times 51 Mbs, which is a speed in excess of 2.4Gbs.
   Thus, ifSpeed is insufficient for the future, and this memo defines
   an additional object: ifHighSpeed.

   The ifHighSpeed object reports the speed of the interface in
   1,000,000 (1 million) bits/second units.  Thus, the true speed of the
   interface will be the value reported by this object, plus or minus
   500,000 bits/second.   [RFC 2233
<http://www1.tools.ietf.org/html/rfc2233>;, 3.1.7]


ifInOctets (.1.3.6.1.2.1.2.2.1.10) vs ifHCInOctets (1.3.6.1.2.1.31.1.1.1.6)
ifOutOctets (1.3.6.1.2.1.2.2.1.16) vs ifHCOutOctets (1.3.6.1.2.1.31.1.1.1.10
)

   As the speed of network media increase, the minimum time in which
   a 32 bit counter will wrap decreases.  For example, a 10Mbs stream
   of back-to-back, full-size packets causes ifInOctets to wrap in
   just over 57 minutes; at 100Mbs, the minimum wrap time is 5.7
   minutes, and at 1Gbs, the minimum is 34 seconds.  Requiring that
   interfaces be polled frequently enough not to miss a counter wrap
   is increasingly problematic.   [RFC 2233
<http://www1.tools.ietf.org/html/rfc2233>;, 3.1.6]

As devmon polls data every 5 minutes, it probably should use the HC versions
of counters when needed (Gb+ speeds).  Is there a transform for performing
an IF statement/substitution?

Example: IF the ifSpeed > 20Mb, use ifHCInOctets instead of ifInOctets.

   For interfaces that operate at 20,000,000 (20 million) bits per
   second or less, 32-bit byte and packet counters MUST be used.  For
   interfaces that operate faster than 20,000,000 bits/second, and
   slower than 650,000,000 bits/second, 32-bit packet counters MUST
   be used and 64-bit octet counters MUST be used.  For interfaces
   that operate at 650,000,000 bits/second or faster, 64-bit packet
   counters AND 64-bit octet counters MUST be used. [RFC 2233
<http://www1.tools.ietf.org/html/rfc2233>;, 3.1.6]


Some tests, such as serial, fans & power have some differences from device
to device.  At times an OID is not available (power/fans), other times, the
information is only available under a different OID (serial).  So this
creates some difference between templates (hence cisco-common vs
cisco-switch in my previous email).
quoted from Buchan Milne

I have been able to reduce nearly all the cisco devices
down to two templates:  cisco-switch and cisco-common
I still have a few minor issues to deal with, but should have something
to
post to the group in about a weeks time.  The biggest of these issues is
finding something in the specs "model" that is common to the
cisco-switch
(2811, 4003, 5500, & 6506), that is not found in all the other devices.
Simularily, I would like to find something in the specs "model" that is
common to all other cisco devices (cisco-common).

note: Many switches are still able to use cisco-common (2900, 3500,
3550,
etc), so I probably have to come up with a better name for cisco-switch.
Well, the issue is that you shouldn't really distinguish features on a
device
based on the hardware model in the first place.

If we stick to the Cisco topic, is a 6500 a switch? Is a 7600 a router?
What
if I put a better supervisor in the 6500 ? If I put a CSM blade into a
6500,
or into a 7600, is one a load balancer and the other not?

Moving on, if I run a RADIUS server (which supports the RADIUS MIB) on a
HP
ProLiant, is a Dell PowerEdge *not* a RADIUS server?

So, yes, I think we need a new approach to:
1)Which tests are done on a specific device
2)Which tests are done by default on a device of a specific kind of
hardware
What about IOS vs CATOS, or differences between versions of IOS?  I have yet
to come up with a better way to do this, but I thinking it will be along the
lines of:
   1. SNMP Get manufacturer
   2. SNMP Get hardware model
   3. SNMP Get OS & OS Version
   4. SNMP Get Software & Version ??
   5. Run appropriate tests

Unfortunately, this can mess up the nice & clean layout to the templates
that devmon has now.
quoted from Buchan Milne

I will see what I can find on your subinterfaces issue.
IMHO, if the device lies over SNMP, you should report it to the vendor,
rather
than workaround the problem in an SNMP manager.
I am also working on an idea (change to devmon) to allow for "default"
templates depending on vendor.
I would prefer that you discuss any design issues on the development list
...
I just signed up for the devmon-devel list.
I will post my ideas for changes & templates to that list.

Robert