Xymon Mailing List Archive search

Hobbit newbie from BB: differences and what may Ilose from migrating?

18 messages in this thread

list Kent Brodie · Wed, 2 Aug 2006 00:01:21 -0500 ·
Basically, you lose NOTHING when you change a BB server to a Hobbit
server.    At first, it'll look very similar.    Life will be good.
The BB clients can run untouched.   

However- once you start setting up a few *Hobbit* clients, you'll
quickly see what Hobbit DOES-- and what a typical BB client does NOT do.

That's the moment when you'll race to wipe BB completely.  Took me about
a week. :-)

-----Original Message-----
From: Jordan Mendler [mailto:user-d91c99e0e5c6@xymon.invalid] 
Sent: Tuesday, August 01, 2006 8:43 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Hobbit newbie from BB: differences and what may
Ilose from migrating?

Cool. I guess I'll add a second display to bb-hosts and give hobbit a
run. I'll just use Shmux to deploy bb-hosts to all the clients (figured
I'd mention that great application while I'm at it :-)

Once again, thanks for all the help everyone, hopefully my next message
here will be as a convert.

Jordan
list Joe Sloan · Tue, 01 Aug 2006 22:18:36 -0700 ·
quoted from Kent Brodie

Brodie, Kent wrote:
Basically, you lose NOTHING when you change a BB server to a Hobbit
server.    At first, it'll look very similar.    Life will be good.
The BB clients can run untouched.   

However- once you start setting up a few *Hobbit* clients, you'll
quickly see what Hobbit DOES-- and what a typical BB client does NOT do.

That's the moment when you'll race to wipe BB completely.  Took me about
a week. :-)
We'd like to replace bb with hobbit but there's no way we can't do without the
bb failover mechanism. We have 2 separate data centers, and while there are bb
servers in both data centers monitoring the hosts on both sides, only side "a"
does notifications. When side "b" can not reach side "a", then side "b" "fails
over" and takes on the notification tasks, until side "a" becomes reachable again.

There's nothing like that in hobbit yet, but if there were, we'd be able to
make the switch.

J
list Henrik Størner · Wed, 2 Aug 2006 09:31:16 +0200 ·
quoted from Joe Sloan
On Tue, Aug 01, 2006 at 10:18:36PM -0700, J Sloan wrote:
Brodie, Kent wrote:
Basically, you lose NOTHING when you change a BB server to a Hobbit
server.
We'd like to replace bb with hobbit but there's no way we can't do without the
bb failover mechanism. We have 2 separate data centers, and while there are bb
servers in both data centers monitoring the hosts on both sides, only side "a"
does notifications. When side "b" can not reach side "a", then side "b" "fails
over" and takes on the notification tasks, until side "a" becomes reachable again.

There's nothing like that in hobbit yet, but if there were, we'd be able to
make the switch.
I won't say it is being worked on, but it is definitely on my agenda.
My own setup is identical to yours, except that we have a procedure for
doing the failover from site "a" to site "b" manually. I've done some
planning for how to implement an active/passive cluster-like setup in
Hobbit, so ... it's coming.


Regards,
Henrik
list Stephane Caminade · Wed, 02 Aug 2006 12:05:25 +0200 ·
quoted from Henrik StørnerHenrik Stoerner wrote:
On Tue, Aug 01, 2006 at 10:18:36PM -0700, J Sloan wrote:

  
Brodie, Kent wrote:

    
Basically, you lose NOTHING when you change a BB server to a Hobbit

server.

      
We'd like to replace bb with hobbit but there's no way we can't do without the

bb failover mechanism. We have 2 separate data centers, and while there are bb

servers in both data centers monitoring the hosts on both sides, only side "a"

does notifications. When side "b" can not reach side "a", then side "b" "fails

over" and takes on the notification tasks, until side "a" becomes reachable again.



There's nothing like that in hobbit yet, but if there were, we'd be able to

make the switch.

    

I won't say it is being worked on, but it is definitely on my agenda.

My own setup is identical to yours, except that we have a procedure for

doing the failover from site "a" to site "b" manually. I've done some

planning for how to implement an active/passive cluster-like setup in

Hobbit, so ... it's coming.





Regards,

Henrik











  
Hi,

Have you considered setting up some kind of Heartbeat or VRRP system ?
At my lab, we use VRRP to share one IP between a master DNS and a secondary DNS which takes over if the primary fails (we have the same system for our web site and our mail server).
If the slave cannot contact the master, it takes over the 'public' IP, and can start some services, like bind or dhcpd for example.
There seems to be the same kind of possibilities with Heartbeat, but I haven t looked into it yet.
You could maybe set up your "b" site to start sending notifications in the event that site "a" is unreachable ?

Stephane

-- 

_____________________________________________________________________________

Stephane Caminade

Administrateur Systemes et Reseaux

                                   \  

Institut d'Astrophysique Spatiale  /  tel : (XX) (X) XX XX XX XX

Batiment 121, Universite Paris XI  \  fax : (XX) (X) XX XX XX XX

F-91405 ORSAY Cedex                /  www : http://www.ias.u-psud.fr/



_____________________________________________________________________________



list Beau Olivier · Wed, 2 Aug 2006 12:23:53 +0200 ·
 
Hi,
 
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log,
what could cause this ?
 
 
olivier
list Henrik Størner · Wed, 2 Aug 2006 12:59:27 +0200 ·
quoted from Beau Olivier
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:
 
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log,
what could cause this ?
It means your netstat data doesn't look like what Hobbit expects.
Basically that it found two or more values for the same piece of data.

The best way of identifying which data causes this is probably to
run two things at the same time:

1) login as the hobbit user, and run
      bbcmd hobbitd_channel --channel=data tee /tmp/data.log

2) Run "tail -f" on the rrd-data.log file.

When you see that error message in the rrd-data.log file, terminate
the first command. You should then have the "guilty" data at the end of
the /tmp/data.log file.

I'd obviously be interested to see what it looks like.


Regards,
Henrik
list Beau Olivier · Wed, 2 Aug 2006 13:50:28 +0200 ·
Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1      Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:217776970 (207.6 MiB)  TX bytes:4275403340 (3.9 GiB)
          Interrupt:201 

eth1.9    Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          inet addr:192.168.250.33  Bcast:192.168.250.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:520111630 (496.0 MiB)  TX bytes:410431496 (391.4 MiB)

eth1.15   Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          inet addr:10.11.99.99  Bcast:10.11.255.255  Mask:255.255.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1909363 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7253215 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:110322292 (105.2 MiB)  TX bytes:1702401944 (1.5 GiB)


olivier
quoted from Henrik Størner


-----Message d'origine-----
De : Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Envoyé : mercredi 2 août 2006 12:59
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] rrd-data.log


On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:
 
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log,
what could cause this ?
It means your netstat data doesn't look like what Hobbit expects.
Basically that it found two or more values for the same piece of data.

The best way of identifying which data causes this is probably to
run two things at the same time:

1) login as the hobbit user, and run
      bbcmd hobbitd_channel --channel=data tee /tmp/data.log

2) Run "tail -f" on the rrd-data.log file.

When you see that error message in the rrd-data.log file, terminate
the first command. You should then have the "guilty" data at the end of
the /tmp/data.log file.

I'd obviously be interested to see what it looks like.


Regards,
Henrik
list Kent Brodie · Wed, 2 Aug 2006 09:48:55 -0500 ·
Aha...!   I asked this very question last week- nobody gave me help...
(sob, sob..).

Anyway, here are two separate chunks of data that is CAUSING the
duplicate data error like Beau is having.

Henrik, what in this data is considered to be "duplicate"?   (I do
notice that the netstat data in AIX is um...  QUITE verbose........)

Data segments follow.  Help?


@@data#810|1154529466.624163|999.888.777.202||fred.moo.mcw.edu|ifstat
data fred,moo,mcw,edu.ifstat
aix
ETHERNET STATISTICS (ent0) :
Device Type: 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
Hardware Address: 00:02:55:4f:a9:f4
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 0                                    Packets: 0
Bytes: 0                                      Bytes: 0
Interrupts: 2                                 Interrupts: 0
Transmit Errors: 0                            Receive Errors: 0
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 0         
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 0                          Broadcast Packets: 0
Multicast Packets: 0                          Multicast Packets: 0
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 0
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors:
0
Deferred: 0                                   Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by
Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 200
Driver Flags: Up Broadcast Running 
        Simplex AlternateAddress 64BitSupport 
        ChecksumOffload PrivateSegment LargeSend 
        DataRateSet 

10/100 Mbps Ethernet PCI Adapter II (1410ff01) Specific Statistics:
Link Status: Down
Media Speed Selected: Auto negotiation
Media Speed Running: Unknown
Receive Pool Buffer Size: 1024
Free Receive Pool Buffers: 1024
No Receive Pool Buffer Errors: 0
Receive Buffer Too Small Errors: 0
Entries to transmit timeout routine: 0
Transmit IPsec packets: 0
Transmit IPsec packets dropped: 0
Receive IPsec packets: 0
Receive IPsec packets dropped: 0
Inbound IPsec SA offload count: 0
Transmit Large Send packets: 0
Transmit Large Send packets dropped: 0
Packets with Transmit collisions:
 1 collisions: 0           6 collisions: 0          11 collisions: 0
 2 collisions: 0           7 collisions: 0          12 collisions: 0
 3 collisions: 0           8 collisions: 0          13 collisions: 0
 4 collisions: 0           9 collisions: 0          14 collisions: 0
 5 collisions: 0          10 collisions: 0          15 collisions: 0
ETHERNET STATISTICS (ent1) :
Device Type: 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
Hardware Address: 00:02:55:4f:a9:f3
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 0                                    Packets: 0
Bytes: 0                                      Bytes: 0
Interrupts: 2                                 Interrupts: 0
Transmit Errors: 0                            Receive Errors: 0
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 0         
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 0                          Broadcast Packets: 0
Multicast Packets: 0                          Multicast Packets: 0
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 0
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors:
0
Deferred: 0                                   Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by
Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 200
Driver Flags: Up Broadcast Running 
        Simplex AlternateAddress 64BitSupport 
        ChecksumOffload PrivateSegment LargeSend 
        DataRateSet 

10/100 Mbps Ethernet PCI Adapter II (1410ff01) Specific Statistics:
Link Status: Down
Media Speed Selected: Auto negotiation
Media Speed Running: Unknown
Receive Pool Buffer Size: 1024
Free Receive Pool Buffers: 1024
No Receive Pool Buffer Errors: 0
Receive Buffer Too Small Errors: 0
Entries to transmit timeout routine: 0
Transmit IPsec packets: 0
Transmit IPsec packets dropped: 0
Receive IPsec packets: 0
Receive IPsec packets dropped: 0
Inbound IPsec SA offload count: 0
Transmit Large Send packets: 0
Transmit Large Send packets dropped: 0
Packets with Transmit collisions:
 1 collisions: 0           6 collisions: 0          11 collisions: 0
 2 collisions: 0           7 collisions: 0          12 collisions: 0
 3 collisions: 0           8 collisions: 0          13 collisions: 0
 4 collisions: 0           9 collisions: 0          14 collisions: 0
 5 collisions: 0          10 collisions: 0          15 collisions: 0
ETHERNET STATISTICS (ent2) :
Device Type: 10/100/1000 Base-TX PCI-X Adapter (14106902)
Hardware Address: 00:02:55:53:c2:3e
Elapsed Time: 196 days 15 hours 56 minutes 59 seconds

Transmit Statistics:                          Receive Statistics:
--------------------                          -------------------
Packets: 365802341                            Packets: 1036460447
Bytes: 1683156286637                          Bytes: 112378387074
Interrupts: 0                                 Interrupts: 614513005
Transmit Errors: 0                            Receive Errors: 0
Packets Dropped: 0                            Packets Dropped: 0
                                              Bad Packets: 0
Max Packets on S/W Transmit Queue: 30        
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 16681                      Broadcast Packets:
210557123
Multicast Packets: 0                          Multicast Packets: 283080
No Carrier Sense: 0                           CRC Errors: 0
DMA Underrun: 0                               DMA Overrun: 0
Lost CTS Errors: 0                            Alignment Errors: 0
Max Collision Errors: 0                       No Resource Errors: 0
Late Collision Errors: 0                      Receive Collision Errors:
0
Deferred: 381                                 Packet Too Short Errors: 0
SQE Test: 0                                   Packet Too Long Errors: 0
Timeout Errors: 0                             Packets Discarded by
Adapter: 0
Single Collision Count: 0                     Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running 
        Simplex 64BitSupport ChecksumOffload 
        PrivateSegment LargeSend DataRateSet 

10/100/1000 Base-TX PCI-X Adapter (14106902) Specific Statistics:
Link Status: Up
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex
PCI Mode: PCI-X (100-133)
PCI Bus Width: 64-bit
Latency Timer: 144
Cache Line Size: 128
Jumbo Frames: Disabled
TCP Segmentation Offload: Enabled
        TCP Segmentation Offload Packets Transmitted: 113291719
        TCP Segmentation Offload Packet Errors: 0
Transmit and Receive Flow Control Status: Enabled
        XON Flow Control Packets Transmitted: 0
        XON Flow Control Packets Received: 430
        XOFF Flow Control Packets Transmitted: 0
        XOFF Flow Control Packets Received: 430
Transmit and Receive Flow Control Threshold (High): 45056
Transmit and Receive Flow Control Threshold (Low): 24576
Transmit and Receive Storage Allocation (TX/RX): 16/48
@@


@@data#811|1154529466.624871|999.777.666.202||phred.mrr.mcw.edu|netstat
data phred,mrr,mcw,edu.netstat
aix
icmp:
        597 calls to icmp_error
        0 errors not generated because old message was icmp
        Output histogram:
                echo reply: 58031
                destination unreachable: 537
        1 message with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                echo reply: 2
                destination unreachable: 562
                echo: 58031
                time exceeded: 12
        58031 message responses generated
igmp:
        282998 messages received
        0 messages received with too few bytes
        0 messages received with bad checksum
        282998 membership queries received
        0 membership queries received with invalid field(s)
        0 membership reports received
        0 membership reports received with invalid field(s)
        0 membership reports received for groups to which we belong
        2 membership reports sent
tcp:
        449406211 packets sent
                419904427 data packets (1612944301 bytes)
                176131 data packets (218422100 bytes) retransmitted
                20439309 ack-only packets (8520363 delayed)
                6 URG only packets
                420337 window probe packets
                2131650 window update packets
                6334351 control packets
                113291719 large sends
                3038842766 bytes sent using largesend
                64240 bytes is the biggest largesend
        909296772 packets received
                857387271 acks (for 1835107293 bytes)
                7844383 duplicate acks
                0 acks for unsent data
                307390334 packets (2464690667 bytes) received
in-sequence
                45039 completely duplicate packets (1602267 bytes)
                0 old duplicate packets
                5 packets with some dup. data (624 bytes duped)
                2441887 out-of-order packets (440550 bytes)
                8 packets (8 bytes) of data after window
                8 window probes
                2587766 window update packets
                3902 packets received after close
                0 packets with bad hardware assisted checksum
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                1440 discarded by listeners
                0 discarded due to listener's queue full
                23840405 ack packet headers correctly predicted
                35343642 data packet headers correctly predicted
        399880 connection requests
        5548229 connection accepts
        5947965 connections established (including accepts)
        5953023 connections closed (including 25563 drops)
        0 connections with ECN capability
        0 times responded to ECN
        137 embryonic connections dropped
        404559742 segments updated rtt (of 404578267 attempts)
        0 segments with congestion window reduced bit set
        0 segments with congestion experienced bit set
        0 resends due to path MTU discovery
        10019 path MTU discovery terminations due to retransmits
        29115 retransmit timeouts
                1 connection dropped by rexmit timeout
        2 fast retransmits
                0 when congestion window less than 4 segments
        3 newreno retransmits
        0 times avoided false fast retransmits
        427603 persist timeouts
                0 connections dropped due to persist timeout
        8957 keepalive timeouts
                8928 keepalive probes sent
                29 connections dropped by keepalive
        0 times SACK blocks array is extended
        0 times SACK holes array is extended
        0 packets dropped due to memory allocation failure
        0 connections in timewait reused
        0 delayed ACKs for SYN
        0 delayed ACKs for FIN
        0 send_and_disconnects
        0 spliced connections
        0 spliced connections closed
        0 spliced connections reset
        0 spliced connections timeout
        0 spliced connections persist timeout
        0 spliced connections keepalive timeout
udp:
        13767017 datagrams received
        0 incomplete headers
        0 bad data length fields
        0 bad checksums
        597 dropped due to no socket
        6762980 broadcast/multicast datagrams dropped due to no socket
        0 dropped due to full socket buffers
        7003440 delivered
        6994616 datagrams output
ip:
        923406868 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets reassembled ok
        923121828 packets for this host
        283574 packets for unknown/unsupported protocol
        0 packets forwarded
        1396 packets not forwardable
        0 redirects sent
        456499667 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        0 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        69 IP Multicast packets dropped due to no receiver
        0 successful path MTU discovery cycles
        0 path MTU rediscovery cycles attempted
        0 path MTU discovery no-response estimates
        0 path MTU discovery response timeouts
        0 path MTU discovery decreases detected
        0 path MTU discovery packets sent
        0 path MTU discovery memory allocation failures
        0 ipintrq overflows
        0 with illegal source
        0 packets processed by threads
        0 packets dropped by threads
        0 packets dropped due to the full socket receive buffer
        0 dead gateway detection packets sent
        0 dead gateway detection packet allocation failures
        0 dead gateway detection gateway allocation failures

ipv6:
        3 total packets received
        0 with size smaller than minimum
        0 with data size < data length
        0 with incorrect version number
        0 with illegal source
        0 input packets without enough memory
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets reassembled ok
        0 packets for this host
        0 packets for unknown/unsupported protocol
        0 packets forwarded
        3 packets not forwardable
        0 too big packets not forwarded
        0 packets sent from this host
        0 packets sent with fabricated ipv6 header
        0 output packets dropped due to no bufs
        0 output packets without enough memory
        0 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 packets dropped due to full socket receive buffer
        0 packets not delivered due to bad raw IPv6 checksum
icmpv6:
        0 calls to icmp6_error
        0 errors not generated because old message was icmpv6
        Output histogram:
                unreachable: 0
                packets too big: 0
                time exceeded: 0
                parameter problems: 0
                redirects: 0
                echo requests: 0
                echo replies: 0
                group queries: 0
                group reports: 0
                group terminations: 0
                router solicitations: 0
                router advertisements: 0
                neighbor solicitations: 0
                neighbor advertisements: 0
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                unreachable: 0
                packets too big: 0
                time exceeded: 0
                parameter problems: 0
                echo requests: 0
                echo replies: 0
                group queries: 0
                        bad group queries: 0
                group reports: 0
                        bad group reports: 0
                        our groups' reports: 0
                group terminations: 0
                bad group terminations: 0
                router solicitations: 0
                bad router solicitations: 0
                router advertisements: 0
                bad router advertisements: 0
                neighbor solicitations: 0
                bad neighbor solicitations: 0
                neighbor advertisements: 0
                bad neighbor advertisements: 0
                redirects: 0
                bad redirects: 0
                mobility calls when not started: 0
                home agent address discovery requests: 0
                bad home agent address discovery requests: 0
                bad home agent address discovery replys: 0
                bad home agent address discovery replys: 0
                prefix solicitations: 0
                bad prefix solicitations: 0
                prefix advertisements: 0
                bad prefix advertisements: 0
        0 message responses generated
@@


@@data#1228|1154529774.054469|141.106.224.202||jordan.hmgc.mcw.edu|netst
at
data jordan,hmgc,mcw,edu.netstat
aix
icmp:
        29 calls to icmp_error
        0 errors not generated because old message was icmp
        Output histogram:
                echo reply: 16024
                destination unreachable: 17
        45 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                echo reply: 7
                destination unreachable: 58
                echo: 16024
        16024 message responses generated
igmp:
        79096 messages received
        0 messages received with too few bytes
        0 messages received with bad checksum
        79096 membership queries received
        0 membership queries received with invalid field(s)
        0 membership reports received
        0 membership reports received with invalid field(s)
        0 membership reports received for groups to which we belong
        2 membership reports sent
tcp:
        256987083 packets sent
                233786265 data packets (146658044 bytes)
                30885 data packets (6116690 bytes) retransmitted
                16701059 ack-only packets (3980875 delayed)
                0 URG only packets
                193 window probe packets
                17752 window update packets
                6450950 control packets
                45166554 large sends
                740111354 bytes sent using largesend
                64240 bytes is the biggest largesend
        496262560 packets received
                450701474 acks (for 151098725 bytes)
                9591183 duplicate acks
                0 acks for unsent data
                195459972 packets (949784396 bytes) received in-sequence
                23755 completely duplicate packets (8371 bytes)
                0 old duplicate packets
                0 packets with some dup. data (0 bytes duped)
                3375737 out-of-order packets (104293 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                4869090 window update packets
                3508 packets received after close
                0 packets with bad hardware assisted checksum
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                1117 discarded by listeners
                0 discarded due to listener's queue full
                10028924 ack packet headers correctly predicted
                24673885 data packet headers correctly predicted
        100631 connection requests
        6257521 connection accepts
        6358083 connections established (including accepts)
        6359382 connections closed (including 24024 drops)
        0 connections with ECN capability
        0 times responded to ECN
        65 embryonic connections dropped
        240375423 segments updated rtt (of 240404633 attempts)
        0 segments with congestion window reduced bit set
        0 segments with congestion experienced bit set
        0 resends due to path MTU discovery
        14379 path MTU discovery terminations due to retransmits
        31032 retransmit timeouts
                5 connections dropped by rexmit timeout
        3 fast retransmits
                0 when congestion window less than 4 segments
        10 newreno retransmits
        5 times avoided false fast retransmits
        195 persist timeouts
                0 connections dropped due to persist timeout
        1407 keepalive timeouts
                1393 keepalive probes sent
                14 connections dropped by keepalive
        0 times SACK blocks array is extended
        0 times SACK holes array is extended
        0 packets dropped due to memory allocation failure
        0 connections in timewait reused
        0 delayed ACKs for SYN
        0 delayed ACKs for FIN
        0 send_and_disconnects
        0 spliced connections
        0 spliced connections closed
        0 spliced connections reset
        0 spliced connections timeout
        0 spliced connections persist timeout
        0 spliced connections keepalive timeout
udp:
        6773177 datagrams received
        0 incomplete headers
        0 bad data length fields
        0 bad checksums
        29 dropped due to no socket
        471205 broadcast/multicast datagrams dropped due to no socket
        0 socket buffer overflows
        6301943 delivered
        6301977 datagrams output
ip:
        504670433 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets reassembled ok
        503051772 packets for this host
        79154 packets for unknown/unsupported protocol
        0 packets forwarded
        1539510 packets not forwardable
        0 redirects sent
        263316001 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        0 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        0 IP Multicast packets dropped due to no receiver
        0 successful path MTU discovery cycles
        0 path MTU rediscovery cycles attempted
        0 path MTU discovery no-response estimates
        0 path MTU discovery response timeouts
        0 path MTU discovery decreases detected
        0 path MTU discovery packets sent
        0 path MTU discovery memory allocation failures
        0 ipintrq overflows
        0 with illegal source
        0 packets processed by threads
        0 packets dropped by threads
        0 packets dropped due to the full socket receive buffer
        0 dead gateway detection packets sent
        0 dead gateway detection packet allocation failures
        0 dead gateway detection gateway allocation failures

ipv6:
        0 total packets received
        0 with size smaller than minimum
        0 with data size < data length
        0 with incorrect version number
        0 with illegal source
        0 input packets without enough memory
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 packets reassembled ok
        0 packets for this host
        0 packets for unknown/unsupported protocol
        0 packets forwarded
        0 packets not forwardable
        0 too big packets not forwarded
        0 packets sent from this host
        0 packets sent with fabricated ipv6 header
        0 output packets dropped due to no bufs
        0 output packets without enough memory
        0 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 packets dropped due to full socket receive buffer
        0 packets not delivered due to bad raw IPv6 checksum
icmpv6:
        0 calls to icmp6_error
        0 errors not generated because old message was icmpv6
        Output histogram:
                unreachable: 0
                packets too big: 0
                time exceeded: 0
                parameter problems: 0
                redirects: 0
                echo requests: 0
                echo replies: 0
                group queries: 0
                group reports: 0
                group terminations: 0
                router solicitations: 0
                router advertisements: 0
                neighbor solicitations: 0
                neighbor advertisements: 0
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                unreachable: 0
                packets too big: 0
                time exceeded: 0
                parameter problems: 0
                echo requests: 0
                echo replies: 0
                group queries: 0
                        bad group queries: 0
                group reports: 0
                        bad group reports: 0
                        our groups' reports: 0
                group terminations: 0
                bad group terminations: 0
                router solicitations: 0
                bad router solicitations: 0
                router advertisements: 0
                bad router advertisements: 0
                neighbor solicitations: 0
                bad neighbor solicitations: 0
                neighbor advertisements: 0
                bad neighbor advertisements: 0
                redirects: 0
                bad redirects: 0
                mobility calls when not started: 0
                home agent address discovery requests: 0
                bad home agent address discovery requests: 0
                bad home agent address discovery replys: 0
                bad home agent address discovery replys: 0
                prefix solicitations: 0
                bad prefix solicitations: 0
                prefix advertisements: 0
                bad prefix advertisements: 0
        0 message responses generated
@@


Kent C. Brodie - user-da7f7d5174c0@xymon.invalid
Department of Physiology
Medical College of Wisconsin
(XXX) XXX-XXXX
quoted from Beau Olivier
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, August 02, 2006 5:59 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:
 
I'm having "Internal error: Duplicate match ignored" in my
rrd-data.log,
what could cause this ?
It means your netstat data doesn't look like what Hobbit expects.
Basically that it found two or more values for the same piece of data.

The best way of identifying which data causes this is probably to
run two things at the same time:

1) login as the hobbit user, and run
      bbcmd hobbitd_channel --channel=data tee /tmp/data.log

2) Run "tail -f" on the rrd-data.log file.

When you see that error message in the rrd-data.log file, terminate
the first command. You should then have the "guilty" data at the end of
the /tmp/data.log file.

I'd obviously be interested to see what it looks like.


Regards,
Henrik
list Henrik Størner · Wed, 2 Aug 2006 16:52:37 +0200 ·
quoted from Beau Olivier
On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:
Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1      Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:217776970 (207.6 MiB)  TX bytes:4275403340 (3.9 GiB)
          Interrupt:201 

eth1.9    Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          inet addr:192.168.250.33  Bcast:192.168.250.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:520111630 (496.0 MiB)  TX bytes:410431496 (391.4 MiB)
Perhaps, but these data should not get anywhere near the code that
prints out this message. The code that generates that message is
the one that parses the output from "netstat -s" which should look like

Ip:
    3017099 total packets received
    1 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    3017058 incoming packets delivered
    3154813 requests sent out
Icmp:
    51081 ICMP messages received
    0 input ICMP message failed.

What does this command report on your host?


Regards,
Henrik
list Kent Brodie · Wed, 2 Aug 2006 10:01:02 -0500 ·
I am stabbing in the dark here, but the duplicate data on my end seems
to be caused by parsing the output of the netstat -s command on *AIX*.
Here, what is different is that the netstat -s command on aix is much
more verbose, showing stuff for ipv4 and ipv6.    Perhaps the "icmp:"
and "icmpv6:" or other similar items is where the parsing breaks, and
supposed duplicates are detected?
signature


Kent C. Brodie - user-da7f7d5174c0@xymon.invalid
Department of Physiology
Medical College of Wisconsin
(XXX) XXX-XXXX

-----Original Message-----

quoted from Henrik Størner
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Wednesday, August 02, 2006 9:53 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] rrd-data.log

On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:
Hi,

yes, this is interesting, and i think it points out a new problem,
802.1q on nics :
eth1      Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:217776970 (207.6 MiB)  TX bytes:4275403340 (3.9
GiB)
          Interrupt:201 

eth1.9    Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          inet addr:192.168.250.33  Bcast:192.168.250.0
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:520111630 (496.0 MiB)  TX bytes:410431496 (391.4
MiB)

Perhaps, but these data should not get anywhere near the code that
prints out this message. The code that generates that message is
the one that parses the output from "netstat -s" which should look like

Ip:
    3017099 total packets received
    1 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    3017058 incoming packets delivered
    3154813 requests sent out
Icmp:
    51081 ICMP messages received
    0 input ICMP message failed.

What does this command report on your host?


Regards,
Henrik
list Beau Olivier · Wed, 2 Aug 2006 17:02:41 +0200 ·
here the output from data.log about netstat :

Ip:
    6774912 total packets received
    8069 forwarded
    0 incoming packets discarded
    6766842 incoming packets delivered
    12918060 requests sent out
Icmp:
    725255 ICMP messages received
    1 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 712247
        timeout in transit: 30
        echo requests: 4212
        echo replies: 8766
    716456 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 712244
        echo replies: 4212
Tcp:
    19091 active connections openings
    20926 passive connection openings
    1 failed connection attempts
    2952 connection resets received
    2 connections established
    4460418 segments received
    11359826 segments send out
    76470 segments retransmited
    0 bad segments received.
    3763 resets sent
Udp:
    105882 packets received
    711976 packets to unknown port received.
    0 packet receive errors
    817609 packets sent
TcpExt:
    ArpFilter: 0
    24632 TCP sockets finished time wait in fast timer
    1629 delayed acks sent
    2 delayed acks further delayed because of locked socket
    Quick ack mode was activated 10 times
    1101 packets directly queued to recvmsg prequeue.
    208762 packets directly received from backlog
    8763 packets directly received from prequeue
    417698 packets header predicted
    155 packets header predicted and directly queued to user
    TCPPureAcks: 586976
    TCPHPAcks: 3180140
    TCPRenoRecovery: 0
    TCPSackRecovery: 46644
    TCPSACKReneging: 0
    TCPFACKReorder: 0
    TCPSACKReorder: 0
    TCPRenoReorder: 0
    TCPTSReorder: 0
    TCPFullUndo: 0
    TCPPartialUndo: 0
    TCPDSACKUndo: 0
    TCPLossUndo: 1
    TCPLoss: 20751
    TCPLostRetransmit: 61
    TCPRenoFailures: 0
    TCPSackFailures: 272
    TCPLossFailures: 1
    TCPFastRetrans: 62624
    TCPForwardRetrans: 1657
    TCPSlowStartRetrans: 2052
    TCPTimeouts: 3170
    TCPRenoRecoveryFail: 0
    TCPSackRecoveryFail: 3672
    TCPSchedulerFailed: 0
    TCPRcvCollapsed: 0
    TCPDSACKOldSent: 11
    TCPDSACKOfoSent: 0
    TCPDSACKRecv: 0
    TCPDSACKOfoRecv: 0
    TCPAbortOnSyn: 0
    TCPAbortOnData: 1432
    TCPAbortOnClose: 9
    TCPAbortOnMemory: 0
    TCPAbortOnTimeout: 8
    TCPAbortOnLinger: 0
    TCPAbortFailed: 0
    TCPMemoryPressures: 0
quoted from Kent Brodie


-----Message d'origine-----
De : Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Envoyé : mercredi 2 août 2006 16:53
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] rrd-data.log


On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:
Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1      Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:217776970 (207.6 MiB)  TX bytes:4275403340 (3.9 GiB)
          Interrupt:201 

eth1.9    Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          inet addr:192.168.250.33  Bcast:192.168.250.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:520111630 (496.0 MiB)  TX bytes:410431496 (391.4 MiB)
Perhaps, but these data should not get anywhere near the code that
prints out this message. The code that generates that message is
the one that parses the output from "netstat -s" which should look like

Ip:
    3017099 total packets received
    1 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    3017058 incoming packets delivered
    3154813 requests sent out
Icmp:
    51081 ICMP messages received
    0 input ICMP message failed.

What does this command report on your host?


Regards,
Henrik
list Henrik Størner · Wed, 2 Aug 2006 17:05:35 +0200 ·
quoted from Beau Olivier
On Wed, Aug 02, 2006 at 04:52:37PM +0200, Henrik Stoerner wrote:
On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:
Hi,

yes, this is interesting, and i think it points out a new problem, 802.1q on nics :

eth1      Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:217776970 (207.6 MiB)  TX bytes:4275403340 (3.9 GiB)
          Interrupt:201 

eth1.9    Link encap:Ethernet  HWaddr 00:0D:9D:4E:11:9C  
          inet addr:192.168.250.33  Bcast:192.168.250.0  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:520111630 (496.0 MiB)  TX bytes:410431496 (391.4 MiB)
Perhaps, but these data should not get anywhere near the code that
prints out this message.
Yikes, I cannot remember my own code. You're right - it IS the interface
statistics code that triggers this error. OK, I'll try and work out why
and how it can be fixed.


Regards,
Henrik
list Henrik Størner · Wed, 2 Aug 2006 17:06:30 +0200 ·
quoted from Kent Brodie
On Wed, Aug 02, 2006 at 10:01:02AM -0500, Brodie, Kent wrote:
I am stabbing in the dark here, but the duplicate data on my end seems
to be caused by parsing the output of the netstat -s command on *AIX*.
No, it's me who is confused. Thanks for your aix data, they do give me a
way of reproducing the problem.


Regards,
Henrik
list Dominique Frise · Wed, 02 Aug 2006 17:13:30 +0200 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Wed, Aug 02, 2006 at 10:01:02AM -0500, Brodie, Kent wrote:
I am stabbing in the dark here, but the duplicate data on my end seems
to be caused by parsing the output of the netstat -s command on *AIX*.
No, it's me who is confused. Thanks for your aix data, they do give me a
way of reproducing the problem.


Regards,
Henrik

We had same problem with following data (client is RHAS2.1). Same statistics 
are reported for both eth0 interfaces:

@@data#366293|1154521218.440675|1.2.7.23||tulp|ifstat
data tulp.ifstat
linux22
eth0      Link encap:Ethernet  HWaddr 00:0C:29:FC:14:DD
           inet addr:1.2.5.36  Bcast:1.2.5.255  Mask:255.255.255.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:167690690 errors:2305 dropped:2628 overruns:0 frame:0
           TX packets:155904732 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:3709223888 (3537.3 Mb)  TX bytes:2014658132 (1921.3 Mb)
           Interrupt:10 Base address:0x1080

eth0:1    Link encap:Ethernet  HWaddr 00:0C:29:FC:14:DD
           inet addr:1.2.5.56  Bcast:1.2.5.255  Mask:255.255.255.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:167690690 errors:2305 dropped:2628 overruns:0 frame:0
           TX packets:155904732 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:3709223888 (3537.3 Mb)  TX bytes:2014658132 (1921.3 Mb)
           Interrupt:10 Base address:0x1080

lo        Link encap:Local Loopback
           inet addr:127.0.0.1  Mask:255.0.0.0
           UP LOOPBACK RUNNING  MTU:16436  Metric:1
           RX packets:18870 errors:0 dropped:0 overruns:0 frame:0
           TX packets:18870 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:4785828 (4.5 Mb)  TX bytes:4785828 (4.5 Mb)

@@
@@data#366294|1154521218.440921|130.223.27.23||tulp|vmstat
data tulp.vmstat
linux22
...
...

Hope this helps


Dominique
UNIL - University of Lausanne
list Henrik Størner · Wed, 2 Aug 2006 17:30:06 +0200 ·
quoted from Beau Olivier
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:
 
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log,
what could cause this ?
Turns out to be a couple of bad regular expressions in the interface
statistics code. This patch should fix it for both the AIX and Linux
systems you've reported this on.


Regards,
Henrik

-------------- next part --------------
--- hobbitd/rrd/do_ifstat.c	2006/08/01 21:32:37	1.7
+++ hobbitd/rrd/do_ifstat.c	2006/08/02 15:25:48
@@ -20,7 +20,7 @@
 /* eth0   Link encap:                                                 */
 /*        RX bytes: 1829192 (265.8 MiB)  TX bytes: 1827320 (187.7 MiB */
 static const char *ifstat_linux_exprs[] = {
-	"^([a-z]+[0-9]+)\\s",
+	"^([a-z]+[0123456789.:]+)\\s",
 	"^\\s+RX bytes:([0-9]+) .*TX bytes.([0-9]+) "
 };
 
@@ -73,7 +73,7 @@
 */
 static const char *ifstat_aix_exprs[] = {
 	"^ETHERNET STATISTICS \\(([a-z0-9]+)\\) :",
-	"^Bytes:\\s+(\\d+)\\s+(\\d+)"
+	"^Bytes:\\s+(\\d+)\\s+Bytes:\\s+(\\d+)"
 };
list Joe Sloan · Wed, 02 Aug 2006 09:22:02 -0700 ·
quoted from Stephane Caminade

Stephane Caminade wrote:
Have you considered setting up some kind of Heartbeat or VRRP system ?
At my lab, we use VRRP to share one IP between a master DNS and a
secondary DNS which takes over if the primary fails (we have the same
system for our web site and our mail server).
If the slave cannot contact the master, it takes over the 'public' IP,
and can start some services, like bind or dhcpd for example.
There seems to be the same kind of possibilities with Heartbeat, but I
haven t looked into it yet.
You could maybe set up your "b" site to start sending notifications in
the event that site "a" is unreachable ?
We thought about this, and the problem with the generic solutions is that they
 tend to be active/passive. We need both sides active and fully functional all
the time, just without redundant notifications, and the failover mechanism of
bb does exactly what is needed, out of the box.

We could, given enough time and effort, implement something that would do what
we need, but management tends to be very conservative about change, and very
reluctant to allow us to spend time on anything not related to the current
projects. It's the power of inertia, and the old "If it ain't broke, don't fix
it" mentality. IOW, the bb/bbgen-3.6 combo is "good enough" to keep running.


J
list Ralph Mitchell · Thu, 3 Aug 2006 01:32:11 -0500 ·
quoted from Joe Sloan
On 8/2/06, J Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
We could, given enough time and effort, implement something that would do what
we need, but management tends to be very conservative about change, and very
reluctant to allow us to spend time on anything not related to the current
projects. It's the power of inertia, and the old "If it ain't broke, don't fix
it" mentality. IOW, the bb/bbgen-3.6 combo is "good enough" to keep running.
I have a similar kind of management.  I came across Hobbit around
Christmas and have been running it in parallel to Big Brother since
then.  The problem of how to switch over was solved for me back in May
when the power supply in my Big Brother server blew out.  I swear it
was nothing I did... :)  The machine is old and probably off
maintenance, so I figured it would be faster to load a backup copy of
my checkout scripts onto the Hobbit server and run with that.

Everybody I've spoken with about it either doesn't care or prefers
Hobbit.  The lone exception being one person who would prefer to just
click on a recycle icon to flip between the main page and the summary,
instead of using the drop-down menu...

Ralph Mitchell
list Rolf Schrittenlocher · Thu, 03 Aug 2006 09:24:11 +0200 ·
Hi,

we have the same issue for netstat and vmstat on Sun Solaris 9 (hobbit 
4.1.2). And we had it for other tests as well while running more than 
two instances of hobbit client usinf different virtual hosts on one machine.

regards
Rolf
quoted from Beau Olivier
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:
 
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log,
what could cause this ?
   
Turns out to be a couple of bad regular expressions in the interface
statistics code. This patch should fix it for both the AIX and Linux
systems you've reported this on.


Regards,
Henrik

 
--- hobbitd/rrd/do_ifstat.c	2006/08/01 21:32:37	1.7
+++ hobbitd/rrd/do_ifstat.c	2006/08/02 15:25:48
@@ -20,7 +20,7 @@
/* eth0   Link encap:                                                 */
/*        RX bytes: 1829192 (265.8 MiB)  TX bytes: 1827320 (187.7 MiB */
static const char *ifstat_linux_exprs[] = {
-	"^([a-z]+[0-9]+)\\s",
+	"^([a-z]+[0123456789.:]+)\\s",
	"^\\s+RX bytes:([0-9]+) .*TX bytes.([0-9]+) "
};

@@ -73,7 +73,7 @@
*/
static const char *ifstat_aix_exprs[] = {
	"^ETHERNET STATISTICS \\(([a-z0-9]+)\\) :",
-	"^Bytes:\\s+(\\d+)\\s+(\\d+)"
+	"^Bytes:\\s+(\\d+)\\s+Bytes:\\s+(\\d+)"
};

-- 

Mit freundlichen Gruessen
Rolf Schrittenlocher

HRZ/BDV, Senckenberganlage 31, 60054 Frankfurt 
Tel: (XX) XX - XXX XXXXX   Fax: (XX) XX - XXX XXXXX
LBS: user-1e39a1813094@xymon.invalid
Persoenlich: user-6ea8e907e200@xymon.invalid