Hobbit newbie from BB: differences and what may Ilose from migrating?
list Kent Brodie
Basically, you lose NOTHING when you change a BB server to a Hobbit server. At first, it'll look very similar. Life will be good. The BB clients can run untouched. However- once you start setting up a few *Hobbit* clients, you'll quickly see what Hobbit DOES-- and what a typical BB client does NOT do. That's the moment when you'll race to wipe BB completely. Took me about a week. :-) -----Original Message----- From: Jordan Mendler [mailto:user-d91c99e0e5c6@xymon.invalid] Sent: Tuesday, August 01, 2006 8:43 PM To: user-ae9b8668bcde@xymon.invalid Subject: RE: [hobbit] Hobbit newbie from BB: differences and what may Ilose from migrating? Cool. I guess I'll add a second display to bb-hosts and give hobbit a run. I'll just use Shmux to deploy bb-hosts to all the clients (figured I'd mention that great application while I'm at it :-) Once again, thanks for all the help everyone, hopefully my next message here will be as a convert. Jordan
list Joe Sloan
▸
Brodie, Kent wrote:
Basically, you lose NOTHING when you change a BB server to a Hobbit server. At first, it'll look very similar. Life will be good. The BB clients can run untouched. However- once you start setting up a few *Hobbit* clients, you'll quickly see what Hobbit DOES-- and what a typical BB client does NOT do. That's the moment when you'll race to wipe BB completely. Took me about a week. :-)
We'd like to replace bb with hobbit but there's no way we can't do without the bb failover mechanism. We have 2 separate data centers, and while there are bb servers in both data centers monitoring the hosts on both sides, only side "a" does notifications. When side "b" can not reach side "a", then side "b" "fails over" and takes on the notification tasks, until side "a" becomes reachable again. There's nothing like that in hobbit yet, but if there were, we'd be able to make the switch. J
list Henrik Størner
▸
On Tue, Aug 01, 2006 at 10:18:36PM -0700, J Sloan wrote:
Brodie, Kent wrote:Basically, you lose NOTHING when you change a BB server to a Hobbit server.We'd like to replace bb with hobbit but there's no way we can't do without the bb failover mechanism. We have 2 separate data centers, and while there are bb servers in both data centers monitoring the hosts on both sides, only side "a" does notifications. When side "b" can not reach side "a", then side "b" "fails over" and takes on the notification tasks, until side "a" becomes reachable again. There's nothing like that in hobbit yet, but if there were, we'd be able to make the switch.
I won't say it is being worked on, but it is definitely on my agenda. My own setup is identical to yours, except that we have a procedure for doing the failover from site "a" to site "b" manually. I've done some planning for how to implement an active/passive cluster-like setup in Hobbit, so ... it's coming. Regards, Henrik
list Stephane Caminade
▸
Henrik Stoerner wrote:On Tue, Aug 01, 2006 at 10:18:36PM -0700, J Sloan wrote:Brodie, Kent wrote:Basically, you lose NOTHING when you change a BB server to a Hobbit server.We'd like to replace bb with hobbit but there's no way we can't do without the bb failover mechanism. We have 2 separate data centers, and while there are bb servers in both data centers monitoring the hosts on both sides, only side "a" does notifications. When side "b" can not reach side "a", then side "b" "fails over" and takes on the notification tasks, until side "a" becomes reachable again. There's nothing like that in hobbit yet, but if there were, we'd be able to make the switch.I won't say it is being worked on, but it is definitely on my agenda. My own setup is identical to yours, except that we have a procedure for doing the failover from site "a" to site "b" manually. I've done some planning for how to implement an active/passive cluster-like setup in Hobbit, so ... it's coming. Regards, Henrik
Have you considered setting up some kind of Heartbeat or VRRP system ?
At my lab, we use VRRP to share one IP between a master DNS and a secondary DNS which takes over if the primary fails (we have the same system for our web site and our mail server).
If the slave cannot contact the master, it takes over the 'public' IP, and can start some services, like bind or dhcpd for example.
There seems to be the same kind of possibilities with Heartbeat, but I haven t looked into it yet.
You could maybe set up your "b" site to start sending notifications in the event that site "a" is unreachable ?
Stephane
--
_____________________________________________________________________________
Stephane Caminade
Administrateur Systemes et Reseaux
\
Institut d'Astrophysique Spatiale / tel : (XX) (X) XX XX XX XX
Batiment 121, Universite Paris XI \ fax : (XX) (X) XX XX XX XX
F-91405 ORSAY Cedex / www : http://www.ias.u-psud.fr/
_____________________________________________________________________________
list Beau Olivier
Hi, I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ? olivier
list Henrik Størner
▸
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?
It means your netstat data doesn't look like what Hobbit expects.
Basically that it found two or more values for the same piece of data.
The best way of identifying which data causes this is probably to
run two things at the same time:
1) login as the hobbit user, and run
bbcmd hobbitd_channel --channel=data tee /tmp/data.log
2) Run "tail -f" on the rrd-data.log file.
When you see that error message in the rrd-data.log file, terminate
the first command. You should then have the "guilty" data at the end of
the /tmp/data.log file.
I'd obviously be interested to see what it looks like.
Regards,
Henrik
list Beau Olivier
Hi,
yes, this is interesting, and i think it points out a new problem, 802.1q on nics :
eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB)
Interrupt:201
eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)
eth1.15 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:10.11.99.99 Bcast:10.11.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1909363 errors:0 dropped:0 overruns:0 frame:0
TX packets:7253215 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:110322292 (105.2 MiB) TX bytes:1702401944 (1.5 GiB)
olivier
▸
-----Message d'origine-----
De : Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Envoyé : mercredi 2 août 2006 12:59
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] rrd-data.log
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?
It means your netstat data doesn't look like what Hobbit expects.
Basically that it found two or more values for the same piece of data.
The best way of identifying which data causes this is probably to
run two things at the same time:
1) login as the hobbit user, and run
bbcmd hobbitd_channel --channel=data tee /tmp/data.log
2) Run "tail -f" on the rrd-data.log file.
When you see that error message in the rrd-data.log file, terminate
the first command. You should then have the "guilty" data at the end of
the /tmp/data.log file.
I'd obviously be interested to see what it looks like.
Regards,
Henrik
list Kent Brodie
Aha...! I asked this very question last week- nobody gave me help...
(sob, sob..).
Anyway, here are two separate chunks of data that is CAUSING the
duplicate data error like Beau is having.
Henrik, what in this data is considered to be "duplicate"? (I do
notice that the netstat data in AIX is um... QUITE verbose........)
Data segments follow. Help?
@@data#810|1154529466.624163|999.888.777.202||fred.moo.mcw.edu|ifstat
data fred,moo,mcw,edu.ifstat
aix
ETHERNET STATISTICS (ent0) :
Device Type: 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
Hardware Address: 00:02:55:4f:a9:f4
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 0 Packets: 0
Bytes: 0 Bytes: 0
Interrupts: 2 Interrupts: 0
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 0
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0
Broadcast Packets: 0 Broadcast Packets: 0
Multicast Packets: 0 Multicast Packets: 0
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors:
0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by
Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0
General Statistics:
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 200
Driver Flags: Up Broadcast Running
Simplex AlternateAddress 64BitSupport
ChecksumOffload PrivateSegment LargeSend
DataRateSet
10/100 Mbps Ethernet PCI Adapter II (1410ff01) Specific Statistics:
Link Status: Down
Media Speed Selected: Auto negotiation
Media Speed Running: Unknown
Receive Pool Buffer Size: 1024
Free Receive Pool Buffers: 1024
No Receive Pool Buffer Errors: 0
Receive Buffer Too Small Errors: 0
Entries to transmit timeout routine: 0
Transmit IPsec packets: 0
Transmit IPsec packets dropped: 0
Receive IPsec packets: 0
Receive IPsec packets dropped: 0
Inbound IPsec SA offload count: 0
Transmit Large Send packets: 0
Transmit Large Send packets dropped: 0
Packets with Transmit collisions:
1 collisions: 0 6 collisions: 0 11 collisions: 0
2 collisions: 0 7 collisions: 0 12 collisions: 0
3 collisions: 0 8 collisions: 0 13 collisions: 0
4 collisions: 0 9 collisions: 0 14 collisions: 0
5 collisions: 0 10 collisions: 0 15 collisions: 0
ETHERNET STATISTICS (ent1) :
Device Type: 10/100 Mbps Ethernet PCI Adapter II (1410ff01)
Hardware Address: 00:02:55:4f:a9:f3
Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 0 Packets: 0
Bytes: 0 Bytes: 0
Interrupts: 2 Interrupts: 0
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 0
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0
Broadcast Packets: 0 Broadcast Packets: 0
Multicast Packets: 0 Multicast Packets: 0
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors:
0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by
Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0
General Statistics:
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 200
Driver Flags: Up Broadcast Running
Simplex AlternateAddress 64BitSupport
ChecksumOffload PrivateSegment LargeSend
DataRateSet
10/100 Mbps Ethernet PCI Adapter II (1410ff01) Specific Statistics:
Link Status: Down
Media Speed Selected: Auto negotiation
Media Speed Running: Unknown
Receive Pool Buffer Size: 1024
Free Receive Pool Buffers: 1024
No Receive Pool Buffer Errors: 0
Receive Buffer Too Small Errors: 0
Entries to transmit timeout routine: 0
Transmit IPsec packets: 0
Transmit IPsec packets dropped: 0
Receive IPsec packets: 0
Receive IPsec packets dropped: 0
Inbound IPsec SA offload count: 0
Transmit Large Send packets: 0
Transmit Large Send packets dropped: 0
Packets with Transmit collisions:
1 collisions: 0 6 collisions: 0 11 collisions: 0
2 collisions: 0 7 collisions: 0 12 collisions: 0
3 collisions: 0 8 collisions: 0 13 collisions: 0
4 collisions: 0 9 collisions: 0 14 collisions: 0
5 collisions: 0 10 collisions: 0 15 collisions: 0
ETHERNET STATISTICS (ent2) :
Device Type: 10/100/1000 Base-TX PCI-X Adapter (14106902)
Hardware Address: 00:02:55:53:c2:3e
Elapsed Time: 196 days 15 hours 56 minutes 59 seconds
Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 365802341 Packets: 1036460447
Bytes: 1683156286637 Bytes: 112378387074
Interrupts: 0 Interrupts: 614513005
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 30
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0
Broadcast Packets: 16681 Broadcast Packets:
210557123
Multicast Packets: 0 Multicast Packets: 283080
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors:
0
Deferred: 381 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by
Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0
General Statistics:
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
PrivateSegment LargeSend DataRateSet
10/100/1000 Base-TX PCI-X Adapter (14106902) Specific Statistics:
Link Status: Up
Media Speed Selected: Auto negotiation
Media Speed Running: 1000 Mbps Full Duplex
PCI Mode: PCI-X (100-133)
PCI Bus Width: 64-bit
Latency Timer: 144
Cache Line Size: 128
Jumbo Frames: Disabled
TCP Segmentation Offload: Enabled
TCP Segmentation Offload Packets Transmitted: 113291719
TCP Segmentation Offload Packet Errors: 0
Transmit and Receive Flow Control Status: Enabled
XON Flow Control Packets Transmitted: 0
XON Flow Control Packets Received: 430
XOFF Flow Control Packets Transmitted: 0
XOFF Flow Control Packets Received: 430
Transmit and Receive Flow Control Threshold (High): 45056
Transmit and Receive Flow Control Threshold (Low): 24576
Transmit and Receive Storage Allocation (TX/RX): 16/48
@@
@@data#811|1154529466.624871|999.777.666.202||phred.mrr.mcw.edu|netstat
data phred,mrr,mcw,edu.netstat
aix
icmp:
597 calls to icmp_error
0 errors not generated because old message was icmp
Output histogram:
echo reply: 58031
destination unreachable: 537
1 message with bad code fields
0 messages < minimum length
0 bad checksums
0 messages with bad length
Input histogram:
echo reply: 2
destination unreachable: 562
echo: 58031
time exceeded: 12
58031 message responses generated
igmp:
282998 messages received
0 messages received with too few bytes
0 messages received with bad checksum
282998 membership queries received
0 membership queries received with invalid field(s)
0 membership reports received
0 membership reports received with invalid field(s)
0 membership reports received for groups to which we belong
2 membership reports sent
tcp:
449406211 packets sent
419904427 data packets (1612944301 bytes)
176131 data packets (218422100 bytes) retransmitted
20439309 ack-only packets (8520363 delayed)
6 URG only packets
420337 window probe packets
2131650 window update packets
6334351 control packets
113291719 large sends
3038842766 bytes sent using largesend
64240 bytes is the biggest largesend
909296772 packets received
857387271 acks (for 1835107293 bytes)
7844383 duplicate acks
0 acks for unsent data
307390334 packets (2464690667 bytes) received
in-sequence
45039 completely duplicate packets (1602267 bytes)
0 old duplicate packets
5 packets with some dup. data (624 bytes duped)
2441887 out-of-order packets (440550 bytes)
8 packets (8 bytes) of data after window
8 window probes
2587766 window update packets
3902 packets received after close
0 packets with bad hardware assisted checksum
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
1440 discarded by listeners
0 discarded due to listener's queue full
23840405 ack packet headers correctly predicted
35343642 data packet headers correctly predicted
399880 connection requests
5548229 connection accepts
5947965 connections established (including accepts)
5953023 connections closed (including 25563 drops)
0 connections with ECN capability
0 times responded to ECN
137 embryonic connections dropped
404559742 segments updated rtt (of 404578267 attempts)
0 segments with congestion window reduced bit set
0 segments with congestion experienced bit set
0 resends due to path MTU discovery
10019 path MTU discovery terminations due to retransmits
29115 retransmit timeouts
1 connection dropped by rexmit timeout
2 fast retransmits
0 when congestion window less than 4 segments
3 newreno retransmits
0 times avoided false fast retransmits
427603 persist timeouts
0 connections dropped due to persist timeout
8957 keepalive timeouts
8928 keepalive probes sent
29 connections dropped by keepalive
0 times SACK blocks array is extended
0 times SACK holes array is extended
0 packets dropped due to memory allocation failure
0 connections in timewait reused
0 delayed ACKs for SYN
0 delayed ACKs for FIN
0 send_and_disconnects
0 spliced connections
0 spliced connections closed
0 spliced connections reset
0 spliced connections timeout
0 spliced connections persist timeout
0 spliced connections keepalive timeout
udp:
13767017 datagrams received
0 incomplete headers
0 bad data length fields
0 bad checksums
597 dropped due to no socket
6762980 broadcast/multicast datagrams dropped due to no socket
0 dropped due to full socket buffers
7003440 delivered
6994616 datagrams output
ip:
923406868 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
0 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets reassembled ok
923121828 packets for this host
283574 packets for unknown/unsupported protocol
0 packets forwarded
1396 packets not forwardable
0 redirects sent
456499667 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
0 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 datagrams that can't be fragmented
69 IP Multicast packets dropped due to no receiver
0 successful path MTU discovery cycles
0 path MTU rediscovery cycles attempted
0 path MTU discovery no-response estimates
0 path MTU discovery response timeouts
0 path MTU discovery decreases detected
0 path MTU discovery packets sent
0 path MTU discovery memory allocation failures
0 ipintrq overflows
0 with illegal source
0 packets processed by threads
0 packets dropped by threads
0 packets dropped due to the full socket receive buffer
0 dead gateway detection packets sent
0 dead gateway detection packet allocation failures
0 dead gateway detection gateway allocation failures
ipv6:
3 total packets received
0 with size smaller than minimum
0 with data size < data length
0 with incorrect version number
0 with illegal source
0 input packets without enough memory
0 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets reassembled ok
0 packets for this host
0 packets for unknown/unsupported protocol
0 packets forwarded
3 packets not forwardable
0 too big packets not forwarded
0 packets sent from this host
0 packets sent with fabricated ipv6 header
0 output packets dropped due to no bufs
0 output packets without enough memory
0 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 packets dropped due to full socket receive buffer
0 packets not delivered due to bad raw IPv6 checksum
icmpv6:
0 calls to icmp6_error
0 errors not generated because old message was icmpv6
Output histogram:
unreachable: 0
packets too big: 0
time exceeded: 0
parameter problems: 0
redirects: 0
echo requests: 0
echo replies: 0
group queries: 0
group reports: 0
group terminations: 0
router solicitations: 0
router advertisements: 0
neighbor solicitations: 0
neighbor advertisements: 0
0 messages with bad code fields
0 messages < minimum length
0 bad checksums
0 messages with bad length
Input histogram:
unreachable: 0
packets too big: 0
time exceeded: 0
parameter problems: 0
echo requests: 0
echo replies: 0
group queries: 0
bad group queries: 0
group reports: 0
bad group reports: 0
our groups' reports: 0
group terminations: 0
bad group terminations: 0
router solicitations: 0
bad router solicitations: 0
router advertisements: 0
bad router advertisements: 0
neighbor solicitations: 0
bad neighbor solicitations: 0
neighbor advertisements: 0
bad neighbor advertisements: 0
redirects: 0
bad redirects: 0
mobility calls when not started: 0
home agent address discovery requests: 0
bad home agent address discovery requests: 0
bad home agent address discovery replys: 0
bad home agent address discovery replys: 0
prefix solicitations: 0
bad prefix solicitations: 0
prefix advertisements: 0
bad prefix advertisements: 0
0 message responses generated
@@
@@data#1228|1154529774.054469|141.106.224.202||jordan.hmgc.mcw.edu|netst
at
data jordan,hmgc,mcw,edu.netstat
aix
icmp:
29 calls to icmp_error
0 errors not generated because old message was icmp
Output histogram:
echo reply: 16024
destination unreachable: 17
45 messages with bad code fields
0 messages < minimum length
0 bad checksums
0 messages with bad length
Input histogram:
echo reply: 7
destination unreachable: 58
echo: 16024
16024 message responses generated
igmp:
79096 messages received
0 messages received with too few bytes
0 messages received with bad checksum
79096 membership queries received
0 membership queries received with invalid field(s)
0 membership reports received
0 membership reports received with invalid field(s)
0 membership reports received for groups to which we belong
2 membership reports sent
tcp:
256987083 packets sent
233786265 data packets (146658044 bytes)
30885 data packets (6116690 bytes) retransmitted
16701059 ack-only packets (3980875 delayed)
0 URG only packets
193 window probe packets
17752 window update packets
6450950 control packets
45166554 large sends
740111354 bytes sent using largesend
64240 bytes is the biggest largesend
496262560 packets received
450701474 acks (for 151098725 bytes)
9591183 duplicate acks
0 acks for unsent data
195459972 packets (949784396 bytes) received in-sequence
23755 completely duplicate packets (8371 bytes)
0 old duplicate packets
0 packets with some dup. data (0 bytes duped)
3375737 out-of-order packets (104293 bytes)
0 packets (0 bytes) of data after window
0 window probes
4869090 window update packets
3508 packets received after close
0 packets with bad hardware assisted checksum
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
1117 discarded by listeners
0 discarded due to listener's queue full
10028924 ack packet headers correctly predicted
24673885 data packet headers correctly predicted
100631 connection requests
6257521 connection accepts
6358083 connections established (including accepts)
6359382 connections closed (including 24024 drops)
0 connections with ECN capability
0 times responded to ECN
65 embryonic connections dropped
240375423 segments updated rtt (of 240404633 attempts)
0 segments with congestion window reduced bit set
0 segments with congestion experienced bit set
0 resends due to path MTU discovery
14379 path MTU discovery terminations due to retransmits
31032 retransmit timeouts
5 connections dropped by rexmit timeout
3 fast retransmits
0 when congestion window less than 4 segments
10 newreno retransmits
5 times avoided false fast retransmits
195 persist timeouts
0 connections dropped due to persist timeout
1407 keepalive timeouts
1393 keepalive probes sent
14 connections dropped by keepalive
0 times SACK blocks array is extended
0 times SACK holes array is extended
0 packets dropped due to memory allocation failure
0 connections in timewait reused
0 delayed ACKs for SYN
0 delayed ACKs for FIN
0 send_and_disconnects
0 spliced connections
0 spliced connections closed
0 spliced connections reset
0 spliced connections timeout
0 spliced connections persist timeout
0 spliced connections keepalive timeout
udp:
6773177 datagrams received
0 incomplete headers
0 bad data length fields
0 bad checksums
29 dropped due to no socket
471205 broadcast/multicast datagrams dropped due to no socket
0 socket buffer overflows
6301943 delivered
6301977 datagrams output
ip:
504670433 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
0 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets reassembled ok
503051772 packets for this host
79154 packets for unknown/unsupported protocol
0 packets forwarded
1539510 packets not forwardable
0 redirects sent
263316001 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
0 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 datagrams that can't be fragmented
0 IP Multicast packets dropped due to no receiver
0 successful path MTU discovery cycles
0 path MTU rediscovery cycles attempted
0 path MTU discovery no-response estimates
0 path MTU discovery response timeouts
0 path MTU discovery decreases detected
0 path MTU discovery packets sent
0 path MTU discovery memory allocation failures
0 ipintrq overflows
0 with illegal source
0 packets processed by threads
0 packets dropped by threads
0 packets dropped due to the full socket receive buffer
0 dead gateway detection packets sent
0 dead gateway detection packet allocation failures
0 dead gateway detection gateway allocation failures
ipv6:
0 total packets received
0 with size smaller than minimum
0 with data size < data length
0 with incorrect version number
0 with illegal source
0 input packets without enough memory
0 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets reassembled ok
0 packets for this host
0 packets for unknown/unsupported protocol
0 packets forwarded
0 packets not forwardable
0 too big packets not forwarded
0 packets sent from this host
0 packets sent with fabricated ipv6 header
0 output packets dropped due to no bufs
0 output packets without enough memory
0 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 packets dropped due to full socket receive buffer
0 packets not delivered due to bad raw IPv6 checksum
icmpv6:
0 calls to icmp6_error
0 errors not generated because old message was icmpv6
Output histogram:
unreachable: 0
packets too big: 0
time exceeded: 0
parameter problems: 0
redirects: 0
echo requests: 0
echo replies: 0
group queries: 0
group reports: 0
group terminations: 0
router solicitations: 0
router advertisements: 0
neighbor solicitations: 0
neighbor advertisements: 0
0 messages with bad code fields
0 messages < minimum length
0 bad checksums
0 messages with bad length
Input histogram:
unreachable: 0
packets too big: 0
time exceeded: 0
parameter problems: 0
echo requests: 0
echo replies: 0
group queries: 0
bad group queries: 0
group reports: 0
bad group reports: 0
our groups' reports: 0
group terminations: 0
bad group terminations: 0
router solicitations: 0
bad router solicitations: 0
router advertisements: 0
bad router advertisements: 0
neighbor solicitations: 0
bad neighbor solicitations: 0
neighbor advertisements: 0
bad neighbor advertisements: 0
redirects: 0
bad redirects: 0
mobility calls when not started: 0
home agent address discovery requests: 0
bad home agent address discovery requests: 0
bad home agent address discovery replys: 0
bad home agent address discovery replys: 0
prefix solicitations: 0
bad prefix solicitations: 0
prefix advertisements: 0
bad prefix advertisements: 0
0 message responses generated
@@
Kent C. Brodie - user-da7f7d5174c0@xymon.invalid
Department of Physiology
Medical College of Wisconsin
(XXX) XXX-XXXX
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, August 02, 2006 5:59 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] rrd-data.log
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?
It means your netstat data doesn't look like what Hobbit expects.
Basically that it found two or more values for the same piece of data.
The best way of identifying which data causes this is probably to
run two things at the same time:
1) login as the hobbit user, and run
bbcmd hobbitd_channel --channel=data tee /tmp/data.log
2) Run "tail -f" on the rrd-data.log file.
When you see that error message in the rrd-data.log file, terminate
the first command. You should then have the "guilty" data at the end of
the /tmp/data.log file.
I'd obviously be interested to see what it looks like.
Regards,
Henrik
list Henrik Størner
▸
On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:
Hi,
yes, this is interesting, and i think it points out a new problem, 802.1q on nics :
eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB)
Interrupt:201
eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)
Perhaps, but these data should not get anywhere near the code that
prints out this message. The code that generates that message is
the one that parses the output from "netstat -s" which should look like
Ip:
3017099 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
3017058 incoming packets delivered
3154813 requests sent out
Icmp:
51081 ICMP messages received
0 input ICMP message failed.
What does this command report on your host?
Regards,
Henrik
list Kent Brodie
I am stabbing in the dark here, but the duplicate data on my end seems to be caused by parsing the output of the netstat -s command on *AIX*. Here, what is different is that the netstat -s command on aix is much more verbose, showing stuff for ipv4 and ipv6. Perhaps the "icmp:" and "icmpv6:" or other similar items is where the parsing breaks, and supposed duplicates are detected?
▸
Kent C. Brodie - user-da7f7d5174c0@xymon.invalid
Department of Physiology
Medical College of Wisconsin
(XXX) XXX-XXXX
-----Original Message-----
▸
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, August 02, 2006 9:53 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] rrd-data.log
On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:Hi, yes, this is interesting, and i think it points out a new problem,
802.1q on nics :
eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9GiB)
Interrupt:201
eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4MiB)
Perhaps, but these data should not get anywhere near the code that
prints out this message. The code that generates that message is
the one that parses the output from "netstat -s" which should look like
Ip:
3017099 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
3017058 incoming packets delivered
3154813 requests sent out
Icmp:
51081 ICMP messages received
0 input ICMP message failed.
What does this command report on your host?
Regards,
Henrik
list Beau Olivier
here the output from data.log about netstat :
Ip:
6774912 total packets received
8069 forwarded
0 incoming packets discarded
6766842 incoming packets delivered
12918060 requests sent out
Icmp:
725255 ICMP messages received
1 input ICMP message failed.
ICMP input histogram:
destination unreachable: 712247
timeout in transit: 30
echo requests: 4212
echo replies: 8766
716456 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 712244
echo replies: 4212
Tcp:
19091 active connections openings
20926 passive connection openings
1 failed connection attempts
2952 connection resets received
2 connections established
4460418 segments received
11359826 segments send out
76470 segments retransmited
0 bad segments received.
3763 resets sent
Udp:
105882 packets received
711976 packets to unknown port received.
0 packet receive errors
817609 packets sent
TcpExt:
ArpFilter: 0
24632 TCP sockets finished time wait in fast timer
1629 delayed acks sent
2 delayed acks further delayed because of locked socket
Quick ack mode was activated 10 times
1101 packets directly queued to recvmsg prequeue.
208762 packets directly received from backlog
8763 packets directly received from prequeue
417698 packets header predicted
155 packets header predicted and directly queued to user
TCPPureAcks: 586976
TCPHPAcks: 3180140
TCPRenoRecovery: 0
TCPSackRecovery: 46644
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 0
TCPLossUndo: 1
TCPLoss: 20751
TCPLostRetransmit: 61
TCPRenoFailures: 0
TCPSackFailures: 272
TCPLossFailures: 1
TCPFastRetrans: 62624
TCPForwardRetrans: 1657
TCPSlowStartRetrans: 2052
TCPTimeouts: 3170
TCPRenoRecoveryFail: 0
TCPSackRecoveryFail: 3672
TCPSchedulerFailed: 0
TCPRcvCollapsed: 0
TCPDSACKOldSent: 11
TCPDSACKOfoSent: 0
TCPDSACKRecv: 0
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 1432
TCPAbortOnClose: 9
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 8
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
▸
-----Message d'origine-----
De : Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Envoyé : mercredi 2 août 2006 16:53
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] rrd-data.log
On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:Hi,
yes, this is interesting, and i think it points out a new problem, 802.1q on nics :
eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0
TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB)
Interrupt:201
eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C
inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0
TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)
Perhaps, but these data should not get anywhere near the code that
prints out this message. The code that generates that message is
the one that parses the output from "netstat -s" which should look like
Ip:
3017099 total packets received
1 with invalid addresses
0 forwarded
0 incoming packets discarded
3017058 incoming packets delivered
3154813 requests sent out
Icmp:
51081 ICMP messages received
0 input ICMP message failed.
What does this command report on your host?
Regards,
Henrik
list Henrik Størner
▸
On Wed, Aug 02, 2006 at 04:52:37PM +0200, Henrik Stoerner wrote:
On Wed, Aug 02, 2006 at 01:50:28PM +0200, Beau Olivier wrote:Hi, yes, this is interesting, and i think it points out a new problem, 802.1q on nics : eth1 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2798842 errors:0 dropped:0 overruns:0 frame:0 TX packets:8950695 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:217776970 (207.6 MiB) TX bytes:4275403340 (3.9 GiB) Interrupt:201 eth1.9 Link encap:Ethernet HWaddr 00:0D:9D:4E:11:9C inet addr:192.168.250.33 Bcast:192.168.250.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2226941 errors:0 dropped:0 overruns:0 frame:0 TX packets:3441485 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:520111630 (496.0 MiB) TX bytes:410431496 (391.4 MiB)Perhaps, but these data should not get anywhere near the code that prints out this message.
Yikes, I cannot remember my own code. You're right - it IS the interface statistics code that triggers this error. OK, I'll try and work out why and how it can be fixed. Regards, Henrik
list Henrik Størner
▸
On Wed, Aug 02, 2006 at 10:01:02AM -0500, Brodie, Kent wrote:
I am stabbing in the dark here, but the duplicate data on my end seems to be caused by parsing the output of the netstat -s command on *AIX*.
No, it's me who is confused. Thanks for your aix data, they do give me a way of reproducing the problem. Regards, Henrik
list Dominique Frise
▸
Henrik Stoerner wrote:
On Wed, Aug 02, 2006 at 10:01:02AM -0500, Brodie, Kent wrote:I am stabbing in the dark here, but the duplicate data on my end seems to be caused by parsing the output of the netstat -s command on *AIX*.No, it's me who is confused. Thanks for your aix data, they do give me a way of reproducing the problem. Regards, Henrik
We had same problem with following data (client is RHAS2.1). Same statistics
are reported for both eth0 interfaces:
@@data#366293|1154521218.440675|1.2.7.23||tulp|ifstat
data tulp.ifstat
linux22
eth0 Link encap:Ethernet HWaddr 00:0C:29:FC:14:DD
inet addr:1.2.5.36 Bcast:1.2.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:167690690 errors:2305 dropped:2628 overruns:0 frame:0
TX packets:155904732 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3709223888 (3537.3 Mb) TX bytes:2014658132 (1921.3 Mb)
Interrupt:10 Base address:0x1080
eth0:1 Link encap:Ethernet HWaddr 00:0C:29:FC:14:DD
inet addr:1.2.5.56 Bcast:1.2.5.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:167690690 errors:2305 dropped:2628 overruns:0 frame:0
TX packets:155904732 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3709223888 (3537.3 Mb) TX bytes:2014658132 (1921.3 Mb)
Interrupt:10 Base address:0x1080
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:18870 errors:0 dropped:0 overruns:0 frame:0
TX packets:18870 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:4785828 (4.5 Mb) TX bytes:4785828 (4.5 Mb)
@@
@@data#366294|1154521218.440921|130.223.27.23||tulp|vmstat
data tulp.vmstat
linux22
...
...
Hope this helps
Dominique
UNIL - University of Lausanne
list Henrik Størner
▸
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:
I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?
Turns out to be a couple of bad regular expressions in the interface
statistics code. This patch should fix it for both the AIX and Linux
systems you've reported this on.
Regards,
Henrik
-------------- next part --------------
--- hobbitd/rrd/do_ifstat.c 2006/08/01 21:32:37 1.7
+++ hobbitd/rrd/do_ifstat.c 2006/08/02 15:25:48
@@ -20,7 +20,7 @@
/* eth0 Link encap: */
/* RX bytes: 1829192 (265.8 MiB) TX bytes: 1827320 (187.7 MiB */
static const char *ifstat_linux_exprs[] = {
- "^([a-z]+[0-9]+)\\s",
+ "^([a-z]+[0123456789.:]+)\\s",
"^\\s+RX bytes:([0-9]+) .*TX bytes.([0-9]+) "
};
@@ -73,7 +73,7 @@
*/
static const char *ifstat_aix_exprs[] = {
"^ETHERNET STATISTICS \\(([a-z0-9]+)\\) :",
- "^Bytes:\\s+(\\d+)\\s+(\\d+)"
+ "^Bytes:\\s+(\\d+)\\s+Bytes:\\s+(\\d+)"
};
list Joe Sloan
▸
Stephane Caminade wrote:
Have you considered setting up some kind of Heartbeat or VRRP system ? At my lab, we use VRRP to share one IP between a master DNS and a secondary DNS which takes over if the primary fails (we have the same system for our web site and our mail server). If the slave cannot contact the master, it takes over the 'public' IP, and can start some services, like bind or dhcpd for example. There seems to be the same kind of possibilities with Heartbeat, but I haven t looked into it yet. You could maybe set up your "b" site to start sending notifications in the event that site "a" is unreachable ?
We thought about this, and the problem with the generic solutions is that they tend to be active/passive. We need both sides active and fully functional all the time, just without redundant notifications, and the failover mechanism of bb does exactly what is needed, out of the box. We could, given enough time and effort, implement something that would do what we need, but management tends to be very conservative about change, and very reluctant to allow us to spend time on anything not related to the current projects. It's the power of inertia, and the old "If it ain't broke, don't fix it" mentality. IOW, the bb/bbgen-3.6 combo is "good enough" to keep running. J
list Ralph Mitchell
▸
On 8/2/06, J Sloan <user-b1d2c84d244b@xymon.invalid> wrote:
We could, given enough time and effort, implement something that would do what we need, but management tends to be very conservative about change, and very reluctant to allow us to spend time on anything not related to the current projects. It's the power of inertia, and the old "If it ain't broke, don't fix it" mentality. IOW, the bb/bbgen-3.6 combo is "good enough" to keep running.
I have a similar kind of management. I came across Hobbit around Christmas and have been running it in parallel to Big Brother since then. The problem of how to switch over was solved for me back in May when the power supply in my Big Brother server blew out. I swear it was nothing I did... :) The machine is old and probably off maintenance, so I figured it would be faster to load a backup copy of my checkout scripts onto the Hobbit server and run with that. Everybody I've spoken with about it either doesn't care or prefers Hobbit. The lone exception being one person who would prefer to just click on a recycle icon to flip between the main page and the summary, instead of using the drop-down menu... Ralph Mitchell
list Rolf Schrittenlocher
Hi, we have the same issue for netstat and vmstat on Sun Solaris 9 (hobbit 4.1.2). And we had it for other tests as well while running more than two instances of hobbit client usinf different virtual hosts on one machine. regards Rolf
▸
On Wed, Aug 02, 2006 at 12:23:53PM +0200, Beau Olivier wrote:I'm having "Internal error: Duplicate match ignored" in my rrd-data.log, what could cause this ?Turns out to be a couple of bad regular expressions in the interface statistics code. This patch should fix it for both the AIX and Linux systems you've reported this on. Regards, Henrik --- hobbitd/rrd/do_ifstat.c 2006/08/01 21:32:37 1.7 +++ hobbitd/rrd/do_ifstat.c 2006/08/02 15:25:48 @@ -20,7 +20,7 @@ /* eth0 Link encap: */ /* RX bytes: 1829192 (265.8 MiB) TX bytes: 1827320 (187.7 MiB */ static const char *ifstat_linux_exprs[] = { - "^([a-z]+[0-9]+)\\s", + "^([a-z]+[0123456789.:]+)\\s", "^\\s+RX bytes:([0-9]+) .*TX bytes.([0-9]+) " }; @@ -73,7 +73,7 @@ */ static const char *ifstat_aix_exprs[] = { "^ETHERNET STATISTICS \\(([a-z0-9]+)\\) :", - "^Bytes:\\s+(\\d+)\\s+(\\d+)" + "^Bytes:\\s+(\\d+)\\s+Bytes:\\s+(\\d+)" };
--
Mit freundlichen Gruessen
Rolf Schrittenlocher
HRZ/BDV, Senckenberganlage 31, 60054 Frankfurt
Tel: (XX) XX - XXX XXXXX Fax: (XX) XX - XXX XXXXX
LBS: user-1e39a1813094@xymon.invalid
Persoenlich: user-6ea8e907e200@xymon.invalid