Xymon Mailing List Archive search

Patch not done yet? was RE: rrd-data.log

7 messages in this thread

list Kent Brodie · Wed, 2 Aug 2006 11:07:06 -0500 ·
Hi Henrik:   Below are snippets from rrd that are still causing the
"Duplicate Error" on my end, even after applying the patch.   In the
cases where there's netstat and ifstat data shown together, I had to
include both because those chunks of data came out right at the time the
duplicate error appeared.    Too hard to time/see which of those 2
chunks of data cause the problem.   In other cases, only ifstat data
caused the problem.

Where I have large chunks of whitespace- that separates "instances" of
the duplicate error occurring.
===========


@@data#348|1154533834.591188|192.168.224.202||wolf13.hmgc.mcw.edu|netsta
t
data wolf13,hmgc,mcw,edu.netstat
linux
Ip:
    56731 total packets received
    0 forwarded
    0 incoming packets discarded
    56067 incoming packets delivered
    65344 requests sent out
Icmp:
    648 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 89
        echo requests: 559
    648 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 89
        echo replies: 559
Tcp:
    4630 active connections openings
    1 passive connection openings
    1 failed connection attempts
    0 connection resets received
    0 connections established
    46150 segments received
    54675 segments send out
    85 segments retransmited
    0 bad segments received.
    1 resets sent
Udp:
    9933 packets received
    0 packets to unknown port received.
    0 packet receive errors
    10021 packets sent
TcpExt:
    ArpFilter: 0
    4616 TCP sockets finished time wait in fast timer
    908 delayed acks sent
    557 packets directly queued to recvmsg prequeue.
    507580 packets directly received from backlog
    554 packets directly received from prequeue
    19157 packets header predicted
    426 packets header predicted and directly queued to user
    TCPPureAcks: 7526
    TCPHPAcks: 9608
    TCPRenoRecovery: 0
    TCPSackRecovery: 0
    TCPSACKReneging: 0
    TCPFACKReorder: 0
    TCPSACKReorder: 0
    TCPRenoReorder: 0
    TCPTSReorder: 0
    TCPFullUndo: 0
    TCPPartialUndo: 0
    TCPDSACKUndo: 0
    TCPLossUndo: 35
    TCPLoss: 0
    TCPLostRetransmit: 0
    TCPRenoFailures: 0
    TCPSackFailures: 0
    TCPLossFailures: 0
    TCPFastRetrans: 0
    TCPForwardRetrans: 0
    TCPSlowStartRetrans: 0
    TCPTimeouts: 85
    TCPRenoRecoveryFail: 0
    TCPSackRecoveryFail: 0
    TCPSchedulerFailed: 0
    TCPRcvCollapsed: 0
    TCPDSACKOldSent: 0
    TCPDSACKOfoSent: 0
    TCPDSACKRecv: 0
    TCPDSACKOfoRecv: 0
    TCPAbortOnSyn: 0
    TCPAbortOnData: 0
    TCPAbortOnClose: 0
    TCPAbortOnMemory: 0
    TCPAbortOnTimeout: 0
    TCPAbortOnLinger: 0
    TCPAbortFailed: 0
    TCPMemoryPressures: 0
@@
@@data#349|1154533834.591337|192.168.224.202||wolf13.hmgc.mcw.edu|ifstat
data wolf13,hmgc,mcw,edu.ifstat
linux
eth1      Link encap:Ethernet  HWaddr 00:30:6E:F3:0B:46  
          inet addr:192.168.96.113  Bcast:192.168.96.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:72022 errors:0 dropped:0 overruns:0 frame:0
          TX packets:71139 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:21476056 (20.4 Mb)  TX bytes:17076949 (16.2 Mb)
          Interrupt:56 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:189 errors:0 dropped:0 overruns:0 frame:0
          TX packets:189 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:16228 (15.8 Kb)  TX bytes:16228 (15.8 Kb)

@@


@@data#507|1154533965.074399|192.168.224.202||bc1s2.phys.mcw.edu|ifstat
data bc1s2,phys,mcw,edu.ifstat
linux
eth1      Link encap:Ethernet  HWaddr 00:0D:60:1E:0E:DD  
          inet addr:192.168.224.111  Bcast:192.168.224.255
Mask:255.255.255.0
          inet6 addr: fe80::20d:60ff:fe1e:edd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:43166350 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6166704 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:14398847028 (13731.8 Mb)  TX bytes:2587305933 (2467.4
Mb)
          Interrupt:45 Memory:c0010000-c0020000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:104 errors:0 dropped:0 overruns:0 frame:0
          TX packets:104 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:9752 (9.5 Kb)  TX bytes:9752 (9.5 Kb)

@@


@@data#740|1154534133.480180|192.168.224.202||dunn.hmgc.mcw.edu|netstat
data dunn,hmgc,mcw,edu.netstat
linux
Ip:
    9154191 total packets received
    0 forwarded
    0 incoming packets discarded
    8170997 incoming packets delivered
    15226980 requests sent out
Icmp:
    8961 ICMP messages received
    9 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 16
        echo requests: 8945
    13860 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 4915
        echo replies: 8945
Tcp:
    33598 active connections openings
    15245 passive connection openings
    4 failed connection attempts
    106 connection resets received
    10 connections established
    7841472 segments received
    15058779 segments send out
    412 segments retransmited
    0 bad segments received.
    2935 resets sent
Udp:
    315663 packets received
    4901 packets to unknown port received.
    0 packet receive errors
    154341 packets sent
TcpExt:
    7 resets received for embryonic SYN_RECV sockets
    ArpFilter: 0
    37618 TCP sockets finished time wait in fast timer
    2 packets rejects in established connections because of timestamp
    12809 delayed acks sent
    2 delayed acks further delayed because of locked socket
    Quick ack mode was activated 1930 times
    144921 packets directly queued to recvmsg prequeue.
    7524 packets directly received from backlog
    40929271 packets directly received from prequeue
    47328 packets header predicted
    135097 packets header predicted and directly queued to user
    TCPPureAcks: 5602771
    TCPHPAcks: 2027416
    TCPRenoRecovery: 47
    TCPSackRecovery: 19
    TCPSACKReneging: 0
    TCPFACKReorder: 0
    TCPSACKReorder: 0
    TCPRenoReorder: 0
    TCPTSReorder: 0
    TCPFullUndo: 0
    TCPPartialUndo: 0
    TCPDSACKUndo: 0
    TCPLossUndo: 39
    TCPLoss: 55
    TCPLostRetransmit: 0
    TCPRenoFailures: 2
    TCPSackFailures: 1
    TCPLossFailures: 1
    TCPFastRetrans: 166
    TCPForwardRetrans: 8
    TCPSlowStartRetrans: 35
    TCPTimeouts: 151
    TCPRenoRecoveryFail: 9
    TCPSackRecoveryFail: 1
    TCPSchedulerFailed: 0
    TCPRcvCollapsed: 0
    TCPDSACKOldSent: 1
    TCPDSACKOfoSent: 0
    TCPDSACKRecv: 0
    TCPDSACKOfoRecv: 0
    TCPAbortOnSyn: 0
    TCPAbortOnData: 948
    TCPAbortOnClose: 14
    TCPAbortOnMemory: 0
    TCPAbortOnTimeout: 4
    TCPAbortOnLinger: 0
    TCPAbortFailed: 0
    TCPMemoryPressures: 0
@@
@@data#741|1154534133.480925|192.168.224.202||dunn.hmgc.mcw.edu|ifstat
data dunn,hmgc,mcw,edu.ifstat
linux
eth0      Link encap:Ethernet  HWaddr 00:09:3D:13:DC:AB  
          inet addr:192.168.224.105  Bcast:192.168.224.255
Mask:255.255.255.0
          inet6 addr: fe80::209:3dff:fe13:dcab/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:39290006 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15275236 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2864046246 (2.6 GiB)  TX bytes:21735816115 (20.2 GiB)
          Interrupt:185 

eth0:0    Link encap:Ethernet  HWaddr 00:09:3D:13:DC:AB  
          inet addr:192.168.224.107  Bcast:192.168.224.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:185 

eth0:1    Link encap:Ethernet  HWaddr 00:09:3D:13:DC:AB  
          inet addr:192.168.224.160  Bcast:192.168.224.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:185 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:36728 errors:0 dropped:0 overruns:0 frame:0
          TX packets:36728 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:5951488 (5.6 MiB)  TX bytes:5951488 (5.6 MiB)

@@
list Beau Olivier · Wed, 2 Aug 2006 18:24:02 +0200 ·
same for me, i still get the log after the patch.


Olivier 
quoted from Kent Brodie

-----Message d'origine-----
De : Brodie, Kent [mailto:user-8fbf1c81e97c@xymon.invalid]
Envoyé : mercredi 2 août 2006 18:07
À : user-ae9b8668bcde@xymon.invalid
Objet : [hobbit] Patch not done yet? was RE: rrd-data.log


Hi Henrik:   Below are snippets from rrd that are still causing the
"Duplicate Error" on my end, even after applying the patch.   In the
cases where there's netstat and ifstat data shown together, I had to
include both because those chunks of data came out right at the time the
duplicate error appeared.    Too hard to time/see which of those 2
chunks of data cause the problem.   In other cases, only ifstat data
caused the problem.

Where I have large chunks of whitespace- that separates "instances" of
the duplicate error occurring.
===========


@@data#348|1154533834.591188|192.168.224.202||wolf13.hmgc.mcw.edu|netsta
t
data wolf13,hmgc,mcw,edu.netstat
linux
Ip:
    56731 total packets received
    0 forwarded
    0 incoming packets discarded
    56067 incoming packets delivered
    65344 requests sent out
Icmp:
    648 ICMP messages received
    0 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 89
        echo requests: 559
    648 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 89
        echo replies: 559
Tcp:
    4630 active connections openings
    1 passive connection openings
    1 failed connection attempts
    0 connection resets received
    0 connections established
    46150 segments received
    54675 segments send out
    85 segments retransmited
    0 bad segments received.
    1 resets sent
Udp:
    9933 packets received
    0 packets to unknown port received.
    0 packet receive errors
    10021 packets sent
TcpExt:
    ArpFilter: 0
    4616 TCP sockets finished time wait in fast timer
    908 delayed acks sent
    557 packets directly queued to recvmsg prequeue.
    507580 packets directly received from backlog
    554 packets directly received from prequeue
    19157 packets header predicted
    426 packets header predicted and directly queued to user
    TCPPureAcks: 7526
    TCPHPAcks: 9608
    TCPRenoRecovery: 0
    TCPSackRecovery: 0
    TCPSACKReneging: 0
    TCPFACKReorder: 0
    TCPSACKReorder: 0
    TCPRenoReorder: 0
    TCPTSReorder: 0
    TCPFullUndo: 0
    TCPPartialUndo: 0
    TCPDSACKUndo: 0
    TCPLossUndo: 35
    TCPLoss: 0
    TCPLostRetransmit: 0
    TCPRenoFailures: 0
    TCPSackFailures: 0
    TCPLossFailures: 0
    TCPFastRetrans: 0
    TCPForwardRetrans: 0
    TCPSlowStartRetrans: 0
    TCPTimeouts: 85
    TCPRenoRecoveryFail: 0
    TCPSackRecoveryFail: 0
    TCPSchedulerFailed: 0
    TCPRcvCollapsed: 0
    TCPDSACKOldSent: 0
    TCPDSACKOfoSent: 0
    TCPDSACKRecv: 0
    TCPDSACKOfoRecv: 0
    TCPAbortOnSyn: 0
    TCPAbortOnData: 0
    TCPAbortOnClose: 0
    TCPAbortOnMemory: 0
    TCPAbortOnTimeout: 0
    TCPAbortOnLinger: 0
    TCPAbortFailed: 0
    TCPMemoryPressures: 0
@@
@@data#349|1154533834.591337|192.168.224.202||wolf13.hmgc.mcw.edu|ifstat
data wolf13,hmgc,mcw,edu.ifstat
linux
eth1      Link encap:Ethernet  HWaddr 00:30:6E:F3:0B:46  
          inet addr:192.168.96.113  Bcast:192.168.96.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:72022 errors:0 dropped:0 overruns:0 frame:0
          TX packets:71139 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:21476056 (20.4 Mb)  TX bytes:17076949 (16.2 Mb)
          Interrupt:56 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:189 errors:0 dropped:0 overruns:0 frame:0
          TX packets:189 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:16228 (15.8 Kb)  TX bytes:16228 (15.8 Kb)

@@


@@data#507|1154533965.074399|192.168.224.202||bc1s2.phys.mcw.edu|ifstat
data bc1s2,phys,mcw,edu.ifstat
linux
eth1      Link encap:Ethernet  HWaddr 00:0D:60:1E:0E:DD  
          inet addr:192.168.224.111  Bcast:192.168.224.255
Mask:255.255.255.0
          inet6 addr: fe80::20d:60ff:fe1e:edd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:43166350 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6166704 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100 
          RX bytes:14398847028 (13731.8 Mb)  TX bytes:2587305933 (2467.4
Mb)
          Interrupt:45 Memory:c0010000-c0020000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:104 errors:0 dropped:0 overruns:0 frame:0
          TX packets:104 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:9752 (9.5 Kb)  TX bytes:9752 (9.5 Kb)

@@


@@data#740|1154534133.480180|192.168.224.202||dunn.hmgc.mcw.edu|netstat
data dunn,hmgc,mcw,edu.netstat
linux
Ip:
    9154191 total packets received
    0 forwarded
    0 incoming packets discarded
    8170997 incoming packets delivered
    15226980 requests sent out
Icmp:
    8961 ICMP messages received
    9 input ICMP message failed.
    ICMP input histogram:
        destination unreachable: 16
        echo requests: 8945
    13860 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 4915
        echo replies: 8945
Tcp:
    33598 active connections openings
    15245 passive connection openings
    4 failed connection attempts
    106 connection resets received
    10 connections established
    7841472 segments received
    15058779 segments send out
    412 segments retransmited
    0 bad segments received.
    2935 resets sent
Udp:
    315663 packets received
    4901 packets to unknown port received.
    0 packet receive errors
    154341 packets sent
TcpExt:
    7 resets received for embryonic SYN_RECV sockets
    ArpFilter: 0
    37618 TCP sockets finished time wait in fast timer
    2 packets rejects in established connections because of timestamp
    12809 delayed acks sent
    2 delayed acks further delayed because of locked socket
    Quick ack mode was activated 1930 times
    144921 packets directly queued to recvmsg prequeue.
    7524 packets directly received from backlog
    40929271 packets directly received from prequeue
    47328 packets header predicted
    135097 packets header predicted and directly queued to user
    TCPPureAcks: 5602771
    TCPHPAcks: 2027416
    TCPRenoRecovery: 47
    TCPSackRecovery: 19
    TCPSACKReneging: 0
    TCPFACKReorder: 0
    TCPSACKReorder: 0
    TCPRenoReorder: 0
    TCPTSReorder: 0
    TCPFullUndo: 0
    TCPPartialUndo: 0
    TCPDSACKUndo: 0
    TCPLossUndo: 39
    TCPLoss: 55
    TCPLostRetransmit: 0
    TCPRenoFailures: 2
    TCPSackFailures: 1
    TCPLossFailures: 1
    TCPFastRetrans: 166
    TCPForwardRetrans: 8
    TCPSlowStartRetrans: 35
    TCPTimeouts: 151
    TCPRenoRecoveryFail: 9
    TCPSackRecoveryFail: 1
    TCPSchedulerFailed: 0
    TCPRcvCollapsed: 0
    TCPDSACKOldSent: 1
    TCPDSACKOfoSent: 0
    TCPDSACKRecv: 0
    TCPDSACKOfoRecv: 0
    TCPAbortOnSyn: 0
    TCPAbortOnData: 948
    TCPAbortOnClose: 14
    TCPAbortOnMemory: 0
    TCPAbortOnTimeout: 4
    TCPAbortOnLinger: 0
    TCPAbortFailed: 0
    TCPMemoryPressures: 0
@@
@@data#741|1154534133.480925|192.168.224.202||dunn.hmgc.mcw.edu|ifstat
data dunn,hmgc,mcw,edu.ifstat
linux
eth0      Link encap:Ethernet  HWaddr 00:09:3D:13:DC:AB  
          inet addr:192.168.224.105  Bcast:192.168.224.255
Mask:255.255.255.0
          inet6 addr: fe80::209:3dff:fe13:dcab/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:39290006 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15275236 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2864046246 (2.6 GiB)  TX bytes:21735816115 (20.2 GiB)
          Interrupt:185 

eth0:0    Link encap:Ethernet  HWaddr 00:09:3D:13:DC:AB  
          inet addr:192.168.224.107  Bcast:192.168.224.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:185 

eth0:1    Link encap:Ethernet  HWaddr 00:09:3D:13:DC:AB  
          inet addr:192.168.224.160  Bcast:192.168.224.255
Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:185 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:36728 errors:0 dropped:0 overruns:0 frame:0
          TX packets:36728 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:5951488 (5.6 MiB)  TX bytes:5951488 (5.6 MiB)

@@
list Henrik Størner · Thu, 3 Aug 2006 12:23:00 +0200 ·
quoted from Beau Olivier
On Wed, Aug 02, 2006 at 11:07:06AM -0500, Brodie, Kent wrote:
Hi Henrik:   Below are snippets from rrd that are still causing the
"Duplicate Error" on my end, even after applying the patch.   In the
cases where there's netstat and ifstat data shown together, I had to
include both because those chunks of data came out right at the time the
duplicate error appeared.    Too hard to time/see which of those 2
chunks of data cause the problem.   In other cases, only ifstat data
caused the problem.
It's the ifstat data; or specifically - it is the interface aliases
("eth0:1") that messed up how the data was being parsed. So a somewhat
larger patch was required. Backout the previous patch I sent you, and
apply this one instead.

Or grab the current snapshot if you cannot get it applied without
problems.


Regards,
Henrik

-------------- next part --------------
--- hobbitd/do_rrd.c	2006/07/20 16:06:41	1.36
+++ hobbitd/do_rrd.c	2006/08/03 10:17:27
@@ -225,6 +225,61 @@
 }
 
 
+static pcre **compile_exprs(char *id, const char **patterns, int count)
+{
+	pcre **result = NULL;
+	int i;
• +	result = (pcre **)calloc(count, sizeof(pcre *));
+	for (i=0; (i < count); i++) {
+		result[i] = compileregex(patterns[i]);
+		if (!result[i]) {
+			errprintf("Internal error: %s pickdata PCRE-compile failed\n", id);
+			for (i=0; (i < count); i++) if (result[i]) pcre_free(result[i]);
+			xfree(result);
+			return NULL;
+		}
+	}
• +	return result;
+}
• +static int pickdata(char *buf, pcre *expr, int dupok, ...)
+{
+	int res, i;
+	int ovector[30];
+	va_list ap;
+	char **ptr;
+	char w[100];
• +	res = pcre_exec(expr, NULL, buf, strlen(buf), 0, 0, ovector, (sizeof(ovector)/sizeof(int)));
+	if (res < 0) return 0;
• +	va_start(ap, dupok);
• +	for (i=1; (i < res); i++) {
+		*w = '\0';
+		pcre_copy_substring(buf, ovector, res, i, w, sizeof(w));
+		ptr = va_arg(ap, char **);
+		if (dupok) {
+			if (*ptr) xfree(*ptr);
+			*ptr = strdup(w);
+		}
+		else {
+			if (*ptr == NULL) {
+				*ptr = strdup(w);
+			}
+			else {
+				errprintf("Internal error: Duplicate match ignored\n");
+			}
+		}
+	}
• +	va_end(ap);
• +	return 1;
+}
• /* Include all of the sub-modules. */
 #include "rrd/do_bbgen.c"
 #include "rrd/do_bbtest.c"
--- hobbitd/rrd/do_ifstat.c	2006/08/01 21:32:37	1.7
+++ hobbitd/rrd/do_ifstat.c	2006/08/03 10:15:38
@@ -20,7 +20,7 @@
 /* eth0   Link encap:                                                 */
 /*        RX bytes: 1829192 (265.8 MiB)  TX bytes: 1827320 (187.7 MiB */
 static const char *ifstat_linux_exprs[] = {
-	"^([a-z]+[0-9]+)\\s",
+	"^([a-z]+[0123456789.:]+|lo)\\s",
 	"^\\s+RX bytes:([0-9]+) .*TX bytes.([0-9]+) "
 };
 
@@ -73,7 +73,7 @@
 */
 static const char *ifstat_aix_exprs[] = {
 	"^ETHERNET STATISTICS \\(([a-z0-9]+)\\) :",
-	"^Bytes:\\s+(\\d+)\\s+(\\d+)"
+	"^Bytes:\\s+(\\d+)\\s+Bytes:\\s+(\\d+)"
 };
 
 
@@ -176,25 +176,39 @@
 		  case OS_LINUX22:
 		  case OS_LINUX:
 		  case OS_RHEL3:
-			if (pickdata(bol, ifstat_linux_pcres[0], &ifname)) dmatch |= 1;
-			else if (pickdata(bol, ifstat_linux_pcres[1], &rxstr, &txstr)) dmatch |= 6;
+			if (pickdata(bol, ifstat_linux_pcres[0], 1, &ifname)) {
+				/*
+				 * Linux' netif aliases mess up things. 
+				 * Clear everything when we see an interface name.
+				 * But we dont want to track the "lo" interface.
+				 */
+				if (strcmp(ifname, "lo") == 0) {
+					xfree(ifname); ifname = NULL;
+				}
+				else {
+					dmatch = 1;
+					if (rxstr) { xfree(rxstr); rxstr = NULL; }
+					if (txstr) { xfree(txstr); txstr = NULL; }
+				}
+			}
+			else if (pickdata(bol, ifstat_linux_pcres[1], 1, &rxstr, &txstr)) dmatch |= 6;
 			break;
 
 		  case OS_FREEBSD:
-			if (pickdata(bol, ifstat_freebsd_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+			if (pickdata(bol, ifstat_freebsd_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
 			break;
 
 		  case OS_OPENBSD:
-			if (pickdata(bol, ifstat_openbsd_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+			if (pickdata(bol, ifstat_openbsd_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
 			break;
 
 		  case OS_NETBSD:
-			if (pickdata(bol, ifstat_netbsd_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+			if (pickdata(bol, ifstat_netbsd_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
 			break;
 
 		  case OS_SOLARIS: 
-			if (pickdata(bol, ifstat_solaris_pcres[0], &ifname, &txstr)) dmatch |= 1;
-			else if (pickdata(bol, ifstat_solaris_pcres[1], &dummy, &rxstr)) dmatch |= 6;
+			if (pickdata(bol, ifstat_solaris_pcres[0], 0, &ifname, &txstr)) dmatch |= 1;
+			else if (pickdata(bol, ifstat_solaris_pcres[1], 0, &dummy, &rxstr)) dmatch |= 6;
 
 			if (ifname && dummy && (strcmp(ifname, dummy) != 0)) {
 				/* They must match, drop the data */
@@ -205,22 +219,32 @@
 			break;
 
 		  case OS_AIX: 
-			if (pickdata(bol, ifstat_aix_pcres[0], &ifname)) dmatch |= 1;
-			else if (pickdata(bol, ifstat_aix_pcres[1], &txstr, &rxstr)) dmatch |= 6;
+			if (pickdata(bol, ifstat_aix_pcres[0], 1, &ifname)) {
+				/* Interface names comes first, so any rx/tx data is discarded */
+				dmatch |= 1;
+				if (rxstr) { xfree(rxstr); rxstr = NULL; }
+				if (txstr) { xfree(txstr); txstr = NULL; }
+			}
+			else if (pickdata(bol, ifstat_aix_pcres[1], 1, &txstr, &rxstr)) dmatch |= 6;
 			break;
 
 		  case OS_HPUX: 
-			if (pickdata(bol, ifstat_hpux_pcres[0], &ifname)) dmatch |= 1;
-			else if (pickdata(bol, ifstat_hpux_pcres[1], &rxstr)) dmatch |= 2;
-			else if (pickdata(bol, ifstat_hpux_pcres[2], &txstr)) dmatch |= 4;
+			if (pickdata(bol, ifstat_hpux_pcres[0], 1, &ifname)) {
+				/* Interface names comes first, so any rx/tx data is discarded */
+				dmatch |= 1;
+				if (rxstr) { xfree(rxstr); rxstr = NULL; }
+				if (txstr) { xfree(txstr); txstr = NULL; }
+			}
+			else if (pickdata(bol, ifstat_hpux_pcres[1], 1, &rxstr)) dmatch |= 2;
+			else if (pickdata(bol, ifstat_hpux_pcres[2], 1, &txstr)) dmatch |= 4;
 			break;
 
 		  case OS_DARWIN:
-			if (pickdata(bol, ifstat_darwin_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+			if (pickdata(bol, ifstat_darwin_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
 			break;
 			
  		  case OS_SCO_SV:
-		        if (pickdata(bol, ifstat_sco_sv_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+		        if (pickdata(bol, ifstat_sco_sv_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
 			break;
 			
 		  case OS_OSF:
--- hobbitd/rrd/do_netstat.c	2006/08/01 21:32:37	1.25
+++ hobbitd/rrd/do_netstat.c	2006/08/03 10:01:21
@@ -46,55 +46,6 @@
             *tcpretransbytes = NULL, *tcpretranspackets = NULL;
 
 
-static pcre **compile_exprs(char *id, const char **patterns, int count)
-{
-	pcre **result = NULL;
-	int i;
• -	result = (pcre **)calloc(count, sizeof(pcre *));
-	for (i=0; (i < count); i++) {
-		result[i] = compileregex(patterns[i]);
-		if (!result[i]) {
-			errprintf("Internal error: %s netstat PCRE-compile failed\n", id);
-			for (i=0; (i < count); i++) if (result[i]) pcre_free(result[i]);
-			xfree(result);
-			return NULL;
-		}
-	}
• -	return result;
-}
• -static int pickdata(char *buf, pcre *expr, ...)
-{
-	int res, i;
-	int ovector[30];
-	va_list ap;
-	char **ptr;
-	char w[100];
• -	res = pcre_exec(expr, NULL, buf, strlen(buf), 0, 0, ovector, (sizeof(ovector)/sizeof(int)));
-	if (res < 0) return 0;
• -	va_start(ap, expr);
• -	for (i=1; (i < res); i++) {
-		*w = '\0';
-		pcre_copy_substring(buf, ovector, res, i, w, sizeof(w));
-		ptr = va_arg(ap, char **);
-		if (*ptr == NULL) {
-			*ptr = strdup(w);
-		}
-		else {
-			errprintf("Internal error: Duplicate match ignored\n");
-		}
-	}
• -	va_end(ap);
• -	return 1;
-}
• static void prepare_update(char *outp)
 {
 	outp += sprintf(outp, ":%s", (udpreceived ? udpreceived : "U")); if (udpreceived) xfree(udpreceived);
@@ -135,20 +86,20 @@
 		else {
 			switch (sect) {
 			  case AT_TCP:
-				if (pickdata(datapart, pcreset[0], &tcpretranspackets, &tcpretransbytes)   ||
-				    pickdata(datapart, pcreset[1], &tcpoutdatapackets, &tcpoutdatabytes)   ||
-				    pickdata(datapart, pcreset[2], &tcpinorderpackets, &tcpinorderbytes)   ||
-				    pickdata(datapart, pcreset[3], &tcpoutorderpackets, &tcpoutorderbytes) ||
-				    pickdata(datapart, pcreset[4], &tcpconnrequests)                       ||
-				    pickdata(datapart, pcreset[5], &tcpconnaccepts)) havedata++;
+				if (pickdata(datapart, pcreset[0],  0, &tcpretranspackets, &tcpretransbytes)   ||
+				    pickdata(datapart, pcreset[1],  0, &tcpoutdatapackets, &tcpoutdatabytes)   ||
+				    pickdata(datapart, pcreset[2],  0, &tcpinorderpackets, &tcpinorderbytes)   ||
+				    pickdata(datapart, pcreset[3],  0, &tcpoutorderpackets, &tcpoutorderbytes) ||
+				    pickdata(datapart, pcreset[4],  0, &tcpconnrequests)                       ||
+				    pickdata(datapart, pcreset[5],  0, &tcpconnaccepts)) havedata++;
 				break;
 
 			  case AT_UDP:
-				if (pickdata(datapart, pcreset[6], &udpreceived)   ||
-				    pickdata(datapart, pcreset[7], &udpsent)       ||
-				    pickdata(datapart, pcreset[8], &udperr1)       ||
-				    pickdata(datapart, pcreset[9], &udperr2)       ||
-				    pickdata(datapart, pcreset[10], &udperr3)) havedata++;
+				if (pickdata(datapart, pcreset[6],  0, &udpreceived)   ||
+				    pickdata(datapart, pcreset[7],  0, &udpsent)       ||
+				    pickdata(datapart, pcreset[8],  0, &udperr1)       ||
+				    pickdata(datapart, pcreset[9],  0, &udperr2)       ||
+				    pickdata(datapart, pcreset[10], 0, &udperr3)) havedata++;
 				break;
 
 			  default:
list Colin Spargo · Thu, 3 Aug 2006 11:37:02 +0100 ·
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101).

 Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL
   Memory              Used       Total  Percentage
red Physical     4294955003M     131072M 4294967287%
green Swap              40973M     144024M         28%


That physical memory calculation is obviously incorrect!

This only happened once, then it went back to normal. The "hostdata" that was saved at the time of the alert had the following memory data:

[memory]
 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0  0 2692 436955 11454 8 6 86

That looks fine to me. I can't see how it could have taken those values and got that bizzare total for memory.


This is how it normally looks for this host:

 Thu Aug 3 10:01:48 BST 2006 - Memory OK

   Memory              Used       Total  Percentage
green Physical          59483M     131072M         45%
green Swap              40960M     144026M         28%


and this is a sample of "normal" memory data from the host:

[memory]
 0 0 0 105535000 73280352 152 910 0 126 126 0 0 0 0 0 0 3339 811798 11953 4 7 89


(I'm running 4.2-RC-20060712)

This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
list Henrik Størner · Thu, 3 Aug 2006 12:49:44 +0200 ·
quoted from Colin Spargo
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101).

 Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL
   Memory              Used       Total  Percentage
red Physical     4294955003M     131072M 4294967287%
green Swap              40973M     144024M         28%


That physical memory calculation is obviously incorrect!
Yep.
quoted from Colin Spargo
This only happened once, then it went back to normal. The "hostdata" that was saved at the time of the alert had the following memory data:

[memory]
 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0  0 2692 436955 11454 8 6 86

That looks fine to me. I can't see how it could have taken those values and got that bizzare total for memory.
Could you send me the [prtconf], [swap] and [memory] sections from
that hostdata file?

Regards,
Henrik
list Henrik Størner · Thu, 3 Aug 2006 13:33:50 +0200 ·
quoted from Colin Spargo
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
I got this in the "memory" column for a Solaris 8 host this morning, which 
caused it to go red (even though i have the threshold set to 101).

 Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL
   Memory              Used       Total  Percentage
red Physical     4294955003M     131072M 4294967287%
green Swap              40973M     144024M         28%


That physical memory calculation is obviously incorrect!
Yep, but the data it got were weird. Colin sent me some additional
data from the client message. The interesting bits are here:

The Solaris prtconf command is used to determine the amount of RAM
in the box. Here is says:

[prtconf]
System Configuration:  Sun Microsystems  sun4u
Memory size: 131072 Megabytes

So this box has 131072 MB. (128 GB - a lot, I might add. Is this really
true?)

The command "vmstat 1 2|tail -1" is used to grab the current memory
usage:
quoted from Henrik Størner

[memory]
0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0  0 2692 436955 11454 8 6 86

Column 5 is the "free memory" column in KB, here: 146806184 KB. Divide
by 1024 to get MB, and it gives 143365 MB free.

Now ... how can a box with 131072 MB RAM end up with 143365 MB free ?
That's almost 12 GB more than what is physically installed in the box.

Hobbit then gets a negative value for the amount of memory used, and
because it is then used in a calculation with some unsigned variables
it blows up and comes up with this hilarious value of the amount of
memory used.


Now, I'll admit that Hobbit should probably do a sanity check on the
data so it doesn't trigger alerts in these circumstances. But the core
problem is that your box is reporting some weird data.


Regards,
Henrik
list Galen Johnson · Thu, 03 Aug 2006 09:24:35 -0400 ·
quoted from Henrik Størner
Henrik Stoerner wrote:
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
 
I got this in the "memory" column for a Solaris 8 host this morning, which 
caused it to go red (even though i have the threshold set to 101).

Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL
  Memory              Used       Total  Percentage
red Physical     4294955003M     131072M 4294967287%
green Swap              40973M     144024M         28%


That physical memory calculation is obviously incorrect!
   
Yep, but the data it got were weird. Colin sent me some additional
data from the client message. The interesting bits are here:

The Solaris prtconf command is used to determine the amount of RAM
in the box. Here is says:

[prtconf]
System Configuration:  Sun Microsystems  sun4u
Memory size: 131072 Megabytes

So this box has 131072 MB. (128 GB - a lot, I might add. Is this really
true?)

The command "vmstat 1 2|tail -1" is used to grab the current memory
usage:

[memory]
0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0  0 2692 436955 11454 8 6 86

Column 5 is the "free memory" column in KB, here: 146806184 KB. Divide
by 1024 to get MB, and it gives 143365 MB free.

Now ... how can a box with 131072 MB RAM end up with 143365 MB free ?
That's almost 12 GB more than what is physically installed in the box.

Hobbit then gets a negative value for the amount of memory used, and
because it is then used in a calculation with some unsigned variables
it blows up and comes up with this hilarious value of the amount of
memory used.


Now, I'll admit that Hobbit should probably do a sanity check on the
data so it doesn't trigger alerts in these circumstances. But the core
problem is that your box is reporting some weird data.


Regards,
Henrik

 
It sounds to me like vmstat is reporting the total memory as "real + 
disk"...which I believe is what the 4th column shows (I don't have a 
solaris server where I'm currently at to confirm).  So, while hobbit 
keys on the physical (real) memory to determine the memory, it's keying 
on a computed value to do the differential.  I've seen this under HPUX 
and linux on occasion for the memory tests for BB.  I've usually just 
gone in and tweaked the scripts when that wouls happen.

=G=