Patch not done yet? was RE: rrd-data.log
list Kent Brodie
Hi Henrik: Below are snippets from rrd that are still causing the
"Duplicate Error" on my end, even after applying the patch. In the
cases where there's netstat and ifstat data shown together, I had to
include both because those chunks of data came out right at the time the
duplicate error appeared. Too hard to time/see which of those 2
chunks of data cause the problem. In other cases, only ifstat data
caused the problem.
Where I have large chunks of whitespace- that separates "instances" of
the duplicate error occurring.
===========
@@data#348|1154533834.591188|192.168.224.202||wolf13.hmgc.mcw.edu|netsta
t
data wolf13,hmgc,mcw,edu.netstat
linux
Ip:
56731 total packets received
0 forwarded
0 incoming packets discarded
56067 incoming packets delivered
65344 requests sent out
Icmp:
648 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
destination unreachable: 89
echo requests: 559
648 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 89
echo replies: 559
Tcp:
4630 active connections openings
1 passive connection openings
1 failed connection attempts
0 connection resets received
0 connections established
46150 segments received
54675 segments send out
85 segments retransmited
0 bad segments received.
1 resets sent
Udp:
9933 packets received
0 packets to unknown port received.
0 packet receive errors
10021 packets sent
TcpExt:
ArpFilter: 0
4616 TCP sockets finished time wait in fast timer
908 delayed acks sent
557 packets directly queued to recvmsg prequeue.
507580 packets directly received from backlog
554 packets directly received from prequeue
19157 packets header predicted
426 packets header predicted and directly queued to user
TCPPureAcks: 7526
TCPHPAcks: 9608
TCPRenoRecovery: 0
TCPSackRecovery: 0
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 0
TCPLossUndo: 35
TCPLoss: 0
TCPLostRetransmit: 0
TCPRenoFailures: 0
TCPSackFailures: 0
TCPLossFailures: 0
TCPFastRetrans: 0
TCPForwardRetrans: 0
TCPSlowStartRetrans: 0
TCPTimeouts: 85
TCPRenoRecoveryFail: 0
TCPSackRecoveryFail: 0
TCPSchedulerFailed: 0
TCPRcvCollapsed: 0
TCPDSACKOldSent: 0
TCPDSACKOfoSent: 0
TCPDSACKRecv: 0
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 0
TCPAbortOnClose: 0
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 0
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
@@
@@data#349|1154533834.591337|192.168.224.202||wolf13.hmgc.mcw.edu|ifstat
data wolf13,hmgc,mcw,edu.ifstat
linux
eth1 Link encap:Ethernet HWaddr 00:30:6E:F3:0B:46
inet addr:192.168.96.113 Bcast:192.168.96.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:72022 errors:0 dropped:0 overruns:0 frame:0
TX packets:71139 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:21476056 (20.4 Mb) TX bytes:17076949 (16.2 Mb)
Interrupt:56
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:189 errors:0 dropped:0 overruns:0 frame:0
TX packets:189 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:16228 (15.8 Kb) TX bytes:16228 (15.8 Kb)
@@
@@data#507|1154533965.074399|192.168.224.202||bc1s2.phys.mcw.edu|ifstat
data bc1s2,phys,mcw,edu.ifstat
linux
eth1 Link encap:Ethernet HWaddr 00:0D:60:1E:0E:DD
inet addr:192.168.224.111 Bcast:192.168.224.255
Mask:255.255.255.0
inet6 addr: fe80::20d:60ff:fe1e:edd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:43166350 errors:0 dropped:0 overruns:0 frame:0
TX packets:6166704 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:14398847028 (13731.8 Mb) TX bytes:2587305933 (2467.4
Mb)
Interrupt:45 Memory:c0010000-c0020000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:104 errors:0 dropped:0 overruns:0 frame:0
TX packets:104 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9752 (9.5 Kb) TX bytes:9752 (9.5 Kb)
@@
@@data#740|1154534133.480180|192.168.224.202||dunn.hmgc.mcw.edu|netstat
data dunn,hmgc,mcw,edu.netstat
linux
Ip:
9154191 total packets received
0 forwarded
0 incoming packets discarded
8170997 incoming packets delivered
15226980 requests sent out
Icmp:
8961 ICMP messages received
9 input ICMP message failed.
ICMP input histogram:
destination unreachable: 16
echo requests: 8945
13860 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 4915
echo replies: 8945
Tcp:
33598 active connections openings
15245 passive connection openings
4 failed connection attempts
106 connection resets received
10 connections established
7841472 segments received
15058779 segments send out
412 segments retransmited
0 bad segments received.
2935 resets sent
Udp:
315663 packets received
4901 packets to unknown port received.
0 packet receive errors
154341 packets sent
TcpExt:
7 resets received for embryonic SYN_RECV sockets
ArpFilter: 0
37618 TCP sockets finished time wait in fast timer
2 packets rejects in established connections because of timestamp
12809 delayed acks sent
2 delayed acks further delayed because of locked socket
Quick ack mode was activated 1930 times
144921 packets directly queued to recvmsg prequeue.
7524 packets directly received from backlog
40929271 packets directly received from prequeue
47328 packets header predicted
135097 packets header predicted and directly queued to user
TCPPureAcks: 5602771
TCPHPAcks: 2027416
TCPRenoRecovery: 47
TCPSackRecovery: 19
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 0
TCPLossUndo: 39
TCPLoss: 55
TCPLostRetransmit: 0
TCPRenoFailures: 2
TCPSackFailures: 1
TCPLossFailures: 1
TCPFastRetrans: 166
TCPForwardRetrans: 8
TCPSlowStartRetrans: 35
TCPTimeouts: 151
TCPRenoRecoveryFail: 9
TCPSackRecoveryFail: 1
TCPSchedulerFailed: 0
TCPRcvCollapsed: 0
TCPDSACKOldSent: 1
TCPDSACKOfoSent: 0
TCPDSACKRecv: 0
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 948
TCPAbortOnClose: 14
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 4
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
@@
@@data#741|1154534133.480925|192.168.224.202||dunn.hmgc.mcw.edu|ifstat
data dunn,hmgc,mcw,edu.ifstat
linux
eth0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB
inet addr:192.168.224.105 Bcast:192.168.224.255
Mask:255.255.255.0
inet6 addr: fe80::209:3dff:fe13:dcab/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:39290006 errors:0 dropped:0 overruns:0 frame:0
TX packets:15275236 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2864046246 (2.6 GiB) TX bytes:21735816115 (20.2 GiB)
Interrupt:185
eth0:0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB
inet addr:192.168.224.107 Bcast:192.168.224.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185
eth0:1 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB
inet addr:192.168.224.160 Bcast:192.168.224.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:36728 errors:0 dropped:0 overruns:0 frame:0
TX packets:36728 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5951488 (5.6 MiB) TX bytes:5951488 (5.6 MiB)
@@
list Beau Olivier
same for me, i still get the log after the patch. Olivier
▸
-----Message d'origine-----
De : Brodie, Kent [mailto:user-8fbf1c81e97c@xymon.invalid]
Envoyé : mercredi 2 août 2006 18:07
À : user-ae9b8668bcde@xymon.invalid
Objet : [hobbit] Patch not done yet? was RE: rrd-data.log
Hi Henrik: Below are snippets from rrd that are still causing the
"Duplicate Error" on my end, even after applying the patch. In the
cases where there's netstat and ifstat data shown together, I had to
include both because those chunks of data came out right at the time the
duplicate error appeared. Too hard to time/see which of those 2
chunks of data cause the problem. In other cases, only ifstat data
caused the problem.
Where I have large chunks of whitespace- that separates "instances" of
the duplicate error occurring.
===========
@@data#348|1154533834.591188|192.168.224.202||wolf13.hmgc.mcw.edu|netsta
t
data wolf13,hmgc,mcw,edu.netstat
linux
Ip:
56731 total packets received
0 forwarded
0 incoming packets discarded
56067 incoming packets delivered
65344 requests sent out
Icmp:
648 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
destination unreachable: 89
echo requests: 559
648 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 89
echo replies: 559
Tcp:
4630 active connections openings
1 passive connection openings
1 failed connection attempts
0 connection resets received
0 connections established
46150 segments received
54675 segments send out
85 segments retransmited
0 bad segments received.
1 resets sent
Udp:
9933 packets received
0 packets to unknown port received.
0 packet receive errors
10021 packets sent
TcpExt:
ArpFilter: 0
4616 TCP sockets finished time wait in fast timer
908 delayed acks sent
557 packets directly queued to recvmsg prequeue.
507580 packets directly received from backlog
554 packets directly received from prequeue
19157 packets header predicted
426 packets header predicted and directly queued to user
TCPPureAcks: 7526
TCPHPAcks: 9608
TCPRenoRecovery: 0
TCPSackRecovery: 0
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 0
TCPLossUndo: 35
TCPLoss: 0
TCPLostRetransmit: 0
TCPRenoFailures: 0
TCPSackFailures: 0
TCPLossFailures: 0
TCPFastRetrans: 0
TCPForwardRetrans: 0
TCPSlowStartRetrans: 0
TCPTimeouts: 85
TCPRenoRecoveryFail: 0
TCPSackRecoveryFail: 0
TCPSchedulerFailed: 0
TCPRcvCollapsed: 0
TCPDSACKOldSent: 0
TCPDSACKOfoSent: 0
TCPDSACKRecv: 0
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 0
TCPAbortOnClose: 0
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 0
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
@@
@@data#349|1154533834.591337|192.168.224.202||wolf13.hmgc.mcw.edu|ifstat
data wolf13,hmgc,mcw,edu.ifstat
linux
eth1 Link encap:Ethernet HWaddr 00:30:6E:F3:0B:46
inet addr:192.168.96.113 Bcast:192.168.96.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:72022 errors:0 dropped:0 overruns:0 frame:0
TX packets:71139 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:21476056 (20.4 Mb) TX bytes:17076949 (16.2 Mb)
Interrupt:56
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:189 errors:0 dropped:0 overruns:0 frame:0
TX packets:189 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:16228 (15.8 Kb) TX bytes:16228 (15.8 Kb)
@@
@@data#507|1154533965.074399|192.168.224.202||bc1s2.phys.mcw.edu|ifstat
data bc1s2,phys,mcw,edu.ifstat
linux
eth1 Link encap:Ethernet HWaddr 00:0D:60:1E:0E:DD
inet addr:192.168.224.111 Bcast:192.168.224.255
Mask:255.255.255.0
inet6 addr: fe80::20d:60ff:fe1e:edd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:43166350 errors:0 dropped:0 overruns:0 frame:0
TX packets:6166704 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:14398847028 (13731.8 Mb) TX bytes:2587305933 (2467.4
Mb)
Interrupt:45 Memory:c0010000-c0020000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:104 errors:0 dropped:0 overruns:0 frame:0
TX packets:104 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9752 (9.5 Kb) TX bytes:9752 (9.5 Kb)
@@
@@data#740|1154534133.480180|192.168.224.202||dunn.hmgc.mcw.edu|netstat
data dunn,hmgc,mcw,edu.netstat
linux
Ip:
9154191 total packets received
0 forwarded
0 incoming packets discarded
8170997 incoming packets delivered
15226980 requests sent out
Icmp:
8961 ICMP messages received
9 input ICMP message failed.
ICMP input histogram:
destination unreachable: 16
echo requests: 8945
13860 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 4915
echo replies: 8945
Tcp:
33598 active connections openings
15245 passive connection openings
4 failed connection attempts
106 connection resets received
10 connections established
7841472 segments received
15058779 segments send out
412 segments retransmited
0 bad segments received.
2935 resets sent
Udp:
315663 packets received
4901 packets to unknown port received.
0 packet receive errors
154341 packets sent
TcpExt:
7 resets received for embryonic SYN_RECV sockets
ArpFilter: 0
37618 TCP sockets finished time wait in fast timer
2 packets rejects in established connections because of timestamp
12809 delayed acks sent
2 delayed acks further delayed because of locked socket
Quick ack mode was activated 1930 times
144921 packets directly queued to recvmsg prequeue.
7524 packets directly received from backlog
40929271 packets directly received from prequeue
47328 packets header predicted
135097 packets header predicted and directly queued to user
TCPPureAcks: 5602771
TCPHPAcks: 2027416
TCPRenoRecovery: 47
TCPSackRecovery: 19
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 0
TCPLossUndo: 39
TCPLoss: 55
TCPLostRetransmit: 0
TCPRenoFailures: 2
TCPSackFailures: 1
TCPLossFailures: 1
TCPFastRetrans: 166
TCPForwardRetrans: 8
TCPSlowStartRetrans: 35
TCPTimeouts: 151
TCPRenoRecoveryFail: 9
TCPSackRecoveryFail: 1
TCPSchedulerFailed: 0
TCPRcvCollapsed: 0
TCPDSACKOldSent: 1
TCPDSACKOfoSent: 0
TCPDSACKRecv: 0
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 948
TCPAbortOnClose: 14
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 4
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
@@
@@data#741|1154534133.480925|192.168.224.202||dunn.hmgc.mcw.edu|ifstat
data dunn,hmgc,mcw,edu.ifstat
linux
eth0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB
inet addr:192.168.224.105 Bcast:192.168.224.255
Mask:255.255.255.0
inet6 addr: fe80::209:3dff:fe13:dcab/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:39290006 errors:0 dropped:0 overruns:0 frame:0
TX packets:15275236 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2864046246 (2.6 GiB) TX bytes:21735816115 (20.2 GiB)
Interrupt:185
eth0:0 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB
inet addr:192.168.224.107 Bcast:192.168.224.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185
eth0:1 Link encap:Ethernet HWaddr 00:09:3D:13:DC:AB
inet addr:192.168.224.160 Bcast:192.168.224.255
Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:185
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:36728 errors:0 dropped:0 overruns:0 frame:0
TX packets:36728 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5951488 (5.6 MiB) TX bytes:5951488 (5.6 MiB)
@@
list Henrik Størner
▸
On Wed, Aug 02, 2006 at 11:07:06AM -0500, Brodie, Kent wrote:
Hi Henrik: Below are snippets from rrd that are still causing the "Duplicate Error" on my end, even after applying the patch. In the cases where there's netstat and ifstat data shown together, I had to include both because those chunks of data came out right at the time the duplicate error appeared. Too hard to time/see which of those 2 chunks of data cause the problem. In other cases, only ifstat data caused the problem.
It's the ifstat data; or specifically - it is the interface aliases
("eth0:1") that messed up how the data was being parsed. So a somewhat
larger patch was required. Backout the previous patch I sent you, and
apply this one instead.
Or grab the current snapshot if you cannot get it applied without
problems.
Regards,
Henrik
-------------- next part --------------
--- hobbitd/do_rrd.c 2006/07/20 16:06:41 1.36
+++ hobbitd/do_rrd.c 2006/08/03 10:17:27
@@ -225,6 +225,61 @@
}
+static pcre **compile_exprs(char *id, const char **patterns, int count)
+{
+ pcre **result = NULL;
+ int i;
• + result = (pcre **)calloc(count, sizeof(pcre *));
+ for (i=0; (i < count); i++) {
+ result[i] = compileregex(patterns[i]);
+ if (!result[i]) {
+ errprintf("Internal error: %s pickdata PCRE-compile failed\n", id);
+ for (i=0; (i < count); i++) if (result[i]) pcre_free(result[i]);
+ xfree(result);
+ return NULL;
+ }
+ }
• + return result;
+}
• +static int pickdata(char *buf, pcre *expr, int dupok, ...)
+{
+ int res, i;
+ int ovector[30];
+ va_list ap;
+ char **ptr;
+ char w[100];
• + res = pcre_exec(expr, NULL, buf, strlen(buf), 0, 0, ovector, (sizeof(ovector)/sizeof(int)));
+ if (res < 0) return 0;
• + va_start(ap, dupok);
• + for (i=1; (i < res); i++) {
+ *w = '\0';
+ pcre_copy_substring(buf, ovector, res, i, w, sizeof(w));
+ ptr = va_arg(ap, char **);
+ if (dupok) {
+ if (*ptr) xfree(*ptr);
+ *ptr = strdup(w);
+ }
+ else {
+ if (*ptr == NULL) {
+ *ptr = strdup(w);
+ }
+ else {
+ errprintf("Internal error: Duplicate match ignored\n");
+ }
+ }
+ }
• + va_end(ap);
• + return 1;
+}
• /* Include all of the sub-modules. */
#include "rrd/do_bbgen.c"
#include "rrd/do_bbtest.c"
--- hobbitd/rrd/do_ifstat.c 2006/08/01 21:32:37 1.7
+++ hobbitd/rrd/do_ifstat.c 2006/08/03 10:15:38
@@ -20,7 +20,7 @@
/* eth0 Link encap: */
/* RX bytes: 1829192 (265.8 MiB) TX bytes: 1827320 (187.7 MiB */
static const char *ifstat_linux_exprs[] = {
- "^([a-z]+[0-9]+)\\s",
+ "^([a-z]+[0123456789.:]+|lo)\\s",
"^\\s+RX bytes:([0-9]+) .*TX bytes.([0-9]+) "
};
@@ -73,7 +73,7 @@
*/
static const char *ifstat_aix_exprs[] = {
"^ETHERNET STATISTICS \\(([a-z0-9]+)\\) :",
- "^Bytes:\\s+(\\d+)\\s+(\\d+)"
+ "^Bytes:\\s+(\\d+)\\s+Bytes:\\s+(\\d+)"
};
@@ -176,25 +176,39 @@
case OS_LINUX22:
case OS_LINUX:
case OS_RHEL3:
- if (pickdata(bol, ifstat_linux_pcres[0], &ifname)) dmatch |= 1;
- else if (pickdata(bol, ifstat_linux_pcres[1], &rxstr, &txstr)) dmatch |= 6;
+ if (pickdata(bol, ifstat_linux_pcres[0], 1, &ifname)) {
+ /*
+ * Linux' netif aliases mess up things.
+ * Clear everything when we see an interface name.
+ * But we dont want to track the "lo" interface.
+ */
+ if (strcmp(ifname, "lo") == 0) {
+ xfree(ifname); ifname = NULL;
+ }
+ else {
+ dmatch = 1;
+ if (rxstr) { xfree(rxstr); rxstr = NULL; }
+ if (txstr) { xfree(txstr); txstr = NULL; }
+ }
+ }
+ else if (pickdata(bol, ifstat_linux_pcres[1], 1, &rxstr, &txstr)) dmatch |= 6;
break;
case OS_FREEBSD:
- if (pickdata(bol, ifstat_freebsd_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+ if (pickdata(bol, ifstat_freebsd_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
break;
case OS_OPENBSD:
- if (pickdata(bol, ifstat_openbsd_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+ if (pickdata(bol, ifstat_openbsd_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
break;
case OS_NETBSD:
- if (pickdata(bol, ifstat_netbsd_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+ if (pickdata(bol, ifstat_netbsd_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
break;
case OS_SOLARIS:
- if (pickdata(bol, ifstat_solaris_pcres[0], &ifname, &txstr)) dmatch |= 1;
- else if (pickdata(bol, ifstat_solaris_pcres[1], &dummy, &rxstr)) dmatch |= 6;
+ if (pickdata(bol, ifstat_solaris_pcres[0], 0, &ifname, &txstr)) dmatch |= 1;
+ else if (pickdata(bol, ifstat_solaris_pcres[1], 0, &dummy, &rxstr)) dmatch |= 6;
if (ifname && dummy && (strcmp(ifname, dummy) != 0)) {
/* They must match, drop the data */
@@ -205,22 +219,32 @@
break;
case OS_AIX:
- if (pickdata(bol, ifstat_aix_pcres[0], &ifname)) dmatch |= 1;
- else if (pickdata(bol, ifstat_aix_pcres[1], &txstr, &rxstr)) dmatch |= 6;
+ if (pickdata(bol, ifstat_aix_pcres[0], 1, &ifname)) {
+ /* Interface names comes first, so any rx/tx data is discarded */
+ dmatch |= 1;
+ if (rxstr) { xfree(rxstr); rxstr = NULL; }
+ if (txstr) { xfree(txstr); txstr = NULL; }
+ }
+ else if (pickdata(bol, ifstat_aix_pcres[1], 1, &txstr, &rxstr)) dmatch |= 6;
break;
case OS_HPUX:
- if (pickdata(bol, ifstat_hpux_pcres[0], &ifname)) dmatch |= 1;
- else if (pickdata(bol, ifstat_hpux_pcres[1], &rxstr)) dmatch |= 2;
- else if (pickdata(bol, ifstat_hpux_pcres[2], &txstr)) dmatch |= 4;
+ if (pickdata(bol, ifstat_hpux_pcres[0], 1, &ifname)) {
+ /* Interface names comes first, so any rx/tx data is discarded */
+ dmatch |= 1;
+ if (rxstr) { xfree(rxstr); rxstr = NULL; }
+ if (txstr) { xfree(txstr); txstr = NULL; }
+ }
+ else if (pickdata(bol, ifstat_hpux_pcres[1], 1, &rxstr)) dmatch |= 2;
+ else if (pickdata(bol, ifstat_hpux_pcres[2], 1, &txstr)) dmatch |= 4;
break;
case OS_DARWIN:
- if (pickdata(bol, ifstat_darwin_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+ if (pickdata(bol, ifstat_darwin_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
break;
case OS_SCO_SV:
- if (pickdata(bol, ifstat_sco_sv_pcres[0], &ifname, &rxstr, &txstr)) dmatch = 7;
+ if (pickdata(bol, ifstat_sco_sv_pcres[0], 0, &ifname, &rxstr, &txstr)) dmatch = 7;
break;
case OS_OSF:
--- hobbitd/rrd/do_netstat.c 2006/08/01 21:32:37 1.25
+++ hobbitd/rrd/do_netstat.c 2006/08/03 10:01:21
@@ -46,55 +46,6 @@
*tcpretransbytes = NULL, *tcpretranspackets = NULL;
-static pcre **compile_exprs(char *id, const char **patterns, int count)
-{
- pcre **result = NULL;
- int i;
• - result = (pcre **)calloc(count, sizeof(pcre *));
- for (i=0; (i < count); i++) {
- result[i] = compileregex(patterns[i]);
- if (!result[i]) {
- errprintf("Internal error: %s netstat PCRE-compile failed\n", id);
- for (i=0; (i < count); i++) if (result[i]) pcre_free(result[i]);
- xfree(result);
- return NULL;
- }
- }
• - return result;
-}
• -static int pickdata(char *buf, pcre *expr, ...)
-{
- int res, i;
- int ovector[30];
- va_list ap;
- char **ptr;
- char w[100];
• - res = pcre_exec(expr, NULL, buf, strlen(buf), 0, 0, ovector, (sizeof(ovector)/sizeof(int)));
- if (res < 0) return 0;
• - va_start(ap, expr);
• - for (i=1; (i < res); i++) {
- *w = '\0';
- pcre_copy_substring(buf, ovector, res, i, w, sizeof(w));
- ptr = va_arg(ap, char **);
- if (*ptr == NULL) {
- *ptr = strdup(w);
- }
- else {
- errprintf("Internal error: Duplicate match ignored\n");
- }
- }
• - va_end(ap);
• - return 1;
-}
• static void prepare_update(char *outp)
{
outp += sprintf(outp, ":%s", (udpreceived ? udpreceived : "U")); if (udpreceived) xfree(udpreceived);
@@ -135,20 +86,20 @@
else {
switch (sect) {
case AT_TCP:
- if (pickdata(datapart, pcreset[0], &tcpretranspackets, &tcpretransbytes) ||
- pickdata(datapart, pcreset[1], &tcpoutdatapackets, &tcpoutdatabytes) ||
- pickdata(datapart, pcreset[2], &tcpinorderpackets, &tcpinorderbytes) ||
- pickdata(datapart, pcreset[3], &tcpoutorderpackets, &tcpoutorderbytes) ||
- pickdata(datapart, pcreset[4], &tcpconnrequests) ||
- pickdata(datapart, pcreset[5], &tcpconnaccepts)) havedata++;
+ if (pickdata(datapart, pcreset[0], 0, &tcpretranspackets, &tcpretransbytes) ||
+ pickdata(datapart, pcreset[1], 0, &tcpoutdatapackets, &tcpoutdatabytes) ||
+ pickdata(datapart, pcreset[2], 0, &tcpinorderpackets, &tcpinorderbytes) ||
+ pickdata(datapart, pcreset[3], 0, &tcpoutorderpackets, &tcpoutorderbytes) ||
+ pickdata(datapart, pcreset[4], 0, &tcpconnrequests) ||
+ pickdata(datapart, pcreset[5], 0, &tcpconnaccepts)) havedata++;
break;
case AT_UDP:
- if (pickdata(datapart, pcreset[6], &udpreceived) ||
- pickdata(datapart, pcreset[7], &udpsent) ||
- pickdata(datapart, pcreset[8], &udperr1) ||
- pickdata(datapart, pcreset[9], &udperr2) ||
- pickdata(datapart, pcreset[10], &udperr3)) havedata++;
+ if (pickdata(datapart, pcreset[6], 0, &udpreceived) ||
+ pickdata(datapart, pcreset[7], 0, &udpsent) ||
+ pickdata(datapart, pcreset[8], 0, &udperr1) ||
+ pickdata(datapart, pcreset[9], 0, &udperr2) ||
+ pickdata(datapart, pcreset[10], 0, &udperr3)) havedata++;
break;
default:
list Colin Spargo
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101). Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28% That physical memory calculation is obviously incorrect! This only happened once, then it went back to normal. The "hostdata" that was saved at the time of the alert had the following memory data: [memory] 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86 That looks fine to me. I can't see how it could have taken those values and got that bizzare total for memory. This is how it normally looks for this host: Thu Aug 3 10:01:48 BST 2006 - Memory OK Memory Used Total Percentage green Physical 59483M 131072M 45% green Swap 40960M 144026M 28% and this is a sample of "normal" memory data from the host: [memory] 0 0 0 105535000 73280352 152 910 0 126 126 0 0 0 0 0 0 3339 811798 11953 4 7 89 (I'm running 4.2-RC-20060712) This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
list Henrik Størner
▸
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101). Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28% That physical memory calculation is obviously incorrect!
Yep.
▸
This only happened once, then it went back to normal. The "hostdata" that was saved at the time of the alert had the following memory data: [memory] 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86 That looks fine to me. I can't see how it could have taken those values and got that bizzare total for memory.
Could you send me the [prtconf], [swap] and [memory] sections from that hostdata file? Regards, Henrik
list Henrik Størner
▸
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:
I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101). Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28% That physical memory calculation is obviously incorrect!
Yep, but the data it got were weird. Colin sent me some additional data from the client message. The interesting bits are here: The Solaris prtconf command is used to determine the amount of RAM in the box. Here is says: [prtconf] System Configuration: Sun Microsystems sun4u Memory size: 131072 Megabytes So this box has 131072 MB. (128 GB - a lot, I might add. Is this really true?) The command "vmstat 1 2|tail -1" is used to grab the current memory usage:
▸
[memory]
0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86
Column 5 is the "free memory" column in KB, here: 146806184 KB. Divide
by 1024 to get MB, and it gives 143365 MB free.
Now ... how can a box with 131072 MB RAM end up with 143365 MB free ?
That's almost 12 GB more than what is physically installed in the box.
Hobbit then gets a negative value for the amount of memory used, and
because it is then used in a calculation with some unsigned variables
it blows up and comes up with this hilarious value of the amount of
memory used.
Now, I'll admit that Hobbit should probably do a sanity check on the
data so it doesn't trigger alerts in these circumstances. But the core
problem is that your box is reporting some weird data.
Regards,
Henrik
list Galen Johnson
▸
Henrik Stoerner wrote:
On Thu, Aug 03, 2006 at 11:37:02AM +0100, Colin Spargo wrote:I got this in the "memory" column for a Solaris 8 host this morning, which caused it to go red (even though i have the threshold set to 101). Thu Aug 3 09:11:17 BST 2006 - Memory CRITICAL Memory Used Total Percentage red Physical 4294955003M 131072M 4294967287% green Swap 40973M 144024M 28% That physical memory calculation is obviously incorrect!Yep, but the data it got were weird. Colin sent me some additional data from the client message. The interesting bits are here: The Solaris prtconf command is used to determine the amount of RAM in the box. Here is says: [prtconf] System Configuration: Sun Microsystems sun4u Memory size: 131072 Megabytes So this box has 131072 MB. (128 GB - a lot, I might add. Is this really true?) The command "vmstat 1 2|tail -1" is used to grab the current memory usage: [memory] 0 0 0 211046168 146806184 744 6249 0 0 0 0 0 0 0 0 0 2692 436955 11454 8 6 86 Column 5 is the "free memory" column in KB, here: 146806184 KB. Divide by 1024 to get MB, and it gives 143365 MB free. Now ... how can a box with 131072 MB RAM end up with 143365 MB free ? That's almost 12 GB more than what is physically installed in the box. Hobbit then gets a negative value for the amount of memory used, and because it is then used in a calculation with some unsigned variables it blows up and comes up with this hilarious value of the amount of memory used. Now, I'll admit that Hobbit should probably do a sanity check on the data so it doesn't trigger alerts in these circumstances. But the core problem is that your box is reporting some weird data. Regards, Henrik
It sounds to me like vmstat is reporting the total memory as "real + disk"...which I believe is what the 4th column shows (I don't have a solaris server where I'm currently at to confirm). So, while hobbit keys on the physical (real) memory to determine the memory, it's keying on a computed value to do the differential. I've seen this under HPUX and linux on occasion for the memory tests for BB. I've usually just gone in and tweaked the scripts when that wouls happen. =G=