Hi all
I was only back at the client today, and unfortunately have not
managed to get that patch in yet.
(As I mentioned before, it's a production system)
However, I did notice something really odd.
I have focused my attention on the trends graphs, where I get all the
extra values, but it's not happening in the test itself, despite the
existence of the additional rrd files.
Example.
I have something that plots the power usage of the PSUs on a NetApp
e-series.
There are 4 PSUs, output looks like this.
Total power drawn- 487 Watts
Number of trays- 2
Tray power input details-
TRAY ID POWER SUPPLY SERIAL NUMBER INPUT POWER
99 0 145 Watts
99 1 151 Watts
0 0 99 Watts
0 1 92 Watts
All good. And I have a graph with 4 lines. Min, Max, Curr and Avg
values are all there. It looks beautiful.
But go look at the power graph in trends, and it's ugly.
Heaps of additional data lines with no entries. All values are NaN
And mixed in amongst the additional empty graphs, are the 4 valid
lines.
I look at the rrd files, and they are all there, even the bad ones.
Here's a few of them.
power,tcpListenDrop.rrd
power,tcpOutAck.rrd
power,tcpOutDataSegs.rrd
power,tcpOutRsts.rrd
power,tcpOutUrg.rrd
power,tcpOutWinProbe.rrd
power,tcpRetransSegs.rrd
power,tcpRtoMax.rrd
power,tcpRttUpdate.rrd
power,tcpTimKeepaliveProbe.rrd
power,tcpTimRetransDrop.rrd
power,Tray0_PSU0.rrd <--- Valid
power,Tray0_PSU1.rrd <--- Valid
power,Tray99_PSU0.rrd <--- Valid
power,Tray99_PSU1.rrd <--- Valid
power,trlogpool.rrd
power,UDP_udpInDatagrams.rrd
power,udpInCksumErrs.rrd
power,udpOutDatagrams.rrd
power,vnet.rrd
So I thought I would check my configs.
In xymonserver
From TEST2RRD= ,power=ncv,
From GRAPHS= ,power::9,
And further down
SPLITNCV_power="*:GAUGE"
And in graphs.cfg
[power]
FNPATTERN power,(.*).rrd
TITLE Database Power Consumption Per Tray PSU
YAXIS Watts
-l 0
DEF:p at RRDIDX@=@RRDFN@:lambda:AVERAGE
LINE2:p at RRDIDX@#@COLOR@:@RRDPARAM@
GPRINT:p at RRDIDX@:LAST: \: %5.1lf (cur)
GPRINT:p at RRDIDX@:MAX: \: %5.1lf (max)
GPRINT:p at RRDIDX@:MIN: \: %5.1lf (min)
GPRINT:p at RRDIDX@:AVERAGE: \: %5.1lf (avg)\n
With luck I will get approval to recompile with the debugging bug-fix,
and we can get more info, but I thought the extra entries in trends, but
not in the test was interesting.
Regards
Vernon
On 13 March 2015 at 15:24, J.C. Cleaver <user-87556346d4af@xymon.invalid>
wrote:
On Wed, March 11, 2015 5:51 pm, Jeremy Laidman wrote:
On 11 March 2015 at 14:18, Vernon Everett <user-b3f8dacb72c8@xymon.invalid
wrote:
About now, I am getting a little nervous adding send and expect,
because
unlike telnet and telnets, we are doing ldap and ldaps testing.
That's understandable. A read through the code suggests that at
least in
some places, an empty string is equivalent to an undefined string,
as the
string length (shown in Sendlen in the debug output) is zero in both
cases. So until a patch is in place, a work-around might be to
define
empty "send" and "expect" strings for those that have none.
Any suggestions?
I think we have some debug code update recommendations for JC
though.
:-)
Here's my patch. I'll push this into the dev list for proposed
inclusion
in a future release.
--- lib/netservices.c.orig 2012-07-25 01:48:41.000000000 +1000
+++ lib/netservices.c 2015-03-12 11:18:18.000000000 +1100
@@ -328,9 +328,9 @@
dbgprintf("Service list dump\n");
for (i=0; (svcinfo[i].svcname); i++) {
dbgprintf(" Name : %s\n", svcinfo[i].svcname);
- dbgprintf(" Sendtext: %s\n",
binview(svcinfo[i].sendtxt,
svcinfo[i].sendlen));
+ dbgprintf(" Sendtext: %s\n",
svcinfo[i].sendtxt!=NULL?binview(svcinfo[i].sendtxt,
svcinfo[i].sendlen):"[null]");
dbgprintf(" Sendlen : %d\n", svcinfo[i].sendlen);
- dbgprintf(" Exp.text: %s\n",
binview(svcinfo[i].exptext,
svcinfo[i].explen));
+ dbgprintf(" Exp.text: %s\n",
svcinfo[i].exptext!=NULL?binview(svcinfo[i].exptext,
svcinfo[i].explen):"[null]");
dbgprintf(" Exp.len : %d\n", svcinfo[i].explen);
dbgprintf(" Exp.ofs : %d\n", svcinfo[i].expofs);
dbgprintf(" Flags : %d\n", svcinfo[i].flags);
This produces "[null]" where we would have seen "(null)" on a
GNU-based
OS,
to differentiate between the two situations.
In the mean time, you could compile a special version of
xymond_rrd, and
run it manually on the same data channel as the real one, but have
it make
RRD files and log file to a different location. This shouldn't
interfere
with your production Xymon. Here's one I prepared earlier that
works for
me:
sudo -u xymon mkdir /tmp/my-rrd-data/
sudo -u xymon xymoncmd /bin/sh -c 'XYMONTMP=/tmp;
/usr/lib/xymon/server/bin/xymond_channel --channel=data
--log=/tmp/my-rrd-data.log /path/to/xymond_rrd_debug_patch
--rrddir=/tmp/my-rrd-data/ --debug'
This seems to show some really useful stuff that's relevant to
solving
your
problem. Some sample debug lines:
15306 2015-03-12 11:36:28 xymond_rrd_debug_patch: Got message 165619
@@data#165619/servername|1426120588.401891|172.16.0.1||servername|vmstat|sunos|ABC
...
15306 2015-03-12 11:36:28 Creating rrd
/tmp/my-rrd-data//servername/vmstat.rrd
15306 2015-03-12 11:36:28 RRD create param 00: 'rrdcreate'
15306 2015-03-12 11:36:28 RRD create param 01:
'/tmp/my-rrd-data//servername/vmstat.rrd'
15306 2015-03-12 11:36:28 RRD create param 02: '-s'
15306 2015-03-12 11:36:28 RRD create param 03: '300'
15306 2015-03-12 11:36:28 RRD create param 04:
'DS:cpu_r:GAUGE:600:0:U'
15306 2015-03-12 11:36:28 RRD create param 05:
'DS:cpu_b:GAUGE:600:0:U'
15306 2015-03-12 11:36:28 RRD create param 06:
'DS:cpu_w:GAUGE:600:0:U'
...
15306 2015-03-12 11:39:42 Got 265 bytes
15306 2015-03-12 11:39:42 xymond_rrd_debug_patch: Got message 165737
@@data#165737/servername|1426120782.080244|172.16.0.2||servername|trends||DEF
15306 2015-03-12 11:39:42 startpos 216644, fillpos 216644, endpos -1
15306 2015-03-12 11:39:42 Flushing
'/servername/tcp.xopiy90404.parameter.rrd' with 1 updates pending,
template
'sec'
15306 2015-03-12 11:39:42 Want msg 165738, startpos 216644, fillpos
216644,
endpos -1, usedbytes=0, bufleft=1884603
J
This is some excellent sleuthing! :)
As I was pouring through the thread (sorry, I've been out the last few
days), I failed to take note of the SPARC-Enterprise-T2000 in the
output.
The patch below should fix the immediate issue triggered by debug
mode...
letting us move on to the larger oddness. Unfortunately, I have a
feeling
there are other occasions where we're relying on GNU's printf(NULL)
printing that out and thus might be caught by this. As I find them, I
go
ahead and work to put fixes in.
In the meantime, this will be in 4.3.19 and can be patched directly
from
below.
HTH,
-jc
--- lib/netservices.c (revision 7598)
+++ lib/netservices.c (working copy)
@@ -81,9 +81,9 @@
unsigned char *inp, *outp;
int i;
- if (!buf) return NULL;
+ if (result) xfree(result);
+ if (!buf) { result = strdup("[null]"); return result; }
- if (result) xfree(result);
if (buf && (buflen == 0)) buflen = strlen(buf);
result = (char *)malloc(4*buflen + 1); /* Worst case: All
binary */
--
"Accept the challenges so that you can feel the exhilaration of
victory"
- General George Patton