Xymon 4.2.3 rrd-data.log shows xstrdup: Cannot dup NULL string

list Peter Welter
Wed, 3 Jun 2009 12:12:48 +0200
Message-Id: <user-63cbfda625e3@xymon.invalid>

Here I am with some new data, because the problem still exists. I know
that the rrd-data-daemon crashes with the "xstrdup: Cannot dup NULL
string" error. I have setup netapp.pl with $Hobbit_fd_lib::debug = 2;
and fount that the systat output is different; don't know if it is the
real cause of the crash...?!

orwell:/usr/lib/hobbit/server/ext # cat
/var/lib/hobbit/tmp/netapp.sysstat.DEBUG.camelot
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s iSCSI  kB/s
                                  in   out   read  write  read write
age   hit time  ty util                 in   out    in   out
 29%     0  7976     0    7976  3147  5098   3872   3104     0     0
  3   96%  12%  T    8%      0     0     0     0     0     0

orwell:/usr/lib/hobbit/server/ext # cat
/var/lib/hobbit/tmp/netapp.sysstat.DEBUG.noah
 CPU   NFS  CIFS  HTTP   Total    Net kB/s   Disk kB/s     Tape kB/s
Cache Cache  CP   CP Disk    FCP iSCSI   FCP  kB/s
                                  in   out   read  write  read write
age   hit time  ty util                 in   out
  8%     0     0     0       0     1     6    986   1988     0     0

60  100%  13%  T    9%      0     0     0     0

The other files (/var/lib/hobbit/tmp/netapp.xtstats.DEBUG.camelot)
also show a change of output. The current beginning was previously the
ending of the output file. So now it begins with :

system:system:nfs_ops:3190/s
system:system:cifs_ops:0/s
system:system:http_ops:0/s
system:system:fcp_ops:0/s
system:system:iscsi_ops:0/s
system:system:read_ops:619/s
system:system:write_ops:144/s
system:system:net_data_recv:4187KB/s
system:system:net_data_sent:23328KB/s
system:system:disk_data_read:5493KB/s
system:system:disk_data_written:6156KB/s
system:system:cpu_busy:10%
system:system:avg_processor_busy:10%
system:system:total_processor_busy:20%
system:system:num_processors:2
system:system:time:1244021254s
system:system:uptime:1048085s
disk:2000001D:38B5ED6F:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:total_transfers:8/s
disk:2000001D:38B5ED6F:00000000:00000000:00000000:00000000:00000000:00000000:00000000:00000000:user_read_chain:3.60

Were from our pre-7.3.1.1 filers, the output is:

.....
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_read_latency:0us
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_read_blocks:0/s
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_write_latency:0us
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:guarenteed_write_blocks:0/s
disk:6BE7CF95:56AFA883:F30CAEF5:83103FAC:00000000:00000000:00000000:00000000:00000000:00000000:disk_busy:0%
system:system:nfs_ops:0/s
system:system:cifs_ops:0/s
system:system:http_ops:0/s
system:system:dafs_ops:0/s
system:system:fcp_ops:0/s
system:system:iscsi_ops:0/s
system:system:net_data_recv:13KB/s
system:system:net_data_sent:47KB/s
system:system:disk_data_read:986KB/s
system:system:disk_data_written:1988KB/s
system:system:cpu_busy:8%
system:system:avg_processor_busy:5%
system:system:total_processor_busy:10%
system:system:num_processors:2
system:system:time:1244021255s
system:system:uptime:7436873s


2009/5/30 Peter Welter <user-f55666bd0d1e@xymon.invalid>:

Addendum:
Turning off 'netapp.pl' to all filers and selectively turning it on
again, it appears that there are no problems with On Tap 7.2.3 and
7.2.4. The error does not show up and all trending (also for other
data-dependant trending) show no holes anymore.

But these 7.3.1.1-filers are very important, so I have to turn the
monitoring on again on this NetApp-cluster. Will see if debugging the
perl script will give more relevant data.

2009/5/29 Peter Welter <user-f55666bd0d1e@xymon.invalid>:

Hello all,

Last friday may 22 at 8:20 we finished our upgrade from our
NetApp-filers (version 7.2.3 to 7.3.1.1). These filers were (and are)
monitored by Xymon in combination with the perl-netapp-client.
Combined a great combo!

However, since the upgrade I keep getting this error in
/var/log/hobbit/rrd-data.log:
...
2009-05-22 08:22:00 xstrdup: Cannot dup NULL string
2009-05-22 08:22:00 Worker process died with exit code 6, terminating
....

This error appears every 5 minutes.

Only one graph-type is not trended anymore since the upgrade, the
xtstatscolumn which deliver all statistics about each drive in the
filer. (About +/- 20 graphs). Sometimes, it does trend some data but
that is for a very short time, let's say 5 or 15 minutes. Then for
hours, nothing.

One filer has not been upgraded, but shows the same lack of trending.
But that can be caused because I have set it up with MultiThreading
(something that can be set using a parameter).

Now I will change this to 1 (for each filer a seperate process) to see
if the problem can be narrowed, so I'll update this problem later on
this weekend.

Regards,
Peter

PS I do not know if this has to do with either Xymon of netapp.pl, but
since it is integrated into the Xymon-source (hobbitd/rrd/do_netapp.c)
I think it should be posted here.