Xymon Mailing List Archive search

hobbitd_rrd error

list Geoff Steer
Fri, 31 Mar 2006 15:21:20 +1100
Message-Id: <user-f3c48ceba6b7@xymon.invalid>

I've finally gotten back to looking at this problem and have some more
info that may be relevant. It hasn't been high on the list as hobbit  is
still working fine for alerts.

Firstly, I've tried removing the existing rrd files and letting hobbit
create new ones, no change - the core files still are produced.

I've tried building hobbit 4.1.2p1 with rrdtool 1.2.11 and with 1.2.12,
no change. This is with existing rrd files and also letting hobbit
create new ones as required.

In looking at the current core files with gdb, it seems that that they
all report an error related to sendmail:

(gdb) bt
#0  0x00abe7a2 in ?? () from /lib/ld-linux.so.2
#1  0x00afe7d5 in raise () from /lib/tls/libc.so.6
#2  0x00b00149 in abort () from /lib/tls/libc.so.6
#3  0x08054af2 in sigsegv_handler (signum=11) at sig.c:57
#4  0x00afe8c8 in killpg () from /lib/tls/libc.so.6
#5  0x0804e011 in do_sendmail_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xbffc5dd0  tstamp=1143771322)
    at rrd/do_sendmail.c:127
#6  0x08050120 in update_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xb7f6f05b "data outrelay1,firstwave,com,au.sendmail Fri Mar 31
13:15:22 EST 2006\nStatistics from Tue Jun 21 10:47:07 2005\n M   msgsfr
bytes_from   msgsto    bytes_to  msgsrej msgsdis msgsqur  Mailer\n 3
25299848"..., 
    tstamp=1143771322, sender=0x0, ldef=0x0) at do_rrd.c:271
#7  0x08049e3a in main (argc=0, argv=0xbffca4e4) at hobbitd_rrd.c:199

I'm ready to rebuild the server entirely but I'm not convinced that this
will resolve the issue. As I said previously, this set up has been
working fine for months, the problem started for no obvious reason in
early march.

Regards
geoff


On Mon, 2006-03-06 at 10:58 +0100, Henrik Stoerner wrote:
On Mon, Mar 06, 2006 at 03:55:13PM +1100, Geoff Steer wrote:
My hobbit server has been error free since I installed 4.1.2 but in the
last day of so, has had an error for hobbitd_rrd .

The rrd-data.log shows:
*** glibc detected *** double linked list
Worker process died with exit code 134
*** glibc detected *** double free or corruption (fasttop)
This usually indicates some sort of corruption of the memory
allocation inside hobbitd_rrd. Since hobbitd_rrd depends on the
rrdtool library, it could also be a problem with that.

Since it's glibc you're probably on a Linux/Intel platform.
Would it be possible for you to run the hobbitd_rrd command
through the "Valgrind" memory checker ? I don't know if
Valgrind is included with your distribution - it is part
of the standard Debian release, but your distro might be
different. If you can get it installed, then just change
the command in the "[rrddata]" section from

CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
    hobbitd_rrd --rrddir=$BBVAR/rrd

to

CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
    valgrind --log-file=$BBSERVERLOGS/valgrind.log \
    hobbitd_rrd --rrddir=$BBVAR/rrd

Let it run until the errors shows up, then send me the valgrind.log.*
files.


Regards,
Henrik


-------------------------------Safe Stamp-----------------------------------
Your Anti-virus Service scanned this email. It is safe from known viruses.
For more information regarding this service, please contact your service provider.

-------------------------------Safe Stamp-----------------------------------
The sender's Anti-virus Service scanned this email. It is safe from known viruses.