Xymon Mailing List Archive search

hobbitd coredumping and purple trends

list Henrik Størner
Sat, 2 Apr 2005 08:51:40 +0200
Message-Id: <user-287af29a5b7f@xymon.invalid>

On Fri, Apr 01, 2005 at 05:22:42PM -0500, Deal, Richard wrote:
My hobbitd is core dumping every so often and less often but still
occasional the trends column turns purple.
hobbitd crashing - that's bad.

Could you run the core-dump through gdb and send me the call-trace.
Do this:

    $ gdb ~hobbit/server/bin/hobbitd /tmp/core-file-from-hobbitd
    [messages from gdb]
    gdb> bt

and send me the output from that "bt" command.

Looking through the makefile the only oddity is MAXMSG=32768
Were my old BBd was set to #define MAXLINE  11264
Shouldn't cause any problems, it just means Hobbit will accept larger
messages than your BB setup.
more bb-display.log 
2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:03:00 connect to bbd failed - Connection refused
Probably a result of hobbitd being down.

I have a lot of these errors in larrd-data.log from various hosts.
2005-04-01 17:17:53 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/ray1.tigr.org/netstat.rrd from
172.17.10.20: expected 12 data source readings (got 16) from
The "netstat" and "vmstat" RRD files from LARRD are not compatible
with Hobbit. Do a

   find ~hobbit/data/rrd -name netstat.rrd | xargs rm -f

to delete the old files.

005-04-01 17:18:10 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/IGR51RRTB.tigr.org/temperature
.module_6_asic-.rrd from 172.17.10.16: illegal attempt to update using
time 1112393889 when last update time is 1112393889 (minimum one second
step)
This is a bit more tricky. It means that the same RRD file was being
updated by two status messages within one second - that normally
should not happen, because a status is sent every 5 minutes. It can
happen if you have two hosts reporting the same hostname (one of them
would be the 172.17.10.16 IP you have in that error message).


Regards,
Henrik