hobbitd_rrd crash in trunk
list Olivier Beau
Hi, i'm running trunk, and got a red "hobbitd_rrd" column for30 minutes this night, it went purple after the"hobbitd_rrd" said "program crash - Fatal signal caught" -> what could be causing this ? olivier
list Henrik Størner
In <user-f4c95569bcb1@xymon.invalid> Olivier Beau <user-eb340192b6fc@xymon.invalid> writes:
i'm running trunk, and got a red "hobbitd_rrd" column for 30 minutes this
▸
night, it went purple after the"hobbitd_rrd" said "program crash - Fatal signal caught"-> what could be causing this ?
The hobbitd_rrd column is a way of making sure you notice that there's been a crash of a Hobbit program. You can remove it with bb 127.0.0.1 "drop YOURHOBBITSERVER hobbitd_rrd" like you would remove any other status column. It would of course be nice to figure out *why* hobbitd_rrd crashed. It should leave a core-file in the ~hobbit/server/tmp/ directory, so if you could run it through gdb as described in http://www.xymon.com/hobbit/help/known-issues.html#bugreport it would help a lot. Thanks, Henrik
list Olivier Beau
ok, i understand this column here is the backtrace : (i'm running debian 4.0, 32bit) $ gdb bin/hobbitd_rrd tmp/core GNU gdb 6.4.90-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"...Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". warning: Can't read pathname for load map: Input/output error. Reading symbols from /usr/lib/librrd.so.2...done. Loaded symbols for /usr/lib/librrd.so.2 Reading symbols from /usr/lib/libpng12.so.0...done. Loaded symbols for /usr/lib/libpng12.so.0 Reading symbols from /usr/lib/libpcre.so.3...done. Loaded symbols for /usr/lib/libpcre.so.3 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...done. Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8 Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...done. Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /usr/lib/libfreetype.so.6...done. Loaded symbols for /usr/lib/libfreetype.so.6 Reading symbols from /usr/lib/libart_lgpl_2.so.2...done. Loaded symbols for /usr/lib/libart_lgpl_2.so.2 Reading symbols from /lib/tls/i686/cmov/libm.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libm.so.6 Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libdl.so.2 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Core was generated by `hobbitd_rrd --rrddir=/data/hobbit/data/rrd --extra-script=/data/hobbit/server/e'. Program terminated with signal 6, Aborted. #0 0xb7f72410 in ?? () (gdb) bt #0 0xb7f72410 in ?? () #1 0xbfbe576c in ?? () #2 0x00000006 in ?? () #3 0x000017c0 in ?? () #4 0xb7c42811 in raise () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7c43fb9 in abort () from /lib/tls/i686/cmov/libc.so.6 #6 0x0806d9d1 in sigsegv_handler (signum=11) at sig.c:58 #7 0xb7f72420 in ?? () #8 0x0000000b in ?? () #9 0x00000033 in ?? () #10 0x00000000 in ?? () (gdb) Olivier
list Olivier Beau
Hello Henrik, This is happening once or twice a day. each time i seem to be loosing 15 to 30 minutes of some graphs (i supposed this because of the rrdcache module) Anything i could do to help fix this ? Olivier
▸
On 13/01/2009 13:21, Olivier Beau wrote:ok, i understand this column here is the backtrace : (i'm running debian 4.0, 32bit) $ gdb bin/hobbitd_rrd tmp/core GNU gdb 6.4.90-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"...Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". warning: Can't read pathname for load map: Input/output error. Reading symbols from /usr/lib/librrd.so.2...done. Loaded symbols for /usr/lib/librrd.so.2 Reading symbols from /usr/lib/libpng12.so.0...done. Loaded symbols for /usr/lib/libpng12.so.0 Reading symbols from /usr/lib/libpcre.so.3...done. Loaded symbols for /usr/lib/libpcre.so.3 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.8...done. Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.8 Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.8...done. Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.8 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /usr/lib/libfreetype.so.6...done. Loaded symbols for /usr/lib/libfreetype.so.6 Reading symbols from /usr/lib/libart_lgpl_2.so.2...done. Loaded symbols for /usr/lib/libart_lgpl_2.so.2 Reading symbols from /lib/tls/i686/cmov/libm.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libm.so.6 Reading symbols from /lib/tls/i686/cmov/libdl.so.2...done. Loaded symbols for /lib/tls/i686/cmov/libdl.so.2 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Core was generated by `hobbitd_rrd --rrddir=/data/hobbit/data/rrd --extra-script=/data/hobbit/server/e'. Program terminated with signal 6, Aborted. #0 0xb7f72410 in ?? () (gdb) bt #0 0xb7f72410 in ?? () #1 0xbfbe576c in ?? () #2 0x00000006 in ?? () #3 0x000017c0 in ?? () #4 0xb7c42811 in raise () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7c43fb9 in abort () from /lib/tls/i686/cmov/libc.so.6 #6 0x0806d9d1 in sigsegv_handler (signum=11) at sig.c:58 #7 0xb7f72420 in ?? () #8 0x0000000b in ?? () #9 0x00000033 in ?? () #10 0x00000000 in ?? () (gdb) Olivier
list Henrik Størner
▸
In <user-fcae5e5390a7@xymon.invalid> Olivier Beau <user-eb340192b6fc@xymon.invalid> writes:
ok, i understand this column
here is the backtrace : (i'm running debian 4.0, 32bit)
$ gdb bin/hobbitd_rrd tmp/core Core was generated by `hobbitd_rrd --rrddir=/data/hobbit/data/rrd --extra-script=/data/hobbit/server/e'. Program terminated with signal 6, Aborted. #0 0xb7f72410 in ?? () (gdb) bt #0 0xb7f72410 in ?? () #1 0xbfbe576c in ?? () #2 0x00000006 in ?? () #3 0x000017c0 in ?? () #4 0xb7c42811 in raise () from /lib/tls/i686/cmov/libc.so.6 #5 0xb7c43fb9 in abort () from /lib/tls/i686/cmov/libc.so.6 #6 0x0806d9d1 in sigsegv_handler (signum=11) at sig.c:58 #7 0xb7f72420 in ?? () #8 0x0000000b in ?? () #9 0x00000033 in ?? () #10 0x00000000 in ?? ()
Yuck, not much help there - I was hoping there would be some more
meaningful stuff in there.
Since this happens regularly, could you try running hobbitd_rrd
for a while with debugging enabled ? Either restart it with the
"--debug" option, or do a "killall -USR2 hobbitd_rrd" while it is
running (the USR2 signal toggles debugging output on/off). The
debug output goes to the normal hobbitd_rrd logfile (rrd-status.log
or rrd-data.log).
Regards,
Henrik
list Olivier Beau
Hi Henrik, It happened again today at 17:00:22. Nothing new when doing a bt on the coredump. An extract of rrd-status.log from 16h55 to 17h05 is available at http://www.qalpit.com/~olivier/tmp/rrd-status.log.gz Olivier ps: hobbitd_rrd only crashes on the status channel (hobbitd_rdd running on the data channel never crashed)
▸
Since this happens regularly, could you try running hobbitd_rrd for a while with debugging enabled ? Either restart it with the "--debug" option, or do a "killall -USR2 hobbitd_rrd" while it is running (the USR2 signal toggles debugging output on/off). The debug output goes to the normal hobbitd_rrd logfile (rrd-status.log or rrd-data.log).
list Henrik Størner
▸
In <user-bb605d1afd88@xymon.invalid> Olivier Beau <user-eb340192b6fc@xymon.invalid> writes:
It happened again today at 17:00:22. Nothing new when doing a bt on the coredump. An extract of rrd-status.log from 16h55 to 17h05 is available at http://www.qalpit.com/~olivier/tmp/rrd-status.log.gz
OK, the interesting part is here when it crashes: 2009-01-19 17:00:22 hobbitd_rrd: Got message 181436 @@status#181436/cedratnet-bdd1|1232380822.602633|127. 0.0.1||cedratnet-bdd1|mysql|1232398822|green||green|1231215890|0||0||1232380812|0|linuxmysql|unix/mysql 2009-01-19 17:00:22 startpos 342639, fillpos 378880, endpos 342991 2009-01-19 17:00:22 hobbitd_rrd: Got message 181437 @@status#181437/moniteur-ora2|1232380822.618847|10.12 .0.67||moniteur-ora2|cpu|1255363113|blue||blue|1228751913|0||1255363113|Disabled by 2009-01-19 17:00:22 startpos 342995, fillpos 378880, endpos -1 2009-01-19 17:00:22 Peer at 0.0.0.0:0 failed: Broken pipe 2009-01-19 17:00:22 Peer not up, flushing message queue 2009-01-19 17:00:22 Opening file /data/hobbit/server/etc/hobbit-rrddefinitions.cfg 2009-01-19 17:00:22 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383 2009-01-19 17:00:22 hobbitd_rrd: Got message 181450 @@status#181450/nurun-etam-bdd1|1232380822.807004|127 .0.0.1||nurun-etam-bdd1|mysql|1232398822|green||green|1231768476|0||0||1232380582|0|linuxmysql|unix/mysql 2009-01-19 17:00:22 startpos 17100, fillpos 19357, endpos 17846 2009-01-19 17:00:22 Opening file /data/hobbit/server/etc/bb-hosts It appears to be a "mysql" status from either cedratnet-bdd1 or nurun-etam-bdd1 that causes the crash (I cannot tell exactly, because output buffering comes into play when there's a crash). It *could* also be the cpu-report from moniteur-ora2, but I doubt that - the cpu-status is tested a lot more than the mysql-status. In fact, "mysql" isn't part of hobbitd_rrd by default. So is this something you've added ? Is it something that you generate graphs for ? Or is it just a status that hobbitd_rrd should ignore ? Regards, Henrik
list Olivier Beau
Henrik, Here are 2 other extracts from crashes : 2009-01-20 16:36:06 hobbitd_rrd: Got message 517875 @@status#517875/sw01.courrierinternational|1232465766.838715|192.168.255.32 ||sw01.courrierinternational|if_load|1232467566|green||green|1225102669|0||0||0|0||network/switch-dedie 2009-01-20 16:36:06 startpos 162634, fillpos 166552, endpos -1 2009-01-20 16:36:06 Want msg 517876, startpos 162634, fillpos 166552, endpos -1, usedbytes=3918, bufleft=361831 2009-01-20 16:36:06 Want msg 517876, startpos 162634, fillpos 170333, endpos -1, usedbytes=7699, bufleft=358050 2009-01-20 16:36:06 hobbitd_rrd: Got message 517876 @@status#517876/sw01.ctoutvert|1232465766.838761|192.168.255.32||sw01.ctout vert|memory|1234247285|blue||blue|1231828085|0||1234247285|Disabled by 2009-01-20 16:36:06 startpos 172884, fillpos 172884, endpos -1 2009-01-20 16:36:06 Peer at 0.0.0.0:0 failed: Broken pipe 2009-01-20 16:36:06 Peer not up, flushing message queue 2009-01-20 16:36:06 Opening file /data/hobbit/server/etc/hobbit-rrddefinitions.cfg 2009-01-20 16:36:06 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383 2009-01-20 16:36:06 hobbitd_rrd: Got message 517913 @@status#517913/sw01.excenteurofac|1232465766.929692|192.168.255.32||sw01.e xcenteurofac|if_err|1232467566|green||green|1231866461|0||0||0|0||network/switch-dedie if_load and if_err are status from devmon, that i do not graph using ncv/extra-test.. memory is also generate from devmon, and is graphes by default in xymon 2009-01-22 17:14:20 hobbitd_rrd: Got message 343666 @@status#343666/logicimmo-netapp2|1232640859.848737|127.0.0.1||logicimmo-ne tapp2|disk|2147483647|blue||blue|1232479545|0||-1|Disabled by 2009-01-22 17:14:20 startpos 417512, fillpos 419047, endpos -1 2009-01-22 17:14:20 Peer at 0.0.0.0:0 failed: Broken pipe 2009-01-22 17:14:20 Peer not up, flushing message queue 2009-01-22 17:14:20 Opening file /data/hobbit/server/etc/hobbit-rrddefinitions.cfg 2009-01-22 17:14:20 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383 2009-01-22 17:14:20 hobbitd_rrd: Got message 343677 @@status#343677/tif-netapp1|1232640860.884630|127.0.0.1||tif-netapp1|disk|1 232644460|green||green|1230710616|0||0||0|0|stockage|unix/infrasys/stockage 2009-01-22 17:14:20 startpos 1335, fillpos 3954, endpos 2589 disk is generate by netapp.pl (from the hobbit-client-perl) -> i noticed that in my 3 extracts, the last log before the crash is disabled. Looks like this could be a problem ? (i've check 2 other crashes, and there again, the last log is a disabled status) i checked those 3 disabled status : those hosts are up and running (so normal status are sent to hobbitd) we have disabled them for migration purpose, that might happen in a few days, or weeks... For your mysql question : yes i do graph mysql using NVC NCV_mysql="Questions:DERIVE,Threadsconnected:GAUGE,*:NONE" Olivier
▸
On 22/01/2009 15:29, Henrik Størner wrote:In <user-bb605d1afd88@xymon.invalid> Olivier Beau <user-eb340192b6fc@xymon.invalid> writes:It happened again today at 17:00:22. Nothing new when doing a bt on the coredump. An extract of rrd-status.log from 16h55 to 17h05 is available at http://www.qalpit.com/~olivier/tmp/rrd-status.log.gzOK, the interesting part is here when it crashes: 2009-01-19 17:00:22 hobbitd_rrd: Got message 181436 @@status#181436/cedratnet-bdd1|1232380822.602633|127. 0.0.1||cedratnet-bdd1|mysql|1232398822|green||green|1231215890|0||0||1232380812|0|linuxmysql|unix/mysql 2009-01-19 17:00:22 startpos 342639, fillpos 378880, endpos 342991 2009-01-19 17:00:22 hobbitd_rrd: Got message 181437 @@status#181437/moniteur-ora2|1232380822.618847|10.12 .0.67||moniteur-ora2|cpu|1255363113|blue||blue|1228751913|0||1255363113|Disabled by 2009-01-19 17:00:22 startpos 342995, fillpos 378880, endpos -1 2009-01-19 17:00:22 Peer at 0.0.0.0:0 failed: Broken pipe 2009-01-19 17:00:22 Peer not up, flushing message queue 2009-01-19 17:00:22 Opening file /data/hobbit/server/etc/hobbit-rrddefinitions.cfg 2009-01-19 17:00:22 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383 2009-01-19 17:00:22 hobbitd_rrd: Got message 181450 @@status#181450/nurun-etam-bdd1|1232380822.807004|127 .0.0.1||nurun-etam-bdd1|mysql|1232398822|green||green|1231768476|0||0||1232380582|0|linuxmysql|unix/mysql 2009-01-19 17:00:22 startpos 17100, fillpos 19357, endpos 17846 2009-01-19 17:00:22 Opening file /data/hobbit/server/etc/bb-hosts It appears to be a "mysql" status from either cedratnet-bdd1 or nurun-etam-bdd1 that causes the crash (I cannot tell exactly, because output buffering comes into play when there's a crash). It *could* also be the cpu-report from moniteur-ora2, but I doubt that - the cpu-status is tested a lot more than the mysql-status. In fact, "mysql" isn't part of hobbitd_rrd by default. So is this something you've added ? Is it something that you generate graphs for ? Or is it just a status that hobbitd_rrd should ignore ? Regards, Henrik
list Olivier Beau
i enabled all disabled status that are used from graphing (cpu,disk,procs,...) and i have not had a single crash in the last 36 hours (before, crashes would happen at least twice per day) -> from my user point of view, it looks like disabled status can crash hobbitd_rrd olivier
▸
On 22/01/2009 21:28, Olivier Beau wrote:Henrik, Here are 2 other extracts from crashes : 2009-01-20 16:36:06 hobbitd_rrd: Got message 517875 @@status#517875/sw01.courrierinternational|1232465766.838715|192.168.255.32 ||sw01.courrierinternational|if_load|1232467566|green||green|1225102669|0||0||0|0||network/switch-dedie 2009-01-20 16:36:06 startpos 162634, fillpos 166552, endpos -1 2009-01-20 16:36:06 Want msg 517876, startpos 162634, fillpos 166552, endpos -1, usedbytes=3918, bufleft=361831 2009-01-20 16:36:06 Want msg 517876, startpos 162634, fillpos 170333, endpos -1, usedbytes=7699, bufleft=358050 2009-01-20 16:36:06 hobbitd_rrd: Got message 517876 @@status#517876/sw01.ctoutvert|1232465766.838761|192.168.255.32||sw01.ctout vert|memory|1234247285|blue||blue|1231828085|0||1234247285|Disabled by 2009-01-20 16:36:06 startpos 172884, fillpos 172884, endpos -1 2009-01-20 16:36:06 Peer at 0.0.0.0:0 failed: Broken pipe 2009-01-20 16:36:06 Peer not up, flushing message queue 2009-01-20 16:36:06 Opening file /data/hobbit/server/etc/hobbit-rrddefinitions.cfg 2009-01-20 16:36:06 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383 2009-01-20 16:36:06 hobbitd_rrd: Got message 517913 @@status#517913/sw01.excenteurofac|1232465766.929692|192.168.255.32||sw01.e xcenteurofac|if_err|1232467566|green||green|1231866461|0||0||0|0||network/switch-dedie if_load and if_err are status from devmon, that i do not graph using ncv/extra-test.. memory is also generate from devmon, and is graphes by default in xymon 2009-01-22 17:14:20 hobbitd_rrd: Got message 343666 @@status#343666/logicimmo-netapp2|1232640859.848737|127.0.0.1||logicimmo-ne tapp2|disk|2147483647|blue||blue|1232479545|0||-1|Disabled by 2009-01-22 17:14:20 startpos 417512, fillpos 419047, endpos -1 2009-01-22 17:14:20 Peer at 0.0.0.0:0 failed: Broken pipe 2009-01-22 17:14:20 Peer not up, flushing message queue 2009-01-22 17:14:20 Opening file /data/hobbit/server/etc/hobbit-rrddefinitions.cfg 2009-01-22 17:14:20 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383 2009-01-22 17:14:20 hobbitd_rrd: Got message 343677 @@status#343677/tif-netapp1|1232640860.884630|127.0.0.1||tif-netapp1|disk|1 232644460|green||green|1230710616|0||0||0|0|stockage|unix/infrasys/stockage 2009-01-22 17:14:20 startpos 1335, fillpos 3954, endpos 2589 disk is generate by netapp.pl (from the hobbit-client-perl) -> i noticed that in my 3 extracts, the last log before the crash is disabled. Looks like this could be a problem ? (i've check 2 other crashes, and there again, the last log is a disabled status) i checked those 3 disabled status : those hosts are up and running (so normal status are sent to hobbitd) we have disabled them for migration purpose, that might happen in a few days, or weeks... For your mysql question : yes i do graph mysql using NVC NCV_mysql="Questions:DERIVE,Threadsconnected:GAUGE,*:NONE" Olivier On 22/01/2009 15:29, Henrik Størner wrote:In <user-bb605d1afd88@xymon.invalid> Olivier Beau <user-eb340192b6fc@xymon.invalid> writes:It happened again today at 17:00:22. Nothing new when doing a bt on the coredump. An extract of rrd-status.log from 16h55 to 17h05 is available at http://www.qalpit.com/~olivier/tmp/rrd-status.log.gzOK, the interesting part is here when it crashes: 2009-01-19 17:00:22 hobbitd_rrd: Got message 181436 @@status#181436/cedratnet-bdd1|1232380822.602633|127. 0.0.1||cedratnet-bdd1|mysql|1232398822|green||green|1231215890|0||0||1232380812|0|linuxmysql|unix/mysql 2009-01-19 17:00:22 startpos 342639, fillpos 378880, endpos 342991 2009-01-19 17:00:22 hobbitd_rrd: Got message 181437 @@status#181437/moniteur-ora2|1232380822.618847|10.12 .0.67||moniteur-ora2|cpu|1255363113|blue||blue|1228751913|0||1255363113|Disabled by 2009-01-19 17:00:22 startpos 342995, fillpos 378880, endpos -1 2009-01-19 17:00:22 Peer at 0.0.0.0:0 failed: Broken pipe 2009-01-19 17:00:22 Peer not up, flushing message queue 2009-01-19 17:00:22 Opening file /data/hobbit/server/etc/hobbit-rrddefinitions.cfg 2009-01-19 17:00:22 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbytes=0, bufleft=528383 2009-01-19 17:00:22 hobbitd_rrd: Got message 181450 @@status#181450/nurun-etam-bdd1|1232380822.807004|127 .0.0.1||nurun-etam-bdd1|mysql|1232398822|green||green|1231768476|0||0||1232380582|0|linuxmysql|unix/mysql 2009-01-19 17:00:22 startpos 17100, fillpos 19357, endpos 17846 2009-01-19 17:00:22 Opening file /data/hobbit/server/etc/bb-hosts It appears to be a "mysql" status from either cedratnet-bdd1 or nurun-etam-bdd1 that causes the crash (I cannot tell exactly, because output buffering comes into play when there's a crash). It *could* also be the cpu-report from moniteur-ora2, but I doubt that - the cpu-status is tested a lot more than the mysql-status. In fact, "mysql" isn't part of hobbitd_rrd by default. So is this something you've added ? Is it something that you generate graphs for ? Or is it just a status that hobbitd_rrd should ignore ? Regards, Henrik
list Henrik Størner
Hi Olivier, just one quick question: What version of the RRDtool library are you using ? I just had a rather nasty problem which looks like a problem with the RRD library on Debian (1.2.15); upgrading to 1.2.30 - the latest 1.2.x version - made the problem disappear. This was causing random crashes of hobbitd_rrd ... Regards, Henrik
list Olivier Beau
Hello Henrik, I upgraded to 1.2.30 and started disabling some status, and after a few hours, got a crash last log is again a disabled status: 2009-02-02 17:47:46 hobbitd_rrd: Got message 812317 @@status#812317/www-vip-topannonce|1233593260.660693| 127.0.0.1||www-vip-test|http|2147483647|blue||blue|1233592711|0||-1|Disabled by 2009-02-02 17:47:46 startpos 211966, fillpos 227847, endpos -1 2009-02-02 17:47:46 Peer at 0.0.0.0:0 failed: Broken pipe 2009-02-02 17:47:46 Peer not up, flushing message queue Olivier
▸
On 30/01/2009 17:27, Henrik Størner wrote:Hi Olivier, just one quick question: What version of the RRDtool library are you using ? I just had a rather nasty problem which looks like a problem with the RRD library on Debian (1.2.15); upgrading to 1.2.30 - the latest 1.2.x version - made the problem disappear. This was causing random crashes of hobbitd_rrd ... Regards, Henrik
list Jason Hand
One more issue I need help with. If I wanted to change the default PING interval from 5 minutes to 1 minute, where would I do that? Thanks, Jason
list Jason Hand
One more issue I need help with. If I wanted to change the default PING interval from 5 minutes to 1 minute, where would I do that? Thanks, Jason
list Josh Luthman
Look in the server config files. Should take 5 minutes to find.
▸
On 2/3/09, Jason Hand <user-17f7af22f408@xymon.invalid> wrote:One more issue I need help with. If I wanted to change the default PING interval from 5 minutes to 1 minute, where would I do that? Thanks, Jason
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Buchan Milne
▸
----- "Jason Hand" <user-17f7af22f408@xymon.invalid> wrote:
One more issue I need help with. If I wanted to change the default PING interval from 5 minutes to 1 minute, where would I do that?
Change the interval for the bbnet task in hobbitlaunch.cfg Regards, Buchan
list Jason Hand
That was it, thanks! -Jason
▸
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Tuesday, February 03, 2009 9:54 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Where can you change the default hobbitping interval?
----- "Jason Hand" <user-17f7af22f408@xymon.invalid> wrote:
One more issue I need help with. If I wanted to change the default PING interval from 5 minutes to 1 minute, where would I do that?
Change the interval for the bbnet task in hobbitlaunch.cfg Regards, Buchan
07:51:00
list Henrik Størner
In <user-d36f528cd948@xymon.invalid> Jason Hand <user-17f7af22f408@xymon.invalid> writes:
One more issue I need help with. If I wanted to change the default PING interval from 5 minutes to 1 minute, where would I do that?
INTERVAL for the [bbnet] task in hobbitlaunch.cfg Henrik
list Jakub Heichman
Hello there, I'm experiencing similar crashes of hobbit_rrd using 4.3.0-beta2 with rrdtool-1.4.2. I have custom tests using NCV graphs. Disabled tests seem to be crashing hobbit_rrd in the status channel: 2010-01-21 09:33:18 hobbitd_rrd: Got message 526516 @@status#526516/cnsb1|1264066398.129576|10.10.0.165||cnsb1|dns|1295449658|blue||blue|1264000058|0||1295449658|Disabled by 2010-01-21 09:33:18 startpos 86857, fillpos 86857, endpos -1 2010-01-21 09:33:18 Peer at 0.0.0.0:0 failed: Broken pipe 2010-01-21 09:33:18 Peer not up, flushing message queue 2010-01-21 09:39:28 hobbitd_rrd: Got message 551600 @@status#551600/prx09|1264066768.465806|10.10.1.179||prx09|cdaemon|2147483647|blue||blue|1263999443|0||-1|Disabled by 2010-01-21 09:39:28 startpos 353816, fillpos 354099, endpos -1 2010-01-21 09:39:28 Peer at 0.0.0.0:0 failed: Broken pipe 2010-01-21 09:39:28 Peer not up, flushing message queue 2010-01-21 09:45:53 hobbitd_rrd: Got message 581864 @@status#581864/prx34|1264067153.276604|10.10.1.74||prx34|cdaemon|2147483647|blue||blue|1263999443|0||-1|Disabled by 2010-01-21 09:45:53 startpos 1147497, fillpos 1147497, endpos -1 2010-01-21 09:45:53 Peer at 0.0.0.0:0 failed: Broken pipe 2010-01-21 09:45:53 Peer not up, flushing message queue GDB output: Reading symbols from /usr/local/rrdtool-1.4.2/lib/librrd.so.4...done. Loaded symbols for /usr/local/rrdtool/lib/librrd.so.4 Reading symbols from /usr/lib/libpng12.so.0...done. Loaded symbols for /usr/lib/libpng12.so.0 Reading symbols from /lib/libpcre.so.0...done. Loaded symbols for /lib/libpcre.so.0 Reading symbols from /lib/librt.so.1...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/lib/libxml2.so.2...done. Loaded symbols for /usr/lib/libxml2.so.2 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libpangocairo-1.0.so.0...done. Loaded symbols for /usr/lib/libpangocairo-1.0.so.0 Reading symbols from /usr/lib/libpango-1.0.so.0...done. Loaded symbols for /usr/lib/libpango-1.0.so.0 Reading symbols from /usr/lib/libcairo.so.2...done. Loaded symbols for /usr/lib/libcairo.so.2 Reading symbols from /lib/libgobject-2.0.so.0...done. Loaded symbols for /lib/libgobject-2.0.so.0 Reading symbols from /lib/libgmodule-2.0.so.0...done. Loaded symbols for /lib/libgmodule-2.0.so.0 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libglib-2.0.so.0...done. Loaded symbols for /lib/libglib-2.0.so.0 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libpthread.so.0...done. Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/libpangoft2-1.0.so.0...done. Loaded symbols for /usr/lib/libpangoft2-1.0.so.0 Reading symbols from /usr/lib/libfontconfig.so.1...done. Loaded symbols for /usr/lib/libfontconfig.so.1 Reading symbols from /usr/lib/libfreetype.so.6...done. Loaded symbols for /usr/lib/libfreetype.so.6 Reading symbols from /usr/lib/libXrender.so.1...done. Loaded symbols for /usr/lib/libXrender.so.1 Reading symbols from /usr/lib/libX11.so.6...done. Loaded symbols for /usr/lib/libX11.so.6 Reading symbols from /lib/libexpat.so.0...done. Loaded symbols for /lib/libexpat.so.0 Reading symbols from /usr/lib/libXau.so.6...done. Loaded symbols for /usr/lib/libXau.so.6 Reading symbols from /usr/lib/libXdmcp.so.6...done. Loaded symbols for /usr/lib/libXdmcp.so.6 Core was generated by `hobbitd_rrd --rrddir=/var/hobbit/data/rrd'. Program terminated with signal 6, Aborted. [New process 9999] #0 0x009c9410 in __kernel_vsyscall () (gdb) bt #0 0x009c9410 in __kernel_vsyscall () #1 0x0042ddf0 in raise () from /lib/libc.so.6 #2 0x0042f701 in abort () from /lib/libc.so.6 #3 0x08069e93 in sigsegv_handler () #4 <signal handler called> #5 0x00474a23 in strchr () from /lib/libc.so.6 #6 0x08052118 in do_ncv_rrd () #7 0x08058c6b in update_rrd () #8 0x0804a400 in main () Best regards, -- Kuba Heichman
list Jakub Heichman
Hello there, I'm experiencing similar crashes of hobbit_rrd using 4.3.0-beta2 with rrdtool-1.4.2. I have custom tests using NCV graphs. Disabled tests seem to be crashing hobbit_rrd in the status channel: 2010-01-21 09:33:18 hobbitd_rrd: Got message 526516 @@status#526516/cnsb1|1264066398.129576|10.10.0.165||cnsb1|dns|1295449658|blue||blue|1264000058|0||1295449658|Disabled by 2010-01-21 09:33:18 startpos 86857, fillpos 86857, endpos -1 2010-01-21 09:33:18 Peer at 0.0.0.0:0 failed: Broken pipe 2010-01-21 09:33:18 Peer not up, flushing message queue 2010-01-21 09:39:28 hobbitd_rrd: Got message 551600 @@status#551600/prx09|1264066768.465806|10.10.1.179||prx09|cdaemon|2147483647|blue||blue|1263999443|0||-1|Disabled by 2010-01-21 09:39:28 startpos 353816, fillpos 354099, endpos -1 2010-01-21 09:39:28 Peer at 0.0.0.0:0 failed: Broken pipe 2010-01-21 09:39:28 Peer not up, flushing message queue 2010-01-21 09:45:53 hobbitd_rrd: Got message 581864 @@status#581864/prx34|1264067153.276604|10.10.1.74||prx34|cdaemon|2147483647|blue||blue|1263999443|0||-1|Disabled by 2010-01-21 09:45:53 startpos 1147497, fillpos 1147497, endpos -1 2010-01-21 09:45:53 Peer at 0.0.0.0:0 failed: Broken pipe 2010-01-21 09:45:53 Peer not up, flushing message queue GDB output: Reading symbols from /usr/local/rrdtool-1.4.2/lib/librrd.so.4...done. Loaded symbols for /usr/local/rrdtool/lib/librrd.so.4 Reading symbols from /usr/lib/libpng12.so.0...done. Loaded symbols for /usr/lib/libpng12.so.0 Reading symbols from /lib/libpcre.so.0...done. Loaded symbols for /lib/libpcre.so.0 Reading symbols from /lib/librt.so.1...done. Loaded symbols for /lib/librt.so.1 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /usr/lib/libxml2.so.2...done. Loaded symbols for /usr/lib/libxml2.so.2 Reading symbols from /lib/libm.so.6...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /usr/lib/libpangocairo-1.0.so.0...done. Loaded symbols for /usr/lib/libpangocairo-1.0.so.0 Reading symbols from /usr/lib/libpango-1.0.so.0...done. Loaded symbols for /usr/lib/libpango-1.0.so.0 Reading symbols from /usr/lib/libcairo.so.2...done. Loaded symbols for /usr/lib/libcairo.so.2 Reading symbols from /lib/libgobject-2.0.so.0...done. Loaded symbols for /lib/libgobject-2.0.so.0 Reading symbols from /lib/libgmodule-2.0.so.0...done. Loaded symbols for /lib/libgmodule-2.0.so.0 Reading symbols from /lib/libdl.so.2...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libglib-2.0.so.0...done. Loaded symbols for /lib/libglib-2.0.so.0 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /lib/libpthread.so.0...done. Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/libpangoft2-1.0.so.0...done. Loaded symbols for /usr/lib/libpangoft2-1.0.so.0 Reading symbols from /usr/lib/libfontconfig.so.1...done. Loaded symbols for /usr/lib/libfontconfig.so.1 Reading symbols from /usr/lib/libfreetype.so.6...done. Loaded symbols for /usr/lib/libfreetype.so.6 Reading symbols from /usr/lib/libXrender.so.1...done. Loaded symbols for /usr/lib/libXrender.so.1 Reading symbols from /usr/lib/libX11.so.6...done. Loaded symbols for /usr/lib/libX11.so.6 Reading symbols from /lib/libexpat.so.0...done. Loaded symbols for /lib/libexpat.so.0 Reading symbols from /usr/lib/libXau.so.6...done. Loaded symbols for /usr/lib/libXau.so.6 Reading symbols from /usr/lib/libXdmcp.so.6...done. Loaded symbols for /usr/lib/libXdmcp.so.6 Core was generated by `hobbitd_rrd --rrddir=/var/hobbit/data/rrd'. Program terminated with signal 6, Aborted. [New process 9999] #0 0x009c9410 in __kernel_vsyscall () (gdb) bt #0 0x009c9410 in __kernel_vsyscall () #1 0x0042ddf0 in raise () from /lib/libc.so.6 #2 0x0042f701 in abort () from /lib/libc.so.6 #3 0x08069e93 in sigsegv_handler () #4 <signal handler called> #5 0x00474a23 in strchr () from /lib/libc.so.6 #6 0x08052118 in do_ncv_rrd () #7 0x08058c6b in update_rrd () #8 0x0804a400 in main () Best regards, -- Kuba Heichman