Problems since upgrading from bbgen to Hobbit
list Frédéric Mangeant
Hi Henrik I upgraded this morning from bbgen 3.6 to Hobbit 4.1.2p1 (with ~ 1000 hosts), and have a few problems : - I can't display the history of one of my test : http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos returns an internal server error [Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of script headers: bb-hist.sh, referer: http://10.50.80.44/hobbit-cgi/bb-hostsvc.sh?HOSTSVC=cronos.AHD&IP=10.50.80.44&DISPLAYNAME=cronos - I had a coredump with bbgen : $ file core core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, SVR4-style, from 'bbgen' $ gdb /BB/hobbit/server/bin/bbgen core GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `bbgen --recentgifs --subpagecolumns=2 --nopropred=AHD --subpagecolumns=2 --page'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/libpcre.so.0...done. Loaded symbols for /usr/lib/libpcre.so.0 Reading symbols from /lib/libc.so.6...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 #0 0x40075941 in kill () from /lib/libc.so.6 (gdb) bt #0 0x40075941 in kill () from /lib/libc.so.6 #1 0x400756e5 in raise () from /lib/libc.so.6 #2 0x40076a86 in abort () from /lib/libc.so.6 #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586 (gdb) quit - some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine). - one of my custom network test keeps getting "Unexpected service response" Its definition is this : [ica] expect "ICA" port 1494 - I get a lot of errors in page.log : 2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 1-6:0000:0700 Do I have to use "TIME=123456:0000:0700" ? Many thanks in advance for your help. Regards, -- Frédéric Mangeant Steria EDC Sophia-Antipolis
list Henrik Størner
▸
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
- I can't display the history of one of my test : http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos returns an internal server error [Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of script headers: bb-hist.sh
Most likely there is some sort of malformed entry in that history file which the Hobbit histlog CGI cannot handle. If you could send me that file - should be ~hobbit/data/hist/cronos.AHD - I can look into it.
▸
- I had a coredump with bbgen : #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
Weird. Does it happen every time you run bbgen ? I hope not.
▸
- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).
Hard to tell ... my best suggestion is to capture a network trace of the traffic between one of these clients and the Hobbit server. If a Linux box, you can use tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP to grab only the traffic between Hobbit and this client.
▸
- one of my custom network test keeps getting "Unexpected service response" Its definition is this : [ica] expect "ICA" port 1494
Most likely, it isn't getting any response back. What does "telnet HOSTNAME 1494" give you ?
▸
- I get a lot of errors in page.log : 2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 1-6:0000:0700 Do I have to use "TIME=123456:0000:0700" ?
I think so, yes. Regards, Henrik
list Frédéric Mangeant
▸
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:- I can't display the history of one of my test : http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos returns an internal server error [Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of script headers: bb-hist.shMost likely there is some sort of malformed entry in that history file which the Hobbit histlog CGI cannot handle. If you could send me that file - should be ~hobbit/data/hist/cronos.AHD - I can look into it.
First of all, let me thank you for your help. I've sent the cronos.AHD file to you by email.
▸
- I had a coredump with bbgen : #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586Weird. Does it happen every time you run bbgen ? I hope not.
Once only. I had some "errors" in bb-hosts (duplicate subpage names, which seemed to confuse bb-findhost.cgi). Since then I didn't have any new coredumps.
▸
- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).Hard to tell ... my best suggestion is to capture a network trace of the traffic between one of these clients and the Hobbit server. If a Linux box, you can use tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP to grab only the traffic between Hobbit and this client.
I'm looking into it, with tcpdump and tcpflow.
▸
- one of my custom network test keeps getting "Unexpected service response" Its definition is this : [ica] expect "ICA" port 1494Most likely, it isn't getting any response back. What does "telnet HOSTNAME 1494" give you ?
It works as expected : $ telnet xx.xx.xx.xx 1494 Trying xx.xx.xx.xx... Connected to xx.xx.xx.xx Escape character is '^]'. ICA
▸
- I get a lot of errors in page.log : 2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 1-6:0000:0700 Do I have to use "TIME=123456:0000:0700" ?I think so, yes.
Thanks, it's working now. I have some more questions : - when stopping Hobbit, is it normal to have this kind of errors in history.log : 2005-11-29 10:51:28 Tried to down BOARDBUSY: Invalid argument 2005-11-29 10:51:28 Worker process died with exit code 0, terminating 2005-11-29 10:51:28 Could not get shm of size 262144: No such file or directory 2005-11-29 10:51:28 Channel not available - I'm running a few scripts with a long heartbeat (eg. "status+1440"), are these errors in history.log normal ? 2005-11-29 11:00:04 Will not update /BB/hobbit/data/hist/HQSTERIACRAUO.NetBackup - color unchanged (green) 2005-11-29 12:00:14 Will not update /BB/hobbit/data/hist/cronos.ebuilds - color unchanged (green) 2005-11-29 12:00:19 Will not update /BB/hobbit/data/hist/hades.ebuilds - color unchanged (green) 2005-11-29 12:01:08 Will not update /BB/hobbit/data/hist/ve1stnet3.photos - color unchanged (green) 2005-11-29 13:33:33 Will not update /BB/hobbit/data/hist/prod.file - color unchanged (green) - what do this errors in page.log mean ? 2005-11-29 14:06:31 hobbitd_alert: Got message 4723, expected 4722 2005-11-29 14:00:15 Worker process died with exit code 0, terminating 2005-11-29 13:42:36 Stale alert for antivirus-1:http dropped - and these in hobbitlaunch.log ? Should I run hobbitd_alert in debug mode ? 2005-11-29 12:48:32 Task bbpage terminated, status 1 2005-11-29 13:21:41 Task bbpage terminated, status 1 2005-11-29 13:32:07 Task bbpage terminated, status 1 2005-11-29 13:32:32 Task bbpage terminated, status 1 2005-11-29 13:37:44 Task bbpage terminated, status 1 2005-11-29 13:59:52 Task bbpage terminated, status 1 2005-11-29 14:00:15 Task bbpage terminated, status 1 Thanks again ! -- Frédéric Mangeant Steria EDC Sophia-Antipolis
list Figaro Nicolas
-----Message d'origine----- De : Frédéric Mangeant [mailto:user-b6ea1d850181@xymon.invalid] Envoyé : mardi 29 novembre 2005 14:29 À : user-ae9b8668bcde@xymon.invalid Objet : Re: [hobbit] Problems since upgrading from bbgen to Hobbit- I can't display the history of one of my test : http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES= 50&IP=10.50.80.44&DISPLAYNAME=cronos returns an internal server error
A good way of solving this is to run the script bb-hist.sh with a -x option directly from the command line. There is only a call to the cgi. So you can then try a truss like this one : export QUERY_STRING="HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos" . /usr2/hobbit/server/etc/hobbitcgi.cfg script /tmp/truss.txt truss /usr2/hobbit/server/bin/bb-hist.cgi $CGI_HIST_OPTS Hope this'll help, et bon courage. Nicolas Figaro
list Frédéric Mangeant
▸
FIGARO Nicolas a écrit :
A good way of solving this is to run the script bb-hist.sh with a -x option directly from the command line. There is only a call to the cgi. So you can then try a truss like this one : export QUERY_STRING="HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos" . /usr2/hobbit/server/etc/hobbitcgi.cfg script /tmp/truss.txt truss /usr2/hobbit/server/bin/bb-hist.cgi $CGI_HIST_OPTS Hope this'll help, et bon courage.
Merci pour l'info Nicolas. Running bb-hist.cgi returns 1000+ lines, and ends with : read(3, " 2005 green 1121362283 877\nThu"..., 4096) = 4096 read(3, "0:59 2005 red 1121389259 12\nFr"..., 4096) = 4096 read(3, "2005 clear 1121415369 12\nFri J"..., 4096) = 4096 brk(0x84c1000) = 0x84c1000 read(3, "5 green 1121440956 879\nFri Jul"..., 4096) = 4096 read(3, "2005 red 1121467928 10\nSat Jul"..., 4096) = 4096 read(3, "08:07:14 2005 clear 1121494034"..., 4096) = 4096 read(3, "Sat Jul 16 15:22:19 2005 green "..., 4096) = 4096 read(3, "1546235 880\nSat Jul 16 22:51:55 "..., 4096) = 4096 read(3, " red 1121573209 10\nSun Jul 17 "..., 4096) = 4096 read(3, "1:56 2005 clear 1121599316 10\n"..., 4096) = 4096 read(3, "Jul 17 20:37:04 2005 green 112"..., 4096) = 4096 read(3, "520 879\nMon Jul 18 04:06:39 2005"..., 4096) = 4096 read(3, " 1121678494 10\nMon Jul 18 11:2"..., 4096) = 4096 read(3, "592\nMon Jul 18 18:36:50 2005 gre"..., 4096) = 4096 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ This external script's color changes very frequently (I use this to open "fake" tickets in Computer Associates USPSD). The ~data/hist/cronos.AHD file is more than 4 Mb, maybe I should change one of the "MAXMSG_*" settings in hobbitserver.cfg ? Regards, -- Frédéric Mangeant Steria EDC Sophia-Antipolis
list Figaro Nicolas
-----Message d'origine----- De : Frédéric Mangeant [mailto:user-b6ea1d850181@xymon.invalid] Envoyé : mardi 29 novembre 2005 14:59 À : user-ae9b8668bcde@xymon.invalid Objet : Re: [hobbit] RE : [hobbit] Problems since upgrading from bbgen to Hobbit Merci pour l'info Nicolas.
Avec plaisir.
▸
Running bb-hist.cgi returns 1000+ lines, and ends with : read(3, " 2005 green 1121362283 877\nThu"..., 4096) = 4096 read(3, "0:59 2005 red 1121389259 12\nFr"..., 4096) = 4096 read(3, "2005 clear 1121415369 12\nFri J"..., 4096) = 4096 brk(0x84c1000) = 0x84c1000 read(3, "5 green 1121440956 879\nFri Jul"..., 4096) = 4096 read(3, "2005 red 1121467928 10\nSat Jul"..., 4096) = 4096 read(3, "08:07:14 2005 clear 1121494034"..., 4096) = 4096 read(3, "Sat Jul 16 15:22:19 2005 green "..., 4096) = 4096 read(3, "1546235 880\nSat Jul 16 22:51:55 "..., 4096) = 4096 read(3, " red 1121573209 10\nSun Jul 17 "..., 4096) = 4096 read(3, "1:56 2005 clear 1121599316 10\n"..., 4096) = 4096 read(3, "Jul 17 20:37:04 2005 green 112"..., 4096) = 4096 read(3, "520 879\nMon Jul 18 04:06:39 2005"..., 4096) = 4096 read(3, " 1121678494 10\nMon Jul 18 11:2"..., 4096) = 4096 read(3, "592\nMon Jul 18 18:36:50 2005 gre"..., 4096) = 4096 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ This external script's color changes very frequently (I use this to open "fake" tickets in Computer Associates USPSD). The ~data/hist/cronos.AHD file is more than 4 Mb, maybe I should change one of the "MAXMSG_*" settings in hobbitserver.cfg ?
Or try to run a head -100 to select only the 100 most recent color changes and see if the SIGSEGV still occurs. You could certainly determine the amount of lines the cgi can handle. NF
Regards, -- Frédéric Mangeant Steria EDC Sophia-Antipolis
list Frédéric Mangeant
▸
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should Even if I add a process to check, it doesn't appear in Hobbit. Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).Hard to tell ... my best suggestion is to capture a network trace of the traffic between one of these clients and the Hobbit server. If a Linux box, you can use tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP to grab only the traffic between Hobbit and this client.
Well, I have finally found the reason... The Quest BB Client 3.01 provides a useful "Start Monitoring After" feature. When set to something different of 0, every change made with the bbntcfg.exe control panel isn't taken into account. If I don't use the "Start Monitoring After" feature, everything is working fine. Let's open a trouble ticket at Quest Software... -- Frédéric Mangeant Steria EDC Sophia-Antipolis
list Frédéric Mangeant
▸
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:- I had a coredump with bbgen : #3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57 #4 <signal handler called> #5 main (argc=11, argv=0xbfffd7a4) at bbgen.c:586Weird. Does it happen every time you run bbgen ? I hope not.
It occured twice today : $ file ~/server/tmp/core
▸
core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV),
SVR4-style, SVR4-style, from 'bbgen'
$ gdb ~/server/bin/bbgen core
▸
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".
Core was generated by `bbgen --recentgifs --subpagecolumns=2 --report
--nopropred=AHD --subpagecolumns'.
▸
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libpcre.so.0...done.
Loaded symbols for /usr/lib/libpcre.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0 0x40075941 in kill () from /lib/libc.so.6
(gdb) bt
#0 0x40075941 in kill () from /lib/libc.so.6
#1 0x400756e5 in raise () from /lib/libc.so.6
#2 0x40076a86 in abort () from /lib/libc.so.6
#3 0x080639d1 in sigsegv_handler (signum=0) at sig.c:57
#4 <signal handler called>
#5 main (argc=11, argv=0xbfffe3b4) at bbgen.c:586
(gdb)
I'm running bbgen twice :
- once every 10 seconds for my "main" Hobbit map
- once every minute for my "customers" Hobbit map.
Can it be a problem ?
--
Frédéric Mangeant
Steria EDC Sophia-Antipolis