Xymon Mailing List Archive search

Problems since upgrading from bbgen to Hobbit

8 messages in this thread

list Frédéric Mangeant · Tue, 29 Nov 2005 12:49:44 +0100 ·
Hi Henrik

I upgraded this morning from bbgen 3.6 to Hobbit 4.1.2p1 (with ~ 1000 hosts), and have a few problems :


- I can't display the history of one of my test :

http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos returns an internal server error

[Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of script headers: bb-hist.sh, referer: http://10.50.80.44/hobbit-cgi/bb-hostsvc.sh?HOSTSVC=cronos.AHD&IP=10.50.80.44&DISPLAYNAME=cronos


- I had a coredump with bbgen :

$ file core
core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, SVR4-style, from 'bbgen'

$ gdb /BB/hobbit/server/bin/bbgen core
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

Core was generated by `bbgen --recentgifs --subpagecolumns=2 --nopropred=AHD --subpagecolumns=2 --page'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libpcre.so.0...done.
Loaded symbols for /usr/lib/libpcre.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0x40075941 in kill () from /lib/libc.so.6
(gdb) bt
#0  0x40075941 in kill () from /lib/libc.so.6
#1  0x400756e5 in raise () from /lib/libc.so.6
#2  0x40076a86 in abort () from /lib/libc.so.6
#3  0x080639d1 in sigsegv_handler (signum=0) at sig.c:57
#4  <signal handler called>
#5  main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
(gdb) quit


- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should
Even if I add a process to check, it doesn't appear in Hobbit.
Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).


- one of my custom network test keeps getting "Unexpected service response"
Its definition is this :
[ica]
   expect "ICA"
   port 1494


- I get a lot of errors in page.log :
2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 1-6:0000:0700

Do I have to use "TIME=123456:0000:0700" ?


Many thanks in advance for your help.

Regards,

-- 

Frédéric Mangeant

Steria EDC Sophia-Antipolis
list Henrik Størner · Tue, 29 Nov 2005 13:12:21 +0100 ·
quoted from Frédéric Mangeant
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
- I can't display the history of one of my test :

http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos 
returns an internal server error

[Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of 
script headers: bb-hist.sh
Most likely there is some sort of malformed entry in that history file
which the Hobbit histlog CGI cannot handle. If you could send me that
file  - should be ~hobbit/data/hist/cronos.AHD - I can look into it.
quoted from Frédéric Mangeant

- I had a coredump with bbgen :
#3  0x080639d1 in sigsegv_handler (signum=0) at sig.c:57
#4  <signal handler called>
#5  main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
Weird. Does it happen every time you run bbgen ? I hope not.
quoted from Frédéric Mangeant

- some of my devices (running the Quest BB client 3.01) do not update 
their statuses as frequently as they should
Even if I add a process to check, it doesn't appear in Hobbit.
Could it be a compatibility problem (my tests with a Quest BB client 
3.01 running on XP SP2 ran fine).
Hard to tell ... my best suggestion is to capture a network trace of the
traffic between one of these clients and the Hobbit server. If a Linux
box, you can use 
   tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP
to grab only the traffic between Hobbit and this client.
quoted from Frédéric Mangeant

- one of my custom network test keeps getting "Unexpected service response"
Its definition is this :
[ica]
  expect "ICA"
  port 1494
Most likely, it isn't getting any response back. What does
"telnet HOSTNAME 1494" give you ?
quoted from Frédéric Mangeant
- I get a lot of errors in page.log :
2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 
1-6:0000:0700

Do I have to use "TIME=123456:0000:0700" ?
I think so, yes.


Regards,
Henrik
list Frédéric Mangeant · Tue, 29 Nov 2005 14:28:44 +0100 ·
quoted from Henrik Størner
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
  
- I can't display the history of one of my test :

http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos 
returns an internal server error

[Tue Nov 29 12:08:08 2005] [error] [client 10.50.8.55] Premature end of 
script headers: bb-hist.sh
    
Most likely there is some sort of malformed entry in that history file
which the Hobbit histlog CGI cannot handle. If you could send me that
file  - should be ~hobbit/data/hist/cronos.AHD - I can look into it.
  
First of all, let me thank you for your help.

I've sent the cronos.AHD file to you by email.
quoted from Henrik Størner
- I had a coredump with bbgen :
#3  0x080639d1 in sigsegv_handler (signum=0) at sig.c:57
#4  <signal handler called>
#5  main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
    
Weird. Does it happen every time you run bbgen ? I hope not.
  
Once only. I had some "errors" in bb-hosts (duplicate subpage names, 
which seemed to confuse bb-findhost.cgi).
Since then I didn't have any new coredumps.
quoted from Henrik Størner

- some of my devices (running the Quest BB client 3.01) do not update 
their statuses as frequently as they should
Even if I add a process to check, it doesn't appear in Hobbit.
Could it be a compatibility problem (my tests with a Quest BB client 
3.01 running on XP SP2 ran fine).
    
Hard to tell ... my best suggestion is to capture a network trace of the
traffic between one of these clients and the Hobbit server. If a Linux
box, you can use 
   tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP
to grab only the traffic between Hobbit and this client.
  
I'm looking into it, with tcpdump and tcpflow.
quoted from Henrik Størner

- one of my custom network test keeps getting "Unexpected service response"
Its definition is this :
[ica]
  expect "ICA"
  port 1494
    
Most likely, it isn't getting any response back. What does
"telnet HOSTNAME 1494" give you ?
  
It works as expected :

$ telnet xx.xx.xx.xx 1494
Trying xx.xx.xx.xx...
Connected to xx.xx.xx.xx
Escape character is '^]'.
ICA
quoted from Henrik Størner

- I get a lot of errors in page.log :
2005-11-29 12:38:03 Bad timespec (missing colon or wrong weekdays): 
1-6:0000:0700

Do I have to use "TIME=123456:0000:0700" ?
    
I think so, yes.
  
Thanks, it's working now.


I have some more questions :

- when stopping Hobbit, is it normal to have this kind of errors in 
history.log :

2005-11-29 10:51:28 Tried to down BOARDBUSY: Invalid argument
2005-11-29 10:51:28 Worker process died with exit code 0, terminating
2005-11-29 10:51:28 Could not get shm of size 262144: No such file or 
directory
2005-11-29 10:51:28 Channel not available

- I'm running a few scripts with a long heartbeat (eg. "status+1440"), 
are these errors in history.log normal ?

2005-11-29 11:00:04 Will not update 
/BB/hobbit/data/hist/HQSTERIACRAUO.NetBackup - color unchanged (green)
2005-11-29 12:00:14 Will not update /BB/hobbit/data/hist/cronos.ebuilds 
- color unchanged (green)
2005-11-29 12:00:19 Will not update /BB/hobbit/data/hist/hades.ebuilds - 
color unchanged (green)
2005-11-29 12:01:08 Will not update 
/BB/hobbit/data/hist/ve1stnet3.photos - color unchanged (green)
2005-11-29 13:33:33 Will not update /BB/hobbit/data/hist/prod.file - 
color unchanged (green)

- what do this errors in page.log mean ?

2005-11-29 14:06:31 hobbitd_alert: Got message 4723, expected 4722
2005-11-29 14:00:15 Worker process died with exit code 0, terminating
2005-11-29 13:42:36 Stale alert for antivirus-1:http dropped

- and these in hobbitlaunch.log ? Should I run hobbitd_alert in debug mode ?

2005-11-29 12:48:32 Task bbpage terminated, status 1
2005-11-29 13:21:41 Task bbpage terminated, status 1
2005-11-29 13:32:07 Task bbpage terminated, status 1
2005-11-29 13:32:32 Task bbpage terminated, status 1
2005-11-29 13:37:44 Task bbpage terminated, status 1
2005-11-29 13:59:52 Task bbpage terminated, status 1
2005-11-29 14:00:15 Task bbpage terminated, status 1


Thanks again !

-- 

Frédéric Mangeant

Steria EDC Sophia-Antipolis
list Figaro Nicolas · Tue, 29 Nov 2005 14:40:18 +0100 ·
-----Message d'origine-----
De : Frédéric Mangeant [mailto:user-b6ea1d850181@xymon.invalid] Envoyé : mardi 29 novembre 2005 14:29
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] Problems since upgrading from bbgen to Hobbit
- I can't display the history of one of my test :

http://10.50.80.44/hobbit-cgi/bb-hist.sh?HISTFILE=cronos.AHD&ENTRIES=
50&IP=10.50.80.44&DISPLAYNAME=cronos
returns an internal server error
A good way of solving this is to run the script bb-hist.sh with a -x option directly from the command line. 
There is only a call to the cgi. 
So you can then try a truss like this one : export QUERY_STRING="HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos"
. /usr2/hobbit/server/etc/hobbitcgi.cfg
script /tmp/truss.txt truss  /usr2/hobbit/server/bin/bb-hist.cgi $CGI_HIST_OPTS

Hope this'll help, et bon courage. 
Nicolas Figaro
list Frédéric Mangeant · Tue, 29 Nov 2005 14:58:58 +0100 ·
quoted from Figaro Nicolas
FIGARO Nicolas a écrit :
A good way of solving this is to run the script bb-hist.sh with a -x option directly from the command line. 

There is only a call to the cgi. 

So you can then try a truss like this one : 
export QUERY_STRING="HISTFILE=cronos.AHD&ENTRIES=50&IP=10.50.80.44&DISPLAYNAME=cronos"
. /usr2/hobbit/server/etc/hobbitcgi.cfg
script /tmp/truss.txt 
truss  /usr2/hobbit/server/bin/bb-hist.cgi $CGI_HIST_OPTS

Hope this'll help, et bon courage.
Merci pour l'info Nicolas.

Running bb-hist.cgi returns 1000+ lines, and ends with :

read(3, " 2005 green   1121362283 877\nThu"..., 4096) = 4096
read(3, "0:59 2005 red   1121389259 12\nFr"..., 4096) = 4096
read(3, "2005 clear   1121415369 12\nFri J"..., 4096) = 4096
brk(0x84c1000)                          = 0x84c1000
read(3, "5 green   1121440956 879\nFri Jul"..., 4096) = 4096
read(3, "2005 red   1121467928 10\nSat Jul"..., 4096) = 4096
read(3, "08:07:14 2005 clear   1121494034"..., 4096) = 4096
read(3, "Sat Jul 16 15:22:19 2005 green  "..., 4096) = 4096
read(3, "1546235 880\nSat Jul 16 22:51:55 "..., 4096) = 4096
read(3, " red   1121573209 10\nSun Jul 17 "..., 4096) = 4096
read(3, "1:56 2005 clear   1121599316 10\n"..., 4096) = 4096
read(3, "Jul 17 20:37:04 2005 green   112"..., 4096) = 4096
read(3, "520 879\nMon Jul 18 04:06:39 2005"..., 4096) = 4096
read(3, "   1121678494 10\nMon Jul 18 11:2"..., 4096) = 4096
read(3, "592\nMon Jul 18 18:36:50 2005 gre"..., 4096) = 4096
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++


This external script's color changes very frequently (I use this to open 
"fake" tickets in Computer Associates USPSD).
The ~data/hist/cronos.AHD file is more than 4 Mb, maybe I should change 
one of the "MAXMSG_*" settings in hobbitserver.cfg ?

Regards,

-- 

Frédéric Mangeant

Steria EDC Sophia-Antipolis
list Figaro Nicolas · Tue, 29 Nov 2005 15:11:44 +0100 ·
-----Message d'origine-----
De : Frédéric Mangeant [mailto:user-b6ea1d850181@xymon.invalid] Envoyé : mardi 29 novembre 2005 14:59
À : user-ae9b8668bcde@xymon.invalid
Objet : Re: [hobbit] RE : [hobbit] Problems since upgrading from bbgen to Hobbit


Merci pour l'info Nicolas.
Avec plaisir. 
quoted from Frédéric Mangeant
Running bb-hist.cgi returns 1000+ lines, and ends with :

read(3, " 2005 green   1121362283 877\nThu"..., 4096) = 4096
read(3, "0:59 2005 red   1121389259 12\nFr"..., 4096) = 4096
read(3, "2005 clear   1121415369 12\nFri J"..., 4096) = 4096
brk(0x84c1000)                          = 0x84c1000
read(3, "5 green   1121440956 879\nFri Jul"..., 4096) = 4096
read(3, "2005 red   1121467928 10\nSat Jul"..., 4096) = 4096
read(3, "08:07:14 2005 clear   1121494034"..., 4096) = 4096
read(3, "Sat Jul 16 15:22:19 2005 green  "..., 4096) = 4096 read(3, "1546235 880\nSat Jul 16 22:51:55 "..., 4096) = 4096
read(3, " red   1121573209 10\nSun Jul 17 "..., 4096) = 4096
read(3, "1:56 2005 clear   1121599316 10\n"..., 4096) = 4096
read(3, "Jul 17 20:37:04 2005 green   112"..., 4096) = 4096
read(3, "520 879\nMon Jul 18 04:06:39 2005"..., 4096) = 4096
read(3, "   1121678494 10\nMon Jul 18 11:2"..., 4096) = 4096
read(3, "592\nMon Jul 18 18:36:50 2005 gre"..., 4096) = 4096
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++


This external script's color changes very frequently (I use this to open "fake" tickets in Computer Associates USPSD).
The ~data/hist/cronos.AHD file is more than 4 Mb, maybe I should change one of the "MAXMSG_*" settings in hobbitserver.cfg ?
Or try to run a head -100 to select only the 100 most recent color changes and see if the SIGSEGV still occurs. 
You could certainly determine the amount of lines the cgi can handle. 
NF 
Regards,

-- 

Frédéric Mangeant

Steria EDC Sophia-Antipolis
list Frédéric Mangeant · Tue, 29 Nov 2005 15:39:30 +0100 ·
quoted from Frédéric Mangeant
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
  
- some of my devices (running the Quest BB client 3.01) do not update their statuses as frequently as they should
Even if I add a process to check, it doesn't appear in Hobbit.
Could it be a compatibility problem (my tests with a Quest BB client 3.01 running on XP SP2 ran fine).
    
Hard to tell ... my best suggestion is to capture a network trace of the
traffic between one of these clients and the Hobbit server. If a Linux
box, you can use    tcpdump -s 1500 -w capturefile tcp port 1984 and host CLIENTHOSTIP
to grab only the traffic between Hobbit and this client.
Well, I have finally found the reason...

The Quest BB Client 3.01 provides a useful "Start Monitoring After" feature. When set to something different of 0, every change made with the bbntcfg.exe control panel isn't taken into account.
If I don't use the "Start Monitoring After" feature, everything is working fine.

Let's open a trouble ticket at Quest Software...

-- 

Frédéric Mangeant

Steria EDC Sophia-Antipolis
list Frédéric Mangeant · Tue, 29 Nov 2005 17:00:22 +0100 ·
quoted from Frédéric Mangeant
Henrik Stoerner a écrit :
On Tue, Nov 29, 2005 at 12:49:44PM +0100, Frédéric Mangeant wrote:
  
- I had a coredump with bbgen :
#3  0x080639d1 in sigsegv_handler (signum=0) at sig.c:57
#4  <signal handler called>
#5  main (argc=11, argv=0xbfffd7a4) at bbgen.c:586
    
Weird. Does it happen every time you run bbgen ? I hope not.
  
It occured twice today :

$ file ~/server/tmp/core
quoted from Frédéric Mangeant
core: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), 
SVR4-style, SVR4-style, from 'bbgen'

$ gdb ~/server/bin/bbgen core
quoted from Frédéric Mangeant
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db 
library "/lib/libthread_db.so.1".

Core was generated by `bbgen --recentgifs --subpagecolumns=2 --report 
--nopropred=AHD --subpagecolumns'.
quoted from Frédéric Mangeant
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libpcre.so.0...done.
Loaded symbols for /usr/lib/libpcre.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0  0x40075941 in kill () from /lib/libc.so.6
(gdb) bt
#0  0x40075941 in kill () from /lib/libc.so.6
#1  0x400756e5 in raise () from /lib/libc.so.6
#2  0x40076a86 in abort () from /lib/libc.so.6
#3  0x080639d1 in sigsegv_handler (signum=0) at sig.c:57
#4  <signal handler called>

#5  main (argc=11, argv=0xbfffe3b4) at bbgen.c:586
(gdb)


I'm running bbgen twice :
- once every 10 seconds for my "main" Hobbit map
- once every minute for my "customers" Hobbit map.

Can it be a problem ?


-- 

Frédéric Mangeant

Steria EDC Sophia-Antipolis