A few hobbit problems
list Eric E *hs Schwimmer
Hi all, We're in the process of migrating from BB to hobbit, and, for the most part, the process has been quick and painless (hobbit seems to be handling our 800+ hosts much better than BB did). We're seeing a few oddities, though; my apoligies if someone else has already pointed these out, I've only been on the list since earlier today and don't know how to search the archives. 1. We see the occaisional yellow message from bbtest: Error output: dns_queue_run deadlock - loops=260 During this problem, ~70 seconds get added to our "Test setup" phase as reported by bb-test. This only happens once or twice an hour so far, so its not a showstopper. We are using the --dns=ip option when calling the bbtest-net binary from hobbitlaunch. 2. The bb-eventlog.sh script dumps core. I've run it successfully in the past, but at some point, as we added more hosts to our bb-hosts file, it began to fail. Calling it from the command line: % setenv QUERY_STRING 'MAXTIME=140&MAXCOUNT=&Send=View+log' % /usr/local/hobbit/bb-eventlog.sh Segmentation fault (core dumped) TIA, -Eric Schwimmer Network Engineer University of Virginia HSCS
list Henrik Størner
Hi Eric,
▸
On Sun, Apr 03, 2005 at 04:53:28PM -0400, Schwimmer, Eric E *HS wrote:We're in the process of migrating from BB to hobbit, and, for the most part, the process has been quick and painless (hobbit seems to be handling our 800+ hosts much better than BB did).
Glad to hear that.
▸
1. We see the occaisional yellow message from bbtest: Error output: dns_queue_run deadlock - loops=260 During this problem, ~70 seconds get added to our "Test setup" phase as reported by bb-test. This only happens once or twice an hour so far, so its not a showstopper. We are using the --dns=ip option when calling the bbtest-net binary from hobbitlaunch.
Since you're seeing this, some DNS lookups must be happening. Most likely, they are from http tests, or hosts that have a "0.0.0.0" IP. I dont have a solution for this problem - it would mean digging into the C-ARES library which handles the DNS lookups, and I haven't done that yet. If it becomes more urgent, I'll see what I can do about it.
▸
2. The bb-eventlog.sh script dumps core. I've run it successfully in the past, but at some point, as we added more hosts to our bb-hosts file, it began to fail. Calling it from the command line: % setenv QUERY_STRING 'MAXTIME=140&MAXCOUNT=&Send=View+log' % /usr/local/hobbit/bb-eventlog.sh Segmentation fault (core dumped)
Could you try getting the call trace from the core file ? Assuming
the core file is in the current directory, you should do this:
$ gdb /usr/local/hobbit/server/bin/bb-eventlog.cgi core
[messages from gdb]
gdb> bt
The output from the "bt" command would be very helpful in narrowing
down the problem.
Thanks,
Henrik
list Eric E *hs Schwimmer
▸
Could you try getting the call trace from the core file ? Assuming the core file is in the current directory, you should do this: $ gdb /usr/local/hobbit/server/bin/bb-eventlog.cgi core [messages from gdb] gdb> bt The output from the "bt" command would be very helpful in narrowing down the problem.
Below is the output from gdb, I apoligize for the formatting, I'm using a rather awkward web client. Interestingly, I found that it runs fine as user 'hobbit', but users root and apache get a segfault.
Core was generated by `/usr/local/hobbit/server/bin/bb-eventlog.cgi'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0 0x08049502 in do_eventlog (output=0x9145c0, maxcount=100, maxminutes=140, allowallhosts=1)
at eventlog.c:170
170 fprintf(output, "<TD ALIGN=CENTER BGCOLOR=%s><FONT COLOR=black>%s</FONT></TD>\n",
(gdb) bt
#0 0x08049502 in do_eventlog (output=0x9145c0, maxcount=100, maxminutes=140, allowallhosts=1)
at eventlog.c:170
#1 0x08049c5b in main (argc=160256136, argv=0x98d50a3) at eventlog.c:338
list Eric E *hs Schwimmer
I might have found another, unrelated, problem: When I include any one of the following three lines in my bb-hosts file: 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu/ 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu=137.54.102.2/ 137.54.102.2 healthsystem.virginia.edu # http://137.54.102.2/ The bbgen process seems to hang. The 'healthsystem.virginia.edu' page fails to appear in the appropriate menu. None of the menu pages update, although individual test pages (such as a conn test for a switch) update appropriately. Furthermore, the bbgen test for my hobbit server sends this message: - Program crashed Fatal signal caught! (though you can only see it by viewing the bbgen test; the parent page doesnt update). -Eric
list Henrik Størner
▸
On Sun, Apr 03, 2005 at 06:20:53PM -0400, Schwimmer, Eric E *HS wrote:
Could you try getting the call trace from the core file ? Assuming the core file is in the current directory, you should do this: $ gdb /usr/local/hobbit/server/bin/bb-eventlog.cgi core [messages from gdb] gdb> bt The output from the "bt" command would be very helpful in narrowing down the problem.Below is the output from gdb
Thanks, that pin-pointed the problem nicely. Your eventlog has an
entry from a host that is not in the bb-hosts file; these are ignored
by the normal eventlog shown on the bb2 page, but the CGI script tried
to include them with fatal consequences.
I've attached a patch to fix this. To apply, save the patch to
/tmp/eventlog-crash.patch, then
cd hobbit-4.0.1
patch -p0 </tmp/eventlog-crash.patch
make
make install # as root
Regards,
Henrik
-------------- next part --------------
--- bbdisplay/pagegen.c 2005/03/22 09:03:37 1.139
+++ bbdisplay/pagegen.c 2005/04/04 05:43:53
@@ -1023,7 +1023,7 @@
while (p) {
/* Dont redo the eventlog or acklog things */
if (strcmp(p, "eventlog.sh") == 0) {
- if (bb2eventlog && !havedoneeventlog) do_eventlog(output, bb2eventlogmaxcount, bb2eventlogmaxtime, 0);
+ if (bb2eventlog && !havedoneeventlog) do_eventlog(output, bb2eventlogmaxcount, bb2eventlogmaxtime);
}
else if (strcmp(p, "acklog.sh") == 0) {
if (bb2acklog && !havedoneacklog) do_acklog(output, 25, 240);
@@ -1202,7 +1202,7 @@
do_bb2ext(output, "BBMKBB2EXT", "mkbb");
/* Dont redo the eventlog or acklog things */
- if (bb2eventlog && !havedoneeventlog) do_eventlog(output, 0, 240, 0);
+ if (bb2eventlog && !havedoneeventlog) do_eventlog(output, 0, 240);
if (bb2acklog && !havedoneacklog) do_acklog(output, 25, 240);
}
--- bbdisplay/eventlog.c 2005/03/22 09:03:37 1.17
+++ bbdisplay/eventlog.c 2005/04/04 05:42:57
@@ -48,7 +48,7 @@
return result;
}
-void do_eventlog(FILE *output, int maxcount, int maxminutes, int allowallhosts)
+void do_eventlog(FILE *output, int maxcount, int maxminutes)
{
FILE *eventlog;
char eventlogfilename[PATH_MAX];
@@ -117,7 +117,7 @@
if ( (itemsfound == 8) &&
(eventtime > cutoff) &&
- (allowallhosts || (eventhost && !eventhost->nobb2)) &&
+ (eventhost && !eventhost->nobb2) &&
(wanted_eventcolumn(svcname)) ) {
newevent = (event_t *) malloc(sizeof(event_t));
@@ -335,7 +335,7 @@
headfoot(stdout, "event", "", "header", COL_GREEN);
fprintf(stdout, "<center>\n");
- do_eventlog(stdout, maxcount, maxminutes, 1);
+ do_eventlog(stdout, maxcount, maxminutes);
fprintf(stdout, "</center>\n");
headfoot(stdout, "event", "", "footer", COL_GREEN);
--- bbdisplay/eventlog.h 2005/03/22 09:03:37 1.3
+++ bbdisplay/eventlog.h 2005/04/04 05:43:14
@@ -14,6 +14,6 @@
extern char *eventignorecolumns;
extern int havedoneeventlog;
-extern void do_eventlog(FILE *output, int maxcount, int maxminutes, int allowallhosts);
+extern void do_eventlog(FILE *output, int maxcount, int maxminutes);
#endif
list Henrik Størner
▸
On Sun, Apr 03, 2005 at 06:32:22PM -0400, Schwimmer, Eric E *HS wrote:
When I include any one of the following three lines in my bb-hosts file: 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu/ 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu=137.54.102.2/ 137.54.102.2 healthsystem.virginia.edu # http://137.54.102.2/ The bbgen process seems to hang. The 'healthsystem.virginia.edu' page fails to appear in the appropriate menu. None of the menu pages update, although individual test pages (such as a conn test for a switch) update appropriately.
Furthermore, the bbgen test for my hobbit server sends this message: - Program crashed Fatal signal caught!
This is a sure sign of the "bbgen" task crashing while generating the
new webpages. You should find a core file from it in the
~hobbit/server/tmp/ directory (or occasionally in ~hobbit/data/logs/),
so do the same thing that you did with the eventlog problem:
cd ~hobbit/server
gdb bin/bbgen tmp/core
gdb> bt
and send me the output.
Thanks,
Henrik
list Eric E *hs Schwimmer
Here's the gdb output: Core was generated by `bbgen --hobbitd --recentgifs --subpagecolumns=2 --report'. Program terminated with signal 6, Aborted.
▸
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
#0 0x007d57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) bt
#0 0x007d57a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00815955 in raise () from /lib/tls/libc.so.6
#2 0x00817319 in abort () from /lib/tls/libc.so.6
#3 0x0805df0e in sigsegv_handler (signum=11) at sig.c:57
#4 <signal handler called>
#5 0x00856490 in strcpy () from /lib/tls/libc.so.6
#6 0x0804d1ed in load_bbhosts (pgset=0x80647e3 "") at loadbbhosts.c:603
#7 0x08049bdb in main (argc=5, argv=0xbff67c44) at bbgen.c:550
-Eric
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Mon 4/4/2005 1:55 AM
To: user-ae9b8668bcde@xymon.invalid
Cc:
Subject: Re: [hobbit] A few hobbit problems
▸
On Sun, Apr 03, 2005 at 06:32:22PM -0400, Schwimmer, Eric E *HS wrote:When I include any one of the following three lines in my bb-hosts file: 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu/ 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu=137.54.102.2/ 137.54.102.2 healthsystem.virginia.edu # http://137.54.102.2/ The bbgen process seems to hang. The 'healthsystem.virginia.edu' page fails to appear in the appropriate menu. None of the menu pages update, although individual test pages (such as a conn test for a switch) update appropriately.
Furthermore, the bbgen test for my hobbit server sends this message: - Program crashed Fatal signal caught!
This is a sure sign of the "bbgen" task crashing while generating the
new webpages. You should find a core file from it in the
~hobbit/server/tmp/ directory (or occasionally in ~hobbit/data/logs/),
so do the same thing that you did with the eventlog problem:
cd ~hobbit/server
gdb bin/bbgen tmp/core
gdb> bt
and send me the output.
Thanks,
Henrik
list Eric E *hs Schwimmer
Works like a champ! Thanks!
▸
-Eric
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Mon 4/4/2005 1:49 AM
To: user-ae9b8668bcde@xymon.invalid
Cc:
Subject: Re: [hobbit] A few hobbit problems
On Sun, Apr 03, 2005 at 06:20:53PM -0400, Schwimmer, Eric E *HS wrote:Could you try getting the call trace from the core file ? Assuming the core file is in the current directory, you should do this: $ gdb /usr/local/hobbit/server/bin/bb-eventlog.cgi core [messages from gdb] gdb> bt The output from the "bt" command would be very helpful in narrowing down the problem.Below is the output from gdb
Thanks, that pin-pointed the problem nicely. Your eventlog has an entry from a host that is not in the bb-hosts file; these are ignored by the normal eventlog shown on the bb2 page, but the CGI script tried to include them with fatal consequences. I've attached a patch to fix this. To apply, save the patch to /tmp/eventlog-crash.patch, then cd hobbit-4.0.1 patch -p0 </tmp/eventlog-crash.patch make make install # as root Regards, Henrik
list Henrik Størner
▸
On Sun, Apr 03, 2005 at 06:32:22PM -0400, Schwimmer, Eric E *HS wrote:
I might have found another, unrelated, problem: When I include any one of the following three lines in my bb-hosts file: 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu/ 137.54.102.2 healthsystem.virginia.edu # http://healthsystem.virginia.edu=137.54.102.2/ 137.54.102.2 healthsystem.virginia.edu # http://137.54.102.2/ The bbgen process seems to hang.
I investigated this together with Eric, and found out that the culprit was setting FQDN=FALSE in hobbitserver.cfg - this was not handled correctly by bbgen after it was adapted for Hobbit. Since it hasn't shown up in the beta-tests, I guess most of you use the default setup where FQDN=TRUE :-) I'll probably release a 4.0.2 version in a few days with the collection of patches that have been done after the 4.0 release. Regards, Henrik