Xymon Mailing List Archive search

Core dump

25 messages in this thread

list Kevin Hanrahan · Mon, 7 Feb 2005 19:19:38 -0500 ·
Henrik,
 I added the following lines in bb-hosts:

group-compress <H3><I>Legacy Systems</I></H3> 
172.16.69.226  nemesis #
apache=http://172.16.69.226/bb/bb.html/server-status?auto 


And got a core dump....a Hobbit core dump not a linux core dump

All outbound tests (conn, http, ftp. etc.) from the Hobbit server stopped
working and they eventually went purple
All inbound tests (from the clients) continued to work.

I stopped Hobbit, deleted the core dump file (saved it first!) and
restarted...got another core dump.

I stopped Hobbit, deleted the core dump file (saved it again), commented out
the line in bb-hosts and restarted and everything started working again.


Any Ideas?


Kevin
list Henrik Størner · Tue, 8 Feb 2005 08:26:49 +0100 ·
quoted from Kevin Hanrahan
On Mon, Feb 07, 2005 at 07:19:38PM -0500, Kevin Hanrahan wrote:
Henrik,
 I added the following lines in bb-hosts:

group-compress <H3><I>Legacy Systems</I></H3> 
172.16.69.226  nemesis #
apache=http://172.16.69.226/bb/bb.html/server-status?auto 


And got a core dump....a Hobbit core dump not a linux core dump
Not good.
quoted from Kevin Hanrahan
All outbound tests (conn, http, ftp. etc.) from the Hobbit server stopped
working and they eventually went purple
All inbound tests (from the clients) continued to work.
OK, this sounds as if the problem is in the network test program.
Could you send me the core-dump file AND your bbtest-net binary,
please ?

BTW, that's a very odd-looking URL you've setup for the apache test.
I very much doubt it will work.


Henrik
list Henrik Størner · Wed, 9 Feb 2005 22:13:41 +0100 ·
quoted from Henrik Størner
On Tue, Feb 08, 2005 at 08:26:49AM +0100, Henrik Stoerner wrote:
On Mon, Feb 07, 2005 at 07:19:38PM -0500, Kevin Hanrahan wrote:
Henrik,
 I added the following lines in bb-hosts:
group-compress <H3><I>Legacy Systems</I></H3> > 172.16.69.226  nemesis #
apache=http://172.16.69.226/bb/bb.html/server-status?auto > > > And got a core dump....a Hobbit core dump not a linux core dump
Not good.
Turned out to be a genuine bug, that shows up when you have more than
one "apache" test in your bb-hosts.

Fixed in RC-2.


Henrik
list David Gore · Mon, 15 Aug 2005 18:40:42 +0000 ·
hobbitd core, could have been caused by me turning up new hobbit clients 
replacing old BB clients?

hobbit - hobbitd_client
	Mon Aug 15 18:25:37 2005

- Program crashed

Fatal signal caught!

I suppose this means hobbitd_client cored first and then hobbitd, 
overwriting the first core?

hobbit at hobbit:/export/home/hobbit/server> file ./tmp/core
./tmp/core:     ELF 32-bit MSB core file SPARC Version 1, from 'hobbitd'
hobbit at hobbit:/export/home/hobbit/server> gdb bin/hobbitd tmp/core
GNU gdb 6.0
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.9"...
Core was generated by `hobbitd 
--pidfile=/export/home/hobbit/server/logs/hobbitd.pid --restart=/export'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/libresolv.so.2...done.
Loaded symbols for /usr/lib/libresolv.so.2
Reading symbols from /usr/lib/libsocket.so.1...done.
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...done.
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libc.so.1...done.
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libdl.so.1...done.
Loaded symbols for /usr/lib/libdl.so.1
Reading symbols from /usr/lib/libmp.so.2...done.
Loaded symbols for /usr/lib/libmp.so.2
Reading symbols from /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1...done.
Loaded symbols for /usr/platform/SUNW,Ultra-60/lib/libc_psr.so.1
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
(gdb) bt
#0  0xff19fff8 in _libc_kill () from /usr/lib/libc.so.1
#1  0xff136cd8 in abort () from /usr/lib/libc.so.1
#2  0x00020fe8 in sigsegv_handler (signum=10) at sig.c:57
#3  <signal handler called>
(gdb) q
hobbit at hobbit:/export/home/hobbit/server> ls -al ./tmp/core
-rw-------   1 hobbit   other    13491236 Aug 15 18:29 ./tmp/core

-- 
David
list Lars Ebeling · Sat, 27 Jan 2007 13:10:18 +0100 ·
I feel that I am nagging, so I changed the subject today.

From todays (27/1) snapshot

$ gdb ../bin/hobbitd core                                                       GNU gdb 5.3                                                                     Copyright 2002 Free Software Foundation, Inc.                                   GDB is free software, covered by the GNU General Public License, and you are    welcome to change it and/or distribute copies of it under certain conditions.   Type "show copying" to see the conditions.                                      There is absolutely no warranty for GDB.  Type "show warranty" for details.     This GDB was configured as "hppa2.0n-hp-hpux11.00"...                           Core was generated by `hobbitd'.                                                Program terminated with signal 6, Aborted.                                                                                                                      warning: The shared libraries were not privately mapped; setting a              breakpoint in a shared library will not work until you rerun the program.                                                                                                                                                                       warning: Can't find file hobbitd referenced in dld_list.                        Reading symbols from /usr/local/lib/libpcre.sl...done.                          Reading symbols from /usr/lib/libnsl.1...done.                                  Reading symbols from /usr/lib/libxti.2...done.                                  Reading symbols from /usr/lib/libc.2...done.                                    Reading symbols from /usr/lib/libdld.2...done.                                  #0  0xc020bad0 in kill () from /usr/lib/libc.2                                  (gdb) bt                                                                        #0  0xc020bad0 in kill () from /usr/lib/libc.2                                  #1  0xc01a655c in raise () from /usr/lib/libc.2                                 #2  0xc01e69a8 in abort_C () from /usr/lib/libc.2                               #3  0xc01e6a04 in abort () from /usr/lib/libc.2                                 #4  0x00018258 in sigsegv_handler (signum=8216) at sig.c:57                     #5  <signal handler called>                                                     #6  int_compare (a=0x6f3d6, b=0xb19fd) at rbtr.c:406                            #7  0x00017ee0 in rbtFind (h=0x6f3d6, key=0x6f3d6) at rbtr.c:381                #8  0x00005a20 in find_cookie (cookie=455638) at hobbitd.c:925                  #9  0x000060b8 in handle_status (                                                   msg=0x4001c9d0 "status leopg9.hobbitd yellow\nStatistics for Hobbit daemon\n
Up since 27-Jan-2007 09:50:23 (0 days, 00:05:01)\n\nIncoming messages      :        156\n- status", ' ' <repeats 15 times>, ":        121\n- combo", ' ' <repeat
s 13 times>..., sender=0x1dddc "hobbitd", hostname=0x40018bf0 "leopg9",             testname=0x4001c3b8 "hobbitd", grouplist=0x0, log=0x4001c3e8, newcolor=4,       downcause=0x0) at hobbitd.c:1135                                            #10 0x0000e8bc in main (argc=3, argv=0x0) at hobbitd.c:4318                     (gdb) quit                                                                      $                                                                               


Regards
Lars Ebeling

http://leopg9.no-ip.org
Hobbithobbyist

"It is better to keep your mouth shut and appear stupid than to open it and remove all doubt."
-- Mark Twain
list Lars Ebeling · Sat, 27 Jan 2007 13:52:54 +0100 ·
I restarted hobbit 12:57, then I don't get any coredump.The statuspages are not updated since the coredump 9:55.

Extract from hobbitlaunch.log:

$ pg *                                                                          2007-01-27 12:57:51 hobbitlaunch starting                                       2007-01-27 12:57:51 Loading tasklist configuration from /home/hobbit/server/etc/
hobbitlaunch.cfg                                                                2007-01-27 12:57:52 Loading hostnames                                           2007-01-27 12:57:52 Loading saved state                                         2007-01-27 12:57:52 Setting up network listener on 0.0.0.0:1984                 2007-01-27 12:57:52 Setting up local listener                                   2007-01-27 12:57:52 Cannot bind to local listen socket (Address already in use) 2007-01-27 12:57:52 Could not connect to bbd at leopg9:1984 - Connection refused   2007-01-27 12:57:52 Whoops ! bb failed to send message - Connection failed      2007-01-27 12:57:53 Task hobbitd terminated, status 1                           2007-01-27 12:57:59 Loading hostnames                                           2007-01-27 12:58:00 Loading saved state                                         2007-01-27 12:58:00 Setting up network listener on 0.0.0.0:1984                 2007-01-27 12:58:00 Setting up local listener                                   2007-01-27 12:58:00 Cannot bind to local listen socket (Address already in use) 2007-01-27 12:58:05 Task hobbitd terminated, status 1                           2007-01-27 12:58:05 Loading hostnames                                           2007-01-27 12:58:05 Loading saved state                                         2007-01-27 12:58:05 Setting up network listener on 0.0.0.0:1984                 2007-01-27 12:58:05 Setting up local listener                                   2007-01-27 12:58:05 Cannot bind to local listen socket (Address already in use) 2007-01-27 12:58:10 Task hobbitd terminated, status 1       
This is the only hobbit-process running
  hobbit 26744     1  0 12:57:51 ?         0:00 /home/hobbit/server/bin/hobbitlaunch --config=/home/hobbit/serv          
Lars                                                           
  ----- Original Message -----   From: lars ebeling   To: hobbit   Sent: Saturday, January 27, 2007 1:10 PM
  Subject: [hobbit] Core dump
quoted from Lars Ebeling


  I feel that I am nagging, so I changed the subject today.

  From todays (27/1) snapshot

  $ gdb ../bin/hobbitd core                                                         GNU gdb 5.3                                                                       Copyright 2002 Free Software Foundation, Inc.                                     GDB is free software, covered by the GNU General Public License, and you are      welcome to change it and/or distribute copies of it under certain conditions.     Type "show copying" to see the conditions.                                        There is absolutely no warranty for GDB.  Type "show warranty" for details.       This GDB was configured as "hppa2.0n-hp-hpux11.00"...                             Core was generated by `hobbitd'.                                                  Program terminated with signal 6, Aborted.                                                                                                                          warning: The shared libraries were not privately mapped; setting a                breakpoint in a shared library will not work until you rerun the program.                                                                                                                                                                             warning: Can't find file hobbitd referenced in dld_list.                          Reading symbols from /usr/local/lib/libpcre.sl...done.                            Reading symbols from /usr/lib/libnsl.1...done.                                    Reading symbols from /usr/lib/libxti.2...done.                                    Reading symbols from /usr/lib/libc.2...done.                                      Reading symbols from /usr/lib/libdld.2...done.                                    #0  0xc020bad0 in kill () from /usr/lib/libc.2                                    (gdb) bt                                                                          #0  0xc020bad0 in kill () from /usr/lib/libc.2                                    #1  0xc01a655c in raise () from /usr/lib/libc.2                                   #2  0xc01e69a8 in abort_C () from /usr/lib/libc.2                                 #3  0xc01e6a04 in abort () from /usr/lib/libc.2                                   #4  0x00018258 in sigsegv_handler (signum=8216) at sig.c:57                       #5  <signal handler called>                                                       #6  int_compare (a=0x6f3d6, b=0xb19fd) at rbtr.c:406                              #7  0x00017ee0 in rbtFind (h=0x6f3d6, key=0x6f3d6) at rbtr.c:381                  #8  0x00005a20 in find_cookie (cookie=455638) at hobbitd.c:925                    #9  0x000060b8 in handle_status (                                                     msg=0x4001c9d0 "status leopg9.hobbitd yellow\nStatistics for Hobbit daemon\n
  Up since 27-Jan-2007 09:50:23 (0 days, 00:05:01)\n\nIncoming messages      :          156\n- status", ' ' <repeats 15 times>, ":        121\n- combo", ' ' <repeat
  s 13 times>..., sender=0x1dddc "hobbitd", hostname=0x40018bf0 "leopg9",               testname=0x4001c3b8 "hobbitd", grouplist=0x0, log=0x4001c3e8, newcolor=4,         downcause=0x0) at hobbitd.c:1135                                              #10 0x0000e8bc in main (argc=3, argv=0x0) at hobbitd.c:4318                       (gdb) quit                                                                        $                                                                               


  Regards
  Lars Ebeling

  http://leopg9.no-ip.org
  Hobbithobbyist

  "It is better to keep your mouth shut and appear stupid than to open it and remove all doubt."
  -- Mark Twain
list Don Munyak · Mon, 9 Apr 2007 16:09:04 -0400 ·
quoted from Lars Ebeling
On 1/27/07, lars ebeling <user-1fecd3eafd52@xymon.invalid> wrote:

I restarted hobbit 12:57, then I don't get any coredump.The statuspages are
not updated since the coredump 9:55.

Extract from hobbitlaunch.log:

$ pg *

2007-01-27 12:57:51 hobbitlaunch starting

2007-01-27 12:57:51 Loading tasklist configuration from
/home/hobbit/server/etc/
hobbitlaunch.cfg

2007-01-27 12:57:52 Loading hostnames

2007-01-27 12:57:52 Loading saved state

2007-01-27 12:57:52 Setting up network listener on 0.0.0.0:1984

2007-01-27 12:57:52 Setting up local listener

2007-01-27 12:57:52 Cannot bind to local listen socket (Address already in
use)
2007-01-27 12:57:52 Could not connect to bbd at leopg9:1984 - Connection
refused
2007-01-27 12:57:52 Whoops ! bb failed to send message - Connection failed

2007-01-27 12:57:53 Task hobbitd terminated, status 1

2007-01-27 12:57:59 Loading hostnames

2007-01-27 12:58:00 Loading saved state

2007-01-27 12:58:00 Setting up network listener on 0.0.0.0:1984

2007-01-27 12:58:00 Setting up local listener

2007-01-27 12:58:00 Cannot bind to local listen socket (Address already in
use)
2007-01-27 12:58:05 Task hobbitd terminated, status 1

2007-01-27 12:58:05 Loading hostnames

2007-01-27 12:58:05 Loading saved state

2007-01-27 12:58:05 Setting up network listener on 0.0.0.0:1984

2007-01-27 12:58:05 Setting up local listener

2007-01-27 12:58:05 Cannot bind to local listen socket (Address already in
use)
2007-01-27 12:58:10 Task hobbitd terminated, status 1

This is the only hobbit-process running
  hobbit 26744     1  0 12:57:51 ?         0:00
/home/hobbit/server/bin/hobbitlaunch
--config=/home/hobbit/serv

Lars


----- Original Message -----
From: lars ebeling
To: hobbit
Sent: Saturday, January 27, 2007 1:10 PM
Subject: [hobbit] Core dump


I feel that I am nagging, so I changed the subject today.

From todays (27/1) snapshot

$ gdb ../bin/hobbitd core

GNU gdb 5.3

Copyright 2002 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "hppa2.0n-hp-hpux11.00"...

Core was generated by `hobbitd'.

Program terminated with signal 6, Aborted.


warning: The shared libraries were not privately mapped; setting a

breakpoint in a shared library will not work until you rerun the program.


warning: Can't find file hobbitd referenced in dld_list.

Reading symbols from /usr/local/lib/libpcre.sl...done.

Reading symbols from /usr/lib/libnsl.1...done.

Reading symbols from /usr/lib/libxti.2...done.

Reading symbols from /usr/lib/libc.2...done.

Reading symbols from /usr/lib/libdld.2...done.

#0  0xc020bad0 in kill () from /usr/lib/libc.2

(gdb) bt

#0  0xc020bad0 in kill () from /usr/lib/libc.2

#1  0xc01a655c in raise () from /usr/lib/libc.2

#2  0xc01e69a8 in abort_C () from /usr/lib/libc.2

#3  0xc01e6a04 in abort () from /usr/lib/libc.2

#4  0x00018258 in sigsegv_handler (signum=8216) at sig.c:57

#5  <signal handler called>

#6  int_compare (a=0x6f3d6, b=0xb19fd) at rbtr.c:406

#7  0x00017ee0 in rbtFind (h=0x6f3d6, key=0x6f3d6) at rbtr.c:381

#8  0x00005a20 in find_cookie (cookie=455638) at hobbitd.c:925

#9  0x000060b8 in handle_status (

    msg=0x4001c9d0 "status leopg9.hobbitd yellow\nStatistics for Hobbit
daemon\n
Up since 27-Jan-2007 09:50:23 (0 days, 00:05:01)\n\nIncoming messages      :

    156\n- status", ' ' <repeats 15 times>, ":        121\n- combo", ' '
<repeat
s 13 times>..., sender=0x1dddc "hobbitd", hostname=0x40018bf0 "leopg9",

    testname=0x4001c3b8 "hobbitd", grouplist=0x0, log=0x4001c3e8,
newcolor=4,
    downcause=0x0) at hobbitd.c:1135

#10 0x0000e8bc in main (argc=3, argv=0x0) at hobbitd.c:4318

(gdb) quit

$
I noticed an unanswered post similiar to my current issue. Here's what
I found in the list, by search for "task hobbitd terminated"

You need to tune the kernel shared-memory settings. See the
sysctl.conf man page. There's also this page that explains a
bit about tuning shared-memory:
http://www.unidata.ucar.edu/support/help/MailArchives/mcidas/msg02324.html

However, I am still having issues as well.
list Asif Iqbal · Thu, 1 Oct 2009 12:12:30 -0400 ·
I had a core dump with bb on non-global solaris 10 zone on T1000
bash-3.00# gdb ~hobbit/client/bin/bb core
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
quoted from Don Munyak
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "sparc-sun-solaris2.8"...
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libsocket.so.1...done.
Loaded symbols for /lib/libsocket.so.1
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libc.so.1...done.
Loaded symbols for /lib/libc.so.1
Reading symbols from /platform/sun4v/lib/libc_psr.so.1...done.
Loaded symbols for /platform/SUNW,Sun-Fire-T1000/lib/libc_psr.so.1
Reading symbols from /lib/ld.so.1...done.
Loaded symbols for /lib/ld.so.1
Core was generated by `/home/hobbit/client/bin/bb 0.0.0.0 @'.
Program terminated with signal 11, Segmentation fault.
#0  0xff109644 in doformat () from /lib/libc.so.1
(gdb) bt
#0  0xff109644 in doformat () from /lib/libc.so.1
#1  0xff10ac90 in strftime () from /lib/libc.so.1
#2  0x00013594 in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:47
#3  0x00013a28 in xmalloc (size=121384) at memory.c:121
#4  0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#5  0x00013a28 in xmalloc (size=121384) at memory.c:121
#6  0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#7  0x00013a28 in xmalloc (size=121384) at memory.c:121
#8  0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#9  0x00013a28 in xmalloc (size=121384) at memory.c:121
#10 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#11 0x00013a28 in xmalloc (size=121384) at memory.c:121
#12 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#13 0x00013a28 in xmalloc (size=121384) at memory.c:121
#14 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#15 0x00013a28 in xmalloc (size=121384) at memory.c:121
---Type <return> to continue, or q <return> to quit---
#16 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#17 0x00013a28 in xmalloc (size=121384) at memory.c:121
#18 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#19 0x00013a28 in xmalloc (size=121384) at memory.c:121
#20 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#21 0x00013a28 in xmalloc (size=121384) at memory.c:121
#22 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#23 0x00013a28 in xmalloc (size=121384) at memory.c:121
#24 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#25 0x00013a28 in xmalloc (size=121384) at memory.c:121
#26 0x0001362c in errprintf (fmt=0x1da28 "xmalloc: Out of memory!\n")
    at errormsg.c:61
#27 0x00013a28 in xmalloc (size=121384) at memory.c:121
#28 0x0001362c in errprintf (fmt=0x1da48 "xrealloc: Out of memory!\n")
    at errormsg.c:61
#29 0x00013a68 in xrealloc (ptr=0x0, size=195584) at memory.c:151
#30 0x00017d80 in strbuf_addtobuffer (buf=0x302a0,
    newtext=0x51cc0 ' ' <repeats 34 times>, "(sparc)
2.6.0,REV=10.0.3.2004.12.15---Type <return> to continue, or q <return> to
quit---
.20.59\n", newlen=76) at strfunc.c:109
#31 0x00012b44 in main (argc=3, argv=0xffbff51c) at bb.c:118
(gdb)


Let me know if I should have sent the backtrace directly to Henrik instead.

-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
list Stef Coene · Mon, 13 Jan 2014 11:02:12 +0100 ·
Hi,

I have installed the client but I moved it to an other directory.
I tried to set XYMONHOME to this new directory, but this causes xymonlaunch to crash with a buffer overvlow.  The rest of the commands are running fine.
This is the code from ./common/xymonlaunch.c:

/* Find config */
if (!config) {
   if (stat("/etc/tasks.cfg", &st) != -1) config = strdup("/etc/tasks.cfg");
   else if (stat("/etc/xymon/tasks.cfg", &st) != -1) config = strdup("/etc/xymon/tasks.cfg");
   else if (stat("/etc/xymon-client/clientlaunch.cfg", &st) != -1) config = strdup("/etc/xymon-client/clientlaunch.cfg");
   else if (xgetenv("XYMONHOME")) {
      char *pf = NULL;
      sprintf(pf, "%s/etc/tasks.cfg", xgetenv("XYMONHOME"));
      if (pf && stat(pf, &st) != -1) config = strdup(pf);
   }     if (config) dbgprintf("Using config file: %s\n", config);
}  
I fixed it like this (line 546).  I also added an extra if statement so the script also tries to use clientlaunch.cfg:

/* Find config */
if (!config) {
   if (stat("/etc/tasks.cfg", &st) != -1) config = strdup("/etc/tasks.cfg");
   else if (stat("/etc/xymon/tasks.cfg", &st) != -1) config = strdup("/etc/xymon/tasks.cfg");
   else if (stat("/etc/xymon-client/clientlaunch.cfg", &st) != -1) config = strdup("/etc/xymon-client/clientlaunch.cfg");
   else if (xgetenv("XYMONHOME")) {
      char pf[1024] ;
      sprintf(pf, "%s/etc/tasks.cfg", xgetenv("XYMONHOME"));
      if (pf && stat(pf, &st) != -1) {
         config = strdup(pf);
      } else {
         sprintf(pf, "%s/etc/clientlaunch.cfg", xgetenv("XYMONHOME"));
         config = strdup(pf);
     }
   }

I also have a problem with this line from file common/xymoncmd.c:
    sprintf(envfn, "%s/etc/xymonserver.cfg", xgetenv("XYMONHOME"));

I want to run xymoncmd without setting XYMONHOME.  So it has to use XYMONHOME from compile time, but that's not working.  In the Makefile I set XYMONHOME to a directory but during the compile, /client is added.


Stef
list Henrik Størner · Mon, 13 Jan 2014 11:57:49 +0100 ·
quoted from Stef Coene
Den 13.01.2014 11:02, Stef Coene skrev:
I have installed the client but I moved it to an other directory.
I tried to set XYMONHOME to this new directory, but this causes
xymonlaunch to crash with a buffer overvlow.
This is the code from ./common/xymonlaunch.c:
[...]
   else if (xgetenv("XYMONHOME")) {
      char *pf = NULL;
      sprintf(pf, "%s/etc/tasks.cfg", xgetenv("XYMONHOME"));
      if (pf && stat(pf, &st) != -1) config = strdup(pf);
   }
I have no idea how that ever got into the official code. It is so obviously broken I should have caught it before hitting 'commit'. Fixed.
quoted from Stef Coene

I also have a problem with this line from file common/xymoncmd.c:
    sprintf(envfn, "%s/etc/xymonserver.cfg", xgetenv("XYMONHOME"));

I want to run xymoncmd without setting XYMONHOME.  So it has to use
XYMONHOME from compile time, but that's not working.
Now I'm confused. You start by saying you are working on a *client* installation, but xymonserver.cfg refers to a *server* installation.
In the Makefile I set XYMONHOME to a directory but during the compile, /client is added.
The "/client" in the top-level Makefile during 'configure'. If you don't want it, then re-build the client with XYMONHOME set the way you want.

Anyway, if you override XYMONHOME in your xymonclient.cfg or xymonserver.cfg, then it should work fine. The only problem that is to bootstrap it for xymonlaunch and xymoncmd (if you run that for commands not invoked through xymonlaunch). The easiest in that case is to add the --env and --config options for xymonlaunch / xymoncmd.


Regards,
Henrik
list Stef Coene · Mon, 13 Jan 2014 13:57:22 +0100 ·
quoted from Henrik Størner
Now I'm confused. You start by saying you are working on a *client*
installation, but xymonserver.cfg refers to a *server* installation.
It is indeed for a client.
But the search logic for the config file for xymonlaunch is:
/etc/tasks.cfg
/etc/xymon-client/clientlaunch.cfg
$XYMONHOME/etc/tasks.cfg

And I think the fourth is missing:
$XYMONHOME/etc/clientlaunch.cfg
quoted from Henrik Størner
The "/client" in the top-level Makefile during 'configure'. If you
don't want it, then re-build the client with XYMONHOME set the way you
want.

Anyway, if you override XYMONHOME in your xymonclient.cfg or
xymonserver.cfg, then it should work fine. The only problem that is to
bootstrap it for xymonlaunch and xymoncmd (if you run that for commands
not invoked through xymonlaunch). The easiest in that case is to add the
--env and --config options for xymonlaunch / xymoncmd.
I just want to get rid of a forced /client in the path.

I found that in lib/environ.c XYMONHOME is set to BUILD_HOME.
So even if you define XYMONHOME in the top level Makefile, it's overwritten in 
lib/environ.c with BUILD_HOME.
I fixed my problem by changing the Makfile in lib en client and remove "/client" 
from the defintion of BUILD_HOME


Stef
list Henrik Størner · Tue, 14 Jan 2014 10:36:52 +0100 ·
quoted from Stef Coene
Den 13.01.2014 13:57, Stef Coene skrev:
I just want to get rid of a forced /client in the path.
OK, looking at this in more detail - it is a bit messy. I think the attached patch should fix it properly, it eliminates the ".../client" in a client-only configuration, but keeps it in a client+server installation.

Note that you should start from scratch with "./configure --client" to test it.


Regards,
Henrik
Attachments (1)
list Stef Coene · Tue, 14 Jan 2014 11:09:50 +0100 ·
quoted from Henrik Størner
On Tuesday 14 January 2014 10:36:52 user-ce4a2c883f75@xymon.invalid wrote:
Den 13.01.2014 13:57, Stef Coene skrev:
I just want to get rid of a forced /client in the path.
OK, looking at this in more detail - it is a bit messy. I think the
attached patch should fix it properly, it eliminates the ".../client" in
a client-only configuration, but keeps it in a client+server
installation.
I think I should apply this patch agains xymon-4.3.14 ?


Stef
list Henrik Størner · Tue, 14 Jan 2014 11:37:07 +0100 ·
quoted from Stef Coene
Den 14.01.2014 11:09, Stef Coene skrev:
On Tuesday 14 January 2014 10:36:52 user-ce4a2c883f75@xymon.invalid wrote:
Den 13.01.2014 13:57, Stef Coene skrev:
I just want to get rid of a forced /client in the path.
OK, looking at this in more detail - it is a bit messy. I think the
attached patch should fix it properly, it eliminates the 
".../client" in
a client-only configuration, but keeps it in a client+server
installation.
I think I should apply this patch agains xymon-4.3.14 ?
Works with 4.3.13 also.


Regards,
Henrik
list Stef Coene · Tue, 14 Jan 2014 12:51:50 +0100 ·
After applying the patch, I get the error 

build/Makefile.rules:95: *** Recursive variable `CC' references itself 
(eventually).  Stop.


Stef
list Gautier Begin · Tue, 14 Jan 2014 14:23:26 +0100 ·
Hello,

Last Sunday, I wanted to start a xymonproxy (vers 4.3.12) on a Solaris 10.5 with 900 targets . I had a performance issue:
 - The xymonproxy process used 100% of only one CPU (no multithread seen).
 - On the main XYMON server, data from this proxy (I have one other on Ubuntu with 50 targets working fine) came with difficulties (delays and lacks).

I tried to used the -lqueue option for the proxy with no success.
Nothing special seen in any logs.
The UNIX admin said me that the xymonproxy was just making 'time()' command all the time.

When I try on test env with both Solaris for proxy and server, all is ok. So I suspect an issue of scale (nbre of targets connected on the proxy) and multithread issue also.

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN
list Henrik Størner · Tue, 14 Jan 2014 15:02:52 +0100 ·
quoted from Gautier Begin
 
Den 14.01.2014 14:23, Gautier Begin skrev: 
Last Sunday, I wanted
to start a xymonproxy (vers 4.3.12) on a Solaris 10.5 with 900 targets .
I had a performance issue: 
- The xymonproxy process used 100% of only
one CPU (no multithread seen). 
- On the main XYMON server, data from
this proxy (I have one other on Ubuntu with 50 targets working fine)
came with difficulties (delays and lacks). 

I tried to used the
-lqueue option for the proxy with no success. 
Nothing special seen in
any logs. 
The UNIX admin said me that the xymonproxy was just making
'time()' command all the time.

I've had the proxy handling about 5000
hosts simultaneously. On Linux, though. 
I have not heard of such
behaviour before, and I am sure there must be others running the proxy
on Solaris in a similar setup. If you can make it happen again, please
do a "kill -USR2" on the xymonproxy process to toggle debugging on/off.


None of the Xymon tools are multithreaded - so far I have stuck to the
traditional Unix way of doing things. 
Regards, 
Henrik
list John Thurston · Tue, 14 Jan 2014 11:26:08 -0900 ·
quoted from Gautier Begin
On 1/14/2014 4:23 AM, Gautier Begin wrote:
Hello,

Last Sunday, I wanted to start a xymonproxy (vers 4.3.12) on a Solaris
10.5 with 900 targets . I had a performance issue:
  - The xymonproxy process used 100% of only one CPU (no multithread seen).
  - On the main XYMON server, data from this proxy (I have one other on
Ubuntu with 50 targets working fine) came with difficulties (delays and
lacks).
Is proxy the only service enabled in your tasks.cfg?

I have occasionally seen xymon run away with the processor. In some 
cases I have been able to find a cause. In all cases, it has been caused 
by an error in my configuration files.

Is it possible that you have defined a proxy or notification loop?
-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska
list Stef Coene · Thu, 16 Jan 2014 13:12:25 +0100 ·
quoted from Stef Coene
On Tuesday 14 January 2014 12:51:50 Stef Coene wrote:
After applying the patch, I get the error

build/Makefile.rules:95: *** Recursive variable `CC' references itself
(eventually).  Stop.
Henrik,

Any idea where this error comes from ?


Stef
list Henrik Størner · Thu, 16 Jan 2014 15:38:22 +0100 ·
quoted from Stef Coene
Den 14.01.2014 12:51, Stef Coene skrev:
After applying the patch, I get the error

build/Makefile.rules:95: *** Recursive variable `CC' references itself
(eventually).  Stop.
Doesn't happen here. Try going to http://sourceforge.net/p/xymon/code/HEAD/tree/branches/4.3.14/ and click on "Download snapshot" to grab the current test-version of 4.3.14, and then apply the diff on top of that.

Regards,
Henrik
list Stef Coene · Thu, 16 Jan 2014 15:58:13 +0100 ·
quoted from Henrik Størner
On Thursday 16 January 2014 15:38:22 user-ce4a2c883f75@xymon.invalid wrote:
Den 14.01.2014 12:51, Stef Coene skrev:
After applying the patch, I get the error

build/Makefile.rules:95: *** Recursive variable `CC' references
itself
(eventually).  Stop.
Doesn't happen here. Try going to
http://sourceforge.net/p/xymon/code/HEAD/tree/branches/4.3.14/ and click
on "Download snapshot" to grab the current test-version of 4.3.14, and
then apply the diff on top of that.
I already tried that.
But I found a solution, I resaved the diff and the patch is working now.


Stef
list Gautier Begin · Fri, 17 Jan 2014 14:41:51 +0100 ·
Henrik,

I managed to reproduce the issue:
- I start the XYMON as a proxy server => Non pbl
- Stop the XYMON                                     =>  Process remains alive (xymonlaunch)
- Kill the xymonlaunch
- Start the XYMON as aproxy server   => The pbl occurs again 
Could some files in the ./tmp of XYMON influence this ?
 
Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN


From:   user-ce4a2c883f75@xymon.invalid
To:     <xymon at xymon.com>
Date:   01/14/2014 03:00 PM
Subject:        Re: [Xymon] xymonproxy perf issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>
quoted from Gautier Begin


Den 14.01.2014 14:23, Gautier Begin skrev:
Last Sunday, I wanted to start a xymonproxy (vers 4.3.12) on a Solaris 10.5 with 900 targets . I had a performance issue:  - The xymonproxy process used 100% of only one CPU (no multithread seen). 
 - On the main XYMON server, data from this proxy (I have one other on Ubuntu with 50 targets working fine) came with difficulties (delays and lacks). 
I tried to used the -lqueue option for the proxy with no success. Nothing special seen in any logs. The UNIX admin said me that the xymonproxy was just making 'time()' command all the time. 
I've had the proxy handling about 5000 hosts simultaneously. On Linux, though.
I have not heard of such behaviour before, and I am sure there must be others running the proxy on Solaris in a similar setup. If you can make it happen again, please do a "kill -USR2" on the xymonproxy process to toggle debugging on/off.
None of the Xymon tools are multithreaded - so far I have stuck to the traditional Unix way of doing things.
 Regards,
Henrik
list Gautier Begin · Fri, 17 Jan 2014 16:14:11 +0100 ·
Hello,

Could it comes from how the xymonproxy program handles the MACHINE variable ?

I say that because when the process becomes nuts, he makes only 'gettimer' action. Then I have a look in the source file and found this line (line 452) that could correspond:

if (proxyname && ((now = gettimer()) >= (laststatus+300))) {


proxyname comes from $MACHINE

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN


CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
quoted from Gautier Begin


From:   Gautier Begin/LUX/CSC at CSC
To:     xymon at xymon.com
Date:   01/17/2014 02:42 PM
Subject:        Re: [Xymon] xymonproxy perf issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>


Henrik, 
I managed to reproduce the issue: - I start the XYMON as a proxy server => Non pbl - Stop the XYMON                                     =>  Process remains alive (xymonlaunch) - Kill the xymonlaunch - Start the XYMON as aproxy server   => The pbl occurs again 
Could some files in the ./tmp of XYMON influence this ?   
Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN


From:        user-ce4a2c883f75@xymon.invalid To:        <xymon at xymon.com> Date:        01/14/2014 03:00 PM Subject:        Re: [Xymon] xymonproxy perf issue Sent by:        "Xymon" <xymon-bounces at xymon.com> 


Den 14.01.2014 14:23, Gautier Begin skrev: Last Sunday, I wanted to start a xymonproxy (vers 4.3.12) on a Solaris 10.5 with 900 targets . I had a performance issue: - The xymonproxy process used 100% of only one CPU (no multithread seen). - On the main XYMON server, data from this proxy (I have one other on Ubuntu with 50 targets working fine) came with difficulties (delays and lacks). 

I tried to used the -lqueue option for the proxy with no success. Nothing special seen in any logs. The UNIX admin said me that the xymonproxy was just making 'time()' command all the time. I've had the proxy handling about 5000 hosts simultaneously. On Linux, though. I have not heard of such behaviour before, and I am sure there must be others running the proxy on Solaris in a similar setup. If you can make it happen again, please do a "kill -USR2" on the xymonproxy process to toggle debugging on/off. None of the Xymon tools are multithreaded - so far I have stuck to the traditional Unix way of doing things.  Regards, Henrik   Xymon at xymon.com
list Henrik Størner · Mon, 20 Jan 2014 11:45:08 +0100 ·
quoted from Gautier Begin
 

Den 17.01.2014 16:14, Gautier Begin skrev: 
Could it comes from
how the xymonproxy program handles the MACHINE variable ? 

I say
that because when the process becomes nuts, he makes only 'gettimer'
action. Then I have a look in the source file and found this line (line
452) that could correspond: 
_if (proxyname && ((now = gettimer())
= (laststatus+300))) {_
[snip]
Last Sunday, I wanted to start a
xymonproxy (vers 4.3.12) on a Solaris 10.5 with 900 targets . I had a
performance issue: 

- The xymonproxy process used 100% of only one
CPU (no multithread seen). 
- On the main XYMON server, data from this
proxy (I have one other on Ubuntu with 50 targets working fine) came
with difficulties (delays and lacks).
I suspect some kind of error
happened with the network socket handling. Could you add this patch and
try to reproduce the problem? It doesn't change the behaviour, but it
does add some error-reporting in case the core select() call fails.


Regards,
Henrik
Attachments (1)
list Gautier Begin · Wed, 29 Jan 2014 10:18:46 +0100 ·
Henrik,

I applied the patch on my test machine that has not pbl => all continues to run well

I applied the patch on the machine with the 100%CPU  issue. I get the same behaviour, the content of the xymonproxy.log is the following lines looping:
1166 2014-01-28 18:09:00 state 0: reading from client
1166 2014-01-28 18:09:00 state 1: reading from client
1166 2014-01-28 18:09:00 state 2: request combining
1166 2014-01-28 18:09:00 state 3: sending to server
1166 2014-01-28 18:09:00 state 4: reading from client
2014-01-28 18:09:00 select() failed: Invalid argument


Could the issue come from the fact that le @IP of the machine in the DNS is associated with 2 hostnames (not alias) ?

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery.  NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
signature
 • CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664


quoted from Henrik Størner
From:   user-ce4a2c883f75@xymon.invalid
To:     <xymon at xymon.com>
Date:   01/20/2014 11:42 AM
Subject:        Re: [Xymon] xymonproxy perf issue
Sent by:        "Xymon" <xymon-bounces at xymon.com>


Den 17.01.2014 16:14, Gautier Begin skrev:
Could it comes from how the xymonproxy program handles the MACHINE variable ? 
I say that because when the process becomes nuts, he makes only 'gettimer' action. Then I have a look in the source file and found this line (line 452) that could correspond: 
if (proxyname && ((now = gettimer()) >= (laststatus+300))) {
[snip]
Last Sunday, I wanted to start a xymonproxy (vers 4.3.12) on a Solaris 10.5 with 900 targets . I had a performance issue: - The xymonproxy process used 100% of only one CPU (no multithread seen). - On the main XYMON server, data from this proxy (I have one other on Ubuntu with 50 targets working fine) came with difficulties (delays and lacks). I suspect some kind of error happened with the network socket handling. Could you add this patch and try to reproduce the problem? It doesn't change the behaviour, but it does add some error-reporting in case the core select() call fails.
 Regards,
Henrik

  [attachment "proxyerror.diff" deleted by Gautier Begin/LUX/CSC]