Xymon Mailing List Archive search

Hobbitd_client on server is crashing

3 messages in this thread

list Martin Ward · Fri, 28 Mar 2008 14:12:49 -0000 ·
Hi All,

I am running Hobbit 4.2.0 on a number of Sun servers and recently the
hobbitd_client daemon has been crashing and leaving core files around.

I ran gdb according to
http://www.hswn.dk/hobbit/help/known-issues.html#bugreport and it came
back with:
hobbit at hbt0:/opt/hobbit/server>gdb bin/hobbitd_client tmp/core
GNU gdb 6.6
...
Core was generated by `hobbitd_client'.
Program terminated with signal 6, Aborted.
#0  0xfee25687 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0  0xfee25687 in _lwp_kill () from /lib/libc.so.1
#1  0xfee22dee in thr_kill () from /lib/libc.so.1
#2  0xfedd11bb in raise () from /lib/libc.so.1
#3  0xfedb15d9 in abort () from /lib/libc.so.1
#4  0x08067172 in sigsegv_handler (signum=11) at sig.c:57
#5  0xfee24a4f in __sighndlr () from /lib/libc.so.1
#6  0xfee1ae72 in call_user_handler () from /lib/libc.so.1
#7  <signal handler called>
#8  0x08063ef9 in get_ostype (osname=0x806c28d "") at misc.c:43
#9  0x0805b320 in main (argc=1, argv=0x80464ac) at hobbitd_client.c:1766

The really bizarre thing is that it only crashes when one particular
machine connects to it and sends data. Other Hobbit clients connect in
and transfer thir core data over with no issues, it's only this one Sun
Ultra 60 that causes a problem.

Any ideas where I might start looking?

|\/|artin
-- 
Martin Ward
Network Systems Operations Specialist
DDI:	+44 (0) 20 7863 5218
Fax: 	+XX (X) XX XXXX XXXX
Mob: 	+44 (0) 7971 97 77 21
www.colt.net

Data | Voice | Managed Services 

Help reduce your carbon footprint | Think before you print

COLT Telecommunications, Beaufort House, XX St Botolph Street, London,
EC3A 7QN UK
Registered in England and Wales, registered number 02452736, VAT number
GB 645 4205 50


*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 

The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 

Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 

No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  

Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.
list Henrik Størner · Sat, 29 Mar 2008 07:43:12 +0100 ·
quoted from Martin Ward
On Fri, Mar 28, 2008 at 02:12:49PM -0000, Ward, Martin wrote:
Hi All,

I am running Hobbit 4.2.0 on a number of Sun servers and recently the
hobbitd_client daemon has been crashing and leaving core files around.
[snip]
#7  <signal handler called>
#8  0x08063ef9 in get_ostype (osname=0x806c28d "") at misc.c:43
#9  0x0805b320 in main (argc=1, argv=0x80464ac) at hobbitd_client.c:1766
[snip]
The really bizarre thing is that it only crashes when one particular
machine connects to it and sends data.
Weird, indeed. It looks like the client message from that particular
host is garbled in some way - specifically, the "osname" argument should
never be blank, it should be "sunos" or some other operating system
name.

Could you have a look at this client host ? Specifically, the client
data that it sends to Hobbit are in the
~hobbit/client/tmp/msg.HOSTNAME.txt file over on the client - it
would be interesting to know what the first few lines of that file
contain. It *should* read something like this:

client my,hostname,com.sunos sunos
[date]
Sat Mar 29 07:40:29 MET 2008
[uname]
SunOS my.hostname.com 5.8 Generic_117350-46 sun4u sparc SUNW,UltraAX-i2
[uptime]
  7:40am  up 197 day(s), 22:10,  0 users,  load average: 0.28, 0.16, 0.14


Regards,
Henrik
list Martin Ward · Mon, 31 Mar 2008 15:17:54 +0100 ·
Solved it (possible bug?):
 The hostname of the server was configured as tac0.lon.ws.colt.net when I
installed the Hobbit client however the machine was rebooted last week
and for reasons I have yet to figure out its hostname was reset to tac0.
 It seems that this caused the program to crash (though it doesn't make
sense to me). I found that the name had been changed so I changed it
back and immediately the Hobbit server registered it is up and running
again!
 |\/|artin
quoted from Martin Ward

-- 

	-----Original Message-----
	From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid] 	Sent: 28 March 2008 14:13
	To: user-ae9b8668bcde@xymon.invalid
	Subject: [hobbit] Hobbitd_client on server is crashing
	
	
	Hi All, 
	I am running Hobbit 4.2.0 on a number of Sun servers and
recently the hobbitd_client daemon has been crashing and leaving core
files around.

	I ran gdb according to
http://www.hswn.dk/hobbit/help/known-issues.html#bugreport
<http://www.hswn.dk/hobbit/help/known-issues.html#bugreport>;  and it
came back with: 	hobbit at hbt0:/opt/hobbit/server>gdb bin/hobbitd_client tmp/core 	GNU gdb 6.6 	... 	Core was generated by `hobbitd_client'. 	Program terminated with signal 6, Aborted. 	#0  0xfee25687 in _lwp_kill () from /lib/libc.so.1 	(gdb) bt 	#0  0xfee25687 in _lwp_kill () from /lib/libc.so.1 	#1  0xfee22dee in thr_kill () from /lib/libc.so.1 	#2  0xfedd11bb in raise () from /lib/libc.so.1 	#3  0xfedb15d9 in abort () from /lib/libc.so.1 	#4  0x08067172 in sigsegv_handler (signum=11) at sig.c:57 	#5  0xfee24a4f in __sighndlr () from /lib/libc.so.1 	#6  0xfee1ae72 in call_user_handler () from /lib/libc.so.1 	#7  <signal handler called> 	#8  0x08063ef9 in get_ostype (osname=0x806c28d "") at misc.c:43 	#9  0x0805b320 in main (argc=1, argv=0x80464ac) at
hobbitd_client.c:1766 
	The really bizarre thing is that it only crashes when one
particular machine connects to it and sends data. Other Hobbit clients
connect in and transfer thir core data over with no issues, it's only
this one Sun Ultra 60 that causes a problem.

	Any ideas where I might start looking? 

	|\/|artin 	-- 	Martin Ward 	Network Systems Operations Specialist 	DDI:    +44 (0) 20 7863 5218 	Fax:    +XX (X) XX XXXX XXXX 	Mob:    +44 (0) 7971 97 77 21 	www.colt.net <file://www.colt.net>  
quoted from Martin Ward
	Data | Voice | Managed Services 
	Help reduce your carbon footprint | Think before you print 
	COLT Telecommunications, Beaufort House, XX St Botolph Street,
London, EC3A 7QN UK 	Registered in England and Wales, registered number 02452736, VAT
number GB 645 4205 50 
	
	
************************************************************************
*************
	The message is intended for the named addressee only and may not
be disclosed to or used by anyone else, nor may it be copied in any way.

	
	The contents of this message and its attachments are
confidential and may also be subject to legal privilege. If you are not
the named addressee and/or have received this message in error, please
advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any
attachments without retaining any copies. 	
	Internet communications are not secure and COLT does not accept
responsibility for this message, its contents nor responsibility for any
viruses. 	
	No contracts can be created or varied on behalf of COLT
Telecommunications, its subsidiaries or affiliates ("COLT") and any
other party by email Communications unless expressly agreed in writing
with such other party. 	
	Please note that incoming emails will be automatically scanned
to eliminate potential viruses and unsolicited promotional emails. For
more information refer to www.colt.net or contact us on +44(0)20 7390
3900.
	

*************************************************************************************
The message is intended for the named addressee only and may not be disclosed to or used by anyone else, nor may it be copied in any way. 
The contents of this message and its attachments are confidential and may also be subject to legal privilege.  If you are not the named addressee and/or have received this message in error, please advise us by e-mailing user-61c7f445d564@xymon.invalid and delete the message and any attachments without retaining any copies. 
Internet communications are not secure and COLT does not accept responsibility for this message, its contents nor responsibility for any viruses. 
No contracts can be created or varied on behalf of COLT Telecommunications, its subsidiaries or affiliates ("COLT") and any other party by email Communications unless expressly agreed in writing with such other party.  
Please note that incoming emails will be automatically scanned to eliminate potential viruses and unsolicited promotional emails. For more information refer to www.colt.net or contact us on +44(0)20 7390 3900.