Xymon Mailing List Archive search

[unsolved]defunct problem on sun 5.7

list John Glowacki
Wed, 06 Sep 2006 11:13:50 -0400
Message-Id: <user-5c3d43970f28@xymon.invalid>

I am seeing defunct processes on a few solaris 7 clients too. Most 7 clients work OK. I have not had time to look into why. It seems to run once which sends the status to the hobbit server OK. Then the processes go defunct and a second status never gets sent. Results are purple after 30 minutes. My first guess is that it is a patch level issue. So far all working servers are at 106541-16 or higher.

# ps -ef | grep bb
       bb 22873     1  0 10:08:20 ?        0:00 /opt/hobbit/client/bin/hobbitlaunch --config=/opt/hobbit/client/etc/clientlaunc
       bb 22911 22909  0 10:08:22 ?        0:00 vmstat 300 2
       bb   549     1  0   Aug 19 ?        0:16 /opt/bbc1.9e-btf/bin/bbrun -a /opt/bbc1.9e-btf/bin/bb-local.sh
     root 22921 22195  0 10:08:33 pts/0    0:00 grep bb
       bb 22875 22873  1 10:08:20 ?        0:00 /opt/RICHPse/bin/se.sparcv9.5.7 /opt/hobbit/client/etc/hobbit.se
       bb 22909     1  0 10:08:22 ?        0:00 sh -c vmstat 300 2 1>/opt/hobbit/client/tmp/hobbit_vmstat.22878 2>&1; mv /opt/h
       bb 22876 22873  0                   0:00 <defunct>
       bb 22874 22873  0                   0:00 <defunct>

# cat clientlaunch.log
2006-09-06 10:08:20 hobbitlaunch starting
2006-09-06 10:08:20 Loading tasklist configuration from /opt/hobbit/client/etc/clientlaunch.cfg
# ps -ef | grep bb
       bb 22873     1  0 10:08:20 ?        0:00 /opt/hobbit/client/bin/hobbitlaunch --config=/opt/hobbit/client/etc/clientlaunc
       bb   549     1  0   Aug 19 ?        0:16 /opt/bbc1.9e-btf/bin/bbrun -a /opt/bbc1.9e-btf/bin/bb-local.sh
       bb 22875 22873  0 10:08:20 ?        0:00 /opt/RICHPse/bin/se.sparcv9.5.7 /opt/hobbit/client/etc/hobbit.se
       bb 22876 22873  0                   0:00 <defunct>
       bb 22874 22873  0                   0:00 <defunct>

John

user-bb3e9041f07f@xymon.invalid wrote:
Hello,

I wrote some weeks ago about a 'defunct' problem on a solaris 7 client. Today I got this problem on another server... This problem worries me a bit as I can't find any clues on how to solve it. Is it a bug in hobbit client ? A bug in the system ? Do you encounter this type of error under hobbit or under solaris ? All system commands normally respond when I separately launch them. All my other applications on this server are normally working. I really don't understand the problem...
Thanks for your help.
Best regards,

Thomas


Hello,

I got a strange behaviour on a server with solaris 7, I got this output when I launch client :

$ ./runclient.sh start
Hobbit client for sunos started on edsvideo
$ ps -efd|grep hobbit
  hobbit 21806 21805  0 11:46:21 ?        0:00 /usr/lib/sa/sadc 300 2
  hobbit 21898 21897  0 11:46:24 ?        0:00 vmstat 300 2
  hobbit 21780     1  0 11:46:20 ?        0:00 /opt/hobbit/client/bin/hobbitlaunch --config=/opt/hobbit/client/etc/clientlaunc
  hobbit 21785 21780  0                   0:00 <defunct>
  hobbit 21781 21780  0                   0:00 <defunct>
  hobbit 21784 21780  0                   0:00 <defunct>
  hobbit 21787 21780  0                   0:00 <defunct>
  hobbit 21783 21780  0 11:46:20 ?        0:00 /usr/bin/sh /opt/hobbit/client/ext/bb-sar.sh
  hobbit 21897     1  0 11:46:24 ?        0:00 sh -c vmstat 300 2 1>/opt/hobbit/client/tmp/hobbit_vmstat.edsvideo.21811 2>&1;   hobbit 21904 21903  0 11:46:24 ?        0:00 iostat -c 300 2
  hobbit 21903     1  0 11:46:24 ?        0:00 sh -c iostat -c 300 2 1>/opt/hobbit/client/tmp/hobbit_iostatcpu.edsvideo.21811   hobbit 15431 15421  0 10:51:57 pts/2    0:00 -ksh
  hobbit 21805 21783  0 11:46:21 ?        0:00 /usr/bin/sar -A -o /opt/hobbit/client/tmp/sar.21783 300 1
  hobbit 21786 21780  0                   0:00 <defunct>
  hobbit 21782 21780  0                   0:01 <defunct>

Few minutes later, I got this :

$ ps -efd|grep hobbit
  hobbit 21780     1  0 11:46:20 ?        0:00 /opt/hobbit/client/bin/hobbitlaunch --config=/opt/hobbit/client/etc/clientlaunc
  hobbit 21785 21780  0                   0:00 <defunct>
  hobbit 21781 21780  0                   0:00 <defunct>
  hobbit 21784 21780  0                   0:00 <defunct>
  hobbit 21787 21780  0                   0:00 <defunct>
  hobbit 21783 21780  0                   0:00 <defunct>
  hobbit 15431 15421  0 10:51:57 pts/2    0:00 -ksh
  hobbit 21786 21780  0                   0:00 <defunct>
  hobbit 21782 21780  0                   0:01 <defunct>

Thus, nothing is graphed on hobbit server and all tests are purple. I got several servers with solaris 7 and everything goes fine on these, so I'm just posting here to know if someone got this problem too.
Sincerly,

Thomas