Xymon Mailing List Archive search

all purple?

5 messages in this thread

list Emmanuel Dreyfus · Wed, 26 Jan 2005 17:37:03 +0000 ·
Hi

How can I trackdown this problem: all network tests show up as purple.
bbgen is yellow, because "Too many purple updates (>30) - disabling 
updates for purple logs". Yeah, good, that helps a lot

hobbitd is sometime yellow to, with these messages:
Latest errormessages:
Loading hostnames
Loading saved state
Setting up network listener on 0.0.0.0:1984
Setting up signal handlers
Setting up hobbitd channels
Setting up logfiles
Setup complete

Any suggestion?

-- 
Emmanuel Dreyfus
user-69030d3a16d0@xymon.invalid
list Henrik Størner · Wed, 26 Jan 2005 20:15:55 +0100 ·
quoted from Emmanuel Dreyfus
On Wed, Jan 26, 2005 at 05:37:03PM +0000, Emmanuel Dreyfus wrote:
Hi

How can I trackdown this problem: all network tests show up as purple.
bbgen is yellow, because "Too many purple updates (>30) - disabling 
updates for purple logs". Yeah, good, that helps a lot
This happens if the network tester does not run (or it crashes).
Check the /var/log/hobbit/bb-network.log file, and also the
hobbitlaunch.log (same directory) for messages like "terminated with
signal X".

It's also supposed to drop a core-dump in either the hobbit-user
homedirectory, or ~/server/tmp/
quoted from Emmanuel Dreyfus
hobbitd is sometime yellow to, with these messages:
Latest errormessages:
Loading hostnames
Loading saved state
Setting up network listener on 0.0.0.0:1984
Setting up signal handlers
Setting up hobbitd channels
Setting up logfiles
Setup complete
That happens when hobbitd (re)starts. It should not happen unless you
do it manually, or trigger it by changing the setup in
hobbitlaunch.cfg.


Henrik
list Emmanuel Dreyfus · Wed, 26 Jan 2005 21:16:16 +0100 ·
quoted from Henrik Størner
Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
This happens if the network tester does not run (or it crashes).
Check the /var/log/hobbit/bb-network.log file, and also the
hobbitlaunch.log (same directory) for messages like "terminated with
signal X".
bb-network.log loops around
2005-01-26 18:33:31 select - no active fd's found, but pending is 2
quoted from Henrik Størner
 
It's also supposed to drop a core-dump in either the hobbit-user
homedirectory, or ~/server/tmp/
No core.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
user-69030d3a16d0@xymon.invalid
list Henrik Størner · Wed, 26 Jan 2005 23:38:14 +0100 ·
quoted from Emmanuel Dreyfus
On Wed, Jan 26, 2005 at 09:16:16PM +0100, Emmanuel Dreyfus wrote:
Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
This happens if the network tester does not run (or it crashes).
Check the /var/log/hobbit/bb-network.log file, and also the
hobbitlaunch.log (same directory) for messages like "terminated with
signal X".
bb-network.log loops around
2005-01-26 18:33:31 select - no active fd's found, but pending is 2
That's the first time I've ever seen this error-message trigger.
NetBSD must be doing something slightly different from the other
unix'es I've tried.

Is this on the machine you gave me access to ? I'd like to run the
bbtest-net tool with debugging output turned on to see what happens.
If it is some other system, then the command to use is (assuming
you're logged in as the hobbit user):

server/bin/bbcmd --env=server/etc/hobbitserver.cfg \
   bbtest-net --report --ping --checkresponse --debug


Henrik
list Emmanuel Dreyfus · Thu, 27 Jan 2005 00:11:32 +0100 ·
Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
server/bin/bbcmd --env=server/etc/hobbitserver.cfg \
   bbtest-net --report --ping --checkresponse --debug
2005-01-27 00:02:20 Cannot get socket - EMFILE
2005-01-27 00:02:20 Try running with a lower --concurrency setting
(currently: 64)
2005-01-27 00:02:20 Cannot get socket - EMFILE
2005-01-27 00:02:20 Try running with a lower --concurrency setting
(currently: 64)
2005-01-27 00:02:20 select - no active fd's found, but pending is 2
2005-01-27 00:02:20 select - no active fd's found, but pending is 2
2005-01-27 00:02:20 select - no active fd's found, but pending is 2

And it loops forever on this.

We hit the file descriptor limit. The default is 64 (check it with
ulimit), which is not enough for bbtest-net. Running ulimit -n 256
before your test fixes the problem.

That can be fixed by running ulimit -n in the startup script, but we
have no idea of the required amount of file descriptors at that time,
we'll have to raise the limit high as possible and hope it will be okay.
The other fix is to raise the limit in the C program that consumes a lot
of file descriptors. If you need fdmax descriptors:

        /* Bump the file descriptor limit to fdmax */
        struct rlimit limit;

        if (getrlimit(RLIMIT_NOFILE, &limit) != 0)
                err(1, "getrlimit failed");

        if (limit.rlim_max < fdmax)
                errx(1, "File descriptor hard limit hit");
        
        limit.rlim_cur = fdmax;
        if (getrlimit(RLIMIT_NOFILE, &limit) != 0)
                err(1, "setrlimit failed"); 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
user-69030d3a16d0@xymon.invalid