Xymon Mailing List Archive search

Odd named errors, perhaps related to Xymon or Devmon?

2 messages in this thread

list Simeon Berkley · Mon, 12 Sep 2011 11:49:38 -0400 ·
We've been getting an average of 40+/day of these messages from the
caching nameserver process on one of our Xymon/Devmon servers:

 named[N]: sockmgr 0xN: maximum number of FD events (64) received

This server is running bind-9.3.6-16.P1 (old, I know, but it's the
latest for the distro we are using). I was wondering if anyone else
out there was seeing the same errors? I'm making an assumption that
Xymon/Devmon (or one of our ext scripts) might be inducing this
condition, since this server doesn't have any other processes on it
that would be doing a large amount of DNS lookups. This server has
over 600 entries in hosts.cfg and 92 entries in the Devmon hosts.db. I
have tried reproducing the problem by making a large amount of DNS
lookups as simultaneously as possible, but with no luck. The xymonnet
status only lists 42 failed DNS lookups out of 797 calls to
dnsresolve. Any suggestions on how to find out what specific process
is causing named to make periodic high demands for sockets?

-- 
S i m e o n  B e r k l e y

Systems Engineer
McClatchy Interactive
phone:  XXX-XXX-XXXX
fax:    XXX-XXX-XXXX
mobile: XXX-XXX-XXXX
e-mail: user-7650b51cc6e3@xymon.invalid
AIM:    sberkleymi
www.mcclatchyinteractive.com
list Jeremy Laidman · Tue, 13 Sep 2011 12:19:58 +1000 ·
On Tue, Sep 13, 2011 at 1:49 AM, Berkley, Simeon <
quoted from Simeon Berkley
user-7650b51cc6e3@xymon.invalid> wrote:
We've been getting an average of 40+/day of these messages from the
caching nameserver process on one of our Xymon/Devmon servers:

 named[N]: sockmgr 0xN: maximum number of FD events (64) received
This is a known condition (seems to be more common on Solaris) that you
shouldn't need to worry about.  You can get rid of the message by
recompiling BIND, if you're so inclined.

It's related to the way BIND asks the OS for any sockets with available
data.  BIND uses epoll_wait() to fill a buffer of sockets with data, but it
gives epoll_wait() a maximum number of sockets to return, so as to avoid
buffer overflow.  If the number of sockets with data is higher than the
complied maximum (64 by default), BIND logs the message, handles what's
returned from epoll_wait() and then loops around again.  No lost data is
indicated by this.
quoted from Simeon Berkley

 I
have tried reproducing the problem by making a large amount of DNS
lookups as simultaneously as possible, but with no luck.

Have you tried using "queryperf" from the BIND source?
quoted from Simeon Berkley

The xymonnet
status only lists 42 failed DNS lookups out of 797 calls to
dnsresolve. Any suggestions on how to find out what specific process
is causing named to make periodic high demands for sockets?
Enable query logging and find out what queried domains are highest during
the interval.  This might give you a hint.

If you're only using BIND as a local caching nameserver, you could consider
instead using nscd, the name service caching daemon.  For most applications,
this is a more lightweight caching resolver.  It's not suitable for
everyone, but it might be just fine for your purpose, and you would no
longer see the error messages.

Cheers
Jeremy