Xymon Mailing List Archive search

Modernizing the DNS check

11 messages in this thread

list Mark Felder · Tue, 07 Jan 2014 14:56:12 -0600 ·
Is there any hope of enhancing the DNS check capability beyond its
current functionality? It would be nice if it could detect all the NS
for the domain you're monitoring to compare the SOA serial of all the NS
servers and go red if they're not in sync.
list Jeremy Laidman · Wed, 8 Jan 2014 12:23:32 +1100 ·
Mark

I think more DNS checks would be really useful for many, but I would say
that we'd be going down a rabbit hole chasing this.  The DNS check you've
described is worthwhile to do for many people (myself included), but is
only one of many that would need to be done to ensure that a name or domain
is resolvable.

For example, should the same checks be done for the parent zone(s)?  Should
we check the WHOIS record for impending zone expiry date?  Should we check
that there is more than one NS record?  Should we check that the NS records
don't all point at the same IP addresses?  For high-turn-over (eg dynamic)
zones, the masters nameservers might only rarely be in sync, or the serial
number might typically change before all of the SOA lookups are complete.
 What about when there's a stealth master that can't be queried?  What
about reporting on slave zones that about to expire?  Or zones that have
semantic errors such as MX records that refer to CNAME records, or host
records with underlines, or CNAME loops?  Should we be checking DNSSEC
signatures?

Hmm, that list turned into a bit of a rant, really.  Sorry.  You can
probably guess that I think about this stuff a fair bit, and many of the
things I've listed are more "niche" than others, but still.

For each possible test anyone might want to include, each installation
might need different ways of reporting and/or recording statistics, and so
it would get complex very quickly.  Do you report a yellow if only 3 out of
4 NS servers are the same, or 7 out of 8?  If the master's serial number
somehow goes backwards, do we show seven servers wrong or is it just one?
 You you assume that the master is in the MNAME field, or would you get the
option to override?  If two hosts have different values for the MNAME
field, which do you consider master?  Or in this case, do you care?
 Also, which host(s) would you report the status against?  Do you have to
create hosts.cfg entries for every NS, and then maintain that list by
tracking the NS records as they change over time, or do you create a
pseudo-host for each domain, or some of each?

Woops, there I go ranting again, sorry.

Such complexity and flexibility is better implemented outside Xymon, to
keep the Xymon core as simple and easy to maintain as possible.

I think the best solution is for each installer to decide on their own
detection and reporting requirements, and create or install ext scripts to
suit each case.  In fact, I'm surprised there aren't any on Xymonton.org
already, but that's where I would expect such code to reside.  I'd be happy
to assist with developing ext scripts for enhanced DNS checks.

J
quoted from Mark Felder


On 8 January 2014 07:56, Mark Felder <user-db141d317836@xymon.invalid> wrote:
Is there any hope of enhancing the DNS check capability beyond its
current functionality? It would be nice if it could detect all the NS
for the domain you're monitoring to compare the SOA serial of all the NS
servers and go red if they're not in sync.

list Mark Felder · Tue, 7 Jan 2014 20:44:46 -0600 ·
quoted from Jeremy Laidman
On Jan 7, 2014, at 19:23, Jeremy Laidman <user-71895fb2e44c@xymon.invalid> wrote:
Mark

I think more DNS checks would be really useful for many, but I would say that we'd be going down a rabbit hole chasing this.  The DNS check you've described is worthwhile to do for many people (myself included), but is only one of many that would need to be done to ensure that a name or domain is resolvable.

For example, should the same checks be done for the parent zone(s)?
Why are you monitoring something so far out of your control? I don't monitor the ROOT servers; I trust my friends at Verisign & co to handle that for me.
 Should we check the WHOIS record for impending zone expiry date?
No, and doing so is ignorant; you can easily get banned from WHOIS lookups for abusing it. Use the registrar's APIs.
 Should we check that there is more than one NS record?  Should we check that the NS records don't all point at the same IP addresses?
The question here is "Are the publicly accessible NS servers in a consistent functional state?". The goal is not to validate the data.
 For high-turn-over (eg dynamic) zones, the masters nameservers might only rarely be in sync,
If you expect it to rarely be in sync why would you try to monitor for that?
or the serial number might typically change before all of the SOA lookups are complete.
Of course you'd expect the race condition where the check happens while a change is happening. Waiting for another check is a reasonable way to avoid a false positive.
 What about when there's a stealth master that can't be queried?
I'm not monitoring from an untrusted network; I own these NS servers and can certainly get to my stealth master from my monitoring infrastructure. Also, the theme is "Are the publicly accessible NS servers in a consistent functional state?"
 What about reporting on slave zones that about to expire?
I could see that as useful, but when the query starts failing it will go red. This would be really easy to do though...
 Or zones that have semantic errors such as MX records that refer to CNAME records, or host records with underlines, or CNAME loops?
Again, we're not validating the data just making sure it can be served correctly which mostly amounts to no errors and the serials not being out of whack. This isn't the proper place for those kinds of checks.
 Should we be checking DNSSEC signatures?
No. I wouldn't trust Xymon's implementation of that anyway; that's best handled by your OS's DNS stack. The check will fail if the signature is incorrect because the entire lookup will fail.
Hmm, that list turned into a bit of a rant, really.  Sorry.  You can probably guess that I think about this stuff a fair bit, and many of the things I've listed are more "niche" than others, but still.
I'd say most are niche :-/
For each possible test anyone might want to include, each installation might need different ways of reporting and/or recording statistics, and so it would get complex very quickly.  Do you report a yellow if only 3 out of 4 NS servers are the same, or 7 out of 8?
If any NS are not at the same serial there should be concern. You have no control over which NS the client chooses. (side note: 7 NS is the max recommended by RFC 1912 anyway)
 If the master's serial number somehow goes backwards, do we show seven servers wrong or is it just one?
Alert will happen because they're not in sync anyway. This is a problem for a human familiar with the environment to figure out once they've been informed.
 You you assume that the master is in the MNAME field, or would you get the option to override?
"Are the publicly accessible NS servers in a consistent functional state?"
 If two hosts have different values for the MNAME field, which do you consider master?  Or in this case, do you care?
How is this even happening? This is not a multi-master infrastructure. If the MNAME is different the serial most certainly is as well or you've picked up axfer errors in the logs, etc.
quoted from Jeremy Laidman
 Also, which host(s) would you report the status against?
 Do you have to create hosts.cfg entries for every NS, and then maintain that list by tracking the NS records as they change over time, or do you create a pseudo-host for each domain, or some of each?
I don't care. I'd probably end up doing "127.0.0.1 foo.com # noping fancydnscheck"

The error is telling me there's something wrong with the infrastructure and most likely will tell you which NS is the problem. I'm not interested in tying the event to a specific NS server hosts.cfg entry in Xymon because it's possible that there isn't one.
quoted from Jeremy Laidman
Woops, there I go ranting again, sorry.

Such complexity and flexibility is better implemented outside Xymon, to keep the Xymon core as simple and easy to maintain as possible.

I think the best solution is for each installer to decide on their own detection and reporting requirements, and create or install ext scripts to suit each case.  In fact, I'm surprised there aren't any on Xymonton.org already, but that's where I would expect such code to reside.  I'd be happy to assist with developing ext scripts for enhanced DNS checks.

J


On 8 January 2014 07:56, Mark Felder <user-db141d317836@xymon.invalid> wrote:
Is there any hope of enhancing the DNS check capability beyond its
current functionality? It would be nice if it could detect all the NS
for the domain you're monitoring to compare the SOA serial of all the NS
servers and go red if they're not in sync.

list Jeremy Laidman · Wed, 8 Jan 2014 16:07:40 +1100 ·
Yes, lots of good rebuttals there.  I think I have to agree that most of my
proposed checks are quite niche, but the one you've proposed is probably
the least niche of the lot.  So if a sizeable number of Xymonsters could
make of it without needing too much configuration (hence complex parsing
code), then it's worthy of inclusion.
quoted from Mark Felder

On 8 January 2014 13:44, Mark Felder <user-db141d317836@xymon.invalid> wrote:
The question here is "Are the publicly accessible NS servers in a
consistent functional state?". The goal is not to validate the data.

So, I suppose the "object" you're trying to watch is the
"NS" consistency state of the zone.  So yes, you'd alert against the zone
name such as what you've shown in your hosts.cfg example.

Although I can't speak for Henrik's design model, I do think that xymonnet
is not geared up to monitor this type of object, and instead it expects its
objects to all have either one IP address, or a name that resolves to an IP
address.

So really, I think the answer to your original question is that the DNS
check capability probably can not be easily enhanced to check something
that doesn't look like a host.

But there's no reason I can think of that some zone check code couldn't be
added into some other part of Xymon.  But probably not as an extension to
the existing DNS check.

Many of the internal checks seem to have been modelled on ext scripts.

J
list Henrik Størner · Wed, 08 Jan 2014 09:39:36 +0100 ·
quoted from Jeremy Laidman
Den 08-01-2014 06:07, Jeremy Laidman skrev:
On 8 January 2014 13:44, Mark Felder <user-db141d317836@xymon.invalid
<mailto:user-db141d317836@xymon.invalid>> wrote:

    The question here is "Are the publicly accessible NS servers in a
    consistent functional state?". The goal is not to validate the data.


So, I suppose the "object" you're trying to watch is the
"NS" consistency state of the zone.  So yes, you'd alert against the
zone name such as what you've shown in your hosts.cfg example.
Testing this would essentially do

    dig example.com ns
    <grab the list of dns servers>
    dig @ns1 example.com soa
    dig @ns2 example.com soa
    <compare soa records to see if they are identical>

Xymon can do the DNS lookups, all that is needed is to cook up the 
necessary data analysis.

I think this should be a separate test from the normal "dns" column?


Regards,
Henrik
list Mark Felder · Wed, 08 Jan 2014 06:48:51 -0600 ·
quoted from Henrik Størner

On Wed, Jan 8, 2014, at 2:39, Henrik Størner wrote:
Den 08-01-2014 06:07, Jeremy Laidman skrev:
On 8 January 2014 13:44, Mark Felder <user-db141d317836@xymon.invalid
<mailto:user-db141d317836@xymon.invalid>> wrote:

    The question here is "Are the publicly accessible NS servers in a
    consistent functional state?". The goal is not to validate the data.


So, I suppose the "object" you're trying to watch is the
"NS" consistency state of the zone.  So yes, you'd alert against the
zone name such as what you've shown in your hosts.cfg example.
Testing this would essentially do

    dig example.com ns
    <grab the list of dns servers>
    dig @ns1 example.com soa
    dig @ns2 example.com soa
    <compare soa records to see if they are identical>

Xymon can do the DNS lookups, all that is needed is to cook up the 
necessary data analysis.

I think this should be a separate test from the normal "dns" column?

That is basically all I need at this point. Anything else would require
a far more clever tool in my opinion. A separate column makes sense as
well. Anything to avoid the unnecessary addition of more monitoring
software that will get neglected. Right now we monitor nearly everything
we need in Xymon and the thought of having to deploy Nagios or something
else that implements this functionality is far from ideal. The fewer
systems I have to train people on the better.
list Mark Felder · Wed, 08 Jan 2014 06:54:18 -0600 ·
quoted from Jeremy Laidman

On Tue, Jan 7, 2014, at 23:07, Jeremy Laidman wrote:
Yes, lots of good rebuttals there.  I think I have to agree that most of
my
proposed checks are quite niche, but the one you've proposed is probably
the least niche of the lot.  So if a sizeable number of Xymonsters could
make of it without needing too much configuration (hence complex parsing
code), then it's worthy of inclusion.

On 8 January 2014 13:44, Mark Felder <user-db141d317836@xymon.invalid> wrote:
The question here is "Are the publicly accessible NS servers in a
consistent functional state?". The goal is not to validate the data.

So, I suppose the "object" you're trying to watch is the
"NS" consistency state of the zone.  So yes, you'd alert against the zone
name such as what you've shown in your hosts.cfg example.

Although I can't speak for Henrik's design model, I do think that
xymonnet
is not geared up to monitor this type of object, and instead it expects
its
objects to all have either one IP address, or a name that resolves to an
IP
address.

So really, I think the answer to your original question is that the DNS
check capability probably can not be easily enhanced to check something
that doesn't look like a host.

But there's no reason I can think of that some zone check code couldn't
be
added into some other part of Xymon.  But probably not as an extension to
the existing DNS check.

Many of the internal checks seem to have been modelled on ext scripts.

J
Thanks for the feedback on everything. It's nice to know I'm not talking
to myself on this list :-) You've also made me consider a few scenarios
I previously didn't carefully consider, but I have some other ideas on
how to prevent those errors by enhancing the procedure used to change
DNS entries.
list Gautier Begin · Wed, 8 Jan 2014 13:57:09 +0100 ·
Hello,

I have installed the 4.3.12 version of XYMON after been used to work with 4.3.0 and all working well, except that strange behaviour:
When displaying a windows with a RRD graph, it takes a long time, and in the Apache log I get the following message:
        Sendto failed: Destination address required


Any idea ?

Cordialement, Regards,Mit freundlichen Grüßen,

Gautier BEGIN

System Tools Team Lead
CACEIS and APERAM accounts
CSC Computer Sciences Luxembourg S.A.
12D Impasse Drosbach
L-1882 Luxembourg

Global Outsourcing Service | p:+352 24 834 276 | m:+352 621 229 172 | user-083785ae1711@xymon.invalid | www.csc.com


CSC • This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery.  NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose
 • CSC Computer Sciences SAS • Registered Office: Immeuble Le Balzac, 10 Place des Vosges, 92072 Paris La Défense Cedex, France • Registered in France: RCS Nanterre B 315 268 664
list Adam Goryachev · Fri, 10 Jan 2014 11:02:38 +1100 ·
quoted from Henrik Størner
On 08/01/14 19:39, Henrik Størner wrote:
Den 08-01-2014 06:07, Jeremy Laidman skrev:
On 8 January 2014 13:44, Mark Felder <user-db141d317836@xymon.invalid
<mailto:user-db141d317836@xymon.invalid>> wrote:

    The question here is "Are the publicly accessible NS servers in a
    consistent functional state?". The goal is not to validate the data.


So, I suppose the "object" you're trying to watch is the
"NS" consistency state of the zone.  So yes, you'd alert against the
zone name such as what you've shown in your hosts.cfg example.
Testing this would essentially do

   dig example.com ns
   <grab the list of dns servers>
   dig @ns1 example.com soa
   dig @ns2 example.com soa
   <compare soa records to see if they are identical>

Xymon can do the DNS lookups, all that is needed is to cook up the
necessary data analysis.

I think this should be a separate test from the normal "dns" column?
IMHO, this really should be an ext test. In a couple of minutes, this might almost suffice with some small extra wrappers:
/usr/bin/dnsqr ns mydomain.com|grep ^answer|awk '{ print $5 }'| while read server;do /usr/bin/dnsq soa mydomain.com $server|grep ^answer:|awk '{ print $7 }';done|uniq|wc -l

If the answer is 1, then green, anything else is red.

Of course, I've only tested this against my own domain on my NS, and it worked, but your results may vary. You should add a whole bunch of error checking to ensure that each lookup is successful, returns valid results, etc...
The main reason for doing this was to simply see how hard it would be to complete the task using the djb tools as they are claimed to be easier to use for scripting. I didn't actually attempt this with the bind tools, but past experience suggests it would be more difficult to parse the correct answers/etc...

Sharing was just in case it is useful to others...

Regards,
Adam
-- 
Adam Goryachev Website Managers www.websitemanagers.com.au
list Henrik Størner · Fri, 10 Jan 2014 07:08:43 +0100 ·
quoted from Adam Goryachev
Den 10-01-2014 01:02, Adam Goryachev skrev:
On 08/01/14 19:39, Henrik Størner wrote:
I think this should be a separate test from the normal "dns" column?
IMHO, this really should be an ext test.
You mean, like this?


#!/bin/sh

DOMAIN=$1; shift

SERVERS="`host -tNS $DOMAIN | awk '{print $4}' | xargs echo`"

if test -f /tmp/soarec-$DOMAIN.$$
then
         rm -f /tmp/soarec-$DOMAIN.$$
fi

for H in $SERVERS
do
         SOAREC=`host -tSOA $DOMAIN $H | grep SOA`
         if test $? -eq 0
         then
                 echo "$H: $SOAREC" >> /tmp/soarec-$DOMAIN.$$
         fi
done

RECCOUNT=`cat /tmp/soarec-$DOMAIN.$$ | cut -d: -f2- | sort | uniq -c | wc -l`

if test $RECCOUNT -eq 1
then
         echo "green: All SOA records match"
else
         echo "red: Different SOA records on the nameservers"
fi

cat /tmp/soarec-$DOMAIN.$$
rm -f /tmp/soarec-$DOMAIN.$$

exit 0


Modifying this to actually send a status report is left as an excercise for the reader :-) While you're at it, make it use "xymongrep" to pick up the domains you want to test.


Regards,
Henrik
list Jeremy Laidman · Tue, 14 Jan 2014 16:16:30 +1100 ·
On 8 January 2014 23:57, Gautier Begin <user-083785ae1711@xymon.invalid> wrote:
I have installed the 4.3.12 version of XYMON after been used to work with
4.3.0 and all working well, except that strange behaviour:
When displaying a windows with a RRD graph, it takes a long time, and in
the Apache log I get the following message:
        *Sendto failed: Destination address required*
http://lists.xymon.com/archive/2012-November/035918.html

J