295 lines
15 KiB
Plaintext
295 lines
15 KiB
Plaintext
|
Requirements for Recursive Caching Resolver
|
||
|
(a.k.a. Treeshrew, Unbound-C)
|
||
|
By W.C.A. Wijngaards, NLnet Labs, October 2006.
|
||
|
|
||
|
Contents
|
||
|
1. Introduction
|
||
|
2. History
|
||
|
3. Goals
|
||
|
4. Non-Goals
|
||
|
|
||
|
|
||
|
1. Introduction
|
||
|
---------------
|
||
|
This is the requirements document for a DNS name server and aims to
|
||
|
document the goals and non-goals of the project. The DNS (the Domain
|
||
|
Name System) is a global, replicated database that uses a hierarchical
|
||
|
structure for queries.
|
||
|
|
||
|
Data in the DNS is stored in Resource Record sets (RR sets), and has a
|
||
|
time to live (TTL). During this time the data can be cached. It is
|
||
|
thus useful to cache data to speed up future lookups. A server that
|
||
|
looks up data in the DNS for clients and caches previous answers to
|
||
|
speed up processing is called a caching, recursive nameserver.
|
||
|
|
||
|
This project aims to develop such a nameserver in modular components, so
|
||
|
that also DNSSEC (secure DNS) validation and stub-resolvers (that do not
|
||
|
run as a server, but a linked into an application) are easily possible.
|
||
|
|
||
|
The main components are the Validator that validates the security
|
||
|
fingerprints on data sets, the Iterator that sends queries to the
|
||
|
hierarchical DNS servers that own the data and the Cache that stores
|
||
|
data from previous queries. The networking and query management code
|
||
|
then interface with the modules to perform the necessary processing.
|
||
|
|
||
|
In Section 2 the origins of the Unbound project are documented. Section
|
||
|
3 lists the goals, while Section 4 lists the explicit non-goals of the
|
||
|
project. Section 5 discusses choices made during development.
|
||
|
|
||
|
|
||
|
2. History
|
||
|
----------
|
||
|
The unbound resolver project started by Bill Manning, David Blacka, and
|
||
|
Matt Larson (from the University of California and from Verisign), that
|
||
|
created a Java based prototype resolver called Unbound. The basic
|
||
|
design decisions of clean modules was executed.
|
||
|
|
||
|
The Java prototype worked very well, with contributions from Geoff
|
||
|
Sisson and Roy Arends from Nominet. Around 2006 the idea came to create
|
||
|
a full-fledged C implementation ready for deployed use. NLnet Labs
|
||
|
volunteered to write this implementation.
|
||
|
|
||
|
|
||
|
3. Goals
|
||
|
--------
|
||
|
o A validating recursive DNS resolver.
|
||
|
o Code diversity in the DNS resolver monoculture.
|
||
|
o Drop-in replacement for BIND apart from config.
|
||
|
o DNSSEC support.
|
||
|
o Fully RFC compliant.
|
||
|
o High performance
|
||
|
* even with validation.
|
||
|
o Used as
|
||
|
* stub resolver.
|
||
|
* full caching name server.
|
||
|
* resolver library.
|
||
|
o Elegant design of validator, resolver, cache modules.
|
||
|
* provide the ability to pick and choose modules.
|
||
|
o Robust.
|
||
|
o In C, open source: The BSD license.
|
||
|
o Highly portable, targets include modern Unix systems, such as *BSD,
|
||
|
solaris, linux, and maybe also the windows platform.
|
||
|
o Smallest as possible component that does the job.
|
||
|
o Stub-zones can be configured (local data or AS112 zones).
|
||
|
|
||
|
|
||
|
4. Non-Goals
|
||
|
------------
|
||
|
o An authoritative name server.
|
||
|
o Too many Features.
|
||
|
|
||
|
|
||
|
5. Choices
|
||
|
----------
|
||
|
o rfc2181 decourages duplicates RRs in RRsets. unbound does not create
|
||
|
duplicates, but when presented with duplicates on the wire from the
|
||
|
authoritative servers, does not perform duplicate removal.
|
||
|
It does do some rrsig duplicate removal, in the msgparser, for dnssec qtype
|
||
|
rrsig and any, because of special rrsig processing in the msgparser.
|
||
|
o The harden-glue feature, when yes all out of zone glue is deleted, when
|
||
|
no out of zone glue is used for further resolving, is more complicated
|
||
|
than that, see below.
|
||
|
Main points:
|
||
|
* rfc2182 trust handling is used.
|
||
|
* data is let through only in very specific cases
|
||
|
* spoofability remains possible.
|
||
|
Not all glue is let through (despite the name of the option). Only glue
|
||
|
which is present in a delegation, of type A and AAAA, where the name is
|
||
|
present in the NS record in the authority section is let through.
|
||
|
The glue that is let through is stored in the cache (marked as 'from the
|
||
|
additional section'). And will then be used for sending queries to. It
|
||
|
will not be present in the reply to the client (if RD is off).
|
||
|
A direct query for that name will attempt to get a msg into the message
|
||
|
cache. Since A and AAAA queries are not synthesized by the unbound cache,
|
||
|
this query will be (eventually) sent to the authoritative server and its
|
||
|
answer will be put in the cache, marked as 'from the answer section' and
|
||
|
thus remove the 'from the additional section' data, and this record is
|
||
|
returned to the client.
|
||
|
The message has a TTL smaller or equal to the TTL of the answer RR.
|
||
|
If the cache memory is low; the answer RR may be dropped, and a glue
|
||
|
RR may be inserted, within the message TTL time, and thus return the
|
||
|
spoofed glue to a client. When the message expires, it is refetched and
|
||
|
the cached RR is updated with the correct content.
|
||
|
The server can be spoofed by getting it to visit a especially prepared
|
||
|
domain. This domain then inserts an address for another authoritative
|
||
|
server into the cache, when visiting that other domain, this address may
|
||
|
then be used to send queries to. And fake answers may be returned.
|
||
|
If the other domain is signed by DNSSEC, the fakes will be detected.
|
||
|
|
||
|
In summary, the harden glue feature presents a security risk if
|
||
|
disabled. Disabling the feature leads to possible better performance
|
||
|
as more glue is present for the recursive service to use. The feature
|
||
|
is implemented so as to minimise the security risk, while trying to
|
||
|
keep this performance gain.
|
||
|
o The method by which dnssec-lameness is detected is not secure. DNSSEC lame
|
||
|
is when a server has the zone in question, but lacks dnssec data, such as
|
||
|
signatures. The method to detect dnssec lameness looks at nonvalidated
|
||
|
data from the parent of a zone. This can be used, by spoofing the parent,
|
||
|
to create a false sense of dnssec-lameness in the child, or a false sense
|
||
|
or dnssec-non-lameness in the child. The first results in the server marked
|
||
|
lame, and not used for 900 seconds, and the second will result in a
|
||
|
validator failure (SERVFAIL again), when the query is validated later on.
|
||
|
|
||
|
Concluding, a spoof of the parent delegation can be used for many cases
|
||
|
of denial of service. I.e. a completely different NS set could be returned,
|
||
|
or the information withheld. All of these alterations can be caught by
|
||
|
the validator if the parent is signed, and result in 900 seconds bogus.
|
||
|
The dnssec-lameness detection is used to detect operator failures,
|
||
|
before the validator will properly verify the messages.
|
||
|
|
||
|
Also for zones for which no chain of trust exists, but a DS is given by the
|
||
|
parent, dnssec-lameness detection enables. This delivers dnssec to our
|
||
|
clients when possible (for client validators).
|
||
|
|
||
|
The following issue needs to be resolved:
|
||
|
a server that serves both a parent and child zone, where
|
||
|
parent is signed, but child is not. The server must not be marked
|
||
|
lame for the parent zone, because the child answer is not signed.
|
||
|
Instead of a false positive, we want false negatives; failure to
|
||
|
detect dnssec-lameness is less of a problem than marking honest
|
||
|
servers lame. dnssec-lameness is a config error and deserves the trouble.
|
||
|
So, only messages that identify the zone are used to mark the zone
|
||
|
lame. The zone is identified by SOA or NS RRsets in the answer/auth.
|
||
|
That includes almost all negative responses and also A, AAAA qtypes.
|
||
|
That would be most responses from servers.
|
||
|
For referrals, delegations that add a single label can be checked to be
|
||
|
from their zone, this covers most delegation-centric zones.
|
||
|
|
||
|
So possibly, for complicated setups, with multiple (parent-child) zones
|
||
|
on a server, dnssec-lameness detection does not work - no dnssec-lameness
|
||
|
is detected. Instead the zone that is dnssec-lame becomes bogus.
|
||
|
|
||
|
o authority features.
|
||
|
This is a recursive server, and authority features are out of scope.
|
||
|
However, some authority features are expected in a recursor. Things like
|
||
|
localhost, reverse lookup for 127.0.0.1, or blocking AS112 traffic.
|
||
|
Also redirection of domain names with fixed data is needed by service
|
||
|
providers. Limited support is added specifically to address this.
|
||
|
|
||
|
Adding full authority support, requires much more code, and more complex
|
||
|
maintenance.
|
||
|
|
||
|
The limited support allows adding some static data (for localhost and so),
|
||
|
and to respond with a fixed rcode (NXDOMAIN) for domains (such as AS112).
|
||
|
|
||
|
You can put authority data on a separate server, and set the server in
|
||
|
unbound.conf as stub for those zones, this allows clients to access data
|
||
|
from the server without making unbound authoritative for the zones.
|
||
|
|
||
|
o the access control denies queries before any other processing.
|
||
|
This denies queries that are not authoritative, or version.bind, or any.
|
||
|
And thus prevents cache-snooping (denied hosts cannot make non-recursive
|
||
|
queries and get answers from the cache).
|
||
|
|
||
|
o If a client makes a query without RD bit, in the case of a returned
|
||
|
message from cache which is:
|
||
|
answer section: empty
|
||
|
auth section: NS record present, no SOA record, no DS record,
|
||
|
maybe NSEC or NSEC3 records present.
|
||
|
additional: A records or other relevant records.
|
||
|
A SOA record would indicate that this was a NODATA answer.
|
||
|
A DS records would indicate a referral.
|
||
|
Absence of NS record would indicate a NODATA answer as well.
|
||
|
|
||
|
Then the receiver does not know whether this was a referral
|
||
|
with attempt at no-DS proof) or a nodata answer with attempt
|
||
|
at no-data proof. It could be determined by attempting to prove
|
||
|
either condition; and looking if only one is valid, but both
|
||
|
proofs could be valid, or neither could be valid, which creates
|
||
|
doubt. This case is validated by unbound as a 'referral' which
|
||
|
ascertains that RRSIGs are OK (and not omitted), but does not
|
||
|
check NSEC/NSEC3.
|
||
|
|
||
|
o Case preservation
|
||
|
Unbound preserves the casing received from authority servers as best
|
||
|
as possible. It compresses without case, so case can get lost there.
|
||
|
The casing from the query name is used in preference to the casing
|
||
|
of the authority server. This is the same as BIND. RFC4343 allows either
|
||
|
behaviour.
|
||
|
|
||
|
o Denial of service protection
|
||
|
If many queries are made, and they are made to names for which the
|
||
|
authority servers do not respond, then the requestlist for unbound
|
||
|
fills up fast. This results in denial of service for new queries.
|
||
|
To combat this the first 50% of the requestlist can run to completion.
|
||
|
The last 50% of the requestlist get (200 msec) at least and are replaced
|
||
|
by newer queries when older (LIFO).
|
||
|
When a new query comes in, and a place in the first 50% is available, this
|
||
|
is preferred. Otherwise, it can replace older queries out of the last 50%.
|
||
|
Thus, even long queries get a 50% chance to be resolved. And many 'short'
|
||
|
one or two round-trip resolves can be done in the last 50% of the list.
|
||
|
The timeout can be configured.
|
||
|
|
||
|
o EDNS fallback. Is done according to the EDNS RFC (and update draft-00).
|
||
|
Unbound assumes EDNS 0 support for the first query. Then it can detect
|
||
|
support (if the servers replies) or non-support (on a NOTIMPL or FORMERR).
|
||
|
Some middleboxes drop EDNS 0 queries, mainly when forwarding, not when
|
||
|
routing packets. To detect this, when timeouts keep happening, as the
|
||
|
timeout approached 5-10 seconds, and EDNS status has not been detected yet,
|
||
|
a single probe query is sent. This probe has a sub-second timeout, and
|
||
|
if the server responds (quickly) without EDNS, this is cached for 15 min.
|
||
|
This works very well when detecting an address that you use much - like
|
||
|
a forwarder address - which is where the middleboxes need to be detected.
|
||
|
Otherwise, it results in a 5 second wait time before EDNS timeout is
|
||
|
detected, which is slow but it works at least.
|
||
|
It minimizes the chances of a dropped query making a (DNSSEC) EDNS server
|
||
|
falsely EDNS-nonsupporting, and thus DNSSEC-bogus, works well with
|
||
|
middleboxes, and can detect the occasional authority that drops EDNS.
|
||
|
For some boxes it is necessary to probe for every failing query, a
|
||
|
reassurance that the DNS server does EDNS does not mean that path can
|
||
|
take large DNS answers.
|
||
|
|
||
|
o 0x20 backoff.
|
||
|
The draft describes to back off to the next server, and go through all
|
||
|
servers several times. Unbound goes on get the full list of nameserver
|
||
|
addresses, and then makes 3 * number of addresses queries.
|
||
|
They are sent to a random server, but no one address more than 4 times.
|
||
|
It succeeds if one has 0x20 intact, or else all are equal.
|
||
|
Otherwise, servfail is returned to the client.
|
||
|
|
||
|
o NXDOMAIN and SOA serial numbers.
|
||
|
Unbound keeps TTL values for message formats, and thus rcodes, such
|
||
|
as NXDOMAIN. Also it keeps the latest rrsets in the rrset cache.
|
||
|
So it will faithfully negative cache for the exact TTL as originally
|
||
|
specified for an NXDOMAIN message, but send a newer SOA record if
|
||
|
this has been found in the mean time. In point, this could lead to a
|
||
|
negative cached NXDOMAIN reply with a SOA RR where the serial number
|
||
|
indicates a zone version where this domain is not any longer NXDOMAIN.
|
||
|
These situations become consistent once the original TTL expires.
|
||
|
If the domain is DNSSEC signed, by the way, then NSEC records are
|
||
|
updated more carefully. If one of the NSEC records in an NXDOMAIN is
|
||
|
updated from another query, the NXDOMAIN is dropped from the cache,
|
||
|
and queried for again, so that its proof can be checked again.
|
||
|
|
||
|
o SOA records in negative cached answers for DS queries.
|
||
|
The current unbound code uses a negative cache for queries for type DS.
|
||
|
This speeds up building chains of trust, and uses NSEC and NSEC3
|
||
|
(optout) information to speed up lookups. When used internally,
|
||
|
the bare NSEC(3) information is sufficient, probably picked up from
|
||
|
a referral. When answering to clients, a SOA record is needed for
|
||
|
the correct message format, a SOA record is picked from the cache
|
||
|
(and may not actually match the serial number of the SOA for which the
|
||
|
NSEC and NSEC3 records were obtained) if available otherwise network
|
||
|
queries are performed to get the data.
|
||
|
|
||
|
o Parent and child with different nameserver information.
|
||
|
A misconfiguration that sometimes happens is where the parent and child
|
||
|
have different NS, glue information. The child is authoritative, and
|
||
|
unbound will not trust information from the parent nameservers as the
|
||
|
final answer. To help lookups, unbound will however use the parent-side
|
||
|
version of the glue as a last resort lookup. This resolves lookups for
|
||
|
those misconfigured domains where the servers reported by the parent
|
||
|
are the only ones working, and servers reported by the child do not.
|
||
|
|
||
|
o Failure of validation and probing.
|
||
|
Retries on a validation failure are now 5x to a different nameserver IP
|
||
|
(if possible), and then it gives up, for one name, type, class entry in
|
||
|
the message cache. If a DNSKEY or DS fails in the chain of trust in the
|
||
|
key cache additionally, after the probing, a bad key entry is created that
|
||
|
makes the entire zone bogus for 900 seconds. This is a fixed value at
|
||
|
this time and is conservative in sending probes. It makes the compound
|
||
|
effect of many resolvers less and easier to handle, but penalizes
|
||
|
individual resolvers by having less probes and a longer time before fixes
|
||
|
are picked up.
|
||
|
|