Critical Cache Poisoning Vulnerability in Dnsmasq
lists.thekelleys.org.uk127 points by westurner 2 days ago
127 points by westurner 2 days ago
I'm confused. There are no "special characters" in domain names -- they're length-tagged 8-bit clean. Hostnames (as opposed to domain names) are limited by convention to a subset of ASCII, but that shouldn't impact resolver logic.
What resolvers silently discard (or do anything else weird with) requests with QNAMES that have non-hostname queries (which aren't "malformed")?
The "special character" thing sounds like a red herring: IIUC, dnsmasq isn't dealing with lost responses correctly, creating a window for birthday collision attack?
Dnsmasq forwards invalid requests (containing invalid characters) to the resolver. The resolver silently ignores these requests.
However, Dnsmasq continues to wait for a response. The attacker only needs to brute force 32 bits (source port and TxID) to falsify a response and poison the cache.
The correct and expected behaviour of Dnsmasq would have been not to forward invalid requests to the resolver.
No.
They aren't "invalid requests". You can put literally anything in a domain name (see RFC 2181, section 11) and the upstream should respond. I'm curious what resolvers are dropping these requests.
The correct behavior is for dnsmasq to forward requests to the upstream regardless of the content of the QNAME. If dnsmasq doesn't get a response back in some reasonable amount of time, it should (probably) return SERVFAIL to its client.
Further, DNS mostly uses UDP which is unreliable -- all DNS clients must deal with the query or response being lost. Dnsmasq's timeouts might be overly long (I didn't bother to check), but this is a minor configuration issue.
This sounds like the (well known) birthday attack, the defense of which is precisely the point of DNSSEC. AFAIK, dnsmasq supports DNSSEC, so the right answer is to turn on validation.
(bug in HN, have to have this for next block to format correctly)
--fast-dns-retry=[<initial retry delay in ms>[,<time to continue retries in ms>]]
Under normal circumstances, dnsmasq relies on DNS clients to do retries; it does not generate timeouts itself. Setting this option instructs dnsmasq to generate its own retries starting after a delay which defaults to 1000ms. If the second parameter is given this controls how long the retries will continue for otherwise this defaults to 10000ms. Retries are repeated with exponential backoff. Using this option increases memory usage and network bandwidth. If not otherwise configured, this option is activated with the default parameters when --dnssec is set.
--dnssec
Validate DNS replies and cache DNSSEC data. When forwarding DNS queries, dnsmasq requests the DNSSEC records needed to validate the replies. The replies are validated and the result returned as the Authenticated Data bit in the DNS packet. In addition the DNSSEC records are stored in the cache, making validation by clients more efficient. Note that validation by clients is the most secure DNSSEC mode, but for clients unable to do validation, use of the AD bit set by dnsmasq is useful, provided that the network between the dnsmasq server and the client is trusted. Dnsmasq must be compiled with HAVE_DNSSEC enabled, and DNSSEC trust anchors provided, see --trust-anchor. Because the DNSSEC validation process uses the cache, it is not permitted to reduce the cache size below the default when DNSSEC is enabled. The nameservers upstream of dnsmasq must be DNSSEC-capable, ie capable of returning DNSSEC records with data. If they are not, then dnsmasq will not be able to determine the trusted status of answers and this means that DNS service will be entirely broken.
Query ID prediction attacks are not in fact the point of DNSSEC, which will not actually meaningfully address this attack because almost nothing in the DNS is signed.
> Query ID prediction attacks are not in fact the point of DNSSEC
Do you deny DNSSEC's goal is to protect DNS data? Do you deny "Query ID prediction attacks" (or more generally, flooding attacks) aim to corrupt DNS data? Do you deny the 16-bit transaction ID allows for effective flooding attacks?
As for "almost nothing in the DNS is signed", while it's true the percentage of second-level domains aren't signed, the DNS root is signed, all generic top-level domains, and the vast majority of country code TLDs are signed. In some countries (e.g., The Netherlands) more than 50% of the zones in their ccTLD are signed. As we've seen empirically, with improved automation/tools and authoritative servers that turn on DNSSEC-signing by default, the percentage will go up.
I deny that query ID protection was the impetus for the development of DNSSEC and that the earliest advocacy for it as an operational security tool, rather than a (government-funded) design improvement for the entire TCP/IP stack, was about query ID prediction. Like you, I was there at the time; if the NANOG archives go back that far, you'll see me on the threads babbling about this.
This notion of DNSSEC signatures being widespread comes up in every thread about the protocol. Here's a little thingy I threw together because I got tired of typing out the bash "dig" loop to regenerate it in threads:
Note that the Tranco list is international, so captures popular zones in places that have automatic (and security-theatric) DNSSEC signatures, as well as amplifying the impact of vendors like Cloudflare who have several different zones in the top 1000. Even with all that included: single digits.
It's been over 30 years of tooling work on DNSSEC --- in recent time intervals, DNSSEC adoption in North America has gone down. Stick a fork in it.
I guess you and I were at different meetings. I was at meetings at NSF with TIS folks that resulted in funding for DNSSEC implementation in BIND where the presentation focused on the 16-bit transaction field (and included a live demonstration), so I'll stand by my view that the point of DNSSEC was to address that particular flaw.
In any event, that's a nice site that provides useful stats.
I remember tearing my hair out in the pre-2008 era as folks tried to get source-port randomization into Bind. The response was "That's what DNSSEC is for" ... which further supports your narrative. But it's still very damning.
Source port randomization, BCP38, and then the 0x20 qname capitalization trick, all turned out to be far more practical mitigations for query-id concerns and others prioritized them. "We really need this massive internet-wide jobs-program lift of the entire Internet, without even providing confidentiality, to solve this query-id issue. Never mind the easier fixes."
Wow this brings up memories. I was at OpenDNS when Dan gave us the heads up.
I'll just leave this here: https://blog.netherlabs.nl/articles/2008/07/09/some-thoughts...