DNSSEC disruption affecting .de domains ā Resolved
status.denic.de660 points by warpspin 13 hours ago
660 points by warpspin 13 hours ago
Looks like a DNSSEC issue, not a nameserver outage. Validating resolvers SERVFAIL on every .de name with EDE:
RRSIG with malformed signature found for a0d5d1p51kijsevll74k523htmq406bk.de/nsec3 (keytag=33834) dig +cd amazon.de @8.8.8.8 works, dig amazon.de @a.nic.de works. Zone data is intact, DENIC just published an RRSIG over an NSEC3 record that doesn't validate against ZSK 33834. Every validating resolver therefore refuses to answer.
Intermittency fits anycast: some [a-n].nic.de instances still serve the previous (good) signatures, so retries occasionally land on a healthy auth. Per DENIC's FAQ the .de ZSK rotates every 5 weeks via pre-publish, so this smells like a botched rollover.
So a single configuration mistake in a single place wiped out external reachability of a major economy. It happened in the evening local time and should be fixable, modulo cache TTLs, by morning. This will limit the blast radius somewhat.
Still, at this level, brittle infrastructure is a political risk. The internet's famous "routing around damage" isn't quite working here. Should make for an interesting post mortem.
I am reminded of the warning that zonemaster gives about putting your domain name servers on a single AS, as is common practice for many larger providers. A lot of people do not want others to see this as a problem since a single AS is a convenient configuration for routing, but it has the downside of being a single point of failure.
Building redundant infrastructure that can withstand BGP and DNS configuration mistakes are not that simple but it can be done.
As the CPU/RAM resources to run an authoritative-only slave nameserver for a few domains are extremely minimal (mine run at a unix load of 0.01), it's a very wise idea to put your ns3 or something at a totally different service provider on another continent. It costs less than a cup of coffee per month.
For a very long time, the computer club I was in operated a DNS server on a Pentium 75MHz and after the last major hardware upgrade it had a total of 110MB RAM memory and 2G disk space. It worked great except that before the upgrade it tended to run out of ram whenever there was a Linux kernel update, a problem we solved forever by populating all the ram slots with the maximum that the motherboard could handle to that nice 110 MB.
On Google cloud it's always four nameservers like
ns-cloud-c1.googledomains.com
ns-cloud-c2.googledomains.com
ns-cloud-c3.googledomains.com
ns-cloud-c4.googledomains.com
Would not make any sense to do four of them if it's a single AZ. Also, they are geo-aware and routed to your nearest region.DNS is a centralization risk, yes. Somehow we've decided this is fine. DNSSEC isn't the only issue - your TLD's nameservers could also be offline, or censored in your country.
DNS is barely centralized. Is there an alternative global name lookup system that is less centralized without even worse downsides?
GNS is the obvious response here, in addition to the various blockchain based solutions. Nothing that enjoys widespread support or mindshare unfortunately.
Even the current centralized ICANN flavor could be substantially more resilient if it instead handed out key fingerprints and semi-permanent addresses when queried. That way it would only ever need to be used as a fallback when the previously queried information failed to resolve.
BGP, but the names in question are limited to 128 bits, of which at most 48 will be looked up, and you don't get to choose which 48 bits are assigned to you.
Normally it should not have been, with cache and all, but that was the past...
Think about what would happen the day that letsencrypt is borken for whatever reason technical or like having a retarded US leader and being located in the wrong country. Taken into account the push of letsencrypt with major web browsers to restrict certificate validities for short periods like only a few days...
Let's Encrypt has to be down for days before people begin to feel the pain. DNS is very different, it breaks stuff immediately everywhere.
No it doesn't. DNS breaks as soon as TTLs run out. It's your choice to set them so low that stuff breaks immediately.
What do you recommend then? DNS doesn't usually change that often, but if you mess it up when it does, you're in for some pain if TTLs are high!
Not the one you're replying to, but I'd keep TTL high normally and lower it one TTL ahead of a planned change.
Not really? .com and .net are still up
If Let's Encrypt goes down, half of the Internet will become inaccessible in a week.
Presumably if LetsEncrypt goes down and stays down for a week, the sites that go down are the ones that see that their CA went down and at no point in the week take the option to get certs from a different CA?
Are there alternative CAs that are anywhere as easy to deal with as Lets encrypt?
I guarantee that there are a ton of sites out there not monitoring their certs.
"The internet's famous "routing around damage" isn't quite working here."
DNS is a look up service that runs on the internet.
Internet routing of IP packets is what the internet does and that is working fine (for a given value of fine).
You remind me of someone using the term "the internet is down" that really means: "I've forgotten my wifi password".
> So a single configuration mistake in a single place wiped out external reachability of a major economy.
Real world beats sci-fi :) And isn't it why we love IT ? And hate it too, because of "peoples in charge"...
fail-closed protocols have introduced some brittleness. A HTTP 1.0 server from 1999 probably still can service visitors today. A HTTPS/TLS 1.0 server from the same year wouldn't.
I think I see the point you're making here and I agree.
There is designing something to be fail-closed because it needs to be secure in a physical sense (actually secure, physically protected), and then there's designing something fail-closed because it needs to be secure from an intellectual sense (gatekept, intellectually protected). While most of the internet is "open source" by nature, the complexity has been increased to the point where significant financial and technical investment must be made to even just participate. We've let the gatekeepers raise the gates so high that nobody can reach them. AI will let the gatekeepers keep raising the gates, but then even they won't be able to reach the top. Then what?
I think the point you're trying to make, put another way is in the context of "availability" and "accessibility" we've compromised a lot of both availability and accessibility in the name of security since the dawn of the internet. How much of that security actually benefits the internet, and how much of that security hinders it? How much of it exists as a gatekeeping measure by those who can afford to write the rules?
You're not wrong but objecting to fail-closed in a security sensitive context is entirely missing the point.
>So a single configuration mistake in a single place wiped out external reachability of a major economy.
And fuck nothing at all happened as a result.
Prove it? Iām sure many lifespans were lost to stress
As someone with oncall yesterday it was a fun experience, but you noticed quickly that everything .de was down and then it was just a waiting game.
We had a short discussion about migrating to .com, but decided risk != reward as no one would know the new tld
I assume there are a couple people working for denic who had a stressfull night..
There is the kritis (critical infrastructure law) law, which trys to enforce some standards to make things not as brittle.
I have a bad feeling, that the impact will be quite severe for some services, as monitoring, performance, and security services might get disrupted. and just cleaning up is a big mess.. Worst case, some ot will experience outage and / or damage. But maybe I am just overestimating the severity of this.
It looks like a failed key replacement during a scheduled maintenance event. Normally this sort of thing is thoroughly tested and has multiple eyes on for detailed review and planning before changes get committed, but obviously something got missed.
> The internet's famous "routing around damage"
...is only for Pentagon networks and military stuff. It's not for us normal people. (We get Cloudflare and FAANG bullshit instead.)
This is actually startlingly true.
Every FAANG company has their own fiber backbone. Why invest the internet that everyone uses when you can invest in your own private internet and then sell that instead?
It's not like the long-haul fiber not owned by FAANG is a public utility, at least not in most places.
Traffic that goes over "the Internet" traverses some mix of your ISP's fiber, fiber belonging to some other ISP they have a deal with, then fiber belong to some ISP they have a deal with, etc.
All those ISPs are being paid to provide service, they can invest in their own networks.
I love how I work with IT for 20 years and don't understand a single acronym here other than DNSSEC
I've been in IT 30+ years, been running DNS, web servers, etc. since at least 1994. I haven't bothered with DNSSEC due to perceived operational complexity. The penalty for a screw up, a total outage, just doesn't seem worth the security it provides.
That was my experience too until I decided that just running email systems for 30 odd years when HN says that is unnatural piqued my weird or something!
I ran up three new VMs on three different sites. I linked all three systems via a private Wireguard mesh. MariaDB on each VM bound to the wg IP and stock replication from the "primary". PowerDNS runs across that lot. One of the VMs is not available from the internet and has no identity within the DNS. The idea is that if the Eye of Sauron bears down on me, I can bring another DNS server online quite quickly and fiddle the records to bring it online. It also serves as a third authority for replication.
I also deployed https://github.com/PowerDNS-Admin/PowerDNS-Admin which is getting on a bit and will be replaced eventually but works beautifully.
Now I have DNS with DNSSEC and dynamic DNS and all the rest. This is how you start signing a zone and PowerDNS will look after everything else:
# pdnsutil secure-zone example.co.uk
# pdnsutil zone set-nsec3 example.co.uk
# pdnsutil zone rectify example.co.uk
Grab a test zone and work it all out first, it will cost you not a lot and then go for "production".My home systems are DNSSEC signed.
How simple sysadmin was in 1994 with no cryptography on any protocol. Everything could be easily MITM'd. Your credit card number would get jacked left and right in the 90s.
Nobody was taking credit cards online then. Your telnet sessions were easily sniffed, however.
Not in '94, sure. But a couple of years later it was common and SSL was still uncommon, for a bunch of reasons, and also everyone was storing the card numbers in plaintext on their servers too.
Telnet was sniffed. IRC was being sniffed and logged.
Cool. Feel free to explain how to tighten things up.
I've just given them part of a recipe for using DNSSEC. I suspect you are not actually human .. qingcharles.
I don't even understand what your comment is about, my dude. Given who a recipe? DENIC?
To be fair, advanced real world knowledge of public/private key PKIs (x.509 or other), things like root CAs, are a fairly esoteric and very specialized field of study. There's people whose regular day jobs are nothing but doing stuff with PKI infrastructure and their depth of knowledge on many other non-PKI subjects is probably surface level only.
I know quite a bit about PKI and X.509, and I can tell you that much: the overlap with how DNSSEC works is limited.
As is the overlap between DNSSEC and DNS itself, to be honest.
I once worked at the level of administering DNSSEC for 300+ TLDs. It's its own world. When that company was winding down, I tried to continue in the field but the most common response (outside of no response, of course), was 'we already have a DNS team/vendor/guy.' And well, then things like this happen. I won't throw stones though, it's a lot to learn and can be incredibly brittle.
Is that actually true, though? Even though it's not really my job, I find myself debugging certificates and keys at least once a month, and that's after automating as much as possible with certbot and cloud certificates. PKI always seems to demand attention.
In my initial comment, I meant more in terms of complexity and planning from the perspective of the people who are running the public/private key infrastructure on the other side/upstream of what you're doing as a letsencrypt end user.
Broadly similar general concept to the team responsible for the DNSSSEC signing keys for an entire ccTLD.
Yeah a x509 PKI / root CA is a very different thing than DNSSSEC but they have a number of general logical similarities in that the chain of trust ultimately comes down to a "do not fuck this up" single point of failure.
It's not made easier by the fact that a lot of cryptography is either very old and arcane or it's one hell of a mess of code that doesn't make sense without reading standards.
I had the misfortune of having to dig deep into constructing ASN.1 payloads by hand [1] because that's the only thing Java speaks, and oh holy hell is this A MESS because OF COURSE there's two ways to encode a bunch of bytes (BIT STRING vs OCTET STRING) and encoding ed25519 keys uses BOTH [2].
And ed25519 is a mess in itself. The more-or-less standard implementation by orlp [3] is almost completely lacking any comments explaining what is going on where and reading the relevant RFCs alone doesn't help, it's probably only understandable by reading a 500 pages math paper.
It's almost as if cryptographers have zero interest in interested random people to join the field.
End of rant.
[1] https://github.com/msmuenchen/meshcore-packets-java/blob/mai...
[2] https://datatracker.ietf.org/doc/html/rfc8410#appendix-A
The trick to asn.1 is to generate both parser and serializer from the spec. Elliptic curve math on the other hand is ... yeah, you need to know the math and also know the tricks to code that implements it. Both of those have steep learning curve, but it's hardly because it's a mess or it's old.
The problem with ASN.1 is that it is big and complicated, and you only need a fraction of it for cryptography, and it isn't really used for anything outside of pki anymore.
It wouldn't be as bad if asn.1 had cought on more as a general purpose serialization format and there were ubiquitous decent libraries for dealing with it. But that didn't happen. Probably partly because there are so many different representations of asn.1.
A bespoke serialization specifically for certificates might actually have aged better, if it was well designed.
Assuming there are some libraries for it, would this make a pretty good case for LLM-generated ports of these existing libraries into other languages or onto other OSs/platforms? One implementation could be treated as "the spect".
ASN.1 is protobufs designed by committee. It is a general-purpose serialization format, but there's no good reason to choose it instead of protobufs.
> Both of those have steep learning curve, but it's hardly because it's a mess or it's old.
Bitpacking structures used to be important in the 60s. That time has passed, unless you're dealing with LoRa, NFC or other cases of highly constrained bandwidth there are way better options to serialize and deserialize information. It's time to move on, and the complexity of all the legacy garbage in crypto has been the case of many a security vulnerability in the past.
As for the code, it might be personal preference but I'd love to have at least some comments referring back to a specification or original research paper in the code.
I think you misunderstand the problem asn.1 solves and constrains it works within (both 30 years ago and now). We sure can have a better one now once we learned all the lessons and know what good parts to keep, but this critique of bitpacking is misplaced.
ASN.1 is not used because of just bitpacking. There are other benefits to ASN.1 and it's probably one of the least problematic parts there.
People who have thought they can do better have made things like PGP. It's one of the worst cryptographic solutions out there. You're free to try as well though.
People who though they can do better did JWT, that is not complicated at all and has no bugs as well. Also solves 20% of what asn.1 is used for.
Maybe a bit pedantic, but it would actually be the more general JOSE which includes tokens (JWT), signatures (JWS), and key transmission (JWK).
And there is a related binary format that uses CBOR (COSE) as well.
The typical vector for entering cryptography as a professional is called "grad school".
X.509 is a deep legacy, but at least at this point it's well tested.
> because that's the only thing Java speaks
No, it most definitely is not. You can just construct a private key directly in BouncyCastle: https://downloads.bouncycastle.org/java/docs/bcprov-jdk18on-...
I'm 100% certain that you also can do that with raw java.security. I did that about 15 years ago with raw RSA/EC keys. You can just directly specify the private exponent for RSA (as a bigint!) or the curve point for EC.
Ditto for ed25519, you can just take the canonical implementation from DJB. And you really really shouldn't do that anyway, please just use OpenSSL or another similar major crypto library.
I wouldn't recommend touching openssl (the library, command line tools are okay-ish) with anything that breaths life.
> I'm 100% certain that you also can do that with raw java.security.
I tried that, the problem is Meshcore specific - they do their own weird shit with private and public keys [1]. Haven't figured out how to do the private key import either, because in the C source code (or in python re-implementations) Meshcore just calls directly into the raw ed25519 library to do their custom math... it's a mess.
[1] https://jacksbrain.com/2026/01/a-hitchhiker-s-guide-to-meshc...
I'm playing with LORA/Meshcore right now (I have an nRF52840 lying around). I'm pretty sure I know how to do that, will take a look.
Apparently the DENIC team was on a party this evening! Party hard, but not too hard. https://bsky.app/profile/denic.de/post/3ml4r2lvcjg2h
A real party killer if I have ever seen one.
At least all of the appropriate people were in a room together when the outage happened
Sounds like poor risk pooling. If that room crashed, we'd have nobody to fix this.
nation state actor picking right time to sabotage a tiny part of the key rotation process. on monday someone cut major fiber lines, on tuesday DENIC is failing.
maybe someone is showing off?
Unironically yeah, we are at the level of weaponizable sophistication that this metaphorical dick waving you are suggesting is probably something that happens
Interesting "bus problem" to have in a scenario where everyone who is qualified, experienced and trusted enough to commit lives changes (or perform a revert, undo results of a botched maintenance, etc) in an emergency situation is not completely sober.
Sobriety is just factor to be weighed in an emergency situation. 30 years ago I was at a ski resort with about 50 friends having a drinking competition in the resort's main bar. Late that night two ski lodges collapsed, trapping people inside. Around midnight, soon after the winner was announced, the police entered and asked "who's able to drive a crane truck?" The winner of the competition put his hand up and informed them of how much he had had to drink. Don't care they said, so he drove a crane big enough to lift a building up a single lane 35km mountain road in nighttime ice conditions. (The crane made it, but sadly most of the people in the ski lodges didn't. https://en.wikipedia.org/wiki/1997_Thredbo_landslide )
Sounds like Australian police. I remember 15 or so years ago being in a big team assisting the Australian police with something on a remote farm. There were 20 people that needed to be taken back to base and one 10 seater car. Someone asked the police if everyone could get in the car and policeman shrugged and said you can try. So the policeman drove a four wheel drive across farmland with 16 people stuffed into the back.
Cloudflare has now disabled DNSSEC validation on their 1.1.1.1 resolver: https://www.cloudflarestatus.com/incidents/vjrk8c8w37lz
Welp. I think can call it on DNSSEC now.
OTOH there was recently a DNSSEC-saved-the-day piece of news: https://incrypted.com/en/dns-attack-on-eth-limo-was-stopped/
That only worked because the attacker didn't understand dnssec. If they had unsigned the domain first and then hijacked it they would have succeeded.
I haven't been able to find any cases of genuine dns hijack attacks in the last few years. Would love to know if anyone else can?
Only about 40% of the crypto companies seem to use dnssec. Seems like a target rich environment.
Probably the most common reason to use DNSSEC is to check a box on a list of compliance rules. And I don't think this will change anything for people who need DNSSEC for compliance.
There's no commercial compliance regime that requires DNSSEC (FedRAMP might be the only exception --- I'm uncertain about the current state of FedRAMP DNSSEC rules --- but that makes sense given that DNSSEC is a giant key escrow scheme.)
FedRAMP requires it, although like many requirements, you may be able to get out of it if you have a good reason and/or your sponsoring agency doesn't care about it.
There are also some large businesses that require, or strongly pressure SaaS providers to use DNSSEC. You can often contest that, but if you have DNSSEC, that's one less thing to argue about in the contract.
Which businesses are those? (I ask because if they're North American, I have a pretty good sense of which large North American businesses even have DNSSEC signatures set up, and it's not many; small enough that you can easily memorize the list.)
Probably the most common reason to use TLS is to check a box on a list of compliance rules. Is that bad?
Do browsers even load non-HTTPS sites anymore without a massive warning?
neverssl.com works fine for me, only a small warning in the place where the padlock usually is, that no-one checks anyway.
The browser would be very unhappy with an <input type="password"/> on a non-TLS site (localhost excepted). HSTS would trigger the "massive" warning and refuse to load the site, however.
Yes, they do.
Yeah just ignore the big "not secure" warning in the URL bar
I just checked it. You mean the very small open padlock icon? The era of browsers warning loudly about HTTP was a decade ago, it got reversed due to pushback.
Well I checked both Chrome and Firefox on mobile and my desktop and they were all much more obvious than just an "open padlock". They both said "Not Secure" and in Firefox it was bright red text. Also in incognito mode Chrome refused to even open the site without a full screen warning. They all make it super clear non-HTTPS sites are not secure so I'm not really sure what your point is?
I doubt it. The root cause of this was a root server misconfiguration or bug. It happened to DNSSEC records this time, which is a pain, but next time it might as well flip bits or point to wrong IP addresses instead.
Paradoxically, resolvers wouldn't have noticed the misconfiguration if it weren't for DNSSEC.
Hahaha. You wish :-p
It's a pretty hard argument to work around: WebPKI certificates should go in the DNS, and also the largest DNS providers might at any moment decide not to validate DNSSEC anymore to get through an outage.
Yes, it's a crappy outcome, but endpoints can still choose to enforce this. Further, it's not a persuasive argument against more DNSSEC usage, since if there was more DNSSEC usage then resolvers would be more reluctant to disable it.
If there's going to be a single point of failure in front of your website, that single point of failure may as well be the only single point of failure instead of having two single points of failure, and it's probably important that people can't spoof responses.
Nobody had to hack it. A system at DENIC broke, and so Cloudflare turned off DNSSEC validation for all of their users accessing .de. If DNSSEC was actually important for the security model of those users, that would be a huge deal.
If DNSSEC is part of your security model, you want local validation. Not relying on third party resolver that you don't have a contract with.
Beyond that, DNS has the AD bit. If you need DNSSEC secure data (for example for the TLSA record), then when Cloudflare turns off DNSSEC validation, the AD bit will be clear and things will stop working.
If it turns out the DNSSEC issue was caused by threat actors, this downstream effect could very well have been the reason to do it.
It is indeed a bit sad that Cloudflare had to turn off DNSSEC completely. But I completely understand that they don't have a production-ready, tested path to override DNSSEC validation for only some domains.
Sorry! status message was not clear. DNSSEC validation is temporarily disabled only for .de domains.
[flagged]
Originally it said:
---
The issue has been identified as a DNSSEC signing problem at DENIC, the organization responsible for the .DE top-level domain. Cloudflare has temporarily disabled DNSSEC validation on 1.1.1.1 resolver in order to allow .DE names to continue to resolve. DNSSEC validation will be re-enabled when the signing problems at DENIC are known to have been resolved.
---
(and in case it changes again, now it says)
---
The issue has been identified as a DNSSEC signing problem at DENIC, the organization responsible for the .DE top-level domain. Cloudflare has temporarily disabled DNSSEC validation for .de domains on 1.1.1.1 resolver (as per RFC 7646) in order to allow .DE names to continue to resolve. DNSSEC validation will be re-enabled when the signing problems at DENIC are known to have been resolved.
See RFC 7646 for more details: https://datatracker.ietf.org/doc/html/rfc7646
---
The RFC 7646 thing here is the funniest possible addition. This is the greatest day.
It didn't originally say that. They added the clarification just a few minutes ago. The guidelines ask you not to ask people these kinds of questions, for what it's worth.
We only disabled SSL on all the websites in one country for a little bit.. I'm sure those credit card numbers were perfectly safe over the wire
I must be early. There's not a single tptacek DNSSEC rant in this thread yet.
doesn't this event speak for itself though?
Kind-of. But there are worse things than outages when it's PKIs we're talking about. DNSSEC is also extremely opaque and unmonitored. Any compromise will not be noticed. Nor will anyone have any recourse against misbehaving roots.
Fun fact, CloudFlare has used the same KSK for zones it serves more than a decade now.
Which is fine. Not because KSK rollover is supposedly complicated, but if you can't manage to keep your private keys and PKI safe in the first place then key rotation is just a security circus trick. But if you do know how to keep them safe, then...
It is not fine. Keeping key material safe is not a boolean between "permanently safe" and "leaks immediately".
Keeping key material secure for more than a decade while it's in active use is vastly more complex than keeping it secure for a month, until it rotates.
For all we know, some ex-employee might be walking around with that KSK, theoretically being able to use it for god knows what for an another decade.
> Keeping key material secure for more than a decade while it's in active use is vastly more complex than keeping it secure for a month, until it rotates.
Nope. Key material rotation is just circus when it's done for the sake of rotation.
> For all we know, some ex-employee might be walking around with that KSK, theoretically being able to use it for god knows what for an another decade.
Or maybe an employee has compromised the new key that is going to be rotated in, while the old key is securely rooted in an HSM?
> Or maybe an employee has compromised the new key that is going to be rotated in, while the old key is securely rooted in an HSM?
Also possible, but that'd be an active threat that has some probability of being caught.
Never replacing keys allows permanent compromise that can only be caught if someone directly observes misuse.
Though nobody monitors DNSSEC like that, nor uses it, so it's fine from that aspect I guess.
The point of rotation for these kinds of keys is that it limits the blast radius of what happens if an employee compromises such a key. This is sort of like how there are one or two die-hard PGP advocates who have come up with a whole Cinematic Universe where authenticated encryption is problematic ("it breaks error recovery! it's usually not what you want!") because mainstream PGP doesn't do it. Except here, it's that key rotation is bad, because of how often DNSSEC has failed to successfully pull off coordinated key rotations.
I can see the periodic rotations used as a way to keep up the operational experience. This is indeed a valid reason, although it needs to be weighted against the increased risk of compromise due to the rotation procedure itself.
I'm just saying that rotating the key just in case someone compromised it is not a great idea. Doubly so if it's done infrequently enough for the operational experience to atrophy between rotations.
And yeah, I fully agree that anything surrounding the DNSSEC operations is a burning trash fire. It doesn't have to be this way, but it is.
I'm glad we agree about DNSSEC, but the rationale I'm giving you for key rotation is the same reason we use short-lived secrets everywhere in modern cryptosystems. It's not controversial (except among Unix systems administrators).