Branch Privilege Injection: Exploiting branch predictor race conditions
comsec.ethz.ch421 points by alberto-m a year ago
421 points by alberto-m a year ago
See also: ETH Zurich researchers discover new security vulnerability in Intel processors - https://ethz.ch/en/news-and-events/eth-news/news/2025/05/eth...
Researchers' blog post: https://comsec.ethz.ch/research/microarch/branch-privilege-i... Paper: https://comsec.ethz.ch/wp-content/files/bprc_sec25.pdf Thanks! We've changed the URL above from the university press release (https://ethz.ch/en/news-and-events/eth-news/news/2025/05/eth...) to that first link. Impact illustration: > [...] the contents of the entire memory to be read over time, explains Rüegge. “We can trigger the error repeatedly and achieve a readout speed of over 5000 bytes per second.” In the event of an attack, therefore, it is only a matter of time before the information in the entire CPU memory falls into the wrong hands. Prepare for another dive maneuver in the benchmarks department I guess. We need software and hardware to cooperate on this. Specifically, threads from different security contexts shouldn't get assigned to the same core. If we guarantee this, the fences/flushes/other clearing of shared state can be limited to kernel calls and process lifetime events, leaving all the benefits of caching and speculative execution on the table for things actually doing heavy lifting without worrying about side channel leaks. I get you, but devs struggle to configure nginx to serve their overflowing cauldrons of 3rd party npm modules of witches incantations. Getting them securely design and develop security labelled cgroup based micro (nano?) compute services for inferencing text of various security levels is beyond even 95% of coders. I'd posit that it would be a herculean effort even for 1% devs. Just fix the processors? It's not a "just" if the fix cripples performance; it's a tradeoff. It is forced to hurt everything everywhere because the processor alone has no mechanism to determine when the mitigation is actually required and when it is not. It is 2025 and security is part of our world; we need to bake it right into how we think about processor/software interaction instead of attempting to bolt it on after the fact. We learned that lesson for internet facing software decades ago. It's about time we learned it here as well. Is the juice worth the squeeze? Not everything needs Orange Book (DoD 5200.28-STD) Class B1 systems. And if not, why did they introduce severe bugs for a tiny performance improvement? It's not tiny. Speculative execution usually makes code run 10-50% faster, depending on how many branches there are Yeah… folks who think this is just some easy to avoid thing should go look around and find the processor without branch prediction that they want to use. On the bright side, they will get to enjoy a much better music scene, because they’ll be visiting the 90’s. > Does Branch Privilege Injection affect non-Intel CPUs? > No. Our analysis has not found any issues on the evaluated AMD and ARM systems. IBM Stretch had branch prediction. Pentium in the early 1990s had it. It's a huge win with any pipelining. That's a vast underestimate. Putting in lfence before every branch is on the order of 10X slowdown. There is of course a slight chicken-egg-thing here: If there was no (dynamic) branch prediction, we (as in compilers) would emit different code that is faster for non-predicting CPUs (and presumably slower for predicting CPUs). That would mitigate a bit of that 10x. A bit. I think we've shown time and time again that letting the compiler do what the CPU is doing doesn't work out, most recently with Itanium. Of course I know that. But if the fix for this bug (how many security holes have ther been now in Intel CPUs? 10?) brings only a couple % performance loss, like most of the them so far, how can you even justify that at all? Isn't there a fundamental issue in there? How much improvement would there still be if we weren't so lazy when it comes to writing software. If we were working to get as much performance out of the machines as possible and avoiding useless bloat instead of just counting on the hardware to be "good enough" to handle the slowness with some grace. A modern processor pipeline is dozens of cycles deep. Without branch prediction, we would need to know the next instruction at all times before beginning to fetch it. So we couldn’t begin fetching anything until the current instruction is decoded and we know it’s not a branch or jump. Even more seriously, if it is a branch, we would need to stall the pipeline and not do anything until the instruction finishes executing and we know whether it’s taken or not (possibly dozens of cycles later, or hundreds if it depends on a memory access). Stalling for so many cycles on every branch is totally incompatible with any kind of modern performance. If you want a processor that works this way, buy a microcontroller. But branch prediction doesn't necessarily need complicated logic. If I remember correctly (it's been 20 years since I read any papers on it), the simple heuristic "all relative branches backwards are taken, but forward and absolute branches are not" could achieve 70-80% performance of the state-of-the-art implementations back then. Do you mean overall or localized to branch prediction? Assuming all of that is true, you're talking about a 20-30% performance hit? > If you want a processor that works this way, buy a microcontroller. The ARM Cortex-R5F and Cortex-M7, to name a few, have branch predictors as well, for what it’s worth ;) You can still have a static branch predictor. That has surprisingly good coverage. I'm not saying this is a great idea, just pointing it out. Thanks! It would be great if someone could update the title URL to that blog post; the press release is worse than useless. Ok, we've changed to that from https://ethz.ch/en/news-and-events/eth-news/news/2025/05/eth... above. I don't know guys. Yes, the direct link saves a click, but the original title was more informative for the casual reader. I'm not a professional karma farmer and in dang's shoes I would have made the same adjustment, but I can't deny that seeing the upvote rate going down by 75% after the change was a little harsh. It was on the frontpage for 23 hours (and still is!) so the submission still did unusually well. I thought about adding the blog post link to the top text (a bit like in this thread: https://news.ycombinator.com/item?id=43936992), but https://news.ycombinator.com/item?id=43974971 was the top comment for most of the day, and that seemed sufficient. Edit: might as well belatedly do that! Great read! Some boiled-down takeaways: - Predictor updates may be deferred until sometime after a branch retires. Makes sense, otherwise I guess you'd expect that branches would take longer to retire! - Dispatch-serializing instructions don't stall the pipeline for pending updates to predictor state. Also makes sense, considering you've already made a distinction between "committing the result of the branch instruction" and "committing the result of the prediction". - Privilege-changing instructions don't stall the pipeline for pending updates either. Also makes sense, but only if you can guarantee that the privilege level is consistent between making/committing a prediction. Otherwise, you might be creating a situation where predictions generated by code in one privilege level may be committed to state used in a different one? Maybe this is hard because "current privilege level" is not a single unambiguous thing in the pipeline? Good to see Kaveh Razavi, he used to teach at my uni in the Vrije Universiteit in Amsterdam :) The course Hardware Security was crazy cool and delved into stuff lijke this. I checked out this course (and another one from Vrije about malware) a couple of years ago, back then there was very little public info about the courses. Do you know if there is any official recording or notes online? Thanks in advance. As far as I am aware, the course material is not public. Practical assignments are an integral part of the courses given by the VUSEC group, and unfortunately those are difficult to do remotely without the course infrastructure. The Binary and Malware Analysis course that you mentioned builds on top of the book "Practical Binary Analysis" by Dennis Andriesse, so you could grab a copy of that if you are interested. Ah yea, he gave a guest lecture on how he hacked a botnet! More info here: https://krebsonsecurity.com/2014/06/operation-tovar-targets-... it's been a while back :) Thanks. I understand that it is difficult to do it remotely. I do have the book! I bought it a while ago but did not have the pleasure to check it out. No, but last time I checked you can be a contracted student for 1200 euro's. If I knew what I was getting into at the time, I'd do it. I did pay for extra, but in my case it was the low Dutch rate, so for me it was 400 euro's to follow hardware security, since I already graduated. But I can give a rough outline of what they taught. It has been years ago but here you go. Hardware security: * Flush/Reload * Cache eviction * Spectre * Rowhammer * Implement research paper * Read all kinds of research papers of our choosing (just use VUSEC as your seed and you'll be good to go) Binary & Malware Analysis: * Using IDA Pro to find the exact assembly line where the unpacker software we had to analyze unpacked its software fully into memory. Also we had to disable GDB debug protections. Something to do with ptrace and nopping some instructions out, if I recall correctly (look, I only low level programmed in my security courses and it was years ago - I'm a bit flabbergasted I remember the rough course outlines relatively well). * Being able to dump the unpacked binary program from memory onto disk. Understanding page alignment was rough. Because even if you got it, there were a few gotcha's. I've looked at so many hexdumps it was insane. * Taint analysis: watching user input "taint" other variables * Instrumenting a binary with Intel PIN * Cracking some program with Triton. I think Triton helped to instrument your binary with the help of Intel PIN by putting certain things (like xor's) into an SMT equation or something and you had this SMT/Z3 solver thingy and then you cracked it. I don't remember got a 6 out of 10 for this assignment, had a hard time cracking the real thing. Computer & Network Security: * Web securtiy: think XSS, CSRF, SQLi and reflected SQLi * Application security: see binary and malware analysis * Network security: we had to create our own packet sniffer and we enacted a Kevin Mitnick attack (it's an old school one) where we had to spoof our IP addresses, figure out the algorithm to create TCP packet numbers - all in the blind without feedback. Kevin in '97 I believe attacked the San Diego super computer (might be wrong about the details here). He noticed that the super computer S trusted a specific computer T. So the assignment was to spoof the address of T and pretend we were sending packets from that location. I think... writing this packet sniffer was my first C program. My prof. thought I was crazy that this was my first time writing C. I was, I also had 80 hours of time and motivation per week. So that helped. * Finding vulnerabilities in C programs. I remember: stack overflows, heap overflows and format strings bugs. ----- For binary & malware analsys + computer & network security I highly recommend hackthebox.eu For hardware security, I haven't seen an alternative. To be fair, I'm not looking. I like to dive deep into security for a few months out of the year and then I can't stand it for a while. Wow, thanks a lot for the detailed answer. I'm going to see if I can register as a contracted student, but they probably do not accept remote students. BTW I can see you were very motivated back then. It got to be pretty steep but you managed to break through. Congrats! Remote won't work yea. It has to be in-person. > BTW I can see you were very motivated back then. It got to be pretty steep but you managed to break through. Congrats! Thanks! Yea I was :) Anyone know how this relates to the Training Solo attack that was just disclosed? https://www.vusec.net/projects/training-solo/ Both exploit Spectre V2, but in different ways. My takeaway: Training Solo:
- Enter the kernel (and switch privilege level) and “self train” to mispredict branches to a disclosure gadget, leak memory. Branch predictor race conditions:
- Enter the kernel while your trained branch predictor updates are still in flight, causing the updates to be associated with the wrong privilege level. Again, use this to redirect a branch in the kernel to a disclosure gadget, leak memory. If CPU brach predictor had bits of information readily available to check buffer boundaries and privilege level of the code, all this would be much easier to prevent. But apparently that will only happen when we pry out the void* from the cold C programmers' hands and start enriching our pointers with vital information. I don't see how you think that will help? It's not about software abstraction, it's about hardware. Changing the "pointer" does nothing to the transistors. Doing what you want would essentially require a hardware architecture where every load/store has to go through some kind of "augmented address" that stores boundary information. Which is to say, you're asking for 80286 segmentation. We had that, it didn't do what you wanted. And the reason is that those segment descriptors need to be loaded by software that doesn't mess things up. And it doesn't, it's "just a pointer" to software and amenable to the same mistakes. Why stop at 80286, consider going back to the ideas of iAPX432, but with modern silicon tech and the ability to spend a few million transistors here and there. (CHERI already exists on ARM and RISC-V though.) 286 far pointers were used sparingly, to save precious memory. Now we don't have any such problem and there are still unused bits in pointers even on largest 64 bit systems that might be repurposed perhaps. With virtual memory, there are all kinds of hardware supported address mappings and translations and IOMMU already so adding more transistors isn't an issue. The issue is purely cultural as you have just shown, people can't imagine it. That's misunderstanding the hardware. All memory access on a 286 was through a segment descriptor, every access done in protected mode was checked against the segment limit. Every single one. A "far pointer" was, again, a *software* concept where you could tell the compiler that this particular pointer needed to use a different descriptor than the one the toolchain assumed (by convention!) was loaded in DS or SS. I suppose a CPU that only runs Rust p-code is what the OP is dreaming about... Generated rust "p-code" would presumably be isomorphic to LLVM IR, which doesn't have this behavior either and would be subject to the same exploits. Again, it's just not a software problem. In the real world we have hardware that exposes "memory" to running instructions as a linear array of numbers with sequential addresses. As long as that's how it works, you can demand an out of bounds address (because the "bounds" are a semantic thing and not a hardware thing). It is possible to change that basic design principle (again, x86 segmentation being a good example), but it's a whole lot more involved than just "Rust Will Fix All The Things". Holy... I need to stop making fun of Rust (*). I keep getting misinterpreted. (*) ... although I don't think I can abstain ... Or people could just understand the scope of the issue better, and realize that just because something has a vulnerability doesn't mean there is a direct line to an attack. In the case of speculative execution, you need an insane amount of prep to use that exploit to actually do something. The only real way this could ever be used is if you have direct access to the computer where you can run low level code. Its not like you can write JS code with this that runs on browsers that lets you leak arbitrary secrets. And in the case of systems that are valuable enough to exploit with a risk of a dedicated private or state funded group doing the necessary research and targeting, there should be a system that doesn't allow unauthorized arbitrary code to run in the first place. I personally disable all the mitigations because performance boost is actually noticeable. > Its not like you can write JS code with this that runs on browsers that lets you leak arbitrary secrets That's precisely what Spectre and Meltdown were though. It's unclear whether this attack would work in modern browsers but they did reenable SharedArrayBuffer & it's unclear if the existing mitigations for Spectre/Meltdown stimy this attack. > I personally disable all the mitigations because performance boost is actually noticeable. Congratulations, you are probably susceptible to JS code reading crypto keys on your machine. Disabling some mitigations makes sense for an internal box that does not run arbitrary code from the internet, like a build server, or a load balancer, or maybe even a stateless API-serving box, as long as it's not a VM on a physical machine shared with other tenants. You run "arbitrary code from the internet" as soon as you use a web browser with JS enabled. This is exactly what you won't do on most of your infrastructure boxes, would you? If you can reasonably trust all the software on the whole box, many mitigations that protect against effects of running adversary code on your machine become superfluous. OTOH if an adversary gets a low-privilege RCE on your box, exploiting something like Spectre or RowHammer could help elevate the privilege level, and more easily mount an attack on your other infrastructure. Yeah, as stated in a sibling answer, I misread your comment a little bit. It's true, on at least some classes of infrastructure boxes, you more or less "own all that is on the machine" anyway. But also note my caveat about database servers, for example. A database server shared between accounts of different trust levels will be affected, if the database supports stored procedures for example. Basically, as soon as there's anything on the box that not all users of it should be able to access anyway, you'll have to be very, very careful. While that’s an interesting idea, I’m not sure a side channel attack is actually exploitable by a stored procedure as I don’t believe it has enough gadgets. I don't know. PL/SQL (which is separate from SQL) is effectively a general purpose language, and kind of a beast at that. I have not the faintest idea, but at least I wouldn't be surprised to see high enough precision timers, and maybe it even getting JITted down into machine code for performance nowadays. (And I've read that tight loops can be used for timing in side channel attacks as well, although I assume it requires a lot more knowledge about the device you're running on.) A quick search reveals that there is at least a timer mechanism, but I have no idea of any of its properties: https://docs.oracle.com/en/database/oracle/oracle-database/1... But what I'm actually trying to say, is: For multiple intents and purposes (which might or might not include relevance to this specific vulnerability), as soon as you allow stored procedures in your database, "not running arbitrary code" is not a generally true statement instead. You need some lowish level programming primitives to execute side chain attacks. For example, you can't do cache timing with SQL. PL/SQL, not SQL. Whatever I knew about PL/SQL in the 90s and early 2000s I've forgotten, but I wouldn't be so certain that PL/SQL a) does not have precise enough timing primitives, and b) does not get JITed down into machine code nowadays. It is a fully fledged, turing complete programming language with loops, arrays etc. What infrastructure box are you running that is running 100% all your code? Unless you ignore supply chain attacks, you’ve always got exposure. Excluding hardware supply chain attack, you start with a secure linux distro that is signed, and then the code that you write basically is written from scratch, using only the core libraries. I got really good a CS because I used to work for a contractor in a SCIF where we counldn't bring in any external packages, so I basically had to write C code for things like web servers from scratch. Or with JS disabled. HTML isn't as expressive, but it's still "arbitrary code from the internet" There is a difference. JS is turing complete, pure HTML is far from (as far as I'm aware). So HTML might (!) well be restricted enough to not be able to carry out such an attack. But I'd never state to definitively, as I don't know enough about what HTML without JS can do these days. For all I know there's a turing tarpit in there somewhere... CSS3 is Turing-complete, but creating an exploit using just it would be... quite a feat. With JS or WASM, it's much more straightforward. HTML doesn’t have the potential to deliver Spectre like attacks because: 1. No timers - timers are generally a required gadget & often they need to be hires or building a suitable timing gadget gets harder & your bandwidth of the attack goes down 2. No loops - you have to do timing stuff in a loop to exploit bugs in the predictor. Which you wouldn't do on an internal load balancer or database server, right? You are right, I sort of misread the statement I was replying to, but also wanted to reinforce that the large class of personal desktop machines is still very much affected, even if you "think" that you don't run "arbitrary code" on your machine. By the way, you have to be careful on your database server to not actually run arbitrary code as well. If your database supports stored procedures (think PL/SQL), that qualifies, if the clients that are able to create the stored procedures are not supposed to be able to access all data on that server anyway. Oh yeah. Supply-chain risk is still a thing too and defense-in-depth is not a bad strategy. Physical isolation simplifies a lot of this. This class of attacks isn't (as) relevant for single-tenant single-workload dedicated machines. Based on this thread, I think people badly misjudge what “single-tenant” means in the context of susceptibility to exploits.
progval - a year ago
dang - a year ago
ncr100 - a year ago
formerly_proven - a year ago
tsukikage - a year ago
tankenmate - a year ago
tsukikage - a year ago
tankenmate - a year ago
cenamus - a year ago
bloppe - a year ago
bee_rider - a year ago
yencabulator - a year ago
wbl - a year ago
titzer - a year ago
grumbelbart2 - a year ago
anyfoo - a year ago
cenamus - a year ago
autoexec - a year ago
umanwizard - a year ago
tremon - a year ago
anyfoo - a year ago
superblas - a year ago
jeffbee - a year ago
trebligdivad - a year ago
dang - a year ago
alberto-m - a year ago
dang - a year ago
eigenform - a year ago
mettamage - a year ago
markus_zhang - a year ago
thijsr - a year ago
mettamage - a year ago
markus_zhang - a year ago
mettamage - a year ago
markus_zhang - a year ago
mettamage - a year ago
rakingleaves - a year ago
hashstring - a year ago
rini17 - a year ago
ajross - a year ago
nine_k - a year ago
rini17 - a year ago
ajross - a year ago
nottorp - a year ago
ajross - a year ago
nottorp - a year ago
ActorNightly - a year ago
vlovich123 - a year ago
nine_k - a year ago
anyfoo - a year ago
nine_k - a year ago
anyfoo - a year ago
vlovich123 - a year ago
anyfoo - a year ago
ActorNightly - a year ago
anyfoo - a year ago
vlovich123 - a year ago
ActorNightly - a year ago
dwattttt - a year ago
anyfoo - a year ago
nine_k - a year ago
vlovich123 - a year ago
baobun - a year ago
anyfoo - a year ago
baobun - a year ago
vlovich123 - a year ago