Bug 1950764: Work Around Crash on Intel Raptor Lake CPU

phabricator.services.mozilla.com

146 points by luu 3 days ago


bri3d - 14 hours ago

Linked in the Bugzilla thread is a really nice in depth investigation of the same issue with high register aliases in a similar algorithm (Huffman coding) but in an entirely different product: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in... .

It's concerning that Intel don't seem to have been responsive to anyone with respect to this issue and it doesn't appear to have an official errata yet, although Raptor Lake was the Intel CPU with voltage issues and basically random bit rot so I suppose it's hard to tell if this is a silicon level errata caused by bad design or by some kind of post-manufacturing damage. Raptor Lake in general causes enough non-reproducible noise that I believe Firefox gave up on automated crash reports from it ( https://bugzilla.mozilla.org/show_bug.cgi?id=1975808 ).

EDIT: I read that Oodle article (which is SO good!) again and realized that their customer-provided reproduction of the bug was directly linked to boost clock speeds (the customer said that overclocking by 5% made it happen entirely reliably), so this is definitely not a "the architecture has a 100% bug in it" but rather some deeper issue with clock propagation that appears at edge cases.

Polizeiposaune - 14 hours ago

Details of the errata from a comment in the diff:

"Write both dist bytes as a single 2-byte store. This avoids the `movb %ch, [mem]` instruction pattern (store from high-byte register alias) that LLVM otherwise emits when dist arrives as a wide register. That pattern triggers the Intel Raptor Lake CPU errata, causing silent 2-byte stores that corrupt the adjacent `len` byte."

robin_reala - 13 hours ago

Also worth reading this thread on the subject: https://mas.to/@gabrielesvelto/116630047156991279

Regarding the Raptor Lake bug I received a couple of messages from confused users that had read articles on Tomshardware and Neowin. They asked about erratas and microcode updates which puzzled me, because that was part of my early investigation into the bug and we know that the failure is not caused by a known errata and microcode updates cannot fix broken CPUs. So why did they ask? As it turns out it was slop. Both articles are 100% slop full of confusing and inaccurate claims.

codedokode - 7 hours ago

I looked at the Raptor Lake errata [1] and it looks pretty scary. What if someones builds an exploit on these errors?

This is why CPU designers should aim for simplicity. This is why RISC-V vector extension, which requires complicated logic, can become a source of implementation errors.

[1] https://edc.intel.com/content/www/us/en/design/products/plat...

mike_hock - 14 hours ago

Uh ... working around this in each and every piece of software sounds like a non-starter? Intel should be on the hook to fix this.

ahartmetz - 2 hours ago

Wait what, they are using Phabricator? I don't think it's particularly bad, I just though it was... particularly dead.

userbinator - 14 hours ago

WTF, Intel? This is reminding me of a very similar bug from 9 years ago: https://news.ycombinator.com/item?id=14630183

Clearly Intel needs to do far more extensive regression-testing, with things like demoscene productions --- especially the extremely size-optimised ones that can exercise the edge-cases much better than the usual "compiler slop".

charcircuit - 13 hours ago

Hopefully this bug is getting handled upstream in a microcode update or a compiler fix to avoid emitting such instructions. Just a comment mentioning that you should not emit a particular instruction is not a strong guarantee.

WaylandYang - 11 hours ago

[flagged]