C++26 Shipped a SIMD Library Nobody Asked For

lucisqr.substack.com

107 points by signa11 2 days ago


jandrewrogers - 5 hours ago

I have written a lot of SIMD for both x86 and ARM over many years and many microarchitectures. Every abstraction, including autovectorization, is universally pretty poor outside of narrow cases because they don’t (and mostly can’t) capture what is possible with intrinsics and their rather extreme variation across microarchitectures. If I want good results, I have to write intrinsics. No library can optimally generate non-trivial SIMD code. Neither can the compiler. Portability just amplifies this gap.

I think a legitimate criticism is that it is unclear who std::simd is for. People that don’t use SIMD today are unlikely to use std::simd tomorrow. At the same time, this does nothing for people that use SIMD for serious work. Who is expected to use this?

The intrinsics are not difficult but you do have to learn how the hardware works. This is true even if you are using a library. A good software engineer should have a rough understanding of this regardless.

mgaunard - 4 hours ago

I made the first proposal to the C++ standard committee to introduce SIMD in 2011, before Matthias Kretz got involved with his own version (which is what became std::simd). This was based on what eventually became Eve (mentioned in the article).

Back then, it was rejected, for the same arguments that people are making today, such as not mapping to SVE well, having a separate way to express control flow etc.

There was a real alternative being considered at the time: integrating ISPC-like semantics natively in the language. Then that died out (I'm not sure why), and SIMD became trendy, so the committee was more open to doing something to show that they were keeping up with the times.

magicalhippo - 2 days ago

The linked[1] "six reasons to use std::simd" was just what I needed after a long week. Hilarious!

[1]: https://github.com/NoNaeAbC/std_simd

ozgrakkurt - 23 minutes ago

Just write inline asm for x86 and aarch64 (if you care about that) and not care about the rest. Is it even useful to do simd on other processors?

Compiler optimizing even the code around the simd code based on the semantics of arithmetic or other things sounds silly after writing some of this kind of code

countWSS - 3 hours ago

GCC already solved it: https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html The operations behave like C++ valarrays. Addition is defined as the addition of the corresponding elements of the operands. For example, in the code below, each of the 4 elements in a is added to the corresponding 4 elements in b and the resulting vector is stored in c.

jcranmer - 2 hours ago

If you thought std::simd was a library nobody asked for, just wait until you hear about <linalg>. I feel like half the people looking forward to that think they're just going to get standard C++ bindings to LAPACK, when instead they're probably going to get an unoptimized, slapdash implementation of LAPACK written by people who aren't good at BLAS.

As for SIMD itself, designing a good SIMD library is difficult because there are several different SIMD approaches and some of them work poorly for certain use cases. For example, you can take an HPC-ish approach of "vectorize this loop" (à la #pragma omp simd) and have the compiler take care of a fairly mechanical transformation. Or you can take an opposite approach of treating a 128-bit SIMD vector as a fundamental data type in your language. Which approach is better depends on your use case.

- an hour ago
[deleted]
zombot - an hour ago

The article's point in a nutshell:

> The problem is that std::simd in 2026 is the 2012 solution arriving after the world moved on. The committee spent a decade polishing a library-based approach while compilers solved the easy cases automatically and ISPC solved the hard cases with language-level support.

I find it interesting that the C++ committee would make that kind of mistake. Shouldn't they know better?

plasticeagle - 2 hours ago

Nobody should read that AI slop article. Nobody.

Maybe there's an interesting story in there, it's certainly possible. But the "author" could not be bothered to write it, and so why should we waster our time reading it?

raverbashing - 27 minutes ago

sigh

C++ sits on that weird abstraction level where it wants to be a higher level language but it keeps grinding their gears on stuff like pointer sizes, pointer arithmetic or vector sizes and at the same time wants to keep being C compatible and needs that interface with the lower level world

Now compare with how numpy does things: you care about the data size but not the implementation.

Still, I didn't expect less (of a crap fest) from the C++ committee as presented here

fithisux - an hour ago

Why not just writing inline assembly is not enough?

You optimize for a specific target.

The problem is that you cannot be cross-platform. Sure.

But that is why software is incremental.

I write for my HW, not yours. You can write for yours.

Make folders with implemntations

x86_v1 x86_v2 arm64 riscv64 ... ... ...

and include

ori_b - 2 hours ago

Slop.