Cost of enum-to-string: C++26 reflection vs. the old ways
vittorioromeo.com95 points by sagacity 4 days ago
95 points by sagacity 4 days ago
> The header is the cost. Not the reflection. The reflection algorithm is fast – asymptotically ~0.07 ms per enumerator, essentially the same as the hand-rolled switch in the X-macro version (~0.06 ms). What makes reflection look expensive is <meta>: just including it costs ~155 ms per TU over the baseline.
So speaking of old ways, I'm not a C++ dev, but a while ago saw someone comment that they still organize their C++ projects using tips from John Lakos' Large-scale C++ software design from 1997, and that their compile times are incredibly fast. So I decided to find a digital copy on the high seas and read it out of historical curiosity. While I didn't finish it, one wild thing stood out to me: he advised for using redundant external include guards around every include, e.g.
#ifndef INCLUDED_MATH
#include <math>
#define INCLUDED_MATH
#endif
The reason for this being that (in 1997) every include required that the pre-processor opened the file just to check for an include guard and reading it all the way to the end to find the closing #endif, causing potentially O(N*2) disk read overhead (if anyone feels like verifying this, it's explained on pages 85 to 87).Again, that was in 1997. I have no idea what mitigations for this problem exist in compilers by now, but I hope at least a few, right?
This conclusion is making me wonder if following that advice still would have a positive impact on compile times today after all though. Surely not, right? Can anyone more knowledgeable about this comment on that?
This cost is not significant nowadays, it's the frontend/parsing time.
You can also use `#pragma once` which works everywhere, is nicer, and technically needs less work by the compiler, but compilers have optimized for include guards since a long time ago.
Some random measurements I found: https://github.com/Return-To-The-Roots/s25client/issues/1073
Yes, I've heard that before, but comments like this one in your linked issue still make me wonder:
> at least for gcc and Visual Studio using #pragma once has a significant impact. The fact is, the compiler does not need to continue parsing the whole file when reaching a #pragma once. otherwise the compiler always needs to do it even if the include guard afterwards will avoid double processing of the content afterwards.
As written the explanation for these optimizationst suggest that both "pragma once" and include guard optimization still requires opening and closing the file each time an include is encountered, even if you bail after parsing the first line. Is that overhead zero? Or are the optimizations explained poorly and is repeatedly opening/closing the file also avoided?
Either way, do you know what causes the slowdown as a result of including <meta>?
The compiler doesn't need to open the same file multiple times. It can remember if a a file is guarded or not every time it sees its name.
My understanding is that this is an optimization that has been available for a very long time now.
The only issue is if a file is referred through multiple names (because of hard links, symlinks, mounts). That might cause the file to be opened again, and can actually break pragma once.
gcc actually documents this behaviour: https://gcc.gnu.org/onlinedocs/cppinternals/Guard-Macros.htm...
The overhead isn't zero, but with SSDs (and filesystem caches in the gigabytes these days) it's damn near insignificant in pure terms of opening files and such.
>...from John Lakos' Large-scale C++ software design from 1997...
I'll just point out that Lakos updated his work with a new edition in 2019:
Large-Scale C++ Volume I: Process and Architecture
and there's scattered evidence that Volume II might be published in Feb. 2027 [1]
Large-Scale C++ Volume II: Design and Implementation
[1]: https://www.amazon.co.uk/Large-Scale-Implementation-Addison-...
Oh nice, thanks for the tip! Don't know if I can justify picking up a copy given that I do not work with C++ at all nor with large-scale systems. But I know a few people who might be interested.
What I found (so far on MSVC) is that #pragma once does only process the file once, where as include guards still open the file each time it is included. Though it takes almost no time to do so but it still appears on the traces.
I'm going to experiment with other compilers and figure out how they handle it.
I've not been diligently keeping up with C++ recently but there's a C++20 feature called modules. Per Wikipedia, they're somewhat like precompiled headers.
Oof, that first example (the idiomatic C++26 way) looks so foreign if you're mostly used to C++11.
I was very curious to see what C++ 26 brings to the table, since I haven't used C++ in a while.
When I saw the 'no boilerplate' example, the very first thought that came to my mind:
This is the ugliest, most cryptic and confusing piece of code I've ever seen. Calling this 'no boilerplate' is an insult to the word 'boilerplate'.
Yeah, I can parse it for a minute or two and I mostly get it.
But if given the choice, I'd choose the C-macro implementation (which is 30+ years old) over this, every time. Or the good old switch case where I understand what's going on.
I understand that reflection is a powerful capability for C++, but the template-meta-cryptic-insanity is just too much to invite me back to this version of the language.
As a developer who doesn't really write C++ code I'm inclined to agree, but I think Herb Sutter's "syntax 2" project might provide a nice way out of that mess eventually.
I played around with cppfront over Christmas and it was a lot more ergonomic than my distant memories of C++11, which I don't even have negative memories of per se.
That isn't going anywhere official.
It is no different from any other language that compiles via C or C++ code generation, it got sold a bit differently due to his former position at WG21.
Well, if you mean "as an official C++ syntax" then I agree, and I suspect Sutter would agree as well. He labeled one talk about it a "Towards a Typescript for C++", after all[0].
But I do think it is different than other "compile to C++" languages, because it seems to be more of a personal case study for Sutter to figure out various reflection and metaprogramming features, and then "backport" those worked out ideas to regular C++ via proposals. And the latter don't have to match the CPP2 syntax at all.
In multiple examples he's given in talks the resulting "regular" C++ code is easier to read, mainly because the metaprogramming deals with so much boilerplate.
What Herb Stutter misses on his Typescript and Kotlin for C++ metaphor is the actual reality how those languages integrate, unlike cpp2.
Typescript is a linter, nothing else, type annotations for JavaScript. The two features that aren't present in JavaScript, enums and namespaces, are considered design mistakes and the team vouched to focus only on being a linter,and polyfill for older runtimes, when possible (some JS features require runtime support).
While Kotlin spews JVM bytecode many language constructs, like co-routines, make it one way, it is easy to call Java from Kotlin, the other way around requires boilerplate code, manipulating the additional classes generated by the Kotlin compiler for its semantics.
My point was that TypeScript isn't exactly about to replace JavaScript, which was what you were arguing. I'm honestly not sure what you're trying to argue now.
Like, yeah, what you say about TS and Kotlin is true about TS and Kotlin. But since you're not explaining what cpp2 does or plans to do differently, and why it matters, I'm not sure where you're going with that. It's probably obvious but I'm not getting it.
The metaphor Sutter was going for, as I see it, is that TS and Kotlin both added missing features to their host language. Most importantly reflection and decorators in TS, which are now becoming a standard in JS as well[0]. cpp2 mainly focuses on experimenting with reflection and metaprogramming as well, adding features currently missing in C++ by being a compiles-to-C++ language. Sutter has written C++ proposals what would allow give C++ similar reflection and metaprogramming capabilities based on what he discovered by working on cpp2. That's pretty comparable if you ask me.
> But if given the choice, I'd choose the C-macro implementation (which is 30+ years old) over this, every time.
Why? The implementation is not pretty, but you only need to write it once and then it works for all enums. The actual usage is trivial, it's just a function call.
The C macro version is horrendous in comparison. Why would I want to declare my enums like that just because I might want to print them?
Then why isn't part of the stdlib? Why should everybody maintain their own version?
Just wait for C++32 :-D. After all, we only got `std::string::starts_with` in C++20 and C++23 finally gave us `std::string::contains`. It's a clown show, you just need to take it with humor.
It is "cryptic" and "ugly" to you just because you're not familiar with it. You'd pick the macro-based implementation because you are familiar with it.
Seeing this argumentation is so tiresome, because it feels like there is a lack of self-awareness regarding what is "familiar" and what isn't, which is subconsciously translated to "ugly" and "bad".
Have you ever used other (modern) programming languages ?
In a lot of languages, you achieve the same with 1 line of code. It's not about familiarity, it's about the fact that it's a long and convoluted incantation to get the name of an enum.
Why do I have to be familiar with all those weird symbols just to do a trivial thing ?
Update:
Zig:
const Color = enum { red, green, blue };
const name = @tagName(Color.red); // "red"
Rust:
#[derive(Display)]
enum Color { Red, Green, Blue }
let name = Color::Red.to_string(); // "Red"
Clojure:
(name :red) => "red"
And what if you want to implement something like Rust's "derive"? That is what the article shows.
As far as I understand you would have to mess with individual parser tokens in Rust instead of high-level structures like "enum" (C++ reflection). It would be much, much uglier to implement anything like "to_enum_string" in Rust as you would have to re-implement parts of the compiler to get the "enum" concept out of a list of tokens.
C++:
enum Color { red, green, blue };
auto name = to_enum_string(Color::Red); // "Red"... and where does that `to_enum_string` come from exactly? It doesn't seem to be built-in, which is the point of the parent comment.
It's a fair comparison. The parent comment isn't showing the compiler source code for the built-in reflection mechanisms.
You won't have to care about ^^ and [:X:] if you just want to consume reflection-based utils, which was the whole point of my comment.
What? No. Parent comment is comparing C++ to modern programming languages, showcasing how they provide commonly used utilities out-of-the-box instead of making every programmer re-implement them again and again and again and again and again.
The parent comment is quite clear:
> Why do I have to be familiar with all those weird symbols just to do a trivial thing ?
And my answer demonstrates that you do not have to.