The Cost of a Closure in C

thephd.dev

177 points by ingve 15 hours ago


kazinator - 5 hours ago

> It’s no wonder GCC is trying to add -ftrampoline-impl=heap to the story of GNU Nested Functions; they might be able to tighten up that performance and make it more competitive with Apple Blocks.

[disclaimer] Without brushing up on the details of this, I strongly suspect that this is about removing the need for executable stacks than performance. Allocating a trampoline on the stack rather than heap is good for efficiency.

These days, many GNU/Linux distros are disabling executable stacks by default in their toolchain configuration, both for building the distro and for the toolchain offered by the system to the user.

When you use GCC local functions, it overrides the linker behavior so that the executable is marked for executable stacks.

Of course, that is a security concession because when your stack is executable, that enables malicious remote execution code to work that relies on injecting code into the stack via a buffer overflow and tricking the process into jumping to it.

If trampolines can be allocated in a heap, then you don't need an executable stack. You do need an executable heap, or an executable dedicated heap for these allocations. (Trampolines are all the same size, so they could be packed into an array.)

Programs which indirect upon GCC local functions are not aware of the trampolines. The trampolines are deallocated naturally when the stack rolls back on function return or longjmp, or a C++ exception passing through.

Heap-allocated trampolines have an obvious deallocation problem; it would be interesting to see what strategy is used for that.

unwind - 10 hours ago

This was very interesting, and it's obvious from the majority of the text that the author knows a lot about these languages, their implementation, benchmarking corners, and so on. Really!

Therefore it's very jarring with this text after the first C code example:

This uses a static variable to have it persist between both the compare function calls that qsort makes and the main call which (potentially) changes its value to be 1 instead of 0

This feels completely made up, and/or some confusion about things that I would expect an author of a piece like this to really know.

In reality, in this usage (at the global outermost scope level) `static` has nothing to do with persistence. All it does is make the variable "private" to the translation unit (C parliance, read as "C source code file"). The value will "persist" since the global outermost scope can't go out of scope while the program is running.

It's different when used inside a function, then it makes the value persist between invocations, in practice typically by moving the variable from the stack to the "global data" which is generally heap-allocated as the program loads. Note that C does not mention the existence of a stack for local variables, but of course that is the typical implementation on modern systems.

Rochus - 11 hours ago

The benchmark demonstrates that the modern C++ "Lambda" approach (creating a unique struct with fields for captured variables) is effectively a compile-time calculated static link. Because the compiler sees the entire definition, it can flatten the "link" into direct member access, which is why it wins. The performance penalty the author sees in GCC is partly due to the OS/CPU overhead of managing executable stacks, not just code inefficiency. The author correctly identifies that C is missing a primitive that low-level languages perfected decades ago: the bound method (wide) pointer.

The most striking surprise is the magnitude of the gap between std::function and std::function_ref. It turns out std::function (the owning container) forces a "copy-by-value" semantics deeply into the recursion. In the "Man-or-Boy" test, this apparently causes an exponential explosion of copying the closure state at every recursive step. std::function_ref (the non-owning view) avoids this entirely.

uecker - 3 hours ago

BTW: I wrote why the lambda design does not fit C well here:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3654.pdf

(and I am not impressed by micro benchmarks)

RossBencina - 13 hours ago

Good to see Borland's __closure extension got a mention.

Something I've been thinking about lately is having a "state" keyword for declaring variables in a "stateful" function. This works just like "static" except instead of having a single global instance of each variable the variables are added to an automatically defined struct, whose type is available using "statetype(foo)" or some other mechanism, then you can invoke foo as with an instance of the state (in C this would be an explicit first parameter also marked with the "state" parameter.) Stateful functions are colored in the sense that if you invoke a nested stateful function its state gets added to the caller's state. This probably won't fly with separate compilation though.

sirwhinesalot - 12 hours ago

I think local functions (like the GNU extension) that behave like C++ byref(&) capturing lambdas makes the most sense for C.

You can call the local functions directly and get the benefits of the specialized code.

There's no way to spell out this function's type, and no way to store it anywhere. This is true of regular functions too!

To pass it around you need to use the type-erased "fat pointer" version.

I don't see how anything else makes sense for C.

Progge - 13 hours ago

Long time ago I wrote C. Could anyone fill me in why the first code snippet is arg parsing the way it is?

int main(int argc, char* argv[]) {

  if (argc > 1) {

    char\* r_loc = strchr(argv[1], 'r');

    if (r_loc != NULL) {

      ptrdiff_t r_from_start = (r_loc - argv[1]);

      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 

    }

  }

  ...
}

Why not

if (argc > 1 && strcmp(argv[1], "-r") == 0) {

    in_reverse = 1;
}

for example?

kazinator - 5 hours ago

Defininig a callback interface in C without a user context parameter is a capital crime.

hyperbolablabla - 7 hours ago

Stewart Lynch in his 10x VODs mentions his custom Function abstraction in C++. It's super clean and explicit, avoiding `auto` requirement of C++ lambdas. It's use looks something akin to:

    // imagine my_function takes 3 ints, the first 2 args are captured and curried.
    Function<void(int)> my_closure(&my_function, 1, 2);
    my_closure(3);
I've never implemented it myself, as I don't use C++ features all too much, but as a pet project I'd like to someday. I wonder how something like that compares!
nesarkvechnep - 12 hours ago

I'm thinking of using C++ for a personal project specifically for the lambdas and RAII.

I have a case where I need to create a static templated lambda to be passed to C as a pointer. Such thing is impossible in Rust, which I considered at first.

mgaunard - 13 hours ago

I feel the results say more about the testing methodology and inlining settings than anything else.

Practically speaking all lambda options except for the one involving allocation (why would you even do that) are equivalent modulo inlining.

In particular, the caveat with the type erasure/helper variants is precisely that it prevents inlining, but given everything is in the same translation unit and isn't runtime-driven, it's still possible for the compiler to devirtualize.

I think it would be more interesting to make measurements when controlling explicitly whether inlining happens or the function type can be deduced statically.

groundzeros2015 - 7 hours ago

Thread locals do solve the problem. You create a wrapper around the original function. You set a global thread local user data, you pass in a function which calls the function pointer accepting the user data with the global one.

ddtaylor - 13 hours ago

I actually enjoy trampoline functions in C a bit and it's one of the GNU extensions I use sometimes.

keymasta - 9 hours ago

It's a post about Man or Boy... and the only typo is... the word _son_. Pretty sure it's supposed to be "on"

psyclobe - 11 hours ago

c++ for the win!! finally!!

capestart - 13 hours ago

The breakdown of lambda, blocks, and nested functions demonstrates how important implementation and ABI details are in addition to syntax. I think the standard for C should include a straightforward, first class wide function pointer along with a closure story to stop people from adding these half portable, half spooky extensions.

stefantalpalaru - 11 hours ago

[dead]

trgn - 8 hours ago

i wish JS gurus understood this before jumping all in on hooks and bloating the runtime footprint of every web app out there