Exploiting Undefined Behavior in C/C++ Programs: The Performance Impact [pdf]
web.ist.utl.pt90 points by luu 4 days ago
90 points by luu 4 days ago
I notice that the paper doesn't claim to eliminate all reasoning about undefined behavior for optimizations. For example:
int f() {
int arr[3], i = 0;
arr[3] = 5;
return i;
}
Optimizing this to "return 0" is relying on UB, because it's assuming that i wasn't laid out directly after arr in the stack frame. I believe this is what the paper calls "non-guardable UB".I don't agree with the claim in the paper that their semantics offers a "flat memory model". A flat memory model would rule out the optimization above. Rather, the memory model still has the notion of object bounds; it's just simplified in some ways.
>it's assuming that i wasn't laid out directly after arr in the stack frame
The compiler isn't "assuming" that so much as choosing not to put i in the stack frame at all. And I don't think it's correct to view the lack of a register spill as an "optimization" per se. It does remain true that code writing past the end of an array will be UB in typical scenarios (i.e. when not using asan/valgrind).
(Now, if the compiler also removed the store, that could legitimately be called an optimization based on assuming no-UB)
"Exploiting undefined behavior" occurs when a simple semantics (however one defines "simple") results in behavior A, but the compiler chooses behavior B instead based on the actual, more complex, language semantics. The code snippet in question passes that test. If I flip the declaration order of i and arr, then I get this [1] at -O0 (the "simple" semantics):
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0
mov dword ptr [rbp - 4], 5
mov eax, dword ptr [rbp - 4]
pop rbp
ret
Which indeed returns 5. But at -O2 clang optimizes it to this: xor eax, eax
ret
Which returns 0. So the simple semantics produces one result, and the complex semantics produces another. Hence, it's exploiting undefined behavior.Maybe this is just arguing semantics, but I don't agree with the definition you've given, and I don't think that your definition is what TFA means. "Exploiting undefined behavior" I think refers narrowly to the implementation assuming that undefined behavior does not occur, and acting on that assumption. Undefined behavior naturally resulting in unpredictable behavior is not exploitation in the same sense. For example,
printf("A");
bool x;
if ( x ) {printf("B");} else {printf("C");}
printf("\n");
If at -O0 "AB" is printed and at -O2 "AC" is printed (due to the vagaries of whatever was left on the stack), then that would meet your definition, but I would not regard that as "exploiting undefined behavior", merely as the manifestation of the inherent unpredictability of UB. If the compiler didn't print anything (i.e. the whole block was removed due to UB detection) then that _would_ be a case of exploiting undefined behavior.What about this: https://godbolt.org/z/xP9xG3Ee3
Here the compiler "register allocates" i for some reads but not for others.
i gets stack allocated, but some uses of it act as though they were register allocated.
I'm not sure quite what you're asking for exactly, given the link is for clang trunk and doesn't have the modifications discussed in TFA, and I don't dispute that clang does UB-based reasoning at -O3. But, I will argue that the assembly shown can be accomplished without resorting to what I call "reasoning about UB", and within a flat memory model, supporting the claim that these sacrifices are often not necessary. I'm going to draw a distinction between stack memory being "private" in the sense that only the compiler is allowed to alter it, and "public" where the address can be written to by something else and the compiler needs to handle that. Local variables at first are tracked privately. After the address of a variable is taken with &x, or at the point in time when an array variable is indexed, the associated memory is public. Conceptually, the use of private memory can be indirect; the compiler could encode/decode a stack variable x as (x XOR 0xDEADBEEF) on the stack and it would be fine (but the compiler does the simple thing in practice, naturally). Note that this notion of "private"/"public" stack memory is a property of addresses, not the provenance of the accessing pointers, and so is fully compatible with a flat memory model. The compiler's ability to use private memory isn't a case of "reasoning around UB" in a meaningful sense -- otherwise you could just as well argue that returning from a function call is "reasoning about UB", because the the return address can't be clobbered.
In your provided snippet, the correctness argument for the assembly in a non-UB-reasoning universe goes like this: at first, i is stored privately on the stack with value zero, and so as an optimization we can assume that value is still zero without rereading. Only later, when &i is taken, is that memory made public and the compiler has to worry about something altering it. In actual execution, the problem is that the write function alters compiler-private memory (and note again, that being private is a property of the underlying address, not the fact that it's accessed via an out-of-bounds array indexing), and this is UB and so the program breaks. But, the compiler didn't need to make _assumptions_ around UB.
I've only briefly skimmed the paper, but on that glance, it looks like what they did was (effectively) drop all the attributes in LLVM that can indicate UB, e.g., the inbounds flags on getelementptr instructions, or the nsw flags on arithmetic operations.
As you note, it doesn't remove the more core UB behaviors in LLVM, in particular LLVM's reliance on pointer provenance.
They do a bit more than that. One of the options (-disable-object-based-analysis under AA2) disables the assumption that distinct identified objects do not alias, which is disabling pointer provenance in at least one key place.
So I think this option very roughly approximates a kind of "no provenance, but with address non-determinism" model, which still permits optimizations like SROA on non-escaping objects.
Sorry, i dont get why the memory layout should have any effect, when its clear in the AST that i=0 should be returned.
I think in the example the parent gave `arr[3]` is past the end of the 3 element array, where `i` might reside, potentially changing its value.
It's clear in the AST that there is undefined behaviour and it is malformed code. It is not valid C code, so what the compiler chooses to do with it is not defined by the language.
Note that if you change the code to this you have the same issue:
int g(int n) {
int arr[3], i = 0;
arr[n] = 5;
return i;
}
Without "exploiting UB" it's incorrect to optimize this to "return 0", because of the possibility that i was allocated right after arr and n == 3.If we consider writing out of bounds to be legal, we make it impossible to reason about the behavior of programs.
Hence why I (and many other compiler developers) are inherently skeptical whenever anyone says "just stop exploiting undefined behavior".
I'm not a compiler developer but I'm at least as skeptical as you because there is no sign that the "just stop exploiting UB" people actually want any specific semantics, IMO they want Do What I Mean, which isn't a realizable language feature.
If you could somehow "stop exploiting UB" they'd just be angry either that you're still exploiting an actual language requirement they don't like and so have decided ought to be excluded or that you followed the rules too literally and obviously the thing they meant ought to happen even though that's not what they actually wrote. It's lose-lose for compiler vendors.
I am one of the "stop exploiting UB" camp. [1]
I agree that some of us are unreasonable, but I do recognize that DWIM is not feasible.
I just want compilers to treat UB the same as unspecified behavior, which cannot be assumed away.
> I just want compilers to treat UB the same as unspecified behavior, which cannot be assumed away.
Unspecified behavior is defined as the "use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance".
Which (two or more) possibilities should the standard provide for out-of-bounds writes? Note that "do what the hardware does" wouldn't be a good specification because it would either (a) disable all optimizations or (b) be indistinguishable from undefined behavior.
You mention that "Note that those surprised programmers are actually Rust compiler authors" but I can't figure out which of the many links is to some "surprised programmers" who are actually rustc authors, and so I don't even know if you're right.
Rust's safe subset doesn't have any UB, but the unsafe Rust can of course cause UB very easily, because the rules in Rust are extremely strict and only the safe Rust gets to have the compiler ensure it doesn't break the rules. So it seems weird for people who work on the compiler guts to be "surprised".