Kernel optimization with BOLT (binary optimization and layout tool)

lwn.net

133 points by chmaynard 7 days ago


stephc_int13 - 3 hours ago

Instruction Cache and TLB trashing is an often overlooked consequence of code bloat and sometimes of overly aggressive micro-benchmark driven optimization.

Reorganizing the binary is an interesting approach to minimize the cost, but I think that any performance oriented developer should keep in mind that most projects are rarely dependent on a single hot loop but on many systems working together and competing for space in the cache(s).

I generally use -Os instead of -O2 and -O3 in my projects, while trying to reduce code bloat to a minimum for that reason.

JoelJacobson - 9 hours ago

Here is another interesting BOLT article, this one on PostgreSQL optimization:

https://vondra.me/posts/playing-with-bolt-and-postgres/

"results are unexpectedly good, in some cases up to 40%"

BSDobelix - 12 hours ago

One can try it out with CachyOS/Arch:

https://cachyos.org/blog/2411-kernel-autofdo/

OnlyMortal - 7 hours ago

Back in the day on the Mac, the order of source files in your project would determine locality in the binary.

If memory serves, this was with MPW C or maybe CodeWarrior.

You could see the jump (jmp) instructions use short jumps rather than long ones.

kardos - 10 hours ago

Does it work with Intel fortran-compiled code?

vsskanth - 8 hours ago

Anyone know of a windows equivalent to BOLT ?