Laws of Software Engineering
lawsofsoftwareengineering.com941 points by milanm081 20 hours ago
941 points by milanm081 20 hours ago
> Premature optimization is the root of all evil.
There are few principle of software engineering that I hate more than this one, though SOLID is close.
It is important to understand that it is from a 1974 paper, computing was very different back then, and so was the idea of optimization. Back then, optimizing meant writing assembly code and counting cycles. It is still done today in very specific applications, but today, performance is mostly about architectural choices, and it has to be given consideration right from the start. In 1974, these architectural choices weren't choices, the hardware didn't let you do it differently.
Focusing on the "critical 3%" (which imply profiling) is still good advice, but it will mostly help you fix "performance bugs", like an accidentally quadratic algorithms, stuff that is done in loop but doesn't need to be, etc... But once you have dealt with this problem, that's when you notice that you spend 90% of the time in abstractions and it is too late to change it now, so you add caching, parallelism, etc... making your code more complicated and still slower than if you thought about performance at the start.
Today, late optimization is just as bad as premature optimization, if not more so.
The most misunderstood statement in all of programming by a wide margin.
I really encourage people to read the Donald Knuth essay that features this sentiment. Pro tip: You can skip to the very end of the article to get to this sentiment without losing context.
Here ya go: https://dl.acm.org/doi/10.1145/356635.356640
Basically, don't spend unnecessary effort increasing performance in an unmeasured way before its necessary, except for those 10% of situations where you know in advance that crucial performance is absolutely necessary. That is the sentiment. I have seen people take this to some bizarre alternate insanity of their own creation as a law to never measure anything, typically because the given developer cannot measure things.
> I have seen people take this to some bizarre alternate insanity of their own creation as a law to never measure anything, typically because the given developer cannot measure things.
Similar to the "code should be self documenting - ergo: We don't write any comments, ever"
It is to me incredible, how many „developers“, even “10 years senior developers” have no idea how to use a dubugger and or profiler. I’ve even met some that asked “what is a profiler?” I hope I’m not insulting anybody, but to me is like going to an “experienced mechanic” and they don’t know what a screwdriver is.
It’s because in most enterprise contexts:
1) Most bugs are integration bugs. Whereby multiple systems are glued together but there’s something about the API contract that the various developers in each system don’t understand.
2) Most performance issues are architectural. Unnecessary round trips, doing work synchronously, fetching too much data.
Debuggers and profilers don’t really help with those problems.
I personally know how to use those tools and I do for personal projects. It just doesn’t come up in my enterprise job.
If you don't have personal examples of using a profiler to diagnose an issue like "too many round trips" and identify where those round trips are coming from, then you've never inherited a complex performance problem before.
That is surprising. They have come up in every enterprise job i have had. Debuggers and profilers absolutely do help although for distributed systems they are called something else.
I once interviewed at Microsoft. The hiring manager asked me how I would go about programming a break point if I were writing a debugger. I started to explain how I would have to swap out an instruction to put an INT 3 in the code and then replace it when the breakpoint would hit.
He stopped me an said he was just looking to see if I knew what an INT 3 was. He said few engineers he interviewed had any idea.
The last time I interviewed (around 10 years ago) I was surprised when 9 of the 10 senior developers didn't know how many bits were in basic elemetary types.
(Then, shortly afterward I also tried to find a new job, realized the entire industry had changed, and was fortunate enough to decide it wasn't worth the trouble.)
> 9 of the 10 senior developers didn't know how many bits were in basic elemetary types
That's likely thanks to C which goes to great pains to not specify the size of the basic types. For example, for 64 bit architectures, "long" is 32 bits on the Mac and 64 bits everywhere else.
The net result of that is I never use C "long", instead using "int" and "long long".
This mess is why D has 32 bit ints and 64 bit longs, whether it's a 32 bit machine or a 64 bit machine. The result was we haven't had porting problems with integer sizes.
It's substantially worse on the JVM. One's intuition from C just fails when you have to think about references vs primitives, and the overhead of those (with or without compressed OOPs).
I've met very few folks who understand the overheads involved, and how extreme the benefits can be from avoiding those.
Conversely I've met many folks who come into managed environments and piss away time trying to wrangle the managed system into how they think it should work, instead of accepting that clever people wrote it and guidelines when followed result in acceptable outcomes.
The sort of insane stuff I've seen on the dotnet repo where people are trying to tear apart the entire type system just because they think they've cracked some secret performance code.
>on the dotnet repo
You mean the .net compiler/runtime itself? I haven't looked at it, but isn't that the one place you'd expect to see weirdly low-level C# code?
My favourite JVM trivia, although I openly admit I don't know if it's still true, is the fact that the size of a boolean is not defined.
If you ask a typical grad the size of a bool they will inevitably say one bit, but, CPUs and RAM, etc don't work like that, typically they expect WORD sized chunks of memory - meaning that the boolean size of one but becomes a WORD sized chunk, assuming that it hasn't been packed
That's a reasonable answer. But, I meant they seemed to have little understanding or interest. I don't interview much, and I'm probably a poor interviewer. But, I guess I was expecting some discussion.
I ran into some comp sci graduates in the early 80's who did not know what a "register" was.
To be fair, though, I come up short on a lot of things comp sci graduates know.
It's why Andrei Alexandrescu and I made a good team. I was the engineer, and he the scientist. The yin and the yang, so to speak.
Oooh, saw Andrei's name pop up and remember his books on C++ back in the day .. ran into a systems engineer a while ago that asked why during a tech review asked why some data size wasn't 1000 instead of 1024.. like err ??
Even more fun is pointers, especially when windows / macos were switching from 32-bits to 64-bits (in different ways).
Microsoft tried valiantly to make Win16 code portable to Win32, and Win32 to Win64. But it failed miserably, apparently because the programmers had never ported 16 bit C to 32 bit C, etc., and picked all the wrong abstractions.
> Even more fun is pointers, especially when windows / macos were switching from 32-bits to 64-bits (in different ways).
And yet even more of a fun time with porting pointer code was going from the various x86 memory models[0] to 32-bit. Depending on the program, the pain was either near, far, or huge... :-D
Why did they design it like that? It must have seemed like a good idea at the time.
In ancient computing times, which is when C was birthed, the size of integers at the hardware level and their representation was much more diverse than it is today. The register bit-width was almost arbitrary, not the tidy powers of 2 that everyone is accustomed to today.
The integer representation wasn't always two's complement in the early days of computing, so you couldn't even assume that. C++ only required integer representations to be two's complement as of C++20, since the last architectures that don't work this way had effectively been dead for decades.
In that context, an 'int' was supposed to be the native word size of an integer on a given architecture. A long time ago, 'int' was an abstraction over the dozen different bit-widths used in real hardware. In that context, it was an aid to portability.
C is a portable language, in that programs will likely compile successfully on a different architecture. Unfortunately, that doesn't mean they will run properly, as the semantics are not portable.