Hardware Stockholm Syndrome

programmingsimplicity.substack.com

106 points by rajiv_abraham 6 days ago


summa_tech - 6 days ago

> Look at a modern CPU die. See those frame pointer registers? That stack management hardware? That’s real estate. Silicon. Transistor budget.

You don't. What you see is caches, lots of caches. Huge vector register files. Massive TLBs.

Get real - learn something about modern chips and stop fighting 1980s battles.

flohofwoe - 6 days ago

Weird article.

Hasn't most code that's been compiled in the last few decades using the x86 frame pointer register (ebp) as a regular register? And C also worked just fine on CPUs that didn't have a dedicated frame pointer.

AFAIK the concepts of the stack and 'subroutine call instructions' existed before C because those concepts are also useful when writing assembly code (subroutines which return to the caller location are about the only way to isolate and share reusable pieces of code, and a stack is useful for deep call hierarchies - for instance some very old CPUs with a dedicated on-chip call-stack or 'return register' only allowed a limited call-depth or even only a single call-depth).

Also it's not like radical approaches in CPU designs are not tried all the time, they just usually fail because hardware and software are not developed in a vacuum, they heavily depend on each other. How much C played a role in that symbiosis can be argued about, but the internal design of modern CPUs doesn't have much in common with their ISA (which is more or less just a thin compatibility wrapper around the actual CPU and really doesn't take up much CPU space).

wrs - 6 days ago

Given this is rewinding to the 1970s, I expected a mention of CSP [0], or Transputers [1], or systolic arrays [2], or Connection Machines [3], or... The history wasn't quite as one-dimensional as this makes it seem.

[0] https://en.wikipedia.org/wiki/Communicating_sequential_proce...

[1] https://en.wikipedia.org/wiki/Transputer

[2] https://en.wikipedia.org/wiki/Systolic_array

[3] https://en.wikipedia.org/wiki/Connection_Machine

killerstorm - 2 days ago

This is poorly written nonsense article.

"In the beginning, CPUs gave you the basics: registers, memory access, CALL and RETURN instructions."

Well, CALL and RETURN need a stack: RETURN would need an address to return to. So there you go.

A concept of subroutine was definitely not introduced by C. It was an essential part of older languages like Algol and Fortran, and is inherently a good way to organize computation. E.g the idea is that you can implement matrix multiplication subroutine just once and then call it every time you need to multiply matrices. That was absolutely a staple of programming back in the day.

Synchronous calls offer a simple memory management convention: caller takes care of data structures passed to callee. If caller's state is not maintained then you need to take care of allocated data in some other way, e.g. introduce GC. So synchronous calls are the simpler, less opinionated option.

mrheosuper - 6 days ago

>Pong had no software. Zero. It was built entirely from hardware logic chips - flip-flops, counters, comparators. Massively parallel.

And that is what FPGA for.

This strikes me the author lack of hardware knowledge but still try to write a post about hardware.

bibanez - 6 days ago

You see innovation in this space a lot in research. For example, Dalorex [1] or Azul [2] for operations on sparse matrices. Currently a more general version of Azul is being developed at MIT with support for arbitrary matrix/graph algorithms with reconfigurable logic.

[1] https://ieeexplore.ieee.org/document/10071089 [2] https://ieeexplore.ieee.org/document/10764548

Animats - 6 days ago

If you disallow recursion, or put an upper bound on recursion depth, you can statically allocate all "stack based" objects at compile time. Modula I did this, allowing multithreading on machines which did not have enough memory to allow stack growth. Statically analyzing worst case stack depth is still a thing in real time control.

Not clear that this would speed things up much today.

jebarker - 2 days ago

> …and has more computing power than the machines that sent humans to the moon.

We’ve all read this comparison many times. Is there any reason to think of sending people to the moon as a difficult computational problem though? Conversely, just because relatively little computation was needed to send someone to the moon does that necessarily mean that more computational power is a necessary path to doing something we’d all agree is more impressive, e.g. sending someone to Mars?

dooglius - 2 days ago

Pretty clueless. When hardware with a new paradigm is invented that provides value, new software has been created for it. GPUs are a great example, TPUs/systolic arrays and the Cerebras device are others; all of these are pretty successful. CPUs architecture is what it is because it's the best approach anyone has come up with for it's domain, and C is built around that, not the other way around. If you want to claim CPUs can be done better, you're going to give something much more concrete than vague gesturing at how things could be different. How, specifically, do you intend to build an abstraction around Pong's approach? What does the programming model look like? Seems to me that this is a pretty difficult way to build anything, and Pong only managed out of its simplicity.

TinkersW - 2 days ago

Article doesn't make much sense, rambles on about frame stacks and calling convention etc, when these don't matter very much on modern hardware, and are hardly standardized(windows and linux don't use the same CC).

Modern hardware is about keeping the compute as busy as possible with as fast as possible access to memory, pretty much the opposite of the proposed solution..message passing.

nippoo - 2 days ago

One of the big things this article fails to mention is that TDP/heat budget is way more of a constraint than number of transistors - at small feature size, silicon is (relatively) cheap, power isn't.

There's no way you can use 100% of your CPU - it would instantly overheat. So it suddenly makes even more sense to have optimised hardware units for all sorts of processes (h264 encoding, crypto etc) if you can do a task any more efficiently than basic logic.

rendall - 2 days ago

The article is no more about transistors or literal frame-pointer hardware than a discussion about the QWERTY keyboard layout is about the metallurgy and mechanics of typewriter arms.

It's about how early design choices, once reinforced by tooling and habit, shape the whole ecosystem's assumptions about what’s "normal" or "efficient."

The only logical refutation of the article would be to demonstrate that any other computational paradigm, such as dataflow, message-passing, continuation-based, logic, actor, whatever, can execute on commodity CPUs with the same efficiency as imperative C-style code.

Saying "modern CPUs don't have stack hardware" is a bit like saying "modern keyboards don't jam, so QWERTY isn't a problem."

True, but beside the point. The argument isn't that QWERTY (or C) is technically flawed, but that decades of co-evolution have made their conventions invisible, and that invisibility limits how we imagine alternatives.

The author's Stockholm Syndrome metaphor isn't claiming we can’t build other kinds of CPUs. Of course we can. It's pointing out how our collective sense of what computing should look like has been quietly standardized, much like how QWERTY standardized how we type.

Saying that "modern CPUs are mostly caches and vector units" is like saying modern keyboards don't have typebars that jam. Technically true, but it misses that we're still typing on layouts designed for those constraints.

Dismissing the critique as fighting 1980s battles is like saying nobody uses typewriters anymore, so QWERTY doesn’t matter.

Pointing out that C works fine on architectures without a frame pointer is like noting that Dvorak and Colemak exist. Yes but it ignores how systemic inertia keeps alternatives niche.

The argument that radical CPU designs fail because hardware and software co-evolve fits the analogy: people have tried new keyboard layouts too, but they rarely succeed because everything from muscle memory to software assumes QWERTY.

The claim that CPU internals are now nothing like their ISA is just like saying keyboards use digital scanning instead of levers. True, but irrelevant to the surface conventions that still shape how we interact with them.

This dismissive pile-on validates the article's main metaphor of Stockholm Syndrome surprisingly directly!

constantcrying - 2 days ago

I agree with the conclusion, but nothing about the arguments to arrive at that conclusion sound true at all.

Hardware does not have a Stockholm Syndrome. Making Chips that are better at things people use these Chips for is the correct thing to do. The CPU architecture is relatively static because making hardware for software which does not exist is a terrible idea. Look at Itanium, a total failure.

The C paradigm which the author bemoans never was the only one, but it was good, it worked well and that is why it is the default to this day. It is also good enough to emulate message passing and parallelism, it is totally non-obvious that throwing out this paradigm is in any way beneficial over the current status quo. There is no reason for the abstractions we want to also to be the abstractions the hardware uses.

zyxzevn - 2 days ago

Check out Transputers, that were programmed via Occam. They do most of the stuff that the article desires. Though its hardware is restricted to a matrix orientation.

Another option is Erlang. On the top level it is organized with micro-services instead of functions.

None of them are system languages. The old hardware had weird data and memory formats. With C a lot of assembler could be avoided to program this hardware. It came as a default with Unix and some other operating systems. Fortran and Pascal were kind of similar.

The most used default languages on most systems were for interpreters. So you got LISP and BASIC. There is no fast hardware for that. To get stuff fast, one needed to program assembler, unless there was a C-compiler available.

eqvinox - 2 days ago

The article (and a lot of comments here) confuse C with the psABI (platform specific ABI, which is what really defines conventions —more than just calling— for executable code.)

Due to history, yes, most psABIs are "do what C does". But the real problem isn't C rigidly frozen, it's psABIs designed around these "classical" programming models. E.g. none of the Linux psABIs even have a concept of "message passing", let alone more creative deviations from 'good ole' imperative code.

xg15 - 2 days ago

You could say the same about JavaScript: JS is "fast" today, even though the language itself is hilariously inefficient - but browser vendors invested an ungodly amount of work into optimizing their engines, solving all kinds of crazy problems that wouldn't have been there in the first place if the language had been designed differently, until execution is now good enough that you can even use it for high-throughput server-side tasks.

mcdeltat - 6 days ago

Would some message passing hardware actually be "better" in terms of performance, efficiency, and ease of construction? I thought moving data between hardware subsystems is generally pretty expensive (e.g. atomic memory instructions, there's a significant performance penalty).

Disclaimer that I'm not a hardware engineer though.

v9v - 2 days ago

Might be relevant: https://yosefk.com/blog/the-high-level-cpu-challenge.html

poppafuze - 2 days ago

Complete fantasy rubbish from typical software thinking. Skipped over all the reasons we no longer wire a Z80 directly to static RAM, and what that precipitates.

IshKebab - 2 days ago

This is one of those "what if we're doing everything wrong and there's a better alternative?" articles that is entirely useless because it doesn't propose a single alternative.

cadamsdotcom - 6 days ago

To say nothing of virtualization!

And we solve the inefficiency with hypervisors!