A CPU that runs entirely on GPU

github.com

235 points by cypres 18 hours ago


jagged-chisel - 12 hours ago

“A CPU that runs entirely on the GPU”

I imagine a carefully crafted set of programming primitives used to build up the abstraction of a CPU…

“Every ALU operation is a trained neural network.”

Oh… oh. Fun. Just not the type of “interesting” I was hoping for.

andreadev - 2 hours ago

The bit about multiplication being ~12x faster than addition is worth pausing on. In silicon, addition is the "easy" operation — but here the complexity hierarchy completely inverts. Makes sense once you think about it: multiplication decomposes into parallel byte-pair lookups (which neural nets handle trivially as table approximation), while addition has a sequential carry chain you can't fully parallelize away.

Funny enough, analog computing had the same inversion — a Gilbert cell does multiplication cheaply, while addition needs more complex summing circuits. Completely different path to the same result.

What I haven't seen discussed: if the whole CPU is neural nets, the execution pipeline is differentiable end-to-end. You could backprop through program execution. Useless for booting Linux, but potentially interesting for program synthesis — learning instruction sequences via gradient descent instead of search. Feels like that's the more promising research direction here than trying to make it fast.

jdlyga - 4 hours ago

I'll do you one better, imagine a CPU that runs entirely in an LLM.

You’re absolutely right! I made an arithmetic mistake there — 3 * 3 is 9, not 8. Let’s correct that: Before: EAX = 3 After imul eax, eax: EAX = 9 Thanks for catching that — the correct return value is 9.

user____name - 11 hours ago

Someone needs to implement LLVMPipe to target this isa, then one can run software OpenGL emulation and call it "hardware accelerated".

bob1029 - 13 hours ago

A fun experiment but I wonder how many out there seriously think we could ever completely rid ourselves of the CPU. It seems to be a rising sentiment.

The cost of communicating information through space is dealt with in fundamentally different ways here. On the CPU it is addressed directly. The actual latency is minimized as much as possible, usually by predicting the future in various ways and keeping the spatial extent of each device (core complex) as small as possible. The GPU hides latency with massive parallelism. That's why we can put them across relatively slow networks and still see excellent performance.

Latency hiding cannot deal well in workloads that are branchy and serialized because you can only have one logical thread throughout. The CPU dominates this area because it doesn't cheat. It directly targets the objective. Making efficient, accurate control flow decisions tends to be more valuable than being able to process data in large volumes. It just happens that there are a few exceptions to this rule that are incredibly popular.

bmc7505 - 16 hours ago

As foretold six years ago. [1]

[1]: https://breandan.net/2020/06/30/graph-computation#roadmap

robertcprice1 - 10 hours ago

Hey everyone thank you taking a look at my project. This was purely just a “can I do it” type deal, but ultimately my goal is to make a running OS purely on GPU, or one composed of learned systems.

nomercy400 - 14 hours ago

I was taught years ago that MUL and ADD can be implemented in one or a few cycles. They can be the same complexity. What am I missing here?

Also, is it possible to use the GPU's ADD/MUL implementation? It is what a GPU does best.

himata4113 - 4 hours ago

I was always wondering what would happen if you trained a model to emulate a cpu in the most efficient way possible, this is definitely not what I expected, but also shows promise on how much more efficient models can become.

andrewdb - 14 hours ago

Why do we call them GPUs these days?

Most GPUs, sitting in racks in datacenters, aren't "processing graphics" anyhow.

deep1283 - 16 hours ago

This is a fun idea. What surprised me is the inversion where MUL ends up faster than ADD because the neural LUT removes sequential dependency while the adder still needs prefix stages.

lorenzohess - 16 hours ago

Out of curiosity, how much slower is this than an actual CPU?

Nevermark - 8 hours ago

Time to benchmark Doom.

Now we know future genius models won't even need CPUs, just tensor/rectifier circuits. If they need a CPU, they will just imagine them.

A low-bit model with adaptive sparse execution might even be able to imagine with performance. Effectively, neural PGA capability.

DonThomasitos - 11 hours ago

I don‘t understand why you would train a NN for an operation like sqrt that the GPU supports in silicon.

GeertB - 7 hours ago

I don't quite understand how multiply doesn't require addition as well to combine the various partial products.

RandyOrion - 6 hours ago

Cool. However, one still need CPU to send commands to GPU in order to let GPU do CPU things.

sudo_cowsay - 16 hours ago

"Multiplication is 12x faster than addition..."

Wow. That's cool but what happens to the regular CPU?

low_tech_punk - 8 hours ago

Saw the DOOM raycast demo at bottom of page.

Can't wait for someone to build a DOOM that runs entirely on GPU!

koolala - 12 hours ago

Exciting if an Ai that is helping in its own improvements finds this and incorporates it into its own architecture. Then it starts reading and running all the worlds binary and gains intelligence as a fully actualized "computer". Finally becoming both a master of language and of binary bits. Thinking in poetry and in pure precise numerical calculations.

artemonster - 13 hours ago

Every clueless person who suggest that we move to GPUs entirely have zero idea how things work and basically are suggesting using lambos to plow fields and tractors to race in nascar

throawayonthe - 13 hours ago

very tangentially related is whatever vectorware et al are doing: https://www.vectorware.com/blog/

jleyank - 11 hours ago

How is this different than the (various?) efforts back then to build a machine based on the Intel i860? Didn’t work, although people gave it a good try.

_blk - 5 hours ago

"Result: 100% accuracy on integer arithmetic" - Could someone with low-level LLM expertise comment on that: Is that future-proof, or does it have to be re-asserted with every rebuild of the neural building blocks? Can it be proven to remain correct? I assume there's a low-temperature setting that keeps it from getting too creative.

The creative thinking behind this project is truly mind boggling.

taofor4 - 9 hours ago

What is the purpose of this project? I didn't get it. How will it be useful?

wartywhoa23 - 10 hours ago

Oh these brave new ways to paraphrase the good old "fuck fuel economy"...

Thank you, Mr. Do-because-I-can!

Yours truly,

- GPU company CEO,

- Electric company CEO.

RagnarD - 16 hours ago

Being able to perform precise math in an LLM is important, glad to see this.

nicman23 - 16 hours ago

can i run linux on a nvidia card though?

- 16 hours ago
[deleted]
- 16 hours ago
[deleted]
mrlonglong - 15 hours ago

Now I've seen it all. Time to die.. (meant humourously)

raphaelmolly8 - 6 hours ago

[dead]

Surac - 16 hours ago

Well GPU are just special purpous CPU.

MadnessASAP - 15 hours ago

Ya know just today I was thinking around a way to compile a neural network down to assembly. Matching and replacing neural network structures with their closest machine code equivalent.

This is way cooler though! Instead of efficiently running a neural network on a CPU, I can inefficiently run my CPU on neural network! With the work being done to make more powerful GPUs and ASICs I bet in a few years I'll be able to run a 486 at 100MHz(!!) with power consumption just under a megawatt! The mind boggles at the sort of computations this will unlock!

Few more years and I'll even be able to realise the dream of self-hosting ChatGPT on my own neural network simulated CPU!