Show HN: I built a toy TPU that can do inference and training on the XOR problem

131 points by evxxan 3 days ago

We wanted to do something very challenging to prove to ourselves that we can do anything we put our mind to. The reasoning for why we chose to build a toy TPU specifically is fairly simple:

- Building a chip for ML workloads seemed cool - There was no well-documented open source repo for an ML accelerator that performed both inference and training

None of us have real professional experience in hardware design, which, in a way, made the TPU even more appealing since we weren't able to estimate exactly how difficult it would be. As we worked on the initial stages of this project, we established a strict design philosophy: TO ALWAYS TRY THE HACKY WAY. This meant trying out the "dumb" ideas that came to our mind first BEFORE consulting external sources. This philosophy helped us make sure we weren't reverse engineering the TPU, but rather re-inventing it, which helped us derive many of the key mechanisms used in the TPU ourselves.

We also wanted to treat this project as an exercise to code without relying on AI to write for us, since we felt that our initial instinct recently has been to reach for llms whenever we faced a slight struggle. We wanted to cultivate a certain style of thinking that we could take forward with us and use in any future endeavours to think through difficult problems.

Throughout this project we tried to learn as much as we could about the fundamentals of deep learning, hardware design and creating algorithms and we found that the best way to learn about this stuff is by drawing everything out and making that our first instinct. In tinytpu.com, you will see how our explanations were inspired by this philosophy.

Note that this is NOT a 1-to-1 replica of the TPU--it is our attempt at re-inventing a toy version of it ourselves.

ganiszulfa - 2 days ago

Amazing project, and amazing write-up, I especially like the animations. What's the end goal here? Putting these TPUs in the consumer hands or edge devices?

jacquesm - 2 days ago

Sometimes it is the projects where you don't know that you really don't know what you are doing that are the most satisfying, kudos, amazing work you have done.

evxxan - 2 days ago

Thank you!

skybrian - 2 days ago

It's unclear to me what the end result is. Did you build real hardware or is it simulated somehow? If it's hardware, what kind and how did you make it?

jacquesm - 2 days ago

Verilog spec by the looks of it. So you should be able to make it work on an FPGA or if you happen to have a chip fab in your garage you might want to make your own silicon ;) I'd go the FPGA route.
antognini - 2 days ago

Based on the code in the repo it looks like they designed the chip in verilog and then ran it in a simulator. But if they have the verilog code in principle they could send it off to a fab and get real hardware back.
- UncleOxidant - 2 days ago
  
  Next step would be to try it out in an FPGA.
zhainya - 2 days ago

I feel like I missed a whole section somewhere. "Built a toy TPU". What does that mean? I have no idea what was actually "built" here.
- evxxan - 2 days ago
  
  By "toy TPU", we simulated forward pass + backprop on a minimal tpu-like accelerator.
evxxan - 2 days ago

all in simulation :)

airza - 2 days ago

What did you use to make the illustrations? It looks nice.

frutiger - 2 days ago

Not OP, but these look like Excalidraw.

zoobab - 2 days ago

Maybe try to build a proto with LiteX?

utopcell - 2 days ago

The Google team used Chisel instead of SystemVerilog. You could consider switching to that if it makes sense for your project.

FirmwareBurner - 2 days ago

>The Google team used Chisel instead of SystemVerilog.
Not sure blindly copying whatever Google is doing is always the right idea for small projects.
They have unlimited ad money and some quirky hiring practices, so they can afford to have development practices that go against HW industry norms, just for shits and giggles, without worrying about the costs.
- utopcell - 2 days ago
  
  [flagged]

UncleOxidant - 2 days ago

Have you tried it out in an FPGA?

evxxan - 2 days ago

Not yet! But that's our next step.
- utopcell - 2 days ago
  
  tang nano 20k. You can't find any cheaper fpga board than this.
  - UncleOxidant - 2 days ago
    
    You can apparently use the open source yosys/nxtpnr tools with the tang nano 9k, but, unless something has changed recently, nxtpnr doesn't work with the 20K yet. However, I found the Gowin tools to work reasonably well (and definitely way less bloated than the Xilinx & Altera tools.)
  - addaon - 2 days ago
    
    At a higher price point but with more capability, Digilent has a one-week 20% sale on their FPGA boards this week. Some good options (Artix 7 and Spartan 7) within spitting distance of $100.
    
    UncleOxidant - 2 days ago
    
    From what it looks like (Xilinx parts primarily) if I bought one of these boards I'd be stuck using either Altera or Xilinx tools. I think some spartan 7s work with yosys/nxtpnr, but not sure how well.
    
    addaon - 2 days ago
    
    Yep. The Xilinx tools are very, very good; but they're definitely proprietary.
    
    UncleOxidant - 16 hours ago
    
    > The Xilinx tools are very, very good
    Ummm... no, that has not been my experience at all. I'd replace 'good' with 'buggy' in that sentence. And also very, very bloated - like 90GB bloated. I've had good experiences using yosys/nxtpnr/SymbiFlow, but that's kind of limited to the Lattice ICE40, ECP5 families and Quicklogic.

skyzouwdev - 3 days ago

This is super cool. The fact that you went in without hardware experience and still pushed through makes it even more impressive. I like the philosophy of trying the “hacky” way first instead of just copying existing designs—it’s probably the fastest path to real understanding. Curious, what was the hardest part where you almost gave up?