Making Deep Learning Go Brrrr from First Principles

horace.io

41 points by tosh 3 hours ago


tosh - 2 hours ago

> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS

wild

jdw64 - 2 hours ago

Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch

noosphr - 2 hours ago

>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.

https://arxiv.org/abs/1912.02292