Train Your Own LLM from Scratch

github.com

140 points by kristianpaul 3 hours ago


JoeDaDude - 2 minutes ago

Coincidentally, I just started on Build a Large Language Model (From Scratch), a repo/book/course by Sebastian Raschka [0][1][2]. Maybe it is a good problem to have to have to decide which learning resource to use.

[0] https://github.com/rasbt/LLMs-from-scratch

[1] https://www.manning.com/books/build-a-large-language-model-f...

[2] https://magazine.sebastianraschka.com/p/coding-llms-from-the...

jvican - 2 hours ago

If you're interested in this resource, I highly recommend checking out Stanford's CS336 class. It covers all this curriculum in a lot more depth, introduces you into a lot of theoretical aspects (scaling laws, intuitions) and systems thinking (kernel optimization/profiling). For this, you have to do the assignments, of course... https://cs336.stanford.edu/

antirez - 4 minutes ago

Context: he is one of the MLX developers, a skilled ML researcher.

steveharing1 - 5 minutes ago

The documentation is really helpful enough to get started

NSUserDefaults - an hour ago

Been doing it since the day I was born. The beginnings were hard but I’m getting there.

ofsen - 34 minutes ago

This looks like exact copy of this video of andrej karpathy ( https://youtu.be/kCc8FmEb1nY ) but in a writing format, am i wrong ?

baalimago - 2 hours ago

Train your LM from scratch*

I doubt you have a machine big enough to make it "Large".

hiroakiaizawa - 2 hours ago

Nice. What scale does this realistically reach on a single machine?

iamnotarobotman - 2 hours ago

This looks great for a first introduction to training LLMs, and it looks simple enough to try this locally. Great job!