Scaffolding to Superhuman: How Curriculum Learning Solved 2048 and Tetris

kywch.github.io

123 points by a1k0n 13 hours ago


omneity - 12 hours ago

Related, I heard about curriculum learning for LLMs quite often but I couldn’t find a library to order training data by an arbitrary measure like difficulty, so I made one[0].

What you get is an iterator over the dataset that samples based on how far you are in the training.

0: https://github.com/omarkamali/curriculus

juggy69 - an hour ago

Is there value in using deep RL for problems that seem more suited to planning-based approaches?

kywch - 6 hours ago

You can watch these agents play live, and you can also intervene * 2048: https://kywch.github.io/games/2048.html * Tetris: https://kywch.github.io/games/tetris.html

gyrovagueGeist - 9 hours ago

I've always found curriculum learning incredibly hard to tune and calibrate reliably (even more so than many other RL approaches!).

Reward scales and horizon lengths may vary across tasks with different difficulty, effectively exploring policy space (keeping multimodal strategy distributions for exploration before overfitting on small problems), and catastrophic forgetting when mixing curriculum levels or when introducing them too late.

Does any reader/or the author have good heuristics for these? Or is it still so problem dependent that hyper parameter search for finding something that works in spite of these challenges is still the go to?

bob1029 - 12 hours ago

> To learn, agents must experience high-value states, which are hard (or impossible) for untrained agents to reach. The endgame-only envs were the final piece to crack 65k. The endgame requires tens of thousands of correct moves where a single mistake ends the game, but to practice, agents must first get there.

This seems really similar to the motivations around masked language modeling. By providing increasingly-masked targets over time, a smooth difficulty curve can be established. Randomly masking X% of the tokens/bytes is trivial to implement. MLM can take a small corpus and turn it into an astronomically large one.

drubs - 11 hours ago

Star the puffer https://github.com/PufferAI/PufferLib

someoneontenet - 11 hours ago

Curriculum learning helped me out a lot in this project too https://www.robw.fyi/2025/12/28/solve-hi-q-with-alphazero-an...

infinitepro - 7 hours ago

Unless I am mistaken, this would be the first heuristic-free model trained to play tetris, which is pretty incredible, since mastering tetris from just raw game state has never been close to solved, till now(?)

Zacharias030 - 2 hours ago

I'm gonna go out on a limb and say that this is LLM written slop that is badly edited by a human. Factually correct but the awful writing remains.

NooneAtAll3 - 6 hours ago

I wonder if he tried NNUE

pedrozieg - 11 hours ago

What I like about this writeup is that it quietly demolishes the idea that you need DeepMind-scale resources to get “superhuman” RL. The headline result is less about 2048 and Tetris and more about treating the data pipeline as the main product: careful observation design, reward shaping, and then a curriculum that drops the agent straight into high-value endgame states so it ever sees them in the first place. Once your env runs at millions of steps per second on a single 4090, the bottleneck is human iteration on those choices, not FLOPs.

The happy Tetris bug is also a neat example of how “bad” inputs can act like curriculum or data augmentation. Corrupted observations forced the policy to be robust to chaos early, which then paid off when the game actually got hard. That feels very similar to tricks in other domains where we deliberately randomize or mask parts of the input. It makes me wonder how many surprisingly strong RL systems in the wild are really powered by accidental curricula that nobody has fully noticed or formalized yet.

jsuarez5341 - 11 hours ago

[dead]

hiddencost - 12 hours ago

Those are not hard tasks ...

kgwxd - 10 hours ago

Great, add "curriculum" to the list of words that will spark my interest in human learning, only for it to be about garbage AI. I want HN with a hard rule against AI posts.