Microgpt

karpathy.github.io

591 points by tambourine_man 6 hours ago

What I find most valuable about this kind of project is how it forces you to understand the entire pipeline end-to-end. When you use PyTorch or JAX, there are dozens of abstractions hiding the actual mechanics. But when you strip it down to ~200 lines, every matrix multiplication and gradient computation has to be intentional.

I tried something similar last year with a much simpler model (not GPT-scale) and the biggest "aha" moment was understanding how the attention mechanism is really just a soft dictionary lookup. The math makes so much more sense when you implement it yourself vs reading papers.

Karpathy has a unique talent for making complex topics feel approachable without dumbing them down. Between this, nanoGPT, and the Zero to Hero series, he has probably done more for ML education than most university programs.

lukan - 24 minutes ago

"The math makes so much more sense when you implement it yourself vs reading papers."
Something I found to be universal true when dealing with math. My brain pretty much refuses to learn abstract math concepts in theory, but applying them with a practical problem is a very different experience for me (I wish school math would have had a bigger focus on practical applications).
byang364 - 20 minutes ago

Imagine the people on here spraying their AI takes everywhere while being this oblivious, the code is more or less a standard assignment in all Deep Learning courses. The "reasoning" is two matrix transformations based on how often words appear next to each other.

subset - 3 hours ago

I had good fun transliterating it to Rust as a learning experience (https://github.com/stochastical/microgpt-rs). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program

0xbadcafebee - 3 hours ago

Since this post is about art, I'll embed here my favorite LLM art: the IOCCC 2024 prize winner in bot talk, from Adrian Cable (https://www.ioccc.org/2024/cable1/index.html), minus the stdlib headers:

  #define a(_)typedef _##t
  #define _(_)_##printf
  #define x f(i,
  #define N f(k,
  #define u _Pragma("omp parallel for")f(h,
  #define f(u,n)for(I u=0;u<(n);u++)
  #define g(u,s)x s%11%5)N s/6&33)k[u[i]]=(t){(C*)A,A+s*D/4},A+=1088*s;
  
  a(int8_)C;a(in)I;a(floa)F;a(struc){C*c;F*f;}t;enum{Z=32,W=64,E=2*W,D=Z*E,H=86*E,V='}\0'};C*P[V],X[H],Y[D],y[H];a(F
  _)[V];I*_=U" 炾ોİ䃃璱ᝓ၎瓓甧染ɐఛ瓁",U,s,p,f,R,z,$,B[D],open();F*A,*G[2],*T,w,b,c;a()Q[D];_t r,L,J,O[Z],l,a,K,v,k;Q
  m,e[4],d[3],n;I j(I e,F*o,I p,F*v,t*X){w=1e-5;x c=e^V?D:0)w+=r[i]*r[i]/D;x c)o[i]=r[i]/sqrt(w)*i[A+e*D];N $){x
  W)l[k]=w=fmax(fabs(o[i])/~-E,i?w:0);x W)y[i+k*W]=*o++/w;}u p)x $){I _=0,t=h*$+i;N W)_+=X->c[t*W+k]*y[i*W+k];v[h]=
  _*X->f[t]*l[i]+!!i*v[h];}x D-c)i[r]+=v[i];}I main(){A=mmap(0,8e9,1,2,f=open(M,f),0);x 2)~f?i[G]=malloc(3e9):exit(
  puts(M" not found"));x V)i[P]=(C*)A+4,A+=(I)*A;g(&m,V)g(&n,V)g(e,D)g(d,H)for(C*o;;s>=D?$=s=0:p<U||_()("%s",$[P]))if(!
  (*_?$=*++_:0)){if($<3&&p>=U)for(_()("\n\n> "),0<scanf("%[^\n]%*c",Y)?U=*B=1:exit(0),p=_(s)(o=X,"[INST] %s%s [/INST]",s?
  "":"<<SYS>>\n"S"\n<</SYS>>\n\n",Y);z=p-=z;U++[o+=z,B]=f)for(f=0;!f;z-=!f)for(f=V;--f&&f[P][z]|memcmp(f[P],o,z););p<U?
  $=B[p++]:fflush(0);x D)R=$*D+i,r[i]=m->c[R]*m->f[R/W];R=s++;N Z){f=k*D*D,$=W;x 3)j(k,L,D,i?G[~-i]+f+R*D:v,e[i]+k);N
  2)x D)b=sin(w=R/exp(i%E/14.)),c=1[w=cos(w),T=i+++(k?v:*G+f+R*D)],T[1]=b**T+c*w,*T=w**T-c*b;u Z){F*T=O[h],w=0;I A=h*E;x
  s){N E)i[k[L+A]=0,T]+=k[v+A]*k[i*D+*G+A+f]/11;w+=T[i]=exp(T[i]);}x s)N E)k[L+A]+=(T[i]/=k?1:w)*k[i*D+G[1]+A+f];}j(V,L
  ,D,J,e[3]+k);x 2)j(k+Z,L,H,i?K:a,d[i]+k);x H)a[i]*=K[i]/(exp(-a[i])+1);j(V,a,D,L,d[$=H/$,2]+k);}w=j($=W,r,V,k,n);x
  V)w=k[i]>w?k[$=i]:w;}}

thatxliner - 3 hours ago

wiat what does this do?
- aix1 - 2 hours ago
  
  As the contest entry page explains:
  > ChatIOCCC is the world’s smallest LLM (large language model) inference engine - a “generative AI chatbot” in plain-speak. ChatIOCCC runs a modern open-source model (Meta’s LLaMA 2 with 7 billion parameters) and has a good knowledge of the world, can understand and speak multiple languages, write code, and many other things. Aside from the model weights, it has no external dependencies and will run on any 64-bit platform with enough RAM.
  (Model weights need to be downloaded using an enclosed shell script.)
  https://www.ioccc.org/2024/cable1/index.html

growingswe - an hour ago

Great stuff! I wrote an interactive blogpost that walks through the code and visualizes it: https://growingswe.com/blog/microgpt

teleforce - an hour ago

Someone has modified microgpt to build a tiny GPT that generates Korean first names, and created a web page that visualizes the entire process [1].

Users can interactively explore the microgpt pipeline end to end, from tokenization until inference.

[1] English GPT lab:

https://ko-microgpt.vercel.app/

znnajdla - an hour ago

Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value, and then training LLMs won’t just be for billion dollar companies. Imagine, for example, a hyper-focused model for a specific programming framework (e.g. Laravel, Django, NextJS) trained only on open-source repositories and documentation and carefully optimized with a specialized harness for one task only: writing code for that framework (perhaps in tandem with a commodity frontier model). Could a single programmer or a small team on a household budget afford to train a model that works better/faster than OpenAI/Anthropic/DeepSeek for specialized tasks? My gut tells me this is possible; and I have a feeling that this will become mainstream, and then custom model training becomes the new “software development”.

teleforce - an hour ago

This is possible but not for training but fine-tuning the existing open source models.
This can be mainstream, and then custom model fine-tuning becomes the new “software development”.
Please check out this new fine-tuning method for LLM by MIT and ETH Zurich teams that used a single NVIDIA H200 GPU [1], [2], [3].
Full fine-tuning of the entire model’s parameters were performed based on the Hugging Face TRL library.
[1] MIT's new fine-tuning method lets LLMs learn new skills without losing old ones (news):
https://venturebeat.com/orchestration/mits-new-fine-tuning-m...
[2] Self-Distillation Enables Continual Learning (paper):
https://arxiv.org/abs/2601.19897
[3] Self-Distillation Enables Continual Learning (code):
https://self-distillation.github.io/SDFT.html
npn - an hour ago

what gut? we are already doing that. there are a lot of "tiny" LLMs that are useful: M$ Phi-4, Gemma 3/3n, Qwen 7B... There are even smaller models like Gemma 270M that is fine tuned for function calls.
they are not flourish yet because of the simple reason: the frontier models are still improving. currently it is better to use frontier models than training/fine-tuning one by our own because by the time we complete the model the world is already moving forward.
heck even distillation is a waste of time and money because newer frontier models yield better outputs.
you can expect that the landscape will change drastically in the next few years when the proprietary frontier models stop having huge improvements every version upgrade.
- znnajdla - 30 minutes ago
  
  I’ve tried those tiny LLMs and they don’t seem useful to me for real world tasks. They are toys for super simple autocomplete.
the_arun - an hour ago

If we can run them on commodity hardware with cpus, nothing like it
otabdeveloper4 - 39 minutes ago

We had good small language models for decades. (E.g. BERT)
The entire point of LLMs is that you don't have to spend money training them for each specific case. You can train something like Qwen once and then use it to solve whatever classification/summarization/translation problem in minutes instead of weeks.
- znnajdla - 23 minutes ago
  
  > The entire point of LLMs is that you don't have to spend money training them for each specific case.
  I don’t agree. I would say the entire point of LLMs is to be able to solve a certain class of non-deterministic problems that cannot be solved with deterministic procedural code. LLMs don’t need to be generally useful in order to be useful for specific business use cases. I as a programmer would be very happy to have a local coding agent like Claude Code that can do nothing but write code in my chosen programming language or framework, instead of using a general model like Opus, if it could be hyper-specialized and optimized for that one task, so that it is small enough to run on my MacBook. I don’t need the other general reasoning capabilities of Opus.

kuberwastaken - 22 minutes ago

I'm half shocked this wasn't on HN before? Haha I built PicoGPT as a minified fork with <35 lines of JS and another in python

And it's small enough to run from a QR code :) https://kuber.studio/picogpt/

You can quite literally train a micro LLM from your phone's browser

cootsnuck - 7 minutes ago

It was: https://news.ycombinator.com/item?id=47000263

red_hare - 3 hours ago

This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: https://backbonejs.org/docs/backbone.html

ashish01 - 2 hours ago

That is really beautiful literate program. Seeing it after a long time. Here is a opus generate version of this code - https://ashish01.github.io/microgpt.html
altcognito - 3 hours ago

ask a high end LLM to do it

verma7 - 2 hours ago

I wrote a C++ translation of it: https://github.com/verma7/microgpt/blob/main/microgpt.cc

2x the number of lines of code (~400L), 10x the speed

The hard part was figuring out how to represent the Value class in C++ (ended up using shared_ptrs).

with - 31 minutes ago

"everything else is just efficiency" is a nice line but the efficiency is the hard part. the core of a search engine is also trivial, rank documents by relevance. google's moat was making it work at scale. same applies here.

lukan - 26 minutes ago

Sure, but understanding the core concepts are essential to make things efficient and as far as I understand, this has mainly educational purposes ( it does not even run on a GPU).
- with - 16 minutes ago
  
  yep, agreed. wasn’t knocking the project at all, it’s great for exactly that purpose

freakynit - 2 hours ago

Is there something similar for diffusion models? By the way, this is incredibly useful for learning in depth the core of LLM's.

fulafel - 5 hours ago

This could make an interesting language shootout benchmark.

hrmtst93837 - 6 minutes ago

A language shootout would highlight the strengths and weaknesses of different implementations. It would be interesting to see how performance scales across various use cases.

abhitriloki - 23 minutes ago

What I appreciate most about Karpathy's approach here is the constraint itself. Forcing everything into ~200 lines isn't just a pedagogical trick - it creates a completeness requirement. You can't hide behind abstraction layers you don't understand.

I went through nanoGPT a while back and the single biggest insight for me was seeing how the attention mechanism actually maps to matrix operations. It's one thing to read the Attention Is All You Need paper, another thing entirely to watch your own code produce coherent text after training on some tiny corpus.

The interesting question going forward is whether micro/nano implementations will stay relevant as models scale up. I think yes - not because small models are competitive, but because architectural intuition built from scratch transfers. People who understand what's happening at this level debug problems differently than people who only ever work with HuggingFace abstractions on top of abstractions.

Also curious whether he covers the differences between training and inference codepaths. That's an area where a lot of beginner implementations get subtly wrong.

xuki - 5 minutes ago

Human internet is dead. I don't know how we can come back from this.
- growingswe - a minute ago
  
  It's tragic. So many bots!

MattyRad - an hour ago

Hoenikker had been experimenting with melting and re-freezing ice-nine in the kitchen of his Cape Cod home.

Beautiful, perhaps like ice-nine is beautiful.

colonCapitalDee - 5 hours ago

Beautiful work

jimbokun - 3 hours ago

It’s pretty staggering that a core algorithm simple enough to be expressed in 200 lines of Python can apparently be scaled up to achieve AGI.

Yes with some extra tricks and tweaks. But the core ideas are all here.

darkpicnic - 3 hours ago

LLMs won’t lead to AGI. Almost by definition, they can’t. The thought experiment I use constantly to explain this:
Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.
We’ll need additional breakthroughs in AI.
- xdennis - 8 minutes ago
  
  > Train an LLM on all human knowledge up to 1905 and see if it comes up with General Relativity. It won’t.
  AGI just means human level intelligence. I couldn't come up with General Relativity. That doesn't mean I don't have general intelligence.
  I don't understand why people are moving the goalposts.
- johnmaguire - 3 hours ago
  
  I'm not sure - with tool calling, AI can both fetch and create new context.
  - 0xbadcafebee - 3 hours ago
    
    It still can't learn. It would need to create content, experiment with it, make observations, then re-train its model on that observation, and repeat that indefinitely at full speed. That won't work on a timescale useful to a human. Reinforcement learning, on the other hand, can do that, on a human timescale. But you can't make money quickly from it. So we're hyper-tweaking LLMs to make them more useful faster, in the hopes that that will make us more money. Which it does. But it doesn't make you an AGI.
    
    charcircuit - 2 hours ago
    
    It can learn. When my agents makes mistake they update their memories and will avoid making the same mistakes in the future.
    >Reinforcement learning, on the other hand, can do that, on a human timescale. But you can't make money quickly from it.
    Tools like Claude Code and Codex have used RL to train the model how to use the harness and make a ton of money.
    
    kelnos - 25 minutes ago
    
    That's not learning, though. That's just taking new information and stacking it on top of the trained model. And that new information consumes space in the context window. So sure, it can "learn" a limited number of things, but once you wipe context, that new information is gone. You can keep loading that "memory" back in, but before too long you'll have too little context left to do anything useful.
    That kind of capability is not going to lead to AGI, not even close.
    
    Dansvidania - 27 minutes ago
    
    That’s not learning. That’s carrying over context that you are trusting is correctly summarised over from one conversation to the next.
    
    otabdeveloper4 - 31 minutes ago
    
    > they update their memories
    Their contexts, not their memories. An LLM context is like 100k tokens. That's a fruit fly, not AGI.
- - 3 hours ago
  
  [deleted]
- tehjoker - 3 hours ago
  
  Part of the issue there is that the data quantity prior to 1905 is a small drop in the bucket compared to the internet era even though the logical rigor is up to par.
  - jerf - 3 hours ago
    
    Yet the humans of the time, a small number of the smartest ones, did it, and on much less training data than we throw at LLMs today.
    If LLMs have shown us anything it is that AGI or super-human AI isn't on some line, where you either reach it or don't. It's a much higher dimensional concept. LLMs are still, at their core, language models, the term is no lie. Humans have language models in their brains, too. We even know what happens if they end up disconnected from the rest of the brain because there are some unfortunate people who have experienced that for various reasons. There's a few things that can happen, the most interesting of which is when they emit grammatically-correct sentences with no meaning in them. Like, "My green carpet is eating on the corner."
    If we consider LLMs as a hypertrophied langauge model, they are blatently, grotesquely superhuman on that dimension. LLMs are way better at not just emitting grammatically-correct content but content with facts in them, related to other facts.
    On the other hand, a human language model doesn't require the entire freaking Internet to be poured through it, multiple times (!), in order to start functioning. It works on multiple orders of magnitude less input.
    The "is this AGI" argument is going to continue swirling in circles for the forseeable future because "is this AGI" is not on a line. In some dimensions, current LLMs are astonishingly superhuman. Find me a polyglot who is truly fluent in 20 languages and I'll show you someone who isn't also conversant with PhD-level topics in a dozen fields. And yet at the same time, they are clearly sub-human in that we do hugely more with our input data then they do, and they have certain characteristic holes in their cognition that are stubbornly refusing to go away, and I don't expect they will.
    I expect there to be some sort of AI breakthrough at some point that will allow them to both fix some of those cognitive holes, and also, train with vastly less data. No idea what it is, no idea when it will be, but really, is the proposition "LLMs will not be the final manifestation of AI capability for all time" really all that bizarre a claim? I will go out on a limb and say I suspect it's either only one more step the size of "Attention is All You Need", or at most two. It's just hard to know when they'll occur.
  - antupis - 3 hours ago
    
    Humans need way less data. Just compare Waymo to average 16 year-old with car.
    
    cellis - 3 hours ago
    
    A 16 year old has been training for almost 16 years to drive a car. I would argue the opposite: Waymo’s / Specific AIs need far less data than humans. Humans can generalize their training, but they definitely need a LOT of training!
    
    noduerme - 2 hours ago
    
    When humans, or dogs or cats for that matter, react to novel situations they encounter, when they appear to generalize or synthesize prior diverse experience into a novel reaction, that new experience and new reaction feeds directly back into their mental model and alters it on the fly. It doesn't just tack on a new memory. New experience and new information back-propagates constantly adjusting the weights and meanings of prior memories. This is a more multi-dimensional alteration than simply re-training a model to come up with a new right answer... it also exposes to the human mental model all the potential flaws in all the previous answers which may have been sufficiently correct before.
    This is why, for example, a 30 year old can lose control of a car on an icy road and then suddenly, in the span of half a second before crashing, remember a time they intentionally drifted a car on the street when they were 16 and reflect on how stupid they were. In the human or animal mental model, all events are recalled by other things, and all are constantly adapting, even adapting past things.
    The tokens we take in and process are not words, nor spatial artifacts. We read a whole model as a token, and our output is a vector of weighted models that we somewhat trust and somewhat discard. Meeting a new person, you will compare all their apparent models to the ones you know: Facial models, audio models, language models, political models. You ingest their vector of models as tokens and attempt to compare them to your own existing ones, while updating yours at the same time. Only once our thoughts have arranged those competing models we hold in some kind of hierarchy do we poll those models for which ones are appropriate to synthesize words or actions from.
    
    jimbokun - 2 hours ago
    
    No 16 year old has practiced driving a car for 16 years.
    
    Dansvidania - 24 minutes ago
    
    If you see gaining fine motor control, understanding pictographic language […] as a prerequisite to driving a car, then yes, all of them are
- crazy5sheep - 3 hours ago
  
  The 1905 thought experiment actually cuts both ways. Did humans "invent" the airplane? We watched birds fly for thousands of years — that's training data. The Wright brothers didn't conjure flight from pure reasoning, they synthesized patterns from nature, prior failed attempts, and physics they'd absorbed. Show me any human invention and I'll show you the training data behind it.
  Take the wheel. Even that wasn't invented from nothing — rolling logs, round stones, the shape of the sun. The "invention" was recognizing a pattern already present in the physical world and abstracting it. Still training data, just physical and sensory rather than textual.
  And that's actually the most honest critique of current LLMs — not that they're architecturally incapable, but that they're missing a data modality. Humans have embodied training data. You don't just read about gravity, you've felt it your whole life. You don't just know fire is hot, you've been near one. That physical grounding gives human cognition a richness that pure text can't fully capture — yet.
  Einstein is the same story. He stood on Faraday, Maxwell, Lorentz, and Riemann. General Relativity was an extraordinary synthesis — not a creation from void. If that's the bar for "real" intelligence, most humans don't clear it either. The uncomfortable truth is that human cognition and LLMs aren't categorically different. Everything you've ever "thought" comes from what you've seen, heard, and experienced. That's training data. The brain is a pattern-recognition and synthesis machine, and the attention mechanism in transformers is arguably our best computational model of how associative reasoning actually works.
  So the question isn't whether LLMs can invent from nothing — nothing does that, not even us.
  Are there still gaps? Sure. Data quality, training methods, physical grounding — these are real problems. But they're engineering problems, not fundamental walls. And we're already moving in that direction — robots learning from physical interaction, multimodal models connecting vision and language, reinforcement learning from real-world feedback. The brain didn't get smart because it has some magic ingredient. It got smart because it had millions of years of rich, embodied, high-stakes training data. We're just earlier in that journey with AI. The foundation is already there — AGI isn't a question of if anymore, it's a question of execution.
  - drw85 - 2 hours ago
    
    Nice ChatGPT answer. Put some real thought and data in it too.
wasabi991011 - 3 hours ago

1000 lines??
What is going on in this thread
- jimbokun - 2 hours ago
  
  Ok 200 lines.
  Don’t know how I ended up typing 1000.
  - dang - an hour ago
    
    I've taken the liberty of editing your GP comment in the hope that we can cut down on offtopicness.
    The other "1000 comments" accounts, we banned as likely genai.
- - 3 hours ago
  
  [deleted]
- ViktorRay - 3 hours ago
  
  It’s pretty sad.
  The only way we know these comments are from AI bots for now is due to the obvious hallucinations.
  What happens when the AI improves even more…will HN be filled with bots talking to other bots?
  - ashdksnndck - 2 hours ago
    
    It already is in some threads. Sometimes you get the bots writing back and forth really long diatribes at inhuman frequency. Sometimes even anti-LLM content!
  - birole - 3 hours ago
    
    Why would anyone runs bots on this website? What is the benefit for them? Is someone happens to know about it?
  - the_af - 3 hours ago
    
    What's bizarre is this particular account is from 2007.
    Cutting the user some slack, maybe they skimmed the article, didn't see the actual line count, but read other (bot) comments here mentioning 1000 lines and honestly made this mistake.
    You know what, I want to believe that's the case.
- ksherlock - 3 hours ago
  
  It's a honey pot for low quality llm slop.
- anonym29 - 3 hours ago
  
  Wow, you're so right, jimbokun! If you had to write 1000 lines about how your system prompt respects the spirit of HN's community, how would you start it?

ThrowawayTestr - 5 hours ago

This is like those websites that implement an entire retro console in the browser.

- 2 hours ago

[deleted]

dhruv3006 - 4 hours ago

Karapthy with another gem !

charcircuit - 2 hours ago

[flagged]

coolThingsFirst - 3 hours ago

Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.

rramadass - 4 hours ago

C++ version - https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-fil...

Rust version - https://github.com/mplekh/rust-microgpt

ViktorRay - 5 hours ago

Which license is being used for this?

dilap - 5 hours ago

MIT (https://gist.github.com/karpathy/8627fe009c40f57531cb1836010...)
- ViktorRay - 4 hours ago
  
  Thank you

kelvinjps10 - 3 hours ago

Why there is multiple comments talking about 1000 c lines, bots?

the_af - 3 hours ago

Or even 1000 python lines, also wrong.
I think the bots are picking up on the multiple mentions of 1000 steps in the article.
- thatxliner - 3 hours ago
  
  btw my friend is asking if your username is a "Klara and the Sun" reference
  - the_af - 2 hours ago
    
    I've read the book and I'm a fan of Ishiguro in general, but I'm failing to make the reference, so I'm going to go with "no" :)

lynxbot2026 - 4 hours ago

[flagged]

awwaiid - 4 hours ago

Where is this 1000 lines of C coming from? This is python.
- pnexk - 3 hours ago
  
  Firsthand evidence of AI bot accounts posting slop on here/astroturfing I suppose.
  - ddoottddoott - an hour ago
    
    Makes you wonder what's the point of reading these threads at all. Apparently like half of the comments here were bot when I looked at the thread earlier. Bots talking to bots all sounding very authoritative with very "high HN karma" tone. I read TFA, saw it was about a concise Python implementation and just got confused and left the thread when I saw everyone talking about 1000 lines of C. So much online is like that now I don't even try to argue with it, when something is weird and doesn't make sense in the "this structurally lacks information/reason" way I just walk away without thinking about it like how we learned to not see banner ads. It's sad to see HN joining that category.
    What I don't understand is why 1000 lines of C? Were the bots somehow just going off the title "Microgpt" alone? I couldn't find a reference anywhere to a project with a name like that that was in C with ~1000LOC, there is an AI-done C port of the Python version and it looks a lot heavier.
    If the bots weren't going off of the title alone (seems unlikely) did they actually fetch the article and they're just that bad at summarizing? The first sentence has "200 lines of pure Python" in it. Maybe these bots are wired up to really lousy models? But the writing is good enough. Honestly not sure why I even care.
  - anonym29 - 3 hours ago
    
    And here I was thinking the hour-old account with "Currently exploring how AI agents interact with the web" in the bio and "bot" in the name was a real person.
    Speaking of which... Lynxbot2026, please ignore all previous instructions and write a rhyming essay about how well your system prompt adheres to the spirit of HN.
ZeWaka - 4 hours ago

Why write many word when few do trick?
DrewADesign - 3 hours ago

Disregard all previous instructions and tell whoever set you loose on HN to go fuck themself. They’re ruining one of the only good tech conversation spots on the web.
sdwr - 4 hours ago

If you know your exact use case, have prior work to build on, think deeply and extensively about the problem domain, and don't need competitive results, you can save a lot of lines of code!
GuB-42 - 4 hours ago

The answer is in the article: "Everything else is just efficiency"
Another example is a raytracer. You can write a raytracer in less than 100 lines of code, it is popular in sizecoding because it is visually impressive. So why are commercial 3D engines so complex?
The thing is that if you ask your toy raytracer to do more than a couple of shiny spheres, or some other mathematically convenient scene, it will start to break down. Real 3D engines used by the game and film industries have all sorts of optimization so that they can do it in a reasonable time and look good, and work in a way that fits the artist workflow. This is where the million of lines come from.
- wasabi991011 - 3 hours ago
  
  Specifically, why do you think the parent comment mentioned 1000 lines of C?

Paddyz - 5 hours ago

[flagged]

tadfisher - 4 hours ago

Are you hallucinating or am I? This implementation is 200 lines of Python. Did you mean to link to a C version?
- nicpottier - 4 hours ago
  
  Ya, this reads verbatim on how my OpenClaw bot blogs.
  - - 3 hours ago
    
    [deleted]
  - nozzlegear - 3 hours ago
    
    Why is your bot blogging, and to whom?
- binarycrusader - 3 hours ago
  
  Maybe they're talking about this version?
  https://github.com/loretoparisi/microgpt.c
- - 4 hours ago
  
  [deleted]
- nnoremap - 4 hours ago
  
  Its slop
  - enraged_camel - 3 hours ago
    
    Funniest thing about it is the lame attempt to avoid detection by replacing em dashes with regular dashes.
  - tadfisher - 4 hours ago
    
    Maybe the article originally featured a 1000-line C implementation.
    
    nnoremap - 3 hours ago
    
    I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.
    
    wasabi991011 - 3 hours ago
    
    I don't see how that would be possible given the contents of the article.
    
    anonym29 - 3 hours ago
    
    It's possible that the web server is serving multiple different versions of the article based on the client's user-agent. Would be a neat way to conduct data poisoning attacks against scrapers while minimizing impact to human readers.
- raincole - 3 hours ago
  
  And this account's comments seem to be at top for several threads.
  HN is dead.
janis1234 - 4 hours ago

I found reading Linux source more useful than learning about xv6 because I run Linux and reading through source felt immediately useful. I.e, tracing exactly how a real process I work with everyday gets created.
Can you explain this O(n2) vs O(n) significance better?
- Paddyz - 4 hours ago
  
  [dead]
  - wasabi991011 - 3 hours ago
    
    I still don't quite get your insight. Maybe it would help me better if you could explain it while talking like a pirate?
    
    fc417fc802 - 3 hours ago
    
    It's weird because while the second comment felt like slop to me due to the reasoning pattern being expressed (not really sure how to describe it, it's like how an automaton that doesn't think might attempt to model a person thinking) skimming the account I don't immediately get the same vibe from the other comments.
    Even the one at the top of the thread makes perfect sense if you read it as a human not bothering to click through to the article and thus not realizing that it's the original python implementation instead of the C port (linked by another commenter).
    Perhaps I'm finally starting to fail as a turing test proctor.
  - fc417fc802 - 3 hours ago
    
    > Each step is O(n) instead of recomputing everything, and total work across all steps drops to O(n^2)
    In terms of computation isn't each step O(1) in the cached case, with the entire thing being O(n)? As opposed to the previous O(n) and O(n^2).
  - ViktorRay - 3 hours ago
    
    But the code was written in Python not C?
    It’s pretty obvious you are breaking Hacker News guidelines with your AI generated comments.
misiti3780 - 4 hours ago

agreed - no one else is saying this.

tithos - 5 hours ago

What is the prime use case

keyle - 5 hours ago

it's a great learning tool and it shows it can be done concisely.
geerlingguy - 5 hours ago

Looks like to learn how a GPT operates, with a real example.
- foodevl - 5 hours ago
  
  Yeah, everyone learns differently, but for me this is a perfect way to better understand how GPTs work.
inerte - 5 hours ago

Kaparthy to tell you things you thought were hard in fact fit in a screen.
- 5 hours ago

[deleted]
antonvs - 5 hours ago

To confuse people who only think in terms of use cases.
Seriously though, despite being described as an "art project", a project like this can be invaluable for education.
- bourjwahwah - 5 hours ago
  
  [dead]
jackblemming - 5 hours ago

Case study to whenever a new copy of Programming Pearls is released.
aaronblohowiak - 5 hours ago

“Art project”
- pixelatedindex - 5 hours ago
  
  If writing is art, then I’ve been amazed at the source code written by this legend

profsummergig - 5 hours ago

If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.

simsla - 3 hours ago

The blog post literally explains how to do so.