We tasked Opus 4.6 using agent teams to build a C Compiler

anthropic.com

190 points by modeless 3 hours ago


geooff_ - 4 minutes ago

Maybe I'm naive, but I find these re-engineering complex product posts underwhelming. C Compilers exist and realistically Claudes training corpus contains a ton of C Compiler code. The task is already perfectly defined. There exists a benchmark of well-adopted codebases that can be used to prove if this is a working solution. Half the difficulty in making something is proving it works and is complete.

IMO a simpler novel product that humans enjoy is 10x more impressive than rehashing a solved problem, regardless of difficulty.

NitpickLawyer - 2 hours ago

This is a much more reasonable take than the cursor-browser thing. A few things that make it pretty impressive:

> This was a clean-room implementation (Claude did not have internet access at any point during its development); it depends only on the Rust standard library. The 100,000-line compiler can build Linux 6.9 on x86, ARM, and RISC-V. It can also compile QEMU, FFmpeg, SQlite, postgres, redis

> I started by drafting what I wanted: a from-scratch optimizing compiler with no dependencies, GCC-compatible, able to compile the Linux kernel, and designed to support multiple backends. While I specified some aspects of the design (e.g., that it should have an SSA IR to enable multiple optimization passes) I did not go into any detail on how to do so.

> Previous Opus 4 models were barely capable of producing a functional compiler. Opus 4.5 was the first to cross a threshold that allowed it to produce a functional compiler which could pass large test suites, but it was still incapable of compiling any real large projects.

And the very open points about limitations (and hacks, as cc loves hacks):

> It lacks the 16-bit x86 compiler that is necessary to boot [...] Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase

> It does not have its own assembler and linker;

> Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

Ending with a very down to earth take:

> The resulting compiler has nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality.

All in all, I'd say it's a cool little experiment, impressive even with the limitations, and a good test-case as the author says "The resulting compiler has nearly reached the limits of Opus’s abilities". Yeah, that's fair, but still highly imrpessive IMO.

201984 - 31 minutes ago

https://github.com/anthropics/claudes-c-compiler/issues/1

Havoc - 38 minutes ago

Cool project, but they really could have skipped the mention of clean room. Something trained on every copyrighted thing known to mankind is the opposite of clean room

btown - 2 hours ago

> This was a clean-room implementation (Claude did not have internet access at any point during its development); it depends only on the Rust standard library. The 100,000-line compiler can build Linux 6.9 on x86, ARM, and RISC-V. It can also compile QEMU, FFmpeg, SQlite, postgres, redis, and has a 99% pass rate on most compiler test suites including the GCC torture test suite. It also passes the developer's ultimate litmus test: it can compile and run Doom.

This is incredible!

But it also speaks to the limitations of these systems: while these agentic systems can do amazing things when automatically-evaluable, robust test suites exist... you hit diminishing returns when you, as a human orchestrator of agentic systems, are making business decisions as fast as the AI can bring them to your attention. And that assumes the AI isn't just making business assumptions with the same lack of context, compounded with motivation to seem self-reliant, that a non-goal-aligned human contractor would have.

throwaway2027 - 5 minutes ago

Next time can you build a Rust compiler in C? It doesn't even have to check things or have a borrow checker, as long as it reduces the compile times so it's like a fast debug iteration compiler.

jcalvinowens - 2 hours ago

How much of this result is effectively plagiarized open source compiler code? I don't understand how this is compelling at all: obviously it can regurgitate things that are nearly identical in capability to already existing code it was explicitly trained on...

It's very telling how all these examples are all "look, we made it recreate a shitter version of a thing that already exists in the training set".

whinvik - 2 hours ago

It's weird to see the expectation that the result should be perfect.

All said and done, that its even possible is remarkable. Maybe these all go into training the next Opus or Sonnet and we start getting models that can create efficient compilers from scratch. That would be something!

akrauss - 2 hours ago

I would like to see the following published:

- All prompts used

- The structure of the agent team (which agents / which roles)

- Any other material that went into the process

This would be a good source for learning, even though I'm not ready to spend 20k$ just for replicating the experiment.

OsrsNeedsf2P - 2 hours ago

This is like a working version of the Cursor blog. The evidence - it compiling the Linux kernel - is much more impressive than a browser that didn't even compile (until manually intervened)

ks2048 - an hour ago

It's cool that you can look at the git history to see what it did. Unfortunately, I do not see any of the human written prompts (?).

First 10 commits, "git log --all --pretty=format:%s --reverse | head",

  Initial commit: empty repo structure
  Lock: initial compiler scaffold task
  Initial compiler scaffold: full pipeline for x86-64, AArch64, RISC-V
  Lock: implement array subscript and lvalue assignments
  Implement array subscript, lvalue assignments, and short-circuit evaluation
  Add idea: type-aware codegen for correct sized operations
  Lock: type-aware codegen for correct sized operations
  Implement type-aware codegen for correct sized operations
  Lock: implement global variable support
  Implement global variable support across all three backends
exitcode0000 - 23 minutes ago

Cool article, interesting to read about their challenges. I've tasked Claude with building an Ada83 compiler targeting LLVM IR within a single C file - currently at 100% a-series and ~30% c-series coverage for the ACATS (Ada Conformity Assessment Test Suite).

I am not using teams though and there is quite a bit of knowledge needed to direct it (even with the test suite).

In case anyone is curious: https://github.com/AdaDoom3/Ada83/tree/main

polskibus - 20 minutes ago

So did the Linux compiled with this compiler worked? Does it work the same as GCC-compiled Linux (but slower due to generating non optimized code?)

epolanski - an hour ago

However it was achieved, building a such a complex project like a C compiler on a 20k $ budget in full autonomy is quite impressive.

Imho some commenters focus way too much on the (many, and honestly also shared by the blog post too) cons, that they forget to be genuinely impressed by the steps forward.

yu3zhou4 - an hour ago

At this point, I genuinely don't know what to learn next to not become obsolete when another Opus version gets released

stephc_int13 - 25 minutes ago

They should add this to the benchmark suite, and create a custom eval for how good the resulting compiler is, as well as how maintainable the source code.

falloutx - 2 hours ago

So it copied one of the C compilers? This was always possible but now you need to pay $1000 in API costs to Anthropic

gignico - 2 hours ago

> To stress test it, I tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. Over nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V.

If you don't care about code quality, maintainability, readability, conformance to the specification, and performance of the compiler and of the compiled code, please, give me your $20,000, I'll give you your C compiler written from scratch :)

small_model - 2 hours ago

How about we get the LLM's to collaborate and design a perfect programming language for LLM coding, it would be terse (less tokens) easy for pattern searches etc and very fast to build, iterate over.

throwaway2027 - 2 hours ago

I think it's funny how me and I assume many others tried to do the same thing and they probably saw it being a popular query or had the same idea.

owenpalmer - 2 hours ago

It can compile the linux kernel, but does it boot?

7734128 - 2 hours ago

I'm sure this is impressive, but it's probably not the best test case given how many C compilers there are out there and how they presumably have been featured in the training data.

This is almost like asking me to invent a path finding algorithm when I've been thought Dijkstra's and A*.

sho_hn - 2 hours ago

Nothing in the post about whether the compiled kernel boots.

gre - 2 hours ago

There's a terrible bug where once it compacts then it sometimes pulls in .o or binary files and immediately fills your entire context. Then it compacts again...10m and your token budget is gone for the 5 hour period. edit: hooks that prevent it from reading binary files can't prevent this.

Please fix.. :)

light_hue_1 - 2 hours ago

> This was a clean-room implementation (Claude did not have internet access at any point during its development);

This is absolutely false and I wish the people doing these demonstrations were more honest.

It had access to GCC! Not only that, using GCC as an oracle was critical and had to be built in by hand.

Like the web browser project this shows how far you can get when you have a reference implementation, good benchmarks, and clear metrics. But that's not the real world for 99% of people, this is the easiest scenario for any ML setting.

sjsjsbsh - 2 hours ago

> So, while this experiment excites me, it also leaves me feeling uneasy. Building this compiler has been some of the most fun I’ve had recently, but I did not expect this to be anywhere near possible so early in 2026

What? Didn’t cursed lang do something similar like 6 or 7 months ago? These bombastic marketing tactics are getting tired.

dmitrygr - 2 hours ago

> The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.

Worse than "-O0" takes skill...

So then, it produced something much worse than tcc (which is better than gcc -O0), an equivalent of which one man can produce in under two weeks. So even all those tokens and dollars did not equal one man's week of work.

Except the one man might explain such arbitrary and shitty code as this:

https://github.com/anthropics/claudes-c-compiler/blob/main/s...

why x9? who knows?!

Oh god the more i look at this code the happier I get. I can already feel the contracts coming to fix LLM slop like this when any company who takes this seriously needs it maintained and cannot...

hrgadyx - 2 hours ago

[flagged]

trilogic - 2 hours ago

Can it create employment? How is this making life better. I understand the achievement but come on, wouldn´t it be something to show if you created employment for 10000 people using your 20000 USD!

Microsoft, OpenAI, Anthropic, XAI, all solving the wrong problems, your problems not the collective ones.

- 2 hours ago
[deleted]
chvid - 2 hours ago

100.000 lines of code for something that is literally a text book task?

I guess if it only created 1.000 lines it would be easy to see where those lines came from.

fxtentacle - 2 hours ago

You could hire a reasonably skilled dev in India for a week for $1k —- or you could pay $20k in LLM tokens, spend 2 hours writing essays to explain what you want, and then get a buggy mess.