The Zig project's rationale for their anti-AI contribution policy
simonwillison.net319 points by lumpa 9 hours ago
319 points by lumpa 9 hours ago
From https://kristoff.it/blog/contributor-poker-and-ai/:
"Unfortunately the reality of LLM-based contributions has been mostly negative for us, from an increase in background noise due to worthless drive-by PRs full of hallucinations (that wouldn’t even compile, let alone pass CI), to insane 10 thousand line long first time PRs. In-between we also received plenty of PRs that looked fine on the surface, some of which explicitly claimed to not have made use of LLMs, but where follow-up discussions immediately made it clear that the author was sneakily consulting an LLM and regurgitating its mistake-filled replies to us."
Pretty much sums up the LLM fanbase.
I don't think it's the complete fanbase. However, there are lots of people in the world who live their whole life by vibing. It's a viable way to live and sometimes it's the only way to live. But they have a very loose relationship with truth and reason. Programming was a domain that filtered out those people because they found it hard to succeed at it. LLM's have changed that and it's a huge problem. It's hard to know if LLMs will end up being a net win for the industry. They may speed up the good programmers a little, but those people were able to program anyway without LLMs. They will speed up the bad programmers a lot and that's where the balance sheet goes into the red.
> However, there are lots of people in the world who live their whole life by vibing
Why are they often so desperate to lie and non-consensually harass others with their vibing rather than be honest about it? Why do they think they are "helping" with hallucinated rubbish that can't even build?
I use LLMs. It is not difficult to: ethically disclose your use, double check all of your work, ensure things compile without errors, not lie to others, not ask it to generate ten paragraphs of rubbish when the answer is one sentence, and respect the project's guidelines. But for so many people this seems like an impossible task.
> Why do they think they are "helping" with hallucinated rubbish that can't even build?
Because they can't tell the difference between what the machine is outputting, and what people have built. All they see is the superficial resemblance (long lines of incomprehensbile code) and the reward that the people writing the code have got, and want that reward too.
"Main character energy". What they're really doing is protecting their view of themselves as smart, and they're making a contribution for the sake of trying to perform being an OSS dev rather than out of need or altruism.
AI is absolutely terrible for people like that, as it's the perfect enabler.
You're asking why oil doesn't act like water. It's not really an impossible task, it's just not one they agree with.
It's the same as cheating in a game. You are given an """advantage""", so lying about it seems like the best option
LLMs are in this case enabling bad behavior, but open source software has always been vulnerable to this. Similarly, people who use LLMs to do this kind of thing are the kind of people who would have done it without LLMs but for the large effort it would have taken. We're just learning now how large that group is.
This is a good thing, it's an opportunity to make open source development processes robust to this kind of sabotage.
> It's hard to know if LLMs will end up being a net win for the industry.
True, regardless of that, for sure with LLM we are borrowing Technical debt like never before.
> It's hard to know if LLMs will end up being a net win for the industry. They may speed up the good programmers a little, but those people were able to program anyway without LLMs. They will speed up the bad programmers a lot and that's where the balance sheet goes into the red.
If you will forgive an appeal to authority:
The hard thing about building software is deciding what one wants to say, not saying it. No facilitation of expression can give more than marginal gains.
- Fred Brooks, 1986
> there are lots of people in the world who live their whole life by vibing. It's a viable way to live and sometimes it's the only way to live. But they have a very loose relationship with truth and reason
This response 1000% was crafted with input from an LLM, or the user spends too much time reading output from llms.
I have never used an LLM to write. Writing forces me to think (and I edited the comment a couple of times when writing it which helped me clear up my thinking). "It's a viable way to live and sometimes it's the only way to live" is a personal realization that has taken me some time to understand. You can go back through my comment history to the time before LLMs to check if my style was different then.
If you run your writing through an LLM, it can poke holes in your argument, organize your ideas better, or point out that your tone is hostile/dismissive. It doesn’t need to be a replacement for writing or thinking, especially if you’re learning along the way.
I don't get that impression at all. LLMs would have avoided the stylistic repetition of "live". Asking an LLM to reformulate the sentences you quoted yields this slop:
> There are a lot of people who go through life by vibing. And honestly: that’s not automatically “bad.” Sometimes it’s even the only workable way to get through things. The issue is that “vibe-first” people tend to have a pretty loose relationship with truth, rigor, and being pinned down by specifics. They’ll confidently move forward on what sounds right instead of what they can verify.
I'll finish this post with a sentence containing an em-dash -- just to confuse people -- and by remarking on how sad I find it that people latch onto dashes and complete sentences as the signifiers of LLM use, instead of the inconsistent logic and general sloppiness that's the actual problem.
Fanbase, maybe. Software engineers using these projects? Probably forking and updating themselves.
FWIW, I've opened a half dozen PRs from LLMs and had them approved. I have some prompts I use to make them very difficult to tell they are AI.
However if it is a big anti-llm project I just fork and have agents rebase my changes.
I'm firmly in the LLM fanbase. Not because I can't type code (was doing it for over 17 years, everywhere from low level hardware drivers in C to web frontend to robot development at home as a hobby - coding is fun!), but because in my profession it allows me to focus more on the abstraction layer where "it matters".
I'm not saying that I'm no longer dealing with code at all though. The way I work is interactively with the LLM and pretty much tell it exactly what to do and how to do it. Sometimes all the way down to "don't copy the reference like that, grab a deep copy of the object instead". Just like with any other type of programming, the only way to achieve valuable and correct results is by knowing exactly what you want and express that exactly and without ambiguity.
But I no longer need to remember most of the syntax for the language I happen to work with at the moment, and can instead spend time thinking about the high level architecture. To make sure each involved component does one thing and one thing well, with its complexities hidden behind clear interfaces.
Engineers who refuse to, or can't, or won't utilize the benefits that LLMs bring will be left behind. It's just the way it is. I'm already seeing it happening.
This mindset is fine (it's mine essentially too).
But it absolutely has to be combined with verification/testing at the same speed as code production.
I generally do have that mindset, but over the past 1y of Claude code I do notice that I’m clearly losing my understanding of the internals of projects. I do review LLM generated code, understand it, no problem reading/following through. But then someone asks me a question, and I’m like… wait, I actually don’t know. I remember the instructions I gave and reviewing the code but don’t actually have a fine-details model of the actual implementation crystallized in my mind, I need to check, was that thing implemented the way I thought it was or not? Wait, it’s actually wrong/not matching at all what I thought! It’s definitely becoming uncomfortable and makes me reconsider my use of Claude code pretty significantly
> Engineers who refuse to, or can't, or won't utilize the benefits that LLMs bring will be left behind. It's just the way it is. I'm already seeing it happening.
Any examples how you see some engineers being left behind?
Not really - I imagine as with almost everything in life there's a normal distribution, in this case of the quality with which people use AI tools.
You can curb an LLM into doing what you want. Unfortunately people don't have the patience or the skill.
People who have skill can do the same without LLMs, maybe slightly slower on average but on more predictable schedule.
I wouldn’t say slightly slower; LLMs are massively useful for software engineering in the right hands.
For some personal projects I still stick to the basics and write everything by hand though. It’s kinda nice and grounding; and almost feels like a detox.
For any new software engineer, I’m a strong advocate of zero LLM use (except maybe as a stack overflow alternative) for your first few months.
Apparently, the noise around the AI policy came from Bun's developers saying that policy blocks upstreaming their performance PR. But the real reason seems to be that PR's code itself isn't in great shape, and introduces unhealthy complexity https://ziggit.dev/t/bun-s-zig-fork-got-4x-faster-compilatio...
> Parallel semantic analysis has been an explicitly planned feature of the Zig compiler for a long time, and it has heavily influenced the design of the self-hosted Zig compiler. However, implementing this feature correctly has implications not only for the compiler implementation, but for the Zig language itself! Therefore, to implement this feature without an avalanche of bugs and inconsistencies, we need to make language changes.
Yes, that reply provides convincing arguments for not merging the Bun fork, as it interferes with Zig's own roadmap for achieving even better results, while continuing to improve the whole language.
A single PR for a 3000-line addition would, in all likelihood, be rejected anyway.
When somebody comments PR with “Incredible work, Jacob. It is an honor to call you my colleague.” then it's safe to assume it's out of the ordinary contribution. Pretty much falling outside of the “in all likelyhood”.
3000 line LLM commit is not that.
Also 95% of those 30k lines changed are fully self-contained inside of the aarch64 directory and of the remaining changes it looks like the majority is just adding "aarch64" as another item into an existing list. There are a few core changes that to me look like they could be done in their own PRs, but also core maintainers get to decide if they want to apply bureaucracy to their own work.
No description provided. I love this PR. But yeah, try being anyone besides Jacob and submitting that!
> In successful open source projects you eventually reach a point where you start getting more PRs than what you’re capable of processing. Given what I mentioned so far, it would make sense to stop accepting imperfect PRs in order to maximize ROI from your work, but that’s not what we do in the Zig project. Instead, we try our best to help new contributors to get their work in, even if they need some help getting there. We don’t do this just because it’s the “right” thing to do, but also because it’s the smart thing to do.
I feel like if their goal is to prioritize contributors over contributions, it'd also logically follow that they should try to have descriptions where possible? Just to make exploring any set of changes and learning easier? Looked it over briefly, no Markdown or similar doc changes there either.
I mean the changes can be amazing, it's just that adding some description of what they are in more detail, alongside the considerations during development, for new folks or anyone wanting to learn from good code would also be due diligence.
How would you differentiate a 3000 line LLM commit made by the best models and good AI processes from a 3000 line commit made by the best human developer?
edit Okay, I set the bar too high here with "best human developer" and vague "good AI processes". My bad. Yes, LLM is not quite there yet.
The post that inspired this post [0] says:
> So while one could in theory be a valid contributor that makes use of LLMs, from the perspective of contributor poker it’s simply irrational for us to bet on LLM users while there’s a huge pool of other contributors that don’t present this risk factor.
> The people who remarked on how it’s impossible to know if a contribution comes from an LLM or not have completely missed the point of this policy and are clearly unaware of contributor poker.
The point isn't about the 3000 line PR, it's about do we think the submitter is going to stick around.
It's still fairly obvious just by skimming the code. The best AI models are still quite far from the best human developers in ability and especially in code quality.
When the best AI models are the same or better than the best[1] human developers, what then?
We're already at the point talking about best vs. best.
If that happens and we have a way of reliably knowing if some code is produced to that high quality, then I think we probably can accept that AI coding is the only sensible option.
We definitely are not close to that point though and it's unclear if/when we will get there.
It seems to me that people might be arguing from conflicting hidden premises here. "AI Coding" is a spectrum that could mean something as simple as letting the LLM proofread your changes and then act on those with your own human brain, or it could mean just telling the agent what you want and let it rip and tear until it is done.
If I do the latter and submit a PR to something like Zig, I'll be certainly caught doing it and rightfully chastised. If I do the former, my PR will be better without anybody besides myself having any way of knowing how it got better. Probably I do something in between when I contribute to open-source these days.
Blanket banning all of these seems like a bad idea to me. It actively gates people like myself from contributing, because I respect these people and projects that much. It feels like I would be doing something they find disgusting if my work has touched an LLM and I obviously don't want to do that to people I respect. But it's fine, there are plenty of things to do in the world even when some doors are closed.
I do not presume to have any say on Zig project's well argued decisions[0] -- I'm not really even their user let alone someone important like a contributor. Their point of preferring human contact is superb, frankly. Probably a different kind of problem in an open-source project staffed with a lot of remote working people, where human contact is scarce.
How can AI possibly be better than “the best” when the corpus of training data now includes its own slop in addition to all the code by new devs/lazy devs/bad devs scattered all over the internet? Law of averages applies here.
Very different context: that PR is from a maintainer, and trusted member of Zig, which surely discussed the implementation/design internally as well
What’s the point in debating the PR quality? The policy explicitly forbids all LLM code, so that policy is of course the “real reason”.
> What’s the point in debating the PR quality?
Because the pro-group are whining that the policy is preventing the merge, when in actual fact even if the policy did not exist, the PR is crap anyway.
I don’t see how it could be that bad (incorrect, specifically), considering bun is probably the most widely-used production use case of zig. But regardless, let’s say it’s a bad PR for the sake of argument - it’s beside the point. It cannot be merged no matter how good it is, due to the strict no-LLM policy.
Of course the policy is preventing the merge. That’s literally the point of the policy…
> Of course the policy is preventing the merge. That’s literally the point of the policy…
In this case it isn't the blocker - the fact that the dev took the time to read the PR in detail, comment on it, and provide reasons why it could not be merged makes it very clear to me that the policy wasn't the blocker.
If they were going to enforce the policy for this PR, they wouldn't have bothered to read it. The only reason to read it is to see if the policy is waived for this specific PR.
OTOH why bother to polish the PR if it won't get accepted anyway?
> OTOH why bother to polish the PR if it won't get accepted anyway?
As the Zig maintainer so patiently explained, no amount of "polish" can fix the PR because it is misaligned to the correctness that they require.
IOW, that PR is so far off the reservation, unless it is completely rewritten, it won't be accepted.
it could have been rewritten, rewriting PRs is cheap today, but that isn't the question. the question is, would it have been accepted had it met all the quality and engineering standards and full disclosure that it was 90%+ LLM generated?
> it could have been rewritten, rewriting PRs is cheap today
Rewriting PRs with LLMs is cheap, but often the output is no better than the previous revision (fixing one issue only to cause another one is very common IME). And reviewing each revision of the PR is not cheap.
I've had good experiences with people submitting AI generated PRs who then actually take the time to understand what's going on and fix issues (either by hand or with a targeted LLM generated fix) that are brought up in review. But it's incredibly frustrating when you spend an hour reviewing something only to have someone throw your review comments directly back at the LLM and have it generate something new that requires another hour of review.
> it could have been rewritten, rewriting PRs is cheap today, but that isn't the question. the question is, would it have been accepted had it met all the quality and engineering standards and full disclosure that it was 90%+ LLM generated?
In this case it looks like the answer is "Yes"; the PR was not dismissed immediately, it was first examined in great detail!
Why would the maintainer expend effort on something that was going to be rejected anyway?