Who owns the code Claude Code wrote?
legallayer.substack.com377 points by senaevren 21 hours ago
377 points by senaevren 21 hours ago
Three things matter when it comes to eating my breakfast sandwich:
1/ Was the pork in my sausage reared on a farm that meets agricultural standards?
2/ Was the food handled safely by the kitchen that cooked my food?
3/ Does the owner of the diner pay kitchen wages in accordance with labor law?
By contrast, I have no idea what went into the models I use, what system prompts have prejudiced it, and whose IP has been exploited in pursuit of my answer.
That’s being charitable, really. In practice the open secret of the AI industry is that the vast majority of training data, for want of a better word even if it is likely to be the most precise description, is stolen data.
The media industry loves to quote ridiculous numbers on lost revenue due to piracy etc. May be a rough ballpark numbers will get them to do something about this theft.
Can someone put a rough estimate on potential revenue loss (direct and incidental) from training AI with industry wise breakup.
Probably, yes, but the burden of proof is with us not them.
I'm already glad some companies have the guts to open their models because proving it for open models is probably a lot easier than for a model behind a service.
Could you please stop posting generated comments to HN? It's not allowed here, and it looks like you've done it over 30 times already.
(Of course, there's no way to be certain of this, but it's what our software thinks, and the overall pattern is pretty convincing.)
See https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079
On that matter, wouldn't an AI flag for submissions help hn? I wouldn't flag a submission for LLM style as it is too harsh, but I don't want to read them -- if only because I don't like LLM prose.
There are so many submissions where most of the discussion is about whether the content has any human effort behind, or the LLM was just a purely assistive role like translating. It's really devaluing hn, IMO. Not sure how much an AI flag would help, or introduce new issues, given how difficult the problem is, though.
You are definitely right to flag it, apologize for that. I used an AI assistant for the replies, and I will make sure not to use one going forward.
Why do you use an AI assistant for the replies?
My guess is she wants to respond to all feedback and questions but doesn't have time to do it all by hand.
> The US Copyright Office confirmed this in January 2025, and the Supreme Court declined to disturb it in March 2026 when it turned away the Thaler appeal. Works predominantly generated by AI without meaningful human authorship are not eligible for copyright protection, and that rule is now settled at the highest judicial level available.
Misstates the law. Denial of certiorari can happen for many reasons unrelated to the merits and does not settle the issue nationwide.From TFA:
> When the Supreme Court declined to hear the Thaler appeal in March 2026, it did not endorse the lower court's reasoning or settle the question nationally. Cert denial means the Court chose not to hear the case, nothing more. What it does mean is that the DC Circuit's ruling stands, the Copyright Office's position is intact, and no court has yet gone the other way.
Your quoted text is no longer in TFA.
Also, I don't think there is any example testing the conclusion. There is no case to point at that any of the factors they listed are sufficient to convey authorship. Would love to be pointed to a case where rejecting decisions and redirecting to a different approach was deemed human authorship. What we do know is that you can disclaim the part of the code a human didn't author. In fact, the Copyright Office requires you disclose and disclaim. If anyone out there has more factual and citable sources please share.
It's in fact the opposite from what I've read. In one of the supreme court cases cited by the copyright office itself in its opinion of AI works (https://en.wikipedia.org/wiki/Community_for_Creative_Non-Vio...) it is deemed that just you advising something to do the work for you, giving criticisms and revisions, isn't enough for authorship or co-authorship.
While it's not code related, the copyright office's opinion is a good read and I don't see any reason to believe it's opinion is different for works of text vs works of physical art: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
You are right that no court has yet ruled that a specific set of human contributions to AI-assisted work was sufficient to establish authorship. What exists is the inverse: the Copyright Office has granted partial registrations where human-authored elements were separated from AI-generated elements, as in Zarya of the Dawn, where the human-written text was protected but the Midjourney images were not. The Allen v. Perlmutter case pending in Colorado is the first direct judicial test of whether iterative prompting and editing can constitute authorship. Until that decision, the positive threshold is genuinely unknown. The piece reflects this in the calibration section at the end, though your point is worth adding to the authorship discussion more explicitly.
> meaningful human authorship
How is this defined? Is my code review "meaningful" ? Are my amendments and edits to the generated code "human authorship" ?
From the article:
> Specifying an objective to the model is not enough. Directing how the work is constructed is what counts.
That's interesting but how is anyone supposed to prove it? They would have to get their hands on your prompts.
Leaks, whistleblowers. Some circumstantial evidence will also do if there's enough of it. Like having hallucinated parts of code that do absolutely nothing, and can't be explained as e.g. leftovers from a refactor.
But it means that the appellate decision will retain precedence, no? Wouldn’t losing precedence be the primary legal effect of overturning that decision? All case law that hasn’t touched the Supreme Court could theoretically be challenged, but most of it isn’t, and it’s considered the law until it isn’t anymore, right? How would this be any different?
The decision is binding only within the jurisdiction of the Court of Appeals for the D.C. Circuit.
So it’s not correct to say “because SCOTUS denied cert, Thaler is now binding national copyright law.”
Practically speaking, it is binding on the US Copyright office (one of the parties in the case) in CADC. And that’s important. But copyright litigation happens all across the country, while this ruling only directly constrains the relatively small number of cases within CADC.
Yes, I didn’t imply national precedence. I imagine it would also signal to attorneys appealing cases other circuits that the same challenge will likely yield the same result.
Fair and correct. Cert denial means the Court declined to hear the case, not that it endorsed the lower court's reasoning or settled the question nationally. The DC Circuit ruling stands and the Copyright Office's position is consistent, but that is stable doctrine rather than Supreme Court-settled law. Updated the piece to reflect this distinction accurately.
Since this is a tech audience... the Supreme Court uses a bounded priority queue. An unbounded queue would risk growing impractically large.
There are some kinds of cases where the Court has "original jurisdiction," meaning they must hear them, but those are very rare.
The Supreme Court declining to take up an issue is taking a position.
Now different circuits can take a different view of the same issue. This is a common reason why the Supreme Court will grant cert: to resolve a circuit split. Appeals court judges know this and have at times (allegedly) intentnionally split to force an issue to the Supreme Court.
Even without settling the issue appeals courts will look at how other circuits have ruled and be guided by their reasoning, generally. The fact that the Supreme Court declined to grant cert actually carries weight.
> The Supreme Court declining to take up an issue is taking a position.
No it is not. > “The denial of a writ of certiorari imports no expression of opinion upon the merits of the case, as the bar has been told many times.”
United States v. Carver, 260 U. S. 482, 490 (1923).Moreover, SCOTUS does not decide issues, they decide cases.
> “We are acutely aware, however, that we sit to decide concrete cases, and not abstract propositions of law.”
Upjohn Co. v. United States, 449 U. S. 383, 386 (1981).the real issue is that the Thaler case was a different question: "Can AI be an author?" and the lower Court said no and SCOTUS left it along. But the question of "what is enough for the human to be the author" wasn't even part of the case. That is completely own checked.
Logically, I think there's a big difference between code which was produced from a simple generic prompt without other input vs code which was produced from a multiple complex prompts with large existing code as input.
When I'm feeding AI my code as input and it ends up producing new code which adheres to my architecture, my coding style and my detailed technical requirements, the copyright over the output should be mine since the code looks exactly like what I would have produced by hand, there is no creative input from the AI. It's just a code completion tool to save time.
I understand if someone leaves an LLM running as an agent for multiple days and it produces a whole bunch of code, then it's a very different process.
Fair point and worth being precise about. Cert denial is not meaningless: it leaves the lower court ruling intact, it signals the Court did not find the issue urgent enough to resolve now, and as you note, other circuits will look at the DC Circuit's reasoning. What it does not do is bind other circuits or establish Supreme Court precedent. The distinction matters here because if a Ninth Circuit case involving AI-generated code reaches a different conclusion, that circuit split would be live law regardless of the Thaler cert denial.
This is the same shape as the image cases.
Zarya of the Dawn already settled it for Midjourney output: human-written elements were protected, AI-generated images were not. The character design didn't get copyright even though the human picked, prompted, and curated. Code isn't different. Prompting Claude to produce a function is closer to prompting Midjourney to produce a frame than to writing the function yourself.
The reason it feels different to engineers is that we're used to thinking of the compiler as the analogy. But a compiler is deterministic — same input, same output. An LLM isn't. That's the line the Copyright Office is drawing, and image cases got there first.
Depends on the scale of LLM involvement, the copyright office left a pretty big carve out for things that are human sourced and then modified by LLM, or the reverse, LLM output thats modified by human intention. (They had to do this because there are already pseudo random elements to digital artwork, like say, render clouds and render noise, that might otherwise poison an artwork). In fact I dont think this has been tested with Highlight area > Prompt a change to this area of the image workflows.
They also mention in the same document that were LLMs to more closely approximate deterministic tools, they would be open to reevaluating. That is Requesting X gets X without substantial wiggle room.
I dont think that last part has been tested with an extremely large set of prompts and human generated input to create a more deterministic output. Even outside of code, where you see large prompts, creative writing LLM tools, NovelAI or Sudowrite for instance can have pages and pages of spec for the LLM, sometimes close to 50% of the size of the final output.
Then there's testing, review etc, human processes confirming that the output meets spec, updating it where needed intelligently.
There are also foreign courts, with similar rules about human intention, that have found in favor of prompts only, where it could be demonstrated that multiple rounds of prompts were used to refine the image.
I wouldnt call this settled at all tbh. And to be honest, a lot of this doesnt require exposure. you dont need to own up to LLM use in a lot of settings, proving LLM use is so difficult its easy to jump up the ladder from LLM (100%) to LLM (50%) and ultimately claim ownership.
The people who will get busted for this are basically just super lazy leaving ChatGPT responses in, failing to pay an editor, failing to modify images for anything more than layouts.
> But a compiler is deterministic — same input, same output. An LLM isn't.
Temperature 0 determinism is subject to active research. NVIDIA tried but failed so far, DeepSeek V4 seems to have done it. I hope judges won't be swayed by this an AI generated code will classified as uncopyrightable, just like Images are.
Fair point on temp-0. But I don't think determinism is what the courts will hang it on. A deterministic LLM still makes the expressive choices — naming, structure, control flow — that the human didn't make. The image cases didn't turn on whether you could re-roll the same Midjourney frame. They turned on who made the creative decisions. Same logic should hold for code.
But is there anything stopping a human from applying for copyright in their own name? Does the fact that somebody can recreate the prompt invalidate their claim?
Filing isn't the gate, registration is.
Copyright Office requires you to disclose AI involvement and disclaim the AI-generated parts. Zarya of the Dawn is the example — applicant filed for the whole graphic novel, got partial registration on the human-written text, refused on the Midjourney images. The reproducibility of the prompt isn't really the test. The test is whether a human made the expressive choices.
Your comments are getting classified by our software as LLM-generated or (more likely) LLM-edited. It's impossible to be certain, of course, but if this is the case—can you please not do this? It's not allowed here - see https://news.ycombinator.com/newsguidelines.html#generated and https://news.ycombinator.com/item?id=47340079.
LLMs are amazing of course and we use them heavily ourselves - but not for modifying text that is to be posted to HN. Doing so leaves imprints on the language that readers are increasingly becoming allergic to, and we want HN to be a place human conversation.
> Filing isn't the gate, registration is.
Not really. Copyright registration is pretty much automatic. The Copyright Office does not check for duplicates. Patent registration involves actual examination for patentability. Issued patents are presumed valid (less so than they used to be), but issued copyrights are not. You have to litigate.
The US does not have "sweat of the brow" copyrights. It's the "spark" that creates the originality, not the work. Which is why you can't copyright a telephone directory (Feist vs. Rural Telephone) or a copy of an uncopyrighted image (Bridgeman vs. Corel) or a scan of a 3D object (Meshwerks vs. Toyota). Or the contents of a database as a collective work. Note that some EU countries do allow database copyright.
Interestingly, a corporation can be an author for copyright purposes. The movie industry pushed for that. We may in time see AI corporate personhood for IP purposes.
Personally, I think that the human directing the agent owns the copyright for whatever is produced, but the ability for the agent to build it in the first place is based off of stolen IP.
I'm concerned about the copyright 'washing' this enables though, especially in OSS, and I think the right thing for OSS devs to do is to try to publish resulting code with the strongest copyleft licensing that they are comfortable with - https://jackson.dev/post/moral-ai-licensing/
Funny how the copyright industry was able to spin copyright infringment into the pejorative "stealing". If you still have the item, what was stolen?
Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act
I still find the idea that "learning" from code is "stealing" kind of ridiculous.
The "learning" isn't learning really. I mean it might be, but if you define learning to be a human endeavor than AI can't learn.
It's perfectly reasonable to say it's okay for humans to do something but not okay for a computer program to do the same thing. We don't have to equate AI to humans, that's a choice and usually a bad one.
It's also perfectly reasonable to say it's ok for a program or machine to do the same thing as a human. This has been the basis for the technological revolution since the dawn of technology.
If one defines 'flying' to be a bird's endeavor, then humans can't fly.
Now, if you'll excuse me, I need to catch a metal shuttle that chucks itself through the air on wings.
Sure as a word it can be broad, as a concept in our legal system that should be much more nuanced.
The relevant extension of your analogy is should birds be required to obey FAA rules? Or should plane factories be protected as nesting sites?
Yes I guess there's also no such thing as stealing in torrents since the computer "learns" the data and returns it in a transcoded fashion so it's technically not a reproduction. Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".
The mental calisthenics required to justify this stuff must be exhausting.
> The mental calisthenics required to justify this stuff must be exhausting.
It's only exhausting if you think copyright ever reasonably settled the matter of ownership of knowledge and want to morally justify an incoherent set of outcomes that they personally favor. In practice it's primarily been a tool for the powerful party in any dispute to hammer others for disrupting their business model. I think that's pretty much the only way attempting to apply ownership semantics to knowledge or information can end up.
Correct.
Knowledge consists of, roughly speaking, thoughts.
(a "justified true belief" - per https://plato.stanford.edu/entries/knowledge-analysis/ - is a kind of thought)
The "thinking" part of a "thinking being" - that also consists of thoughts.
If your knowledges are someone's property, you are someone's property.
A society where all knowledge is proprietary, is a society of ubiquitous slavery.
Maybe multi-layered, maybe fractional, maybe with a smiley-face drawn on top.
Doesn't matter.
Humans have been known to recite entire parts from plays from memory, live in front of audiences even.
And they are legally required to license the play to do that, if it's still in copyright.
I think that it's absurd that we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.
I mean I don't think think I could find a better description for following the derivatives of error in reproducing a set of works as creating a "derivative work".
>> ... we've jumped to the conclusion backpropagation in neural networks should be legally treated the same as human learning.
I agree. However, the reverse is also likely true, i.e., it cannot currently be denied that learning in humans is different from learning in artificial neural networks from the point of view of production of works that mix ideas/memes from several works processed/read. Surely, as the article says, copyright law talks exclusively about humans, not machines, not animals.
I understand the article - the point about 'learning' is that if the model and its outputs are a derivative works then the copyright belongs to the human creators of the works it was trained on.
Edit*: Or perhaps put more pseudo legally that the created works infringe on the copyrights of the original human creators.
The part I agree to is that copyright law calls out humans specifically as the potential owners of copyright. So what you suggest seems to be the only possibility out. Calling out humans could imply that when a human reads a thousand books and then writes something basis the same but which is not a substantial copy of anything explicitly read, that human owns the copyright to the text written. Whereas, if an artificial neural network does the same (hypothetically writing the same text), it would not.
The above does not follow from, imply or conclude anything about learning in artificial neural networks and humans being similar or dissimilar.
I find it more ridiculous to equate the act of a human learning with for-profit AI training without recompense to the authors of the training material.
Learning, probably not.
Copy/pasting at scale, yes
It is learning though. It’s not just copying the code.
Code gets turned into tokens and then it learns the next most likely token.
The issue that I see most people talk about it the scale at which is learnt.
A human will learn from other people’s code but not from every persons code.
The issue is that of copyright law WRT to derivative works. Machine transformations on original works does not create a new copyright for the person that directed the machine transformation. That's why you can't pirate a bunch of media by simply adding a red pixel to the righthand corner or by color shifting the video.
Copyright law is very clear that if a machine does it, the original copyright on the input is kept. This is why your distributed binaries are still copyrighted, because the machine transformed, very significantly, the source code into binary which maintains the copyright throughout.
It would be inconsistent for the courts to suddenly decide that "actually, this specific type of machine transformation is actually innovative."
I know this is generally really bad for the AI industry, so they just ignore it until a court tells them they can't anymore. And they might get away with it as I don't have faith that the courts will be consistent.
Shredding is a machine transformation. Does it mean that shreds retain original copyright even if the content can't be restored and the provenance can't be traced? Just an example that treating all machine transformations equally with no regard to the specifics doesn't make much sense.
And the specifics of autoregressive pretraining is that it is lossy compression. Good luck finding which copyrighted materials have made it into the final weights.
> Does it mean that shreds retain original copyright even if the content can't be restored?
Yup, it absolutely does. In fact, that's why you are still violating copyright law by using bittorrent even though each of the users is only giving out a small slice or shred of the original content.
The US has a granted defense in the case of something like shredding called "Fair Use" but that doesn't mean or imply that a copyright is void simply because of a fair use claim.
> And the specifics of autoregressive pretraining is that it is lossy compression.
That doesn't matter. Why would it? If I take a FLAC recording and change it to an MP3. The fact that it was a lossy transform doesn't suddenly give me the legal right to distribute the MP3.
> Good luck finding which copyrighted materials have made it into the final weights.
That's what the NYT v. OpenAI lawsuit is all about. And for earlier models they could, in fact, pull out full NYT articles which proved they made it into the final weights.
Further, the NYT is currently in discovery which means OpenAI must open up to the NYT what goes into their weights. A move that, if OpenAI loses, other litigants can also use because there's a real good shot that OpenAI also included their works in the dataset.
> Yup, it absolutely does
Well, it's not the first time when the law contradicts laws of nature (for the entertainment of the future generations). Bittorent is not a relevant example, because the system is designed to restore the work in its fullness.
> in fact, pull out full NYT articles
That's when they used their knowledge of the exact text they wanted to "retrieve" to get the text? It wouldn't be so efficient with a random number generator, but it's doable.