Relicensing with AI-Assisted Rewrite

tuananh.net

65 points by tuananh 3 hours ago


nairboon - 2 hours ago

That code is still LGPL, it doesn't matter what some release engineer writes in the release notes on Github. All original authors and copyright holders must have explicitly agreed to relicense under a different license, otherwise the code stays LGPL licensed.

Also the mentioned SCOTUS decision is concerned with authorship of generative AI products. That's very different of this case. Here we're talking about a tool that transformed source code and somehow magically got rid of copyright due to this transformation? Imagine the consequences to the US copyright industry if that were actually possible.

samrus - an hour ago

> The ownership void: If the code is truly a “new” work created by a machine, it might technically be in the public domain the moment it’s generated, rendering the MIT license moot.

Im struggling to see where this conclusion came from. To me it sounds like the AI-written work can not be coppywritten, and so its kind of like a copy pasting the original code. Copy pasting the original code doesnt make it public domain. Ai gen code cant be copywritten, or entered into the public domain, or used for purposes outside of the original code's license. Whats the paradox here?

kshri24 - an hour ago

> The ownership void: If the code is truly a “new” work created by a machine, it might technically be in the public domain the moment it’s generated, rendering the MIT license moot.

How would that work? We still have no legal conclusion on whether AI model generated code, that is trained on all publicly available source (irrespective of type of license), is legal or not. IANAL but IMHO it is totally illegal as no permission was sought from authors of source code the models were trained on. So there is no way to just release the code created by a machine into public domain without knowing how the model was inspired to come up with the generated code in the first place. Pretty sure it would be considered in the scope of "reverse engineering" and that is not specific only to humans. You can extend it to machines as well.

EDIT: I would go so far as to say the most restrictive license that the model is trained on should be applied to all model generated code. And a licensing model with original authors (all Github users who contributed code in some form) should be setup to be reimbursed by AI companies. In other words, a % of profits must flow back to community as a whole every time code-related tokens are generated. Even if everyone receives pennies it doesn't matter. That is fair. Also should extend to artists whose art was used for training.

zozbot234 - an hour ago

If you ask a LLM to derive a spec that has no expressive element of the original code (a clean-room human team can carefully verify this), and then ask another instance of the LLM (with fresh context) to write out code from the spec, how is that different from a "clean room" rewrite? The agent that writes the new code only ever sees the spec, and by assumption (the assumption that's made in all clean room rewrites) the spec is purely factual with all copyrightable expression having been distilled out.

spwa4 - a minute ago

Can we do the same with universal music? Because that's easy and already possible.

mfabbri77 - an hour ago

This has the potential to kill open source, or at least the most restrictive licenses (GPL, AGPL, ...): if a license no longer protects software from unwanted use, the only possible strategy is to make the development closed source.

pu_pe - 40 minutes ago

Licensing issues aside, the chardet rewrite seems to be clearly superior to the original in performance too. It's likely that many open source projects could benefit from a similar approach.

Tomte - an hour ago

> The original author, a2mark , saw this as a potential GPL violation

Mark Pilgrim! Now that‘s a name I haven‘t read in a long time.

Retr0id - 2 hours ago

> In traditional software law, a “clean room” rewrite requires two teams

Is the "clean room" process meaningfully backed by legal precedent?

anilgulecha - 2 hours ago

This is precedent setting. In this case the rewrite was in same language, but if there's a python GPL project, and it's tests (spec) were used to rewrite specs in rust, and then an implementation in rust, can the second project be legally MIT, or any other?

If yes, this in a sense allows a path around GPL requirements. Linux's MIT version would be out in the next 1-2 years.

DrammBA - 2 hours ago

I like the idea of AI-generated ~code~ anything being public domain. Public data in, public domain out.

foota - 2 hours ago

I think the more interesting question here would be if someone could fine tune an open weight model to remove knowledge of a particular library (not sure how you'd do that, but maybe possible?) and then try to get it to produce a clean room implementation.

est - an hour ago

Uh, patricide?

The key leap from gpt3 to gpt-3.5 (aka ChatGPT) was code-davinci-002, which is trained upon Github source code after OpenAI-Microsoft partnership.

Open source code contributed much to LLM's amazing CoT consistency. If there's no Open Source movement, LLM would be developed much later.

himata4113 - an hour ago

I mean in my opinion GPL licensed code should just infect models forcing them to follow the license.

You can do this a lot by saying things like: complete the code "<snippet from gpl licensed code>".

And if now the models are GPL licensed the problem of relicensing is gone since the code produced by these models should in theory be also GPL licensed.

Unfortunately, there is a dumb clause that computer generated code cannot be copyrighted or licensed to begin with.

verdverm - 2 hours ago

Interesting questions raised by recent SCOTUS refusal to hear appeals related to AI an copyright-ability, and how that may affect licensing in open source.

Hoping the HN community can bring more color to this, there are some members who know about these subjects.

- an hour ago
[deleted]