Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲No right to relicense this project (github.com)

512 points by robin_reala 2 days ago | 366 comments

antirez 2 days ago [-]

I believe that Pilgrim here does not understand very well how copyright works:

> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code

This is simply not true. The reason why the "clean room" concept exists is precisely since actually the law recognizes that independent implementations ARE possibile. The "clean room" thing is a trick to make the litigation simpler, it is NOT required that you are not exposed to the original code. For instance, Linux was implemented even if Linus and other devs where well aware of Unix internals. The law really mandates this: does the new code copy something that was in the original one? The clean room trick makes it simpler to say, it is not possible, if there are similar things it is just by accident. But it is NOT a requirement.

maybewhenthesun 2 days ago [-]

Regardless of the legal interpretations, I think it's very worrying if an automated AI rewrite of GPLed code (or any code for that matter) could somehow be used to circumvent the original license. That kinda takes out the one stick the open source community has to force soulless multinationals to contribute back to the open source projects they use.

rao-v 2 days ago [-]

I’m genuinely surprised to see this not discussed more by the FOSS community. There are so many ways to blow past the GPL now:

1. File by file rewrite by AI (“change functions and vars a bit”)

2. One LLM writes a diff language (or pseudo code) version of each function that a diff LLM translates back into code and tests for input/output parity

The real danger is that this becomes increasingly undetectable in closed source code and can continue to sync with progress in the GPLed repo.

I don’t think any current license has a plausible defense against this sort of attack.

nmfisher 1 days ago [-]

I’ve never delved fully into IP law, but wouldn’t these be considered derivative works? They’re basically just reimplementing exactly the same functionality with slightly different names?

This would be different from the “API reimplementation” (see Google vs Oracle) because in that case, they’re not reusing implementation details, just the external contract.

pas 10 hours ago [-]

there's usually a test for originality, and it involves asking things (from the jury) like, is it transformative enough?

so if someone tells the LLM to write it in WASM and also make it much faster and use it in a different commercial sector... then maybe

since 2023 the standard is much higher (arguably it was placed too low in 1993)

singpolyma3 1 days ago [-]

"change functions and bars a bit" isn't a rewrite. Anything where the LLM had access to the original code isn't a rewrite. This would just be a derivative work.

However most of the industry willfully violates the GPL without even trying such tricks anyway so there are certainly issues

UqWBcuFx6NV4r 19 hours ago [-]

The fact that you are drawing such absolute conclusions is indication enough that you are not qualified to speak on this.

wlonkly 1 days ago [-]

(1) sounds like a derivative work, but (2) is an interesting AI-simulacrum of a clean room implementation IF the first LLM writes a specification and not a translation.

wakawaka28 1 days ago [-]

#1 is already possible and always has been. I never heard of a case of anyone actually trying it. #2 is too nitpicky and unnecessarily costly for LLMs. It would be better to just ask it to generate a spec and tests based on the original, them create a separate implementation based on that. A person can do that today free and clear. If LLMs will be able to do this, we will just need to cope. Perhaps the future is in validating software instead of writing it.

ineedasername 5 hours ago [-]

It’s less worry to me given that a year ago this would have been exceptionally harder to do, requiring a lot more time and effort and been more costly. A year from now it will be even easier. All of this means that one aspect of the mission that brought about the need for a license like this is now fundamentally easier whether or not the license is used. There can be less worry about software locked up in closed source overall.

TruePath 13 hours ago [-]

If the AI is good enough to truly implement the whole thing to a similar level of reliability without copying it then who cares. At that point you should be able to decompile any program you want and find enough information inside that an AI can go write a similar quality program from the vague information about the call graph. We've transcended copyright in computer code.

If it can't and it costs a bunch of money to clean it up then same as always.

OTOH if what is actually happening is just that it is rewording the existing code so it looks different then it is still going to run afoul of copyright. You can't just rewrite harry potter with different words.

Note that even with Google vs oracle it was important they didn't need the actual code just the headers to get the function calls were enough. Yes it's true that the clean room isn't required but when you have an AI and you can show that it can't do it a second time without looking at the source (not just function declarations) that's pretty strong evidence.

13 hours ago [-]

wareya 1 days ago [-]

It's worrying, but it's consistent with how copyright law is currently written. Laws haven't caught up with what technology is currently capable of yet. The discussion should be whether, and if so how, our laws should be tweaked to stop this from getting out of hand, IMO.

therealpygon 2 days ago [-]

Take AI out…if a person can do it, which they can, the situation hasn’t changed. Further, it was a person who did it, with the assistance of AI. Also, the concept that you “can’t be exposed to the code before writing a compatible alternative” is utterly false in their arguments. In fact, one could take every single interface definition they have defined to communicate and use those interfaces directly to write their own, because in fact this i(programmatic) interface code is not covered by copyright (with an implicit fair use exemption due to the face the software cannot operate without activating said interfaces). The Java lawsuit set that as precedent with JDK. A person could have absolutely rewritten this software using the interfaces and their knowledge, which is perfectly legal if they don’t literally copy and re-word code. Now, if it IS simply re-worded copies of the same code and otherwise the entire project structure is basically the same, it’s a different story. That doesn’t sound like what happened.

Finally, how exactly do people think corporations rewrite portions of code that were contributed before re-licensing under a private license? It is ABSOLUTELY possible to rewrite code and relicense it.

Edit: Further, so these people think you contribute to a project, that project is beholden to your contribution permanently and it can never be excised? That seems like it would blatantly violate their original persons rights to exercise their own control of the code without those contributions, which is exactly the purpose of a rewrite.

rmast 1 days ago [-]

As part of the relicensing ZeroMQ did a few years ago, they sought permission from all previous contributors (yes, it was a multi-year effort). Code contributions that they weren’t able to get permission to relicense resulting in the corresponding lines being removed (or functionality rewritten from scratch).

ottah 1 days ago [-]

It cuts both ways. You can write a GPL version of a proprietary or permissively licensed program. The only difference is the effort of the rewrite is (theoretically) easier.

(I have my doubts the rewrite is a reasonably defect free replacement)

TophWells 20 hours ago [-]

True, but if that is found to be how it works then an automated AI rewrite of closed-source code is just as unbound by the original license. Which is a much bigger win for the open-source community, since any closed-source software can become the inspiration for an open-source project.

luma 1 days ago [-]

If automated AI rewrites are generally feasible, then the marginal price of nearly all software trends to zero.

robinsonb5 1 days ago [-]

If code becomes essentially free (ignoring for a moment the environmental cost or the long term cost of allowing code generation to be tollboothed by AI megacorps) the value of code must lie in its track record.

The 5-day-old code in chardet has little to no value. The battle-tested years-old code that was casually flushed away to make room for it had value.

singpolyma3 1 days ago [-]

If it actually is a rewrite it's not "circumventing" it's just a new thing

wakawaka28 1 days ago [-]

Soulless multinationals often want to share costs with other soulless multinationals, just like individuals do. So I think there will always be publicly shared code. The real question is whether this code will be worth much if it can be implemented so quickly by a machine.

judahmeek 1 days ago [-]

Implementation is only one of the costs shared through open-source projects.

There are others, such as security vulnerability detection, support, & general maintenance.

CamperBob2 2 days ago [-]

That kinda takes out the one stick the open source community has to force soulless multinationals to contribute back to the open source projects they use.

I'll trade that stick for what GenAI can do for me, in a heartbeat.

The question, of course, is how this attitude -- even if perfectly rational at the moment -- will scale into the future. My guess is that pretty much all the original code that will ever need to be written has already been written, and will just need to be refactored, reshaped, and repurposed going forward. A robot's job, in other words. But that could turn out to be a mistaken guess.

beepbooptheory 2 days ago [-]

I think it's very weird but valid I guess to want to be just atomic individual in constant LLM feedback loop. But, at risk of sounding too trite and wholesome here, what about caring for others, the world at large? If you wanna get your thing to rewrite curl or something, that's again really weird but fine, but just don't share it or try to make money off of it. Isn't that like even the rational position here if you still wanna have good training materials for future models? These need not be conflicting interests! We can all be in this together, even if you wanna totally fork yourself into your own LLM output world.

What happened to sticking up for the underdogs? For the goodness of well-made software in itself, for itself? Isn't that what gave you all the stuff you have now? Don't you feel at least a little grateful, if maybe not obliged? Maybe we can start there?

lukeschlather 2 days ago [-]

> If you wanna get your thing to rewrite curl or something, that's again really weird but fine, but just don't share it or try to make money off of it.

The whole point of the GPL is to encourage sharing! Making money off of GPL code is not encouraged by the text of the license, but it is encouraged by the people who wrote the licenses. Saying "don't share it" is antithetical to the goals of the free software movement.

I feel like everyone is getting distracted by protecting copyright, when in fact the point of the GPL is that we should all share and share alike. The GPL is a negotiation tactic, it is not an end unto itself. And curl, I might note, is permissively licensed so there's no need for a clean room reimplementation. If someone's rewriting it I'm very interested to hear why and I hope they share their work. I'm mostly indifferent to how they license it.

joquarky 1 days ago [-]

> what about caring for others, the world at large

30 years of experience in the tech industry taught me that this will get you nowhere. Nobody will reciprocate generosity or loyalty without an underlying financial incentive.

> What happened to sticking up for the underdogs?

Underdogs get SPACed out and dump the employees that got them there.

beepbooptheory 1 days ago [-]

Grateful I do not share your experiences. But I'm sure your viewpoint here is hard won. Sorry.

CamperBob2 2 days ago [-]

Everything I have now arose from processes of continuous improvement, carried out by smart people taking full advantage of the best available tools and technologies including all available means of automation.

It'll be OK.

beepbooptheory 2 days ago [-]

Ah well, I tried.. To paraphrase Nietzsche, a man can be measured by how well he sleeps at night. I can only hope you stay well rested into this future ;).

And yes, it will be ok!

CamperBob2 2 days ago [-]

Ah, Nietzsche. "They call him Ubermensch, 'cause he's so driven." He told us that man is a thing that will be surpassed, and asked what we've done to surpass him. The last thing I want to do is get in the way of the people doing it.

beepbooptheory 1 days ago [-]

Ah geeze don't lie down so easily! It's aspirational! You don't need to prefigure yourself as so impotent here... We can all find the courage to roar against the consensus of slave mentality, even those of us who are maybe quicker to give it all up at first for some new God. I think you have the right attitude, but you are going to end up on the side of losers either way if you don't even try to fight. Also, I am just an old man, so grain of salt and all that!

And fwiw, the idea he meant like literal people walking around being Uber is kinda nazi distortion anyway.

dragonwriter 2 days ago [-]

Neither does the maintainer that claims a mechanical test of structural similarities can prove anything either waybwith regard to whether legally it is a derivative work (or even a mechnaical copy without the requisite new creative work to be a derivative work.)

And then Pilgrim is again wrong by saying that the use of Claude definitively makes it a derivative work because of the inability to prove it the work in question did not influence the neurons involved.

It is all dueling lay misreadings of copyright law, but it is also an area where the actual specific applicable law, on any level specific enough to cleanly apply, isn’t all that clear.

simiones 2 days ago [-]

I think this is a bit too broad. There are actually three possible cases.

When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

If the code is completely different, then clean room or not is indeed irrelevant. The only way the author can claim that you violated their copyright despite no apparent similarity is for them to have proof you followed some kind of mechanical process for generating the new code based on the old one, such as using an LLM with the old code as input prompt (TBD, completely unsettled: what if the old code is part of the training set, but was not part of the input?) - the burden of proof is on them to show that the dissimilarity is only apparent.

In realistic cases, you will have a mix of similar and dissimilar portions, and portions where the similarity is questionable. Each of these will need to be analyzed separately - and it's very likely that all the similar portions will need to be re-written again if you can't prove that they were not copied directly or from memory from the original, even if they represent a very small part of the work overall. Even if you wrote a 10k page book, if you copied one whole page verbatim from another book, you will be liable for that page, and the author may force you to take it out.

dmurvihill 9 hours ago [-]

The burden of proof is completely uncharted when it comes to LLMs. Burden of proof is assigned by court precedent, not the Copyright Act itself (in US law). Meaning, a court looking at a case like this could (should) see the use of an LLM trained on the copyrighted work as a distinguishing factor that shifts the burden to the defense. As a matter of public policy, it's not great if infringers can use the poor accountability properties of LLMs to hide from the consequences of illegally redistributing copyrighted works.

Someone 2 days ago [-]

> When there is similar code, the only defense possible to prove that you have not copied the original is to show that your process is a clean room re-implementation.

Yes, but you do not have to prove that you haven’t copied the original; you have to prove you didn’t infringe copyright. For that there are other possible defenses, for example:

- fair use

- claiming the copied part doesn’t require creativity

- arguing that the copied code was written by AI (there’s jurisdiction that says AI-generated art can’t be copyrighted (https://www.theverge.com/2023/8/19/23838458/ai-generated-art...). It’s not impossible judges will make similar judgments for AI-generated programs)

kube-system 2 days ago [-]

Courts have ruled that you can't assign copyrights to a machine, because only humans qualify for human rights. ** There is not currently a legal consensus on whether or not the humans using AI tools are creating derivative works when they use AI models to create things.

** this case is similar to an old case where a ~~photographer~~ PETA claimed a monkey owned a copyright to a photo, because they said a monkey took the photo completely on their own. The court said "okay well, it's public domain then because only humans can have copyrights"

Imagine you put a harry potter book in a copy machine. It is correct that the copy machine would not have a copyright to the output. But you would still be violating copyright by distributing the output.

schlauerfox 2 days ago [-]

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput... Specifically he claimed he owned the copyright on a photo he didn't directly take. PETA weighed in trying to say the monkey owned the copyright.

kube-system 2 days ago [-]

Ah yeah you’re right I forgot it was PETA arguing that.

pseudalopex 2 days ago [-]

> there’s jurisdiction that says AI-generated art can’t be copyrighted

The headline was misleading. The courts said what Thaler could have copyrighted was a complicated question they ignored because he said he was not the author.

gpm 1 days ago [-]

- Arguing that you owned the copyright on the copied code (the author here has apparently been the sole maintainer of this library since 2013, not all, but a lot of the code that could be copied here probably already belongs to him...)

red_admiral 2 days ago [-]

I'm with you here, but I see another problem.

The expected functionality of chardet (detect the unicode encoding) is kind of fixed - apart from edge cases and new additions to unicode, you'd expect the original and new implementations to largely pass the same tests, and have a lot of similar code such as for "does this start with a BOM".

The fact that the JPlag shows such a low %overlap for an implementation of "the same interface" is convincing evidence for me that it's not just plagiarised.

kahnclusions 1 days ago [-]

I’m surprised they think the AI generated rewrite is even copyrightable.

cubefox 2 days ago [-]

If you let an LLM merely rephrase the codebase, that's like letting it rephrase the Harry Potter novels. Which, I'm pretty sure, would still be considered a copy under copyright law, not an original work, despite not copying any text verbatim.

actsasbuffoon 2 days ago [-]

But what if it didn’t summarize Harry Potter? What if it analyzed Harry Potter and came back with a specification for how to write a compelling story about wizards? And then someone read that spec and wrote a different story about wizards that bears only the most superficial resemblance to Harry Potter in the sense that they’re both compelling stories about wizards?

This is legitimately a very weird case and I have no idea how a court would decide it.

cubefox 1 days ago [-]

That seems unrelated to what happened.

2 days ago [-]

spwa4 2 days ago [-]

Given that LLMs were trained on the repository directly, it's not just the case that anything made by the LLM is a derivative work, the LLM ITSELF is a derivative work. After all, they all are substantially based on GPL licensed works by others. The standard courts have always used for "substantially based" by the way, is the ability to extract from the new work anything bigger than an excerpt of the original work.

So convincing evidence, by historical standards, that ChatGPT, Gemini, Copilot AND Claude are all derivative works of the GPL linux kernel can be gotten simply by asking "give me struct sk_buff", then keep asking until you're out of the headers (say, ask how a network driver uses it).

That means if courts are honest (and they never are when it comes to GPL) OpenAI, Google and Anthropic would be forced to release ALL materials needed to duplicate their models "at cost". Given how LLMs work that would include all models, code, AND training data. After all, that is the contract these companies entered into when using the GPL licensed linux kernel.

But of course, to courts copyright applies to you when Microsoft demands it ($30000 per violation PLUS stopping the use of the offending file/torrent/software/... because such measures are apparently justified for downloading a $50 piece of software), it does not apply to big companies when the rules would destroy them.

The last time this was talked about someone pointed out that Microsoft "stole", as they call it, the software to do product keys. They were convicted for doing that, and the judge even increased damages because of Microsoft's behavior in the case.

But there is no way in hell you'll ever get justice from the courts in this. In fact courts have already decided that AI training is fair use on 2 conditions:

1) that the companies acquired the material itself without violating copyright. Of course it has already been proven that this is not the case for any of them (they scraped it without permission, which has been declared illegal again and again in the file sharing trials)

2) that the models refuse to reproduce copyrighted works. Now go to your favorite model and ask "Give me some code written by Linus Torvalds": not a peep about copyright violation.

... but it does not matter, and it won't matter. Courts are making excuses to allow LLM models to violate any copyright, the excuse does not work, does not convince rational people, but it just doesn't matter.

But of course, if you thought that just because they cheat against the law to make what they're already doing legal, they'll do the same for you, help you violate copyright, right? After all, that's how they work! Ok now go and ask:

"Make me an image of Mickey Mouse peeling a cheese banana under an angry moon"

And you'll get a reply "YOU EVIL COPYRIGHT VILLAIN". Despite, of course, Mickey Mouse no longer being covered under copyright!

And to really get angry, find your favorite indie artist, and ask to make something based on their work. Even "Make an MC Escher style painting of Sonic the Hedgehog" ... even that doesn't count as copyright violation, only the truly gigantic companies deserve copyright protection.

dragonwriter 1 days ago [-]

> Given that LLMs were trained on the repository directly, it's not just the case that anything made by the LLM is a derivative work, the LLM ITSELF is a derivative work.

That’s not how “derivative works”, well, work.

First of all, a thing can only be a derivative work if it is itself an original work of authorship.

Otherwise, it might be (or contain) a complete copy or a partial copy of one or more source works (which, if it doesn't fall into a copyright exception, would still be a at least a potential violation), but its not a derivative work.

spwa4 24 hours ago [-]

So you're saying LLMs don't count as an original work and so have zero copyright protection? So anyone running those models can just freely copy them if they have access to them? And, of course, it means distillation attacks, even if they do turn out to copy the OpenAIs/Anthropic/... model are just 100% perfectly legal? I mean paying someone to break into the DC and then putting the model on torrent would allow anyone downloading it to use it, legally. Because that would be the implication, wouldn't it?

Plus, if this is true, it would be a loophole. Plus this is totally crazy.

It would be great if courts declared WHAT is the case. But they won't, because copyright only protects massive companies.

dragonwriter 23 hours ago [-]

> So you're saying LLMs don't count as an original work and so have zero copyright protection?

No, I'm saying that your explanation of what makes something a derivative work is wrong. Now, personally, I think there is a very good argument that LLMs and similar models, if they have a copyright at all, do so only because of whatever copyright can be claimed on the training set as a work of its own (which, if ti exists, would be a compilation copyright), as a work of authorship of which it is a mechanical transformation (similar to object code having a copyright as a consequence of the copyright on the source code, which is a work of authorship.) Its also quite arguable that they are not subject to copyright, and many have made that argument.

> So anyone running those models can just freely copy them if they have access to them?

I'm not arguingn for that, but yes that is the consequence if they are not subject to copyright, assuming no other (e.g., contractual) prohibition binds the parties seeking to make copies.

> And, of course, it means distillation attacks, even if they do turn out to copy the OpenAIs/Anthropic/... model are just 100% perfectly legal?

Distillation isn't an “attack” and probably isn't a violation of copyright even if models are protected, they are literally interacting with the model through its interface to reproduce its function; they are functional reverse engineering.

Distillation is a violation of ToS, for which there are remedies outside of copyright.

> I mean paying someone to break into the DC and then putting the model on torrent would allow anyone downloading it to use it, legally.

Paying someone to break into the DC and do that would subject you to criminal charges for burglary and conspiracy, and civil liability for the associated torts as well as for theft of trade secrets covering the resulting harms, even without copyright protection.

> Plus, if this is true, it would be a loophole. Plus this is totally crazy.

Its not a “loophole” that copyright law only covers works of original authorship, it is the whole point of copyright law.

> It would be great if courts declared WHAT is the case.

If there is a dispute which turns on what is the case, courts will rule one way or the other on the issues necessary to resolve it. Courts (in the US at least) do not rule on issues not before them, except to the extent that a general rule which resolves but covers somewhat more than the immediate case can usefully be articulated by an appellate court.)

> But they won't, because copyright only protects massive companies.

Leaving out any question of whether the premise of this claim is true, the conclusion doesn't follow from it, since “what is the case” here is the kind of thing that is quite likely to be an issue between massive companies at some point in the not too distant future, requiring courts to resolve it even if they only address the meaning of copyright law for that purpose.

spwa4 22 hours ago [-]

Your first 3-4 arguments I just read as trying to weasel out from under the GPL. Because everyone trains on GPL code and if the GPL applies to the result ... well clearly you know the implications of that.

And btw: that a "compilation copyright" would apply to training data. Great. That only means, of course, that if they are publish their training data (like they agreed to when using GPL code to base their models on), people can't republish the exact same collection under different conditions (BUT they can under the same conditions). Everyone will happily follow that rule, don't worry.

> Paying someone to break into the DC and do that would subject you to criminal charges for burglary and conspiracy, and civil liability for the associated torts as well as for theft of trade secrets covering the resulting harms, even without copyright protection.

I don't claim the break-in would be legal, but without copyright protection, if that made a model leak, it would be fair game for everyone to use.

> Distillation is a violation of ToS, for which there are remedies outside of copyright.

But the models were created by violating ToS of webservers! This has the exact same problem the copyright violations have, only far far bigger! Scraping webservers is a violation of the ToS of those servers. For example [1]. Almost all have language somewhere that only allows humans to browse them, and bots, and IF bots are allowed at all (certainly not always), only specific bots for the purpose of indexing. So this is a much bigger problem for AI labs than even the GPL issue.

So yes, if you wanted to make the case that the AI labs, and large companies, violate any kind of contract, not just copyright licenses, excellent argument. But I know already: I'm a consultant, and I've had to sue, and won, 2 very large companies on terms of payment. In one case, I've had to do something called "forced execution", of the payment order (ie. going to the bank and demanding the bank execute the transaction against a random account of the company, against the will of the large company. Let me tell you, banks DO NOT like to do this)

Btw: what model training is doing, obviously, is distilling from the work, from the brain, of humans, against the will of those humans, and without paying for it. So in any reasonable interpretation, that's also a ToS violation. Probably a lot more implicit than the ones spelled out on websites, but not fundamentally different.

[1] https://www.bakerdatacounsel.com/blogs/terms-of-use-10-thing...

dragonwriter 6 hours ago [-]

> Your first 3-4 arguments I just read as trying to weasel out from under the GPL.

I haven't talked about any license, or given any though to any particular license in any of this; I don't know where you are reading anything about the GPL specifically into it.

None of this has anything to do with the GPL, except that the GPL only is even necessary where there is something to license because of a prohibition on copyright law.

> nd btw: that a "compilation copyright" would apply to training data. Great. That only means, of course, that if they are publish their training data (like they agreed to when using GPL code to base their models on), people can't republish the exact same collection under different conditions (BUT they can under the same conditions).

No, that's not what it means, and I don't know where you got the "other terms" or the dependency on publication from; neither is from copyright law.

> But the models were created by violating ToS of webservers!

And, so what?

To the extent those terms are binding (more likely the case for sites where there is affirmative assent to the conditions, like ones that are gated on accounts with a signup process that requires agreeing to the ToS, e.g., “clickwrap”), there are remedies. For those where the conditions are not legally binding (more like the case where the terms are linked but there is no access gating, clear notice, or affirmative assent), well, they aren't binding.

> Btw: what model training is doing, obviously, is distilling from the work, from the brain, of humans, against the will of those humans, and without paying for it. So in any reasonable inteUhrpretation, that's also a ToS violation.

Uh, what? We are just creating imaginary new categories of intellectual property and imaginary terms of service and imaginary bases for those terms to be enforceable now?

vsl 1 days ago [-]

The LLM would, under that argument, be a transformative derivative work, which has important fair use implications (that don’t exist in the chardet case)…

aaron695 2 days ago [-]

[dead]

TZubiri 2 days ago [-]

Ok sure, in the alternative, here's the argument:

The AI was trained with the code, so the complete rewrite is tainted and not a clean room. I can't believe this would need spelling out.

pocksuppet 2 days ago [-]

"Tainted rewrite" isn't a legal concept either. You have to prove (on balance of probabilities - more likely than not) that the defendant made an unauthorized copy, made an unauthorized derivative work, etc. Clean-room rewriting is a defense strategy, because if the programmer never saw the original work, they couldn't possibly have made a derivative. But even without that, you still have to prove they did. It's not an offence to just not be able to prove you didn't break the law.

2 days ago [-]

rmast 1 days ago [-]

If you wanted to do the clean-room approach for something like chardet in a less controversial way, instead of having the AI do all the work couldn’t the AI generate the spec and then a human (with no exposure to the original code) do an initial implementation based on the spec?

Manuel_D 2 days ago [-]

As other pointed out, the notion of "clean room" rewrites is to make a particularly strong case of non-infringement. It doesn't mean that anything other than a clean room implementation is an infringement.

jdauriemma 2 days ago [-]

This is interesting and I'm not sure what to make of it. Devil's advocate: the person operating the AI also was "trained with the code," is that materially different from them writing it by hand vs. assisted by an LLM? Honestly asking, I hadn't considered this angle before.

cardanome 2 days ago [-]

If you worked at Microsoft and had access to the Windows source code you probably should not be contributing to WINE or similar projects as there would be legal risk.

So for this case, not much different legally. Of course there is the practical difference just like there is between me seeing you with my own eyes and me taking a picture of you.

"Training" an LLM ist not the same as training a human being. It a metaphor. Its confusing the save icon with an actual floppy disk.

I can say I "trained" my printer to print copyrighted material by feeding it bits but that that would be pure sophism.

Problem is that law hasn't really caught up the our brave new AI future yet so lots of decisions are up in the air. Plus governments incentivized to look the other way regarding copyright abuses when it comes to AI as they think that having competitive AI is of strategic importance.

jdauriemma 2 days ago [-]

> "Training" an LLM ist not the same as training a human being. It a metaphor. Its confusing the save icon with an actual floppy disk.

Maybe? But the design of the floppy disk is for data storage and retrieval per se. It can't give you your bits in a novel order like an LLM does (by design). From what I can tell in this case, the output is significantly differentiated from the source code.

senko 2 days ago [-]

Reread the parent: clean room is not required.

TZubiri 10 hours ago [-]

Oh, got it.

Parent was making a claim about clean room not being required, without making claims about whether LLM coding is or isn't clean room.

jacquesm 2 days ago [-]

This is correct. I think any author of a main chunk of code that they claim ownership to (which is probably all of us!) should at least study the basics of copyright law. Getting little details wrong can cost you time, money and eventually your business if you're not careful.

dathinab 2 days ago [-]

The argument that a rewrite is a copyright violation because they are familiar with the code base is not fully sound.

"Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

Or else a artist having seen a picture of a sunset over an empty ocean wouldn't be allowed to pain another sunset over an empty ocean as people could claim copyright violation.

Through what is a violation is, if you place the code side by side and try to circumvent copyright law by just rephrasing the exact same code.

This also means that if you give an AI access to a code base and tell it to produce a new code base doing the same (or similar) it will most likely be ruled as copyright violation as it's pretty much a side by side rewriting.

But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so. Rewrite it from scratch. And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

Through while doing so is not per-se illegal, it is legally very attackable. As you will have a hard time defending such a rewrite from copyright claims (except if it's internally so completely different that it stops any claims of "being a copy", e.g. you use complete different algorithms, architecture, etc. to produce the same results in a different way).

In the end while technically "legally hard to defend" != "illegal", for companies it's most times best to treat it the same.

simiones 2 days ago [-]

> "Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.

On the contrary. Except for discussions about punitive damages and so on, insider knowledge or lack thereof is completely irrelevant to patent law. If company A has a patent on something, they can assert said patent against company B regardless of whether any person in company B had ever seen or heard of company A and their patent. Company B could have a legal trail proving they invented their product that matches the patent from scratch with no outside knowledge, and that they had been doing this before company A had even filed their patent, and it wouldn't matter at all - company A, by virtue of filing and being granted a patent, has a legal monopoly on that invention.

In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work. Even more, you have your own copyright over your own work, and can assert it over anyone that tries to copy your work without permission, despite an identical work existing and being owned by someone else.

Now, purely in principle this would remain true even if you had seen the other work. But in reality, it's impossible to convince any jury that you happened to produce, entirely out of your own creativity, an original work that is identical to a work you had seen before.

> But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so.

No, this is very much false. You will never be able to win a court case on this, as any significant similarity between your work and the original will be considered a copyright violation, per the preponderance of the evidence.

aleph_minus_one 2 days ago [-]

> In contrast, for copyright the right is intrinsically tied to the origin of a work. If you create a digital image that is entirely identical at the pixel level with a copyrighted work, and you can prove that you had never seen that original copyrighted work and you created your image completely independently, then you have not broken anyone's copyright and are free to sell copies of your own work.

This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

> https://www.travelandleisure.com/photography/illegal-to-take...

> https://www.headout.com/blog/eiffel-tower-copyright/

simiones 2 days ago [-]

This has no relation to what I was saying. Taking a photo of a copyrighted work is a method for creating a copy of said work using a mechanical device, so it is of course covered by copyright (whether buildings or light shows fall under copyright is an irrelevant detail).

What I'm saying is that if you, say, create an image of a red oval in MS Paint, you have copyright over said image. If 2 years later I create an identical image myself having never seen your image, I also have copyright over my image - despite it being identical to your image, I have every right to sell copies of my image, and even to sue someone who distributes copies of my image without my permission (but not if they're distributing copies of your image).

But if I had seen your image of a red oval before I created mine, it's basically impossible for me to prove that I created my own image out of my own creativity, and I didn't just copy yours. So, if you were to sue me for copyright infringement, I would almost certainly lose in front of any reasonable jury.

chimeracoder 2 days ago [-]

> This is not true. I will just give the example of the nighttime illumination of the Eiffel Tower:

That example is not analogous to the topic at hand.

But furthermore, it also is specific to French/European copyright law. In the US, the US Copyright Act would not permit restrictions on photographs of architectural works that are visible from public spaces.

jerrysievert 2 days ago [-]

actually, the US Copyright Act does in fact allow restrictions on photographs of architectural works that are visible from public spaces:

https://en.wikipedia.org/wiki/Portlandia_(statue)

the Portlandia statue is one such architectural work - and its creator is fairly litigious.

chimeracoder 2 days ago [-]

I don't know the details of that specific case so I can't speak to it, but the text of the AWCPA is very clear:

> The copyright in an architectural work that has been constructed does not include the right to prevent the making, distributing, or public display of pictures, paintings, photographs, or other pictorial representations of the work, if the building in which the work is embodied is located in or ordinarily visible from a public place.

This codifies an already-established principle in US law. French law does not have that same principle.

2 days ago [-]

twoodfin 2 days ago [-]

If I read Mario Puzo’s The Godfather and then proceed to write a structurally identical novel with many of the same story beats and character types, it will not be difficult to convince a jury exposed to these facts that I’ve created a derivative work.

On the other hand, if I can prove to the jury’s satisfaction that I’ve never been exposed to Puzo’s work in any form, it’s independent creation.

Manuel_D 2 days ago [-]

To the contrary, there have been many cases of very similar novels with largely identical plot points and settings that survive copyright allegations, even if the author was exposed to the original work.

For a rather entertaining example (though raunchy, for a heads up): https://www.youtube.com/watch?v=zhWWcWtAUoY&themeRefresh=1

twoodfin 2 days ago [-]

Sure, but there’s some level of slavish copying with the serial numbers filed off that would convince a judge or a jury that it’s derivative.

Manuel_D 1 days ago [-]

Sure, but that level is a lot higher than what a lot of commenters seem to think.

2 days ago [-]

helsinkiandrew 2 days ago [-]

In the case of chardet though it wouldn't it be more like you were the publisher of the godfather novel, withdrawing it from print and releasing a novel with the same name with much of the same plot and characters but claiming the new version was an independent creation?

pocksuppet 2 days ago [-]

That's even worse for your case.

helsinkiandrew 2 days ago [-]

If the new maintainers used Claude as their “fancy code generator” (there’s a Claude.md file in the repository so it seems so) then it was almost certainly trained with the chardet source code.

oneeyedpigeon 2 days ago [-]

> And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.

How different does the new code have to be from the old code and how is that measured?

larodi 2 days ago [-]

nobody can tell and this is how we entered this very turbulent modern times of "everything can be retold" without punishment. LLMs already doing it at large, while original author is correct in terms of the LGPL, it is nearly impossible to say how different should expression of an idea be to be considered separate one. this is truly fundamental philosophical question that may not have an easy answer.

bsenftner 2 days ago [-]

Hate to be "that guy" but in a corrupt legal system, which ours is, none of this matters. Who has the influence and dollars to make the decision theirs is all that matters.

jmyeet 2 days ago [-]

This is a bad argument.

Think of a rewrite (by a human or an LLM) as a translation. If you wrote a book in English and somebody translated it into Spanish, it'd still be a copyright issue. Same thing with translations.

That's very different to taking the idea of a body of work. So you can't copyright the idea of a pirate taking a princess hostage and a hero rescuing her. That's too generic. But even here there are limits. There have been lawsuits over artistic works being too similar.

Back to software, you can't copyright the idea of photo-editing software but you can copyright the source code that produces that software. If you can somehow prompt an LLM to produce photo editing software or if a person writes it themselves then you have what's generally referred to as a "cleanroom" implmentation and that's copyright-free (although you may have patent issues, which is a whole separate issue).

But even if you prompted an LLM that way, how did the LLM learn what it needed? Was the source code of another project an input in its training? This is a legal grey area, currently. But I suspect it's going to be a problem.

pera 2 days ago [-]

Suchir Balaji, the OpenAI researcher who was found dead in his flat just before testifying against his employer, published an excellent article somehow related to this topic:

When does generative AI qualify for fair use?

https://suchir.net/fair_use.html

Balaji's argument is very strong and I feel we will see it tested in court as soon as LLM license-washing starts getting more popular.

RcouF1uZ4gsC 2 days ago [-]

I think you could have an LLM produce a written English detailed description of the complete logic of the program and tests.

Then use another LLM to produce code from that spec.

This would be similar to the cleanroom technique.

simiones 2 days ago [-]

Producing a copy of a copyrighted work through a purely mechanical process is clear violation of copyright. LLMs are absolutely not different from a copier machine in the eyes of the law.

Original works can only be produced by a human being, by definition in copyright law. Any artifact produced by an animal, a mechanical process, a machine, a natural phenomenon etc is either a derived work if it started from an original copyrighted work, or a public domain artifact not covered by copyright law if it didn't.

For example, an image created on a rock struck by lightning is not a copyright covered work. Similarly, an image generated by an diffusion model from a randomly generated sentence is not a copyrightable work. However, if you feed a novel as a prompt to an LLM and ask for a summary, the resulting summary is a derived work of said novel, and it falls under the copyright of the novel's owner - you are not allowed to distribute copies of the summary the LLM generated for you.

Whether the output of an LLM, or the LLM weights themselves, might be considered derived works of the training set of that LLM is a completely different discussion, and one that has not yet been settled in court.

robinsonb5 2 days ago [-]

Perhaps - but an argument might still be made that the result is a derivative work of the original, given that it's produced by feeding the original work through automated tooling.

But either way, deleting the original version from the repo and replacing it with the new version - as opposed to, say, archiving the old version and starting a new repo with the new version - would still be a dick move.

robin_reala 2 days ago [-]

Assuming the second LLM hadn’t been trained on the existing codebase. Which in this case we can’t know, but can assume that it was.

knollimar 2 days ago [-]

Does the second LLM have the codebase in its training?

9864247888754 2 days ago [-]

One could use Comma, which has only been trained on public domain texts:

https://arxiv.org/pdf/2506.05209

Roritharr 2 days ago [-]

As part of my consulting, i've stumbled upon this issue in a commercial context. A SaaS company who has the mobile apps of their platform open source approached me with the following concern.

One of their engineers was able to recreate their platform by letting Claude Code reverse engineer their Apps and the Web-Frontend, creating an API-compatible backend that is functionally identical.

Took him a week after work. It's not as stable, the unit-tests need more work, the code has some unnecessary duplication, hosting isn't fully figured out, but the end-to-end test-harness is even more stable than their own.

"How do we protect ourselves against a competitor doing this?"

Noodling on this at the moment.

3rodents 2 days ago [-]

You're not describing anything new, you're describing progress. A company invests time and money and expertise into building a product, it becomes established, people copy in 1/10th of the time, the quality of products across the industry improve. Long before generative AI, Instagram famously copied Snapchat's stories concept in a weekend, and that is now a multi-multi-multi-billion contributor to Meta's bottom line.

As engineers, we often think only about code, but code has never been what makes a business succeed. If your client thinks that their businesses primary value is in the mobile app code they wrote, 1) why is it even open source? 2) the business is doomed.

Realistically, though, this is inconsequential, and any time spent worrying about this is wasted time. You don't protect yourself from your competitor by worrying about them copying your mobile app.

amelius 2 days ago [-]

> You don't protect yourself from your competitor by worrying about them copying your mobile app.

They did not copy the mobile app. They copied the service.

3rodents 2 days ago [-]

Replace “mobile app” with “backend” in my comment.

IanCal 2 days ago [-]

You might be interested in the dark factory work here https://factory.strongdm.ai/

They do something very similar for some of their work. It’s hard to use external services so they replicate them and the cost of doing so has come down from “don’t be daft, we can’t reimplement slack and google drive this sprint just to make testing faster” to realistic. They run the sdks against the live services and their own implementations until they don’t see behaviour differences. Now they have a fast slack and drive and more (that do everything they need for their testing) accelerating other work. I’m dramatically shifting my concept of what’s expensive and not for development. What you’re describing could have been done by someone before, but the difficulty of building that backend has dropped enormously. Even if the application was closed you could probably either now or soon start to do the same thing starting with building back to core user stories and building the app as well.

You can view some of this as having things like the application as a very precise specification.

Really fascinating moment of change.

Garlef 2 days ago [-]

> It’s hard to use external services

I think it's interesting to add what they use it for and why its hard.

What they use it for:

- It's about automated testing against third party services.

- It's not about replicating the product for end users

Why using external services is hard/problematic

- Performance: They want to have super fast feedback cycles in the agentic loop: In-Memory tests. So they let the AI write full in-memory simulations of (for example) the slack api that are behaviorally equivalent for their use cases.

- Feasiblity: The sandboxes offered by these services usually have performance limits (= number of requests per month, etc) that would easily be exhausted if attached to a test harness that runs every other minute in an automated BDD loop.

zozbot234 2 days ago [-]

> "How do we protect ourselves against a competitor doing this?"

If the platform is so trivial that it can be reverse engineered by an AI agent from a dumb frontend, what's there to protect against? One has to assume that their moat is not that part of the backend but something else entirely about how the service is being provided.

littlecranky67 2 days ago [-]

Interesting case, IANAL but sounds legal and legit. The AI did not have expose to the backend it re-implemented. The API itself is public and not protectable.

bandrami 2 days ago [-]

OTOH as of yesterday the output of the LLM isn't copyrightable, which makes licensing it difficult

graemep 2 days ago [-]

As other's have pointed out, this case is really about refusing to allow an LLM to be recognised as the author. The person using the LLM waived any right to be recognised as the author.

Its also US only. Other countries will differ. This means you can only rely on this ruling at all for something you are distributing only in the US. Might be OK for art, definitely not for most software. Very definitely not OK for a software library.

For example UK law specifically says "In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken."

https://www.legislation.gov.uk/ukpga/1988/48/section/9

jacquesm 2 days ago [-]

> The person using the LLM waived any right to be recognised as the author.

They can't waive their liability from being identified as an infringer though.

bakugo 2 days ago [-]

> the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken.

This seems extremely vague. One could argue that any part of the pipeline counts as an "arrangement necessary for the creation of the work", so who is the author? The prompter, the creator of the model, or the creator of the training data?

graemep 2 days ago [-]

The courts will have to settle that according to circumstances. I think it is likely to be the prompter, and in some cases the creator of the training data as well. The creator of the model will have copyright on the model, but unlikely to have copyright on its outputs (any more than the writer of a compiler has copyright on its output).

NitpickLawyer 2 days ago [-]

I wrote this comment on another thread earlier, but it seems relevant here, so I'll just c/p:

I think we didn't even began to consider all the implications of this, and while people ran with that one case where someone couldn't copyright a generated image, it's not that easy for code. I think there needs to be way more litigation before we can confidently say it's settled.

If "generated" code is not copyrightable, where do draw the line on what generated means? Do macros count? Does code that generates other code count? Protobuf?

If it's the tool that generates the code, again where do we draw the line? Is it just using 3rd party tools? Would training your own count? Would a "random" code gen and pick the winners (by whatever means) count? Bruteforce all the space (silly example but hey we're in silly space here) counts?

Is it just "AI" adjacent that isn't copyrightable? If so how do you define AI? Does autocomplete count? Intellisense? Smarter intellisense?

Are we gonna have to have a trial where there's at least one lawyer making silly comparisons between LLMs and power plugs? Or maybe counting abacuses (abaci?)... "But your honour, it's just random numbers / matrix multiplications...

bandrami 2 days ago [-]

In terms of adoption, "it's not settled" is even worse

amelius 2 days ago [-]

Maybe we should build an LLM that can be the judge of that :)

senko 2 days ago [-]

That's a very incorrect reading.

AI can't be the author of the work. Human driving the AI can, unless they zero-shotted the solution with no creative input.

camgunz 2 days ago [-]

Only the authored parts can be copyrighted, and only humans can author [0].

"For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the 'traditional elements of authorship' are determined and executed by the technology—not the human user."

"In other cases, however, a work containing AI-generated material will also contain sufficient human authorship to support a copyright claim. For example, a human may select or arrange AI-generated material in a sufficiently creative way that 'the resulting work as a whole constitutes an original work of authorship.'"

"Or an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work, which are 'independent of' and do 'not affect' the copyright status of the AI-generated material itself."

IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

[0]: https://www.federalregister.gov/d/2023-05321/p-40

simiones 2 days ago [-]

> IMO this is pretty common sense. No one's arguing they're authoring generated code; the whole point is to not author it.

Actually this is very much how people think for code.

Consider the following consequence. Say I work for a company. Every time I generate some code with Claude, I keep a copy of said code. Once the full code is tested and released, I throw away any code that was not working well. Now I leave the company and approach their competitor. I provide all of the working code generated by Claude to the competitor. Per the new ruling, this should be perfectly legal, as this generated code is not copyrightable and thus doesn't belong to anyone.

camgunz 2 days ago [-]

No software company thinks this, not Oracle, not Google, not Meta, no one. See: the guy they sued for taking things to Uber.

simiones 1 days ago [-]

The person I replied to said "No one's arguing they're authoring generated code; the whole point is to not author it.". My point was that people absolutely do think and believe strongly they are authoring code when they are generating it with AI - and thus they are claiming ownership rights over it.

camgunz 24 hours ago [-]

(the person you originally replied to is also me, tl;dr: I think engineers don't think they're authoring, but companies do)

The core feature of generative AI is the human isn't the author of the output. Authoring something and generating something with generative AI aren't equivalent processes; you know this because if you try and get a person who's fully on board w/ generative AI to not use it, they will argue the old process isn't the same as the new process and they don't want to go back. The actual output is irrelevant; authorship is a process.

But, to your point, I think you're right: companies super think their engineers have the rights to the output they assign to them. If it wasn't clear before it's clear now: engineers shouldn't be passing off generated output as authored output. They have to have the right to assign the totality of their output to their employer (same as using MIT code or whatever), so that it ultimately belongs to them or they have a valid license to use it. If they break that agreement, they break their contract with the company.

maxerickson 2 days ago [-]

So if I want to publish a project under some license and I put a comment in an AI generated file (never mind what I put in the comment), how do you go about proving which portion of that file is not protected under copyright?

If the AI code isn't copyrightable, I don't have any obligations to acknowledge it.

bandrami 2 days ago [-]

You're looking at this as the infringer rather than the owner. How do you as a copyright owner prove you meaningfully arranged the work when you want to enforce your copyright?

maxerickson 1 days ago [-]

I was looking at it from the perspective of an owner who simply wants to discourage use outside of some particular license.

There's close enough to zero enforcement of infringement, it's all self policing or violation.

camgunz 2 days ago [-]

Copyright office says this has to be done case-by-case. My guess is they'd ask to see prompts and evidence of authorship.

skeledrew 2 days ago [-]

The human is still at best a co-author, as the primary implementation effort isn't theirs. And I think effort involved is the key contention in these cases. Yesterday ideas were cheap, and it was the execution that matters. Today execution is probably cheaper than ideas, but things should still hold.

tricorn 18 hours ago [-]

No, effort is explicitly not a factor in copyright. It was at one point, but "sweat of the brow" doctrine went away in Feist Publications in 1991, at least in the US.

phire 2 days ago [-]

That's not really what the ruling said. Though, I suspect this type of "vibe rewrite" does fall afoul of the same issue.

But for this type of copyright laundering, it doesn't really matter. The goal isn't really about licensing it, it's about avoiding the existing licence. The idea that the code ends up as public domain isn't really an issue for them.

oblio 2 days ago [-]

As of yesterday?

phi-go 2 days ago [-]

I think they mean this: https://news.ycombinator.com/item?id=47232289

rwmj 2 days ago [-]

No serious enterprise SaaS company differentiates themselves solely on the product (the products are usually terrible). It's the sales channel, the fact that you know how to bill a big company, the human engineer who is sent on site to deploy and integrate the product, the people on the support line 24/7, the regulatory framework that ensures the customer can operate legally and obtain insurance, the fact that there's a deep pool of potential hires who have used and understand the product. Those are the differentiators.

jillesvangurp 2 days ago [-]

> "How do we protect ourselves against a competitor doing this?"

You can try patenting; but not after the fact. Copyright won't help you here. You can't copyright an algorithm or idea, just a specific form or implementation of it. And there is a lot of legal history about what is and isn't a derivative work here. Some companies try to forbid reverse engineering in their licensing. But of course that might be a bit hard to enforce, or prove. And it doesn't work for OSS stuff in any case.

Stuff like this has been common practice in the industry for decades. Most good software ideas get picked apart, copied and re-implemented. IBM's bios for the first PC quickly got reverse engineered and then other companies started making IBM compatible PCs. IBM never open sourced their bios and they probably did not intend for that to happen. But that didn't matter. Likewise there were several PC compatible DOS variants that each could (mostly) run the same applications. MS never open sourced DOS either. There are countless examples of people figuring out how stuff works and then creating independent implementations. All that is perfectly legal.

jasomill 1 days ago [-]

IBM never open sourced their BIOS, but they did publish complete source code listings:

https://bitsavers.org/pdf/ibm/pc/pc/6025008_PC_Technical_Ref...

https://bitsavers.org/pdf/ibm/pc/xt/1502237_PC_XT_Technical_...

https://bitsavers.org/pdf/ibm/pc/at/1502494_PC_AT_Technical_...

Between this and the fact that their PC-DOS (née MS-DOS) license was nonexclusive, I'm honestly not sure what they expected to happen.

The nature of early IBM PC advertising suggests to me that they expected the IBM name and established business relationships to carry as much weight as the specifications itself, and that "IBM PC compatible" systems would be no more attractive than existing personal computers running similar if not identical third-party software (PC-DOS wasn't the only example of IBM reselling third-party software under nonexclusive license), and would perhaps even lead to increased sales of first-party IBM PCs.

Which, in fact, they did, leading me to believe the actual result may have been not too far from their original intent, only with IBM capturing and holding a larger share of the pie.

consumer451 2 days ago [-]

> "How do we protect ourselves against a competitor doing this?"

I have been thinking about this a lot lately, as someone launching a niche b2b SaaS. The unfortunate conclusion that I have come to is: have more capital than anyone for distribution.

Is there any other answer to this? I hope so, as we are not in the well-capitalized category, but we have friendly user traction. I think the only possible way to succeed is to quietly secure some big contracts.

I had been hoping to bootstrap, but how can we in this new "code is cheap" world? I know it's always been like this, but it is even worse now, isn't it?

ShowalkKama 2 days ago [-]

If your backend is trivial enough to be implemented by a large language model, what value are you providing?

I know it's a provoking question but that answers why a competitor is not a competitor.

dboreham 2 days ago [-]

I suspect you're underestimating the capabilities of today's LLMs.

nandomrumber 2 days ago [-]

Maybe a better question is:

How do our competitors protect themselves against us doing this?

dredmorbius 2 days ago [-]

Particularly if you're named "Google", "Amazon", "Microsoft", or "Apple".

mellosouls 2 days ago [-]

The famous case Google vs Oracle may need to be re-evaluated in the light of Agents making API implementation trivial.

https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_....

Meneth 2 days ago [-]

"How do we protect ourselves against a competitor doing this?"

That's the neat thing: you don't!

senko 2 days ago [-]

> "How do we protect ourselves against a competitor doing this?"

DMCA. The EULA likely prohibits reverse engineering. If a competitor does that, hit'em with lawyers.

Or, if you want to be able to sleep at night, recognize this as an opportunity instead of a threat.

orthoxerox 2 days ago [-]

What about jurisdictions where reverse engineering is an inalienable right?

dredmorbius 2 days ago [-]

Which are those?

orthoxerox 2 days ago [-]

Afaik, the EU and Russia says that observing/experimenting with the external behavior of the program to determine its internal logic is legal.

Russia even allows to decompile object code if you have to solve private compatibility issues.

jasomill 1 days ago [-]

Even in the US, are there any non-DRM examples where reverse engineering for the purpose of interoperability in violation of a license agreement have been used as the basis for copyright claims, even when the results are incorporated into a competing product?

For example, I don't recall Microsoft ever being sued by WordPerfect or Lotus for reading and writing their applications' unpublished file formats, which wouldn't have necessarily involved disassembly or decompilation, but was still the result of reverse engineering that almost certainly involved using a licensed or unlicensed copy of the competitor's product.

dredmorbius 1 days ago [-]

Google LLC v. Oracle America, Inc. is also a relevant case, I suspect. Found for Google against Oracle's claim of copyright infringement, for non-clean-room RE of Java APIs:

<https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...>

amelius 2 days ago [-]

Makes me wonder when AI will put the mobile phone OS duopoly to an end.

fragmede 2 days ago [-]

Nothing. This is why SaaS stocks took a dump last week.

jmyeet 2 days ago [-]

I think the genie is out of the bottle on this one and there's really no putting it back.

There is a certain amount of brand loyalty and platform inertia that will keep people. Also, as you point out, just having the source code isn't enough. Running a platform is more than that. But that gap will narrow with time.

The broader issue here is that there are people in tech who don't realize that AI is coming for their jobs (and companies) too. I hope people in this position can maybe understand the overall societal issues for other people seeing their industries "disrupted" (ie destroyed) by AI.

markthered 2 days ago [-]

The copyright argument is a sidetrack both in the PR comment thread and here. The issue opened claims the new code is based on the old code, and therefore derivative, and therefore must be offered in a modified version of the source code under the previous license, LGPL. The complaint is the maintainers violated the terms of LGPL, that they must prove no derivation from the original code to legally claim this is a legal new version without the LGPL license. Claim is if they or Claude read the old code (or of course directly use any of it) it is a license violation. “… in the release 7.0.0, the maintainers claim to have the right to “relicense” the project. They have no such right; doing so is an explicit violation of the LGPL. Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation).“ By this reasoning, I am genuinely asking (I’m not a license expert) if a valid clean room rewrite is possible, because at a minimum you would need a spec describing all behavior, which ses to require ample exposure to the original to be sufficiently precise.

Pannoniae 2 days ago [-]

That's not what a derivative work means, though. Being exposed to something doesn't mean you can't create original work which is similar to it (otherwise every song or artwork would be a derivative work of everything before it)

People do cleanroom implementations as a precaution against a lawsuit, but it's not a necessary element.

In fact, even if some parts are similar, it's still not a clear-cut case - the defendant can very well argue that the usage was 1. transformative 2. insubstantial to the entirety of work.

"The complaint is the maintainers violated the terms of LGPL, that they must prove no derivation from the original code to legally claim this is a legal new version without the LGPL license."

The burden of proof is on the accuser.

"I am genuinely asking (I’m not a license expert) if a valid clean room rewrite is possible, because at a minimum you would need a spec describing all behavior, which ses to require ample exposure to the original to be sufficiently precise."

Linux would be illegal if so (they had knowledge of Unix before), and many GNU tools are libre API-compatible reimplementations of previous Unix utilities :)

simiones 2 days ago [-]

The copyright argument is the only relevant argument. If the new work is a derived work of the original, then it follows by definition that the new work is under the copryight of the original's author(s). Since the original chardet was distributed by its author(s) only under the LGPL, any copy/derivative of it that anyone else creates must be distributed only under the LGPL, per the terms of the LGPL.

Now, whether chardet 7.0.0 is a derivative of chardet or not is a matter of copyright law that the LGPL has no say on, and a rather murky ground with not that much case law to rely on behind it. If it's not, the new author is free to distribute chardet 7.0.0 under any license they want, since it is a new work under his copyright.

klustregrif 1 days ago [-]

> Claim is if they or Claude read the old code (or of course directly use any of it) it is a license violation

The original code is part of claude's training material. With that intepretation of the LGPL AI is incapable of writing non LGPL derivatives. I like that interpretation.

yxhuvud 1 days ago [-]

I'm not certain I buy it, but I find it a little hard to motivate the training being fair use if used to regenerate the project in a different licence.

gmerc 2 days ago [-]

It's the same project/name, with a version bump.

amtamt 2 days ago [-]

Maintainers must not be able to change the license that original author chose, and based on which contributors made contributions. When one stepped up to be maintainer, it was a trustee role, not owner role.

It should be perfectly ok (by maintainer or anyone for that mater) to be inspired from a community project and build something from scratch hand-crafting/ AI sloping, as long as the imitation is given a new name/ identity.

What rubbed me off personally was maintainer saying "pin your dependncies to version 6.0.0 or 5.x.x", as if maintainer owns the project. maintainer role is more akin to serve the community, not rule.

If it is completely new, why not start a new project with new name? No one will object. And of course leave the old project behind to whoever is willing to maintain it. And if the new name project is better, people will follow.

hoopyKt 1 days ago [-]

A distinction should be made between ownership of the code and its copyright and ownership of the repository and associated distribution channels. As far as I know, there's no precedent to state that owning the former means ownership of the latter. The original author abandoned this project years ago and likely has no legal basis to maintain that the project itself stays LGPL, only that their code and derivatives of it stay LGPL. Unless it could be proven otherwise, a rewrite of the project under a different license made without directly referencing the original is likely well within the rights of the owners of the repository to do.

amtamt 24 hours ago [-]

Say a community builds a hall with an explicit intent of community use only, led by single person or a group, and then a person/ a group is appointed as a caretaker. Caretaker of the community hall decides unilaterally to convert the hall into a business convention center, razing old building to ground and rebuilding, disregarding community wishes, to make hall business friendly. How would you react to this situation? Caretaker has the ownership of the community assets, including the ground on which the hall is standing?

my understanding of the situation is:

Is the caretaker paying from his own pocket to maintain the hall? no

Is the caretaker paying from his own pocket for community usage of the hall? no

Is the caretaker spending time to maintain the community hall? yes

Is caretaker obliged to spend time on community hall? no

Is caretaker free to stop spending time on community hall? yes.

Is caretaker free to raze current hall, build new hall on same ground for new purposes WITH community agreement? YES

Is caretaker free to raze current hall, build new hall on same ground for new purposes WITHOUT community agreement (even if paying all the bill)? NO

Is caretaker free to build another similar hall someplace else? YES

Reasoning of your comment is of someone who is hell bent on staking claims on community resources (like big companies) without having slightest concern of the wishes or well-being of the community. Not sure of the commenter's motive either, given the new account with just two comments, supporting such blatant disregard of basic human decency.

hoopyKt 21 hours ago [-]

I think your metaphor is flawed though, firstly because we're not talking about the maintainer being a caretaker, for all intents and purposes they are the owner of chardet, just not a subset of the IP within, those are two separate entities here. Secondly, the original author doesn't have any ties to this project within the last decade, to imply that they're paying for it or have any ownership over how the project is operated beyond the scope of the license is just wrong.

If you'd want to correct the metaphor, this is a more accurate understanding of the situation:

Is the maintainer obligated to the terms of the original license? yes

Does the original IP holder have any rights beyond that license? no

Is the maintainer free to raze the current hall, as long as the IP-holder's property is appropriately removed first? YES

Now if it were to come out that ownership of the chardet name, pip package, or github organization were transferred to the maintainer under an agreement that the project always stay LGPL regardless of the actual terms of the license that's a whole other thing, but nobody has stated that is the case. The only contention is whether LGPL was violated by the rewrite under a new license, but if that's not the case it is entirely the prerogative of the project maintainers to do as they wish.

That's what free software is all about.

If the community wants the old "community hall" it still exists, they can still use it and do what they want with it. They have a right to the hall, but the maintainer has a right to the repository and the package name and one does not nullify the other.

hoopyKt 21 hours ago [-]

There is a certain irony here as well that this project was considered for actual community ownership by being added to the standard library, but it was decided that it was ineligible due to the LGPL license. Had this been MIT from the start you'd actually be correct about the community having some kind of ownership over how the project is operated, but that isn't the case here, it's not community owned. It's owned by the maintainer, it's their IP broadly and they can do as they wish within that LGPL license, including removing the LGPL licensed code.

amtamt 20 hours ago [-]

Just one question, why maintainer is hell bent on using existing name and removing LGPL and not create an entirely new project by new name and new license (after all this is completely new code... right)?

First reason would be use the "name recall", and second guess would be to do another rug-pull to re-licence under some other conditions.

> It's owned by the maintainer

This is completely in-correct. GPL and variants (FOSS, not OSS) were meant to make software free of "any ownership".

hoopyKt 20 hours ago [-]

Possibly they're "hell bent" on using the existing name because they've been using that name for their project in their github repository with their pip package that they've been supporting for a decade now. If they chose to make a new name, new package, new repo the current one would simply be abandonware. That's no different from what they're doing now, but with the added benefit of including a mechanism to inform their users of the situation.

You're acting like they've just swooped in in the last week to steal this repo out from under the community, when you have to go back to 2024 to see another person's name in the commits and to 2022 to see another person show up more than once. This one guy has been thanklessly maintaining chardet for years, decided to do a fresh rewrite and decided he'd like to use a different license now that the opportunity is here.

> GPL and variants (FOSS, not OSS) were meant to make software free of "any ownership".

And you're right! Version 6 and earlier is (functionally but not actually) free of "any ownership"! It still exists in this repo and out in the world! You can still personally fork it and make your own LGPL with a version 7 if that's the world you want to live in! If you don't want to use an OSS project using MIT you still have the community non-ownership of that code!

But you're not upset that the new code is MIT, you're upset that the new MIT code is using that name in pip and GitHub, but pip is MIT and GitHub is proprietary! The parts you're mad about were never LGPL! Because reminder! This code works outside of pip! Git is decentralized, this GitHub repository isn't the source of truth! Your fork of version 6 is just as real and valid as the MIT'd version 7! Your fork of version 6 can still be called chardet and will still work and is still community owned! You never had to use this guy's repo and this guy's pip publication! And nobody is entitled to this guy's pip account or GitHub account just because he rewrote the library, the community is entitled to the software and they still have it. This is all 100% valid under even the strictest FOSS license, much less LGPL.

amtamt 18 hours ago [-]

One would say

this guy's forked MIT'd version 0 will be as real and valid as version 6 of original chardet.

instead of

> Your fork of version 6 is just as real and valid as the MIT'd version 7!

Supporting for a decade is not a basis for unilateral takeover. In last 3 months there seem to be at least 3 other active contributors, any many dozens in past, who share the copyright on parts (ownership)

> nobody is entitled to this guy's pip account or GitHub account just because he rewrote the library

this guy's also not entitled to takeover what is communal, exactly in the same manner.

Someone answered it long ago in quite some details. Feel free to have a look: https://stackoverflow.com/a/11455485

hoopyKt 18 hours ago [-]

You're ignoring the part where the maintainer demonstrated that version 7 isn't a relicense of LGPL work but a complete rewrite based on public domain research and algorithms. That stack overflow article is irrelevant to this situation, and again, versions 6 and before have not been "taken over" they still exist exactly as they did a week ago and are still available to anyone that wants them in full license compliance. LGPL requires the source be made available, not that the source's distribution channels never be used for something else.

amtamt 16 hours ago [-]

Then we are back to... if this is a complete re-write, why not a new name for new code?

I guess it's futile to argue in this circular logic when full picture is not considered and argument are being put forward only for the sake of winning argument. Have a good $time_of_day.

kreco 2 days ago [-]

A comment from 2021:

> Unfortunately, because the code that chardet was originally based on was LGPL, we don't really have a way to relicense it. Believe me, if we could, I would. There was talk of chardet being added to the standard library, and that was deemed impossible because of being unable to change the license.

So the person that did the rewrite knew this was a dive into dangerous water. That's so disrespectful.

Pannoniae 2 days ago [-]

[flagged]

ks2048 2 days ago [-]

Why didn't he just start a new project?

scosman 2 days ago [-]

Sounds like they didn’t build a proper clean room setup: the agent writing the code could see the original code.

Question: if they had built one using AI teams in both “rooms”, one writing a spec the other implementing, would that be fine? You’d need to verify spec doesn’t include source code, but that’s easy enough.

It seems to mostly follow the IBM-era precedent. However, since the model probably had the original code in its training data, maybe not? Maybe valid for closed source project but not open-source? Interesting question.

swiftcoder 2 days ago [-]

> Sounds like they didn’t build a proper clean room setup: the agent writing the code could see the original code.

It doesn't matter how they structure the agents. Since chardet is in the LLM training set, you can't claim any AI implementation thereof is clean room.

scosman 2 days ago [-]

Yeah I mention that in the question.

Might still be valid for closed source projects (probably is).

I think courts would need to weigh in on the open source side. There’s legal precedent is that you can use a derived work to generate a new unique work (the spec derived for the copyrighted code is very much a derived work). There are rulings that LLMs are transformative works, not just copies of training data.

LLMs can’t reproduce their entire training set. But this thinking is also ripe for misuse. I could always train or fine-tune a model on the original work so that it can reproduce the original. We quickly get into statistical arguments here.

It’s a really interesting question.

jacquesm 2 days ago [-]

I just wrote a long comment about that, but yes, you are on to something here.

The key to me is that the LLM itself is a derived work and that by definition it can not produce something original. Which in turn would make profiting off such a derived work created by an automated process from copyrighted works a case of wholesale copyright infringement. If you can get a judge to agree on that I predict the price of RAM will come down again.

swiftcoder 2 days ago [-]

> There’s legal precedent is that you can use a derived work to generate a new unique work (the spec derived for the copyrighted code is very much a derived work)

Indeed, but in the clean room scenario, the party who implements the spec has to be a separate entity that has never seen the code. Whether or not the LLM is copyright infringing is a separate question - it definitely has (at least some) familiarity with the code in question, which makes the "clean room" argument an uphill battle

bsza 2 days ago [-]

So by that logic, you're not legally allowed to implement your own character detector and license it as your own if you've ever looked at chardet's source code? I'm confused. I thought copyright laws protect intellectual property as-is, not the impression it leaves on someone.

jacquesm 2 days ago [-]

Well, you are not making things easier for yourself by looking at that source code if the author of chardet brings a case for copyright infringement against you.

The question is: if you had not looked at chardet's source would you still be able to create your work? If the answer is 'yes' then you probably shouldn't have looked at the source, you just made your defense immeasurably harder. And if the answer is 'no' then you probably should have just used chardet and respected its license.

bsza 2 days ago [-]

Sorry, but that sounds like a witch hunt to me, not modern law. Isn't the burden of proof on the accuser? I.e. the accuser has to prove that "this piece of code right here is a direct refactoring of my code, and here are the trivial and mechanical steps to produce one from the other"? And if they present no such evidence, we can all go home?

jacquesm 2 days ago [-]

No, the burden of proof is on the defender: if you didn't create it you are not the copyright holder.

Copyright is automatic for a reason, the simple act of creation is technically enough to establish copyright. But that mechanism means that if your claimed creation has an uncanny resemblance to an earlier, published creation or an unpublished earlier creation that you had access to that you are going to be in trouble when the real copyright holder is coming to call.

In short: just don't. Write your own stuff if you plan on passing it off as your own.

The accuser just needs to establish precedence.

So if you by your lonesome have never listened to the radio and tomorrow morning wake up and 'Billy Jean' springs from your brain you're going to get sued, even if the MJ estate won't be able to prove how you did it.

bsza 2 days ago [-]

That much I understand, but that question only comes up when the similarity is already an established fact, no? If we take the claim that this is a "complete rewrite" at face value, then there should be no reason for the code to have any uncanny similarities with chardet 6 beyond what is expectable from their functionality (which is not copyrightable) being the same, right?

So my (perhaps naive) understanding is if none can be found, then the author of chardet 1-6 simply doesn't have a case here, and we don't get to the point of asking "have you been exposed to the code?".

jacquesm 2 days ago [-]

No, they're on the record as this being a derived work. There is no argument here at all. Not finding proof in a copyright case when the author is on the record about the infringement is a complete non-issue.

You'd have to make that claim absent any proof and then there better not be any gross similarities between the two bodies of code that can not be explained away by coincidence.

And then there is such a thing as discovery. I've been party to a case like this and won because of some silly little details (mostly: identical typos) and another that was just a couple of lines of identical JavaScript (with all of the variable names changed). Copyright cases against large entities are much harder to win because they have deeper pockets but against smaller parties that are clearly infringing it is much easier.

When you're talking about documented protocols or interface specifications then it is a different thing, those have various exceptions and those vary from one jurisdiction to another.

What can help bolster the case for the defense is for instance accurate record keeping, who contributed what parts, sworn depositions by those individuals that they have come up with these parts by their lonesome, a delivery pace matching that which you would expect from that particular employee without any suspicious outliers in terms of amount of code dropped per interval and so on. Code copied from online sources being properly annotated with a reference to the source also helps, because if you don't do that then it's going to look like you have no problem putting your own copyright on someone else's code.

If it is real, then it is fairly easy to document that it is real. If it is not, after discovery has run its course it is usually fairly easy to prove that it is not if it is not.

swiftcoder 2 days ago [-]

> when the similarity is already an established fact

The similarity is an established fact - the authors claim that this is chardet, to the extent that they are even using the chardet name!

Had they written a similar tool with a different name, and placed it in its own repo, we might be having a very different discussion.

nz 2 days ago [-]

Not all legal systems put the burden of proof on the accuser. In fact, many legal systems have indefinite detentions, in which the government effectively imprisons a suspect, sometimes for months at a time. To take it a step further, the plea-bargain system of the USA, is really just a method to skip the entire legal process. After all, proving guilt is expensive, so why not just strong-arm a suspect into confessing? It also has the benefit of holding someone responsible for an injustice, even if the actual perpetrator cannot be found. By my personal standards, this is a corrupt system, but by the standards of the legal stratum of society, those KPIs look _solid_.

By contrast, in Germany (IIRC), false confessions are _illegal_, meaning that objective evidence is required.

Many legal systems follow the principle of "innocent until proven guilty", but also have many "escape hatches" that let them side-step the actual process that is supposed to guarantee that ideal principle.

EDIT: And that is just modern society. Past societies have had trial by ordeal and trial by combat, neither of which has anything to do with proof and evidence. Many such archaic proof procedures survive in modern legal systems, in a modernized and bureaucratized way. In some sense, modern trials are a test of who has the more expensive attorney (as opposed to who has a more skilled champion or combatant).

pocksuppet 2 days ago [-]

This is a balance of probabilities standard of proof. Both sides have the same burden of proof, it's equally split. Whoever has the stronger proof wins.

swiftcoder 2 days ago [-]

> if you've ever looked at chardet's source code

If you wish to be able to claim in court that it is a "clean room" implementation, yes.

Clean room implementations are specifically where a company firewalls the implementing team off from any knowledge of the original implementation, in order to be able to swear in court that their implementation does not make any use of the original code (which they are in such a case likely not licensed to use).

zozbot234 2 days ago [-]

This seems right to me. If you ask a LLM to derive a spec that has no expressive element of the original code (a clean-room human team can carefully verify this), and then ask another instance of the LLM (with fresh context) to write out code from the spec, how is that different from a "clean room" rewrite? The agent that writes the new code only ever sees the spec, and by assumption (the assumption that's made in all clean room rewrites) the spec is purely factual with all copyrightable expression having been distilled out. But the "deriving the spec (and verifying that it's as clean as possible)" is crucial and cannot be skipped!

sigseg1v 2 days ago [-]

How would a team verify this for any current model? They would have to observe and control all training data. In practice, any currently available model that is good enough to perform this task likely fails the clean room criteria due to having a copy of the source code of the project it wants to rewrite. At that point it's basically an expensive lossy copy paste.

zozbot234 2 days ago [-]

You can always verify the output. Unless the problem being solved really is exceedingly specific and non-trivial, it's at least unlikely that the AI will rip off recognizable expression from the original work. The work may be part of the training but so are many millions of completely unrelated works, so any "family resemblance" would have to be there for very specific reasons about what's being implemented.

oytis 2 days ago [-]

It requires the original project to not be in the training data for the model for it to be a clean room rewrite

zozbot234 2 days ago [-]

That only matters if expression of the original project really does end up in the rewrite, doesn't it? This can be checked for (by the team with access to the code) and it's also quite unlikely at least. It's not trivial at all to have an LLM replicate their training verbatim: even when feasible (the Harry Potter case, a work that's going to be massively overweighted in training due to its popularity) it takes very specific prompting and hinting.

oytis 2 days ago [-]

> That only matters if expression of the original project really does end up in the rewrite, doesn't it?

No, I don't think so. I hate comparing LLMs with humans, but for a human being familiar with the original code might disqualify them from writing a differently-licensed version.

Anyway, LLMs are not human, so as many courts confirmed, their output is not copyrightable at all, under any license.

toyg 2 days ago [-]

Uh, this is just a curiosity, but do you have a reference for that last argument?

If true, it would mean most commercial code being developed today, since it's increasingly AI-generated, would actually be copyright-free. I don't think most Western courts would uphold that position.

duskdozer 2 days ago [-]

https://news.ycombinator.com/item?id=47232289

pseudalopex 2 days ago [-]

The headline was misleading. The courts avoided to decide what Thaler could have copyrighted because he said he was not the author.

vkou 2 days ago [-]

> That only matters if expression of the original project really does end up in the rewrite, doesn't it?

If that were the case, nobody would bother with clean-room rewrites.

nneonneo 2 days ago [-]

Somewhat annoyingly, there's been research that suggests that models can pass information to each other via (effectively) steganographic techniques - specific but apparently harmless choices of tokens, wordings, and so on; see https://arxiv.org/abs/1712.02950 and https://alignment.anthropic.com/2025/subliminal-learning/ for some simple examples.

While it feels unlikely that a simple "write this spec from this code" + "write this code from this spec" loop would actually trigger this kind of hiding behaviour, an LLM trained to accurately reproduce code from such a loop definitely would be capable of hiding code details within the spec - and you can't reasonably prove that the frontier LLMs have not been trained to do so.

actionfromafar 2 days ago [-]

Yeah I think, the Compaq / IBM precedent can only superficially apply. It would be like having two teams only meet in a room full of documentation - but both teams crammed the source code the day before. (That, the source code you are "reverse engineering" is in the training data.) It doesn't make sense.

Also, it's weird that it's okay apparently to use pirated materials to teach an LLM, but maybe not to disseminate what the LLM then tells you.

duskdozer 2 days ago [-]

Not if the codebase was included in training the implementer.

fergie 2 days ago [-]

Answer: probably not, as API-topography is also a part of copyright

Edit: this is wrong

Tiberium 2 days ago [-]

Didn't the Google - Oracle case about Java APIs in Android https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_.... directly disprove this?

looperhacks 2 days ago [-]

In the end, the supreme court case decided that the re-implementation fell under fair use, it did not answer the copyright question.

scosman 2 days ago [-]

The courts decided that wasn’t true for IBM, Java and many other cases. API typography describes functionality, which isn’t copyrightable (IANAL).

Keyframe 2 days ago [-]

Wasn't Oracle vs Google about all of that?

mytailorisrich 2 days ago [-]

> Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation).

I don't think that the second sentence is a valid claim per se, it depends on what this "rewritten code" actually looks like (IANAL).

Edit: my understanding of "clean room implementation" is that it is a good defence to a copyright infrigement claim because there cannot be infringement if you don't know the original work. However it does not mean that NOT "clean room implementation" implies infrigement, it's just that it is potentially harder to defend against a claim if the original work was known.

bo1024 2 days ago [-]

I agree that (while the ethics of this are a different issue) the copyright question is not obviously clear-cut. Though IANAL.

As the LGPL says:

> A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".)

Is v7.0.0 a [derivative work](https://en.wikipedia.org/wiki/Derivative_work)? It seems to depend on the details of the source code (implementing the same API is not copyright infringement).

jerven 2 days ago [-]

I was wondering how the existing case law of translated works, from one language to an other works here. It would at suggest that this is an infringement of the license especially because of the lack of creativity. But IANAL and of course no idea of applicable case law.

_ache_ 2 days ago [-]

"Exposure" means here, I think, that they feed the 6.X code version to Claude.

Radle 2 days ago [-]

the ai copy pasted the existing project. How can such a procedure not fall under copyright?

Especially now that ai can do this for any kind of intellectual property, like images, books or sourcecode. If judges would allow an ai rewrite to count as an original creation, copyright as we know it completely ends world wide.

Instead whats more likely is that no one is gonna buy that shit

charcircuit 2 days ago [-]

>the ai copy pasted the existing project.

The change log says the implementation is completely different, not a copy paste. Is that wrong?

>Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved.

fzeroracer 2 days ago [-]

It's up to them to prove that a) the original implementation was not part of whatever data set said AI used and b) that the engineers in question did not use the original as a basis.

charcircuit 2 days ago [-]

It's up to the accuser to prove that they copied it and did not actually write it from scratch as they claimed.

fzeroracer 2 days ago [-]

No, that's not how copyright laws work. Especially in a world where the starting point is the accused making something and marketing it as someone else's IP with a license change.

Ukv 2 days ago [-]

It's still on the claimant to establish copying, which usually involves showing that the two works are substantially similar in protected elements. That the defendants had access to the original helps establish copying, but isn't on its own sufficient.

Only after that would the burden be on the defendants, such as to give a defense that their usage is sufficiently transformative to qualify as fair use.

spacedcowboy 2 days ago [-]

I came here to say this. While I agree with Mark that what they’re doing is not nice, I’m not sure it’s wrong. A clean-room implementation is one way the industry worked around licensing in the past (and present, I guess), but it’s not a requirement in law as far as I know.

I’m not sure that “a total rewrite” wouldn’t, in fact, pass muster - depending on how much of a rewrite it was of course. The ‘clean room’ approach was just invented as a plausible-sounding story to head off gratuitous lawsuits. This doesn’t look as defensible against the threat of a lawsuit, but it doesn’t mean it wouldn’t win that lawsuit (I’m not saying it would, I haven’t read or compared the code vs its original). Google copied the entire API of the Java language, and got away with it when Oracle sued. Things in a courtroom can often go in surprising ways…

[edit: negative votes, huh, that’s a first for a while… looks like Reddit/Slashdot-style “downvote if you don’t like what is being said” is alive and well on HN]

duskdozer 2 days ago [-]

I spent like two minutes looking at the diff between the original and the supposed "clean room" implementation [1] and already found identical classes, variable names, methods, and parameters. It looks like there was no actual attempt at clean-rooming this, regardless of whether that "counts".

[1]https://github.com/chardet/chardet/compare/6.0.0.post1...7.0...

toyg 2 days ago [-]

Lol at the statement that "clean room" would have been invented to scare people from suing. It's the opposite: clean room is a fairly-desperate attempt to pre-empt accusations in court when it is expected that the "derivative" argument will be very strong, in order to then piggyback on the doctrine about interoperability. Sometimes it works, but it's a very high bar to clear.

actionfromafar 2 days ago [-]

I thought we were debating if it was legal, not if it's wrong. The law is about creativity. Was this creative or a more mechanical translation?

klustregrif 2 days ago [-]

It will hold up in court. The line of argument of “well I went into a dark room with only the first Harry Potter book and a type writer and reproduced the entire work, so now I own the rewrite” doesn’t hold up in court, it doesn’t either when when you put AI in the mix. It doesn’t matter if the result is slightly different, a judge will rule based on the fact that this even is literally what the law is intended to prevent, it’s not a case of which incantation or secret sentence you should utter to free the work of its existing license.

mytailorisrich 2 days ago [-]

> “well I went into a dark room with only the first Harry Potter book and a type writer and reproduced the entire work, so now I own the rewrite”

This is not a good analogy.

A "rewrite" in context here is not a reproduction of the original work but a different work that is functionally equivalent, or at least that is the claim.

IanCal 2 days ago [-]

Possibly important is that it’s largely api compatible but it’s not functionally equivalent in that its performance (as accuracy not just speed) is different.

2 days ago [-]

red_admiral 2 days ago [-]

To stay with the analogy, Harry Potter is not a rewrite of A Wizard of Earthsea, even if they both contain schools that teach magic.

QuadmasterXLII 2 days ago [-]

“Mr Teacher, how many words do I have to change after copy pasting wikipedia so its not plagiarism?” has grown up and entered the workforce.

Pin your dependency versions people! With hashes at this point, cant trust anybody out here.

jmyeet 2 days ago [-]

There's a subtext in your point that I want to expand on.

Tech people, particularly engineers, tend to make a fundamental error when dealing with the law that almost always causes them to make wrong conclusions. And that error is that they look for technical compliance when so much of the law is subjective and holistic.

An example I like to use is people who do something illegal on the Internet and then use the argument "you can't prove I did it (with absolute certainty)". It could've been someone who hacked your Wifi. You don't know who on the Wifi did it, etc. But the law will look at the totality of the evidence. Did the activity occur when you were at home and stop when you weren't? How likely are alternative explanations? Etc.

All of that will be considered based on some legal standard depending on the venue. In civil court that tends to be "the preponderance of the evidence" (meaning more likely than not) while in criminal court it's "beyond a reasonable doubt" (which is a much higher standard).

So, using your example, an engineer will often fall into a trap of thinking they can substitute enough words to have a new original work, Ship of Theseus-like. And the law simply doesn't work that way.

So, when this gets to a court (which it will, it's not a question of "if"), the court will consider how necessary the source work was to what you did. If you used it for a direct translation (eg from C++ to Go) then you're going to lose. My prediction is that even using it in training data will be cause for a copyright claim.

If you use Moby Dick in your training data and ask an LLM to write a book like Moby Dick (either explicitly or implicitly) then you're going to have an issue. Even if you split responsibilities so one LLM (training on Moby Dick) comes up with a structure/prompt and another LLM (not trained on Moby Dick) writes it, I don't think that'll really help you avoid the issue.

sumtechguy 2 days ago [-]

> So, when this gets to a court (which it will, it's not a question of "if"), the court will consider how necessary the source work was to what you did. If you used it for a direct translation (eg from C++ to Go) then you're going to lose. My prediction is that even using it in training data will be cause for a copyright claim.

This has a lot of similarity to when colorization of film started popping up. Did colorizing black and white movies suddenly change the copyright of the film? At this point is seems mostly the courts say no. But you may find sometimes people rule the other way and say yes. But it takes time and a lot of effort to get what in general people want.

But basically if you start with a 'spec' then make something you probably can get a wholly owned new thing. But if you start with the old thing and just transform it in some way. You can do that. But the original copyright holders still have rights too to the thing you mangled too.

If I remember right they called it 'color of copyright' or something like that.

The LLM bits you are probably right. But that has not been worked out by the law or the courts yet. So the courts may make up new case law around it. Or the lawmakers might get ahead of it and say something (unlikely).

pas 1 hours ago [-]

it's going to be fun when the same LLM output in one jurisdiction will be a new original work whereas in a different one it will be a derivative one

(This is/was already the case with software patents for US and EU, right?)

pocksuppet 2 days ago [-]

A work could even have two copyrights! Copying a colorized film could require the permission of both the studio that made the film, and the studio that colorized it.

npongratz 2 days ago [-]

> And that error is that they look for technical compliance when so much of the law is subjective and holistic.

I know it sounds like an oversimplification, but "got off on a technicality" is a common thing among the well-connected and well-heeled. Sure, us nerds probably focus too much on the "technicality" part, since we are by definition technical, but the rest is wishy-washy, unfair BS as far as many of our brains work much of the time.

jmyeet 2 days ago [-]

"Get off on a technicality" is largely police propaganda. The "technicality" tends to be the police violated their rights in some way or did something illegal.

And if you get to trial (without being coerced into a guilty plea for something you may not have done [1]), the cops will lie constantly in police reports and even on the stand. It happens so often it has a name: testalying [2].

The well-connected don't really get off on a "technicality". They simply never get charged in the first place. Only two people were ever charged because of Jeffrey Epstein. One is Epstein, who died, and the other is Ghislane Maxwell who got convicted of trafficking minors to... nobody in particular... and is now in a low-security work camp it's technically illegal for sex offenders to be in.

And even if somehow you, as a connected person, are charged and convicted, well you just buy a pardon [3].

[1]: https://www.vera.org/news/how-the-criminal-legal-system-coer...

[2]: https://www.chicagoappleseed.org/2020/11/09/testilying/

[3]: https://www.propublica.org/article/trump-pardons-clemency-ge...

starkparker 2 days ago [-]

Everyone armchair debating the licensing side of this frustrates discussion, because the only licensing discussion that matters is the one in front of a court. Until and unless one happens, this is just a boring hobby.

Otherwise all this rewrite accomplishes is a 2.3% accuracy improvement and some performance gains that might not be relevant in production, in exchange for a broken test suite, breaking changes, and unnecessary legal and ethical risks pushed out as an update to what was already a stable project.

If it's truly a sufficiently separate project that it can be relicensed from LGPL, then it could've just been _a fully separate project with a new identity_, and the license change would've been at least harder to challenge. Instead, we're here.

p0w3n3d 2 days ago [-]

Wow that's hot. I was not aware that you need to be "untainted" by the original LGPL code. This could mean that...

All AI generated code is tainted with GPL/LGPL because the LLMs might have been taught with it

wongarsu 2 days ago [-]

Being completely untainted is the standard many reimplementations set for themselves to completely rule out legal trouble. For example ReactOS won't let you contribute if you have ever seen Windows code. Because if you have never seen it, there can be no allegation that you copied it.

That is however stricter than what's actually legally necessary. It's just that the actual legal standard would require a court ruling to determine if you passed it, and everyone wants to avoid that. As a consequence there also aren't a lot of court cases to draw similarities to

indrora 2 days ago [-]

> For example ReactOS won't let you contribute if you have ever seen Windows code. Because if you have never seen it, there can be no allegation that you copied it.

I've heard this called in some circles "The curse of knowledge." The same thing applies to emulator developers, especially N64 developers (and now Nintendo emulator developers in general) after the Oman Archive and later Gigaleaks. There's an informal "If you read this, you can NEVER directly contribute to the development of that emulator, ever."

This comes to a head when a relatively unknown developer starts contributing oddly specific patches to an emulator.

p_l 2 days ago [-]

"Taint" requires that the code is demonstratably derivative from the *GPL licensed work.

This is actually harder standard than some people think.

The absolute clean room approaches in USA are there because they help short circuit a long lawsuit where a bigger corp can drag forever until you're broken.

SpicyLemonZest 2 days ago [-]

It's harder than some people think, but the author does a lot of the work when he names the resulting artifact "chardet v7.0.0". If I thought I was writing the kind of arms-length reimplementation that's required, I would never put it into the versioning scheme of the original, come on.

pas 60 minutes ago [-]

it can be API compatible and legally original

not to mention that it's not a complete copy, because it has different behavior (the better performance)

but of course we have to check the code too

actionfromafar 2 days ago [-]

[flagged]

greggoB 2 days ago [-]

Does "lonely" in this case encompass people who've formed relationshios with said LLMs?

orwin 2 days ago [-]

I'm not lonely! And I stopped shouting that since 24, because you know :/

allreduce 2 days ago [-]

Not a lawyer, but that always seemed naively correct to me.

However, the copyright system has always be a sham to protect US capital interests. So I would be very surprised if this is actually ruled/enforced. And in any case american legislators can just change the law.

kermatt 2 days ago [-]

On a side note, it is interesting to see Mark Pilgrim rise from the "dead": https://en.wikipedia.org/wiki/Mark_Pilgrim#%22Disappearance%...

His Python books, although a bit dated, are something I still recommend to new Python programmers.

darkwater 2 days ago [-]

Let's see how much time it takes for the Wikipedia page to mention the "a2mark" GitHub user :)

(I can hear a "challenge accepted" from some random HNer already)

tricorn 19 hours ago [-]

A rewrite based on functional equivalency is not infringing on the copyright as long as no creative expression was copied. That was the heart of the Google case, whether the API itself was creative expression or functionality.

There are many aspects to what can be considered creative expression, including names, organization, non-functional aspects. An algorithm would not be protected expression. If an AI can write it without reference to the original source code, using only documented behavior, then it would not be infringing (proving that it didn't copy anything from training data might be tough though). It also would not itself be copyrightable, except for elements that could be traced back as "authorship" to the humans who worked with the AI.

If LLMs can create GOOD software based only on functionality, not by copying expression, then they could reproduce every piece of GPL software and release it as Public Domain (which it would have to be if no human has any authorship in it). By the same principle that the GPL software wasn't infringing on the programs they copied functionality from, neither would the AI software. That's a big IF at this point, though, the part about producing GOOD software without copying.

hu3 2 days ago [-]

I torn on where the line should be drawn.

If the code is different but API compatible, Google Java vs Oracle Java case shows that if the implementation is different enough, it can be considered a new implementation. Clean room or not.

spoiler 2 days ago [-]

That whole clean room argument makes no sense. Project changed governance and was significantly refactored or reimplemented... I think the maintainers deserve to call it their own. Original-pre MIT release can stay LGPL.

I don't think this is a precedent either, plenty of projects changed licenses lol.

I keep kind mixing them up but the GPL licenses keep popping up as occasionally horror stories. Maybe the license is just poorly written for today's standards?

shaan7 2 days ago [-]

> plenty of projects changed licenses lol.

They usually did that with approval from existing license holders (except when they didn't, those were the bad cases for sure).

DarkmSparks 2 days ago [-]

No. Because they couldnt have done any of that refactoring without a licence to do so, and that licence forbids them from relicencing it.

spoiler 2 days ago [-]

Ok since this is not really answered... Hypothetically, If I'm a maintainer of this project. I decided I hate the implementation, it's naive, horrible performance, weird edge cases. I'm wiser today than 3 years ago.

I rewrite it, my head full of my own, original, new ideas. The results turn out great. There's a few if and while loops that look the same, and some public interfaces stayed the same. But all the guts are brand new, shiny, my own.

Do I have no rights to this code?

DarkmSparks 2 days ago [-]

You have all rights to the code that you wrote that is not "colored" by previous code. Aka "an original work"

But code that is any kind of derivative of code before it contains a complex mix of other peoples rights. It can be relicensed, but only if all authors large and small agree to the terms.

pocksuppet 2 days ago [-]

You have rights, but if it's a derivative, the original author might have rights too. If you made a substantial creative input, the original author can't copy your project without your permission, but neither can you copy theirs.

IanCal 2 days ago [-]

Hmm are we in a ship of Theseus/speciation area? Each individual step of refactoring would not cross the threshold but would a rewrite? Even if the end result was the same?

spoiler 2 days ago [-]

Let us also remember that certain architectural changes need to happen over a period of planned refractors. Nobody wants to read a 5000 line shotgun-blast looking diff

spoiler 2 days ago [-]

So effective, LGPL means you freely give all copyright for your work to the license holder? Even if the license holder has moved on from the project?

What if I decide to make a JS or Rust implementation of this project and use it as inspiration? Does that mean I'm no longer doing a "clean room" implementation and my project is contaminated by LGPL too?

justinclift 2 days ago [-]

The standard way of "relicensing" a project is to contact all of the prior code contributors about it and get their ok.

Generally relicensing is done in good faith for a good reason, so pretty much everyone ok's it.

Trickiness can turn up when code contributors aren't contactable (ie dead, missing, etc), and I'm unsure of the legally sound approach to that.

toyg 2 days ago [-]

The legally-sound approach is to keep track of your actions, so you can later prove you've made "reasonable" efforts to contact them.

Meneth 2 days ago [-]

If a copyright holder does not give you permission, you can't legally relicense. Even if they're dead.

If they're dead and their estate doesn't care, you might pirate it without getting sued, but any recipient of the new work would be just as liable as you are, and they'd know that, so I probably wouldn't risk it.

2 days ago [-]

user34283 2 days ago [-]

Afaik you can do whatever you like to GPL licensed code, you do not need a license to refactor it.

I understand you need to publish the source code of your modifications, if you distribute them outside of your company.

skeledrew 2 days ago [-]

You can do anything except change the license, which ensures that right to do anything passes on to others in perpetuity. That's how it's designed.

duskdozer 2 days ago [-]

You also can't relicense it to be less restrictive

pocksuppet 2 days ago [-]

Or more restrictive! There are certain exceptions permitting combinations of open-source code however.

scosman 2 days ago [-]

Governance change or refactoring don’t give you a right to relicense someone else’s work. It needs to be a whole new work, which you own the copyright to.

spoiler 2 days ago [-]

Which is what happened here? The maintainers did a rewrite, apparently, but it's not enough!

duskdozer 2 days ago [-]

No, that defeats the entire purpose of GPL licenses

darkwater 2 days ago [-]

It's not clear at all why the current maintainers wanted/needed this re-licensing. I guess that their employee, Monarch Money, wants to use derivative work in their application without releasing the changes? It was already LGPL, perfect for a library, not GPL.

pseudalopex 2 days ago [-]

Python wouldn't take LGPL code in the standard library. And Dan Blanchard imagined more people would want to work on it.[1]

[1] https://github.com/chardet/chardet/issues/327#issuecomment-4...

darkwater 2 days ago [-]

That take and the link within it to https://github.com/chardet/chardet/issues/36 which is from 2014 (!!) and revived in 2021 (!) make me think now that Dan Blanchard is in good faith, but still acted in a naive manner. Rewriting from "scratch" with Claude Code and basically taking over the project by changing the license, all in one single pull request, well... it's not going to end well on the public relations side.

tokai 2 days ago [-]

"I prefer MIT/BSD licenses just because they're simpler"[0]

Seems like there is no real point, just vibes.

[0] https://github.com/chardet/chardet/issues/36

binaryturtle 2 days ago [-]

Isn't the real issue here that tons of projects that depend on the "chardet" now drag in some crappy still unverified AI slop? AI forgery poisoning, IMHO.

Why does this new project here needed to replace the original like that in this dishonourable way? The proper way would have been to create a proper new project.

Note: even Python's own pip drags this in as dependency it seems (hopefully they'll stick to a proper version)

robinsonb5 2 days ago [-]

This indeed the real issue (not the AI angle per se, but the wholesale replacement. The licensing issue is real, but less important IMO).

Half a million lines of code have been deleted and replaced over the course of four days, directly to the main branch with no opportunity for community review and testing. (I've no idea whether depending projects use main or the stable branch, but stable is nearly 4 years old at this point, so while I hope it's the version depending projects use, I wouldn't put money on it.)

The whole thing smells a lot like a supply chain attack - and even if it's in good faith, that's one hell of a lot of code to be reviewed in order to make sure.

duskdozer 2 days ago [-]

The test coverage is going to be entirely different, unless of course they copied the tests, which would then preclude them from changing the license. They didn't even bother to make sure the CI passed on merging a major version release https://github.com/chardet/chardet/actions/runs/22563903687/...

earthscienceman 2 days ago [-]

Woah. As someone not in this particular community but dependent on these tools this is exactly the terrifying underbelly we've all discussed with the user architecture of tools like pip and npm. It's horrifying that a major component just got torn apart, rebuilt, and deployed to anyone who uses those python ecosystems (... many millions? ... billions of people?)

adrian17 2 days ago [-]

The drop"-in" compatibility claims are also just wrong? I ran it on the old test suite from 6.0 (which is completely absent now), and quickly checking:

- the outputs, even if correctly deduced, are often incompatible: "utf-16be" turns into "utf-16-be", "UTF-16" turns into "utf-16-le" etc. FWIW, the old version appears to have been a bit of a mess (having had "UTF-16", "utf-16be" and "utf-16le" among its outputs) but I still wouldn't call the new version _compatible_,

- similarly, all `ascii` turn into `Windows-1252`

- sometimes it really does appear more accurate,

- but sometimes it appears to flip between wider families of closely related encodings, like one SHIFT_JIS test (confidence 0.99) turns into cp932 (confidence 0.34), or the whole family of tests that were determined as gb18030 (chinese) are now sometimes determined as gb2312 (the older subset of gb18030), and one even as cp1006, which AFAIK is just wrong.

As for performance claims, they appear not entirely false - analyzing all files took 20s, versus 150s with v6.0. However, looks like the library sometimes takes 2s to lazy initialize something, which means that if one uses `chardetect` CLI instead of Python API, you'll pay this cost each time and get several times slower instead.

Oh, and this "Negligible import memory (96 B)" is just silly and obviously wrong.

duskdozer 2 days ago [-]

Yeah, there's really low quality code added if you take a look.

Ardren 2 days ago [-]

Huh, 7e25bf4 was a big commit.

  2,305 files changed
  +0 -546871 lines changed

https://github.com/chardet/chardet/commit/7e25bf40bb4ae68848...

vintagedave 2 days ago [-]

Impossible to code inspect and verify. Someone else commented 'smells like a supply chain attack' and while I'm sure it's not intended to be, there is no way to verify. And who believes, in half a million lines of code, no bugs have been introduced?

duckerude 2 days ago [-]

Perhaps notable: years ago the original original chardet was rewritten with a different license: https://github.com/hsivonen/chardetng

AFAIK this was not a clean room reimplementation. But since it was rewritten by hand, into a different language, with not just a different internal design but a different API, I could easily buy that chardetng doesn't infringe while Python chardet 7 does.

raggi 2 days ago [-]

Look, forget the details, step back and consider the implications of the principle.

Someone should not be able to write a semi-common core utility, provide it as a public good, abandon it for over a decade, and yet continue to hold the rest of the world hostage just because of provenance. That’s a trap and it’s not in any public interest.

The true value of these things only comes from use. The extreme positions for ideals might be nice at times, but for example we still don’t have public access to printer firmware. Most of this ideology has failed in key originating goals and continues to cause headaches.

If we’re going to share, share. If you don’t want to share, don’t. But let’s not setup terminal traps, no one benefits from that.

If we flip this back around though, shouldn’t this all be MPL and Netscape communications? (Edit: turns out they had an argument about that in the past on their own issue tracker: https://github.com/chardet/chardet/issues/36)

luke5441 2 days ago [-]

LGPL means "gift to the world". The license ensures that any modification/improvements stay a gift to the world.

People not being okay with having to share their improvements not being able to use the software is by design.

I don't get how you get from there to some sinister hostage taking situation.

Also everyone that contributes to the previous LGPL verison probably contributed under LGPL only, so it is now just one guy...

raggi 2 days ago [-]

LGPL applies to the LGPL’d code, not to every piece of code someone might add to the repository or under the same name implicitly.

The claim being made is that because some prior implementation was licensed one way, all other implementations must also be licensed as such.

AIUI the code has provenance in Netscape, prior to the chardet library, and the Netscape code has provenance in academic literature.

Now the question of what constitutes a rewrite is complex, and maybe somewhat more complex with the AI involvement, but if we take the current maintainers story as honest they almost certainly passed the bar of independence for the code.

il-b 2 days ago [-]

Fork it?

raggi 2 days ago [-]

That certainly probably would have reduced the noise.

malklera 2 days ago [-]

You have to look from two sides this Moral: What is right or wrong? If they wanted to change the license, they could have made another project with another name, and nobody would care, but they wanted the reputation of the project.

Legal: How much are you willing to spend on litigation? The only real "protection" by copyright is in court.

hoopyKt 1 days ago [-]

Further, while the copyright of the original code and its derivatives are still owned by the original author does that hold true to the rights to the name and package namespace? Supposing this were indisputably a clean room implementation instead of an unclear one, would the maintainers then have a right to relicense under the same name? I would imagine that yes, they do have the right to relicense in that case because the copyright only applies to the code, not the project itself.

Other questions that haven't really been explored before also are maintained: the original author hasn't been involved in some time, technically the copyright of all code since still belongs to those authors who might be bound by LGPL but are also the only ones with the right to enforce it and could simply choose not to. What then?

oytis 2 days ago [-]

I wonder if LLMs will push the industry towards protecting their IP with patents like the other branches of engineering rather than copyright. If you patent a general idea of how your software works then no rewrite will be able to lift this protection.

skeledrew 2 days ago [-]

General patents aren't allowed.

pocksuppet 2 days ago [-]

If there wasn't a specific limitation against software patents, you could patent the process of "first we look at the first 3 bytes for a BOM, then we take statistics of the most frequently used bytes and match them against this table..." which is the software equivalent of "the device contains a large box full of water, underneath which is a flame and on the upper side is a pipe leading to a valve box which contains..."

Either one would still have to meet the requirements like being sufficiently non-obvious. The first steam engine was patented, even though you couldn't patent one any more.

ajnin 1 days ago [-]

Interestingly the original author ported the code from the Mozilla C++ version which is licensed as "MPL 1.1 or LGPL" (it's a bit unclear as the readme says that but the license file mentions only MPL). So the author did already relicence the project in a way by licensing the port as LGPL only.

Chris2048 1 days ago [-]

> licensing the port as LGPL only

They ported the LGPL version. There's no obligation to port any other, unless "MPL 1.1 or LGPL" is itself some kind of singular licence.

ondrek 1 days ago [-]

With continuing the same repo, name, and reputation built on LGPL code.. you can’t keep the goodwill and drop the contract that created it :-)

jrochkind1 2 days ago [-]

Does anyone understand the intent behind the changed license on the package, why are the current maintainers trying to do it in the first place? What's actually going on?

red_admiral 2 days ago [-]

Unicode detection is the kind of utility the language maintainers want in their package collection if not in the standard library, and programmers who have to do anything with "plain text" files might want to rely on.

Releasing a core library like this under a genuinely free licence (MIT) is a service to anyone working in the ecosystem.

pocksuppet 2 days ago [-]

Maybe. Enabling more GPL software to become proprietary isn't exactly a service.

red_admiral 2 days ago [-]

I think they moved chardet from GPL to MIT? If the maintainer made future versions proprietery, they'd surely be forked and then kicked out of the python package repo?

pocksuppet 2 days ago [-]

That's what MIT does. It lets downstream versions become proprietary.

2 days ago [-]

omnibrain 2 days ago [-]

Nobody that is not already writing (L)GPL licensed software wants to pull LGPL licensed libraries into his software. I'm not a lawyer. If you have it in separate object artefacts that you can dynamically link to, you should be fine. Everything else may be more difficult.

See what FFmpeg writes on this topic: https://ffmpeg.org/legal.html

pseudalopex 2 days ago [-]

https://news.ycombinator.com/item?id=47260749

PaulDavisThe1st 2 days ago [-]

I am confused. In the USA, there has been a clear rule that machine-generated code cannot be copyright. If the "new implementation" was in fact created by Claude (which is my impression), then nobody holds any copyright to the code, and it cannot be licensed under any license at all.

I am sure I am missing something ... what is it?

pas 33 minutes ago [-]

The machine cannot get authorship, but just as images created by humans with PhotoShop and all kinds of machinery are still copyrighted to the human creator - unless they explicitly set up some circumstances where the process of creation happens completely without them - code/software produced by a machine instructed by a human should get copyright (either original or derivative).

Unless the human is so far removed from the output. (And how far is far enough is probably very much depends on the circumstances and unless/until case law or Congress gives us some unifying criteria, it's going to be up to how the judge and the jury feels.)

For example someone set up a system where their dog ends up promoting some AI to make video games. This might be the closest to the case of that photo.

Though there the court ruled only that PETA (as friend of the monkey) cannot sue the photographer, because the monkey cannot be a copyright holder, but very importantly it didn't rule on the authorship of the photographer. (And thus the Wikimedia metadata which states that the image is in thr public domain, is simply their opinion.)

next_xibalba 2 days ago [-]

This is effectively a contract. You can put anything you want in a contract, but contracts are enforceable to only to the extent they comply with the law (statutes, case law, the constitution, etc.)

So to settle this, someone needs to violate this license and get sued. Or maybe proactively sue?

PaulDavisThe1st 2 days ago [-]

Remember: this applies to all LLM-generated code, not just chardet. No LLM-generated code is copyrightable, and thus cannot be licensed. The legal challenge could come in any context where LLMs have been used and the code placed under any license (proprietary or otherwise).

Which is going cause a collision between the "not copyrightable" and "derived from copyrighted work" angles.

nilsbunger 2 days ago [-]

Does using the old version’s tests to create a new version make it a derivative work? That’s certainly some pretty tight coupling.

pocksuppet 2 days ago [-]

My gut feeling is no, a work is not a derivative of a compliance test suite that it passes and was used to guide it. But I'm not a lawyer.

nilsbunger 2 days ago [-]

We’re going to be in a very weird place:

* LLMs make it trivial to recreate almost any software using its test suite (maybe not a derivative work)

* LLM generated code has no copyright (according to current court interpretations)

Soon we will be able to make an unlicensed copy of anything if we have its test suite and a little money for tokens.

joquarky 1 days ago [-]

> We’re going to be in a very weird place

It's only weird because copyright is an unnatural abstraction that requires increasing amounts of legal upkeep to maintain.

And by the standard of promoting the progress of science and the useful arts, it is rapidly becoming obsolete.

pocksuppet 2 days ago [-]

* LLM generated code from a test suite is crap

pmarreck 2 days ago [-]

I have successfully reproduced a few projects with LLM assistance via strict cleanroom rules and only working off public specifications.

b40d-48b2-979e 2 days ago [-]

Once you use a LLM, the room is no longer clean.

pmarreck 2 days ago [-]

Does that hold up if it’s in an entirely different language?

Because I don’t think so

bdangubic 2 days ago [-]

[flagged]

pocksuppet 2 days ago [-]

Human training and AI training are legally distinct processes.

b40d-48b2-979e 2 days ago [-]

No, it isn't. A human wasn't trained on the material they're trying to reproduce.

geenat 2 days ago [-]

FastAPI's underlying library, Starlette, has been going through licensing shenanigans too lately: https://github.com/Kludex/starlette/issues/3042

Be really careful who you give your projects keys to, folks!

Orygin 2 days ago [-]

That doesn't seem related at all, this is just adding attribution, not changing the license through LLM-washing

oxag3n 2 days ago [-]

AI "translation" kills any incentive to open source code.

What's worse - disassembler+AI is good enough to "translate" the binary into working source code, probably in a different programming language than then original.

charcircuit 2 days ago [-]

Clean room implementations are not necessary to avoid copyright infringement.

red_admiral 2 days ago [-]

The law is what a court says it is; there is precedent for decisions on human rewrites but LLM (assisted) code might still be fairly uncharted territory.

kreco 2 days ago [-]

This is not unprecedented, TCC relicensed part of its code by being approved by all authors:

https://repo.or.cz/tinycc.git/blob/3d963aebcd533da278f086a3e...

The interesting part is that the original author is against it but some people claims it could be a rewrite and not a derivative work.

I don't know the legal basis of everything but it's definitly not morally correct toward the original author.

noosphr 2 days ago [-]

If the code is written by an Ai they can't copyright it. It is all public domain.

0sdi 2 days ago [-]

surely this can also be used to turn proprietary software into free

alpaca128 2 days ago [-]

If you can afford to defend yourself in court

nailer 2 days ago [-]

> chardet 7.0 is a ground-up, MIT-licensed rewrite of chardet. Same package name, same public API

Licensing aside, morally you don't rewrite someone else's project with the same package name.

gpm 1 days ago [-]

I've got some serious moral questions about "rewrite your own widely used project from scratch with the same package name"... but I don't think it's fair to call this "someone else's project" when the OP has apparently been the only maintainer working on the project for 13 years...

nailer 1 days ago [-]

Creator is still the creator and retains rights to the name.

byteatwork 1 days ago [-]

maybe we are now heading to the new world where software become 'free goods' like air?

byteatwork 1 days ago [-]

maybe we are heading to a new world where software become "free goods" like air?

2 days ago [-]

tantalor 1 days ago [-]