Why GitHub doesn’t violate free software licences

🔗 forgoodeyesonly.codeberg.page/

’s new code completion is an trained using . Many see this as a copyright infringement of licences, but this is dangerous half-knowledge. Read here why this is the case and why stricter law won’t get us anywhere.

Follow

TL;DR:

  1. Scraping code simply isn't a infringement.
  2. outputs are no derivative works.
  3. As an artificial machine, Copilot is not an author in the meaning of copyright.
  4. doesn’t even claim copyright in the outputs.
  5. The outputs don’t reach the necessary level of creation to be copyright-protected.
  6. The AI's complexity is irrelevant for the protection of the outputs.
  7. GitHub's terms of use override the repo licences.

While doesn't violate free licences (see ⬆️), there are plenty of reasons to anyway. Here are a few:

1. Since we're developers, our tools should be too.
2. Monopolies are never a good idea.
3. By using a walled garden, we're excluding potential contributors.
4. By using products, we're supporting a producer of malware and an collaborator.

Instead, we should switch to platforms running @gitea, such as @codeberg. Also, @dachary and others are already working hard on federation in the @forgefriends project.

Read more on the blog: forgoodeyesonly.codeberg.page/

@pixelcodeapps And while you’re switching platforms, maybe re-investigate the choice of git. With Mercurial it becomes quite a bit easier to make decentralized tooling well-integrated. See for example b: foss.heptapod.net/mercurial/b @gitea @codeberg @dachary @forgefriends

@pixelcodeapps hello, first time finding Pixelcode at mastodon. Greetings!

@pixelcodeapps I'm sorry, your article is completely wrong.
Why are you supporting the actions of a criminal company?
>Scraping isn’t copyright infringement
Sure, downloading something *publicly available* isn't copyright infringement, but this isn't particularly relevant.
>Copilot outputs are no derivative works
>Yet, this in no way makes newly generated code fragments derived works, as they are entirely new creations; after all, there is usually no intersection between the functionalities of the original and output codes. The training data is usually only used to artificially “understand” the syntax and semantics of the programming language, in order to subsequently be able to create output with a completely new task.
This is complete garbage. The (((((AI))))) doesn't *learn* from the source, it only *encodes* source code in a different form.
You can quite easily get co-pilot to output GPL'd code *verbatim* with the wrong license on top, as at least 1 user has found.
If it really learned from such code, rather than encode it in a roundabout way, the above would *never* happen.
If what co-pilot does is not creating a derivative work, then compilers don't create derivative works when you use them to compile source code.
>Copilot is not an author
Correct, but users of whatever co-pilot outputs are responsible for whatever copyright infringement they commit with such.
>Copilot outputs don’t reach the necessary level of creation
Sure outputs are usually short, but co-pilot is happy to give you longer outputs.
Short size is not the sole indicator if something qualifies for copyright or not.
>constitute intellectual property
This term is an oxymoron. "intellectual property" does not exist: https://www.gnu.org/philosophy/not-ipr.en.html
>Do licences apply here at all?
Yes, they have taken huge amounts of work under many licenses, then proceeded to make a derivative work out of all of them.
>However, they overlook the fact that when you register with GitHub, you have to agree to their terms of service which explicitly state: >it is completely irrelevant whether a developer on GitHub demands compliance with his copyleft provisions.
The service terms only cover making identical copies and do *not* override any authors copyrights. A lot of software on github is based off others authors works, which have *not* agreed to such terms anyway, so such authors should demand compliance and demand it hard.

The following parts gets even more proprietary 1/2.
@pixelcodeapps 2/2
As for your repeated claims that co-pilot does not violate copyright, I would say *lawyers* at the SFC know a hell of a lot more than you.

>Commercial re-use is not unethical
There is nothing wrong with commercial usage and sale of software, unless such software is proprietary software.
You can't get much more unethical than co-pilot, as it was created by taking free software and turning it into proprietary software.
>In fact, it is quite unjust for someone to redistribute an application for the sole purpose of making a profit without having made any substantial contribution of his own.
That is not true in any way as long as you are honest and note that you didn't make any meaningful changes.
I don't care if you take free software verbatim and under the same license, try to sell it for a billion dollars, but don't come complaining when it doesn't sell.

>Such behaviour is also not excluded without reason by the For Good Eyes Only Licence (which allows commercial re-use only for substantial works of which the licensed work is merely a component).
That is a proprietary software license. I am completely DISGUSTED.
Did microsoft at least pay you, or are you doing this pro-bono from your hate for freedom?
>Scraping is not unethical but a cornerstone of software freedom
I haven't actually seen any complaints about scraping.
There's obviously no issue with the software being used for any purpose, except taking other peoples freedom.
>Double standards at their best >it is downright absurd that copyleft advocates on the one hand attack users of ethical licences (such as the For Good Eyes Only Licence) for not wishing to tolerate human rights violations
The software license is NOT the place to tackle human rights violations, as if you're ready to commit some of those, you'll use the software anyway without giving a stuff about the license.
Any attempt to do such ends up being another proprietary software license that we will fight tirelessly to eliminate along with the rest.
Software is a tool, like a hammer - some people use it to drive in nails as they should, but it is ridiculous to even try to stop them using it as a weapon with a license - as this will not stop them.

>Copyleft promotes monopoly
I'm sorry, not allowing software to be proprietized is not promoting a monopoly.
You can use the software for any purpose, as you wish, as long as that doesn't consists of turning it into proprietary software - which really isn't difficult unless you are an asshole.
>Software freedom means freedom of licence choice
Yes, you can choose whatever license you want, as long as such license isn't opposed to the quite reasonable terms.
If there's an issue, it's not the fault of the GPLv3, it's the fault of the incompatible license you selected.
>Stricter copyright law would only bring disadvantages
The current laws state that co-pilot is a violation, there's no need for stricter laws.
If other parties decide to make the copyright laws stricter, that's fine, as it'll make copyleft just as strong as well.
>In January 2022, the regional court in Hamburg ruled (308 O 130/19) that browser ad blockers do not constitute “unauthorised copying and/or reworking” of copyright-protected websites. In line with the copyleft advocates’ ideas of stricter copyright protection, it would therefore possibly be forbidden in future to use additional software to protect oneself from intrusive advertising and privacy-invading trackers.
Well no, the ultimate solution to the advertising problem is disabling JavaScript - there is no software with a license to violate them.
I'm not sure if ad-deniers even modify any part of the JavaScript, they usually just decline to load unwanted JavaScript files.
>How we should license our works instead >Instead, we should rather just use one of the conventional permissive licences for our works, such as the MIT licence.
Go ahead and use expat if you want to write proprietary software FOR FREE: https://lukesmith.xyz/articles/why-i-use-the-gpl-and-not-cuck-licenses/

Fellow brothers, lets continue to eliminate all forms of proprietary software, foreign and domestic - whether that at least admit it, or they say it is "open source".

@Suiseiseki I don't “support” GitHub, I advocate against using it. The article simply explains why Copilot isn't illegal. Please understand the difference.

Call what Copilot does whatever you want; re-arranging the characters of a copyright-protected work doesn't constitute a copyright infringement whatsoever.

Those few excerpts that are copied verbatim are not copyright-protectable because they are too short for that (in 99% less than 150 chars). And yes, size isn't the only factor for copyright protection – even longer excerpts might be not protectable.

Verbatim copying is not magically impossible just because one has “learnt” from code. Take the Fibonacci sequence for example: scribe.citizen4.eu/developers- There just aren't that many different ways of implementing it. There are tons of similar situations where even a human would repeat “someone else's” code, knowingly or unknowingly.

@Suiseiseki Compilers only translate the source code into machine code without changing anything about the funcionality, and they're so simple they're merely a tool of the programmer.

Well, “intellectual property” might be an oxymoron, but that doesn't change the fact that it's core to copyright law.

Author right's don't need to be overridden, GitHub just needs sufficient usage rights, which are in fact granted by the ToS.

Yes, uploading copyleft works to a third-party service, whose ToS aren't copyleft-compliant, is most likely a copyright infringement by the uploader.

@Suiseiseki The SFC's lawyers absolutely do know “a hell of a lot” more than me, but that doesn't change anything about the legal situation, you know? Funny how, for some reason, you seem to think that a billion-dollar company has not made sure not to risk tons of lawsuits.

Also, you might be interested in this paper by John A. Rothchild, Professor of Law, Wayne State University, and Daniel H. Rothchild, PhD candidate, University of California, who argue that “Copilot and its developer-customers likely do not infringe developers’ copyrights”: fsf.org/licensing/copilot/copy

There are regularly instances of F-Droid apps (even the F-Droid store itself) being sold on Google Play by third parties, sometimes even including ads and trackers. You might find that a great example of software freedom but I don't.

@Suiseiseki If you knew what the term “proprietary” means, you'd understand that something like “proprietary licences” doesn't exist: forgoodeyesonly.codeberg.page/

The main purpose of the For Good Eyes Only Licence is banning privacy invasions, for example by including third-party trackers in the derivative work. It doesn't make sense to ban something (in this context) as “trivial” as GDPR violations but on the other hand not to care about something much worse like human rights infringements.

Copyleft prevents licensees from uploading derivative works to third-party services whose ToS aren't copyleft-compliant, which is basically a form of vendor lock-in.

If you think stricter copyright law would “strengthen copyleft”, you must be a troll. It would mean that trivial code fragments, like said implementation of the Fibonacci sequence, would be copyright-protected, so that it would be much more easy to unknowingly commit IP infringements. Read this for reference: felixreda.eu/2021/07/github-co

@Suiseiseki The argument of copyright organisations is that ad-blockers modify websites in whole (not the JavaScript), by removing the ads from the display visible to the user.

@pixelcodeapps >If you knew what the term “proprietary” means, you'd understand that something like “proprietary licences” doesn't exist
I know what a proprietary software licenses is and that license is one of those.
>is banning privacy invasions, for example by including third-party trackers in the derivative work >something much worse like human rights infringements.
If people are willing to violate humans rights by violating privacy, they'll be happy to violate the law as well, thus they will just ignore your license.
Your proprietary software license isn't even going to achieve what you wanted.
>Copyleft prevents licensees from uploading derivative works to third-party services whose ToS aren't copyleft-compliant, which is basically a form of vendor lock-in.
Please cease writing garbage. You can upload GPLv3'd works to any services that aren't opposed to the terms. In cases where the service is opposed to the terms, there's no lock-in, rather such platform is locking-out such works. You're trying to claim that services not being about to turn GPL'd works into proprietary software is lock-in - good try.
>It would mean that trivial code fragments, like said implementation of the Fibonacci sequence, would be copyright-protected, so that it would be much more easy to unknowingly commit IP infringements.
Yes, trivial code fragments are copyrightable, but if you independent arrive at an implementation of the Fibonacci sequence, then you hold the copyright for that instance, as you wrote that.
Such is in no way related to a machine learning algorithm encoding an implementation of the Fibonacci sequence and then vomiting it out verbatim.
I'm sorry, you cannot confuse me by whipping out the "IP" term.
I've read the article and that author is very confused, a key sign is at the end, where they've written "World Intellectual Property Organization" without quotes.
>The argument of copyright organisations is that ad-blockers modify websites in whole (not the JavaScript), by removing the ads from the display visible to the user.
Such argument is complete garbage.
Most ads seems to be JavaScript based, and browsers load no JavaScript when you first load a page (the JavaScript is either embedded or in external script) so the ads really "pop-in", rather than being there in the first place.
I guess the same applies to images based ads - as they're loaded after the HTML page.
Thus, no "removal" takes place, the ads simply aren't loaded in first place.
I would be tyrannical to dictate how a user views a page.
A user may wish to download the page with wget, open it up in nano and render the HTML in their head after all.

@Suiseiseki By definition, licences grant rights, therefore they can't reserve all rights (that's the default due to copyright law), which is exactly what “proprietary” means. “Proprietary” is not a synonym for “not OSD-compliant”.

Almost everyone who violates the GDPR is not something like a human rights violator or war criminal etc. For example, the German national railway company is currently being sued for violating the GDPR by including mandatory trackers in their app. According to the For Good Eyes Only Licence, that would also be a copyright infringement (if the app was a derivative work).

@Suiseiseki
Almost every single content hosting platform has got ToS that allow them to use the content in ways that violate standard copyleft provisions. That's not unique to GitHub, but is also the case for GitLab and Gitea (servers). Therefore, if one is not a lawyer, the only way to make sure you don't commit a copyright infringement by re-distributing a copyleft work is not uploading derivative works to third-party services, but only to that service used by the original work. That is a de-facto vendor lock-in. It would be possible to host the code oneself on one's own server, but that's not an actual option for most developers.

@Suiseiseki “Yes, trivial code fragments are copyrightable”

That's plainly false. I'd suggest re-reading § 69a UrhG.

Regarding the ad-blocking stuff you're completely right. Your arguments are similar to the Hamburg regional court's argumentation why ad-blockers don't constitute copyright infringement.

@pixelcodeapps >“Proprietary” is not a synonym for “not OSD-compliant”.
Correct, but where did I claim that?
I regard "open source" to be the same thing as proprietary software, but with some trickery.
>Almost everyone who violates the GDPR is not something like a human rights violator or war criminal etc. For example, the German national railway company is currently being sued for violating the GDPR by including mandatory trackers in their app. According to the For Good Eyes Only Licence, that would also be a copyright infringement (if the app was a derivative work).
Why are you bringing up the EU's laws? I'm talking about USA copyright and related laws and the general applicability of those to the rest of the world.
>every single content hosting platform has got ToS that allow them to use the content in ways that violate standard copyleft provisions
Big claim.
>but is also the case for GitLab and Gitea (servers).
I wouldn't be surprised by gitlab, but I guess gitea.io is looking mighty proprietary as well.
>the only way to make sure you don't commit a copyright infringement by re-distributing a copyleft work is not uploading derivative works to third-party services, but only to that service used by the original work
Of you do, use literally any git host without nonfree terms?
There are a few crappy sites, but those are little in number.

>I'd suggest re-reading § 69a UrhG.
Can you link that to me?
I've searched that, but I can only really find a German law, which is not under USA copyright.
I can't read German and such translations are only to assist the understanding of the German law and have no legal effect.
Anyway, the relevant part appears to be;
"(2) The protection granted applies to the expression, in any form, of a computer program. Ideas and principles which underlie any element of a computer program, including the ideas and principles which underlie its interfaces, are not protected."
That is NOT stating that short excerpts are not copyrightable. That is stating that the "ideas and principles", like the idea of a function, or function names of an interface etc are not copyrightable.
It then goes on to say; "(3) Computer programs are protected if they represent individual works in the sense that they are the result of the author’s own intellectual creation. No other criteria, especially qualitative or aesthetic criteria, are to be applied when determining its eligibility for protection." - which means anything else not mentioned, no matter how short, are copyrightable, as long as "they are the result of the author’s own intellectual creation".
Maybe the German version says something else, I don't know.

@Suiseiseki “I'm talking about USA copyright”

Why then are you commenting on a blog article that deals with European law and uses German law as an example? (Also, FYI, the For Good Eyes Only Licence enforces the GDPR regardless of the licensee's jurisdiction.)

“use literally any git host without nonfree terms”

Can you name a few?

Yes, that's the one I meant. Though, the devil is in the details, particularly in the phrase “[... ] if they represent individual works in the sense that they are the result of the author’s own intellectual creation”. The terms “individual work” and “intellectual creation” both refer to the level of artistic creation necessary for copyright protection. And that level just isn't reached by short excerpts, according to common interpretation.

@pixelcodeapps >Why then are you commenting on a blog article that deals with European law and uses German law
Since Microsoft is based in the USA, so USA laws apply to them when it comes to copyright?

Why are do you keep trying to tell me about your nonfree license?
It's like trying to get rms to accept a proprietary software license.

>“use literally any git host without nonfree terms” >Can you name a few?
Plenty;
- My cgit server.
- https://sh.ht
- The hosting software built into git itself.
- gitea - for example: https://codeberg.org/
I'm sure there's plenty more that escape me right now.

>that level just isn't reached by short excerpts, according to common interpretation.
I don't care about common interpretation - I care what it actually says and I've stated what it does.

@Suiseiseki

“Why do you keep trying to tell me about your nonfree license?”

I'm just replying to your criticism.

sh.ht only says “It works”. Codeberg's ToS don't grant Codeberg any usage rights in the first place (which I'd consider a huge risk for Codeberg).

“I don't care about common interpretation - I care what it actually says”

That's not how legal interpretation works, there are tons of literature on that.

Here are a few (German-only, sorry) articles that all agree that trivial software works aren't copyright-protected:

prigge-recht.de/urheberrecht-a
studlib.de/7519/recht/software
kanzlei-lachenmann.de/software
digital-recht.at/wann-ist-eine (note: Austrian law is very similar to German law)

@pixelcodeapps >re-arranging the characters of a copyright-protected work doesn't constitute a copyright infringement whatsoever
Do you know of a legal ruling that states this?
As far as I am aware, re-arranging a copyrighted work doesn't magically wash off the copyright.
>Those few excerpts that are copied verbatim are not copyright-protectable because they are too short for that >There are tons of similar situations where even a human would repeat “someone else's” code, knowingly or unknowingly.
Do you have a legal ruling that specifically states what the minimum length of a copyrightable work is?
As far as I am aware, as long you don't *copy* or *know* of another work, if you end up reproducing another work exactly, you hold the copyright for what you wrote just the same,
But, co-pilot *knows* the works it outputs. It knew *exactly* what's in the quick square root function, and was happy to *copy* it verbatim, including the copied (non-optimal) magic values exactly (if it actually learned from such function, it would know to use a more optimal magic values I reckon).
>Compilers only translate the source code into machine code without changing anything about the funcionality
This is incorrect. Optimizing compilers change a lot of functionality - they just happen to arrive at a program that does the indeed thing usually.
>that doesn't change the fact that it's core to copyright law.
It isn't core to copyright law - it's been grafted on by corrupt public officials sure, but that doesn't mean you should play along.
>Funny how, for some reason, you seem to think that a billion-dollar company has not made sure not to risk tons of lawsuits.
Microsoft knows what they are doing is copyright infringement (why else wouldn't they answer the SFC's questions if they knew it wasn't), but they know that if they avoid trouble by playing their cards right, spreading enough FUD and refuse to answer questions, they can probably pull it off, as so far it seems that everyone who willingly uses github is utterly spineless.
Really, they're happy to risk a lawsuit or two, as they have more money than sense, so they can just crush any small player in court, no matter what they do.
>Also, you might be interested in this paper
I read the paper a few months ago and I don't agree with the conclusions reached.
In my experience, I've noticed a pattern in that almost everything that says "intellectual property" without quotes is complete garbage - as either the writer is confused about the topic, or it's intended to confuse the reader.
>There are regularly instances of F-Droid apps (even the F-Droid store itself) being sold on Google Play, sometimes even including ads and trackers >You might find that a great example of software freedom but I don't.
As long as the license has been complied with, any users can just strip garbage out, so I don't see a problem as long as the before-mentioned has been done.

@Suiseiseki I'm not talking about re-arrangements of the work, but of its characters. What I was referring to is that most of Copilot's outputs have nothing in common with the analysed code, other than the syntax of the programming language used.

There is no universal minimum length for copyright protection; it's always case-dependent and usually very complex. I mean, there could be two works of the same length; one of them might condense significant artistic value into minimal volume, while the other one is nothing but a bloat of uncreative, generic commands.

If you don't know you're infringing upon someone else's copyright, you aren't punished – but that doesn't make the copyright infringement itself legal.

Compilers: Hm, that's not exactly what I meant. I was more referring to the fact that compilers always produce “expected” outputs, or in other words: the same output for the same input.

@Suiseiseki

“As long as the license has been complied with, any users can just strip garbage out”

Well, not only garbage, but dangerous garbage, that's the problem. I just don't want to allow anyone to misuse my creations for the purpose of harming others.

@pixelcodeapps >What I was referring to is that most of Copilot's outputs have nothing in common with the analysed code, other than the syntax of the programming language used.
That appears to be the case, but really that's just a whole lot of grafting together.
>copyright protection
Copyright is restrictions, not "protections".
>If you don't know you're infringing upon someone else's copyright, you aren't punished
Um, yes you can be, as copyright infringement is always a crime,
Usually complainants will be happy if you just cease violating, but it's still a crime - although it rarely is enforced.
>I just don't want to allow anyone to misuse my creations for the purpose of harming others.
Software are tools and even attempting to restrict what you personally do with a tool is tyrannical (i.e. what you're trying to do is make a hammer with license terms that allow you to drive in nails, but don't allow you to drive in screws) - you are a wannabee tyrant who wishes to restrict the users.
Anyone can use what I write for any purpose, aside from turning it into proprietary software.

@Suiseiseki “yes you can be [punished], as copyright infringement is always a crime”

§ 15 StGB clearly says: “Unless the law expressly provides for criminal liability for negligent conduct, only intentional conduct attracts criminal liability.” And § 106 UrhG doesn't say anything about negligent conduct.

If you're saying that prohibiting the use of one's creations for the purpose of harming others is “tyrannical”, then, by your own logic, you must be a war crimes advocate.

Just like free speech doesn't mean you can say literally anything, software freedom shouldn't mean that you can use the software for literally any purpose – but only for those purposes that aren't destructive for society.

Have you heard of the paradox of tolerance (rhetorical question)?

@pixelcodeapps You're linking to German laws there.
I'm talking about USA laws.
>If you're saying that prohibiting the use of one's creations for the purpose of harming others is “tyrannical”
Attempting to do so with a proprietary software license is tyrannical, as you just hurt the users, while not stopping the creation from being used to harm others.
>Just like free speech doesn't mean you can say literally anything
I really don't care what anyone says, as they're just words, but don't expect to be able to say complete garbage or threats without me criticizing.
>software freedom shouldn't mean that you can use the software for literally any purpose – but only for those purposes that aren't destructive for society.
The most destructive thing to society is proprietary software (you would think it isn't, but it is), so literally for any other purpose is arguably less destructive.
I don't respect your ideals of proprietary software, so the answer is no.

@pixelcodeapps I'm pretty sure it violates a lot if not all foss licenses because it strips the license file from the code it distributes, modified or not.

Whether or not Github's ToS override license files the author added to the code is an interesting point. If they indeed do then the code authors are at fault for not catching that when creating their projects there and trusting their license agreement would be honored by both users as well as Github themselves.

@fedops The (re-)distributed excerpts are too small to be copyright-protected, so there's no need for keeping the licence files.

Yes, developers should be very careful about creating forks of copyleft-licensed on third-party platforms. In contrast, permissive licences don't introduce such problems in the first place.

@pixelcodeapps is that conjecture or a verifiable fact?

Unless there's a court decision (which I doubt there is) I'd say for example GPLv3 5a) and 5b) apply to anything non-trivial derived from Foss code taken from any repo.

The fact that they don't even attempt to keep track of which repo it was taken from and what license applied at the time leaves an additional sour taste.

I'd like to see someone change the license post-mortem and sue them for damages.

@fedops Not sure what you mean by “verifiable fact”. Please note that court rulings, especially those from lower instances, don't necessarily create certainty, since (a) judges are independent, (b) cases are different and (c) they only apply to one specific jurisdiction.

If the “derived” work doesn't contain an actual excerpt from the original code, in Germany this usage falls under § 44b UrhG which allows “text and data mining”; meaning that creating (non-trivial) works based on the *analysis* of copyright-protected code is not a copyright infringement, so there's no need to comply with the GPL in this case.

If the derived work does, in fact, contain actual excerpts from the original code, then it depends on whether those excerpts themselves reach a level of artistic creation that is high enough to fulfil the requirements of § 69a UrhG for copyright protection as a computer program: forgoodeyesonly.codeberg.page/

@pixelcodeapps yeah the latter case is the interesting one. If you can't trace a code snippet to something on github it doesn't really matter.

What I'm wondering (and IANAL which is why I'm asking about precedence) is whether this actually even falls under copyright law. Since we're not talking about copyright protection but what I would call legitimate use within the license established by the code author.

But I really don't know. All I know is don't trust microsoft, ever.

@fedops I really hope forge federation will soon be ready so that large FOSS projects won't be able to argue anymore that they “need GitHub because all the contributors are there”.

@pixelcodeapps even if it is my guess is github is staying out of it because business model.

@fedops @pixelcodeapps It's almost as though copyright law is an arbitrary mess or something...
@pixelcodeapps Here's the problem with number 7:

In order to supercede the GPL, you need the consent of every single contributor to the project.
Sign in to participate in the conversation
Mastodon 🐘

A general-purpose Mastodon server with a 1000 character limit.

Support us on Ko-Fi Support us on Patreon Support us via PayPal