Meta admits using pirated books to train AI, but won't pay for it

Lee Duna@lemmy.nz · 1 year ago

Meta admits using pirated books to train AI, but won't pay for it

TWeaK@lemm.ee · 1 year ago

We do it in a non-commercial nature. Meta does it in the hope of building a market, estasblishing paywalls and eventually turning a profit - all the while never paying the original creators.

This is exactly what they (and Google and many others) do with personal user data. We manufacture the data, they collect it without due consideration (payment) and use it to profit so much that they’ve become some of the wealthiest businesses in the world. They’ve robbed us via deceptive fine print, why wouldn’t they think they can get away with this also?

archomrade [he/him]@midwest.social · 1 year ago

Responding to you here with the same response from the /c/piracy conversation:

I appreciate your enthusiasm here, but the law (and precedent reading of the law) simply does not bear out a clear interpretation like you’re suggesting.

It bears relationship to the end product when the end product reproduces the original work.

This is not how copyright has been applied when speaking of other machine learning processes using logical regression that is considered fair use, as in Text and Data Mining classifications(TDM) (proposed class 7(a) and 7(b) (page 102) in Recommendation of the Register of Copyrights 2021). The model itself is simply a very large regression model, that has created metadata analysis from unstructured data sources. When determining weather an LLM fits into this fair use category, they will look at what the model is and how it is created, not to whether it can be prompted to recreate a similar work. To quote from Comments in Response of Notice of Inquiry on the matter:

Understanding the process of training foundation models is relevant to the generative AI systems’ fair use defenses because the scope of copyright protection does not extend to “statistical information” such as “word frequencies, syntactic patterns, and thematic markers.” Processing in-copyright works to extract “information about the original [work]” does not infringe because it does not “replicat[e] protected expression.

Granted, what is novel about this particular case (LLM’s generally) is their apparent ability to re-construct substantially similar works from the same overall process of TDM. Acknowledged, but to borrow again from the same comments as above:

Yet, in limited situations, Generative AI models do copy the training data.24 So unlike prior copy-reliant technologies that courts have held are fair use, it is impossible to say categorically that inputs and outputs of Generative AI will always be fair use. We note in addition that some have argued that the ability of Generative AI to produce artifacts that could pass for human expression and the potential scale of such production may have implications not seen in previous non-expressive use cases. The difficulty with such arguments is that the harm asserted does not flow from the communication of protected expression to any human audience.

Basically, they are asserting that applying copyright to this use that falls outside of its explicit scope would not prevent the same harm caused by that same technology created without the use of the copyrighted works. Any work sufficiently described in publicly available text data could be reconstructed by a sufficiently weighted regression model and the correct prompting. E.g. - if I described a desired output sufficiently enough in my input to the model, the output could be substantially similar to a protected work, regardless of its lack of representation in the training data.

I happen to agree that these AI models represent a threat to the work and livelihoods of real artists, and that the benefit as currently captured by billion-dollar companies is a substantial problem that must be addressed, but I simply do not think the application of copyright in this manner is appropriate (as it will prevent legitimate uses of the technology), nor do i think it is sufficiently preventative in future consolidation of wealth by the use of these models.

Nevermind my personal objections to copyright law on the basis of my worldview - I just don’t think copyright is the correct tool to use for the desired protection.

TWeaK@lemm.ee · 1 year ago

There was no need for a duplicate reply. Different points were raised in my different comments. Also, we haven’t chatted in /c/privacy.

The downvote wasn’t from me btw, but I was planning on downvoting this duplicate comment down to zero (but not below).

██████████@lemmy.world · edit-2 1 year ago

man you didnt used to sell pirated dvds? i mean i didnt but i sure supported those who did. Guess what i am trying to say is i am always down with piracy

TWeaK@lemm.ee · 1 year ago

Nah man, although I did buy some of my first CDs that were rips with home printed covers from this girl who was the daughter of my dad’s lawyer friend. Nowadays though I think paying for piracy is for chumps - even if I do admit that people with hacked Firesticks get better access to live sports with their dodgy subscriptions.

██████████@lemmy.world · 1 year ago

i just want to let you know read your comment about 4 times and think it is very well written. You must have gone to uni

TWeaK@lemm.ee · 1 year ago

I went to 2 of them over more than 10 years (with a gap year at the end) and left with a Bachelors!!

Even so, my most prized qualification is my NVQ in Contact Centre Operations.

All true stories.

██████████@lemmy.world · 1 year ago

dude that was not an invitation to ramble. thanks

TWeaK@lemm.ee · 1 year ago

It fucking was an you’re welcome.

Also, why didn’t the person who downvoted you upvote me?!

██████████@lemmy.world · 1 year ago

deleted by creator

Honytawk@lemmy.zip · edit-2 1 year ago

Nah, you are just irritating

Like a pebble in a shoe

Dark_Dragon@lemmy.dbzer0.com · 1 year ago

👆 spotted a corporate slave

ShroOmeric@lemmy.world · edit-2 1 year ago

It’s not piracy if no one can come after you matee. You wouldn’t call the ships of the queen Pirates, would you?