AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

RickRussell_CA@beehaw.org · 3 years ago

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

lily33@lemm.ee · 3 years ago

They have the right to ingest data, not because they’re “just learning like a human would". But because I - a human - have a right to grab all data that’s available on the public internet, and process it however I want, including by training statistical models. The only thing I don’t have a right to do is distribute it (or works that resemble it too closely).

In you actually show me people who are extracting books from LLMs and reading them that way, then I’d agree that would be piracy - but that’d be such a terrible experience if it ever works - that I can’t see it actually happening.

donuts@kbin.social · 3 years ago

You’re making two, big incorrect assumptions:

Simply seeing something on the internet does not give you any legal or moral rights to use that thing in any way other than things which are, or have previously been, deemed to be “fair use” by a court of law. Individuals have personal rights over their likeness and persona, and copyright holders have rights over their works, whether they are on the internet or not. In other words, there is a big difference between “visible in public” and “public domain”.
More importantly, something that might be considered “fair use” for a human being do to is not necessary “fair use” when a computer or “AI” does it. Judgements of what is and is not fair use are made on a case by case basis as a legal defense against copyright infringement claims, and multiple factors (purpose of use, nature of original work, degree and sustainability of use, market effect, etc.) are often taken into consideration. At the very least, AI use has serious implications on sustainability and markets, especially compared to examples of human use.

I know these are really tough pills for AI fans to swallow, but you know what they say… “If it seems too good to be true, it probably is.”

lily33@lemm.ee · edit-2 3 years ago

One the contrary - the reason copyright is called that is because it started as the right to make copies. Since then it’s been expanded to include more than just copies, such as distributing derivative works

But the act of distribution is key. If I wanted to, I could write whatever derivative works in my personal diary.

I also have the right to count the number of occurrences of the letter ‘Q’ in Harry Potter workout Rowling’s permission. This I can also post my count online for other lovers of ‘Q’, because it’s not derivative (it is ‘derived’, but ‘derivative’ is different - according to Wikipedia it means ‘includes major copyrightable elements’).

Or do more complex statistical analysis.

RickRussell_CA@beehaw.org · 3 years ago

Two things:

Many of these LLMs – perhaps all of them – have been trained on datasets that include books that were absolutely NOT released into the public domain.
Ethically, we would ask any author who parrots the work of others to provide citations to original references. That rarely happens with AI language models, and if they do provide citations, they often do it wrong.

lily33@lemm.ee · 3 years ago

I’m sick and tired of this “parrots the works of others” narrative. Here’s a challenge for you: go to https://huggingface.co/chat/, input some prompt (for example, “Write a three paragraphs scene about Jason and Carol playing hide and seek with some other kids. Jason gets injured, and Carol has to help him.”). And when you get the response, try to find the author that it “parroted”. You won’t be able to - because it wouldn’t just reproduce someone else’s already made scene. It’ll mesh maaany things from all over the training data in such a way that none of them will be even remotely recognizable.

state_electrician@discuss.tchncs.de · 3 years ago

Well, I think that these models learn in a way similar to humans as in it’s basically impossible to tell where parts of the model came from. And as such the copyright claims are ridiculous. We need less copyright, not more. But, on the other hand, LLMs are not humans, they are tools created by and owned by corporations and I hate to see them profiting off of other people’s work without proper compensation.

I am fine with public domain models being trained on anything and being used for noncommercial purposes without being taken down by copyright claims.

RickRussell_CA@beehaw.org · 3 years ago

it’s basically impossible to tell where parts of the model came from

AIs are deterministic.

Train the AI on data without the copyrighted work.
Train the same AI on data with the copyrighted work.
Ask the two instances the same question.
The difference is the contribution of the copyrighted work.

There may be larger questions of precisely how an AI produces one answer when trained with a copyrighted work, and another answer when not trained with the copyrighted work. But we know why the answers are different, and we can show precisely what contribution the copyrighted work makes to the response to any prompt, just by running the AI twice.

RickRussell_CA@beehaw.org · 3 years ago

And yet, we know that the work is mechanically derivative.

keegomatic@kbin.social · 3 years ago

So is your comment. And mine. What do you think our brains do? Magic?

edit: This may sound inflammatory but I mean no offense

conciselyverbose@kbin.social · 3 years ago

So is literally every human work in the last 1000 years in every context.

Nothing is “original”. It’s all derivative. Feeding copyrighted work into an algorithm does not in any way violate any copyright law, and anyone telling you otherwise is a liar and a piece of shit. There is no valid interpretation anywhere close.

RandoCalrandian@kbin.social · 3 years ago

Is there a meaningful difference between reproducing the work and giving a summary? Because I’ll absolutely be using AI to filter all the editorial garbage out of news, setup and trained myself to surface what is meaningful to me stripped of all advertising, sponsorships, and detectable bias

RickRussell_CA@beehaw.org · 3 years ago

When you figure out how to train an AI without bias, let us know.

RandoCalrandian@kbin.social · 3 years ago

You’re confusing ai with chatgpt, but to answer your question: if it’s my own bias, why would I care that it’s in my personal ai? That’s kind of the point: using my personal lens (bias) to determine what info I would be interested in being alerted of

RaleighEnt@kbin.social · 3 years ago

oooh I dunno man having an AI feed you shit based on what fits your personal biases is basically what social media already does and I do not think that’s something we need more of.

Ilandar@aussie.zone · 3 years ago

You’re confusing ai with chatgpt

???

RickRussell_CA@beehaw.org · 3 years ago

The bias is in the AI design and the training dataset.