Are there any initiatives aimed at training generative AI using 100% public domain works and works authorized by the creator?

HiddenLayer555@lemmy.ml · 29 days ago

Are there any initiatives aimed at training generative AI using 100% public domain works and works authorized by the creator?

kadup@lemmy.world · 29 days ago

It’s also very hard to keep track of licenses for text based content on the internet. Do most users know what’s the default licence for their comments on Reddit? How about Facebook? How about the comments section of a random blog? How about the title of their Medium post? And so on

General_Effort@lemmy.world · 29 days ago

The usual tends to be that the platform can do basically whatever. That shouldn’t really be surprising. But I see your point. If you literally want consent, not just legally licensed material, then you need more than just a clause in the TOS.

You could raise the same issue with permissively licensed material. People who released it may not have foreseen AI training as a use, and might not have wanted to actually allow it.

kadup@lemmy.world · 29 days ago

Exactly - the platform owner usually can do everything. Can a third party crawler? I don’t know

General_Effort@lemmy.world · 28 days ago

You mean legally? Yeah, no problem. It depends on the location, though. In the EU, the rights-holder can opt out. So if you want to do it in the EU you have to pay off Reddit, Meta, and so on. In Japan, it’s fine regardless. In the US, it should turn out similarly, but it’s up to the courts to work out the details, and it’s quite up in the air if you can trust the system to work.