Avram Piltch is the editor in chief of Tom’s Hardware, and he’s written a thoroughly researched article breaking down the promises and failures of LLM AIs.

    • lily33@lemm.ee
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      I’m sick and tired of this “parrots the works of others” narrative. Here’s a challenge for you: go to https://huggingface.co/chat/, input some prompt (for example, “Write a three paragraphs scene about Jason and Carol playing hide and seek with some other kids. Jason gets injured, and Carol has to help him.”). And when you get the response, try to find the author that it “parroted”. You won’t be able to - because it wouldn’t just reproduce someone else’s already made scene. It’ll mesh maaany things from all over the training data in such a way that none of them will be even remotely recognizable.

      • state_electrician@discuss.tchncs.de
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Well, I think that these models learn in a way similar to humans as in it’s basically impossible to tell where parts of the model came from. And as such the copyright claims are ridiculous. We need less copyright, not more. But, on the other hand, LLMs are not humans, they are tools created by and owned by corporations and I hate to see them profiting off of other people’s work without proper compensation.

        I am fine with public domain models being trained on anything and being used for noncommercial purposes without being taken down by copyright claims.

        • RickRussell_CA@beehaw.orgOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          it’s basically impossible to tell where parts of the model came from

          AIs are deterministic.

          1. Train the AI on data without the copyrighted work.

          2. Train the same AI on data with the copyrighted work.

          3. Ask the two instances the same question.

          4. The difference is the contribution of the copyrighted work.

          There may be larger questions of precisely how an AI produces one answer when trained with a copyrighted work, and another answer when not trained with the copyrighted work. But we know why the answers are different, and we can show precisely what contribution the copyrighted work makes to the response to any prompt, just by running the AI twice.

        • keegomatic@kbin.social
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          So is your comment. And mine. What do you think our brains do? Magic?

          edit: This may sound inflammatory but I mean no offense

        • conciselyverbose@kbin.social
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          So is literally every human work in the last 1000 years in every context.

          Nothing is “original”. It’s all derivative. Feeding copyrighted work into an algorithm does not in any way violate any copyright law, and anyone telling you otherwise is a liar and a piece of shit. There is no valid interpretation anywhere close.

    • RandoCalrandian@kbin.social
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      Is there a meaningful difference between reproducing the work and giving a summary? Because I’ll absolutely be using AI to filter all the editorial garbage out of news, setup and trained myself to surface what is meaningful to me stripped of all advertising, sponsorships, and detectable bias