• 2pt_perversion@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    4
    ·
    1 年前

    I’d love to debate politics with you but first tell me how many r’s are in the word strawberry. (AI models are starting to get that answer correct now though)

    • sbv@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 年前

      I tried this with Gemini. Regardless of the number of rs in a word (zero to 3), it said two.

      • Kraven_the_Hunter@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 年前

        So ask it about a made up or misspelled word - “how many r’s in the word strauburrry” or ask it something with no answer like “what word did I just type?”. Anything other than, “you haven’t typed anything yet” is wrong.

        • lad@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 年前

          But it’s a phrase you typed, the very one that contains the question, unless you ask by voice or in a picture

      • 2pt_perversion@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        ·
        1 年前

        Over simplification but partly it has to do with how LLMs split language into tokens and some of those tokens are multi-letter. To us when we look for R’s we split like S - T - R - A - W - B - E - R - R - Y where each character is a token, but LLMs split it something more like STR - AW - BERRY which makes predicting the correct answer difficult without a lot of training on the specific problem. If you asked it to count how many times STR shows up in “strawberrystrawberrystrawberry” it would have a better chance.

      • tee9000@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        ·
        1 年前

        Llms look for patterns in their training data. So like if you asked 2+2= it would look its training and finds high likelihood the text that follows 2+2= is 4. Its not calculating, its finding the most likely completion of the pattern based on what data it has.

        So its not deconstructing the word strawberry into letters and running a count… it tries to finish the pattern and fails at simple logic tasks that arent baked into the training data.

        But a new model chatgpt-o1 checks against itself in ways i dont fully understand and scores like 85% on international mathematic standardized test now so they are making great improvements there. (Compared to a score of like 14% from the model that cant count the r’s in strawberry)