OpenAI still leads in agentic terminal coding, but by less.

Claude can plan the work and then run hundreds of parallel subagents in a single session (and with Opus 4.8, the agents can run for even longer)

That’s one way to turn profitable before the IPO, I guess. Goodbye tokens.

  • Nighed@feddit.uk
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    2
    ·
    2 days ago

    The opus models are the first models I can work with day to day and not rage at them.

    But, at their true cost, it’s not worth it 90% of the time. Next month with the GitHub copilot changes is going to be a bloodbath.

    • unpossum@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      19 hours ago

      Pricing is an issue, yes - the open-weight models aren’t on par with claude and codex yet. I have hopes that six months to a year can bring them to the level of current frontier models, and if so I think that’s probably good enough for most users, including me. How Anthropic and OpenAI intend to make money at that point, I couldn’t tell you, but I don’t see an actual downside there :)

    • obelisk_complex@piefed.ca
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      3
      ·
      1 day ago

      I’ve been using deepseek v4 flash on opencode’s infra for a couple of weeks and it’s pretty solid for something so low-cost. Honestly satisfied with it over Claude, for the premium Anthropic charges. Have you tried it at all?

      • Nighed@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        1 day ago

        Limited to approved tools at work.

        Need to set up local models on my server for personal stuff.

        • obelisk_complex@piefed.ca
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          2
          ·
          1 day ago

          Ahhhh fair play. I have a lot of freedom since I’m paying out of pocket for my own use. I have a pretty beefy rig for running local, but it’s not beefy enough to run deepseek pro and the like 😬 so, I have a bunch of subscriptions to try out a bunch of different models and see what works best in my workflow. I also have a problem with making alts in games, which seems like it rhymes 🤔

          Been pretty impressed with glm5.1 too, before deepseek-v4 came out, but you’d be amazed what even a smaller older coding model can do with the right config and a little proactive context management. I really hope this trend of smaller, better models for local agentic use continues.