• FatCrab@slrpnk.net
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    2 days ago

    A quick search turns up that alpha fold 3, what they are using for this, is a diffusion architecture, not a transformer. It works more the image generators than the GPT text generators. It isn’t really the same as “the LLMs”.

    • holomorphic@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      I will admit didn’t check because it was late and the article failed to load. I just remember reading several papers 1-2years ago on things like cancer-cell segmentation where the ‘classical’ UNet architecture was beaten by either pure transformers, or unets with added attention gates on all horizontal connections.

    • MajinBlayze@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      2 days ago

      I skimmed the paper, and it seems pretty cool. I’m not sure I quite follow the “diffusion model-based architecture” it mentioned, but it sounds interesting

      • FatCrab@slrpnk.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        Diffusion models iteratively convert noise across a space into forms and that’s what they are trained to do. In contrast to, say, a GPT that basically performs a recursive token prediction in sequence. They’re just totally different models, both in structure and mode of operation. Diffusion models are actually pretty incredible imo and I think we’re just beginning to scratch the surface of their power. A very fundamental part of most modes of cognition is converting the noise of unstructured multimodal signal data into something with form and intention, so being able to do this with a model, even if only in very very narrow domains right now, is a pretty massive leap forward.