Get llama.cpp and try Qwen3.6-35B-A3B. Just came out and looks good. You’ll have to look into optimal settings, as it’s a Mixture of Experts (MoE) model with only 3B parameters active. That means that the rest can stay in RAM for quick inference.
You could also try the dense model (Qwen3.5-27B), but that will be significantly slower. Put these in a coding harness like Oh-My-Pi, OpenCode, etc. and see how it fares for your tasks. Should be ok for small tasks, but don’t expect Opus / Sonnet 4.6 quality, more like better than Haiku.
Say I have a GPU with 32GB VRAM and I am on Linux, what local LLM would be good for coding?
Currently I just have an iGPU ;) but that’s always an option, albeit a very expensive one.
Get llama.cpp and try Qwen3.6-35B-A3B. Just came out and looks good. You’ll have to look into optimal settings, as it’s a Mixture of Experts (MoE) model with only 3B parameters active. That means that the rest can stay in RAM for quick inference.
You could also try the dense model (Qwen3.5-27B), but that will be significantly slower. Put these in a coding harness like Oh-My-Pi, OpenCode, etc. and see how it fares for your tasks. Should be ok for small tasks, but don’t expect Opus / Sonnet 4.6 quality, more like better than Haiku.
[deleted by user]
I meant that I can buy one of those Radeons dedicated to AI work, like the ASRock Radeon AI PRO R9700 Creator 32GB GDDR6. If I need to.
Currently my Ryzen iGPU is all I need, because all I need is to see the graphical desktop environment on my screen ;) It does the job well.
I use Claude Code as well and I am slightly concerned with that ID verification news, even more so because of the technology partner that they chose.
[deleted by user]