Parameter Update: 2025-32

"small fruit" edition

Parameter Update: 2025-32

This week: The whale strikes back and tiny bananas!

DeepSeek V3.1

Despite being reported as a minor version bump (remember the Claude 4 -> 4.1 bump last week?), this is actually a huge upgrade! DeepSeek's new V3.1 is about as close to an R1 successor as we're likely to get. The model has all the bells and whistles you would expect from the whale: Hybrid reasoning, agentic tool use and benchmark scores somewhere between Sonnet 4 and GPT-5 - while being completely open source (MIT license!) and offered throught their own API for less than half the price of the OpenAI API. I'll keep a close eye on hallucination rates and multilingual (German) performance, as those were two weaknesses of R1 (haven't looked at V3 in depth), but assuming this holds up, this is really impressive!

Cohere: Command A Reasoning

I was pretty sure Cohere got lost somewhere making too much money selling RAG to large enterprises to bother training large models, but it seems they're still on the ball! Getting surprisingly little coverage this week, Command A Reasoning seems surprisingly good for a model targeted at enterprise self-hosting (running on <2 GPUs!). They're also giving the model away for free for non-commercial use.

Image Generation News

Meta: Midjourney Partnership

While I was under the impression that Midjourney just didn't really like money, given that they still don't offer an API for their models and until recently mostly operated through a Discord bot monetized (primarily) through an art magazine, it seems that Meta has now found their price point, partnering with the company licensing "their aesthetic technology for (...) future models and products, bringing beauty to billions".

Google (?): Nano-Banana

While I was pretty sure we'd run into diminishing returns after the gpt-image-1 release (and it did seem like things were slowing down for a few weeks), someone (presumably Google) is back on LMArena with a new model that seems like another pretty large improvement in character consistency and speed. I've seen some remaining artifacts and suboptimal realism in my tests (look at the laptop below!), but for the most part, this seems like a huge unlock! Either wait for the inevitable Google release (probably coming to AI Studio first?) or try your luck in LMArena.

Qwen Image Edit

Following last week's Qwen Image "base model", this week we saw the release of a variant focussed specifically on image editing. In my tests, it seems comparable to the Ideogram model from last week but still worse than the new nano-banana model.

Input vs. gpt-image-1 vs. qwen-image-edit vs. nano-banana

Runway: Game Worlds Beta

I initially assumed this would be another world model (as that would seem in character for Runway) but instead it turned out to be a very cool but very weird generative gaming experience that also appears to be a bit broken right now. Runway combines their image and video generation with some language model to provide dynamically generated interactive text adventures. This lands right on the intersection of "very neat intern project" and "potential to go mega viral". If you haven't tried it, give them a few days to fix it and then check it out here.

Grok 2: Public Weights

In the past, Elon has promised to open source old Grok models after the launch of the new generation. While slightly late this time around (Grok 4 is already out!), he has finally made good on that promise by releasing Grok 2.5. At first glance, this release sucks. Not only is this model worse than what's already out there, it also comes with an extremely restrictive licence and only as a specific checkpoint requiring 8 GPUs with >40GB VRAM - hence xAI getting some flack on Twitter. But I'd be remiss to point out that they are the only big lab to actually open source their old foundation models once they are deprecated, so while not ideal, this is still better than the competition (see: Anthropic killing Claude 3 Opus and OpenAI spontaneously deprecating all their models). I only wish they'd also give us the image component!