Parameter Update: 2025-32
"small fruit" edition

This week: The whale strikes back and tiny bananas!
DeepSeek V3.1
Despite being reported as a minor version bump (remember the Claude 4 -> 4.1 bump last week?), this is actually a huge upgrade! DeepSeek's new V3.1 is about as close to an R1 successor as we're likely to get. The model has all the bells and whistles you would expect from the whale: Hybrid reasoning, agentic tool use and benchmark scores somewhere between Sonnet 4 and GPT-5 - while being completely open source (MIT license!) and offered throught their own API for less than half the price of the OpenAI API. I'll keep a close eye on hallucination rates and multilingual (German) performance, as those were two weaknesses of R1 (haven't looked at V3 in depth), but assuming this holds up, this is really impressive!
Introducing DeepSeek-V3.1: our first step toward the agent era! π
β DeepSeek (@deepseek_ai) August 21, 2025
π§ Hybrid inference: Think & Non-Think β one model, two modes
β‘οΈ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528
π οΈ Stronger agent skills: Post-training boosts tool use andβ¦
Cohere: Command A Reasoning
I was pretty sure Cohere got lost somewhere making too much money selling RAG to large enterprises to bother training large models, but it seems they're still on the ball! Getting surprisingly little coverage this week, Command A Reasoning seems surprisingly good for a model targeted at enterprise self-hosting (running on <2 GPUs!). They're also giving the model away for free for non-commercial use.
It surpasses other privately deployable models in its class across key agentic and multilingual benchmarks to unlock real value for global enterprises. pic.twitter.com/8nmu5PGxNU
β cohere (@cohere) August 21, 2025
Image Generation News
Meta: Midjourney Partnership
While I was under the impression that Midjourney just didn't really like money, given that they still don't offer an API for their models and until recently mostly operated through a Discord bot monetized (primarily) through an art magazine, it seems that Meta has now found their price point, partnering with the company licensing "their aesthetic technology for (...) future models and products, bringing beauty to billions".
Google (?): Nano-Banana
While I was pretty sure we'd run into diminishing returns after the gpt-image-1 release (and it did seem like things were slowing down for a few weeks), someone (presumably Google) is back on LMArena with a new model that seems like another pretty large improvement in character consistency and speed. I've seen some remaining artifacts and suboptimal realism in my tests (look at the laptop below!), but for the most part, this seems like a huge unlock! Either wait for the inevitable Google release (probably coming to AI Studio first?) or try your luck in LMArena.
Qwen Image Edit
Following last week's Qwen Image "base model", this week we saw the release of a variant focussed specifically on image editing. In my tests, it seems comparable to the Ideogram model from last week but still worse than the new nano-banana model.

Runway: Game Worlds Beta
I initially assumed this would be another world model (as that would seem in character for Runway) but instead it turned out to be a very cool but very weird generative gaming experience that also appears to be a bit broken right now. Runway combines their image and video generation with some language model to provide dynamically generated interactive text adventures. This lands right on the intersection of "very neat intern project" and "potential to go mega viral". If you haven't tried it, give them a few days to fix it and then check it out here.
Grok 2: Public Weights
In the past, Elon has promised to open source old Grok models after the launch of the new generation. While slightly late this time around (Grok 4 is already out!), he has finally made good on that promise by releasing Grok 2.5. At first glance, this release sucks. Not only is this model worse than what's already out there, it also comes with an extremely restrictive licence and only as a specific checkpoint requiring 8 GPUs with >40GB VRAM - hence xAI getting some flack on Twitter. But I'd be remiss to point out that they are the only big lab to actually open source their old foundation models once they are deprecated, so while not ideal, this is still better than the competition (see: Anthropic killing Claude 3 Opus and OpenAI spontaneously deprecating all their models). I only wish they'd also give us the image component!