Parameter Update

Parameter Update: 2025-10

"R2 when?" edition

Gereon Elvers

09 Mar 2025 — 3 min read

After the rush of releases in the past two weeks, this one felt almost calm by comparison. Nevertheless, there's still some cool stuff to dig into.

Manus

Dominating my timeline this week is Manus, another release coming out of China. Through what appears to be an ensemble of Sonnet 3.5 and fine-tuned Qwen models, they're pushing the frontier of agents: long term planning, interaction with UIs and ability to run code by itself.

While they have a whole load of use cases listed on their site, and all demos I've seen so far are really impressive (sometimes borderline unrealistic), I'm mostly excited about hopefully getting to try it myself at some point, as with these types of systems, cherry-picking something that looks impressive is pretty easy, while getting something to work reliably and without hallucinating some of the details on that stock analysis you want the thing to do) is decidedly not. In the meantime, I've enjoyed getting a feel for the vibe of what people are doing by stalking their share-domain.

Google: AI Mode

If you thought the initial wave of backlash regarding problematic hallucinations in their "AI Overview" feature would dissuade Google from going full throttle on the AI train, you'd be decidedly wrong.

This week, we got a new "AI Mode" in Search, currently as a limited opt-in preview for paying subscribers (still called "Google One AI Premium" by the way, in case you were wondering if they sorted the naming out yet), this time featuring "multimodality and advanced reasoning". While I would've assumed this to cause concern over at Perplexity, it seems that assumption would've been dead wrong though, as they were busy launching their own VC and partnering with Telekom for an "app-less AI phone". Interesting strategy, let's see if it pays off.

As I've not tested the feature myself, I can't really comment on specific implementation, which might not be entirely uninteresting given that the "Deep Research" feature in AI Studio is usually the worst of all the "Deep Research" contenders out there. Perhaps more pressingly though, I would enjoy any insight into how they expect to make this work financially for the sites whose content their taking or themselves (if they're stuck paying for loads of inference compute if they ever think about rolling this out more broadly, which might be a stretch to make work with their current ad models).

Mistral: OCR

After just relaunching Le Chat as a proper European ChatGPT competitor, it seems Mistral is starting to realize they'd also like some corporate money please. Their new OCR model does solve some very hard problems and seems overall really useful, but also seems like a stark departure from their previous "let's just tweet out the torrent" release strategy, with a blog post discussing how "advancements in information abstraction and retrieval have driven human progress". They're also only offering this model through the API and "on-premises deployment on a selective basis". Disappointing!

Alibaba: QwQ-32B

If the past two weeks have been about pushing the limits with larger and larger models, this week has been about seeing how small we can make a model and still get it to reason with RL.

The slightly bigger variant of this is QwQ-32B from Alibaba, which manages to squeeze benchmark performance somewhere between DeepSeek R1 (the big 671B one!) and OpenAI's o3-mini a model you can reasonably run on your laptop!

I've only tried this one briefly, and I am surprised how much harder it is getting to vibe-check these new releases, but this seems like a pretty big release that has been somewhat underdiscussed so far?

BlinkDL: RWKV7-G1

The second release in the "scaling down RL" story this week is even more insane to me. Released by BlinkDL, whom I have never heard of before, RWKV7-G1 is based on their own "transformer free" architecture, which attempts to scale pure RNNs while matching transformer performance at massively reduced memory requirements.

While we've seen headlines about transformer-alternatives before, what caught me off guard is their RWKV7-G1 model, which at just 0.1B parameters, manages honestly shocking levels of coherence for something the size of the original GPT-2!