Parameter Update: 2025-31

"model welfare" edition

Parameter Update: 2025-31

It would seem we're all still recovering from the GPT-5 launch last week, so this week was very light.

OpenAI: GPT-5 Updates

After OpenAI was already forced to walk back a lot of the changes made to ChatGPT as part of the GPT-5 rollout, this week saw further concessions.

For one, OpenAI has updated the GPT-5 base model to have a "better personality", which feels like a very unspecific thing to note. On the other hand, we also got the rest of the legacy models (o3, o4-mini,...) back - in addition to GPT-5-mini/thinking-mini and the higher rate limits for GPT-5, which now has a clearer Auto/Fast/Thinking switcher.

All in all, the GPT-5 launch was not what most people were hoping for, but at least the iteration speed seems extremely high and ChatGPT feels like a better deal than it has for a while.

Anthropic

Model Welfare

As part of what they are calling "potential model welfare", Anthropic has given Claude the option to end "persistently harmful and abusive conversations" itself, making them un-resumable by users. Not sure what to make of this - I see the general idea, but if taken literally, isn't this the equivalent of Claude killing itself?

One Million Token Context Limit

Besides model welfare, we also (finally!) got a larger context window for Claude over the API. However, they gave it to us in the most Anthropic way possible: Hidden behind a feature flag while also charging extra for any context beyond 250,000 tokens. Looking forward to this being available in Claude Code - the context compacting is by for my biggest annoyance with it as it stands.

Google: Gemma 270M

Given that a tiny LM that could run directly in a users browser would be really cool for a variety of use cases, I was extremely excited about Google's new Gemma 270M model. That excitement lasted right up until I actually gave it a single prompt, which it failed to respond to coherently. They claim it''s good for fine tuning (which might be true?), but given that 170M of the parameters are just embeddings, I will continue to argue that while the model may be good for its size, it's just not a very good model period.

Meta

TRIBE

Winning entry of this year's Algonaut's challenge, TRIBE is a 1B model, trained to predict brain responses to stimuli across multiple modalities, cortical areas, and individuals, relying on a combination of pretrained representations across existing foundation models.

GitHub - facebookresearch/algonauts-2025: Training and evaluating encoding models to predict fMRI brain responses to naturalistic video stimuli
Training and evaluating encoding models to predict fMRI brain responses to naturalistic video stimuli - facebookresearch/algonauts-2025

DINO v3

Perhaps more exciting for a general audience, DINO v3 is a "universal backbone" for image models, pretrained on huge amounts of visual imagery, including satellite data. What that means is that the model can serve as a basis for a wide variety of use cases via transfer learning (i.e., freezing weights and adding a small "adapter" at the end of the model). For many use cases, DINO v3 achieves SOTA results, surprisingly outperforming specialized models.

DINOv3
DINOv3 scales self-supervised learning (SSL) for images to produce our strongest universal vision backbones, enabling breakthrough performance across diverse domains.

ARC-AGI: HRM Update

Given I did a bit of a deeper dive on the Hierarchical Reasoning Model architecture a few weeks ago, I felt obgligated to follow up on it now. ARC-AGI ran the numbers on their test set and as it turns out, the model performed effectively just as good as a pure just transformer model trained on the same data, without any of the "neurobiology inspired" design tweak. Still a neat party trick, but no revolution.

World Humanoid Robot Games

Really only including this because the video material is really cool, but here it goes: China hosted the world's first World Humanoid Robot Games in Bejing last week with participants from 16 countries including the Germany, the US and Japan.