Parameter Update

Parameter Update: 2025-31

"model welfare" edition

Gereon Elvers

18 Aug 2025 — 4 min read

It would seem we're all still recovering from the GPT-5 launch last week, so this week was very light.

OpenAI: GPT-5 Updates

After OpenAI was already forced to walk back a lot of the changes made to ChatGPT as part of the GPT-5 rollout, this week saw further concessions.

For one, OpenAI has updated the GPT-5 base model to have a "better personality", which feels like a very unspecific thing to note. On the other hand, we also got the rest of the legacy models (o3, o4-mini,...) back - in addition to GPT-5-mini/thinking-mini and the higher rate limits for GPT-5, which now has a clearer Auto/Fast/Thinking switcher.

All in all, the GPT-5 launch was not what most people were hoping for, but at least the iteration speed seems extremely high and ChatGPT feels like a better deal than it has for a while.

Anthropic

Model Welfare

As part of what they are calling "potential model welfare", Anthropic has given Claude the option to end "persistently harmful and abusive conversations" itself, making them un-resumable by users. Not sure what to make of this - I see the general idea, but if taken literally, isn't this the equivalent of Claude killing itself?

One Million Token Context Limit

Besides model welfare, we also (finally!) got a larger context window for Claude over the API. However, they gave it to us in the most Anthropic way possible: Hidden behind a feature flag while also charging extra for any context beyond 250,000 tokens. Looking forward to this being available in Claude Code - the context compacting is by for my biggest annoyance with it as it stands.

Google: Gemma 270M

Given that a tiny LM that could run directly in a users browser would be really cool for a variety of use cases, I was extremely excited about Google's new Gemma 270M model. That excitement lasted right up until I actually gave it a single prompt, which it failed to respond to coherently. They claim it''s good for fine tuning (which might be true?), but given that 170M of the parameters are just embeddings, I will continue to argue that while the model may be good for its size, it's just not a very good model period.

ARC-AGI: HRM Update

Given I did a bit of a deeper dive on the Hierarchical Reasoning Model architecture a few weeks ago, I felt obgligated to follow up on it now. ARC-AGI ran the numbers on their test set and as it turns out, the model performed effectively just as good as a pure just transformer model trained on the same data, without any of the "neurobiology inspired" design tweak. Still a neat party trick, but no revolution.

Analyzing the Hierarchical Reasoning Model by @makingAGI

We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source

ARC-AGI Semi Private Scores:
* ARC-AGI-1: 32%
* ARC-AGI-2: 2%

Our 4 findings: pic.twitter.com/hVBsio83g7
— ARC Prize (@arcprize) August 15, 2025

World Humanoid Robot Games

Really only including this because the video material is really cool, but here it goes: China hosted the world's first World Humanoid Robot Games in Bejing last week with participants from 16 countries including the Germany, the US and Japan.

Beijing's first World Humanoid Robot Games open with hip-hop, soccer, boxing, track and more. pic.twitter.com/81oXZtpoxs
— The Associated Press (@AP) August 14, 2025

Unitree wins the gold medal for the 1500m run at the World Humanoid Robot Games, setting a world record time of 6 minutes and 34 seconds.

The current men's world record is 3:26. pic.twitter.com/q2VThR9n5E
— The Humanoid Hub (@TheHumanoidHub) August 15, 2025

Parameter Update: 2025-31

Gereon Elvers

OpenAI: GPT-5 Updates

Anthropic

Model Welfare

One Million Token Context Limit

Google: Gemma 270M

Meta

TRIBE

DINO v3

ARC-AGI: HRM Update

World Humanoid Robot Games

Read more

Parameter Update: 2025-42

Parameter Update: 2025-41

Parameter Update: 2025-40

Parameter Update: 2025-39