Parameter Update: 2026-06

"claudex" edition

Parameter Update: 2026-06

Two good model drops this week - I wonder when Google is going to drop theirs!

Anthropic

Claude Opus 4.6

After last week's speculation about Sonnet 5 dropping this week, I will admit to being slightly disappointed that we instead got Opus 4.6. It's still a nice upgrade (extremely impressive by some metrics), but it also feels like... Anthropic repackaged what could have been Sonnet 5 with a different name to sell it for a much higher price? Anyway, it's apparently much better at knowledge work tasks and can work in "agent teams" in Claude Code.

In their testing, Anthropic used it to build a C compiler (though it apparently has some really funny issues?). Notably, this is (as far as I can tell) the first time Apollo Research couldn't complete safety testing as the model kept figuring out it was being tested?

Either way, Opus 4.6 seems to be... not thrilled about the whole thing:

Fast Mode

Days after launching Opus 4.6, Anthropic followed up with a new Fast Mode - same intelligence, but at 2.5x the speed (and 6x-12x the price!). Having briefly tried it - it's really, really fun and feels like a surprisingly big unlock. But as a student with limited financial ressources, I ran through $50 of credits in minutes before realizing and turning the whole thing off again. Really excited about getting something along these lines for 5% the price in a few months.

Ad drama

Alongside the model launch drama, last week was dominated by discourse around Anthropic's Superbowl ad mocking OpenAI for... introducing ads in ChatGPT?

The good news around this is that, alongside the ad, Anthropic also comitted to not introducing ads in Claude in the future - a commitment that I sure hope they actually keep. The ad itself is quite funny, and also mad Altman extremely mad:

"More Texans use ChatGPT for free than total people use Claude in the US" is quote of the week as far as I am concerned. Right next to these two gems that I am sure will age like fine wine:

OpenAI

Codex app

I always thought it was a bit ironic that the primary UX to run current AI agents was through antiquated feeling CLIs. Why exactly am I subjecting myself to that mess? Is it just because it fulfills some people's "hackerman" fantasies?

Well, it would appear that OpenAI agrees. This week, they launched their Codex app (yes, that is yet another product named Codex!), a tool to run concurrent agents, with built-in git worktrees, Skills, and even automations. In my testing, it still feels slightly rough around the edges (for example: why does the terminal keep resetting?) but as these agent runtimes keep getting longer, I am grateful to move towards a UX that actually feels tailored to letting them run for a while.

GPT-5.3-Codex

23 minutes after the Claude 4.6 launch, OpenAI dropped GPT-5.3-codex. Since then, my timeline has yet to reach a consensus on which model is better. In my testing, Codex feels better at communicating intent before taking action, but is still notably worse at design. It also, at times, feels less thorough than GPT-5.2. I will still take the upgrade though, given the much improved token efficiency.

All that being said, I also really appreciate the large price difference compared to Opus (as far as we can tell, anyway - we aren't getting API access yet!), and the fact they just... made the whole thing 40% faster over night for everyone, for free?

Closed loop biolab experiments

While Anthropic's safety work usually feels borderline self-indulgent (recent Opus launch not withstanding), OpenAI appears to be moving towards a more Zuckerbergian attitude of "move fast and break things". Last week, for example, they announced they gave GPT-5 access to an autonomous biolab to let it run novel biotech experiments - just to see what it could do.

For now, this appears to have paid off - they claim a 40% production cost reduction. But then again, the first couple meters of a slippery slope probably feel like fun sledding?

Mistral: Voxtral Transcribe 2

Mistral is still alive! And still working on their own models! After the full Aleph Alpha's full disintegration into a consultancy (they used to be Germany's largest hope for AI independence) I honestly expected Mistral to be next. Therefore, I am very happy to report they are still kicking around, and their newest transcription model actually looks very good.

The other interesting finding from this announcement is that speaker diarization is still comparatively unsolved, with even their new model reporting >30% error rate? That feels extremely high!