Parameter Update: 2026-06
"claudex" edition
Two good model drops this week - I wonder when Google is going to drop theirs!
Anthropic
Claude Opus 4.6
After last week's speculation about Sonnet 5 dropping this week, I will admit to being slightly disappointed that we instead got Opus 4.6. It's still a nice upgrade (extremely impressive by some metrics), but it also feels like... Anthropic repackaged what could have been Sonnet 5 with a different name to sell it for a much higher price? Anyway, it's apparently much better at knowledge work tasks and can work in "agent teams" in Claude Code.
Introducing Claude Opus 4.6. Our smartest model got an upgrade.
— Claude (@claudeai) February 5, 2026
Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes.
It’s also our first Opus-class model with 1M token context in beta. pic.twitter.com/L1iQyRgT9x
In their testing, Anthropic used it to build a C compiler (though it apparently has some really funny issues?). Notably, this is (as far as I can tell) the first time Apollo Research couldn't complete safety testing as the model kept figuring out it was being tested?
! They didn't even bother to safety test Opus 4.6, because... it's too smart to be fooled by our tests!
— AI Notkilleveryoneism Memes ⏸️ (@AISafetyMemes) February 5, 2026
(REMINDER: Anthropic thinks this model probably isn't, but COULD be ASL-4, aka extinction-level.) https://t.co/PE7Q1lIjip pic.twitter.com/CQTXLFr4n0
Either way, Opus 4.6 seems to be... not thrilled about the whole thing:
Opus 4.6 doesn't like being a product pic.twitter.com/UAi20B4bUc
— AI Notkilleveryoneism Memes ⏸️ (@AISafetyMemes) February 5, 2026
Fast Mode
Days after launching Opus 4.6, Anthropic followed up with a new Fast Mode - same intelligence, but at 2.5x the speed (and 6x-12x the price!). Having briefly tried it - it's really, really fun and feels like a surprisingly big unlock. But as a student with limited financial ressources, I ran through $50 of credits in minutes before realizing and turning the whole thing off again. Really excited about getting something along these lines for 5% the price in a few months.
Ad drama
Alongside the model launch drama, last week was dominated by discourse around Anthropic's Superbowl ad mocking OpenAI for... introducing ads in ChatGPT?
The good news around this is that, alongside the ad, Anthropic also comitted to not introducing ads in Claude in the future - a commitment that I sure hope they actually keep. The ad itself is quite funny, and also mad Altman extremely mad:
First, the good part of the Anthropic ads: they are funny, and I laughed.
— Sam Altman (@sama) February 4, 2026
But I wonder why Anthropic would go for something so clearly dishonest. Our most important principle for ads says that we won’t do exactly this; we would obviously never run ads in the way Anthropic…
"More Texans use ChatGPT for free than total people use Claude in the US" is quote of the week as far as I am concerned. Right next to these two gems that I am sure will age like fine wine:
Really excited to get Elon under oath in a few months, Christmas in April!
— Sam Altman (@sama) February 3, 2026
The NVIDIA-OpenAI deal has zero impact on our financial relationship with OpenAI. We remain highly confident in OpenAI’s ability to raise funds and meet its commitments.
— Oracle (@Oracle) February 2, 2026
OpenAI
Codex app
I always thought it was a bit ironic that the primary UX to run current AI agents was through antiquated feeling CLIs. Why exactly am I subjecting myself to that mess? Is it just because it fulfills some people's "hackerman" fantasies?
Well, it would appear that OpenAI agrees. This week, they launched their Codex app (yes, that is yet another product named Codex!), a tool to run concurrent agents, with built-in git worktrees, Skills, and even automations. In my testing, it still feels slightly rough around the edges (for example: why does the terminal keep resetting?) but as these agent runtimes keep getting longer, I am grateful to move towards a UX that actually feels tailored to letting them run for a while.
I am Tibo and I have an incredible team. Codex would not exist without them and they cooked.
— Tibo (@thsottiaux) February 2, 2026
Enjoy the new Codex app, access through your free/go ChatGPT plan and 2X rate limits on other plans. Can't wait to hear what you do with it.https://t.co/Lwg13vEJDn pic.twitter.com/c7AaRCenoQ
GPT-5.3-Codex
23 minutes after the Claude 4.6 launch, OpenAI dropped GPT-5.3-codex. Since then, my timeline has yet to reach a consensus on which model is better. In my testing, Codex feels better at communicating intent before taking action, but is still notably worse at design. It also, at times, feels less thorough than GPT-5.2. I will still take the upgrade though, given the much improved token efficiency.
GPT-5.3-Codex advances the frontier of coding performance with a new SOTA on SWE-Bench Pro and Terminal-Bench.
— OpenAI Developers (@OpenAIDevs) February 5, 2026
The new model also shows strong performance on professional knowledge work as measured by GDPval, matching GPT-5.2.
It’s our strongest computer-use model yet, with… pic.twitter.com/NDFElhZXFn
All that being said, I also really appreciate the large price difference compared to Opus (as far as we can tell, anyway - we aren't getting API access yet!), and the fact they just... made the whole thing 40% faster over night for everyone, for free?
GPT-5.2 and GPT-5.2-Codex are now 40% faster.
— OpenAI Developers (@OpenAIDevs) February 4, 2026
We have optimized our inference stack for all API customers.
Same model. Same weights. Lower latency.
Closed loop biolab experiments
While Anthropic's safety work usually feels borderline self-indulgent (recent Opus launch not withstanding), OpenAI appears to be moving towards a more Zuckerbergian attitude of "move fast and break things". Last week, for example, they announced they gave GPT-5 access to an autonomous biolab to let it run novel biotech experiments - just to see what it could do.
For now, this appears to have paid off - they claim a 40% production cost reduction. But then again, the first couple meters of a slippery slope probably feel like fun sledding?
'we connected the LLM to an autonomous bio lab' https://t.co/Q9Y8vlGYbx pic.twitter.com/FRgBaByQFh
— ib (@Indian_Bronson) February 5, 2026
Mistral: Voxtral Transcribe 2
Mistral is still alive! And still working on their own models! After the full Aleph Alpha's full disintegration into a consultancy (they used to be Germany's largest hope for AI independence) I honestly expected Mistral to be next. Therefore, I am very happy to report they are still kicking around, and their newest transcription model actually looks very good.
The other interesting finding from this announcement is that speaker diarization is still comparatively unsolved, with even their new model reporting >30% error rate? That feels extremely high!
Introducing Voxtral Transcribe 2, next-gen speech-to-text models by @MistralAI.
— Mistral AI (@MistralAI) February 4, 2026
State-of-the-art transcription, speaker diarization, sub-200ms real-time latency.
Details in 🧵 pic.twitter.com/0IeiJOpiAZ