Parameter Update: 2025-33

"animals" edition

Parameter Update: 2025-33

I was expecting this week to be a bit slow apart from the Google image model, but there's actually a lot in there - wow!

Google: nano-banana

As expected, Google finally launched their new "nano-banana" image model this week, officially called "Gemini 2.5 Flash Image" (I won't even complain about the name, I am just glad they're not calling things "Gemini 2.0 Flash with Image Generation Capabilities" any more). And what can I say? It's a really good model! While their marketing seems a bit confused about what to call it (but they are sure it's "leaderboard topping, as their emails pointed out 15 times), it's also by far the most capable image model we have seen yet, especially when it comes to consistency. For all intents and purposes, this is basically a conversational image editor.

You can still expect some limitations when it comes to longer text, and I have seen some images of people that appear a bit artificial, but it's nevertheless extremely cool they are giving us this capability at a very good price point (~3ct per image)!

Model launches

Microsoft: MAI-1-Preview

Remember when Microsoft and OpenAI seemed inseparable? Well, it's been clear for a while now that these times are over, but this week we got another reminder of just how over they really are, with Microsoft announcing their own LLM, MAI-1-preview (now available in LMArena), and their own text-to-speech model, MAI-voice-1 (now live in some Copilot). While neither of the two models is record-breaking, they are already pretty competitive for the amount of compute they used and (more importantly) are clearly derisking models for more expensive runs down the line.

Nous Research: Hermes 4

After their detour into distributed model training, I didn't expect to hear from Nous Research again so soon. But turns out they were cooking! Hermes 4 doesn't claim to be state-of-the-art in most benchmarks, but it is a modern hybrid reasoning model that doesn't originate from any of the big labs while also being tuned to have a lower refusal rate and a more "user-aligned" personality (whatever that means)

Grok Code Fast 1

After the slow and expensive Grok 4 models, this week we got the much faster and much cheaper Grok Code Fast 1. While usage seems pretty high right now (presumably mostly because it's still free in a bunch of coding assistants), and it's doing well in some benchmarks, my experience with it has mostly been this:

LongCat Flash

If you thought that Europe had any opportunity of keeping up with international model training, here's another sign of how hard this might be: After Alibaba decided to get into food delivery, Chinese food delivery platform LongCat (i.e., Chinese Deliveroo/Lieferando) just launched their first public LLM called "LongCat Flash", which happens to be surprisingly competitive, fully open weight and mostly designed to spite Alibaba's Qwen models? Either way, the tech report is a very interesting read!

OpenAI

Realtime API GA + gpt-realtime

After being stuck in preview forever, the OpenAI Realtime API has finally moved to general availability. With it, we got a new model (gpt-realtime), which sounds more natural, is a bit smarter, better at instruction following and also 20% cheaper. We also got some dev improvements like SIP integration and support for the EU Data Residency (still waiting on that one to be available for us, OpenAI!)

Codex Updates

Just in case you weren't confused yet by the 500 products all named Codex, this week OpenAI added another one - an IDE plugin for VSCode and related forks (Cursor, Windsurf,...). Behind the curtain, this uses the same Codex CLI my timeline has been pretty happy about with for a while now, but the UI seems much more approachable. I'm a bit surprised to see this so quickly after their recent collaboration with Cursor (as this seems like a pretty direct attack), but from what I can tell Codex with the $20/month plan is one of the better deals out there right now, so I'll take what I can get.

Anthropic

Claude for Chrome (+ discussion of agentic browsing)

Just last week, there was a small piece of drama I decided not to cover, when Brave (the browser) called out a prompt injection vulnerabulity in Perplexity's new Comet browser. While the whole thing also seemed slightly self interested, given that Brave is rumoured to be working on their own AI browser thing, it is important that these things are handled competently.

This week, Anthropic launched Claude for Chrome - an extension of their Computer Use feature, designed to integrate directly into Chrome and allow the AI to take actions directly within that environment. The majority of the announcement was focussed on discussing limitations and hardening against attacks, and the initial pilot will be live for just 100 users (with Max plan subscriber being able to sign up for a waitlist). While this sucks for everyone not willing to shell out at least $100/month, it certainly feels like the much more grown up way of handling this type of application.

In general, it seems like most companies are now in agreement that agentic browsers (or at least "browsing AI") is a frontier they are interested in pushing. Google has AI in Search, OpenAI has Deep Research, Agent and search grounding during reasoning (I will sometimes see o3 perform 20-30 page retrievals, with Deep Research doing even more!), and Perplexity was early to the party for once. Against that backdrop, I was very interested in hearing about Browserbase collaborating with Cloudflare on their "Web Bot Auth". After calling out Perplexity's behavior a little while ago, this seems like a way to protect themselves and their customers (AI crawlers are undoubtly creating loads of traffic!). On the other hand, though, it is also a very convenient framing for a potential new revenue channel! If Reddit's training data is worth at least $60 million, how much would people be willing to pay Cloudflare (who are "protecting" ~20% of all internet traffic!)?

Terms of Service Change

Just a short note: After cutting down Max plan usage a couple weeks ago, Anthropic pulled change on us: By default, training data will now be retained for up to five years (explicitly to be used for training future models). As far as I can tell, this is designed as an explicit opt-in in the EU (nice!) but also applies to code submitted via Claude Code (boo!). Given how quickly Codex is catching up, Anthropic could really use a win right now and this is not it.