Parameter Update: 2025-06

"ge-mini?" edition

Parameter Update: 2025-06

Gemini 2.0

The biggest news this week Google finally expanding access to it's Gemini 2.0 model series. In practice, this actually means a few things

  • Gemini 2.0 Flash 2.0 moving to general availability: After months of terribly named, extremely rate-limited, but equally brilliant model variants, this is long overdue.
  • Gemini 2.0 Flash Lite (public preview): Advertised as a "workhorse model", this is a replacement for Gemini 1.5 Flash (did anyone use that?).
  • Gemini 2.0 Pro (experimental): This is the first time we're getting public access to one of the "big" next gen models of the labs. As such, I was particularly excited about what this release might indicate about where traditional scaling laws are going.

After using the models for a few days, I have mixed feelings about this release.
The good: While the Flash model is great by itself, I am especially excited about people building PDF parsing / RAG killers with it, as that feels like an unsolved issue to me, that these models could be a great fit for (I am thinking the combination of low hallucinations + good visual capabilities + long context). That's great!

The bad: The Pro model, on the other hand, feels like the pinnacle of traditional (non-reasoning) LLMs. While the benchmark scores are mostly just "alright" (except for almost matching o1 in coding?), it has very good vibes in practice. Nevertheless, people have been hyping up the next generation of LLMs for years now, with Altman most recently affirming his belief that GPT-5 will be smarter than him. A lot of this hype has been based on the idea of traditional scaling laws holding - that being the idea that model capabilities would keep increasing with increased data used in self-supervised pretraining. Now, one would assume Google spent a lot of time attempting just this scaling - which requires enormous investments in both data and compute - to build 2.0 Pro. All these efforts converging into a model that has okay benchmarks and pretty good vibes, then, doesn't negate their accomplishments, but does explain why OpenAI has been very transparent about putting their eggs primarily into the TTC basket.

The ugly: Despite the fanfare made about them a few months ago, we're not getting any major news on the Thinking variant or access to any new modalities just yet. Wake me up when I can finally reproduce the image editing demo.

Besides these three models, we also got the release of 2.0 Flash on the primary consumer Gemini site (which, for some reason, always seems to be behind AI Studio in terms of model access, feature set, UX,...??).
What's even cooler, though, is the release of "Gemini 2.0 Flash Thinking Experimental with Tools" (naming lmao) model variant. Can anyone tell me why they're launching an experimental model on the consumer site? Anyway, days after blocking OpenAI's agents from accessing YouTube, Google is now flexing their ecosystem even more in this release, allowing for some really impressive demos like summarizing 5h long video essays within a few seconds. Unfortunately, it seems they are clamping down the compute time for thinking quite hard, which limits the usefulness of this model in practice. It's easy to imagine how cool a "Pro" version of this would be though.
Edit: Turns out this is just using the subtitles from YouTube now? I swear it used to just throw the whole video in there wholesale? ☹️

OpenAI: o3-mini Updates

After Altman announced it in a Reddit AMA, ChatGPT now shows a better, more comprehensive summary of the CoT created during reasoning. Unfortunately, this still feels slightly condescending, as it's not the raw tokens - OpenAI must still be extremely afraid of distillation - but a slightly cringe summary. The prompt used to create the summary proceeded to leak, making the whole thing even funner still. Despite knowing the thoughts aren't real in any sense of the word, I am scared to admit that I still cheer for o3-mini when it says things like "Wowser!" or "What a doozy!" while pooring through my shitty code.

Anthropic: Jailbreak competition

After announcing a prompt jailbreaking competition for their new "8 layer defense system" last week, which later also received a cash price, Anthropic has been subjected to some very fun mockery and pretty good points by Pliny (this is the guy who may have just pulled off the first data poisoning attack on a major LLM).

So far, the competition seems to mostly have brought to light some really annoying false positives before concluding by someone breaking all lines of defense. lol.

Mistral: New Le Chat

Just one week after wondering how Mistral planned to compete and/or make any money, I am now eating my word. After soft-relaunching their "Le Chat" platform, they seem to somehow have pulled together everything it takes to reach critical mass. Their new model is now accelerated with Cerebras, which leads to an amazingly fun experience (similar to the I am speed moment we all got from Groq a few months ago). The platform also does image generation with Flux and has web browsing, code interpreter and Canvas build in. It also got recommended by Macron (who then proceeded to post a deepfake video of himself???) and their app is moving up the App Store charts. Hats off to Mistral for the quick turnaround, it's genuinely refreshing to see the EU doing so well!

ByteDance OmniHuman

Latest in the series of slightly unnerving video models coming out of China, we get Bytedance's OmniHuman-1. In contrast to previous models, this one works with a combination of modalities, taking in a single image of a person as well as video, audio or a combination of both. As usual, the demos look extremely impressive, but I'll hold back judgement until I've got to try a demo myself.

Replit Agent: Mobile & free tier

In the final news item of this week, Replit has massively broadened access to their Agent. The tool is now also available to free users (though limited to a measly 10 checkpoints) and, and this is really cool, also works on mobile. They've also given us some insights into how the agent works under the hood (the second coolest guide this week, after the absolute banger dropped by Karpathy). This is all really cool - now, I'll just keep waiting until someone actually builds anything useful with this, which I still haven't seen.