Parameter Update

Parameter Update: 2025-07

"cringe post" edition

Gereon Elvers

17 Feb 2025 — 3 min read

OpenAI: Drama

Taking a break from just being the US president now, Musk proposed buying OpenAI for a ridiculous $97 Billion. Current working theory is that this seems to be a fake proposal meant to push up the amount Altman will need to pay to buy out the board for his plans to transition the company into a public benefit corporation.

Incoming announcements

While the past week was slightly boring (at least compared to the past couple months), this is only because it seems we have a load of exciting stuff coming in the pipeline.

xAI: Grok 3

Later today, xAI will be the first to move, unveiling their Grok 3 model. While both Elon and most xAI employees seem excited about its performance, we also saw Elon post cringe on the timeline that decidedly didn't look state of the art. My personal hype level is also moderated by the firing of an employee that voiced more tempered expectations.

On the other hand, xAI may actually have the largest supply of compute available right now, so this will be another indicator of current scaling laws holding.

Anthropic: Claude 4

The past week was also filled with discussions and speculations surrounding Claude 4, rumored to be coming in this month still. Currently, the model is expected to outperform at least some of the o-series models while being able to dynamically switch between reasoning and non-reasoning modes.

This "merge" of models is similar to some of the concepts Jeff Dean and Noam Shazeer talked about in my favorite podcast of the past week.

OpenAI: Roadmaps, blog posts and vibe shifts

Not leaving all the thunder to Elon and Anthropic, Altman also had some things in store. The same day that OpenAI released one of their more interesting blog posts in recent memory, he provided new insights into their current roadmap. It seems that, contrary to what some people were arguing, the "GPT" name is not actually dead. Instead, we will get the final "traditional" GPT model with GPT-4.5 coming in a few weeks, which will then get merged with the o-series and existing agent products for an "all in one" GPT-5 that can dynamically choose to reason, do research, ... - as it sees fit. Will we see the timeline on this moved up after the Grok 3 announcement? Is this a delayed response to DeepSeek? Will Claude 4 mog all of the above? We'll see.

The only thing people seem sure about is that the new update to GPT-4o has massively improved the overall vibe of the model, giving it much more personality. Now, if it would only stop overusing 🚀the rocket emoji🚀 that'd be great.

But still...

While all of these non-announcements are pretty cool, as it stands, it seems that Nous Research may have beat all of them to the punch, with their DeepHermes-3 experiment already being able to dynamically shift between reasoning and non-reasoning responses - and actually being available right now.

Perplexity: Deep Research

Not to be outdone by the Deep Research announcements of the past week, Perplexity announced a feature of the same name, with much, much broader access (thanks!). In my testing, it somehow seems worse than their admittedly pretty great o3-mini based "Reasoning" (and was much slower at that), but that didn't stop Altman and Aravind from posting the worst cringe of the week in the comment section of Altman announcing the updated GPT-4o (this includes the Elon post from the beginning btw - very impressive, you two!)

lol, i just mogged you yday, check this out: https://t.co/hjOl8qk8Y4
— Aravind Srinivas (@AravSrinivas) February 15, 2025

Research

Distillation Scaling Laws

After everyone kept talking about how distillation is the hottest topic around for the past month, some people at Apple went and did a lot of benchmarks around the concept. This allowed them to come up with some very solid looking rules for when it makes sense to apply distillation and when you should just stick to a single model. Cool stuff!

Large Language Diffusion Models

In what is still very much a developing story, it seems that a Chinese team has succeeded in scaling a non-autoregressive language model based on discrete diffusion to 7B parameters. Incredibly, they claim to have reached competitive benchmark performance to the likes of Llama3 at similar model sizes.

Not relying on autoregressive generation comes with some very cool advantages, like solving string-reversal. This is genuinely incredibly cool to see and I, for one, am really excited about the open sourcing of the base model in around two weeks.

Zyphra Zonos

Flying somewhat under the radar, it seems we have a new SOTA open source TTS model - Zonos-0.01 from Zyphra. While we have seen some cool demos around this idea in the past, this was the first time the voice cloning demo really worked for me, so I am both slightly terrified and would highly recommend you give it a shot!