Parameter Update: 2025-09

"emotional intelligence" edition

Parameter Update: 2025-09

Between two major new model announcements, the most impressive voice demo I've seen since Advanced Voice Mode and DeepSeek open sourcing alien technology, it has certainly been a week.

OpenAI: GPT-4.5

Hot on the heels of last week's Grok 3 announcement, we got another weird launch. In an incredibly awkward launch stream, OpenAI finally revealed their long awaited successor to GPT-4.

Unfortunately, instead the next frontier of model intelligence, we got a model that sucks at benchmarks (though there is a point to be made about this being somewhat expected) while being surprisingly good at writing 4chan greentexts - all while being 40x more expensive than even Claude 3.5 over the API. It being limited to the $200/month ChatGPT Pro tier is especially hard to swallow given OpenAI just opened up access to Deep Research to the regular Plus tier (only 10 queries per month though!).

After 50 bucks of API usage over the past few days, I can definitely confirm the model is, at times, surprisingly funny and certainly has that big model smell people talk about. Unfortunately, the majority of my use cases employ these models as tools not as friends, which means I am not looking for understanding and personal connection, but reliability and intelligence - for which GPT-4.5 currently lacks a good value proposition.

It'll be interesting to see if this changes when GPT-4.5 is combined with the test-time compute / reasoning paradigms. Either way, watching the progression to getting there will be interesting as, given the major compute required, it means OpenAI is going to have to focus a lot on infra over the next little bit.

Anthropic: Claude 3.7

Announced a few days before GPT-4.5, Claude 3.7 seems downright tame in what it means for the ecosystem more broadly. Most analysis focusses less on what it's announcement may mean for the ecosystem more broadly, and there are few new insights into scaling laws to be had here.

Nevertheless, this is Anthropics first "real" step into reasoning models (let's not forget Claude 3.5 already did reasoning before it was cool!) and, in terms of raw usefulness, it seems like this one might be hard to beat for the next little bit. Somehow, despite having a worse naming convention than OpenAI and openly mocking users with an (admittedly hilarious) "strawberry counting demo", the Claude announcement still filled my timeline this week, with more users than ever embracing the vibe coding paradigm.

Personally, I managed to sneak into the Claude Code preview and, after burning through even more money than I did with GPT-4.5, vibed together a very competent questionnaire tool for my current thesis project. I am surprised by the difference in output length made to the use cases I found myself using it for and look forward to seeing what the dozens of YC startups Anthropic just killed will pivot to next.

DeepSeek: Open Source Week

While the large model announcement stole most of the thunder this week, just below the radar DeepSeek released some insanely cool things, most of which I honestly barely understand. Following the "12 days of OpenAI" last year (or the meinGPT Launch Week we did this January!), we got a new open source project every day last week.

Monday: FlashMLA

To kick things off, we got an optimized implementation DeepSeeks own Multi-head Latent Attention for Hopper GPUs, makes should speed up inference quite significantly, with vLLM reporting up to 16% speedup in their initial implementation.

Tuesday: DeepEP

On Tuesday, we got a new communication library for expert parallelism (meant for MoE models), which just happened to exploit "out-of-doc" instructions (cool!).

Wednesday: DeepGEMM

Wednesday brought a heavily optimized library for General Matrix Multiplications (GEMMs) on Hopper GPUs at FP8 precision.

Thursday: Optimized Parallelism Strategies
Thurday brought three announcements: DualPipe, a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training (honestly haven't looked into this one!), EPLB (Expert Parallelism Load Balancer - pretty self-explanatory) and, perhaps most interestingly, profiling data from their own training runs.

Friday: Fire-Flyer File System (3FS)

On Friday, we got a whole new distributed file system. Yup, that's right - they just build a whole new FS, lol. I'm far from an expert on this, but it also seems to reach really impressive performance numbers. To top things off, we also got a data processing framework built on top of 3FS.

Saturday: DeepSeek-V3/R1 Inference System Overview

On Saturday, just as you might have expected things to be over, DeepSeek came out and told us just how they combine all the components they just gave us access to in their own system, including the exact number of GPUs they're currently running and their profit margins. Honestly, I am still struggling to think of a reason for doing this that isn't just pissing off the other labs (and I am all here for it!).

Google: Gemini Code Assist

Not letting Microsoft take all the thunder with their Free Copilot announcement some time ago, Google has now also released a free version of their coding tool, Gemini Code Assist. Honestly, I haven't tried it yet and haven't heard from anyone that has, which is probably not a great sign?

Sesame Voice Mode

In the final news item this week, I want to shout out Sesame (who I hadn't heard of before) for putting together the most impressive voice mode demo I've seen in months. The fact that this feel like the voice mode GPT-4.5 should have had is not lost on me. If you haven't videos, I won't spoil it - try it yourself, it feels scarily human!