Parameter Update: 2025-15

"glazing" edition

Parameter Update: 2025-15

After last week's packed recap, this week's a fair bit lighter - sometimes that's just how it goes, I guess.

OpenAI: GPT-image-1

After the ghibli-fication of my timeline has died down slightly over the last little bit, OpenAI has finally made the underlying model (which I initially assumed to just be GPT-4o, but may be a combination of things, given autoregressive + diffusion animation?) available over the API.

While the model is just as locked down as the ChatGPT version (perhaps more so?), it is also among the first set of models to be locked behind an ID verification wall. While there has been some outrage about this on Twitter (and I am always skeptical when someone wants me to upload my ID anywhere), I can see that the abuse potential here is sufficiently large that not doing it also feels like a poor decision. The fact that, for a short while, it was possible to circumvent using Replicate or fal.ai instead, then, is mostly just very funny.

As an unrelated sidenote, this is the third or fourth time Altman has talked about smaller GPT-4o updates that bring "improved personality" - not only do I find this a very weird thing to say, GPT-4o has honestly never felt worse for me than it does now? The glazing has gotten bad enough that Altman has admitted they may have overdone it

Cognition Labs: Project DeepWiki

In order to provide their "AI Software Engineer" Devin with appropriate context for the software it works with, Cognition figured they'd just go ahead and let an AI index, summarize and explain essentially every relevant GitHub repo ever.

Thankfully, they decided to make these summaries open to the public - simply replace "github.com" with "deepwiki.com" (e.g., https://deepwiki.com/karpathy/nanogpt), and you'll be provided with architectural diagrams, summaries,... - for every one of the 30k+ indexed repos. Seems neat, given the fact that this entire thing has cost them more than $300k+ in compute credits alone so far - now, who is building an MCP for it?

ICLR 2025

Dominating my timeline this week has been content from one of the larger AI conferences of the year. Unfortunately, given the long publication cycles, it feels like this sentiment probably rings true (and matches with my, remote only, experiences):

Tsinghua: Reinforcement Learning vs. Reasoning

In a new paper, researchers from Tsinghua University claim that the "generalizations" claimed t0 be reached by applying reinforcement learning to LLMs are actually not helping them scale beyond the base model's performance consistently. Instead, while they may help increase best-of-1 performance, they actually negatively impact best-of-n in many cases (meaning they help the model perform more consistently, but not strictly better).

While this runs counter to my own experience, I also have to admit that I am not usually running 1024 inference passes for every request, so maybe getting that consistency is good actually? On the other hand, it seems like some OpenAI people are in disagreement:

With others even blaming things on a skill issue, as it were