Parameter Update: 2025-15
"glazing" edition

After last week's packed recap, this week's a fair bit lighter - sometimes that's just how it goes, I guess.
OpenAI: GPT-image-1
After the ghibli-fication of my timeline has died down slightly over the last little bit, OpenAI has finally made the underlying model (which I initially assumed to just be GPT-4o, but may be a combination of things, given autoregressive + diffusion animation?) available over the API.
While the model is just as locked down as the ChatGPT version (perhaps more so?), it is also among the first set of models to be locked behind an ID verification wall. While there has been some outrage about this on Twitter (and I am always skeptical when someone wants me to upload my ID anywhere), I can see that the abuse potential here is sufficiently large that not doing it also feels like a poor decision. The fact that, for a short while, it was possible to circumvent using Replicate or fal.ai instead, then, is mostly just very funny.
As an unrelated sidenote, this is the third or fourth time Altman has talked about smaller GPT-4o updates that bring "improved personality" - not only do I find this a very weird thing to say, GPT-4o has honestly never felt worse for me than it does now? The glazing has gotten bad enough that Altman has admitted they may have overdone it
yeah it glazes too much
— Sam Altman (@sama) April 25, 2025
will fix
Cognition Labs: Project DeepWiki
In order to provide their "AI Software Engineer" Devin with appropriate context for the software it works with, Cognition figured they'd just go ahead and let an AI index, summarize and explain essentially every relevant GitHub repo ever.
Thankfully, they decided to make these summaries open to the public - simply replace "github.com" with "deepwiki.com" (e.g., https://deepwiki.com/karpathy/nanogpt), and you'll be provided with architectural diagrams, summaries,... - for every one of the 30k+ indexed repos. Seems neat, given the fact that this entire thing has cost them more than $300k+ in compute credits alone so far - now, who is building an MCP for it?
ICLR 2025
Dominating my timeline this week has been content from one of the larger AI conferences of the year. Unfortunately, given the long publication cycles, it feels like this sentiment probably rings true (and matches with my, remote only, experiences):
btw iclr is awesome but the disconnect between what people present in conferences and what they talk about is at an all-time high
— jack morris (is at iclr) (@jxmnop) April 25, 2025
one salient example: o1 came out in september and deepseek r1 in jan and got lots of people excited about ~reasoning~
but unfortunately all the work…
Tsinghua: Reinforcement Learning vs. Reasoning
In a new paper, researchers from Tsinghua University claim that the "generalizations" claimed t0 be reached by applying reinforcement learning to LLMs are actually not helping them scale beyond the base model's performance consistently. Instead, while they may help increase best-of-1 performance, they actually negatively impact best-of-n in many cases (meaning they help the model perform more consistently, but not strictly better).
While this runs counter to my own experience, I also have to admit that I am not usually running 1024 inference passes for every request, so maybe getting that consistency is good actually? On the other hand, it seems like some OpenAI people are in disagreement:
reading this must feel great if you have one brain cell https://t.co/tJFRFHZy8h
— will depue (in singapore for ICLR) (@willdepue) April 24, 2025
With others even blaming things on a skill issue, as it were
🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨
— Kunhao Zheng @ ICLR 2025 (@KunhaoZ) April 27, 2025
That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing.
You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.
🧵 How? pic.twitter.com/1kBO8mPP4Z