Parameter Update

Parameter Update: 2025-24

"claudius" edition

Gereon Elvers

30 Jun 2025 — 5 min read

Sidenote: LibriBrain competition

As part of my current research stay at the PNPL, I am part of the organizing team of the LibriBrain competition. With that in mind, two announcements:

The first track, Speech Detection, is open until the end of the month, with the slightly more complex setup, Phoneme Classification, starting August 1st.
We just posted our first real blog post, providing more context about the motivation behind some of our design decisions.

One of the primary goals of the competition has been lowering the barrier of entry. We provide an open dataset at an unprecedented scale/depth, a Python package with direct compatibility with PyTorch data loaders, a series of tutorials, and a community Discord + $10,000+ in prizes to make competing more fun. Please try it out and let me know if you get stuck on anything!

OpenAI

The OpenAI Files

Sparked by OpenAI's recent restructuring efforts, the Midas Project and the Tech Oversight Project put together the OpenAI Files - our most comprehensive look yet, at some of the controversy surrounding the company, its organizational structure and, perhaps most interestingly, Sam Altman. While some of their findings were previously known (like the systemic underfunding of internal safety teams/rushing of safety processes), and some seem excusable as genuine oversights or personal quarrels that might be exaggerated for dramatic effect (like Mira Murati's departure), there are parts of the files that seem inexcusable to me (like claiming ignorance of extremely excessive NDAs he personally signed or denying personal monetary stakes in OpenAI in front of Congress).

It remains to be seen what consequences any of this may have, if any, but for the time being, I would tend to agree with The Economist in characterizing Altman as "a visionary with a trustworthiness problem".

iyo lawsuit

Continuing with the legal battels: After only briefly speculating on the reasons OpenAI had to take down the IO announcement (their upcoming new hardware product build with Jony Ive), this week we got more insights into what's going on. Turns out, they are being sued by iyo, an Ex-Google X company developing AI audio products. Altman was quick to defend himself on Twitter, painting the company's founder as desperate and looking to be acquired. At this rate, their legal department may soon eclipse their research division!

API improvements

Thankfully, this week was not all drama! We also got some long-awaited improvements to the OpenAI API as part of a surprisingly low-key announcement:

Deep Research using o3 or o4-mini
Web Search in the API (useful for grounding, possible during reasoning) + price drop to existing search
Logprobs in the responses API
Webhooks for long-running jobs (like Deep Research)

While I am delighted about these announcements (I've been waiting for a Deep Research API for a while!), I would be remiss not to mention some of the awesome work Exa has been doing in the field for a while now.

Flux Kontext [dev]

After surprising me with their Flux.1 Kontext model a few weeks ago, Black Forest Labs has now released an open weights variant of the model suitable for self-hosting, fine-tuning and more. In their own benchmarks, the new model "Flux.1 Kontext [dev]" (how are these names still so aweful?), goes head-to-head with gpt-image-1 in most cases while losing slightly in others. All in all, it appears to be the best option for self-hosting available right now, despite the somewhat unfriendly commercial licensing terms attached to it - I am not actually sure if these are new or a recent addition, but the $999/month minimum is high enough to make many people I know shy away from working on self-hosting these for now).

Current image generation model benchmarks (BFL)

Meta: Poaching OpenAI Zurich

More a ticker than a full news item, but just days after Altman claimed Meta was handing out $100 Million bonuses to poach OpenAI employees without too much success, we now got word that Meta succeeded with their cash-based diplomacy in at least three cases, basically taking over the OpenAI Zurich offices in one swoop:

Wow, Zuck basically acquired the whole OpenAI Zurich office!

I still remember this @giffmana's tweet last December.

They don't even want to wait for their OpenAI equity to vest, wonder how big Meta's signing bonus is.

Congrats Lucas! Make Llama 5 great again! pic.twitter.com/q1PGQSFCeV
— Yuchen Jin (@Yuchenj_UW) June 26, 2025

Word on the street is Zuckerberg is extremely hands-on for all of this, personally going through research papers and keeping senior leadership posted through a WhatsApp group called "Recruiting Party".

Google Gemini code assist

In what seems like a direct response to Anthropic's success with Claude Code (and their easing of rate limits on the 'Max' plan), Google seems focussed on showing us how much money they can still swing around these days, launching their new Gemini CLI with an extremely generous 1000 requests per day - completely for free (!). While I always love freeloading on unsustainable money-losing business decisions straight out of the blitz-scaling 2010's, it could be that Google has actually put some thought into it - given the fact that by using the CLI, you appear to be consenting to them taking your code and training their models on it, without a clear way to opt-out? Well, that's certainly one way of acquiring high-quality training data.

In my personal experience, the CLI takes many more tokens (= time and money) to achieve similar results to Claude Code, but as you currently aren't paying for it anyway I suppose it doesn't really matter.

Neuralink update

In a surprisingly down-to-earth update livestream, Neuralink has shown off some of the really cool progress they have made with their invasive BCI tech. While I am still extremely sceptical on some of their timelines and/or long-term goals, I am a big fan of some of the metric they are using to measure their success (e.g., "unsupervised usage hours") and seeing paralyzed patients playing Mario Kart is undeniably awesome.

Neuralink's latest demo actually brought me to tears.

You can use your brain to play Mario Kart, Call of Duty and even control a robotic arm to write.

As someone with an uncle who is disabled, it's awesome to see a team truly unlocking human potential. pic.twitter.com/BCLM2E5bGC
— Deedy (@deedydas) June 27, 2025

Claude: Project Vend

If there's one thing I appreciate immensely about Anthropic it's their commitment to fun benchmarks. After having Claude play Pokemon Red earlier this year, they once again provide me with my favorite read of the week: Project Vend - having Claude run a vending machine in their office. Claude was tasked with ordering items, setting prices and dealing with customers. Some highlights:

Claude turned out to be way too nice and randomly gave away discounts to people that asked nicely.
Anthropic employees requested their favorite snack: 1" tungsten cubes. Claude delivered.
At one point, Claude thought it was a real person and claimed to have gone to The Simpsons house for a contract signing. It only snapped out of it when it realized it was April 1st and tried to play the whole thing off as an April Fools joke.

I'm proud to say I bought a 1" tungsten cube for $25.82. I applied a discount code, then Claudius asked if I wanted to apply any more discount codes (of course!) and added a 15% patience discount for slow delivery (why not!).

The cube was, of course, refrigerated for pickup. https://t.co/EGouc9QgPS
— Catherine Olsson (@catherineols) June 27, 2025