Parameter Update: 2025-20

"kontext?" edition

Parameter Update: 2025-20

After last week's firehose of updates, this week feels almost calm by comparison - but there's still some cool stuff to dig into, so here we go!

DeepSeek R1-0526

It seems that ”weirdly named reasoning models that work surprisingly well” may no longer be just be a western thing! A few days ago, DeepSeek released (as they tend to do) an updated version of one of their existing models. While the mast time around, this turned to be the updated V3, reaching ~4o performance, this time we got an updated R1 going for o3 (full) level! I’ve been surprised by how little noise there was about this - while initial test seem to indicate that it may also reach o3 in terms of hallucinations (not a good thing!), Open Source catching up is actually really cool to see!

Flux Kontext

In news I didn’t expect this week: A German lab closing the gap to gpt-image-1! Black Forest labs has released a new version of Flux that, through some black magic I don't quite understand, manages to achieve gpt-image-1 performance using (what seems like) purely diffusion?

Flux Kontext seems to win in most comparisons including zero-shot and with reference images.

Anthropic

Open Sourcing mech interp tools

Anthropic has made public some of the research work they published in their safety blog posts over last couple months - cool!

Claude Voice Mode

Anthropic now has their own Advanced Voice Mode? And he's British? How is no one talking about this?

OpenAI: Project Stargate

OpenAI has announced that as part of their Stargate initiative, they would provide free ChatGPT for all citizens of the United Arab Emirates. Having a company offer all citizens of a country free access to their software like this feels unprecedented to me?