Parameter Update: 2025-20
"kontext?" edition

After last week's firehose of updates, this week feels almost calm by comparison - but there's still some cool stuff to dig into, so here we go!
DeepSeek R1-0526
It seems that ”weirdly named reasoning models that work surprisingly well” may no longer be just be a western thing! A few days ago, DeepSeek released (as they tend to do) an updated version of one of their existing models. While the mast time around, this turned to be the updated V3, reaching ~4o performance, this time we got an updated R1 going for o3 (full) level! I’ve been surprised by how little noise there was about this - while initial test seem to indicate that it may also reach o3 in terms of hallucinations (not a good thing!), Open Source catching up is actually really cool to see!
First benchmark for the new Deepseek R1!
— AiBattle (@AiBattle_) May 28, 2025
The new Deepseek R1-0528 performs nearly on par with o3 (High) on the LiveCodeBench benchmark. pic.twitter.com/TtMOisgu3O
Flux Kontext
In news I didn’t expect this week: A German lab closing the gap to gpt-image-1! Black Forest labs has released a new version of Flux that, through some black magic I don't quite understand, manages to achieve gpt-image-1 performance using (what seems like) purely diffusion?
bro wtf, these were all generated w/ ONE input image
— Sakib (@zsakib_) May 29, 2025
> 1 img → 13 outputs (picked the best 4 to show y'all)
> i want you to keep in mind, this is just 13 prompts ran against 1 image pic.twitter.com/yNGFGdr922
Flux Kontext seems to win in most comparisons including zero-shot and with reference images.
Anthropic
Open Sourcing mech interp tools
Anthropic has made public some of the research work they published in their safety blog posts over last couple months - cool!
The methods we used to trace the thoughts of Claude are now open to the public!
— Emmanuel Ameisen (@mlpowered) May 29, 2025
Today, we are releasing a library which lets anyone generate graphs which show the internal reasoning steps a model used to arrive at an answer. https://t.co/8IR88WriNB pic.twitter.com/CA0aCSYK8F
Claude Voice Mode
Anthropic now has their own Advanced Voice Mode? And he's British? How is no one talking about this?
"no its fine i just didn't expect Claude to be british" https://t.co/JJIKIGDiUN
— kalomaze (@kalomaze) May 27, 2025
OpenAI: Project Stargate
OpenAI has announced that as part of their Stargate initiative, they would provide free ChatGPT for all citizens of the United Arab Emirates. Having a company offer all citizens of a country free access to their software like this feels unprecedented to me?