Parameter Update: 2025-28
"action plan!" edition

Slightly slower week this time, so good opportunity to catch up with some of the stuff from the past two weeks!
Geopolitics: US & Chine AI Action Plans
With the EU AI Act's next phase due to activate on August 2nd (which might hamper large model releases moving forward!) and the UK's Online Safety Act requiring ID verification for Wikipedia the same week "Tea app" leaked over 10.000 IDs of women all over the US, we also got a more concrete idea of what the US and China's AI policy might look like moving forward.
Over in the states, the Trump administration announced "America’s AI Action Plan", which appears to be a primarily domestic agenda, focussed on removing red tape for model providers and requiring models be free from "ideological bias". China, on the other hand, announced their "Global AI Governance Action Plan", which will focus on trying to grow their global soft power in the space, e.g., through the creation of a new international regulatory body (headquartered in Shanghai). Honestly, I didn't have the time to look too deeply into either of these yet, but they are effectively the governance blueprints that will dominate AI policy over the next little bit.
Alibaba: Qwen 3 Coder
After the Kimi 2 release two weeks ago (read their technical report if you haven't yet - it's good!), we got another very good Chinese model this week. Qwen 3 Coder is a variant of the new "Qwen3-235B-A22B-Thinking-2507" (worst name in a while!). While the base model is tuned for general reasoning tasks, the Coder variant was trained specifically for the types of long-horizon agentic tasks IDEs like Cursor or CLIs like Claude Code (they themselves just forked the Gemini CLI to make their own!) and matches Sonnet 4 in most benchmarks while being ~80% cheaper.
>>> Qwen3-Coder is here! ✅
— Qwen (@Alibaba_Qwen) July 22, 2025
We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves… pic.twitter.com/Z8HfyrVScE
Image Segmentation with Gemini 2.5
While I don't think this is technically a new capability, Google's demo of Gemini 2.5 based conversational image segmentation is incredibly cool!
Next-level referring expression segmentation with Gemini 2.5 🤯 https://t.co/1mHILu4wnT pic.twitter.com/cleBzg9BHu
— Valentin Gabeur (@vgabeur) July 22, 2025
Bonus: You can also do something sort-of similar with Veo 3 by providing on-image descriptions of what you want the model to animate:
Google just discovered a powerful emergent capability in Veo 3 - visually annotate your instructions on the start frame, and Veo just does it for you!
— Bilawal Sidhu (@bilawalsidhu) July 25, 2025
Instead of iterating endlessly on the perfect prompt, defining complex spatial relationships in words, you can just draw it out… pic.twitter.com/DWsxiVGBuq
Deep Research with Test-time diffusion
Since I do a lot more scientific writing these days than I used to, I am growing more and more interested in Google's attempts at creating LLM-based AI scientists. While we are still quite a way away from them being really good, I think that their new approach to deep research (which they title "test-time diffusion" which is technically correct but also sounds a lot fancier than what they're really doing) is extremely promising. Effectively, the model first generates a very high level draft - which it then makes more and more detailed, effectively "denoising" the text over time. This is in contrast to traditional LLM writing which may feature some conceptualization up front but is then done front-to-back, end-to-end. This "denoising" style feels a lot closer to they way I approach writing, so I am excited to see future iterations (the current variant beats out OpenAI Deep Research ~60% of the time, which is good but not really a step-function improvement).
OpenAI
Very quiet week from OpenAI this time around. No big surprise, as we are expecting both the Open Source models and GPT-5 in the next few days/weeks. I did get to try out agent on Plus tier and while 40 requests doesn't feel like nearly enough, it has already enabled some really cool use cases that either would not have been possible at all or would have taken a lot longer using any other product. In short: it's good, try it out!
In other news, we got Altman on Theo Von's podcast (lol) and some rumoured new models in LLMArena that appear to be are clearing the floor with what's out there today:
Kinda amazing: the mystery model "summit" with the prompt "create something I can paste into p5js that will startle me with its cleverness in creating something that invokes the control panel of a starship in the distant future" & "make it better"
— Ethan Mollick (@emollick) July 27, 2025
2,351 lines of code. First time https://t.co/Hc1uHkZl08 pic.twitter.com/Wkr7vvwYIB