Parameter Update: 2025-28

"action plan!" edition

Parameter Update: 2025-28

Slightly slower week this time, so good opportunity to catch up with some of the stuff from the past two weeks!

Geopolitics: US & Chine AI Action Plans

With the EU AI Act's next phase due to activate on August 2nd (which might hamper large model releases moving forward!) and the UK's Online Safety Act requiring ID verification for Wikipedia the same week "Tea app" leaked over 10.000 IDs of women all over the US, we also got a more concrete idea of what the US and China's AI policy might look like moving forward.

Over in the states, the Trump administration announced "America’s AI Action Plan", which appears to be a primarily domestic agenda, focussed on removing red tape for model providers and requiring models be free from "ideological bias". China, on the other hand, announced their "Global AI Governance Action Plan", which will focus on trying to grow their global soft power in the space, e.g., through the creation of a new international regulatory body (headquartered in Shanghai). Honestly, I didn't have the time to look too deeply into either of these yet, but they are effectively the governance blueprints that will dominate AI policy over the next little bit.

Alibaba: Qwen 3 Coder

After the Kimi 2 release two weeks ago (read their technical report if you haven't yet - it's good!), we got another very good Chinese model this week. Qwen 3 Coder is a variant of the new "Qwen3-235B-A22B-Thinking-2507" (worst name in a while!). While the base model is tuned for general reasoning tasks, the Coder variant was trained specifically for the types of long-horizon agentic tasks IDEs like Cursor or CLIs like Claude Code (they themselves just forked the Gemini CLI to make their own!) and matches Sonnet 4 in most benchmarks while being ~80% cheaper.

Google

Image Segmentation with Gemini 2.5

While I don't think this is technically a new capability, Google's demo of Gemini 2.5 based conversational image segmentation is incredibly cool!

Bonus: You can also do something sort-of similar with Veo 3 by providing on-image descriptions of what you want the model to animate:

Deep Research with Test-time diffusion

Since I do a lot more scientific writing these days than I used to, I am growing more and more interested in Google's attempts at creating LLM-based AI scientists. While we are still quite a way away from them being really good, I think that their new approach to deep research (which they title "test-time diffusion" which is technically correct but also sounds a lot fancier than what they're really doing) is extremely promising. Effectively, the model first generates a very high level draft - which it then makes more and more detailed, effectively "denoising" the text over time. This is in contrast to traditional LLM writing which may feature some conceptualization up front but is then done front-to-back, end-to-end. This "denoising" style feels a lot closer to they way I approach writing, so I am excited to see future iterations (the current variant beats out OpenAI Deep Research ~60% of the time, which is good but not really a step-function improvement).

OpenAI

Very quiet week from OpenAI this time around. No big surprise, as we are expecting both the Open Source models and GPT-5 in the next few days/weeks. I did get to try out agent on Plus tier and while 40 requests doesn't feel like nearly enough, it has already enabled some really cool use cases that either would not have been possible at all or would have taken a lot longer using any other product. In short: it's good, try it out!

In other news, we got Altman on Theo Von's podcast (lol) and some rumoured new models in LLMArena that appear to be are clearing the floor with what's out there today: