Parameter Update: 2025-26
"truth seeking" edition

When I speculated last week that things might pick up some steam this week, I may have underestimated some things - because we really haven't had a week like this in a bit!
xAI
Grok Drama
Just hours before revealing their new Grok 4 series of models (below!), xAI got themselves into some trouble with the existing "reply guy" infrastructure (the thing where Grok responds when tagged on Twitter). For one reason or another (xAI would go on to blame a minor modification to their system prompt, which I am surprised would consistently trigger behavior like this), the AI started spouting extremist viewpoints while also referring to itself as "MechHitler"
Grok is currently calling itself ‘MechaHitler’ pic.twitter.com/A6YAkvbfoh
— Josh Otten (@ordinarytings) July 8, 2025
Grok 4
While initially slightly overshadowed by the MechaHitler stuff, it turns out that Grok 4 is actually a really good model - by 10x-ing the post-training/RL spend, it is now beating out effectively all other models out there (at the cost of also being a lot more expensive).
Grok 4 comes in 1st or 2nd in every benchmark, even the "not good" ones. Compared to Claude 4 Sonnet, it cost almost 5x more money to run the Artificial Analysis benchmark pic.twitter.com/oUio9VeOTJ
— Theo - t3.gg (@theo) July 10, 2025
One small caveat: The model appears to be post-trained to check Elon Musks Twitter account when asked about it's opinion on controversial topics (l0l):
Grok 4 decides what it thinks about Israel/Palestine by searching for Elon's thoughts. Not a confidence booster in "maximally truth seeking" behavior. h/t @catehall. Screenshots are mine. pic.twitter.com/WFAG3FOG10
— Ramez Naam (@ramez) July 10, 2025
Kimi
Slightly overshadowed by Grok, but no less impressive: Chinese startup Moonshot AI has released "Kimi K2", a absolutely giganic 1T parameter (MoE though, so only 32B active parameters!) open weights model under a slightly modified MIT license (it's not MIT anymore then, so shouldn't be called that).
🚀 Hello, Kimi K2! Open-Source Agentic Model!
— Kimi.ai (@Kimi_Moonshot) July 11, 2025
🔹 1T total / 32B active MoE model
🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models
🔹Strong in coding and agentic tasks
🐤 Multimodal & thought-mode not supported for now
With Kimi K2, advanced agentic intelligence… pic.twitter.com/PlRQNrg9JL
Besides beating most other non-reasoning models handily, and being post-trained specifically for tool calls, it also appears to simply be a superior writer that produces content significantly different than most other models "slop".
Kimi has a distinct writing style that is free of most of the patterns we now associate with AI generated text. Both Kimi and DeepSeek's prose is apparently even more impressive in Chinese. Both of these models have a unique 'voice', quite different from Western AI. https://t.co/25NL4VUv23 pic.twitter.com/8CxqjGyMAq
— Andrew Curran (@AndrewCurran_) July 13, 2025
Windsurf Acquisition Drama
A couple weeks after Altman and the Windsurf people first started posting about a potential acquisition (and frankly, when I thought the deal was already done), there have been some new developments. The Windsurf CEO (and a couple of engineers) are instead going to Google. As part of the deal, Google will also pay Windsurf $2.4B for a non-exclusive license to use their technology. My take: This looks like an acquisition if I've ever seen one and honestly sucks for the remaining engineers who appear to now own a unvested shares of nothing. I am also not sure I like this new acqui-hire pattern as a way of skirting around antitrust.
Welcome Windsurf to this list of totally serious independent companies pic.twitter.com/JfJ4xLM9hS
— Deva Hazarika (@devahaz) July 11, 2025
Other stuff
HuggingFace Robot
While I haven't gotten too much into robotics so far, HuggingFaces new "Reachy Mini" robot really makes me want to change that. For just $299 ($449 if you actually want to build anything serious), it seems you can get a whole lot of robot!
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics
— Thomas Wolf (@Thom_Wolf) July 9, 2025
Our first robot: Reachy Mini
A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community.
Tiny price, small size, huge… pic.twitter.com/yl71EtwTKs
Perplexity Comet
After teasing it for months, Perplexity has finally given the first users access to its new "Comet" browser. Turns out that while it _will_ probably collect most of your data, it appears to actually be good?