Parameter Update: 2026-24

"better call sol" edition

Parameter Update: 2026-24

The US government has ensured the new OpenAI launch is sufficiently interesting. Not much going on besides that.

OpenAI

GPT-5.6

Since the Claude Fable release, OpenAI's market position has really only been saved by the US government's intervention. This week, the finally responded with a new model drop. GPT-5.6 updates their naming scheme - gone are the "-mini" and "-nano" suffixes, replaced with "Sol", "Terra" and "Luna" for the three available model sizes (somehow even more confusing than the old name, but at least a little cooler?), and provides a solid step up in capabilities.

Sol is the biggest of the three models, providing performance "close to" Mythos for the same per-token price as GPT-5.5. It wins in TerminalBench, which is the headline metric OpenAI chose to highlight, but lacks behind slightly in most other benchmarks - even in their new "ultra" reasoning mode, which appears to be OpenAI's response to Anthropic's "workflows" feature. The system card unfortunately provides very few details that aren't safety-related, so we'll have to take their word for it for now.

Luna roughly matches GPT-5.5 at half the price, while having worse token efficiency, leading to some very fun benchmarks where the two actually match in final price, with Luna just taking significantly longer.

Terra is positioned as a "fast and affordable model", which means I expect it to be useless for most productive tasks but great for batch processing.

While Anthropic has now finally been allowed to re-release Fable to a small portion of national security organizations around the US government, OpenAI received more notice, planning out a staggered release strategy where each individual customer needs to be approved by the Trump administration. Ever the diplomat, Altman called with not "quite the process that we think is optimal".

The thing to watch with these restrictions is the loss of soft power that goes along with them. Similarly to the ML engineering restrictions Anthropic silently placed on Fable, there is a loss of trust that comes along with them, that will cause some organizations to shift to non-US alternatives. I am unconvinced this is the optimal strategy for anyone involved, but I wish them nothing but success.

Jalapeño

In line with their new model release, OpenAI also announced first in-house chip, engineered in collaboration with Broadcom. It's specifically tuned for LLM inference (interestingly enough, they specify explicitly that it's designed for general LLMs, not OpenAI's specific deployment stack). While we got very few details, we know a few things:

  • This is the first chip in a multi-year strategy, where we can expect to see chips released in conjunction with new models
  • Performance-per-watt is "substantially better than current state-of-the-art", and I expect it to be the primary metric worth optimizing for
  • Engineering samples are already serving GPT‑5.3‑Codex‑Spark, which is a very high-throughput model

Strategically, this is an important integrations step for OpenAI. Diversification away from Nvidia has traditionally meant buying AMD or hoping Google will give you TPUs - both of which came with significant compromises. Expect a broader rollout, and hopefully more details, over the next couple months.

Claude Tag

Anthropic has a history of announcing seemingly small things that end up reshaping the AI engineering stack. I remember dismissing both MCP and Skills when they first came out, and Claude Code was explicitly framed as an engineering experiment. Claude Tag might be the next release in this spirit.

You can think of it as a Claude Code integration into Slack, but that might be missing the reframing they want to achieve: Instead of seeing Claude as a collaborator for individual developers, it becomes a stand-along team member, with its own credentials, persistent memory, and audit log.

According to Anthropic, 65% of their Product team's code now comes from Claude Tag, which might very well be cherry-picked, but could also indicate Tag works really well in a shared environment with technical + non-technical people.

Personally, I am also a bit sketched out by the amount of lock-in a properly tuned instance of this might lead to - if all your organizational engineering knowledge is locked into Claude, you'll have a hard time kicking it out if Anthropic stops playing ball.

Sakana Fugu / OpenRouter Fusion

I missed covering this last week, but it's a trend worth talking about: With OpenRouter Fusion and Sakana Fugu, we've seen the sequential release of two very capable "Fusion" model systems, that both claim to match Mytho/Fable in general capabilities.

OpenRouter:

Sakana:

The idea behind both of these systems is relatively simple: orchestrate multiple models (sequentially and in parallel), let them vote on each others responses, and condense the consensus into a single response. Treat the whole system as a black box that, externally, looks like a single model.

It's not surprising that this works - we've known for a whole that there's a tradeoff between compute/latency and performance, and this is really just pushing that frontier, but that doesn't make it less cool.

That being said, I am skeptical that these systems enable things in the real world that weren't doable beforehand. If none of the orchestrated models is uniquely capable of something, the ensemble also won't be. And treating an orchestrated set of models as a single entity also introduces another layer of fragility - I'd be very interested in seeing someone actually build a production application using one of these in a regular coding harness to verify the performance/latency trade-off makes sense.

Again, this isn't meant to take away from the fact that these systems are really cool and can probably do really cool things. It just feels like comparing a cluster of gaming GPUs with a Blackwell GPU.