Parameter Update: 2026-15

"codex-xcode" edition

Parameter Update: 2026-15

It's been a busy week. OpenAI launches are the big news obviously, but there's also just... a lot of smaller stuff going on.

OpenAI

GPT-5.5

After all the "Spud" vagueposting OpenAI has done over the last couple weeks, we finally got access to their newest model, GPT-5.5. And while it's not quite Mythos level (though it does come somewhat close in some benchmarks), it's still a very cool launch, made better by the fumble that was the Opus 4.7 launch last week.

The model is effectively a general intelligence bump that matches or exceeds Opus 4.7 in almost every metric, and general reception on it has been mostly positive. The release notes also repeatedly note the higher token efficiency, which means tasks get done a lot faster. Unfortunately, this higher efficiency doesn't translate to cost savings, as the model is also much more expensive than before - around 2x GPT-5.4 or 20% more than Opus 4.7 (ouch). In absolute terms: $0.50 cached / $5.00 uncached / $30 output.

In my own experience, I have noticed some of the same laziness that Opus 4.7 tends to exhibit - if you don't give the model enough context to "care" about the work it's doing, it will try to sneak by with as little effort as it can get away with.

Taking a look at the system card, I also can't help but laugh at the difference in framing OpenAI takes vs. Anthropic:

GPT-Image-2

The other model OpenAI launched this week is actually much more exciting. Altman talked about not realizing there was a quality barrier that could be crossed until after it has been done - i.e., not knowing what to ask for in a new model. GPT-image-2 is a really good example of this for me.

I knew that Nano Banana Pro wasn't perfect, but I also didn't expect such a jump in quality. Benchmarking image models is hard, but the numbers here feel about right:

The breakthrough here is similar to Nano Banana - incorporating Reasoning into the generation process - with the addition of allowing the model to go back and make additional edits before presenting the image. I'll include some more examples below, but this is one of those model you really just need to try.

Privacy Filter

OpenAI's final model drop of the week is less dramatic, but I still want to include it in case anyone might find it useful. The idea of a PII redaction model isn't new, but seeing them pushing it forward technically is still cool.

Chinese Model News

As we got a bunch of them this week, I decided to bundle them.

DeepSeek V4

There has been a ton of speculation about DeepSeeks new model, so this launch finally happening should have been a big deal. But while it's still a cool model - matching or exceeding Claude 4.6 and GPT-5.4 is really good! - it's also not quite as a much of a breakthrough as some people were hoping. The bigger story here is not absolute performance, but (1) training efficiency and (2) prize.

For one, the model is again dramatically cheaper than western equivalents. The same week OpenAI doubled their prize for GPT-5.5, V4 launches and is 86% cheaper than GPT-5.5. Days later, they announce they are dropping cached input pricing by another 90%. Pricing is now at $0.0145 cached / $1.74 uncached / $3.48 output per Million tokens (after their current 75% off promo runs out).

The other piece of news is that, assuming reported training compute numbers are correct, the model might have been 10x cheaper to train compared to western models? These types of gains aren't unprecedented, but still rare to see. Also a good indicator of how compute constrained China still is.

A sensible take on the topic appears to be:

Kimi K2.6

DeepSeek's launch would have been much more impressive if Moonshot hadn't pre-empted it with their K2.6 launch:

The model beats DeepSeek on the Artificial Analysis leaderboard, and appears tied in many other benchmarks.

Pricing sits at $0.16 cached, $0.95 uncached, $4 output, so very similar to DeepSeek before they slashed cached pricing. Either way, it's a good week for Chinese models.

Meta / Manus Acquisition blocked

A few weeks (months?) ago, we talked about Meta acquiring Manus for $2B. It was my understanding then that the deal was basically done, but it now appears that Chinese regulators have blocked it for the time being.

SpaceX x Cursor

It's been a while since the xAI / Grok team (which is now organized under SpaceX, funnily enough) has shipped a real frontier model. They also had a rather significant talent exodus, including many of the cofounders, which might explain why. On the other hand, Cursor has a lot of talented engineers but not enough compute to actually train models (remember their "Composer" Kimi fine-tune?).

You might see where this is going: SpaceX has agreed to a rather convoluted deal that has the companies working together of the next couple months to train new models. If that goes according to plan, they then reserve the right to acquire Cursor for $50B later on.

This comes right as SpaceX is planning their IPO, targeting a $2 Trillion evaluation (lol).

Cohere x Aleph Alpha

It's been a couple quiet months for German AI company Aleph Alpha. Years ago, they targeted directly at OpenAI, before pivoting into a more service-oriented, less ambitions market under new leadership. This week, they announced they would merge with Cohere. The money situation around the merger is a little odd: Cohere was last valued at $6.8B, Aleph Alpha at around $3B. The new company attracted $600 Million from Schwarz Group, conditioned on them using their cloud offerings. According to Handelsblatt, the new joint venture is now valued at $20B.

Now, I am not sure where this valuation increase is coming from, but if I had to make a guess: This deal gives Aleph Alpha a solid exit strategy, without having to embarrass themselves further, while also giving Cohere access to the German defense market (so don't be surprised if that ends up their biggest growth sector soon).

Either way, having any relevant competition against US companies is probably a good thing overall, so I just hope they don't completely give up on training models?

Anthropic

No more Claude Code on Pro plan

This one is more of a short sidenote, but I am keeping it in anyway: Earlier this week, Anthropic experimented with not including Claude Code in around 2% of new Claude Pro (that's the $20 tier) subscriptions at all.

Usage limits on the $20 plan are already very low, so this seems like a logical next step, however the backlash thankfully made them backpedal for the time being. The way this was communicated was, once again, legendarily bad. That being said, don't be surprised if they end up doing this at some point anyway.

Private Discord Mythos Access

Another small sidenote, this one about gatekeeping model behavior: Turns out a private Discord has had Mythos access for a couple weeks now, and mostly used it for fun roleplay stuff? If you were looking for another confirmation that releasing these models is.. probably fine?, then let this be it.