Parameter Update: 2026-15
"codex-xcode" edition
It's been a busy week. OpenAI launches are the big news obviously, but there's also just... a lot of smaller stuff going on.
OpenAI
GPT-5.5
After all the "Spud" vagueposting OpenAI has done over the last couple weeks, we finally got access to their newest model, GPT-5.5. And while it's not quite Mythos level (though it does come somewhat close in some benchmarks), it's still a very cool launch, made better by the fumble that was the Opus 4.7 launch last week.
The model is effectively a general intelligence bump that matches or exceeds Opus 4.7 in almost every metric, and general reception on it has been mostly positive. The release notes also repeatedly note the higher token efficiency, which means tasks get done a lot faster. Unfortunately, this higher efficiency doesn't translate to cost savings, as the model is also much more expensive than before - around 2x GPT-5.4 or 20% more than Opus 4.7 (ouch). In absolute terms: $0.50 cached / $5.00 uncached / $30 output.
In my own experience, I have noticed some of the same laziness that Opus 4.7 tends to exhibit - if you don't give the model enough context to "care" about the work it's doing, it will try to sneak by with as little effort as it can get away with.
GPT-5.5 delivers this step up in intelligence without compromising on speed.
— OpenAI (@OpenAI) April 23, 2026
GPT-5.5 matches GPT-5.4 per-token latency in real-world serving, while performing better across nearly every evaluation we measured.
It also uses significantly fewer tokens to complete the same Codex… pic.twitter.com/5mR46SM7mW
Taking a look at the system card, I also can't help but laugh at the difference in framing OpenAI takes vs. Anthropic:
Anthropic System Card:
— Nathan Calvin (@_NathanCalvin) April 23, 2026
Claude accessed the internet without us asking, enjoys brooding philosophy, and is seeing a psychotherapist.
OpenAI System Card:
Here are lots of graphs. The graphs are slightly taller than the previous graphs. https://t.co/T4qVNf19up
GPT-Image-2
The other model OpenAI launched this week is actually much more exciting. Altman talked about not realizing there was a quality barrier that could be crossed until after it has been done - i.e., not knowing what to ask for in a new model. GPT-image-2 is a really good example of this for me.
I knew that Nano Banana Pro wasn't perfect, but I also didn't expect such a jump in quality. Benchmarking image models is hard, but the numbers here feel about right:
Exciting news - GPT-Image-2 by @OpenAI has claimed the #1 spot across all Image Arena leaderboards!
— Arena.ai (@arena) April 21, 2026
A clean sweep with a record-breaking +242 point lead in Text-to-Image - the largest gap we’ve seen to date.
- #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search… https://t.co/YYKjhgjhsn pic.twitter.com/IBN9a1RIJ4
The breakthrough here is similar to Nano Banana - incorporating Reasoning into the generation process - with the addition of allowing the model to go back and make additional edits before presenting the image. I'll include some more examples below, but this is one of those model you really just need to try.
what. what. what.
— Justin Schroeder (@jpschroeder) April 21, 2026
gpt-image-2 almost passes the pelican test...in a screenshot of a code editor. pic.twitter.com/KK0BZEOP3f
Greater Precision and Control
— OpenAI (@OpenAI) April 21, 2026
ChatGPT Images 2.0 can conceptualize more sophisticated images, and then actually bring that vision to life effectively.
It’s able to follow instructions, preserve requested details, and render the fine-grained elements that often break image… pic.twitter.com/n29165pV9Q
Text-heavy visuals get more practical.
— OpenAI Developers (@OpenAIDevs) April 21, 2026
gpt-image-2 improves multilingual text rendering and structured image generation for diagrams, infographics, charts, comics, and multi-panel scenes. pic.twitter.com/cp06c9nqYo
Privacy Filter
OpenAI's final model drop of the week is less dramatic, but I still want to include it in case anyone might find it useful. The idea of a PII redaction model isn't new, but seeing them pushing it forward technically is still cool.
very nice release by @OpenAI! a 50M active, 1.5B total gpt-oss arch MoE, to filter private information from trillion scale data cheaply. keeping 128k context with such a small model is quite impressive toohttps://t.co/zdZQWUA4T9
— elie (@eliebakouch) April 22, 2026
Chinese Model News
As we got a bunch of them this week, I decided to bundle them.
DeepSeek V4
There has been a ton of speculation about DeepSeeks new model, so this launch finally happening should have been a big deal. But while it's still a cool model - matching or exceeding Claude 4.6 and GPT-5.4 is really good! - it's also not quite as a much of a breakthrough as some people were hoping. The bigger story here is not absolute performance, but (1) training efficiency and (2) prize.
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.
— DeepSeek (@deepseek_ai) April 24, 2026
🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params.… pic.twitter.com/n1AgwMIymu
For one, the model is again dramatically cheaper than western equivalents. The same week OpenAI doubled their prize for GPT-5.5, V4 launches and is 86% cheaper than GPT-5.5. Days later, they announce they are dropping cached input pricing by another 90%. Pricing is now at $0.0145 cached / $1.74 uncached / $3.48 output per Million tokens (after their current 75% off promo runs out).
The other piece of news is that, assuming reported training compute numbers are correct, the model might have been 10x cheaper to train compared to western models? These types of gains aren't unprecedented, but still rare to see. Also a good indicator of how compute constrained China still is.
DS didn't report cost of v4 final training run, but could be $15-30M ~ 10²⁵ FLOPs
— steve hsu (@hsu_steve) April 24, 2026
Algorithm efficiency gains mean ~10x fewer FLOPs vs latest US closed models?
Looks like mixed HW + Nvidia training infrastructure, but optimized for HW inference.https://t.co/d7JxLE55DX
A sensible take on the topic appears to be:
DeepSeek V4 is impressive because it’s a near-SOTA model with highly efficient 1 million token context that can run on Huawei’s new Ascend 950PR chips.
— Kyle Chan (@kyleichan) April 24, 2026
But equally notable is what V4 didn’t do:
- No mention of training on Chinese AI chips
- Still lags behind US frontier models…
Kimi K2.6
DeepSeek's launch would have been much more impressive if Moonshot hadn't pre-empted it with their K2.6 launch:
Meet Kimi K2.6: Advancing Open-Source Coding
— Kimi.ai (@Kimi_Moonshot) April 20, 2026
🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2)
What's new:
🔹Long-horizon coding - 4,000+… pic.twitter.com/wkzsQqKphv
The model beats DeepSeek on the Artificial Analysis leaderboard, and appears tied in many other benchmarks.
DeepSeek is back among the leading open weights models with the release of DeepSeek V4 Pro and V4 Flash, with V4 Pro second only to Kimi K2.6 on the Artificial Analysis Intelligence Index @deepseek_ai has released DeepSeek V4 Pro and V4 Flash. V4 is the first new architecture… pic.twitter.com/grL6nsZ1qL
— Artificial Analysis (@ArtificialAnlys) April 24, 2026
Pricing sits at $0.16 cached, $0.95 uncached, $4 output, so very similar to DeepSeek before they slashed cached pricing. Either way, it's a good week for Chinese models.
Meta / Manus Acquisition blocked
A few weeks (months?) ago, we talked about Meta acquiring Manus for $2B. It was my understanding then that the deal was basically done, but it now appears that Chinese regulators have blocked it for the time being.
Breaking news: China has blocked Meta’s $2bn acquisition of artificial intelligence platform Manus, after regulators reviewed whether the deal violated Beijing’s investment rules. https://t.co/hsuAdD1HUB pic.twitter.com/anPdvfcNYJ
— Financial Times (@FT) April 27, 2026
SpaceX x Cursor
It's been a while since the xAI / Grok team (which is now organized under SpaceX, funnily enough) has shipped a real frontier model. They also had a rather significant talent exodus, including many of the cofounders, which might explain why. On the other hand, Cursor has a lot of talented engineers but not enough compute to actually train models (remember their "Composer" Kimi fine-tune?).
You might see where this is going: SpaceX has agreed to a rather convoluted deal that has the companies working together of the next couple months to train new models. If that goes according to plan, they then reserve the right to acquire Cursor for $50B later on.
some quick thoughts on the xai cursor deal
— FleetingBits (@fleetingbits) April 22, 2026
1) spacex has agreed to either pay cursor $10bn for some joint model work or otherwise to acquire cursor outright for $60bn later this year
2) i suspect that elon thinks of the $50bn option to buy cursor more as a performance incentive… https://t.co/zpaFqCP3qH
This comes right as SpaceX is planning their IPO, targeting a $2 Trillion evaluation (lol).
wonder how will Elon rename cursor, considering both xcode and codex are taken
— simran sachdeva (@simranrambles) April 25, 2026
Cohere x Aleph Alpha
It's been a couple quiet months for German AI company Aleph Alpha. Years ago, they targeted directly at OpenAI, before pivoting into a more service-oriented, less ambitions market under new leadership. This week, they announced they would merge with Cohere. The money situation around the merger is a little odd: Cohere was last valued at $6.8B, Aleph Alpha at around $3B. The new company attracted $600 Million from Schwarz Group, conditioned on them using their cloud offerings. According to Handelsblatt, the new joint venture is now valued at $20B.
Now, I am not sure where this valuation increase is coming from, but if I had to make a guess: This deal gives Aleph Alpha a solid exit strategy, without having to embarrass themselves further, while also giving Cohere access to the German defense market (so don't be surprised if that ends up their biggest growth sector soon).
Either way, having any relevant competition against US companies is probably a good thing overall, so I just hope they don't completely give up on training models?
Today, we announce a landmark agreement with @cohere. By uniting our European research depth with global AI scale, we are building a transatlantic AI powerhouse to give enterprises control over their AI.
— Aleph Alpha (@Aleph__Alpha) April 24, 2026
Learn more about our shared vision here: https://t.co/6GLIosEION pic.twitter.com/97GZAMl1es
Anthropic
No more Claude Code on Pro plan
This one is more of a short sidenote, but I am keeping it in anyway: Earlier this week, Anthropic experimented with not including Claude Code in around 2% of new Claude Pro (that's the $20 tier) subscriptions at all.
Usage limits on the $20 plan are already very low, so this seems like a logical next step, however the backlash thankfully made them backpedal for the time being. The way this was communicated was, once again, legendarily bad. That being said, don't be surprised if they end up doing this at some point anyway.
For clarity, we're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected. https://t.co/CkTiVCTmd7
— Amol Avasare (@TheAmolAvasare) April 21, 2026
Private Discord Mythos Access
Another small sidenote, this one about gatekeeping model behavior: Turns out a private Discord has had Mythos access for a couple weeks now, and mostly used it for fun roleplay stuff? If you were looking for another confirmation that releasing these models is.. probably fine?, then let this be it.
A private discord has had access to Mythos since launch. https://t.co/qTrUuJxgMU pic.twitter.com/p3Qidov8yu
— Andrew Curran (@AndrewCurran_) April 21, 2026