Parameter Update: 2026-19

"gemin-IO-mni" edition

Parameter Update: 2026-19

I was expecting Google IO to be a lot, but this week was still heavy all around!

Google IO

Google had a very Google I/O: a million announcements, almost all of which were Gemini. Still, a lot of them were individually interesting, and most of them will either never exist or be killed within the year. It's impossible to cover everthing, but here are some highlights:

Gemini Omni

Google has been pushing Gemini's multimodal abilities for a while (it's one of the reasons Nano Banana is as good as it is), and Gemini Omni will take it up a notch by being "any modality in, any modality out". Unfortunately, we only get "video out" for now, which puts Google into the slightly awkward position of branding this "everything model" as "Nano Banana for video" which is a strong pitch but actually undersells what's going on?

Either way, it's available to try for free (so go ahead!), and in my testing videos look good, though it's not a generational leap from, for example, Seedance 2 (which is probably a stronger endorsement of the latter than a diss on the former).

Gemini 3.5 Flash

The main "Gemini" model announcement was Gemini 3.5 Flash, which Google is positioning as the fast, agentic, coding-capable default for basically everything.

While I have always been a bit skeptical about the Flash series, the benchmark claims are pretty strong, positioning the modal as on-par with current frontier models (and better than Gemini 3.1 Pro). At the same time, the pricing is not especially “Flash” anymore - the standard API price is $1.50 / $9.00 per million input/output tokens, which puts it in an awkward spot: much cheaper than the biggest frontier models, sure, but expensive enough that token efficiency (which is the main thing OpenAI and Anthropic have been talking about recently) suddenly starts mattering a lot. Also, given Google has historically struggled to get their tool calling/harness interactions to work well in practice, I remain skeptical that this will be a better choice than, for example, DeepSeek V4 for budget-oriented agentic use.

Gemini Spark

Given the wide reach of the Google ecosystem, and the way in which they have been pushing proactive usefullness through further integration of services, the writing has been on the wall for a while on this: Gemini Spark is Google doing an OpenClaw.

This means Spark is an always-on background agent, connected to all aspects of your digital life, able to take actions on your behalf, meant to be interacted with like an executive assistant rather than a chatbot. And in the demo, this seemed genuinely useful. But the question is rarely if the demo is good - for most AI product, the demo tends to be very good. The question is if Spark will work in the way users expect when they actually try it in practice - and if it does so in way that justifies the (now cheaper) "starting at" $100/month Google AI Ultra plans. Context management and permissions are the two main things OpenClaw is still struggling with, and Google is not making their lives easier building this as a product for a billion regular, non-technical people (rather than the small horde of Mac Mini enthusiasts that are regularly using OpenClaw).

Antigravity 2.0

If Gemini 3.5 Flash turns out to be useful to people in practice, it will almost certainly be due to the Antigravity 2.0 harness, which was mentioned almost as much as the model during the keynote. While the harness will be widely deployed, including in many of the smaller features noted below, the most prominent point of interaction will be through the Antigravity desktop application (which now looks almost exactly like the Codex app) or the agy CLI (which replaces the Gemini CLI, but appears to be worse in some ways, including the fact it is now closed source).

Other stuff

Apart from the big headliners above, the two hour long main keynote also featured countless smaller features and integrations, like "braindumping a voice note into a structured Google Doc", "generative UI in Google Search (!)", a "universal shopping cart for the web", "vibecoding native android apps and putting them straight into Google Play", and a million other things. If that sounds interesting to you, just take a look at Google's own "100 things we announced at IO" post here.

OpenAI

Elon Lawsuit

The OpenAI / Elon lawsuit is finally over. After all the drama of the past week, the conclusion is almost underwhelming: Elon's claims were dismissed by the jury due to a timeliness issue, i.e., he has simply made them too late and the statute of limitations has run out. This means we didn't actually get any judgement on the merits of what went down.

I've seen some people on Twitter wonder why Elon even decided to sue if the time issue appears to be pretty clear. The answer is: it actually wasn't that clear - or at least that's the case he tried to make: Musk argued the breach of trust only occurred in 2023, when OpenAI actually started properly morphing into a for-profit. OpenAI argued (and the jury agreed) that by 2017-2019, Musk already knew the facts he later sued over, so the timer actually started then.

Either way, Musk has announced plans to appeal, so this might not be the last we hear from the case.

Erdős Unit Distance Problem

This one seems genuinely impressive: OpenAI says an internal reasoning model disproved a central conjecture around the planar unit distance problem, first posed by Erdős in 1946. Very roughly: if you place n points in the plane, how many pairs can be exactly distance 1 apart? The longstanding belief was that square-grid-like constructions were essentially optimal. The model found an infinite family of better constructions, and the proof was checked by external mathematicians.

While there is some hedging around the proof being "reapplying known methods" instead of coming up with something truly novel (and there might be something to that claim, I am in no position to tell!), the published result appears to be more consistently impressive to serious math people than the majority of previous work I have seen.

According to erdosproblems.com, 671 open problems remain, and I assume we'll see that number decrease in short order.

Cursor: Composer 2.5

Narratively I would love to claim we're already seeing some of the xAI compute being put to good use by the folks at Cursor, but their blog post implies this model actually predates that deal. Either way, Composer 2.5 is build on the same Kimi 2.5 base as Composer 2 but improves on it by a surprising amount. In their own benchmarks, it matches or beats Opus 4.7 and GPT-5.5 in all but the highest reasoning settings, while being massively cheaper.

I have stopped using Cursor's AI features a couple months ago after my free student plan ran out, but if the usage budget in the $20 tier is actually acceptable, this might be enough to win me back! The drop also led to one of the more fun benchmark comparisons in recent memory (keep in mind Composer 2.5 is specifically tuned for coding, so this might not be entirely fair):

Anthropic

Hiring Andrej Karpathy

After spending the past ~two years making educational content around LLMs, Andrej Karpathy (who is one of the founding members of OpenAI before leading self-driving at Tesla), has now joined Anthropic. The last big project of his that blew up was Auto Research, so let's see if that gives us an indication of what he will work on over there.

SpaceX Compute Deal

A couple weeks ago I noted that xAI didn't really seem to know what to do with all the compute they had built, now that no one wanted to use their models. Quickly thereafter, they signed the deal with Cursor to give them some cash and compute (and, optionally, acquire them) to build big new models.

This week, it appears they found someone else interested in the GPUs: Anthropic, who is apparently paying $1.25B per month to SpaceX for compute. At first glance this makes sense: Anthropic desperately needs compute, and SpaceX has a bunch of it. On the other hand, it comes at the same time SpaceX is preparing a $2B IPO, centered to a significant part around its AI capabilities. It also means that Anthropic is paying a direct competitor (especially if SpaceX actually ends up acquiring Cursor down the line). Slightly bizarre all around, but as long as it leads to better Claude rate limits, I'm all for it.