Parameter Update

Parameter Update: 2025-34

"hallucinations?" edition

Gereon Elvers

08 Sep 2025 — 3 min read

This week's been pretty slow, so no hard feeling if you skip it!

Qwen-3-Max-Preview

After almost being outdone by a Chinese delivery service last week, this week Alibaba shot back with their largest model yet - Qwen-3 Max, now live on OpenRouter. Apart from some very impressive benchmark numbers for a non-reasoning model, the main thing that's interesting here is the fact that they aren't releasing open weights (yet?) and will only offer the model over their own "Qwen Chat" or via API.

Big news: Introducing Qwen3-Max-Preview (Instruct) — our biggest model yet, with over 1 trillion parameters! 🚀

Now available via Qwen Chat & Alibaba Cloud API.

Benchmarks show it beats our previous best, Qwen3-235B-A22B-2507. Internal tests + early user feedback confirm:… pic.twitter.com/7vQTfHup1Z
— Qwen (@Alibaba_Qwen) September 5, 2025

EmbeddingGemma

While I haven't been following the embedding model space too closely, this one looks very useful: Google’s new 300M parameter, open-weights multilingual embedding model built on Gemma 3 tops the MTEB charts while being small enough to fit on a phone, meaning that things like high-quality on-device vector search might become very feasible, even on lower end Android devices.

OpenAI

Why AIs hallucinate

While OpenAI's new blog post on "Why AIs hallucinate" seems pretty fluffy, the underlying paper is actually very interesting. It argues that hallucinations are not a fundamental problem of current LLM architectures, but an artifact of their training rewarding guesses over acknowledging uncertainty. They may therefore, so the claim, be solvable by improving calibration rather than accuracy. It also features one of the more intutive visuals on the cause of hallucinations I've seen yet (below)

Copied from OpenAI's new paper, Kalai et al (2025)

Jobs Platform & Certifications

In what reads like an extremely corporate post, OpenAI has announced they will be building an OpenAI Jobs Platform, meant to help connect companies and AI adoption experts. In the same vain, they will also be launching OpenAI certifications. As it stands, neither of the two are live yet, so it's hard to judge, but from my standpoint I'm assuming these will be mostly worthless from a technical/skillset standpoint while also being very valuable from a marketing standpoint. Clever, but I'm not convinced.

Acquiring Statsig

In the final OpenAI news of the week, the company has announced their acquisition of Statsig, a platform for data-driven product development. This comes at a time where my Twitter timeline is heavily debating the value of evals vs. A/B test - so it seems that OpenAI has comitted heavily on the latter.

Claude Code: no evals

[well known code agent company]: no evals

[well known code agent company 2]: kinda halfassed evals

[leading vibe coding company]: no evals

[ceo of company selling you evals]: mmmmm yess all my top customers do evals, you should do evals

[vc's in love… https://t.co/FMg8lMyF2Q pic.twitter.com/qKlxIZsZet
— swyx (@swyx) September 4, 2025

Mistral: ASML buying x%

In a surprise turn, ASML (the chipmaker-maker) is now the largest shareholder in Mistral! Investing $1.5 Billion of their $2 Billion Series C for around 11% of the company. I will say that I really like this development - part of the reason why the EU is as week in tech as we are is the fact that we have very few big players, so while I am all for the EU antitrust legislation in theory, seeing moves like this still makes me hopeful.

Parameter Update: 2025-34

Gereon Elvers

Qwen-3-Max-Preview

EmbeddingGemma

OpenAI

Why AIs hallucinate

Jobs Platform & Certifications

Acquiring Statsig

Mistral: ASML buying x%

Read more

Parameter Update: 2026-05

Parameter Update: 2026-04

Parameter Update: 2026-03

Parameter Update: 2026-02