AI inference is becoming a commodity. Who wins? Who loses?
GLM 5.2 might be another DeepSeek R1 moment, maybe bigger. The open-weight model from Z.ai is really good. I gave it the same one-shot task as Opus 4.8 (to build a cap table simulator) and it roughly matched it for about a seventh of the cost.
In terms of performance, GLM 5.2 is close to Opus 4.8 and GPT-5.5. I’ve put millions of tokens through it and it’s excellent. Prominent people in tech and AI have been raving about it, too.
Jeremy Howard, AI researcher & co-founder of fast.ai, rates GLM 5.2 at least as good as GPT-5.5 and Opus 4.8.
Guillermo Rauch, CEO of Vercel, said he was “impressed, almost shocked, at how good GLM-5.2 by zai_org is at coding,” and that “this changes things.”
Engineers at Kilo, the agentic coding platform, compared GLM’s planning output with Claude Fable 5 on the same rubric. The scores were 9.1 (Fable) and 9.0 (GLM).
Admittedly, open models won’t replace all AI workloads. James Dborin’s analysis is a useful caveat here: Across 18 benchmarks, open models are still on average about five months behind closed models. But coding is the exception. It used to be 15 months behind. Today, it’s probably no more than two months back. So you can’t use open models for everything. But for coding, the race is on.
What does all this add up to?
For one, if models like GLM 5.2 can get you 80-90% of the way there at prices that are 5-8x cheaper than proprietary models, the ginormous revenue numbers we’ve seen the big AI labs do might not be as enduring.
Secondly — and perhaps excitingly — for anyone building with AI, open source is an opportunity. It brings cheaper inference without big performance trade-offs. This means you no longer have to ration tokens to the same extent.
We’ve seen some version of this before. Storage and compute are essentially commodities today. We hardly ration the number of files we store in the cloud or the number of virtual machines a software developer can spin up.
That said, we still pay a premium for the orchestration and reliability of these services. So even if AI inference becomes a commodity, it doesn’t necessarily doom the big labs. They have a ton of talent and infrastructure capable of developing and selling additional valuable services.
So whether or not inference becomes a commodity, the big labs could be fine either way.
Who else is in this picture?
If you’re an AI application layer company and your product is suddenly more profitable because you’ve switched to open models, do you pass the savings on to users or keep the margin? What about your competitors?
As for enterprises, if GLM or some other new open model can do 95% of your AI workloads, how do you recut your AI budget? Would you trust an open model over a closed one? What new guardrails and harnesses would you need around something open source?
Most interesting of all: if you’re building an AI startup, what ambitious work have you been holding off because of token costs? How far would you push things with greater AI capabilities at a fraction of the cost?
Interesting times ahead!