Coinbase’s CEO vows routing prompts to cheaper models
Coinbase CEO Brian Armstrong says the company is routing prompts to cheaper AI models when appropriate to keep costs roughly flat as token usage grows. He expects most workloads to run on cheaper models within 12–18 months, with the newest models reserved for
For users watching token bills climb, Coinbase’s CEO offered a message that sounded almost like a budget memo: don’t waste expensive model capacity where routing can do the job.
On Sunday. Coinbase CEO Brian Armstrong wrote on X that the company is “working hard on routing prompts to cheaper models where appropriate. ” and that in some cases it has been able to keep costs “roughly flat” even as token usage continues to grow. The promise isn’t just about saving money on paper. It’s about sustaining the economics of AI usage when demand for compute keeps accelerating.
Armstrong’s post comes with a clear warning about why this matters. He pointed to the latest model releases—citing Opus 4.8 and GPT-5.5—as promising “bleeding-edge benefits,” while also consuming more tokens. That token appetite, he noted, can be especially noticeable once features like “Fast mode” are turned on. He also referenced earlier user complaints tied to Anthropic’s Opus 4.7. when many users said they were quickly hitting rate limits.
The strategy Armstrong described is built around a shift in where the highest costs land. He anticipates that “80% of workloads will be running on 99% cheaper models within 12-18 months.” In his view. the newest models will be used only when the situation calls for pushing intelligence to the edge—his phrase for it was “IQ maxing.” That includes “scientific breakthroughs or agent orchestration.”.
Armstrong then made the logic behind the trade-off sound less like a pricing tactic and more like a ceiling on the future of model choice. “This leads me to think the limiting factor will be energy and compute, not better models,” he wrote.
His message spread quickly beyond Coinbase’s own product circle. Venture capitalist Marc Andreessen called it “interesting.” Hugging Face cofounder Julien Chaumond said that “model routing is growing a lot these days.” Box CEO Aaron Levie responded that Armstrong’s numbers were “a bit extreme. ” adding that AI use would likely “stratify” in the coming years—where “high end” work gets handled by leading models and “high volume” tasks shift to cheaper ones. Harvey cofounder Winston Weinberg weighed in with a single line: “Intelligence allocation is going to be extremely important.”.
The debate also tugged at a broader change in the culture around AI spending. The efficiency mindset Armstrong is publicly advocating is described as relatively new—especially compared with the earlier era when “tokenmaxxing” was the rage and tech leaders posted their high token bills or highlighted their use of the latest models. In the startup world. that maxim had a famous cheerleader: Y Combinator CEO Garry Tan advised founders to “let it rip” with tokens. Lance Yan, a YC-backed startup founder, told Business Insider in April that rationing tokens was “stupid.”.
Now, the tone is shifting. Glean cofounder Tony Gentilcore commented that Armstrong’s post was “spot on,” writing that “Everyone technical already knows this,” and adding that “The financial markets are the only ones extrapolating out Opus prices to infinite scale.”
Underneath the arguments is a single pressure point: as token demand grows exponentially. companies that want AI to scale can’t treat every prompt like it deserves the most expensive model in the lineup. Armstrong’s post frames routing to cheaper models as the practical bridge between ambitious usage and real-world cost control—an approach that. for now. has captured attention not just in crypto circles. but across the broader AI business world.
Coinbase Brian Armstrong AI routing token usage Opus 4.8 GPT-5.5 Opus 4.7 model routing tokenmaxxing cost controls intelligence allocation
So they’re doing coupons for AI prompts now?
That sounds like “we’re cheaping out” but dressed up as efficiency. If tokens keep going up, how is “roughly flat” even real? Like I’m still paying at the end.
I’m confused bc doesn’t Coinbase deal with crypto not AI? Also “IQ maxing” sounds like marketing lol. Opus 4.8 and GPT-5.5—aren’t those the ones everyone was complaining about rate limits with?
Routing prompts to cheaper models sounds great until you realize the “newest models reserved for…” whatever that means. So basically if you turn on Fast mode they’ll save money by giving you slower/less capable answers? Or am I mixing it up with that Opus 4.7 thing. I swear every time there’s a new model release my app gets weird and then “token usage” is suddenly the problem.