Business

Coinbase pushes tokenmaxxing while cutting AI costs

Coinbase lets – Coinbase CEO Brian Armstrong laid out five internal steps aimed at keeping AI spend low without restricting how many tokens engineers use. His plan pairs cheaper default large language models and smarter routing with cost controls that focus on efficiency and

On a Friday, Coinbase CEO Brian Armstrong posted a clear message to the engineers building with AI: keep tokenmaxxing, but do it in a way that doesn’t let costs spiral.

In an X post. Armstrong outlined five strategies his company is using to keep AI spend down while still letting engineers use as many tokens as they want. The stakes are personal inside the company—token usage has recently reached one of the highest levels in Coinbase’s history. while AI spending has fallen significantly to nearly half its peak level. based on a graph he attached to the post.

Armstrong’s first move targets what happens at the start of an AI workflow: the default model. He said Coinbase is experimenting with cheaper Chinese large language models as defaults, rather than relying on frontier American AI labs like Anthropic and OpenAI.

He wrote that Coinbase is “experimenting with defaulting to open weight models like GLM 5.2 and Kimi 2.7 through our LLM gateway. ” while still encouraging engineers to select the right model for the task. GLM 5.2 and Kimi 2.7, Armstrong said, are models developed by the Chinese AI labs Z.ai and Moonshot AI, respectively.

The second strategy is about routing work to the right model based on difficulty. Armstrong described the tradeoff in plain terms: a frontier model might be useful for planning. but it can be overkill for execution. “Ultimately, humans shouldn’t be choosing models – AI can automate this task,” he wrote. Earlier in June, Armstrong had already spoken about this approach.

Cost discipline, in his telling, doesn’t come from telling engineers to use less. It comes from reducing waste after the prompt is already in motion. Third, Armstrong said Coinbase uses better caching, a technique that reduces inference costs.

Fourth, he pointed to keeping context lean. The method is straightforward: start new sessions when switching between tasks, rather than letting context carry over and inflate the amount of information models must process.

The final strategy is transparency. Armstrong said Coinbase will “improve visibility into AI spending across the company” so that engineers can use as many tokens as they want—but can see their usage. Coinbase will expect “more impact” from employees who spend more on AI.

Armstrong framed the goal as growth without breaking the budget. “The goal isn’t to suppress usage. It’s to build the infrastructure that makes exponential growth sustainable,” he wrote.

His post landed less than two months after Coinbase laid off 14% of its staff. partly because AI has been changing how people work. In a May post. Armstrong said he’d watched engineers use AI to ship in days what used to take a team weeks. adding that “the pace of what’s possible with a small. focused team has changed dramatically.”.

He also placed his approach in the context of a broader industry debate. He said his strategy follows the industry shift away from the short-lived tokenmaxxing trend that had moved some companies toward imposing usage caps on employees to curb rampant token consumption.

The result. at least inside Coinbase right now. is a paradox engineers can live with: token usage has surged to some of the highest levels in the company’s history. while AI spending has dropped to nearly half its peak—supported. according to Armstrong. by cheaper default models. smarter routing. caching. leaner context. and visibility that turns token use into something engineers can see and manage.

Coinbase Brian Armstrong AI costs tokens tokenmaxxing LLM gateway GLM 5.2 Kimi 2.7 Z.ai Moonshot AI inference costs caching lean context AI spending visibility layoffs

4 Comments

  1. Tokenmaxxing sounds like another crypto TikTok thing. Why would they even care how many tokens engineers use lol. Also “Chinese models”??? I dunno, feels sketchy.

  2. I read “router” and thought they’re sending trades to different exchanges or something. But then it’s like routing AI tasks… still, frontier model overkill, execution model, whatever. Wouldn’t this just make the AI worse though?

  3. Half its peak level spending is still a lot of money. And they’re defaulting to open weight models like GLM and Kimi, which sounds like cheaper knockoffs, so are they basically nerfing stuff? Also token usage hit one of the highest levels in Coinbase history—so engineers got told to stop… but also keep tokenmaxxing? sounds like corporate doublespeak to me.

Leave a Reply

Your email address will not be published. Required fields are marked *

Are you human? Please solve:Captcha


Secret Link