Gemini inches ahead as Vercel customers demand more
Gemini Flash – Even before Google’s big announcements next week, demand for Gemini is showing up inside Vercel’s AI Gateway—where tokens flowing through the platform have shifted toward Google’s Gemini models, with different labs winning depending on how companies measure su
When Guillermo Rauch talks about AI demand, he doesn’t sound worried about hype cycles. He sounds concerned about supply.
At Anthropic’s developer conference last week. the Vercel CEO told MISRYOUM’s news desk he ran into Google first—not on a stage. but in conversations with people building products.. Rauch said he had even contacted a top Google executive recently to ask for more Gemini tokens. describing it as a sign of how quickly customer needs are outpacing what’s available.
While much of the spotlight has stayed on Anthropic and OpenAI, Rauch said Google’s Gemini has been quietly gaining ground with Vercel’s customers—AI startups, software firms, and enterprise teams using a single connection point to access multiple model providers.
Vercel’s AI Gateway is designed to let companies route requests to different AI models through one system. Rauch pointed to what that setup reveals in day-to-day usage: the pattern of which models are being pulled into production workflows.
Rauch said that in March, Anthropic models led the pack on the number of tokens handled by Vercel’s AI Gateway. By early April, Gemini 3 Flash jumped into the lead and stayed there—an outcome Rauch frames as evidence that Gemini is winning real usage, not just attention.
That matters because it’s happening before Google’s next major moment: its annual developer conference, I/O, kicking off next week. Rauch described what’s coming as an opportunity for more capable models, tools, and features.
Gemini Flash’s pitch is straightforward. Rauch said it is less powerful than the full Gemini 3 model, but it’s faster and cheaper—qualities that translate well for corporate customers.
“Enterprise teams tend to pick Gemini Flash and Claude Haiku, the smallest, fastest, cheapest models each lab ships,” Rauch told the newsroom.
He added that Flash, in particular, is drawing strong B2C adoption because it “doesn’t hallucinate much, uses tools effectively, and it’s fast and affordable.”
But Rauch cautioned that AI “winners” depend on the metric. Token volume tells one story; money spent tells another.
In Rauch’s view, a single month can be misleading. “I’m often asked which lab is ‘winning’, but what we see in production looks nothing like the benchmark leaderboards,” he said. “AI Gateway reflects a variety of models winning different use cases.”
Rauch said Google clearly led by tokens in April. Yet Anthropic led by dollars spent, with 61% share.
He explained the mismatch this way: some teams want high-volume, lower-cost responses, while others need “expensive, quality-critical work.” In that split, different labs can come out ahead depending on where the workload lands.
OpenAI’s spend share, meanwhile, rose sharply—from 4% in March to 12% in April—after the launch of its new GPT-5.4 and 5.5 model series. Google climbed as well, moving from 8% to 21% based on dollars spent, as Gemini Flash usage scaled.
Rauch stressed that the snapshot doesn’t predict what comes next, especially with Google I/O approaching.
The takeaway, from inside Vercel’s production plumbing, is that Google’s Gemini momentum isn’t waiting for big announcements.. The shift is already visible in customer demand—token by token—while the spending picture suggests the market is still dividing its needs across different models for different moments.
Google Gemini Vercel AI Gateway tokens Anthropic OpenAI GPT-5.4 GPT-5.5 Gemini Flash I/O