Science

AI compute crunch: Why usage limits are rising

AI compute – Misryoum explains the compute crunch behind faster-than-expected AI usage caps and what it means for access.

AI usage limits are no longer a nuisance you can ignore, they’re a sign that the digital engine behind popular AI tools is hitting real-world constraints.

In late March. heavy users of Claude large language models began sharing screenshots showing an abrupt jump in scarcity: sessions were running through five-hour usage limits in just minutes.. Complaints spread quickly. and Misryoum reports that the company pointed to peak-hour demand as a key driver. while also tightening how some third-party tools could draw from flat-rate plans.. For many paying customers. the shift felt like a simple price-to-performance mismatch. especially after earlier settings that affect how models “think” were quietly adjusted.

This kind of friction is increasingly familiar across the industry, and Misryoum sees it as part of a broader “compute crunch” narrative: demand for AI is moving faster than the systems that supply it.

The idea behind a compute crunch is straightforward. even if the hardware underneath is anything but: AI services need computing power both when training models and when answering real user queries.. Training scales with how large a model is and how much data it needs. but inference. the step that powers every response. can be intensely expensive too. especially for bigger systems and longer outputs.. If more people use AI. and they use it more heavily. the cost of serving each request rises sharply. turning “unlimited” promises into a budgeting problem.

Misryoum notes that this is where flat-rate subscriptions start to break.. Older internet experiences could afford near-unlimited use because the marginal cost per extra user was low.. AI is different: providers pay for tokens, and tokens map to compute.. When usage grows beyond what a monthly plan can cover. rate limits become the practical lever to keep access from collapsing for everyone.

Even outside strict caps, companies can steer demand by changing defaults or routing users to smaller, cheaper models.. Misryoum says that the same pattern appears when tool access changes. such as when certain third-party integrations can no longer benefit from the way subscriptions were designed.. In effect. these policies don’t just manage cost. they shape what users experience: quicker responses. different capabilities. and fewer “long-thinking” runs.

The supply side adds another layer of difficulty.. Building more AI capacity is not just a software upgrade. it involves factories for chips. power infrastructure for data centers. and specialized components that require long lead times.. Misryoum explains that the bottlenecks show up across the chain because physical industries cannot instantly scale like software companies sometimes can. from semiconductor manufacturing to the electrical and memory systems that keep large-scale AI running.. That’s why “just add more compute” can be an expensive and slow proposition. particularly when multiple buyers compete for the same limited resources.

There’s also the internal competition for resources.. Companies are balancing research efforts that require heavy compute to develop and test stronger models against the serving capacity needed to generate revenue day to day.. Misryoum highlights that this isn’t a clean split between “training” and “inference” so much as a continuous trade-off between building the next generation and keeping today’s services responsive.

Ultimately. the compute crunch matters because AI is becoming the interface for work in education. medicine. customer support. and daily productivity.. When compute becomes scarce. access constraints follow. and the real risk is not just developer frustration but slower. uneven adoption of tools that increasingly influence economic and social speed.