jani@raatti:~ $ cat ~/blog/ai-capacity-is-getting-full.md
---
title: "AI Capacity Is Getting Full"
date: 2026-05-05
author: jani
categories: [AI]
tags: [AI, Claude Code, GitHub Copilot, GPU, Infrastructure]
reading_time: 13 min
cover: "ai-dc-prices-up-optimized.png"
---
AI prices up

Something shifted in the AI developer space over the past few weeks. Subscription limits tightened, usage-based billing appeared where flat rates used to be, and companies that were handing out unlimited inference started quietly pulling back. Streaming developers have been sounding the alarm — AI prices have been heavily subsidized, and that era is ending. But the reason isn’t what most people think. It’s not greed. It’s physics.

“You don’t pause signups because you want to make more money. You pause signups because you don’t have capacity.”
— Theo (t3.gg)

“There is not enough compute in the world right now. That is the problem. We are working with a limited resource and the limited resource isn’t money.”
— Theo (t3.gg)

“Things just can’t be as free as they once were, and the amount of usage you’re going to be getting is clearly and obviously going down.”
— ThePrimeagen

“How could you tell every employee to maximally use AI? By the way, we’re judging you on AI usage. Oh my gosh, you’re using too much AI.”
— ThePrimeagen (on Uber)


I’ve been using Claude Code daily for months now. I’ve watched my own usage patterns shift from “let me ask it a quick question” to “here, take this entire codebase and figure out what’s wrong.” The productivity gains are real — genuinely, meaningfully real. But “it works” and “it’s sustainable” aren’t the same thing, and the last few weeks have made that painfully clear.

The narrative most people are running with is that AI companies got greedy. Anthropic tested removing Claude Code from the $20 plan — obviously they want your money. GitHub Copilot switched to token billing — obviously Microsoft is squeezing developers. Uber burned its annual AI budget in four months — obviously the economics don’t work. Cue the “AI bubble is popping” takes.

That framing is wrong. Not partially wrong — fundamentally wrong about what the problem even is. These companies don’t care about your $20 or even your $200 subscription. What they care about is that they’re running out of GPUs, and you’re sitting on compute they need for enterprise customers who pay full rate.

It’s Compute, Not Money

The real constraint isn’t financial — it’s physical. There aren’t enough GPUs in the world to serve the current demand. Data center GPUs are sold out for months with lead times stretching 36-52 weeks. All three major HBM suppliers — SK Hynix, Samsung, Micron — have their 2025-2026 production fully booked. And the GPUs that do exist can’t all be powered — Microsoft literally has Nvidia cards sitting in inventory because they don’t have enough megawatts to turn them on. Half of planned US data center builds have been delayed or cancelled because of power grid limitations. High-voltage transformer lead times have stretched from 12-18 months to 36-48 months.

When GitHub pauses Copilot signups, when Anthropic restricts Claude Code on the $20 tier — these aren’t revenue plays. These are companies rationing a physical resource they’ve run out of. The subscription tiers were always marketing. Now the marketing is eating the product.

Consider the 7.5x message multiplier for GPT-5.5 on Copilot. GPT-5.5 actually ends up cheaper per run than 5.4 because it uses far fewer tokens despite being 2x more expensive per token. If this were about cost, the multiplier would be lower than 5.4’s, not 7.5x higher. The multiplier exists because Microsoft needs those GPUs for enterprise Azure customers, not because 5.5 costs 7.5x more to run. These numbers are compute rationing, not pricing. Once you see that, every other move by every other company clicks into place.

The Subsidization Problem Is Real — But It’s Not What You Think

Here’s what’s wild about the current moment. Cursor’s internal auditing reportedly shows you can get up to $5,000 of inference from a $200/month Claude Code subscription. Anthropic is losing money on some heavy users purely from electricity costs — before you even count GPU depreciation, training costs, or salaries. OpenAI lost $5 billion on $3.7 billion in revenue last year. They spend $1.35 for every dollar they earn. Anthropic hit $9 billion in annualized revenue by end of 2025 while spending $9.7 billion — and still had to raise $30 billion in February 2026 after raising $13 billion in September 2025.

The bubble crowd looks at these numbers and says “see, it’s all fake.” But the subsidy isn’t hitting a financial wall — it’s hitting a physical one. You can raise another $30 billion. What you can’t do is fabricate GPUs faster than TSMC’s CoWoS packaging capacity allows, or conjure gigawatts of power capacity that takes years to build. Data center occupancy is projected to hit 95%+ by late 2026, up from 85% in 2023.

The Uber story illustrates this perfectly. They rolled Claude Code out in December, 84% of their 5,000 engineers adopted it, monthly API costs hit $500 to $2,000 per engineer, and by April the annual budget was gone. But here’s what most people miss — those engineers were paying full API rates, not the subsidized subscription prices. If they’d each been on $200/month personal subscriptions, Anthropic would be eating the difference. The enterprise pricing is closer to the real cost. The consumer pricing is the fantasy.

The Enterprise Copilot Problem — And Why Direct API Access Might Be the Answer

GitHub Copilot is the default AI coding tool for most enterprises. It’s integrated into VS Code, it has SSO and policy controls, it’s easy to roll out to hundreds of developers. At $19/user/month for Business or $39/user/month for Enterprise, it looks like a reasonable line item. But that flat pricing was always a fiction — and now that fiction is ending.

Flat pricing existed as a sales tactic. People buy flat rates because uncertainty kills purchases. Tell an enterprise CTO “it’ll cost somewhere between $5 and $500 per developer per month depending on usage” and they’ll think about it forever. Tell them “$19/user/month, done” and they sign. The problem is that Copilot’s internal costs nearly doubled week-over-week since January as agentic workflows exploded, and the flat rate became untenable. This isn’t enshittification — it’s GitHub trying to make the numbers work for the first time.

Starting June 1st, Copilot moves to usage-based billing — you get AI credits matching your subscription ($19 in credits for Business, $39 for Enterprise), and anything beyond that is billed by token consumption at listed API rates. Code completions stay unlimited, but chat, CLI, cloud agents, and Spaces all consume credits.

But here’s what enterprises should actually be asking: if you’re paying per-token anyway, why pay the middleman?

Copilot Enterprise at $39/user/month gives you 1,000 premium requests with a markup on the underlying API costs. For a 50-developer team, that’s $23,400/year before overages. The alternative is buying API access directly — Claude Code via Anthropic’s API, or OpenAI’s API — and plugging it into open-source harnesses, VS Code extensions, or terminal tools like Claude Code CLI. You lose some of the managed platform convenience (SSO policies, usage dashboards, PR review integration), but you pay actual API rates instead of Copilot’s marked-up credit system.

For teams that are heavy users — and the whole point of deploying AI to your engineering org is to make them heavy users — the direct API path can be significantly cheaper. You also get to choose your models without multiplier penalties (the 7.5x multiplier for GPT-5.5 on Copilot doesn’t exist when you buy direct). And there’s a longer-term strategic concern worth thinking about: once AI becomes core engineering infrastructure and the market consolidates, whoever controls your AI pipeline has VMware-style lock-in power. Someone will become the Broadcom of AI. Going direct to the API layer at least reduces your exposure to a single intermediary deciding to jack up prices once you’re locked in.

This isn’t to say Copilot is bad — for teams starting out with AI coding tools, the managed experience is worth something. But the move to usage-based billing changes the calculus. If you’re paying per token either way, evaluate whether the Copilot wrapper is worth the premium over direct API access. For a lot of enterprises, especially ones already doing significant inference volumes, it won’t be.

Google Isn’t Winning Either

There’s a popular narrative that Google is the “real winner” here — rich enough to pour $200 billion into AI and still make money, free from the hype machine because they don’t need investor capital. That’s wrong.

Google was subsidizing harder than anyone. Free AI Overviews on every search query — signed out, incognito, unlimited. Subsidized Opus 4.5 access through Gemini subscriptions. People were using Gemini specifically to get cheap access to Anthropic models. And Google had to clamp down faster and more aggressively than anyone else — banning users who built plugins to track usage, restricting API access, walking back the generosity at every turn. Google was arguably the most extreme example of the compute crunch because they were giving away more free inference than anyone.

The reason you don’t think of Google as struggling is because their models haven’t been competitive enough for developers to notice the restrictions. But behind the scenes, Google is as compute-constrained as everyone else — possibly more so, with rumors that they’re using CPUs for some training and inference because that’s what they have available. Google makes their own TPUs and they’re still behind on capacity.

The Cost of Intelligence IS Dropping

Before this reads too much like a doom piece — models are getting dramatically more efficient. The Artificial Analysis intelligence index numbers tell a compelling story: GPT-5.5 medium matches 5.4x-high’s intelligence level at less than half the cost per benchmark run ($1,200 vs $2,800). GPT-5.5 low delivers comparable scores to Claude Sonnet 4.6 at a sixth of the price. And the key insight is that per-token pricing is misleading — what matters is cost per problem solved. GPT-5.5 is 2x more expensive per token but uses so many fewer tokens that it ends up cheaper overall.

The frontier keeps getting more expensive to push, but any given level of capability gets cheaper over time. Gartner projects inference costs for trillion-parameter models will drop 90% by 2030. Whether that timeline is right or not, the direction is clear. The question is whether the industry can survive the next two to three years of physical constraints before efficiency gains bail them out.

What This Means If You’re Building With AI

If you’re using AI tools through personal subscriptions — Claude Pro, Copilot, Cursor — expect the generosity to keep shrinking. You’re getting an incredible deal right now, and the companies providing it are losing money on you. That’s not sustainable. Usage-based pricing is the future, and Copilot’s June 1st switch is just the start.

If you’re making enterprise decisions about AI adoption, budget conservatively. The current API prices are still subsidized. Plan for 30-50% increases over the next 18 months as these companies move toward unit economics that don’t require raising billions every quarter. For every $1 billion spent training a model, organizations face $15-20 billion in inference costs over its production lifetime. And for the love of god, don’t tell your entire engineering org to “maximize AI usage” without understanding what that means for your token bill. Uber learned that the hard way.

If you’re watching this from the outside trying to decide whether AI is real or a bubble — it’s both, and that’s not a contradiction. The technology is genuinely transformative. The economics are genuinely unsustainable at current pricing. Those two things coexist. The dot-com bubble popped and the internet was still the most important technology of the century. AI tools will get more restricted and more expensive in the short term, and they’ll get dramatically cheaper and more capable in the long term. The companies that survive the squeeze will be the ones with the most compute, not the most hype.

My Take

I’m not cutting back on AI usage. The productivity gains are too significant to walk away from, even as the free ride winds down. But I am paying closer attention to which models I use for what, how many tokens my workflows actually consume, and whether I’m getting real value from that inference or just burning tokens because they’re cheap.

The era of “unlimited AI for $20/month” is ending. What replaces it will be more honest about the actual costs — and that’s probably healthier for everyone. The current pricing was always a distortion. The question is how rough the transition period gets before efficiency gains, new fab capacity, and power infrastructure catch up to demand.

This isn’t the end of the AI subsidy economy — this is the start of compute restrictions really affecting the way companies think. Meaningful new HBM3e capacity is expected online in late 2026, but it won’t clear the backlog immediately. Based on everything I’ve seen, we’re looking at 18-24 months of increasing restrictions before things start to ease. Late 2027, early 2028 is when new HBM capacity, additional data center power, and continued model efficiency improvements should start meaningfully closing the gap.

Until then — use AI deliberately, watch your token spend, and don’t build your workflow around subsidized pricing that won’t last.


References & Source Material

Videos That Sparked This Post

Data & Sources

GPU/Compute Supply:
GPU lead times 36-52 weeks, TSMC CoWoS fully allocated
HBM production sold out through 2026
Big Five committed $600-630B capex for 2026

Power Grid Bottleneck:
Microsoft has GPUs sitting idle — can’t power them
Half of US data center builds delayed/cancelled
Transformer lead times stretched to 36-48 months
Data center occupancy projected >95% by late 2026
AI to drive 165% increase in power demand by 2030

Uber:
Burned entire 2026 AI budget by April
84% of 5,000 engineers adopted Claude Code, CTO “back to the drawing board”
$500-$2,000/month API costs per engineer

Subsidization & Losses:
Anthropic: $9B annualized revenue, $9.7B total spend (2025)
Anthropic Series G: $30B raised February 2026
OpenAI: $3.7B revenue, lost ~$5B
Ed Zitron’s “Subprime AI Crisis” analysis
$1B training = $15-20B inference costs over model lifetime
Budget for 30-50% API price increases

GitHub Copilot:
Moving to usage-based billing June 1st
Developer reactions to pricing change
Copilot internal costs leaked — nearly doubled since January
Models and pricing details

Direct API Alternatives:
Claude Code vs Copilot comparison
AI coding tools pricing compared 2026
Claude Code vs Copilot vs Cursor comparison

Efficiency Improvements:
Gartner: inference costs to drop 90% by 2030
AI compute shortage challenges bubble narrative

jani@raatti:~ $ git commit # leave a comment