News Daily Nation Digital News & Media Platform

collapse
Home / Daily News Analysis / Why AI tokens will send your enterprise cloud bill sky-high again

Why AI tokens will send your enterprise cloud bill sky-high again

Jun 29, 2026  Twila Rosenbaum  10 views
Why AI tokens will send your enterprise cloud bill sky-high again

AI usage is moving to token-based pricing, a model that is far more expensive than the previous flat-fee approach. As discussed at FinOps X 2026, tokens have become the foundation of the entire generative AI economy. Many enterprise customers are reminded of the early days of cloud pricing, when invoices were volatile and business models shifted frequently. Underneath the confusion, tokens are quietly standardizing how labs translate scarce GPU capacity into billable units, how enterprises measure AI usage, and how software vendors reprice their products.

Tokens: The atomic units of AI

In this new world, the token is the basic unit of AI work. J.R. Storment, executive director of the FinOps Foundation, calls it "the atomic unit of AI." He notes that tokens serve multiple roles: they are the unit of output from hardware, compute, and data centers; the way labs price their outputs and inputs; and the value unit that enterprises look to monetize. This abstraction is why labs and hyperscalers like it. Instead of charging for GPU types, memory, and power directly, they expose a single unit—tokens per million—over a mix of architectures and deployment topologies. OpenAI, Anthropic, Google, and others now publish per-model rate cards with separate prices for input and output tokens, usually quoted in dollars per million tokens.

So what are tokens? An AI token is the smallest unit a word or phrase can be broken down into when processed by a large language model (LLM). Before a model can work with text, it breaks it into fragments, a process called tokenization. For English, one token is roughly four characters, or about three-quarters of a word, so 100 tokens ≈ 75 words. The token hides enormous complexity, from model choice and quantization to caching or agent usage. That complexity is exactly what FinOps teams are now being asked to decode.

The all-you-can-eat token era is over

If 2023 through early 2025 was the era of cheap experiments, the last 18 months have been a rude awakening. Storment describes three distinct phases: the old days of AI before ChatGPT, the good old days when chatbots could write decent code, and the post-November 2025 world when major model releases took AI from pretty good to really good. In the good old days, we saw a brief period of token maxing, where everyone was excited about token leaderboards. Today, token leaderboards are painfully obsolete because no one can afford to waste tokens. As Amazon senior vice president Dave Treadwell said, "Please don't use AI just for the sake of using AI."

Objectively, between June and November last year, global token usage grew in a linear path. Then new models and agentic patterns landed. Context windows went from a few thousand or tens of thousands up to millions of tokens in a single conversation, and agentic hit the scene, adding loops, retries, and corrections. Companies happily subsidized that behavior until they saw the bills. Some $200-a-month power users actually cost upwards of tens of thousands of dollars a month when running everything on the latest model. For example, an analyst firm estimated that a $200 Anthropic plan used to give $8,000 worth of Claude tokens, while a similar OpenAI offering gave $14,000 worth of Codex tokens. Those days are done. Moving forward, companies will have to pay the real cost of AI tokens.

"So now what matters more than anything is AI value," Storment said. "We've got to bring value back to what we're doing… We're in an era where tokens are the main measurement. They're in everything in software, and they're driving a lot of the global token economy."

Scarcity keeps token prices from collapsing

If Moore's law and hyperscale competition were the only forces at work, you'd expect token prices to keep falling. Since 2023, token prices have fallen dramatically. However, both industry experts stress the caveat: the floor may be in sight. Token prices have been flat since November 2025, linked directly to hardware and power constraints. We can't get enough hardware, can't get enough power, and we're seeing backlogs, long commitment periods, and shortages. Intel's CEO has said he doesn't expect real relief in GPU and related component supply until 2028. Supply chain constraints, rising hardware prices, and growing costs of new frontier models all contribute.

The net result is a classic Jevons paradox: falling unit cost, exploding total spend. Even with falling token prices, spend is still rising. At one enterprise scale, unit costs fell, but in some months spend doubled. Global usage is estimated to rise from 6 quadrillion tokens today to 120 quadrillion forecasted tokens within about 3.5 years. Even if token prices drop further once supply loosens, they are unlikely to fall 24 times as fast as volume grows.

FinOps discovers token economics

For the FinOps community, which cut its teeth on cloud right-sizing and reserved instances, token pricing is both familiar and alien. The familiar part is usage-based, big invoices, hard forecasting. The alien part? The unit is tied to language, not infrastructure, and it changes as fast as model releases, not as slowly as server depreciation schedules. AI does not just stretch the cloud playbook; it breaks it. Unlike CPUs, AI models have unique strengths and weaknesses, different cost profiles, and swapping out an LLM is not just a pricing decision but also a quality-of-output decision.

One enterprise's experience is a case study. Its business AI platform runs across multiple different LLMs, including ChatGPT, Anthropic, Gemini, and open source models, layered on different hyperscalers. When it first went looking for AI cost data, it hit a wall. Existing cloud tools were very blind to the nuance of LLMs, so they could tell how much was spent on a provider but not which model or how much. They pulled data manually, merged data across tables, and got a first picture. That picture, once it reached the global infrastructure lead and then the CTO, transformed the conversation. Within days, it went from interesting to a constant demand for more data. That demand forced the enterprise to formalize an internal AI FinOps framework built around three pillars: spend visibility (what is consumed, how, and where), economics (how efficiently AI is leveraged with token-level metrics), and value (connecting AI spend to business outcomes).

"Every token needs to earn its cost," said one executive, echoing the phrase "token factory effectiveness." That factory spans everything from silicon and data center leases to model routing and prompt design.

Tokenomics: Beyond just counting tokens

If FinOps is about cost control and accountability, tokenomics is about the full lifecycle of tokens as an economic good. Storment defines it as the emerging discipline of converting energy and capital into AI tokens and resources, consuming those tokens to drive efficient intelligence, and then ultimately driving value. This breaks into three buckets: production (taking energy and capital to create tokens), consumption (allocation, forecasting, and optimization), and value (how to monetize tokens, adjust pricing based on cost, and labor implications).

That last piece is where token pricing directly collides with software-as-a-service business models. Tokenomics is changing pricing models for Fortune 100 companies. One example is a major platform shifting Copilot toward more explicit usage-based charging. Developers who loved the unlimited tokens are angry because their implicit subsidy vanished. The labs themselves are also tightening the screws in ways invisible at the token level. Some model cards include policies that silently drop users to a different model if they try to build another LLM. Such policies make a mockery of naive "cost per token" metrics because not all tokens are created equal.

Advanced LLMs can chase answers and burn tokens without users knowing what's happening. For instance, one developer reported that an AI model launched a web server, used numerous browsers, and performed many tricks to track down a simple display bug. Had he used token pricing, it would have cost only $12. It's easy to envision a frontier model taking on a more complex problem and burning hundreds or thousands of dollars.

Business models: From credits and seats to blended token bundles

These pricing experiments show a pricey future. Most customers will never see a line item labeled "120 quadrillion tokens." Instead, vendors build layers of abstraction: credits and opaque consumption (like putting quarters in a machine), hybrid subscription plus usage (a basic monthly plus token-denominated overages), or direct pass-through models (showing the token meter more honestly but wrapped in dashboards and guardrails). All are vulnerable to upstream shocks. Anything changes in the token factory—model routing, cache, forecasting—affects consumer pricing, and this cascades into banks and everyone else.

That cascading effect is why a foundation is spinning up a Tokenomics Foundation alongside the FinOps Foundation to give big consumers and suppliers a vendor-neutral place to hash out specifications for measuring and allocating token-based costs. The FinOps Focus specification, originally designed to normalize cloud billing data, is already being extended for token-level telemetry.

The human side: AI haves versus have-nots

Beyond spreadsheets, token pricing shapes who gets to use powerful AI and who doesn't. There is a societal divide between those who can afford AI and those who can't if high token costs persist. At the enterprise level, certain teams are deemed worthy of getting the latest model, while others are routed automatically to cheaper models. Yet there is also a strong argument against crude caps. One executive advised looking across usage for outliers and talking to them rather than capping them, because they might be doing something interesting. In a world where startups receive millions of dollars of tokens to disrupt incumbents, shutting down internal experimentation could be an existential threat.

For individuals, token pricing feeds into anxieties about AI and jobs. The view is nuanced: AI is not immediately coming for everybody's job, but the person who is better at AI is coming for the job of the person who's not using AI. If token prices and quotas restrict who can learn and experiment, that divide will only deepen. For both companies and individuals, we are moving quickly into an AI-token-based economy, which will be far more expensive than it has been. What all that will mean is a question we do not yet have an answer to, but the one thing we know for certain is that it will be orders of magnitude more expensive than it has been.


Source: ZDNET News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy