
The rise of token pricing
For years, enterprise AI access came with a simple flat fee. Users paid a monthly subscription and leveraged models as needed. That era has ended. Today, token-based pricing is the standard, and costs are rising sharply. AI tokens, which represent the smallest unit of text processed by large language models (LLMs), now determine how much businesses pay for every interaction with generative AI. This shift mirrors the early days of cloud computing, when unpredictable usage bills first startled finance teams. Now, enterprises face a similar shock: token pricing is far more expensive than the previous all-you-can-eat models, and the complexity of managing these costs is daunting.
Why costs are rising
Token prices have actually fallen since 2023, but total enterprise spending has exploded. This classic Jevons paradox occurs because cheaper units drive higher consumption. The rise of agentic AI — models that loop, retry, and correct themselves — has dramatically increased the number of tokens consumed per task. Context windows expanded from thousands to millions of tokens, and usage surged. Power users who once paid $200 a month were costing providers tens of thousands of dollars. As a result, labs and hyperscalers have ended subsidies and are now charging the real cost of tokens. Hardware scarcity, including GPU shortages and power constraints, is keeping token prices from dropping further. Industry leaders predict supply relief may not come until at least 2028, meaning high costs are here to stay.
FinOps meets tokenomics
The FinOps community, which mastered cloud cost optimization, now faces a new challenge. Token pricing is tied to language, not infrastructure, and model releases happen faster than server depreciation. Traditional cloud tools cannot track which model or prompt drove costs. Enterprises must build custom dashboards to measure token consumption, input/output ratios, and caching efficiency. The Linux Foundation is launching a Tokenomics Foundation to standardize how tokens are measured and allocated. The new discipline, called tokenomics, covers the entire lifecycle: production of tokens from energy and capital, consumption through models and agents, and value derived from business outcomes. Without these frameworks, companies risk runaway bills and poor AI investment decisions.
SAP's internal AI FinOps
SAP provides a practical case study. The company runs multiple LLMs across several hyperscalers. Initially, they hit a wall: cloud tools showed total spend but not which model or how much per token. By manually merging data, they gained visibility into model-level costs. This picture transformed the conversation with leadership. Now SAP uses a three-pillar framework: spend visibility (what, how, where), economics (token-level efficiency metrics), and value (cost per use case). They track drift between token consumption and spend to detect mix shifts to pricier models. Every token must earn its cost. This approach has become a mandate from the C-suite, enabling SAP to optimize model routing, set agent limits, and decide which AI features are economically viable.
Business models adapting
Vendors are layering abstract pricing on top of raw tokens. Some use credits that disappear quickly, others combine a base subscription with token overages, and a few pass through token costs directly. All are vulnerable to upstream shocks: model changes, cache failures, or routing errors can instantly alter customer pricing. Microsoft's shift of GitHub Copilot to explicit usage-based charging angered developers who relied on unlimited tokens. The labs themselves sometimes downgrade users to cheaper models without notice, undermining any naive cost-per-token metric. As tokenomics evolves, companies must build guardrails and forecasting tools to avoid surprises.
The human side of token costs
Token pricing is creating a societal divide. Teams deemed worthy get access to expensive frontier models; others are restricted to cheaper ones. This can stifle experimentation and innovation. Yet crude caps can be counterproductive: one Fortune 100 executive advises against shutting down outlier users, as they may be discovering valuable use cases. For new workers, limited token access deepens anxiety about AI replacing jobs. The reality is that those who master AI tools will outpace those who do not. If token costs restrict learning opportunities, the gap between AI haves and have-nots will widen. The future of enterprise AI depends on solving the value measurement problem — determining whether each token spent generates commensurate business benefit.
Source:ZDNET News
