“As artificial intelligence got smarter, it was supposed to become too cheap to meter. It's proving to be anything but.
Developers who buy AI by the barrel, for apps that do things like make software or analyze documents, are discovering their bills are higher than expected -- and growing.
What's driving up costs? The latest AI models are doing more "thinking," especially when used for deep research, coding or as AI agents. So while the price of a unit of AI, known as a token, continues to drop, the number of tokens needed to accomplish many tasks is skyrocketing.
It's the opposite of what many analysts and experts predicted even a few months ago. That has set off a new debate in the tech world about who the AI winners and losers will be.
"The arms race for who can make the smartest thing has resulted in a race for who can make the most expensive thing," says Theo Browne, chief executive of T3 Chat.
Browne should know. His service allows people to access dozens of different AI models in one place. He can calculate, across thousands of user queries, his relative costs for the various models.
Remember, AI training and AI inference are different. Training those huge models continues to demand ever more costly processing, delivered by those AI supercomputers you've probably heard about. But getting answers out of existing models -- inference -- should be getting cheaper fast.
Sure enough, the cost of inference is going down by a factor of 10 every year, says Ben Cottier, a former AI engineer who is now a researcher at Epoch AI, a not-for-profit research organization that has received funding from OpenAI in the past.
Despite that drop in cost per token, what's driving up costs for many AI applications is so-called reasoning. Many new forms of AI re-run queries to double-check their answers, fan out to the web to gather extra intel, even write their own little programs to calculate things, all before returning with an answer that can be as short as a sentence. And AI agents will carry out a lengthy series of actions based on user prompts, potentially taking minutes or even hours.
As a result, they deliver meaningfully better responses, but can spend a lot more tokens in the process. Also, when you give them a hard problem, they may just keep going until they get the answer, or fail trying.
Here are approximate amounts of tokens needed for tasks at different levels, based on a variety of sources:
-- Basic chatbot Q&A: 50 to 500 tokens;
-- Short document summary: 200 to 6,000 tokens;
-- Basic code assistance: 500 to 2,000 tokens;
-- Writing complex code: 20,000 to 100,000+ tokens;
-- Legal document analysis: 75,000 to 250,000+ tokens;
-- Multi-step agent workflow: 100,000 to one million+ tokens.
Hence the debate: If new AI systems that use orders of magnitude more tokens just to answer a single request are driving much of the spike in demand for AI infrastructure, who will ultimately foot the bill?
Ivan Zhao, CEO of productivity software company Notion, says that two years ago, his business had margins of around 90%, typical of cloud-based software companies. Now, around 10 percentage points of that profit go to the AI companies that underpin Notion's latest offerings.
The challenges are similar -- but potentially more dire -- for companies that use AI to write code for developers. These "vibecoding" startups, including Cursor and Replit, have recently adjusted their pricing. Some users of Cursor have, under the new plan, found themselves burning through a month's worth of credits in just a few days. That's led some to complain or switch to competitors.
And when Replit updated its pricing model with something it calls "effort-based pricing," in which more complicated requests could cost more, the world's complaint box, Reddit, filled up with posts by users declaring they were abandoning the vibecoding app.
Despite protests from a noisy minority of users, "we didn't see any significant churn or slowdown in revenue after updating the pricing model," says Replit CEO Amjad Masad. The company's plan for enterprise customers can still command margins of 80% to 90%, he adds.
Some consolidation in the AI industry is inevitable. Hot markets eventually slow down, says Martin Casado, a general partner at venture-capital firm Andreessen Horowitz. But the fact that some AI startups are sacrificing profits in the short term to expand their customer bases isn't evidence that they are at risk, he adds.
Casado sits on the boards of several of the AI startups now burning investor cash to rapidly expand, including Cursor. He says some of the companies he sits on the boards of are already pursuing healthy margins. For others, it makes sense to "just go for distribution," he adds. Cursor didn't respond to requests for comment.
The big companies creating cutting-edge AI models can, at least for now, afford to collectively spend more than $100 billion a year building out infrastructure to train and deliver AI. That includes well-funded startups OpenAI and Anthropic, as well as companies like Google and Meta that redirect profits in other lines of business to their AI ventures.
For all of that investment to pay off, businesses and individuals will eventually have to spend big on these AI-powered services and products. There is an alternative, says Browne: Consumers could just use cheaper, less-powerful models that require fewer resources.
For T3, his many-model AI chatbot, Browne is beginning to explore ways to encourage this behavior. Most consumers are using AI chatbots for things that don't require the most resource-intensive models, and could be nudged toward "dumber" AIs, he says.
This fits the profile of the average ChatGPT user. OpenAI's chief financial officer said in October that three-quarters of the company's revenue came from regular Joes and Janes paying $20 a month. That means just a quarter of the company's revenue comes from businesses and startups paying to use its models in their own processes and products.
And the difference in price between good-enough AI and cutting-edge AI isn't small.
The cheapest AI models, including OpenAI's new GPT-5 Nano, now cost around 10 cents per million tokens. Compare that with OpenAI's full-fledged GPT-5, which costs about $3.44 per million tokens, when using an industry-standard weighted average for usage patterns, says Cottier.
While rate limits and dumber AI could help some of these AI-using startups for a while, it puts them in a bind. Price hikes will drive customers away. And the really big players, which own their own monster models, can lose money while serving their customers directly.
In late June, Google offered its own code-writing tool to developers, completely free of charge [1].
Which raises a thorny question about the state of the AI boom: How long can it last if the giants are competing with their own customers?” [2]
1. In late June 2025, Google released the open-source Gemini CLI, a free, terminal-based AI coding agent. This followed the release of a free tier for Gemini Code Assist in February 2025. These offerings give developers free access to AI coding tools powered by Google's Gemini models.
The Gemini CLI, released on June 25, 2025, is an open-source command-line interface that allows developers to use Gemini directly from their terminal. A terminal-based AI coding agent is an artificial intelligence tool designed to assist developers with coding tasks directly within the command-line interface (CLI). These agents integrate with the terminal environment, allowing users to interact with them using natural language prompts and receive code suggestions, complete features, debug issues, or perform other development-related tasks without leaving their familiar terminal workflow.
Purpose: It functions as a local utility supporting coding, problem-solving, content generation, and task management.
Models: The free version provides access to the Gemini 2.5 Pro model with a 1 million-token context window.
Usage: It provides up to 1,000 requests daily and 60 requests per minute.
Access: It can be used by logging in with a personal Google account, which issues a free Gemini Code Assist license.
Free Gemini Code Assist
Released in February 2025, Gemini Code Assist offers a free tier for individual developers.
Purpose: The tool acts as an AI pair programmer within a developer's integrated development environment (IDE).
Features: It can generate and complete code, explain and debug existing code, and perform AI-powered code reviews.
High limits: The free tier provides up to 180,000 code completions per month.
Integrations: The free IDE extension is available for Visual Studio Code and JetBrains IDEs. A companion tool, Gemini Code Assist for GitHub, provides free AI code reviews for pull requests.
Access: A personal Gmail account is required to sign up, with no credit card necessary.
2. EXCHANGE --- Keywords: AI Was Supposed to Get Cheaper. It's More Expensive Than Ever. --- Artificial intelligence is doing more 'thinking' than ever before. Small companies are feeling the pinch. Mims, Christopher. Wall Street Journal, Eastern edition; New York, N.Y.. 30 Aug 2025: B2.
2. EXCHANGE --- Keywords: AI Was Supposed to Get Cheaper. It's More Expensive Than Ever. --- Artificial intelligence is doing more 'thinking' than ever before. Small companies are feeling the pinch.Mims, Christopher. Wall Street Journal, Eastern edition; New York, N.Y.. 30 Aug 2025: B2.
Komentarų nėra:
Rašyti komentarą