The shift from per-seat to per-token AI pricing is not a billing footnote. It is a fundamental change in how AI costs behave inside your business, and the organisations that understand it first will have a structural advantage that compounds over time.
- AI pricing is moving from per-seat licences to per-token consumption, which means your costs will track actual usage rather than headcount.
- Most organisations get worse results from AI because they automate broken workflows. Fix the constraint first, then integrate.
- Work is shifting from doing to framing and verifying. Your people's judgment is now the scarce resource, not their hours.
- The businesses compounding value fastest are the ones measuring what AI actually changes in their workflows, not just what it costs per month.
The Old Model Was Built for a Different World
Software pricing has run on seats for decades. You pay per user, per month, whether those users open the tool or not. IT budgets were predictable. Vendor relationships were simple. And the economics made sense when software was mostly static, doing roughly the same thing for every user every day.
AI does not work that way. A language model doing a simple task might consume a fraction of a cent in compute. That same model writing a detailed technical specification, cross-referencing three documents, and producing a structured output might cost fifty times more. Per-seat pricing papers over that variation. Per-token pricing exposes it.
The transition is already underway. OpenAI, Anthropic, Google, and the major enterprise AI platforms have all moved toward consumption-based models at their API layer. What you pay is increasingly tied to what you actually use, measured in tokens, which are roughly three-quarters of a word in most languages. As AI moves deeper into business software, that consumption layer is surfacing in the tools SMBs actually use every day.
What Per-Token Pricing Actually Means for Your P&L
Here is the practical difference. Under per-seat pricing, your AI spend is a fixed cost. It sits on your P&L like a SaaS subscription and behaves like rent. Under per-token pricing, your AI spend is a variable cost. It moves with activity, with complexity, and with how aggressively your team leans on the tools.
That is not automatically bad. Variable costs tied to productive output are often the best kind of cost. If your estimating team produces twice as many quotes because AI is doing the first-draft work, and your token spend doubles alongside that output, you have a ratio worth protecting. The problem comes when token spend scales with activity but the activity itself is not generating value.
This is why the question of workflow design matters so much more now. Per-seat pricing let you get sloppy. You paid the same whether AI helped close three deals or was used to draft emails nobody read. Per-token pricing makes the inefficiency legible. And if you have not thought carefully about what you are pointing AI at, that legibility is going to be uncomfortable.
The Three Budget Scenarios
Most SMBs will end up in one of three positions as per-token pricing takes hold.
- Spend less. Your team is currently licensed for AI tools they use inconsistently. Moving to consumption pricing drops your baseline cost and you pay more only when usage drives real output. This happens most often in businesses that adopted AI tools early but never built consistent workflows around them.
- Spend more, proportionally. Your team integrates AI deeply into high-value work, token costs scale with output, and your revenue scales faster than your costs. This is the position you want to be in.
- Spend more, disproportionately. Token costs rise because AI is being used on tasks that do not produce measurable value. Drafting internal updates nobody acts on. Summarising reports that do not change decisions. This is the scenario that will quietly erode margin until someone looks at the numbers.
The only way to know which scenario you are heading toward is to understand your workflows before you optimise them with AI. That step, the diagnostic step, is where most businesses skip straight past it.
The 94 Percent Problem
Research from McKinsey and BCG has consistently shown that the majority of organisations attempting large-scale AI integration do not achieve the productivity gains they expect. The pattern holds across studies: roughly 94 to 96 percent of organisations see flat or negative results in the first cycle of AI adoption. The remaining 4 to 6 percent compound value at a rate that pulls them away from the field.
The difference is not access to better tools. The tools are largely the same. The difference is sequencing.
The 94 percent grab tools first. They find a workflow that looks automatable, point AI at it, and measure the time saved per task. Sometimes that produces real gains. More often, it produces a faster version of the wrong thing.
The 4 to 6 percent find the constraint first. They ask: what is the one thing that, if we fixed it, would unlock everything downstream? Then they redesign the workflow around that constraint. Then they integrate AI into the redesigned workflow.
That sequence matters because AI amplifies what already exists. If your estimating process is slow because the underlying data is unstructured and the decision criteria are implicit, AI will produce fast estimates that are still unreliable. You have not solved the problem. You have automated the symptom.
A Real Example: From Estimating Bottleneck to Doubled Revenue
A contracting and cladding firm came to us having tried several AI tools with mixed results. Their estimating team was the bottleneck. Jobs were being quoted late, the follow-up process was inconsistent, and the business was turning away work it could not price fast enough.
The instinct was to point an AI tool at the existing estimating process and make it faster. We pushed back. The problem was not speed. The problem was that the estimating process had no clear structure. Inputs were scattered across emails, site photos, and handwritten notes. The criteria for pricing decisions lived entirely in the heads of two senior estimators.
We spent six weeks restructuring the workflow before a single AI tool was integrated. We defined the inputs, standardised the decision criteria, and built a simple template that captured the right information upfront. Then we integrated AI into that redesigned workflow to handle first-draft estimates, flag missing inputs, and generate client-facing summaries.
The result over the following six months was a doubling of revenue. Not because AI made estimating faster, though it did. Because fixing the estimating constraint unlocked capacity throughout the entire business. AI accelerated the benefit, but it did not create it.
Work Is Shifting from Doing to Framing and Verifying
The IMF has estimated that 60 percent of jobs in advanced economies will materially change as AI capabilities expand. That number gets quoted in contexts ranging from alarming to encouraging. The useful way to read it is neither.
What it describes is a shift in where human judgment is most valuable. In most knowledge work roles today, the majority of time goes to production: writing the thing, calculating the thing, assembling the thing, formatting the thing. AI is very good at production tasks when the inputs are clear and the criteria are defined.
What AI cannot do well is frame the problem correctly or verify that the output actually solves the right problem in the right context. Those are judgment tasks. They require understanding of the business, the client, the relationship, and the consequences of being wrong. That is where your people's time is increasingly most valuable.
This is the practical meaning of AI amplifying talent rather than replacing it. A good estimator using AI is not doing less skilled work. They are doing more of the skilled part, the part where experience and judgment produce outcomes that a model cannot replicate, and less of the time-consuming production work that was never what made them valuable in the first place.
The businesses that get this right are building roles around framing and verification. The ones that miss it are measuring lines of output per hour and wondering why quality is declining.
The J-Curve and the K-Curve
There is a predictable performance pattern when organisations integrate new technology without fixing underlying constraints first. Productivity dips before it recovers, because the team is now managing both the old problems and the new tools. This is the J-curve.
Most organisations adopting AI are on the J-curve. They are slower than they were six months ago at some tasks because the AI integration created friction without eliminating the root cause of inefficiency. The productivity lift that research consistently shows on AI-assisted knowledge work, often around 40 percent on specific well-defined tasks, requires that the tasks themselves are sound and the workflow is structured.
The organisations compounding value are on what looks more like a K-curve. One branch of the K climbs steeply as AI accelerates high-value work on well-designed workflows. The other branch declines as lower-value tasks, manual data handling, routine formatting, repetitive communication, become cheap enough to eliminate or delegate entirely. The organisation that was doing all of those things with the same team is now doing more of the high-value work with the same headcount. Or the same amount of high-value work with a smaller team and wider margins.
Growth That Sticks
A therapy clinic we worked with illustrates this clearly. They started with three practitioners and a backlog. Admin work, scheduling, intake documentation, and insurance correspondence was absorbing about a third of each clinician's time. That time was expensive, and the work itself was not what the clinicians were trained for or hired to do.
By redesigning the intake and documentation workflow and integrating AI into the administrative layer, clinician time shifted toward billable care. Within eighteen months the clinic had grown to sixteen practitioners. The growth was not solely because of AI. It was because fixing the administrative constraint made it possible to bring on practitioners without proportionally scaling the overhead that was previously the limit.
A construction firm made a comparable jump from $42 million to $180 million in revenue over a similar timeframe. The constraint in that case was business development and estimating capacity. The fix was structural before it was technological. AI compressed the timeline, but the sequencing was the same: find the constraint, fix the workflow, then integrate.
How to Measure Workflow Value Before You Spend
Per-token pricing gives you a mechanism to measure AI ROI at a granular level that per-seat pricing never allowed. But that measurement is only useful if you know what you are measuring for.
Before you scale AI spend under a per-token model, three questions are worth answering clearly.
First: what is the constraint that limits your revenue or your capacity right now? Not what is inconvenient. Not what your team complains about. The actual bottleneck that, if removed, would let more value flow through the business.
Second: what does the workflow around that constraint look like today, and is it structured enough for AI to help? If inputs are inconsistent, criteria are implicit, and outputs are poorly defined, AI will make the mess faster. The workflow needs to be sound before AI makes it faster.
Third: what does a good outcome look like, and how will you know if you are getting it? If you cannot define what the AI-assisted version of this task should produce and how you will verify it, you do not have a workflow yet. You have a hope.
Answering those three questions before you configure a single integration is the difference between the J-curve and the K-curve. It is also the difference between per-token costs that track value creation and per-token costs that track activity with no clear return.
Do It With Your People, Not To Them
One more thing the 4 to 6 percent consistently get right: they build these workflows with the people who do the work, not for them.
This matters for practical reasons as much as cultural ones. The person doing the estimation knows where the information gaps are. The clinician doing the intake knows what documentation is actually used downstream. The project manager knows which parts of the weekly report anyone actually reads. That knowledge is the raw material for workflow design, and it only surfaces when the people closest to the work are in the room.
AI integration done without that input tends to produce polished outputs that do not quite fit the real context. The team works around them. Token spend rises. Value delivered stays flat.
Done with that input, the workflow changes stick. The team has a stake in making the integration work because they shaped it. And the outputs fit the actual requirements of the work because the people who understand those requirements were part of the design.
This is not a soft point about change management. It is a hard point about information quality. The people doing the work hold the constraint knowledge. If you bypass them, you are building on incomplete information, and AI will amplify that gap just as efficiently as it amplifies everything else.
Frequently asked questions
Per-token AI pricing is not a technical detail. It is a business model shift that will make the quality of your workflow design visible in your costs every month. The organisations that are compounding value right now are not the ones with the most AI tools. They are the ones who found their binding constraint, redesigned the workflow around it with their team, and then integrated AI into a process that was already sound. The single next action: map your one highest-cost constraint, write down the inputs, the criteria, and what a verified good output looks like, and only then ask how AI could accelerate it.
