There is a growing chorus of voices in legal AI telling you to be very, very worried about the cost of tokens. Stanford says agentic AI uses 1,000 times more tokens than a chat query. Bloomberg Law says the subsidies are ending and the meter is about to start. A company called Portal26 just launched an entire product category — “Agentic Token Controls” — to cap your runaway AI spend before it eats your budget alive.

The message is clear: usage-based AI pricing is a ticking time bomb, and you had better lock in a flat rate while you still can.

I have spent the last few days stewing over an economic model of legal AI costs, and I think this narrative is almost entirely wrong. Not wrong about the facts — the Stanford data is real, the token multipliers are real, and yes, AI vendors are subsidizing current prices. Wrong about the conclusion. Wrong about what the numbers actually mean when you do the math instead of just reading the headline.

Let me show you.

Start With the Deal

Josh Kubicki’s recent Brainyacts briefing cites a case study from law.co — a mid-size corporate firm running M&A purchase agreement reviews through a five-agent AI chain. Before any optimization, the firm was consuming 3.2 million tokens per deal. At Sonnet rates, that is somewhere between $16 and $48 in raw AI compute.

The legal fees on an M&A purchase agreement review at a mid-size firm? Call it $50,000. That is a conservative round number.

So the AI compute cost was, at worst, one-tenth of one percent of the deal fee. Before anyone lifted a finger to optimize anything.

Now let us make it scary.

The 1,000x Scenario

The Stanford Digital Economy Lab found that agentic tasks can consume 1,000 times more tokens than simple code reasoning and chat. That is the headline number that launched a thousand LinkedIn posts about the coming token apocalypse.

Fine. Let us take it at face value. Multiply those 3.2 million deal tokens by 1,000 and you get 3.2 billion tokens. Assume a 75/25 split between input and output tokens, which is reasonable for agentic workflows that spend most of their cycles re-reading context rather than generating new text. At Sonnet rates, with no caching, no optimization, no discount of any kind, the naive cost is $19,200.

That is 38% of the deal fee. Now it sounds like a real number. Now the panic makes sense.

Except it does not. Because that calculation treats every token as if it costs the same, and in an agentic workflow, that is not how any of this works.

What the 1,000x Is Actually Made Of

When an agentic AI system loops through a task — retrying approaches, reading files, building context, refining its output — the token count explodes. But the composition of those tokens matters enormously.

Most of the tokens in an agentic loop are the same context being re-read on every cycle. The system prompt, the uploaded documents, the accumulated conversation history. Each cycle adds a relatively small amount of new input and new output. The rest is recycled context.

And recycled context is exactly what prompt caching covers. Cached input tokens cost 90% less than fresh ones.

Here is what that does to the math. At a moderate 75% cache rate — meaning 75% of the input tokens on each cycle are cached context, which is conservative for a system that is re-reading the same contract fifty times — the 1,000x scenario drops from $19,200 to roughly $14,300. At 90% caching, it drops to $13,400. And this is not a theoretical optimization. Claude handles caching automatically — every turn in a session re-reads the accumulated context at cached rates, no engineering required.

But here is what the model really reveals: the output tokens, not the input tokens, are the cost driver. Output tokens are never cached. They are always full price. In the 1,000x scenario at 75% caching, output tokens account for $12,000 of the $14,300 total. The entire caching debate — the part that dominates the conversation about token costs — is fighting over the remaining $2,300.

The “1,000x token usage” framing conflates volume with cost. It is like saying a lawyer who re-reads a contract ten times did ten times the work. They did not. They read the same thing again. Unlike the lawyer, the AI actually charges 90% less for the second through tenth readings.

Stack Every Worst Case

I want to be honest about the upper bound. Let us stack every worst-case assumption simultaneously:

  • The 1,000x agentic multiplier (the extreme outlier, not the expected range)
  • Zero prompt caching (ignoring how the technology actually works)
  • Doubled token prices (assuming subsidies fully unwind and prices go up 100%)

The result: $38,400. On a $50,000 deal.

That is the single scenario where token costs start to matter. And it requires you to simultaneously assume the worst-case usage multiplier, ignore the primary cost reduction mechanism built into the platform, and double the price of every token. If you told a first-year associate to model a risk scenario that required stacking three independent worst-case assumptions to produce a concerning result, they would tell you that is not a risk — that is a tail event.

Now here is what the realistic range looks like. Kubicki’s briefing cites Gartner’s March 2026 analysis, which puts the agentic multiplier at 5x to 30x, not 1,000x. At 30x with 75% caching and current prices, the AI compute on that $50,000 deal costs $430.

Four hundred and thirty dollars. On a fifty-thousand-dollar deal. That is the number everyone is panicking about.

Who Benefits From the Panic?

The loudest voices in the token cost panic are not neutral observers. The Stanford paper’s “1,000x” headline is empirically accurate but stripped of economic context — 1,000 times almost nothing is still almost nothing. The Bloomberg Law piece is explicitly framed as an argument for locking in flat-rate pricing now. Portal26 is literally selling a product that solves the problem. And the broader narrative — that usage-based pricing is dangerous and flat-rate licensing is safe — benefits exactly one category of vendor: the legal-specific AI platforms that charge per-seat flat rates.

The ones with billion-dollar valuations, venture-funded pricing, and a business model that depends on firms paying the same license fee whether 20% or 80% of their attorneys actually use the tool. For those vendors, the token cost panic is not a bug. It is a feature. Every firm that locks into a flat-rate contract because they are afraid of unpredictable token costs is a firm that just chose the pricing model that benefits the vendor’s economics, not the firm’s.

I am not saying those tools are bad. I am not saying flat-rate pricing is never the right choice — my own company bills consulting on a flat rate. I am saying that the market narrative about token costs is doing the vendors’ sales work for them, and you should at least notice that before you sign the contract.

The Legitimate Concern (and Its Answer)

There is one version of the cost predictability argument that is not FUD, and I want to give it its due.

If you are a law firm CTO or CFO who has never managed usage-based AI spend before, the preference for predictability is rational. You do not have the tooling, the budgeting frameworks, or the institutional muscle memory for variable AI costs. That is a real operational gap.

But it has answers. Claude Enterprise already ships four levels of spend controls: organization-wide monthly caps, group-level caps, per-seat-tier caps (Standard vs. Premium), and individual per-user caps. The limits are hierarchical — a user cannot exceed their individual cap, their group cap, or the organization cap, whichever is lowest. When someone hits their limit, they are blocked until the next billing period or until an admin raises the cap. That is more granular cost governance than most firms have on their Westlaw spend. And it is a shipping product, not a roadmap item.

Is the matter-level attribution tooling where it needs to be? No. Kubicki is right that most firms cannot tell you what the AI compute cost was on a specific matter. That infrastructure needs to be built. But the answer to “we do not have good cost visibility yet” is not “pay a 3x to 10x premium for flat-rate pricing so we do not have to look.” The answer is to build the visibility so you know whether you are getting a good deal on your flat-rate per-seat pricing.

The Reasons That Actually Matter

Here is my real argument, and it is not about cost at all.

There are excellent reasons to care about token efficiency. Cost barely makes the list. The reasons that actually matter:

Data minimization. Every token you send to a model is data leaving your environment. Rule 1.6 might have something to say about sending a model more client information than the task requires. Token efficiency is not so much a cost optimization — as it is a professional responsibility practice. Do not send the entire deal room to summarize one document.

Output quality. Tighter, better-structured context produces better reasoning. Models perform worse when you flood them with irrelevant context. Pruning your token usage is not about saving money — it is about getting better work product.

Latency. Fewer tokens means faster responses. In an agentic workflow where the system is cycling through multiple steps, token efficiency is the difference between a result in three minutes and a result in thirty.

Environmental impact. This is the one nobody in legal AI is talking about, and they should be. Data center energy consumption is projected to hit 1,050 terawatt-hours by 2026. That would make data centers the fifth-largest energy consumer on the planet — between Japan and Russia. Reasoning-mode queries draw 5 to 12 times more energy than standard inference. A typical 100-megawatt AI data center consumes 1.5 to 3 million cubic meters of water per year for cooling.

Every large law firm I know has an ESG page on their website. Most of them publish sustainability commitments. Not one of them is accounting for AI compute in their environmental reporting. They are tracking the carbon footprint of their office buildings while ignoring the energy footprint of the millions of tokens their attorneys are burning every day.

That is not a criticism. It is an observation that the industry has not connected these dots yet. Token efficiency is an environmental issue, and the firms that figure that out first will have a genuine story to tell — to their clients, to their recruits, and to the market.

Cost. Last on the list. The math does not justify the panic.

What This Means

One fair objection: a single deal at $430 is a rounding error, but a firm running 500 matters a month through agentic AI at varying complexity levels starts to see real aggregate numbers — maybe $50,000 to $200,000 a month in total token spend. That is not catastrophic, but it is not invisible either. That is the scale where firms need real cost attribution — the ability to see token spend by matter, by practice group, by workflow type. Yet another argument to build that visibility into token usage.

The conversation about token costs is a distraction from the conversations that actually matter. It is easier to argue about whether to pay $1,000 per seat or $430 per deal than to argue about what happens to associate leverage when AI absorbs junior-level work. It is easier to compare vendor pricing models than to ask whether your tool architecture is pushing your attorneys into fragmented, cognitively degraded work patterns instead of genuine strategic delegation.

The token cost debate is the comfortable debate. The one where the numbers are small enough that nobody has to change anything fundamental about how law firms operate.

The uncomfortable debate — the one about leverage economics, margin transparency, cognitive architecture, and who captures the value of AI efficiency gains — is the one that will actually determine which firms thrive in the next five years.

But first: stop panicking about the tokens. The math does not support it. Your time, your data hygiene, your environmental footprint, and your competitive position are all better reasons to optimize. The cost is a rounding error.

Do the math yourself. You will see.

Print:
Email this postTweet this postLike this postShare this post on LinkedIn
Photo of Ryan McClead Ryan McClead

Ryan is Principal and CEO at Sente Advisors, a legal technology consultancy helping law firms with innovation strategy, project planning and implementation, prototyping, and technology evaluation.  He has been an evangelist, advocate, consultant, and creative thinker in Legal Technology for more than…

Ryan is Principal and CEO at Sente Advisors, a legal technology consultancy helping law firms with innovation strategy, project planning and implementation, prototyping, and technology evaluation.  He has been an evangelist, advocate, consultant, and creative thinker in Legal Technology for more than 2 decades. In 2015, he was named a FastCase 50 recipient, and in 2018, he was elected a Fellow in the College of Law Practice Management. In past lives, Ryan was a Legal Tech Strategist, a BigLaw Innovation Architect, a Knowledge Manager, a Systems Analyst, a Help Desk answerer, a Presentation Technologist, a High Fashion Merchandiser, and a Theater Composer.