The wrong way to make a budget case for AI is to lead with model benchmarks. The right way is to lead with the line on the P&L the work touches.
Benchmarks are interesting to engineers. The CFO doesn't have a budget category for "MMLU score." They have one for headcount and one for vendor spend. If the AI workflow doesn't show up against either of those, it doesn't get a second year.
The three numbers that travel
We frame every Implement engagement against three columns. They're not always all positive, sometimes the answer is that the workflow saves hours but adds vendor cost, and the trade is what's being negotiated.
- Hours reclaimed. Operator hours that used to be spent on the workflow and are now spent elsewhere. This is the number that lets a department lead reassign capacity instead of asking for headcount.
- Errors avoided. Mistakes the workflow used to surface, missed SLAs, mis-routed cases, billing errors, that the AI is now catching or preventing. Easy to undercount; very visible when it goes wrong.
- Capacity unlocked. Work the team couldn't do before because they were full doing the manual version. Backlog reduction, faster response times, new client segments served.
What we don't lead with
Token spend, model accuracy, latency, vendor logos. Those are all real and important, they live in the technical addendum, not the executive summary. If the executive summary leads with them, you've already lost the argument.
The framing exists because most AI rollouts that get killed in year two don't get killed for technical reasons. They get killed because nobody could explain to the new CFO why the line item exists. The work to translate from "tokens per request" to "hours per case" is the work that makes the line item survive a budget review.
The CFO doesn't need to understand the model. They need to understand what the model did to a number they already track.
If you can't draw a line from the AI workflow to a number on the P&L, the audit isn't done. We'd rather find that out in week two than in budget season.