Here's the arithmetic behind a number we hear constantly from prospects: Relativity prices aiR's privilege review at roughly 30¢ a document — we've written about that comparison before. Add a comparably naive, one-shot Responsiveness Review pass through a frontier model — in our own nine-model benchmark, that's claude-opus-4-8 at $0.0813/doc — and you're at $0.3813. Call it 40 cents. And that's before privilege log drafting, quality control, or project management touch the number at all.
We don't think 40¢ is a technology ceiling. It's what a document review pipeline costs when nobody has gone through the exercise of removing waste from it. This post is that exercise, run in order: six decisions, each one building on the last, that take the combined cost of clearing both gates — responsive or not, privileged or not — from 40¢ to under a nickel using nothing but off-the-shelf hosted models. Then we get to where we think the next unlock actually is: small, open-weight models, fine-tuned on labels your own review process is already producing for free.
1.Lever One — Model Selection Beats Model Tier
The single biggest lever requires no new architecture at all: stop assuming the most expensive model is the most accurate one. We ran the same nine models — spanning Qwen, DeepSeek, MiniMax, Kimi, and the Claude family — against gold-labeled sets for both tasks, holding the document set and review logic constant and varying only the model. The pattern repeats almost exactly on both tasks:
| Model | Privilege F1 / $ per doc | Responsiveness F1 / $ per doc |
|---|---|---|
| Qwen 3.6 Plus | 0.868 · $0.0205 |
0.87 · $0.0205 |
| DeepSeek V4 Pro | 0.815 · $0.0027 |
0.81 · $0.0027 |
| claude-haiku-4-5 | 0.824 · $0.0195 |
— |
| claude-opus-4-8 | 0.760 · $0.0813 |
0.76 · $0.0813 |
| claude-sonnet-4-6 | Precision 0.448 — disqualifying | F1 0.61 (recall 0.963, precision 0.448) |
Two things jump out. First, it's the same headline number twice: claude-opus-4-8 costs $0.0813 a document whether you're checking responsiveness or privilege, because underneath the label it's the same operation both times — read a document, decide against a definition — and it's the lowest-F1 performer in both studies despite being the most expensive model tested in both. Second, DeepSeek V4 Pro lands within a few points of the best F1 on both tasks at a price 30× lower than opus. Just this swap — no pre-filtering, no cascade, still one-shot, still full text — takes the combined cost of both checks from $0.1626/doc to $0.0054/doc.
claude-sonnet-4-6 looks cheap and posts strong recall, but its 44.8% precision means it flags roughly half of everything as positive. On responsiveness that's an expensive false-positive problem; on privilege it's a sanctions-exposure problem, since every improperly withheld document is a decision a judge can unwind. A benchmark has to be run per task, not borrowed from a leaderboard — a model's F1 on one classification question doesn't transfer to a different one, even when both are phrased as "is this document X."
2.Lever Two — Stop Paying to Classify What You Can Already Answer With a Lookup
Model selection cuts the price of a model call. The next lever cuts the number of model calls, by refusing to spend one on a question a deterministic check can already answer. Before any document reaches a model in our privilege pipeline, it passes through a pure rule-based screen: is a privileged party even present in the header roster, and has a confirmed non-privileged third party already broken confidentiality? No inference, no tokens — just set membership against a roster built from the matter's own custodian and counsel data.
That single deterministic gate clears 58% of a privilege-eligible collection before a model is ever called. The same principle governs responsiveness ingestion: documents with no extractable text, exact-duplicate files already scored under a different file ID, and known non-evidentiary system files are identified by hash lookup and never generate a model call at all. None of this is a modeling technique. It's the recognition that a meaningful share of any collection can be resolved by a lookup table, and every document resolved that way costs exactly $0 in model spend — which pulls the blended cost across the whole collection well below the per-call price quoted in any benchmark table.
3.Lever Three — Read the Metadata Before You Read the Document
Everything that survives the deterministic gate still doesn't need the same amount of model attention. Both pipelines run a cheap first pass over a compact, structured summary of the document — not the full text — and only escalate to a full-text pass with a stronger model when that first pass can't confidently resolve the question.
In practice, roughly 45% of documents need that deeper pass; the rest are confidently resolved from structured metadata and a compact summary alone, at a fraction of the full-text price. Confidence is calibrated, not self-reported — models cluster their own stated confidence near 1.0 regardless of whether they're actually right, so escalation is governed by a validated decision threshold and a hard floor below which a document is never auto-resolved either way. Ambiguity always escalates rather than resolves; cost discipline never gets to trade away recall. The privilege pipeline runs the identical shape one layer further: a coarse legal-purpose screen on a summary escalates anything with a legal signal, and full-text judgment is reserved for whatever survives every gate before it.
4.Lever Four — Cache the Parts of the Prompt That Don't Change
A matter's tagging or privilege definition — the legal standard every document gets judged against — is often tens of thousands of characters, and it's identical across every document in the scan. Compiling it into a compact brief once per scan, rather than re-sending and re-billing the full definition on every one of hundreds of thousands of calls, turns a fixed cost into a rounding error. Layer provider-side prompt caching on top of that stable prefix and the repeated tokens are billed at a fraction of the standard input rate on every call after the first.
Prompt caching only pays off if an identical prefix lands on the same backend twice. Some routing layers load-balance a single model name across multiple providers or regions that don't share a cache — so a stable, cacheable prefix keeps missing the cache, and every call quietly bills at full price. Pin the route to one deployment and the discount holds. It's a one-line fix with an outsized effect on a bill that's otherwise identical on paper.
5.Lever Five — Spend Attorney Hours on a Sample, Not the Whole Population
The cascade from Lever Three already tells you exactly where the risk in the collection lives: the roughly 45% of documents that needed the deeper pass are, by construction, the hardest and most ambiguous share. Point full attorney review there. For the confidently-resolved majority, a flat "read it all again" QC pass doesn't buy back the accuracy it costs — a calibrated statistical sample, sized off the model's own measured precision and recall on a gold set, tells you the true error rate on the whole population without re-reviewing the whole population.
This is also where the AI cost curve and the human cost curve actually meet. Our own pricing benchmark put manual first-pass responsiveness review at roughly $1.50/document. A 100% manual QC pass on top of an AI-assisted pipeline reintroduces most of the cost the first four levers just removed — sampling, calibrated by the pipeline's own confidence bands, is what keeps the nickel a nickel end to end, not just on the model-inference line item.
6.The Bridge, Summarized
| Stage | What changes | Combined cost, both checks |
|---|---|---|
| Baseline | One flagship model, one-shot, full text, per task | $0.3813/doc |
| + Model selection | Swap flagship for the accuracy-per-dollar leader | $0.0054/doc |
| + Deterministic pre-filter | 58% of eligible documents cost $0 in model spend | ≈$0.003/doc blended |
| + Cascade & caching | Cheap metadata pass first; full text only for the ∼45% that need it; stable prefixes cached | <$0.05/doc, target met |
Four architecture decisions, zero self-hosting, zero fine-tuning — and the combined AI cost of clearing both gates on a document lands comfortably under a nickel. It's worth noting this isn't a hypothetical target: it's close to the same number already baked into our own public cost estimator, which plans AI review — classification, relevance, and privilege screening together — at a flat 1¢ per document.
7.A Worksheet: Pricing This Against Relativity for Your Own Matter
The formula underneath both platforms is the same, whether or not either vendor itemizes it that way:
Total AI review cost = (docs × privilege $/doc) + (docs × responsiveness $/doc) + (flagged docs × log $/entry, if billed separately)
Here's what to plug in on each side. The Relativity/aiR column below is only as good as the quote in front of you — we've sourced the one number that's publicly benchmarked (aiR's privilege pricing) and flagged everywhere else you need your own reseller's number, rather than guessing on your behalf:
| Line item | Relativity + aiR (typical) | DecoverAI |
|---|---|---|
| Privilege review | aiR list price ≈ $0.30/doc — confirm your negotiated rate |
≈$0.003/doc average across our ACP benchmark set |
| Responsiveness review | Not publicly benchmarked the way aiR is — get this figure from your vendor or reseller quote | Included at the same flat rate; publicly modeled at 1¢/doc |
| Privilege log drafting | $10–$15/entry if billed as a distinct pass, on the flagged subset only | Included, no separate per-entry fee |
| Hosting | Varies by reseller; commonly $20–30/GB/month plus $200–800/user/month in seat fees | $60/GB/month flat, unlimited users |
| Billing model | Per-document, per-entry, and per-seat line items stack independently | One flat rate that scales with data volume, not headcount or document count |
Worked at 250,000 documents / 100 GB — the same matter size we used in our pricing benchmark — the aiR privilege line alone runs $75,000 before a single document is checked for responsiveness, before any log entries are drafted, and before hosting or seats are counted. The equivalent DecoverAI cost for both AI review passes and log generation together is folded into the same $36,000 all-in total for the matter. You don't have to take our benchmark's word for either side of that: run your own document count and your own negotiated per-document rate through the formula above, or drop your numbers into the cost estimator for an instant side-by-side.
8.Why We Think Small, Open Models Are the Next Lever
Everything above gets both checks under a nickel using hosted, proprietary APIs — no self-hosting, no training run, nothing beyond good architecture and honest benchmarking. We think the next real unlock isn't a better hosted model. It's a small, open-weight model, fine-tuned on your own matter's labels, that you can run yourself.
The case for it starts with what document review actually is: a narrow, repeatable classification task against a fixed definition, run over and over on structurally similar documents. That's a much better fit for supervised fine-tuning than for open-ended chat — and it means the training data isn't a new cost center. Your review process is already producing it for free. Every document the cascade escalates is a labeled example of a case the cheap pass got wrong or wasn't sure about; every attorney QC decision on a sampled document (Lever Five, above) is a gold-standard human label. Both are decisions your team was already making — turning them into a fine-tuning dataset costs you the storage, not a new labeling operation.
This is an active research track for us, not a shipped default: we're running supervised fine-tuning and DPO (direct preference optimization) experiments on open-weight 7–9B models — Qwen2.5-7B-Instruct, DeepSeek's 7B chat model, and GLM-4-9B — using LoRA adapters to distill a flagship model's privilege and responsiveness judgments into a fraction of the parameter count, and comparing the fine-tuned models' precision and recall against the base open-weight models and against the flagship teacher itself.
The four levers above already push the hosted-API cost of both checks below a penny in the best case. A fine-tuned small model you run yourself isn't primarily about beating that number further — it's about what stops moving. Inference becomes a fixed compute cost you control, not a variable line item tied to a frontier lab's next price change, deprecation notice, or rate limit. And privileged content never has to leave your environment to be scored by someone else's model in the first place — which matters as much to a general counsel weighing waiver risk as it does to a CFO weighing a bill.
That's the framing we keep coming back to: outside counsel sets a litigation budget before document one gets reviewed. A review cost that scales with a third party's API pricing is a budget nobody can actually hold to for the life of a matter. A review cost that's a fixed, owned, self-hosted compute expense — trained on labels your own team already generates as a byproduct of doing the work — is one they can. That's the bet we're placing our research effort on.
9.Conclusion
Forty cents a document is what you get by default: one frontier model, one shot, full text, no pre-filtering, no cascade, no caching. None of that is a law of physics — it's a starting point nobody has optimized yet. Six ordered decisions get you under a nickel: pick the model that actually earns its price, refuse to spend a model call on what a lookup can already answer, read the metadata before the document, cache the prompt content that never changes, sample your QC instead of duplicating it, and — where the volume and the stakes justify it — own the model outright by fine-tuning a small open-weight one on labels you were already generating. The first five are available today. The sixth is where we're headed next.
Cost Estimation Worksheet: the formula, a fill-in comparison table, and the worked 250,000-document example from this post, in one printable page — so you can run the DecoverAI-vs.-Relativity math on your own matter.
Want to run this math against your own matter — document count, current AI review spend, and what it would cost on a flat rate? Book a session with our technical team, or start with the cost estimator.