What does it cost to run privilege detection with large language models? On a controlled benchmark, the model itself costs somewhere between $0.001 and $0.08 per document depending on which one you use, and our recommended operating point — DeepSeek V4 Pro, chosen for its balanced precision and recall rather than for being the cheapest option on the table — lands at $0.0027 per document. Layer in the ACP pipeline's deterministic screen, which clears roughly 58% of a typical collection before any model is called at all, and the effective blended cost across a full collection drops well under that already-low per-document figure. That compares with Relativity, where aiR-based AI privilege review is priced at roughly 30¢ per document reviewed — on the order of 100 times more per document than Decover's benchmarked operating point.
That gap is not a pricing gimmick — it is downstream of two decisions: which model does the review, and how much of the collection ever needs a model at all. Ask most eDiscovery teams how their platform detects attorney-client privilege and you'll get a list of doctrines: a Kovel check for consultants, an Upjohn check for corporate employees, a common-interest check for co-defendants, an in-house check for general counsel. Each one sounds like its own problem, because the case law treats them as separate doctrines with separate elements and separate citations. Priced and built that way, privilege detection is expensive — every doctrine means another full-text model pass.
They are not separate problems for a classification system, and they do not need to be separate model calls. Every one of those doctrines is answering some combination of the same four questions: is a privileged party in the circle, was the communication for legal advice, was it kept confidential, and was privilege later waived. Build a classifier per doctrine and you pay to compute the same underlying facts five or six times over. Build the pipeline around the four shared questions instead, and most of the expense disappears along with the redundancy: mechanical questions get answered deterministically for free, and the model is reserved for the one genuine judgment call left over.
This post is a plain-language walkthrough of how Decover's ACP (attorney-client privilege) tag is actually architected — what gets decided deterministically, what gets left to a model, in what order, and why the pipeline defaults to withholding a document until it has a specific reason not to — and what our own model benchmarks say about where that $0.0027-a-document figure comes from, and why spending more doesn't buy more accuracy.
1.The Problem With One Classifier Per Doctrine
The instinct to build a doctrine-specific detector is understandable. The case law is organized by doctrine, so it feels natural to organize the software the same way: a Kovel model, an Upjohn model, a common-interest model, each trained or prompted on its own elements.
The problem shows up the first time two doctrines overlap on the same document — an outside forensic accountant (Kovel) retained by in-house counsel (in-house primary-purpose) communicating with employees (Upjohn) about a matter shared with a co-defendant under a joint defense agreement (common-interest). Run four independent classifiers over that email and you get four independent, possibly inconsistent, judgments about facts that are actually identical across all four: who is inside the privilege circle, and whether the purpose was legal advice. The inconsistency is not a modeling failure — it is what you get when the same underlying fact is computed four separate times by four separate systems with no shared source of truth.
A privilege rulebook looks like dozens of independent doctrines — Kovel, Upjohn, common-interest, in-house primary-purpose, the waiver theories, crime-fraud. Underneath, they decompose into a handful of shared primitives. Compute each primitive once, at the stage where it is cheapest and most reliable, and every doctrine becomes a combination of those primitives rather than a system of its own.
Kovel is the clearest example. It is not one test. Its “the expert is inside the circle” element lives in the roster. Its “the purpose was legal advice” element lives in the same purpose test every doctrine uses. Its confidentiality element lives in the same confidentiality screen every doctrine uses. Kovel is not a fifth classifier — it is one more membership rule plus the same three tests everything else already runs through.
2.Four Questions, Not Six Doctrines
Every ACP doctrine we looked at reduces to some combination of five questions. The first four determine whether privilege attaches and survives; the fifth strips it back off:
| Primitive | The question | Doctrines that are really just this question |
|---|---|---|
| Membership | Is a privileged party in the circle? | Core ACP · Kovel · agent/expert · Upjohn · common-interest / pooled defense · in-house counsel |
| Legal purpose | Was it for legal advice, not business? | Core ACP · in-house primary-purpose · dual-purpose communications |
| Confidentiality | Was it kept confidential? | Core ACP (a necessary element of every doctrine above) |
| Waiver | Was privilege lost after attaching? | Subject-matter · inadvertent (FRE 502) · at-issue · advice-of-counsel waiver |
| Exception | Does an exception strip it anyway? | Crime-fraud |
The membership row is the one worth sitting with. Kovel, agent/expert, Upjohn, common-interest, and in-house counsel are five different fact patterns for deciding one thing: who counts as being inside the privilege circle. A system that builds five separate answers to that question will disagree with itself. A system that builds one membership computation — the roster — and feeds it five classification rules will not.
The same logic applies to legal purpose. “Made for the purpose of legal advice” is a requirement of core ACP, and it is the entire fight in in-house and dual-purpose cases, where business advice from a lawyer is not privileged. That test is common to every doctrine — it gets evaluated once, not re-implemented inside each one.
3.Deterministic Where the Law Is Mechanical, AI Only for Judgment
Once the doctrines are broken into primitives, a second design question follows: which primitives should a model decide, and which should a rule decide?
Membership and confidentiality are lookups against a known circle — is this sender's domain on the case's attorney list, is there an outside recipient on the header. That is mechanical. It should be deterministic: fast, free, reproducible, and safe to run at full recall on every document. Legal purpose and at-issue waiver require reading the substance of a communication and forming a judgment. That is where a model belongs.
This is also where a cheap, high-recall screen should run before an expensive one. Proving a document is privileged is genuinely hard. Proving it cannot be — because no attorney appears anywhere on the thread — is easy and mechanical. So the pipeline rejects only on the absence of a necessary condition, and escalates everything uncertain. On our internal validation set, that single deterministic screen clears roughly 58% of documents for free, before any model call is made.
Notice what this means for model selection, a subject we have benchmarked in depth (see our privilege review model-selection guide): the model only ever touches Stage 2's compact document summary and Stage 4's full-text judgment call. It is never asked to re-derive membership or confidentiality — those facts are already settled by the time a model sees the document. That has direct consequences for which model you should pay for, which we come back to in Section 6.
4.Walking the Four Stages
Here is what actually happens to a document, stage by stage.
Because every in-circle doctrine shares the same roster, purpose test, and confidentiality screen, adding a new one is not a new stage — it is one more classification rule fed into Stage 1's roster. The table below shows how five doctrines that read as unrelated case law all resolve into the same four stages:
| Doctrine | Membership | Purpose | Confid. / Waiver | Exception |
|---|---|---|---|---|
| Core ACP | Stage 1 roster | Stage 2 + 4 | Stage 1 + 3 | Stage 4 |
| Kovel (expert) | Stage 1 (expert in-circle) | Stage 2 + 4 | Stage 1 + 3 | Stage 4 |
| Agent (non-attorney) | Stage 1 (agent in-circle) | Stage 2 + 4 | Stage 1 + 3 | Stage 4 |
| Upjohn | Stage 1 (employee = client) | Stage 2 + 4 | Stage 1 + 3 | Stage 4 |
| Common-interest / pooled | Stage 1 (JD in-circle) | Stage 2 + 4 | Stage 1 + 3 | Stage 4 |
| In-house / primary-purpose | Stage 1 (in-house counsel) | Stage 4 (primary-purpose) | Stage 1 + 3 | Stage 4 |
5.Why the Gate Defaults to Withhold
ACP has a different error asymmetry than responsiveness review, and it drives the single most important design decision in Stage 4.
False negative (missed privilege): a privileged document gets produced. This risks an inadvertent waiver — potentially of the entire subject matter, not just the one document, depending on how FRE 502 and the governing protective order treat the disclosure. A clawback can limit the damage, but the exposure exists the moment the document leaves your hands.
False positive (improper withholding): a non-privileged document gets withheld and logged. This is a correctable problem — it shows up as a challengeable log entry, a meet-and-confer, potentially an in-camera review. It is not free, but it does not create waiver.
The practical conclusion: recall on the “is anything here potentially privileged” question needs to be as close to 1.0 as the pipeline can get, because a missed privileged document is close to irreversible. Precision on what specifically gets released back into the production set is where the engineering effort belongs.
That is why Stage 4 does not start neutral. It starts at withhold, and releases a passage only when it can point to a specific, confident reason the remaining content is plainly innocuous. Precision in this design does not come from being aggressive about rejecting borderline documents — it comes from being disciplined about what gets released. Loosening that stance to catch a few more borderline documents is exactly the kind of change that looks like an accuracy improvement in testing and turns into a waiver incident in production.
6.What Our Model Benchmarks Say About Where AI Judgment Belongs
Because Stages 1 and 3 are deterministic and Stage 2 only clears the obvious administrative noise, the only place a model's capability tier actually matters in this pipeline is Stage 4's protective-gate judgment on the documents that survive to it. That makes our own model benchmarking directly relevant to how we'd recommend configuring an ACP review.
Decover's Working Paper 2026-02 compared nine large language models — from Alibaba, DeepSeek, MiniMax, Moonshot AI, and Anthropic — on a fixed 100-document gold-labeled classification task, holding the pipeline constant and varying only the model. This was a responsiveness benchmark, not an ACP-specific one, and privilege is a harder, more judgment-intensive task than responsiveness — model rankings on ACP specifically may differ, and we are building a privilege-labeled benchmark to test that directly. But the structural finding is the one that matters here, and there is no reason to expect it is unique to responsiveness: cost does not predict accuracy, and a well-designed pipeline reaches a performance plateau that model spend does not break through.
Figure 1. Cost vs. F1 across nine models on a 100-document gold-labeled sample. Eight of nine models cluster in an 11-point F1 band (0.76–0.87) across a 30× cost range — the most expensive model (Claude Opus, far right) scores below models costing 30× less. In the ACP pipeline, this is the band that determines the quality of Stage 4's protective-gate call, since Stages 1 and 3 remove the model from the decision entirely. Source: Decover Research Working Paper 2026-02.
This lines up with an internal validation result specific to the ACP gate itself: when we tested collapsing Stage 4 from separate per-doctrine calls into a single holistic structured verdict, precision and recall did not move. What changed was that the single call could also emit the doctrine label, waiver basis, and rules triggered — the fields a privilege log actually needs — at no accuracy cost. That is the same pattern the nine-model benchmark shows at the model layer: once the architecture around a judgment call is right, spending more (on a bigger model, or on more separate model calls) buys auditability and structure, not additional accuracy.
7.The Cost Math Changes When Most Documents Never Reach a Model
The per-document costs in our benchmark — and in most vendor pricing — are quoted as if every document in a collection gets a full model pass. In the ACP pipeline, that is not what happens. Stage 1 alone removes roughly 58% of documents deterministically, for zero model cost, before Stage 2's cheap summary-only pass thins the rest further. Only what survives all three earlier stages ever reaches Stage 4's full-text call — the one stage where model choice tracks the cost figures below.
Extrapolated linearly from per-document metered costs at June 2026 prices, for a flat full-collection pass. Higher F1 bar shown for context: Qwen leads at 0.87, claude-opus at 0.76.
Apply the ACP funnel to that chart and the real number drops sharply. Even on the conservative assumption that Stage 2 clears nothing further — that the full 42% surviving Stage 1 reaches Stage 4 — the most expensive model in the study falls from $8,130 to roughly $3,415 per 100,000 documents, and the best-value model falls from $270 to roughly $113. Stage 2 does clear additional obvious non-privileged administrative traffic in practice, so the real blended cost of an ACP tag on a typical collection is lower than either figure. The upshot for budgeting: model spend on ACP is not the line item it looks like when priced per-document across a whole collection — because most of the collection is never priced at all.
8.What This Buys You on a Privilege Log
The reason to architect the gate this way, rather than as a single yes/no privilege flag, is that a privilege log needs more than a flag. Every Stage 4 verdict is a structured record:
| Verdict field | What it's for |
|---|---|
| label | The disposition — withhold, produce, redact, or refer to senior attorney review |
| doctrine | The named basis for the log entry — core ACP, Kovel, Upjohn, common-interest, in-house, or none |
| waived / waiver_basis | Whether privilege attached but was later lost, and on what theory — anticipates the challenge opposing counsel will make |
| crime_fraud | Whether the exception was considered and what it found |
| rules_triggered | The specific rule references applied — the audit trail behind the call |
| basis | Plain-language reasoning, written at privilege-log quality — not a debugging note |
A black-box “privileged: yes” output cannot survive a meet-and-confer. A record that names the doctrine, states whether waiver was considered, and cites the rules applied is the difference between a privilege log you can defend entry-by-entry and one that invites a wholesale challenge.
9.Scope: What ACP Is Not
This pipeline is deliberately scoped to attorney-client privilege only. It is not the same test as work product, and treating the two as interchangeable is itself a common source of over-broad withholding claims. Work product (Hickman v. Taylor; FRCP 26(b)(3)) protects a different interest — materials prepared in anticipation of litigation — under different elements and a different (and weaker) waiver standard. Jurisdiction and choice-of-law questions, ESI logging mechanics, and ethics rules are handled elsewhere in the pipeline and are out of scope for the ACP gate specifically.
The doctrine references behind each stage, for the record:
- Core ACP — Upjohn Co. v. United States, 449 U.S. 383 (1981); Fed. R. Evid. 501.
- Kovel — United States v. Kovel, 296 F.2d 918 (2d Cir. 1961).
- Non-attorney agent — 8 Wigmore §2301; cf. Kovel.
- Upjohn (employee communications) — Upjohn, 449 U.S. 383.
- Common-interest / pooled defense — In re Teleglobe, 493 F.3d 345 (3d Cir. 2007); United States v. Schwimmer, 892 F.2d 237 (2d Cir. 1989).
- In-house / primary-purpose — In re Kellogg Brown & Root, 756 F.3d 754 (D.C. Cir. 2014).
- Dual-purpose — In re Grand Jury, 23 F.4th 1088 (9th Cir. 2021).
- Waiver — Fed. R. Evid. 502; In re Pacific Pictures, 679 F.3d 1121 (9th Cir. 2012).
- Crime-fraud exception — United States v. Zolin, 491 U.S. 554 (1989).
10.A Checklist for Evaluating an AI Privilege Tool
If you're evaluating any AI-assisted ACP workflow — ours or anyone else's — these are the questions that separate an architecture built to hold up in a log negotiation from one built to demo well:
- Does it compute membership once, or re-derive it per doctrine? If a Kovel call and an Upjohn call on the same email can disagree about who's an attorney, membership isn't a shared fact — it's five independent guesses.
- Does it model waiver at the communication level, or the document level? A thread that was forwarded outside the circle halfway through should have its still-confidential half protected, not treated as one all-or-nothing document.
- Does it default to withhold, and can it explain every release? Ask for the reasoning behind a specific innocuous-content release, not just the aggregate precision/recall numbers.
- Does every verdict include a doctrine and a basis you could put directly in a log entry? A bare privileged/not-privileged flag is not a privilege log input.
- Will they run a validation pass on your own document population? Ask for it before committing to a model or a platform — see our model-selection framework for what that validation should measure.
11.Conclusion
The ACP tag is not made more defensible by having more classifiers behind it. It is made more defensible by having fewer, shared, correctly-ordered ones: one roster that every in-circle doctrine draws from, one purpose test, one confidentiality screen, one waiver partition, and one protective gate that starts at withhold and only releases what it can specifically justify. Doctrines stop being separate systems and become combinations of the same handful of tests — which is also why the system can absorb a new fact pattern, a new jurisdiction's rule, or a new doctrine entirely as one more classification rule rather than a rebuild.
Where a model sits in that pipeline matters more than which model it is. Our benchmark data shows a performance plateau that a bigger model does not break through, and a deterministic-first architecture means most of a collection never reaches a model call at all. Spend the engineering effort on getting the stages and the withhold-by-default gate right; spend the model budget on the judgment calls that are actually left over — and put the savings into attorney QC on the documents the pipeline escalates as uncertain.
To see the ACP pipeline validated on your own document population, book a session with our technical team.