DecoverAI Blog - What Happens to Your eDiscovery Costs When the Data Set Doubles?

The Baseline: A 100 GB Matter

The starting point is the benchmark matter from the DecoverAI pricing white paper: 100 GB of collected data, approximately 250,000 documents after deduplication, and a 6-month active review window. These parameters reflect a mid-sized commercial dispute — large enough that a traditional eDiscovery vendor is necessary, small enough that cost consciousness should govern tool selection.

Under traditional per-GB/per-document pricing at mid-market US rates as of early 2026, the all-in cost for this baseline matter is approximately $460,000. The line items include processing and ingestion ($10,000), hosting for six months ($15,000), first-pass responsiveness review at $1.50 per document across 250,000 documents ($375,000), privilege review and log generation ($32,000), production ($8,000), and project management ($20,000). The dominant line item — responsible for approximately 92% of the cost differential versus any alternative — is first-pass review.

Under AI-augmented all-inclusive pricing (DecoverAI at $60/GB/month), the same matter costs $36,000: 100 GB × $60/GB/month × 6 months. Processing, hosting, AI-driven relevance classification, privilege log generation, production, and redaction are all included. The 12.8× cost differential between the two models at 100 GB is almost entirely explained by whether first-pass review is billed per document or absorbed into a GB-based rate. The question examined here is what happens to each model as the matter grows.

What Happens at 200 GB

When the data set doubles from 100 GB to 200 GB, the document count roughly doubles to approximately 500,000 documents, assuming similar data density across the additional custodians. Under the traditional per-document model, each line item responds differently to this expansion.

Processing and hosting scale linearly. Processing surcharges double from $10,000 to $20,000. Hosting doubles from $15,000 to $30,000. These are genuinely proportional increases and represent the only components of the traditional model that behave the way a client might expect when the data set doubles.

First-pass review scales super-linearly. The additional 250,000 documents require the same per-document rate ($1.50), which would nominally produce a $375,000 increment — but the total first-pass review cost at 200 GB is not simply $750,000. Expanding the review team to handle double the document volume in the same time window adds coordination overhead: additional project management hours, a second QC layer to maintain consistency across a larger team, and training time for new contract reviewers brought on to meet the deadline. The first-pass review cost at 200 GB is conservatively $750,000 and realistically higher.

Privilege review scales with responsive documents. If 6% of 500,000 documents are responsive, approximately 30,000 documents require privilege screening — double the baseline. At $6 per document for privilege review plus $10 per log entry, the privilege line item doubles from $32,000 to $64,000 or more, depending on how privilege-dense the additional custodians' materials are.

Under the AI-augmented model, the math is exact: 200 GB × $60/GB/month × 6 months = $72,000. No renegotiation. No additional line items. No per-document charges triggered by the expanded collection.

Line item	100 GB traditional	200 GB traditional	100 GB AI	200 GB AI
Processing & ingestion	$10,000	$20,000	Included	Included
Hosting (6 months)	$15,000	$30,000	$36,000	$72,000
First-pass review	$375,000	$750,000+	Included	Included
Privilege review + log	$32,000	$64,000+	Included	Included
Production	$8,000	$12,000	Included	Included
Project management	$20,000	$40,000+	Optional	Optional
Total	$460,000	$916,000+	$36,000	$72,000

Why Review Costs Superscale

The per-document model scales super-linearly with data volume for two structurally distinct reasons, and understanding both matters for any matter budget that may be subject to a collection expansion.

First, contract reviewers are not infinitely scalable. A review team of 12 reviewers can process approximately 125,000–150,000 documents in a standard review window at a quality-controlled pace. Doubling the document volume does not simply require twice as many reviewers — it requires twice as many reviewers plus the project management infrastructure to coordinate them: additional team leads, a second QC layer, more training sessions to ensure consistency across the larger group, and more attorney supervision hours at the senior level. The cost per document reviewed does not stay flat as the team scales; it tends to increase because the supervisory overhead grows faster than the review throughput.

Second, the responsiveness rate of additional collections is typically lower than the initial collection. The initial custodians collected in a matter are typically the most central to the dispute — the decision-makers whose communications are most likely to be responsive. Additional custodians added later (those compelled by opposing counsel's motion or ordered by the court) tend to be more peripheral. This means the additional 250,000 documents in a 200 GB matter do not produce 250,000 × 6% = 15,000 additional responsive documents. The yield is lower. But the per-document review charge applies regardless of responsiveness: every document in the population must be reviewed before it can be classified as non-responsive and set aside. A 200 GB matter does not produce twice as many responsive documents as a 100 GB matter, but it does cost twice as much to review — and the discovery yield per dollar of review cost drops as the matter expands.

The AI-augmented model is structurally immune to both of these dynamics. First-pass responsiveness classification is performed computationally, not by contract reviewers. The marginal cost of classifying an additional document is not $1.50; it is a fraction of a cent of compute, already included in the per-GB rate. The project management cost of coordinating a larger review team does not exist because there is no review team to coordinate. The only scaling parameter in the AI-augmented model is data volume — which is exactly what the per-GB pricing formula captures.

Mid-Matter Collection Expansion

The most common trigger for a data set doubling is not initial scoping error — it is a mid-matter collection expansion. Opposing counsel moves to compel additional custodians. The court orders production from a data source that was initially excluded from the scope. New claims added to the pleadings bring additional document categories into scope that were not contemplated in the original preservation letter. These are operationally normal events in commercial litigation; they are not anomalies. Any matter budget that does not model the cost of collection expansion has not modeled the matter.

At the point of expansion, a party using traditional per-document pricing receives a revised vendor estimate that scales with the size of the expansion. That revised estimate arrives while the review is already underway — after the vendor relationship is established, after the review team is staffed, after the budget has been submitted to the client. Renegotiating the vendor rate at the point of expansion is possible in theory and almost never successful in practice; vendors have no incentive to discount a line item that is already running.

Under an all-inclusive per-GB pricing model, the expansion is absorbed into the existing rate structure without renegotiation. If opposing counsel's successful motion to compel adds 50 GB to the collection, the incremental cost is 50 × $60/month × remaining months in the review window. That number is calculable the moment the court's order issues. It does not require a vendor estimate, a revised SOW, or a new approval cycle. The predictability difference is operationally significant for in-house legal teams managing matter budgets — particularly in matters where the initial budget was approved by a finance committee that did not contemplate a collection expansion as a line-item risk.

This is also the reason why matter budget variance is systematically higher under per-document pricing than under GB-based all-inclusive pricing. The per-document model has a latent expansion multiplier built into every collection: any additional custodian or data source triggers a per-document charge on every document in that source, regardless of whether those documents turn out to be relevant. The GB-based model has no such multiplier; the rate is the rate, and the only variable is how many GB were collected.

The Question to Ask Before You Sign

Every eDiscovery vendor evaluation should include a marginal cost analysis as a required step. The question to ask is specific: “What is your all-in price if our data set doubles from the initial estimate?” The answer reveals the structure of the pricing model more clearly than any rate card, because it forces the vendor to expose the scaling behavior of the line items they would prefer not to discuss.

A vendor whose answer to the doubled-data-set question is “more than double” is using per-document review and project management as the growth engines of their revenue model. The rate card may look competitive at the initial data estimate; it is not competitive at 2× the initial estimate, and it is not predictable at any intermediate point. A vendor whose answer is “exactly proportional to volume growth” has a cost structure that aligns with the client's interest: the client pays for what they store and process, not for the labor cost of reviewing what turns out to be non-responsive.

The full list of vendor evaluation questions — including how to elicit an all-in number that covers collection expansion scenarios — is covered in detail at /blog/ediscovery-vendor-all-in-price. The marginal cost question is the most diagnostic single question on that list, because it is the one that most clearly separates per-document pricing from volume-based all-inclusive pricing.

For matters where collection expansion is a realistic risk — which is to say, most contested commercial matters — the marginal cost analysis should be performed before the vendor is selected, not after the first motion to compel is granted. At that point, the leverage is gone.

The right marginal cost question: “If our data set doubles, does your cost double?” For per-document pricing models, the honest answer is “more than double.” For all-inclusive GB-based pricing, the honest answer is “exactly double.” The difference compounds over the life of the matter.

Download the Full Pricing Benchmark

The complete methodology behind the 100 GB vs. 200 GB analysis, with line-item assumptions and mid-market rate sources for early 2026.

What Happens to Your eDiscovery Costs When the Data Set Doubles?

The Baseline: A 100 GB Matter

What Happens at 200 GB

Why Review Costs Superscale

Mid-Matter Collection Expansion

The Question to Ask Before You Sign

Related Reading