What is the unit of review for a chat message? That question sounds trivial until you try to answer it for a Discord server with 40,000 messages across 30 channels, or a Slack workspace where a single thread runs for eight months and gets edited, reacted to, and quoted out of order the entire time. Email-era eDiscovery tooling has a clean answer — the document is the email, the attachment is the attachment, and the boundary between them is obvious. Chat data has no such boundary built in. Someone has to decide where one reviewable unit ends and the next begins, and that decision cascades into everything downstream: how tagging works, how redaction works, how privilege gets logged, and how a production actually gets built.
We already ingest Slack and Teams conversational data in production. This post documents something narrower and newer: the decision record behind extending that same message-level architecture to Discord, which is showing up with increasing frequency in litigation involving online communities, crypto and Web3 organizations, gaming companies, and remote-first teams that never adopted Slack. Discord's export formats, threading model, and social conventions (servers, roles, stickers, voice channels) are different enough from Slack's that most of these decisions had to be made from scratch — and a good number of them apply to any chat platform, Slack included.
1.Why a Chat Message Breaks the Document Model
An email has a natural document boundary: the message itself, plus whatever is attached to it. A chat channel does not work that way. It is continuous — messages accumulate for as long as the channel exists, with no natural stopping point. It is branching — a reply three days later can attach to a message from the top of the channel, not the one directly above it. And it is mutable — messages get edited after the fact, deleted, reacted to, and pinned, all of which change what the record shows depending on when you look at it.
Treat an entire channel as one document and a reviewer has to wade through months of unrelated conversation to find the three messages that matter, with no way to tag just the relevant part. Treat every message as a fully independent document with no thread context and a reviewer sees "I agree, let's proceed" with no idea what was being agreed to. Both failure modes are common in tools that were retrofitted from email review rather than built around chat's actual structure. The fix is to separate two things that document-era tools collapse into one: the unit of review is the message, but the context a reviewer sees is the thread.
2.Collection: Preserving Custody Before Anything Else
Collection has to happen before any of the review-side decisions matter, and it has to happen in a way that survives a challenge. For the Discord proof of concept, collection starts from platform exports, with native API-based collection supported as a second path for matters where a broader or more targeted pull is needed. Whichever path is used, the same custody discipline applies on the way in.
The last step is the one that makes a multi-platform review workflow possible at all: Discord's raw fields (server ID, channel ID, per-server nickname, role) get mapped into the same normalized schema that Slack and Teams data already flows through, rather than living in a platform-specific silo. A reviewer working a matter that includes both Slack and Discord custodians should not have to context-switch between two different data models to do it.
3.What a Reviewer Actually Sees
Two decisions determine whether chat review feels usable or feels like reading a database dump: what metadata surfaces, and what the review screen looks like.
Metadata. The pipeline surfaces threading structure, user and channel identity, file associations, reactions, and timestamps as core fields on every message — the same backbone Slack and Teams review already runs on. Discord adds its own identity layer on top: server ID, role, and per-server nickname, since the same Discord user can appear under a different display name in every server they belong to.
Layout. Review happens in a threaded, chronological view that resembles the native app rather than a flattened list of rows, because that is the layout a reviewer needs to actually follow a conversation. Discord-specific visual conventions — the channel sidebar, embeds, stickers — are preserved rather than stripped out, since stripping them is exactly what turns a conversation back into an unreadable database dump.
4.Picking the Right Unit — Three Separate Boundaries, Not One
This is the decision that everything else depends on, and it is actually three decisions, because tagging, redaction, and document boundary do not have to use the same grain. Collapsing them into a single unit is the mistake that makes chat review either too coarse (you tag an entire channel to catch three privileged messages) or too fragile (you redact whole messages when only one word is sensitive).
| Boundary | Unit | Why it's set there |
|---|---|---|
| Tagging | A single message | The smallest thing a reviewer should have to make a relevance or privilege call on — but the document rendered for that call always carries the surrounding thread for context |
| Redaction | A single word | Privilege or confidentiality rarely spans an entire message; word-level redaction avoids withholding content that happens to sit next to a sensitive phrase |
| Document boundary | A single message | Each message is its own reviewable, loggable, producible unit — not the channel, not the thread, not the day's conversation |
Making the message the document boundary is also what makes a defensible privilege log possible: a log entry needs to point to one specific communication, not "some messages somewhere in this 40,000-message channel export."
5.Reconstructing Threads So a Reply Makes Sense
Message-level tagging only works if the reviewer isn't reading messages in isolation. Discord's raw export scatters replies throughout the file in send order, disconnected from whatever they're replying to — a reply sent an hour after the original message, and possibly after a dozen unrelated messages in between, shows up as just another line in the stream. Threading reconstruction walks the raw export and links every reply back to its parent message, regardless of how far apart they landed in send order.
Replied-to and quoted messages are a direct consequence of that same reconstruction, not a separate feature: once the parent/reply linkage exists, showing a reviewer what a given message was quoting or responding to is just rendering the link that Discord already encoded and the pipeline already recovered.
6.Voice Messages Are Chat With Extra Steps
Discord communities do a meaningful amount of their communication in voice — voice messages sent in DMs and channels, and voice-channel activity more broadly. Treating audio as out of scope for review because it isn't text is how privileged or responsive conversations slip through a collection untouched. The pipeline explicitly transcribes voice messages and voice-channel recordings, so they become searchable and reviewable the same way a text message is — the reviewer sees the transcription and tags it privileged or not, exactly as they would any other message.
7.Dedup, Time Zones, and a Production Format That Isn't a Screenshot
Three more decisions round out the pipeline, each addressing a way that chat data behaves differently from a discrete document set:
8.Partial Privilege Inside a Single Message
A single Discord or Slack message can mix a privileged legal question with an unrelated logistics line — "also can we push standup to 3pm" tacked onto the end of a question for outside counsel. Withholding the whole message over one sentence is over-broad; producing the whole message defeats the privilege. Because redaction is already scoped down to the word level, the reviewer has a real choice, not a forced compromise:
a) Withhold the whole message and log it as privileged, when the reviewer judges the privileged and non-privileged content are too entangled to safely separate.
b) Redact just the privileged portion and produce the rest, when the non-privileged content stands on its own without the redacted material.
9.Proportionality Before Review Starts
The last decision is about controlling volume before a reviewer ever opens a channel. Chat collections routinely run into the hundreds of thousands of messages once bot chatter, GIF reactions, and off-topic banter are included, and proportionality arguments are as relevant here as they are for any other ESI source. The pipeline supports prompt-driven responsiveness and privilege review — letting a team scope a first-pass review around the specific subject matter, custodians, or date ranges that proportionality actually calls for, rather than reviewing every message at uniform depth regardless of relevance.
10.What's Next: The Nine Harder Decisions
The list above handles the mechanics that any chat collection needs on day one. The decisions still in active development are harder because they involve judgment calls under uncertainty — resolving an identity that doesn't cleanly resolve, deciding what counts as noise versus signal, and handling content that was deleted before it could be collected cleanly.
Of the nine decisions above, deleted-but-recoverable messages is the one with the most direct defensibility consequence. A platform that silently omits a message Discord's audit log shows was deleted invites exactly the challenge eDiscovery teams dread: opposing counsel discovering, after the fact, that something existed and isn't in the production, with no record that it was ever considered. Flagging the gap — even when the content itself can't be recovered — is what keeps the record honest.
11.Bring Your Own Model
The one decision we've deliberately left open is the model itself. Some teams have their own classification requirements, their own fine-tuned models, or compliance constraints that dictate which model can touch their data. Rather than forcing every customer onto a single default, DecoverAI supports customers bringing their own model into the classification pipeline — the same message-level architecture, thread reconstruction, and metadata normalization underneath, with the model choice left to the customer.
12.A Checklist for Evaluating a Chat-Native eDiscovery Tool
If you're evaluating any platform's ability to handle Discord, Slack, or Teams data — ours or anyone else's — these are the questions worth asking:
- Does it tag at the message level, or force you to tag whole channels? Channel-level tagging either buries relevant content or over-produces irrelevant content — there's no middle ground without message-level granularity.
- Does redaction go down to the word, or only the message? Message-level-only redaction forces an all-or-nothing choice on partially privileged content.
- Does it reconstruct threads, or just list messages in send order? A reply with no visible parent is close to meaningless to a reviewer.
- What happens to a message that was deleted before collection? Silent omission is a defensibility risk; a flagged placeholder is not.
- Is voice content transcribed and reviewable, or excluded from scope entirely? Excluding it isn't a scoping decision, it's a gap.
- Can it validate against your own export, before you commit? Discord, Slack, and Teams each export differently enough that a generic demo doesn't tell you how the pipeline handles your actual data.
13.Conclusion
Chat data doesn't need eDiscovery to abandon the concept of a document — it needs the document boundary redrawn at the right grain. A message is small enough to tag and log precisely, a word is small enough to redact precisely, and a reconstructed thread is the context a reviewer actually needs to make sense of either one. Get those three boundaries right, and pipeline features that sound platform-specific — Discord's per-server nicknames, Slack's app integrations, either platform's edit and delete history — turn out to be variations on the same handful of underlying problems: who is this from, what did it originally say, and can we prove it.
To see the chat pipeline validated against your own Discord or Slack export, book a session with our technical team.