How DecoverAI Handles Discord & Slack Chats for Privilege and Responsiveness Review

DecoverAI Product & Engineering · Design record for the chat message ingestion, review, and production pipeline — built against a Discord proof of concept, extending patterns already shipped for Slack · July 2026

What is the unit of review for a chat message? That question sounds trivial until you try to answer it for a Discord server with 40,000 messages across 30 channels, or a Slack workspace where a single thread runs for eight months and gets edited, reacted to, and quoted out of order the entire time. Email-era eDiscovery tooling has a clean answer — the document is the email, the attachment is the attachment, and the boundary between them is obvious. Chat data has no such boundary built in. Someone has to decide where one reviewable unit ends and the next begins, and that decision cascades into everything downstream: how tagging works, how redaction works, how privilege gets logged, and how a production actually gets built.

We already ingest Slack and Teams conversational data in production. This post documents something narrower and newer: the decision record behind extending that same message-level architecture to Discord, which is showing up with increasing frequency in litigation involving online communities, crypto and Web3 organizations, gaming companies, and remote-first teams that never adopted Slack. Discord's export formats, threading model, and social conventions (servers, roles, stickers, voice channels) are different enough from Slack's that most of these decisions had to be made from scratch — and a good number of them apply to any chat platform, Slack included.

Decisions shipping now — collection, threading, tagging, redaction, and production, built and validated

Decisions in active development — identity resolution, scope controls, edited/deleted message handling, near-dup clustering

Decision left to the customer — bring-your-own-model support for teams with their own classification requirements

1.Why a Chat Message Breaks the Document Model

An email has a natural document boundary: the message itself, plus whatever is attached to it. A chat channel does not work that way. It is continuous — messages accumulate for as long as the channel exists, with no natural stopping point. It is branching — a reply three days later can attach to a message from the top of the channel, not the one directly above it. And it is mutable — messages get edited after the fact, deleted, reacted to, and pinned, all of which change what the record shows depending on when you look at it.

Treat an entire channel as one document and a reviewer has to wade through months of unrelated conversation to find the three messages that matter, with no way to tag just the relevant part. Treat every message as a fully independent document with no thread context and a reviewer sees "I agree, let's proceed" with no idea what was being agreed to. Both failure modes are common in tools that were retrofitted from email review rather than built around chat's actual structure. The fix is to separate two things that document-era tools collapse into one: the unit of review is the message, but the context a reviewer sees is the thread.

2.Collection: Preserving Custody Before Anything Else

Collection has to happen before any of the review-side decisions matter, and it has to happen in a way that survives a challenge. For the Discord proof of concept, collection starts from platform exports, with native API-based collection supported as a second path for matters where a broader or more targeted pull is needed. Whichever path is used, the same custody discipline applies on the way in.

Ingest

Preserve the original export/API package, untouched

→

Verify

Generate hash manifests; validate message & attachment counts

→

Log

Record processing exceptions instead of silently dropping data

→

Normalize

Map raw platform fields into a common eDiscovery schema

The last step is the one that makes a multi-platform review workflow possible at all: Discord's raw fields (server ID, channel ID, per-server nickname, role) get mapped into the same normalized schema that Slack and Teams data already flows through, rather than living in a platform-specific silo. A reviewer working a matter that includes both Slack and Discord custodians should not have to context-switch between two different data models to do it.

3.What a Reviewer Actually Sees

Two decisions determine whether chat review feels usable or feels like reading a database dump: what metadata surfaces, and what the review screen looks like.

Metadata. The pipeline surfaces threading structure, user and channel identity, file associations, reactions, and timestamps as core fields on every message — the same backbone Slack and Teams review already runs on. Discord adds its own identity layer on top: server ID, role, and per-server nickname, since the same Discord user can appear under a different display name in every server they belong to.

Layout. Review happens in a threaded, chronological view that resembles the native app rather than a flattened list of rows, because that is the layout a reviewer needs to actually follow a conversation. Discord-specific visual conventions — the channel sidebar, embeds, stickers — are preserved rather than stripped out, since stripping them is exactly what turns a conversation back into an unreadable database dump.

4.Picking the Right Unit — Three Separate Boundaries, Not One

This is the decision that everything else depends on, and it is actually three decisions, because tagging, redaction, and document boundary do not have to use the same grain. Collapsing them into a single unit is the mistake that makes chat review either too coarse (you tag an entire channel to catch three privileged messages) or too fragile (you redact whole messages when only one word is sensitive).

Boundary	Unit	Why it's set there
Tagging	A single message	The smallest thing a reviewer should have to make a relevance or privilege call on — but the document rendered for that call always carries the surrounding thread for context
Redaction	A single word	Privilege or confidentiality rarely spans an entire message; word-level redaction avoids withholding content that happens to sit next to a sensitive phrase
Document boundary	A single message	Each message is its own reviewable, loggable, producible unit — not the channel, not the thread, not the day's conversation

Making the message the document boundary is also what makes a defensible privilege log possible: a log entry needs to point to one specific communication, not "some messages somewhere in this 40,000-message channel export."

5.Reconstructing Threads So a Reply Makes Sense

Message-level tagging only works if the reviewer isn't reading messages in isolation. Discord's raw export scatters replies throughout the file in send order, disconnected from whatever they're replying to — a reply sent an hour after the original message, and possibly after a dozen unrelated messages in between, shows up as just another line in the stream. Threading reconstruction walks the raw export and links every reply back to its parent message, regardless of how far apart they landed in send order.

Replied-to and quoted messages are a direct consequence of that same reconstruction, not a separate feature: once the parent/reply linkage exists, showing a reviewer what a given message was quoting or responding to is just rendering the link that Discord already encoded and the pipeline already recovered.

6.Voice Messages Are Chat With Extra Steps

Discord communities do a meaningful amount of their communication in voice — voice messages sent in DMs and channels, and voice-channel activity more broadly. Treating audio as out of scope for review because it isn't text is how privileged or responsive conversations slip through a collection untouched. The pipeline explicitly transcribes voice messages and voice-channel recordings, so they become searchable and reviewable the same way a text message is — the reviewer sees the transcription and tags it privileged or not, exactly as they would any other message.

7.Dedup, Time Zones, and a Production Format That Isn't a Screenshot

Three more decisions round out the pipeline, each addressing a way that chat data behaves differently from a discrete document set:

Deduplication

Hash + Metadata Match

Chat platforms are continuous streams, not batches, so dedup runs on a hash of the message content combined with matching metadata — not just a content hash, which would risk collapsing distinct messages that happen to share identical text.

Time Zones

UTC Internally, Custodian-Local on Screen

All timestamps are handled internally in UTC. The reviewer sees the time in the timezone of the custodian being reviewed — so a chronology built across custodians in different time zones stays internally consistent while still reading naturally for each individual reviewer.

Production

RSMF, With Metadata

Each message is converted into RSMF (Relativity Short Message Format) along with its metadata, rather than flattened into a TIFF image of a chat window — keeping the production searchable and consistent with how other short-message platforms are already produced in the industry.

8.Partial Privilege Inside a Single Message

A single Discord or Slack message can mix a privileged legal question with an unrelated logistics line — "also can we push standup to 3pm" tacked onto the end of a question for outside counsel. Withholding the whole message over one sentence is over-broad; producing the whole message defeats the privilege. Because redaction is already scoped down to the word level, the reviewer has a real choice, not a forced compromise:

Two paths for a partially privileged message

a) Withhold the whole message and log it as privileged, when the reviewer judges the privileged and non-privileged content are too entangled to safely separate.

b) Redact just the privileged portion and produce the rest, when the non-privileged content stands on its own without the redacted material.

9.Proportionality Before Review Starts

The last decision is about controlling volume before a reviewer ever opens a channel. Chat collections routinely run into the hundreds of thousands of messages once bot chatter, GIF reactions, and off-topic banter are included, and proportionality arguments are as relevant here as they are for any other ESI source. The pipeline supports prompt-driven responsiveness and privilege review — letting a team scope a first-pass review around the specific subject matter, custodians, or date ranges that proportionality actually calls for, rather than reviewing every message at uniform depth regardless of relevance.

10.What's Next: The Nine Harder Decisions

The list above handles the mechanics that any chat collection needs on day one. The decisions still in active development are harder because they involve judgment calls under uncertainty — resolving an identity that doesn't cleanly resolve, deciding what counts as noise versus signal, and handling content that was deleted before it could be collected cleanly.

Identity resolution

Names That Change Mid-Matter

Internal IDs get mapped to human-readable names for Slack and Teams today. Discord adds real complexity — per-server nicknames and username/discriminator changes over time — handled as best-effort matching, with every resolution decision auditable, human QC available for names that don't resolve cleanly, and a confidence score shown to the reviewer.

Scope

Which Servers, Channels, and DMs

Slack's scoping pattern — by custodian, channel, and date range via the Discovery API or Corporate Export — is the template. Discord's equivalent API or export-based scoping method is still being defined.

Bots & system messages

Filtering Automation Noise

Slack's "app integrations" — automated messages from bots and connected services — are already treated as a distinct data type that AI classification deprioritizes in favor of legally significant content. The same approach is extending to Discord's bot and system-message spam (joins, pins, automated posts).

Rich content

Embeds, Stickers, GIFs, Custom Emoji

Reviewers can see and search these with their associated metadata. A vision model additionally generates descriptive metadata for embeds, stickers, GIFs, and custom emoji, so they're searchable by content, not just by filename.

Reactions

Discoverable, Not Decorative

Reaction tracking is already a named pipeline step for Slack, and DecoverAI treats reactions as potentially legally significant — evidence of awareness or agreement with a message's content. This is one of the more directly transferable capabilities to Discord.

Edited messages

History, Not Just the Latest Version

The engine reconstructs and displays prior versions of an edited message, not just the current text. The latest version displays by default, with an explicit "Edited" tag so a reviewer knows to check the history.

Deleted messages

Transparency Over Silence

A deletion event with no recoverable content is ingested as a flagged placeholder — "Deleted, content unavailable" — instead of vanishing silently. Content that is actually recoverable through an agreed collection process gets full chain-of-custody treatment and a distinct "Recovered, Deleted Message" flag with who/when deletion metadata.

Bulk batch tagging

Tag a Thread, Not Just a Message

Reviewers will be able to select an entire thread of communication and tag it in one action, rather than clicking through each message individually when an entire exchange shares the same disposition.

Near-duplicate detection

Beyond Exact-Match Dedup

Discord communities repeat copy-pasted announcements and re-post pinned messages across channels. Clustering near-duplicates, not just exact ones, meaningfully cuts reviewer volume on top of the hash-based dedup already running upstream.

Why the deleted-message decision matters most

Of the nine decisions above, deleted-but-recoverable messages is the one with the most direct defensibility consequence. A platform that silently omits a message Discord's audit log shows was deleted invites exactly the challenge eDiscovery teams dread: opposing counsel discovering, after the fact, that something existed and isn't in the production, with no record that it was ever considered. Flagging the gap — even when the content itself can't be recovered — is what keeps the record honest.

11.Bring Your Own Model

The one decision we've deliberately left open is the model itself. Some teams have their own classification requirements, their own fine-tuned models, or compliance constraints that dictate which model can touch their data. Rather than forcing every customer onto a single default, DecoverAI supports customers bringing their own model into the classification pipeline — the same message-level architecture, thread reconstruction, and metadata normalization underneath, with the model choice left to the customer.

12.A Checklist for Evaluating a Chat-Native eDiscovery Tool

If you're evaluating any platform's ability to handle Discord, Slack, or Teams data — ours or anyone else's — these are the questions worth asking:

Does it tag at the message level, or force you to tag whole channels? Channel-level tagging either buries relevant content or over-produces irrelevant content — there's no middle ground without message-level granularity.
Does redaction go down to the word, or only the message? Message-level-only redaction forces an all-or-nothing choice on partially privileged content.
Does it reconstruct threads, or just list messages in send order? A reply with no visible parent is close to meaningless to a reviewer.
What happens to a message that was deleted before collection? Silent omission is a defensibility risk; a flagged placeholder is not.
Is voice content transcribed and reviewable, or excluded from scope entirely? Excluding it isn't a scoping decision, it's a gap.
Can it validate against your own export, before you commit? Discord, Slack, and Teams each export differently enough that a generic demo doesn't tell you how the pipeline handles your actual data.

13.Conclusion

Chat data doesn't need eDiscovery to abandon the concept of a document — it needs the document boundary redrawn at the right grain. A message is small enough to tag and log precisely, a word is small enough to redact precisely, and a reconstructed thread is the context a reviewer actually needs to make sense of either one. Get those three boundaries right, and pipeline features that sound platform-specific — Discord's per-server nicknames, Slack's app integrations, either platform's edit and delete history — turn out to be variations on the same handful of underlying problems: who is this from, what did it originally say, and can we prove it.

To see the chat pipeline validated against your own Discord or Slack export, book a session with our technical team.