Summarize long email threads

A 40-message email thread is one of the most reliable ways to lose a decision. The decision is in there somewhere — probably between the third and fourth reply, before the side-tangent about scheduling, after the bit where someone said “let’s circle back.” Anyone joining mid-thread is lost. Anyone returning from vacation is lost. Anyone trying to write the follow-up two months later is definitely lost.

AI summaries promise to fix this. Most of them produce wallpaper — a polite narrative paragraph that mentions everyone, attributes nothing precisely, and leaves the action items implied. This piece is the workflow that produces summaries people actually use: structured, attributed, surfacing the dissent rather than smoothing over it, with action items that have owners and dates.

When to use

Where this fits — and where it doesn't

Use this if the thread is informational or coordinative — project updates, vendor discussions, internal planning, scheduling tangles, post-mortems. Anywhere the value of the thread is “what did we decide and who’s doing what” rather than the specific phrasing of the conversation, AI summarisation works. Best applied to threads with 10+ messages where the cost of re-reading is non-trivial.

Don’t use this if the thread is contentious, legally sensitive, or contains decisions that will be cited verbatim later (HR matters, customer-facing commitments, regulatory or compliance discussions). Don’t use it on threads with confidentiality classifications that exceed your AI vendor’s data terms. And don’t use it on threads where the original phrasing is the point — disputes about wording, drafts being negotiated, lawyer-reviewed text. A summary in those cases loses the thing you actually need.

Prerequisites

What you'll need before starting

Access to an AI tool with sufficient context window — Claude (200k/1M), ChatGPT (128k–272k), or Gemini (1M+) all handle typical email threads comfortably.
The thread in a copyable form — Outlook, Gmail, or your client’s “view raw” / “view source” output works.
A standing summarisation prompt — we’ll build it in step 2. Save it somewhere reusable (Claude Project, Custom GPT, or a note-taking app).
Awareness of your team’s data-handling rules — internal threads, client confidential threads, and threads under privilege may not be acceptable inputs to your AI tool. Check before establishing the habit.

The solution

Six steps to summaries you can actually trust

Sanitise the input — strip signatures, footers, quoted replies, and noise
Email threads carry roughly 30–60% of their bytes in repeat material: signatures, legal disclaimers, deeply-quoted prior replies, “sent from my iPhone” footers. None of it improves the summary; all of it costs tokens. Strip aggressively before pasting. Preserve sender attribution lines (“From: Alex Chen, 14 March 09:42”) and the unique content of each message. Most email clients have a “view source” or plain-text mode that makes this easier than copying from the rendered view.
Use a structured summarisation prompt — not “summarise this”
The default summary is too narrative. Override with explicit sections. A version that works across Claude, ChatGPT, and Gemini:
- TL;DR: 2 sentences max — what was decided, what changes.
- Decisions made: bullet list. Each line names what was decided and who proposed/agreed.
- Open questions: bullets — what is still unresolved.
- Disagreements: bullets — where participants disagreed, even if a decision was reached. Do not smooth these over.
- Action items: bullets with owner and due date on each line. Flag any item without an owner or date.
- Skip: scheduling tangents, social pleasantries, attribution of who-said-what unless it shapes a decision.
Save this prompt as the standing template. Edit it once a quarter as the thread types you’re summarising shift.
Check the token budget before sending — the model is not magic
A typical 40-message email thread runs 8,000–15,000 tokens after sanitisation. All three major models handle that comfortably. A 200-message thread (rare but real — long sales cycles, multi-month projects) can exceed 100,000 tokens; at that length, attention degradation starts mattering and quality drops on the parts of the thread that landed near the start. For very long threads, summarise in batches (first 50 messages, next 50, then combine the batch summaries) rather than relying on raw long-context handling.
Verify attribution before trusting the summary
The most reliable failure mode of AI thread summaries is misattribution — Alex’s quote attributed to Sam, the decision Sam pushed back on attributed as Sam’s idea, the action item the engineer agreed to attributed to the project manager. Spot-check three claims per summary against the original thread, especially anything that names a person. The model’s attribution is right most of the time and confidently wrong sometimes; the cost of confidently-wrong attribution is high enough to justify the spot-check.
Preserve disagreements deliberately — most summarisers smooth them over
Default summarisation collapses dissent into consensus (“the team discussed pricing and aligned on $X”). This is the second-most-common failure mode, behind misattribution. The structured prompt asks for disagreements explicitly; verify the section is populated when the original thread contained real disagreement. If the disagreement section reads as empty when the thread had clear pushback, the summary is hiding something — rewrite or re-run the prompt with an explicit instruction to surface the disagreement.
Distribute the summary where the reader already lives — not in a new folder
The number-one reason summaries go unread is that they land somewhere readers don’t already check. For internal threads, paste the summary into the team’s existing channel (the project Slack, the Notion doc, the Linear ticket) rather than creating a new “Email Summaries” folder. For customer threads, attach to the deal or account in the CRM. Distribution is the workflow step that determines whether the summary is useful or just another piece of generated content nobody opens.

The numbers

What it costs and what to expect

Token cost — typical 40-message thread (sanitised, Sonnet 4.6 API) ~$0.05 per summary (12k input, 800 output tokens)

Token cost — large thread (100+ messages, batched summarisation) ~$0.25–$0.80 per summary

Consumer plan cost Zero incremental — Claude Pro, ChatGPT Plus, Google AI Pro all handle this within standard caps

Built-in email-tool summary feature pricing Superhuman $30/seat/month bundled; SaneBox $7–$36/seat/month tiered

Summary length-to-thread ratio (structured prompt) 5–10% of original thread length; vs default summaries at 15–25%

Attribution accuracy (spot-checked manually, structured prompt) ~90–95% — high enough to be useful, low enough to spot-check

Action-item recall (extracted vs actually present in thread) ~70–85% — review pass needed to catch missed items

Latency — 12k-token thread summary at API tier 5–15 seconds typical

Latency — 100k+ token thread (long-context model) 30–90 seconds; degrades on the first half of the thread

Quality difference — Claude 200k vs ChatGPT 128k vs Gemini 1M on long threads Claude leads at ≤200k; Gemini extends further but quality drops on very long contexts; benchmark on your data

The numbers favour doing this in your existing AI subscription rather than buying a dedicated email-AI tool, unless the email tool also handles triage and prioritisation. For thread summarisation alone, the consumer-plan path is more flexible and roughly free.

In practice

What teams running this typically learn first

The first surprise is that the structured prompt produces noticeably better summaries than the model’s default — across Claude, ChatGPT, and Gemini, the same model produces flat narrative summaries with “summarise this” and crisp structured summaries with the template. Teams that adopt the standing prompt and never use the default summarisation see a quality gap they can’t go back from.

The non-obvious cost is that misattribution is the hard failure mode. Teams that skip the verification step ship summaries that confidently name the wrong person, and the cost of correcting that downstream — apologising to the person who was misquoted, re-explaining the actual decision, restoring trust in the summarisation habit — is significantly higher than the 30 seconds of spot-checking would have been.

Six months in, thread summaries become more valuable as the team’s archive grows. The first month feels marginal — one summary, one project, one thread. By month six, the team has a searchable archive of “what did we decide on the X engagement” that the original email tool’s search can’t match. Plan for the compounding rather than the per-summary win.

Alternatives

Other ways to solve this

Built-in email-tool AI (Superhuman, SaneBox, Outlook Copilot). Right answer if you want the summarisation inside the email client itself rather than copy-pasting into Claude or ChatGPT. The trade-off is less prompt control — you get the tool’s default summary structure, not your standing template. Good for individual productivity; weaker for teams that want a consistent summary shape across the company.

Slack-channel summarisation (Slack AI, Glean, and similar workplace-AI add-ons). Same pattern, different medium — if your team’s discussion has moved to Slack rather than email, the channel summarisation tools are the right surface. The structured-prompt pattern from step 2 transfers directly.

Manual notes by a designated thread owner. Still the right answer for legally sensitive, HR-related, or board-level threads where misattribution is unacceptable. Higher cost in attention; zero risk of AI-introduced error.

Don’t summarise — restructure the conversation. Many threads that “need a summary” are signals the conversation was in the wrong tool. If a thread regularly produces decisions and action items that need extracting, move the next instance to a shared doc, a Linear ticket, or a project channel with structure built in. The best summary is the conversation that didn’t need one.

What's next

Related work

For the meeting-summary equivalent of this workflow, see Meeting summaries people actually read. For the broader unit-economics of LLM context windows that shapes pricing at long-thread sizes, see Tokens, context windows, and what they cost. For pattern-detection across many summarised threads — the workflow this enables — see Find patterns in customer feedback.

Common questions

FAQ

Can I do this directly in my email client?

Yes — most major email clients now have built-in summarisation. Outlook Copilot, Gmail with Gemini, Superhuman's native feature, SaneBox add-ons. They use the same underlying LLM technology with the client's default prompt. Quality is fine for personal use; for team-wide consistency, the standing-prompt workflow (paste into Claude or ChatGPT) wins because you control the structure.

What about confidential threads — does the model see them?

Yes, the model sees them. For sensitive content, three options: (1) use a paid tier with explicit data exclusion from training (Claude Team / Enterprise, ChatGPT Team / Enterprise, Gemini Business); (2) use a self-hosted setup — see build a private knowledge base for the architecture; (3) don't summarise that thread with AI. Picking option 3 for legal-privileged, HR-related, and customer-PII-heavy threads is the responsible default.

How do I handle threads with attachments?

Extract the attachment content separately and include the most-relevant excerpts as part of the context, not the whole attachment. A 50-page PDF as the attachment is its own summarisation problem; summarising the thread that mentions it should reference the attachment ('Alex shared the budget spreadsheet') rather than try to summarise the spreadsheet inline. See extract structured data from PDFs for the document-handling side.

What about multilingual threads where participants reply in different languages?

All three major models handle code-switching within a thread reasonably well — typically returning the summary in whichever language dominates or in English by default. Specify the summary language explicitly in the prompt ("summarise in English" or "summarise in French") to avoid surprises. Attribution quality may drop slightly on code-switched content; spot-check more carefully.

Can I summarise weekly Slack channels the same way?

Yes — same pattern, same structured prompt, with a small adjustment to handle Slack's threaded-reply structure. Most teams find Slack channel summarisation more useful than email-thread summarisation because the channels are continuous and the catch-up problem is recurring. Tools like Slack AI offer this natively; the standing-prompt approach gives more control for teams that already have a summarisation library.

Where this fits — and where it doesn't

What you'll need before starting

Six steps to summaries you can actually trust

What it costs and what to expect

Other ways to solve this

Related work

FAQ

Can I do this directly in my email client?

What about confidential threads — does the model see them?

How do I handle threads with attachments?

What about multilingual threads where participants reply in different languages?

Can I summarise weekly Slack channels the same way?

Sources & references

Related solutions

AI agents for inbound qualification

Auto-tag and route inbound social DMs

Detect churn signal from support patterns

Draft customer support replies that hold up to scrutiny