Contract review and clause extraction

Contract clause extraction means pulling specific clauses out of a contract — termination, liability cap, intellectual property, renewal, governing law — and checking each one against the terms your team has decided are acceptable. A modern AI model can do this in seconds across an inbound NDA, a vendor agreement, or a SaaS subscription.

The point isn’t to replace your lawyer. It’s to change what the lawyer spends time on. NDAs and routine SaaS contracts are roughly 90% identical to the previous one and 10% genuinely worth reading. Most of the senior-attention cost gets burned confirming the 90% instead of digging into the 10%. AI triage inverts that — the lawyer’s hour lands on the contract that actually needs an hour, not on the NDA that needed a glance.

What comes next is the workflow: extract the standard clauses from each inbound contract, compare them to your written playbook of acceptable terms, flag the differences, and let everything else flow through your normal signing process. Nothing in the workflow replaces legal advice — it just sends legal attention where it matters.

When to use

Where this fits — and where it doesn't

Use this if you sign 20+ routine contracts a month, your legal team is a bottleneck on time-to-close, and you have a written playbook (formal or informal) of clauses you accept and clauses you redline. Common fits: legal ops teams in growing companies, in-house counsel handling vendor and partner agreements, founders running their own contract pipeline, sales ops teams managing customer paper.

Don’t use this if every contract you sign is a custom negotiation (M&A, complex licensing, regulated industry agreements), your playbook isn’t written down yet (build the playbook first — the AI is useless without it), or your legal counsel is unwilling to share review responsibility with an automated triage layer. The last case is real and worth respecting; the right pace is the pace your legal partner can support.

Prerequisites

What you'll need before starting

A written playbook of standard clauses — what you accept by default, what you typically redline, what’s a hard no. If this doesn’t exist yet, write it first; the AI is the engine, the playbook is the road. A first version can be a one-page Notion doc; it gets more sophisticated over time.
20–50 representative contracts of the type you want to triage. The system learns what “standard” looks like for your business from this sample.
A long-context-capable model API — Claude (200k default, 1M on the higher tiers), GPT (128k on GPT-4o, 1M on GPT-4.1 family), or Gemini (1M+). Routine contracts run 5–25k tokens; complex ones can hit 100k+.
Buy-in from legal counsel that AI triage is an acceptable first pass for routine contract types. This is operational; the lawyer still reviews flagged contracts and signs off on the playbook.
A clear definition of what the triage does and doesn’t do. “Surfaces non-standard terms for human review” is the right framing. “Automatically approves contracts” is not.

The solution

Six steps to a triage your legal team will trust

Codify the playbook — clauses, acceptable language, redline triggers
For each contract type (NDA, vendor agreement, SaaS subscription, partnership), write down the 8–15 clauses that matter: governing law, term and termination, liability caps, IP ownership, indemnity, confidentiality, payment terms, renewal mechanics, data protection. For each clause, capture three things: what’s clearly acceptable, what triggers a soft flag (acceptable but worth noting), and what triggers a hard flag (requires human review). The playbook is the artifact the AI is comparing against; without it, the AI has no basis to flag anything.
Extract clauses with structured output, citing source spans
For each inbound contract, ask the model to extract each clause from the playbook, returning the exact source text (verbatim quote with span position) plus a normalised summary in plain English. The verbatim quote matters — it’s how reviewers verify what the contract actually says, not what the model paraphrased. Skipping the source-span requirement is how AI contract review loses trust: the summary is plausible, the actual clause says something different, and a wrong redline goes back to counterparty.
Compare extracted clauses to the playbook — rules first, model second
Run deterministic rules first: governing law is one of your acceptable jurisdictions; liability cap is at least your minimum; payment terms aren’t longer than your maximum. Use the model only for the harder judgement calls: is this indemnity provision standard or unusual? Does this confidentiality clause include the carve-outs you require? The rule-first pattern is more reliable than letting the model judge everything — deterministic rules don’t drift, and they produce a clean audit trail.
Generate the triage report — clean pass, soft flags, hard flags
For each contract, produce a one-page report with three sections: (1) clauses that match playbook standard — green, no action; (2) clauses that diverge but within acceptable parameters — yellow, note for awareness; (3) clauses that require human review — red, attached source spans, suggested redline language if applicable. The report is what counsel reads; the format should be optimised for legal-skim, not for AI showcasing. Keep summaries short, keep source quotes verbatim, keep the rationale for each flag explicit.
Route by flag level — only red-flagged contracts reach counsel
Three routes: (a) all green and no yellow → contract proceeds to standard signing flow with the triage report attached as a record; (b) green plus minor yellow → routes to a non-lawyer reviewer (legal ops, contract manager) for sign-off; (c) any red flag → routes to counsel with the report, the source contract, and suggested redlines. The routing is what saves counsel’s time — the goal is that they only read contracts that genuinely warrant attorney attention.
Track flag patterns weekly — the triage gets smarter as the corpus grows
Log every flag and the eventual outcome (counsel agreed it needed review, counsel said this is fine, contract redlined and re-extracted). Within a month you’ll see patterns: which clauses generate the most false positives (rule too strict — relax it), which generate the most false negatives (counsel caught something the rules missed — add the rule), which counterparty templates consistently trigger the same flags (worth a vendor-specific playbook). The audit loop is what makes month six dramatically better than month one.

The numbers

What it costs and what to expect

Per-contract extraction cost (typical 10–25k token contract) $0.05–$0.30 per contract at API tier

Managed CLM with AI features (Ironclad, LinkSquares, Workday Contracts [formerly Evisort]) $200–$2,000 per month at SMB volumes; enterprise tiers higher

Time saved per routine contract review 20–45 minutes per contract — varies by contract type and depth of review

Clause-extraction accuracy on standard contract types (NDA, SaaS) 92–97% on the standard clause set with a tuned prompt

Clause-extraction accuracy on complex or bespoke contracts 80–90% — accuracy drops as language diverges from common templates

False-positive rate (flagged but actually standard) 15–25% in the first month; drops below 10% after audit-loop tuning

False-negative rate (passed but should have been flagged) 2–5% — the rate that matters most; lawyer review during audit keeps this honest

Auto-pass rate after tuning 60–75% of routine contracts (NDAs, simple vendor agreements) pass without human review

Time to first working triage 1 week for a single contract type; 1–2 months for the full family

Counsel-time freed per month at typical SMB volume 15–30 hours — varies sharply with current review depth

ROI break-even 1–3 months at typical legal-rate billing

The cost is small. The freed-counsel-time is the operational metric — that’s what determines whether the workflow is paying off. Track it explicitly; the legal team will thank you.

In practice

What teams running this typically learn first

The first surprise is how much of the value lives in the playbook, not in the AI. Teams that try to “just run AI on contracts” without a playbook produce flag-everything reports that legal ignores. Teams that invest a week in writing down their accept/redline standards — then plug in the AI — get a usable triage from week two. The playbook is the artifact that survives model upgrades, vendor changes, and team turnover; the AI is the substitutable engine.

The non-obvious cost is that source-span verbatim citations are non-negotiable. Summaries are useful for a quick read; verbatim quotes are what reviewers actually rely on for decisions. Teams that skip the verbatim requirement to keep the report tidy find that lawyers stop trusting the triage within weeks — the moment a summary turns out to have paraphrased an important nuance, the workflow loses the legal team’s confidence and is hard to win back.

By the time the project is humming, teams notice the contract types where AI triage pays off are not always the obvious ones. NDAs are obvious; SaaS subscriptions are obvious. The less-obvious wins are renewal-window tracking (the AI extracts renewal terms across the contract portfolio and surfaces the 60-day windows the legal team would otherwise discover late), and clause-drift analysis (which vendors are pushing for terms that diverge from your playbook, which makes the negotiation strategy data-driven). These are second-order wins that emerge after the basic triage is humming.

Alternatives

Other ways to solve this

Full CLM platforms with AI built in (Ironclad, LinkSquares, Workday Contracts [formerly Evisort], Agiloft, ContractPodAi). Turnkey contract lifecycle management with clause extraction, repository, approval workflows, and reporting. Right answer for legal teams that want a complete system rather than building. Trade-off: higher per-month cost, less control over the AI behaviour, and platform lock-in. Strong fit for larger legal teams or regulated industries.

AI-augmented drafting tools (Spellbook, Harvey, LawGeex). Different problem — these help lawyers draft and review faster, not triage at the inbox. Useful complement to the triage workflow; sits at the next step in the pipeline (after triage flags a contract, the lawyer uses Harvey or Spellbook for the review itself). Pricing tends to be per-seat for legal users.

Paralegal triage by human. Still the right answer for low-volume teams where the cost of building exceeds the cost of a paralegal hour. The threshold is roughly 20–30 routine contracts a month; below that, human triage is faster than the system you’d build.

No triage — counsel reads everything. The current baseline at many companies. Honest answer for very-low-volume operations; expensive for anything else. The pain isn’t dollars (counsel is already on payroll); it’s calendar — every signed contract has a counsel-review queue in front of it, and that queue is the bottleneck on time-to-close.

What's next

Related work

For the broader document-extraction pattern this fits into, see Extract structured data from PDFs. For classifying contracts by type before triage, see Document classification at scale. For pulling contract-related action items out of email threads (renewal reminders, signature requests), see Email-to-task automation. For the broader privacy-and-data-handling implications of running contracts through an LLM, see AI privacy — what to watch for.

Common questions

FAQ

Is this legal advice? Is the AI's output privileged?

No, and no. This workflow is a triage and operational tool — it surfaces clauses for human review, it does not provide legal advice. Output is not protected by attorney-client privilege unless an attorney is in the loop and the privilege is explicitly preserved through the workflow design. Talk to your counsel about how the triage output is stored, who sees it, and how it interacts with privilege. For sensitive contracts, the triage should run inside the attorney's workflow (counsel reviews the output as part of their work product) rather than as a standalone ops layer.

What about confidentiality — can we send contracts through an LLM API?

Use a vendor with an enterprise data-exclusion tier (Anthropic Enterprise, OpenAI Team / Enterprise, Google AI Enterprise) that excludes your inputs from training and offers a documented data-retention policy. For highly sensitive contracts (M&A, regulated industries, customer-PII-heavy), consider self-hosting an open-source model — see build a private knowledge base for the architecture. Many in-house legal teams are comfortable with enterprise-tier APIs for routine contracts and reserve self-hosting for the sensitive subset.

How do we keep the playbook current as our business changes?

Quarterly review with the legal team. Each quarter, ask: which clauses have we redlined recently that aren't yet in the playbook? Which standards have shifted (new state privacy laws, new corporate policies, new insurance requirements)? Update the playbook; re-run the triage on the recent contracts to verify the new rules behave as expected. The playbook should evolve at roughly the same pace as your business; static playbooks decay within a year.

What about contracts in other languages?

Modern LLMs handle 30+ languages competently for clause extraction. Quality holds well in major business languages (English, Spanish, French, German, Portuguese, Japanese, Mandarin); drops in lower-resource languages. For multilingual operations, maintain a playbook per language (clause patterns vary by legal tradition) rather than translating one master playbook. The triage workflow itself is identical; the rules and examples differ by jurisdiction.

How is this different from a CLM like Ironclad or LinkSquares?

A CLM is the system of record — repository, approval workflow, e-signature integration, reporting. The triage workflow described here is the AI layer that sits above (or alongside) the CLM. Modern CLMs increasingly bundle the AI triage; pre-AI CLMs require you to layer the triage on top. If you don't have a CLM yet and process meaningful contract volume, evaluate AI-bundled CLM platforms before building. If you have a CLM but limited AI features, the triage workflow described here is a reasonable supplement.

What if the AI surfaces a clause as 'acceptable' but it turns out to be problematic?

That's a false negative, and it's the failure mode the audit loop is designed to catch. Every cleared contract should still go through your standard signing workflow with the triage report attached — the report is supplementary, not authoritative. When counsel reviews a contract that the triage cleared and spots a problem, log the pattern, update the playbook or the rules, and re-test on the historical sample. The system gets better; the human-in-the-loop never goes away for material contracts.

Where this fits — and where it doesn't

What you'll need before starting

Six steps to a triage your legal team will trust

What it costs and what to expect

Other ways to solve this

Related work

FAQ

Is this legal advice? Is the AI's output privileged?

What about confidentiality — can we send contracts through an LLM API?

How do we keep the playbook current as our business changes?

What about contracts in other languages?

How is this different from a CLM like Ironclad or LinkSquares?

What if the AI surfaces a clause as 'acceptable' but it turns out to be problematic?

Sources & references

Related solutions

Audit-trail generation from system logs

Auto-categorize support tickets by topic and urgency

Auto-generate documentation from PRs and code

Automated invoice and receipt processing