A marketing lead decides on a Friday afternoon to “just AI-translate the site into French” for next week’s launch. By Monday she has a French site. By the following month her brand looks slightly off to native French readers — flat tone, awkward idioms, a product tagline that means something different than intended. The work takes a fortnight to ship and six months to undo.
AI translation is genuinely good in 2026 — better than it has ever been — but the failure modes are subtle, language-specific, and uneven across tools. The right question isn’t “which is best.” It’s “which level of quality does this content need, and which tool meets that bar?”
What follows: the side-by-side plus the decision rule. Snapshot is current as of May 2026; the AI-translation market moves quickly — see the change log for the freshness check.
The comparison matrix
| DeepL | ChatGPT (GPT-5) | Claude (Sonnet 4.6) | Google Translate | Amazon Translate | |
|---|---|---|---|---|---|
| Architecture | Specialist neural-MT service tuned for translation | General-purpose LLM with strong multilingual training | General-purpose LLM with strong multilingual training | Specialist neural-MT service from Google | Specialist neural-MT service from AWS |
| Quality on European language pairs (EN↔DE, FR, ES, IT) | Strongest — idiomatic accuracy is the DeepL specialty | Strong — close to DeepL, better at steerable tone | Strong — close to DeepL, better at steerable tone | Strong but flatter — accurate, less idiomatic | Competent — improving but still behind on subtlety |
| Quality on Asian language pairs (EN↔JA, KO, ZH) | Good and improving; gap narrowing but specialists in the language still beat it | Strong; ChatGPT consistently leads on lower-resource Asian pairs | Strong; Claude is close to GPT here | Strong on volume; less reliable on register | OK; uneven across the family |
| Quality on low-resource languages | Limited language coverage; not the strength | Broadest coverage of the LLM-based options | Broad coverage; some gaps vs ChatGPT | Widest language coverage in the comparison (130+) | Wide coverage; quality varies sharply by language |
| Formality / register control | Yes — formal / informal toggle on many language pairs | Strongest — prompt-steerable tone, audience, idioms | Strongest — prompt-steerable tone, audience, idioms | Limited — newer "formality" feature on select pairs | Limited — formality only on a subset of pairs |
| Glossary / term injection | Yes — glossary feature on Pro and API plans | Yes via system prompt; not a structured glossary | Yes via system prompt; not a structured glossary | Yes — Translation API supports glossaries | Yes — custom terminology supported |
| Document-format preservation (PDF, DOCX, HTML) | Yes — document translation preserves formatting on supported file types | No native document support; requires extraction pipeline | No native document support; requires extraction pipeline | Yes — document translation API supports common formats | Yes — supports DOCX, XLSX, PPTX, HTML |
| Pricing — Free tier | 500,000 chars/month free with API key | Limited consumer free tier (not API) | Limited consumer free tier (not API) | 500,000 chars/month free on Cloud Translation | No free tier; pay-as-you-go from first char (free tier ended) |
| Pricing — Paid (per million characters typical) | $20/million chars (Pro API) | ~$3/million chars at API tier (varies by output length; LLM not char-billed) | ~$4/million chars at API tier (varies by output length; LLM not char-billed) | $20/million chars (NMT) · $5/million (NMT Lite tier) | $15/million chars (NMT) |
| Latency at scale | Low — purpose-built service | Higher — LLM inference; latency varies with prompt size | Higher — LLM inference; latency varies with prompt size | Lowest — Google's NMT is heavily optimised | Low — AWS NMT is well-optimised |
| API maturity | Excellent — translation-specific API for a decade | Excellent — general LLM API, broad ecosystem | Excellent — general LLM API, broad ecosystem | Excellent — translation API since 2016 | Excellent — translation API since 2017 |
| Data residency / EU compliance | EU-based; GDPR-friendly by default | US-based; EU compliance via Enterprise / Business plans | US-based; EU compliance via Enterprise / Business plans | Multi-region; data-residency controls available | Multi-region; data-residency controls available |
What to actually use
For European language pairs where idiomatic accuracy matters — DeepL. The lead on EN↔DE / FR / ES / IT / NL is real and consistent; the document-format preservation makes it the fastest path from “we have a website in English” to “we have a website in three languages.” Subscription plus glossary plus the formality toggle covers most marketing-content workflows.
For content where you need to steer the tone, audience, or terminology — GPT or Claude. The ability to say “translate this product description into French for a younger audience, keep the product names in English, use tu not vous, and don’t translate the call-to-action” beats every specialist NMT service. The trade-off: no native document support, so you’ll build an extraction-and-reinsertion pipeline yourself or pay a tool that does it.
For breadth of language coverage at industrial scale — Google Translate or Amazon Translate. When you need 130 languages, when latency matters, when the workload is per-line product-catalog translation rather than long-form prose, specialist NMT services are the right tool. Quality is good; tone is flat.
For anything legal, medical, financial, or marketing-launch — a human translator, optionally with AI assist as a first pass. The cost differential is high (USD 0.10–0.25/word vs cents per million characters) but the cost of a translation error in any of those contexts is much higher. AI translation is a productivity tool for translators, not a replacement.
What you'll actually pay
The pricing comparison is asymmetric: LLM-based translation is character-cheaper than specialist NMT for most pairs, but you pay in engineering effort to handle documents, formatting, and scale. For high-volume catalog translation, specialist NMT wins on operational simplicity; for steerable marketing-quality translation, LLMs win on flexibility.
The work AI translation should not be doing alone
There is a specific kind of content where AI translation is irresponsible, not just suboptimal:
- Legal documents — contracts, terms of service, privacy policies, regulatory filings. A mistranslation in a contract is a legal liability; the cost of a human translator is small relative to the exposure.
- Medical and clinical content — drug instructions, clinical-trial documents, patient-facing health information. Specialist medical translators exist for a reason.
- Financial disclosures and investor communications — regulated language, precise number formatting, jurisdiction-specific conventions. Use AI for internal drafts; ship a human-translated version.
- Marketing launches into a new market — the first thing your brand says in a language is the first impression. Spend the money for a native translator on launch content; AI translation can handle the long tail once your brand voice is established in the new market.
- Anything that will be quoted in press or PR — your CEO’s quote, your press release, the line your customer success team will hear back from journalists. AI translation flattens nuance in exactly the places press attention amplifies it.
For everything else — product descriptions in a large catalog, help-centre articles after the launch, internal documentation, social-media variants — AI translation is the right tool, with a human reviewer in the loop on the published flow.
Volatility notes
This category is moving but stabilising. Concrete watch-list for the next two quarters:
- DeepL’s lead on European languages. Narrowing, but real. Each new flagship LLM closes a fraction of the gap; specialist NMT services have responded with steerability features that look more LLM-like. The architecture distinction may matter less by 2027.
- Google Translate’s quality vs Cloud Translation API quality. The consumer product and the API now diverge — Cloud Translation now offers higher-quality models on paid tiers. Don’t benchmark on the free consumer site if you’re evaluating for production.
- Pricing pressure on specialist NMT. AWS, Google, and DeepL all face the same competitive pressure from LLM-based translation being cheaper per character. Expect at least one price drop in 2026.
- Translation memory + LLM hybrids. Tools that combine traditional translation memory (Trados, MemoQ) with LLM steering are emerging. Worth watching for high-volume teams.
Re-verify every 3–6 months. Pricing rows in particular are the most likely to drift.
Related work
For the cost-vector story behind freemium translation tools, see The hidden costs of “free” AI tools. For the broader content-team workflow that translation fits into, see First-draft marketing copy without the AI tells. For scalable image accompaniment (alt text in multiple languages), see Generate alt text at scale.
FAQ
Can I post-edit AI translations with a human and call it "human translation"?
There's an industry term for this: MT post-editing (MTPE), and it's a recognised translator service tier — usually billed at 40–60% of full-human rates. "Human translation" without qualification typically implies full-human work; many clients and certifying bodies treat MTPE as a distinct service tier. Be explicit with translators (and with clients) about which tier you're paying for. The quality is usually better than raw MT and below full-human; the price reflects that.
What about low-resource languages — Welsh, Yoruba, Quechua, Khmer?
AI translation quality drops sharply outside the top ~30 languages. ChatGPT and Google Translate have the broadest coverage; DeepL is narrower. For published content in low-resource languages, treat AI as a research draft and budget for native-speaker review. The risk isn't that the translation is wrong word-for-word — it's that it's syntactically fluent but culturally off in ways monolingual editors can't catch.
Does fine-tuning help translation quality?
On specialist NMT, custom-trained models (AutoML Translation on Google, custom terminology + parallel-data training on AWS) improve domain-specific quality measurably — especially for technical content. On LLMs, fine-tuning is less common for translation specifically; prompt steering with glossary injection achieves much of the same outcome with less engineering effort. For most teams, glossary + good prompts is the right starting point; full fine-tuning is overkill until volume justifies it.
How do I evaluate translation quality without speaking the target language?
Three reliable proxies. (1) Back-translation: translate to target, then translate back to source with a different tool, compare to original. Catches major errors but misses subtle tone failures. (2) Native-speaker spot-check: pay a translator hourly to review a representative sample. (3) Customer feedback: if your translated content is in front of users, watch which segments get questions, edits, or complaints. None of these substitutes for having a native speaker on the team for any market where translation quality matters.
What about real-time translation in customer support?
Live-chat translation is now mature for the top languages — Intercom, Zendesk, and Front all offer it bundled or as add-ons. Quality is good enough for support contexts where the operator on each end has context to repair mistranslations. Don't deploy it as a replacement for a human agent in the target language; deploy it as a way to give a small support team broader language coverage. See customer support reply drafting for the broader pattern.