AI translation services compared

A marketing lead decides on a Friday afternoon to “just AI-translate the site into French” for next week’s launch. By Monday she has a French site. By the following month her brand looks slightly off to native French readers — flat tone, awkward idioms, a product tagline that means something different than intended. The work takes a fortnight to ship and six months to undo.

AI translation is genuinely good in 2026 — better than it has ever been — but the failure modes are subtle, language-specific, and uneven across tools. The right question isn’t “which is best.” It’s “which level of quality does this content need, and which tool meets that bar?”

What follows: the side-by-side plus the decision rule. Snapshot is current as of May 2026; the AI-translation market moves quickly — see the change log for the freshness check.

Side by side

The comparison matrix

	DeepL	ChatGPT (GPT-5)	Claude (Sonnet 4.6)	Google Translate	Amazon Translate
Architecture	Specialist neural-MT service tuned for translation	General-purpose LLM with strong multilingual training	General-purpose LLM with strong multilingual training	Specialist neural-MT service from Google	Specialist neural-MT service from AWS
Quality on European language pairs (EN↔DE, FR, ES, IT)	Strongest — idiomatic accuracy is the DeepL specialty	Strong — close to DeepL, better at steerable tone	Strong — close to DeepL, better at steerable tone	Strong but flatter — accurate, less idiomatic	Competent — improving but still behind on subtlety
Quality on Asian language pairs (EN↔JA, KO, ZH)	Good and improving; gap narrowing but specialists in the language still beat it	Strong; ChatGPT consistently leads on lower-resource Asian pairs	Strong; Claude is close to GPT here	Strong on volume; less reliable on register	OK; uneven across the family
Quality on low-resource languages	Limited language coverage; not the strength	Broadest coverage of the LLM-based options	Broad coverage; some gaps vs ChatGPT	Widest language coverage in the comparison (130+)	Wide coverage; quality varies sharply by language
Formality / register control	Yes — formal / informal toggle on many language pairs	Strongest — prompt-steerable tone, audience, idioms	Strongest — prompt-steerable tone, audience, idioms	Limited — newer "formality" feature on select pairs	Limited — formality only on a subset of pairs
Glossary / term injection	Yes — glossary feature on Pro and API plans	Yes via system prompt; not a structured glossary	Yes via system prompt; not a structured glossary	Yes — Translation API supports glossaries	Yes — custom terminology supported
Document-format preservation (PDF, DOCX, HTML)	Yes — document translation preserves formatting on supported file types	No native document support; requires extraction pipeline	No native document support; requires extraction pipeline	Yes — document translation API supports common formats	Yes — supports DOCX, XLSX, PPTX, HTML
Pricing — Free tier	500,000 chars/month free with API key	Limited consumer free tier (not API)	Limited consumer free tier (not API)	500,000 chars/month free on Cloud Translation	No free tier; pay-as-you-go from first char (free tier ended)
Pricing — Paid (per million characters typical)	$20/million chars (Pro API)	~$3/million chars at API tier (varies by output length; LLM not char-billed)	~$4/million chars at API tier (varies by output length; LLM not char-billed)	$20/million chars (NMT) · $5/million (NMT Lite tier)	$15/million chars (NMT)
Latency at scale	Low — purpose-built service	Higher — LLM inference; latency varies with prompt size	Higher — LLM inference; latency varies with prompt size	Lowest — Google's NMT is heavily optimised	Low — AWS NMT is well-optimised
API maturity	Excellent — translation-specific API for a decade	Excellent — general LLM API, broad ecosystem	Excellent — general LLM API, broad ecosystem	Excellent — translation API since 2016	Excellent — translation API since 2017
Data residency / EU compliance	EU-based; GDPR-friendly by default	US-based; EU compliance via Enterprise / Business plans	US-based; EU compliance via Enterprise / Business plans	Multi-region; data-residency controls available	Multi-region; data-residency controls available

The decision

What to actually use

For European language pairs where idiomatic accuracy matters — DeepL. The lead on EN↔DE / FR / ES / IT / NL is real and consistent; the document-format preservation makes it the fastest path from “we have a website in English” to “we have a website in three languages.” Subscription plus glossary plus the formality toggle covers most marketing-content workflows.

For content where you need to steer the tone, audience, or terminology — GPT or Claude. The ability to say “translate this product description into French for a younger audience, keep the product names in English, use tu not vous, and don’t translate the call-to-action” beats every specialist NMT service. The trade-off: no native document support, so you’ll build an extraction-and-reinsertion pipeline yourself or pay a tool that does it.

For breadth of language coverage at industrial scale — Google Translate or Amazon Translate. When you need 130 languages, when latency matters, when the workload is per-line product-catalog translation rather than long-form prose, specialist NMT services are the right tool. Quality is good; tone is flat.

For anything legal, medical, financial, or marketing-launch — a human translator, optionally with AI assist as a first pass. The cost differential is high (USD 0.10–0.25/word vs cents per million characters) but the cost of a translation error in any of those contexts is much higher. AI translation is a productivity tool for translators, not a replacement.

The numbers

What you'll actually pay

DeepL Pro (API entry) $5.49/month base + $25 per million chars; 500k chars/month free

DeepL Pro (high volume) Scales down; enterprise contracts negotiable

GPT-4o API (translation use) $2.50 input / $10 output per million tokens — typically ~$3 per million chars translated

Claude Sonnet 4.6 API (translation use) $3 input / $15 output per million tokens — typically ~$4 per million chars translated

Google Cloud Translation — NMT $20 per million chars; free tier 500k chars/month

Google Cloud Translation — NMT Lite $5 per million chars; cheaper but smaller language coverage

Amazon Translate — standard $15 per million chars (active text); custom terminology incurs extra

Human translator — typical rate $0.10–$0.25 per word ($100,000–$250,000 per million chars equivalent)

Human post-editor on AI draft (MT post-editing) $0.04–$0.10 per word — typically 40–60% of full-human cost

Quality drop on under-served languages (Yoruba, Khmer, Maori, etc.) Material — escalation to human review essential for any published content

Token ratio penalty — Latin-script English to CJK target Output is ~1.5–3× the source token count; matters for LLM pricing, not for char-billed NMT

The pricing comparison is asymmetric: LLM-based translation is character-cheaper than specialist NMT for most pairs, but you pay in engineering effort to handle documents, formatting, and scale. For high-volume catalog translation, specialist NMT wins on operational simplicity; for steerable marketing-quality translation, LLMs win on flexibility.

When to escalate to a human

The work AI translation should not be doing alone

There is a specific kind of content where AI translation is irresponsible, not just suboptimal:

Legal documents — contracts, terms of service, privacy policies, regulatory filings. A mistranslation in a contract is a legal liability; the cost of a human translator is small relative to the exposure.
Medical and clinical content — drug instructions, clinical-trial documents, patient-facing health information. Specialist medical translators exist for a reason.
Financial disclosures and investor communications — regulated language, precise number formatting, jurisdiction-specific conventions. Use AI for internal drafts; ship a human-translated version.
Marketing launches into a new market — the first thing your brand says in a language is the first impression. Spend the money for a native translator on launch content; AI translation can handle the long tail once your brand voice is established in the new market.
Anything that will be quoted in press or PR — your CEO’s quote, your press release, the line your customer success team will hear back from journalists. AI translation flattens nuance in exactly the places press attention amplifies it.

For everything else — product descriptions in a large catalog, help-centre articles after the launch, internal documentation, social-media variants — AI translation is the right tool, with a human reviewer in the loop on the published flow.

What changes between now and the next refresh

Volatility notes

This category is moving but stabilising. Concrete watch-list for the next two quarters:

DeepL’s lead on European languages. Narrowing, but real. Each new flagship LLM closes a fraction of the gap; specialist NMT services have responded with steerability features that look more LLM-like. The architecture distinction may matter less by 2027.
Google Translate’s quality vs Cloud Translation API quality. The consumer product and the API now diverge — Cloud Translation now offers higher-quality models on paid tiers. Don’t benchmark on the free consumer site if you’re evaluating for production.
Pricing pressure on specialist NMT. AWS, Google, and DeepL all face the same competitive pressure from LLM-based translation being cheaper per character. Expect at least one price drop in 2026.
Translation memory + LLM hybrids. Tools that combine traditional translation memory (Trados, MemoQ) with LLM steering are emerging. Worth watching for high-volume teams.

Re-verify every 3–6 months. Pricing rows in particular are the most likely to drift.

What's next

Related work

For the cost-vector story behind freemium translation tools, see The hidden costs of “free” AI tools. For the broader content-team workflow that translation fits into, see First-draft marketing copy without the AI tells. For scalable image accompaniment (alt text in multiple languages), see Generate alt text at scale.

Common questions

FAQ

Can I post-edit AI translations with a human and call it "human translation"?

There's an industry term for this: MT post-editing (MTPE), and it's a recognised translator service tier — usually billed at 40–60% of full-human rates. "Human translation" without qualification typically implies full-human work; many clients and certifying bodies treat MTPE as a distinct service tier. Be explicit with translators (and with clients) about which tier you're paying for. The quality is usually better than raw MT and below full-human; the price reflects that.

What about low-resource languages — Welsh, Yoruba, Quechua, Khmer?

AI translation quality drops sharply outside the top ~30 languages. ChatGPT and Google Translate have the broadest coverage; DeepL is narrower. For published content in low-resource languages, treat AI as a research draft and budget for native-speaker review. The risk isn't that the translation is wrong word-for-word — it's that it's syntactically fluent but culturally off in ways monolingual editors can't catch.

Does fine-tuning help translation quality?

On specialist NMT, custom-trained models (AutoML Translation on Google, custom terminology + parallel-data training on AWS) improve domain-specific quality measurably — especially for technical content. On LLMs, fine-tuning is less common for translation specifically; prompt steering with glossary injection achieves much of the same outcome with less engineering effort. For most teams, glossary + good prompts is the right starting point; full fine-tuning is overkill until volume justifies it.

How do I evaluate translation quality without speaking the target language?

Three reliable proxies. (1) Back-translation: translate to target, then translate back to source with a different tool, compare to original. Catches major errors but misses subtle tone failures. (2) Native-speaker spot-check: pay a translator hourly to review a representative sample. (3) Customer feedback: if your translated content is in front of users, watch which segments get questions, edits, or complaints. None of these substitutes for having a native speaker on the team for any market where translation quality matters.

What about real-time translation in customer support?

Live-chat translation is now mature for the top languages — Intercom, Zendesk, and Front all offer it bundled or as add-ons. Quality is good enough for support contexts where the operator on each end has context to repair mistranslations. Don't deploy it as a replacement for a human agent in the target language; deploy it as a way to give a small support team broader language coverage. See customer support reply drafting for the broader pattern.

The comparison matrix

What to actually use

What you'll actually pay

The work AI translation should not be doing alone

Volatility notes

Related work

FAQ

Can I post-edit AI translations with a human and call it "human translation"?

What about low-resource languages — Welsh, Yoruba, Quechua, Khmer?

Does fine-tuning help translation quality?

How do I evaluate translation quality without speaking the target language?

What about real-time translation in customer support?

Sources & references

Related solutions

Ad creative A/B testing at scale

Brand-voice guardrails for marketing teams

Competitor monitoring with automated alerts

Content performance attribution