Cyberax AI Playbook
cyberax.com
Comparison · Content & Marketing

AI translation services compared

Five AI translation services and the human translator on retainer. What each is good for, what each costs per million characters, and the kind of content where AI translation will quietly betray you in ways a monolingual editor can't catch.

At a glance Last verified · May 2026
Problem solved Pick a translation pipeline for a content team — and recognise when "AI translation" is the wrong tool and a human translator is the right answer
Best for Content managers, localisation leads, founders entering new markets
Tools DeepL, ChatGPT, Claude, Google Translate, Amazon Translate
Difficulty Intermediate
Cost $0 (free tiers, low volume) → $0.50–$25 per million characters (API tiers) → $0.10–$0.25/word (human translator)

A marketing lead decides on a Friday afternoon to “just AI-translate the site into French” for next week’s launch. By Monday she has a French site. By the following month her brand looks slightly off to native French readers — flat tone, awkward idioms, a product tagline that means something different than intended. The work takes a fortnight to ship and six months to undo.

AI translation is genuinely good in 2026 — better than it has ever been — but the failure modes are subtle, language-specific, and uneven across tools. The right question isn’t “which is best.” It’s “which level of quality does this content need, and which tool meets that bar?”

What follows: the side-by-side plus the decision rule. Snapshot is current as of May 2026; the AI-translation market moves quickly — see the change log for the freshness check.

Side by side

The comparison matrix

DeepLChatGPT (GPT-5)Claude (Sonnet 4.6)Google TranslateAmazon Translate
Architecture Specialist neural-MT service tuned for translationGeneral-purpose LLM with strong multilingual trainingGeneral-purpose LLM with strong multilingual trainingSpecialist neural-MT service from GoogleSpecialist neural-MT service from AWS
Quality on European language pairs (EN↔DE, FR, ES, IT) Strongest — idiomatic accuracy is the DeepL specialtyStrong — close to DeepL, better at steerable toneStrong — close to DeepL, better at steerable toneStrong but flatter — accurate, less idiomaticCompetent — improving but still behind on subtlety
Quality on Asian language pairs (EN↔JA, KO, ZH) Good and improving; gap narrowing but specialists in the language still beat itStrong; ChatGPT consistently leads on lower-resource Asian pairsStrong; Claude is close to GPT hereStrong on volume; less reliable on registerOK; uneven across the family
Quality on low-resource languages Limited language coverage; not the strengthBroadest coverage of the LLM-based optionsBroad coverage; some gaps vs ChatGPTWidest language coverage in the comparison (130+)Wide coverage; quality varies sharply by language
Formality / register control Yes — formal / informal toggle on many language pairsStrongest — prompt-steerable tone, audience, idiomsStrongest — prompt-steerable tone, audience, idiomsLimited — newer "formality" feature on select pairsLimited — formality only on a subset of pairs
Glossary / term injection Yes — glossary feature on Pro and API plansYes via system prompt; not a structured glossaryYes via system prompt; not a structured glossaryYes — Translation API supports glossariesYes — custom terminology supported
Document-format preservation (PDF, DOCX, HTML) Yes — document translation preserves formatting on supported file typesNo native document support; requires extraction pipelineNo native document support; requires extraction pipelineYes — document translation API supports common formatsYes — supports DOCX, XLSX, PPTX, HTML
Pricing — Free tier 500,000 chars/month free with API keyLimited consumer free tier (not API)Limited consumer free tier (not API)500,000 chars/month free on Cloud TranslationNo free tier; pay-as-you-go from first char (free tier ended)
Pricing — Paid (per million characters typical) $20/million chars (Pro API)~$3/million chars at API tier (varies by output length; LLM not char-billed)~$4/million chars at API tier (varies by output length; LLM not char-billed)$20/million chars (NMT) · $5/million (NMT Lite tier)$15/million chars (NMT)
Latency at scale Low — purpose-built serviceHigher — LLM inference; latency varies with prompt sizeHigher — LLM inference; latency varies with prompt sizeLowest — Google's NMT is heavily optimisedLow — AWS NMT is well-optimised
API maturity Excellent — translation-specific API for a decadeExcellent — general LLM API, broad ecosystemExcellent — general LLM API, broad ecosystemExcellent — translation API since 2016Excellent — translation API since 2017
Data residency / EU compliance EU-based; GDPR-friendly by defaultUS-based; EU compliance via Enterprise / Business plansUS-based; EU compliance via Enterprise / Business plansMulti-region; data-residency controls availableMulti-region; data-residency controls available
The decision

What to actually use

For European language pairs where idiomatic accuracy matters — DeepL. The lead on EN↔DE / FR / ES / IT / NL is real and consistent; the document-format preservation makes it the fastest path from “we have a website in English” to “we have a website in three languages.” Subscription plus glossary plus the formality toggle covers most marketing-content workflows.

For content where you need to steer the tone, audience, or terminology — GPT or Claude. The ability to say “translate this product description into French for a younger audience, keep the product names in English, use tu not vous, and don’t translate the call-to-action” beats every specialist NMT service. The trade-off: no native document support, so you’ll build an extraction-and-reinsertion pipeline yourself or pay a tool that does it.

For breadth of language coverage at industrial scale — Google Translate or Amazon Translate. When you need 130 languages, when latency matters, when the workload is per-line product-catalog translation rather than long-form prose, specialist NMT services are the right tool. Quality is good; tone is flat.

For anything legal, medical, financial, or marketing-launch — a human translator, optionally with AI assist as a first pass. The cost differential is high (USD 0.10–0.25/word vs cents per million characters) but the cost of a translation error in any of those contexts is much higher. AI translation is a productivity tool for translators, not a replacement.

The numbers

What you'll actually pay

DeepL Pro (API entry) $5.49/month base + $25 per million chars; 500k chars/month free
DeepL Pro (high volume) Scales down; enterprise contracts negotiable
GPT-4o API (translation use) $2.50 input / $10 output per million tokens — typically ~$3 per million chars translated
Claude Sonnet 4.6 API (translation use) $3 input / $15 output per million tokens — typically ~$4 per million chars translated
Google Cloud Translation — NMT $20 per million chars; free tier 500k chars/month
Google Cloud Translation — NMT Lite $5 per million chars; cheaper but smaller language coverage
Amazon Translate — standard $15 per million chars (active text); custom terminology incurs extra
Human translator — typical rate $0.10–$0.25 per word ($100,000–$250,000 per million chars equivalent)
Human post-editor on AI draft (MT post-editing) $0.04–$0.10 per word — typically 40–60% of full-human cost
Quality drop on under-served languages (Yoruba, Khmer, Maori, etc.) Material — escalation to human review essential for any published content
Token ratio penalty — Latin-script English to CJK target Output is ~1.5–3× the source token count; matters for LLM pricing, not for char-billed NMT

The pricing comparison is asymmetric: LLM-based translation is character-cheaper than specialist NMT for most pairs, but you pay in engineering effort to handle documents, formatting, and scale. For high-volume catalog translation, specialist NMT wins on operational simplicity; for steerable marketing-quality translation, LLMs win on flexibility.

When to escalate to a human

The work AI translation should not be doing alone

There is a specific kind of content where AI translation is irresponsible, not just suboptimal:

  • Legal documents — contracts, terms of service, privacy policies, regulatory filings. A mistranslation in a contract is a legal liability; the cost of a human translator is small relative to the exposure.
  • Medical and clinical content — drug instructions, clinical-trial documents, patient-facing health information. Specialist medical translators exist for a reason.
  • Financial disclosures and investor communications — regulated language, precise number formatting, jurisdiction-specific conventions. Use AI for internal drafts; ship a human-translated version.
  • Marketing launches into a new market — the first thing your brand says in a language is the first impression. Spend the money for a native translator on launch content; AI translation can handle the long tail once your brand voice is established in the new market.
  • Anything that will be quoted in press or PR — your CEO’s quote, your press release, the line your customer success team will hear back from journalists. AI translation flattens nuance in exactly the places press attention amplifies it.

For everything else — product descriptions in a large catalog, help-centre articles after the launch, internal documentation, social-media variants — AI translation is the right tool, with a human reviewer in the loop on the published flow.

What changes between now and the next refresh

Volatility notes

This category is moving but stabilising. Concrete watch-list for the next two quarters:

  • DeepL’s lead on European languages. Narrowing, but real. Each new flagship LLM closes a fraction of the gap; specialist NMT services have responded with steerability features that look more LLM-like. The architecture distinction may matter less by 2027.
  • Google Translate’s quality vs Cloud Translation API quality. The consumer product and the API now diverge — Cloud Translation now offers higher-quality models on paid tiers. Don’t benchmark on the free consumer site if you’re evaluating for production.
  • Pricing pressure on specialist NMT. AWS, Google, and DeepL all face the same competitive pressure from LLM-based translation being cheaper per character. Expect at least one price drop in 2026.
  • Translation memory + LLM hybrids. Tools that combine traditional translation memory (Trados, MemoQ) with LLM steering are emerging. Worth watching for high-volume teams.

Re-verify every 3–6 months. Pricing rows in particular are the most likely to drift.

What's next

Related work

For the cost-vector story behind freemium translation tools, see The hidden costs of “free” AI tools. For the broader content-team workflow that translation fits into, see First-draft marketing copy without the AI tells. For scalable image accompaniment (alt text in multiple languages), see Generate alt text at scale.

Common questions

FAQ

Can I post-edit AI translations with a human and call it "human translation"?

There's an industry term for this: MT post-editing (MTPE), and it's a recognised translator service tier — usually billed at 40–60% of full-human rates. "Human translation" without qualification typically implies full-human work; many clients and certifying bodies treat MTPE as a distinct service tier. Be explicit with translators (and with clients) about which tier you're paying for. The quality is usually better than raw MT and below full-human; the price reflects that.

What about low-resource languages — Welsh, Yoruba, Quechua, Khmer?

AI translation quality drops sharply outside the top ~30 languages. ChatGPT and Google Translate have the broadest coverage; DeepL is narrower. For published content in low-resource languages, treat AI as a research draft and budget for native-speaker review. The risk isn't that the translation is wrong word-for-word — it's that it's syntactically fluent but culturally off in ways monolingual editors can't catch.

Does fine-tuning help translation quality?

On specialist NMT, custom-trained models (AutoML Translation on Google, custom terminology + parallel-data training on AWS) improve domain-specific quality measurably — especially for technical content. On LLMs, fine-tuning is less common for translation specifically; prompt steering with glossary injection achieves much of the same outcome with less engineering effort. For most teams, glossary + good prompts is the right starting point; full fine-tuning is overkill until volume justifies it.

How do I evaluate translation quality without speaking the target language?

Three reliable proxies. (1) Back-translation: translate to target, then translate back to source with a different tool, compare to original. Catches major errors but misses subtle tone failures. (2) Native-speaker spot-check: pay a translator hourly to review a representative sample. (3) Customer feedback: if your translated content is in front of users, watch which segments get questions, edits, or complaints. None of these substitutes for having a native speaker on the team for any market where translation quality matters.

What about real-time translation in customer support?

Live-chat translation is now mature for the top languages — Intercom, Zendesk, and Front all offer it bundled or as add-ons. Quality is good enough for support contexts where the operator on each end has context to repair mistranslations. Don't deploy it as a replacement for a human agent in the target language; deploy it as a way to give a small support team broader language coverage. See customer support reply drafting for the broader pattern.

Sources & references

Change history (1 entry)
  • 2026-05-11 Initial publication.