Which model is best for EN to ZH technical content?

As of June 2026, Claude Sonnet 4.6 is the cost/quality sweet spot for bulk work; Opus 4.7 edges it on the most nuanced prose, and GPT-5.5 and Gemini 3.1 Pro are both strong alternatives. Claude tends to keep technical jargon in English where that reads natural; GPT translates more aggressively. Test on 5 articles in your domain before committing the whole run.

How much does it cost?

On the Batch API (50% off), roughly $3-7 per 100 typical articles on Sonnet 4.6 and a bit more on Opus 4.7. Even at full real-time rates you are well under $20 per 100. That is cheaper than any human translator, and the cost is dominated by your review hours, not the tokens.

Should I review every article?

No — spot-check 10% plus a native-speaker pass on your top-traffic 5%. If the spot-check error rate is above 5%, re-prompt and re-translate the batch; below 5%, fix in place. Skipping review entirely is the one thing that gets a translated site flagged as machine-generated.

Do I need a CAT tool like memoQ?

No. The LLM does the work a CAT tool would have done, plus the translation. A simple folder structure with a `done.txt` log is enough.

What about RTL languages or scripts the model handles poorly?

Test the model first. For RTL (Arabic, Hebrew), validate that no code blocks or numbers got mirrored. For low-resource scripts, expect to hire a human reviewer on top.

Should I publish translated articles all at once or staged?

Stage them. Publish 50 articles, watch indexing and CTR in Search Console for a week, then publish the next 200. A sudden flood of new pages can trigger crawl rate limits and quality reviews.

What about updating existing translations when the source changes?

Diff the source file, send the diff plus the existing translation to the model, ask for a minimal-edit translation of just the changed sections. This preserves human edits while syncing new content.

Should the translated site share a sitemap with the source or be separate?

One sitemap is fine; the canonical and hreflang tags do the heavy lifting. Sitemap-per-language can help debugging but does not change indexing.

How do I handle locale-specific content (currency, region examples)?

Mark these in the source with a comment the model can read. The system prompt then says "for marked content, adapt to the target locale's conventions" rather than translate literally.

Indie Dev & Website Building

AI Bulk Translation of a Content Site: The Pipeline That Works

Translate an existing MDX content site with AI in 2026: batched Claude/GPT calls, the 50%-off Batch API for cost, terminology lock-ins, MDX-safe validation, and the QA pass that catches the 5% of translations Google's spam systems would penalize.

Published: May 23, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Translating 500 articles by hand is a multi-month project. With a frontier model and a 50-line script it is a long weekend — but only if you treat it as a batch data job, not a copy-paste session. This is the exact pipeline I used to translate this site (1,209 articles per language) from English into Chinese, including the terminology lock-ins, frontmatter rules, MDX validation, and the QA pass that keeps Google from flagging the translated set as machine-generated.

TL;DR

Translate 5-10 files per LLM call with a system prompt that locks terminology and forbids paraphrase. Keep H2 count and code-block count identical to the source.
Pass frontmatter through field-by-field: translate only title and description; keep urlSlug and translationKey identical so the bilingual nav can pair sister articles.
Use the Anthropic Batch API for a flat 50% discount on input and output tokens (returns within 24h). For 100 typical articles, expect roughly $3-6 on Sonnet 4.6 batch, more on Opus 4.7.
Run a structural validator before saving each file (same headings, code fences, link count) and a brace audit, since stray {var} patterns break the whole MDX build.
Edit before you publish. As of June 2026, Google’s quality systems detect unedited machine translation with high accuracy and can suppress rankings across every language version, not just the translated one. Spot-check 10%; native-review your top-traffic 5%.

Why pipeline correctness, not language quality, is the bottleneck

Frontier models (Claude Opus 4.7, Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro) translate technical prose well enough that fluency is no longer the limiting factor. With 1M-token context windows on Opus 4.7, Sonnet 4.6, and Gemini 3.1 Pro (as of June 2026), you can even feed a glossary plus several full articles in a single call without truncation. The actual failure modes are mechanical:

MDX syntax breaks — a stray { or < in translated prose fails the build for the entire site.
Frontmatter fields get translated when they should pass through verbatim.
Terminology drifts across files (the same product name rendered three ways).
Links point to the wrong language version, or the URL slug gets translated.
The model paraphrases instead of translating, quietly changing meaning.

Solve these five and the output is publishable with a 5-10% human review pass. The interactive approach (paste into ChatGPT, copy the output, save) works for ten files. At 500 you need a script; at 2,000 the script needs caching, retries, and a resume log. Build it as a batch job from day one — the 2-3 hours of upfront scripting saves days later.

When this pipeline is the right call

You have 50+ articles in one language and want a second-language version.
Google Translate or DeepL broke your MDX braces, frontmatter, or links.
You have a glossary (product names, jargon) that must translate consistently.
You want the translated site indexed as real content, not flagged as a thin clone.

What it costs (June 2026)

The cheap part is the model; the expensive part is your review time. A typical 1,500-word article is roughly 2,500 input tokens and, because Chinese output expands a little, around 3,000 output tokens. Run those numbers through the Batch API (a flat 50% off both input and output, results within 24 hours) and a full corpus is cheaper than a single freelance translator’s day rate.

Model	Standard ($/1M in/out)	Batch rate (50% off)	~Cost per 100 articles (batch)
Sonnet 4.6	$3 / $15	$1.50 / $7.50	~$3-4
Opus 4.7	$5 / $25	$2.50 / $12.50	~$5-7
GPT-5.5	$5 / $30	(Batch via OpenAI, also ~50% off)	~$5-8

Batch and real-time output are identical in quality — you only trade latency. The batch discount also stacks with prompt caching (cache the long, static glossary + system prompt once and you avoid re-paying for those input tokens on every file). For a one-time corpus translation, Sonnet 4.6 on the Batch API is the value pick; reserve Opus 4.7 for your highest-traffic pages where the marginal fluency is worth it. See the Anthropic batch processing docs for the API shape.

Pipeline shape

Build a glossary file: 30-100 terms in source and target language, with notes on tone. Example: "prompt" -> "prompt" (keep English), "workflow" -> "工作流", "shipping" -> "上线" (not “运输”).
Write a system prompt that includes: the glossary, “do not translate code blocks”, “do not translate URLs”, “do not translate frontmatter fields except title and description”, “translate publishedAt date verbatim”, “preserve all MDX components and braces exactly”.
Process files in batches. Each batch: 5-10 articles per LLM call. Use the source MDX as input, expect the translated MDX as output. Keep a done.txt list so you can resume after interruptions.
Validate output structurally: same number of ## headings, same code blocks, same number of [](url) links. Any mismatch goes to a manual review queue.
Run a brace audit. MDX breaks on stray {var} patterns. Strip or escape them in both source and target.
Human spot-check 10% by random sample. Read full article in target language. Fix terminology drift in the glossary, re-run affected files.

Frontmatter rules

title and description: translate.
urlSlug: keep identical to source so translationKey matching works.
category, subcategory, tags: keep identical to source — these are facets, not user-facing strings.
publishedAt, author, featured, draft: keep identical.
lang: change to target language code.
translationKey: set equal to urlSlug. This is what connects EN and ZH versions of the same article.

Minimal batch script

A working starter (Node + Anthropic SDK) — read 5 files, call once, write outputs, log progress:

// scripts/translate-batch.mjs
import fs from 'node:fs/promises';
import path from 'node:path';
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();
const GLOSSARY = await fs.readFile('glossary.md', 'utf8');
const SYSTEM = `Translate EN MDX articles to ZH.
Rules:
- Keep code fences and inline code ('x\`) verbatim. Do not translate inside.
- Frontmatter: translate ONLY title and description. Keep urlSlug, tags, category,
  publishedAt, author exactly. Set lang: "zh". Set translationKey = urlSlug.
- Keep text link slugs; only change /en/ to /zh/.
- No paraphrase. Keep H2 count and order identical.
${GLOSSARY}`;

async function translateFile(src) {
  const body = await fs.readFile(src, 'utf8');
  const res = await client.messages.create({
    model: 'claude-opus-4-7',
    max_tokens: 8000,
    system: SYSTEM,
    messages: [{ role: 'user', content: body }],
  });
  return res.content[0].text;
}

const done = new Set((await fs.readFile('done.txt', 'utf8').catch(() => '')).split('\n'));
const files = (await fs.readdir('src/content/articles/en')).filter((f) => !done.has(f));

for (const f of files.slice(0, 10)) {  // batch of 10 per run
  const out = await translateFile(path.join('src/content/articles/en', f));
  await fs.writeFile(path.join('src/content/articles/zh', f), out);
  await fs.appendFile('done.txt', f + '\n');
  console.log('translated:', f);
}

Run with node scripts/translate-batch.mjs repeatedly until done.txt covers everything. Resume is free because done.txt is the source of truth on progress. For a one-time bulk run, swap messages.create for the Batch API (client.messages.batches.create) to halve the token bill — you submit all jobs at once and poll for results within 24 hours.

Terminology consistency

Lock product names and proper nouns in the glossary. Most “translations” of these should be “keep original”.
Lock 20-50 domain terms. For a developer audience, commit / deploy / build often stay English in Chinese. For a general audience, translate to standard equivalents.
Run a post-pass that greps the translated corpus for each glossary term. If 95% use one rendering and 5% use another, batch-replace.
For tone words (terse, conversational, professional), include 2-3 sample sentences in the system prompt showing the target tone.
Maintain the glossary in a versioned file (glossary.json or glossary.csv). When you update a term, you know exactly which articles need re-translation by grepping the source corpus for the old rendering.

Resume and retry

Log every successful translation to done.txt with a hash of the source file. If the source changes, the hash differs and that file goes back into the queue.
Retry on rate limits with exponential backoff. Frontier model rate limits in 2026 are generous but not infinite — for 2000 files you will hit them.
Cache the LLM responses by source-file hash for the first 30 days. If the script crashes mid-run, restarting only re-translates files that have not been cached yet.
Run a structural validator on every output before saving. If output is malformed, retry once with a higher temperature, then quarantine for human review.

QA pass that actually catches things

This step is not optional, and not only for reader experience. As of June 2026, Google Search documentation is explicit that you should avoid side-by-side translations and that its quality systems detect unedited machine translation with high accuracy — an unedited bulk dump can suppress rankings across all of your language versions, not just the new pages. AdSense reviewers apply the same “original, not auto-generated” bar. A light human edit is what separates an indexable second-language site from a thin clone.

Random sample 10% of articles. Read each one fully in the target language for naturalness, not just accuracy.
Targeted checks: every code block matches source, every link’s URL is verbatim, every frontmatter field is the right value.
Native-speaker review for the highest-traffic 5% of articles. These get the most eyeballs and carry the most ranking weight; localize examples and any currency/region details rather than translating them literally.
Confirm hreflang is symmetric and self-referencing, with valid ISO codes (en, zh). Most international hreflang bugs come from non-reciprocal annotations.
Compare in-article metrics before and after translation publishes. If session duration on translated pages is below 60% of the source, the language probably reads stiff.

Common mistakes

Translating one file at a time interactively. Cost and time explode; consistency drops.
Letting the model translate URLs inside Markdown links. Always pass URLs through verbatim.
Forgetting translationKey. Without it, the bilingual nav cannot link sister articles.
Skipping the structural validation step. A model may silently merge two headings into one — readers will not notice immediately, but search engines will.
Publishing translated articles without any human review. Even 0.5% paraphrase error rate across 1000 articles means 5 articles say something wrong.
Re-translating the whole corpus when you change one glossary term. Use a targeted re-translation script for affected files only.

FAQ

Which model is best for EN to ZH technical content?: As of June 2026, Claude Sonnet 4.6 is the cost/quality sweet spot for bulk work; Opus 4.7 edges it on the most nuanced prose, and GPT-5.5 and Gemini 3.1 Pro are both strong alternatives. Claude tends to keep technical jargon in English where that reads natural; GPT translates more aggressively. Test on 5 articles in your domain before committing the whole run.
How much does it cost?: On the Batch API (50% off), roughly $3-7 per 100 typical articles on Sonnet 4.6 and a bit more on Opus 4.7. Even at full real-time rates you are well under $20 per 100. That is cheaper than any human translator, and the cost is dominated by your review hours, not the tokens.
Should I review every article?: No — spot-check 10% plus a native-speaker pass on your top-traffic 5%. If the spot-check error rate is above 5%, re-prompt and re-translate the batch; below 5%, fix in place. Skipping review entirely is the one thing that gets a translated site flagged as machine-generated.
Do I need a CAT tool like memoQ?: No. The LLM does the work a CAT tool would have done, plus the translation. A simple folder structure with a done.txt log is enough.
What about RTL languages or scripts the model handles poorly?: Test the model first. For RTL (Arabic, Hebrew), validate that no code blocks or numbers got mirrored. For low-resource scripts, expect to hire a human reviewer on top.
Should I publish translated articles all at once or staged?: Stage them. Publish 50 articles, watch indexing and CTR in Search Console for a week, then publish the next 200. A sudden flood of new pages can trigger crawl rate limits and quality reviews.
What about updating existing translations when the source changes?: Diff the source file, send the diff plus the existing translation to the model, ask for a minimal-edit translation of just the changed sections. This preserves human edits while syncing new content.
Should the translated site share a sitemap with the source or be separate?: One sitemap is fine; the canonical and hreflang tags do the heavy lifting. Sitemap-per-language can help debugging but does not change indexing.
How do I handle locale-specific content (currency, region examples)?: Mark these in the source with a comment the model can read. The system prompt then says “for marked content, adapt to the target locale’s conventions” rather than translate literally.

Tags: #Indie dev #ai-assisted #building #translation

TL;DR

Why pipeline correctness, not language quality, is the bottleneck

When this pipeline is the right call

What it costs (June 2026)

Pipeline shape

Frontmatter rules

Minimal batch script

Terminology consistency

Resume and retry

QA pass that actually catches things

Common mistakes

FAQ

Related

Related Articles

AI-Assisted MDX Template Design: 10 Layout Patterns

Using AI to Review and Improve Existing Articles

How to Avoid Low-Quality AI Content (2026 Editing Pass)

Build a Content Site with Claude Code (2026 Workflow)

Find Content Gaps With AI: A Repeatable Workflow

Prompt Design for AI Website Building (Stop the Drift)