Translating 500+ articles by hand is a 6-month project; with AI it is a long weekend if you set the pipeline up right. This article walks through the batch flow that actually works for MDX content sites — including terminology lock-ins, frontmatter handling, and the QA pass that catches what would otherwise embarrass you.
Background
Modern frontier models (Claude Opus 4.7, GPT-5.5, Gemini 3) translate prose well enough that the bottleneck is no longer language quality — it is pipeline correctness. The actual risks are: MDX syntax breaks, frontmatter fields getting translated when they should not, terminology drifting across files, links pointing to the wrong language, and the model paraphrasing instead of translating. Solve these five and the result is publishable with a 5-10% human review pass.
The temptation when starting is to do it interactively — paste a file into ChatGPT, copy the output, save. This works for 10 files. At 500, you need a script. At 2000, the script needs caching, retries, and a resume log. Treat translation as a batch ETL job from day one; the upfront overhead of 2-3 hours of scripting saves days later.
How to tell
- You have 50+ articles in one language and want a second language version.
- You tried Google Translate or DeepL on MDX and the braces, frontmatter, or links broke.
- You have a glossary of terms that must translate consistently (product names, jargon).
- You want the translated site to be SEO-indexable, not a thin clone.
Quick verdict
Translate in batches of 5-10 files per LLM call, with system prompt locking terminology and forbidding paraphrase. Pass frontmatter through verbatim with field-by-field rules. Run a final pass that compares EN and ZH file structure (same number of code blocks, headings, links). Spot-check 10% of articles manually. Expect 0.5-2 hours per 100 articles for review.
Pipeline shape
- Build a glossary file: 30-100 terms in source and target language, with notes on tone. Example:
"prompt" -> "prompt"(keep English),"workflow" -> "工作流","shipping" -> "上线"(not “运输”). - Write a system prompt that includes: the glossary, “do not translate code blocks”, “do not translate URLs”, “do not translate frontmatter fields except title and description”, “translate publishedAt date verbatim”, “preserve all MDX components and braces exactly”.
- Process files in batches. Each batch: 5-10 articles per LLM call. Use the source MDX as input, expect the translated MDX as output. Keep a
done.txtlist so you can resume after interruptions. - Validate output structurally: same number of
##headings, same code blocks, same number of[](url)links. Any mismatch goes to a manual review queue. - Run a brace audit. MDX breaks on stray
{var}patterns. Strip or escape them in both source and target. - Human spot-check 10% by random sample. Read full article in target language. Fix terminology drift in the glossary, re-run affected files.
Frontmatter rules
titleanddescription: translate.urlSlug: keep identical to source sotranslationKeymatching works.category,subcategory,tags: keep identical to source — these are facets, not user-facing strings.publishedAt,author,featured,draft: keep identical.lang: change to target language code.translationKey: set equal tourlSlug. This is what connects EN and ZH versions of the same article.
Minimal batch script
A working starter (Node + Anthropic SDK) — read 5 files, call once, write outputs, log progress:
// scripts/translate-batch.mjs
import fs from 'node:fs/promises';
import path from 'node:path';
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
const GLOSSARY = await fs.readFile('glossary.md', 'utf8');
const SYSTEM = `Translate EN MDX articles to ZH.
Rules:
- Keep code fences and inline code (\`x\`) verbatim. Do not translate inside.
- Frontmatter: translate ONLY title and description. Keep urlSlug, tags, category,
publishedAt, author exactly. Set lang: "zh". Set translationKey = urlSlug.
- Keep text link slugs; only change /en/ to /zh/.
- No paraphrase. Keep H2 count and order identical.
${GLOSSARY}`;
async function translateFile(src) {
const body = await fs.readFile(src, 'utf8');
const res = await client.messages.create({
model: 'claude-opus-4-7',
max_tokens: 8000,
system: SYSTEM,
messages: [{ role: 'user', content: body }],
});
return res.content[0].text;
}
const done = new Set((await fs.readFile('done.txt', 'utf8').catch(() => '')).split('\n'));
const files = (await fs.readdir('src/content/articles/en')).filter((f) => !done.has(f));
for (const f of files.slice(0, 10)) { // batch of 10 per run
const out = await translateFile(path.join('src/content/articles/en', f));
await fs.writeFile(path.join('src/content/articles/zh', f), out);
await fs.appendFile('done.txt', f + '\n');
console.log('translated:', f);
}
Run with node scripts/translate-batch.mjs repeatedly until done.txt covers everything. Resume is free because done.txt is the source of truth on progress.
Terminology consistency
- Lock product names and proper nouns in the glossary. Most “translations” of these should be “keep original”.
- Lock 20-50 domain terms. For a developer audience,
commit / deploy / buildoften stay English in Chinese. For a general audience, translate to standard equivalents. - Run a post-pass that greps the translated corpus for each glossary term. If 95% use one rendering and 5% use another, batch-replace.
- For tone words (terse, conversational, professional), include 2-3 sample sentences in the system prompt showing the target tone.
- Maintain the glossary in a versioned file (
glossary.jsonorglossary.csv). When you update a term, you know exactly which articles need re-translation by grepping the source corpus for the old rendering.
Resume and retry
- Log every successful translation to
done.txtwith a hash of the source file. If the source changes, the hash differs and that file goes back into the queue. - Retry on rate limits with exponential backoff. Frontier model rate limits in 2026 are generous but not infinite — for 2000 files you will hit them.
- Cache the LLM responses by source-file hash for the first 30 days. If the script crashes mid-run, restarting only re-translates files that have not been cached yet.
- Run a structural validator on every output before saving. If output is malformed, retry once with a higher temperature, then quarantine for human review.
QA pass that actually catches things
- Random sample 10% of articles. Read each one fully in the target language for naturalness, not just accuracy.
- Targeted checks: every code block matches source, every link’s URL is verbatim, every frontmatter field is the right value.
- Native-speaker spot check for the highest-traffic 5% of articles. These get the most eyeballs; quality matters most there.
- Compare in-article metrics before and after translation publishes. If session duration on translated pages is below 60% of source, something feels off in the language.
Common mistakes
- Translating one file at a time interactively. Cost and time explode; consistency drops.
- Letting the model translate URLs inside Markdown links. Always pass URLs through verbatim.
- Forgetting
translationKey. Without it, the bilingual nav cannot link sister articles. - Skipping the structural validation step. A model may silently merge two headings into one — readers will not notice immediately, but search engines will.
- Publishing translated articles without any human review. Even 0.5% paraphrase error rate across 1000 articles means 5 articles say something wrong.
- Re-translating the whole corpus when you change one glossary term. Use a targeted re-translation script for affected files only.
FAQ
- Which model is best for EN to ZH technical content?: Claude Opus 4.7 and GPT-5.5 are both excellent. Claude tends to keep technical jargon in English where natural; GPT translates more aggressively. Test on 5 articles in your domain before committing.
- How much does it cost?: Roughly $5-20 per 100 articles depending on length and model. Cheaper than any human translator, including non-native ones.
- Should I review every article?: Spot-check 10%. If the spot-check error rate is above 5%, re-prompt and re-translate the batch. Below 5%, fix in place.
- Do I need a CAT tool like memoQ?: No. The LLM does the work a CAT tool would have done, plus the translation. A simple folder structure with a
done.txtlog is enough. - What about RTL languages or scripts the model handles poorly?: Test the model first. For RTL (Arabic, Hebrew), validate that no code blocks or numbers got mirrored. For low-resource scripts, expect to hire a human reviewer on top.
- Should I publish translated articles all at once or staged?: Stage them. Publish 50 articles, watch indexing and CTR in Search Console for a week, then publish the next 200. A sudden flood of new pages can trigger crawl rate limits and quality reviews.
- What about updating existing translations when the source changes?: Diff the source file, send the diff plus the existing translation to the model, ask for a minimal-edit translation of just the changed sections. This preserves human edits while syncing new content.
- Should the translated site share a sitemap with the source or be separate?: One sitemap is fine; the canonical and hreflang tags do the heavy lifting. Sitemap-per-language can help debugging but does not change indexing.
- How do I handle locale-specific content (currency, region examples)?: Mark these in the source with a comment the model can read. The system prompt then says “for marked content, adapt to the target locale’s conventions” rather than translate literally.