Bilingual Pages Drift Apart Over Time

English got updated 5 times, Chinese once. ZH version has outdated screenshots, broken cross-links, hreflang warnings. Audit drift, decide per pair, automate the sync.

You launched bilingual two years ago. Today: English articles got updated 5 times each since launch; Chinese versions are mostly untouched. ZH pages reference outdated screenshots, missing sections that exist only in EN, broken cross-links to pages that were renamed in EN. Search Console throws hreflang warnings on dozens of pairs. The bilingual site you proudly launched is actually one site with a fading shadow.

Bilingual content is a commitment, not a launch event. Drift compounds: 6 months of one-side updates produces a year of catch-up debt. The fix isn’t “translate everything now” — it’s audit + decide per pair, automate sync for future updates, and accept that some articles should be marked single-language rather than maintained badly.

Common causes

Ordered by hit rate, highest first.

1. Translation happened once at launch; updates only touch primary language

You translated 200 EN articles to ZH at launch. Since then, every update was English-only. ZH froze in time while EN evolved.

How to spot it: Compare git log dates on en/ vs zh/ versions of the same translationKey. If en is newer for >50% of pairs, you have systematic drift.

You renamed gpt-tips.mdx to chatgpt-tips.mdx in EN. ZH still has gpt-tips.mdx. They no longer pair. hreflang is broken for both pages.

How to spot it: Find translationKey values present in one language only:

diff \
  <(grep -h "translationKey:" src/content/articles/en/**/*.mdx | sort -u) \
  <(grep -h "translationKey:" src/content/articles/zh/**/*.mdx | sort -u)

3. New articles published in one language never got translated

You add 5 EN articles a month, but only translate 1-2 to ZH. Translation backlog grows monthly. After a year, 50+ articles are EN-only.

How to spot it: Articles with translationKey in en/ but no matching zh/ file. Count these — that’s your translation debt.

4. Auto-translation got applied without review

To “fix drift,” you machine-translated the missing pieces. Now the non-English locale has fluent-sounding but contextually wrong content — translated copy that references the English button name when the UI shows the localized one, or English nouns sticking out in translated paragraphs. Bilingual presence with worse quality than monolingual.

How to spot it: Read 5 random recent ZH translations. If they have AI-translation tells (literal idioms, English remnants, mismatched UI terms), unreviewed auto-translation is the culprit.

EN article links to /zh/articles/old-name/ because the EN was written first and the author copy-pasted a link without updating the locale. ZH reader clicks → 404.

How to spot it: Grep for /zh/articles/ in en/ files (and vice versa) — usually wrong unless explicitly bilingual reference. Count mismatched-locale links.

6. Translations diverge in content, not just freshness

EN got expanded with new examples; ZH got pruned for brevity. Now they’re not translations of each other — they’re related articles. hreflang implies they’re the same; readers find they’re not.

How to spot it: Word count ratio. If en > 1.5x zh (or vice versa) and the difference isn’t just language verbosity, content actually diverged.

Shortest path to fix

Ordered by ROI. Step 1 audits; subsequent steps decide what to do per drift type.

Step 1: Build a drift audit

Script the pairing check:

# scripts/audit-bilingual.mjs
import fs from "node:fs";
import path from "node:path";
import matter from "gray-matter";

const en = collectKeys("src/content/articles/en/troubleshooting");
const zh = collectKeys("src/content/articles/zh/troubleshooting");

console.log("EN only:", [...en.keys()].filter(k => !zh.has(k)));
console.log("ZH only:", [...zh.keys()].filter(k => !en.has(k)));
console.log("Both, EN newer:", [...en.keys()].filter(k => zh.has(k) && en.get(k).mtime > zh.get(k).mtime));

function collectKeys(dir) {
  const map = new Map();
  for (const f of fs.readdirSync(dir)) {
    const p = path.join(dir, f);
    const { data } = matter(fs.readFileSync(p, "utf8"));
    if (data.translationKey) {
      map.set(data.translationKey, { path: p, mtime: fs.statSync(p).mtime });
    }
  }
  return map;
}

Output: list of EN-only, ZH-only, and EN-newer pairs. This is your drift inventory.

Step 2: For each pair, decide: translate, sync, or single-language

| Pair type | Action |
|---|---|
| EN-only | (a) translate to ZH, OR (b) mark single-language with hreflang declaring so |
| ZH-only | Same, in reverse |
| Both, EN newer | Sync ZH from current EN |
| Both, ZH newer | Sync EN from current ZH |
| Both, diverged content | Pick a canonical, sync the other or split into distinct articles |

Not every article needs to be bilingual. “Mark single-language” is a legitimate choice — better than maintaining badly.

Step 3: Sync the high-value pairs first

Don’t try to sync 200 pairs in one weekend. Prioritize by traffic:

# In GSC: filter by URL pattern /zh/articles/*
# Sort by impressions or clicks
# Top 20 ZH pages with traffic → these get sync priority

A ZH page with 100 monthly impressions deserves sync; a ZH page with 0 impressions doesn’t justify the work.

Step 4: For new updates, automate the translation queue

Add a CI check or pre-commit hook:

# scripts/check-translation-sync.sh
# When an en/*.mdx is modified, check the matching zh/*.mdx
# If zh is older than en's previous version's date, fail

CHANGED=$(git diff --name-only origin/main -- 'src/content/articles/en/' | grep '\.mdx$')
for f in $CHANGED; do
  zh=$(echo "$f" | sed 's/\/en\//\/zh\//')
  if [ -f "$zh" ]; then
    # ZH exists; flag for review
    echo "::warning::EN updated: $f — ZH may need sync: $zh"
  fi
done

This doesn’t auto-translate; it surfaces drift at the PR stage so you can decide.

Step 5: Fix hreflang explicitly

In your sitemap / page metadata:

<url>
  <loc>https://site.com/en/articles/topic/</loc>
  <xhtml:link rel="alternate" hreflang="en" href="https://site.com/en/articles/topic/" />
  <xhtml:link rel="alternate" hreflang="zh" href="https://site.com/zh/articles/topic/" />
  <xhtml:link rel="alternate" hreflang="x-default" href="https://site.com/en/articles/topic/" />
</url>

Articles with no counterpart get no hreflang entry — single-language declaration. Don’t half-do it.

Step 6: Reject blind machine translation

If you must auto-translate to catch up:

- Auto-translate to a draft branch
- Human reviews each one for UI terms, idioms, and tone
- Verify cross-links are localized
- Only then publish

Bad ZH is worse than no ZH. AdSense and Google both detect unreviewed MT content.

Prevention

  • Automate a CI drift check that fires whenever an EN file is updated without ZH (or vice versa)
  • Translation debt is a tracked metric; review monthly and reduce deliberately
  • Articles without bilingual maintenance get marked single-language, not left half-maintained
  • For high-traffic pages, sync is mandatory; for zero-traffic pages, single-language is fine
  • Reject auto-translation without human review — bad bilingual is worse than mono-lingual
  • translationKey is the contract; renaming a file requires updating its counterpart too

Tags: #Content ops #Site quality #Site audit #Troubleshooting #Bilingual