You open the EN article, you open its ZH counterpart, and they barely look like translations of each other. EN has five ## sections and three fenced code blocks; ZH has three sections and one code block. EN added an FAQ block six weeks ago; ZH never got it. EN renamed a step from “Step 1: Audit” to “Step 1: Inventory” and the ZH version still says the old phrasing. The pair shares a translationKey but the content has structurally diverged.
This is different from word-count drift (where ZH is just terser by language). It is structural drift: section count differs, code block count differs, link targets differ, headings translate concepts that no longer exist in the other locale. Readers landing via hreflang feel cheated; Google sees mismatched alternates and downgrades trust on both. The fix has three legs: audit by structure (not just timestamps), enforce translate-as-you-edit at the PR layer, and accept “single-language” as a valid declaration for low-value pages.
Common causes
1. Solo edits never trigger a translation ticket
You edit en/foo.mdx to add a new section. You commit. Nothing reminds you that zh/foo.mdx now lacks that section. Repeat for six months across 200 articles and the structural gap is huge.
How to spot it: count ## headings per file and diff across pairs.
for f in src/content/articles/en/troubleshooting/*.mdx; do
key=$(basename "$f")
zh="src/content/articles/zh/troubleshooting/$key"
[ -f "$zh" ] || continue
en_sec=$(grep -c '^## ' "$f")
zh_sec=$(grep -c '^## ' "$zh")
if [ "$en_sec" != "$zh_sec" ]; then
echo "$key: en=$en_sec zh=$zh_sec"
fi
done
Anything where en and zh disagree by 2+ is structural drift, not language verbosity.
2. Edits propagate one direction only — usually EN -> ZH stalls
Most content sites have a primary author who writes EN first. ZH gets translated weeks later, if at all. Subsequent EN edits never round-trip back to ZH. The pair starts mirrored and drifts every PR.
How to spot it: list pairs where EN mtime is more than 30 days newer than ZH.
3. Renamed translationKey or moved file breaks the pair silently
You renamed a slug in EN. The translationKey now points to a missing ZH file (or a stale one with the old key). hreflang emits a dangling pair. Nothing fails the build.
How to spot it: dump translationKeys from both locales and diff.
diff \
<(grep -h "^translationKey:" src/content/articles/en/**/*.mdx | sort -u) \
<(grep -h "^translationKey:" src/content/articles/zh/**/*.mdx | sort -u)
4. New code examples added in EN never copied to ZH
You added a fenced code block in EN with a fresh shell script. ZH still shows the old version (or has no code block at all). Code blocks are language-agnostic but the surrounding prose isn’t — so the ZH page now references a snippet that does not appear on the page.
How to spot it: count triple-backtick fences per pair and diff.
5. FAQ block added on one side only
You added a ## FAQ section with three ### Question? entries on EN. ZH never got it. The FAQ JSON-LD only emits on EN. The ZH page loses a rich result opportunity and looks thinner.
Shortest path to fix
Step 1: Run a structural diff across all pairs
Build a script that compares structure, not just mtime:
# scripts/audit-pair-structure.mjs
import fs from "node:fs";
import path from "node:path";
const EN_DIR = "src/content/articles/en/troubleshooting";
const ZH_DIR = "src/content/articles/zh/troubleshooting";
function metrics(file) {
const txt = fs.readFileSync(file, "utf8");
return {
h2: (txt.match(/^## /gm) || []).length,
h3: (txt.match(/^### /gm) || []).length,
code: (txt.match(/^```/gm) || []).length / 2,
lines: txt.split("\n").length,
};
}
for (const f of fs.readdirSync(EN_DIR)) {
const en = path.join(EN_DIR, f);
const zh = path.join(ZH_DIR, f);
if (!fs.existsSync(zh)) continue;
const a = metrics(en), b = metrics(zh);
if (Math.abs(a.h2 - b.h2) >= 2 || Math.abs(a.code - b.code) >= 2) {
console.log(`DRIFT ${f}: h2 en=${a.h2} zh=${b.h2}, code en=${a.code} zh=${b.code}`);
}
}
Output ranks the most divergent pairs. Sync those first.
Step 2: For each drifted pair, decide sync or split
Three legitimate outcomes:
- Sync: bring the laggard up to match the leader's structure
- Split: content has legitimately diverged; remove the translationKey pair and treat as two distinct articles
- Mark single-language: low-traffic ZH; remove translationKey on ZH, drop hreflang alternate from EN
Do not “auto-translate the missing sections.” Bad MT is worse than a missing section. Either commit to a real translation or split the pair.
Step 3: Enforce translate-as-you-edit at the PR layer
Add a CI step that flags any PR touching en/*.mdx without touching the matching zh/*.mdx:
# .github/workflows/translation-sync.yml fragment
- name: Check translation parity
run: |
CHANGED_EN=$(git diff --name-only origin/main -- 'src/content/articles/en/' | grep '\.mdx$' || true)
for f in $CHANGED_EN; do
zh=$(echo "$f" | sed 's|/en/|/zh/|')
if [ -f "$zh" ] && ! git diff --name-only origin/main | grep -q "$zh"; then
echo "::warning::EN changed: $f -- but ZH not updated: $zh"
fi
done
This is a warning, not a failure. The author either updates ZH in the same PR or opens an i18n ticket and acknowledges the drift explicitly.
Step 4: Backfill the worst offenders deliberately
Pick the top 20 drifted pairs by traffic (filter Search Console by /zh/articles/*). Sync those manually. Ignore the long tail until it earns the work.
Step 5: Verify hreflang still pairs cleanly
After sync, recheck hreflang in the sitemap. Each pair should emit:
<xhtml:link rel="alternate" hreflang="en" href="https://site.com/en/articles/slug/" />
<xhtml:link rel="alternate" hreflang="zh" href="https://site.com/zh/articles/slug/" />
If a page got marked single-language, drop the alternate entirely. Half-emitted hreflang is worse than none.
Prevention
- CI warning whenever EN/ZH change in isolation; author must respond
- Structural audit (h2/h3/code-block count) runs weekly in prebuild
- New
## FAQblock on one side requires an i18n ticket before merge - Renaming a slug requires updating both locales in the same PR; lint rule enforces it
- Low-traffic pages explicitly marked single-language rather than left drifted
- Quarterly review: top 20 drifted pairs get scheduled sync work
Related
- Bilingual Pages Drift Apart Over Time
- Content Site Hreflang Tags Misconfigured
- Content Site Canonical Points to Self Wrong
- Stale Articles Not Updated
- Article Count Looks Big But Real Coverage Is Weak
- Content Site Broken Internal Link Rot
Tags: #Content ops #Site quality #Site audit #Troubleshooting #Bilingual