You check Search Console and your ZH pages show “Alternate page with proper canonical tag” — meaning Google decided not to index them because their canonical points elsewhere. You view-source on a ZH article and find a <link rel="canonical" href="https://site.com/en/articles/foo/"> — the ZH page is canonicalizing itself to the EN version. To Google, ZH and EN are now the same URL; the ZH version disappears from the index. Half your bilingual investment goes invisible.
This usually comes from an SEO plugin set to “use the primary language version” or a layout that always emits the EN URL. The fix is conceptually trivial: each page’s canonical must be its own URL. The execution requires careful template work, a build-time check, and a one-time view-source verification across a sample of pages.
Common causes
1. SEO plugin configured to canonicalize all translations to primary
Some plugins (and some custom layouts) have a “consolidate authority on the primary language” option. Sounds reasonable; is actually wrong. Hreflang already handles the locale relationship; canonical should be self-referential.
How to spot it: view-source on a translated page; check the canonical link target.
curl -s https://site.com/zh/articles/foo/ | grep 'rel="canonical"'
If the href points at /en/, the plugin is mis-configured.
2. Hard-coded canonical in the layout
Someone wrote <link rel="canonical" href={https://site.com/en/articles/$\{slug\}/`} />` in the article layout. That worked for EN pages and silently broke ZH pages.
How to spot it: grep the layout for canonical.
grep -rn 'rel="canonical"' src/layouts/ src/components/
If the URL doesn’t include Astro.url or the current page’s locale, it’s hard-coded.
3. Canonical points at the URL without trailing slash (or vice versa)
Site serves /en/articles/foo/ but canonical declares /en/articles/foo (no trailing slash). Google sees two URLs, picks the one in canonical, indexes that one, and your real URLs are deprioritized.
How to spot it: compare canonical href to actual page URL. Trailing slash must match.
4. Canonical includes query strings or fragments
Author shared a URL with ?utm_source=twitter and that pattern got cached or hard-coded into a template. Now canonical includes query strings that fragment indexing.
How to spot it: canonical href contains ? or #.
5. Cross-domain canonical pointing to a republished version
You syndicated an article to Medium/Substack. Someone set the Medium canonical to your site (correct) but also set YOUR site’s canonical to the Medium URL (wrong). Now your own page tells Google “I’m a copy of Medium.”
How to spot it: any canonical hostname that doesn’t match the page hostname.
6. Canonical missing entirely
No canonical tag at all. Google chooses on its own — usually fine, but with URL variants (with/without trailing slash, with utm params) you lose deterministic indexing.
How to spot it: view-source for rel="canonical". Empty result — missing.
Shortest path to fix
Step 1: Make canonical self-referential, per page
In your article layout, compute the canonical from the current URL:
---
const { article } = Astro.props;
const SITE = "https://site.com";
const canonical = `${SITE}/${article.data.lang}/articles/${article.data.urlSlug}/`;
---
<link rel="canonical" href={canonical} />
This guarantees ZH canonicalizes to ZH, EN to EN. Hreflang separately tells Google the pages are alternates.
Step 2: Lock down trailing slash and casing
In astro.config.mjs:
export default defineConfig({
trailingSlash: "always",
build: { format: "directory" },
site: "https://site.com",
});
Canonical URLs in layout must match what the server actually serves. Build canonicals from Astro.site (so it picks up the trailing slash from config) when possible.
Step 3: Verify with curl + view-source on samples
Sample one article in each language and one in each major category:
for url in \
https://site.com/en/articles/foo/ \
https://site.com/zh/articles/foo/ \
https://site.com/en/articles/bar/ \
https://site.com/zh/articles/bar/
do
echo "=== $url ==="
curl -s "$url" | grep -E 'rel="(canonical|alternate)"'
done
Each page’s canonical should match its own URL. Each pair’s alternates should reciprocate.
Step 4: Add a prebuild assertion
Catch regressions:
# scripts/audit-canonical.mjs
import fs from "node:fs";
import path from "node:path";
const distRoot = "dist";
let problems = 0;
function walk(dir) {
for (const e of fs.readdirSync(dir, { withFileTypes: true })) {
const p = path.join(dir, e.name);
if (e.isDirectory()) { walk(p); continue; }
if (!p.endsWith("index.html")) continue;
const html = fs.readFileSync(p, "utf8");
const m = html.match(/<link\s+rel="canonical"\s+href="([^"]+)"/);
if (!m) { console.error(`MISSING canonical: ${p}`); problems++; continue; }
// Reconstruct expected URL from path
const rel = p.replace(/^dist/, "").replace(/index\.html$/, "");
const expected = `https://site.com${rel}`;
if (m[1] !== expected) {
console.error(`WRONG canonical: ${p} -> ${m[1]} (expected ${expected})`);
problems++;
}
}
}
walk(distRoot);
process.exit(problems > 0 ? 1 : 0);
Wire to a postbuild step.
Step 5: Request reindexing on previously deindexed pages
In Search Console, for pages stuck on “Alternate page with proper canonical tag,” request indexing once the canonical is fixed. The fix only takes effect on Google’s next crawl.
Prevention
- Canonical computed from current page URL, never hard-coded
- Trailing-slash policy locked in Astro config; canonical matches served URL exactly
- Postbuild audit: every page has a canonical matching its own URL
- SEO plugin (if any) configured to NOT consolidate translations
- No query strings or fragments in canonical
- Cross-domain syndication uses canonical-to-self on the original site; only the syndicated copy canonicalizes back
- Sample view-source check after major template changes
Related
- Content Site Hreflang Tags Misconfigured
- Bilingual Pages Drift Apart Over Time
- Content Site Translation Pages Mismatched
- Content Site Sitemap Not Resubmitted After Big Changes
- Search Console Low Value URLs
- Content Site FAQ Schema Not Extracted
Tags: #Content ops #Site quality #Site audit #Troubleshooting #Canonical