<link rel="canonical"> is how you tell Google “if this page has multiple versions, assign indexing and ranking signals to this URL.” Get it wrong and you hit one of two fatal outcomes: the page you actually want indexed never enters the index, or the URL that ranks is a wrong version (param-laden URL, a staging subdomain, an alternate locale, even someone else’s site).
The worst part: in the browser everything looks fine. Only Google knows you’re shooting yourself in the foot. The three patterns below are all real.
Common causes
1. Canonical points to a non-existent / 404 / noindexed URL
Typical patterns:
<!-- Template hardcoded a path, but the target URL was later deleted -->
<link rel="canonical" href="https://yourdomain.com/legacy/post" />
<!-- Domain changed but the template didn't get updated -->
<link rel="canonical" href="https://staging.yourdomain.com/article" />
<!-- Self-canonical, but page is noindex -->
<meta name="robots" content="noindex" />
<link rel="canonical" href="https://yourdomain.com/this-very-page" />
Google reads the canonical → fetches that URL → 404 or noindex → the whole group (original + canonical) drops from the index.
How to confirm: Search Console → URL inspection → check whether “Google-selected canonical” returns 200. Or from the terminal:
curl -sI "https://yourdomain.com/legacy/post" | head -1
# Want: HTTP/2 200
2. Cross-domain canonical without permission or reciprocal mention
A cross-site canonical (pointing to another domain or CDN host) only works when you actually want signals to transfer to that URL — for example, syndicated content. Common screwups:
- You copied another site’s HTML template and forgot to update the canonical (still points to the original site)
- You serve through a CDN subdomain (
cdn.example.com) but didn’t canonical-back to the main host - Both
www.and the apex domain exist, and the canonical points to a mix of the two
<!-- Your page lives on www.yourdomain.com -->
<!-- Wrong: splits its own authority -->
<link rel="canonical" href="https://yourdomain.com/article" />
How to confirm: Search Console → “Pages” report → look at “Duplicate, Google chose different canonical.” That status is specifically these cases.
3. Canonical fights hreflang / robots / sitemap
hreflang requires each locale version to reference the others, but canonical must point at its own locale. Otherwise the two signals cancel:
<!-- on /zh/article -->
<link rel="canonical" href="https://yourdomain.com/zh/article/" />
<link rel="alternate" hreflang="zh" href="https://yourdomain.com/zh/article/" />
<link rel="alternate" hreflang="en" href="https://yourdomain.com/en/article/" />
<link rel="alternate" hreflang="x-default" href="https://yourdomain.com/en/article/" />
The wrong version (canonical pointing to the English page from the Chinese page) drops the entire Chinese group from the index — only the English page ranks, and it ranks badly for Chinese queries.
Another conflict: your sitemap lists URL A, but page A’s canonical points to URL B. Google trusts the canonical, so the sitemap submission is wasted.
4. Canonical mismatches by case / trailing slash / protocol
Google treats these as different URLs:
HTTPS://yourdomain.com/Articlevshttps://yourdomain.com/articlehttps://yourdomain.com/articlevshttps://yourdomain.com/article/https://yourdomain.com/articlevshttp://yourdomain.com/article
If the sitemap, internal links, and canonical disagree on case / slash, Google tolerates one 301 hop when fetching the canonical, but two hops makes the signal weak enough that Google may pick a different version as primary.
Shortest path to fix
Step 1: Audit canonicals across the site
This script walks your sitemap and extracts the canonical for each URL:
// scripts/audit-canonicals.mjs
import { XMLParser } from "fast-xml-parser";
const sitemapUrl = "https://yourdomain.com/sitemap.xml";
const expectedHost = "yourdomain.com";
const xml = await fetch(sitemapUrl).then((r) => r.text());
const { urlset } = new XMLParser().parse(xml);
const urls = urlset.url.map((u) => u.loc);
for (const url of urls) {
const html = await fetch(url).then((r) => r.text());
const m = html.match(/<link\s+rel=["']canonical["']\s+href=["']([^"']+)["']/i);
const canonical = m?.[1] ?? "(missing)";
const issues = [];
if (canonical === "(missing)") issues.push("MISSING");
else {
const c = new URL(canonical);
if (c.host !== expectedHost) issues.push(`CROSS-HOST: ${c.host}`);
if (c.protocol !== "https:") issues.push("NON-HTTPS");
if (c.pathname !== new URL(url).pathname) issues.push("PATH-DIFFERS");
}
console.log(`${url}\t→ ${canonical}\t${issues.join(",")}`);
}
Run: node scripts/audit-canonicals.mjs > canonicals.tsv and skim the flagged lines.
Step 2: Default to self-canonical, only point elsewhere when necessary
90% of pages should canonical to themselves. In your layout:
---
const canonical = Astro.url.href;
---
<link rel="canonical" href={canonical} />
Only point elsewhere in these specific cases:
| Case | Canonical points to |
|---|---|
Pagination /blog?page=2 | /blog |
Param variants /p?utm=x | /p |
Mobile subdomain m.example.com/p | example.com/p |
| Self-syndicated (you wrote it, published elsewhere) | Your master version |
| Re-syndicated (someone else’s original) | Their master version |
Step 3: When canonical + hreflang coexist, generate both from one helper
export function buildHreflangAndCanonical(currentLang, slug, langs) {
const base = "https://yourdomain.com";
const canonical = `${base}/${currentLang}/${slug}/`;
const alternates = langs.map((l) => ({
hreflang: l,
href: `${base}/${l}/${slug}/`,
}));
alternates.push({ hreflang: "x-default", href: `${base}/en/${slug}/` });
return { canonical, alternates };
}
Every page goes through this function. Canonical is always the current locale; hreflang covers all locales + x-default; the loop always closes.
Step 4: Add a build-time check
In prebuild:
// scripts/check-canonical-build.mjs
import fg from "fast-glob";
import fs from "node:fs";
const files = fg.sync("dist/**/*.html");
const issues = [];
for (const f of files) {
const html = fs.readFileSync(f, "utf8");
const cm = html.match(/<link\s+rel=["']canonical["']\s+href=["']([^"']+)["']/i);
const robots = html.match(/<meta\s+name=["']robots["']\s+content=["']([^"']+)["']/i);
if (!cm) issues.push(`${f}: MISSING canonical`);
if (robots?.[1]?.includes("noindex") && cm) {
issues.push(`${f}: noindex + canonical (canonical wasted)`);
}
}
if (issues.length) {
console.error(issues.join("\n"));
process.exit(1);
}
This catches “template wrote the wrong canonical” before deploy.
Step 5: Force a re-crawl on key URLs
In Search Console, “Request indexing” on your 5-10 most important URLs. A full re-evaluation typically takes 1-4 weeks.
Prevention
- All canonicals go through one
buildCanonical()helper — no hand-written tags - Run
audit-canonicals.mjssite-wide before shipping a new template - CI prebuild blocks: missing canonical, noindex + canonical coexisting, cross-host canonicals
- canonical / sitemap / internal links agree exactly on case and trailing slash — generate all URLs through a single
urlFor()function
Related
- Canonical wrong after domain change
- Alternate page with proper canonical tag
- Duplicate, Google chose different canonical
- Googlebot Crawl Spikes But Impressions Stay Flat
- Infinite Scroll Pages Don’t Get Indexed
- Page Not Mobile-Friendly Warning
Tags: #SEO #Google #Search Console #Indexing