Canonical URLs Explained — What to Set and What to Avoid

A clear, practical guide to canonical tags — what they do, when they help, and the four mistakes that quietly break indexing.

A canonical URL tells Google “of all the URLs that show this content, this one is the original.” Set it right and duplicate content stops eating your rankings. Set it wrong and Google quietly drops half your pages from the index.

Background

The <link rel="canonical" href="..."> tag was introduced to fix the URL-duplication mess of the web — same page reachable at www and bare domain, with and without trailing slash, with tracking parameters, paginated, AMP, language variants. Canonical points all of those at one preferred URL. Google still treats it as a hint, not a command, but it works correctly about 90% of the time when you set it correctly.

How to tell

  • Your Pages report shows “Duplicate without user-selected canonical” or “Google chose different canonical than user”.
  • site:yourdomain.com returns both www. and bare-domain versions of the same article.
  • Tracking-parameter URLs (?utm_source=...) are getting indexed instead of the clean URL.
  • You ship the same content in English and Chinese and Google is confused about which version to rank.

Quick verdict

Set a self-referencing canonical on every page — <link rel="canonical" href="https://yoursite.com/articles/slug/">. For real duplicates (parameterized URLs, pagination, syndicated copies), point them at the master URL. Do not get clever with cross-domain or cross-language canonical — use hreflang for language and 301 for cross-domain.

Step by step

  1. Audit your current canonical. View source on 5-10 pages and search for rel="canonical". Confirm every page has exactly one, pointing at the public, https, no-tracking-parameter version of itself.
  2. Pick one host: https://yoursite.com or https://www.yoursite.com. The other should 301 redirect to it. Canonical alone is not enough — you also need the redirect.
  3. For paginated lists (/blog/page/2), canonical each page to itself, not to page 1. Pointing all pages to page 1 hides the deeper pages and slows discovery.
  4. For URLs with tracking parameters, the canonical should drop the parameters: ?utm_source=twitter -> canonical https://yoursite.com/articles/slug/.
  5. For multilingual sites, do NOT canonical zh to en or vice versa. Each language version has its own self-referencing canonical, and they reference each other via hreflang.
  6. After deploy, use Search Console URL Inspection on 3-4 sample URLs and check “Google-selected canonical” matches your declared one. If it does not, your hint is being overruled — investigate why before doing anything else.

Common pitfalls

  • Canonical-ing every page to the homepage (a real anti-pattern from broken CMS plugins). This deindexes your entire site.
  • Canonical-ing zh pages to the en version. Google drops the Chinese pages from the Chinese index and you lose half your traffic.
  • Setting canonical to a URL that returns 404 or 301. Google ignores it and picks its own canonical — usually the wrong one.
  • Using the absolute URL on the staging domain (staging.yoursite.com) and forgetting to flip it for production. Now your live pages canonical to a private staging server.
  • Mixing http/https or www/non-www between canonical and the actual URL. They must match exactly, byte for byte.

Who this is for

Site owners who already have duplicate-content reports in Search Console, or who run any combination of pagination, tracking links, and multiple subdomains.

When to skip this

Brand-new 5-page sites with no parameters, no pagination, and one language. A self-referencing canonical is enough — you can come back to this when you scale.

FAQ

  • What is a self-referencing canonical?: A canonical that points the page at its own clean URL. Every page should have one — it tells Google “this URL is the canonical version of this content.” It is the safest default and prevents Google from picking a weird canonical on its own.
  • What if Google picks a different canonical than I declared?: It usually means your declared canonical conflicts with other signals — internal links pointing at a different URL, sitemap listing a different URL, or hreflang referencing a different URL. Make all signals agree and Google will respect your hint.
  • Can I canonical across domains?: Technically yes, but only if you are syndicating content and want the original to rank. In every other case use 301 redirect instead — it is more reliable.
  • Should I canonical or noindex duplicate pages?: canonical if both URLs serve real users (e.g. category and search-result page show the same product). noindex if one URL is purely internal (e.g. cart page, admin). They are not interchangeable.

Tags: #Indie dev #SEO #Technical SEO #Canonical