Meta Robots vs X-Robots-Tag — Which One Wins

When `<meta name="robots">` and `X-Robots-Tag` HTTP header conflict, results are unpredictable. How to decide which to use and how to keep them in sync.

You added <meta name="robots" content="noindex"> to a staging page and a week later the production version is missing from Google. Or the opposite: you noindexed a thank-you page in the meta tag, but it still ranks. In both cases the meta is only half the story — Google reads two channels for robots directives, and they don’t have to agree. The HTTP response header X-Robots-Tag is the other channel, and when the two disagree Google does not pick one source as authoritative. It combines them by taking the most restrictive value across both.

That means a stray X-Robots-Tag: noindex from a CDN rule can silently kill an HTML page whose meta tag says index. The page looks fine in View Source. Only curl -I shows the truth.

How to identify which case you’re in

Case 1: Page has noindex in meta but is still indexed

How to spot it:

curl -s https://yourdomain.com/path | grep -i 'name="robots"'
# Shows: <meta name="robots" content="noindex">

curl -sI https://yourdomain.com/path | grep -i x-robots
# Shows: (nothing) — meta says noindex, header says nothing

Then in Search Console URL Inspection, the page reports as indexed.

Why it happens: Google has not re-crawled the page since you added the meta tag, or the page renders the meta via JavaScript and Googlebot didn’t execute that JS during this crawl. The header would have been read instantly on every request; the meta only takes effect on re-crawl + render.

Fix: render noindex server-side in the initial HTML (not via client JS), then use URL Inspection’s “Test live URL” → confirm Google sees noindex in the rendered HTML → “Request indexing” to trigger re-crawl. See I set noindex but the page is still in search results for the full removal timeline.

Case 2: Page has no noindex in meta but disappeared from the index

How to spot it:

curl -sI https://yourdomain.com/path | grep -i x-robots
# X-Robots-Tag: noindex, nofollow

Open the page in a browser, View Source — there is no robots meta, or it says index, follow. The header is the killer.

Why it happens: typical leak paths:

  • Vercel, Netlify, or Cloudflare Pages preview deployments inject X-Robots-Tag: noindex for preview hostnames, but a misconfigured production alias also receives it.
  • A WAF or CDN rule was scoped to *.staging.example.com but the rule’s host pattern also matches your apex domain.
  • An origin server middleware that adds X-Robots-Tag: noindex for NODE_ENV !== "production" ships to production because the env var isn’t set.

Fix: find which layer injects the header. Walk it back layer by layer:

# Check origin directly (bypass CDN)
curl -sI --resolve yourdomain.com:443:ORIGIN_IP https://yourdomain.com/path | grep -i x-robots
# Then check via CDN
curl -sI https://yourdomain.com/path | grep -i x-robots

If origin clean but CDN dirty → it’s a CDN rule. If both dirty → it’s the origin/app.

How to spot it:

curl -sI https://yourdomain.com/whitepaper.pdf | grep -i x-robots

PDFs, images, and other non-HTML responses cannot carry a <meta> tag. The only robots signal Google reads for them is X-Robots-Tag. If your hosting platform or framework sets X-Robots-Tag: noindex as a blanket default for static assets, your PDFs will never be indexed.

Why it happens: some frameworks (Next.js with default headers, S3+CloudFront with a noindex default policy) set X-Robots-Tag on all responses including assets.

Fix: scope the noindex rule to HTML pages only, or to specific paths, not blanket-applied.

Case 4: Behavior is intermittent — sometimes indexed, sometimes not

How to spot it: URL Inspection shows the rendered HTML carries noindex, but the raw response (the “HTTP response” tab) does not. Or vice versa.

Why it happens: a JavaScript injection mutates the meta tag after page load. If Googlebot executes the JS during this crawl, it sees noindex. If it doesn’t, it sees the original index value. The header path is always read; the meta path depends on render.

Fix: do not toggle robots meta via client JS. Render the final value in the SSR HTML, or set X-Robots-Tag server-side.

Case 5: noindex set in both, page still indexed

How to spot it:

# Both signals present
curl -s https://yourdomain.com/path | grep -i robots  # meta says noindex
curl -sI https://yourdomain.com/path | grep -i x-robots  # X-Robots-Tag: noindex

# But also:
curl -s https://yourdomain.com/robots.txt | grep -i path
# Disallow: /path

Why it happens: robots.txt Disallow blocks crawl entirely. Google never fetches the page, never sees the meta or header, and keeps the URL in the index based on external link signals alone (the famous “Indexed, though blocked by robots.txt” status). This is the most common noindex failure mode.

Fix: remove the Disallow line. The URL must be crawlable for Google to see the noindex and process the removal.

How Google resolves a conflict

From Google’s documented behavior:

  • Both meta robots and X-Robots-Tag are valid signals.
  • When both are present, the most restrictive value wins per directive.
  • noindex beats index. nofollow beats follow. noarchive beats absent.
  • If robots.txt blocks the URL, neither signal is ever read — the URL can still appear in results with no snippet.

So there is no “winner” between meta and header. They are merged. The practical rule: never let them disagree in production.

Shortest fix path

In hit-rate order:

  1. Run curl -I on the affected URL → 60% of the time the header is the smoking gun. The meta tag in View Source distracts you from the real cause.
  2. Check robots.txt → 25% of the time a Disallow line is preventing Google from ever seeing the signal you set.
  3. Walk the request layers → if header looks right at origin but wrong at the edge, it’s a CDN / WAF / hosting platform rule.
  4. Render robots meta server-side, not via JS → the remaining edge cases are almost all “JS sets the meta but Googlebot didn’t run JS.”

A safe production setup

Pick one source per content type. The combination below covers most sites:

  • HTML pages: meta robots in SSR HTML. Do not set X-Robots-Tag for HTML responses unless you have a specific reason.
  • PDF, images, downloads: X-Robots-Tag header (meta isn’t available).
  • Staging hosts: X-Robots-Tag: noindex scoped to staging hostnames only.

Example: scope the staging noindex to its hostname instead of applying it everywhere.

# Nginx — only on staging
server {
  server_name staging.yourdomain.com;
  add_header X-Robots-Tag "noindex, nofollow" always;
}

server {
  server_name yourdomain.com;
  # No X-Robots-Tag here. Meta tag handles per-page noindex.
}

Prevention

  • Document per-file-type which signal is canonical. Put it in the SEO README in the repo.
  • Add a CI check: curl -I a sample of production URLs and fail the build if X-Robots-Tag appears unexpectedly.
  • Never noindex in the meta and Disallow in robots.txt for the same URL — Google will never see the noindex.
  • Don’t toggle robots meta via client-side JS — Googlebot may skip JS execution on this crawl.
  • After changing platform settings (Vercel, Netlify, Cloudflare), curl -I a sample of pages to confirm headers didn’t change.

FAQ

Q: I set noindex in meta but the page is still in Google. What’s the first thing to check? A: Run curl -sI https://yourdomain.com/path | grep -i x-robots. If there is no header, then check robots.txt for a Disallow on that path. Those two checks resolve about 85% of cases. If neither is the issue, it’s almost always “Google has not re-crawled yet” — wait or use URL Inspection’s “Request indexing”.

Q: Can I use both meta and X-Robots-Tag on the same page? A: Yes, but only if they agree. Google merges them by taking the most restrictive directive. If meta says index and header says noindex, the page is noindex. Having both is redundant for HTML; pick one.

Q: Does X-Robots-Tag work for HTML pages? A: Yes, identically to meta. It’s just less common because most CMSes write the meta tag for you. The header is the right choice for non-HTML (PDF, image, ZIP) where there’s no place to put a meta tag.

Q: My staging environment leaked noindex to production. How do I undo it fast? A: Remove the header (deploy or CDN rule change), then in Search Console: URL Inspection on the affected URL → “Test live URL” → confirm header is now absent → “Request indexing”. For a homepage or top-level page, this can re-index within 24–72 hours.

Q: Does nosnippet or max-snippet interact differently between meta and header? A: No — same merge rule applies. The most restrictive value across both sources wins. If meta says max-snippet:50 and header says max-snippet:200, the page gets max 50.

Tags: #SEO #Troubleshooting #Debug #Structured data #robots.txt #X-Robots-Tag