Meta Robots vs X-Robots-Tag — Which One Wins

Q: I set noindex in meta but the page is still in Google. What's the first thing to check?

Run `curl -sI https://yourdomain.com/path | grep -i x-robots`. If there is no header, then check `robots.txt` for a `Disallow` on that path. Those two checks resolve about 85% of cases. If neither is the issue, it's almost always "Google has not re-crawled yet" — wait or use URL Inspection's "Request indexing".

Q: A `something.vercel.app` URL of my site is in Google but my real domain isn't. Why?

Vercel sets `X-Robots-Tag: noindex` on system-generated `*.vercel.app` URLs by default, so those shouldn't be indexed — but if Google found one via a link before it could read the header, the raw URL can linger. Fix the canonical/links so only your production domain is referenced, make sure your production domain returns no `noindex` header (`curl -sI`), and let the `*.vercel.app` URL drop out. Never serve your real content on the bare deployment URL as the canonical.

When `<meta name="robots">` and the `X-Robots-Tag` HTTP header conflict, Google merges them and takes the most restrictive value. How to diagnose it with curl and keep both in sync.

Published: May 19, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Fastest fix: run curl -sI https://yourdomain.com/path | grep -i x-robots on the affected URL. If an X-Robots-Tag: noindex header is present, that header is overriding your meta tag — Google takes the most restrictive value across both signals, so a header noindex beats a meta index every time. Strip the stray header at whatever layer injects it, then re-crawl. If there is no header, jump to Case 5 (robots.txt is blocking the page).

You added <meta name="robots" content="noindex"> to a staging page and a week later the production version is missing from Google. Or the opposite: you noindexed a thank-you page in the meta tag, but it still ranks. In both cases the meta is only half the story — Google reads two channels for robots directives, and they don’t have to agree. The HTTP response header X-Robots-Tag is the other channel, and when the two disagree Google does not pick one source as authoritative. As Google’s robots meta tag spec puts it: “In the case of conflicting robots rules, the more restrictive rule applies.” So it combines the two channels and takes the most restrictive value per directive.

That means a stray X-Robots-Tag: noindex from a CDN rule, a stale Vercel deployment, or an origin middleware can silently kill an HTML page whose meta tag says index. The page looks fine in View Source — browsers don’t show you response headers. Only curl -I (or the Network tab, or Search Console’s “HTTP response” view) shows the truth.

Which bucket are you in

Run the two checks below, then read off the row that matches:

Meta (`curl -s` + grep `robots`)	Header (`curl -sI` + grep `x-robots`)	`robots.txt`	Your case
`noindex`	none	allowed	Case 1 — not re-crawled yet, or meta set by JS
`index` / none	`noindex`	allowed	Case 2 — a header is leaking `noindex`
n/a (PDF, image, ZIP)	`noindex`	allowed	Case 3 — blanket header on non-HTML assets
flips between crawls	flips between crawls	allowed	Case 4 — JS mutates the meta after load
`noindex`	`noindex`	`Disallow`	Case 5 — `robots.txt` blocks the crawl entirely

The last column tells you which fix to apply. Cases 2 and 5 cover the large majority of “I set noindex and it didn’t work” (or “I didn’t set noindex and the page vanished”) reports.

How to identify which case you’re in

Case 1: Page has `noindex` in meta but is still indexed

How to spot it:

curl -s https://yourdomain.com/path | grep -i 'name="robots"'
# Shows: <meta name="robots" content="noindex">

curl -sI https://yourdomain.com/path | grep -i x-robots
# Shows: (nothing) — meta says noindex, header says nothing

Then in Search Console URL Inspection, the page reports as indexed.

Why it happens: Google has not re-crawled the page since you added the meta tag, or the page renders the meta via JavaScript and Googlebot didn’t execute that JS during this crawl. The header would have been read instantly on every request; the meta only takes effect on re-crawl + render.

Fix: render noindex server-side in the initial HTML (not via client JS), then use URL Inspection’s “Test live URL” → confirm Google sees noindex in the rendered HTML → “Request indexing” to trigger re-crawl. See I set noindex but the page is still in search results for the full removal timeline.

Case 2: Page has no `noindex` in meta but disappeared from the index

How to spot it:

curl -sI https://yourdomain.com/path | grep -i x-robots
# X-Robots-Tag: noindex, nofollow

Open the page in a browser, View Source — there is no robots meta, or it says index, follow. The header is the killer.

Why it happens: typical leak paths:

Stale Vercel deployment. Vercel automatically serves X-Robots-Tag: noindex on system-generated *.vercel.app URLs and on deployments not assigned to the Production Domain — including outdated production deployments after you promote a new one (per Vercel’s KB on preview indexing). If Google had indexed a raw deployment URL, it now sees noindex. A custom domain assigned to a non-production branch does not get this header, so the trap is usually a bare *.vercel.app URL that leaked into the index.
Host-pattern over-match. A WAF or CDN rule was scoped to *.staging.example.com but the host pattern also matches your apex domain.
Env var unset in prod. Origin middleware that adds X-Robots-Tag: noindex for NODE_ENV !== "production" ships to production because NODE_ENV isn’t actually set there, so the guard is always true.

Fix: find which layer injects the header. Walk it back layer by layer:

# Check origin directly (bypass CDN)
curl -sI --resolve yourdomain.com:443:ORIGIN_IP https://yourdomain.com/path | grep -i x-robots
# Then check via CDN
curl -sI https://yourdomain.com/path | grep -i x-robots

If origin clean but CDN dirty → it’s a CDN rule. If both dirty → it’s the origin/app.

Case 3: PDF, image, or other non-HTML disappearing from search

How to spot it:

curl -sI https://yourdomain.com/whitepaper.pdf | grep -i x-robots

PDFs, images, and other non-HTML responses cannot carry a <meta> tag. The only robots signal Google reads for them is X-Robots-Tag. If your hosting platform or framework sets X-Robots-Tag: noindex as a blanket default for static assets, your PDFs will never be indexed.

Why it happens: some frameworks (Next.js with default headers, S3+CloudFront with a noindex default policy) set X-Robots-Tag on all responses including assets.

Fix: scope the noindex rule to HTML pages only, or to specific paths, not blanket-applied.

Case 4: Behavior is intermittent — sometimes indexed, sometimes not

How to spot it: URL Inspection shows the rendered HTML carries noindex, but the raw response (the “HTTP response” tab) does not. Or vice versa.

Why it happens: a JavaScript injection mutates the meta tag after page load. If Googlebot executes the JS during this crawl, it sees noindex. If it doesn’t, it sees the original index value. The header path is always read; the meta path depends on render.

Fix: do not toggle robots meta via client JS. Render the final value in the SSR HTML, or set X-Robots-Tag server-side.

Case 5: `noindex` set in both, page still indexed

How to spot it:

# Both signals present
curl -s https://yourdomain.com/path | grep -i robots  # meta says noindex
curl -sI https://yourdomain.com/path | grep -i x-robots  # X-Robots-Tag: noindex

# But also:
curl -s https://yourdomain.com/robots.txt | grep -i path
# Disallow: /path

Why it happens: robots.txt Disallow blocks crawl entirely. Google never fetches the page, never sees the meta or header, and keeps the URL in the index based on external link signals alone (the famous “Indexed, though blocked by robots.txt” status). This is the most common noindex failure mode.

Fix: remove the Disallow line. The URL must be crawlable for Google to see the noindex and process the removal.

How Google resolves a conflict

From Google’s documented behavior (robots meta tag spec):

Any rule you can put in a robots meta tag can also be sent as an X-Robots-Tag header — they are the same vocabulary on two channels, and both are valid signals.
When both channels are present, “the more restrictive rule applies” — per directive, not per source.
noindex beats index. nofollow beats follow. For snippet controls, nosnippet beats any max-snippet:[number], and a smaller max-snippet beats a larger one.
If robots.txt blocks the URL, neither signal is ever read: “any information about indexing or serving rules will not be found and will therefore be ignored.” The URL can still appear in results with no snippet.

A note on stale advice: noarchive and nocache are no longer supported indexing rules — don’t reach for them. So there is no “winner” between meta and header for the rules that matter. They are merged. The practical rule: never let them disagree in production. (Full directive list and exact wording: Google’s robots meta tag spec.)

Shortest fix path

In hit-rate order:

Run curl -I on the affected URL → 60% of the time the header is the smoking gun. The meta tag in View Source distracts you from the real cause.
Check robots.txt → 25% of the time a Disallow line is preventing Google from ever seeing the signal you set.
Walk the request layers → if header looks right at origin but wrong at the edge, it’s a CDN / WAF / hosting platform rule.
Render robots meta server-side, not via JS → the remaining edge cases are almost all “JS sets the meta but Googlebot didn’t run JS.”

A safe production setup

Pick one source per content type. The combination below covers most sites:

HTML pages: meta robots in SSR HTML. Do not set X-Robots-Tag for HTML responses unless you have a specific reason.
PDF, images, downloads: X-Robots-Tag header (meta isn’t available).
Staging hosts: X-Robots-Tag: noindex scoped to staging hostnames only.

Example: scope the staging noindex to its hostname instead of applying it everywhere.

# Nginx — only on staging
server {
  server_name staging.yourdomain.com;
  add_header X-Robots-Tag "noindex, nofollow" always;
}

server {
  server_name yourdomain.com;
  # No X-Robots-Tag here. Meta tag handles per-page noindex.
}

How to confirm it’s fixed

Don’t trust View Source — confirm at the header level and in Google’s own view:

curl -sI https://yourdomain.com/path | grep -i x-robots returns nothing (or the value you intend). Re-run it against the CDN URL and the origin, since the edge can rewrite headers.
In Search Console, run URL Inspection on the URL → click Test Live URL. Open the HTTP response section and confirm there’s no unexpected x-robots-tag. The “Indexing allowed?” line should read Yes.
Click Request Indexing to queue a fresh crawl. Note the cap: as of June 2026, Search Console allows only roughly 10–12 manual submissions per property per day, and the button greys out for 24 hours once you hit it. For a handful of important pages this is fine; for a sitewide fix, rely on the normal crawl cycle and resubmit the sitemap instead.
Wait for re-crawl. A homepage or high-traffic page often re-indexes within 24–72 hours; deep, low-traffic pages can take a week or more.

Prevention

Document per-file-type which signal is canonical. Put it in the SEO README in the repo.
Add a CI check: curl -I a sample of production URLs and fail the build if X-Robots-Tag appears unexpectedly.
Never noindex in the meta and Disallow in robots.txt for the same URL — Google will never see the noindex.
Don’t toggle robots meta via client-side JS — Googlebot may skip JS execution on this crawl.
After changing platform settings (Vercel, Netlify, Cloudflare), curl -I a sample of pages to confirm headers didn’t change.

FAQ

Q: I set noindex in meta but the page is still in Google. What’s the first thing to check? A: Run curl -sI https://yourdomain.com/path | grep -i x-robots. If there is no header, then check robots.txt for a Disallow on that path. Those two checks resolve about 85% of cases. If neither is the issue, it’s almost always “Google has not re-crawled yet” — wait or use URL Inspection’s “Request indexing”.

Q: Can I use both meta and X-Robots-Tag on the same page? A: Yes, but only if they agree. Google merges them by taking the most restrictive directive. If meta says index and header says noindex, the page is noindex. Having both is redundant for HTML; pick one.

Q: Does X-Robots-Tag work for HTML pages? A: Yes, identically to meta. It’s just less common because most CMSes write the meta tag for you. The header is the right choice for non-HTML (PDF, image, ZIP) where there’s no place to put a meta tag.

Q: My staging environment leaked noindex to production. How do I undo it fast? A: Remove the header (deploy or CDN rule change), then in Search Console: URL Inspection on the affected URL → “Test Live URL” → confirm the header is now absent in the HTTP response view → “Request Indexing”. For a homepage or top-level page, this can re-index within 24–72 hours. Remember the ~10–12-submissions-per-day cap, so prioritize your most important URLs.

Q: A something.vercel.app URL of my site is in Google but my real domain isn’t. Why? A: Vercel sets X-Robots-Tag: noindex on system-generated *.vercel.app URLs by default, so those shouldn’t be indexed — but if Google found one via a link before it could read the header, the raw URL can linger. Fix the canonical/links so only your production domain is referenced, make sure your production domain returns no noindex header (curl -sI), and let the *.vercel.app URL drop out. Never serve your real content on the bare deployment URL as the canonical.

Q: Does nosnippet or max-snippet interact differently between meta and header? A: No — same merge rule applies. The most restrictive value across both sources wins. If meta says max-snippet:50 and header says max-snippet:200, the page gets max 50. And nosnippet overrides any max-snippet value on either channel.

Tags: #SEO #Troubleshooting #Debug #Structured data #robots.txt #X-Robots-Tag

Which bucket are you in

How to identify which case you’re in

Case 1: Page has noindex in meta but is still indexed

Case 2: Page has no noindex in meta but disappeared from the index

Case 3: PDF, image, or other non-HTML disappearing from search

Case 4: Behavior is intermittent — sometimes indexed, sometimes not

Case 5: noindex set in both, page still indexed

How Google resolves a conflict

Shortest fix path

A safe production setup

How to confirm it’s fixed

Prevention

FAQ

Related articles

Related Articles

Dynamic Title Set by JavaScript Not Indexed by Google

HowTo Schema Is Deprecated But Your Template Still Emits It

Product Schema Review Count Does Not Match Visible Reviews

Fix Article Schema Missing Field author.name in Search Console

Sitemap lastmod Is Always Today and Google Stopped Trusting It

Title Tag and H1 Mismatch Causes Google Rewrites

Case 1: Page has `noindex` in meta but is still indexed

Case 2: Page has no `noindex` in meta but disappeared from the index

Case 5: `noindex` set in both, page still indexed