I Set Noindex But the Page Is Still in Search Results

Q: Can I combine `noindex` and `robots.txt Disallow` for extra safety?

No — that combination breaks `noindex`. The Disallow prevents the crawl, so Google never sees the meta tag. The URL stays indexed (often with the note "Indexed, though blocked by robots.txt"). Pick one: `noindex` (crawlable, removed from index) OR `robots.txt Disallow` (uncrawled, may stay in index from external signals).

Q: Does `noindex` lose backlink equity?

A `noindex, follow` page passes link equity through outbound links but does not itself rank. `noindex, nofollow` blocks both. Most use cases want `noindex, follow` — the default when you write `noindex`.

Q: My URL shows "Excluded by noindex tag" in Search Console but still appears in `site:` results. Is it actually removed?

Yes. "Excluded by noindex tag" is the success state. `site:` operator lag is normal and resolves within days. URL Inspection is the source of truth.

Q: Will ` ` work, or do I need the `googlebot`-specific tag?

` ` applies to all compliant crawlers, including Googlebot, so it's the right default. Use ` ` only when you want a rule that targets Google specifically (e.g., noindex for Google but allow another engine). Don't add both with conflicting values on the same page.

Q: Why is my page "Indexed, though blocked by robots.txt" even though I added `noindex`?

That status is the smoking gun for Case 1. The `Disallow` line stops Googlebot from fetching the page, so it never reads your `noindex`. Remove the `Disallow`, request indexing, and the status will move to "Excluded by 'noindex' tag" after the next crawl.

You added `<meta name="robots" content="noindex">` weeks ago but the page is still in Google. The six reasons, in hit-rate order, with the curl checks to tell them apart.

Published: May 19, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Three weeks ago you added <meta name="robots" content="noindex"> to a thank-you page, an internal dashboard, or a staging copy that leaked to production. Today you run site:yourdomain.com/that-url and Google still returns it. The meta tag is in View Source. So what is happening?

Fastest answer: Google only acts on noindex after it re-crawls the page and actually sees the tag. In Google’s own words, “if the page is blocked by a robots.txt file or the crawler can’t access the page, the crawler will never see the noindex rule, and the page can still appear in search results” (Block Search indexing). So before anything else, run one check:

# Does robots.txt block this path? If yes, that's your bug.
curl -s https://yourdomain.com/robots.txt | grep -iE 'disallow|noindex'

If a Disallow line covers the URL, remove it — that single fix clears roughly 40% of these cases (the crawler can finally reach the page and read noindex). If robots.txt is clean, work down the six cases below in hit-rate order.

Quick diagnosis

Run these against the live URL and match the result to a case:

What you observe	Likely case	Jump to
`robots.txt` has a `Disallow` covering the URL	robots.txt blocks the crawl	Case 1
`curl` raw HTML has no robots meta, but DevTools DOM does	JS-rendered noindex	Case 2
Cache-busted `curl` shows `noindex`, plain `curl` doesn’t	Stale CDN HTML	Case 3
Tag is in raw HTML, “Last crawled” predates the change	Not re-crawled yet	Case 4
`X-Robots-Tag` header conflicts with the meta tag	Conflicting signals	Case 5
`site:` shows it, but URL Inspection says “not on Google”	`site:` lag (already fixed)	Case 6

How to identify which case you’re in

Case 1: `robots.txt` is blocking the crawl

This is the most common failure. Symptom in Search Console: the Page indexing report (left sidebar → Indexing → Pages) lists the URL under “Indexed, though blocked by robots.txt,” or URL Inspection shows “Blocked by robots.txt.”

How to spot it:

curl -s https://yourdomain.com/robots.txt | grep -i your-path
# Disallow: /private/

Why it happens: someone thought “to keep a page out of Google, block it in robots.txt.” That stops the crawl but does not remove the URL from the index — external links keep it there with no snippet. The meta noindex you added is invisible to Google because Google never fetches the page.

Fix: remove the Disallow line for that URL. Google must crawl the page to see the noindex and process removal. After removal, the URL exits the index typically within 1–4 weeks.

# robots.txt — before
User-agent: *
Disallow: /private/

# robots.txt — after (allow crawl so meta noindex takes effect)
User-agent: *

For permanent exclusion: keep the noindex meta, remove the Disallow. They are mutually exclusive — noindex requires crawl access.

Case 2: Noindex is rendered by client-side JavaScript

How to spot it:

# Server response, raw HTML
curl -s https://yourdomain.com/path | grep -i "name=\"robots\""
# (no result — meta tag missing in initial HTML)

Then in Chrome DevTools → Elements (which shows the rendered DOM after JS runs), the meta tag is present. Or in Search Console URL Inspection → “View crawled page” → the “HTML” tab shows no meta, but “Rendered HTML” shows it.

Why it happens: a client-side framework (React, Vue) injects the robots meta after hydration. Googlebot may or may not execute JS on a given crawl. When it doesn’t, it sees no noindex and keeps the URL indexed.

Fix: render noindex in the SSR HTML so it’s present in the response Googlebot fetches, not appended later. For SPAs, put the meta tag in the initial server <head>. If you can’t change the rendered HTML at all (static host, no SSR), send the directive as an HTTP response header instead — X-Robots-Tag: noindex is read before any JS runs and is the recommended fallback. To verify, confirm noindex appears in the raw curl output, not just the DevTools DOM.

Case 3: CDN is serving stale HTML

How to spot it:

# Add a cache-busting query
curl -s "https://yourdomain.com/path?cb=$(date +%s)" | grep -i robots
# noindex appears

# Plain request
curl -s "https://yourdomain.com/path" | grep -i robots
# noindex missing

Different result between cache-busted and plain requests = CDN cache.

Why it happens: Cloudflare, Vercel Edge, CloudFront, etc. cache HTML responses. If you added the noindex meta after the page was already cached, the CDN serves the stale version to Googlebot for the cache TTL (often 24h–7d, sometimes longer).

Fix: purge the CDN cache for that URL.

# Cloudflare
curl -X POST "https://api.cloudflare.com/client/v4/zones/$ZONE/purge_cache" \
  -H "Authorization: Bearer $CF_TOKEN" \
  -H "Content-Type: application/json" \
  --data '{"files":["https://yourdomain.com/path"]}'

Vercel: deploy invalidates edge cache. Netlify: netlify cache:clear. Then curl the URL again to confirm the meta tag is now in the response.

Case 4: Google has not re-crawled yet

How to spot it: in Search Console → URL Inspection → “Last crawled” date is older than when you added noindex.

Why it happens: Google’s crawl frequency for low-traffic pages can be weeks to months — Google’s docs note “it may take months for Googlebot to revisit a page.” The meta is in the HTML, but Googlebot hasn’t fetched it since you added it.

Fix: URL Inspection → “Request indexing.” Yes, you can request indexing of a noindex page — counter-intuitive, but it just triggers a re-crawl; Google fetches the page, reads the noindex, and drops the URL from the index. Typical timeline as of June 2026: a few days for high-traffic sites, 2–4 weeks for low-traffic. Requesting indexing has a daily quota, so submit the one URL that matters rather than batching.

Case 5: Conflicting signals across meta and `X-Robots-Tag`

How to spot it:

curl -sI https://yourdomain.com/path | grep -i x-robots
# X-Robots-Tag: index, follow   <-- conflicts with meta noindex

Why it happens: meta in the HTML says noindex, but an HTTP header says index. Google merges the two and takes the most restrictive, so this case should actually work — but if a CDN strips or overwrites headers, the meta tag may be missing too.

Fix: see Meta Robots vs X-Robots-Tag — which one wins for the full conflict resolution. In short: align both, or rely on only one.

Case 6: `site:` operator quirk vs actual indexing

How to spot it:

# In Google search
site:yourdomain.com/that-url
# Shows the URL

# But URL Inspection in Search Console says:
# "URL is not on Google"

Why it happens: site: results sometimes lag actual indexing state. The URL may already be removed from active search results but still appear in site: queries for a few days.

Fix: rely on URL Inspection, not site:, as the source of truth. If URL Inspection says “URL is not on Google,” the page is no longer in search results — the site: result will catch up.

Shortest fix path

In hit-rate order:

Check robots.txt for a Disallow on the URL → 40% of cases. Remove the Disallow so Google can crawl and see noindex.
curl the URL and confirm the meta tag is in the raw HTML, not added by JS → 25% of cases.
Purge CDN cache for the URL → 15% of cases.
URL Inspection → “Request indexing” → 15% of cases (just slow re-crawl).
For urgent removal: use the Removals tool → temporary 6-month suppression while the above works.

Using the Removals tool correctly

Search Console → Indexing → Removals → New Request → “Temporarily remove URL” → choose “Remove this URL only” or “Remove all URLs with this prefix”:

Effect: hides the URL from Google Search for about 6 months (Google’s documented duration as of June 2026).
Important: this is suppression, not removal. The URL still exists in Google’s index. After ~6 months the request expires, and if noindex hasn’t propagated, the URL reappears.
Use this only as a stopgap while the underlying noindex propagates. Google explicitly says the 6 months is meant to give you time to put a permanent solution in place.

Common misuse: people submit a Removals request and then think they are done. Without noindex (or robots.txt Disallow plus 410 on the page), the URL comes back after 6 months. See URL Removals tool confusion.

Permanent removal flow (the right sequence)

For a page you want completely out of Google forever:

Ensure the URL returns 200 (not 404/410 yet — Google must be able to crawl).
Add <meta name="robots" content="noindex"> in SSR HTML, or X-Robots-Tag: noindex header.
Ensure robots.txt does NOT Disallow the URL.
URL Inspection → “Request indexing” to trigger crawl.
Wait for Google to re-crawl (URL Inspection’s “Last crawled” date updates).
URL Inspection now reports “Excluded by ‘noindex’ tag” — success.
(Optional) Once Google has confirmed noindex, you can 410 or 404 the URL to fully retire it.

If you 404/410 before Google sees the noindex, Google may keep the URL in the index for a long time because all it sees is “this URL stopped responding,” which is not the same as “the owner says don’t index.”

How to confirm it’s actually fixed

Don’t trust site: as your signal — it lags. Confirm in this order:

Raw fetch contains the directive. curl -s https://yourdomain.com/path | grep -i robots must show noindex. For an HTTP-header setup, curl -sI https://yourdomain.com/path | grep -i x-robots must show X-Robots-Tag: noindex.
Google fetched it after your change. In URL Inspection, “Last crawled” is newer than the date you added noindex. If it’s older, click “Request indexing” and wait.
The page report flips to the success state. URL Inspection (or the Page indexing report) shows the URL under “Excluded by ‘noindex’ tag.” That is the confirmed-removed state — once it shows here, the live search result will drop within days even if site: still echoes it.

If after 8 weeks the URL is still “Indexed” in URL Inspection, the crawl is being blocked somewhere — re-run the Quick diagnosis table and look hardest at robots.txt, JS-only rendering, and CDN cache.

Prevention

Render noindex in SSR HTML — never via client JS.
Never combine noindex with robots.txt Disallow on the same URL.
After changing robots meta, purge CDN cache for the affected paths.
Use X-Robots-Tag for non-HTML responses (PDFs, images) where a meta tag isn’t possible.
For pages you never want indexed (admin, internal tools), put them behind auth — that’s stronger than any robots directive, because Google can’t see the content to index at all.

Google’s own reference for the supported syntax and behavior: Block Search indexing with noindex.

FAQ

Q: I added noindex weeks ago. How long until the URL disappears from search? A: Typical timeline: 1–4 weeks after Google’s first re-crawl. If you have not seen the URL leave search after 8 weeks, something is blocking the crawl — check robots.txt, JS-rendered meta, or CDN cache.

Q: Can I combine noindex and robots.txt Disallow for extra safety? A: No — that combination breaks noindex. The Disallow prevents the crawl, so Google never sees the meta tag. The URL stays indexed (often with the note “Indexed, though blocked by robots.txt”). Pick one: noindex (crawlable, removed from index) OR robots.txt Disallow (uncrawled, may stay in index from external signals).

Q: Does noindex lose backlink equity? A: A noindex, follow page passes link equity through outbound links but does not itself rank. noindex, nofollow blocks both. Most use cases want noindex, follow — the default when you write noindex.

Q: My URL shows “Excluded by noindex tag” in Search Console but still appears in site: results. Is it actually removed? A: Yes. “Excluded by noindex tag” is the success state. site: operator lag is normal and resolves within days. URL Inspection is the source of truth.

Q: Can the Removals tool permanently remove a URL? A: No — it’s a ~6-month suppression. For permanent removal you still need noindex (or auth, or 410). The Removals tool buys time while the underlying mechanism propagates.

Q: Will <meta name="robots"> work, or do I need the googlebot-specific tag? A: <meta name="robots" content="noindex"> applies to all compliant crawlers, including Googlebot, so it’s the right default. Use <meta name="googlebot" content="noindex"> only when you want a rule that targets Google specifically (e.g., noindex for Google but allow another engine). Don’t add both with conflicting values on the same page.

Q: Why is my page “Indexed, though blocked by robots.txt” even though I added noindex? A: That status is the smoking gun for Case 1. The Disallow line stops Googlebot from fetching the page, so it never reads your noindex. Remove the Disallow, request indexing, and the status will move to “Excluded by ‘noindex’ tag” after the next crawl.

Tags: #SEO #Troubleshooting #Debug #Structured data #noindex

Quick diagnosis

How to identify which case you’re in

Case 1: robots.txt is blocking the crawl

Case 2: Noindex is rendered by client-side JavaScript

Case 3: CDN is serving stale HTML

Case 4: Google has not re-crawled yet

Case 5: Conflicting signals across meta and X-Robots-Tag

Case 6: site: operator quirk vs actual indexing

Shortest fix path

Using the Removals tool correctly

Permanent removal flow (the right sequence)

How to confirm it’s actually fixed

Prevention

FAQ

Related articles

Related Articles

Dynamic Title Set by JavaScript Not Indexed by Google

HowTo Schema Is Deprecated But Your Template Still Emits It

Product Schema Review Count Does Not Match Visible Reviews

Fix Article Schema Missing Field author.name in Search Console

Sitemap lastmod Is Always Today and Google Stopped Trusting It

Title Tag and H1 Mismatch Causes Google Rewrites

Case 1: `robots.txt` is blocking the crawl

Case 5: Conflicting signals across meta and `X-Robots-Tag`

Case 6: `site:` operator quirk vs actual indexing