noindex vs robots.txt: Which to Use (and the Trap That Breaks Both)

Q: I added `noindex` but the page is still in Google. Why?

Most likely the page is also `Disallow`ed in `robots.txt`, so Googlebot can't fetch it to see the tag. Remove the `Disallow`, confirm "Crawl allowed? Yes" in URL Inspection, then wait for a recrawl. (Other causes: the tag is JS-injected and not in the raw HTML, or Google simply hasn't recrawled yet.)

Q: Does `Disallow` in robots.txt remove a page from search?

No. It stops crawling, not indexing. A `Disallow`ed URL that's linked from elsewhere can still appear in results — without a snippet. To remove a page, use `noindex` and keep it crawlable.

Q: Can I use both on the same URL if I'm careful?

No — they directly conflict. `robots.txt` blocking wins on the crawl, so the `noindex` is never read. Pick one mechanism per URL.

Q: How do I deindex something fast?

For an emergency, use Search Console's **Removals** tool to hide a URL for ~6 months while your `noindex` (with crawling allowed) does the permanent job. Removals alone is temporary; the `noindex` is what makes it stick.

Q: What about `nofollow` and `disallow` — same thing?

No. `nofollow` is a link-level hint (don't pass signals through a link). `Disallow` is a crawl directive in `robots.txt`. `noindex` is an indexing directive. Three different layers; don't substitute one for another.

noindex blocks indexing; robots.txt blocks crawling. They are not interchangeable. Here's a decision table, the conflict that silently leaves pages indexed with no snippet, and how to confirm the fix in Search Console.

Published: May 15, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

TL;DR: To keep a page out of Google’s results, use noindex (a meta tag or an X-Robots-Tag header) and leave the page crawlable in robots.txt. To stop Google wasting crawl budget on junk URLs, use robots.txt Disallow. Never put both on the same URL: if robots.txt blocks the page, Googlebot never reads the noindex, and the URL can still get indexed with an empty title and snippet.

One-line difference

robots.txt Disallow controls whether the page is crawled. The bot is told not to open it. It says nothing about indexing.
noindex (<meta name="robots" content="noindex"> or the X-Robots-Tag: noindex HTTP header) controls whether the page is indexed. The bot must open the page to read the rule.

The dependency is the whole story: noindex only works if the page is crawlable. Google’s own docs are blunt about it — “For the noindex rule to be effective, the page or resource must not be blocked by a robots.txt file.” Block crawling and you also block the instruction that removes the page.

Which one do I want?

Goal	Use this	Not this
Keep a public page out of search results	`noindex` (crawlable)	`Disallow` alone won’t deindex it
Stop Google crawling a private/admin path at all	`Disallow`	`noindex` (still gets crawled)
Save crawl budget on faceted/filter/junk URLs	`Disallow`	`noindex` (Google still crawls them)
Deindex a PDF, image, or non-HTML file	`X-Robots-Tag: noindex` header	meta tag (no HTML head to put it in)
Site-search result pages	`Disallow` is fine	—
Truly secret content	server-side auth / password	neither — both are public hints

Two rules of thumb: Disallow is about crawling and crawl budget; noindex is about what appears in results. And robots.txt is a public file at https://yourdomain.com/robots.txt — never treat a Disallow line as a security control.

The trap: Disallow + noindex on the same URL

This is the single most common self-defeating combo. You add noindex to a page to remove it, then also Disallow it in robots.txt “to be safe.” The result:

Googlebot honors robots.txt and never fetches the page.
Because it never fetched the page, it never sees the noindex.
If anything links to the URL (internal or external), Google can still index the bare URL — listed with no title or description, just the URL and “No information is available for this page.”

In Search Console this shows up under Pages → Why pages aren’t indexed as “Indexed, though blocked by robots.txt.” That status is the fingerprint of this exact mistake. A page that’s correctly deindexed instead shows “Excluded by ‘noindex’ tag.”

Fix: decide what you actually want, then pick one mechanism.

Want it gone from results? Remove the Disallow line so the page is crawlable, keep the noindex, and wait for a recrawl.
Want it never crawled? Keep the Disallow, but accept that this does not guarantee deindexing — and remove the page from sitemaps and internal links so nothing points Google at it.

Don’t put `noindex` in robots.txt

A related dead end: some old guides show a Noindex: line inside robots.txt. Google dropped support for that on September 1, 2019 — it does nothing now. noindex only works as a meta tag or an X-Robots-Tag HTTP header on the page itself. See meta robots vs X-Robots-Tag for which to use where.

How to apply each one

noindex on an HTML page — in the <head>:

<meta name="robots" content="noindex">

Add nofollow only if you also want Google to ignore the page’s links: content="noindex, nofollow". For most “remove from results” cases, plain noindex (which keeps follow) is what you want, so internal link equity still flows.

noindex on a PDF / image / non-HTML file — set an HTTP response header (no head to edit):

X-Robots-Tag: noindex

Disallow in robots.txt — block a path from crawling:

User-agent: *
Disallow: /admin/
Disallow: /search

How to confirm it’s fixed

Test the page is crawlable. In Search Console open URL Inspection, paste the URL, and check that “Crawl allowed?” is Yes. If it says “No: blocked by robots.txt,” your noindex will never take effect.
Confirm the tag is actually served. Use View Source (not the rendered DOM — some setups inject noindex via JavaScript, which Google may render late). curl -I https://yourdomain.com/page shows the X-Robots-Tag header if you used that route.
Request indexing / wait for recrawl. In URL Inspection click Request Indexing. Deindexing isn’t instant — it can take days to weeks for Google to recrawl and drop the page.
Watch the status flip. Once it works, the URL moves out of “Indexed, though blocked by robots.txt” and into “Excluded by ‘noindex’ tag.” That’s the success state.

FAQ

I added noindex but the page is still in Google. Why? Most likely the page is also Disallowed in robots.txt, so Googlebot can’t fetch it to see the tag. Remove the Disallow, confirm “Crawl allowed? Yes” in URL Inspection, then wait for a recrawl. (Other causes: the tag is JS-injected and not in the raw HTML, or Google simply hasn’t recrawled yet.)

Does Disallow in robots.txt remove a page from search? No. It stops crawling, not indexing. A Disallowed URL that’s linked from elsewhere can still appear in results — without a snippet. To remove a page, use noindex and keep it crawlable.

Can I use both on the same URL if I’m careful? No — they directly conflict. robots.txt blocking wins on the crawl, so the noindex is never read. Pick one mechanism per URL.

How do I deindex something fast? For an emergency, use Search Console’s Removals tool to hide a URL for ~6 months while your noindex (with crawling allowed) does the permanent job. Removals alone is temporary; the noindex is what makes it stick.

What about nofollow and disallow — same thing? No. nofollow is a link-level hint (don’t pass signals through a link). Disallow is a crawl directive in robots.txt. noindex is an indexing directive. Three different layers; don’t substitute one for another.

Tags: #SEO #Google #Indexing #Debug

One-line difference

Which one do I want?

The trap: Disallow + noindex on the same URL

Don’t put noindex in robots.txt

How to apply each one

How to confirm it’s fixed

FAQ

Related

Related Articles

Hreflang "No Return Tags": Fix the Missing Reciprocal Link

JavaScript-Rendered Content Not Showing in Google Index

Indexing Dropped After Google Switched Your Site to Mobile-First

noindex,follow on Page 2+ Is Orphaning Your Deep Articles

Query-Parameter URLs Creating Duplicate Index Entries

robots.txt Blocks CSS/JS and Indexing Quality Drops

Don’t put `noindex` in robots.txt