If I `Disallow` a URL, will it still appear in search?

Sometimes, yes. If other sites link to it, Google may list the bare URL without a description. Use `noindex` if your goal is "do not appear in search at all."

Does `noindex, nofollow` make sense together?

Rarely. `noindex` already removes the page from results, and adding `nofollow` seals off internal link flow to your own content. Default to `noindex, follow` unless you deliberately want links sealed.

What is `rel="sponsored"` versus `nofollow`?

Both signal "no trust passed," but `sponsored` specifically declares a paid placement. Google prefers the precise attribute when it is accurate, and all three are hints rather than hard rules as of June 2026.

How long until a `noindex` page leaves the index?

Usually 1–7 days after Google re-crawls and sees the tag — but the recrawl can take weeks. Speed it up with URL Inspection → Request indexing.

How is the Removals tool different from `noindex`?

The Removals tool is a fast, temporary hide (~6 months). `noindex` is the permanent fix. Use both together when you need something gone immediately and for good.

Should I `noindex` thin tag and category pages?

Only if they genuinely add no value. A thin tag page with three articles is a candidate for merging, not noindexing. Treat `noindex` as the last resort.

Indie Dev & Website Building

Noindex vs Nofollow vs Disallow: When to Use Each

Three SEO controls, three different jobs. Pick the wrong one and you leak pages into the index, waste crawl budget, or hide content so thoroughly Google can never drop it.

Published: May 23, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

noindex, nofollow, and Disallow look interchangeable in the docs and behave wildly differently in production. One keeps a page out of search results, one is a hint about link signals, and one tells crawlers not to fetch the page at all. Use the wrong one and you either leak pages into the index, waste crawl budget on junk, or hide a page from Google so completely it cannot even drop it.

TL;DR

Want a page gone from results: use noindex (meta tag or X-Robots-Tag header). Make sure robots.txt does not block the same URL, or Google never sees the directive.
Want crawlers to skip a heavy path entirely: use Disallow in robots.txt, and accept that the URL may still appear without a snippet if other sites link to it.
Want to stop vouching for a link: use rel="nofollow", or the more precise rel="sponsored" / rel="ugc". As of June 2026 these are hints, not hard directives.
Never put noindex and Disallow on the same URL. The block stops Google from reading the noindex, so the URL gets stuck in the index.

Three mechanisms, three problems

These controls grew out of three separate problems at three different times.

Control	Where it lives	What it does	Crawl stage
`Disallow`	`robots.txt`	”Do not fetch this path” (original 1994 robots standard)	Before fetch
`noindex`	`<meta>` tag or `X-Robots-Tag` HTTP header	”Fetch it, but keep it out of search results”	After fetch
`nofollow` / `sponsored` / `ugc`	`rel` attribute on a single link	”Do not vouch for / do not pass trust through this link”	Per link

They live in different files, fire at different stages, and answer different questions. The mistake is treating “block crawling” and “block indexing” as the same lever. They are opposites: Disallow stops the fetch, noindex requires the fetch.

How to tell which problem you have

A staging, cart, or thank-you page is showing up in site:yoursite.com results.
Your Search Console Pages report flags “Indexed, though blocked by robots.txt” — the worst-of-both-worlds state.
You added Disallow to robots.txt to hide a page, and the URL still appears in results, just without a description.
Internal links to login, cart, or admin pages are bleeding link equity into non-indexable surfaces.

The Disallow vs noindex trap

This is the single most common bug, and Google’s own docs call it out directly: “For the noindex rule to be effective, the page must not be blocked by a robots.txt file.”

Someone wants to hide a page, so they add it to Disallow. Google obeys and stops crawling — but the URL was already indexed, and now Google cannot fetch the page to see the noindex you also added. The URL stays in the index indefinitely, listed without a snippet.

The fix has a strict order:

Remove the Disallow from robots.txt.
Add noindex (meta tag or header) and let Google re-crawl.
Once the URL has dropped, then you may re-add Disallow if you also want to block crawling.

<!-- Page-level noindex: the safe default for "do not show this" -->
<meta name="robots" content="noindex, follow">

# robots.txt — blocks crawling, NOT indexing
User-agent: *
Disallow: /search?
Disallow: /admin/

If you only care about Google specifically, you can target its crawler: <meta name="googlebot" content="noindex">. The plain robots value applies to every compliant crawler, so prefer it unless you have a reason not to.

When nofollow is the right answer

nofollow is link-level, not page-level. It says “I do not trust where this link goes” or “this is a paid placement.” Since Google’s September 2019 change (effective March 1, 2020), nofollow, sponsored, and ugc are treated as hints, not absolute commands — Google may still crawl the target and use it for context. They reliably stop passing ranking trust, but they no longer guarantee the link is ignored.

Use them on outbound links you do not editorially vouch for:

<a href="https://example.com" rel="nofollow">untrusted destination</a>
<a href="https://partner.com" rel="sponsored">paid placement</a>
<a href="https://forum-comment.com" rel="ugc">user comment</a>

When a link is both user-generated and paid, you can combine values: rel="ugc sponsored". Do not use nofollow to “save link equity” by capping outbound links — that pattern stopped working years ago and now mostly reads as manipulation.

Decision table

Goal	Use this	Caveat
Keep a page out of results, Google can still crawl	`noindex` only	Robots.txt must NOT block the URL
Crawler should never fetch a heavy/infinite path	`Disallow` only	URL may still appear without a snippet
Stop passing trust through one link	`nofollow` / `sponsored` / `ugc`	Hints, not directives, as of June 2026
Make a page truly gone	`noindex` first, recrawl, then optionally `Disallow`	Or return `410 Gone` / delete the page
Deindex a whole staging site	`noindex` via `X-Robots-Tag` header, or HTTP auth	Never use bare `Disallow: /` for this

X-Robots-Tag for non-HTML files

noindex lives in two places: the <meta> tag (HTML pages) and the X-Robots-Tag response header (everything else). PDFs, JSON endpoints, images, and any URL whose response is not HTML cannot carry a meta tag, so use the header at the server or CDN layer.

X-Robots-Tag: noindex, follow

Google’s docs state the two methods “have the same effect,” so pick whichever is convenient for the content type. Where to set it:

Firebase Hosting: in the headers block of firebase.json with a source glob (for example **/*.pdf).
Nginx: add_header X-Robots-Tag "noindex, follow"; inside the relevant location block.
Cloudflare: a transform rule or a Worker that appends the header.

How long each signal takes to apply

noindex removal: once Googlebot re-crawls and sees the tag, the URL typically drops within 1–7 days. The catch is the recrawl itself — Google’s docs note “it may take months for Googlebot to revisit a page.” Push it with URL Inspection → Request indexing on the noindexed URL.
Disallow effect: takes hold on the next crawl attempt — hours for high-traffic sites, days for small ones.
nofollow effect: trust stops flowing on the next crawl. PageRank already passed is not clawed back.

If you need something gone now, the Removals tool in Search Console hides a URL fast — but only for about 6 months. It is a temporary mask, not a removal. Pair it with a permanent signal (noindex, a 410 Gone status, or deletion) before the window expires, or the URL reappears.

A common surprise runs the other way: you remove noindex hoping a page indexes fast, but Google still has to recrawl to notice. Request indexing on a representative URL to push it sooner; the rest follow over the next couple of weeks.

Common mistakes

Disallow + noindex on the same URL. The block stops the crawl, Google never reads the noindex, the URL stays indexed. The top cause of “Indexed, though blocked by robots.txt.”
Treating nofollow as a “do not index this target” signal. It governs link trust, not the destination’s indexing.
Shipping Disallow: / from staging to production. The live site silently refuses crawling for weeks until someone notices traffic flatlining.
Leaving noindex in a shared layout template after a temporary block, then wondering why the entire site dropped from search.
Adding noindex to canonical alternates (paginated ?page=2, language variants) and deindexing valid content by accident.
Relying on the Removals tool alone. It expires in ~6 months; without a permanent fix the page comes back.

FAQ

If I Disallow a URL, will it still appear in search?: Sometimes, yes. If other sites link to it, Google may list the bare URL without a description. Use noindex if your goal is “do not appear in search at all.”
Does noindex, nofollow make sense together?: Rarely. noindex already removes the page from results, and adding nofollow seals off internal link flow to your own content. Default to noindex, follow unless you deliberately want links sealed.
What is rel="sponsored" versus nofollow?: Both signal “no trust passed,” but sponsored specifically declares a paid placement. Google prefers the precise attribute when it is accurate, and all three are hints rather than hard rules as of June 2026.
How long until a noindex page leaves the index?: Usually 1–7 days after Google re-crawls and sees the tag — but the recrawl can take weeks. Speed it up with URL Inspection → Request indexing.
How is the Removals tool different from noindex?: The Removals tool is a fast, temporary hide (~6 months). noindex is the permanent fix. Use both together when you need something gone immediately and for good.
Should I noindex thin tag and category pages?: Only if they genuinely add no value. A thin tag page with three articles is a candidate for merging, not noindexing. Treat noindex as the last resort.

External references: Google’s noindex documentation and Google’s guide to qualifying outbound links.

Tags: #Indie dev #SEO #Technical SEO #robots.txt #Indexing

TL;DR

Three mechanisms, three problems

How to tell which problem you have

The Disallow vs noindex trap

When nofollow is the right answer

Decision table

X-Robots-Tag for non-HTML files

How long each signal takes to apply

Common mistakes

FAQ

Related

Related Articles

Internal Search Result Pages: Index or Noindex?

Canonical URLs Explained — What to Set and What to Avoid

hreflang for Bilingual Sites — The Parts That Actually Matter

robots.txt — What to Put, What to Never Put (2026)

Should Your Category Pages Be Indexed?

Should Tag Pages Be Noindex? (For Most Sites, Yes)