Noindex, nofollow, and Disallow look interchangeable in the docs and behave wildly differently in production. One keeps a page out of search results, one is a hint about link signals, and one tells crawlers not to fetch the page at all. Use the wrong one and you either leak pages into the index, waste crawl budget on junk, or hide pages from Google so thoroughly it cannot even drop them.
Background
Three separate mechanisms grew out of three separate problems. Disallow (in robots.txt) was the original 1994 standard: do not fetch this path. noindex (meta tag or HTTP header) came later for “fetch it, but do not show it in search results.” nofollow (originally an anti-spam signal on links) tells Google not to pass PageRank through a link. They live in different files, fire at different stages of crawling, and answer different questions.
How to tell
- A staging or thank-you page is showing up in
site:yoursite.comresults. - Your Search Console Pages report flags “Indexed, though blocked by robots.txt” — the worst of both worlds.
- You added
Disallowto robots.txt to hide a page, and the URL still appears in search results without a snippet. - Internal links to login, cart, or admin are bleeding PageRank to non-indexable pages.
Quick verdict
Use noindex for pages you do not want in search results (cart, admin, internal tools, thin tag pages). Use Disallow only for paths you want crawlers to skip entirely — usually large dynamic surfaces (search, faceted filters). Use nofollow on links you do not vouch for (user-generated content, paid links). Never combine noindex and Disallow on the same URL.
Disallow vs noindex — the trap
The most common bug: someone wants to hide a page, so they add it to Disallow in robots.txt. Google obeys and stops crawling — but the URL was already indexed, and now Google cannot fetch the page to see the noindex you also added. The URL stays in the index forever, listed without a description. The fix is to remove the Disallow, let Google re-crawl, see noindex, and drop the URL cleanly. Then, if you still want to block crawling, re-add Disallow afterwards.
<!-- Page-level noindex (the safe default for "do not show this") -->
<meta name="robots" content="noindex, follow">
# robots.txt — block crawling, not indexing
User-agent: *
Disallow: /search?
Disallow: /admin/
When nofollow is the right answer
nofollow is link-level, not page-level. It says “I do not trust where this link goes” or “this is a paid placement.” Use it on outbound links in user-generated content, paid review links, and affiliate-style placements where Google has explicit attribute names (rel="sponsored", rel="ugc"). Do not use nofollow to “save PageRank” by capping outbound links — that pattern stopped working years ago and now mostly signals manipulation.
<a href="https://example.com" rel="nofollow">untrusted destination</a>
<a href="https://partner.com" rel="sponsored">paid placement</a>
<a href="https://forum-comment.com" rel="ugc">user comment</a>
Decision table
- Page should not appear in search results, but Google can crawl it:
noindexonly. - Crawler should never fetch this path (heavy, low-value, infinite):
Disallowonly, and accept that orphan URLs may still appear without snippet. - You want to stop passing trust through a link:
nofolloworsponsored/ugc. - You want a page truly gone:
noindexfirst, wait for re-crawl, then optionallyDisallow. - You want to deindex an entire site temporarily (staging):
noindexat the response-header level, or HTTP auth — never justDisallow.
HTTP header variant for non-HTML
noindex lives in two places: the <meta> tag (HTML pages) and the X-Robots-Tag response header (everything else). PDFs, JSON endpoints, images, and any URL whose response is not HTML cannot carry a meta tag — use the header form on the server or CDN layer.
X-Robots-Tag: noindex, follow
Most hosts let you set this in a config file. On Firebase Hosting, in the headers block. On Nginx, in the location directive. On Cloudflare, via a worker or transform rule. The directive is identical in meaning to the meta tag — Google honors whichever it sees first.
How long each signal takes to apply
noindexremoval from the index: 1-3 weeks after the next crawl. Speed it up by requesting indexing on the noindexed URL.Disalloweffect: takes effect on the next crawl attempt, typically within hours for popular sites, days for small sites.nofolloweffect: link equity stops flowing immediately on the next crawl. PageRank already passed does not get clawed back.
A common surprise: you remove noindex from a page hoping it indexes quickly. Google still has to recrawl to see the change. Request indexing on a representative URL to push it sooner; the rest follow within a couple of weeks.
Common mistakes
- Adding
Disallowandnoindextogether on the same URL. TheDisallowblocks crawling, Google never sees thenoindex, the URL stays indexed. - Treating
nofollowas a “do not index this link target” signal. It controls link equity, not indexing. - Using
Disallow: /on staging without removing it before launch. The launched site silently refuses crawling for weeks. - Leaving
noindexin a shared layout template after a temporary block, then wondering why the entire site dropped from search. - Adding
noindexon canonical alternates (paginated?page=2, language alternates) and accidentally deindexing valid content.
FAQ
- If I
Disallowa URL, will it still appear in search?: Yes, sometimes. If other sites link to it, Google may list the URL without a description. Usenoindexif your goal is “do not appear in search.” - Does
noindex, nofollowmake sense together?: Rarely.noindexalready removes the page from results; addingnofollowblocks internal link flow to your own content. Default tonoindex, followunless you explicitly want links sealed off. - What is
rel="sponsored"versusnofollow?: Both pass no link equity, butsponsoreddeclares “this is a paid placement” specifically. Google prefers the precise attribute when accurate. - How long until a
noindexpage leaves the index?: Usually 1-3 weeks after the next crawl. Speed it up with URL Inspection > Request indexing on the noindexed URL. - Should I
noindexthin tag and category pages?: Only if they truly add no value. A thin tag page with 3 articles is a candidate for merging, not noindexing. Noindex is the last resort.