Internal Search Result Pages: Index or Noindex?

Most internal search result pages should be kept out of the index. Here is why, and the two exceptions where letting them in actually wins traffic.

On most sites, internal search result pages are the single biggest source of low-quality URLs leaking into Google. They are dynamically generated, infinite in number, near-duplicate of the homepage, and updated faster than crawlers can keep up. Google’s own guidelines spell it out: by default, internal search results should not be indexed. Here is the rule, the two exceptions, and how to implement either side cleanly.

Background

An internal site search returns whatever the user typed: /search?q=react+hooks, /search?q=react%20hooks, /search?q=foo+bar+baz+qux. Every unique query is a new URL. If a crawler hits even a small site, it can discover thousands of these from internal links, sitemaps, and external referrers. Most of them have thin content (just a list of titles), are duplicates of each other (same query, different formatting), or contain garbage from spam queries. Google has been telling webmasters since 2007 to keep these out of the index, and the advice has not changed.

How to tell

  • Search Console Pages report shows /search?q=... URLs under “Crawled — currently not indexed” or “Duplicate, Google chose different canonical.”
  • site:yoursite.com inurl:search returns hundreds of pages you did not intend to publish.
  • A spammer has been pinging your search endpoint with spam queries and those URLs are getting picked up.
  • Your sitemap accidentally lists ?q= URLs because the URL discovery script crawled the site.

Quick verdict

Default: noindex all internal search result pages. Add a meta robots noindex to the template, and let Google crawl freely so it can see the directive. Do not Disallow in robots.txt — that prevents Google from seeing the noindex and you can end up with URL-only listings. The exceptions are below.

The two exceptions

There are two cases where letting search pages into the index can actually win traffic.

Exception 1: curated landing pages. If you can pre-generate a small set of high-value queries as real pages — /search/react-hooks, /search/python-async — those are not really search results, they are landing pages with unique titles, descriptions, and edited copy. Treat them as full articles. The URL pattern looks search-like, but the content is curated.

Exception 2: e-commerce category-like queries. If your “search” is really a filter on a product catalog with stable inventory (/search?category=running-shoes&size=10), and you have real demand for that combination, you may want it indexed. But only the combinations that match real search intent — not every possible filter combination.

Everything else: noindex.

Implementing noindex correctly

<!-- In the search template, conditional on the page being a real query -->
<meta name="robots" content="noindex, follow">
<!-- Or via HTTP header (cleaner for non-HTML responses) -->
X-Robots-Tag: noindex, follow

Make sure the directive renders before any redirect or canonical tag. Crawlers fetch the page, see noindex, and drop it from the index over the next 1-3 weeks. Keep follow so any links to real articles in the results page still pass equity.

If you have already indexed thousands of search URLs and want to clean up:

  1. Add noindex to the template.
  2. Submit a representative URL to URL Inspection > Request indexing — Google recrawls and drops it.
  3. Watch the Pages report shrink over the next month.
  4. After the cleanup, consider adding Disallow: /search? to robots.txt to save crawl budget — but only after the noindex has cleared.

A noindex on the page is the policy layer. Sitemaps and internal links are the discovery layer. If they keep pointing at search URLs, crawlers keep fetching them — wasted budget even if nothing ends up indexed.

  • Audit your sitemap generator. If it walks the site looking for links, it will pick up ?q= URLs from the search box’s example queries and ship them in the sitemap. Filter the pattern out at build time.
  • Reduce in-template links to search. A persistent search box that submits to /search?q= is fine; a “popular searches” list that hardcodes 10 example URLs into the footer is not.
  • After cleanup, run site:yoursite.com inurl:search weekly for a month to confirm the count is dropping. If it plateaus, find what still links to the URLs and remove the link.

Spam query protection

If your search endpoint reflects user input into the page title or H1, you have a free SEO injection vector. Spammers ping your search with their target keyword (often in Russian, Japanese, or pharma terms) and the resulting URL becomes a thin page on your domain with their keyword in the title — exactly the artifact they want indexed.

Two defenses, both cheap. First, escape and truncate the user query before rendering — never let it appear in <title> or H1 as raw text. Second, even with noindex on the search template, rate-limit the endpoint at the CDN. A single IP hitting /search?q= 500 times an hour is not a real user; throw 429 and stop crawling those URLs entirely.

Common mistakes

  • Disallow: /search? in robots.txt without noindex first. Google cannot crawl the page, cannot see the noindex, and the URL stays indexed as a URL-only listing.
  • Linking heavily to internal search from the homepage, header, or footer. Every link is a crawl invitation. If you noindex, also reduce the internal link count.
  • Including ?q= URLs in your sitemap because a crawler-based generator picked them up. Filter them out at sitemap build time.
  • Forgetting that internal search referrers also appear in Search Console performance reports — those queries are searches on your site, not on Google.
  • Allowing user-submitted queries to be reflected raw in the page title (Search results for "spam-phrase-here"). Spammers exploit this for SEO injection.

FAQ

  • Should I noindex or canonical search results?: noindex. Canonical is for “this is the same content as another URL.” Search results are not duplicates of one master page; they are thin pages that should not exist in the index at all.
  • What about pagination of search results?: Same rule. noindex every page. The pagination itself is also noise.
  • My site search is powered by Algolia / Meilisearch / client-side JS. Does any of this matter?: If the URL changes (?q=) when a user searches, then yes — Google still sees the URL and tries to crawl it. If your search is entirely client-side with no URL change, you have nothing to noindex.
  • Will noindexing search hurt my SEO?: No. These pages were unlikely to rank for anything competitive. Removing them improves crawl efficiency and average quality across your indexed URLs.
  • Can I let only some search pages index?: Yes — see the curated landing pages exception. But that is no longer “internal search” in any meaningful sense; it is content with a search-shaped URL.
  • What about facet / filter URLs on an e-commerce site?: Same logic. Noindex by default. Whitelist a small set of high-traffic combinations as real pages with hand-written copy.
  • Will Bing handle this the same way?: Bing follows the same noindex and robots.txt semantics. The behavior is consistent across the major crawlers.

Tags: #Indie dev #SEO #Technical SEO #Indexing #search