Pages Dropped From Google's Index: Diagnose and Recover

Q: How do I check for a `noindex` I can't see in the page source?

A `noindex` can be sent as an `X-Robots-Tag` HTTP response header instead of an HTML meta tag. Run `curl -sI https://yourdomain.com/that-url | grep -i x-robots-tag` to catch it.

A page that was indexed, ranked, and getting traffic suddenly shows 'URL is not on Google'. Here's how to tell technical regression from a core-update quality drop, and how to get it back.

Published: May 17, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

A page that was indexed, ranked, and getting traffic suddenly disappears. site:yourdomain.com/that-url returns nothing, URL Inspection says “URL is not on Google,” and it isn’t a noindex, a 404, or a robots.txt block. Google actively removed (deindexed) it.

Fastest path: open the URL in Search Console’s URL Inspection tool and read the exact status. If it says Excluded by 'noindex', Not found (404), or Blocked by robots.txt, you have a technical regression — fix the server/template and click “Request indexing” (usually re-crawled within a day or two). If it says Crawled - currently not indexed or a bare URL is not on Google with no technical cause, this is a quality/algorithmic call, and the only real fix is making the page genuinely better, then waiting weeks.

This is completely different from “never indexed.” It used to be in, now it’s not, which means either you changed something, or Google’s assessment of your site changed. The diagnostic below tells the two apart fast, then fixes each.

Is this part of the 2026 deindexing wave?

Before blaming your own code, check the timing. As of June 2026 there is an ongoing, widely reported deindexing wave:

The March 2026 Core Update rolled out March 27 and finished April 8, 2026 — described by multiple SEO trackers as one of the most volatile core updates on record.
Since late April 2026, large numbers of site owners have reported previously stable pages flipping to Crawled - currently not indexed or Excluded with no manual action and no crawl error. The thread was kicked off by ex-Googler Pedro Dias asking whether others saw higher deindexing rates; the answer was overwhelmingly yes.
Google’s position: John Mueller said on Bluesky, “Some sites go up, some sites go down. I don’t see anything exceptional there.” Google has not confirmed any change in indexing behavior.

What this means for you: if many pages dropped at once, near these dates, with no noindex/404/robots cause, treat it as algorithmic — not a bug to patch. Do not bulk-noindex pages, rename URL paths, or rebuild your site structure to “reset” — those moves can turn a temporary dip into permanent damage. Cross-check your drop date against Google’s update list and searchengineroundtable.com before touching anything.

Common causes

1. Technical regression: accidental noindex / 404 / redirect

Easiest to miss, easiest to fix. Common scenarios:

A template change added <meta name="robots" content="noindex"> and no one caught it
A deploy changed server config, so the URL now 301s elsewhere
A CMS upgrade flipped those pages to draft / private
The canonical now points to a different URL

How to confirm:

# Is the page actually accessible and 200?
curl -sI https://yourdomain.com/that-url | head -5

# Any noindex (also check the X-Robots-Tag response header)?
curl -sL https://yourdomain.com/that-url | grep -i noindex
curl -sI https://yourdomain.com/that-url | grep -i x-robots-tag

# Is the canonical self-referential?
curl -sL https://yourdomain.com/that-url | grep -oE '<link rel="canonical" href="[^"]+"'

# Is robots.txt blocking it?
curl -s https://yourdomain.com/robots.txt | grep -E "Disallow.*your-url"

Any anomaly here is your root cause. Note: noindex can live in the HTML head OR in an X-Robots-Tag HTTP header, which the second command above catches — a header-level noindex is invisible if you only view source.

2. Merge / restructure picked a different URL

If you recently merged similar articles, changed URL structure, or shipped a lot of 301s, Google may have re-evaluated and tagged a URL you wanted to keep as “consolidated into X,” removing the original.

How to confirm: Search Console → URL Inspection → read “Google-selected canonical.” If it points to another URL, Google consolidated.

3. Site-wide quality assessment changed (core update)

Google runs major updates several times a year (core updates and spam updates; the standalone “Helpful Content Update” was folded into the core ranking system in 2024). If many pages deindex at once and the timing matches a known rollout, that is algorithmic re-evaluation, not a bug.

How to confirm: check the Google Search Status Dashboard or searchengineroundtable.com for recent rollouts and compare to your deindex date.

4. Content judged “unhelpful / low value”

Related to #3 but more specific. Pages most at risk:

Bulk AI-generated content with no editing or verification
Content highly similar to other sites (or to your own other pages)
A first paragraph that is generic platitudes
No firsthand information, original data, or genuine expertise

5. Domain penalized (manual action)

Rare but serious. Search Console → Security & Manual Actions → Manual actions. Any notice there is a human-issued penalty. Common triggers: buying backlinks, cloaking, thin affiliate content, unnatural link schemes. A manual action shows an explicit reason and a “Request Review” button — algorithmic drops never do.

6. Persistent server 5xx errors

If your server returned 500/503/504 frequently for a stretch, repeated failed fetches make Google temporarily drop pages. This usually auto-recovers within 1-2 weeks after the errors stop.

How to confirm: Search Console → Settings → Crawl stats → “By response” — a spike in 5xx?

7. Duplicate content consolidated to another site

Like #2, but the scenario is “your page is too similar to another site,” and Google picked the other site as canonical. Common with syndicated or scraped-then-reposted content.

Shortest path to fix

Step 1: URL Inspection for Google’s exact status

Open Search Console → the search bar at the top → enter the lost URL → wait for the result. Read:

Page indexing — what status?
Last crawl — when did Google last fetch it?
User-declared canonical vs Google-selected canonical — do they differ?
Indexing allowed? — is it “No”?

Status → fix mapping:

Status in URL Inspection	Likely cause	Fix
`Excluded by 'noindex'`	You added noindex (head or X-Robots-Tag)	Remove the tag/header
`Not found (404)`	URL really broken	Fix server / restore the route
`Blocked by robots.txt`	robots.txt disallow	Edit robots.txt
`Alternate page with proper canonical tag`	Your own canonical points elsewhere	Fix the canonical
`Duplicate, Google chose different canonical`	Consolidated to another URL/site	Strengthen signals, or accept
`Crawled - currently not indexed`	Quality / value judgment	Deepen the page; see Step 6
`URL is not on Google` (no clear reason)	Core-update assessment	Site-wide quality work

Step 2: Check server / canonical / robots.txt (the three technical points)

# One-shot check script
URL="https://yourdomain.com/lost-page"

echo "=== HTTP status ==="
curl -sI "$URL" | head -3

echo "=== noindex (HTML + header) ==="
curl -sL "$URL" | grep -i noindex || echo "none in HTML"
curl -sI "$URL" | grep -i x-robots-tag || echo "no x-robots-tag header"

echo "=== canonical ==="
curl -sL "$URL" | grep -oE '<link rel="canonical" href="[^"]+"' || echo "none"

echo "=== robots.txt blocking ==="
PATH_PART=$(echo "$URL" | sed 's|https\?://[^/]*||')
curl -s "$(echo $URL | grep -oE 'https?://[^/]*')/robots.txt" | grep -E "Disallow.*$PATH_PART" || echo "not blocked"

Fix any anomaly here first, before assuming a quality problem.

Step 3: Compare the lost URL’s content against its history

Open the Wayback Machine and compare the pre-loss snapshot with the current page:

Did word count drop?
Did the topic shift?
Did key signals (H1, title, main image) change?

If a rewrite trimmed the page or shifted its topic, that is likely the direct cause. Restore the lost substance.

Step 4: Check the core-update timing window

Look up Google’s announcements in the past 90 days against your deindex date:

The March 2026 Core Update (March 27 – April 8, 2026)
Any spam update or unannounced ranking shift on the status dashboard

If your deindex date falls inside a rollout window (core updates typically run 1-3 weeks), treat it as algorithmic. Technical fixes won’t move it — only site-wide quality work will.

Step 5: If it’s a technical regression, request indexing after the fix

After removing the noindex, fixing the canonical, or restoring the 404’d route, run “Test Live URL” in URL Inspection to confirm the page now passes, then click “Request indexing.” Google usually re-crawls within a day or two. (Request-indexing has a per-day quota, so use it on the fixed URLs that matter, not the whole site.)

Step 6: Quality issues need weeks, not a button

If it’s Crawled - currently not indexed or a core-update drop, there is no shortcut. Google’s own guidance: improvements “can take effect in a few days, but it could take several months,” and if nothing changes after a few months you may have to wait for the next core update — with no guarantee. Concretely:

Merge or noindex the thinnest 20% of pages so crawl effort concentrates on the ones worth keeping
For retained pages, add what only you can: original examples, exclusive data, first-person experience, real comparison tables
Tighten internal linking so kept pages are no more than 2-3 clicks from the homepage
Earn 3-5 new relevant backlinks
Check Core Web Vitals (Search Console → Core Web Vitals, plus PageSpeed Insights) and fix poor LCP / CLS, which is a cheap, real signal to clean up

Realistically, expect to see a meaningful share of dropped pages return only after the next crawl cycle or update — often 4-12 weeks out, not days.

How to confirm it’s fixed

Re-run URL Inspection on the affected URL. For a technical fix, “Test Live URL” should now show “URL is available to Google.”
A day or two after “Request indexing,” the status should read URL is on Google (or Submitted and indexed).
site:yourdomain.com/that-url returns the page again.
In Search Console → Indexing → Pages, the indexed-page count stops falling and trends back up.

If URL Inspection passes live but the page still won’t index after two weeks, the cause is quality/algorithmic, not technical — return to Step 6.

Prevention

Watch Search Console → Indexing → Pages → indexed-count trend weekly; a 10%+ drop triggers an investigation
After every deploy, smoke-test critical pages with curl: 200 + no noindex (head and header) + correct canonical
Before a CMS upgrade or template overhaul, back up the current sitemap and indexed count so you can diff afterward
Run a daily CI health check: sample ~50 production URLs and assert 200 + no noindex
Log each Google core-update announcement date, so a future drop is easy to attribute

FAQ

Is “deindexed” the same as a penalty? No. A penalty (manual action) shows an explicit notice in Search Console → Security & Manual Actions with a reason and a “Request Review” button. Most deindexing in 2026 is algorithmic — a quality call with no notice and no review button.

My pages dropped right around the March 2026 core update. Should I rush to change things? No. Google and most practitioners advise against reactive moves like bulk-noindex, URL renames, or structural rebuilds to “reset” pages — those can make a temporary drop permanent. Confirm there’s no technical regression, improve the weakest pages genuinely, and wait out the rollout.

How long until a fixed page comes back? A technical fix plus “Request indexing” is usually re-crawled within a day or two. A quality recovery is weeks: Google says days to several months, and you may need to wait for the next core update, with no guarantee.

Does “Request indexing” help if the cause is quality? Barely. It pushes Google to re-crawl, but if the page is judged low-value it will be crawled and dropped again. Improve the page first, then request indexing.

Why did only some of my pages drop and not others? Algorithmic quality assessment is page- and section-level. Thin, duplicative, or template-heavy pages go first; pages with original depth and inbound links survive. That pattern itself is a signal of where to invest.

How do I check for a noindex I can’t see in the page source? A noindex can be sent as an X-Robots-Tag HTTP response header instead of an HTML meta tag. Run curl -sI https://yourdomain.com/that-url | grep -i x-robots-tag to catch it.

Tags: #SEO #Google #Search Console #Indexing