A page that was previously indexed, ranked, and getting traffic suddenly disappears from the index one day. site:yourdomain.com/that-url returns nothing, URL Inspection says “URL is not on Google,” and it’s not from noindex / 404 / robots.txt block — Google actively removed (deindexed) it.
Completely different from “never indexed”: it used to be in, now it’s not = either you changed something OR Google’s overall assessment of your site changed. Below is a step-by-step diagnostic.
Common causes
1. Technical regression: accidental noindex / 404 / redirect
Easiest to miss, easiest to fix. Common scenarios:
- Template change added
<meta name="robots" content="noindex">and no one caught it - Server config changed in a deploy, the URL now 301s elsewhere
- CMS upgrade flipped those pages to draft / private
- Canonical changed to point to a different URL
How to confirm:
# Is the page actually accessible + 200?
curl -sI https://yourdomain.com/that-url | head -5
# Any noindex?
curl -sL https://yourdomain.com/that-url | grep -i noindex
# Is canonical self?
curl -sL https://yourdomain.com/that-url | grep -oE '<link rel="canonical" href="[^"]+"'
# Is robots.txt blocking?
curl -s https://yourdomain.com/robots.txt | grep -E "Disallow.*your-url"
Any anomaly = root cause.
2. Merge / restructure picked a different URL
If you recently merged similar articles, changed URL structure, or shipped lots of 301s — Google may have re-evaluated and tagged a URL you wanted to keep as “consolidated into X” and removed the original.
How to confirm: Search Console → URL Inspection → see “Google-selected canonical.” If it points to another URL, Google consolidated.
3. Site-wide quality assessment changed (Core Update / Helpful Content Update)
Google runs major algorithm updates several times a year (Core Update, Helpful Content Update, Spam Update). If you suddenly see many pages deindex and the timing matches a known update — that’s algorithmic re-evaluation.
How to confirm: Check searchengineroundtable.com or search.google/updates for recent rollouts. Compare to your deindex date.
4. Content flagged “unhelpful / low value”
Related to #3 but more specific:
- Lots of bulk AI-generated content
- Content highly similar to other sites
- First paragraph is generic platitudes
- No firsthand information
5. Domain penalized (Manual Action)
Rare but serious. Search Console → Security & Manual Actions → Manual actions — any notice = manual penalty. Common triggers: buying backlinks, cloaking, thin affiliate content, unnatural links.
6. Persistent server 5xx errors
If your server returned 500/503/504 frequently for a period, Google failing to fetch multiple times will temporarily drop pages from the index. Usually auto-recovers within 1-2 weeks after fixed.
How to confirm: Search Console → Crawl Stats → “By response” — heavy 5xx?
7. Duplicate content consolidated to another site
Like #2 but the scenario is “your page is too similar to another site,” and Google picked them as canonical.
Shortest path to fix
Step 1: URL Inspection for Google’s specific status
Open Search Console → top input → enter the lost URL → wait. Look at:
- Page indexing: what status?
- Crawl: last crawl time
- Google-selected canonical: pointing elsewhere?
- Indexing allowed: No?
Status → fix mapping:
| Status | Cause | Fix |
|---|---|---|
| Excluded by ‘noindex’ | You added noindex | Remove meta |
| Not found (404) | URL really broken | Fix server |
| Blocked by robots.txt | robots.txt block | Edit robots |
| Duplicate, Google chose different | Consolidated elsewhere | Strengthen signals / accept |
| Crawled - currently not indexed | Content quality | Deepen content |
| URL is not on Google (no clear reason) | Core Update assessment | Site-wide quality work |
Step 2: Check server / canonical / robots.txt (the three technical points)
# One-shot check script
URL="https://yourdomain.com/lost-page"
echo "=== HTTP status ==="
curl -sI "$URL" | head -3
echo "=== noindex ==="
curl -sL "$URL" | grep -i noindex || echo "none"
echo "=== canonical ==="
curl -sL "$URL" | grep -oE '<link rel="canonical" href="[^"]+"' || echo "none"
echo "=== robots.txt blocking ==="
PATH_PART=$(echo "$URL" | sed 's|https\?://[^/]*||')
curl -s "$(echo $URL | grep -oE 'https?://[^/]*')/robots.txt" | grep -E "Disallow.*$PATH_PART" || echo "not blocked"
Any anomaly is fixed first.
Step 3: Compare lost URL’s content vs. its history
Open Wayback Machine for the pre-loss snapshot vs. current page:
- Word count dropped?
- Topic shifted?
- Key signals (H1, title, image) changed?
If a rewrite trimmed it down or shifted topic, that’s the direct cause. Restore.
Step 4: Check the Core Update timing window
Look up Google’s announcements in the past 90 days:
- 2026 March Core Update
- 2026 Spam Update
- …
If your deindex date falls within an update rollout window (updates typically run 2 weeks), it’s algorithmic. Technical fixes won’t help — site-wide quality work is the only path.
Step 5: If technical regression, Request indexing after fix
After removing noindex / canonical / 404, click “Request indexing” in URL Inspection — Google usually re-crawls within 24h.
Step 6: Quality issues need 4-12 weeks of rework
If it’s Helpful Content–type:
- Merge or noindex the thinnest 20% of pages
- For retained pages, add: original cases, exclusive data, first-person experience, comparison tables
- Get 3-5 new backlinks
- 4-8 weeks later, expect 30-60% of dropped pages back in the index
Prevention
- Check Search Console → Pages → indexed count trend weekly; a 10%+ drop triggers investigation
- After each deploy, smoke-test critical pages with curl: 200 + no noindex + correct canonical
- Before CMS upgrade / template overhaul, back up the current sitemap + indexed count
- CI runs production health check: daily random 50 URLs validated via curl for 200 + no noindex
- Log Google’s Core Update announcement dates for later attribution
Related
Tags: #SEO #Google #Search Console #Indexing