How to Use AI to Detect Thin Content (Before Google Does)

A 3-pass AI workflow that scores every page for thin content, surfaces the worst offenders, and proposes one of: expand, merge, noindex, delete.

What this covers

The pain: your site has hundreds of pages, Search Console traffic is uneven, and you suspect a long tail of thin pages is dragging down the rest. Going page-by-page is impractical and your gut feeling is unreliable. This 3-pass AI workflow scores every page, surfaces the worst offenders, and proposes exactly one action per page — expand, merge, noindex, or delete — with a logged record of what changed so you can measure index and traffic recovery 30 days out.

Who this is for

Content site owners with roughly 100 to 2,000 pages who suspect a chunk of the catalog is thin enough to drag site-wide rankings. Also useful for affiliate sites, recipe sites, documentation sites, and any catalog of generated or templated pages. Less useful for sites with under 50 pages — at that size, manual review is faster than building the workflow.

When to reach for it

Before an AdSense or affiliate program application — both reject sites with too many thin pages. After a content migration where short stub pages may have been imported. After a Google Helpful Content or core update where you notice site-wide ranking drops not tied to specific keywords. Every quarter as a hygiene pass, because thin content reappears as topics drift and templates get cloned.

Before you start

  • Build a page inventory CSV before opening any AI tool. Columns needed: URL, word count, last-modified date, internal links in, organic clicks last 90 days, organic impressions last 90 days, top query.
  • Pull the impressions and clicks data from Search Console; pull word count and internal links from a crawler (Screaming Frog, custom script, or your CMS).
  • Decide your impressions-without-clicks threshold for “candidate for review”. A reasonable default is fewer than 50 clicks and more than 200 impressions over 90 days.
  • Have a redirect strategy: where deleted pages will 301 to, and which pages are good merge targets.
  • Take a Search Console performance screenshot today. You will compare against it 30 days after changes.

Step by step

  1. Build the page inventory CSV. Sort by clicks ascending. The bottom 30% is your initial pool.
  2. Feed batches of 50 rows to AI with the prompt: score each page 1 to 5 for thin-content risk. Flag any page with fewer than 300 words OR fewer than 2 substantive paragraphs OR zero clicks despite over 200 impressions. Output JSON: URL, score, primary reason.
  3. Pass 2: for every flagged page, open the actual HTML and ask AI: does this page answer one specific search intent well, or does it dance around the topic? Flag intent-mismatch separately from depth-mismatch.
  4. Pass 3: action proposal. AI picks one of four actions per page:
    • expand — clear intent, just short. Worth investing 60 to 90 minutes to deepen.
    • merge — overlaps with a sibling URL. Merge to the higher-traffic URL and 301 the loser.
    • noindex — utility page (tag listing, author page, thank-you) with no search value. Robots-meta noindex and remove from sitemap.
    • delete — no traffic, no merge candidate, no expansion potential. 410 or redirect to closest topical hub.
  5. Apply in priority order: delete first (the easiest wins), then noindex (free, low-risk), then merge (requires rewrite), then expand (most labor).
  6. Log every change in an action CSV: old URL, new URL or status code, date applied. Take a fresh Search Console screenshot 30 days later.
  7. Repeat quarterly. Thin content reappears as new content gets created, templates clone, and topics drift.

First-run exercise

Run this on one subfolder first, not the whole site. Pick a category that you suspect is the worst — usually tag pages, author pages, or an early-days topic that never matured. Run all three passes, apply changes, and watch the 30-day Search Console signal. Most sites find 30 to 60% of the suspected subfolder gets noindex or delete, 20% gets merge, and the remaining 20% benefits from expansion. The data from this first pass calibrates your thresholds for the rest of the site.

Quality check

  • Every flagged page has a primary reason recorded. “Looks thin” is not a reason; “200 words, no headers, duplicates content from /sibling-page” is.
  • Action proposals are mutually exclusive — each page has exactly one. AI sometimes suggests “expand or delete”.
  • No page with rising impressions over the last 30 days is in the delete bucket. Rising impressions mean Google is starting to rank it; let it cook.
  • Redirects are mapped before deletes happen. A 410 Gone is fine; a 404 is not.
  • The action CSV is committed to your repo or saved. Without the log, you cannot tell which change caused which recovery.

How to reuse this workflow

  • Save three prompts as snippets: thin-score, intent-check, action-proposal. Each is one-line tweak per batch.
  • Keep the action CSV growing over time. After 4 quarters you have a corpus of which actions actually moved metrics on your site.
  • Build a recurring monitor: any new page that goes 90 days with fewer than 50 clicks and over 200 impressions is auto-flagged for review.
  • Diff each quarterly inventory against the previous. New thin pages are usually one of three patterns: template cloning, half-finished drafts, or topic drift. Address the source, not just the symptom.
  • Share the action CSV with one peer running a similar site. Comparing actions surfaces blind spots in your taxonomy.

Page inventory CSV (URL + word count + clicks + impressions + links-in) → batch thin-content scoring in groups of 50 → intent check on flagged pages → action proposal (expand / merge / noindex / delete) → apply in priority order delete first → log every change → check Search Console 30 days later → repeat quarterly.

Common mistakes

  • Deleting without redirects — internal links die silently and you lose link equity.
  • Merging without rewriting — Google treats partially-merged content as duplicate; rewrite the canonical version.
  • Treating “short” as “thin” — a 250-word page that answers a sharp question can rank fine. The signal is intent fit, not length.
  • Bulk-noindexing without removing from sitemap — Google keeps crawling and ignoring, wasting crawl budget.
  • Trusting AI’s pass-1 score without intent check. A page with 500 words can be thinner than a 250-word page that nails one intent.
  • Skipping the 30-day re-check. Without measurement, you do not know if the changes helped, hurt, or were noise.

FAQ

  • Is there a word count threshold?: There is no magic number. Intent fit and depth matter more than length. A 250-word answer page can outrank a 1,500-word listicle on the same query.
  • What about AdSense impact?: AdSense rejects sites with too many thin pages. Running this audit before applying typically lifts approval odds materially.
  • Should I use noindex or 410?: Noindex for pages with internal utility (tag pages users follow). 410 for pages that should not exist anymore. 301 redirect for pages whose users still need to land somewhere.
  • What if AI flags a page as thin that I know is high quality?: Override it. The workflow proposes; you decide. Track these false positives in the action CSV and refine the prompt.
  • How long for the whole audit?: Roughly 30 minutes setup, then 10 minutes per 50-row batch, plus the time to actually apply changes. A 500-page site is a half-day project.
  • Will this work for non-English sites?: Yes. Be explicit about language in the prompt and adjust word count expectations — Chinese pages routinely communicate more in fewer characters.

Tags: #Tutorial #SEO #AI coding #Thin content #Audit