Tag Archive Pages With Zero Articles: Empty Pages Bloating the Sitemap

Q: Should I use `410` or `404` for a removed tag page?

Both deindex the URL. `410 Gone` says "permanently removed" and Google typically acts on it faster (about 1-2 weeks vs 2-4 for `404`, as of June 2026). Use `410` when the tag is gone for good; use `404` if you might revive it later. Never leave it returning `200`.

Q: Why not just add `noindex` to the empty tag page and keep it?

You can — `noindex` drops a page in about 3-7 days after the next crawl. But an empty archive with zero articles is dead weight even if deindexed: it still wastes crawl budget and offers users nothing. Removing the route entirely (Step 3) plus a `410` is cleaner than keeping a hollow page around.

Q: How long until Search Console stops flagging them?

Expect the `Soft 404` / `Crawled - currently not indexed` counts for tag URLs to drop over 1-2 weeks once Google re-crawls and sees the `410` and the cleaned sitemap. You can nudge a specific URL with **URL Inspection → Request indexing** so Google re-fetches it sooner.

Tag pages render but have 0 published articles after a cleanup. Audit tag counts, set a minimum-per-tag prebuild rule, and 410 the empty archives so Google drops them fast.

Published: May 24, 2026 Updated: Jun 18, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You audit your tag pages and find 23 of them showing “No articles found.” The tags were used by articles you later deleted, or by drafts that never published. The empty pages still render, still sit in the sitemap, and Google still crawls them. In Search Console they show up under Page indexing as Crawled - currently not indexed or, worse, Soft 404 — Google sees a live 200 OK page with no real content and treats it as a non-page.

Fastest fix: stop generating tag pages that fall below a minimum article count (set MIN = 3 in your tag route’s getStaticPaths), drop those tags from the sitemap, and return 410 Gone for any tag URL Google already indexed. The 410 is the part that makes Google remove them quickly — as of June 2026 a 410 is typically dropped within 1-2 weeks (sometimes 3-5 days on frequently crawled sites), versus 2-4 weeks for a 404.

The full fix has two layers: backfill (audit existing empty tags, then merge or 410 them) and prevention (a prebuild rule so an empty tag page can never be generated again).

Which bucket are you in

Run the audit script in Step 1 first, then match what you see:

Symptom	Likely cause	Go to
Tag count is `0`, articles were recently removed	Articles using the tag were deleted	Cause 1
Tag count is `0`, but files still exist	Those articles are all `draft: true`	Cause 2
Two near-identical tags, one empty	Typo / singular-plural duplicate	Cause 3
Many tags with exactly `1` article	Generator builds a page per frontmatter value, no floor	Cause 4
Old tag URL 404s, no article references it	Tag renamed without a redirect	Cause 5

Common causes

1. Articles using a tag were deleted

You bulk-deleted thin articles. Some of them were the only users of a particular tag, so that tag’s archive page now has zero content.

How to spot it: list every tag used in frontmatter and check the article count (see Step 1). Any tag that still has a route generated but a count of 0 is an orphan.

2. Articles using a tag were all set to draft

A quality push set some articles to draft: true. If those were the last published users of a tag, the tag page renders the “no articles” empty state even though the .mdx files still exist on disk.

How to spot it: same script, but exclude draft: true from the count. A tag that has files but a published count of 0 is in this bucket.

3. Tag taxonomy has typos creating duplicates

You have ai-tools and ai-tool as separate tags. The plural has articles; the singular has one stray article or none. One archive page is empty, the other holds the content.

How to spot it: dump all tag values, sort them, and scan for near-duplicates:

grep -rhoE 'tags:.*' src/content/articles/ | tr -d '[]"' | tr ',' '\n' | sed 's/tags://' | sort | uniq -c | sort -n

4. Auto-generated tag pages for every frontmatter value

Your tag generator walks frontmatter and creates a page for every distinct tag value with no minimum count. One-off tags (tags: ["experimental-feature"] on a single article) get a tag page that will probably stay near-empty forever.

How to spot it: count articles per tag. A tag with 1 article is a “thin tag”; 0 is an orphan. Both are weak archive pages Google will treat as low value.

5. Tag rename without a redirect

You renamed chatgpt to chat-gpt (or back). The old tag page no longer matches any article, so it disappears from the build but stays in the sitemap and in Google’s index — now a 404 or soft 404.

How to spot it: compare tag slugs that appear in your sitemap or in Search Console’s indexed URLs against the tags present in current article frontmatter. Anything indexed but no longer referenced is a stranded rename.

Shortest path to fix

Step 1: Inventory orphan and thin tags

Build one tag-usage report so you can decide per tag instead of guessing. This counts published articles per tag and flags everything at or below the threshold:

// scripts/audit-tag-usage.mjs  ->  run with: node scripts/audit-tag-usage.mjs
import fs from "node:fs";
import path from "node:path";
import matter from "gray-matter";

const MIN = 3; // keep in sync with the route + sitemap threshold
const counts = new Map(); // key: `${lang}:${tag}`

function walk(dir) {
  for (const e of fs.readdirSync(dir, { withFileTypes: true })) {
    const p = path.join(dir, e.name);
    if (e.isDirectory()) { walk(p); continue; }
    if (!p.endsWith(".mdx")) continue;
    const { data } = matter(fs.readFileSync(p, "utf8"));
    if (data.draft) continue; // count only published
    const lang = data.lang || "en";
    for (const t of (data.tags || [])) {
      const key = `${lang}:${t}`;
      counts.set(key, (counts.get(key) || 0) + 1);
    }
  }
}
walk("src/content/articles");

for (const [key, n] of [...counts.entries()].sort((a, b) => a[1] - b[1])) {
  if (n < MIN) console.log(`${n < 1 ? "ORPHAN" : "THIN"}  ${key}  ->  ${n} article(s)`);
}

Orphan tags have 0 published articles; thin tags have 1 or 2. Both fall below the floor you will enforce in Step 3.

Step 2: Decide per tag — merge, deindex, or backfill

For each orphan or thin tag, pick one:

- Merge into a similar tag: rename it in every article's frontmatter, then 410 the old tag URL.
- Deindex the tag page: stop generating it (Step 3) + drop it from the sitemap (Step 4) + 410 it.
- Backfill: if the tag is genuinely valuable, write 2+ real articles so it clears the floor.

Merging is usually best. It collapses near-duplicates, concentrates internal link authority on one archive, and removes the stranded URL in the same move. Reserve backfill for tags you actually want as a topic hub.

Step 3: Add a prebuild rule — a tag needs `N+` articles to render

Enforce a minimum (3 published articles is a reasonable floor) in the tag route’s getStaticPaths, so an empty tag page can never be built in the first place:

// src/pages/[lang]/tags/[tag].astro
import { getCollection } from "astro:content";

export async function getStaticPaths() {
  const all = await getCollection("articles");
  const MIN = 3; // same number as the audit + sitemap
  const counts = new Map();
  for (const a of all) {
    if (a.data.draft) continue;
    for (const t of (a.data.tags || [])) {
      const key = `${a.data.lang}:${t}`;
      counts.set(key, (counts.get(key) || 0) + 1);
    }
  }
  return [...counts.entries()]
    .filter(([, n]) => n >= MIN)
    .map(([key]) => {
      const [lang, tag] = key.split(":");
      return { params: { lang, tag } };
    });
}

Now only tags with 3+ published articles get a page. Empty and thin tag pages stop existing at build time, so there is nothing for Google to crawl.

Step 4: Drop deindexed and removed tags from the sitemap

Whatever generates your sitemap must apply the same MIN filter, or you will keep advertising URLs that no longer build:

// src/pages/sitemap.xml.ts (sketch) — reuse the exact filter from Step 3
const tagPaths = /* the same n >= MIN list as getStaticPaths */;
// Emit a <url> only for those; never list a tag below the floor.

A clean sitemap stops new crawls of dead tags. But URLs Google already indexed will linger until it re-crawls and gets a removal signal — that is Step 5.

Step 5: Return `410 Gone` for already-indexed tag URLs

A page that simply stops building usually returns your host’s 404. A 404 works, but 410 Gone (“permanently removed, not coming back”) is the stronger, faster signal — Google drops 410s in roughly 1-2 weeks as of June 2026.

The catch most people miss: how you serve a 410 depends on your host, and most static _redirects files cannot emit 410.

Netlify — _redirects (or a [[redirects]] block in netlify.toml) does support a forced 410. The trailing ! forces the status even if a file matches:
```
/en/tags/deprecated-tag/  /en/tags/deprecated-tag/  410!
/zh/tags/deprecated-tag/  /zh/tags/deprecated-tag/  410!
```
Cloudflare Pages — its _redirects file only supports 3xx codes (301, 302, 303, 307, 308) as of June 2026; there is no 410!. Use a Pages Function (or a Worker route) that returns new Response("Gone", { status: 410 }) for the dead paths.
Firebase Hosting — redirects in firebase.json only supports 301, 302, 308. For a real 410 you must rewrite the path to a Cloud Function / Cloud Run service that calls res.status(410).send("Gone").

If you cannot easily emit a 410, a plain 404 plus a clean sitemap still gets the URLs removed — just a week or two slower. Do not robots.txt-block a still-indexed URL: that stops Google from crawling it, which means it never sees the 410/404 and the page can stay indexed.

Step 6: (Optional) Speed up removal with the GSC Removals tool

If a dead tag URL is ranking or visible right now and you want it hidden today, open Search Console → Removals → New request → Temporarily remove URL. This hides it from Search and clears the cached snippet for about 6 months — but it is temporary. It only buys time; the permanent removal still comes from the 410/404. Pair the two: temporary removal to hide it immediately, 410 to deindex it for good.

How to confirm it’s fixed

Re-run the audit (Step 1). Every tag that still has a route should report >= MIN; no ORPHAN/THIN lines should map to a built page.
Build and grep the output. After a build, confirm the dead tag’s HTML is gone: find dist -path '*tags/deprecated-tag*' should return nothing.
Check the status code. curl -I https://yoursite.com/en/tags/deprecated-tag/ should return HTTP/2 410 (or 404 if you went that route), not 200.
Check the sitemap. Open /sitemap.xml (or your sitemap index) and confirm no <loc> points at a below-threshold tag.
Watch Search Console. Over the next 1-2 weeks the count under Soft 404 and Crawled - currently not indexed for tag URLs should fall. Use URL Inspection on one removed tag to confirm Google now sees the 410.

Prevention

Tag route and sitemap share a single MIN constant (3 is a reasonable floor) so they can never drift apart.
The build fails CI if any tag drops below the floor unexpectedly — add the Step 1 script to prebuild and exit non-zero on a new orphan.
A tag rename must update every using article in the same PR, and add the 410 for the old slug.
Quarterly tag audit: merge near-duplicates, prune thin tags, retire one-off tags.
Validate frontmatter tag values against a controlled vocabulary so typos like ai-tool vs ai-tools are rejected before they ever create a page.

FAQ

Should I use 410 or 404 for a removed tag page? Both deindex the URL. 410 Gone says “permanently removed” and Google typically acts on it faster (about 1-2 weeks vs 2-4 for 404, as of June 2026). Use 410 when the tag is gone for good; use 404 if you might revive it later. Never leave it returning 200.

Why not just add noindex to the empty tag page and keep it? You can — noindex drops a page in about 3-7 days after the next crawl. But an empty archive with zero articles is dead weight even if deindexed: it still wastes crawl budget and offers users nothing. Removing the route entirely (Step 3) plus a 410 is cleaner than keeping a hollow page around.

My _redirects file ignores the 410 — why? Almost certainly because you are on Cloudflare Pages or Firebase Hosting, whose static redirect files only support 3xx codes (and Firebase adds 404 via a custom page). Only Netlify’s _redirects emits a forced 410!. On the others, return 410 from a Pages Function / Worker (Cloudflare) or a rewrite to a Cloud Function (Firebase).

Will deindexing the empty tags hurt the rest of my site? No — the opposite. Removing zero- and low-content archive pages improves your average page quality and concentrates crawl budget on real articles. The risk is only if you accidentally 410 a tag that does have articles, so confirm the count in Step 1 before retiring anything.

How long until Search Console stops flagging them? Expect the Soft 404 / Crawled - currently not indexed counts for tag URLs to drop over 1-2 weeks once Google re-crawls and sees the 410 and the cleaned sitemap. You can nudge a specific URL with URL Inspection → Request indexing so Google re-fetches it sooner.

Tags: #Content ops #Site quality #Site audit #Troubleshooting #Tag page

Which bucket are you in

Common causes

1. Articles using a tag were deleted

2. Articles using a tag were all set to draft

3. Tag taxonomy has typos creating duplicates

4. Auto-generated tag pages for every frontmatter value

5. Tag rename without a redirect

Shortest path to fix

Step 1: Inventory orphan and thin tags

Step 2: Decide per tag — merge, deindex, or backfill

Step 3: Add a prebuild rule — a tag needs N+ articles to render

Step 4: Drop deindexed and removed tags from the sitemap

Step 5: Return 410 Gone for already-indexed tag URLs

Step 6: (Optional) Speed up removal with the GSC Removals tool

How to confirm it’s fixed

Prevention

FAQ

Related

Related Articles

Internal Link Rot: Articles Point to Renamed or Deleted Slugs

Canonical Points to the Wrong Page: Translations Canonicalize Back to English

FAQ Rich Result Gone in Google? It's Deprecated, Not Your Schema

Hreflang Misconfigured Between EN and ZH: No Return Tags, Wrong Codes, Missing x-default

Image Alt Text Missing in Bulk: Audit, Backfill, and Lock It In

Publish Date Stuck in the Past: Articles Look Stale After Real Refreshes

Step 3: Add a prebuild rule — a tag needs `N+` articles to render

Step 5: Return `410 Gone` for already-indexed tag URLs