Redirect Map Grew to Thousands of Entries and Slows Builds

Your _redirects file is 3,000 lines. Builds are slower, crawlers chase chains, and nobody knows which rules are still needed. How to audit, collapse, and prune.

Two years ago your _redirects file had 40 entries. Today it has 3,200. Every URL slug change, every taxonomy rework, every typo-fix on a published slug added a line and nobody ever removed one. Now builds spend 8 extra seconds parsing the file, edge functions hit a memory ceiling on cold start, and Search Console’s “Page with redirect” bucket keeps growing because Googlebot is walking 3-hop chains that finally end at a 410. The redirect map has become its own technical-debt artifact.

This article walks through how the bloat happens, how to safely audit and collapse it, and the rules that prevent it from refilling.

Common causes

Ordered by hit rate, highest first.

1. Every slug rename added a forward rule, never a backward one

Editorial renamed /why-claude-is-better/ to /claude-vs-gpt-comparison/. A redirect went in. Six months later they renamed it again to /claude-4-vs-gpt-5/. Another redirect. Now there’s a 3-hop chain: hop 1 → hop 2 → hop 3 → 200.

How to spot it: Run curl -sIL <old-url> on a sample of redirect sources. If you see more than one Location header before the final status, you have chains.

2. Bulk taxonomy rework dumped hundreds of rules in one PR

You renamed /category/ai-tools/ to /ai-applications/. The PR added 400 redirects covering every child URL. None of those rules were collapsed into a wildcard.

How to spot it: grep -c "^/category/ai-tools" _redirects. If the count is in the hundreds and every line maps to a sibling under /ai-applications/, a wildcard would replace them.

3. Redirects for URLs Google has already forgotten

You added a redirect in 2023 for a URL that got 12 visits total and hasn’t been crawled in 18 months. The rule still ships on every deploy.

How to spot it: Cross-reference redirect source paths with the last 12 months of Search Console + analytics. Sources with zero impressions and zero clicks are dead weight.

4. Trailing-slash, www, and protocol normalizations live in the map instead of at the edge

Each of /foo, /foo/, http://, https://, www., non-www. got its own redirect line per page. That’s 6x rules per URL that should be one host-level rule.

How to spot it: Same destination appearing 4-6 times with only host/slash differences in the source.

5. Imported legacy redirects from a CMS migration nobody owns

When you migrated off WordPress, you exported the entire redirect plugin DB. 600 rules of “old-old-CMS” URLs that nobody has linked to in 5 years.

How to spot it: Look for source patterns inconsistent with your current URL structure (/?p=1234, /index.php?cat=..., /wp-content/...).

6. Duplicate rules with conflicting targets

Same source path appears twice with different destinations. Which one wins is platform-dependent (first match on Netlify, last on Cloudflare Workers).

How to spot it: awk '{print $1}' _redirects | sort | uniq -d.

7. Soft-404 redirects to homepage

When a page was deleted, somebody redirected to /. Now hundreds of unrelated URLs all 301 to the homepage. Google treats these as soft-404s and reports them as “Submitted URL seems to be a Soft 404.”

How to spot it: grep -E " / 30[12]$" _redirects | wc -l. If it’s more than a handful, this is a pattern, not an exception.

Shortest path to fix

Step 1: Snapshot and version the current map

Before touching anything, copy _redirects to a dated backup so you can compare and roll back:

cp public/_redirects public/_redirects.snapshot-$(date +%Y%m%d).txt
git add public/_redirects.snapshot-*.txt
git commit -m "snapshot: redirect map before audit"

Step 2: Resolve all rules to final destinations

Walk every redirect end-to-end and write the resolved chain to a TSV. This is the single most useful artifact in the whole cleanup.

// scripts/resolve-redirects.mjs
import fs from 'node:fs';
const lines = fs.readFileSync('public/_redirects', 'utf8').split('\n');
const map = new Map();
for (const line of lines) {
  const m = line.match(/^(\S+)\s+(\S+)\s+(\d+)/);
  if (m) map.set(m[1], { to: m[2], code: m[3] });
}
function resolve(path, hops = 0) {
  if (hops > 10) return { final: path, hops, loop: true };
  const rule = map.get(path);
  if (!rule) return { final: path, hops };
  return resolve(rule.to, hops + 1);
}
for (const [from] of map) {
  const r = resolve(from);
  process.stdout.write(`${from}\t${r.final}\t${r.hops}\t${r.loop ? 'LOOP' : ''}\n`);
}

Any row with hops > 1 is a chain to collapse. Any LOOP row is a bug, fix it immediately.

Step 3: Collapse chains in one pass

Rewrite every multi-hop rule to point directly at the final destination. This usually cuts 20-30% of crawler-visible latency and removes the soft-404 risk where chains end at deleted pages.

Step 4: Replace bulk rules with patterns

Most edge platforms support wildcards. Replace 400 sibling redirects with one rule:

# Before
/category/ai-tools/chatgpt        /ai-applications/chatgpt        301
/category/ai-tools/claude         /ai-applications/claude         301
# ... 398 more

# After
/category/ai-tools/*              /ai-applications/:splat         301

Verify with curl -sI on 5-10 samples before deleting the originals.

Cross-reference redirect sources with:

  • Search Console: any impressions in 12 months?
  • Analytics: any sessions in 12 months?
  • Internal links: does anything on the live site point at this URL?
  • Backlinks: any external referrer in your backlink tool?

If all four are “no” AND the rule is older than 12 months, delete it. The URL is gone from the indexable web.

Step 6: Move normalization off the redirect map

Trailing slash, host, and protocol enforcement belongs in edge config or framework config, not as per-page rules. On Astro:

// astro.config.mjs
export default defineConfig({
  trailingSlash: 'always',
  site: 'https://example.com',
});

Then delete every per-page slash variant from _redirects.

Step 7: Lock in a cap and a TTL

Add a CI check that fails the build if _redirects exceeds N lines or contains rules older than M months without an annotation:

// scripts/check-redirects.mjs
const MAX_LINES = 800;
const lines = fs.readFileSync('public/_redirects', 'utf8').split('\n').filter(Boolean);
if (lines.length > MAX_LINES) {
  console.error(`Redirect map has ${lines.length} lines (cap: ${MAX_LINES})`);
  process.exit(1);
}

Pair with a comment convention: every rule must have # added: YYYY-MM-DD reason: ... on the line above. Annual review removes anything stale.

When this is not on you

A redirect map is a ledger of past decisions. Some bloat is the cost of legitimate URL hygiene over years. The goal is not zero, it is “auditable and not chain-y.” If your business genuinely renamed sections four times, you will have multi-hundred rules and that is fine as long as the chains are collapsed and each rule is justified.

Easy to misdiagnose as

  • “Builds are slow because of MDX.” Often the redirect map parse is the actual hot path; profile before you blame content.
  • “Crawl budget problem on content pages.” When Googlebot reports lots of “Page with redirect,” it is spending budget on your redirect map, not on new content.
  • “Edge function memory ceiling needs upgrade.” If the function loads _redirects into memory on cold start, shrinking the file is cheaper than upsizing the runtime.

Prevention

  • Collapse multi-hop chains in the same PR that adds the new redirect (write a script, not a process).
  • Wildcards for bulk taxonomy renames; never enumerate siblings by hand.
  • Annotate every rule with # added: YYYY-MM-DD reason: so future-you knows whether to delete it.
  • Quarterly job: drop any rule older than 12 months with zero impressions, zero clicks, zero inlinks, zero backlinks.
  • Push protocol/host/slash normalization to edge config, never per-page.
  • CI line-count cap so the file cannot silently grow past your threshold.

FAQ

  • Will deleting old redirects hurt SEO? Only if the source URL still has impressions, inlinks, or backlinks. If all four signals are zero for 12+ months, the URL is gone from the indexable web and the rule is dead weight.
  • 301 vs 302 for slug renames? 301 (permanent) for renames you intend to keep. 302 only for genuinely temporary moves, which on a content site is rare.

Tags: #Content ops #Troubleshooting #SEO #redirects #site-performance #Crawl budget