Use AI to Find Broken Links Before Google Does

A monthly workflow to surface broken internal + external links using AI + simple tooling.

By the time Search Console tells you about broken links, the affected pages have been bleeding rankings for weeks. The fix is a 30-minute monthly workflow that combines a deterministic checker with AI clustering — the checker finds the URLs, AI groups them by cause and proposes fixes in priority order. This walks through the full loop.

What this covers

A monthly maintenance workflow for content sites with 100+ articles. Output: a triaged list of broken internal + external links, grouped by root cause (404 / redirect chain / typo / dead external), with a fix proposal per cluster. The AI does the grouping and prioritization; the link checker is the source of truth.

Who this is for

Content site owners with 100+ articles. SEO leads at small companies. Indie devs running affiliate or blog sites where dead external links are both an SEO drag and a credibility problem.

When to reach for it

Monthly maintenance. Before any major SEO push (you don’t want fixable 404s eating crawl budget when Google notices the push). After a domain migration, slug rename, or content cleanup. After importing content from another platform.

Before you start

  • Install one link checker: linkinator (npm), lychee (Rust, very fast), or broken-link-checker. Lychee is the most accurate for external URLs.
  • Have access to your sitemap (/sitemap.xml) — the checker uses it as the entry point.
  • Decide on the threshold for action: external 404s with under 10 backlinks may not be worth fixing; internal 404s are always worth fixing.

Step by step

  1. Run the link checker against production. Examples:
# Lychee (fastest):
lychee --include-fragments --max-concurrency 20 \
  https://yoursite.com/sitemap.xml > broken.txt

# Linkinator:
npx linkinator https://yoursite.com --recurse \
  --format JSON > broken.json

Cross-check what the checker found against a Codex sitemap review so you don’t miss URLs the checker never crawled.

  1. Pipe the output to AI as context. For 100+ findings, paste a CSV; for fewer, paste raw output.
  2. Prompt the AI:
Here are N broken-link findings from a link checker.
Cluster by root cause: internal 404, internal redirect
chain (>2 hops), typo in URL (extra slash, wrong case),
dead external (host unreachable), external 410 (gone).
For each cluster, propose the fix priority (HIGH/MED/LOW)
based on: how many source articles link to it, whether it
is internal or external, whether it returns 410 vs 404.
  1. For internal 404s (HIGH priority always): grep source content for the broken URL, fix to the correct slug. Most are typos or post-rename leftovers.
  2. For internal redirect chains: shorten to one hop in the source content. Chains over 2 hops accumulate latency penalties.
  3. For dead external links: replace if you can (similar content elsewhere), or remove with a footnote acknowledging the dead reference. Never silently delete — readers may have linked back to your page citing that source.
  4. Re-run the checker to confirm. The diff between runs is your fix evidence.

First-run exercise

  1. Run the checker for the first time. Don’t fix anything yet — just read the report.
  2. Cluster manually first, before involving AI. This calibrates you on what AI gets right vs wrong.
  3. Pick the highest-volume cluster (usually “rename leftovers” — one slug rename that broke 30 links). Fix that cluster end-to-end.
  4. Re-run the checker. Confirm those 30 are gone. Now you have ground truth on the workflow.

Quality check

  • Are the AI’s clusters actually meaningful, or did it group disparate issues together? Manually spot-check 5 findings from each cluster.
  • Did the checker miss URLs that exist in your content? Compare against a grep -roE "href=\"[^\"]+\"" src/ to verify completeness.
  • For external links flagged as dead, did the checker hit them at a bad moment? Re-test after 24 hours — temporary 5xx errors flap.
  • Did the AI propose fixes you can apply without manual judgment? “Replace with similar source” requires you to find the similar source — not a fix, a task.

How to reuse this workflow

  • Schedule the checker via cron or GitHub Actions, monthly. Email the report; don’t wait for someone to remember to run it.
  • Save the AI clustering prompt as a Custom GPT or saved prompt. The prompt is stable; the input changes.
  • Track findings over time. Recurring categories (always rename leftovers) reveal that your slug-rename process needs a fix step, not just a fixing cadence.

Checker tool → 50 findings → AI clusters into 4 groups (10 internal 404s, 15 redirect chains, 18 dead external, 7 typos) → fix internal first → replace or remove externals → re-run checker → 3 findings remain (flapping externals) → schedule re-check next month. Total time: 90 minutes. Treat broken-link sweeps as one row in a larger stack-tailored technical SEO checklist so the monthly cadence sits next to schema, hreflang and render-mode checks.

Common mistakes

  • Running checker only after Search Console complains — by then you’ve lost weeks of crawl efficiency.
  • Silently deleting broken external links — readers and citing pages expect them. Mark as “[archived]” or use a Wayback Machine link instead.
  • Not setting up monthly cadence — drift returns within 60 days on an active site.
  • Fixing external 404s by removing the link without finding a replacement — half the value of citations is gone.
  • Trusting the AI’s “fix” output literally without verifying the target page exists.
  • Ignoring redirect chains — they don’t show as broken, but they cost crawl budget and time-to-first-byte.

FAQ

  • Which checker is most accurate?: Lychee for speed and accuracy on external URLs. Linkinator for simple Node setups. Both work; pick one and stick with it.
  • What about JavaScript-rendered links?: Most checkers see only static HTML. If your site is SPA-heavy, use a checker with headless-browser mode (Lychee with --headless, or Playwright-based tools).
  • How fast should I fix?: Internal 404s within 7 days (crawl budget). External within 30 days. Redirect chains within 60 days.
  • Will AI replace the link checker?: No. The checker is deterministic; AI is for the layer above (clustering, prioritization, fix proposals). Use both.
  • What about checking links inside MDX or markdown content?: Run a separate pre-build script that resolves internal links against your content collection — the link checker only sees the deployed site, not the source.

Tags: #Tutorial #SEO #AI coding #Broken links