By the time Search Console tells you about broken links, the affected pages have been bleeding rankings for weeks. The fix is a 30-minute monthly workflow that combines a deterministic checker with AI clustering — the checker finds the URLs, AI groups them by cause and proposes fixes in priority order. This walks through the full loop.
What this covers
A monthly maintenance workflow for content sites with 100+ articles. Output: a triaged list of broken internal + external links, grouped by root cause (404 / redirect chain / typo / dead external), with a fix proposal per cluster. The AI does the grouping and prioritization; the link checker is the source of truth.
Who this is for
Content site owners with 100+ articles. SEO leads at small companies. Indie devs running affiliate or blog sites where dead external links are both an SEO drag and a credibility problem.
When to reach for it
Monthly maintenance. Before any major SEO push (you don’t want fixable 404s eating crawl budget when Google notices the push). After a domain migration, slug rename, or content cleanup. After importing content from another platform.
Before you start
- Install one link checker:
linkinator(npm),lychee(Rust, very fast), orbroken-link-checker. Lychee is the most accurate for external URLs. - Have access to your sitemap (
/sitemap.xml) — the checker uses it as the entry point. - Decide on the threshold for action: external 404s with under 10 backlinks may not be worth fixing; internal 404s are always worth fixing.
Step by step
- Run the link checker against production. Examples:
# Lychee (fastest):
lychee --include-fragments --max-concurrency 20 \
https://yoursite.com/sitemap.xml > broken.txt
# Linkinator:
npx linkinator https://yoursite.com --recurse \
--format JSON > broken.json
Cross-check what the checker found against a Codex sitemap review so you don’t miss URLs the checker never crawled.
- Pipe the output to AI as context. For 100+ findings, paste a CSV; for fewer, paste raw output.
- Prompt the AI:
Here are N broken-link findings from a link checker.
Cluster by root cause: internal 404, internal redirect
chain (>2 hops), typo in URL (extra slash, wrong case),
dead external (host unreachable), external 410 (gone).
For each cluster, propose the fix priority (HIGH/MED/LOW)
based on: how many source articles link to it, whether it
is internal or external, whether it returns 410 vs 404.
- For internal 404s (HIGH priority always): grep source content for the broken URL, fix to the correct slug. Most are typos or post-rename leftovers.
- For internal redirect chains: shorten to one hop in the source content. Chains over 2 hops accumulate latency penalties.
- For dead external links: replace if you can (similar content elsewhere), or remove with a footnote acknowledging the dead reference. Never silently delete — readers may have linked back to your page citing that source.
- Re-run the checker to confirm. The diff between runs is your fix evidence.
First-run exercise
- Run the checker for the first time. Don’t fix anything yet — just read the report.
- Cluster manually first, before involving AI. This calibrates you on what AI gets right vs wrong.
- Pick the highest-volume cluster (usually “rename leftovers” — one slug rename that broke 30 links). Fix that cluster end-to-end.
- Re-run the checker. Confirm those 30 are gone. Now you have ground truth on the workflow.
Quality check
- Are the AI’s clusters actually meaningful, or did it group disparate issues together? Manually spot-check 5 findings from each cluster.
- Did the checker miss URLs that exist in your content? Compare against a
grep -roE "href=\"[^\"]+\"" src/to verify completeness. - For external links flagged as dead, did the checker hit them at a bad moment? Re-test after 24 hours — temporary 5xx errors flap.
- Did the AI propose fixes you can apply without manual judgment? “Replace with similar source” requires you to find the similar source — not a fix, a task.
How to reuse this workflow
- Schedule the checker via cron or GitHub Actions, monthly. Email the report; don’t wait for someone to remember to run it.
- Save the AI clustering prompt as a Custom GPT or saved prompt. The prompt is stable; the input changes.
- Track findings over time. Recurring categories (always rename leftovers) reveal that your slug-rename process needs a fix step, not just a fixing cadence.
Recommended workflow
Checker tool → 50 findings → AI clusters into 4 groups (10 internal 404s, 15 redirect chains, 18 dead external, 7 typos) → fix internal first → replace or remove externals → re-run checker → 3 findings remain (flapping externals) → schedule re-check next month. Total time: 90 minutes. Treat broken-link sweeps as one row in a larger stack-tailored technical SEO checklist so the monthly cadence sits next to schema, hreflang and render-mode checks.
Common mistakes
- Running checker only after Search Console complains — by then you’ve lost weeks of crawl efficiency.
- Silently deleting broken external links — readers and citing pages expect them. Mark as “[archived]” or use a Wayback Machine link instead.
- Not setting up monthly cadence — drift returns within 60 days on an active site.
- Fixing external 404s by removing the link without finding a replacement — half the value of citations is gone.
- Trusting the AI’s “fix” output literally without verifying the target page exists.
- Ignoring redirect chains — they don’t show as broken, but they cost crawl budget and time-to-first-byte.
FAQ
- Which checker is most accurate?: Lychee for speed and accuracy on external URLs. Linkinator for simple Node setups. Both work; pick one and stick with it.
- What about JavaScript-rendered links?: Most checkers see only static HTML. If your site is SPA-heavy, use a checker with headless-browser mode (Lychee with
--headless, or Playwright-based tools). - How fast should I fix?: Internal 404s within 7 days (crawl budget). External within 30 days. Redirect chains within 60 days.
- Will AI replace the link checker?: No. The checker is deterministic; AI is for the layer above (clustering, prioritization, fix proposals). Use both.
- What about checking links inside MDX or markdown content?: Run a separate pre-build script that resolves internal links against your content collection — the link checker only sees the deployed site, not the source.
Related
- SEO audit prompts
- Run site content audit
- AI Check Your Hreflang Setup (Bilingual / Multi-Locale)
- AI Astro content audit tutorial
- Codex sitemap review tutorial
Tags: #Tutorial #SEO #AI coding #Broken links