How to Use AI to Audit an Astro Content Site (Without Reading Every File)

A repeatable AI audit workflow for Astro content sites — catches broken slugs, missing translations, dead internal links, draft leakage, and config drift.

Past 50 articles, an Astro content site develops invisible drift — duplicate slugs across languages, translation keys that don’t match, dead internal links from renamed pages, draft files quietly shipping to production. Reading every file by hand isn’t the answer. This walks through an AI-driven audit that codifies your frontmatter contract into a script, fixes findings in safe batches, and wires the script into prebuild so the drift can’t return.

What this covers

A reproducible AI audit workflow for Astro content sites — catches broken slugs, missing translations, dead internal links, draft leakage, and config drift. The output is not just a list of problems; it’s a script that runs on every build and a fix plan ordered by severity.

Key tools and concepts:

  • Astro: A modern frontend framework optimized for content-heavy static and hybrid sites. Content collections live in src/content/, typically as MDX with YAML frontmatter.
  • Frontmatter contract: The set of fields every content file must have (urlSlug, translationKey, lang, draft, etc.). Most Astro audits start by formalizing this.
  • Drift: When new content additions slowly violate the contract you set 6 months ago. Inevitable without enforcement.

Who this is for

Indie devs and content engineers running an Astro content site that has grown past 50 articles, especially bilingual or multi-locale sites where translation pairing creates extra invariants to verify.

When to reach for it

Before a launch. After a refactor. Quarterly. After importing content from another platform. Anywhere you need to verify “nothing broke while we were adding content.” Once article-level audits are clean, run the same playbook one level up with the AI category-page audit tutorial — category pages are the surface most teams forget and they quietly drag down SEO.

Before you start

  • Be on a clean working tree. Some fixes are bulk renames; you want to be able to revert.
  • Pick an AI with file access: Claude Code or Codex (best), or Cursor’s agent mode. Pure-chat AI works but the copy-paste cost is real.
  • Have the build command working locally (npm run build). The audit ends by wiring the script into prebuild, and you need a green baseline.

Step by step

  1. Discover shape. Run find src/content -name "*.mdx" | wc -l to confirm count. Pipe the file list to the agent as context: “These are all my MDX files.”
  2. Ask Claude Code or Codex: “Read 5 random MDX files and write me the frontmatter contract — which fields are required, which are optional, what values are allowed for lang/category/subcategory.”
  3. Review the contract. Tighten any field that’s actually required but the AI marked optional.
  4. Generate an audit script. Use this prompt:
Write scripts/audit-content.mjs that walks src/content/**/*.mdx
and flags:
- missing required frontmatter fields per the contract above
- duplicate urlSlug within the same lang
- translationKey present in one lang but not the other
- draft: true files (warn only)
- internal links (/en/articles/... or /zh/articles/...) that
  do not resolve to an existing urlSlug
Output: one finding per line, prefixed by SEVERITY (BLOCKER /
WARN / INFO), grouped at the end with totals.
  1. Run the script. Capture output (node scripts/audit-content.mjs > audit.txt).
  2. Feed the audit output back to AI: “Here are N findings. Group by root cause. Propose the smallest fix per group.”
  3. For each blocker group, ask the AI to write the actual patch (frontmatter edit, file rename, or link rewrite). Apply in small batches; re-run the script after each batch.
  4. Wire the script into prebuild in package.json so any future drift fails the build before deploy.

First-run exercise

  1. Run the audit and DON’T fix anything yet. Just read the report.
  2. Pick the single most common finding (usually translationKey mismatches or duplicate urlSlugs). Fix only that category, then re-run.
  3. Confirm the count dropped by exactly the number you fixed. If it didn’t, the script has a bug — fix the script before fixing more content.
  4. Now move to the next category. Iterative beats one-shot because each fix can introduce new findings.

Quality check

  • Does the script flag findings you can verify by grep-ing the repo manually? If you grep for the slug and the script’s claim doesn’t hold, the script has false positives.
  • Are findings deterministic across runs? Random ordering of results is fine; random count of results is a bug.
  • Did the script catch a known-broken case you planted as a test (e.g., temporarily set draft: true on a published article)? If not, the rule isn’t checking what you thought.
  • Does it run fast enough for prebuild? Under 3 seconds for 1000 files is normal; longer means inefficient I/O.

How to reuse this workflow

  • Save scripts/audit-content.mjs in version control. Don’t regenerate it each time — evolve it.
  • Add new rules incrementally as you find new drift patterns. The script gets stronger over the life of the site.
  • Re-run the audit before any major content batch (e.g., before adding 50 articles), not just after, so you start from a known-clean state.

List MDX → discover frontmatter contract → generate audit script → run → AI groups findings + proposes fixes → apply in batches → re-run after each batch → wire into prebuild. Once frontmatter and slugs are healthy, layer on a technical SEO checklist generated for your stack so render mode, hreflang and schema items get checked alongside content.

Common mistakes

  • Doing the audit by reading files manually — you’ll miss things, and you can’t run it again next quarter.
  • Applying all fixes at once — if the build breaks, you can’t tell which fix did it. Small batches.
  • Not wiring into prebuild — the same drift returns next month because nothing prevents it.
  • Letting AI write rules without examples — give it 3-5 known-bad files; rules built from real cases are sharper.
  • Treating WARN-level findings as BLOCKER — you’ll never ship. Triage honestly.
  • Re-running the script before all fixes are committed — uncommitted state confuses the next round.

FAQ

  • Which AI works best?: Claude Code or Codex with direct file access. Pure-chat AI works but you copy-paste a lot, and that erodes the workflow.
  • Bilingual sites?: Treat each lang as a separate audit pass, then run a paired-key pass at the end (every translationKey in /en has a match in /zh).
  • What about non-MDX content?: Same workflow, different glob. JSON content files, Astro markdown, even YAML data files all benefit from the same contract-then-script approach.
  • How often should I run this?: Quarterly minimum. Before any major content import. After any refactor of the content schema.
  • My audit script is hitting a memory limit on 1000+ files.: Stream the file list instead of loading all at once. Or split the audit into category passes.

Tags: #Tutorial #SEO #AI coding #Astro #Audit