Which AI works best for this?

Claude Code (Anthropic models) or OpenAI Codex CLI (GPT-5.5), both with direct file access. Codex sandboxes writes to the project directory by default; Claude Code prompts for approval before each write. Pure-chat models work but the copy-paste cost erodes the workflow.

Does this work on Astro 5 and 6?

Yes. The audit reads raw `.mdx` source, so it's independent of Astro's runtime. Just remember the config file is `src/content.config.ts` (renamed in Astro 5.0) and entries expose `id`, not the old `slug`.

How do I handle bilingual sites?

Run each `lang` as its own pass for slug uniqueness, then a final paired-key pass asserting every `translationKey` in `/en` has a match in `/zh`. Treat a missing pair as WARN while a translation is in progress, BLOCKER once it ships.

What about non-MDX content?

Same workflow, different glob. JSON content files, plain Astro Markdown, and YAML data files all benefit from the contract-then-script approach — point `gray-matter` (or `JSON.parse`) at the right files.

How often should I run this?

It runs on every build via `prebuild`. Run it manually before any large content import and after any content-schema refactor.

AI Tool Tutorials

How to Use AI to Audit an Astro Content Site (Without Reading Every File)

A reproducible AI audit for Astro 5/6 content sites: catch duplicate slugs, missing translations, dead internal links, and draft leakage with one script wired into prebuild.

Published: May 17, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Past 50 articles, an Astro content site develops invisible drift: a duplicate urlSlug across two languages, a translationKey that points nowhere, an internal link left dangling after a page rename, a file with draft: true that quietly ships anyway. Reading every file by hand doesn’t scale and isn’t repeatable. This tutorial walks through an AI-driven audit that turns your frontmatter rules into one audit-content.mjs script, fixes findings in severity-ordered batches, and wires the script into prebuild so the same drift can’t return on the next deploy.

TL;DR

Have an agent with real file access (Claude Code or OpenAI Codex CLI) read 5-10 of your MDX files and write down your frontmatter contract.
Turn that contract into a single scripts/audit-content.mjs that parses frontmatter with gray-matter and flags four classes of problem: missing required fields, duplicate urlSlug per language, broken translationKey pairing, and dead internal links.
Have the script process.exit(1) on hard problems and warn-only on soft ones, then run node scripts/audit-content.mjs and fix in small batches.
Add the script to prebuild in package.json so npm run build fails before a broken page reaches production.

What this covers

The output here is not just a list of problems; it’s a script that runs on every build plus a fix plan ordered by severity. On Astro 5 and 6, content lives in collections defined by src/content.config.ts (renamed from src/content/config.ts in Astro 5.0), loaded with the built-in glob() loader, and each entry exposes an id rather than the old slug field. Your audit reads the raw .mdx source directly, so it doesn’t depend on Astro’s runtime at all.

Key terms:

Frontmatter contract: the fields every content file must carry (urlSlug, translationKey, lang, draft, category, and so on), plus the allowed values for each.
Drift: new content slowly violating the contract you set six months ago. Without an enforced check, it is guaranteed.
BLOCKER vs WARN: a BLOCKER breaks the build (duplicate slug, missing required field); a WARN is logged but ships (an unmatched translationKey while a translation is still in progress).

Who this is for

Indie devs and content engineers running an Astro content site past 50 articles, especially bilingual or multi-locale sites where every translation pair adds invariants to verify. If your repo already has a prebuild step, this drops straight in.

When to reach for it

Before a launch. After a content-schema refactor. Quarterly. After importing content from another CMS. Anytime you need to prove “nothing broke while we added content.” Once article-level audits are clean, run the same playbook one level up with the AI category-page audit tutorial — category and tag pages are the surface most teams forget, and they quietly drag down crawl efficiency.

Pick the right AI agent

The workflow depends on an agent that can read and edit files in place, not a chat window you paste into. As of June 2026, the two practical choices:

Agent	Model(s)	File access	Edit modes
Claude Code	Anthropic only (Sonnet 4.6 / Opus 4.7)	Full project tree, approval before writes	Plan, then apply patches on approval
OpenAI Codex CLI	GPT-5.5	Sandboxed to the current project directory	Suggest (default) / auto-edit / full-auto

Both ship as terminal CLIs and read a project memory file (CLAUDE.md for Claude Code, AGENTS.md for Codex) you can use to pin the frontmatter contract. Cursor’s agent mode works too if you live in the editor. Pure-chat models work in theory, but the copy-paste tax erodes the whole point of a repeatable audit. See the Claude Code SEO audit tutorial for how the same agent extends to technical SEO checks. Official docs: Claude Code and Codex CLI.

Before you start

Be on a clean working tree (git status shows nothing). Several fixes are bulk renames; you want a clean git revert available.
Confirm the build works locally. On Astro 5+ the audit must run inside prebuild, and you need a green baseline first. If the build OOMs at scale, raise the heap: NODE_OPTIONS=--max-old-space-size=8192 npm run build.
Install a frontmatter parser once: npm i -D gray-matter. It handles YAML edge cases (multi-line strings, lists) that a hand-rolled regex will miss.

Step by step

Count the surface. Run find src/content -name "*.mdx" | wc -l so you have a number to validate the script against later.
Extract the contract. Ask the agent: “Read 8 random MDX files under src/content and write the frontmatter contract — which fields are required, which are optional, and the allowed values for lang, category, and subcategory.” Paste the result into CLAUDE.md / AGENTS.md.
Tighten the contract. The agent will mark some required fields as optional because it only saw a sample. Promote anything that is actually mandatory (urlSlug, translationKey, lang).
Generate the audit script with this prompt:

Write scripts/audit-content.mjs (ESM). Walk src/content/**/*.mdx,
parse each file's frontmatter with the `gray-matter` package, and flag:
- missing required fields per the contract above   -> BLOCKER
- duplicate urlSlug within the same lang            -> BLOCKER
- translationKey present in one lang, missing the   -> WARN
  matching lang
- draft: true on a non-WIP file                     -> WARN
- internal links (/en/articles/<slug>/ or /zh/...)  -> BLOCKER
  that don't resolve to an existing urlSlug
Print one finding per line, prefixed by severity, then a totals
block. Call process.exit(1) if any BLOCKER fired; exit 0 otherwise.

Run it and capture output: node scripts/audit-content.mjs | tee audit.txt.
Group by root cause. Feed audit.txt back: “Here are N findings. Group them by root cause and propose the smallest fix per group.” One renamed page often explains a dozen dead-link findings at once.
Patch in batches. For each BLOCKER group, have the agent write the actual change — a frontmatter edit, a file rename, or a link rewrite — then re-run the script after each batch. Stop if the count goes up instead of down.
Wire it into the build. Add to package.json:

"scripts": {
  "audit:content": "node scripts/audit-content.mjs",
  "prebuild": "node scripts/audit-content.mjs"
}

Now any future drift makes npm run build exit non-zero before a broken page is generated.

First-run exercise

Run the audit and fix nothing. Read the report cold.
Pick the single most common finding (usually translationKey mismatches or duplicate urlSlugs). Fix only that category, then re-run.
Confirm the count dropped by exactly the number you fixed. If it didn’t, the script has a bug — fix the script before touching more content.
Plant a known-bad case on purpose: temporarily set draft: true on a published article and confirm the audit catches it. If it doesn’t, the rule isn’t checking what you think it is.

Quality check

Can you reproduce a flagged finding with a manual grep? If grep for the slug contradicts the script, you have a false positive in the rule.
Are results deterministic across runs? Random ordering is fine; a random count is a bug.
Is it fast enough for prebuild? Under ~3 seconds for 1,000 files is normal — Astro 5’s content layer made Markdown builds up to 5x faster, but your audit runs outside that, so keep its I/O lean.
Does the totals block match find ... | wc -l for total files seen? A mismatch means the glob missed files.

How to reuse this workflow

Commit scripts/audit-content.mjs to version control. Don’t regenerate it each time — evolve it.
Add rules incrementally as you find new drift patterns (orphan tags, missing hreflang, oversized descriptions). The script compounds in value over the life of the site.
Run the audit before a big content batch, not only after, so you start from a known-clean state.
Once frontmatter and slugs are healthy, layer on a technical SEO checklist generated for your stack so render mode, hreflang, and schema get checked alongside content.

Common mistakes

Auditing by reading files manually. You’ll miss cases, and you can’t re-run it next quarter.
Applying all fixes at once. If the build breaks you can’t tell which fix did it. Batch and re-run.
Skipping the prebuild wiring. The same drift returns next month because nothing prevents it.
Letting the agent write rules with no examples. Give it 3-5 known-bad files; rules grown from real cases are sharper.
Treating every WARN as a BLOCKER. You’ll never ship. A translation still in progress is a WARN, not a build failure.
Re-running before committing fixes. Uncommitted state confuses the next grouping pass.

FAQ

Which AI works best for this?: Claude Code (Anthropic models) or OpenAI Codex CLI (GPT-5.5), both with direct file access. Codex sandboxes writes to the project directory by default; Claude Code prompts for approval before each write. Pure-chat models work but the copy-paste cost erodes the workflow.
Does this work on Astro 5 and 6?: Yes. The audit reads raw .mdx source, so it’s independent of Astro’s runtime. Just remember the config file is src/content.config.ts (renamed in Astro 5.0) and entries expose id, not the old slug.
How do I handle bilingual sites?: Run each lang as its own pass for slug uniqueness, then a final paired-key pass asserting every translationKey in /en has a match in /zh. Treat a missing pair as WARN while a translation is in progress, BLOCKER once it ships.
What about non-MDX content?: Same workflow, different glob. JSON content files, plain Astro Markdown, and YAML data files all benefit from the contract-then-script approach — point gray-matter (or JSON.parse) at the right files.
How often should I run this?: It runs on every build via prebuild. Run it manually before any large content import and after any content-schema refactor.
My audit script hits a memory limit on 1,000+ files.: Stream the file list and read files one at a time instead of loading every body into memory, or split the audit into per-category passes. Raising Node’s heap with NODE_OPTIONS=--max-old-space-size=8192 is a stopgap, not a fix.

Tags: #Tutorial #SEO #AI coding #Astro #Audit

TL;DR

What this covers

Who this is for

When to reach for it

Pick the right AI agent

Before you start

Step by step

First-run exercise

Quality check

How to reuse this workflow

Common mistakes

FAQ

Related

Related Articles

AI Content Refresh Tutorial: Bring Stale Posts Back to Top-3

AI Internal Link Graph Tutorial: Ship 20 Bridges in an Afternoon

AI International SEO Tutorial: Hreflang, Locale, Currency

AI Keyword Cannibalization Fix: Merge or Differentiate

AI AdSense Readiness Review: Pre-Audit Before You Apply

Use AI to Find Broken Links Before Google Does