Most site bugs are not interesting — they are missing alt tags, dead internal links, and pages with 200 words that should not have shipped. AI is excellent at finding these if you give it a real checklist and you cross-check what it returns. Below are the exact prompts and the shell verifiers that prove the findings.
Background
Site QA is the part of indie-dev that nobody is excited about, which is exactly why it gets skipped. An AI agent reading your built output against a known checklist takes 10 minutes and catches what a human reviewer would miss across 200 pages. The key is making the checklist concrete: “find pages under 400 words” is useful; “review my site” is not.
How to tell
- You have not run a full site check in 60+ days.
- You restructured routes, renamed slugs, or migrated content recently.
- Search Console reports increasing coverage issues.
- You added new pages quickly and suspect some shipped with bugs.
Quick verdict
Build, point the AI at dist/, ask six narrow questions, verify a sample with shell, fix in batches.
Before you start
- Local
dist/build available. - Codex / Claude Code with file-read.
grep,linkinator,xmllint,jqinstalled.
Step by step
- Build locally:
npm run build
ls dist/ # confirm pages
- Broken-link QA. Prompt:
[CONTEXT] Build output is dist/ (Astro static). Internal links look like /en/articles/<slug>/ or /zh/articles/<slug>/.
[TASK] List every internal link in dist/**/*.html whose href points to a path that does NOT have a corresponding index.html in dist/.
Output: file, broken_href
Shell verifier (canonical source of truth):
# Build set of live URLs
find dist -name 'index.html' | sed 's|dist||;s|/index.html|/|' | sort > /tmp/live-urls.txt
# Extract internal hrefs from every page
grep -RhoE 'href="(/[a-z]+/articles/[a-z0-9-]+/)"' dist | sed 's/href="//;s/"//' \
| sort -u > /tmp/used-urls.txt
# Used minus live = broken
comm -23 /tmp/used-urls.txt /tmp/live-urls.txt | head
- Missing alt tags. Prompt:
[TASK] In dist/**/*.html, list every <img> tag that lacks an alt attribute or has alt="".
Output: file, line_excerpt
Verifier:
grep -RHn '<img[^>]*>' dist | grep -v 'alt="[^"]\+"' | head
- Thin pages. Prompt:
[TASK] For every dist/**/index.html, extract the visible body text (strip nav, header, footer, scripts) and count words.
List pages with body word count < 400.
Output: file, word_count
- Orphan detection. Prompt:
[TASK] For every article page in dist/, count incoming internal links from other pages in dist/.
List pages with 0 incoming internal links (orphans).
Output: file, incoming_count
Verifier:
# Pages never referenced as a link target
for url in $(cat /tmp/live-urls.txt); do
count=$(grep -RlE "href=\"$url\"" dist | grep -v "dist$url" | wc -l)
[ "$count" -eq 0 ] && echo "ORPHAN: $url"
done | head
- Title sanity. Prompt:
[TASK] For every dist/**/*.html, list pages where <title> is:
- missing or empty
- > 65 characters
- duplicated across pages
Output: file, issue, title_text
- Frontmatter consistency on source. A 10-line check:
// scripts/frontmatter-consistency.mjs
import { readdirSync, readFileSync } from 'node:fs';
import matter from 'gray-matter';
const REQUIRED = ['title', 'description', 'urlSlug', 'category', 'tags',
'publishedAt', 'lang', 'translationKey'];
for (const lang of ['en', 'zh']) {
for (const cat of readdirSync(`src/content/articles/${lang}`)) {
for (const f of readdirSync(`src/content/articles/${lang}/${cat}`)) {
if (!f.endsWith('.mdx')) continue;
const { data } = matter(readFileSync(`src/content/articles/${lang}/${cat}/${f}`, 'utf8'));
const missing = REQUIRED.filter((k) => data[k] === undefined || data[k] === '');
if (missing.length) console.log(`${lang}/${cat}/${f}: missing ${missing.join(',')}`);
}
}
}
- Open one issue per category. Wire the critical ones into CI to fail builds:
# .github/workflows/qa.yml (excerpt)
- name: QA gates
run: |
node scripts/frontmatter-consistency.mjs # always
npx linkinator dist --skip 'http' --silent # internal-link gate
node scripts/audit-pillars.mjs # orphan gate
Implementation checklist
- AI prompts target
dist/, one concern each. - Shell verifiers cross-check at least 3 of the 6 categories.
- Frontmatter consistency runs in CI.
- Broken-link gate fails the build on any new break.
- Issues are opened per category, not per file.
After-launch verification
- A re-run of each prompt after fixes returns 0 (or near-0) results.
- Search Console coverage issues trend down 4-8 weeks later.
- Lighthouse SEO + Accessibility hit 100 on samples.
Common pitfalls
- Running the QA on source instead of build output. Many bugs only appear after rendering — missing meta from null frontmatter, broken
:keyinterpolations, etc. - Trusting “looks fine to me” from the AI. Always ask for explicit lists. If the prompt does not produce a list, you cannot verify it.
- Fixing items inline without tracking. You will repeat the same bugs in a month.
- Skipping the orphan check. Orphan pages are common after restructures and tank ranking quietly.
- Treating word count alone as a quality signal. A 200-word page with a clear answer is fine; the QA flag is a starting point, not a verdict.
FAQ
- Can I automate this in CI?: Yes. Build a script that emits the lists and fails the build on critical categories (broken internal links, missing canonicals). AI is best for the first pass; once you know the categories, codify them.
- What about external link rot?: Use a dedicated link checker (e.g., lychee or linkinator) for external links — AI is overkill and slower.
- How often should I run a full QA?: Monthly for active sites, quarterly otherwise. Always after a major change.
- Are missing alt tags really a ranking issue?: Indirectly — they hurt accessibility and image search. Fix them, but do not panic.
- AI counts 23 broken links but my verifier counts 18 — which to trust?: Always the verifier. Use the AI’s list to investigate, your script to gate the build.
Related
- Doing an SEO Review of Your Site with Codex
- Using AI to Review and Improve Existing Articles
- Finding Content Gaps with AI
- Run site content audit
- Pillar cluster page structure
Tags: #Indie dev #AI-assisted build #Workflow #Technical SEO