Site QA with AI — Broken Links, Missing Tags, Thin Pages

Catch the boring site-health issues with a real checklist: copy-paste AI prompts, shell verifiers, and a CI script that fails builds on critical failures.

Most site bugs are not interesting — they are missing alt tags, dead internal links, and pages with 200 words that should not have shipped. AI is excellent at finding these if you give it a real checklist and you cross-check what it returns. Below are the exact prompts and the shell verifiers that prove the findings.

Background

Site QA is the part of indie-dev that nobody is excited about, which is exactly why it gets skipped. An AI agent reading your built output against a known checklist takes 10 minutes and catches what a human reviewer would miss across 200 pages. The key is making the checklist concrete: “find pages under 400 words” is useful; “review my site” is not.

How to tell

  • You have not run a full site check in 60+ days.
  • You restructured routes, renamed slugs, or migrated content recently.
  • Search Console reports increasing coverage issues.
  • You added new pages quickly and suspect some shipped with bugs.

Quick verdict

Build, point the AI at dist/, ask six narrow questions, verify a sample with shell, fix in batches.

Before you start

  • Local dist/ build available.
  • Codex / Claude Code with file-read.
  • grep, linkinator, xmllint, jq installed.

Step by step

  1. Build locally:
npm run build
ls dist/   # confirm pages
  1. Broken-link QA. Prompt:
[CONTEXT] Build output is dist/ (Astro static). Internal links look like /en/articles/<slug>/ or /zh/articles/<slug>/.
[TASK] List every internal link in dist/**/*.html whose href points to a path that does NOT have a corresponding index.html in dist/.
Output: file, broken_href

Shell verifier (canonical source of truth):

# Build set of live URLs
find dist -name 'index.html' | sed 's|dist||;s|/index.html|/|' | sort > /tmp/live-urls.txt
# Extract internal hrefs from every page
grep -RhoE 'href="(/[a-z]+/articles/[a-z0-9-]+/)"' dist | sed 's/href="//;s/"//' \
  | sort -u > /tmp/used-urls.txt
# Used minus live = broken
comm -23 /tmp/used-urls.txt /tmp/live-urls.txt | head
  1. Missing alt tags. Prompt:
[TASK] In dist/**/*.html, list every <img> tag that lacks an alt attribute or has alt="".
Output: file, line_excerpt

Verifier:

grep -RHn '<img[^>]*>' dist | grep -v 'alt="[^"]\+"' | head
  1. Thin pages. Prompt:
[TASK] For every dist/**/index.html, extract the visible body text (strip nav, header, footer, scripts) and count words.
List pages with body word count < 400.
Output: file, word_count
  1. Orphan detection. Prompt:
[TASK] For every article page in dist/, count incoming internal links from other pages in dist/.
List pages with 0 incoming internal links (orphans).
Output: file, incoming_count

Verifier:

# Pages never referenced as a link target
for url in $(cat /tmp/live-urls.txt); do
  count=$(grep -RlE "href=\"$url\"" dist | grep -v "dist$url" | wc -l)
  [ "$count" -eq 0 ] && echo "ORPHAN: $url"
done | head
  1. Title sanity. Prompt:
[TASK] For every dist/**/*.html, list pages where <title> is:
  - missing or empty
  - > 65 characters
  - duplicated across pages
Output: file, issue, title_text
  1. Frontmatter consistency on source. A 10-line check:
// scripts/frontmatter-consistency.mjs
import { readdirSync, readFileSync } from 'node:fs';
import matter from 'gray-matter';
const REQUIRED = ['title', 'description', 'urlSlug', 'category', 'tags',
                  'publishedAt', 'lang', 'translationKey'];
for (const lang of ['en', 'zh']) {
  for (const cat of readdirSync(`src/content/articles/${lang}`)) {
    for (const f of readdirSync(`src/content/articles/${lang}/${cat}`)) {
      if (!f.endsWith('.mdx')) continue;
      const { data } = matter(readFileSync(`src/content/articles/${lang}/${cat}/${f}`, 'utf8'));
      const missing = REQUIRED.filter((k) => data[k] === undefined || data[k] === '');
      if (missing.length) console.log(`${lang}/${cat}/${f}: missing ${missing.join(',')}`);
    }
  }
}
  1. Open one issue per category. Wire the critical ones into CI to fail builds:
# .github/workflows/qa.yml (excerpt)
- name: QA gates
  run: |
    node scripts/frontmatter-consistency.mjs    # always
    npx linkinator dist --skip 'http' --silent  # internal-link gate
    node scripts/audit-pillars.mjs              # orphan gate

Implementation checklist

  • AI prompts target dist/, one concern each.
  • Shell verifiers cross-check at least 3 of the 6 categories.
  • Frontmatter consistency runs in CI.
  • Broken-link gate fails the build on any new break.
  • Issues are opened per category, not per file.

After-launch verification

  • A re-run of each prompt after fixes returns 0 (or near-0) results.
  • Search Console coverage issues trend down 4-8 weeks later.
  • Lighthouse SEO + Accessibility hit 100 on samples.

Common pitfalls

  • Running the QA on source instead of build output. Many bugs only appear after rendering — missing meta from null frontmatter, broken :key interpolations, etc.
  • Trusting “looks fine to me” from the AI. Always ask for explicit lists. If the prompt does not produce a list, you cannot verify it.
  • Fixing items inline without tracking. You will repeat the same bugs in a month.
  • Skipping the orphan check. Orphan pages are common after restructures and tank ranking quietly.
  • Treating word count alone as a quality signal. A 200-word page with a clear answer is fine; the QA flag is a starting point, not a verdict.

FAQ

  • Can I automate this in CI?: Yes. Build a script that emits the lists and fails the build on critical categories (broken internal links, missing canonicals). AI is best for the first pass; once you know the categories, codify them.
  • What about external link rot?: Use a dedicated link checker (e.g., lychee or linkinator) for external links — AI is overkill and slower.
  • How often should I run a full QA?: Monthly for active sites, quarterly otherwise. Always after a major change.
  • Are missing alt tags really a ranking issue?: Indirectly — they hurt accessibility and image search. Fix them, but do not panic.
  • AI counts 23 broken links but my verifier counts 18 — which to trust?: Always the verifier. Use the AI’s list to investigate, your script to gate the build.

Tags: #Indie dev #AI-assisted build #Workflow #Technical SEO