Codex is very good at reading your repo and spotting SEO bugs you would otherwise find six months later in Search Console. The trick is asking it to check specific things, not “review my SEO.” Below are the exact prompts to send, plus the shell scripts that verify what the agent reports.
Background
Technical SEO is the part of indie-dev that is genuinely boring but compounds heavily — one missing canonical or a broken hreflang and you bleed traffic forever. An agent like Codex can scan your build output and source in minutes and flag the systematic mistakes (inconsistent canonicals, missing alt text, wrong sitemap entries) that a human reviewer would skip. What it cannot do is judge intent — whether your H1 actually matches the search intent — so the workflow is: agent finds mechanical issues, you fix the judgment calls.
How to tell
- Your site has more than 20 pages and you have not done a formal SEO audit.
- You see indexed pages drop in Search Console without an obvious cause.
- You launched recently and want to catch issues before they compound.
- You migrated stacks or restructured routes and want to verify nothing broke.
Quick verdict
Build locally, point the agent at dist/, ask narrow questions per SEO concern, verify a sample with grep or curl, then treat the rest as a worklist.
Before you start
- A local
dist/build that represents production output. - Codex / Claude Code / similar with file-read access to the repo.
grepandxmllint(orjqfor JSON-LD) installed for verification.
Step by step
- Build locally and aim the agent at
dist/, not source:
npm run build
ls dist/ # confirm pages exist
du -sh dist/ # sanity check size
- Canonical tag audit. Prompt:
[CONTEXT] Astro static site; build output is in dist/. Each page is index.html under a slug directory.
[TASK] Walk dist/ recursively. For every *.html, find <link rel="canonical">. Report:
- any file MISSING the tag
- any file with a canonical href that does NOT match its own URL path
- any file with MULTIPLE canonical tags
Output as CSV: file,issue,detail
[CONSTRAINTS] Do not modify any files. Read only.
Verify the report with grep:
# count pages without canonical
grep -L 'rel="canonical"' $(find dist -name '*.html') | wc -l
# files whose canonical does not contain "yourdomain.com"
grep -ROIL '<link rel="canonical" href="https://yourdomain.com' dist | head
- hreflang audit. Prompt:
[TASK] For every dist/**/index.html that has hreflang tags, verify:
- the page references its own URL via hreflang="<its-lang>"
- the page references its translation via hreflang="<other-lang>"
- the other-lang URL actually exists in dist/
Report mismatches as CSV: file,expected_other,actual_other_or_missing
- Title / meta description audit. Prompt:
[TASK] For every dist/**/index.html, extract <title> and <meta name="description">.
Report pages where:
- title is missing or empty
- description is missing or empty
- title length < 25 or > 65 characters
- description length < 80 or > 170 characters
- duplicate title or description appears on more than one page
Output: file,issue,title,desc_len
Sanity check:
# duplicate titles
grep -hr '<title>' dist | sort | uniq -c | awk '$1 > 1' | head
- Sitemap diff. Prompt:
[TASK] Parse dist/sitemap-index.xml (and any referenced sitemap files).
Compare URLs in the sitemap against the set of *.html files actually present in dist/.
Report:
- URLs in sitemap with no corresponding file
- HTML files NOT in any sitemap (potential indexing leak)
You can cross-check with:
xmllint --xpath '//*[local-name()="loc"]/text()' dist/sitemap*.xml \
| sort > /tmp/sitemap-urls.txt
find dist -name 'index.html' | sed 's|dist|https://yourdomain.com|' | sed 's|/index.html|/|' \
| sort > /tmp/file-urls.txt
diff /tmp/sitemap-urls.txt /tmp/file-urls.txt
- Structured-data validator. Prompt:
[TASK] Find every <script type="application/ld+json"> block in dist/**/*.html.
For each, parse the JSON. Validate against schema.org Article and BreadcrumbList minimal required fields:
Article: @context, @type, headline, datePublished, author
BreadcrumbList: @context, @type, itemListElement (array)
Report any parse errors or missing required fields.
Output: file, ld_type, problem
Verify a sample:
# extract first JSON-LD block from a page and pretty-print
sed -n '/<script type="application\/ld+json">/,/<\/script>/p' \
dist/en/articles/some-slug/index.html \
| sed '1d;$d' | jq .
-
Spot-check 5 findings. If 5/5 are real bugs, trust the rest as a worklist; if 2/5, narrow the prompt and rerun.
-
Open one tracked issue per category, not per file — keeps the cleanup focused:
Issue: 23 articles missing canonical tag in dist/
- See attached CSV
- Fix in ArticleLayout.astro and rebuild
- Re-run audit prompt 2 to verify zero remaining
Implementation checklist
- Build output (
dist/) is what gets reviewed, not source. - Each prompt asks about one specific SEO concern, not “everything”.
- Findings cross-checked with
grep,xmllint, orjqbefore treating as truth. - Findings are tracked as issues, not fixed inline.
- The same prompts are saved for re-running after each major change.
After-launch verification
- Re-running the same prompts after fixes returns an empty (or much shorter) list.
- Search Console URL Inspection on samples shows canonical, hreflang as expected.
- Lighthouse SEO score = 100 on at least 3 sample articles.
Common pitfalls
- Asking “is my SEO good?” — generic answer that misses your actual bugs. Always ask about a specific tag, file, or route.
- Trusting the agent on search intent or keyword strategy. It does not see your Search Console data and will make plausible-but-wrong suggestions.
- Running it on source code instead of built HTML. Many SEO issues only appear post-build (e.g., empty meta tags from undefined frontmatter).
- Letting it auto-fix issues without a diff. Have it propose patches; review and apply manually.
- Skipping the
grep/xmllintverification step — agents do hallucinate, especially on file counts.
FAQ
- Codex or Claude Code — does it matter?: Either works. The key is that the agent can read files in your repo. For pure HTML inspection, even ChatGPT with file uploads can do a one-shot review.
- Can it replace tools like Screaming Frog?: Not really. Crawlers find issues across links and redirects systematically; an agent finds template-level bugs. Use both.
- What about Core Web Vitals?: Agents are bad at performance audits — use PageSpeed Insights and real Lighthouse runs. Code review can spot obvious issues (huge unoptimized images, blocking scripts) but cannot replace runtime measurement.
- How often should I rerun this?: After any structural change (new layout, new route pattern, content schema update), and otherwise quarterly.
- Can the agent generate the fix as well?: Yes, but ask for a diff, not direct file edits, so you can review.
Related
- Using Claude Code to Build a Content Site End-to-End
- Site QA with AI — Broken Links, Missing Tags, Thin Pages
- Submit Website to Google
- Run site content audit
- Find content gaps with AI
Tags: #Indie dev #AI-assisted build #Codex #SEO #Technical SEO