Use Codex to Review Your Sitemap

Use Codex to spot-check sitemap correctness, freshness, and coverage.

What this covers

After any meaningful site change — a URL restructure, a language addition, a content migration — your sitemap quietly drifts. Draft pages sneak in, lastmod dates stop reflecting reality, trailing-slash conventions split between sections, and Search Console starts complaining a week later. This walkthrough uses Codex as a focused auditor against the sitemap source and build output, with a prompt template that catches the common failures and reports them as a checklist you can act on.

Who this is for

Owners of any generated sitemap.xml: Astro sites, Next.js sites, Hugo, WordPress with a plugin, custom generators. Especially useful for bilingual or multi-language sites where hreflang and language-specific sitemaps multiply the failure modes.

When to reach for it

After major site changes: URL restructure, language addition, content migration, big content prune, or a category rename. Also useful as a quarterly hygiene pass even when nothing big has changed — drift accumulates silently.

Before you start

  • Confirm you have a build step that produces the sitemap in dist/ (or your build output directory). Codex needs to read the output, not just the source.
  • Identify the source files that generate the sitemap. For Astro, usually src/pages/sitemap-*.ts or an astro-sitemap config. For Next.js, app/sitemap.ts or next-sitemap.config.js.
  • Decide your trailing-slash convention up front. Mixed conventions are the single biggest source of indexing chaos.
  • Know your draft/noindex policy. Drafts must never appear in the sitemap.

Step by step

  1. Open Codex in your repo. Run a fresh build so dist/sitemap-index.xml reflects the current source.
  2. Send a focused review prompt that names exact files and rules:
Read src/pages/sitemap-*.ts and dist/sitemap-index.xml (after build).

Verify and report findings:
- Every category page is included for every language.
- Every article URL ends with a trailing slash (project convention).
- No draft URLs appear (filter by frontmatter `draft: false`).
- lastmod dates are within the last 90 days for active pages.
- hreflang pairs exist between en/zh translations where translationKey matches.
- No 404 URLs (sample 10 random URLs and verify the source file exists).

For each finding, cite the file and line in source, not in dist/. Suggest the source change, not a dist edit.
  1. Codex returns a list of anomalies. Read every finding; some will be false positives (especially around hreflang heuristics).
  2. Fix in source — the sitemap generation logic, frontmatter, or content audit script — never in dist/. Edits to dist get overwritten on the next build.
  3. Re-run build, then re-run Codex with the same prompt. The findings should shrink to zero or a documented set of intentional exceptions.
  4. Commit the source fixes with a clear message: “fix(sitemap): drop drafts, normalize trailing slashes.”

First-run exercise

  1. Pick the easiest category to verify: the homepage and one well-known section.
  2. Run the prompt scoped to just those URLs first. Confirms Codex understands your source structure before the full sweep.
  3. Run the full prompt. Count the findings. A first-run audit on a year-old site typically produces 10-30 anomalies.
  4. Fix the top three. Re-run. Watch the count drop. The remaining items become a follow-up ticket.

Quality check

  • Did every finding cite a source file, not just a dist URL? If not, Codex did not fully understand the build pipeline.
  • Are the suggested fixes idempotent — would they break anything if applied to an already-correct file? If yes, refine the suggestion.
  • Spot-check 5 random URLs in the sitemap by opening them in a browser. 404s, redirects, or wrong-language pages are red flags.
  • Verify hreflang pairs in Search Console after the next crawl. Codex catches structural issues; Search Console confirms search-engine behavior.

How to reuse this workflow

  • Save the prompt as a snippet, with named placeholders for project-specific paths. Run quarterly, after any URL change, and after any content cleanup.
  • Add a CI step that builds the sitemap and runs a simplified version of the checks (file count, draft filter, trailing slash). Codex catches the rest.
  • Maintain a sitemap-conventions.md documenting your trailing-slash rule, draft policy, and hreflang setup. Codex reads it.
  • Re-test after major Astro/Next.js/CMS upgrades — sitemap generation often shifts.

Build → Codex prompt with full checklist → triage findings → fix in source → rebuild → re-verify → commit → submit sitemap to Search Console.

Common mistakes

  • Editing dist/sitemap.xml directly. The next build overwrites your fix.
  • Not specifying trailing-slash convention. Codex will report mixed conventions but cannot tell which side is canonical.
  • Including draft or noindex pages in the sitemap. Search engines crawl them, then complain when they 404 or noindex.
  • Stale lastmod dates. If every URL says the same lastmod, crawlers stop trusting the field.
  • Missing hreflang pairs for translated content. Search Console flags these but Codex catches them earlier.
  • Forgetting to resubmit the sitemap to Search Console after a fix. Google may not recrawl on its own schedule.

FAQ

  • What if my site has tens of thousands of URLs?: Sample. Ask Codex to verify rules on a random 100 URLs and report patterns rather than every URL.
  • Can Codex actually fix the sitemap?: Yes — let it open a PR with source-level fixes. Review like any other PR.
  • What about robots.txt and meta noindex?: Adjacent concerns. Worth a separate audit prompt; do not mix into the sitemap one.
  • My sitemap is generated by a CMS plugin — can Codex still help?: Yes, it reviews the output and the config. Source fixes go in the plugin config, not the generated XML.

Tags: #Tutorial #SEO #AI coding #Codex #Sitemap