Most “content audit” articles online describe an audit so heavy you do it once and never again. For an indie site, you need a lighter audit you can actually run every quarter — backed by scripts so the work is mostly automated. The goal isn’t perfection; it’s catching the regressions before they compound.
Background
A content audit is a join: your URL inventory plus Search Console data plus a few heuristics. If you write the join and heuristics as scripts, every subsequent audit takes hours instead of days. This article gives you the scripts.
How to tell
- Your last audit was more than 6 months ago (or never).
- Search Console “Submitted vs Indexed” ratio is below 90%.
- You have 100+ articles and you can no longer say what is on the site without looking.
- Internal link checker has not been run in months.
- You suspect you have duplicates but cannot point to them.
Quick verdict
Run a light audit every quarter, not a heavy one yearly. Frequent small audits surface issues while they’re still cheap to fix.
Before you start
- Search Console API access (OAuth) — without it the audit is mostly guessing.
- Content collection or other file-based content layer.
- A spreadsheet or CSV format you will reuse — the audit becomes a baseline each time.
Step by step
- Generate the URL inventory. A 30-line Node script:
// scripts/audit-step1-inventory.mjs
import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
import matter from 'gray-matter';
const rows = [];
for (const lang of ['en', 'zh']) {
for (const cat of readdirSync(`src/content/articles/${lang}`)) {
for (const f of readdirSync(`src/content/articles/${lang}/${cat}`)) {
if (!f.endsWith('.mdx')) continue;
const { data, content } = matter(readFileSync(`src/content/articles/${lang}/${cat}/${f}`, 'utf8'));
rows.push({
url: `https://yourdomain.com/${lang}/articles/${data.urlSlug}/`,
lang, category: cat,
slug: data.urlSlug,
title: data.title,
primaryKeyword: data.primaryKeyword || '',
publishedAt: data.publishedAt,
words: content.split(/\s+/).length,
});
}
}
}
writeFileSync('audit-inventory.csv',
'url,lang,category,slug,title,primaryKeyword,publishedAt,words\n' +
rows.map(r => Object.values(r).map(v => `"${v}"`).join(',')).join('\n'));
- Pull 28-day Search Console data and join. Per URL: impressions, clicks, average position:
curl -X POST "https://www.googleapis.com/webmasters/v3/sites/$SITE/searchAnalytics/query" \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
--data '{
"startDate":"2026-04-22","endDate":"2026-05-22",
"dimensions":["page"],"rowLimit":25000
}' \
| jq -r '.rows[] | [.keys[0],.clicks,.impressions,.position] | @csv' \
> gsc-28d.csv
Join with a 5-line awk or Python:
python3 -c "
import csv
gsc = {r[0]:r for r in csv.reader(open('gsc-28d.csv'))}
out = csv.writer(open('audit-joined.csv','w'))
for row in csv.reader(open('audit-inventory.csv')):
url = row[0]; m = gsc.get(url, ['','0','0','0'])
out.writerow(row + m[1:4])
"
- Flag dead pages. Live > 90 days with zero impressions:
awk -F, 'NR>1 && $9==0 && $10==0 {print $4, $5}' audit-joined.csv \
| awk -F'"' '{print $2}'
# slugs needing decision (merge / refresh / noindex / delete)
- Flag near-rankers. Position 8-20 with impressions > 100:
awk -F, 'NR>1 && $11>=8 && $11<=20 && $10>100' audit-joined.csv \
| sort -t, -k10 -rn | head -30
# refresh list, sorted by impressions
- Flag duplicates. Group by primary keyword:
awk -F, 'NR>1 {print $7}' audit-joined.csv | sort | uniq -c \
| awk '$1 > 1' | sort -rn
# any count > 1 = duplicate-intent group
- Run a broken-link checker over the built site. Use linkinator or your own walker:
npx linkinator https://yourdomain.com \
--recurse --concurrency 5 --skip 'http(s)?://[^/]+/$' \
--format CSV > linkinator-report.csv
awk -F, '$2 != "200"' linkinator-report.csv | head
- Flag thin pages. Word count < 400 with no special reason:
awk -F, 'NR>1 && $8<400 {print $4, $8}' audit-joined.csv
- Write decisions back to the CSV. Add a
decisioncolumn with values likekeep,refresh,merge:<target-slug>,noindex,delete. Commit the CSV; it is the baseline for next quarter.
Implementation checklist
- All scripts live in
scripts/and are runnable withnpm run audit. - Inventory CSV is regenerated from the file system, not maintained by hand.
- Search Console pull uses 28-day window consistently.
- Decisions are recorded in the CSV before any actual content changes.
- A diff between this quarter’s CSV and last quarter’s is reviewable.
After-launch verification
- After 4-8 weeks, Search Console Pages indexed count rises (dead pages either fixed or removed).
- Re-running the audit produces a shorter list of dead pages.
- Linkinator reports zero non-200 internal links on the latest build.
Common pitfalls
- Auditing without writing decisions down. By next quarter you’ll re-discover the same problems.
- Refusing to retire anything. The audit becomes a list of “things to fix someday” instead of decisions.
- Trying to fix everything in one sitting. Spread fixes over the next few weeks; the audit is the diagnosis, not the surgery.
- Skipping the audit because “things look fine”. Search Console always shows surprises.
- Using Google Sheets when a CSV in the repo would be diff-able and scriptable.
FAQ
- How long should this take?: 4-8 hours for 200-500 articles the first time. Half that on the next round once the tooling is in place.
- Can I do this with AI?: Yes for triage (flagging candidates) but human review for retire/keep decisions. AI is bad at judging article context across an interconnected site.
- What if I have no Search Console data?: Connect Search Console first and wait 28 days. Auditing without impression data is mostly guessing.
- How aggressive should I be with retiring?: On a healthy site, retiring 5-10% of articles per audit is normal. Retiring 30%+ in one pass suggests deeper problems with the original content strategy.
- What about translated content?: Audit each language separately. Different markets ≠ duplicates.
Related
- Managing a content site after 1,000 articles
- When to refresh old articles
- Site QA with AI
- Avoid content duplication when scaling fast
- SEO review with Codex
Tags: #Indie dev #Content ops #SEO #Website planning #Technical SEO #Workflow