Article Count Looks Big But Real Coverage Is Weak

"We have 800 articles!" — but 60% are bilingual duplicates, drafts, or thin redirects. Count what matters: unique, indexable, substantive URLs.

You publish “800 articles!” on your homepage. Investors love it. AdSense reviewers count actual indexable substantive URLs and see 280. The gap isn’t lying — it’s metric confusion. Bilingual pages double the count without doubling content. Drafts, redirects, archive pages, and tag pages inflate the number further. The 800 figure isn’t wrong by your definition; it’s just not the number anyone evaluating your site uses.

Article count is one of the easiest metrics to inflate accidentally, and one of the worst to optimize against. Below: how to compute the honest count, decide what to do about the gap, and pick metrics that actually correlate with site value.

Common causes

Ordered by hit rate, highest first.

1. Counting both languages as separate articles

Your articles/en/post.mdx and articles/zh/post.mdx are one article in two languages, not two articles. Counting them as two doubles your number without doubling content.

How to spot it: Total file count vs unique translationKey count. If they differ by 2x, you’re double-counting languages.

2. Including drafts, redirects, or noindexed pages

draft: true files, 301 redirects you haven’t cleaned up, noindex pages, archive listing pages — none of these are “articles” by SEO or AdSense definition.

How to spot it: grep -l "draft: true" articles/**/*.mdx | wc -l plus checks for noindex and redirect entries. Subtract these from your headline number.

3. Auto-generated category / tag pages inflating the number

If your stack auto-generates a page for every tag, you might claim 800 articles when 600 are tag/category pages. AdSense calls those “navigation,” not “content.”

How to spot it: List pages by template/type. Article-template pages vs category/tag-template pages should be tracked separately.

4. Stub / one-paragraph “articles” counted as full content

You wrote 50 SEO-targeting stubs (“What is X?” with 80 words) to capture long-tail queries. They count as articles in your file system; AdSense calls them “low value.”

How to spot it: Word count distribution. If a meaningful fraction of articles is <300 words, those stubs inflate the count without contributing substance.

5. Old re-prints / syndicated content from another site

You imported 100 articles from a defunct partner blog years ago. They’re still there, technically articles, but unoriginal and aged out. They inflate count without contributing trust.

How to spot it: Articles with no edits in 3+ years, attribution to former staff/partners. Could be quality stuck-in-time or could be import-and-forget.

6. Translations done by machine without review

You ran 400 English articles through Google Translate to get 400 “Chinese articles.” Total: 800. AdSense and Google both detect machine-translated content; it’s deprioritized.

How to spot it: Read 5 random ZH articles. If they have telltale machine-translation phrasing (literal idioms, awkward word order), they aren’t really articles for the ZH audience.

Shortest path to fix

Ordered by ROI. Step 1 reveals the gap; the rest decide what to do about it.

Step 1: Compute the honest count

Run an audit script:

# Total MDX files
find src/content/articles -name "*.mdx" | wc -l

# Unique translationKeys
grep -h "^translationKey:" src/content/articles/**/*.mdx | sort -u | wc -l

# Non-draft, non-noindex
grep -l "draft: false" src/content/articles/**/*.mdx \
  | xargs grep -L "noindex: true" 2>/dev/null \
  | wc -l

# Substantive (>500 words)
for f in src/content/articles/en/**/*.mdx; do
  wc -w "$f" | awk '$1 > 500 {print $2}'
done | wc -l

Result: total file count, unique articles, publishable count, substantive count. Pick which one is “real” for your context (usually substantive count).

Step 2: Update public claims to match the honest number

Homepage / about / pitch deck:

Bad: "800 articles"
Good: "280 in-depth guides across 12 topics"
Best: "280 guides, updated within the last 12 months"

Honest numbers don’t shrink credibility — they grow it once people compare against the inflated competitors who turn out to have less.

Step 3: Decide per gap source: upgrade or prune

For each inflation source:

| Source | Action |
|---|---|
| Stubs (<300 words) | Upgrade or merge into parent topic |
| Auto-translated ZH | Human review or noindex / delete |
| Tag pages | Keep but exclude from article count |
| Imported old content | Audit each; prune the rotted |
| Stale drafts | Publish or delete; no "someday maybe" |

The goal isn’t always more articles — sometimes fewer, better articles raise quality.

Step 4: Stop using article count as a KPI

Replace with metrics that correlate with value:

| Metric | What it measures |
|---|---|
| Indexed URLs in GSC | Google considers them valuable |
| Avg position improvement | Content rank trend |
| Top-10 rankings count | Queries you actually win |
| Click-through on indexed | Whether SERPs draw clicks |
| Avg time on page | Content depth signal |

Count is vanity; rankings + clicks are reality.

Step 5: Define metric per-language

Track separately:

- en/ unique articles: 280
- zh/ unique articles: 230
- en/ articles with no zh counterpart: 50 (translation debt)
- zh/ articles with no en counterpart: 0

Per-language tracking surfaces translation debt and lets each language have its own quality bar.

Step 6: Audit metric definitions quarterly

Definitions drift. “Article” might mean different things to ops, marketing, SEO. Pin definitions:

# metric-definitions.md

## "Article" =
- Lives in `src/content/articles/`
- `draft: false`
- Not `noindex`
- ≥500 words (en) / ≥400 chars (zh)
- Has a `translationKey`

Pin this once; everyone’s “article count” is now the same number.

Prevention

  • Define “article” once in metric-definitions.md; everyone reports against the same definition
  • Track total URLs, indexable URLs, per-language originals separately — don’t conflate
  • Audit metric definitions quarterly; drift is the default
  • Replace article count as a KPI with indexed-URL count + ranking distribution
  • Don’t double-count bilingual pages; one translationKey = one article
  • Prune or upgrade stubs and machine-translated content; counting them inflates without informing

Tags: #Content ops #Site quality #Site audit #Troubleshooting #Site quality