Building a Markdown / MDX Content Site That Scales

Structure a Markdown / MDX content site to scale from 50 to 1000 articles. Includes Content Collections schema, component map, and link-check script.

Markdown is easy. Markdown at 500 articles is hard. The fix is to put the boring rules in place before article 50, not after — a strict frontmatter schema, a central component map, and a CI link checker. The configs below are the ones that actually keep a site healthy at scale.

Background

Almost every Astro content site eventually hits the same scaling problems: inconsistent frontmatter, slug collisions, components used differently in every article, image paths that drift. These are not bugs in Markdown — they are bugs in the workflow around it. A few light conventions, enforced by code, prevent most of them.

Markdown vs MDX: what each gives you

The first scaling decision is which format authors actually write in. The two look similar; what they enable is very different.

Markdown

Pure text. Headings, lists, links, code blocks, images. That’s the whole surface area.

  • Pros: portable everywhere — Ghost, WordPress, Substack, Notion all consume it. Easy to grep, diff, and merge in git. Readable as plain text without rendering. LLMs rarely break it.
  • Cons: no components, no embedded interactivity, no logic. Anything beyond typography lives outside the file.

MDX

Markdown plus JSX, imports, and components. You can embed React (or your framework’s) components directly inside posts.

  • Pros: call-outs, charts, interactive embeds, custom shortcodes, design-system reuse inside content. Lets the content layer share visual language with the rest of the site.
  • Cons: not portable to plain CMS — only MDX-aware builders consume it. Harder to grep and lint. Build pipeline required. LLMs sometimes break MDX by escaping braces incorrectly, which fails the build (this site has hit that bug repeatedly — see fix-mdx-braces.mjs in the repo).

One-line picker

If you only need typography, use Markdown. If you need to embed components in posts, use MDX.

How to tell

  • You plan more than 100 articles.
  • You expect to add or rename fields over time.
  • You want consistent components (callouts, code blocks, FAQ) across articles.
  • You may want to translate or re-export content later.

Quick verdict

Pick MDX over plain Markdown if you’ll use any components. Pick plain Markdown only if you’ll never embed components and want maximum portability.

Before you start

  • Decide MDX vs MD first — schema and components depend on it.
  • Pick a slug convention up front (kebab-case, no dates).
  • Have Astro Content Collections enabled if you’re on Astro.

Step by step

  1. Define a strict frontmatter schema via Content Collections. src/content/config.ts:
import { defineCollection, z } from 'astro:content';

const HUBS = ['ai-applications', 'ai-tools', 'indie-dev',
               'prompt-library', 'troubleshooting'] as const;

export const collections = {
  articles: defineCollection({
    type: 'content',
    schema: ({ image }) => z.object({
      title:          z.string().min(8).max(80),
      description:    z.string().min(80).max(170),
      urlSlug:        z.string().regex(/^[a-z0-9-]+$/),
      category:       z.enum(HUBS),
      subcategory:    z.string().optional(),
      tags:           z.array(z.string()).max(8),
      publishedAt:    z.date(),
      updatedAt:      z.date().optional(),
      author:         z.string().default('AI Productivity Guide Team'),
      featured:       z.boolean().default(false),
      draft:          z.boolean().default(false),
      lang:           z.enum(['en', 'zh']),
      translationKey: z.string(),
      primaryKeyword: z.string().optional(),
      hero:           image().optional(),     // image() helper for optimization
      schemaVersion:  z.literal(2).default(2),
    }),
  }),
};

schemaVersion future-proofs the renames you will inevitably do.

  1. Centralize MDX components. src/components/mdx/index.ts:
import Callout from './Callout.astro';
import FAQ from './FAQ.astro';
import VideoEmbed from './VideoEmbed.astro';
import { Image } from 'astro:assets';

export const mdxComponents = {
  Callout,
  FAQ,
  VideoEmbed,
  img: Image,        // override default <img> with optimized one
};

In the article layout:

---
import { mdxComponents } from '@/components/mdx';
const { Content } = await Astro.props.article.render();
---
<Content components={mdxComponents} />

Authors call <Callout type="warn">…</Callout> without per-file imports.

  1. Pick a single slug convention and enforce it. The schema regex above (^[a-z0-9-]+$) blocks My_Article-2024-01.mdx. Add a prebuild check that compares urlSlug to the filename:
// scripts/check-slug-matches-filename.mjs
import { readdirSync, readFileSync } from 'node:fs';
import matter from 'gray-matter';
for (const file of readdirSync('src/content/articles/en/indie-dev')) {
  const { data } = matter(readFileSync(`src/content/articles/en/indie-dev/${file}`, 'utf8'));
  const expected = `${data.urlSlug}.mdx`;
  if (file !== expected) {
    console.error(`MISMATCH: ${file} ≠ ${expected}`); process.exit(1);
  }
}
  1. Store images in src/assets/ and use the image() helper. Never /public/... for content images — you lose responsive optimization:
---
hero: ../../assets/hero.jpg
---

import { Image } from 'astro:assets';
import diagram from '../../assets/diagram.png';

<Image src={diagram} alt="architecture diagram" widths={[400, 800, 1200]} formats={['avif', 'webp']} />
  1. Add a build-time link checker. Fail the build on broken internal links:
// scripts/check-mdx-links.mjs (excerpt)
import { readFileSync, readdirSync } from 'node:fs';
import { join } from 'node:path';

const known = new Set(/* all slugs */);
let failed = false;

function walk(dir) {
  for (const f of readdirSync(dir, { withFileTypes: true })) {
    const full = join(dir, f.name);
    if (f.isDirectory()) walk(full);
    else if (f.name.endsWith('.mdx')) {
      const md = readFileSync(full, 'utf8');
      for (const m of md.matchAll(/\]\(\/[a-z]+\/articles\/([a-z0-9-]+)\/\)/g)) {
        if (!known.has(m[1])) {
          console.error(`BROKEN: ${full} → ${m[1]}`); failed = true;
        }
      }
    }
  }
}
walk('src/content/articles');
if (failed) process.exit(1);

Wire into package.json:

{
  "scripts": {
    "prebuild": "node scripts/audit-content.mjs && astro check && node scripts/check-mdx-links.mjs"
  }
}
  1. Guard against the MDX brace bug. LLMs love to write {var} in prose, which MDX parses as JSX. A scan in CI:
# fail build on unescaped braces in prose
awk 'BEGIN{f=0} /^```/{f=!f; next} {if(!f && /\{[a-z]/) print FILENAME":"NR}' \
  src/content/articles/**/*.mdx \
  | tee /tmp/brace-hits.txt
test ! -s /tmp/brace-hits.txt
  1. Document the editorial rules in CONTENT.md. Authors should not have to reverse-engineer your schema from error messages.

Implementation checklist

  • Content Collections schema validates every frontmatter field.
  • MDX components are centralized; articles do not import them per-file.
  • Slug regex + filename match enforced in prebuild.
  • Internal link checker runs in CI and fails on dead links.
  • Brace-scan guard catches MDX gotchas before deploy.

After-launch verification

  • astro check + prebuild scripts pass on every PR.
  • A new article that violates the schema fails the build with a clear error.
  • Sitemap entries match the slugs declared in frontmatter.

Common pitfalls

  • Letting MDX components be imported per-file — every article ends up styled slightly differently.
  • Hard-coding image paths to /public/... and breaking responsive optimization.
  • Using long-prose .md files for everything when MDX would let you embed structured FAQ or table data.
  • Not versioning your frontmatter schema — when you rename a field, half your articles break silently. The schemaVersion literal above is the cheapest defense.
  • Treating Markdown as a free-for-all instead of as a content database.
  • Forgetting to add the brace-scan guard — LLM-assisted edits will eventually break the build.

FAQ

  • MDX or plain Markdown?: MDX if you want any embedded components or interactive elements. Plain Markdown if portability and CMS compatibility matter more.
  • Should I commit images to git?: Yes for site-critical visuals; use a CDN for heavy media. Lossless commits keep the build reproducible.
  • How do I keep frontmatter consistent across many authors?: Schema enforcement plus a documented CONTENT.md. Reject PRs that violate the schema.
  • What about internationalization?: Use parallel folders like src/content/articles/en/ and src/content/articles/zh/, sharing the same schema and a translationKey field.
  • How do I migrate when I rename a frontmatter field?: Bump schemaVersion, write a one-off migration script in scripts/, run it, commit the result in one PR.

Tags: #Indie dev #Astro #MDX #Content Collections #Content ops