Sitemap and robots.txt Basics in Next.js

A correctly served sitemap and robots.txt are non-negotiable for SEO. Here is the App Router idiom for both in 2026.

Sitemaps and robots.txt are boring infrastructure — until Google says “Discovered, currently not indexed” because your sitemap is missing pages or your robots accidentally blocks the CSS. App Router gives you two clean ways to handle both. Pick one and own it.

Background

Google still discovers most new content via internal links, but a sitemap meaningfully accelerates indexing for new sites — especially when paired with a Search Console submission. robots.txt is the polite signal to crawlers, not a security mechanism. Both files need to be reachable at the site root, served as text/plain (robots) or application/xml (sitemap).

How to tell

  • You launched a Next.js site and Google has indexed fewer than 30% of your articles after two weeks.
  • Search Console shows “Sitemap could not be read” or “0 discovered URLs”.
  • You see Disallow: / in your deployed robots.txt and panic — yes, that blocks everything.

Step by step

  1. Decide static vs dynamic. Under ~100 pages a static public/robots.txt + public/sitemap.xml is fine. For anything generated at build time, use App Router’s app/sitemap.ts and app/robots.ts.

  2. Static robots.txt: drop this file at public/robots.txt:

User-agent: *
Allow: /

Disallow: /api/
Disallow: /preview/
Disallow: /_next/
Disallow: /drafts/

Sitemap: https://yourdomain.com/sitemap.xml
  1. Dynamic app/robots.ts: the App Router idiom, fully typed:
// app/robots.ts
import type { MetadataRoute } from 'next';

export default function robots(): MetadataRoute.Robots {
  return {
    rules: [
      { userAgent: '*', allow: '/', disallow: ['/api/', '/drafts/', '/preview/'] },
      { userAgent: 'GPTBot', disallow: '/' },          // opt out of AI scraping
    ],
    sitemap: 'https://yourdomain.com/sitemap.xml',
    host: 'https://yourdomain.com',
  };
}
  1. Dynamic app/sitemap.ts: generate from the same content source your pages use. For a bilingual MDX site:
// app/sitemap.ts
import type { MetadataRoute } from 'next';
import { getAllArticles } from '@/lib/content';

const SITE = 'https://yourdomain.com';

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const articles = await getAllArticles();

  const staticPaths: MetadataRoute.Sitemap = [
    { url: `${SITE}/`,       changeFrequency: 'daily',   priority: 1.0 },
    { url: `${SITE}/about/`, changeFrequency: 'monthly', priority: 0.5 },
  ];

  const articlePaths: MetadataRoute.Sitemap = articles.flatMap(a => ([
    {
      url: `${SITE}/en/articles/${a.slug}/`,
      lastModified: a.updatedAt ?? a.publishedAt,
      changeFrequency: 'weekly',
      priority: 0.8,
      alternates: {
        languages: {
          en: `${SITE}/en/articles/${a.slug}/`,
          zh: `${SITE}/zh/articles/${a.slug}/`,
          'x-default': `${SITE}/en/articles/${a.slug}/`,
        },
      },
    },
  ]));

  return [...staticPaths, ...articlePaths];
}
  1. Verify content type and shape after deploy:
curl -sI https://yourdomain.com/robots.txt | grep -i content-type
# content-type: text/plain; charset=utf-8

curl -sI https://yourdomain.com/sitemap.xml | grep -i content-type
# content-type: application/xml; charset=utf-8

curl -s  https://yourdomain.com/sitemap.xml | grep -c '<loc>'
# should roughly match total article count + static pages
  1. The sitemap XML Next emits should look like this — each URL carries hreflang alternates:
<url>
  <loc>https://yourdomain.com/en/articles/foo/</loc>
  <lastmod>2026-05-22T00:00:00.000Z</lastmod>
  <changefreq>weekly</changefreq>
  <priority>0.8</priority>
  <xhtml:link rel="alternate" hreflang="en" href="https://yourdomain.com/en/articles/foo/" />
  <xhtml:link rel="alternate" hreflang="zh" href="https://yourdomain.com/zh/articles/foo/" />
  <xhtml:link rel="alternate" hreflang="x-default" href="https://yourdomain.com/en/articles/foo/" />
</url>
  1. Submit https://yourdomain.com/sitemap.xml in Search Console → Sitemaps. Status moves from “Pending” to “Success” in 1-2 days.

  2. Re-check Search Console weekly for the first month. Coverage should climb from a handful of pages to most of the site. If it plateaus, hit one URL in the URL Inspection tool to see the reason Google reports.

Common pitfalls

  • Putting Disallow: / in robots.txt during dev and forgetting to remove it before launch — the all-time classic SEO disaster.
  • Returning HTML instead of XML for the sitemap because of a route handler typo — Google rejects it silently.
  • Forgetting trailing slashes consistency between sitemap and canonical URLs — Google treats them as different URLs.
  • Letting the sitemap go stale because it is hand-written — generate from your content source.
  • Including paginated URLs (?page=2) or filter URLs in the sitemap — those should be noindex-ed, not advertised.
  • Forgetting to include both language versions — /en/foo and /zh/foo are different URLs in Google’s eyes.

Who this is for

Any Next.js content site that wants to be indexed. Mandatory for new sites, helpful for established ones.

When to skip this

Sites intentionally not indexed (staging, internal tools) — those should serve Disallow: / and skip the sitemap.

FAQ

  • Does Google need a sitemap if I have good internal linking?: For an established site, internal links are usually enough. For a new site, a sitemap meaningfully accelerates first-time discovery.
  • How often should the sitemap update?: On every publish. If you use app/sitemap.ts reading from your content collection, it updates automatically on every build.
  • Should I include lastModified?: Yes when accurate — it helps Google prioritize re-crawling changed content. Do not fake it to the current date on every URL or you teach Google to ignore it.
  • Can I have multiple sitemaps?: Yes — create a sitemap index (sitemap-index.xml) listing multiple sub-sitemaps. Helpful past a few thousand URLs.

Tags: #Indie dev #Next.js #SEO #Technical SEO #robots.txt #Indexing