Sitemaps and robots.txt are boring infrastructure — until Google says “Discovered, currently not indexed” because your sitemap is missing pages or your robots accidentally blocks the CSS. App Router gives you two clean ways to handle both. Pick one and own it.
Background
Google still discovers most new content via internal links, but a sitemap meaningfully accelerates indexing for new sites — especially when paired with a Search Console submission. robots.txt is the polite signal to crawlers, not a security mechanism. Both files need to be reachable at the site root, served as text/plain (robots) or application/xml (sitemap).
How to tell
- You launched a Next.js site and Google has indexed fewer than 30% of your articles after two weeks.
- Search Console shows “Sitemap could not be read” or “0 discovered URLs”.
- You see
Disallow: /in your deployed robots.txt and panic — yes, that blocks everything.
Step by step
-
Decide static vs dynamic. Under ~100 pages a static
public/robots.txt+public/sitemap.xmlis fine. For anything generated at build time, use App Router’sapp/sitemap.tsandapp/robots.ts. -
Static robots.txt: drop this file at
public/robots.txt:
User-agent: *
Allow: /
Disallow: /api/
Disallow: /preview/
Disallow: /_next/
Disallow: /drafts/
Sitemap: https://yourdomain.com/sitemap.xml
- Dynamic
app/robots.ts: the App Router idiom, fully typed:
// app/robots.ts
import type { MetadataRoute } from 'next';
export default function robots(): MetadataRoute.Robots {
return {
rules: [
{ userAgent: '*', allow: '/', disallow: ['/api/', '/drafts/', '/preview/'] },
{ userAgent: 'GPTBot', disallow: '/' }, // opt out of AI scraping
],
sitemap: 'https://yourdomain.com/sitemap.xml',
host: 'https://yourdomain.com',
};
}
- Dynamic
app/sitemap.ts: generate from the same content source your pages use. For a bilingual MDX site:
// app/sitemap.ts
import type { MetadataRoute } from 'next';
import { getAllArticles } from '@/lib/content';
const SITE = 'https://yourdomain.com';
export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
const articles = await getAllArticles();
const staticPaths: MetadataRoute.Sitemap = [
{ url: `${SITE}/`, changeFrequency: 'daily', priority: 1.0 },
{ url: `${SITE}/about/`, changeFrequency: 'monthly', priority: 0.5 },
];
const articlePaths: MetadataRoute.Sitemap = articles.flatMap(a => ([
{
url: `${SITE}/en/articles/${a.slug}/`,
lastModified: a.updatedAt ?? a.publishedAt,
changeFrequency: 'weekly',
priority: 0.8,
alternates: {
languages: {
en: `${SITE}/en/articles/${a.slug}/`,
zh: `${SITE}/zh/articles/${a.slug}/`,
'x-default': `${SITE}/en/articles/${a.slug}/`,
},
},
},
]));
return [...staticPaths, ...articlePaths];
}
- Verify content type and shape after deploy:
curl -sI https://yourdomain.com/robots.txt | grep -i content-type
# content-type: text/plain; charset=utf-8
curl -sI https://yourdomain.com/sitemap.xml | grep -i content-type
# content-type: application/xml; charset=utf-8
curl -s https://yourdomain.com/sitemap.xml | grep -c '<loc>'
# should roughly match total article count + static pages
- The sitemap XML Next emits should look like this — each URL carries hreflang alternates:
<url>
<loc>https://yourdomain.com/en/articles/foo/</loc>
<lastmod>2026-05-22T00:00:00.000Z</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
<xhtml:link rel="alternate" hreflang="en" href="https://yourdomain.com/en/articles/foo/" />
<xhtml:link rel="alternate" hreflang="zh" href="https://yourdomain.com/zh/articles/foo/" />
<xhtml:link rel="alternate" hreflang="x-default" href="https://yourdomain.com/en/articles/foo/" />
</url>
-
Submit
https://yourdomain.com/sitemap.xmlin Search Console → Sitemaps. Status moves from “Pending” to “Success” in 1-2 days. -
Re-check Search Console weekly for the first month. Coverage should climb from a handful of pages to most of the site. If it plateaus, hit one URL in the URL Inspection tool to see the reason Google reports.
Common pitfalls
- Putting
Disallow: /in robots.txt during dev and forgetting to remove it before launch — the all-time classic SEO disaster. - Returning HTML instead of XML for the sitemap because of a route handler typo — Google rejects it silently.
- Forgetting trailing slashes consistency between sitemap and canonical URLs — Google treats them as different URLs.
- Letting the sitemap go stale because it is hand-written — generate from your content source.
- Including paginated URLs (
?page=2) or filter URLs in the sitemap — those should benoindex-ed, not advertised. - Forgetting to include both language versions —
/en/fooand/zh/fooare different URLs in Google’s eyes.
Who this is for
Any Next.js content site that wants to be indexed. Mandatory for new sites, helpful for established ones.
When to skip this
Sites intentionally not indexed (staging, internal tools) — those should serve Disallow: / and skip the sitemap.
FAQ
- Does Google need a sitemap if I have good internal linking?: For an established site, internal links are usually enough. For a new site, a sitemap meaningfully accelerates first-time discovery.
- How often should the sitemap update?: On every publish. If you use
app/sitemap.tsreading from your content collection, it updates automatically on every build. - Should I include
lastModified?: Yes when accurate — it helps Google prioritize re-crawling changed content. Do not fake it to the current date on every URL or you teach Google to ignore it. - Can I have multiple sitemaps?: Yes — create a sitemap index (
sitemap-index.xml) listing multiple sub-sitemaps. Helpful past a few thousand URLs.
Related
- Next.js Content-Site SEO: The Things That Bite
- Submit Sitemap in Search Console
- Submit a New Site to Google in 2026
Tags: #Indie dev #Next.js #SEO #Technical SEO #robots.txt #Indexing