Do hreflang alternate URLs count toward the 50,000 limit?

No. Only ` ` URLs count. ` ` annotations inside a ` ` block don't increment the counter, but they do add to the 50 MB file-size budget.

Can I submit several sitemaps separately instead of an index?

Yes, but an index is the standard, scales to 500 sitemaps per property, and is far easier to maintain than submitting dozens of files by hand.

Does the 50 MB limit apply before or after gzip?

Before — Google checks the uncompressed size (52,428,800 bytes). Gzip only helps transfer speed.

Does Bing have the same 50,000 limit?

Yes. The 50,000 / 50 MB cap is the official sitemaps.org protocol, and all major search engines follow it.

How many child sitemaps can one index list?

Up to 50,000, and the index itself must stay under 50 MB. For most sites you'll never get close.

Troubleshooting

Sitemap Over 50,000 URLs: Split It With a Sitemap Index

A single sitemap.xml caps at 50,000 URLs and 50 MB uncompressed. Search Console says "Couldn't fetch" or only reads the first 50,000. How to split correctly and resubmit.

Published: May 24, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You generated a single sitemap.xml from your content collection. The file is 18 MB and contains 73,000 URLs. Search Console either reports Couldn't fetch or shows the parsed count stuck at exactly 50,000 — it read the limit and silently dropped the rest. Pages past index 50,000 never get discovered through the sitemap.

Fastest fix: the per-file cap is hard, so stop trying to shrink one file. Split your URLs into multiple child sitemaps of <= 50,000 URLs each (aim for 25,000), write a sitemap index that lists them, keep that index at the URL robots.txt already points to (/sitemap.xml), and resubmit the index in Search Console. Below: how to split, name, validate, and confirm Google processed every child.

The hard limits (verified June 2026)

The sitemaps.org protocol is explicit, and Google enforces all of it:

Limit	Value	Applies to
URLs per sitemap	`<= 50,000` `<loc>` entries	one urlset file
Size per sitemap	`<= 50 MB` (52,428,800 bytes) uncompressed	one urlset file
Child sitemaps per index	`<= 50,000`	one sitemapindex file
Size per index	`<= 50 MB` uncompressed	one sitemapindex file
Sitemaps per Search Console property	up to 500 submitted	the whole property

Two facts catch people out. First, gzip is fine for transport but Google checks the uncompressed size against the 50 MB cap, so a 6 MB .gz that expands to 70 MB still fails. Second — and this reverses a common assumption — only <loc> URLs count toward the 50,000 limit. <xhtml:link rel="alternate" hreflang="..."> annotations nested inside a <url> block do not increment the URL counter (Google confirmed this; the counter ticks on <loc>, not on alternates). They do still add bytes toward the 50 MB cap.

Sources: sitemaps.org protocol, Google: Build and submit a sitemap, Google: Manage sitemaps with index files.

Which bucket are you in

Symptom	Likely cause	Go to
Parsed count frozen at 50,000	over the URL cap	split by count
`Couldn't fetch` on a file under 50k URLs	over 50 MB uncompressed	split by size
File big but `<loc>` count looks low	hreflang/images bloating bytes	split by size, not count
Index submitted, still only one child processed	index lists one file or lists itself	fix the index
One split file still over the cap	split by alphabet, not count	re-split by count
Google ignores your new files	`robots.txt` still names the old file	fix robots.txt

Common causes

1. Single sitemap generator with no chunking

The build script writes every URL into one sitemap.xml regardless of count. Fine until your collection crosses 50k <loc> entries.

How to spot it: grep -c '<loc>' public/sitemap.xml. At or over 50,000? You are over the URL cap. (wc -l is unreliable — minified XML can put all URLs on one line.)

2. Sitemap is over 50 MB uncompressed even with fewer URLs

Each entry can carry a long <loc>, a <lastmod>, several <xhtml:link> hreflang tags, and <image:image> blocks. With heavy annotations you can hit 50 MB well before 50,000 URLs.

How to spot it: ls -lh public/sitemap.xml. Over 50 MB uncompressed? Split regardless of URL count.

3. Hreflang inflates the file size (but not the URL count)

A bilingual en/zh site adds 2-3 <xhtml:link> tags per <url>. Those alternates do not count toward the 50,000 URL limit — only the <loc> does — but they bloat the bytes and can push you over 50 MB. Note the separate case: if you emit en and zh as two distinct <url> blocks (two <loc> values), each one does count, so a 30k-page bilingual site really has ~60k <loc> entries and is over the cap.

How to spot it: compare grep -c '<loc>' public/sitemap.xml (the real URL count) against grep -c '<url>' public/sitemap.xml. If <loc> is near 50k you have a count problem; if <loc> is fine but the file is large, it’s a size problem.

4. Sitemap index pointing to itself or missing

You created an index but it lists only one child (the original 73k file), or it accidentally lists itself.

How to spot it: cat public/sitemap.xml. It should be a <sitemapindex> with multiple <sitemap><loc> children pointing at distinct child files, each under the caps. No child should point back at the index.

5. Splits by alphabet rather than count

Naive split: sitemap-a.xml, sitemap-b.xml by slug first letter. If 80k of your 300k URLs start with “p”, that file still blows the limit.

How to spot it: for f in public/sitemap-*.xml; do echo "$f $(grep -c '<loc>' "$f")"; done. Any file over 50,000? Your split key is wrong — split by count, not by letter.

6. Compressed file under 50 MB but uncompressed over

Google checks uncompressed size. An 8 MB sitemap.xml.gz that expands to 80 MB fails.

How to spot it: gzip -l public/sitemap.xml.gz prints compressed and uncompressed bytes. The uncompressed column must be under 52,428,800.

7. Sitemap files don’t match robots.txt

You split into sitemap-1.xml, sitemap-2.xml, but robots.txt still says Sitemap: https://example.com/sitemap.xml pointing at the old urlset, or at nothing.

How to spot it: curl -s https://yoursite.com/robots.txt | grep -i sitemap. It should list the sitemap-index URL.

Shortest path to fix

Step 1: Decide a chunk size and split scheme

Conservative target: 25,000 <loc> URLs or 25 MB per file — half the hard limit, so a growth spurt or a fat lastmod batch never tips you over between deploys. Group by content type so a failing child is easy to identify:

sitemap-articles-1.xml … sitemap-articles-N.xml
sitemap-categories.xml
sitemap-tags.xml
sitemap-pages.xml (static pages)

Step 2: Generate child sitemaps in chunks

// scripts/generate-sitemaps.mjs
import fs from 'node:fs';

const CHUNK = 25000;
const articles = JSON.parse(fs.readFileSync('articles.json', 'utf8'));
const total = articles.length;
const numFiles = Math.ceil(total / CHUNK);

for (let i = 0; i < numFiles; i++) {
  const chunk = articles.slice(i * CHUNK, (i + 1) * CHUNK);
  const urls = chunk.map(a => `<url><loc>https://example.com/articles/${a.slug}/</loc><lastmod>${a.modifiedAt}</lastmod></url>`);
  fs.writeFileSync(
    `public/sitemap-articles-${i + 1}.xml`,
    `<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${urls.join('')}</urlset>`
  );
}

Step 3: Generate a sitemap index

const indexEntries = [];
for (let i = 1; i <= numFiles; i++) {
  indexEntries.push(`<sitemap><loc>https://example.com/sitemap-articles-${i}.xml</loc><lastmod>${new Date().toISOString()}</lastmod></sitemap>`);
}
indexEntries.push(`<sitemap><loc>https://example.com/sitemap-categories.xml</loc></sitemap>`);

fs.writeFileSync(
  'public/sitemap.xml',
  `<?xml version="1.0" encoding="UTF-8"?>\n<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">${indexEntries.join('')}</sitemapindex>`
);

Keep the top-level file named sitemap.xml so existing robots.txt references still resolve — it’s now a <sitemapindex>, not a <urlset>. The index has no URLs of its own; never add a <sitemap> entry that points back at sitemap.xml.

Step 4: Update robots.txt

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml

The index lives at the same URL, so this line can stay as-is — just confirm it points at the index, not a leftover single urlset.

Step 5: Validate every file

for f in public/sitemap*.xml; do
  echo -n "$f: locs="
  grep -c '<loc>' "$f"
  xmllint --noout "$f" && echo "  valid XML"
done

Each child must have <= 50,000 <loc> entries and be under 50 MB uncompressed. The index lists <sitemap> entries, not <url> entries. xmllint (from libxml2) catches unescaped &, broken tags, and encoding problems that make Google report Couldn't fetch.

Step 6: Resubmit in Search Console

Search Console → Indexing → Sitemaps. Remove the old single-sitemap entry. In Add a new sitemap, enter the path after your domain (for example sitemap.xml) and Submit. Submitting the index alone is enough — Google discovers and processes the children automatically. You can also fetch it on demand from the sitemap testing flow in the Sitemaps report.

Step 7: Confirm it’s fixed

In the Sitemaps report, the index row shows Status: Success and a Discovered URLs total that is the sum across all children (duplicates counted once). Watch it climb past 50,000 over a few days.
Click into each child sitemap; none should read Couldn't fetch. If one does, open that file’s URL directly in a browser and check the HTTP status and XML.
Click See page indexing on the index (or a child) to filter the Page Indexing report to just those URLs and watch indexed counts rise.

Discovery is not indexing — Google still decides what to index — but every valid <loc> should now at least be discovered, which the old truncated file prevented.

When this is not on you

For sites under 50,000 URLs, splitting won’t help. The cap only bites at scale; don’t shard a 5k-URL site.

Easy to misdiagnose as

A crawl-budget problem. Crawl budget is real but mainly matters for sites with millions of URLs. The 50k-per-sitemap cap is a sharper, simpler issue — rule it out first.

Prevention

Generate sitemaps with a hard chunk size (for example 25k <loc> per file).
Validate XML in CI before deploy; fail the build if any file exceeds 40 MB or 40k URLs.
Keep robots.txt pointing at one canonical sitemap-index URL.
Log each sitemap file’s size and <loc> count in build output so you see the trend before it crosses a limit.
Gzip for transport, but assert the uncompressed size stays under 52,428,800 bytes.

FAQ

Do hreflang alternate URLs count toward the 50,000 limit? No. Only <loc> URLs count. <xhtml:link rel="alternate"> annotations inside a <url> block don’t increment the counter, but they do add to the 50 MB file-size budget.
Can I submit several sitemaps separately instead of an index? Yes, but an index is the standard, scales to 500 sitemaps per property, and is far easier to maintain than submitting dozens of files by hand.
Does the 50 MB limit apply before or after gzip? Before — Google checks the uncompressed size (52,428,800 bytes). Gzip only helps transfer speed.
Does Bing have the same 50,000 limit? Yes. The 50,000 / 50 MB cap is the official sitemaps.org protocol, and all major search engines follow it.
How many child sitemaps can one index list? Up to 50,000, and the index itself must stay under 50 MB. For most sites you’ll never get close.

Tags: #SEO #Troubleshooting #Indexing #Search Console #Sitemap #xml