You shipped a batch of 50 new articles. A week later Search Console still shows the same URL count it had before the launch, and your new pages do not appear in site: searches. Sitemap generation succeeded, the file is on the live site, but Google has not noticed. Search engines do not poll sitemaps aggressively — they re-fetch on a schedule that depends on prior crawl history, recent change signals, and explicit submission. A sitemap that exists is not the same as a sitemap that has been read.
This article covers how to confirm the sitemap is correctly published, how to force a re-fetch, and the automation that makes “submission” not a manual step.
Common causes
1. Sitemap generated but old version still served
Build wrote new sitemap.xml; CDN edge cache returns the old version. Crawler fetches it, sees old URLs, ignores the new ones.
How to judge: curl -s https://yoursite.com/sitemap.xml | grep -c '<url>' — count entries. Compare to local dist/sitemap.xml.
2. Sitemap URL not declared in robots.txt
If robots.txt does not have a Sitemap: line, crawlers may not find the sitemap unless explicitly submitted in Search Console.
How to judge: curl -s https://yoursite.com/robots.txt | grep -i sitemap. Empty = sitemap not advertised.
3. Sitemap was submitted once and never re-pinged
Google fetches submitted sitemaps but the frequency depends on observed change rate. For a site with low historical update frequency, re-fetch can be 1-2 weeks apart.
How to judge: Search Console > Sitemaps > look at “Last read” column.
4. lastmod dates not updated
Crawlers use <lastmod> in sitemap entries to decide what to re-crawl. If your build keeps stale lastmod values, the crawler sees no signal that anything changed.
How to judge: open sitemap.xml and check whether new articles have a recent <lastmod>, and existing articles’ lastmod reflects their latest edit.
5. Sitemap index split out of sync
Big sites use a sitemap index pointing to multiple per-section sitemaps. If new articles went into a section sitemap but the index still has the old reference, crawler does not discover the new content.
How to judge: curl -s https://yoursite.com/sitemap-index.xml and confirm every sub-sitemap is listed and reachable.
6. New URLs blocked by robots.txt or noindex
Sitemap lists URLs, but each URL has <meta name="robots" content="noindex"> or is disallowed in robots.txt. Crawler fetches, sees noindex, drops them.
How to judge: curl -s <new-article-url> | grep -i 'noindex' — should not match. Check robots.txt for Disallow: covering the new URL path.
Before you start
- Confirm the new articles are actually on the live site (open one in a browser).
- Note when the deploy completed.
- Have Search Console access for the property in question.
Information to collect
- Output of
curl -s https://yoursite.com/sitemap.xml | head -50— first entries and structure. - Total
<url>count from sitemap on the live site. robots.txtcontent.- Search Console > Sitemaps > status and last-read date.
- Search Console > Coverage > whether new URLs appear in “Discovered” or “Crawled” tabs.
Step-by-step fix
Step 1: Verify the sitemap on disk and on the live site match
# Local
grep -c '<url>' dist/sitemap.xml
head -3 dist/sitemap.xml
# Live
curl -s https://yoursite.com/sitemap.xml | grep -c '<url>'
curl -s https://yoursite.com/sitemap.xml | head -3
Numbers should match. If live is lower, you have a CDN cache issue — purge per the CDN stale article.
Step 2: Make robots.txt advertise the sitemap
public/robots.txt:
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap-index.xml
Use the index URL if you have one, otherwise sitemap.xml. Multiple Sitemap: lines are allowed.
After rebuild, curl -s https://yoursite.com/robots.txt must show the Sitemap line.
Step 3: Resubmit in Search Console
search.google.com/search-console > pick the property > Sitemaps (left nav).
Add a new sitemap:
https://yoursite.com/sitemap-index.xml
Submit
Google reads it within minutes for an active property, longer for a newer one.
If a sitemap is already listed and shows “Couldn’t fetch” or stale “Last read”:
- Remove it from the list.
- Re-add with the same URL.
- The remove + re-add forces a fresh fetch.
Step 4: Ping search engines directly
Programmatic re-ping for Google and Bing:
# Google
curl "https://www.google.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"
# Bing
curl "https://www.bing.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"
These return 200 OK and trigger a re-fetch within hours. Not a replacement for Search Console submission but useful for triggering immediate attention.
Step 5: Make sure <lastmod> is meaningful
Build config (Astro example, astro.config.mjs):
import sitemap from "@astrojs/sitemap";
export default defineConfig({
integrations: [
sitemap({
lastmod: new Date(),
changefreq: "weekly",
priority: 0.7,
}),
],
});
For per-article precision, source lastmod from the article frontmatter’s updatedAt (or publishedAt if unchanged).
Step 6: Set up automatic ping on deploy
Add a GitHub Actions step that pings after every successful production deploy:
- name: Ping search engines
run: |
curl -s "https://www.google.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"
curl -s "https://www.bing.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"
This removes manual submission from the release checklist entirely.
Step 7: Use Search Console URL Inspection for priority pages
For especially important new pages, use URL Inspection in Search Console:
Top search bar in Search Console > paste URL > Inspect
> "URL is not on Google" > Request Indexing
Limited to ~10/day per property but valuable for cornerstone content.
Verify
curl -s https://yoursite.com/sitemap.xml | grep -c '<url>'matches your expected count.- Search Console > Sitemaps > “Last read” updates to today’s date.
- Search Console > Sitemaps > “Discovered URLs” climbs to your new total within a few days.
site:yoursite.com/articles/new-article-slugshows the page within 3-7 days.- New articles appear in Coverage > Indexed within 1-2 weeks.
Long-term prevention
- Always advertise the sitemap in
robots.txt; it is one line and removes a whole class of discovery failures. - Automate Google and Bing pings as part of your deploy pipeline.
- Use accurate
<lastmod>per URL — stale lastmod tells crawlers “nothing changed, don’t bother”. - For sites over 50K URLs, use a sitemap index with per-section sitemaps to keep individual files under the 50MB / 50K URL limit.
- Monitor Search Console Coverage weekly; sudden drops or stalled discovery are early warning signs.
Common pitfalls
- Submitting a sitemap URL without testing it returns HTTP 200; if the URL 404s the submission silently fails.
- Listing URLs in the sitemap that are blocked by robots.txt — crawler reports “Submitted URL blocked by robots.txt” errors.
- Submitting too often (every minute); Search Console rate-limits and may flag the property.
- Using XML files larger than 50MB or with more than 50K URLs; split into multiple sitemaps via an index.
- Mixing http:// and https:// URLs in the sitemap; pick canonical and stick with it.
FAQ
Q: How fast will Google index new sitemap URLs? A: Initial discovery within 24-48 hours after submission. Full indexing depends on site authority — typically 3-14 days.
Q: Does Bing also use sitemap pings? A: Yes; Bing Webmaster Tools has its own submission flow plus the ping URL.
Q: Do I need a sitemap for a small site? A: For under 100 pages, internal linking is usually enough. But a sitemap is cheap insurance against discovery gaps.
Q: Should I include images in the sitemap? A: Optional — image sitemaps help image search but are not required for normal page indexing. Add them once core sitemap is healthy.