Content Site Sitemap Not Resubmitted After Update

You added 50 new articles, but Search Console still shows last month's count — sitemap regenerated locally but never reached the crawler.

You shipped a batch of 50 new articles. A week later Search Console still shows the same URL count it had before the launch, and your new pages do not appear in site: searches. Sitemap generation succeeded, the file is on the live site, but Google has not noticed. Search engines do not poll sitemaps aggressively — they re-fetch on a schedule that depends on prior crawl history, recent change signals, and explicit submission. A sitemap that exists is not the same as a sitemap that has been read.

This article covers how to confirm the sitemap is correctly published, how to force a re-fetch, and the automation that makes “submission” not a manual step.

Common causes

1. Sitemap generated but old version still served

Build wrote new sitemap.xml; CDN edge cache returns the old version. Crawler fetches it, sees old URLs, ignores the new ones.

How to judge: curl -s https://yoursite.com/sitemap.xml | grep -c '<url>' — count entries. Compare to local dist/sitemap.xml.

2. Sitemap URL not declared in robots.txt

If robots.txt does not have a Sitemap: line, crawlers may not find the sitemap unless explicitly submitted in Search Console.

How to judge: curl -s https://yoursite.com/robots.txt | grep -i sitemap. Empty = sitemap not advertised.

3. Sitemap was submitted once and never re-pinged

Google fetches submitted sitemaps but the frequency depends on observed change rate. For a site with low historical update frequency, re-fetch can be 1-2 weeks apart.

How to judge: Search Console > Sitemaps > look at “Last read” column.

4. lastmod dates not updated

Crawlers use <lastmod> in sitemap entries to decide what to re-crawl. If your build keeps stale lastmod values, the crawler sees no signal that anything changed.

How to judge: open sitemap.xml and check whether new articles have a recent <lastmod>, and existing articles’ lastmod reflects their latest edit.

5. Sitemap index split out of sync

Big sites use a sitemap index pointing to multiple per-section sitemaps. If new articles went into a section sitemap but the index still has the old reference, crawler does not discover the new content.

How to judge: curl -s https://yoursite.com/sitemap-index.xml and confirm every sub-sitemap is listed and reachable.

6. New URLs blocked by robots.txt or noindex

Sitemap lists URLs, but each URL has <meta name="robots" content="noindex"> or is disallowed in robots.txt. Crawler fetches, sees noindex, drops them.

How to judge: curl -s <new-article-url> | grep -i 'noindex' — should not match. Check robots.txt for Disallow: covering the new URL path.

Before you start

  • Confirm the new articles are actually on the live site (open one in a browser).
  • Note when the deploy completed.
  • Have Search Console access for the property in question.

Information to collect

  • Output of curl -s https://yoursite.com/sitemap.xml | head -50 — first entries and structure.
  • Total <url> count from sitemap on the live site.
  • robots.txt content.
  • Search Console > Sitemaps > status and last-read date.
  • Search Console > Coverage > whether new URLs appear in “Discovered” or “Crawled” tabs.

Step-by-step fix

Step 1: Verify the sitemap on disk and on the live site match

# Local
grep -c '<url>' dist/sitemap.xml
head -3 dist/sitemap.xml

# Live
curl -s https://yoursite.com/sitemap.xml | grep -c '<url>'
curl -s https://yoursite.com/sitemap.xml | head -3

Numbers should match. If live is lower, you have a CDN cache issue — purge per the CDN stale article.

Step 2: Make robots.txt advertise the sitemap

public/robots.txt:

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap-index.xml

Use the index URL if you have one, otherwise sitemap.xml. Multiple Sitemap: lines are allowed.

After rebuild, curl -s https://yoursite.com/robots.txt must show the Sitemap line.

Step 3: Resubmit in Search Console

search.google.com/search-console > pick the property > Sitemaps (left nav).

Add a new sitemap:
  https://yoursite.com/sitemap-index.xml

Submit

Google reads it within minutes for an active property, longer for a newer one.

If a sitemap is already listed and shows “Couldn’t fetch” or stale “Last read”:

  1. Remove it from the list.
  2. Re-add with the same URL.
  3. The remove + re-add forces a fresh fetch.

Step 4: Ping search engines directly

Programmatic re-ping for Google and Bing:

# Google
curl "https://www.google.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"

# Bing
curl "https://www.bing.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"

These return 200 OK and trigger a re-fetch within hours. Not a replacement for Search Console submission but useful for triggering immediate attention.

Step 5: Make sure <lastmod> is meaningful

Build config (Astro example, astro.config.mjs):

import sitemap from "@astrojs/sitemap";

export default defineConfig({
  integrations: [
    sitemap({
      lastmod: new Date(),
      changefreq: "weekly",
      priority: 0.7,
    }),
  ],
});

For per-article precision, source lastmod from the article frontmatter’s updatedAt (or publishedAt if unchanged).

Step 6: Set up automatic ping on deploy

Add a GitHub Actions step that pings after every successful production deploy:

- name: Ping search engines
  run: |
    curl -s "https://www.google.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"
    curl -s "https://www.bing.com/ping?sitemap=https://yoursite.com/sitemap-index.xml"

This removes manual submission from the release checklist entirely.

Step 7: Use Search Console URL Inspection for priority pages

For especially important new pages, use URL Inspection in Search Console:

Top search bar in Search Console > paste URL > Inspect
> "URL is not on Google" > Request Indexing

Limited to ~10/day per property but valuable for cornerstone content.

Verify

  • curl -s https://yoursite.com/sitemap.xml | grep -c '<url>' matches your expected count.
  • Search Console > Sitemaps > “Last read” updates to today’s date.
  • Search Console > Sitemaps > “Discovered URLs” climbs to your new total within a few days.
  • site:yoursite.com/articles/new-article-slug shows the page within 3-7 days.
  • New articles appear in Coverage > Indexed within 1-2 weeks.

Long-term prevention

  • Always advertise the sitemap in robots.txt; it is one line and removes a whole class of discovery failures.
  • Automate Google and Bing pings as part of your deploy pipeline.
  • Use accurate <lastmod> per URL — stale lastmod tells crawlers “nothing changed, don’t bother”.
  • For sites over 50K URLs, use a sitemap index with per-section sitemaps to keep individual files under the 50MB / 50K URL limit.
  • Monitor Search Console Coverage weekly; sudden drops or stalled discovery are early warning signs.

Common pitfalls

  • Submitting a sitemap URL without testing it returns HTTP 200; if the URL 404s the submission silently fails.
  • Listing URLs in the sitemap that are blocked by robots.txt — crawler reports “Submitted URL blocked by robots.txt” errors.
  • Submitting too often (every minute); Search Console rate-limits and may flag the property.
  • Using XML files larger than 50MB or with more than 50K URLs; split into multiple sitemaps via an index.
  • Mixing http:// and https:// URLs in the sitemap; pick canonical and stick with it.

FAQ

Q: How fast will Google index new sitemap URLs? A: Initial discovery within 24-48 hours after submission. Full indexing depends on site authority — typically 3-14 days.

Q: Does Bing also use sitemap pings? A: Yes; Bing Webmaster Tools has its own submission flow plus the ping URL.

Q: Do I need a sitemap for a small site? A: For under 100 pages, internal linking is usually enough. But a sitemap is cheap insurance against discovery gaps.

Q: Should I include images in the sitemap? A: Optional — image sitemaps help image search but are not required for normal page indexing. Add them once core sitemap is healthy.

Tags: #content-site #Ops #Troubleshooting