You updated an article. Days pass. Google’s cached version still shows the old content. dateModified in your JSON-LD is correct, the page itself renders the new content — but Google hasn’t re-crawled. One likely reason: your server isn’t returning a Last-Modified HTTP header (or returns the same one for every request regardless of content change). Without Last-Modified, Google can’t tell if a page changed since its last crawl, and de-prioritizes recrawl in favor of other pages.
This is most common on static sites: Vercel and Netlify usually set it correctly, but Firebase Hosting and some self-hosted setups don’t.
Common causes
Ordered by hit rate, highest first.
1. Hosting platform doesn’t set Last-Modified
Firebase Hosting, GitHub Pages, and some custom nginx configs don’t automatically set Last-Modified for static files.
How to spot it:
curl -sI "https://yoursite.com/article" | grep -i last-modified
Empty result = not set.
2. CDN strips the header
Origin returns Last-Modified. Cloudflare or another CDN strips it for caching reasons.
How to spot it: Two curl -I commands — one against your origin (bypassing CDN), one normal. If origin has Last-Modified but the CDN-served version doesn’t, CDN is the issue.
3. Last-Modified is always the same value (deploy time)
Some static-site generators set Last-Modified to the build time, so every page on the site has the same value. Google sees no diff between pages and treats it as a non-signal.
How to spot it: curl -I two different pages. If they share the same Last-Modified, this is the bug.
4. Cache-Control: no-store overrides
If you set Cache-Control: no-store, Last-Modified becomes meaningless because clients won’t cache anyway. Google may also discount the signal.
How to spot it:
curl -sI "https://yoursite.com/article" | grep -E 'cache-control|last-modified'
If cache-control: no-store and last-modified both present, conflict.
5. Using ETag instead, but missing Last-Modified
ETag is fine but Google prefers Last-Modified for HTML pages. If you only emit ETag, you’re partially right but missing the more useful signal.
How to spot it: Headers have ETag but no Last-Modified. Add both.
6. Page is served by a worker / SSR function that doesn’t set headers
Cloudflare Worker or Vercel Edge function generating the page doesn’t add Last-Modified by default.
How to spot it: Page is dynamically generated (check cf-ray or server header) AND Last-Modified is missing.
Shortest path to fix
Step 1: Verify the missing header
curl -sI "https://yoursite.com/article" | head -10
Look for Last-Modified line. Absent = need to add.
Step 2: Enable per-platform
Vercel — works by default for static files. For SSR routes, add to response:
return new Response(html, {
headers: { 'Last-Modified': new Date(article.modifiedAt).toUTCString() },
});
Netlify — works by default. Custom headers via _headers file:
/article/*
Last-Modified: ...
(Note: dynamic value isn’t supported in _headers; use Netlify Functions for that.)
Firebase Hosting — explicit headers in firebase.json:
{
"hosting": {
"headers": [{
"source": "**/*.html",
"headers": [
{ "key": "Last-Modified", "value": "Wed, 21 Oct 2025 07:28:00 GMT" }
]
}]
}
}
Firebase static doesn’t compute it dynamically; you can set a per-deploy value via build script.
GitHub Pages — doesn’t support custom headers. Move off it if Last-Modified matters.
Cloudflare Workers — manually add to Response:
return new Response(body, {
headers: {
'Last-Modified': lastModifiedDate.toUTCString(),
'Cache-Control': 'public, max-age=3600',
},
});
Step 3: Make the value reflect actual content change
Per article:
const lastModified = new Date(article.modifiedAt || article.publishedAt).toUTCString();
NOT build time for all pages.
Step 4: Don’t let CDN strip it
Cloudflare → Caching → Configuration → ensure Last-Modified isn’t in any transform rules. By default it’s preserved.
Step 5: Verify both origin and CDN
# Origin (replace with your origin IP or skip-CDN URL)
curl -sI "https://origin.yoursite.com/article" | grep -i last-modified
# CDN-served
curl -sI "https://yoursite.com/article" | grep -i last-modified
Both should show Last-Modified. If origin has it but CDN doesn’t, CDN is stripping.
Step 6: Use If-Modified-Since to test caching behavior
LM=$(curl -sI "https://yoursite.com/article" | grep -i last-modified | cut -d' ' -f2-)
curl -sI -H "If-Modified-Since: $LM" "https://yoursite.com/article"
Should return 304 Not Modified if the page hasn’t changed. If returns 200, your caching isn’t honoring the header.
Prevention
- Right after every deploy,
curl -Ithree representative pages and confirmLast-Modifiedis present and per-page distinct. - Set
Last-Modifiedfrom the article’s actualmodifiedAtfield, not from build time. - Don’t strip the header in CDN config.
- Pair
Last-ModifiedwithETag(both is best practice). - For CDN-served HTML, add cache rules that respect
Last-ModifiedandIf-Modified-Since.
Related
- Pagination canonical confusion
- Article date JSON-LD mismatch
- Structured data rich results warning
- Stale articles not updated
Tags: #SEO #Troubleshooting