Google Crawls My Homepage But Never the Article Pages

Search Console shows the homepage gets crawled regularly but inner article pages stay at "Discovered — currently not indexed" or are never crawled.

Search Console → Crawl Stats shows Googlebot hits your homepage dozens of times per day, but /articles/* combined gets just a handful. New articles aren’t crawled for weeks after publishing. The sitemap was submitted long ago and doesn’t help.

Two root causes: Google can’t find article URLs from the homepage (discovery failure), or it deliberately won’t spend budget on article pages (allocation failure). Different fixes apply.

Symptoms

  • Crawl Stats shows daily homepage hits but rare hits to /articles/*
  • New articles take 2-4 weeks to be discovered (well over the normal 3-7 days)
  • Sitemap is submitted but Google rarely fetches its URLs
  • Article URLs in URL Inspection are mostly “Discovered” or “URL unknown to Google”

Quick verdict

Either Google can’t find article URLs from the homepage, or it has decided not to spend budget on them. Both are fixable, but the fixes differ. Step 1 below splits the diagnosis.

Common causes

1. Homepage uses JS-rendered article links Google can’t see

Most common. The “latest articles” list is mounted by React/Vue/Svelte after hydration.

How to confirm:

# Look at homepage without executing JS
curl -sL https://yourdomain.com/ > home.html

# Count article links
grep -oE 'href="/articles/[^"]+"' home.html | wc -l
# 0 or very few = links rendered by JS

2. Homepage shows only 5-10 latest, older articles 3+ clicks deep

Homepage (5 latest) → /blog (pagination page 1) → /blog/page/2 → article

3+ clicks deep = Google treats them as low priority.

3. Article URLs not in sitemap, or sitemap too large to parse

# Count articles in sitemap
curl -s https://yourdomain.com/sitemap.xml | grep -c "<loc>"

# Should equal your actual article count

Hand-maintained sitemaps often miss new articles. Sitemaps > 50MB / > 50,000 URLs are dropped by Google.

4. Article pages too thin — Google deprioritizes after one visit

If Googlebot’s first crawl finds articles that are < 300 words / template content, it flags “this URL pattern isn’t worth re-visiting” and drops the entire /articles/* priority.

<a href="/articles/foo/">read more</a>
<a href="/articles/bar/">view post</a>

Generic anchors = weak discovery signals. Google knows a link exists but not what topic it points to.

6. Article URLs blocked in robots.txt

Disallow: /articles/draft/

If you wrote a too-broad rule (e.g., Disallow: /a), you may accidentally block /articles/.

7. Server slow / 5xx-ing for Googlebot

If the server is slow (> 3 seconds) or frequently 5xx for Googlebot, Google self-throttles crawl rate.

How to confirm: Crawl Stats → Crawl responses → “Average response time.” > 1000ms is a problem.

Shortest path to fix

Step 1: Distinguish discovery failure vs. allocation failure

# Disable JS, view homepage source
curl -sL https://yourdomain.com/ | grep -c "/articles/"
  • 0 or < 5: discovery failure. Go Step 2-3.
  • Normal count: allocation failure. Go Step 4-6.

Step 2: Switch listing components to SSR / SSG

In Next.js:

// Wrong: useEffect fetch
function LatestPosts() {
  const [posts, setPosts] = useState([]);
  useEffect(() => { fetch('/api/posts').then(r => r.json()).then(setPosts); }, []);
  return posts.map(p => <a href={p.url}>{p.title}</a>);
}

// Right: getStaticProps
export async function getStaticProps() {
  const posts = await getAllPosts();
  return { props: { posts } };
}

Astro is SSG by default. React-only sites can pre-render via react-snap or similar.

Step 3: Add a paginated /articles index

---
// src/pages/articles/index.astro
import { getCollection } from 'astro:content';
const posts = await getCollection('posts');
const sorted = posts.sort((a, b) => b.data.publishedAt - a.data.publishedAt);
---
<h1>All articles ({sorted.length})</h1>
<ul>
  {sorted.map(p => (
    <li>
      <a href={`/articles/${p.slug}/`}>{p.data.title}</a>
      <span>{p.data.publishedAt.toLocaleDateString()}</span>
    </li>
  ))}
</ul>

Homepage + main nav both link to /articles/. This makes every article ≤2 clicks from anywhere.

Step 4: Fix the sitemap

Every article URL should have a real lastmod:

<url>
  <loc>https://yourdomain.com/articles/foo/</loc>
  <lastmod>2026-05-21</lastmod>
</url>

lastmod must be the real modification time, not “today” for every URL (Google ignores fake lastmods).

For sitemaps > 5000 URLs, split into a sitemap-index:

<!-- sitemap-index.xml -->
<sitemapindex>
  <sitemap><loc>https://yourdomain.com/sitemap-articles.xml</loc></sitemap>
  <sitemap><loc>https://yourdomain.com/sitemap-pages.xml</loc></sitemap>
</sitemapindex>

Step 5: Audit the 10 most recent articles’ quality baseline

Each must have:

  • Exactly one real <h1>
  • Intro paragraph (80+ words)
  • 600+ word body
  • ≥2 internal links from other articles pointing to it
  • ≥1 image with alt text

Articles below this bar should be improved before publishing — otherwise they drag /articles/* priority site-wide.

<!-- Bad -->
<a href="/articles/foo/">read more</a>

<!-- Good -->
<a href="/articles/foo/">Astro Deploy Vercel Complete Guide</a>

Topic-bearing anchors = strong discovery signal.

Step 7: Fix server response time for Googlebot

# Simulate Googlebot for speed test
curl -sL -A "Mozilla/5.0 (compatible; Googlebot/2.1)" \
  -w "%{time_total}\n" -o /dev/null https://yourdomain.com/articles/foo/

# Should be < 1.5s

Slow → add CDN, add caching, optimize server-rendered time.

When this is not on you

On very large sites, Google intentionally throttles crawl rate per directory. It doesn’t mean the site is broken — it means Google has decided how much budget to spend. Recovery = pursue backlinks + lift overall authority so Google allocates more budget.

Easy to misdiagnose

  • <priority> in sitemap helps: Google almost completely ignores it
  • Repeated sitemap resubmission helps: Google knows the sitemap; problem isn’t discovery
  • Adding keywords to homepage helps: keyword stuffing triggers spam signals instead
  • Pinging sitemap speeds it up: Google deprecated the official ping endpoint in 2023

Prevention

  • Never rely solely on JavaScript-rendered links for important pages — always SSR / SSG <a href>
  • Every article ≤2 clicks from anywhere (Homepage → /articles index → article)
  • Build internal link clusters around topics, with hub pages concentrating links
  • Internal link anchors are always topic words — no “more” / “read”
  • Monitor Crawl Stats → response time; > 1000ms triggers immediate optimization

FAQ

Q: Does pinging the sitemap help? A: Marginal, and Google deprecated the official ping endpoint in 2023. The biggest signal is “can this URL be reached directly from authoritative site pages?”

Q: Should I switch from HTML to JSON sitemap? A: No — sitemap format isn’t the bottleneck.

Q: Can Cloudflare caching improve Googlebot response time? A: Yes — especially for statically-cached articles, response time can drop from 800ms to 50ms on cache hit.

Tags: #SEO #Google #Search Console #Indexing #Troubleshooting #Crawl budget