New Site Stuck in the Discovery Phase

A new domain has been live for 4–8 weeks. Sitemap is submitted, URL Inspection shows pages as "Discovered". Nothing seems to be moving forward.

Your new domain has been live for 4-8 weeks. You submitted the sitemap, URL Inspection says “Discovered – currently not indexed,” but Googlebot never comes back to crawl — let alone index. This is Google’s “wait and see” state for new domains: it knows the URL exists but isn’t spending budget on fetching it.

The fix isn’t to force Google to crawl a URL it doesn’t believe in. It’s to lift the whole site’s authority signals so Google decides the site deserves budget.

Symptoms

  • Many URLs in the “Pages” report show “Discovered — currently not indexed”
  • Crawl Stats shows very few hits per day (10-50 URLs/day; an active site should see hundreds to thousands)
  • No manual action, no obvious errors
  • “Discovered” URLs show “Last crawl: N/A” or weeks ago in URL Inspection

Quick verdict

New domains routinely sit in a discovery / sandbox phase. Google is sampling pages and building site authority signals. The fix is to make every signal you can control strong, then wait.

Common causes

Backlinks are one of Google’s strongest “how much crawl budget should I spend on this site” signals. Zero backlinks → Googlebot barely visits.

How to confirm: Check ahrefs.com/webmaster-tools (free) for Referring Domains — new sites are often 0-2.

When Google decides whether to crawl a URL, it counts internal links pointing to it. The sitemap is a baseline signal but weighted lightly. A URL that appears only in the sitemap, with no site-internal links, sits at the bottom of the Discovered queue.

How to confirm: Pick a few “Discovered” URLs and grep -r "/that-url/" src/ to see how many places in your code reference each.

3. Most pages are thin or templated

If your first 30 articles are < 300 words / heavily templated AI content, Google may flag the site as “bulk low-quality” and lower indexing priority across the board.

4. Sitemap is submitted, but there’s no user-activity proof of life

No traffic = no validation that “this site is useful” = Google stays conservative. A perfect sitemap doesn’t compensate for zero traffic signals.

5. Domain is an expired / previously penalized domain

If you bought an expired domain, you may have inherited the previous owner’s spam history.

How to confirm: Check archive.org Wayback Machine. If snapshots show gambling / adult / spam, you’ve inherited a penalty.

Shortest path to fix

Ordered by effect (not by ease).

Action list by hit rate:

MethodDifficultyExpected links
Post a valuable question / resource on relevant Reddit / HN (link in profile or comments, not body)Easy1-3
Cross-link with a friend / coworker / ex-coworker’s personal blog or company siteEasy1-5
Submit to a relevant awesome-* GitHub listMedium1-3
Guest post on a relevant blog (even 1k followers counts)Hard1-2
Submit to directories / tool listings (Product Hunt etc.)Easy0-3

With fewer than 5 dofollow links, the whole site’s crawl budget stays low. This step matters more than any technical change.

Common new-site mistake: homepage shows latest 5; older articles vanish as new ones publish.

Fix:

---
// src/pages/index.astro
import { getCollection } from "astro:content";
const all = await getCollection("posts");
const sorted = all.sort((a, b) => b.data.publishedAt - a.data.publishedAt);
---
<h2>Latest</h2>
<ul>{sorted.slice(0, 10).map(p => <li><a href={`/articles/${p.slug}/`}>{p.data.title}</a></li>)}</ul>

<h2>All articles ({sorted.length})</h2>
<ul>{sorted.map(p => <li><a href={`/articles/${p.slug}/`}>{p.data.title}</a></li>)}</ul>

Or build a standalone /articles/ index and link to it from the homepage. Either way every article is ≤2 clicks from anywhere.

Step 3: Each article: 600+ words real content, real H1, structured

Minimum bar:

  • Exactly 1 <h1>, includes the primary keyword
  • At least 3 <h2> for section structure
  • 600+ words of body (800-1500 is the sweet spot)
  • At least 1 image (with alt text)
  • At least 3 internal links (to related articles or hub pages)
  • At least 1 outbound link to an authoritative source (Wikipedia, official docs, known site)

Enforce with a script:

// scripts/check-thin.mjs
import fg from "fast-glob";
import fs from "node:fs";

const issues = [];
for (const f of fg.sync("dist/articles/**/*.html")) {
  const html = fs.readFileSync(f, "utf8");
  const text = html.replace(/<[^>]+>/g, " ").replace(/\s+/g, " ").trim();
  const words = text.split(/\s+/).length;
  const h1s = (html.match(/<h1[\s>]/g) || []).length;
  const intLinks = (html.match(/href="\/[^"]+"/g) || []).length;
  if (words < 600) issues.push(`THIN (${words}w): ${f}`);
  if (h1s !== 1) issues.push(`H1=${h1s}: ${f}`);
  if (intLinks < 3) issues.push(`internal links=${intLinks}: ${f}`);
}
console.log(issues.join("\n"));

Step 4: Run Lighthouse, fix critical perf + crawl warnings

npx lighthouse https://yourdomain.com/some-article --quiet --chrome-flags="--headless"

Fix these specifically:

  • LCP > 2.5s → optimize main image loading, reduce JS
  • CLS > 0.1 → set fixed width/height on images
  • Crawlability — any robots.txt warning
  • “Document has a meta description” — add if missing

Step 5: Wait 8-12 weeks

Realistically, 8-12 weeks for a new domain to fully enter the index. During that window:

  • Keep publishing on cadence (2-3 articles/week)
  • Every 4 weeks revisit Search Console → Pages and look at the Indexed count growth curve
  • Resist the urge to thrash canonical / robots / sitemap “to speed things up”

When this is not on you

Sandboxing is under-documented but well-observed. Even technically perfect, well-content new sites need 6-12 weeks before indexing rates climb steadily. If by week 9 you start seeing trickle indexing and by week 12 you’re at 30-50%, that’s healthy.

Easy to misdiagnose

  • Resubmitting sitemap daily, hammering URL Inspection: doesn’t skip the sandbox, wastes Search Console quota
  • Changing URL structure to “force re-evaluation”: resets every weak signal you’ve accumulated — slower, not faster
  • Publishing more AI content to “boost signals”: bulk low-quality content activates SpamBrain in the wrong direction
  • Buying backlinks: paid links are easy for algorithms to flag — net negative

Prevention

  • Start with a clear topic focus — Google trusts focused sites faster (10 articles about “Astro deployment” beats 50 scattered topics)
  • A few quality backlinks in month one beats any technical tweak
  • Install Analytics + Search Console on launch day so you have trend data
  • After launch, give the site a 60-90 day “quiet window” — no big structural changes

FAQ

Q: How long until a new site is “out” of the sandbox? A: Usually 8-16 weeks, varies widely. Technically strong + good content + a few backlinks often clears at 8-10 weeks; sitemap-only with no traffic or links can take 4-6 months.

Q: Does Google publicly acknowledge the sandbox? A: Officially described as “cautious treatment for new domains,” not a separate sandbox system. The behavior is real either way.

Q: Is migrating from a subdirectory to a new independent domain still “new site”? A: Yes. Even with identical content, Google rebuilds authority signals from zero. Don’t change domains unless necessary.

Q: Are expired domains useful? A: High risk. Possible to inherit spam history, sometimes some link equity carries over. Before buying, check Wayback Machine + ahrefs historical backlink record.

Tags: #SEO #Google #Search Console #Indexing #Troubleshooting #Discovery