You inherited a robots.txt that disallows /assets/, /_next/static/, or /wp-content/plugins/. The argument was “Googlebot doesn’t need to crawl static assets — save crawl budget.” But Googlebot needs to FETCH your CSS and JS to render the page. Without them, Googlebot sees a layout-less, half-broken page, can’t determine main content vs boilerplate, can’t run the JS that injects content, and may flag the page as mobile-unfriendly. Search Console URL Inspection will show “Page resources couldn’t be loaded” warnings.
Google has been explicit since 2014: don’t block CSS, JS, or images that affect rendering. The “crawl budget” intuition is wrong here — Google handles assets separately from HTML in its crawl scheduling.
Common causes
1. Legacy “block /assets/” rule from a 2010-era SEO guide
The directive looked clean: stop wasting crawl on bundles. It made sense in 2010 when Google didn’t render JS. It’s been outdated since 2015.
How to spot it: curl https://yoursite.com/robots.txt. If you see Disallow: /assets/, Disallow: /static/, Disallow: /_next/, or Disallow: /wp-content/plugins/, you’ve found it.
2. Wildcard rules accidentally block CSS
A rule like Disallow: /*.json$ to block API responses also blocks manifest.json, webpack-runtime.json, or critical config files needed for the build.
How to spot it: List any Disallow: /*.<extension> rules. Test each against your actual static file paths.
3. CDN subdomain disallowed in main site’s robots.txt
Your assets live on cdn.example.com. The CDN’s own robots.txt (which Googlebot checks when fetching CSS from there) disallows everything because no one configured it.
How to spot it: curl https://cdn.example.com/robots.txt. If it’s User-agent: * Disallow: /, your assets are blocked there even though main site’s robots.txt looks fine.
4. Fastly / Cloudflare WAF rule blocking Googlebot from /static/
Not robots.txt, but functionally equivalent. A WAF rule blocks bot user-agents from asset paths to prevent hotlinking. Googlebot gets 403.
How to spot it: curl -A "Googlebot/2.1 (+http://www.google.com/bot.html)" https://yoursite.com/static/main.css. If status is 403 / 401, WAF is blocking.
5. Subdirectory robots.txt overrides parent
You have https://yoursite.com/robots.txt allowing assets, but https://yoursite.com/app/robots.txt exists separately and disallows. Actually, robots.txt only works at the root — but mistaken file-serving setups can serve different content based on path.
How to spot it: Multiple robots.txt files in your codebase. Should be exactly one, served at the domain root.
6. Disallow /api/ blocks JS data-fetch endpoints
Common for SPAs: the page renders only after fetching /api/page/foo. Blocking /api/ in robots.txt means Googlebot can’t render the page client-side.
How to spot it: Page is mostly client-rendered AND robots.txt has Disallow: /api/. Rendering will fail.
7. Disallow: /*? blocks query strings on resources
Tries to deduplicate parameter URLs. Side effect: cache-busted assets like main.css?v=abc123 get blocked.
How to spot it: Check version-tagged asset URLs. If they include ?, the wildcard query rule blocks them.
Shortest path to fix
Step 1: Audit current robots.txt
curl -s https://yoursite.com/robots.txt
Print, share with your team, identify every Disallow: line. For each, ask: “Does Googlebot need to fetch this to render or understand the page?”
Step 2: Allow critical resources explicitly
Whitelist rendering-critical paths:
User-agent: *
Disallow: /admin/
Disallow: /private/
User-agent: Googlebot
Allow: /assets/
Allow: /static/
Allow: /_next/static/
Allow: /api/articles/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/
Allow: overrides Disallow: for more specific paths.
Step 3: Test rendering with Search Console
Search Console → URL Inspection → “Test live URL” → “View tested page” → “Screenshot” + “More info” → “Page resources.” Any resource marked “Blocked by robots.txt” is hurting rendering. Re-test after each robots.txt change.
Step 4: Verify CDN robots.txt
curl https://cdn.example.com/robots.txt
If it returns a global disallow, fix the CDN-side robots.txt (or remove it entirely so it 404s — Googlebot treats 404 as “no rules, allowed”).
Step 5: Whitelist Googlebot in your WAF
In Cloudflare: Security → Bots → “Verified Bot” allowlist. In Fastly: VCL rule to allow User-Agent matching Googlebot. Verify with reverse-DNS check, not user-agent string alone (it’s spoofable).
Step 6: Avoid wildcard rules unless tested
Replace overly broad Disallow: /*.json$ with targeted paths: Disallow: /api/admin.json. Test the wildcard against your actual file inventory before shipping.
Step 7: Resubmit and watch
After fixing robots.txt, request indexing on a few key pages. Within 1-2 weeks, “Page resources blocked” warnings in Search Console should drop, and pages should re-render correctly.
When this is not on you
Crawl-budget concerns are valid for sites with millions of URLs and tight server capacity, but should be handled via internal linking and sitemap priorities, not by blocking rendering resources. Don’t apply enterprise-scale tactics to a 5k-page site.
Easy to misdiagnose as
A “JS not rendered” issue. The symptoms overlap (blank rendered HTML), but root cause is different: here Googlebot couldn’t fetch the JS at all, not that the JS ran and didn’t render. Check robots.txt first when JS rendering issues appear.
Prevention
- Default
robots.txtshould allow everything except/admin/,/private/, search results pages. - Never disallow
/assets/,/static/,/_next/,/wp-content/themes/or/wp-content/plugins/. - Run a CI check on robots.txt against a list of known-required asset paths.
- Verify CDN robots.txt during launches.
- Audit robots.txt yearly as your asset paths evolve.
FAQ
- Will Google penalize me for excessive crawling if I allow everything? No — Google scales crawl rate automatically. Blocking assets doesn’t reduce crawl meaningfully.
- Should I disallow
/wp-admin/? Yes, that’s fine to block — it’s not needed for rendering and contains private endpoints.
Related
- JavaScript-Rendered Content Not Indexed
- Mobile-First Switch Indexing Drop
- Not Mobile-Friendly Indexing Issue
- Discovered Currently Not Indexed
- Crawled Currently Not Indexed
- Pages Dropped from Index
- Search Console Pages Report Drops
- Meta Robots vs X-Robots-Tag — Which One Wins
- Noindex vs robots.txt
- Indexing Coverage Drop After Redesign
Tags: #SEO #Troubleshooting #Indexing #Search Console #robots-txt #rendering