robots.txt Blocks CSS and JS and Indexing Quality Drops

You disallowed `/assets/` or `/_next/static/` in robots.txt to "save crawl budget." Googlebot can no longer render your page. Rankings and rich results decline.

You inherited a robots.txt that disallows /assets/, /_next/static/, or /wp-content/plugins/. The argument was “Googlebot doesn’t need to crawl static assets — save crawl budget.” But Googlebot needs to FETCH your CSS and JS to render the page. Without them, Googlebot sees a layout-less, half-broken page, can’t determine main content vs boilerplate, can’t run the JS that injects content, and may flag the page as mobile-unfriendly. Search Console URL Inspection will show “Page resources couldn’t be loaded” warnings.

Google has been explicit since 2014: don’t block CSS, JS, or images that affect rendering. The “crawl budget” intuition is wrong here — Google handles assets separately from HTML in its crawl scheduling.

Common causes

1. Legacy “block /assets/” rule from a 2010-era SEO guide

The directive looked clean: stop wasting crawl on bundles. It made sense in 2010 when Google didn’t render JS. It’s been outdated since 2015.

How to spot it: curl https://yoursite.com/robots.txt. If you see Disallow: /assets/, Disallow: /static/, Disallow: /_next/, or Disallow: /wp-content/plugins/, you’ve found it.

2. Wildcard rules accidentally block CSS

A rule like Disallow: /*.json$ to block API responses also blocks manifest.json, webpack-runtime.json, or critical config files needed for the build.

How to spot it: List any Disallow: /*.<extension> rules. Test each against your actual static file paths.

3. CDN subdomain disallowed in main site’s robots.txt

Your assets live on cdn.example.com. The CDN’s own robots.txt (which Googlebot checks when fetching CSS from there) disallows everything because no one configured it.

How to spot it: curl https://cdn.example.com/robots.txt. If it’s User-agent: * Disallow: /, your assets are blocked there even though main site’s robots.txt looks fine.

4. Fastly / Cloudflare WAF rule blocking Googlebot from /static/

Not robots.txt, but functionally equivalent. A WAF rule blocks bot user-agents from asset paths to prevent hotlinking. Googlebot gets 403.

How to spot it: curl -A "Googlebot/2.1 (+http://www.google.com/bot.html)" https://yoursite.com/static/main.css. If status is 403 / 401, WAF is blocking.

5. Subdirectory robots.txt overrides parent

You have https://yoursite.com/robots.txt allowing assets, but https://yoursite.com/app/robots.txt exists separately and disallows. Actually, robots.txt only works at the root — but mistaken file-serving setups can serve different content based on path.

How to spot it: Multiple robots.txt files in your codebase. Should be exactly one, served at the domain root.

6. Disallow /api/ blocks JS data-fetch endpoints

Common for SPAs: the page renders only after fetching /api/page/foo. Blocking /api/ in robots.txt means Googlebot can’t render the page client-side.

How to spot it: Page is mostly client-rendered AND robots.txt has Disallow: /api/. Rendering will fail.

7. Disallow: /*? blocks query strings on resources

Tries to deduplicate parameter URLs. Side effect: cache-busted assets like main.css?v=abc123 get blocked.

How to spot it: Check version-tagged asset URLs. If they include ?, the wildcard query rule blocks them.

Shortest path to fix

Step 1: Audit current robots.txt

curl -s https://yoursite.com/robots.txt

Print, share with your team, identify every Disallow: line. For each, ask: “Does Googlebot need to fetch this to render or understand the page?”

Step 2: Allow critical resources explicitly

Whitelist rendering-critical paths:

User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: Googlebot
Allow: /assets/
Allow: /static/
Allow: /_next/static/
Allow: /api/articles/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/

Allow: overrides Disallow: for more specific paths.

Step 3: Test rendering with Search Console

Search Console → URL Inspection → “Test live URL” → “View tested page” → “Screenshot” + “More info” → “Page resources.” Any resource marked “Blocked by robots.txt” is hurting rendering. Re-test after each robots.txt change.

Step 4: Verify CDN robots.txt

curl https://cdn.example.com/robots.txt

If it returns a global disallow, fix the CDN-side robots.txt (or remove it entirely so it 404s — Googlebot treats 404 as “no rules, allowed”).

Step 5: Whitelist Googlebot in your WAF

In Cloudflare: Security → Bots → “Verified Bot” allowlist. In Fastly: VCL rule to allow User-Agent matching Googlebot. Verify with reverse-DNS check, not user-agent string alone (it’s spoofable).

Step 6: Avoid wildcard rules unless tested

Replace overly broad Disallow: /*.json$ with targeted paths: Disallow: /api/admin.json. Test the wildcard against your actual file inventory before shipping.

Step 7: Resubmit and watch

After fixing robots.txt, request indexing on a few key pages. Within 1-2 weeks, “Page resources blocked” warnings in Search Console should drop, and pages should re-render correctly.

When this is not on you

Crawl-budget concerns are valid for sites with millions of URLs and tight server capacity, but should be handled via internal linking and sitemap priorities, not by blocking rendering resources. Don’t apply enterprise-scale tactics to a 5k-page site.

Easy to misdiagnose as

A “JS not rendered” issue. The symptoms overlap (blank rendered HTML), but root cause is different: here Googlebot couldn’t fetch the JS at all, not that the JS ran and didn’t render. Check robots.txt first when JS rendering issues appear.

Prevention

  • Default robots.txt should allow everything except /admin/, /private/, search results pages.
  • Never disallow /assets/, /static/, /_next/, /wp-content/themes/ or /wp-content/plugins/.
  • Run a CI check on robots.txt against a list of known-required asset paths.
  • Verify CDN robots.txt during launches.
  • Audit robots.txt yearly as your asset paths evolve.

FAQ

  • Will Google penalize me for excessive crawling if I allow everything? No — Google scales crawl rate automatically. Blocking assets doesn’t reduce crawl meaningfully.
  • Should I disallow /wp-admin/? Yes, that’s fine to block — it’s not needed for rendering and contains private endpoints.

Tags: #SEO #Troubleshooting #Indexing #Search Console #robots-txt #rendering