Finding Content Gaps with AI — a Repeatable Workflow

A 2026 workflow for using AI to spot the articles your site should have but doesn't — using your own sitemap, Search Console data, and a topic pillar map.

Content gaps are where a site stalls. You know there is more to write about, but you cannot see what is missing. AI is great at this if you feed it the right map.

Background

Most “find content gaps” advice tells you to scrape competitors. That works once. The better long-term approach is to feed AI your own pillar/cluster structure plus the queries you already partly rank for, then ask it where the holes are. This works because AI is genuinely good at completing a structured pattern when given enough of it.

How to tell

  • You publish regularly but feel like you are running out of obvious topics.
  • You can describe your site’s pillar topics in 3-5 sentences.
  • You have Search Console access and at least 3 months of query data.
  • You have not done a formal gap analysis in the past 60 days.

Step by step

  1. Export sitemap to a flat CSV. Run in your terminal:

    # Pull sitemap, extract URL + last-modified
    curl -s https://yoursite.com/sitemap.xml \
      | grep -E "<(loc|lastmod)>" \
      | sed -E 's/<\/?(loc|lastmod)>//g; s/^[[:space:]]+//' \
      | paste - - \
      > sitemap_urls.tsv
    
    # Pull <title> for each URL via Python (~5 minutes)
    python3 -c "
    import csv, requests
    from bs4 import BeautifulSoup
    with open('sitemap_urls.tsv') as f, open('articles.csv','w') as out:
        w = csv.writer(out)
        w.writerow(['url','slug','title','lastmod'])
        for line in f:
            url, lastmod = line.strip().split('\t')
            html = requests.get(url, timeout=10).text
            title = BeautifulSoup(html,'html.parser').title.string.strip()
            slug = url.rstrip('/').split('/')[-1]
            w.writerow([url, slug, title, lastmod])
    "

    Output articles.csv has 4 columns: url, slug, title, lastmod.

  2. Tag pillar / cluster. Add pillar and cluster columns in Sheets / Excel, fill manually (500 articles < 1 hour). If your URL already encodes category:

    # If URL is /en/articles/indie-dev/foo/
    awk -F'/' 'NR>1\{print $0","$5\}' articles.csv > articles_with_pillar.csv
  3. Pull Top 200 GSC queries. Search Console → Performance → Queries → Export → “Download CSV”. Or via API (pip install searchconsole):

    import searchconsole
    account = searchconsole.authenticate(client_config='client_secret.json')
    webproperty = account['https://yoursite.com/']
    report = webproperty.query.range('today', days=-90).dimension('query').limit(200).get()
    report.to_dataframe().to_csv('gsc_top_queries.csv', index=False)

    CSV contains query, clicks, impressions, ctr, position. Focus on impressions > 100 AND position > 10 — has reach but isn’t ranking = coverage gap signal.

  4. Feed to AI for gap analysis. Open Claude / GPT-5.5 long-context, paste all 3 files + this prompt:

    Attachment 1: articles_with_pillar.csv (all current articles + pillar)
    Attachment 2: gsc_top_queries.csv (last 90 days of high-impression queries)
    
    Run a gap analysis:
    
    1. Per pillar, list the 5 queries in GSC Top 200 that have impressions but no current article directly answers them
       - "Directly answers" = article title or URL slug contains the query's core terms
       - Output: | pillar | query | current impressions | current position | existing coverage (none/partial/full) |
    
    2. Per pillar, list 5 sub-topics NOT in GSC but that should be there
       - Reason: a competitor ranks for it, OR your existing article mentions it but doesn't have a dedicated page
       - Output: | pillar | missing sub-topic | why it should exist | 1 seed title |
    
    3. List 5 cross-pillar bridge topics — content that connects 2 pillars
       - Example: indie-dev pillar + ai-tools pillar → "Use AI to audit App Store screenshots"
       - These typically have the strongest differentiation
    
    Do NOT hallucinate articles / queries — only use what I gave you.
  5. Filter with domain knowledge. For every AI suggestion, ask:

    - Is this actually searched (not AI-fabricated semantics)?
    - Can I write this better / more concretely than the top 3?
    - Does it have commercial value (affiliate / conversion / brand)?
    - Will 1 page do, or do I need a 3-5 article cluster?

    Any “no” → drop.

  6. Cross-cluster bridges, second pass:

    Based on the previous gap analysis, give me 8 cross-pillar bridge topics:
    
    - Each topic must naturally belong to 2 pillars (not forced)
    - For each:
      - Title (verb-led or number-led)
      - Core question (≤10 words)
      - Primary audience (pillar A folks + pillar B folks)
      - Internal-link opportunities (which existing X articles can it link to)
    
    Bridge content captures the "cares about both" audience — competitors rarely sit on both sides.
  7. Write an “angle card” per gap topic:

    For this gap topic: <title>
    
    1. Search the current Google top 10 — note their titles, opening lines, structure
    2. List 3 common blind spots (all dodge X / all answer Y shallowly / all are stale)
    3. Give me 1 "counter-conventional + first-person + number" opening angle
    4. List 5 pieces of specific evidence I must collect before publishing (numbers / screenshots / tool versions)
  8. Write back to a content queue: content_pipeline.csv:

    slug,title,pillar,cluster,target_keyword,intent,angle_note,evidence_needed,priority
    ai-app-store-screenshot-review,Use AI to audit App Store screenshots,indie-dev,app-store-launch,app store screenshot review ai,how-to,counter-conventional + measured,3 before/after sets,P1
    ...

    priority = P1 (high-impression GSC + your strongest angle) / P2 (bridge topics) / P3 (exploratory). Pick 3 P1 per week to actually write.

Common pitfalls

  • Letting the AI suggest topics without your data. It will produce generic “things people search in your niche” lists that miss what your site is actually positioned for.
  • Skipping the manual pillar tagging. Without structure, AI cannot give structured gaps.
  • Pursuing every gap. Most “gaps” are gaps because nobody searches for them. Cross-check intent.
  • Doing this only once. Search Console data shifts; gap analysis should run quarterly.
  • Trusting AI volume estimates. It hallucinates traffic numbers freely.

Who this is for

Established sites (50+ articles) with a real pillar structure and Search Console history.

When to skip this

Brand-new sites with no Search Console data — focus on writing the first 30 articles before optimizing gaps.

FAQ

  • Should I scrape competitors too?: Yes as a supplement, but your own data is a stronger signal than competitor mimicry. Use competitor scraping to validate, not to seed.
  • How many gap topics should I queue at once?: 8-15 is a healthy backlog. More than that and you will pick less-promising ones to clear the list.
  • Can the AI prioritize the gaps for me?: It can give an opinion, but final prioritization should reflect your edge and effort cost — neither of which the AI sees.
  • What if AI suggests gaps already on my list?: Good signal — means the gap is real. Reprioritize it.

Tags: #Indie dev #AI-assisted build #SEO #Content ops #Workflow