Fix Weak, Generic Gemini Document Summaries

Gemini summaries miss key numbers and skip risk sections? Tell it what to extract and how to structure it. Tested prompts, model-picker fix, and verification steps for June 2026.

Published: May 17, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You upload an 80-page 10-K filing, type summarize, and get back five paragraphs that miss the key numbers, skip the risk factors, and read like a press release. This is the default LLM behavior on long documents: when you don’t say what you care about, the model surfaces the obvious section headings and skims past everything else.

Fastest fix: stop sending a one-word prompt. Tell Gemini exactly what to extract (decisions, risks, numbers) and how to structure it (a table or fixed headings), and make sure you’re on the Pro model, not Fast. That alone fixes most weak summaries. The rest of this page covers the long-document cases where that isn’t enough.

Which bucket are you in

Symptom	Most likely cause	Jump to
Output is a vague overview with no specifics	Prompt was just “summarize”	Step 1 and Step 6
First and last pages covered, middle missing	Long-context recall drop on a huge doc	Step 3
Summary cites no numbers from a doc full of tables	Tables flattened during PDF parsing	Step 4
Shallow reasoning, generic phrasing	Running on `Fast` (Gemini 3 Flash)	Step 5
Numbers in the summary look wrong	Hallucinated figure	Step 7
You need citations you can trust	Wrong tool for the job	Use NotebookLM instead

Common causes

1. The prompt is just “summarize” (most common)

Summarize this document tells the model nothing about whether you want decisions, risks, numbers, or strategy, so it defaults to a non-specific overview.

How to tell: your prompt is a single verb.

2. Long-context recall drops in the middle of very large docs

As of June 2026, Pro in the Gemini app runs Gemini 3.1 Pro with a 1M-token context window (roughly 1,500 pages of text), so a typical PDF now fits whole. But “fits in context” is not the same as “fully recalled.” On documents near the top of that window, recall is strongest at the beginning and end and weakest in the middle, a well-documented “lost in the middle” effect. Critical appendix or mid-document data still gets thin treatment.

How to tell: mid-document topics are completely absent from the summary while the intro and conclusion are well covered.

3. Tables, figures, and code blocks lost in parsing

Complex tables flatten or vanish during PDF parsing, so Gemini never sees the structure and the summary cites no specific numbers from them.

How to tell: the source has key tables (financials, comparisons) but the summary contains no figures from them.

4. No output structure specified

Free-form prose makes the information you actually want hard to locate, and lets the model quietly skip whatever it didn’t surface.

5. The document itself is low-information

Marketing or SEO content is generic to begin with, and the summary inherits that.

6. You’re on `Fast` (Gemini 3 Flash), not `Pro`

Fast is tuned for speed and is noticeably shallower than Pro on long-document reasoning.

Shortest path to fix

Step 1: Outline first, then drill

Don’t ask for the summary in one shot. Round 1:

Read this document. Give me ONLY a section-by-section outline (no summary yet):
- Section title
- Section length (pages)
- Key claim / topic (one sentence)

Review the outline, pick the 5-10 sections that matter, then Round 2:

Now give me a detailed summary of these sections only:
{section names}

For each:
- Key facts (with numbers)
- Decisions / recommendations
- Risks mentioned
- Direct quotes for critical claims

This forces the model to commit to coverage instead of paraphrasing the whole doc at low resolution.

Step 2: Give it a structured output template

Don’t let Gemini write prose. Hand it slots to fill:

Summarize this 10-K filing using this exact structure:

## Business Overview
- Main revenue segments + % of total
- Geographic mix

## Financial Highlights
| Metric | This year | Last year | YoY change |
|---|---|---|---|
| Revenue | | | |
| Operating margin | | | |
| Free cash flow | | | |
| Headcount | | | |

## Risk Factors (top 5)
1. ... (with page reference)

## Strategic Initiatives
- ...

## Management Tone Indicators
- Words used more / less than last year's filing

Explicit format, table slots, and numeric requirements mean the model can’t quietly leave gaps.

Step 3: Chunk very long docs

The 1M-token window means most single PDFs fit, but if you’re seeing the middle drop out (cause 2) on a doc of several hundred pages, split it:

Split into pages 1-30, 31-60, 61-80
Upload each batch separately, request a summary using the Step 2 template
Finally: paste the three batch summaries back and ask Gemini to
consolidate them and extract cross-batch themes

Smaller batches keep every page in the model’s high-recall zone. The Gemini app accepts up to 10 files per prompt and 100MB per file (non-video) as of June 2026, so multi-file uploads are practical.

Step 4: Extract tables separately first

If the key information lives in tables, pull them before you ask for meaning:

Extract every table from this document.
For each:
- Table title
- Headers (row + column)
- All cell values as markdown
- Page number

Get the numbers into clean markdown first, then run the semantic summary on top, so the figures survive parsing.

Step 5: Use `Pro` (Gemini 3.1 Pro), not `Fast`

In the Gemini app, the model picker at the top of the chat shows three options as of June 2026: Fast (Gemini 3 Flash), Thinking, and Pro (Gemini 3.1 Pro). For document analysis, choose Pro.

Model picker (top of chat) -> Pro

If you have Google AI Pro or Ultra, Pro also exposes a Thinking level toggle (Standard / Extended); pick Extended for dense filings where you want deeper reasoning per response. Pro is meaningfully deeper than Fast on long-document summarization.

Step 6: The “decisions / risks / numbers” template

A universal extraction prompt that works for roughly 90% of business documents:

Extract from this document:

DECISIONS: What did the author decide or recommend?
RISKS: What risks are mentioned? Use original phrasing.
NUMBERS: All quantitative claims (dates, percentages, dollar amounts) with surrounding context.
GAPS: What questions does the document raise but not answer?

The GAPS line is the one most people skip, and it’s the one that surfaces what a generic summary hides.

Step 7: Verify critical numbers

LLMs occasionally hallucinate figures in long-document summaries (uncommon, but worth checking on anything you’ll act on):

Pick the 5 most critical numeric claims.
Ctrl+F (or Cmd+F) each one in the source PDF.
For any mismatch, ask Gemini to re-extract: The figure you reported for X is wrong. Find the actual value on page Y and quote the surrounding sentence.

Alternative: use NotebookLM for grounded citations

If your real need is a summary you can trust and cite, the Gemini app is the wrong tool. NotebookLM (notebooklm.google) is built for source-grounded analysis: it stays inside the documents you give it and attaches inline citations that link back to the exact passage, so you can verify every claim. It accepts up to 500,000 words per source, with the per-notebook source count rising by plan (the free tier caps lower; paid tiers allow several hundred). Notebooks are now shared between Gemini and NotebookLM, so you can move a project between them. Use NotebookLM when verifiability matters; use the Gemini app when you want web search and broader tools around the doc.

How to confirm it’s fixed

A summary is good enough to act on when all of these are true:

It cites specific numbers (revenue, margins, dates), not just adjectives.
It covers mid-document sections, not only the intro and conclusion.
The figures you spot-checked in Step 7 match the source.
It answers the question you actually asked (the DECISIONS / RISKS / NUMBERS you requested), not a generic overview.

Prevention

Outline first, drill second. Never run a single-pass summarize.
Use a structured output template (tables plus bullets) to prevent prose drift.
For very long docs where the middle drops out, chunk into 30-40 page batches.
Extract tables separately before the semantic summary so numbers don’t vanish.
Keep document analysis on Pro (Gemini 3.1 Pro), not Fast.
Verify critical numbers yourself, or use NotebookLM when you need citations.

Tags: #Gemini #Debug #Troubleshooting

Which bucket are you in

Common causes

1. The prompt is just “summarize” (most common)

2. Long-context recall drops in the middle of very large docs

3. Tables, figures, and code blocks lost in parsing

4. No output structure specified

5. The document itself is low-information

6. You’re on Fast (Gemini 3 Flash), not Pro

Shortest path to fix

Step 1: Outline first, then drill

Step 2: Give it a structured output template

Step 3: Chunk very long docs

Step 4: Extract tables separately first

Step 5: Use Pro (Gemini 3.1 Pro), not Fast

Step 6: The “decisions / risks / numbers” template

Step 7: Verify critical numbers

Alternative: use NotebookLM for grounded citations

How to confirm it’s fixed

Prevention

Related

Related Articles

Gemini Code Assist IDE Plugin Out of Sync With the Web Model

Gemini Connected Apps (Workspace, Maps, YouTube) Not Triggering

Gemini Gems Not Saving or Disappearing: Fixes

Gmail 'Help Me Write' Drafts Sound Off-Tone or Generic in Gemini

Gemini Image Generation Blocks a Reasonable Prompt as a Safety Violation

Gemini 1M Context Still Truncates Long Documents

6. You’re on `Fast` (Gemini 3 Flash), not `Pro`

Step 5: Use `Pro` (Gemini 3.1 Pro), not `Fast`