Will my journal accept AI-assisted reviews?

Most accept AI for screening and extraction if you disclose it; few accept AI-generated prose without revision. As of June 2026, the expected disclosure format is the PRISMA-trAIce checklist (14 items) — record tool, version, stage, and prompts. Check the specific author guidelines too.

Is it safe to let AI do the screening alone?

No. 2025 Cochrane-corpus tests show LLMs reach near-100% sensitivity but low precision, and Cochrane's stated position is that current evidence does not support generative-AI use without human oversight. Use it as a second reviewer.

Which model for extraction?

Long-context preferred. Claude Sonnet 4.6 or Opus 4.7 for nuanced fields; Gemini 3.1 Pro for very long PDFs. All offer a 1M-token window as of June 2026.

How many abstracts per screening batch?

20-50. Beyond 50, the model starts averaging the criteria.

What about non-English papers?

AI translation helps for screening but is risky for extraction. For included non-English papers, get a human translation of the methods section.

Should I use Rayyan or Covidence?

Yes, alongside the LLM. They hold the audit trail and PRISMA flow diagram the chat window cannot. Rayyan has a free tier (3 reviews); Covidence (~$340/year) is the Cochrane-protocol standard.

AI Tool Tutorials

AI Systematic Literature Review Tutorial Without Hallucination

Run a PRISMA-grade systematic review with AI as a screening and extraction layer — never as a citation generator. Tools, prompts, and disclosure rules for June 2026.

Published: May 23, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Asking an AI to “write a literature review” is how people get a four-page essay with seven made-up citations and one accidental retraction. A real systematic review still needs you to define inclusion criteria, run the actual database searches, screen, and extract. AI earns its place in three spots only: triaging abstracts, pulling structured data from full texts you have downloaded, and helping draft synthesis where every claim resolves to a numbered paper. This tutorial walks the loop that keeps PRISMA-grade rigor and cuts the busywork roughly in half.

TL;DR

AI does screening, extraction, and synthesis drafting. It does not run your database search or generate citations.
As of June 2026, the reporting standard for AI-assisted reviews is PRISMA-trAIce (14 items, published in JMIR AI on December 10, 2025): record every tool, version, prompt, and which stage it touched.
Long-context models earn their keep: Claude Opus 4.7 / Sonnet 4.6 (1M tokens), Gemini 3.1 Pro (1M), GPT-5.5. Short-context models drop mid-paper detail during extraction.
LLM abstract screening hits very high sensitivity (up to 100% in 2025 Cochrane-corpus tests) but low precision — treat it as a second reviewer biased toward inclusion, never as the sole gatekeeper.
Cochrane’s stated position (2025): current evidence does not support generative-AI use in evidence synthesis without human oversight. Build the audit trail accordingly.

What this covers

A real systematic-review workflow with AI in three roles only: screening abstracts against your inclusion criteria, extracting structured data from full texts you have actually downloaded, and helping draft the synthesis where every claim points back to a numbered paper. The workflow assumes PRISMA-2020 discipline plus PRISMA-trAIce-style AI disclosure, not “skim and summarize.”

Who this is for

PhD students writing the review chapter, postdocs producing a meta-analysis, evidence-synthesis teams in medicine, policy, or ed-research, and any consultant who needs a defensible “what does the literature say” section. Less useful for casual reading — for that, the AI paper-reading workflow is faster.

When to reach for it

When you have a defined research question, access to the right databases (PubMed, Scopus, Web of Science, ACM, Semantic Scholar), and a body of literature too large to read linearly. Not the right tool when the field has fewer than 20 candidate papers — read those by hand — or when the target journal bans AI-assisted screening; check author guidelines first.

Pick your tools first

Two layers matter: a workflow tool that holds the audit trail (deduplication, dual screening, PRISMA flow diagram) and a long-context LLM that does the reading. Do not collapse them into one chat window — reviewers want the artifact, not a transcript. Pricing as of June 2026:

Tool	Role	Free tier	Paid (individual)	Notes
Rayyan	Screening + audit trail	3 active reviews, 2 reviewers	Essential $4.99/seat/mo, Advanced $8.33/seat/mo (billed annually)	Advanced adds AI PICO extraction; Academic license $25/mo
Covidence	Full SR workflow	Limited trial review	~$340/year per reviewer	Cochrane’s standard; built-in PRISMA reporting
Elicit	Discovery + extraction	Limited monthly credits	$20/mo	Searches 200M+ papers via the Semantic Scholar corpus
ASReview	Active-learning screening	Free, open source	—	Self-hosted; good for huge candidate sets
Claude / ChatGPT / Gemini	Extraction + synthesis	Free tiers (capped)	$20/mo (Pro/Plus)	Long context is what you pay for; see model table below

Rayyan plus one long-context chat model covers most solo reviews. Cochrane-protocol teams default to Covidence.

Pick your model

Model	Context window	Best for	Watch out
Claude Opus 4.7	1M tokens	Nuanced extraction, contested fields	Higher cost ($5/$25 per 1M tokens API)
Claude Sonnet 4.6	1M tokens	The default workhorse	—
Gemini 3.1 Pro	1M tokens	Very long PDFs, multi-paper batches	Verify table extraction
GPT-5.5	~320 pages in-app (Plus); full 1M only on $200 Pro	Quick screening passes	In-app context is tighter than the API

For extraction, prefer a 1M-token model so a 30-page methods-heavy paper fits without truncation. Per the canonical sheet, Claude Code runs Anthropic models only — relevant if you script extraction.

Before you start

Write your PICO or equivalent. Population, intervention, comparator, outcome — or your field’s framing. Without it, inclusion screening drifts into vibes.
Decide your databases and search strings in advance. The AI does not run the database search; you do.
Lock the extraction columns before screening: design, sample, method, primary result, effect size with CI, limitations.
Open a disclosure log now. PRISMA-trAIce wants the tool name, version (for example “Claude Opus 4.7, June 2026”), the stage it touched, and the verbatim prompt. Retrofitting this after the review is painful.

Step by step

Run the database search yourself. PubMed, Scopus, Semantic Scholar — whatever your field uses. Export the hits as RIS or CSV. The AI does not search; it screens. This separation is the whole reason the review is defensible.
Deduplicate before screening. Rayyan and Covidence do this automatically; if you are working in a chat window, dedupe in your reference manager first. Duplicate hits silently inflate your counts and your PRISMA flow diagram.
Title and abstract screening with AI as a second reviewer. Paste 20-50 abstracts at a time with your inclusion criteria. Ask: “for each, output INCLUDE, EXCLUDE, or UNCLEAR, with a one-line rationale referencing my criteria.” 2025 diagnostic-accuracy tests on the Cochrane corpus found LLMs reach near-100% sensitivity but low precision, so bias the prompt toward inclusion and treat UNCLEAR as automatic full-text review.
Reconcile against your own first-reviewer pass. Disagreements are signal — they expose ambiguous criteria. Most reviews need 1-2 criteria rewrites here. Log inter-reviewer agreement; PRISMA-trAIce expects you to separate human exclusions from AI exclusions.
Download full texts for the included set. Non-negotiable. You cannot extract data from a paper you have not downloaded. Build a folder where the filename is firstauthor_year_id.pdf.
Run extraction one paper at a time. Upload the PDF. Ask: “Extract design, sample size, intervention, comparator, primary outcome, effect size with CI, and the single biggest limitation the authors acknowledge. Return as one row of pipe-separated values matching this column order.” The extraction step here is the structured cousin of pass 2 in the AI paper-reading workflow.
Spot-check 20 percent of extracted rows against the source. Open the PDF. Find the number. If the AI got effect size or sample size wrong, you have a calibration problem — switch models or shorten the prompt.
Synthesis draft. Cluster included papers by design or by intervention. Ask the AI to draft a paragraph per cluster, citing only by your numeric IDs. Never let it invent author names or years; run the result through an AI citation check workflow before it touches the manuscript.

First-run exercise

Pick a sub-question of your real review — narrow enough that 10-15 papers cover it. Run the full loop end-to-end on that sub-question first. Time each phase. Most teams find screening compresses the most, extraction modestly, and synthesis the least. Use the per-phase timing to budget the full review; the sub-question also tests whether your inclusion criteria are crisp enough.

Quality check

Every cell in the extraction sheet matches a sentence or table in the source PDF — spot-check 20 percent.
Screening disagreements between you and the AI were logged, not silently overridden.
No citation in the final synthesis is invented — every numeric ID resolves to a paper in your downloaded folder.
Effect sizes report confidence intervals or “not reported” — never a single number with no spread.
Your disclosure log covers tool, version, stage, and prompt for every AI step (the PRISMA-trAIce minimum).
Synthesis paragraphs have a clear shape: established, contested, gap. If everything reads “established,” you are flattering the field.

How to reuse this workflow

Save inclusion criteria, extraction columns, and the screening prompt as review_template.md. New question, new search string, same scaffolding.
Keep a model-calibration log: which model got effect sizes right at what rate, across which fields. This compounds across reviews.
Keep the screening reconciliation log. Reviewers asking “how did you handle ambiguous cases” want to see this artifact.

Recommended workflow

PICO question → search string → database hits → dedupe → AI second-reviewer screening → reconcile + log → download full texts → structured extraction → 20 percent spot-check → cluster and synthesize → cite by numeric IDs → citation check → PRISMA-trAIce disclosure. Plan roughly one week for a 100-paper review with AI assistance, versus three weeks linear.

Common mistakes

Asking the AI to “find the relevant papers” — it cannot replace your database search and will invent citations.
Skipping the extraction spot-check — confident-sounding errors enter the sheet and survive into the meta-analysis.
Trusting AI screening as a solo gatekeeper. High sensitivity, low precision: it is great at not missing papers, bad at confidently excluding them.
Letting the AI cluster papers without your judgment — clusters end up by surface topic rather than by mechanism.
Treating UNCLEAR as exclude — you lose the borderline papers that are usually the most interesting.
Using a short-context model on long papers — the back half gets summarized away.
Forgetting to record prompt versions and model versions. PRISMA-trAIce and your reviewers will both ask.

FAQ

Will my journal accept AI-assisted reviews?: Most accept AI for screening and extraction if you disclose it; few accept AI-generated prose without revision. As of June 2026, the expected disclosure format is the PRISMA-trAIce checklist (14 items) — record tool, version, stage, and prompts. Check the specific author guidelines too.
Is it safe to let AI do the screening alone?: No. 2025 Cochrane-corpus tests show LLMs reach near-100% sensitivity but low precision, and Cochrane’s stated position is that current evidence does not support generative-AI use without human oversight. Use it as a second reviewer.
Which model for extraction?: Long-context preferred. Claude Sonnet 4.6 or Opus 4.7 for nuanced fields; Gemini 3.1 Pro for very long PDFs. All offer a 1M-token window as of June 2026.
How many abstracts per screening batch?: 20-50. Beyond 50, the model starts averaging the criteria.
What about non-English papers?: AI translation helps for screening but is risky for extraction. For included non-English papers, get a human translation of the methods section.
Should I use Rayyan or Covidence?: Yes, alongside the LLM. They hold the audit trail and PRISMA flow diagram the chat window cannot. Rayyan has a free tier (3 reviews); Covidence (~$340/year) is the Cochrane-protocol standard.

Tags: #lit-review #Research #Tutorial

TL;DR

What this covers

Who this is for

When to reach for it

Pick your tools first

Pick your model

Before you start

Step by step

First-run exercise

Quality check

How to reuse this workflow

Recommended workflow

Common mistakes

FAQ

Related

Related Articles

AI Competitive Research Tutorial: 5 Competitors in 30 Minutes

AI Historical Archive Research: A Primary-Sources-First Workflow

AI Market Sizing Tutorial: TAM/SAM/SOM From Top-Down + Bottom-Up

How to Check AI Citations and Sources: A 4-Pass Verification Workflow

AI Fact-Check Workflow: Verify a Claim in 3 Minutes

AI Industry Research Workflow: Deep Research, End to End