ChatGPT Image Generation: From First Image to a Consistent Series

Q: Do I need Midjourney-style parameters like `--ar`?

No. ChatGPT doesn't parse `--ar`, `--v`, or `--sref`. Use natural language for ratio ("4:5", "vertical 9:16"). It supports 3:1 to 1:3 at up to 2K resolution, and it recomposes the frame for the new ratio instead of cropping.

How ChatGPT Images 2.0 works in June 2026, the smallest prompt that gives clean results, and how to keep a character or style consistent across an 8-image set.

Published: May 16, 2026 Updated: Jun 06, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

ChatGPT’s image generation runs differently from Midjourney: there are no --ar flags, the model reasons about your request before it renders, and it can keep a character consistent across a whole set in one conversation. This walks you from “type the prompt” to a publishable image, then to a style-consistent series for social channels.

TL;DR

As of June 2026, ChatGPT image generation runs on ChatGPT Images 2.0 (model gpt-image-2), which OpenAI shipped April 21, 2026 to replace GPT Image 1.5.
Just type your request in the normal chat box. Words like “generate”, “draw”, “image”, or “create a picture of” trigger it; you don’t open a separate tool.
Free users get roughly 2-3 images per day on the base model. Plus ($20/mo) unlocks Thinking mode (reasoning, character consistency across up to 8 images) at roughly 50 images per rolling 3-hour window.
Aspect ratios run 3:1 to 1:3 at up to 2K resolution, set in plain language (“make it 4:5”, “vertical 9:16”). No Midjourney parameters.
The real edge over Midjourney: it remembers the previous image in the same chat, so “same character, new scene” actually holds.

What ChatGPT Images 2.0 changed

The April 2026 update is worth knowing because the workflow advice below depends on it. Three things are new versus the older DALL·E behavior:

Capability	Before (GPT Image 1.5)	Now (ChatGPT Images 2.0)
Images per prompt	1	Up to 8, with character/object continuity
Aspect ratios	Limited presets	3:1 to 1:3, recomposed (not cropped)
Reasoning step	None	Plans layout and text placement before rendering (Plus+ Thinking)
In-image text	Often garbled	Readable typography, including Chinese, Japanese, Korean
Reference editing	Coarse	Upload an image, select a region, describe the change

The reasoning step matters most: on Plus and above, the model interprets spatial relationships and text placement first, so infographics, menus, and labeled diagrams come out legible instead of as gibberish.

Where it lives and which tier you need

There is no separate image panel. You generate from the regular chat box.

Tier	Price (June 2026)	Image access	Rough limit
Free	$0	Base model only	~2-3 images/day
Go	$8/mo	Base model	Higher than Free
Plus	$20/mo	Base + Thinking mode	~50 / 3-hr window
Pro	$200/mo	Base + Thinking + ImageGen Pro	Highest

OpenAI does not publish exact image caps, and they shift; treat these as community-observed ranges as of June 2026. The practical line: if you want consistent characters across a set, you need Plus or above so Thinking mode is available.

The minimum prompt that works

You don’t need Midjourney parameters, but structure still beats a vague sentence. Order the prompt like this:

Subject + Style + Lens / POV + Lighting + Mood + Aspect

Example:

A 30-year-old man drinking coffee by a cafe window, realistic photography style, 50mm f/1.8 lens, natural window light, warm and quiet mood, 4:5 aspect.

The two parts most people skip are lens and lighting direction, and those are exactly what separate a real-looking photo from “AI stock”. A named focal length (50mm, 85mm) plus a light source (“side-light from a window”, “overcast soft light”) does more than ten mood adjectives.

For aspect ratio, just say it in words. ChatGPT recomposes the frame for the new ratio rather than cropping, so the same idea ships cleanly as 1:1, 9:16, and 16:9 for different placements.

How to keep a series consistent

This is ChatGPT’s killer feature: it remembers the previous image in the same chat. To use it:

Generate image 1, then name exactly what to preserve: “keep this lighting and color grade, same character and outfit”.
For each new image, anchor to the prior one: “Based on the previous image, same person, now in a kitchen at night.”
After 3-4 iterations, freeze the style into a fixed block of text and only change the subject line. Paste the same block every time.
On Plus or above, ask for the whole set at once: “Generate 4 images of this character in different scenes, consistent face and outfit.” Thinking mode holds continuity across up to 8 images in a single request.

Two things that break consistency: starting a new conversation (style resets) and editing the frozen style block mid-series. Keep one chat, keep the block stable.

Editing an existing image

Upload a photo and you can either select a region and describe the change, or describe a broader edit in conversation. Plain-language follow-ups steer the result without starting over: “remove the cup”, “change the jacket to navy”, “fix the hands”. Region selection is not pixel-perfect, so for tight masks, expect a couple of passes.

What it’s good for

Xiaohongshu / blog covers, banners, and thumbnail variants
Personal avatars and virtual-character exploration
Product mock-ups and scene shots for design conversations
App / website hero references and storyboards
Infographics and labeled diagrams (the readable-text upgrade makes this newly viable)

What it’s not good for

Vector, scalable, or fully editable final deliverables (logos, icons) — it outputs raster pixels, not editable paths
Strict commercial-rights assets where you need clear licensing for trademarks
Exact text on a tight deadline — typography is far better now but still proofread it

On rights: under OpenAI’s current usage terms, you own the images you create and can use them commercially, but don’t lean on AI for logos, trademarks, or anything needing strict IP clearance. Check the latest terms before a commercial launch.

FAQ

Where is ChatGPT image generation in the interface? It lives in the regular chat box for Free, Plus, and Pro users — there’s no separate panel. It triggers on words like “generate”, “draw”, “image”, or “create a picture of”. Free users are limited to roughly 2-3 images per day.

Do I need Midjourney-style parameters like --ar? No. ChatGPT doesn’t parse --ar, --v, or --sref. Use natural language for ratio (“4:5”, “vertical 9:16”). It supports 3:1 to 1:3 at up to 2K resolution, and it recomposes the frame for the new ratio instead of cropping.

How do I keep a 4-image series visually consistent? Generate everything in one conversation and reference the previous image explicitly (“same character, same outfit, now in a kitchen”). On Plus or above, Thinking mode can produce up to 8 images with continuity in a single request. Starting a new chat resets the style.

Why does my image look generic or “AI-stock”? The prompt is missing specifics. Concrete lens and lighting direction beat generic adjectives every time. Add a 50mm f/1.8, side-light from a window, and a named mood before re-running.

What’s the difference between Free and Plus for images? Free runs the base model only (~2-3 images/day). Plus ($20/mo) adds Thinking mode — the reasoning step that plans layout and text, plus character consistency across up to 8 images — at roughly 50 images per 3-hour window. Pro adds an ImageGen Pro layer on top.

Tags: #ChatGPT #Image generation #Prompt

TL;DR

What ChatGPT Images 2.0 changed

Where it lives and which tier you need

The minimum prompt that works

How to keep a series consistent

Editing an existing image

What it’s good for

What it’s not good for

FAQ

Related

Related Articles

ChatGPT Canvas Workflow: Edit Long Docs Without Full Rewrites

ChatGPT Deep Research: A Workflow That Survives Scrutiny

ChatGPT Keyboard Shortcuts: The 2026 List Worth Memorizing

ChatGPT Meeting Notes: Transcript to Action Items (2026)

ChatGPT on Mobile: Patterns That Actually Work on a Phone

ChatGPT Tasks: Schedule Recurring AI Work (2026 Guide)