AI Image Wrong Perspective or Scale

Table has no horizon, stairs face the wrong way, head-to-body ratio is off. Add focal length + viewpoint + perspective style to the prompt opening.

You generated a kitchen scene and the table is flat against the camera — no horizon, no depth. Or the staircase goes “up” but the perspective lines point sideways. Or a person’s head is 50% the height of their body. The image content is right, but spatially it makes no sense.

Perspective and scale problems happen because the model has no spatial anchor in the prompt — no focal length, no viewpoint, no perspective style. Without them, the model averages across thousands of possible spatial setups and produces a confused average.

Common causes

Ordered by hit rate, highest first.

1. No focal length specified

Without wide-angle, telephoto, 35mm, 85mm, the model picks a generic mid-focal length — fine for closeups, but produces flat / distorted environments.

How to spot it: prompt has no lens / focal length word. Add one.

2. No viewpoint specified

eye level, low angle, high angle, bird's eye view, dutch angle — pick one. Without it, model defaults to a flat eye-level that often kills depth.

How to spot it: prompt has no viewpoint word.

3. Too many objects fighting for depth

Each object has its own implied scale. A vase + chair + sofa + window + plant + cat all imply different distances. Model gets confused trying to compose them consistently and breaks perspective.

How to spot it: prompt has 5+ distinct named objects.

4. Aspect ratio mismatch to subject

A wide environment shot at 9:16 vertical forces the model to crop perspective awkwardly. A tall figure at 16:9 horizontal does the opposite.

How to spot it: aspect ratio is opposite of what the subject naturally suggests.

5. Conflicting perspective cues

"bird's eye view of a person standing tall, looking up at the camera"

Bird’s eye view = looking down. Looking up = camera below. They contradict. Model picks one or averages.

How to spot it: viewpoint words in the prompt don’t agree with subject angle words.

6. Style anchor that breaks normal perspective

Cubist, surreal, mc escher, dali, isometric, axonometric — these styles intentionally break perspective. If you used one accidentally, that’s why.

How to spot it: a style word in your prompt evokes broken perspective.

Shortest path to fix

Step 1: Add a 5-7 word “camera block” at the prompt opening

Template:

[focal length] + [viewpoint] + [perspective style] + [your subject]

Examples:

# Architecture / environment
"wide-angle 24mm, eye level, two-point perspective, ..."

# Portrait
"85mm portrait lens, eye level with subject, shallow depth, ..."

# Cinematic landscape
"35mm anamorphic wide, low angle from ground, three-point perspective, ..."

# Top-down product / flat-lay
"top-down shot 90 degrees overhead, no perspective, flat, ..."

# Interior architecture
"24mm wide, eye level standing 5ft above floor, two-point perspective, ..."

This single addition fixes 60-70% of perspective issues.

Step 2: Pick ONE viewpoint, commit

Common viewpoints to choose from:

eye level                — natural, default
low angle                — looking up, heroic
high angle               — looking down, vulnerable
bird's eye view          — straight down or near-vertical
worm's eye view          — straight up
dutch angle              — tilted camera, dynamic
three-quarter view       — 30-45° offset
straight-on              — flat, head-on, no perspective
isometric                — engineering-style flat angled

Don’t mix incompatible ones.

Step 3: Drop object count if scene is busy

Cap the prompt at 3-5 named objects. The fewer objects, the more consistent perspective.

# Before — too many
"a kitchen with a table, four chairs, fridge, stove, microwave, coffee maker, blender, sink, window, plant"

# After — focused
"a kitchen scene featuring a wooden dining table with four chairs in soft morning light"

Step 4: Match aspect ratio to scene type

# Match
- Wide landscape / environment → 16:9 or 21:9 (landscape)
- Standing person / portrait → 4:5 or 9:16 (portrait)
- Flat-lay / top-down → 1:1 (square)
- Architecture wide → 21:9 (cinematic)
- Architecture tall → 9:16 (vertical)

Step 5: Add explicit perspective-style words

two-point perspective       — standard architectural
one-point perspective       — single vanishing point, hallway / road
three-point perspective     — looking up/down at tall buildings
forced perspective          — exaggerated depth
isometric perspective       — engineering, no vanishing
no perspective              — flat, top-down or straight-on
deep depth of field         — everything in focus, perspective visible

Step 6: Use ControlNet Depth for strict perspective

When you have a reference image with the perspective you want:

# ComfyUI / Forge
1. Load ControlNet Depth or LineArt
2. Provide reference image with desired perspective
3. Strength: 0.6-0.8 — locks perspective, model fills content

Or use a quick 3D render from SketchUp / Blender as the ControlNet input for very precise architectural perspective.

Prevention

  • Open every prompt with a 5-7 token camera block: focal length + viewpoint + perspective style
  • Default focal lengths: 24mm wide for environments, 50mm normal for half-body, 85mm for portrait
  • Match aspect ratio to scene type (landscape for environments, portrait for people)
  • For architectural / product work, use ControlNet Depth as default

Tags: #Image generation #Debug #Troubleshooting