Your image has technically everything you asked for — the cat, the coffee cup, the book, the laptop, the window, the houseplant, the morning light — but it reads as visual chaos. The eye doesn’t know where to land. Every object is rendered at similar size, sharpness, and prominence, so the brain reads “noise” instead of “scene.”
Cluttered composition is rarely a “model can’t compose” problem. It’s almost always a prompt problem: you listed seven things and gave the model no priority signal.
Common causes
Ordered by hit rate, highest first.
1. Too many objects with equal weight in the prompt
cat, coffee, book, laptop, plant, window light, cozy morning — seven nouns, no hierarchy. The model treats them as equally important and tries to render all of them at central prominence.
How to spot it: count the concrete nouns in your prompt. More than 3 without weighting → cluttered output likely.
2. No depth-of-field cue
Without DOF instructions, the model defaults to medium aperture — everything roughly in focus. That means even peripheral elements compete with the subject for attention.
How to spot it: your prompt has no shallow depth of field, bokeh, f/1.4, out of focus, or blurred background. Add one.
3. No explicit hero subject
You said the cat is in the scene, but you didn’t say the cat is the subject. Models need that hierarchy hint, especially when multiple nouns are listed.
How to spot it: your prompt doesn’t have the words hero subject, main subject, centered, dominant, or a sized modifier like large cat, tiny coffee cup in background.
4. Wide framing with detailed scene words
Wide shot + words like cozy, interior, room, still life, lifestyle scene invite the model to fill the frame with stuff. Tighter framing or single-noun composition prevents it.
How to spot it: prompt is wide + scene/lifestyle/interior-style words.
5. Style anchor implies clutter
Specific styles bake in clutter:
still life painting— multiple objects on a tablecozy aesthetic— many props, soft layered detailflat lay photography— busy by definitionwes anderson— symmetrical maximaliststudio ghibli interior— busy lived-in spaces
How to spot it: your style anchor evokes a busy scene on its own.
Shortest path to fix
Step 1: Cut to one hero + max 2 secondary objects
Before:
a cat, a coffee cup, a book, a laptop, a houseplant, a window with morning light, a cozy desk scene
After:
a ginger cat sitting on a desk, soft morning window light in the background,
one out-of-focus coffee cup beside the cat
One hero (cat), one secondary (coffee cup, explicitly out-of-focus), and atmosphere (window light) instead of an object.
Step 2: Add explicit hero subject + size modifiers
Prompt patterns that work:
"[hero] is the main subject, centered, large in frame"
"close-up of [hero], everything else small and out of focus"
"[hero] in sharp focus, [other objects] blurred in the background"
Step 3: Add depth of field
This single line transforms most “everything is sharp” cluttered images into “subject pops”:
"shallow depth of field, f/1.4, creamy bokeh, only [hero] in focus"
For Midjourney specifically:
"... --style raw --ar 4:5"
--style raw reduces auto-stylization that adds clutter; tall aspect ratio reduces background coverage.
Step 4: Add negative space wording
Words to add (pick 1-2):
minimalist compositionlarge negative spacebreathing room around the subjectclean composition with simple backgroundJapanese minimalist aesthetic(if it fits your style)
Step 5: Negative-prompt the clutter (SD-family)
cluttered, busy composition, many objects, crowded scene,
multiple subjects, ornate, baroque, maximalist, busy background,
overlapping objects
Step 6: Sketch the composition first, then prompt
For art-directed work, do a rough sketch (paper, iPad, ControlNet’s Scribble) of the composition you want, then feed it as a ControlNet input. The model fills in detail but cannot deviate from your composition.
# ComfyUI / Forge ControlNet
- Load ControlNet Scribble or Canny
- Provide your composition sketch
- Strength: 0.6-0.8 (lower = more model creativity, higher = strict)
Prevention
- Decide the hero subject BEFORE writing the prompt; write it first in the sentence
- Default to 3 nouns max per image; if you need more, do a series, not one image
- Set a rule: every prompt with 3+ nouns must include a depth-of-field or focus modifier
- For series work, keep a reusable “minimalist composition” snippet at the end of every prompt