You generate a hero shot of your product. The lighting reads premium, the background reads agency-quality — but the bottle has a curve where there should be a straight edge, the box has a wobble in the side panel, and the logo on the front melted into an approximation. For real products, AI cannot invent your packaging accurately. The fix is not better prompting. It is constraining the silhouette with ControlNet, masking the product area for inpaint, or compositing the real product photo on top of the AI background.
Common causes
Ordered by how often each is the actual root cause.
1. Model has never seen your product
Diffusion models know “bottle”, “box”, “phone” as abstract categories. They cannot reproduce your specific product’s silhouette, embossing, or label layout from text alone. Every text-only generation of a branded product is a hallucination of plausible packaging.
How to spot it: Are you trying to generate a real, branded product from text prompts? Yes = this is the root cause and no amount of prompt engineering will fix it.
2. No structural constraint on the silhouette
The model picks a generic shape that fits “bottle” or “box” and improvises. Without ControlNet Canny, Depth, or a reference image, every render has a different silhouette.
3. Stylization fighting realism
“Product photo of a Coke bottle in the style of Andy Warhol” — the model will warp the bottle to be more Warhol-like. The bottle silhouette degrades.
4. Low resolution for fine details
Embossed logos, fine type on labels, and thin product features (handles, spouts, caps) need pixel budget. A 1024x1024 hero shot of a tall thin bottle gives the bottle only 200 pixels of width — not enough for label fidelity.
5. Wrong camera angle or perspective
Models trained on catalog product shots know straight-on and slight-three-quarter views well. Top-down, worm’s-eye-view, or extreme angles push the model into low-data territory and shapes distort.
6. Multiple products in one frame
Group hero shots (a six-pack, a starter set) ask the model to render every product correctly. Even with constraints, each unit competes for accuracy budget.
Before you start
- Decide whether the product must be the real product or whether a “product-like” stand-in is acceptable. Concept work tolerates stand-ins; commercial work does not.
- Save the seed, prompt, model, and tier of the broken image.
- Have at least one clean reference photo of the real product on a neutral background.
- Confirm the use case. Print needs more accuracy than web; packaging mockups need pixel-perfect labels.
Information to collect
- Full prompt, model, seed, sampler, steps, aspect ratio.
- A reference photo of the real product, ideally on a clean background.
- The product’s key features that must read correctly (logo position, label color, silhouette curve).
- Intended deliverable size and use (web hero, print poster, social card).
Step-by-step fix
Ordered by ROI. Step 1 plus Step 4 is the standard production workflow.
Step 1: Constrain the silhouette with ControlNet
The single biggest move is replacing text-only generation with a structural constraint from the real product:
- SDXL / A1111 / ComfyUI: Load Canny ControlNet from a product photo with a clean silhouette. Use weight 1.0 for strict adherence.
- SDXL alternative: Use Depth ControlNet for products with rounded surfaces (bottles, jars).
- Flux: Flux ControlNet (Union variant) supports Canny and Depth in ComfyUI.
- Midjourney: Use
--srefwith the product photo and--sw 100for strict style and--creffor character / object reference.
This locks the silhouette. The model fills in lighting, environment, and surface — but the shape is yours.
Step 2: Raise pixel budget for the product region
Pick framing that gives the product more pixels:
- For tall products, vertical aspect ratio.
- For wide products, horizontal.
- For square products, square.
- Crop tighter to the product if the environment is not the story.
Step 3: Remove style fights from the prompt
Drop heavy style modifiers. Lead with photographic descriptors:
product photography, studio lighting, sharp focus, clean background,
[product description], shot on Hasselblad
Style modifiers go to the background and lighting, not the product surface.
Step 4: Composite the real product photo
This is the standard production workflow for commercial hero work:
- Generate the AI image with a generic stand-in for the product (a “blank white bottle” or “blank box”).
- Export the AI background and lighting.
- Cut the real product from its reference photo.
- Place the cut product into the AI background.
- Match lighting direction with a soft drop shadow and a global color grade.
For 90% of commercial hero work this is faster and more accurate than fighting ControlNet to perfection.
Step 5: Mask and inpaint the product region
If composition is locked but the product shape is off:
- Mask the product area.
- Run img2img inpaint with the ControlNet still active.
- Reduce denoise to 0.3-0.4 to preserve the silhouette while refining surface detail.
Step 6: Switch to a stronger model
Flux Pro, Midjourney v7, and Imagen 3 hold product silhouettes meaningfully better than SDXL base or older models. For label fidelity, Ideogram 2 is strongest because it handles short text best.
How to confirm the fix
- Overlay the generated product on a reference photo at the same scale. Silhouettes should match within 2-3 pixels at the edge.
- Inspect the logo and label region at 100%. Letters should be readable and correctly spelled.
- Check straight edges on boxes — they should be straight, not wobbled.
- Check axisymmetric features on bottles and jars — the two sides of the silhouette should mirror.
- For multi-unit shots, every unit should pass the above checks.
Long-term prevention
- For any commercial product work, default to ControlNet plus composite. Do not rely on text-only generation.
- Maintain a clean reference photo library of every product you generate hero shots for.
- Use Flux Pro or Midjourney v7 for product work. Avoid SD 1.5 entirely.
- Build a saved prompt template for product photography that excludes heavy style modifiers from the product itself.
- For labels and logos, composite type from the real product image — never trust the model.
Common pitfalls
- Re-rolling 20 seeds hoping the model gets your product right. It will not.
- Adding “accurate product shape” to the prompt expecting magic. It does almost nothing.
- Compositing the product without matching lighting direction. The composite reads fake.
- Forgetting to inspect the label at 100% zoom. Label errors are easy to miss at thumbnail size.
FAQ
Q: Can the model learn my product if I train a LoRA? A: Yes for SDXL and Flux. A product LoRA trained on 20-50 photos of your product holds silhouette and label fidelity well. Worth the setup for products you generate often.
Q: Does ControlNet Canny work for transparent products like glass bottles? A: Partially. Canny detects edges; transparent products have weak edges. Use Depth ControlNet or train a LoRA instead.
Q: What about generating packaging mockups from scratch? A: Use a dedicated mockup workflow (Smart Mockups, Placeit, or Photoshop mockups with smart objects) and overlay the AI-generated label. Cleaner than asking the model to invent packaging.
Q: Why does my logo come out almost right but slightly off? A: Logos are rasterized as image features the model approximates. For brand-critical work, composite the real logo from a vector file on top of the AI render.