The portrait is great. The pose is great. Then your eye drifts down to the hands and there are six fingers, or a thumb that grew out of a wrist, or a fist that has fused into a single mitten shape. Hands are the second-hardest body part for diffusion models after teeth. The fix is rarely a prompt tweak. It is workflow: hide hands when you can, give them enough pixels when you cannot, and run a dedicated inpaint pass when the composition forces them into view.
Common causes
Ordered by how often each is the actual root cause.
1. Hands occupy too few pixels
A standing full-body shot at 1024x1536 gives each hand roughly 60-90 vertical pixels. That is too few for the model to articulate five distinct fingers reliably. Below 128 pixels per hand, expect failure.
How to spot it: Crop to one hand at 100% and measure. Below 100x100 pixels of bounding box, the hand is undersized.
2. Hand is in motion or partially occluded
“Holding a coffee cup”, “waving hello”, “playing guitar” — the model has to compose a hand and an object simultaneously. Articulated objects (instruments, tools, weapons) are the hardest case; smooth objects (cups, balls) are easier.
3. Multiple hands in frame
Two people shaking hands, group photos with visible hands, hand-on-shoulder poses. The model splits its already-thin hand budget across all of them.
4. Extreme angle on the hand
Hands viewed from the side, from above the knuckles, or fully foreshortened (palm directly at camera) push the model into low-data territory.
5. Stylized prompts fighting realism
“Photorealistic hand in the style of Picasso” — the model averages styles and hands fail first because they have the least stable training signal.
6. Older model or low-step generation
SD 1.5 hands are notoriously bad. SDXL base is meaningfully better. Flux, Imagen 3, and Midjourney v7 are better still. Low step counts (under 20 on SDXL) skip the iterations that would have refined finger geometry.
Before you start
- Decide whether the hands need to be visible at all. Cropping them out is often the fastest fix.
- Save the seed, prompt, model, and tier of the broken image.
- Generate 4 seeds. If 4 of 4 break, the prompt or composition is structural. If 1 of 4 breaks, it is noise.
- Confirm other anatomy is not also breaking. If feet, faces, and hands all break, the root cause is pixel budget, not hands specifically.
Information to collect
- Full prompt, negative prompt, model, seed, sampler, steps, aspect ratio.
- A 100% crop of the broken hand with pixel measurement.
- Whether the hand is interacting with an object or empty.
- Intended use case — print product hero needs more accuracy than a blog header.
Step-by-step fix
Ordered by ROI. Step 1 and Step 2 together clear about 70% of cases.
Step 1: Hide or crop hands when possible
The cheapest fix is composition. If hands are not the story:
- Cut framing at the chest, not at the waist. No hands in frame.
- Put hands behind back, in pockets, or out of frame.
- Use props that hide hands (mug held with the back of the hand toward camera, sleeves over knuckles).
For commercial portrait work, “hands out of frame or in pockets” is the standard delivery default.
Step 2: Raise pixel budget for the hand region
If hands must be visible, give them more pixels:
- Switch to a vertical aspect ratio so the body has more vertical pixels.
- Crop tighter — half-body instead of full-body.
- Generate at higher base resolution. Flux at 1024x1280, SDXL at 1024x1536.
Step 3: Add hand-specific negative prompts
For SDXL and SD-family models, a hand negative block helps:
extra fingers, missing fingers, fused fingers, malformed hand,
mutated hand, six fingers, deformed hand, extra thumb,
wrong hand anatomy
This will not save a totally broken hand but it reduces failure rate at the population level.
Step 4: Run a hand-specific inpaint pass
This is the highest-ROI step for any visible hand that is still wrong after steps 1-3.
- SDXL / A1111: Enable ADetailer with
hand_yolov8n.ptmodel. Default settings handle roughly 70% of cases. Use denoise 0.5 to start. - ComfyUI: Use the
Hand Detailernode, or chainMediaPipe Hand Mesh+Inpaintfor explicit hand mask. - Midjourney: Vary (Region) on the hand, prompt with
well-defined hand, five fingers, natural finger anatomy, sharp focus. - Flux: ComfyUI Flux Fill with a manual hand mask.
- Photoshop: Generative Fill on the hand region with a focused prompt.
If ADetailer still fails, repeat with denoise raised to 0.7 and a tighter mask.
Step 5: Switch to a stronger model for hands
If the model itself is weak on hands:
- Flux Pro: currently strongest on hand anatomy in realistic work.
- Midjourney v7: very strong with Vary (Region) for fixes.
- Imagen 3: solid hands, especially in soft / editorial lighting.
- Avoid SD 1.5 for any hand-visible work.
How to confirm the fix
- Count fingers on every visible hand. Five per hand. Verify thumbs are in the right place.
- Check knuckle direction. Fingers should bend toward the palm side, not backward.
- Check for fused fingers. Each finger should be separable by visible shadow lines.
- Get a second pair of eyes. Self-blindness is real, especially on your own renders.
- Regenerate 4 candidates at the fixed prompt. All four should pass the count and bend checks.
Long-term prevention
- For any portrait series, design framing to hide hands by default unless they are the story.
- Build a saved negative prompt snippet for hands and paste it into every realistic portrait.
- Always run a hand-specific inpaint pass on portraits where hands are visible — treat it as a workflow step.
- Standardize on Flux Pro or Midjourney v7 for any hand-critical work.
- Maintain a personal blacklist of poses and props that historically break hands in your favorite model.
Common pitfalls
- Re-rolling 20 seeds hoping for a good hand. After 4 bad rolls, change workflow.
- Trusting the thumbnail. Always inspect hands at 100% zoom.
- Forgetting that ADetailer denoise too high will reshape the wrist and forearm too.
- Adding “perfect hands” to the positive prompt expecting magic. It does almost nothing alone.
FAQ
Q: Why are hands so much harder than faces? A: Faces have a strong symmetric prior — two eyes, one nose, one mouth, in fixed positions. Hands have variable poses, articulations, occlusions, and 14 finger joints each. Training data covers far more frontal faces than well-posed hands.
Q: Will newer models eliminate this problem? A: They are improving fast. Flux Pro and Midjourney v7 are markedly better than 2023 models. But the problem will not be fully solved at the generation step for several more model generations.
Q: Does adding “five fingers” to the positive prompt help? A: Marginal at best. Negative prompts plus inpaint passes do the real work.
Q: Can I use a hand-pose ControlNet? A: Yes. For SDXL, OpenPose-Hand and DWPose-Hand ControlNets force a specific hand shape. This is the most reliable approach for hand-critical work but adds setup overhead.