AI Video — Hands Disappear or Morph During Motion

A character's hands vanish, fuse into the torso, or grow extra fingers the moment they start moving. Why hands are the worst region for AI video and how to keep them visible.

Your reference image has clean, anatomically correct hands. The first frame of the generated clip still looks fine. Then the character reaches for a cup, waves, or just walks past the camera — and the hands smear into the sleeve, fuse with the torso, sprout a sixth finger, or vanish entirely for 8 frames before snapping back. This is one of the most reliable failure modes in current AI video: hands are small, articulated, fast-moving, and self-occluding, which is exactly the region where a diffusion model has the least signal to work from.

This article covers why hands break specifically during motion (not stillness), how to phrase prompts that minimize it, and how to recover a shot when re-rolling is not an option.

Common causes

Ordered by hit rate, highest first.

1. Hand is small relative to frame and crosses the motion blur threshold

When a hand occupies less than ~3% of the frame and moves faster than the model’s training distribution allows for that resolution, it gets aliased into a blur that the decoder cannot reconstruct as an anatomically valid hand. The model picks “looks like a sleeve” over “looks like a half-resolved hand.”

How to spot it: Pause every 4 frames. Hands are intact when stationary, degrade only during the motion segment, and recover when motion stops.

2. Prompt focuses on the action, not the hand

Prompts like “person waves hello” or “barista pours coffee” describe the action. The model interprets the action holistically and treats the hand as a means to the verb, not a region to preserve. Hands get optimized away in favor of the dominant motion vector.

How to spot it: Your prompt names the verb but never the noun “hand” or “fingers.” A prompt that explicitly mentions five visible fingers gives the model a region to defend.

3. Hand crosses in front of a similarly-colored region

Hand passes in front of the face, the torso, or a same-tone background. The model’s segmentation between hand and background fails for a few frames, and the hand visually fuses into whatever it crossed.

How to spot it: The disappearance happens exactly when the hand overlaps a same-color region. Move it across high-contrast space and the bug stops.

4. Image-to-video extension of a closed-fist or hidden-hand starting frame

If the starting image hides the hands (pockets, behind back, closed fist), the model has no anchor for finger count, knuckle position, or palm orientation. The moment hands enter frame, the model invents them from scratch, often badly.

How to spot it: Reference frame has hands hidden, and the artifact appears precisely when hands first become visible.

5. Holding an object — fingers wrap the object incorrectly

Pen, cup, phone, steering wheel — anything the hand grips. The model has to simultaneously render correct grip geometry and consistent object size. It usually fails at one. Fingers pass through the object, the cup floats, or the pen warps.

How to spot it: Hands look fine when empty; fail only when grasping. The object’s shape distorts in sync with the finger errors.

6. Motion segment exceeds ~2 seconds of continuous hand action

Most current models stay anatomically stable for ~1.5–2 seconds of complex hand motion, then drift. Long takes with continuous hand work (typing, sign language, gestures) accumulate error.

How to spot it: Hands are correct for the first ~40 frames, then degrade progressively. Shortening the clip eliminates the issue.

7. Wide-angle lens spec amplifies hand distortion

“Wide-angle lens,” “fisheye,” or “GoPro” in the prompt teaches the model to exaggerate near-camera elements. Hands closest to the lens get stretched into the distortion budget, which models render as anatomical drift rather than honest perspective.

How to spot it: Removing the lens spec while keeping every other prompt term fixed produces normal hands.

Shortest path to fix

Step 1: Add explicit hand language to the prompt

Don’t just describe the action. Add structural anchors:

"a barista pours espresso, both hands visible,
five fingers on each hand, fingers wrap naturally
around the cup handle, hands occupy lower-third of frame"

The phrase “five fingers” alone reduces extra-finger artifacts in most models because it gives the denoiser a count to honor.

Step 2: Keep hands large enough in frame

Reframe the shot so hands occupy at least 8–10% of the frame area during the motion segment. Medium shot beats wide shot for any clip where hand motion is the subject.

If you can’t reframe, generate at higher resolution (1080p → 4K) and downscale. The model has more pixels to spend on hand detail at higher resolution.

Step 3: Avoid hand-overlap-with-similar-tone regions

If the action requires the hand to cross the body, change one of:

  • Wardrobe color (high-contrast sleeve vs. background).
  • Hand position (cross higher or lower to avoid the torso midline).
  • Lighting (rim light separates hand from background).

Step 4: Anchor with a hands-visible starting frame for image-to-video

If you’re doing image-to-video, the reference frame must show the hands you want, in the position they’ll start from. Closed fists, pockets, or behind-back starting poses are the single biggest predictor of hand drift in the generated motion.

Step 5: Shorten and stitch

Split a 4-second hand-heavy clip into two 2-second clips with a cut. Each shorter generation will hold anatomy better, and a clean cut between them is invisible if the action continues. Avoid asking one model pass to hold hands for more than ~2 seconds of continuous action.

Step 6: Mask-and-regenerate just the hand region

If re-rolling the whole clip isn’t viable, several tools (Runway Inpaint, Kling Local Edit) let you mask the hand region only and regenerate that area while keeping the rest of the clip frozen. Costs less than a full re-roll and preserves the parts that worked.

Mask: hand bounding box + 20px feather
Prompt: "five-fingered hand, natural anatomy, holding cup"
Strength: 0.7 (keep some motion from original)

Step 7: Hide the failure with intentional motion blur or cut

If all else fails: add post-production motion blur over the bad frames (radial blur centered on the hand), or cut to a different angle for the 8–12 frames where the hand breaks. Audiences forgive a cut; they do not forgive a six-fingered hand on screen for half a second.

When this is not on you

Hands during motion are a known weak point across every current frontier video model (Sora, Veo, Runway Gen-3, Kling, Hailuo, Pika). Some shots — like sign language, juggling, or hands-only close-ups during fast motion — are simply not yet achievable with one-shot generation. Plan around the limitation.

Easy to misdiagnose as

  • “Bad seed.” Re-rolling rarely fixes hands-during-motion; it just shuffles which frames break. Address the cause, not the variance.
  • “Model is bad.” Hands break across models in the same class of motion. Switching models without changing the prompt or framing usually reproduces the problem.
  • “Prompt is wrong.” The prompt may be fine; the issue is often framing, duration, or starting-frame visibility — none of which the prompt alone controls.

Prevention

  • Default to medium shots for any clip with prominent hand motion.
  • Add “five fingers on each hand, both hands visible” to your hand-motion prompt template.
  • Keep continuous hand-action segments under 2 seconds; stitch longer takes.
  • If starting from a reference image, never use a hidden-hand starting pose.
  • Build a “hand-safe” prompt module you reuse across all character clips, separate from the action description.
  • For client work, plan a cut-away shot at the hand region as fallback B-roll.

FAQ

  • Why are hands worse than feet? Hands are more articulated, move faster, occupy more frame attention, and have more training-data variance (different sleeves, gloves, accessories). Feet are usually static, partially occluded by floor or pants, and forgive distortion.
  • Does upscaling fix hands? Upscaling sharpens existing pixels but cannot invent correct anatomy. If the hand is broken at 720p, it will be broken at 4K. Fix the generation, then upscale.

Tags: #ai-video #Troubleshooting #Video generation #hands #motion-artifacts #anatomy