AI Image-to-Video Drifts From Reference

Start with image A, end with someone else. Motion strength + identity anchors fix it.

You fed Runway / Kling / Pika a clean reference image — your character, your product, your scene — and the first frame of the generated clip looks great. By frame 30 the face has shifted, the outfit color has drifted, the product silhouette has changed. By frame 120 you are looking at a different person or product entirely. Image-to-video drift is the single most reported issue in 2025-2026 video generation. Fix it with the right combination of motion strength, clip length, and explicit identity anchors.

Common causes

Ordered by what causes drift most often.

1. Motion strength too high

Every image-to-video model has a knob that controls how much movement to add. Runway calls it “Motion Brush strength” or “Camera Motion” intensity; Pika has a 0-4 motion slider; Kling has “subtle/medium/intense” presets. Set too high, the model invents motion that requires inventing new geometry, and identity collapses.

How to spot it: Re-run at the lowest motion setting. If the drift drops dramatically, motion strength was the culprit.

2. Clip longer than identity coherence window

Each model has a “coherence window” — how many frames it can hold the subject before identity drifts. For 2025-2026 models:

  • Runway Gen-3 Alpha: ~80 frames (~3.3s at 24fps) before noticeable drift
  • Kling 1.6: ~96 frames (~4s) with high subject coherence mode
  • Pika 1.5: ~72 frames (~3s) without identity anchor
  • Sora: ~120 frames (~5s) for tight close-ups, less for full-body

Request a 10s clip and you are guaranteed to exceed the window.

3. Reference image too low resolution

If the reference image is 512x512 or has heavy JPEG compression, the model is interpreting blurry edges as semantic ambiguity (“is that a collar or a scarf?”) and resolving them differently every frame. The result reads as drift.

How to spot it: Open the reference image at 100%. Are edges crisp? Any compression artifacts? File size under 500KB for a 1024px image suggests heavy compression.

4. Prompt contradicts reference

Reference image shows a blonde woman; prompt says “young woman with auburn hair.” The model has two conflicting signals and resolves them inconsistently across frames.

How to spot it: Read your prompt next to the reference. Any attribute named in the prompt that does not match the image? That is a fight.

5. Subject too small in reference

If the subject occupies less than 30% of the reference image, the model has limited identity anchor data to work from and drifts faster.

6. Multiple subjects in reference

Two or more people / objects in the reference, and the model can swap which one it tracks across frames. Group reference images are the highest-risk case.

Before you change anything

  • Save the reference image, full prompt, motion settings, and the drifting output clip.
  • Note which model and tier you are on (Pika 1.5 vs 1.0, Runway Gen-3 Alpha vs Turbo).
  • Decide your target clip length and how much identity drift is acceptable for the use case (B-roll tolerates more than hero shots).
  • Confirm the reference image is at least 1024px on the short side and crisp.
  • Commit or back up the current reference image and prompt before changing them.

Information to collect

  • Reference image at native resolution, full prompt, motion strength, clip length.
  • Model name and version.
  • A side-by-side of the first frame vs the drifted frame to quantify the gap.
  • Whether the same reference produces drift on a different model.
  • Final-cut requirement: hero, B-roll, or background — different tolerances apply.

Shortest path to fix

Step 1: Re-export the reference at native resolution

Make sure the reference is at least 1024px on the shortest side, saved as a PNG (not JPEG), with the subject centered and clearly visible. Crop out background clutter, watermarks, or text overlays. The reference is the most important variable; under-investing here makes every other step harder.

For people: head and shoulders or chest-up framing, neutral pose. For products: clean background, single object, no reflections from other objects.

Step 2: Set motion strength to the lowest preset

  • Runway: Motion Brush strength 1-2, Camera Motion “static” or “slow”
  • Pika: motion slider at 0.3-0.5, not 1.5+
  • Kling: “subtle” preset
  • Sora: shortest duration

Then regenerate. If identity holds, dial up gradually. Most drift cases are solved here.

Step 3: Cap clip length at 3 seconds

Generate 3-second clips, then concatenate. Each 3s segment can use the previous segment’s last frame as the next segment’s reference image, preserving identity across the full sequence.

Clip A: image-to-video (reference = original image, 3s)
Export last frame of Clip A as image
Clip B: image-to-video (reference = last frame of A, 3s)
Concatenate in CapCut / Premiere

This “chained reference” workflow gets you to 10-20s of coherent output that single-shot generation cannot.

Step 4: Add explicit identity description to the prompt

Even with a reference image, add a text description that names the subject:

the same blonde woman from the reference image, red leather jacket, 
slight head turn, no camera movement, identity preserved across frames

For products:

the same red ceramic mug from the reference, rotating slowly on its axis, 
shape and color preserved, no morphing

This dual-anchor approach (image + text) significantly reduces drift.

Step 5: Switch to a model with stronger identity preservation

If drift persists at lowest motion + shortest clip + sharpened reference, the model itself is the bottleneck. As of 2025-2026:

  • For human identity: Kling 1.6 “high subject coherence” mode
  • For product identity: Runway Gen-3 with Motion Brush locked to background only
  • For full-scene preservation: try Sora at shortest tier

Step 6: Use Runway Motion Brush or Kling reference lock

Both Runway and Kling expose a “lock subject” or “motion brush” feature where you paint the area that should stay still, and only the painted area drifts. For talking-head shots, paint the body and only allow head motion.

How to confirm the fix

  • Compare frame 1 and the last frame side-by-side. The subject should be recognizably the same.
  • Watch the clip at 25% speed. Any frame-to-frame jumps in face, color, or shape are drift.
  • Three clips generated at the same settings should all hold identity, not just one lucky output.
  • A teammate seeing only the final clip (no reference) should be able to match it back to the reference image.

If it still fails

  1. Reduce the clip to 2 seconds and re-run at the lowest motion setting. If 2s still drifts, the reference image itself is the problem.
  2. Try a much more constrained prompt: static shot, minimal motion, identity preserved and dial out all camera moves.
  3. Use a different reference image of the same subject — sometimes a different angle or framing produces dramatically better coherence.
  4. Switch to a fundamentally different model.
  5. Package the reference, prompt, motion settings, and the drifted clip before posting to community channels.

Prevention

  • Always start at the strictest motion setting and loosen up only after you verify identity holds.
  • Standardize reference image format: 1024-1536px, PNG, neutral background, single subject.
  • For any clip over 3s, plan as a chain of 3s segments, not one long generation.
  • For brand or product video, lock identity with both reference image AND text description naming key attributes.
  • Maintain a per-model “coherence window” doc so you do not request longer clips than the model can hold.

Tags: #Prompt #Debug #Troubleshooting #Video generation #Image-to-video