Image-to-Video Doesn't Follow the Source Image

Your output looks nothing like the input image. Raise image fidelity (Kling Relevance / CFG), cut the text prompt to motion-only, and feed a source at output resolution or larger with the subject filling 40%+ of frame.

Published: May 17, 2026 Updated: Jun 18, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You upload a specific image to Runway / Kling / Pika / Luma, write a prompt about gentle motion, and the output video doesn’t really look like your input. The subject is slightly different. The lighting changed. The composition shifted. It’s “inspired by” your image rather than animating it.

Fastest fix (works for ~70% of cases): cut the text prompt down to a short motion-only hint, drop motion to its lowest setting, and feed a source image that is at least as large as your output resolution with the subject filling most of the frame. The text prompt fighting your image is the single most common cause. If your tool has a fidelity control (Kling’s Creativity/Relevance slider, ComfyUI/WAN CFG, or a denoise/strength value), push it toward “follow the image.”

Image-to-video tools have a few knobs that together decide how strictly the video follows the source: the start-frame fidelity, motion strength (which deforms the subject), text-prompt weight (which can override the image), and source-image quality. Each tool weights them differently, and the controls have moved as models updated through 2026.

What changed in 2026 (read this first)

Model names in older guides are stale. As of June 2026:

Tool	Current model (June 2026)	Fidelity / strength control
Runway	Gen-4.5 (Gen-4 Turbo for speed)	No literal “image strength” slider; fidelity comes from a motion-only prompt + `Fixed Seed`. Upload sets the first frame.
Kling	3.0 (also 2.6, 2.5 Turbo still selectable)	`Creativity` vs `Relevance` slider (CFG). Push toward `Relevance` for strict adherence. `Professional` mode renders at highest fidelity.
Pika	2.5 flagship (2.2 still available; Pikaframes on both)	Strength via “Image conditioning”; keep prompt minimal.
Luma	Dream Machine on Ray3 / Ray3.14 (Ray2 retired)	First-frame conditioning; control via low motion + motion-only prompt. Ray3.14 adds native 1080p.

Note: OpenAI’s Sora consumer app (sora.com plus the iOS/Android apps) was discontinued on April 26, 2026, and the Sora 2 / Sora 2 Pro API shuts down September 24, 2026, after which API generation stops entirely. Sora is no longer a viable image-to-video target. Use Kling, Runway, or Luma instead. See OpenAI’s Sora discontinuation notice.

Common causes

Ordered by hit rate, highest first.

1. Text prompt overrides the image

This is the top cause and the one people miss. Every major tool now treats the text prompt as a strong signal. A descriptive prompt (“a beautiful woman in a red dress walking through a city”) tells the model to generate that scene, and it will happily overwrite your actual subject’s face, clothing, and background. Runway’s own Gen-4.5 guidance is explicit: for image-to-video, the prompt should describe the motion of the scene, not its visual contents (that is the difference from text-to-video, where you describe both).

How to spot it: your text prompt describes things visible in the image (clothing, hair, expression, setting). It’s competing with the image instead of animating it.

2. Image fidelity / strength too low

Where a fidelity control exists, defaults lean creative. In Kling, the Creativity/Relevance slider defaults toward the middle; for strict adherence push it to High Relevance. In img2img-style local pipelines (WAN, LTX, ComfyUI), the denoise/strength value behaves the opposite way to text generation — for an image-to-video first frame you want CFG/structure conditioning high, not the 0.2-0.5 range used for loose style edits.

How to spot it: the tool has a Relevance/CFG/strength control and it is at default or mid-range. Raise it.

3. Motion strength too high

High motion overrides whatever fidelity you set. Even with the image locked, a high motion value will deform the subject as the model invents movement it has no reference for.

How to spot it: the motion slider is at default or above. Drop it to the minimum for your first generation.

4. Source image smaller than the output resolution

The model can only anchor on the pixels you give it. If you upload a 512×512 image but render at 720p or 1080p, the model upscales and reinvents detail. Vendor guidance (Runway, LTX) is consistent: the input should match the output resolution or be larger.

How to spot it: source short side is < 1024px, or smaller than your render resolution.

5. Subject too small in the source image

If the subject is 20% of the frame, the model has few pixels to anchor identity on. Aim for the subject filling > 40% of the frame.

How to spot it: subject is small in the source. Crop tighter before uploading.

6. Heavily stylized / illustration source

Anime, painting, and sketch sources translate to video worse than photos because the model has to interpret an art style, not just animate pixels.

How to spot it: source is an illustration; the output drifts toward a realistic version that doesn’t match the original style.

Shortest path to fix

Step 1: Verify source image quality

# Source image checklist
- Resolution: >= your output resolution, and >= 1024px on the short side (1536+ is safer)
- Format: PNG (avoid heavily compressed JPEG)
- File size: under the tool cap (Kling rejects files over 10 MB)
- Subject fills > 40% of frame
- Subject in focus and sharp
- Clear lighting; no extreme shadows hiding features

If your source fails any of these, fix the source before touching tool settings.

Step 2: Strip the text prompt to motion only

This is the highest-leverage change. Describe what moves, never what is in the frame.

# Bad - overrides the image, rewrites your subject
"a beautiful woman in a red dress walking confidently through a vibrant city street, cinematic, warm sunset"

# Good - motion only
"slight head turn, gentle smile, hair moves in light breeze"

# Also fine - camera motion only
"slow push-in, subtle parallax"

# Strongest adherence in some tools - empty prompt
""   # let the image speak; add motion only if it stays too static

Keep it to 5-10 words about motion or camera, not the subject.

Step 3: Set the fidelity / strength control toward the image

# Kling 3.0 / 2.6 / 2.5
- Creativity vs Relevance slider -> push to Relevance ("High Relevance")
- Use Professional mode (not Turbo) for the highest-fidelity render

# Runway Gen-4.5 / Gen-4
- No strength slider: rely on motion-only prompt + enable Fixed Seed
- Re-roll the seed if the first frame drifts

# Pika 2.5 / 2.2
- Use "Image conditioning" and keep the prompt minimal

# Local / img2img pipelines (WAN, LTX, ComfyUI)
- Raise CFG / structure conditioning; keep the first-frame denoise low

Step 4: Drop motion to the minimum

# Runway: lowest motion / minimal camera prompt
# Kling: keep camera-move terms out; let it animate subtly
# Pika: motion 0.2-0.3
# Luma: low motion

Less motion means less deformation of the source subject. Add motion back only after you confirm the subject is being preserved.

If you only need part of the image to move (a face, a flag, water), use Kling’s Motion Brush to mask just that region. The unpainted area stays locked to the source, which is the most reliable way to keep identity while still getting motion.

Step 5: Crop the source so the subject fills the frame

# Before upload
- Open the source in Photoshop / Preview / Pixelmator
- Crop so the subject is at least 50% of the frame
- Pad with matching-color background if you need a specific aspect ratio
- Re-save as a high-quality PNG at >= the output resolution

Step 6: For illustration sources, match the model to the style

# Anime / illustration -> video
- Kling with a stylized prompt (it preserves illustrated styles well)
- Luma Dream Machine for stylized motion exploration

# If you actually want a photoreal conversion
- Accept that the source art style won't be preserved
- Use a reference/structure pass (ControlNet-style) to keep composition only

How to confirm it’s fixed

Pause the output on frame 1. It should be visually identical to your uploaded image (same face, clothing, background). If frame 1 already differs, the model never honored the start frame — re-check the fidelity setting and that you uploaded to the image slot, not a text-only field.
Scrub to the middle and end. The subject’s identity (face, hair, outfit) should hold. If it drifts only later, lower motion further.
If frame 1 matches but the look shifts, you’re motion-bound — drop motion, not fidelity.

Prevention

Prep every source to >= 1024px (and >= your render resolution) with the subject filling > 40% before uploading.
Default first generations to minimum motion + maximum fidelity; raise motion only if the clip is too static.
Keep text prompts motion-only; never re-describe the subject.
For multi-clip projects on the same source, save the prepped source and reuse the exact file plus a fixed seed where the tool supports it.

FAQ

Why does the output ignore my image but follow my text? Because the text prompt outranks the image in most 2026 models. If your prompt describes the subject, the model regenerates that subject from scratch. Cut the prompt to motion-only words and the image takes over.

Which tool follows the source image most strictly? For a single locked first frame, Kling 3.0 (Relevance slider high, Professional mode) and Luma Dream Machine (Ray3 / Ray3.14) are the most consistent. Runway Gen-4.5 also holds the first frame well when you use a motion-only prompt and Fixed Seed. If you need surgical control, Kling’s Motion Brush lets you animate one region and freeze the rest.

My input is 4K but the output still drifts. Why? Resolution alone isn’t enough. Check three things: the prompt isn’t describing the subject, motion isn’t maxed, and the subject fills enough of the frame. A pristine 4K image with the subject at 15% of frame still gives the model little to anchor on.

Should the source image match the output aspect ratio? Yes where possible. A mismatch forces the tool to crop or pad, which can cut off or reframe your subject. Crop the source to the target aspect ratio (16:9, 9:16, 1:1) before uploading.

Is Sora still an option for image-to-video? No. The Sora consumer app (sora.com plus the mobile apps) was discontinued April 26, 2026, and the Sora 2 / Sora 2 Pro API ends September 24, 2026; after that date API generation stops and account data is deleted. Export anything you still need and move workflows to Kling, Runway Gen-4.5, or Luma.

Tags: #Video generation #Debug #Troubleshooting