Character Motion Video Workflow

How to make AI character clips where the character actually moves naturally — not "AI uncanny".

What this tutorial solves

AI character motion is the hardest part of video gen. Faces flicker, limbs glitch, walking cycles distort. The fix is constraining the motion and the camera.

Who this is for

Indie animators, comic / story creators, anyone making character-driven AI video.

When to reach for it

A character must move on-screen — walking, gesturing, talking, reacting.

When this is NOT the right tool

Complex multi-character action (still beyond AI); precise lip-sync to existing audio (use a lip-sync tool, not raw generation).

Step by step

Tools: Runway Gen-3 / Kling / Luma Dream Machine / Pika / Sora (image-to-video mode). Copy the prompts as-is, replace <...> with your real info.

  1. First, make the character reference image: use Midjourney / FLUX / Nano Banana to produce a front-facing, natural-light, medium-shot (waist up) portrait at ≥1024×1024. Prompt example:

    <character description: age/gender/clothing/hair color/distinguishing features>, neutral expression, looking at camera, medium shot from waist up, soft window light from camera-left, plain light gray studio background, sharp focus on face, 35mm lens, photo-realistic, 9:16

    Save as character_ref_v1.jpg. Use this same file as the image-to-video input for every clip.

  2. One motion per clip. Four most-stable motions + copy-paste prompts (image-to-video input = the reference image, text = the prompt):

    • Walk side-on across frame (3s):

      character walks from left edge to right edge of frame, natural side-profile gait, one full stride per second (3 strides total), fixed camera, no zoom, no pan, character maintains identical face and clothing throughout, soft window light, plain background, 24fps, 3 seconds
    • Turn head toward camera (2s):

      character starts facing 3/4 right, slowly turns head toward camera, eyes meet lens at 1.5s, subtle smile, eyebrow micro-lift, no body movement, fixed camera, 2 seconds
    • Sit down on chair (4s):

      character is standing, looks down at chair, lowers body smoothly into seated posture, hands settle on knees, single fluid motion, no glitching limbs, fixed camera at chest height, side angle, 4 seconds
    • Reach for object (3s):

      character extends right arm forward and slightly down to pick up a small object from desk, fingers close around object, brings hand back to neutral, no other body movement, fixed close-up on torso and arm, 3 seconds
  3. Must be image-to-video, not pure text-to-video. Entry points per tool:

    • Runway: sidebar Generative VideoImage to Video → upload reference image
    • Kling: home → Image to Video tab
    • Luma: generate page, toggle “Image start frame”
    • Sora: upload reference frame to Start frame
  4. Lock the camera. Force one of these into every prompt:

    fixed camera, no pan, no zoom, no dolly
    locked-off tripod shot, no camera movement
    static wide shot, camera stationary

    Skip this and the tool defaults to a Ken Burns push that stacks with your character motion = double drift.

  5. Clip length 3-5s. Tool defaults:

    • Runway Gen-3: default 5s, extendable +5s (don’t — identity drifts)
    • Kling: 5s / 10s (pick 5s)
    • Luma: 5s
    • Sora: 5s / 10s / 15s (pick 5s)

    Need longer? Stitch 3 × 5s clips in editing + hide cuts at turn-heads / occlusion moments.

  6. Batch generate, expect 1/8 hit rate. Run each motion 6-10 times (same ref image + same prompt, regenerate). Per-tool cost:

    • Runway Standard: ~10 credits per 5s (≈ $0.50)
    • Kling Pro: ~5 credits per 5s (≈ $0.20-0.30)
    • Sora 1080p: ~100 credits per 5s (subscription-dependent)

    Budget for 8 attempts × unit cost, expect 1-2 usable.

  7. Per-clip acceptance checklist: run each new clip through these 5; if any fails, discard:

    [ ] Face is the reference character's face, not a distant cousin
    [ ] Clothing / hair color / distinguishing features unchanged across the clip
    [ ] Correct number of fingers, no clipping through the body
    [ ] Natural gait (not glide / spasm)
    [ ] Single motion type (no "walking then suddenly running")
  8. Stitch into the final. Best take per motion → DaVinci Resolve / CapCut / Premiere:

    • Hide cuts at the final frame before a head-turn or occlusion
    • Add a 2-3 frame cross-dissolve between clips to absorb tiny color jumps
    • Apply one unifying LUT across the whole edit so 8 independent clips read as one shoot

A 3s character walk clip: reference image → image-to-video → “walks left to right, single stride per second, static camera, soft window light” → 8 generations → 2 usable → pick best.

Common mistakes

  • Complex motion in long clips. Drift wins.
  • Combining camera movement and character movement. Pick one.
  • Asking for facial expressions in long shots. AI gets faces wrong from far away.
  • Skipping the reference image. Text-only character motion is much harder.

Advanced tips

  • For multiple character clips, use the SAME reference image for image-to-video. Maintains identity.
  • Side-angle walking handles drift better than head-on or back-shot.
  • For dialogue / lip-sync, generate visuals separately, then use a lip-sync tool to add the audio match.

Output checklist

  • Character identity holds throughout the clip.
  • Motion is natural (no limb glitches, no skipped frames).
  • Face doesn’t flicker if visible.
  • Single motion type per clip.
  • Reference image used to seed generation.

FAQ

  • Can I generate a 30-second character monologue?: Not reliably. Generate 5-second chunks and edit together with cuts hiding seams.
  • What about lip sync?: Raw generation rarely lip-syncs accurately. Generate visuals, then use a dedicated lip-sync tool over the top.

Tags: #Tutorial #Video generation #Workflow