You generated a video with audio and the soundtrack drifts visibly against the picture. The drum hits land a beat early, the dialogue arrives after the lips move, or the footstep slaps disappear entirely halfway through. This is not the same as bad lip-sync from a talking-head generator — this is the audio and video being two different generations stitched together with no shared clock. Fix it by either re-rendering with a model that produces synced audio in one pass, or by manually nudging the audio in CapCut, Premiere, or DaVinci Resolve.
Common causes
Ordered by hit rate.
1. Audio generated in a separate pass
Tools like Runway Gen-3, Pika 2.0, and Kling 1.6 generate silent video, and you added audio later from ElevenLabs, Suno, or a stock library. The two passes have no shared timecode, so any delay between cue points compounds across the clip.
How to spot it: Open the project file. If the audio track was imported separately from the video track, you have a two-pass setup. Drift will accumulate.
2. Frames-per-second mismatch on export
The model rendered at 24 fps but the export pipeline assumes 30 fps. Audio plays at real time, video plays slower or faster, and they diverge linearly. After 10 seconds the drift is obvious.
How to spot it: Check the source clip metadata with ffprobe -v error -show_entries stream=r_frame_rate input.mp4. Compare to the project sequence fps in your editor.
3. Model with audio dropped audio mid-clip
Sora with audio, Veo 3, and Kling Audio occasionally output a clip where the audio track ends at second 4 of an 8-second clip. The video continues silently.
How to spot it: Scrub the audio waveform. If it visibly stops mid-clip while the video continues, this is a generation defect, not a sync issue.
4. Variable frame rate (VFR) source
Phone screen recordings of the generation UI, or downloads from some platforms, ship VFR. Editors convert poorly and drift accumulates.
How to spot it: ffprobe -select_streams v -show_entries frame=pkt_pts_time and check if frame intervals are constant. If not, VFR.
5. Codec or container remux corrupted the timestamp track
You ran the clip through HandBrake or an online converter and the audio PTS got rewritten incorrectly.
How to spot it: Sync is fine on the original download but broken after conversion. Always test the raw file first.
Shortest path to fix
Step 1: Identify drift direction and magnitude
Pick a sharp visual cue (a hand clap, a door slam, a drum hit) and find its corresponding peak in the audio waveform. Measure the offset in frames.
# In Premiere / Resolve / CapCut
- Zoom timeline to single-frame resolution
- Mark video event with marker on V1
- Mark audio peak with marker on A1
- Difference = drift in frames
- Divide by fps to get seconds: 6 frames at 24 fps = 0.25s
If drift is constant across the clip, it is offset. If drift grows over time, it is fps mismatch.
Step 2: Apply audio offset for constant drift
# CapCut
- Right-click audio clip -> Detach
- Drag audio earlier or later by measured frames
- Or use keyboard: select clip, press . or , for 1-frame nudge
# Premiere Pro
- Select audio clip, press Shift+arrow to nudge 5 frames
- Or Effect Controls -> Time -> set offset directly
# DaVinci Resolve
- Select audio clip, use +/- key for 1-frame slip
- Or right-click -> Change Clip Speed for precise offset
Step 3: Conform fps for variable or mismatched rates
# Convert VFR to CFR with ffmpeg
ffmpeg -i input.mp4 -vf fps=24 -c:a copy output.mp4
# Or use HandBrake
- Framerate -> Constant Framerate
- Same as source -> 24 fps
# Then reimport into editor with sequence set to 24 fps
Step 4: Regenerate with synced-audio model
If you have budget for a re-render, switch to a model that bakes audio at generation:
# Veo 3 / Veo 3.1
- Native audio generation per clip
- Ambient + dialogue + foley all locked to picture
# Sora with audio
- Synced dialogue from storyboard prompt
- Reliable lip-sync on talking heads
# Kling Audio (Kling 1.6 Audio mode)
- One-pass video + ambient audio
- Less reliable for dialogue, strong for sfx
Re-prompt example:
A wooden door slams shut as a person enters a quiet room.
Synced audio: the slam should land on the visual contact frame.
Ambient room tone after, no music.
Step 5: Strip audio and remux as a fallback
If the synced audio still drifts and you have a cleaner alternative track:
# Strip the broken audio
ffmpeg -i broken.mp4 -an -c:v copy video_only.mp4
# Mux in fresh audio
ffmpeg -i video_only.mp4 -i new_audio.wav -c:v copy -c:a aac -shortest final.mp4
This works well when the AI-generated audio is wrong but the visuals are keepers.
Prevention
- Lock your project sequence fps to the source clip fps before importing.
- Prefer one-pass synced-audio models (Veo 3, Sora) for talking-head and SFX-critical clips.
- Always test playback in the editor at full resolution, not just the preview proxy.
- Convert any VFR source to CFR before editing with
ffmpeg -vf fps=24. - Keep raw downloads in a master folder; never edit on a HandBrake-converted file.