AI Video Audio Out of Sync With Visuals: Fix

Q: Does `-itsoffset` re-encode and hurt quality?

No. With `-c copy` it only rewrites timestamps and copies the streams, so there is no quality loss and it finishes in seconds. Only the fps conform (`-r`) re-encodes the video.

Q: Should I use `-async` to fix drift?

The old `-async` audio option is deprecated. For slow drift, conform the video to CFR and add `-af aresample=async=1`, which resamples the audio to follow the corrected timestamps.

Lips move, drums hit, but audio drifts ahead of or behind the picture. Fix with a one-command ffmpeg offset, a VFR-to-CFR conform, or a one-pass synced-audio model.

Published: May 24, 2026 Updated: Jun 18, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You generated a video with audio and the soundtrack drifts visibly against the picture. The drum hits land a beat early, the dialogue arrives after the lips move, or the footstep slaps disappear halfway through. This is not bad lip-sync from a talking-head generator. This is the audio and video being two different generations stitched together with no shared clock, or a frame-rate mismatch stretching one against the other.

Fastest fix: if the audio is off by a fixed amount across the whole clip, nudge it with one ffmpeg command (no re-encode):

# Push audio LATER by 0.25s (audio was arriving early)
ffmpeg -i in.mp4 -itsoffset 0.25 -i in.mp4 -map 0:v:0 -map 1:a:0 -c copy out.mp4

# Pull audio EARLIER by 0.25s (audio was arriving late)
ffmpeg -itsoffset 0.25 -i in.mp4 -i in.mp4 -map 1:v:0 -map 0:a:0 -c copy out.mp4

-itsoffset N delays the start of whichever input follows it, so you mix the delayed input’s stream with the other input’s. -c copy means no quality loss and it finishes in seconds. If the drift instead grows over the clip, skip ahead to the fps conform in Step 3.

Which bucket are you in

Symptom	Likely cause	Go to
Audio off by the same amount start to finish	Constant offset (two-pass stitch)	Step 2 / `-itsoffset` above
Drift small at the start, large by the end	Frame-rate mismatch or VFR	Step 3
Audio just stops mid-clip, video continues	Model dropped audio mid-generation	Step 4 (regenerate)
Fine on the original, broke after converting	Remux rewrote the timestamps	re-test the raw file, then Step 5
Lips move wrong on a single talking head	Lip-sync defect, not a sync offset	see AI video talking head lip sync off

Common causes

Ordered by hit rate.

1. Audio generated in a separate pass

You generated silent video (Pika 2.5, older Runway Gen-3, or any “video only” run) and added audio later from ElevenLabs, Suno, or a stock library. The two passes share no timecode, so any delay between cue points compounds across the clip.

How to spot it: open the project file. If the audio track was imported separately from the video track, you have a two-pass setup and drift will accumulate.

Note for June 2026: most current flagship models now bake audio in one pass (Sora 2, Veo 3.1, Kling 3.0, Runway Gen-4), so this cause shows up mainly when you deliberately generated silent video or stitched a stock track on top.

2. Frames-per-second mismatch on export

The model rendered at 24 fps but the export pipeline or editor sequence assumes 30 fps. Audio plays at real time, video plays slower or faster, and they diverge linearly. After 10 seconds the drift is obvious.

How to spot it: check the source clip with ffprobe -v error -select_streams v:0 -show_entries stream=r_frame_rate -of csv=p=0 input.mp4. Compare that to the project sequence fps in your editor.

3. Model dropped audio mid-clip

Generators occasionally output a clip where the audio track ends partway through, for example at second 5 of a 10-second clip, while the video keeps going silently. This has been reported across Sora 2, Veo 3.1, and Kling 3.0 on longer prompts.

How to spot it: scrub the audio waveform. If it visibly flatlines mid-clip while the video continues, this is a generation defect, not a sync offset. No amount of nudging fixes it; regenerate (Step 4).

4. Variable frame rate (VFR) source

Phone screen recordings of the generation UI, or downloads from some platforms, ship VFR. Editors conform VFR poorly and drift accumulates non-linearly.

How to spot it: ffprobe -select_streams v -show_entries frame=pts_time -of csv=p=0 input.mp4 and check whether frame intervals are constant. If they wander, it is VFR.

5. Codec or container remux corrupted the timestamp track

You ran the clip through HandBrake or an online converter and the audio PTS got rewritten incorrectly.

How to spot it: sync is fine on the original download but broken after conversion. Always test the raw file first.

Shortest path to fix

Step 1: Measure drift direction and magnitude

Pick a sharp visual cue (a hand clap, a door slam, a drum hit) and find its matching peak in the audio waveform. Measure the offset in frames.

# In Premiere / Resolve / CapCut
- Zoom timeline to single-frame resolution
- Mark the video event with a marker on V1
- Mark the matching audio peak with a marker on A1
- Difference = drift in frames
- Divide by fps to get seconds: 6 frames at 24 fps = 0.25s

If drift is constant across the clip, it is an offset (Step 2). If drift grows over time, it is an fps mismatch (Step 3).

Step 2: Apply a constant offset

The one-command ffmpeg fix at the top is the fastest. To do it by hand in an editor instead:

# CapCut
- Right-click the audio clip -> Detach audio
- Drag audio earlier or later by the measured frames
- Or select the clip and press , or . for a 1-frame nudge

# Premiere Pro
- Select the audio clip, press Shift+Left/Right to nudge 5 frames (1 frame without Shift)
- Or Effect Controls -> Time Remapping is for speed; for a pure shift just slide the clip

# DaVinci Resolve (Edit page)
- Select the audio clip, press , or . for a 1-frame trim/slip
- Hold the clip and nudge; the offset shows in the viewer's frame counter

Step 3: Conform fps for variable or mismatched rates

# Convert VFR (or a mismatched rate) to true CFR with ffmpeg.
# -fps_mode cfr is the current flag; it replaced the deprecated -vsync 1.
ffmpeg -i input.mp4 -fps_mode cfr -r 24 -c:a copy output.mp4

# If audio also drifts slowly, resample it to follow the new timestamps:
ffmpeg -i input.mp4 -fps_mode cfr -r 24 -af aresample=async=1 output.mp4

# Or in HandBrake (Video tab)
- Framerate (FPS) -> pick 24 (or match the model's render rate)
- Select "Constant Framerate"

# Then reimport into the editor with the sequence also set to 24 fps

Set the target rate (-r 24) to the model’s actual render rate from the ffprobe check in cause 2, not a guess. The -fps_mode flag and the deprecated -vsync are both documented in the official ffmpeg manual; as of June 2026 -vsync 1 still runs but prints a deprecation warning, so prefer -fps_mode cfr.

Step 4: Regenerate with a one-pass synced-audio model

If the audio dropped mid-clip, or it drifts no matter what you do, regenerate with a model that bakes audio at generation time. As of June 2026:

# Sora 2 (OpenAI)
- Synchronized dialogue, SFX, and music in one pass; base clip ~15s, ~25s on Sora 2 Pro
- Reliable lip-sync on talking heads

# Veo 3.1 (Google)
- Native 48kHz audio (ambient + dialogue + foley) locked to picture; 4K output
- Base clip 8s; scene-extension chains up to ~140s while keeping audio synced

# Kling 3.0 (Omni One)
- One-pass video + audio with lip-sync; clips up to ~10s at 1080p, multilingual dialogue

# Runway Gen-4
- Native audio (lip-sync + environmental SFX) synthesized alongside video, added May 3 2026

Re-prompt example that calls the cue out explicitly:

A wooden door slams shut as a person enters a quiet room.
Synced audio: the slam lands on the visual contact frame.
Ambient room tone after, no music.

Step 5: Strip audio and remux as a fallback

If the generated audio is simply wrong but the visuals are keepers, and you have a cleaner alternative track:

# Strip the broken audio
ffmpeg -i broken.mp4 -an -c:v copy video_only.mp4

# Mux in the fresh audio
ffmpeg -i video_only.mp4 -i new_audio.wav -c:v copy -c:a aac -shortest final.mp4

If the replacement track still needs a small shift, add -itsoffset to the audio input exactly as in the fastest-fix command above.

How to confirm it is fixed

Reload the rendered file into the editor (do not trust the old timeline cache).
Jump to your reference cue (the clap or door slam) at single-frame zoom; the audio peak should sit on the contact frame, within 1 frame.
Jump to a cue near the very end of the clip and check again. If the start is locked but the end drifts, you fixed an offset but still have an fps mismatch (return to Step 3).
Play back at full resolution, not the proxy preview, since proxy playback can mask or invent drift.

Prevention

Lock the project sequence fps to the source clip fps before importing.
Prefer one-pass synced-audio models (Sora 2, Veo 3.1, Kling 3.0, Runway Gen-4) for talking-head and SFX-critical clips.
Convert any VFR source to CFR before editing with -fps_mode cfr -r <rate>.
Keep raw downloads in a master folder; never edit on a HandBrake-converted file, and re-test the raw file first if sync looks broken.
Always verify playback at full resolution, not just the preview proxy.

FAQ

Why does the drift only show up after about 10 seconds? That is the signature of a frame-rate mismatch (or VFR), not an offset. A constant offset is wrong by the same amount from frame one; an fps mismatch starts near-perfect and diverges linearly. Conform to CFR (Step 3) rather than nudging.

My audio just stops partway through. Is that a sync problem? No. That is the model dropping the audio track mid-generation (cause 3). Nudging or remuxing the existing track will not bring back audio that was never rendered. Regenerate the clip (Step 4), ideally with a shorter prompt or a one-pass model.

Does -itsoffset re-encode and hurt quality? No. With -c copy it only rewrites timestamps and copies the streams, so there is no quality loss and it finishes in seconds. Only the fps conform (-r) re-encodes the video.

Should I use -async to fix drift? The old -async audio option is deprecated. For slow drift, conform the video to CFR and add -af aresample=async=1, which resamples the audio to follow the corrected timestamps.

The fix works in my editor but breaks again after export. Why? Your export setting is changing the frame rate or re-introducing VFR. Match the export fps to the sequence fps, choose a constant-frame-rate preset, and re-run the confirmation steps on the exported file, not the timeline.

Tags: #ai-video #Troubleshooting #audio-sync

Which bucket are you in

Common causes

1. Audio generated in a separate pass

2. Frames-per-second mismatch on export

3. Model dropped audio mid-clip

4. Variable frame rate (VFR) source

5. Codec or container remux corrupted the timestamp track

Shortest path to fix

Step 1: Measure drift direction and magnitude

Step 2: Apply a constant offset

Step 3: Conform fps for variable or mismatched rates

Step 4: Regenerate with a one-pass synced-audio model

Step 5: Strip audio and remux as a fallback

How to confirm it is fixed

Prevention

FAQ

Related

Related Articles

AI Video Extend Loses Style, Color, or Character

AI Video: Hands Disappear or Morph During Motion

AI Video Output FPS Doesn't Match What You Requested

Fix an AI Video Loop That Has a Visible Seam

AI Video Multi-Character Identities Swapped Mid-Clip Fix

Fix AI Video Prompt Keyword Ignored Mid-Clip