What this covers
Generate your first AI video in 15 minutes — prompt structure + length.
Key tools and concepts:
- Sora: OpenAI’s text-to-video generation model.
- Veo: Google’s text-to-video generation model, available through Gemini and Vertex products.
What Sora and Veo are, and how to actually try them
Sora
Sora is OpenAI’s text-to-video model. As of 2026, you reach it two ways: sora.com (standalone site) or the Sora tool inside ChatGPT — both require ChatGPT Plus or Pro. Pro gives you longer clips, faster queue, and higher concurrency.
What it does best:
- Complex camera moves: dolly, tracking, aerial, low-flying drone, one-take long shots.
- Stylized looks: golden hour, 35mm film grain, neon cyberpunk, low-saturation cinematic color.
- Abstract / surreal subjects: shattering glass, fluids, smoke, morphing forms, slow-motion physics.
Sample prompt:
A woman in a red trench coat slowly turns toward camera
on a neon-lit Tokyo street in the rain,
slow motion, 35mm film grain,
camera dollies in slowly from waist height,
neon reflections on wet pavement.
What it cannot do well: face identity drifts across multiple shots; hands, fingers, and on-screen text are often distorted; it does not yet generate synced spoken dialogue (you’ll need to add audio in post).
Veo
Veo is Google’s text-to-video model — Veo 3 is the current main version in 2026. Three entry points: the Video tool inside the Gemini app (requires Google AI Pro or Ultra), Google AI Studio (free developer quota), or Vertex AI (enterprise API). Veo 3’s killer feature: it can generate synced dialogue, ambient audio, and music together with the video, which Sora cannot do yet.
What it does best:
- Realistic physics and natural-light scenes: street, documentary, indoor conversations, sun / wind / water.
- Shots that need audio: a person speaking with lip-sync + room ambience, no post dubbing needed.
- Photoreal humans and animals: skin, fur, micro-expressions hold up better.
Sample prompt:
An elderly woman at a Paris café table,
smiles at the camera and says "Bonjour",
natural light, ambient street audio, 35mm film grain,
eye-level fixed camera.
What it cannot do well: more conservative on stylization and surreal warping; default clip length is short (around 8s on most plans); celebrity and copyrighted-character filters are strict; pricing differs a lot between Gemini app, AI Studio, and Vertex API.
One-line picker
- Want cinematic / complex camera / stylized → Sora.
- Want realistic + native synced audio → Veo.
- First time? Use whichever subscription you already have — don’t sign up for a second one just to test.
Who this is for
New to AI video.
When to reach for it
You’ve seen the demos and want to try.
Step by step
- Pick scope: 2-4 seconds first try (Sora defaults near 5s, Veo near 8s — start short either way).
- Prompt covers four things: subject + action verb + camera move + lighting. For Veo, you can also add a
dialogue:orambient:line and it will generate the audio with the video. - Iterate one variable at a time: change the subject, or the camera move, or the lighting, or the style — never two at once, or you can’t tell which change moved the result.
- Stitch clips in an editor (CapCut, Premiere, DaVinci). Sora clips need audio added in post; Veo clips ship with an audio track — be careful not to mute it when cutting.
Recommended workflow
Short clip → tweak → next clip → editor.
Common mistakes
- Long clip on first try: waiting 5 minutes for 15 seconds of waste is more painful than three 3-second tries.
- No verb in prompt: “a sunset by the sea” is a still frame — add “camera dollies slowly along the coastline” and now it’s video.
- Asking Sora for synced dialogue: it doesn’t generate audio yet. Send dialogue scenes to Veo.
- Asking Veo for stylized surreal shots: Veo skews realistic — “a glass person melting into neon” will look stiff. Use Sora for that.
- Cross-tool comparisons that change prompt and model at the same time: fix the prompt, then swap the model, otherwise you won’t know whether the prompt or the model moved the result.
FAQ
Q: Which should I pick for my first AI video — Sora or Veo? A: Pick by aesthetic. Sora handles stylized, surreal, or expressive shots better. Veo skews realistic — better for product video or grounded scenes. If you have access to both, run the same prompt through each and compare.
Q: How long can a single clip be? A: Both default to short clips — typically 5-10 seconds for free / standard tiers, with Pro plans unlocking longer runs and higher concurrency. Plan your storyboard in 8-second beats and stitch in the editor.
Q: Why does my AI video look stiff or jittery? A: Two common causes — over-detailed motion prompts (“walks left, then turns, then sits”) that the model can’t choreograph, and too-short clips that get extended past their natural length. Simplify the action and aim shorter.
Q: Can I A/B compare prompts across Sora and Veo fairly? A: Yes, but change one variable at a time. Fix the prompt, then swap the model. If you change both, you can’t tell whether the prompt or the model moved the result. Save each generation with its exact prompt.