Sora / Veo Beginner Guide

First AI video in 15 minutes — Sora and Veo, prompt structure, clip length limits, and the realism vs surrealism split that decides which model to use.

What this covers

Generate your first AI video in 15 minutes — prompt structure + length.

Key tools and concepts:

  • Sora: OpenAI’s text-to-video generation model.
  • Veo: Google’s text-to-video generation model, available through Gemini and Vertex products.

What Sora and Veo are, and how to actually try them

Sora

Sora is OpenAI’s text-to-video model. As of 2026, you reach it two ways: sora.com (standalone site) or the Sora tool inside ChatGPT — both require ChatGPT Plus or Pro. Pro gives you longer clips, faster queue, and higher concurrency.

What it does best:

  • Complex camera moves: dolly, tracking, aerial, low-flying drone, one-take long shots.
  • Stylized looks: golden hour, 35mm film grain, neon cyberpunk, low-saturation cinematic color.
  • Abstract / surreal subjects: shattering glass, fluids, smoke, morphing forms, slow-motion physics.

Sample prompt:

A woman in a red trench coat slowly turns toward camera
on a neon-lit Tokyo street in the rain,
slow motion, 35mm film grain,
camera dollies in slowly from waist height,
neon reflections on wet pavement.

What it cannot do well: face identity drifts across multiple shots; hands, fingers, and on-screen text are often distorted; it does not yet generate synced spoken dialogue (you’ll need to add audio in post).

Veo

Veo is Google’s text-to-video model — Veo 3 is the current main version in 2026. Three entry points: the Video tool inside the Gemini app (requires Google AI Pro or Ultra), Google AI Studio (free developer quota), or Vertex AI (enterprise API). Veo 3’s killer feature: it can generate synced dialogue, ambient audio, and music together with the video, which Sora cannot do yet.

What it does best:

  • Realistic physics and natural-light scenes: street, documentary, indoor conversations, sun / wind / water.
  • Shots that need audio: a person speaking with lip-sync + room ambience, no post dubbing needed.
  • Photoreal humans and animals: skin, fur, micro-expressions hold up better.

Sample prompt:

An elderly woman at a Paris café table,
smiles at the camera and says "Bonjour",
natural light, ambient street audio, 35mm film grain,
eye-level fixed camera.

What it cannot do well: more conservative on stylization and surreal warping; default clip length is short (around 8s on most plans); celebrity and copyrighted-character filters are strict; pricing differs a lot between Gemini app, AI Studio, and Vertex API.

One-line picker

  • Want cinematic / complex camera / stylized → Sora.
  • Want realistic + native synced audio → Veo.
  • First time? Use whichever subscription you already have — don’t sign up for a second one just to test.

Who this is for

New to AI video.

When to reach for it

You’ve seen the demos and want to try.

Step by step

  1. Pick scope: 2-4 seconds first try (Sora defaults near 5s, Veo near 8s — start short either way).
  2. Prompt covers four things: subject + action verb + camera move + lighting. For Veo, you can also add a dialogue: or ambient: line and it will generate the audio with the video.
  3. Iterate one variable at a time: change the subject, or the camera move, or the lighting, or the style — never two at once, or you can’t tell which change moved the result.
  4. Stitch clips in an editor (CapCut, Premiere, DaVinci). Sora clips need audio added in post; Veo clips ship with an audio track — be careful not to mute it when cutting.

Short clip → tweak → next clip → editor.

Common mistakes

  • Long clip on first try: waiting 5 minutes for 15 seconds of waste is more painful than three 3-second tries.
  • No verb in prompt: “a sunset by the sea” is a still frame — add “camera dollies slowly along the coastline” and now it’s video.
  • Asking Sora for synced dialogue: it doesn’t generate audio yet. Send dialogue scenes to Veo.
  • Asking Veo for stylized surreal shots: Veo skews realistic — “a glass person melting into neon” will look stiff. Use Sora for that.
  • Cross-tool comparisons that change prompt and model at the same time: fix the prompt, then swap the model, otherwise you won’t know whether the prompt or the model moved the result.

FAQ

Q: Which should I pick for my first AI video — Sora or Veo? A: Pick by aesthetic. Sora handles stylized, surreal, or expressive shots better. Veo skews realistic — better for product video or grounded scenes. If you have access to both, run the same prompt through each and compare.

Q: How long can a single clip be? A: Both default to short clips — typically 5-10 seconds for free / standard tiers, with Pro plans unlocking longer runs and higher concurrency. Plan your storyboard in 8-second beats and stitch in the editor.

Q: Why does my AI video look stiff or jittery? A: Two common causes — over-detailed motion prompts (“walks left, then turns, then sits”) that the model can’t choreograph, and too-short clips that get extended past their natural length. Simplify the action and aim shorter.

Q: Can I A/B compare prompts across Sora and Veo fairly? A: Yes, but change one variable at a time. Fix the prompt, then swap the model. If you change both, you can’t tell whether the prompt or the model moved the result. Save each generation with its exact prompt.

Tags: #Tutorial #Video generation #Getting started