Your prompt asked for a sign that says “OPEN” or a phone screen showing “MESSAGE.” What came back is hieroglyphic letterforms, jittering across frames, with a different spelling each frame. Most current AI video models do not handle in-frame text well — they treat letters as texture rather than language. Veo 3 and Sora are the best at it as of now; everything else mangles text. Fix by either adding text in post with CapCut, Premiere, or DaVinci Resolve, switching to a text-capable model, or compositing a static graphic over the AI-generated background.
Common causes
Ordered by hit rate.
1. Model does not represent text as language
Runway Gen-3, Pika 2.0, Kling 1.6, Hailuo, Luma — all of these were trained to render visual scenes, not to spell. They produce shapes that look like text but are not.
How to spot it: Generate any clip with the word “HELLO” on a sign. If the output reads as “H3LL0” or “HEILO” or different on each frame, the model cannot do text.
2. Text exists per-frame but not across frames
Even on a model that gets the spelling right at frame 1, the model regenerates each frame fresh. By frame 30 the same word has different kerning, color, or shape.
How to spot it: Pause at frame 1 and frame 30. If text “wobbles,” the model is regenerating it each frame rather than tracking a static object.
3. Text is small in frame
Smaller text = fewer pixels = less chance the model has the capacity to spell. Big headline text on a wall is easier than small button labels on a UI.
How to spot it: Estimate the text height in pixels. Under 40 pixels tall? Expect garbled. Over 200 pixels tall? Some models can manage it.
4. Multiple text instances in one clip
A street scene with three signs, two posters, and a license plate is asking the model to render text in five places at once. It will fail in at least four.
How to spot it: Count text regions. More than one and you should plan to do them in post.
5. Stylized fonts requested
“Cursive neon,” “graffiti tag,” “1920s film title card” all push text into a stylized space where even Veo 3 / Sora may slip.
How to spot it: Reduce to plain sans-serif uppercase. If that works and stylized doesn’t, the style was the problem.
Shortest path to fix
Step 1: Switch to a text-capable model if regeneration is feasible
# Veo 3 / Veo 3.1
- Best in-frame text rendering as of 2026
- Works for short words, signage, basic UI elements
- Prompt:
"A wooden shop sign with the text OPEN in clear black letters,
sharp focus, no other text in scene."
# Sora
- Strong for medium-length text in clear typography
- Storyboard mode lets you specify exact text per shot
- Prompt:
"Vintage diner sign with the word DINER in red neon, glowing steadily,
no flicker, no other text."
# Kling 2.0
- Improved text vs 1.6, still inconsistent
- Try only as fallback
Step 2: Plan AI video without text, add text in post
This is the production-grade approach. Generate the clip with a blank surface where text should go, then composite text on top:
# Prompt the AI video for a blank surface
"A wooden shop sign hanging from a chain, blank surface, no text, no markings,
clean weathered wood ready for signage."
# In CapCut
- Add Text -> place over the blank sign
- Animate position to track the sign's apparent motion
- Match perspective with 3D Layer if tilted
- Export
# In Premiere Pro
- Essential Graphics -> Text
- Track with manual keyframes or Mocha tracking
- Apply Drop Shadow to ground it in the scene
# In DaVinci Resolve Fusion
- Text+ node
- Tracker node bound to the surface
- Merge over the AI footage
Step 3: Use static graphic overlay for fixed-camera shots
If the AI clip has no camera motion and the text region is static, just slap a PNG on top:
# Create the text in Photoshop, Figma, or Affinity Designer
- Match the implied lighting of the scene
- Add slight noise / grain to match camera response
- Export as PNG with transparency
# Composite in any editor
- Place PNG on track above AI video
- Match position to the intended surface
- Add subtle 5-10% opacity grain layer over both to unify
Step 4: Mask and patch garbled text with post tools
If you already have the AI clip with bad text and re-rendering is not an option:
# DaVinci Resolve Fusion
- Mask the garbled text region
- Patch Replacer node samples adjacent clean surface
- Composite new clean text on top via Text+ node
# Adobe After Effects
- Content-Aware Fill on the masked region
- Add layer of correct text on top
- Track with Mocha to match motion
# CapCut Pro
- Object Removal -> brush over garbled text
- Add new Text element with desired wording
- Position over the cleaned region
Step 5: Last resort — accept stylized illegibility
For purely aesthetic background signage (alley shots, distant billboards), illegible text reads as “atmospheric foreign language” and audiences accept it. Only do this if text is decorative, never if it carries information.
# Decision rule:
- Hero text (logo, headline, dialogue card)? Fix it with Step 2 or 3.
- Background atmosphere text (neon signs, posters in distance)? Acceptable as-is.
- UI / button text (phone screens, computer monitors)? Must be added in post.
Prevention
- Plan video and graphics as separate layers from the start.
- Generate AI clips with blank surfaces specifically for text overlay.
- Build a library of branded text templates in your editor.
- Reserve Veo 3 / Sora for shots where in-frame text is unavoidable.
- For UI / app demo videos, do screen recordings or mockups in post, not in generation.