The task
You’re shooting a 5-10 minute YouTube video this week. Writing the script eats 2-4 hours, and the part that matters most — the first 30 seconds — is the easiest to get wrong. A good AI workflow lands a first-draft script with a real hook, retention beats, B-roll cues, and a clear payoff in 15 minutes, leaving you time to actually edit and shoot.
This guide is for solo creators, educators, indie SaaS founders, and anyone using YouTube for top-of-funnel content.
When AI is the right tool
Use AI when you know your audience and your one-sentence promise, and you need structure plus first-pass language. Models are excellent at chunking content into act 1 / 2 / 3, suggesting B-roll, and writing cold opens with specific examples.
It also shines for repurposing: turn a 60-minute podcast into a 7-minute YouTube script by feeding the transcript and asking for the three sharpest beats.
When not to rely on AI alone
If you’re on camera, never read AI prose verbatim — it sounds like nobody. Rewrite every line in your voice. AI also doesn’t know your channel’s running jokes, your unique POV, or the inside info that makes a video feel singular.
Skip pure AI for sensitive topics (mental health, finance, medical) without a human SME pass.
What to feed the AI
- Topic and the one specific angle you’re taking
- Audience: who, what they already know, what they don’t
- Promise: what the viewer can do or believe after watching
- Length: target minutes and target word count (~150 words/minute spoken)
- Two channels whose tone you like, two things that bore you
The “what bores you” list is underused and quietly important.
Copy-ready prompt
You are a YouTube scriptwriter. Write a {target_minutes}-minute script.
Topic: {topic}
Angle: {sharp_angle}
Audience: {who_they_are_and_what_they_know}
Promise: {what_they_can_do_after}
Tone references: {channel_1}, {channel_2}
Avoid: {what_bores_you}
Structure:
- 0:00-0:15 cold open: a concrete moment or surprising claim. No "in this video..."
- 0:15-0:30 hook: state the promise + a tease of the payoff.
- Act 1 (~25% of runtime): set up the problem with a specific example.
- Act 2 (~50%): the meat — 3 sub-points, each with one concrete demo or visual.
- Act 3 (~20%): payoff + one actionable next step.
- 0:30 outro: CTA, what to watch next.
Every 60-90 seconds, add a retention beat (pattern interrupt, callback, or question).
Mark B-roll cues inline as [B-ROLL: ...].
Mark on-screen text as [TEXT: ...].
Word count target: {minutes * 150}.
Recommended output structure
A timed script with timestamp markers, B-roll cues, and on-screen text directions inline. Keep paragraphs short (1-3 sentences) so you can read them off a teleprompter without losing your place.
End with the CTA and “next video” suggestion clearly labeled so your editor can find them.
How to check the output
Read it aloud with a timer. If you’re off target by more than 20%, cut or expand. Watch the first 30 seconds specifically — does the opening line work without context? Would you keep watching at second 15?
Then count retention beats: there should be one every ~75 seconds. If they’re all questions or all callbacks, vary them.
Common mistakes
- “In this video, I’m going to tell you…” openings (kills retention)
- No retention beats — viewers drop at the 60s and 180s marks
- B-roll cues missing, so editing takes 3x longer
- A promise the video doesn’t actually deliver
- AI prose read verbatim, which sounds robotic on camera
Next steps to keep improving
Track 30-second retention in YouTube Studio. After each video, paste your script back into the model with the retention curve and ask “where would retention dip and why?” Use the feedback for the next script.
Practical depth notes
For How to Write a YouTube Script With AI That Actually Retains Viewers, the difference between a usable AI result and a generic one is the input packet. Give the model the audience, the current draft or raw material, the desired format, the decision you need to make, and two examples of what good and bad output look like. Ask it to preserve facts first, then improve structure or wording second.
After the first response, do a separate review pass. Look for missing constraints, invented details, weak calls to action, and language that sounds plausible but does not match the real situation. The best final output should be easy to use immediately: clear owner, clear next step, and no hidden assumption that someone else has to untangle. One final check: compare the finished result against the original goal in a single sentence. If that sentence is hard to write, the output is probably polished but unfocused. Tighten the goal, remove decorative language, and rerun only the weak section instead of regenerating the entire piece.
FAQ
- Can AI write the thumbnail copy too? Yes — ask for 5 thumbnail text options and pick the one that contradicts an expectation or promises a concrete payoff.
- How do I keep my voice? Rewrite the cold open and outro yourself. Those carry the most personality.
- What about Shorts? Different format, different beats — see the linked Shorts prompts below.
Related
For 60-second formats, use YouTube Shorts script prompts and the broader short video script with AI workflow. For Reels and TikTok openings, swap in reel hook prompts.