Auto-generated tests often test implementation details and pad coverage. These prompts target behavior-based tests.
Who this is for
Engineers adding tests before a refactor, contributors trying to land a PR with required coverage, indie devs needing a safety net before launch, anyone who inherited untested code.
When not to use these prompts
Don’t generate tests for code you don’t plan to keep — write the spec first instead. Avoid these for throwaway scripts and trivial getters/setters; the test cost dwarfs the bug it would catch.
Prompt anatomy / structure formula
A test-generation prompt should always carry six elements:
- Subject under test: the function / module / endpoint with its signature and types.
- Test taxonomy: unit / integration / e2e / property — never “write some tests”.
- Behavior contract: what the code MUST do, not what it currently does (avoids change-detector tests).
- Coverage scope: happy path + N edge cases + 1 invalid input — exact counts force completeness.
- Framework + style:
vitest/pytest/go test, plus mocks-vs-real expectations. - Output shape: runnable code only, no explanatory prose unless asked.
Best for
- New feature tests
- Regression tests
- Edge-case coverage
- Pre-refactor safety net
- Tightening a flaky test suite
13 copy-ready prompt templates
1. Behavior-first unit tests
For {function}, write unit tests by observable behavior, not internal state. Cover: happy path, 3 edge cases, 1 invalid input. Use {test framework}.
2. Regression test from a bug report
Bug: {description}. Failing repro: {steps}. Write 1 minimal failing test that captures this. Once fixed, this test must fail before fix and pass after.
3. Property-based test ideas
For {function}, identify 3 properties that should always hold (e.g., "output sorted regardless of input order"). Write property-based test stubs.
4. Boundary-input tests
For {function with type info}, generate tests for boundary inputs: empty, single, max, very long, special chars, unicode, negative. Mark which currently fail.
5. Integration tests for a flow
Below is a flow involving {N components}. Write 3 integration tests covering: golden path, one failure injected per step, recovery.
{paste flow}
6. Mock vs real strategy
For {feature}, advise: which dependencies to mock, which to keep real. Justify each choice based on stability + speed trade-off.
7. Snapshot test critique
Below are existing snapshot tests. For each: is it useful or noise? Suggest replacements or deletions.
{paste}
8. Flaky test diagnosis
Test {name} is flaky. Likely causes (in priority order): network, timing, shared state, randomness, ordering. Read the test + tested code; identify the most likely cause and the fix.
{paste}
9. Tests for an API endpoint
For the endpoint `{METHOD /path}` (handler pasted below), write integration tests in {framework} covering: (1) happy path with valid auth, (2) 401 unauth, (3) 403 wrong role, (4) 400 invalid body — name the specific invalid field, (5) 404 resource missing, (6) idempotency under retry (same key, second call). Tests must seed/teardown DB state per case.
{paste handler + schema}
Variables to swap: framework (supertest+vitest, pytest+httpx, etc.)
Optimization: If you have OpenAPI / Zod schemas, paste them too — AI will derive invalid-input cases automatically.
10. React / UI component tests
For the React component below, write tests in {framework, e.g. RTL + vitest}. Cover: (1) renders given a typical prop set, (2) calls the correct callback on user interaction, (3) handles loading state, (4) handles error state, (5) accessibility — focusable, labelled, role-correct. Test by what the USER sees, not by component internals.
{paste component}
11. Test-pyramid balancer
Run when your test suite feels heavy but useless.
Below is my test directory layout + a list of test files (paste). Analyze the test pyramid: ratio of unit / integration / e2e. Identify (1) where the pyramid is inverted (too many e2e), (2) tests that should be pushed down (e2e → integration → unit), (3) duplicate coverage between layers, (4) 5 specific test moves that would cut CI time by ≥30% without losing safety.
{paste tree + sample tests}
12. Coverage-gap finder (no tools)
Don't run coverage tools. Below is the module under test + its current test file. Qualitatively identify: (1) the 3 branches with no test coverage, (2) any error path that isn't exercised, (3) any data shape used in production but not in tests, (4) the 5 highest-ROI tests I should add this week, ranked by likelihood-of-bug × user-impact.
Module: {paste}
Tests: {paste}
13. Test naming & structure cleanup
Below are 20 of my test names + bodies. Rewrite for readability: (1) name format `describe<unit>().it<scenario>().expect<outcome>`, (2) remove "tests" / "should" prefixes that add no info, (3) collapse duplicated setup into a `beforeEach`, (4) flag any 2 tests that are testing the same thing and suggest a merge.
{paste tests}
Common mistakes
- Tests that mirror implementation step-by-step
- “100% coverage” without testing behavior
- Flaky tests left in main
- Generating tests AFTER fixing a bug instead of BEFORE — you lose the regression guarantee
- Mocking the thing you’re supposed to be testing
How to push results further
- Always state the framework and assertion library — otherwise AI mixes Jest with Mocha syntax.
- For each test ask “what bug would this catch?” — if you can’t answer, delete the test.
- Generate the failing test before the fix. The test must fail on
mainand pass on your branch — only then is it a real regression test. - Prefer table-driven test generation prompts (one row = one case). Easier to extend, lower duplication.
- For Claude Code / Cursor, let the agent
Readneighboring tests so style matches — pasted-style tests look foreign in PR review. - Cap test count per prompt at 8-10; more and AI starts duplicating semantics with renamed variables.
- Run the generated tests in a scratch branch first; auto-applying broken tests poisons CI for the team.
FAQ
- How do I avoid change-detector tests?: Always describe the BEHAVIOR contract in the prompt (“function returns sorted unique strings”), not the implementation. The test should still pass after a refactor.
- What test framework should I tell AI to use?: Whatever your project already uses. Mixing frameworks adds cognitive load and breaks CI runners. Look at one existing test file and copy its imports into the prompt.
- Should AI mock or use real dependencies?: Mock external (HTTP, payments, email). Use real for in-process (parsers, formatters, pure functions). Tell AI this rule in the prompt — defaults vary.
- Can AI catch flaky tests?: Sometimes. Template 8 (flaky diagnosis) works when you paste the test + tested code together. Pure-from-name flake hunting is mostly guessing.
- How many tests per function?: Happy path + 3 edge cases + 1 invalid input is a solid baseline. Add more only when a known bug taught you a missing case.
- My AI-generated tests pass even when I break the code. Why?: AI tested the AI’s mental model, not your code. Always mutate one line and re-run — if tests still pass, the tests are noise. Use template 12 to find gaps.