Test Generation Prompts (Integration / E2E / Snapshot): 13 Templates

13 prompts for integration, E2E, snapshot, and contract tests — for unit-test prompts specifically, see the unit-test article. Tests that catch real bugs, not noise.

Auto-generated tests often test implementation details and pad coverage. These prompts target behavior-based tests.

Who this is for

Engineers adding tests before a refactor, contributors trying to land a PR with required coverage, indie devs needing a safety net before launch, anyone who inherited untested code.

When not to use these prompts

Don’t generate tests for code you don’t plan to keep — write the spec first instead. Avoid these for throwaway scripts and trivial getters/setters; the test cost dwarfs the bug it would catch.

Prompt anatomy / structure formula

A test-generation prompt should always carry six elements:

  • Subject under test: the function / module / endpoint with its signature and types.
  • Test taxonomy: unit / integration / e2e / property — never “write some tests”.
  • Behavior contract: what the code MUST do, not what it currently does (avoids change-detector tests).
  • Coverage scope: happy path + N edge cases + 1 invalid input — exact counts force completeness.
  • Framework + style: vitest / pytest / go test, plus mocks-vs-real expectations.
  • Output shape: runnable code only, no explanatory prose unless asked.

Best for

  • New feature tests
  • Regression tests
  • Edge-case coverage
  • Pre-refactor safety net
  • Tightening a flaky test suite

13 copy-ready prompt templates

1. Behavior-first unit tests

For {function}, write unit tests by observable behavior, not internal state. Cover: happy path, 3 edge cases, 1 invalid input. Use {test framework}.

2. Regression test from a bug report

Bug: {description}. Failing repro: {steps}. Write 1 minimal failing test that captures this. Once fixed, this test must fail before fix and pass after.

3. Property-based test ideas

For {function}, identify 3 properties that should always hold (e.g., "output sorted regardless of input order"). Write property-based test stubs.

4. Boundary-input tests

For {function with type info}, generate tests for boundary inputs: empty, single, max, very long, special chars, unicode, negative. Mark which currently fail.

5. Integration tests for a flow

Below is a flow involving {N components}. Write 3 integration tests covering: golden path, one failure injected per step, recovery.

{paste flow}

6. Mock vs real strategy

For {feature}, advise: which dependencies to mock, which to keep real. Justify each choice based on stability + speed trade-off.

7. Snapshot test critique

Below are existing snapshot tests. For each: is it useful or noise? Suggest replacements or deletions.

{paste}

8. Flaky test diagnosis

Test {name} is flaky. Likely causes (in priority order): network, timing, shared state, randomness, ordering. Read the test + tested code; identify the most likely cause and the fix.

{paste}

9. Tests for an API endpoint

For the endpoint `{METHOD /path}` (handler pasted below), write integration tests in {framework} covering: (1) happy path with valid auth, (2) 401 unauth, (3) 403 wrong role, (4) 400 invalid body — name the specific invalid field, (5) 404 resource missing, (6) idempotency under retry (same key, second call). Tests must seed/teardown DB state per case.

{paste handler + schema}

Variables to swap: framework (supertest+vitest, pytest+httpx, etc.)

Optimization: If you have OpenAPI / Zod schemas, paste them too — AI will derive invalid-input cases automatically.

10. React / UI component tests

For the React component below, write tests in {framework, e.g. RTL + vitest}. Cover: (1) renders given a typical prop set, (2) calls the correct callback on user interaction, (3) handles loading state, (4) handles error state, (5) accessibility — focusable, labelled, role-correct. Test by what the USER sees, not by component internals.

{paste component}

11. Test-pyramid balancer

Run when your test suite feels heavy but useless.

Below is my test directory layout + a list of test files (paste). Analyze the test pyramid: ratio of unit / integration / e2e. Identify (1) where the pyramid is inverted (too many e2e), (2) tests that should be pushed down (e2e → integration → unit), (3) duplicate coverage between layers, (4) 5 specific test moves that would cut CI time by ≥30% without losing safety.

{paste tree + sample tests}

12. Coverage-gap finder (no tools)

Don't run coverage tools. Below is the module under test + its current test file. Qualitatively identify: (1) the 3 branches with no test coverage, (2) any error path that isn't exercised, (3) any data shape used in production but not in tests, (4) the 5 highest-ROI tests I should add this week, ranked by likelihood-of-bug × user-impact.

Module: {paste}
Tests: {paste}

13. Test naming & structure cleanup

Below are 20 of my test names + bodies. Rewrite for readability: (1) name format `describe<unit>().it<scenario>().expect<outcome>`, (2) remove "tests" / "should" prefixes that add no info, (3) collapse duplicated setup into a `beforeEach`, (4) flag any 2 tests that are testing the same thing and suggest a merge.

{paste tests}

Common mistakes

  • Tests that mirror implementation step-by-step
  • “100% coverage” without testing behavior
  • Flaky tests left in main
  • Generating tests AFTER fixing a bug instead of BEFORE — you lose the regression guarantee
  • Mocking the thing you’re supposed to be testing

How to push results further

  • Always state the framework and assertion library — otherwise AI mixes Jest with Mocha syntax.
  • For each test ask “what bug would this catch?” — if you can’t answer, delete the test.
  • Generate the failing test before the fix. The test must fail on main and pass on your branch — only then is it a real regression test.
  • Prefer table-driven test generation prompts (one row = one case). Easier to extend, lower duplication.
  • For Claude Code / Cursor, let the agent Read neighboring tests so style matches — pasted-style tests look foreign in PR review.
  • Cap test count per prompt at 8-10; more and AI starts duplicating semantics with renamed variables.
  • Run the generated tests in a scratch branch first; auto-applying broken tests poisons CI for the team.

FAQ

  • How do I avoid change-detector tests?: Always describe the BEHAVIOR contract in the prompt (“function returns sorted unique strings”), not the implementation. The test should still pass after a refactor.
  • What test framework should I tell AI to use?: Whatever your project already uses. Mixing frameworks adds cognitive load and breaks CI runners. Look at one existing test file and copy its imports into the prompt.
  • Should AI mock or use real dependencies?: Mock external (HTTP, payments, email). Use real for in-process (parsers, formatters, pure functions). Tell AI this rule in the prompt — defaults vary.
  • Can AI catch flaky tests?: Sometimes. Template 8 (flaky diagnosis) works when you paste the test + tested code together. Pure-from-name flake hunting is mostly guessing.
  • How many tests per function?: Happy path + 3 edge cases + 1 invalid input is a solid baseline. Add more only when a known bug taught you a missing case.
  • My AI-generated tests pass even when I break the code. Why?: AI tested the AI’s mental model, not your code. Always mutate one line and re-run — if tests still pass, the tests are noise. Use template 12 to find gaps.

Tags: #Prompt #AI coding #Test generation