How do I avoid change-detector tests?

Describe the BEHAVIOR contract in the prompt ("function returns sorted unique strings"), never the implementation. A good test still passes after a refactor and only fails when behavior changes.

Which framework version should I tell the model?

Whatever your project already pins, with the major version. Vitest 3 and Jest 30 are about 95% API-compatible, but `vi.mock` vs `jest.mock` and ESM vs CJS config differ enough to break a run. Paste one existing test file's imports into the prompt.

Should the AI mock or use real dependencies?

Mock external boundaries (HTTP, payments, email). Use real for in-process pure code (parsers, formatters). State the rule in the prompt, because model defaults vary.

Can AI catch flaky tests?

Sometimes. Template 8 works when you paste the test plus the tested code together. Hunting flake from the test name alone is mostly guessing.

How many tests per function?

Happy path + 3 edge cases + 1 invalid input is a solid baseline. Add more only when a real bug teaches you a missing case.

My AI-generated tests pass even when I break the code. Why?

The model tested its mental model, not your code. Mutate one line and re-run — if the tests still pass, they are noise. Use template 12 to find the real gaps. Industry guidance in 2026 is consistent on this: treat the model as a draft author and own the critical assertions yourself.

Prompt Library

Test Generation Prompts (Integration / E2E / Snapshot): 13 Templates

13 prompts for integration, E2E, snapshot, and contract tests with Vitest 3, pytest 9, and Playwright. Tests that catch real bugs, not coverage noise (June 2026).

Published: May 17, 2026 Updated: Jun 14, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Ask GPT-5.5 or Claude Sonnet 4.6 to “write tests” and you get coverage padding: assertions that mirror the implementation line by line and pass even when the code is broken. These 13 prompts force behavior-based tests instead. They are tuned for the frameworks teams actually ship in mid-2026: Vitest 3, Jest 30, pytest 9, and Playwright.

TL;DR

The default failure mode of AI test generation is the change-detector test: it re-asserts what the code does today, so it breaks on every refactor and never catches a real bug. Every prompt below pins the behavior contract instead.
State your framework and version. Vitest 3 and Jest 30 share roughly 95% of the same API, but the mocking and config differ enough that a mixed-syntax file will not run.
Treat the model as a draft author, not the final author. Generate, then mutate one line of the code under test and re-run; if every test still passes, the tests are noise.
For unit tests specifically, see the companion Unit Test Generation Prompts. This page covers integration, E2E, API, component, snapshot, and suite-health prompts.

Who this is for

Engineers adding a safety net before a refactor, contributors trying to land a PR that requires coverage, indie devs who need confidence before launch, and anyone who inherited a codebase with no tests.

When not to use these prompts

Skip them for code you do not plan to keep — write the spec first instead. Skip them for throwaway scripts and trivial getters/setters, where the test costs more than the bug it would catch. And do not let a model auto-apply tests straight into CI: a green-but-meaningless suite is worse than no suite, because it tells the team a refactor is safe when it is not.

Prompt anatomy: the six elements

A test-generation prompt should always carry six things. Drop any one and the model fills the gap with a guess.

Element	What to specify	Why it matters
Subject under test	Function / module / endpoint, with signature and types	Without types the model invents inputs
Test taxonomy	unit / integration / e2e / property — never “some tests”	Sets the right tool and isolation level
Behavior contract	What the code MUST do, not what it currently does	Prevents change-detector tests
Coverage scope	Happy path + N edge cases + 1 invalid input	Exact counts force completeness
Framework + style	`vitest`, `pytest`, `go test`, plus mock-vs-real rules	Stops Jest/Mocha syntax mixing
Output shape	Runnable code only, no prose unless asked	Keeps the answer paste-ready

Pick the right framework first

Tell the model the framework AND the major version. The runners diverged enough by 2026 that “write Jest tests” against a Vitest project produces a file that will not run.

Framework	Version teams ship (mid-2026)	Best for	One thing to tell the AI
Vitest	3.x widely deployed (4.x is current)	Vite / TypeScript apps, browser mode via Playwright	Use `vi.mock`, not `jest.mock`; ESM by default
Jest	30 (shipped June 2025)	Legacy CommonJS, React Native, large monorepos	Confirm ESM vs CJS; React Native has no Vitest support
pytest	9.0.x (9.0 shipped Nov 2025)	Python services and libraries	Prefer fixtures over setup/teardown; use `parametrize` for tables
Playwright	pytest-playwright 0.8.0 / `@playwright/test`	Browser E2E, cross-browser	Assert on user-visible state, auto-wait, no `sleep()`

13 copy-ready prompt templates

Swap each [bracketed] placeholder with your specifics before sending.

1. Behavior-first unit tests

For [function], write unit tests by observable behavior, not internal state.
Cover: happy path, 3 edge cases, 1 invalid input. Use [test framework].
The tests must still pass after a behavior-preserving refactor.

2. Regression test from a bug report

Bug: [description]. Failing repro: [steps].
Write 1 minimal failing test that captures this.
It must fail on main (before the fix) and pass on my branch (after the fix).

3. Property-based test ideas

For [function], identify 3 properties that must always hold
(e.g. "output sorted regardless of input order").
Write property-based test stubs using [fast-check / Hypothesis].

4. Boundary-input tests

For [function with type info], generate tests for boundary inputs:
empty, single, max, very long, special chars, unicode, negative.
Mark which currently fail.

5. Integration tests for a flow

Below is a flow involving [N components].
Write 3 integration tests covering: golden path, one failure injected per step, recovery.

[paste flow]

6. Mock vs real strategy

For [feature], advise which dependencies to mock and which to keep real.
Justify each choice on the stability vs speed trade-off.

7. Snapshot test critique

Below are existing snapshot tests. For each, decide: useful, or noise?
Suggest a targeted assertion to replace any snapshot that exists only to detect change.

[paste]

8. Flaky test diagnosis

Test [name] is flaky. Likely causes in priority order:
network, timing, shared state, randomness, ordering.
Read the test plus the tested code; name the most likely cause and the fix.

[paste]

9. Tests for an API endpoint

For the endpoint `[METHOD /path]` (handler pasted below), write integration tests
in [framework] covering: (1) happy path with valid auth, (2) 401 unauth,
(3) 403 wrong role, (4) 400 invalid body — name the specific invalid field,
(5) 404 resource missing, (6) idempotency under retry (same key, second call).
Tests must seed and tear down DB state per case.

[paste handler + schema]

Variables to swap: framework (supertest + Vitest, pytest + httpx, etc.)

Optimization: If you have an OpenAPI or Zod schema, paste it too. The model derives invalid-input cases automatically from the schema.

10. React / UI component tests

For the React component below, write tests in [framework, e.g. RTL + Vitest].
Cover: (1) renders given a typical prop set, (2) calls the correct callback
on user interaction, (3) handles loading state, (4) handles error state,
(5) accessibility — focusable, labelled, role-correct.
Query by role and label, not by test id or component internals.

[paste component]

11. Test-pyramid balancer

Run this when your suite feels heavy but useless.

Below is my test directory layout plus a list of test files.
Analyze the test pyramid: ratio of unit / integration / e2e.
Identify (1) where the pyramid is inverted (too many e2e),
(2) tests that should be pushed down (e2e to integration to unit),
(3) duplicate coverage between layers,
(4) 5 specific test moves that would cut CI time by at least 30% without losing safety.

[paste tree + sample tests]

12. Coverage-gap finder (no tools)

Do not run coverage tools. Below is the module under test plus its current test file.
Qualitatively identify: (1) the 3 branches with no test coverage,
(2) any error path that is not exercised,
(3) any data shape used in production but not in tests,
(4) the 5 highest-ROI tests I should add this week,
ranked by likelihood-of-bug times user-impact.

Module: [paste]
Tests: [paste]

13. Test naming and structure cleanup

Below are 20 of my test names plus bodies. Rewrite for readability:
(1) name format "unit / scenario / expected outcome",
(2) remove "tests" and "should" prefixes that add no information,
(3) collapse duplicated setup into a beforeEach (or pytest fixture),
(4) flag any 2 tests covering the same thing and suggest a merge.

[paste tests]

Generating tests inside an agent (Claude Code / Cursor)

When the model can read your repo, the workflow beats copy-paste. Let it match your existing style instead of inventing one.

Point the agent at a neighboring test file first: “Read [path/to/example.test.ts], then write tests for [target] in the same style.” Pasted-style tests look foreign in PR review.
Ask it to run the suite after generating, and to report the exact failing assertions rather than silently editing them away.
Work on a scratch branch. Auto-applied broken tests poison CI for the whole team.
After it finishes, mutate one line of the code under test yourself and re-run. If the suite stays green, send the gaps back through template 12.

Claude Code runs Anthropic models only (Sonnet 4.6 is the workhorse; Opus 4.7 for the hardest reasoning). Cursor can route the same job to Sonnet 4.6, GPT-5.5, or Gemini 3.1 Pro, which is useful when you want a second model to check the first one’s assertions. If the agent quietly drops a red test, see why Claude Code skips failing tests.

Common mistakes

Tests that mirror the implementation step by step (the change-detector trap).
Chasing “100% coverage” without ever asserting behavior.
Leaving flaky tests in main, where they train the team to ignore red CI.
Writing the regression test AFTER fixing the bug — you lose the proof that it ever caught anything.
Mocking the exact thing you are supposed to be testing.
Trusting a self-healing E2E runner that silently adapts to a change that was supposed to fail.

How to push results further

Always state the framework AND assertion library, or the model mixes Jest with Mocha syntax in one file.
For each generated test, ask “what bug would this catch?” If you cannot answer, delete it.
Generate the failing test before the fix. It must fail on main and pass on your branch — only then is it a real regression test.
Prefer table-driven prompts: one row equals one case (it.each in Vitest/Jest, @pytest.mark.parametrize in pytest). Lower duplication, easier to extend.
Cap test count per prompt at 8 to 10. Past that, the model starts duplicating semantics with renamed variables.

FAQ

How do I avoid change-detector tests? Describe the BEHAVIOR contract in the prompt (“function returns sorted unique strings”), never the implementation. A good test still passes after a refactor and only fails when behavior changes.
Which framework version should I tell the model? Whatever your project already pins, with the major version. Vitest 3 and Jest 30 are about 95% API-compatible, but vi.mock vs jest.mock and ESM vs CJS config differ enough to break a run. Paste one existing test file’s imports into the prompt.
Should the AI mock or use real dependencies? Mock external boundaries (HTTP, payments, email). Use real for in-process pure code (parsers, formatters). State the rule in the prompt, because model defaults vary.
Can AI catch flaky tests? Sometimes. Template 8 works when you paste the test plus the tested code together. Hunting flake from the test name alone is mostly guessing.
How many tests per function? Happy path + 3 edge cases + 1 invalid input is a solid baseline. Add more only when a real bug teaches you a missing case.
My AI-generated tests pass even when I break the code. Why? The model tested its mental model, not your code. Mutate one line and re-run — if the tests still pass, they are noise. Use template 12 to find the real gaps. Industry guidance in 2026 is consistent on this: treat the model as a draft author and own the critical assertions yourself.

Tags: #Prompt #AI coding #Test generation