How many e2e tests are too many?

When the suite runs over 10 minutes on a 4-core runner, or when devs start ignoring red builds. Trim back to journeys and push the rest down to component / unit tests.

Cypress or Playwright in 2026?

Playwright unless you have a heavy Cypress investment. As of June 2026 it holds ~45% framework adoption to Cypress's ~14%, with cleaner multi-tab, mobile emulation, and native parallelism.

Should e2e block merge?

Smoke subset yes; full suite no, it's too slow. Run the full suite on `main` and revert on red.

Can AI write the tests too?

Yes for the skeleton, no for data setup. Agents invent seed data that doesn't exist, so wire fixtures (prompt 8) first.

How do I deal with auth on third-party SSO?

Use a programmatic token endpoint or a `storageState` fixture. Never drive Google's or Okta's real login UI in CI.

Do I need visual regression?

For about 3 stable screens, yes. Beyond that the false-failure cost from dynamic data exceeds the bug-catching value.

How do I cut a slow suite's CI time?

Shard it. Pass `--shard=1/4` across 4 runners with `fullyParallel: true`; teams report 60-minute suites dropping to roughly 8 minutes with the right split.

Prompt Library

E2E Test Plan Prompts: 13 Templates for Playwright / Cypress

Turn a flaky, screenshot-heavy e2e suite into a small, fast, deterministic plan. 13 copy-ready prompts for journeys, selectors, auth fixtures, flake triage, and PR coverage, tuned for Playwright 1.59 (2026).

Published: May 19, 2026 Updated: Jun 06, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Most e2e suites die from rot, not bugs: flaky selectors, brittle waits, and a login that times out the moment CI gets busy. A good e2e-plan prompt picks the right user journeys (NOT every page), names a selector strategy (role / test-id), and forbids the patterns that cause flake (waitForTimeout, networkidle on dynamic pages). The 13 prompts below feed an AI coding agent enough structure to plan a suite you will actually keep running.

TL;DR

Cap e2e at 5-8 user journeys. Everything else belongs in unit / component tests.
Default to Playwright for new suites: as of June 2026 it pulls ~33M weekly npm downloads to Cypress’s ~6.5M and runs roughly 23% faster in independent benchmarks.
Use getByRole + accessible name as the primary selector. Switching off CSS / XPath kills more flake than any other single change.
Replace waitForTimeout with web-first assertions (expect(locator).toBeVisible()), which auto-retry until they pass or time out.
Run a 3-test smoke subset on every PR; run the full suite on main and nightly, sharded with --shard and fullyParallel: true.

Who this is for

Frontend leads choosing a Playwright / Cypress strategy, QA engineers writing a test plan before implementation, and indie devs pointing Claude Code or Cursor at an existing flaky suite.

When not to use these prompts

Don’t use these to test internal component logic; that’s unit / component-test territory. Don’t use them on flows that change weekly, where the maintenance cost outweighs the bug-catching value.

Why Playwright is the 2026 default

Both tools work, but the gap has widened. Pick based on this, not habit (figures as of June 2026):

Signal	Playwright	Cypress
Weekly npm downloads	~33M	~6.5M
Browser engines	Chromium, Firefox, WebKit (one API)	Chrome-family + Firefox
Languages	JS/TS, Python, Java, .NET	JS/TS
Parallelism	Native workers + `--shard`	Paid Cloud or plugins
Latest release	1.59 (April 1, 2026)	14.x

Choose Cypress only if you have a large existing investment or your team values its in-browser time-travel runner above all else. For greenfield, Playwright wins on cross-browser, native parallelism, and multi-tab support. New in 1.59: Playwright Test Agents and a CLI trace debugger (npx playwright trace) that lets a coding agent analyze a failed run from the terminal, no UI download required.

Prompt anatomy / structure formula

Every e2e plan prompt should carry six elements:

Role: who the AI plays (release captain / QA lead / SRE / staff engineer).
Context: repo / framework / runtime / branch / diff / failing logs.
Goal: one concrete deliverable (checklist, plan, test file, review notes, root cause, or ticket list).
Constraints: what the AI MUST NOT do (don’t auto-fix, don’t silently rewrite, don’t guess versions).
Output format: numbered findings, markdown table, JSON schema, unified diff, or runnable code.
Signal: 1-2 examples of “good” output, or a note on what bad output looks like.

Best for

Choosing the 5-8 user journeys worth e2e-testing
Setting selector / fixture / auth conventions before writing tests
Stabilising a flaky suite without throwing it away
Adding e2e coverage on a single PR (not the full suite)
Migrating Cypress → Playwright (or vice versa)

13 copy-ready prompt templates

1. Journey selection

You are a QA lead. Given this app description: {appDescription}, list the 5-8 user journeys worth e2e-testing. For each: (1) one-line user goal, (2) the failure mode that would lose us revenue / users, (3) entry and exit URLs, (4) data setup needed. Stop at 8. Anything else belongs in unit / component tests.

Variables to swap: appDescription

2. Selector strategy

Audit these existing e2e tests for selector strategy. For each test, mark: ROLE (good — `getByRole("button", { name })`), TEST-ID (acceptable — `[data-testid]`), TEXT (acceptable for unique copy), or CSS / XPATH (bad). Replace CSS / XPATH selectors with role / text. Output a diff plan.

3. Auth fixture

Design an auth fixture for Playwright that: (1) signs in once per worker, (2) reuses the storage state across tests, (3) skips UI login for tests that don't exercise the login flow itself. Show the fixture code, the playwright.config.ts entry, and one example test consuming it.

4. Network stub strategy

For this test {testName}, decide for each network call: STUB (third-party / brittle / slow), REAL (the system-under-test's own API), or RECORD-REPLAY (rarely-changing reference data). Output a table: endpoint | strategy | reason. Don't stub our own backend except for explicit error-path tests.

Variables to swap: testName

5. Flake taxonomy + fix

Read these flaky test results from the last 7 days: {flakeLog}. Classify each flake as: TIMING (need `expect.toBeVisible()` not arbitrary wait), NETWORK (need stub or retry), STATE (test leak from previous test), ENVIRONMENT (CI vs local). For each, write a one-line fix recipe. Don't patch with retries — patch root cause.

Variables to swap: flakeLog

6. PR-scoped e2e coverage

For this diff {diff}, decide whether new e2e tests are needed. Criteria: changes touch a critical journey (template 1) AND change observable user behaviour. If yes, draft 1-3 e2e test outlines (not full code). If no, say "unit / component test is sufficient" and stop.

Variables to swap: diff

7. Mobile + responsive coverage

Add mobile coverage to this Playwright config. (1) Add 1 mobile project (Pixel 5) and 1 small-screen Chromium. (2) Pick 2 journeys from the existing suite to run on mobile (sign-up, checkout). (3) Use `test.use({ viewport })` for per-test overrides. Don't run the full suite on mobile.

8. Hermetic test data

For this test {testName}, propose a data-setup strategy that doesn't depend on prod state: (1) create user via API not UI, (2) seed needed records via fixture, (3) clean up in `afterEach` even if the test fails. Show one example using API factories.

Variables to swap: testName

9. CI sharding plan

Our Playwright suite takes 25 minutes. Design a sharding plan to bring it under 7 minutes on 4 shards: (1) Group tests by file (default) or by tag, (2) Avoid auth-state contention across shards, (3) Don't shard the smoke subset. Output the workflow YAML diff.

10. Visual regression scope

Pick the 3 screens worth visual-regression-snapshotting: (1) homepage hero, (2) the one screen where layout drift hurts conversions most, (3) any screen with a CSS variable change in the diff. Don't snapshot pages that change with real data (lists, dashboards). Output the test stubs.

11. Accessibility checks in e2e

Add `@axe-core/playwright` to 3 critical pages: home, sign-up, checkout. For each: assert no violations of severity `serious` or higher. Allow `moderate` for now with a ticket comment. Don't fail on `minor` — too noisy. Show the helper and the 3 tests.

12. Cypress → Playwright migration

I have {nTests} Cypress tests. Migrate them to Playwright in priority order: (1) journeys from template 1 first, (2) high-flake tests next (rewrite, don't port), (3) low-value tests last (consider deleting). Output a migration tracker with status per test.

Variables to swap: nTests

13. Test-plan markdown for stakeholders

Turn the full e2e plan into a 1-page markdown doc for non-engineers: (1) Which journeys are tested, (2) Roughly which devices / browsers, (3) Approximate run time, (4) What we deliberately DON'T test (and why). Plain English. No "Playwright" / "fixtures" jargon.

Common mistakes

Trying to e2e-test every page — leads to a 2-hour suite that everyone skips.
Using CSS / XPATH selectors — refactor breaks the test, not the assertion.
Sleeping await page.waitForTimeout(2000) to “fix” flake — masks the root cause.
Logging in through UI in every test — slow and brittle.
Stubbing the system-under-test’s own API — tests pass when prod is broken.
Snapshotting full pages — every dynamic value causes false failures.
Sharing test users across tests — order-dependent failures.

Which AI model to drive these prompts

For planning work (journey selection, flake triage, migration trackers) any frontier model is fine. For agents that read your repo and write the actual test files, prefer one strong at code: as of June 2026, Claude Opus 4.7 leads SWE-bench Verified at 87.6%, with GPT-5.5 and Gemini 3.1 Pro close behind. Claude Code (Opus 4.7 / Sonnet 4.6) and Cursor both run these models and can execute npx playwright test to verify their own output. A practical loop: paste prompt 1 to pick journeys, then hand the agent prompts 3, 4, and 8 so it scaffolds auth, network, and data fixtures before any test body. Microsoft now recommends the Playwright CLI over MCP for coding agents because the CLI uses about 4x fewer tokens per session.

How to push results further

Cap e2e at 5-8 journeys. Move everything else to component / unit.
Use getByRole + accessible name. Selectors stay valid through refactors and improve a11y at the same time.
expect(locator).toBeVisible({ timeout }) instead of waitForTimeout (web-first assertions auto-retry).
Sign in once per worker via API; reuse storageState.
Stub third-party APIs (Stripe / OAuth / analytics); let your own API run real.
Tag flaky tests with @flaky, run them in a separate workflow, fix or delete them weekly.
Run smoke (3 tests) on every PR; full suite on main / nightly with duration-aware sharding (Playwright 1.49+).

FAQ

How many e2e tests are too many?: When the suite runs over 10 minutes on a 4-core runner, or when devs start ignoring red builds. Trim back to journeys and push the rest down to component / unit tests.
Cypress or Playwright in 2026?: Playwright unless you have a heavy Cypress investment. As of June 2026 it holds ~45% framework adoption to Cypress’s ~14%, with cleaner multi-tab, mobile emulation, and native parallelism.
Should e2e block merge?: Smoke subset yes; full suite no, it’s too slow. Run the full suite on main and revert on red.
Can AI write the tests too?: Yes for the skeleton, no for data setup. Agents invent seed data that doesn’t exist, so wire fixtures (prompt 8) first.
How do I deal with auth on third-party SSO?: Use a programmatic token endpoint or a storageState fixture. Never drive Google’s or Okta’s real login UI in CI.
Do I need visual regression?: For about 3 stable screens, yes. Beyond that the false-failure cost from dynamic data exceeds the bug-catching value.
How do I cut a slow suite’s CI time?: Shard it. Pass --shard=1/4 across 4 runners with fullyParallel: true; teams report 60-minute suites dropping to roughly 8 minutes with the right split.

External references: the Playwright best-practices guide and the test-sharding docs are the canonical sources for the selector and parallelism advice above.

Tags: #Prompt #Coding #Testing #E2E #Playwright