e2e 多少条算多？

4 核 runner 跑超过 10 分钟、或开发开始忽略红色构建时，就太多了。砍回旅程，其余下沉到 component / unit。

2026 年选 Cypress 还是 Playwright？

没有历史包袱就直接 Playwright。截至 2026 年 6 月它约占 45% 框架采用率，Cypress 约 14%，多 tab、移动端模拟、原生并行都更干净。

e2e 该卡合并吗？

smoke 子集应该卡；全套别，太慢。主分支跑全套，红了立刻回滚。

AI 能写测试吗？

骨架可以，数据 setup 不行。Agent 会编一份根本不存在的种子数据，所以先用 Prompt 8 接好 fixture。

第三方 SSO 的登录怎么办？

走 token 接口或 `storageState` fixture，永远别在 CI 里走 Google 或 Okta 的真实登录 UI。

一定要做视觉回归吗？

大约 3 个稳定屏值得，更多就得不偿失：动态数据带来的误报成本会超过抓 bug 的价值。

慢套件怎么压 CI 时间？

分片。在 4 台 runner 上传 `--shard=1/4` 并开 `fullyParallel: true`；有团队报告 60 分钟的套件切到合适分片后降到约 8 分钟。

AI 提示词库

E2E 测试计划 Prompt：13 个 Playwright / Cypress 模板

把脆弱、截图泛滥的 e2e 套件变成小、快、确定的计划。13 个可直接复制的 Prompt——旅程、选择器、登录 fixture、flake 分类、PR 级覆盖，针对 Playwright 1.59（2026）调过。

发布于: 2026/05/19 更新于: 2026/06/06 作者: AI Productivity Guide Team 🌐 查看英文版本

大部分 e2e 套件不是死于 bug，而是死于腐烂：选择器脆、等待不可靠、CI 一忙登录就超时。好的 e2e 计划 Prompt 会选对用户旅程（不是每个页面都测），指定选择器策略（role / test-id），并禁掉造成 flake 的写法（waitForTimeout、动态页面上的 networkidle）。下面这 13 个 Prompt 给 AI 编码 agent 足够的结构，让它规划出一套你真的愿意长期跑的测试。

一句话总结

e2e 限制在 5-8 条用户旅程，其余都进 unit / component test。
新套件默认选 Playwright：截至 2026 年 6 月，它每周约 3300 万次 npm 下载，Cypress 约 650 万次，独立基准里快约 23%。
主选择器用 getByRole + accessible name：放弃 CSS / XPath，比任何单项改动都更能消灭 flake。
用自动重试的 web-first 断言（expect(locator).toBeVisible()）取代 waitForTimeout，它会一直轮询到通过或超时。
每个 PR 跑 3 条 smoke 子集；main 和 nightly 才跑全套，用 --shard 加 fullyParallel: true 分片。

适合哪些场景

在选 Playwright / Cypress 策略的前端 lead、写实施前测试计划的 QA、用 Claude Code 或 Cursor 接手脆弱套件的独立开发者。

什么时候不建议这样写 Prompt

别拿 e2e 测组件内部逻辑，那是 unit / component test 的领域。每周都在变的流程也别 e2e，维护成本远大于抓 bug 的收益。

为什么 2026 年默认选 Playwright

两个都能用，但差距已经拉大了。按下面这张表选，而不是凭习惯（数据截至 2026 年 6 月）：

指标	Playwright	Cypress
每周 npm 下载	约 3300 万	约 650 万
浏览器引擎	Chromium、Firefox、WebKit（同一套 API）	Chrome 系 + Firefox
语言	JS/TS、Python、Java、.NET	JS/TS
并行	原生 worker + `--shard`	付费 Cloud 或插件
最新版本	1.59（2026 年 4 月 1 日）	14.x

只有在已有大量 Cypress 投入、或团队特别看重它的浏览器内时光回溯 runner 时，才选 Cypress。绿地项目里 Playwright 在跨浏览器、原生并行和多 tab 上都赢。1.59 新增了 Playwright Test Agents 和 CLI 版 trace 调试（npx playwright trace），让编码 agent 在终端里就能分析失败的运行，不必下载 trace 再开 UI。

Prompt 结构公式

每个 e2e 计划 Prompt 都要带这六个要素：

角色：让 AI 扮演谁（Release Captain / QA Lead / SRE / staff 工程师）。
上下文：仓库 / 框架 / 运行时 / 分支 / diff / 失败日志。
目标：一个具体可交付物（checklist、计划、测试文件、review 笔记、根因、ticket 列表）。
限制：AI 不能做什么（别自动修、别静默改写、别瞎猜版本号）。
输出格式：编号清单、markdown 表格、JSON schema、unified diff、可直接运行的代码。
信号：1-2 条”好输出”示例，或者说明什么是糟糕输出。

这套 Prompt 适合用在哪

选出值得 e2e 测的 5-8 条用户旅程
写测试前先定下选择器 / fixture / 登录约定
在不重写整个套件的前提下稳定化
单 PR 范围的 e2e 覆盖（不跑整套）
Cypress → Playwright（或反向）迁移

13 个可直接复制的 Prompt 模板

1. 旅程选择

You are a QA lead. Given this app description: {appDescription}, list the 5-8 user journeys worth e2e-testing. For each: (1) one-line user goal, (2) the failure mode that would lose us revenue / users, (3) entry and exit URLs, (4) data setup needed. Stop at 8. Anything else belongs in unit / component tests.

可替换变量： appDescription 应用一句话描述

2. 选择器策略

Audit these existing e2e tests for selector strategy. For each test, mark: ROLE (good — `getByRole("button", { name })`), TEST-ID (acceptable — `[data-testid]`), TEXT (acceptable for unique copy), or CSS / XPATH (bad). Replace CSS / XPATH selectors with role / text. Output a diff plan.

3. 登录 fixture

Design an auth fixture for Playwright that: (1) signs in once per worker, (2) reuses the storage state across tests, (3) skips UI login for tests that don't exercise the login flow itself. Show the fixture code, the playwright.config.ts entry, and one example test consuming it.

4. 网络 stub 策略

For this test {testName}, decide for each network call: STUB (third-party / brittle / slow), REAL (the system-under-test's own API), or RECORD-REPLAY (rarely-changing reference data). Output a table: endpoint | strategy | reason. Don't stub our own backend except for explicit error-path tests.

可替换变量： testName

5. flake 分类 + 修法

Read these flaky test results from the last 7 days: {flakeLog}. Classify each flake as: TIMING (need `expect.toBeVisible()` not arbitrary wait), NETWORK (need stub or retry), STATE (test leak from previous test), ENVIRONMENT (CI vs local). For each, write a one-line fix recipe. Don't patch with retries — patch root cause.

可替换变量： flakeLog 近 7 天 flake 日志

6. PR 级 e2e 覆盖

For this diff {diff}, decide whether new e2e tests are needed. Criteria: changes touch a critical journey (template 1) AND change observable user behaviour. If yes, draft 1-3 e2e test outlines (not full code). If no, say "unit / component test is sufficient" and stop.

可替换变量： diff

7. 移动端 / 响应式

Add mobile coverage to this Playwright config. (1) Add 1 mobile project (Pixel 5) and 1 small-screen Chromium. (2) Pick 2 journeys from the existing suite to run on mobile (sign-up, checkout). (3) Use `test.use({ viewport })` for per-test overrides. Don't run the full suite on mobile.

8. 隔离测试数据

For this test {testName}, propose a data-setup strategy that doesn't depend on prod state: (1) create user via API not UI, (2) seed needed records via fixture, (3) clean up in `afterEach` even if the test fails. Show one example using API factories.

可替换变量： testName

9. CI 分片计划

Our Playwright suite takes 25 minutes. Design a sharding plan to bring it under 7 minutes on 4 shards: (1) Group tests by file (default) or by tag, (2) Avoid auth-state contention across shards, (3) Don't shard the smoke subset. Output the workflow YAML diff.

10. 视觉回归范围

Pick the 3 screens worth visual-regression-snapshotting: (1) homepage hero, (2) the one screen where layout drift hurts conversions most, (3) any screen with a CSS variable change in the diff. Don't snapshot pages that change with real data (lists, dashboards). Output the test stubs.

11. e2e 里的无障碍检查

Add `@axe-core/playwright` to 3 critical pages: home, sign-up, checkout. For each: assert no violations of severity `serious` or higher. Allow `moderate` for now with a ticket comment. Don't fail on `minor` — too noisy. Show the helper and the 3 tests.

12. Cypress → Playwright 迁移

I have {nTests} Cypress tests. Migrate them to Playwright in priority order: (1) journeys from template 1 first, (2) high-flake tests next (rewrite, don't port), (3) low-value tests last (consider deleting). Output a migration tracker with status per test.

可替换变量： nTests 测试条数

13. 给非工程师看的测试计划

Turn the full e2e plan into a 1-page markdown doc for non-engineers: (1) Which journeys are tested, (2) Roughly which devices / browsers, (3) Approximate run time, (4) What we deliberately DON'T test (and why). Plain English. No "Playwright" / "fixtures" jargon.

容易踩的坑

想 e2e 测每个页面——套件 2 小时，没人愿意跑。
用 CSS / XPATH 选择器——重构一改就崩。
waitForTimeout(2000) 修 flake——只是盖住根因。
每个测试都走 UI 登录——又慢又脆。
stub 自己的后端 API——线上挂了测试还是绿的。
整页 snapshot——动态值天天 false fail。
测试间共享用户——出现执行顺序依赖。

该用哪个 AI 模型来驱动这些 Prompt

规划类工作（选旅程、flake 分类、迁移跟踪表）任何前沿模型都够用。要让 agent 读你的仓库、真正写出测试文件，就选代码能力强的：截至 2026 年 6 月，Claude Opus 4.7 在 SWE-bench Verified 上以 87.6% 领先，GPT-5.5 与 Gemini 3.1 Pro 紧随其后。Claude Code（Opus 4.7 / Sonnet 4.6）和 Cursor 都能跑这些模型，并能执行 npx playwright test 自行验证输出。一个实用循环：先把 Prompt 1 贴进去选旅程，再把 Prompt 3、4、8 交给 agent，让它先搭好登录、网络、数据 fixture，再动任何测试体。微软现在建议编码 agent 用 Playwright CLI 而非 MCP，因为 CLI 每个会话省约 4 倍 token。

优化技巧

e2e 限制在 5-8 条旅程，其它进 unit / component。
用 getByRole + accessible name：重构无碍，且顺带提升 a11y。
expect(locator).toBeVisible({ timeout }) 取代 waitForTimeout（web-first 断言会自动重试）。
每个 worker 用 API 登录一次，复用 storageState。
第三方 API（Stripe / OAuth / analytics）stub，自家 API 放过。
加 @flaky 标记，单独跑这条流水线，每周修或删。
每个 PR 跑 smoke（3 条），main / nightly 才跑全套，用时长感知分片（Playwright 1.49+）。

FAQ

e2e 多少条算多？: 4 核 runner 跑超过 10 分钟、或开发开始忽略红色构建时，就太多了。砍回旅程，其余下沉到 component / unit。
2026 年选 Cypress 还是 Playwright？: 没有历史包袱就直接 Playwright。截至 2026 年 6 月它约占 45% 框架采用率，Cypress 约 14%，多 tab、移动端模拟、原生并行都更干净。
e2e 该卡合并吗？: smoke 子集应该卡；全套别，太慢。主分支跑全套，红了立刻回滚。
AI 能写测试吗？: 骨架可以，数据 setup 不行。Agent 会编一份根本不存在的种子数据，所以先用 Prompt 8 接好 fixture。
第三方 SSO 的登录怎么办？: 走 token 接口或 storageState fixture，永远别在 CI 里走 Google 或 Okta 的真实登录 UI。
一定要做视觉回归吗？: 大约 3 个稳定屏值得，更多就得不偿失：动态数据带来的误报成本会超过抓 bug 的价值。
慢套件怎么压 CI 时间？: 分片。在 4 台 runner 上传 --shard=1/4 并开 fullyParallel: true；有团队报告 60 分钟的套件切到合适分片后降到约 8 分钟。