数据解读 Prompt:表格 / 图表 / 统计模板

让数据解读不被 spin——区分描述与推断、效应量诚实、混杂变量搜寻、图表误用检测、可写进论文的叙事。

一张表不会自己解释自己。下面 15 个 Prompt 带你认真读数据——区分描述与推断、揭示效应量、抓混杂变量、识别可视化误用、写出可考可写的叙事。

适合哪些场景

统计与方法学生、读新数据集的研究者、做数据新闻的记者、做分析 deck 的业务分析师,以及需要写论文数据段的人。

什么时候不建议这样写 Prompt

琐碎数据(“5 人调查”)不用大动干戈。需要严肃统计分析时也别迷信 AI——它可以建议解读,但替代不了统计师做关键决定。

Prompt 结构公式

数据解读 Prompt 一定要带这六个要素:

  • 角色:AI 扮演谁——研究导师、同行评议人、考试教练、辩论对手、图书馆员。
  • 上下文:水平、学科、deadline、论文数量、引用风格、课程或项目。
  • 目标:一个具体交付物——12 道题、1 页文献矩阵、5 条反论、4 周复习计划。
  • 限制:字数、深度、允许的来源类型、跳过什么、绝不主张什么。
  • 输出格式:编号清单、表格、JSON 或分级块(E / M / H),能粘到 Notion / Anki / Word。
  • 示例 / 信号:1-2 段参考或反例(“不要像维基那样讲”)。

这套 Prompt 适合用在哪

  • 统计作业 / 实验报告
  • 论文结果段
  • 数据新闻解释
  • 考试图表题答案
  • 商业 deck 的数据叙事

15 个可直接复制的 Prompt 模板

1. 描述优先

默认首读,描述稳了再推断。

You are a data tutor. Below is a data table / chart description. (1) Describe what it shows in 3 plain sentences (no inference yet). (2) List the 3 most striking patterns. (3) Identify what we cannot conclude from this data alone. No causal language until the next step.

{paste data}

可替换变量: data description

优化建议: AI 跳到推断时追加:“Strictly descriptive in this step. Any sentence that uses cause / effect / leads / drives should be removed.”

2. 效应量诚实

Below is a result: {paste statistic, e.g., r=0.18, p less than .05, n=420}. Translate this into plain language: what does the effect size mean in practice, how confident should I be, what the p-value does and does not tell me. End with: "this finding should change your behavior by..." or "...should not change your behavior because...".

3. 混杂变量搜寻

I observed that {variable A} correlates with {variable B} in dataset {context}. List 5 plausible confounders, why each could explain the correlation, and what additional data would help distinguish them.

4. 图表误用审计

Below is a description of a chart. Audit it for common visual misuse: truncated y-axis, dual axes without justification, area-vs-length confusion (3D pies), cherry-picked baseline, misleading color scales. Suggest a fixed version.

{paste chart description}

5. 置信区间讲解

Explain this confidence interval in plain language for a {audience — undergraduate / executive / journalist}: {paste CI}. Cover: what it means, what it does not mean (common misinterpretation), one practical implication.

6. “缺了什么”探测

Below is a data summary. List 5 things that are missing or unclear: denominator, time window, sample frame, missing-data handling, outlier treatment. For each: how it could change the interpretation if addressed.

{paste summary}

7. 两结果对比

Compare these two results: {result A} and {result B}. Note: (a) which has the larger effect, (b) which is more precise, (c) which is more generalizable, (d) which deserves more weight in a decision and why.

8. 怀疑论者压力测试

Pretend you are a skeptical reviewer. Below is my interpretation of the data. List 5 alternative interpretations consistent with the same data and 1 piece of additional evidence that would discriminate between them.

{paste my interpretation}

9. 朴素贝叶斯检查

A study found {result}. Apply a common-sense Bayesian update: what was a reasonable prior before the study, how strong is this evidence, what should my posterior be? Express each step in plain language, no formulas required.

10. 多重检验警报

The paper tested {N} hypotheses and reported {M} significant at p less than .05. Estimate how many we would expect to be "significant" by chance alone. Discuss whether the authors corrected for multiple comparisons and what to look for in the methods.

11. 大众叙事

Translate this data result into a 150-word plain-language story for a non-technical reader: setup, what was found, what it means, one caveat. Do not omit numbers; humanize them with comparisons (per 1000 people, per year, etc.).

{paste}

12. 决策相关结论

For a {decision-maker role} reading this data, what 3 takeaways actually matter for action? For each: the data point, the recommended action, the threshold at which the action should be revisited.

{paste data}

13. 结果段草稿

Draft a 200-word results section paragraph for a {social science / clinical / engineering} paper based on the following findings: {paste numbers}. Use neutral, descriptive academic voice; cite the statistical test, effect size, and CI / p-value.

14. 更好的可视化

Describe a better way to visualize this data, given my audience is {audience}: type of chart, key annotations, what to highlight, what to drop. Justify each choice in 1 sentence.

{paste data}

15. 数据段局限

Write a 150-word limitations paragraph for this data analysis: sampling, measurement, missing data, generalizability. End with the single most important caveat a reader should remember.

{paste study summary}

容易踩的坑

  • 看到相关就喊”X 导致 Y”。
  • 只报 p 值不报效应量——小效应也能”显著”却无用。
  • 丢分母——“30%“是 10 人里还是 10000 人里?
  • 盲信 AI 算数——决策相关数字必须核对原始数据。
  • 忽略缺失数据——被丢掉的常比被报的更重要。
  • 把一项研究当终局——meta 分析优于单结果。
  • 只看标题和摘要——不看方法和图。

优化技巧

  • 永远先做描述(模板 1),再上推断语言。
  • 每个结果都问”和什么比?""什么单位?”
  • 决策相关数字用计算器或表格再算一遍。
  • 亲手把数据画一遍——读表和读图洞见不同。
  • 每条要发表 / 演讲的结论都跑一次模板 8(压力测试)。
  • 大众沟通要人化数字(模板 11)——“每千人 / 每年”比百分比更具体。
  • 维护一份”常见图表误用”私人库,模式识别省时间。

FAQ

  • AI 能跑统计分析吗?: 能建议用什么检验、解读输出、写结果叙事。计算请用专业工具(R / Python / SPSS / Stata)。
  • 怎么判断 AI 解读对不对?: 核对效应量、分母、单位与原始数据。重要结论先跑模板 8 压力测试。
  • 数据解读最常见的错是?: 把”统计显著”等同于”实务重要”。永远用原始单位报效应量。
  • 论文数据段用 AI 吗?: 用它草拟与核查叙事;不要让它跑分析或评判检验适配关键决策。
  • 没有受控实验怎么处理混杂?: 列合理的(模板 3),能用分层或匹配的就做,剩下的写进局限。

相关阅读

标签: #Prompt #学习 #研究