ChatGPT Data Analysis Workflow — Real Numbers, Not Vibes

Advanced Data Analysis turns ChatGPT into a real Python notebook. Here is how to use it without getting fooled by pretty charts.

A chart from ChatGPT looks authoritative — neat axes, clean colors, plausible numbers — and that is the danger. The code behind it can be silently wrong, the column it summed could be the wrong column, the “outlier filter” might have quietly dropped half your data. This workflow is the friction that catches those errors before you ship the chart into a deck or a board meeting. Aimed at non-analysts (ops, PMs, indie founders) who use ChatGPT for occasional analysis and want to stop being fooled by good-looking output.

What this tutorial solves

ChatGPT can write Python and run it on your CSV, but easily produces confident-looking results from wrong code. This workflow forces verification at each step.

Who this is for

Anyone with CSV / Excel files who is not a full-time data analyst — operators, PMs, indie founders, researchers.

When to reach for it

Cleaning a messy CSV, producing a one-off chart, comparing two datasets, generating a summary table for a report.

When this is NOT the right tool

Production data pipelines, anything involving PII, datasets larger than a few hundred thousand rows, or work that must be reproducible by others.

Step by step

  1. Use the Advanced Data Analysis tool (Plus or higher). Without it, ChatGPT only reasons about data, not actually computes.
  2. Upload one file at a time. Ask it to describe columns, types, row count, and any nulls before you ask analysis questions.
  3. For every chart, ask to show the underlying code. If the code is wrong, the chart is wrong — pretty doesn’t mean correct.
  4. Cross-check totals: ask ChatGPT to print the sum or count of the column being analyzed, and verify against your raw file.
  5. For grouping or filtering, ask: “Show 5 sample rows from each group” so you can sanity-check the bucket.
  6. Export the cleaned dataset before the chat ends — sessions expire and you lose work.

Sales CSV cleanup: upload → ask for column summary → fix data types → ask “show me 5 rows where revenue is null” → decide drop or impute → group by month → export cleaned CSV + summary chart.

Common mistakes

  • Trusting a chart without seeing the code or the totals it was built from.
  • Uploading PII or financial data into a public ChatGPT account. Use a workspace with privacy guarantees, or strip identifiers first.
  • Letting ChatGPT decide what “outliers” are without showing you the rule.
  • Asking for “insights” instead of specific questions. You will get a generic-sounding paragraph.
  • Skipping the column-profile step and going straight to charting. Wrong type inference (numbers parsed as strings) silently breaks every aggregation downstream.
  • Letting the sandbox expire without exporting the cleaned data. Sessions die; “I can reload this tomorrow” is a lie you tell yourself.

Advanced tips

  • Ask for the cleaned data as a CSV download at multiple checkpoints — recovery insurance.
  • For repeat analyses, save the final Python code and run it yourself next time. Way faster and reproducible.
  • Use “Show me the SQL equivalent of this analysis” to learn or to migrate to a real database.

Output checklist

  • Every number in the final summary can be traced back to a column and a Python operation you saw.
  • You have spot-checked at least 5 raw rows against the summary stats.
  • No PII left in the uploaded file.
  • Cleaned dataset exported and saved locally.

FAQ

  • Does ChatGPT really run Python?: Yes — Advanced Data Analysis runs sandboxed Python on OpenAI servers. Free tier does not have this.
  • Is my CSV stored?: Files are tied to your account. Disable training in settings if that matters to you, and avoid uploading PII.
  • Reasoning model or fast model for analysis?: Reasoning model for anything past basic groupings, especially when the analysis has multiple steps you cannot easily verify.
  • What if the file is too big?: Sample it locally first (say, 100k rows). If you actually need full-dataset analysis, switch to a real notebook — ChatGPT will silently sample anyway.

Tags: #ChatGPT #Tutorial #Data analysis #Workflow