ChatGPT Data Analysis Basics

Use ChatGPT's data analysis (with code) to read CSVs and make charts.

You have a CSV and a question, and you do not want to open a Jupyter notebook for it. ChatGPT’s Advanced Data Analysis runs real Python on your file, draws the chart, and explains what it did — but only if you ask in the right shape. This is the entry guide: upload, describe, one chart at a time, and the three sanity checks that catch most bad outputs.

What this covers

Use ChatGPT’s data analysis (Code Interpreter / Advanced Data Analysis) to read CSV and Excel files, profile columns, and produce single-question charts without firing up a notebook.

Key tools and concepts:

  • ChatGPT: OpenAI’s conversational AI assistant — the product that brought the GPT models to a mass audience.
  • Advanced Data Analysis: A Plus-tier feature that actually runs Python on uploaded files in a sandbox. Without it, the model only reasons about your data.
  • Data profile: A quick summary of columns, types, row count, nulls. Ask for this before any analysis.

Who this is for

Anyone with occasional CSV or XLSX work who is not a full-time analyst — ops, PMs, indie devs, students. If you do this daily in a real notebook, this guide will feel slow.

When to reach for it

A one-off question about a small file. “What is the conversion rate by channel last month?” “Plot signups per day.” “Find rows where revenue is null.” For anything past ~100k rows or needing reproducibility, switch to a notebook.

When this is NOT the right tool

Files with PII or financial data you cannot upload to OpenAI. Datasets over a few hundred thousand rows. Anything that has to run on a schedule or be re-run identically by someone else.

Before you start

  • Save a small clean copy of the file — drop unrelated sheets, strip identifiers, keep <50MB.
  • Write your specific question in one sentence before opening ChatGPT. Vague uploads get vague output.
  • Know the answer roughly. If your manual estimate disagrees with the chart by 10x, the chart is wrong, not your gut.

Step by step

  1. Open a chat in a model that supports file analysis (Plus or higher). Upload one file. Wait for the “ready” indicator.
  2. First message — never analysis, always profile: “Describe each column: type, sample values, null count, row total.” Read it carefully. Most bad analyses trace back to wrong type inference.
  3. For each chart, ask one specific question. “Plot the distribution of revenue for rows where channel = paid.” One chart per ask.
  4. After every chart, request the underlying Python code or the totals. “Print sum of revenue grouped by channel — does it match the chart?”
  5. For comparisons, give it the axes: “Plot signups over time, grouped by country, line chart, last 90 days only.”
  6. Export anything you might need later — the cleaned CSV, the chart PNG. Session sandboxes expire.

Example prompts that work

1. "Describe each column. Mark which look like dates,
   numbers, or categories. Flag any with >5% nulls."

2. "Group rows by month using created_at. Sum revenue
   per group. Show the result as a table, not a chart."

3. "Plot daily active users, last 30 days. X-axis = day,
   Y-axis = count. Annotate the highest and lowest days."

4. "Find rows where status = 'failed' and amount > 100.
   Show me 10 sample rows."

First-run exercise

  1. Pick a real small CSV you have lying around — under 10MB, no PII.
  2. Profile it (step 2 above). Read the column summary in detail.
  3. Ask one specific chart question. Ask for the code afterward.
  4. Cross-check one number against the raw file (open it in Excel or less). If they match, you can trust the chart. If not, find why.

Quality check

  • Does the row count in the profile match what you expect? Wrong file uploaded is the #1 silent error.
  • Do column types make sense? Numbers parsed as strings will silently break aggregations.
  • For every chart, can you point to the cell in Python that computed the headline number?
  • Does the chart actually answer your question, or just a paraphrase of it?

How to reuse this workflow

  • Save the profile-first prompt as a template. It works for almost every CSV regardless of domain.
  • Build a Custom GPT called “Data peek” with the profile prompt as the default starter.
  • For files you re-analyze monthly, save the Python code that worked and re-run it yourself next time — way faster.

Upload → profile columns → one specific question → one chart → ask for code or totals → annotate findings → export the cleaned file. Total time for a small analysis: 10-20 minutes.

Common mistakes

  • Uploading huge files (>50MB) — uploads time out or get sampled silently.
  • Skipping the profile step. You will analyze the wrong columns and not notice.
  • Trusting visuals without checking axis labels and scale. Log scale charts look great and lie.
  • Asking “give me insights” instead of a specific question. Generic prompt = generic paragraph.
  • Letting nulls and duplicates go unhandled. Ask explicitly: “How are nulls handled in this aggregation?”
  • Closing the chat without exporting the cleaned data. The sandbox expires and your work is gone.

FAQ

  • Does ChatGPT actually run code?: Yes — with Advanced Data Analysis on Plus or higher. Free tier only reasons about data, no execution.
  • Will it remember my data across chats?: No. Each new chat is a fresh sandbox. Use Projects if you need persistent context (but not persistent data).
  • What if my file is 500MB?: Too big — sample it first locally, or switch to a real notebook. ChatGPT will sample silently if you push it.
  • Can it handle Excel files with multiple sheets?: Yes, but tell it which sheet to use. Default behavior picks the first.

Tags: #ChatGPT #Tutorial