You have a CSV and a question, and you do not want to open a Jupyter notebook for it. ChatGPT’s Advanced Data Analysis runs real Python on your file, draws the chart, and explains what it did — but only if you ask in the right shape. This is the entry guide: upload, describe, one chart at a time, and the three sanity checks that catch most bad outputs.
What this covers
Use ChatGPT’s data analysis (Code Interpreter / Advanced Data Analysis) to read CSV and Excel files, profile columns, and produce single-question charts without firing up a notebook.
Key tools and concepts:
- ChatGPT: OpenAI’s conversational AI assistant — the product that brought the GPT models to a mass audience.
- Advanced Data Analysis: A Plus-tier feature that actually runs Python on uploaded files in a sandbox. Without it, the model only reasons about your data.
- Data profile: A quick summary of columns, types, row count, nulls. Ask for this before any analysis.
Who this is for
Anyone with occasional CSV or XLSX work who is not a full-time analyst — ops, PMs, indie devs, students. If you do this daily in a real notebook, this guide will feel slow.
When to reach for it
A one-off question about a small file. “What is the conversion rate by channel last month?” “Plot signups per day.” “Find rows where revenue is null.” For anything past ~100k rows or needing reproducibility, switch to a notebook.
When this is NOT the right tool
Files with PII or financial data you cannot upload to OpenAI. Datasets over a few hundred thousand rows. Anything that has to run on a schedule or be re-run identically by someone else.
Before you start
- Save a small clean copy of the file — drop unrelated sheets, strip identifiers, keep <50MB.
- Write your specific question in one sentence before opening ChatGPT. Vague uploads get vague output.
- Know the answer roughly. If your manual estimate disagrees with the chart by 10x, the chart is wrong, not your gut.
Step by step
- Open a chat in a model that supports file analysis (Plus or higher). Upload one file. Wait for the “ready” indicator.
- First message — never analysis, always profile: “Describe each column: type, sample values, null count, row total.” Read it carefully. Most bad analyses trace back to wrong type inference.
- For each chart, ask one specific question. “Plot the distribution of
revenuefor rows wherechannel = paid.” One chart per ask. - After every chart, request the underlying Python code or the totals. “Print sum of revenue grouped by channel — does it match the chart?”
- For comparisons, give it the axes: “Plot
signupsover time, grouped bycountry, line chart, last 90 days only.” - Export anything you might need later — the cleaned CSV, the chart PNG. Session sandboxes expire.
Example prompts that work
1. "Describe each column. Mark which look like dates,
numbers, or categories. Flag any with >5% nulls."
2. "Group rows by month using created_at. Sum revenue
per group. Show the result as a table, not a chart."
3. "Plot daily active users, last 30 days. X-axis = day,
Y-axis = count. Annotate the highest and lowest days."
4. "Find rows where status = 'failed' and amount > 100.
Show me 10 sample rows."
First-run exercise
- Pick a real small CSV you have lying around — under 10MB, no PII.
- Profile it (step 2 above). Read the column summary in detail.
- Ask one specific chart question. Ask for the code afterward.
- Cross-check one number against the raw file (open it in Excel or
less). If they match, you can trust the chart. If not, find why.
Quality check
- Does the row count in the profile match what you expect? Wrong file uploaded is the #1 silent error.
- Do column types make sense? Numbers parsed as strings will silently break aggregations.
- For every chart, can you point to the cell in Python that computed the headline number?
- Does the chart actually answer your question, or just a paraphrase of it?
How to reuse this workflow
- Save the profile-first prompt as a template. It works for almost every CSV regardless of domain.
- Build a Custom GPT called “Data peek” with the profile prompt as the default starter.
- For files you re-analyze monthly, save the Python code that worked and re-run it yourself next time — way faster.
Recommended workflow
Upload → profile columns → one specific question → one chart → ask for code or totals → annotate findings → export the cleaned file. Total time for a small analysis: 10-20 minutes.
Common mistakes
- Uploading huge files (>50MB) — uploads time out or get sampled silently.
- Skipping the profile step. You will analyze the wrong columns and not notice.
- Trusting visuals without checking axis labels and scale. Log scale charts look great and lie.
- Asking “give me insights” instead of a specific question. Generic prompt = generic paragraph.
- Letting nulls and duplicates go unhandled. Ask explicitly: “How are nulls handled in this aggregation?”
- Closing the chat without exporting the cleaned data. The sandbox expires and your work is gone.
FAQ
- Does ChatGPT actually run code?: Yes — with Advanced Data Analysis on Plus or higher. Free tier only reasons about data, no execution.
- Will it remember my data across chats?: No. Each new chat is a fresh sandbox. Use Projects if you need persistent context (but not persistent data).
- What if my file is 500MB?: Too big — sample it first locally, or switch to a real notebook. ChatGPT will sample silently if you push it.
- Can it handle Excel files with multiple sheets?: Yes, but tell it which sheet to use. Default behavior picks the first.