ChatGPT Reads CSV But Reports Wrong Column Names or Merged Columns

Q: Why does pandas not just auto-detect the delimiter?

By default `read_csv` uses the fast C parser, which has no delimiter detection at all and assumes a comma. Detection only happens if you pass `sep=None`, which switches to the Python engine and `csv.Sniffer`. The Sniffer reads a sample and guesses, but it is unreliable on quoted fields and single-column files, so passing the delimiter explicitly is always safer.

Q: ChatGPT keeps re-reading the file with a comma even after I tell it the delimiter. Why?

Each code block runs fresh. If a later cell calls `pd.read_csv("data.csv")` without your `sep=` argument, it reverts to comma. Tell it: "reuse the `df` we already loaded with `sep=";"` and do not re-read the file." Or ask it to print the exact `read_csv` line it ran so you can confirm the separator.

Upload a CSV and ChatGPT lists weird columns, merges two into one, or treats the header as data. Delimiter and encoding detection is the usual culprit. Here is the fast fix.

Published: May 24, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You upload a CSV, ask “what columns are in this file,” and ChatGPT comes back with garbage: one giant column, the header treated as data, or two clearly separate columns smushed together. The cause is almost always delimiter or encoding detection. ChatGPT’s Advanced Data Analysis sandbox loads the file with pandas.read_csv, and pandas defaults to a comma separator. The moment your file uses semicolons, tabs, or has an unescaped comma inside a field, the read goes sideways.

Fastest fix (most cases): tell ChatGPT the delimiter explicitly. Paste this into the chat:

Reload my file with pandas using sep=";" and encoding="utf-8-sig",
then print df.columns.tolist() and df.shape before doing anything else.

Swap sep=";" for sep="\t" (tab) or sep="|" (pipe) depending on what you have. If you do not know the delimiter, jump to Step 1 below to find it. If you control the source file, the cleanest fix is to re-export as comma-delimited UTF-8 (no BOM) or paste a small table as markdown so no detection happens at all.

Which bucket are you in?

Run one diagnostic first. Ask ChatGPT:

Show me df.columns.tolist() and df.shape, and print repr() of the first raw line of the file.

Then match the output to the table:

What you see	Cause	Jump to
One column named `Name;Email;Country`	Semicolon delimiter read as comma	Cause 1
One column with tabs visible as `\t` in the raw line	Tab-delimited (TSV)	Cause 1
`ParserError: Error tokenizing data. C error: Expected N fields...`	Unescaped comma inside a field	Cause 2
First column name prints as `id` and `df["id"]` raises `KeyError`	UTF-8 BOM	Cause 3
Column names look like a sentence (`Report: Q3 Sales`)	Header row is not line 1	Cause 4
Row count is way off, garbled splits	Mixed line endings	Cause 5

Common causes

Ordered by hit rate.

1. Non-comma delimiter (semicolon, tab, pipe)

European Excel exports use ; by default because the comma is the decimal separator in those locales. Database dumps often use a tab. Older systems use |. Pandas read_csv defaults to sep=",". There is no auto-detection unless you pass sep=None, which forces the slower Python engine and uses Python’s built-in csv.Sniffer to guess. The Sniffer is unreliable on quoted fields and single-column files (it can raise Could not determine delimiter), so ChatGPT’s default comma read just collapses the row into one column.

How to spot it: Ask ChatGPT to “show me df.columns and df.shape.” If you see one column name that looks like Name;Email;Country, that’s a semicolon CSV being read as comma.

2. Unescaped commas inside fields

A column like Address containing 123 Main St, Suite 4 is fine if the field is quoted ("123 Main St, Suite 4"), but breaks if the export didn’t quote it. The comma in the address becomes a column boundary, and every subsequent value shifts one column right.

How to spot it: pandas throws ParserError: Error tokenizing data. C error: Expected 4 fields in line 12, saw 5. The line number tells you exactly which row has the rogue comma. (If the bad rows are few and disposable, on_bad_lines="warn" will read the rest and tell you which lines it dropped, but do not use on_bad_lines="skip" on data you care about, it silently deletes rows.)

3. BOM or encoding header pollution

UTF-8 with BOM puts an invisible byte-order mark (U+FEFF) at the start of the file. Pandas reads it as part of the first column name, so id becomes id and any df["id"] lookup fails. Windows Excel “CSV (Comma delimited)” exports add a BOM by default; the separate “CSV UTF-8” option also writes a BOM.

How to spot it: First column name has a weird prefix, or df["id"] raises KeyError: 'id' even though id clearly exists. Fix it with encoding="utf-8-sig", which strips the BOM on read.

4. Header row not on line 1

Some exports put title metadata on lines 1-3 and the real header on line 4. Pandas uses line 1 as the header, so column names look like Report: Q3 Sales and the actual headers become data rows.

How to spot it: Column names look like English sentences instead of short identifiers. Fix with header=3 (zero-indexed: skips the first 3 lines, uses line 4 as the header) or skiprows=3.

5. Mixed line endings (CR inside a CRLF file)

Old Mac files use \r, Windows uses \r\n, Unix uses \n. Mixing them inside one file (rare, but happens with copy-paste between systems) breaks row splitting. The C parser may see the whole thing as one or two giant rows.

How to spot it: df.shape shows a row count far from what you expect. Fix by passing lineterminator is not enough here; re-save the file with consistent endings, or open and re-export it (Step 3).

Shortest path to fix

Step 1: Confirm the delimiter by reading the raw first line

Before ChatGPT loads the file with pandas, ask:

Open the file in binary mode, read the first 200 bytes, print as repr().
Don't use pandas yet.

You will see something like b'name;email;country\r\nalice;a@x.com;US\r\n'. The ; between fields is the delimiter, the \r\n is the line ending, and a leading \xef\xbb\xbf (if present) is the BOM. Now you know all three.

Step 2: Re-read with the correct delimiter

import pandas as pd
df = pd.read_csv("data.csv", sep=";", encoding="utf-8-sig")
print(df.columns.tolist())
print(df.shape)
print(df.head())

encoding="utf-8-sig" strips the BOM. sep=";" handles European CSVs; for tab use sep="\t", for pipe use sep="|". If fields have leading spaces after the delimiter, add skipinitialspace=True. For messy quoting that still fails, add engine="python" and quoting=csv.QUOTE_ALL (you’ll need import csv first).

If you genuinely do not know the delimiter and Step 1 was inconclusive, let pandas sniff it: pd.read_csv("data.csv", sep=None, engine="python"). This is a fallback, not a default, because the sniffer guesses wrong on quoted or single-column files.

Step 3: If re-export is easier, save as proper UTF-8 CSV

In Excel: File - Save As - CSV UTF-8 (Comma delimited) (.csv). In Google Sheets: File - Download - Comma-separated values (.csv). Google Sheets writes comma-delimited UTF-8 with no BOM, which pandas reads cleanly with zero extra args. Excel’s “CSV UTF-8” adds a BOM, so still pass encoding="utf-8-sig" for Excel exports.

Step 4: For tiny tables, paste the first 5 rows as markdown

If the file is small or you only need a quick analysis, skip the upload entirely:

Here are the first 5 rows of my data:

| id | name  | email       | country |
|----|-------|-------------|---------|
| 1  | Alice | a@x.com     | US      |
| 2  | Bob   | b@x.com     | DE      |

Answer based on the schema above.

ChatGPT parses markdown tables flawlessly, with no delimiter detection involved.

Step 5: Convert to XLSX as a last resort

If the CSV is a mess and you cannot fix the source, open it once in Excel using Data - Get & Transform Data - From Text/CSV (the modern Power Query importer, which replaced the old Text Import Wizard). In the preview pane you can set the delimiter and the file origin (encoding) by hand before loading. Save the result as .xlsx and upload that. XLSX has explicit column types and no delimiter ambiguity. The sandbox reads it with pd.read_excel.

Note the upload ceiling: as of June 2026 ChatGPT accepts spreadsheets up to roughly 50 MB. Spreadsheets are exempt from the text token limit, so a large CSV that fails as a “document” may upload fine as data, but it still has to fit under the file-size cap.

How to confirm the fix

Once you have a clean read, run these three checks before trusting any analysis:

# 1. Row count matches the source
print("expected:", 12453)
print("got:    ", len(df))

# 2. Column count and names match
print(df.columns.tolist())

# 3. Spot-check three random rows
print(df.sample(3, random_state=42))

If all three agree with the source spreadsheet, the read is correct and you can trust everything downstream.

FAQ

Why does pandas not just auto-detect the delimiter? By default read_csv uses the fast C parser, which has no delimiter detection at all and assumes a comma. Detection only happens if you pass sep=None, which switches to the Python engine and csv.Sniffer. The Sniffer reads a sample and guesses, but it is unreliable on quoted fields and single-column files, so passing the delimiter explicitly is always safer.

ChatGPT keeps re-reading the file with a comma even after I tell it the delimiter. Why? Each code block runs fresh. If a later cell calls pd.read_csv("data.csv") without your sep= argument, it reverts to comma. Tell it: “reuse the df we already loaded with sep=";" and do not re-read the file.” Or ask it to print the exact read_csv line it ran so you can confirm the separator.

My first column is named id and lookups fail. What is that? That is a UTF-8 byte-order mark (BOM) glued to the first column name. Re-read with encoding="utf-8-sig" and the prefix disappears. Windows Excel adds it on export.

I got Error tokenizing data. C error: Expected N fields in line X. What now? One row has more (or fewer) separators than the header, usually an unescaped comma inside a text field. Line X is the culprit; open it and check. The real fix is to re-export with every field quoted (quoting=csv.QUOTE_ALL). To read the rest and see what is being dropped, use on_bad_lines="warn".

Does this also apply to Excel (.xlsx) uploads? No. XLSX stores columns and types explicitly, so there is no delimiter or encoding to guess. If your CSV is unfixable, converting to XLSX (Step 5) sidesteps the whole class of problem.

Prevention

Standardize exports: always UTF-8, comma-delimited, no BOM, header on line 1.
For data with addresses, names, or any free text, quote every field (quoting=csv.QUOTE_ALL in Python’s csv module) so embedded commas never break the column count.
When sharing data with ChatGPT, prefer XLSX or markdown tables over CSV for anything under 1000 rows.
Always ask ChatGPT to print df.columns.tolist() and df.head(3) as the very first analysis step, which confirms the read worked before you trust any conclusions.
If you control the export pipeline, write a tiny validator that round-trips the CSV through pandas and asserts the column count matches expected.
For ongoing dashboards, write a one-time normalizer that takes any input CSV (any delimiter, any encoding) and emits a canonical UTF-8 comma CSV. Run it as a pre-step on every upload.

Tags: #ChatGPT #ChatGPT files #Troubleshooting #Debug #csv

Which bucket are you in?

Common causes

1. Non-comma delimiter (semicolon, tab, pipe)

2. Unescaped commas inside fields

3. BOM or encoding header pollution

4. Header row not on line 1

5. Mixed line endings (CR inside a CRLF file)

Shortest path to fix

Step 1: Confirm the delimiter by reading the raw first line

Step 2: Re-read with the correct delimiter

Step 3: If re-export is easier, save as proper UTF-8 CSV

Step 4: For tiny tables, paste the first 5 rows as markdown

Step 5: Convert to XLSX as a last resort

How to confirm the fix

FAQ

Prevention

Related

Related Articles

ChatGPT Silently Rejects Password-Protected PDFs

ChatGPT Reads Excel but Ignores Formulas (Returns Them as Strings)

ChatGPT Still Uses the Old File Version After You Re-Uploaded

ChatGPT: 'No Text Could Be Extracted From This File' (Scanned / Handwritten PDF)

ChatGPT Treats Uploaded JSON as Plain Text Instead of Structured Data

Renaming a ChatGPT Project Breaks Its Share Link