ChatGPT Reads CSV But Reports Wrong Column Names or Merged Columns

Upload a CSV and ChatGPT lists weird columns, merges two into one, or treats the header row as data — delimiter detection is the usual culprit.

You upload a CSV, ask “what columns are in this file,” and ChatGPT comes back with garbage — one giant column, the header treated as data, or two clearly separate columns smushed together. The cause is almost always delimiter detection: Code Interpreter loads the file with pandas auto-sniffing, and pandas guesses wrong on semicolon-delimited European CSVs, tab-delimited TSVs, or files where a single field contains an unescaped comma. The fastest fix is to re-export as a clean comma-delimited UTF-8 CSV, or hand ChatGPT the first 5 rows pasted as a markdown table so it stops guessing.

Common causes

Ordered by hit rate.

1. Non-comma delimiter (semicolon, tab, pipe)

European Excel exports use ; by default. Database dumps often use tab. Older systems use |. Pandas read_csv defaults to , and sep="infer" only works on simple files — once you have quoted fields or odd whitespace, it gives up and treats the whole row as one column.

How to spot it: Ask ChatGPT to “show me df.columns and df.shape.” If you see one column name that looks like Name;Email;Country, that’s a semicolon CSV being read as comma.

2. Unescaped commas inside fields

A column like Address containing "123 Main St, Suite 4" is fine if the field is quoted ("..."), but breaks if the export didn’t quote it. The comma in the address becomes a column boundary, and every subsequent row shifts one column right.

How to spot it: Column count looks normal on row 1 but df.shape shows extra columns, or one row has values landing under the wrong header.

3. BOM or encoding header pollution

UTF-8 with BOM puts an invisible  at the start of the file. Pandas reads it as part of the first column name, so id becomes id and any df["id"] lookup fails. Windows Excel exports add BOM by default.

How to spot it: First column name has a weird prefix or df["id"] raises KeyError even though id clearly exists.

4. Header row not on line 1

Some exports put title metadata on lines 1-3 and the real header on line 4. Pandas uses line 1 as header, so column names look like Report: Q3 Sales and the actual headers become data rows.

How to spot it: Column names look like English sentences instead of short identifiers.

5. Mixed line endings (CRLF inside a CR file)

Old Mac files use \r, Windows uses \r\n, Unix uses \n. Mixing them inside one file (rare, but happens with copy-paste) breaks row splitting.

Shortest path to fix

Step 1: Confirm the delimiter by asking for the raw first line

Before ChatGPT loads the file with pandas, ask:

Open the file in binary mode, read the first 200 bytes, print as repr().
Don't use pandas yet.

You will see something like b'name;email;country\r\nalice;a@x.com;US\r\n'. The ; is the delimiter. Now you know.

Step 2: Re-read with the correct delimiter

import pandas as pd
df = pd.read_csv("data.csv", sep=";", encoding="utf-8-sig")
print(df.columns.tolist())
print(df.shape)
print(df.head())

encoding="utf-8-sig" strips the BOM. sep=";" handles European CSVs. For tab, use sep="\t". For mixed quoting use quoting=csv.QUOTE_ALL and engine="python".

Step 3: If re-export is easier, save as proper UTF-8 CSV

In Excel: File - Save As - CSV UTF-8 (Comma delimited) (.csv). In Google Sheets: File - Download - Comma-separated values (.csv). Both produce comma-delimited UTF-8 without BOM, which pandas reads cleanly with zero extra args.

Step 4: For tiny tables, paste the first 5 rows as markdown

If the file is small or you only need a quick analysis, skip the upload entirely:

Here are the first 5 rows of my data:

| id | name  | email       | country |
|----|-------|-------------|---------|
| 1  | Alice | a@x.com     | US      |
| 2  | Bob   | b@x.com     | DE      |

Answer based on the schema above.

ChatGPT parses markdown tables flawlessly — no delimiter detection involved.

Step 5: Convert to XLSX as a last resort

If the CSV is a mess and you cannot fix the source, open it once in Excel with the Import wizard (specify delimiter manually), save as .xlsx, and upload that. XLSX has explicit column types and no delimiter ambiguity. Code Interpreter reads it with pd.read_excel.

How to confirm the fix

Once you have a clean read, run these three checks before trusting any analysis:

# 1. Row count matches the source
print("expected:", 12453)
print("got:    ", len(df))

# 2. Column count and names match
print(df.columns.tolist())

# 3. Spot-check three random rows
print(df.sample(3, random_state=42))

All three agree with the source spreadsheet — you’re good.

Prevention

  • Standardize exports: always UTF-8, comma-delimited, no BOM, header on line 1.
  • For data with addresses, names, or any free text — quote every field (quoting=csv.QUOTE_ALL in Python’s csv module).
  • When sharing data with ChatGPT, prefer XLSX or markdown tables over CSV for anything under 1000 rows.
  • Always ask ChatGPT to print df.columns.tolist() and df.head(3) as the very first analysis step — confirms the read worked before you trust any conclusions.
  • If you control the export pipeline, write a tiny validator that round-trips CSV through pandas and asserts column count matches expected.
  • For ongoing dashboards, write a one-time normalizer that takes any input CSV (any delimiter, any encoding) and emits a canonical UTF-8 comma CSV. Run it as a pre-step on every upload.

Tags: #ChatGPT #ChatGPT files #Troubleshooting #Debug #csv