ChatGPT vision (GPT-5.5 / 5-class) is genuinely useful but lossy: images are downscaled to the model’s max tile size (typically ≤ 2048px on the long edge) before processing, so small text, low contrast, and complex charts can all misread. “It doesn’t understand” almost never means total blindness — it’s usually resolution plus prompting together causing precision loss. Upload at high res, then write a directed prompt that focuses attention on the right region, and most misreads are fixable.
Common causes
Ordered by hit rate, highest first.
1. Resolution too low / image got compressed
The most common failure. A phone screenshot sent through WhatsApp and re-downloaded is now a thumbnail with long edge < 800px. Text gets blurry, gets misread.
How to spot it: Click the image in ChatGPT to view original size. Long edge < 1024px = likely cause. Re-upload the original without any compression pipeline.
2. JPEG compression / blur on small text
JPEG produces ringing artifacts on small text / edges. OCR reads “0” as “O”, “5” as “S”.
How to spot it: Misreads concentrate on small text (< 12pt) = compression issue. Switch to PNG and re-upload.
3. Dark-mode screenshot, low contrast
Dark background + dark grey text (IDE / Notion default themes) drops vision recognition accuracy noticeably.
How to spot it: Re-screenshot in light mode and accuracy jumps = contrast issue.
4. Prompt doesn’t specify what to extract
“Look at this image” → model returns a vague description, never reads specifics. “Read every y-axis tick number” → it scans that region specifically.
How to spot it: Your prompt is just “look at / describe / what is this” = too generic.
5. The chart / data visual itself is missing context
No axis labels, no legend, similar colors, overlapping stacked bars — even a human can’t read it; the model can’t either.
How to spot it: Show your image to a colleague for 1 second; if they can’t read the numbers either, the image itself lacks signal.
6. Handwriting / archaic characters / rare fonts
Vision performs much worse on handwriting, traditional Chinese, Japanese kanji, cursive — compared to standard printed text.
How to spot it: Recognition < 50% and all errors are these content types = current capability boundary. Transcribe first.
7. Screen glare / perspective distortion
Phone photos of computer screens have moiré patterns; photos of paper documents have perspective skew.
How to spot it: Regular striped pattern / text not horizontal / hotspot reflections = capture problem. Take a clean screenshot or shoot straight-on.
Before you start
- Confirm whether this happens in a plain chat, a Project, or a Custom GPT — vision capability is the same but quotas may differ.
- Back up the chat and original image before retesting so history doesn’t pollute the next diagnostic.
- Confirm your plan: Free users have limited daily vision calls — over-quota requests fail.
Info to collect
- Image’s actual resolution (W × H), file size (KB), format (PNG / JPG / HEIC).
- Origin: own photo, screenshot, sent by someone, downloaded.
- Full prompt text + the misread reply (which characters / numbers were wrong).
- Current model + whether in Project / Custom GPT.
Shortest fix path
Ordered by ROI. The first two solve ~70% of cases.
Step 1: Re-export at 1500px+, use PNG
Low resolution is the most common and easiest fix:
- Use the system screenshot tool (macOS
Cmd+Shift+4, WindowsWin+Shift+S) and save the original without any compression hop. - Move phone screenshots via AirDrop / Telegram-uncompressed / iCloud, not WhatsApp / WeChat.
- For already-low-res images, upscale: macOS Preview → Tools → Adjust Size → 1500px long edge, 300 dpi.
- PNG for text / screenshots, JPG for photos (quality 95+).
Step 2: Crop to the region you actually care about
Full-window screenshot → model attention is diluted. Crop to the 200×200 area that contains the actual question, and recognition jumps:
# macOS built-in
Cmd+Shift+4 → drag a region → auto-saved to Desktop
# Dedicated tools
Snipaste / ShareX / Skitch (with annotation)
Step 3: Directed prompt to focus attention
Don’t ask “what’s in this image.” Give a specific extraction task:
Text:
Transcribe every word visible in this image, in reading order
(top-to-bottom, left-to-right). Use "[unclear]" for any character
you cannot read with high confidence.
Chart data:
This is a bar chart. List each x-axis label and its corresponding
y-axis value as a two-column table. State your confidence (high /
medium / low) for each row.
UI screenshot:
Read the text inside the red box only. Ignore everything outside it.
Step 4: Mark important regions on the image
Vision pays noticeably more attention to boxed / arrowed / highlighted areas:
- After screenshot, draw a red box / arrow with Preview / Snipaste / Skitch.
- In the prompt say “focus on the red box / arrow.”
- Multiple regions → different colors + prompt mapping (“green box = section A, red box = section B”).
Step 5: Invert dark-mode screenshots
Dark IDE / Notion screenshots recognize poorly:
- macOS Preview → Tools → Adjust Color → Invert.
- Or temporarily switch your IDE to light theme and re-screenshot.
- Or boost screen brightness + font size before capture.
In PowerPoint / Keynote you can also right-click → “Picture Format → Color Corrections” for brightness / contrast.
Step 6: Transcribe handwriting / rare fonts first
Don’t fight recognition. Handwritten notes / archaic text / business cards:
- Handwriting: Apple Notes built-in handwriting → text conversion; GoodNotes / Notability export PDF + OCR.
- Printed classical text: Google Lens / ABBYY FineReader.
- Paste the transcribed text into ChatGPT — much more reliable than “read the image.”
How to confirm the fix
- Open a fresh chat, upload the same image, ask the same question — accurate = truly fixed (not a lucky guess last time).
- Have it describe where in the image the key numbers / characters are (“top right corner, just below the logo”) — position match means it actually read them.
- Have a colleague follow the same capture pipeline and upload — consistent accuracy means the pipeline is stable.
If still broken
- Cut the image to the minimum: keep only the area you need to read (even one character), see if the smallest case works.
- Swap image source: screen capture → re-photograph paper → AI-generate a sample of the same content — rules out a source-quality issue.
- Switch model: 4o vs GPT-5 vs Claude vs Gemini vision — strengths differ on handwriting / tables / charts.
- Package source image + prompt + error symptom + expected output, file a ticket at help.openai.com.
Prevention
- Always use system screenshot tools and save originals — never route through chat apps that recompress.
- For high-stakes recognition (contracts / financial tables): PNG + long edge ≥ 2000px + light background.
- For paper documents, use a scanner app (Adobe Scan / CamScanner) that corrects perspective and contrast — much better than a straight photo.
- Always require “state confidence per item” for numerical / text reads and double-check low-confidence ones manually.
- Before asking vision to read a chart, ask yourself: can I get the raw CSV / data? If yes, skip the image entirely.
Related reading
- ChatGPT image generation failed
- ChatGPT image edit not applied
- ChatGPT file analysis too shallow
- ChatGPT Projects
- ChatGPT file analysis
- ChatGPT Projects advanced workflow
Tags: #ChatGPT #ChatGPT files #Troubleshooting #Debug #Image upload