Ollama Pull Stalls or Resets Mid-Download — Fixes

Q: Can I pre-download on another machine and copy the files over?

Yes. Copy the entire `~/.ollama/models/` directory — both `blobs/` and `manifests/` — to the target machine. Run `ollama list` and the model appears with no new network download. Paths differ per OS (Linux system installs use `/usr/share/ollama/.ollama/models`).

Q: It downloaded to 100% but `ollama list` doesn't show the model — why?

All blobs landed but the final `manifests/` write failed, usually from running out of space or a permission issue at the very end. Check `~/.ollama/models/manifests/` for the entry; if it's missing, free space, fix permissions, and re-pull (completed blobs are reused).

Q: How do I pull a specific quantization instead of the default?

Use the tag syntax, e.g. `ollama pull llama3.3:70b-instruct-q4_K_M`. The full tag list (Q4_K_M, Q5_K_M, Q8_0, etc.) is on the model's library page at `https://ollama.com/library/llama3.3/tags`. Smaller quants download faster and fail less often on flaky links.

ollama pull freezes at a percentage, the bar runs backwards, or you see max retries exceeded: EOF. Diagnose network, disk, and partial-blob causes and resume cleanly.

Published: May 25, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You run ollama pull llama3.3:70b on a fast connection, it crawls to 47%, and the bar stops refreshing. Sometimes it runs backwards and the total size shrinks; sometimes it dies with max retries exceeded: EOF or connection reset by peer. This is almost never the registry being down. As of June 2026 (Ollama v0.30.x), pulls are split into up to 16 parallel byte-range parts that redirect from registry.ollama.ai to Cloudflare R2 object storage — so the failure is usually on your path to R2: a proxy or VPN that kills long streams, a full disk, or a leftover -partial chunk that fails its checksum.

Fastest fix (works for most stalls): disable any VPN, delete leftover partial files, and re-run the exact same ollama pull command — Ollama resumes completed parts and only re-fetches what failed:

# 1. turn off VPN / split-tunnel ollama if you must keep it on
# 2. clear half-written parts (NOT whole blobs)
find ~/.ollama/models/blobs -name '*-partial-*' -delete
# 3. resume
ollama pull llama3.3:70b

If that does not move it, work the buckets below in order.

Which bucket are you in?

Symptom you see	Most likely cause	Jump to
Bar freezes at the same % every retry	Corrupt `-partial` chunk failing checksum	Cause 3
`max retries exceeded: EOF` after a long pause	TCP stream to R2 dropped (proxy/VPN/idle timeout)	Cause 1
Stalls only behind office network or VPN	Proxy/VPN not routing `*.r2.cloudflarestorage.com`	Cause 4
`no space left on device` or silent freeze near end	Disk or inodes full mid-write	Cause 2
Slow but never errors, writes to a NAS/SMB path	`OLLAMA_MODELS` on a network mount	Cause 5
Bar visibly runs backwards then recovers	Normal: a failed part is discarded and re-fetched	FAQ

Common causes

Ordered by hit rate, highest first.

1. The TCP stream to Cloudflare R2 gets dropped

Each blob layer is fetched as parallel HTTP range requests, and the large layers redirect from registry.ollama.ai to *.r2.cloudflarestorage.com. Corporate proxies, home routers with aggressive idle-connection timeouts, and many VPNs silently drop a long-lived TLS connection after 30-120 seconds. Ollama’s downloader treats a part with no bytes written for 30 seconds as stalled and retries it (exponential backoff, up to 6 attempts per part as of v0.30.x); when all 6 are exhausted you get max retries exceeded: EOF.

How to spot it. Watch the live connection while the pull hangs:

# Linux
ss -tn state established '( dport = :443 )' | grep -iE 'r2|cloudflare|ollama'
# macOS
netstat -an | grep '\.443 .*ESTABLISHED'

If the connection to the R2 host disappears for tens of seconds at a time, the TLS session is being dropped. VPNs are the single most common trigger — disable yours and retry before anything else.

2. The target disk fills up mid-write

Ollama writes blobs to ~/.ollama/models/blobs/ (macOS), /usr/share/ollama/.ollama/models/blobs/ (Linux system service), or C:\Users\%username%\.ollama\models\blobs\ (Windows). If the volume runs out of space — or out of inodes — mid-download, writes block silently or surface as no space left on device.

How to spot it. Run both while the pull hangs:

df -h ~/.ollama       # is Use% at/near 100%?
df -i ~/.ollama       # is IUse% at 100%? (inode exhaustion)

llama3.3:70b at the default Q4_K_M quant needs roughly 40 GB free; the Q8_0 variant needs about 70 GB. Keep headroom of at least 1.2x the model size.

3. Corrupt `-partial` chunk from an interrupted pull

A killed or crashed pull leaves resumable part files named sha256-<digest>-partial-* in the blobs directory. On the next pull Ollama scans for these and resumes them — but if a part was half-flushed and its bytes do not match the expected range checksum, the layer can wedge at the same percentage on every retry.

How to spot it. List leftover parts:

ls -lh ~/.ollama/models/blobs/ | grep -- '-partial-'

Any *-partial-* file from a prior session that keeps stalling at the same point is suspect. Deleting just the partials forces a clean re-fetch of only the failed layer (completed blobs are kept).

4. Proxy or VPN only covers the registry, not the R2 CDN

This is the office-network classic. Your proxy rules or VPN split-tunnel allow registry.ollama.ai, so the manifest downloads instantly, but the large layer requests redirect to *.r2.cloudflarestorage.com, which is not whitelisted — so the first big layer hangs or resets.

How to spot it. Follow the redirect manually and see where the bytes actually come from:

# -L follows the redirect to R2; watch the final Location/host
curl -sIL https://registry.ollama.ai/v2/ | grep -i '^location\|^http/'

Fix. Route both hosts. Ollama uses HTTPS only for pulls, so set HTTPS_PROXY (the docs explicitly warn against HTTP_PROXY, which Ollama ignores for model pulls):

export HTTPS_PROXY=http://your-proxy:port
# or bypass the proxy for Ollama traffic entirely
export NO_PROXY=registry.ollama.ai,.r2.cloudflarestorage.com
ollama pull llama3.3:70b

If your proxy does TLS interception, its CA must be in the system trust store or the R2 handshake fails.

5. `OLLAMA_MODELS` points at a network-mounted path

If you set OLLAMA_MODELS=/mnt/nas/ollama or similar, sustained 5-40 GB writes over NFS or SMB can buffer and then block when the server’s write cache fills, which the downloader experiences as a stall.

How to spot it. echo $OLLAMA_MODELS — if it resolves to a network share, pull to a local disk first, then move the models/ directory across.

6. Real-time antivirus scanning the blob writes

Real-time AV scanning every multi-gigabyte file as it lands can throttle disk I/O enough to trip the 30-second stall timeout on each part.

How to spot it. Temporarily disable real-time protection, restart the pull, and watch throughput. On Windows, add %USERPROFILE%\.ollama to Microsoft Defender exclusions (Settings → Privacy & security → Windows Security → Virus & threat protection → Manage settings → Exclusions).

Shortest path to fix

Step 1: Clear partial chunks and verify disk headroom

# remove half-written resumable parts (keeps completed blobs)
find ~/.ollama/models/blobs -name '*-partial-*' -delete

# delete any zero-byte blobs left behind
find ~/.ollama/models/blobs -type f -size 0 -delete

# confirm space AND inodes
df -h ~/.ollama
df -i ~/.ollama

Step 2: Disable VPN, then re-run with debug logging

OLLAMA_DEBUG=1 ollama pull llama3.3:70b

With OLLAMA_DEBUG=1 Ollama logs each layer URL and byte range. If it wedges on a specific layer hash, note that hash for Step 3. Disabling any VPN first removes the most common stall trigger.

Step 3: Test the real download path (R2), not just the registry

# registry should answer instantly
curl -sI https://registry.ollama.ai/v2/ | head -1

# follow a manifest/blob redirect to R2 and measure throughput
curl -sL -o /dev/null --max-time 60 -w 'speed=%{speed_download} B/s\nfinal=%{url_effective}\n' \
  "https://registry.ollama.ai/v2/library/llama3.3/blobs/<digest-from-debug>"

If final= lands on an *.r2.cloudflarestorage.com URL and speed is under ~1 MB/s or times out, the bottleneck is your path to R2 — go to Cause 4 (proxy/VPN) or try a different network.

Step 4: Pull over a clean network or fix proxy routing

# route both hosts through the proxy, or bypass it for Ollama
export NO_PROXY=registry.ollama.ai,.r2.cloudflarestorage.com
ollama pull llama3.3:70b

A phone hotspot is a fast way to prove the corporate network is the problem.

Step 5: Move OLLAMA_MODELS to local fast storage

# in ~/.bashrc or ~/.zshrc — use a LOCAL NVMe path, not a NAS
export OLLAMA_MODELS=/path/to/local/ssd/ollama

# restart the server so it picks up the new path
# (macOS app: quit and relaunch; Linux: sudo systemctl restart ollama)
ollama pull llama3.3:70b

Step 6: Prove the stack works with a tiny model first

ollama pull llama3.2:3b
ollama run llama3.2:3b "say hello"

If a 2 GB model pulls and answers cleanly, your install, disk, and registry auth are all fine — the 70B stall is purely a network-duration problem, so focus on Causes 1 and 4.

How to confirm it’s fixed

# 1. the model is registered
ollama list | grep llama3.3

# 2. no stray partial chunks remain
find ~/.ollama/models/blobs -name '*-partial-*'   # should print nothing

# 3. it actually loads and generates
ollama run llama3.3:70b "reply with the single word: ready"

A clean ollama list entry plus a real generated token means the blobs and the manifest both wrote successfully.

Prevention

Disable VPN (or split-tunnel registry.ollama.ai and *.r2.cloudflarestorage.com out of it) before pulling models over 10 GB.
Keep at least 1.2x the model size free; check df -h and df -i ~/.ollama before a large pull.
Behind a corporate proxy, set HTTPS_PROXY (never HTTP_PROXY) and whitelist both the registry and the R2 CDN.
Put OLLAMA_MODELS on local NVMe, never a network share.
Add ~/.ollama (or %USERPROFILE%\.ollama on Windows) to your antivirus exclusions.
Use OLLAMA_DEBUG=1 on any pull over 20 GB so you can see exactly which layer stalls.
After any interrupted pull, run find ~/.ollama/models/blobs -name '*-partial-*' -delete before retrying to avoid a checksum loop.

FAQ

Q: Does Ollama resume a download after a network drop, or restart from zero? A: It resumes. As of v0.30.x, Ollama stores resumable part files (sha256-<digest>-partial-*) and, on the next pull, skips completed blobs and continues the in-progress layer from where it stopped. The catch is corruption: if a part was half-flushed and fails its checksum, that layer is discarded and re-fetched. Deleting only the *-partial-* files (not whole blobs) gives the cleanest resume.

Q: Why does the progress bar run backwards and the total size shrink? A: Pulls run up to 16 parts in parallel and the displayed percentage is an aggregate. When one part fails its checksum or its TCP stream drops, that part’s bytes are discarded and re-downloaded, so the aggregate drops. It is annoying but expected behavior, not corruption — let it recover.

Q: My pull works at home but stalls on the office network. What’s different? A: Almost always the Cloudflare R2 redirect. Your proxy allows registry.ollama.ai (so the manifest loads) but blocks or throttles *.r2.cloudflarestorage.com where the real bytes live. Whitelist both hosts, or pull at home and copy the models/ directory in.

Q: Can I pre-download on another machine and copy the files over? A: Yes. Copy the entire ~/.ollama/models/ directory — both blobs/ and manifests/ — to the target machine. Run ollama list and the model appears with no new network download. Paths differ per OS (Linux system installs use /usr/share/ollama/.ollama/models).

Q: It downloaded to 100% but ollama list doesn’t show the model — why? A: All blobs landed but the final manifests/ write failed, usually from running out of space or a permission issue at the very end. Check ~/.ollama/models/manifests/ for the entry; if it’s missing, free space, fix permissions, and re-pull (completed blobs are reused).

Q: How do I pull a specific quantization instead of the default? A: Use the tag syntax, e.g. ollama pull llama3.3:70b-instruct-q4_K_M. The full tag list (Q4_K_M, Q5_K_M, Q8_0, etc.) is on the model’s library page at https://ollama.com/library/llama3.3/tags. Smaller quants download faster and fail less often on flaky links.

Tags: #local-llm #ollama #Troubleshooting

Which bucket are you in?

Common causes

1. The TCP stream to Cloudflare R2 gets dropped

2. The target disk fills up mid-write

3. Corrupt -partial chunk from an interrupted pull

4. Proxy or VPN only covers the registry, not the R2 CDN

5. OLLAMA_MODELS points at a network-mounted path

6. Real-time antivirus scanning the blob writes

Shortest path to fix

Step 1: Clear partial chunks and verify disk headroom

Step 2: Disable VPN, then re-run with debug logging

Step 3: Test the real download path (R2), not just the registry

Step 4: Pull over a clean network or fix proxy routing

Step 5: Move OLLAMA_MODELS to local fast storage

Step 6: Prove the stack works with a tiny model first

How to confirm it’s fixed

Prevention

FAQ

Related

Related Articles

llama.cpp mmap Fails on a Network Drive

llama.cpp Quality Drops After Switching to a More Aggressive Quant

LM Studio Out of Memory When Loading a Model

Local Embedding Server Crashes Under Batched Requests

Chat-Template Mismatch Produces Garbage Local LLM Output

Multi-GPU Not Used — Local LLM Runs Only on GPU 0

3. Corrupt `-partial` chunk from an interrupted pull

5. `OLLAMA_MODELS` points at a network-mounted path