Large File in History Blocks the Push

GitHub rejects your push because a file over 100 MB exists in history. Remove it with git filter-repo and push cleanly.

You committed a 250 MB dataset, realized the mistake, deleted it, committed the deletion, and ran git push origin main — only to see remote: error: File dataset.csv is 250.00 MB; this exceeds GitHub's file size limit of 100.00 MB. Deleting a file in a new commit does not remove it from Git history; every clone must still download all historical versions of the blob. The file must be surgically excised from every commit in history, which rewrites the commit SHAs — a destructive operation that requires coordination with teammates who have clones.

Common causes

Ordered by hit rate, highest first.

1. Large dataset or model weights committed directly

Data scientists often commit CSVs, HDF5 files, or ONNX model weights directly to the repo for “convenience,” not realizing the 100 MB GitHub limit or the clone-size impact.

How to spot it: git log --all --stat | grep -E "[0-9]+ files? changed" | head -20 combined with git rev-list --objects --all | git cat-file --batch-check | sort -k3 -n | tail -20 to see the largest blobs.

2. Build artifact accidentally committed

A compiled binary (*.jar, *.exe, dist/*.js.gz larger than 100 MB) was committed as part of a “full release” commit. The CI server now fails on every fresh clone.

How to spot it: Check .gitignore — the large file extension is missing. git ls-files | xargs ls -lh | sort -k5 -rh | head -10 shows large tracked files.

3. Database dump or log file committed

pg_dump production.sql (2 GB) committed “temporarily” for a migration and never removed from the working tree before the next commit.

How to spot it: File extension is .sql, .log, .dump, or .bak. git show HEAD:production.sql | wc -c shows the blob size.

4. LFS migration started but not completed before push

Developer ran git lfs track "*.psd" and committed the .gitattributes change, but the existing large .psd blobs in history were never migrated from regular blobs to LFS pointers.

How to spot it: git lfs ls-files shows no files, but .gitattributes has LFS track rules. The large blobs are still regular objects.

5. Submodule accidentally bundled instead of referenced

A developer copied a dependency repo into the project directory and committed the entire thing — including its .git directory’s objects as loose files — as regular blobs.

How to spot it: git ls-tree -r HEAD | grep -E "^[0-9]+ blob" | awk '{print $3, $4}' | while read sha path; do git cat-file -s $sha; echo $path; done | sort -n | tail -20

Shortest path to fix

Step 1: Install git-filter-repo (preferred over BFG)

pip install git-filter-repo
# or: brew install git-filter-repo
git filter-repo --version   # confirm installation

Step 2: Identify all large blobs across all history

git rev-list --objects --all \
  | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
  | awk '/^blob/ { print $3, $4 }' \
  | sort -rn \
  | head -20

Note the paths of files over 50 MB.

Step 3: Back up the repository before rewriting

cd ..
cp -r myrepo myrepo-backup
cd myrepo
git tag backup/before-filter-repo

Step 4: Remove the large file from all history

git filter-repo --path dataset.csv --invert-paths --force

For multiple files at once:

git filter-repo \
  --path dataset.csv \
  --path models/weights.onnx \
  --path dumps/production.sql \
  --invert-paths --force

Step 5: Add the file to .gitignore to prevent recurrence

echo "dataset.csv" >> .gitignore
echo "*.onnx" >> .gitignore
git add .gitignore
git commit -m "chore: ignore large files that must not be committed"

Step 6: Force-push all branches and tags

git remote add origin <remote-url>   # if filter-repo removed the remote
git push --force-with-lease --all
git push --force-with-lease --tags

Notify all teammates: they must re-clone or run git fetch --all && git reset --hard origin/main on every local checkout because all SHAs have changed.

Prevention

  • Add a pre-commit hook that blocks files over a size threshold:
# .git/hooks/pre-commit  (chmod +x)
LIMIT=52428800  # 50 MB
git diff --cached --name-only | while read f; do
  size=$(git cat-file -s ":$f" 2>/dev/null || echo 0)
  if [ "$size" -gt "$LIMIT" ]; then
    echo "ERROR: $f is $(($size/1048576))MB — exceeds 50MB limit."
    exit 1
  fi
done
  • Use Git LFS for all binary assets over 10 MB: git lfs track "*.psd" "*.onnx" "*.zip".
  • Store datasets in object storage (S3, GCS, Azure Blob) and reference them by URL or hash in the repo.
  • Configure git config http.postBuffer 524288000 only as a last resort for legitimate large binary pushes — never as a workaround for files that should not be committed.
  • Add a .gitignore template at repo initialization time that covers common large-file extensions.
  • Run git-secrets or trufflehog as a pre-push hook to catch both secrets and large accidental commits.
  • For monorepos, enforce a per-directory size budget via CI checks that fail the build if any committed blob exceeds the threshold.

FAQ

Q: Can I use BFG Repo Cleaner instead of git-filter-repo? A: Yes, BFG is faster for simple cases: bfg --delete-files dataset.csv. However, git filter-repo is the official Git project recommendation, handles edge cases better, and is actively maintained.

Q: After filter-repo, my remote says “refusing to update stale pack.” What do I do? A: git filter-repo removes the origin remote as a safety measure. Re-add it with git remote add origin <url> and then push with --force.

Q: Teammates cloned before I rewrote history. What should they do? A: The safest approach is a fresh clone. Alternatively, teammates can run git fetch --all followed by git rebase --onto origin/main ORIG_HEAD main to replay their local-only commits on top of the rewritten history.

Q: The large file was only in one commit months ago. Do I still need to rewrite all history? A: Yes. Git history is a chain of SHA-linked snapshots. Even if the file was “deleted” in commit 50, it still exists as a blob object referenced by commit 5. Removing it requires rewriting every commit from commit 5 onward.

Tags: #git #version-control #Troubleshooting