Monorepo Partial Clone Has Stale or Missing Objects

Q: How do I count how many blobs are still missing from my partial clone?

`git rev-list --objects --all --missing=print | grep -c '^?'`. Each object ID prefixed with `?` is missing and would be lazy-fetched on demand. After `git backfill --sparse`, that count should drop to zero for everything inside your sparse cone.

Q: What is `git backfill` and do I need a recent Git for it?

`git backfill` (Git 2.49.0, March 2025) downloads missing blobs in large batches instead of one slow request per file, which also gives better delta compression. With `--sparse` it only fetches blobs that match your sparse-checkout cone. On Git older than 2.49, fall back to `git checkout HEAD -- ` to trigger the lazy fetch. Check your version with `git --version`.

Q: Can I convert a partial clone to a full clone without re-cloning?

Yes. Remove the filter and refetch: `git config --unset remote.origin.partialclonefilter`, then `git fetch --refetch origin` to pull every object the filter previously skipped. For very large repos this can take a long time, so prefer `git backfill` if you only need the blobs for your sparse paths.

Q: Our monorepo has 50,000+ files. Is sparse + blobless actually faster?

On developer machines, yes — sparse checkout keeps `git status` and `git diff` fast because they only scan the cone. For CI, `--filter=blob:none` cuts initial clone time but each job still pays for blob downloads, so a `git backfill --sparse` (or `--filter=tree:0`) up front usually beats death-by-a-thousand lazy fetches mid-build. Benchmark both for your workload.

A blobless monorepo clone shows empty, outdated, or missing files after a pull. Backfill the missing blobs, re-cone the sparse checkout, and stop it recurring.

Published: May 25, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You cloned a large monorepo with git clone --filter=blob:none --sparse to check out only the packages/billing subtree. After a git pull, some files in that subtree are empty, some still show content from three months ago, and git status reports a clean working tree even though your editor is clearly displaying a stale package.json. The cause is almost always lazy fetching: a blobless clone downloads commits and trees but not file contents, and the blobs are only pulled on demand. When a checkout, a CI step, or a flaky network never triggers (or never completes) that fetch, you are left with empty or out-of-date files and no error.

Fastest fix (works in most cases): bulk-download every missing blob for your sparse paths in one batched request, then re-run the checkout.

git fetch origin
git backfill --sparse        # Git 2.49+ (March 2025); pulls all missing blobs for sparse paths
git checkout HEAD -- .       # re-materialize the working tree from the now-complete object store

git backfill (introduced in Git 2.49.0) is the modern, supported way to fetch missing blobs in batches instead of one slow request per file. If your Git is older than 2.49, use the Step 2 fallback below. The rest of this guide diagnoses why objects went missing so you can pick the right fix and prevent a recurrence.

Which bucket are you in?

Symptom	Most likely cause	Jump to
Files empty / old; `git fsck` reports missing blobs	Blobs never lazy-fetched after the pull	Step 2
A whole moved/new directory is absent	Sparse cone does not cover the new path	Step 3
`cat-file -e` fails only in CI, works locally	`GIT_NO_LAZY_FETCH=1` set in the build env	Step 4
`git fetch` errors with a connection failure	Promisor remote URL is stale	Step 5
Random failures right after a large force-push	Server-side filter cache lagging HEAD	Cause 3 (wait it out)

Common causes

Ordered by hit rate, highest first.

1. Blobs were never lazy-fetched after the pull

A --filter=blob:none clone defers blob downloads until something reads the file. A plain git pull brings in new commit and tree objects but does not download the new blobs unless a checkout (or a command that reads file content) triggers the lazy fetch. If that never happens, the file stays at its last-fetched content or empty.

How to spot it: git ls-files --error-unmatch packages/billing/package.json returns without error (Git knows the file), but the blob is absent. Confirm without triggering a fetch:

git --no-lazy-fetch cat-file -e HEAD:packages/billing/package.json && echo present || echo MISSING

--no-lazy-fetch (equivalent to GIT_NO_LAZY_FETCH=1) makes cat-file -e report whether the object is locally available rather than silently fetching it.

2. Sparse-checkout cone does not include an updated directory

The initial checkout set packages/billing, but a refactor moved some code into packages/shared/billing-utils. A cone pattern that still only covers packages/billing never checks out the moved files, so they look “missing.”

How to spot it: git sparse-checkout list shows only packages/billing. Run git log --oneline -- packages/shared/billing-utils — if recent commits appear, the path is active but outside your cone.

3. The server-side partial-filter cache is stale

On large monorepos hosted on GitHub Enterprise or GitLab, the server-side filter cache can briefly lag the true HEAD. A clone or fetch made within a minute or two of a large force-push may see an inconsistent set of filtered objects.

How to spot it: git fsck --connectivity-only reports missing objects immediately after a force-push, and the same fetch succeeds a few minutes later once the cache catches up. git fsck is promisor-aware, so it will not flag expected lazy-fetchable objects — anything it reports here is genuinely unreachable.

4. `GIT_NO_LAZY_FETCH=1` blocks on-demand blob downloads

Some CI environments set GIT_NO_LAZY_FETCH=1 (or git -c ... --no-lazy-fetch) to prevent surprise network calls mid-build. In a partial clone, that variable blocks the lazy resolution that normally happens on first read, leaving files empty or absent.

How to spot it: echo "$GIT_NO_LAZY_FETCH" returns 1 in the build environment, and the failure reproduces only in CI, not on a developer machine.

5. Object database fragmented by concurrent `git gc` and `git fetch`

On a shared CI agent that reuses a partial-clone workspace, a git gc --prune can run alongside a git fetch and delete freshly downloaded blobs before they are referenced in the index.

How to spot it: git fsck reports missing or dangling blobs, and the CI log shows a gc process running in parallel with the fetch.

6. Promisor remote URL changed but the clone config did not

The promisor remote (the source for lazy-fetched blobs) was moved to a new hostname, but the clone’s remote.origin.url still points at the old one. Lazy fetches then fail.

How to spot it: git config remote.origin.url returns the old hostname, and git fetch returns a connection error.

Shortest path to fix

Step 1: Diagnose missing and stale objects

git fsck --connectivity-only 2>&1 | head -20
git rev-list --objects --all --missing=print | grep -c '^?'   # count of missing objects
git status

If git fsck (or the rev-list count) shows missing objects, go to Step 2. If git status is clean but files look wrong, the working tree holds stale content not reflected in the index — go to Step 3.

Step 2: Backfill the missing blobs for your sparse paths

On Git 2.49.0 or newer, this is one batched fetch rather than one request per file:

git fetch origin
git backfill --sparse        # downloads only blobs that match your sparse-checkout cone
git checkout HEAD -- .

git backfill is currently marked experimental but is the supported path; the default batch size is 50,000 objects (--min-batch-size). On older Git (pre-2.49), force the lazy fetch with a targeted checkout instead:

git fetch --filter=blob:none origin
git checkout HEAD -- packages/billing   # reading the files triggers lazy blob resolution

Step 3: Update the sparse-checkout cone to include new paths

git sparse-checkout set --cone \
  packages/billing \
  packages/shared/billing-utils \
  packages/shared/types
git sparse-checkout reapply
git checkout HEAD -- packages/shared/billing-utils

git sparse-checkout reapply re-runs the checkout against the updated cone, materializing files that the new pattern now includes.

Step 4: Unset GIT_NO_LAZY_FETCH if it is blocking fetches

unset GIT_NO_LAZY_FETCH
git checkout HEAD -- packages/billing/package.json

In CI, remove the variable from the environment, or add an explicit blob-prefetch step before the build so no blob has to be fetched mid-build:

git fetch origin
git backfill --sparse        # prefetch all sparse blobs up front, then build

Step 5: Update the promisor remote URL

git remote set-url origin https://new-git-host.example.com/org/monorepo.git
git fetch origin
git backfill --sparse

Step 6: Confirm it is fixed

git fsck --connectivity-only                              # no missing-object lines
git rev-list --objects --all --missing=print | grep -c '^?'   # should be 0 for sparse paths after backfill
git --no-lazy-fetch cat-file -e HEAD:packages/billing/package.json && echo present
git diff HEAD -- packages/billing                         # no spurious diffs
git status                                                # clean

When cat-file -e reports present with --no-lazy-fetch, the blob is genuinely on disk and not just lazy-fetchable. That is the real signal the working tree is now complete.

Prevention

After cloning with --filter=blob:none --sparse, run git backfill --sparse immediately so your target paths are fully materialized before you start work.
In CI, prefer a full-blob policy for the checked-out commit. With actions/checkout@v5 you can set filter: blob:none plus a sparse-checkout list (cone mode is on by default), then add a git backfill --sparse step so the build never waits on a mid-job lazy fetch.
For build agents that do not need blame/log on old content, --filter=tree:0 is often safer than blob:none: it still downloads every blob reachable from the checked-out commit, so there is no per-file lazy fetch to fail.
Pin the cone paths in a checked-in script (scripts/setup-sparse.sh) so every developer and CI job uses the same cone definition.
Never set GIT_NO_LAZY_FETCH=1 in an environment that uses partial clones unless you also prefetch — the lazy fetch is what keeps a partial clone functional.
Add --recurse-submodules --also-filter-submodules to partial-clone commands so nested repos do not end up in their own incomplete state.
After a large force-push to a monorepo’s main branch, give the server-side filter cache a minute or two before triggering fresh clones in CI.

FAQ

Q: How do I count how many blobs are still missing from my partial clone? A: git rev-list --objects --all --missing=print | grep -c '^?'. Each object ID prefixed with ? is missing and would be lazy-fetched on demand. After git backfill --sparse, that count should drop to zero for everything inside your sparse cone.

Q: What is git backfill and do I need a recent Git for it? A: git backfill (Git 2.49.0, March 2025) downloads missing blobs in large batches instead of one slow request per file, which also gives better delta compression. With --sparse it only fetches blobs that match your sparse-checkout cone. On Git older than 2.49, fall back to git checkout HEAD -- <path> to trigger the lazy fetch. Check your version with git --version.

Q: Why does git status show “clean” when files clearly have old content? A: The index matches HEAD, so Git’s metadata layer is internally consistent — but the blob recorded at that commit was never downloaded (or was downloaded at an earlier state). git status compares the index to HEAD, not the on-disk bytes against the remote. Run git log --oneline -3 to confirm HEAD is where you expect, then git backfill --sparse to pull the real content.

Q: Can I convert a partial clone to a full clone without re-cloning? A: Yes. Remove the filter and refetch: git config --unset remote.origin.partialclonefilter, then git fetch --refetch origin to pull every object the filter previously skipped. For very large repos this can take a long time, so prefer git backfill if you only need the blobs for your sparse paths.

Q: Our monorepo has 50,000+ files. Is sparse + blobless actually faster? A: On developer machines, yes — sparse checkout keeps git status and git diff fast because they only scan the cone. For CI, --filter=blob:none cuts initial clone time but each job still pays for blob downloads, so a git backfill --sparse (or --filter=tree:0) up front usually beats death-by-a-thousand lazy fetches mid-build. Benchmark both for your workload.

Tags: #git #version-control #Troubleshooting

Which bucket are you in?

Common causes

1. Blobs were never lazy-fetched after the pull

2. Sparse-checkout cone does not include an updated directory

3. The server-side partial-filter cache is stale

4. GIT_NO_LAZY_FETCH=1 blocks on-demand blob downloads

5. Object database fragmented by concurrent git gc and git fetch

6. Promisor remote URL changed but the clone config did not

Shortest path to fix

Step 1: Diagnose missing and stale objects

Step 2: Backfill the missing blobs for your sparse paths

Step 3: Update the sparse-checkout cone to include new paths

Step 4: Unset GIT_NO_LAZY_FETCH if it is blocking fetches

Step 5: Update the promisor remote URL

Step 6: Confirm it is fixed

Prevention

FAQ

Related

Related Articles

git bisect Stuck on Skipped Commits

Branch Protection Blocks a Legitimate Merge

Cherry-pick Became Empty After Resolving Conflicts

Git Credential Helper Locked: Every Pull and Push Fails

I Committed in Detached HEAD — Now What?

Force Push Overwrote Teammates' Commits

4. `GIT_NO_LAZY_FETCH=1` blocks on-demand blob downloads

5. Object database fragmented by concurrent `git gc` and `git fetch`