MongoDB Aggregation With $lookup + $group Runs for 30 Seconds

Q: Why is it fast on dev but slow in production?

Dev has 10k docs, so even a `COLLSCAN` or a `NestedLoopJoin` finishes instantly and hides the missing index. At 5M docs the same plan is `O(n x m)`. Always run `explain("executionStats")` against production-scale data, not the dev dataset.

Q: Does `allowDiskUse: true` fix the slowness?

No. It only stops the `QueryExceededMemoryLimitNoDiskUseAllowed` error by spilling intermediate results to disk, which is far slower than RAM. Since MongoDB 6.0 it is on by default anyway (`allowDiskUseByDefault: true`). Treat `usedDisk: true` as a warning sign and shrink the stage with an earlier `$match`/`$project` or `$topN`.

Q: What order should fields go in a compound index?

Equality, Sort, Range (ESR). Put fields you match with `$eq` first, then the field you sort on, then range filters like `$gte`/`$lt` last. For `{ tenant_id: "acme", status: "paid", created_at: { $gte: ... } }` sorted by `created_at`, use `{ tenant_id: 1, status: 1, created_at: -1 }`.

Q: I added the index but `$lookup` still shows `NestedLoopJoin`. Why?

A few usual suspects. The index must be on the **foreign** collection's `foreignField`, not the local one. A collation mismatch between the query/collection and the index disqualifies it. Numeric path components in `localField`/`foreignField` (for example joining on `tags.0`) force the classic engine and skip `EQ_LOOKUP` entirely. And if `from` is a view or a sharded collection, SBE won't run the join, so you won't see `IndexedLoopJoin` either. Confirm the index exists with `db.users.getIndexes()` and re-check the `foreignField` spelling against it.

Q: How do I know the optimizer reordered my pipeline?

Compare your written stages with the `stages` (or `queryPlanner.winningPlan`) array in `explain()`. The aggregation optimizer pulls some `$match`/`$sort` stages forward automatically, but it will not move a `$match` that depends on a field created by `$lookup`/`$group`, so write those in the optimal order yourself.

A MongoDB pipeline with $lookup + $group crawls in production. Read explain('executionStats'), index the join field, push $match first, and read the new EQ_LOOKUP strategy.

Published: May 24, 2026 Updated: Jun 18, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

The new dashboard works on dev with 10k docs. In production with 5M docs the same aggregation takes 30 seconds and pegs CPU on the primary. The fix, fastest first: run explain("executionStats"), then add an index on the $lookup foreign field, move $match to the very first stage, and $project away fields you don’t need before the join. On MongoDB 8.x (8.3.4 is current as of June 2026), the slot-based engine shows the $lookup as an EQ_LOOKUP node with a strategy field — IndexedLoopJoin is what you want; NestedLoopJoin means it is scanning the foreign collection once per input document.

These four changes turn a 30-second dashboard query into a sub-second one in most cases. The rest of this page tells you how to confirm which bucket you are in.

Which bucket are you in

Run the diagnosis below first, then jump to the matching cause.

Symptom in explain output	Most likely cause	Fix
`EQ_LOOKUP` shows `strategy: "NestedLoopJoin"`	No index on the `$lookup` foreign field	Cause 1
First pipeline stage is not `$match` / `$geoNear`	Filtering happens after the join	Cause 2
`usedDisk: true` on a `$group`, or `BSONObjectTooLarge`	Huge intermediate documents from `$push`	Cause 3
`nReturned` is far below `totalDocsExamined`	Index not selective enough	Cause 4
`SORT` stage with `usedDisk: true`	Sort cannot use the index	Cause 5

Common causes

Ordered by hit rate.

1. Missing index on the `$lookup` foreign field

Without an index on the foreignField in the joined collection, MongoDB scans the entire foreign collection for every input document — an O(n) pass becomes O(n x m). If foreignField is _id, the default _id index covers it; any other field needs an explicit index.

How to spot it (MongoDB 8.x): in explain("executionStats"), the $lookup stage appears as EQ_LOOKUP with "strategy": "NestedLoopJoin". A healthy join shows "strategy": "IndexedLoopJoin" and an indexName. The EQ_LOOKUP node is emitted only when the slot-based execution engine (SBE) runs the join. If the pipeline falls back to the classic engine — for example because a $lookup runs a sub-pipeline: against the foreign collection, or the join field uses numeric path components — you will not see EQ_LOOKUP at all; instead look for a COLLSCAN embedded inside $lookup.queryPlanner. Same root cause: no index on the foreign field.

Note on hash joins: as of MongoDB 8.0+, the SBE planner may pick "strategy": "HashJoin" instead of NestedLoopJoin when no usable index exists, allowDiskUse: true is set, and the foreign collection is small enough on all three of these limits — internalQueryCollectionMaxNoOfDocumentsToChooseHashJoin (default 10,000 docs), internalQueryCollectionMaxDataSizeBytesToChooseHashJoin (default 100 MB), and internalQueryCollectionMaxStorageSizeBytesToChooseHashJoin (default 100 MB). HashJoin is fine for small lookup tables, but for anything large you still want a real index so you get IndexedLoopJoin.

2. `$match` placed after `$lookup` or `$group`

The pipeline reads the whole collection, joins everything, then filters. Filter first instead. The aggregation optimizer moves some $match stages earlier automatically, but it cannot reorder a $match that references a field produced by $lookup/$group, so write it in the right order yourself.

How to spot it: the first executed stage of the pipeline is anything other than $match or $geoNear. Compare your written pipeline against the stages array in explain() to see what the optimizer actually ran.

3. Huge intermediate documents

A $group with $push collects all matching docs into an array. If a single group has 500k entries, the resulting document exceeds the 16 MiB BSON limit, or the $group stage trips the 100 MB in-memory cap and starts spilling to disk.

How to spot it: BSONObjectTooLarge, or a $group/$sort stage in explain with usedDisk: true. Note: since MongoDB 6.0 the server parameter allowDiskUseByDefault is true by default, so a heavy stage usually spills to disk and runs slowly rather than throwing QueryExceededMemoryLimitNoDiskUseAllowed. You only see the hard error when an operator runs setParameter allowDiskUseByDefault false or you pass { allowDiskUse: false }. Either way, spilling is a symptom, not a fix — shrink the intermediate instead (see Step 5).

4. Index not selective enough

Index on status only, but the query filters status + tenant_id. The “most selective field” intuition still applies: a compound { tenant_id: 1, status: 1, created_at: -1 } covers the whole predicate.

How to spot it: nReturned is much smaller than totalDocsExamined (the index let too many docs through and the server filtered the rest in memory).

5. Sort in memory because the index does not match

$sort after $match cannot use the index when the leading fields differ. MongoDB sorts in memory and spills past 100 MB.

How to spot it: a SORT stage in explain (not IXSCAN for the sort) with usedDisk: true. A blocking in-memory sort also shows up as totalKeysExamined being far below totalDocsExamined.

Shortest path to fix

Step 1: Read the plan

db.orders.aggregate([
  { $match: { tenant_id: "acme", status: "paid", created_at: { $gte: ISODate("2026-05-01") } } },
  { $lookup: { from: "users", localField: "user_id", foreignField: "_id", as: "user" } },
  { $unwind: "$user" },
  { $group: { _id: "$user.country", revenue: { $sum: "$amount" } } },
  { $sort: { revenue: -1 } },
], { allowDiskUse: true }).explain("executionStats");

Inspect for each stage:

IXSCAN vs COLLSCAN on the base collection.
The $lookup node: on MongoDB 8.x look for EQ_LOOKUP and its strategy (IndexedLoopJoin is good); on the classic engine inspect lookup.queryPlanner.
totalDocsExamined vs nReturned ratio (closer to 1.0 is better).
totalKeysExamined vs totalDocsExamined (a gap means the index isn’t covering the filter).
usedDisk on any $group/$sort stage (true means it spilled).
executionTimeMillisEstimate per stage to find the long pole.

Step 2: Add the right compound index

Rule of thumb (Equality, Sort, Range):

// orders: equality on tenant_id and status, range on created_at
db.orders.createIndex({ tenant_id: 1, status: 1, created_at: -1 });

// users: _id is indexed by default. If localField/foreignField is a different field, index the foreign field
db.users.createIndex({ _id: 1 });

For lookups on non-_id fields, the index goes on the foreign collection’s join field:

db.events.createIndex({ user_id: 1 });
db.users.aggregate([{ $lookup: { from: "events", localField: "_id", foreignField: "user_id", as: "events" } }]);

Re-run the explain: the $lookup should now report "strategy": "IndexedLoopJoin" with the indexName you just created.

Step 3: Push `$match` to the front and `$project` early

Reorder the pipeline so the most selective filter is the first stage, and drop columns before the join so intermediate docs are small.

db.orders.aggregate([
  // 1. Filter aggressively first
  { $match: {
      tenant_id: "acme",
      status: "paid",
      created_at: { $gte: ISODate("2026-05-01"), $lt: ISODate("2026-06-01") }
  } },
  // 2. Project only the fields you need (smaller intermediates)
  { $project: { user_id: 1, amount: 1 } },
  // 3. Lookup against an indexed foreign field, projecting only what you need from it
  { $lookup: { from: "users", localField: "user_id", foreignField: "_id", as: "user", pipeline: [{ $project: { country: 1 } }] } },
  { $unwind: "$user" },
  { $group: { _id: "$user.country", revenue: { $sum: "$amount" } } },
  { $sort: { revenue: -1 } },
]);

The pipeline: inside $lookup (MongoDB 5.0+) lets you $match and $project against the foreign collection, so you carry only the foreign fields you actually use. Much smaller intermediates. One caveat: a $lookup that runs a sub-pipeline: is executed by the classic engine, so the EQ_LOOKUP node disappears from explain even though the index on foreignField is still used. Verify the join with totalKeysExamined > 0 and an IXSCAN inside $lookup.queryPlanner rather than looking for the strategy field.

Step 4: Use `$facet` for parallel branches

If your dashboard wants three independent rollups, do them in a single aggregation with $facet so the input is scanned once.

db.orders.aggregate([
  { $match: { tenant_id: "acme", created_at: { $gte: ISODate("2026-05-01") } } },
  { $facet: {
      byCountry: [ { $group: { _id: "$country", n: { $sum: 1 } } } ],
      byStatus:  [ { $group: { _id: "$status",  n: { $sum: 1 } } } ],
      topUsers:  [ { $group: { _id: "$user_id", n: { $sum: 1 } } }, { $sort: { n: -1 } }, { $limit: 10 } ],
  } },
]);

One pass over the input set. Caveat: each $facet sub-pipeline still has its own 100 MB cap, and $facet itself cannot use an index for its sub-stages, so keep the input set small with the leading $match. If any branch is heavy it will spill to disk.

Step 5: Avoid huge `$push` arrays

Instead of $push followed by $slice, use $topN/$bottomN (MongoDB 5.2+):

{ $group: {
    _id: "$user_id",
    recent: { $topN: { n: 5, sortBy: { created_at: -1 }, output: { id: "$_id", amount: "$amount" } } }
} }

output takes any expression, so pull exactly the fields you need per element. Bounded by n, the group stays small and you cannot blow the 16 MiB document size limit.

Step 6: Confirm it’s fixed

Re-run the explain after each change:

db.orders.aggregate([ /* your stages */ ]).explain("executionStats")

You’re done when all of these hold:

Every base-collection access is IXSCAN, not COLLSCAN.
The $lookup reports "strategy": "IndexedLoopJoin" (or HashJoin for a small foreign table) — never NestedLoopJoin.
totalKeysExamined / nReturned is below 5.
No $group/$sort stage shows usedDisk: true.
executionTimeMillis is under 1 second for dashboard queries.

For a live production check, watch it under load with the database profiler or db.currentOp() rather than trusting a one-off explain.

Prevention

Every $lookup foreign field has an index; verify the join reports IndexedLoopJoin in explain.
Compound indexes follow Equality - Sort - Range order.
Pipelines start with $match (or $geoNear); $project early to shrink intermediates.
Use $facet for parallel rollups; use $topN/$bottomN instead of unbounded $push.
Catch slow queries early with db.setProfilingLevel(1, { slowms: 100 }), then review system.profile (or Atlas Performance Advisor) weekly.

Reference: MongoDB’s own $lookup docs state plainly that a $lookup equality match “will likely have poor performance” without an index on the foreignField, and the aggregation pipeline limits page documents the 100 MB blocking-stage cap and the 16 MiB BSON document limit.

FAQ

Why is it fast on dev but slow in production?

Dev has 10k docs, so even a COLLSCAN or a NestedLoopJoin finishes instantly and hides the missing index. At 5M docs the same plan is O(n x m). Always run explain("executionStats") against production-scale data, not the dev dataset.

My explain shows `HashJoin`, not `IndexedLoopJoin`. Is that a problem?

Not necessarily. Since MongoDB 8.0 the slot-based engine picks HashJoin when there is no usable index, allowDiskUse: true is on, and the foreign collection stays under all three size limits (internalQueryCollectionMaxNoOfDocumentsToChooseHashJoin, default 10,000 docs; plus 100 MB data-size and 100 MB storage-size caps). For small lookup tables it is fast. For a large foreign collection, add an index so you get IndexedLoopJoin instead.

Does `allowDiskUse: true` fix the slowness?

No. It only stops the QueryExceededMemoryLimitNoDiskUseAllowed error by spilling intermediate results to disk, which is far slower than RAM. Since MongoDB 6.0 it is on by default anyway (allowDiskUseByDefault: true). Treat usedDisk: true as a warning sign and shrink the stage with an earlier $match/$project or $topN.

What order should fields go in a compound index?

Equality, Sort, Range (ESR). Put fields you match with $eq first, then the field you sort on, then range filters like $gte/$lt last. For { tenant_id: "acme", status: "paid", created_at: { $gte: ... } } sorted by created_at, use { tenant_id: 1, status: 1, created_at: -1 }.

I added the index but `$lookup` still shows `NestedLoopJoin`. Why?

A few usual suspects. The index must be on the foreign collection’s foreignField, not the local one. A collation mismatch between the query/collection and the index disqualifies it. Numeric path components in localField/foreignField (for example joining on tags.0) force the classic engine and skip EQ_LOOKUP entirely. And if from is a view or a sharded collection, SBE won’t run the join, so you won’t see IndexedLoopJoin either. Confirm the index exists with db.users.getIndexes() and re-check the foreignField spelling against it.

How do I know the optimizer reordered my pipeline?

Compare your written stages with the stages (or queryPlanner.winningPlan) array in explain(). The aggregation optimizer pulls some $match/$sort stages forward automatically, but it will not move a $match that depends on a field created by $lookup/$group, so write those in the optimal order yourself.

Tags: #Backend #Troubleshooting #mongodb

Which bucket are you in

Common causes

1. Missing index on the $lookup foreign field

2. $match placed after $lookup or $group

3. Huge intermediate documents

4. Index not selective enough

5. Sort in memory because the index does not match

Shortest path to fix

Step 1: Read the plan

Step 2: Add the right compound index

Step 3: Push $match to the front and $project early

Step 4: Use $facet for parallel branches

Step 5: Avoid huge $push arrays

Step 6: Confirm it’s fixed

Prevention

FAQ

Why is it fast on dev but slow in production?

My explain shows HashJoin, not IndexedLoopJoin. Is that a problem?

Does allowDiskUse: true fix the slowness?

What order should fields go in a compound index?

I added the index but $lookup still shows NestedLoopJoin. Why?

How do I know the optimizer reordered my pipeline?

Related

Related Articles

Scheduled Cron Job Skipped Silently With No Error Logged

Postgres Migration Stuck on ALTER TABLE in Production

Docker Container Restarts With Exit Code 137 (OOM Killed): Fix It

Fix gRPC DEADLINE_EXCEEDED Errors Under Load

JWT 'jwt expired' on Fresh Tokens: Fix Clock Skew

Kafka Consumer Lag Keeps Growing After Scaling Consumers