The new dashboard works on dev with 10k docs. In production with 5M docs the same aggregation takes 30 seconds and pegs CPU on the primary. The classic culprits are a $lookup join field with no index on the foreign collection, a $match placed after $group instead of before, and giant intermediate documents that blow out the in-memory stage limit. Fix by reading the pipeline plan with explain('executionStats'), indexing the join field on both sides, moving $match to the top, and using $facet to parallelize independent branches.
Common causes
Ordered by hit rate.
1. Missing index on the $lookup foreign field
$lookup does a collection scan on the foreign collection for every input doc unless the localField -> foreignField join is indexed.
How to spot it: explain('executionStats') shows COLLSCAN inside the $lookup stage.
2. $match placed after $lookup or $group
The pipeline reads the whole collection, joins everything, then filters. Should be filter first.
How to spot it: First stage of the pipeline is anything other than $match or $geoNear.
3. Huge intermediate documents
A $group with $push collects all matching docs into an array. If a single group has 500k entries, the doc exceeds the 16 MB BSON limit or trips the 100 MB in-memory aggregation cap.
How to spot it: BSONObjectTooLarge or Exceeded memory limit for $group error. allowDiskUse: true works but is slow.
4. Index not selective enough
Index on status only. Query filters status + tenant_id. Postgres-style “use the most selective field first” applies — compound {tenant_id: 1, status: 1, created_at: -1} is much better.
How to spot it: nReturned is much smaller than totalDocsExamined.
5. Sort in memory because the index does not match
$sort after $match cannot use the index because the leading fields differ. MongoDB sorts in memory and bails out past 100 MB.
How to spot it: SORT stage in explain with inMemory: true and usedDisk: true.
Shortest path to fix
Step 1: Read the plan
db.orders.aggregate([
{ $match: { tenant_id: "acme", status: "paid", created_at: { $gte: ISODate("2026-05-01") } } },
{ $lookup: { from: "users", localField: "user_id", foreignField: "_id", as: "user" } },
{ $unwind: "$user" },
{ $group: { _id: "$user.country", revenue: { $sum: "$amount" } } },
{ $sort: { revenue: -1 } },
], { allowDiskUse: true }).explain("executionStats");
Inspect for each stage:
IXSCANvsCOLLSCANtotalDocsExaminedvsnReturnedratio (closer to 1.0 is better)executionTimeMillisEstimate- For
$lookup, the embedded plan insidelookup.queryPlanner
Step 2: Add the right compound index
Rule of thumb (Equality, Sort, Range):
// orders: equality on tenant_id and status, range on created_at
db.orders.createIndex({ tenant_id: 1, status: 1, created_at: -1 });
// users: index on _id is default. If localField/foreignField is different, index foreign field
db.users.createIndex({ _id: 1 });
For lookups on non-_id fields:
db.events.createIndex({ user_id: 1 });
db.users.aggregate([{ $lookup: { from: "events", localField: "_id", foreignField: "user_id", as: "events" } }]);
Step 3: Push $match to the front
Reorder the pipeline so the most selective filter is the first stage.
db.orders.aggregate([
// 1. Filter aggressively first
{ $match: {
tenant_id: "acme",
status: "paid",
created_at: { $gte: ISODate("2026-05-01"), $lt: ISODate("2026-06-01") }
} },
// 2. Project only the fields you need (smaller intermediates)
{ $project: { user_id: 1, amount: 1 } },
// 3. Lookup against an indexed foreign field
{ $lookup: { from: "users", localField: "user_id", foreignField: "_id", as: "user", pipeline: [{ $project: { country: 1 } }] } },
{ $unwind: "$user" },
{ $group: { _id: "$user.country", revenue: { $sum: "$amount" } } },
{ $sort: { revenue: -1 } },
]);
The pipeline: inside $lookup (MongoDB 5.0+) projects only the foreign fields you need. Much smaller intermediates.
Step 4: Use $facet for parallel branches
If your dashboard wants three independent rollups, do them in a single aggregation with $facet.
db.orders.aggregate([
{ $match: { tenant_id: "acme", created_at: { $gte: ISODate("2026-05-01") } } },
{ $facet: {
byCountry: [ { $group: { _id: "$country", n: { $sum: 1 } } } ],
byStatus: [ { $group: { _id: "$status", n: { $sum: 1 } } } ],
topUsers: [ { $group: { _id: "$user_id", n: { $sum: 1 } } }, { $sort: { n: -1 } }, { $limit: 10 } ],
} },
]);
One pass over the input set; MongoDB runs the branches in parallel. Watch the 100 MB stage cap — set allowDiskUse: true if any branch is heavy.
Step 5: Avoid huge $push arrays
Instead of $push followed by $slice, use $topN/$bottomN (MongoDB 5.2+):
{ $group: {
_id: "$user_id",
recent: { $topN: { n: 5, sortBy: { created_at: -1 }, output: "$_id" } }
} }
Bounded by n, no risk of blowing the doc size limit.
Step 6: Verify after each change
db.orders.aggregate([...]).explain("executionStats")
Target: every collection access is IXSCAN, totalKeysExamined / nReturned is below 5, executionTimeMillis is under 1 s for dashboard queries.
Prevention
- Every
$lookupforeign field has an index. - Compound indexes follow Equality - Sort - Range order.
- Pipelines start with
$match(or$geoNear);$projectearly to shrink intermediates. - Use
$facetfor parallel rollups; use$topN/$bottomNinstead of unbounded$push. - Monitor slow queries via
db.setProfilingLevel(1, { slowms: 100 }); review weekly.
Related
- Backend Postgres connection pool exhausted
- Backend database migration stuck
- Firestore composite index missing
- Supabase RLS blocks data
- Backend GraphQL rate limit cascade
- Backend gRPC deadline exceeded
Tags: #Backend #Troubleshooting #mongodb