Scheduled Cron Job Skipped Silently With No Error Logged
A scheduled job never fired and nothing showed up in logs. Fix by going UTC-only, adding heartbeat metrics, and alerting on missed execution counts.
Firebase permissions, rewrites, Functions, Supabase RLS, CORS, edge functions.
Common Firebase / Supabase errors aren`t complex, but they need to be checked in order: env → auth → config → quota. This hub reduces them to one-screen checklists.
A scheduled job never fired and nothing showed up in logs. Fix by going UTC-only, adding heartbeat metrics, and alerting on missed execution counts.
An ALTER TABLE migration hangs forever on prod. Find the blocker in pg_stat_activity, terminate it, and re-run the migration with a lock_timeout.
Containers restart with exit 137. The OOM killer hit your --memory limit. Find the leak, profile heap, set sensible limits, and stop the bleeding.
gRPC clients return DEADLINE_EXCEEDED when traffic rises. Propagate deadlines, set sensible per-RPC timeouts, and add a retry policy plus circuit breaker.
JWT verification fails intermittently with 'token expired' even on tokens issued seconds ago. Fix the server clock drift with NTP and add JWT leeway.
MongoDB pipelines with $lookup + $group crawl. Use explain('executionStats'), add compound indexes, push $match early, and split with $facet.
RabbitMQ shows healthy consumer connections but the queue keeps growing. Fix prefetch, unacked messages, and dead-letter routing for stuck consumers.
AWS S3 presigned URL works for small files but 403s mid-upload on large ones. Fix with longer TTL, multipart upload, or the SDK upload manager.
You added more consumer pods. Lag is still going up. The bottleneck is almost never "not enough consumers" — it is partition count, poison messages, or commit-offset drift.
Table bloat grows, queries get slower, and pg_stat_progress_vacuum shows nothing running. A single forgotten transaction is holding back the entire vacuum horizon.
One slow resolver triggers rate limiting that cascades to every query sharing the upstream. Fix by adding per-resolver complexity costs, DataLoader batching, and circuit breakers.
DLQ growing in SQS / RabbitMQ / Kafka without bound. Fix by classifying failures, fixing root-cause poison messages, and adding retry-with-backoff.
Postgres throws 'remaining connection slots reserved' under traffic. Fix by sizing the pool, adding PgBouncer, and killing long-idle connections.
Master node went down but no replica gets promoted; cluster stays in fail state. Fix by checking quorum, network partitions, and replica priority settings.
Sign-in flow redirects to localhost or the wrong domain. Configure allowed redirect URIs.
You configured rewrites for SPA / functions but they don't trigger.
Works locally, breaks on Vercel or Render. Diff env vars, runtime version, filesystem case-sensitivity, network, and build process — the systematic checklist.
Stripe, Twilio, SendGrid all return 429 once traffic ramps. Read Retry-After, add exponential back-off, cache GETs, and batch writes — in that order.
S3 / Firebase Storage / Supabase Storage upload 403s — IAM, signed URL, or bucket policy.
Supabase URL or anon key undefined in prod — host env config or prefix mismatch.
Provider says webhook delivered, your endpoint never sees it.