Agent Rollback Was Incomplete

Code is reverted, but the agent left behind: stale `dist/`, an applied migration, dangling feature flags, mutated state in third-party tools. Roll back by domain, not by `git reset`.

You asked the agent to undo the last hour’s work. It ran git reset --hard HEAD~3 and reported “rolled back.” The code is back. But: the database has an extra column from the migration that ran 20 minutes ago, the dist/ folder still has the old bundle which Vercel deployed, your Stripe webhook URL was updated in the dashboard and now points at a path that no longer exists, and the BILLING_V2_ENABLED feature flag is still on in PostHog.

“Rollback” only undoes what’s in git. Everything else — generated artifacts, database state, third-party configuration, environment variables, feature flags, caches — has its own lifecycle. A complete rollback is a multi-domain checklist, not a git command. Below: the six domains agents typically leave dirty, plus the prompts that catch them.

Common causes

Ordered by hit rate, highest first.

1. The agent only reverted git-tracked code

git reset --hard reverts code. Doesn’t revert anything else. Agent confidently says “rolled back” because its concept of state is the repo, not the system.

How to spot it: After the rollback, the code on disk matches the target commit, but the running app behaves like the new code. Outputs/state somewhere ≠ code.

2. A database migration was applied and isn’t idempotent

The new feature added a column. The agent ran the migration. Now the code is rolled back but the column still exists, and the old code may break trying to read/write a schema it doesn’t know about.

How to spot it: pnpm db:status or prisma migrate status shows the migration as applied. Or your model definitions in the old code don’t match the live DB schema.

3. Built artifacts in dist/, .next/, .svelte-kit/ weren’t rebuilt

Code is back to the old version, but the dev server is serving cached compiled output. Browser reload shows the new behavior. Worse, your deploy pipeline pushed dist/ to a CDN that’s still serving it.

How to spot it: Code says X, browser shows Y. ls -la dist/ shows files newer than the last code commit.

4. Third-party configuration was changed and not reverted

The new feature required updating webhook URLs in Stripe, redirect URIs in Google OAuth console, DNS records in Cloudflare. These live outside your repo entirely. Agent had no concept of them.

How to spot it: External integrations break post-rollback because they’re calling URLs that no longer exist in the rolled-back code.

5. Environment variables / feature flags still set

NEW_FEATURE_ENABLED=true is still in your .env.production or PostHog/LaunchDarkly. The new code is gone, so reading the flag has no effect — or the old code reads it and crashes because it doesn’t expect that flag at all.

How to spot it: vercel env ls (or your platform’s equivalent) shows variables that don’t appear in the rolled-back .env.example.

6. Cache (CDN, Redis, browser) still has new responses

Cloudflare cached the new API responses for 1 hour. Redis cached the new computed values for 24 hours. Even with code rolled back, users get the new data until the cache expires.

How to spot it: Hard-refresh / different browser / curl shows old behavior, but normal browser still shows new. Cache layer is the difference.

Shortest path to fix

Ordered by ROI. Step 1 is the framework; steps 2-6 are domain-specific recoveries.

Step 1: Enumerate every domain the original change touched

Before any rollback, ask the agent (or yourself) for a “blast radius” inventory:

For the last 3 commits, list every domain affected:
1. Git-tracked code: which files
2. Built artifacts: which dist/build dirs
3. Database: which migrations applied, what schema changes
4. Environment variables: which added/changed in which env
5. Feature flags: which created/flipped in which tool
6. Third-party config: which Stripe/OAuth/DNS/etc. changes
7. Caches: which CDN/Redis/browser caches may hold new state

For each, list the exact rollback action and its inverse command.

You’ll see immediately which domains need attention beyond git revert.

Step 2: Revert the code with git revert, not git reset --hard

git reset --hard is destructive — others on the branch lose work. Use revert which creates new commits:

# For the last 3 commits, in reverse order
git revert HEAD~2..HEAD --no-edit
git push origin HEAD

Then deploy the reverted code so production matches.

Step 3: Roll back the database migration

If the migration was non-destructive (only added columns/tables), the old code may still work — leave the schema alone, just stop using it. If it was destructive (renamed/dropped columns), you need an explicit down migration:

# Prisma
pnpm prisma migrate resolve --rolled-back <migration_name>
# then create a reverse migration
pnpm prisma migrate dev --name revert_<original_name>

# Drizzle / raw SQL
pnpm db:rollback   # if your stack has it
# else apply a hand-written undo migration

If you have backups, restoring from a backup taken before the migration is sometimes simpler than a hand-rolled down migration.

Step 4: Rebuild and redeploy fresh artifacts

# Local
rm -rf dist .next .svelte-kit node_modules/.cache
pnpm install
pnpm build

# Push
vercel deploy --prod
# or trigger your CI redeploy

# Invalidate CDN
curl -X POST "https://api.cloudflare.com/client/v4/zones/<zone>/purge_cache" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -d '{"purge_everything":true}'

Don’t trust “the deploy will pick up the revert” — force a clean rebuild + cache purge.

Step 5: Revert third-party config and env vars

Walk each external system. Have a checklist (or build one):

- Stripe webhook URL: https://app.example.com/api/webhooks/billing (was: /api/v2/webhook)
- Google OAuth redirect: https://app.example.com/auth/callback
- DNS records modified: api.example.com CNAME
- PostHog flag BILLING_V2_ENABLED: turn off
- Vercel env vars added: BILLING_V2_API_KEY (remove)
- Cron schedules: hourly billing-sweep (deactivate)

These manual reverts are the ones agents miss most. If your team uses Terraform/Pulumi for any of this, git revert + terraform apply handles it; otherwise it’s clicking through dashboards.

Step 6: Force cache expiry where it matters

# Cloudflare full purge
wrangler cache purge --zone <zone-id>

# Redis specific keys (or full flush in dev)
redis-cli --scan --pattern "billing:v2:*" | xargs redis-cli del

# Browser: tell users to hard-refresh, or bump asset version in HTML

For user-facing caches, often the right move is “let it expire naturally” if TTL is short (minutes). For longer TTLs, force-invalidate.

Prevention

  • Before any risky multi-domain change, take a “rollback inventory” — snapshot of every domain that may be touched
  • Make migrations idempotent and reversible — every up has a matching down you’ve tested
  • Use feature flags for risky launches so you can disable in seconds without rolling back code
  • Manage third-party config in code (Terraform, Pulumi) so revert is also a git operation
  • Document “rollback runbook” per feature — agents and humans both follow it
  • Don’t trust git reset --hard "rolled back" as a complete answer — always re-verify each domain

Tags: #Troubleshooting #Claude Code #Debug #Rollback