Prevent Unsafe AI Edits: Guardrails That Actually Work

AI sometimes touches what it shouldn't. Guardrails that actually work.

What this covers

The first time an AI agent edits prod.config.ts, deletes an .env.local, or “tidies up” a migration file, you discover that polite instructions don’t generalize. This guide is the set of guardrails that actually hold: written rules in repo, filesystem-level deny-lists, branch protection, and the workflow shape that makes unsafe edits visible before they land.

Who this is for

Anyone giving Cursor, Claude Code, Codex, or another agentic tool write access to a real repository - solo devs, small teams, and especially anyone whose repo touches secrets, infra, or production data. If your only current guardrail is “I asked the agent nicely,” this is for you.

When to reach for it

Before letting an agent run autonomously on a real repo for the first time. Also after any incident where the agent touched something it shouldn’t have - that’s the moment your rules will actually get written down.

Before you start

  • Audit what’s already in the repo that’s sensitive: .env*, credentials, infra config, migrations, deployment scripts. List them.
  • Confirm .gitignore already excludes secrets - because pre-commit hooks check git, not the filesystem.
  • Install at least one secret scanner (gitleaks, detect-secrets, GitHub secret scanning). Pre-commit hook preferred.
  • Pick which branch is “human-only” (usually main). Plan branch protection to enforce review on it.

Step by step

  1. List do-not-touch paths in CLAUDE.md / AGENTS.md at repo root. Be specific: infra/, migrations/, .env*, secrets/, **/credentials.json. Vague rules (“be careful with config”) don’t generalize.
  2. Use .gitignore-style ignores or agent-native ignore files (.cursorignore, .claudeignore, etc.) so the agent can’t even read the files. Out of context = can’t edit.
  3. Require human review for critical files via CODEOWNERS plus branch protection on main. The agent can propose, you approve.
  4. Run agents on a clean tree (no uncommitted local edits) so any change shown in git status post-run is the agent’s, not yours mixed in.
  5. Use pre-commit hooks (gitleaks, pre-commit framework) - the last line of defense if the agent and you both miss a secret.
  6. Enable branch protection on main: require PR, require status checks, disallow force push, require at least one human review. The agent can never push directly even if it wanted to.

A concrete CLAUDE.md block

## Files the agent must NOT edit
- .env, .env.*, .env.local, .env.production
- infra/**, terraform/**, k8s/**
- migrations/**
- src/lib/secrets/**
- prisma/schema.prisma (read OK, edit requires human PR)
- package-lock.json (only `npm install` may modify)
- VERSION, CHANGELOG.md (humans only)

## Operations forbidden
- git push --force, git push --force-with-lease
- git rebase on any branch already pushed to origin
- npm publish, pnpm publish, yarn publish
- any rm -rf in the project root

Concrete paths matter. Vague guidance becomes vibes.

Why “ask nicely” fails

  • Agents reset between sessions; verbal rules don’t persist.
  • Long context degrades; rules near the top get echoed, rules in the middle get forgotten.
  • Different agents (Cursor / Claude Code / Codex) honor different conventions.
  • Even with rules, an agent will rationalize edge cases (“I had to touch this to fix the bug”).
  • Rules without enforcement (hooks, branch protection) are suggestions, not guardrails.

guardrails (written rules + ignore files + hooks + branch protection) -> run agent on a clean tree -> review diff -> human commits or approves PR -> push. Each layer is cheap; together they catch what the previous layer missed.

FAQ

  • Do I really need branch protection if I’m solo? - Yes. It also stops you from accidentally pushing in a panic at 2am.
  • What if I want the agent to edit a normally-protected file? - Temporarily remove the rule, do the work, restore the rule, then commit both as separate steps. Don’t soften the rule “just for this PR.”
  • Are pre-commit hooks worth the friction? - Once you’ve shipped a .env once, yes. Install gitleaks and forget it.
  • What about agents that can shell out? - Use a sandbox / container, restrict the shell tool to allow-listed commands. Never let an agent run sudo or production deploy commands.
  • CODEOWNERS for AI projects? - Yes; mark infra/, migrations/, and security-sensitive files as requiring a human reviewer.
  • What if the agent edits a protected file anyway? - Pre-commit hook should block; if it lands, git revert the commit and add the path to the ignore file. Don’t blame the agent twice.

Common mistakes

  • Trusting verbal “don’t touch X” - rules need to be written in repo, or they decay within one session.
  • No CLAUDE.md / AGENTS.md - the agent invents conventions per session.
  • Running agents on a dirty tree - your edits get mixed with the agent’s, review becomes impossible.
  • Letting the agent push directly - removes the only human checkpoint.
  • Skipping pre-commit hooks - “I’ll catch it in review” eventually fails.
  • Vague rules (“be careful”) - they generalize unpredictably; concrete paths only.

Tags: #AI coding #Tutorial