Joining a new codebase used to mean a week of clicking through files, two weeks of pairing, and a month before you could land a non-trivial PR. With an agent that can read the repo, the same orientation collapses into roughly one focused day — if you ask the right questions in the right order and write down what you learn. This is the day-one ritual for any developer joining a new project or returning to an old one after months away.
What this tutorial solves
Most new-repo confusion is not lack of code reading skill — it is not knowing which 50 files (of 5,000) actually matter. AI agents with repo access can answer that triage question in minutes. This guide gives you the exact five-question sequence, the format for capturing answers, and the verification steps so you do not ship a tour built entirely on AI hallucinations.
Who this is for
Developers joining a new project, ramping back into an old project after a leave, or auditing a codebase before signing on as a contractor. Also useful for staff and senior engineers asked to “go take a look at the X service” without prior context. Less useful for tiny repos (under 5,000 lines) where you can just read everything in an afternoon.
When to reach for it
Day 1 of a new job or new team rotation. Also: when revisiting your own project after 3+ months away, when picking up an abandoned internal tool, or when a code-review request comes in for a repo you have never touched. Run this before you write a single line of new code.
When this is NOT the right tool
Codebases with no documentation AND no agent access (you cannot upload, you cannot share, you can only paste 2,000 lines at a time) — the workflow takes longer than just reading the code. Also skip for proprietary code you genuinely cannot share with any AI, even self-hosted; in that case use a locally hosted model or a paired human walkthrough instead.
Before you start
- Decide which AI you will use: Claude Code in a checked-out repo, Cursor with its indexer warmed up, a GitHub-connected ChatGPT, or Gemini Code Assist. Pick one — switching mid-tour costs context.
- Have the repo checked out locally and the dev environment runnable. You will want to verify claims by running things.
- Create an empty
CODEBASE-TOUR.mdin your scratch space. This is your output artifact. - Block 2-3 uninterrupted hours. The tour is finished when the doc is done, not when the timer runs out.
Step by step
- Connect the AI to the codebase. Claude Code launched in the repo root, Cursor with the index fully built, or a GitHub connector pointed at the branch. Plain chat without access is much slower and you cannot verify the answers as easily.
- Question 1 — Important files. “What are the 5 most important files in this codebase? For each, one sentence on why. Cite the file path.”
- Question 2 — Request lifecycle. “Walk me through the request lifecycle, from URL or entry point to response. Cite
file:linefor each hop.” For non-web codebases, swap “request lifecycle” for “primary data flow” or “the path from CLI invocation to output”. - Question 3 — Testing. “How is testing organized? What is the test command for one file vs. all files? Which directories or modules have weak coverage based on test file ratio?”
- Question 4 — Unwritten conventions. “What conventions are unwritten? Look for patterns repeated three or more times that are not in any README or CONTRIBUTING. Examples: naming, error handling, logging, transaction scope, file layout.”
- Question 5 — Dragons. “Where are the dragons? Files that look risky to change. Look for old TODOs, comments with
HACK,FIXME, orDO NOT TOUCH, deeply nested conditionals, and modules with high git churn.” - Verify each answer. Open each cited file at the cited line. If the AI claims
auth.ts:42is the entry point andauth.tsis 200 lines but does not contain auth, the agent guessed — re-run with a more specific prompt. - Write up your tour. Put answers into
CODEBASE-TOUR.mdin your own words with the verifiedfile:linecites. This becomes your first PR — and proof you understood.
First-run exercise
- Pick a single subsystem (auth, billing, search, one job worker) rather than the whole repo.
- Run the five questions scoped to that subsystem only.
- Verify every cite by clicking through. Note which ones the AI got right, wrong, or hallucinated.
- Change one variable for the second pass — usually the agent (Claude Code vs. Cursor) — and see whether the failure modes change. They will.
Quality check
- For every cited path, does the file actually contain what the AI said? Wrong paths and stale line numbers are the most common failure.
- Can you run the test command the AI gave you and have it pass green? If not, the AI guessed the command — find the actual one in
package.json,Makefile, or CI config. - Does the “dragons” answer match what existing team members say? Ask the most senior person on the team to spot-check that one section.
- Did the AI describe code that does not exist (a function that “handles” something, that you cannot find)? Re-prompt with “quote the exact code” — hallucinated functions vanish under that constraint.
How to reuse this workflow
- Save your five prompts as a
codebase-tour.prompts.mdfile you carry between jobs. - After three tours you will know which prompts give weak answers in your stack and rewrite them — for example, “request lifecycle” is wrong for a CLI tool; use “command dispatch path” instead.
- Re-run a mini-tour (3 questions only) after any major refactor in a repo you already know. Codebases drift.
Recommended workflow
Day 1 at a new job: 2-3 hour focused session, agent connected to the repo, five questions in sequence, output a CODEBASE-TOUR.md with verified file:line cites and your own annotations. Open a PR adding it to docs/ or your personal wiki. Day 2 has roughly 70% less “where do I start” friction, and you have a reusable artifact for the next person.
Common mistakes
- Asking high-level questions like “explain everything” — you get vague summaries that read well but help nothing.
- Trusting AI claims without clicking through to verify
file:line. Agents hallucinate line numbers more often than they admit. - Skipping the “dragons” question because it feels rude on day one. These are the bugs you will trip on in month two.
- Treating the tour as a finished product. Real understanding only comes from changing code; the tour is the map, not the territory.
- Letting the agent invent module names. If a path does not exist, say so and re-prompt; do not paper over it.
- Running the tour on a stale branch. Pull
mainfirst; otherwise you are touring last quarter’s code.
Advanced tips
- For each “important file”, read the first 200 lines yourself the same day. AI hint plus human read equals real understanding.
- After the tour, pick the smallest viable contribution — a typo fix, a doc update, a missing test — and ship it. Validates your dev environment and creates a paper trail of competence.
- Re-run the full tour after major refactors or any time the team adds a new core dependency. The architecture diagram in the README is almost certainly out of date.
- Pair-program the tour with a teammate on day two. They will correct anything the AI got wrong, and you will look prepared, not lost.
FAQ
- Cursor or Claude Code?: Cursor for IDE-native exploration where you want to jump to definition mid-conversation. Claude Code for agentic multi-step questions where the agent does its own file reads. Both work; pick the one you already pay for.
- What about NotebookLM?: Good if the codebase has substantial written design docs you can upload. For raw source code, agents with direct repo access are better because they can navigate imports.
- How big a codebase before this stops working?: Above roughly 500k lines or 10k files, the agent’s context window starts to bite. Scope the tour to one subsystem at a time.
- Should I share the tour doc with the team?: Yes, after a teammate spot-checks the dragons section. Onboarding docs from outsiders often catch unwritten conventions long-timers cannot see.
- What if the codebase has no tests at all?: That itself is the answer to question 3. Note it in the tour as a risk, and consider whether you want to land here at all.
- Can I do this for a private repo without uploading code?: Yes, with self-hosted models or Claude Code running locally. Verify your AI provider’s data handling before pasting any proprietary code.
Related
- Claude Code project setup
- Cursor beginner guide
- Codex beginner guide
- AI coding context management
- AI dependency upgrade workflow
Tags: #AI coding #Tutorial #Workflow