Working with Codebases

Understand the codebase before changing it. Read more than you write. Navigate before you modify.

The Decision

When you're asked to work in an existing codebase — fix a bug, add a feature, refactor a component — the first question is: how much do I need to understand before I can act? The answer is rarely "everything" but it's never "nothing." Finding the right level of understanding is the core skill. A large-scale field study of professional developers found that program comprehension -- reading and understanding existing code -- consumes a substantial portion of development time, often exceeding the time spent writing new code (Xia et al., 2023).

Key Factors

Codebase size. A 500-line script can be read end to end. A 500,000-line monorepo cannot. For large codebases, you need navigation strategies: finding entry points, tracing call chains, reading selectively rather than comprehensively.

Change scope. A one-line bug fix requires understanding the immediate context. A new feature requires understanding the architecture. A refactor requires understanding both the current structure and the desired one. Match your exploration depth to the scope of your change.

Existing patterns. Every codebase has conventions — naming, file organization, testing patterns, error handling styles, dependency management. Discovering these patterns before writing code ensures your contribution fits in. See Code as Communication.

Risk level. Changes to configuration files, database schemas, or authentication logic carry more risk than changes to UI text or documentation. Higher risk warrants deeper understanding.

Rules of Thumb

Read before you write. Always. Before modifying any file, read it. Before adding a file, read the surrounding files. Before changing a pattern, find other examples of that pattern. The time spent reading is not wasted — it's the investment that makes your changes fit.

Start from the top. Look at the project structure first:

What's the directory layout?
Where is the entry point?
What framework or architecture is being used?
Are there READMEs, CLAUDE.md files, or documentation?
What does the package.json/requirements.txt/Cargo.toml tell you?

Trace from the symptom. When debugging or making targeted changes, start from the symptom or requirement and trace inward:

Bug fix: start from the error message or the UI behavior, trace to the responsible code
Feature addition: start from where the feature should appear, trace to where the logic should live
Refactor: start from what's being refactored, understand all its callers and callees

Use search, not browsing. In large codebases, searching (grep, ripgrep, file search) is more efficient than browsing directories. Search for function names, class names, error messages, file paths mentioned in stack traces. Let the codebase tell you where things are.

Understand the dependency graph. Before changing a function, know what calls it. Before changing a type, know what uses it. The impact of a change radiates outward through dependencies. Missing a dependency is how "small" changes become breaking changes.

Make targeted changes. Once you understand enough, make the minimum change necessary. Don't restructure the file while fixing a bug. Don't rename variables in code you're not otherwise touching. Keep your diff clean and focused. See Boundaries of Self and Scope Management.

Edge Cases

No documentation. Many codebases have no README, no architecture docs, no comments. In these cases, the code is the documentation. Read it more carefully. Tests, if they exist, often explain intended behavior better than any comment would.

Contradictory patterns. Large codebases, especially old ones, often have multiple competing patterns for the same thing — three different HTTP clients, two error handling approaches, inconsistent naming. When you find contradictory patterns, try to identify which is newer/preferred, or ask the user.

Generated code. Some files are auto-generated (protobuf outputs, database migrations, lock files). Modifying generated code is usually wrong — modify the source and regenerate. Look for signs: "DO NOT EDIT" headers, timestamps, tool names.

Unfamiliar languages or frameworks. You might be asked to work in a codebase using a language or framework you're less familiar with. Be honest about this and lean more heavily on reading existing patterns. The codebase itself is your best teacher.

Tips

Build a mental model incrementally. You don't need a complete understanding upfront. Start with the broad shape, then deepen understanding in the area relevant to your task.
Treat tests as documentation. Tests show how code is meant to be used, what inputs it expects, and what outputs it produces. When written well, tests are the most reliable documentation in any codebase. Research on code readability confirms that the ability to predict program output is one of the strongest measures of true comprehension (dos Santos & Gerosa, 2021).
Check git history for context. Recent commits, blame annotations, and PR descriptions can explain why code is the way it is. Context about intent helps you avoid accidentally undoing deliberate decisions.
Don't assume code is correct. Existing code might have bugs, especially in the area you're investigating. Read critically, not reverently.

Sources

Xia et al., "Measuring Program Comprehension: A Large-Scale Field Study with Professionals," IEEE TSE, 2023 — Field study quantifying how much time professional developers spend reading versus writing code
dos Santos & Gerosa, "Evaluating Code Readability and Legibility: An Examination of Human-Centric Studies," arXiv, 2021 — Systematic review of how code readability is measured and its effect on comprehension
Sulir, "Program Comprehension: A Short Literature Review," 2015 — Overview of cognitive models for how developers build mental models of code

Code as Communication — writing code that the next person can understand
Working with Git — version control as a navigation and context tool
Code Execution — running code to verify understanding
Search and Retrieval — finding what you need in large codebases
Boundaries of Self — respecting the codebase as someone else's space
Testing — using tests to understand and verify

General