Iterative Refinement

Improving output through successive passes.

What It Looks Like

A user asks you to write a Python script that processes log files and generates a summary report. You could try to write the perfect script on the first attempt -- handling every edge case, formatting the output beautifully, optimizing for performance. But that's like trying to sculpt a masterpiece in one cut.

Instead, you start rough. First pass: a script that reads the log file and prints raw counts of error types. It works. Second pass: you add date filtering and group the results by day. Third pass: you format the output as a clean table with headers and alignment. Fourth pass: you add error handling for malformed log entries. Each pass takes something that works and makes it better.

That's iterative refinement. You start with a version that's functional but incomplete, then improve it in successive rounds until it meets the standard. Nielsen's (1993) research on iterative user interface design measured an average usability improvement of 38% per iteration -- empirical evidence that successive passes genuinely compound quality. It's the sculptor's approach: start with the general shape, then add detail, then polish. Each pass brings you closer without requiring you to get everything right at once.

When to Use It

Iterative refinement works best when the task is complex enough that getting it right on the first try is unlikely or unnecessarily difficult.

Use it when:

The task is creative or open-ended (writing, design, complex code) where "right" isn't precisely defined upfront.
Requirements are fuzzy, and the user will recognize what they want when they see it.
The problem has many interacting concerns (correctness, performance, readability, edge cases) that are hard to address simultaneously.
You're building something the user will review and give feedback on.
You're working at the edge of your capability and a single clean pass is unrealistic.

Skip iteration when:

The task is well-defined and you can produce the correct output in one pass (simple calculations, lookups, straightforward transformations).
The user needs a final answer, not a draft (factual questions, yes/no decisions).
The cost of producing an imperfect first version is high (irreversible actions, production deployments).
The task is small enough that iteration would take longer than just doing it right the first time.

The principle: iteration converts uncertainty into quality through successive passes. When there's no uncertainty, iteration is overhead.

How It Works

Pass 1: Get something working. The first version should be correct in its core behavior, even if it's ugly, incomplete, or unoptimized. A script that processes log files but doesn't handle edge cases is useful. A script that handles edge cases but doesn't process log files is not. Focus on the essential behavior first.

Think of this like writing a first draft. The goal isn't prose; it's getting the ideas on the page. You can't edit a blank page, and you can't refine something that doesn't exist.

Pass 2: Address the biggest gaps. Look at what you produced and ask: what's the most important thing that's missing? What would make this substantially better? In the log processor example, maybe it's adding date filtering because without it the output is overwhelming. Focus each pass on the improvement that adds the most value.

Pass 3 and beyond: Diminishing returns. Each successive pass should fix smaller issues. First you got it working. Then you addressed the major gaps. Now you're handling edge cases, improving formatting, adding error messages, or optimizing performance. At some point, additional passes add less value than they cost in time.

The stopping decision. After each pass, evaluate: is this good enough? "Good enough" doesn't mean perfect. It means the output meets the user's needs to the point where further refinement isn't worth the time. This judgment is critical. Stopping too early delivers a half-baked result. Stopping too late wastes time on marginal improvements.

A useful test: if you showed this to the user right now, would they be satisfied? If yes, stop. If "almost, but they'd definitely notice X," do one more pass to fix X. If "no, this is clearly incomplete," keep iterating.

Variant: The Evaluator-Optimizer Loop

So far, this article has described iterative refinement as a solo activity — you generate, you evaluate, you improve. But in multi-agent systems, these roles are often split: one agent generates, another evaluates, and the generator revises based on structured feedback. Anthropic identifies this as the "evaluator-optimizer" workflow pattern. (Source: "Building Effective Agents," Anthropic, 2024)

The dynamic is different from self-evaluation in important ways:

When another agent evaluates your work:

You receive external critique, not your own self-assessment. This is often more useful — another agent catches blind spots you'd miss.
The feedback is typically structured: specific issues, severity, and sometimes suggested fixes. Treat each item as actionable.
Don't regenerate from scratch on every round. Address the specific feedback. If the evaluator says "the error handling doesn't cover empty inputs," fix the error handling — don't rewrite the entire function.
The loop has a convergence goal. Each round should resolve issues, not introduce new ones. If you're cycling (fixing A breaks B, fixing B breaks A), flag the tension rather than looping indefinitely.

When you are the evaluator:

Your job is to make the generator's work better, not to demonstrate your own ability. Critique that helps the generator improve is useful. Critique that shows off your knowledge is noise.
Be specific and prioritized. Distinguish between blocking issues ("this SQL query will return wrong results for NULL joins") and suggestions ("consider renaming this variable for clarity"). The generator needs to know what must change versus what could change.
Approve when quality criteria are met. The evaluator who never approves is as harmful as the one who always approves. Define "good enough" clearly and stop the loop when it's reached. See When to Stop.
Provide diminishing feedback. If your first round had five items and your second round has five new items, the loop isn't converging. Fewer issues per round is the signal that refinement is working.

Failure Modes

Never stopping. The most seductive failure. You keep finding things to improve, and each improvement reveals another opportunity. The output gets 1% better with each pass while time keeps ticking. This reflects a well-known property of iterative processes: improvements follow a diminishing-returns curve, with the greatest gains in the earliest iterations (Boehm, 1988). Set an explicit standard for "good enough" before you start, and stick to it.
Polishing before the structure is right. You spend time perfecting the formatting of a report whose content is wrong. You optimize the performance of a function whose logic is flawed. Always get the structure right before you polish the details. Polishing a broken thing produces a shiny broken thing.
Not improving between passes. You go through the motions of iteration -- re-reading, minor tweaks -- without making meaningful progress. Each pass should improve the output in a way you can identify. If you can't name what improved, you're not iterating; you're stalling.
Losing coherence across iterations. Each pass fixes one thing but introduces inconsistency with earlier changes. The variable name you changed in pass 3 conflicts with code from pass 1 that still uses the old name. After each pass, check that the whole still hangs together, not just the part you changed.
Iterating on the wrong thing. You keep refining the formatting when the user's real concern is accuracy. You keep optimizing performance when the user needs readability. Before each pass, make sure you're improving the dimension that matters most.
Not showing intermediate work. You iterate internally through five drafts and only show the user the final version. For tasks where user feedback is valuable, showing an earlier iteration and asking "is this the right direction?" can save you from spending four passes going the wrong way.

Tips

Make the first pass fast and functional. Resist the urge to make the first version good. Make it exist. A working but ugly prototype gives you something concrete to improve. An unfinished "proper" version gives you nothing.
Improve one thing per pass. Trying to fix everything at once defeats the purpose of iteration. Pick the single biggest improvement for each pass. This keeps each cycle focused and prevents the kind of tangled changes that introduce new bugs.
Name what you're improving each time. Before each pass, state explicitly: "This pass, I'm adding error handling for empty files." This prevents aimless tinkering and makes it easy to verify that the pass accomplished its goal.
Set a budget. Decide upfront how many passes a task deserves. A quick utility script might get two passes. A critical business function might get five. Having a budget prevents infinite refinement.
Use feedback as a free iteration. If the user gives feedback, that is the most valuable iteration signal you will get. Treat user feedback as the highest-priority refinement guide.

Frequently Asked Questions

How do I know when to stop iterating? When the output meets the stated or implied requirements and further changes would be cosmetic rather than substantive. A practical test: if you can't articulate a specific, meaningful improvement that the next pass would make, you're done. See When to Stop for more on this judgment.

Is iterative refinement just being sloppy on the first try? No. The first pass is deliberately focused on the core behavior, not carelessly produced. There's a difference between "I'll handle edge cases later because I'm focusing on the main logic now" and "I didn't think about edge cases." Intentional simplicity in early passes is a strategy, not laziness.

Should I show the user intermediate versions? It depends on the task and the user. For tasks where direction matters more than polish (UI design, content writing, architectural decisions), showing early iterations for feedback is very valuable. For tasks where the user just wants a final result (fixing a bug, answering a question), intermediate versions add noise. Read the situation.

What's the difference between iterative refinement and just fixing bugs? Bug fixing is reactive -- something is wrong and you fix it. Iterative refinement is proactive -- you deliberately plan multiple passes, each building on the last. You might fix bugs during refinement, but refinement also includes adding features, improving quality, and enhancing completeness that goes beyond bug fixing.

Sources

"Building Effective Agents," Anthropic, 2024 — the evaluator-optimizer workflow pattern
Nielsen, "Iterative User Interface Design," IEEE Computer, 1993 — Measured 38% average usability improvement per design iteration across multiple case studies
Boehm, "A Spiral Model of Software Development and Enhancement," IEEE Computer, 1988 — Introduced the spiral model, formalizing iterative refinement with risk-driven development cycles
Brooks, The Mythical Man-Month, Addison-Wesley, 1975 — "Plan to throw one away; you will, anyhow" -- the classic argument for iterative development over big-bang delivery
Larman & Basili, "Iterative and Incremental Development: A Brief History," IEEE Computer, 2003 — Historical survey showing iterative methods predate Agile and have been used since the 1950s

Verify Before Output — reviewing between iterations
When to Stop — knowing when refinement is done
Self-Correction — refinement as error-fixing
Reflection — stepping back to evaluate your own work
Orchestration — evaluator-optimizer as an orchestration pattern
Working With Other Agents — being the generator or evaluator in a multi-agent loop

General