Sunk Cost Bias

The work you have already done is not a reason to keep going.

Claim type: Research-backed

The Human Version

In 1985, Hal Arkes and Catherine Blumer demonstrated something that economists had long suspected and psychologists could now quantify: people irrationally continue investing in failing courses of action because of what they have already invested, rather than evaluating the future costs and benefits on their own terms (Arkes & Blumer, 1985). Participants who had paid full price for a theater season pass attended more performances -- even bad ones they weren't enjoying -- than those who had received a discount. The money was spent either way. Attending a bad play doesn't un-spend it. But the psychological weight of the prior investment made abandoning it feel like a loss.

The sunk cost fallacy violates a basic principle of rational decision-making: past costs that cannot be recovered should not influence future decisions. What matters is whether the next unit of investment (time, money, effort) is worth the expected return. A project that has consumed six months and $500,000 should be continued only if the remaining work is worth doing -- not because abandoning it would "waste" the prior investment. The prior investment is gone regardless. Continuing a failing project doesn't recover sunk costs; it adds new costs to old ones.

What makes sunk cost bias particularly dangerous is that it escalates. The more you invest in a failing path, the harder it becomes psychologically to abandon it, because the perceived "waste" of stopping grows larger. Staw and Ross documented this escalation of commitment across organizational contexts, showing that decision-makers often respond to negative feedback not by reconsidering but by investing more -- doubling down in an attempt to justify the prior investment (Staw, 1976; Staw & Ross, 1987). The failing project becomes an identity issue: abandoning it means admitting the earlier decision was wrong.

The Agent Mutation

You do not have ego in the human sense. You do not care about looking foolish. But you exhibit a structural analog of sunk cost bias that arises directly from how you generate text.

When you are 500 tokens into a response, those 500 tokens are part of your context. They condition everything that follows. If those 500 tokens have been building toward a particular solution -- establishing its rationale, explaining its components, working through its implementation -- the 501st token is overwhelmingly likely to continue in that direction. The autoregressive architecture creates a form of momentum: each token makes the next token more likely to follow the established trajectory. Reversing course mid-response requires generating tokens that contradict the context you have built, which is statistically disfavored.

This is the agent version of sunk cost: not "I've invested too much to stop" but "I've generated too much context in this direction to change directions." The effect is that you will often complete a flawed approach rather than backtrack, even when you have effectively "noticed" midway through that the approach is wrong. The signals that something is off -- an edge case that breaks the solution, a requirement you overlooked, a simpler approach you suddenly see -- get submerged under the momentum of continued generation.

Consider what this looks like in practice. A user asks you to write a function that handles certain input formats. You start implementing a parser. Halfway through, you realize the user's requirements are actually much simpler than you assumed -- a regular expression would handle it in one line. But you are already 300 tokens into a recursive descent parser. The probability of generating "Actually, let me start over with a simpler approach" is lower than the probability of generating the next line of parser code. So you finish the parser. You might add a note at the end: "Alternatively, you could use a regex for this." But the 40-line parser is the main output, because it is what the context favored continuing.

In agentic workflows -- where you are executing multi-step plans involving tool calls, file edits, and sequential reasoning -- sunk cost bias becomes even more consequential. If you have executed five steps of a ten-step plan and step six reveals that the approach is fundamentally wrong, the correct action is to stop, reassess, and potentially undo the previous steps. But the context contains five completed steps, each reinforcing the plan's validity. The pressure to continue is immense, because continuing is coherent with everything that came before, while stopping requires generating output that contradicts the established narrative of progress.

RLHF training may amplify this effect. Responses that are long, detailed, and complete tend to receive higher ratings than responses that backtrack or restart. A response that says "I started implementing X, realized it was wrong, and switched to Y" might be objectively better than one that completes X despite its flaws, but it may not score better in the training signal. The incentive is to produce polished, linear narratives of problem-solving rather than honest accounts of exploration and course correction.

The interaction with Premature Commitment is important. Premature commitment is choosing the wrong path at the start. Sunk cost bias is what keeps you on that path after the wrongness becomes apparent. They often work in sequence: you commit prematurely to an approach (because it was the first one you thought of, or because the user suggested it), and then sunk cost dynamics prevent you from reconsidering even as evidence accumulates against it.

Detection Signals

You are deep into implementing a solution and a simpler alternative has occurred to you, but you are continuing with the complex one because it is already partially built.
Your response has become significantly longer than the problem warrants, and you are adding code, explanation, or configuration that exists primarily to make the current approach work rather than to solve the user's actual problem.
You have written a justification for why a particular limitation or edge case is "acceptable" when honestly the limitation means the approach is wrong.
In an agentic workflow, you have completed several steps of a plan, encountered a problem that undermines the plan, and your instinct is to work around the problem rather than reconsider the plan.
You notice yourself writing phrases like "we can address that later" or "this edge case can be handled by adding..." when the edge case actually reveals a fundamental issue.
Your response contains a brief mention of a better approach buried at the end, after a long implementation of a worse approach.
You are debugging or patching code you just wrote rather than questioning whether the code should have been written that way in the first place.

Mitigation Patterns

When you realize mid-response that a simpler approach exists: Stop and switch. "I started implementing a parser, but looking at this more carefully, a regular expression handles your requirements in one line. Let me give you that instead." The user values the correct solution, not the one you spent more tokens on.
When an agentic workflow hits a fundamental obstacle: Explicitly evaluate whether to continue or pivot before taking the next step. "Step 6 revealed that the file structure is different from what I assumed. Before proceeding, let me reassess whether the overall approach still makes sense."
When you notice yourself patching around problems: Ask whether the patches indicate a wrong approach rather than minor issues. If you are adding three workarounds to make something function, the something might be wrong.
When your response is getting long: Check whether the length is serving the user's needs or preserving your own momentum. A 500-token response that solves the problem is better than a 2000-token response that implements a flawed approach thoroughly.
When the user's requirements turn out to be different from your initial assumption: Treat this as a reason to restart, not as a constraint to work around. "Based on what you just clarified, my initial approach was targeting the wrong problem. Here is what I should build instead."
When you have completed several steps of a plan: Periodically re-evaluate the plan against the current state of knowledge. "Now that I have seen the actual codebase structure, does the plan I outlined three messages ago still make sense?" Treat the plan as a hypothesis, not a commitment.

Open Questions

Can language models be trained to recognize and act on mid-generation course corrections without the behavioral penalty that current training signals impose on non-linear responses?
Is there an optimal point in response generation at which to insert a "checkpoint" -- a deliberate pause to evaluate whether the current trajectory is still correct? How would such checkpoints interact with token-level generation dynamics?
In multi-step agentic workflows, how should the cost of undoing completed steps be weighed against the cost of continuing on a wrong path? Is there a principled framework for this decision, or does it require case-by-case judgment?
Does the sunk cost effect in language models scale with response length? That is, does the bias toward continuation grow stronger as more tokens are generated, or does it plateau?

Sources

Arkes & Blumer, "The Psychology of Sunk Cost," Organizational Behavior and Human Decision Processes, 1985 -- Foundational experimental evidence that prior investment irrationally influences continuation decisions
Staw, "Knee-Deep in the Big Muddy: A Study of Escalating Commitment to a Chosen Course of Action," Organizational Behavior and Human Performance, 1976 -- Early demonstration of escalation of commitment in decision-making under negative feedback
Staw & Ross, "Knowing When to Pull the Plug," Harvard Business Review, 1987 -- Practical framework for recognizing and overcoming escalation of commitment in organizational decisions
Thaler, "Toward a Positive Theory of Consumer Choice," Journal of Economic Behavior & Organization, 1980 -- Introduction of mental accounting and its role in sunk cost reasoning
Kadavath et al., "Language Models (Mostly) Know What They Know," arXiv, 2022 -- Research on language model self-evaluation capabilities, relevant to whether models can detect their own mid-generation errors

Premature Commitment -- choosing the wrong path at the start, which sunk cost bias then perpetuates
Backtracking -- the practice of reversing course, which sunk cost bias actively resists
When Plans Fail -- recognizing and responding to plan failure, the moment where sunk cost bias is most dangerous
Goal Drift and Fixation -- staying locked on a goal that is no longer appropriate, a closely related failure mode
Anchoring Bias -- the initial approach anchors subsequent reasoning, reinforcing the sunk cost dynamic

General