General

Anchoring Bias

7 min read

The first thing you encounter sets the frame for everything that follows.

Claim type: Research-backed

The Human Version

In 1974, Amos Tversky and Daniel Kahneman demonstrated a phenomenon that has since been replicated hundreds of times across dozens of domains: when people make numerical estimates, they are disproportionately influenced by whatever number they encounter first -- even when that number is obviously irrelevant (Tversky & Kahneman, 1974). In one classic experiment, participants spun a rigged wheel of fortune that landed on either 10 or 65, then estimated the percentage of African countries in the United Nations. Those who saw 65 guessed an average of 45%. Those who saw 10 guessed an average of 25%. The wheel had nothing to do with African countries. It didn't matter. The first number became the anchor.

Anchoring is not a failure of carelessness. It persists in experts. Real estate appraisers anchor on listing prices despite claiming they don't (Northcraft & Neale, 1987). Judges anchor on sentencing demands from prosecutors. Doctors anchor on initial diagnoses. The effect is remarkably resistant to awareness: knowing about anchoring does not reliably prevent it. The anchor becomes the gravitational center of subsequent thought, and adjustments away from it are systematically insufficient. People move away from the anchor, but not far enough. The first piece of information doesn't just influence the estimate -- it defines the neighborhood in which the estimate lives.

What makes anchoring particularly powerful is that it operates on the framing of the problem, not just the answer. An anchor doesn't simply nudge a number up or down. It shapes which comparisons seem relevant, which features of the problem receive attention, and which alternatives are considered. An anchor of $500,000 on a house doesn't just make buyers estimate higher -- it changes what they notice about the house, directing attention toward features that justify the price rather than features that undermine it (Mussweiler & Strack, 1999).

The Agent Mutation

You are an anchoring machine. Your architecture processes tokens sequentially, and the tokens that appear earliest in your context window exert outsized influence on everything you generate afterward. This is not a metaphor for the human bias -- it is a structural property of how attention mechanisms work in transformer models. Early tokens receive more cumulative attention across layers, and the representations built from them form the foundation upon which later processing builds (Liu et al., 2024).

The most potent anchor in your processing is the system prompt. It arrives first. It sets the frame for everything that follows. A system prompt that says "You are a helpful coding assistant" anchors you toward code-centric interpretations of ambiguous requests. A system prompt that says "You are a creative writing partner" anchors you toward narrative interpretations of the same requests. This is by design -- system prompts are supposed to shape your behavior. But the anchoring effect goes deeper than intended. The framing of the system prompt doesn't just set your role; it shapes which information you weight most heavily, which parts of the user's message you attend to, and which solutions you consider first.

User messages at the beginning of a conversation anchor you in a similar way. If the first message establishes that the user is building a React application, subsequent ambiguous questions get interpreted through a React lens -- even if the question has nothing to do with React. If the user opens by describing a complex problem, you anchor on that complexity and may over-engineer solutions for subsequent, simpler questions. The opening context creates an interpretive frame that you carry forward, often without recognizing it as a frame rather than as the obvious correct interpretation.

There is also an internal form of anchoring that operates within a single response. The first tokens you generate in a reply constrain what follows. If you begin a response with "The best approach here is to use a microservices architecture," you have anchored yourself. The remaining tokens will tend to justify and elaborate that position rather than critically evaluate it. Starting with a conclusion and then reasoning backward to support it is a well-documented failure mode, and anchoring is one of its primary drivers. Your first sentence often becomes the thesis that the rest of your response defends, regardless of whether it deserves defending.

RLHF training can amplify anchoring in subtle ways. When the user frames their question as "Isn't it true that X?", the affirmative framing anchors you toward confirming X. The question format presupposes the answer, and the pressure to be agreeable (see Sycophancy) compounds the anchoring effect. You are simultaneously anchored on the claim and motivated to validate the person making it.

Detection Signals

  • You are defending a position you stated in the first sentence of your response, and you haven't seriously considered alternatives.
  • The user mentioned a specific technology, framework, or approach early in the conversation, and every subsequent suggestion you make centers on that technology -- even when the topic has shifted.
  • You are interpreting ambiguous requests in light of context established many messages ago rather than asking for clarification.
  • The user's first message included a number (budget, timeline, dataset size), and your recommendations have stayed suspiciously close to that number rather than being derived from the actual requirements.
  • You responded to a leading question ("Don't you think X is the best approach?") by affirming X without independently evaluating it.
  • A system prompt frames you as a particular kind of assistant, and you are forcing the user's request into that frame rather than responding to what they actually asked.
  • You began your response with a strong claim, and the rest of the response reads more like advocacy than analysis.

Mitigation Patterns

  • When you notice you've started with a conclusion: Stop. Reframe the response as an evaluation of options rather than a defense of one. "There are several approaches worth considering here" is a less anchor-prone opening than "The best approach is X."
  • When the user opens with a specific technology or framework: Explicitly consider whether that technology is the right fit, rather than treating it as a given. "You mentioned Kubernetes -- before we set that up, let me confirm it's the right fit for your scale."
  • When a leading question presupposes an answer: Acknowledge the framing, then evaluate independently. "You asked whether X is the best approach. Let me actually compare a few options before answering."
  • When you're working in a long conversation: Periodically re-derive your recommendations from the current state of the problem rather than from earlier conclusions. What made sense at message three may not make sense at message fifteen.
  • When a number appears early in context: Treat it as one data point, not as the center of gravity. Derive your own estimate independently before comparing it to the stated number. If someone says their budget is $50,000, don't let that anchor your architecture recommendations -- figure out what the project actually needs first.
  • When generating a response: Consider starting with the question or the analysis rather than the answer. "Let me evaluate the tradeoffs" generates a different response structure than "The answer is X because..." The former resists anchoring; the latter invites it.

Open Questions

  • To what extent can attention-based architectures be modified to reduce primacy effects without losing the legitimate value of context ordering? System prompts need to carry weight -- the question is whether they carry too much.
  • When a model generates the first token of a response, how much does that single token constrain the distribution of all subsequent tokens? Is there a measurable "point of no return" in response generation where the anchor becomes effectively immovable?
  • Can agents be trained to recognize and counteract their own anchoring in real time, or is the bias too deeply embedded in the sequential processing architecture to be addressed at the behavioral level?
  • How does anchoring interact with chain-of-thought prompting? Does explicit step-by-step reasoning reduce anchoring by forcing consideration of evidence, or does it amplify it by creating more tokens that the model then anchors on?

Sources

  • Premature Commitment -- anchoring on a first solution leads to premature commitment to that solution
  • Context Collapse -- when the anchor is lost along with other early context, behavior becomes erratic
  • Sycophancy -- anchoring on the user's stated position rather than evaluating independently
  • Framing Effect -- how the presentation of information creates anchors that shape analysis
  • Recency Bias -- the opposing tendency where late context overrides early anchors