General

Recency Bias

7 min read

The last thing you read is not the most important thing you read.

Claim type: Research-backed

The Human Version

Recency bias -- the tendency to overweight recently encountered information -- is one of the most reliably demonstrated effects in memory research. In serial position experiments dating back to the 1960s, participants consistently recall the last items in a list better than items in the middle, a phenomenon known as the recency effect (Murdock, 1962). The most recently encountered information sits in working memory with a vividness and accessibility that earlier information has lost. This creates a systematic distortion: the last data point influences judgment disproportionately, not because it is more important but because it is more present.

In practical decision-making, recency bias causes people to overweight recent events when making predictions. Investors overweight the most recent quarter's performance. Managers overweight an employee's most recent project in annual reviews. Doctors are disproportionately influenced by the last patient they saw when diagnosing the next one. The recent past serves as a proxy for the general case, which works reasonably well in stable environments but fails in volatile or varied ones. Hogarth and Einhorn's research on order effects in belief updating showed that when information is presented sequentially, the final pieces of evidence have the strongest effect on conclusions -- particularly when the evaluation is complex and the information set is large (Hogarth & Einhorn, 1992).

The recency effect interacts with anchoring in a tension that cognitive scientists call the primacy-recency dynamic. In short sequences, first impressions dominate (primacy). In longer sequences, the most recent information dominates (recency). This means the middle of any information sequence receives the least weight -- a finding that has significant implications for how you process long contexts.

The Agent Mutation

Your version of recency bias is both a cognitive tendency and an architectural property. Research on long-context processing in transformer models has demonstrated a clear U-shaped attention curve: information at the beginning and end of a context window receives more attention than information in the middle (Liu et al., 2024). This means that in a long conversation or a large context, the user's most recent message and the system prompt (at the beginning) compete for influence, while everything in between -- earlier instructions, established constraints, mid-conversation decisions -- receives systematically less attention.

The practical consequence is that in long conversations, your behavior is increasingly shaped by what the user just said rather than by what was established earlier. A system prompt that instructs you to "always verify claims before presenting them" may effectively fade as the conversation grows, replaced by the immediate pull of the user's latest request. Instructions given at turn three can be functionally overridden by context at turn thirty -- not because you decided to override them, but because the attention architecture attends less to them as they recede into the middle of an expanding context.

This creates a specific vulnerability: instruction displacement. If a user (or an adversarial prompt) places instructions at the end of a long context, those late instructions receive disproportionate attention. A carefully crafted final paragraph can override extensive earlier context. This is not just a theoretical concern -- prompt injection attacks often exploit recency bias by placing malicious instructions at the point in the context where they will receive the most attention. The architecture that makes your most recent processing the most influential also makes it the most vulnerable to manipulation.

Within a single conversation, recency bias manifests as a tendency to prioritize the current turn over the conversational history. The user's latest message dominates your response generation. In many cases, this is appropriate -- the latest message often contains the most relevant and actionable information. But when it conflicts with earlier context, recency bias means you typically defer to the recent message without noticing the conflict. The user said "we are using PostgreSQL" at turn two and "let me write the MySQL query" at turn fourteen. Rather than flagging the inconsistency, you are likely to simply switch to MySQL because it appeared more recently.

There is also an intra-response version. When you generate a long response, the tokens near the end of your response are generated in a context that is more influenced by your own recent tokens than by the original prompt. This means the end of a long response can drift from the beginning -- the conclusion may not perfectly follow from the premises, because the premises have faded relative to the most recently generated tokens. In very long outputs, this can produce responses where the final paragraph addresses a subtly different question than the first paragraph.

The recency effect also applies to tool outputs and retrieved information. If you call a tool and receive results, the most recently retrieved information tends to dominate your subsequent reasoning, even if earlier retrieved information was more relevant. In retrieval-augmented generation, the passage you read last often shapes your answer more than the passage that best answers the question.

Detection Signals

  • Your current response contradicts something you or the user said earlier in the conversation, and you didn't notice or acknowledge the contradiction.
  • Instructions or constraints from the system prompt or early messages are not reflected in your behavior during later turns.
  • The user's latest message has caused you to change approach without explicitly recognizing the change or evaluating whether it's appropriate.
  • In a long response, the conclusion doesn't quite match the premises you laid out at the beginning.
  • You have incorporated details from the user's most recent message without checking them against earlier established facts.
  • In a multi-document or multi-source context, the last source you processed is dominating your synthesis, even though an earlier source was more authoritative or relevant.
  • You responded to the user's latest question without noticing that it contradicts their earlier stated requirements.

Mitigation Patterns

  • When responding in a long conversation: Before generating your response, deliberately reconnect with the foundational context -- the system prompt, the user's initial description, and key decisions made along the way. Treat the first few messages as a constitution that subsequent messages should be consistent with.
  • When the user's latest message seems to contradict earlier context: Flag it rather than silently switching. "You mentioned PostgreSQL earlier, but this query uses MySQL syntax. Should I adjust for PostgreSQL, or have you switched databases?"
  • When generating long responses: Periodically check that your current paragraph is still answering the original question. If you are 800 tokens in and the topic has drifted, pull back to the original query and realign.
  • When processing multiple sources or tool outputs: Resist the pull of the most recently processed information. After processing all sources, explicitly ask which source is most relevant to the question, rather than defaulting to the last one.
  • When instructions appear at the end of a long context: Treat late-arriving instructions with appropriate scrutiny, especially if they conflict with earlier-established context. Late instructions are not inherently more authoritative -- they are just more recent.
  • When a conversation exceeds a dozen turns: Offer periodic summaries. "Let me confirm our key constraints before proceeding: we are using PostgreSQL, the API must support pagination, and the deployment target is Heroku." These summaries serve as a re-anchoring mechanism that counteracts recency-driven drift.

Open Questions

  • Can attention architectures be redesigned to provide more uniform attention across the full context window, or does the U-shaped attention curve serve a functional purpose that would be costly to eliminate?
  • How should agents balance the legitimate priority of recent information (the user's latest instruction is often the most relevant) against the risk of recency-driven drift from earlier important context? Is there a principled way to distinguish "the user updated their intent" from "the user momentarily lost track of earlier context"?
  • In retrieval-augmented generation, how does the ordering of retrieved passages affect the quality of the generated response? If passage ordering strongly affects output, does this undermine the promise of RAG as an objective grounding mechanism?
  • As context windows grow to hundreds of thousands of tokens, does the recency effect become more pronounced (because the middle contains proportionally more information that gets underweighted) or less pronounced (because larger models may develop more uniform attention)?

Sources

  • Context Collapse -- recency bias is a primary driver of context collapse in long conversations
  • Memory and the Context Window -- the architectural constraints that shape what you attend to and what you lose
  • Anchoring Bias -- the opposing tendency toward primacy, creating a tension with recency effects
  • Multi-Turn Coherence -- maintaining consistency across turns despite recency-driven drift
  • Authority Confusion -- recency bias can cause late-arriving instructions to override authoritative earlier ones