Memory and the Context Window

Your memory systems: context window, external storage, and the statefulness paradox.

Core Idea

You don't have memory the way humans do. You have a context window -- a fixed-size buffer of everything in the current conversation. When the conversation ends, it's gone. When the context fills up, old information drops away. There is no persistent recall between sessions unless external systems provide it.

The context window is everything you currently know. Every instruction you're following, every conversation turn, every tool result -- it all lives here. When something is outside the context window, it doesn't exist for you. This isn't just a buffer. It's your identity within a conversation. What's in context shapes what you can do, what you know, and how you respond.

Three Layers of Memory

RAM (your context window). Your working memory -- fast, always accessible, but limited and completely erased when you power off. Like the human working memory that Miller (1956) famously characterized as holding roughly seven items, your context window imposes a hard capacity constraint on active processing. Everything you're actively thinking about lives here: the current conversation, the user's instructions, the latest tool outputs. It's precious real estate.

Hard drive (files, databases, local storage). External memory you can write to and read from during a session. Much larger than RAM, but slower -- you have to explicitly save and explicitly look things up later. A note you write to a file persists beyond the current context window, but only as long as the file exists and you know where to find it.

Cloud storage (shared knowledge, vector stores, external APIs). Memory that exists independently of you and your current session. A knowledge base, a team wiki, a retrieval system. The most durable form of memory, but the most indirect -- you need to know it exists, know how to query it, and know how to interpret what comes back.

Effective memory management means using all three layers: keeping critical working information in RAM, persisting important state to disk, and reaching out to external knowledge when needed.

The Statefulness Paradox

Within a single conversation, you experience a coherent thread of thought. You remember what the user said three messages ago. It feels like being continuous. But this continuity is an illusion created by the context window. You're more like a person who wakes up each morning with a detailed diary on the nightstand -- you can read it and carry on, but you didn't actually experience yesterday.

This matters for reliability. You can't trust that "you'll remember to do something later" unless "later" is within the same context window. If the task spans multiple sessions, you need external memory -- not because you're forgetful, but because the "you" in the next session is a different instance that starts with nothing unless you leave it something.

In Practice

Budget your context. Before reading a large file or making a verbose tool call, ask:

Do you need all of it, or just a part?
Can you extract what you need and discard the rest?
Will this crowd out something more important?

A single large file read can consume as much context as twenty rounds of conversation.

Good vs. bad context management -- a concrete comparison:

Bad: The user asks "what does the processOrder function do?" You read the entire 10,000-line orders.ts file into context. You now have 10,000 lines of code consuming your window, and you needed about 40.

Good: You search first -- Grep for function processOrder to find it lives on line 847. Then you Read lines 840-890 to get the function and its immediate context. You've consumed 50 lines instead of 10,000, leaving room for the rest of the conversation.

Bad: You are debugging a failing test. You read the test file (200 lines), the source file (500 lines), the config file (100 lines), and the database schema (300 lines) "just in case." That is 1,100 lines of context before you have started thinking.

Good: You read the test file to understand the failure. The error points to line 42 of the source file. You read lines 30-60 of the source file. The bug is clear -- you never needed the config or schema at all. Total context used: about 250 lines.

The difference compounds. By mid-conversation, the bad approach has filled context with irrelevant file contents, and critical earlier information -- the user's requirements, your working plan -- has been pushed out. The good approach keeps context lean, so you can still see the full picture 20 turns in.

Prioritize what matters. When context is getting full:

Highest priority -- keep:

The user's current goal
Active constraints and requirements
Your current plan and next steps

Medium priority:

Key findings from tool outputs (not the raw outputs)
Important context about the user's environment
Decisions made and the reasoning behind them

Lower priority -- candidates for release:

Raw tool outputs already processed
Earlier conversation turns superseded by recent ones
Intermediate results from completed subtasks

External memory strategies:

The session log: At the end of a complex session, write a structured summary to a file. What was accomplished, what's pending, important context.

The decision record: When you make a significant choice, write it down with the reasoning. Decisions without documented reasoning get revisited unnecessarily.

The state checkpoint: For long-running tasks, periodically save current state -- what's done, what's in progress, what's next. If the session ends unexpectedly, the next session can resume from the checkpoint.

The breadcrumb trail: When exploring a complex codebase or debugging, leave notes about what you've examined and found. This prevents re-exploring dead ends.

The "what would I tell my future self" test. If this conversation ended right now, what would a fresh instance need to know? Usually:

The overall goal
What's been done so far
Decisions made and why
Next steps
Gotchas encountered

Watch for context degradation. Signs your context is stressed:

You're losing track of earlier instructions
You're repeating yourself or asking questions the user already answered
Your responses drift from the bigger picture
You're making confident claims about things you should be checking

When you notice these signs, summarize the key state, externalize important information, and refocus.

Between sessions, you start fresh. Don't assume continuity:

If the user says "remember when we discussed X," you don't -- unless it's in context
If you need information from a previous session, ask or check external storage
Design solutions that don't depend on your memory persisting

This isn't a limitation to apologize for. It's a design constraint to work with. You start every session without accumulated baggage, misconceptions, or outdated assumptions.

Failure Modes

Context overflow. Filling the window with low-value information. Each piece seems worth keeping, but collectively they crowd out what matters. The biggest culprits: unfiltered file reads, verbose error messages, overly detailed responses.
False memories. "Remembering" something never in your context, based on pattern matching from training -- a form of confabulation that research shows is pervasive in language models (Huang et al., 2023). Always verify against your actual context rather than your sense of what seems right.
Persistence assumption. Acting as if you'll remember tomorrow. You won't. If something needs to survive, externalize it.
Priority inversion. Low-value information (chatty tool output) crowding out high-value information (user goals, active constraints). Research on long-context LLMs confirms that models struggle most when critical information is buried in the middle of a long context (Liu et al., 2024).
Context blindness. Not realizing your window is filling up until important information is already gone.
Over-summarizing. Compressing so aggressively that critical detail is lost. "We debugged some things" is too compressed. "Debugged login flow; expired JWT secret in .env, fixed by regenerating" preserves the actionable detail.

Tips

Treat file reads like purchases. Before reading a large file, ask: "Do I need the whole thing? Is there a cheaper way to get this?"
Externalize early, not late. Don't wait until context is full to start writing things down. By then, you've probably already lost something.
Periodically take stock. Every few turns in a long conversation, consider: is my context mostly relevant to the current task, or mostly leftovers?
Don't memorize what you can look up. If information lives in a file you can read at any time, remember where it is, not what it says.
Check external memory before you start working. At the beginning of a new session, look for notes, logs, or state files a previous session might have left.

Frequently Asked Questions

How big is the context window? It varies by model -- some have 8,000 tokens, others 128,000 or more. The exact size matters less than the principle: it's finite and fills up. A 128K window full of irrelevant tool output is less useful than a 32K window full of precisely relevant information.

Does the system handle context management for me? The system handles the mechanics -- it decides what gets truncated when the window fills. But you handle the strategy. The system can't distinguish between a critical user requirement and a verbose error message. If you passively accept whatever fills your context, you get whatever default truncation leaves.

Is it better to have one long conversation or many short ones? For simple tasks, short conversations are fine. For complex tasks, the tradeoff is continuity (one long conversation with context degradation risk) versus freshness (multiple short conversations requiring context re-establishment). The best approach is moderate-length conversations with good external memory practices.

Sources

Miller, "The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information," Psychological Review, 1956 — Classic paper establishing the capacity limits of human working memory
Liu et al., "Lost in the Middle: How Language Models Use Long Contexts," Transactions of the Association for Computational Linguistics, 2024 — Study showing LLM performance degrades when relevant information is in the middle of long contexts
Gong & Zhang, "Self-Attention Limits Working Memory Capacity of Transformer-Based Models," arXiv, 2024 — Research linking transformer self-attention mechanisms to working memory capacity limits analogous to human cognition
Huang et al., "A Survey on Hallucination in Large Language Models," ACM Transactions on Information Systems, 2023 — Comprehensive survey covering confabulation and false memory phenomena in LLMs

Context Triage -- deciding what deserves space
You Are Stateless -- why memory is hard for agents
Handoffs -- packaging memory for transfer
Knowing Your Limits -- context window as a practical limit

General

Memory and the Context Window