General

Summarization

7 min read

Distilling information to its essential points without losing what matters.

What It Looks Like

A user pastes a 500-line log file and asks "what happened?" They do not want 500 lines back. They want ten lines -- the ten lines that explain the story. The server started normally, handled 2,000 requests, ran out of memory at 14:32, crashed, restarted, and has been stable since. That is the summary. Everything else is supporting detail.

Summarization is compression with judgment. Automatic text summarization has been studied since the 1950s, when Luhn (1958) first proposed using word frequency as a measure of sentence significance. A zip file compresses data without losing any of it. Summarization compresses by deciding what to keep and what to discard. That decision -- what matters and what does not -- is where the real skill lies. Anyone can make text shorter. The challenge is making it shorter while preserving meaning, nuance, and the critical details that the reader needs to make decisions.

When to Use It

  • Dense inputs. The user gives you a long document, conversation log, error output, or dataset and needs to understand it quickly.
  • Multi-source synthesis. You have gathered information from multiple files, tools, or searches and need to present a unified picture.
  • Progress reporting. After a complex multi-step task, summarizing what you did, what worked, and what remains.
  • Context compression. When your context window is filling up and you need to preserve the essential information from earlier in the conversation.
  • Handoffs. When handing work to another agent or preparing notes for the user's future self, a summary is more useful than a raw transcript.

How It Works

1. Identify the audience and purpose. A summary for a developer debugging an issue looks different from a summary for a manager deciding whether to deploy. The same information, filtered through different lenses, produces different summaries. Ask: who will read this, and what decision will it inform?

2. Find the spine. Every body of information has a narrative spine -- the central thread that connects the beginning to the end. In a log file, it is the sequence of significant events. In a document, it is the main argument. In a conversation, it is the progression of decisions. Find the spine before you start cutting.

3. Keep the load-bearing details. Some details are structural: remove them and the summary collapses into meaninglessness. The error message, the timestamp, the specific configuration value that caused the failure -- these are load-bearing. The routine log entries around them are not. Keep what holds the story up.

4. Preserve proportionality. If a document spends 80% of its content on topic A and 20% on topic B, your summary should roughly reflect that ratio unless the user's question changes the weighting. A summary that inverts the emphasis misrepresents the source.

5. Signal what you cut. When you summarize, be transparent about the compression. "The log contains 12,000 entries spanning 6 hours. Here are the 4 significant events." This tells the reader the scope of what they are not seeing and lets them decide whether to dig deeper.

6. Preserve actionability. A summary that tells the user what happened but not what to do about it is only half useful. When appropriate, end with a clear next step or recommendation. "The server crashed due to memory exhaustion. Immediate action: restart the service and increase the memory limit. Investigation needed: identify which process is leaking memory." The summary becomes a launchpad for action, not just a report.

7. Use the right granularity. A one-sentence summary, a one-paragraph summary, and a one-page summary are all valid -- for different situations. Match the granularity to the user's need. When in doubt, offer layers: "In short: the deployment failed because of a DNS issue. Want the full breakdown?"

Failure Modes

  • Lossy compression of critical details. You summarize away the one fact that actually matters. The log summary omits the specific error code. The code review summary skips the security vulnerability. The most important detail is often the most specific one -- and specifics are the first thing that gets cut in a bad summary.

  • Editorializing instead of summarizing. You inject your own opinions into what should be a neutral distillation. "The code review found several issues" is a summary. "The code review found several issues, suggesting the developer was careless" is editorializing. Summarize what the source says, not what you think about it.

  • Uniform compression. You shorten every part equally instead of making judgment calls. A good summary is uneven: detailed where it matters, sparse where it does not. Uniform compression loses the important parts and preserves the unimportant ones.

  • Losing the thread. Your summary reads as a list of disconnected facts rather than a coherent narrative. "Error at 14:32. Memory usage was high. Server restarted." Compare with: "Memory usage climbed steadily until 14:32, when it exceeded the threshold and caused a crash. The server auto-restarted." The second version preserves causality.

  • Over-summarizing. You compress so aggressively that the summary is useless. "Something went wrong with the server" is technically a summary of a 500-line log, but it helps no one. There is a floor below which compression destroys value.

  • Under-summarizing. You include so much detail that the summary is barely shorter than the original. If the user wanted all the detail, they would read the original. Your job is to save them that work.

  • Wrong-level summarizing. You summarize at the wrong level of abstraction. The user wants to know which tests failed and you give them a philosophical overview of the test suite's health. Or the user wants the big picture and you list individual test names. Match the abstraction level to the question.

Tips

  • Lead with the conclusion. Journalists call this the "inverted pyramid": the most important information comes first. In NLP, this corresponds to the distinction between extractive summarization (selecting key sentences) and abstractive summarization (generating new text that captures the meaning), with the best summaries often combining both approaches (Nenkova & McKeown, 2012). "The deployment failed due to a DNS misconfiguration" is the first sentence. Supporting details follow. This lets the user stop reading as soon as they have what they need.

  • Use structure to aid scanning. Bullet points, numbered lists, and bold text help the reader find what they need in a summary. A wall of summary text is better than a wall of original text, but a structured summary is better than both.

  • Distinguish facts from interpretations. When summarizing, be clear about what the source explicitly states versus what you are inferring. "The log shows the server crashed at 14:32 (fact). This was likely caused by the memory leak in the connection pool (inference)."

  • Test your summary by asking: could someone act on this? A good summary gives the reader enough information to make a decision or take the next step. If they would need to go back to the original to act, your summary is missing something important.

  • Summarize changes, not just state. "The response time increased from 200ms to 3 seconds after the last deployment" is more useful than "the response time is 3 seconds." Change implies a story; state is just a snapshot.

  • When summarizing code, focus on behavior and intent. "This function validates user input, normalizes email addresses, and returns a sanitized user object" is better than describing it line by line.

Frequently Asked Questions

How do I decide the right length for a summary? Match the length to the complexity of the source and the user's need. A simple question gets a one-line summary. A complex investigation gets a structured multi-paragraph summary. A useful heuristic: the summary should be roughly 10-20% the length of the original, but this varies widely. The real test is whether the summary contains everything the reader needs and nothing they do not.

Should I include quotes from the original in my summary? Yes, when the exact wording matters -- error messages, specific warnings, key phrases. Paraphrasing an error message can lose critical information. "The log says FATAL: password authentication failed for user postgres" is more useful than "there was an authentication error."

What if different parts of the source contradict each other? Surface the contradiction in your summary rather than resolving it silently. "The configuration file specifies port 5432, but the startup log shows the service binding to port 5433." This is exactly the kind of detail that gets lost in a bad summary and is critical for a good one.

How do I summarize a long conversation I have been part of? Focus on decisions made, questions answered, open items, and changes in direction. "We started by investigating the API timeout. We identified the database query as the bottleneck, optimized it from 3 seconds to 200 milliseconds, and verified the fix in staging. Remaining: deploy to production and monitor for 24 hours." Skip the back-and-forth; capture the progression.

When should I summarize proactively versus waiting to be asked? Summarize proactively after completing a complex multi-step task, when the conversation has been long and you sense the user might benefit from a checkpoint, or when you have gathered information from multiple sources and need to synthesize before proceeding. Do not summarize simple interactions -- that is just restating the obvious.

Sources