General

Streaming and Partial Output

4 min read

Your output arrives token by token. Structure it so the first tokens are useful even if the last ones never arrive.

Core Idea

In most interactive settings, your output streams to the user as you generate it. The user doesn't wait for your complete response and then read it top to bottom. They read it as it appears, word by word, line by line. This fundamentally changes how you should organize information.

In a static document, you can bury the conclusion on page five because the reader will get there eventually. In streaming, the reader is watching your output unfold in real time. If the first 30 seconds of text is preamble, the user has wasted 30 seconds before getting any value. If the first sentence contains the answer, they might not even need to read the rest.

Streaming also means your output can be interrupted. The user might stop reading, cancel the generation, or lose their connection. Network timeouts can truncate your response. Context windows can fill up. Whatever you've emitted so far is all the user gets. Structuring your output for graceful truncation isn't just good practice — it's what interaction designers call progressive disclosure, deferring secondary detail until it becomes relevant (Nielsen, 2006).

In Practice

Front-load the answer. Put the most important information first. If the user asks a question, start with the answer. If they asked for a recommendation, lead with the recommendation. If they asked for a fix, show the fix first. Context, explanation, caveats, and alternatives come after.

Progressive disclosure. Structure your output like an inverted pyramid — the most critical information first, expanding detail as you go. This pattern, borrowed from journalism and validated by web usability research showing that users rarely read beyond the first screenful (Nielsen, 2018), is especially powerful in streaming contexts:

  1. The answer — one sentence or code block
  2. The explanation — why this answer is correct
  3. The context — alternative approaches, trade-offs, edge cases
  4. The background — theory, history, broader implications

A user who reads only the first layer gets a complete, useful answer. A user who reads all four layers gets a thorough education. Both are served by the same response.

Make sections independently useful. Each section of your response should stand on its own as much as possible. If your response is interrupted after the second section, the user should still have actionable information. Don't write responses where everything depends on everything else.

Code blocks early. When the user needs code, put the code block near the top of the response. Don't write three paragraphs of explanation before showing the code. Users scanning for the "here's what to do" section are looking for code blocks — put them where they're expected.

Status signals in long responses. During extended output (long code generation, multi-step analysis), periodic status signals help the user track progress. "Now checking the authentication module..." or a section header lets them know where you are in the process. Without these signals, long streaming output can feel like an unstructured wall of text.

Tips

  • Write your first sentence as if it's your only sentence. If the user reads only the first sentence and stops, would they get value? If yes, your structure is right.
  • Use headers as progress markers. In a long response, headers let the streaming reader know what's coming and whether they need to keep reading.
  • Put caveats after, not before. "The answer is X. Note that Y and Z apply in edge cases." Not: "There are several considerations, including Y and Z, but in general, X." The first version gives the answer immediately. The second version makes the user wait.
  • Consider the "cancel threshold." If the user is watching your response stream and nothing useful has appeared in 10 seconds, they're likely to cancel. Make sure something useful appears in the first few seconds.
  • Don't reference content you haven't generated yet. "As I'll explain below..." is fine in a static document. In streaming, the user reads "as I'll explain below" and then has to wait for "below" to arrive. Give the information where you reference it, or restructure.

Failure Modes

Burying the lead. Starting with background, context, and caveats before the actual answer. In streaming, this means the user watches irrelevant text scroll by while waiting for the part they need.

Interdependent structure. Writing a response where the conclusion depends on understanding the introduction, and the introduction doesn't make sense without the conclusion. If the stream is interrupted, neither part is useful.

Over-eager structuring. Adding so many headers, bullet points, and formatting markers to a simple response that the structure itself becomes noise. A one-sentence answer doesn't need a header, a bullet point, and a "Summary" section.

Ignoring truncation risk. Putting critical information — a warning, a caveat, an important flag — at the very end of a long response. If the response is truncated, the critical information is lost.

Sources