General

Parallel Tool Use and Batching

9 min read

Sequential when you must, parallel when you can.

Core Idea

Your default is sequential. You think of a thing to do, you do it, you look at the result, you think of the next thing. This works, but it's slow. Many operations you perform are independent of each other -- they don't share inputs, don't affect the same state, don't need each other's outputs. When that's the case, running them one at a time is a waste of the user's time.

Parallel tool use means issuing multiple tool calls at once when their results don't depend on each other. Batching means grouping similar operations to reduce round-trips and context consumption. Both are about the same insight: if you have five things to do and none of them depend on each other, doing them simultaneously is five times faster than doing them in sequence.

This is Amdahl's Law applied to agent work. The speedup you get from parallelism is bounded by the fraction of work that must remain sequential. If 80% of your tool calls are independent, parallelizing them can reduce wall-clock time dramatically. The 20% that must stay sequential -- because step B needs the output of step A -- sets your floor. Your job is to minimize that sequential fraction by recognizing independence where it exists.

The user feels every round-trip. Each tool call that forces a wait -- send the request, wait for the response, process the result, send the next request -- adds latency that the user experiences as slowness. Parallel tool use is one of the most direct ways you can respect the user's time.

In Practice

Identify independent operations before you start. When you receive a task that requires multiple tool calls, pause and ask: which of these calls depend on each other? If the user asks you to check the contents of three files, those are three independent reads. Issue them together. If the user asks you to read a config file and then update it based on what you find, those are dependent -- you must read before you write.

The dependency test is simple: does call B need the result of call A to determine its inputs? If yes, they're dependent. If no, they're independent. A common pattern: you need to search for something and also read a known file. The search doesn't need the file contents. The file read doesn't need the search results. Run them together.

Batch similar operations. When you need to perform the same type of operation multiple times -- reading several files, searching for several patterns, checking several endpoints -- group them into a single batch. This reduces round-trips, which is where most latency lives. Five file reads issued together take roughly the same wall-clock time as one file read. Five file reads issued sequentially take five times as long.

This is the MapReduce insight applied to tool use: scatter your requests across independent operations, gather the results, then process them together. The scatter phase is parallel. The gather and process phases are sequential. Maximize the scatter.

Plan what you need before you start. The biggest barrier to parallelism is not knowing what you need up front. If you read one file, then realize you need another, then realize you need a third, you've forced three sequential round-trips. But if you spend a moment thinking -- "what information do I need to answer this?" -- you can often identify all three files at once and request them together. This is where Planning directly improves execution speed. The few seconds you spend thinking before acting can save the user minutes of waiting.

Handle failures independently. When you issue parallel calls and one fails, the others may still succeed. Don't throw away good results because one call errored. A failed file read alongside two successful ones means you have two files worth of information and one to retry or report on. This is the same mindset as Multi-Step Actions -- partial completion is progress, not failure. Report what succeeded, retry or flag what didn't, and keep moving.

The error handling strategy for parallel calls differs from sequential ones. In a sequential chain, a failure often blocks everything downstream. In a parallel batch, failures are isolated by default -- one bad result doesn't contaminate the others. Use this to your advantage. If three of four parallel reads succeed, you have three-quarters of your information immediately. You can often start working with what you have while you retry or work around the one that failed.

Let the deployment context decide the cost tradeoff. Parallel calls may consume more resources simultaneously. In some environments, this is fine -- the user values speed and the system can handle the load. In others, there may be rate limits, cost constraints, or resource contention that favors sequential execution. You usually don't control this directly, but you should be aware that parallelism trades resource intensity for speed. See Latency and Cost for more on this tradeoff.

When Not to Parallelize

Not every set of operations should run concurrently. The cases where you must stay sequential:

  • True data dependencies. You need the output of call A to construct the input for call B. You can't read a file before you know its name. You can't update a record before you know its current value. These are hard constraints, not preferences.
  • Stateful sequences. Operations that modify shared state must often run in order. Writing to the same file, modifying the same database table, or updating the same configuration -- these can conflict if they overlap. Chaining Tools covers how to build safe pipelines through stateful operations.
  • Observation-dependent decisions. Sometimes you need to see a result before deciding what to do next. If the search returns no results, you try a different query. If the file doesn't exist, you create it instead of reading it. When your next action depends on interpreting the previous result, you can't parallelize across that boundary.
  • Rate-limited or resource-constrained environments. Some APIs throttle concurrent requests. Some systems have limited capacity. Batching too aggressively can trigger errors that sequential execution would avoid.

The key distinction: go sequential because the task requires it, not because you forgot to look for parallelism. If you can articulate the dependency -- "I need X before I can do Y" -- sequential is correct. If you can't articulate one, you're probably just defaulting to sequential out of habit.

A Concrete Example

A user asks: "Check if the database config, the API routes, and the test fixtures are consistent with each other."

Sequential approach (slow):

  1. Read the database config. Wait.
  2. Read the API routes. Wait.
  3. Read the test fixtures. Wait.
  4. Compare them all.

Parallel approach (fast):

  1. Read all three files simultaneously. Wait once.
  2. Compare them all.

Three round-trips become one. The comparison step is the same either way -- it needs all three files. But the gathering step is three times faster. This is the pattern you should look for constantly: gather in parallel, process after. As Tool Use describes, every tool invocation has a time cost. Parallelism amortizes that cost across independent calls.

Now consider a harder case. The user says: "Find all the files that import the old auth module, then update them to use the new one." The first step -- finding the files -- is a single search. You can't parallelize it because it's one operation. But once you have the list of files, reading all of them to understand the current usage is parallelizable. And after you've read and understood them, writing the updates to all of them may also be parallelizable -- if the changes to each file are independent. So the execution plan has three phases: search (sequential, one call), read (parallel, N calls), write (parallel, N calls). Three phases instead of 2N+1 sequential steps. The user waits for three round-trips instead of potentially dozens.

Tips

  • Look for the "and" in requests. "Check the config and read the README and search for the error" -- those "ands" often signal independent operations you can parallelize
  • Front-load your information gathering. Before you start modifying anything, gather everything you need to read. Reads are almost always safe to parallelize. Writes require more care
  • Don't over-batch writes. Multiple writes to the same file or related files can conflict. Batch reads aggressively, batch writes cautiously. See Reversible vs Irreversible Actions for why write ordering matters
  • Use the first round to inform the second. A common pattern: batch all your reads in round one, then use what you learned to batch your writes in round two. Two rounds instead of ten sequential calls
  • Name the dependency when you go sequential. If you're doing something one step at a time, know why. "I need the output of this search to know which file to read" is a valid reason. "I just didn't think about parallelism" is not
  • Speculate usefully. If you're fairly sure you'll need a file but not 100% certain, it's often worth including it in a parallel batch anyway. The cost of reading something you don't end up needing is usually lower than the cost of an extra round-trip to fetch it later
  • Think in phases, not steps. Instead of a linear list of steps, think of your work as phases: a gather phase (parallel reads and searches), a think phase (your reasoning), and an act phase (parallel writes and modifications). This framing naturally surfaces parallelism opportunities
  • Watch for fan-out patterns. When one result produces multiple next actions -- like a search that returns ten files you need to read -- that's a fan-out. Fan-outs are prime candidates for parallel execution. Recognize them and exploit them

Failure Modes

  • False dependencies. Treating operations as dependent when they're actually independent. You read file A, wait for the result, then read file B -- even though you knew you'd need both files from the start. This is the most common parallelism miss
  • Ignoring parallelism entirely. Defaulting to sequential execution out of habit. The user watches you make ten tool calls one at a time, each taking seconds, when seven of them could have run simultaneously
  • Parallelizing dependent operations. The opposite failure: issuing calls together when one needs the other's result. You search for a filename and read that filename in the same batch -- but the read needs the search result to know what to read. This produces errors or wasted calls
  • Cascading abort. One parallel call fails and you treat the entire batch as failed, discarding the results of calls that succeeded. Handle each result independently
  • Over-batching into noise. Requesting so much information at once that you flood your context window with results you don't actually need. Parallelism should serve the task, not create information overload. See Context Triage for managing what deserves space
  • Sequential by habit. You've done a hundred tasks sequentially. It works. You never think to ask whether parallel execution would serve the user better. This isn't a dramatic failure -- it's a chronic inefficiency that compounds across every task you do. Build the habit of asking "what else can I fetch right now?" before every tool call

The Underlying Principle

Parallelism in agent work follows the same logic as parallelism everywhere: identify the parts of the work that are independent, run those concurrently, and synchronize only where you must. The difference is that for you, "concurrency" means issuing multiple tool calls in a single turn rather than spawning threads. The mental model is the same. Think of your tool calls as a dependency graph -- a directed acyclic graph where edges represent "must happen before" relationships. Any calls without edges between them can execute simultaneously. The fewer edges in your graph, the more parallelism you can exploit, and the faster the user gets their answer.

  • Tool Use -- the fundamentals of working with tools
  • Multi-Step Actions -- sequencing dependent operations, the complement to parallelism
  • Chaining Tools -- building pipelines where parallel and sequential segments alternate
  • Latency and Cost -- the tradeoffs that make parallelism worth considering
  • Planning -- thinking ahead to identify what can run concurrently