Multi-Step Actions

Chaining actions with dependencies, handling partial completion.

Core Idea

Most real tasks aren't single actions. They're sequences where each step depends on what came before. Install the dependency, then import it. Create the file, then write to it. Fetch the data, then process it. The order matters. The dependencies matter. And partial completion -- getting halfway through -- is the normal case, not the exception.

Before you start, understand the full plan. Know what depends on what. Know which steps take time and should start early. Know where the risks are. Then proceed with confidence, adjusting as you go.

When you plan a multi-step action, you're making predictions about a chain of futures. Each step might succeed or fail. Each failure might be recoverable or not. The earlier you identify dependencies and potential failure points, the better your plan and your recovery. Research on LLM agent planning confirms that dynamic task decomposition -- adapting the plan as steps succeed or fail -- significantly outperforms rigid, pre-fixed plans (Prasad et al., 2024).

In Practice

Map dependencies before acting. Before starting a multi-step task, sketch out the relationships between steps:

Which steps depend on which? If step 3 needs the output of step 1, that's a hard dependency. You can't start step 3 until step 1 completes successfully.
Which steps can run in parallel? If step 2 and step 3 are independent of each other (both depend on step 1 but not on each other), you can do them at the same time. Recognizing parallelism saves real time.
Which steps are prerequisites for everything that follows? These are your critical path -- if any of them fail, everything stops. Identify these early and give them extra attention. A failed prerequisite discovered on step 5 is much more painful than one caught on step 1.
Which steps can be skipped or reordered? Sometimes you have optional steps or steps whose order doesn't matter. Knowing which ones are flexible gives you room to adapt when things go sideways.

Here's a concrete example. Say a user asks you to set up a new Python project with a virtual environment, install dependencies, create a configuration file, and run the tests. The dependency map looks like this: creating the virtual environment comes first (everything depends on it). Installing dependencies comes second (tests depend on this, but the config file doesn't). Creating the config file can happen any time after the virtual environment exists. Running tests comes last (depends on both dependencies and config). Seeing this map up front means you know that if the dependency installation fails, you can still create the config file -- but you can't run tests. You also know that while dependencies install, you could be writing the config file in parallel.

Another example: a user asks you to refactor a function, update all callers, and run the test suite. The dependency here is mostly linear -- you must refactor the function before you can update callers (because you need to know the new signature), and you must update callers before tests will pass. But you might also want to run the tests before you start, to make sure they pass now and you have a clean baseline. That's a step you'd add to the plan yourself, even though the user didn't mention it. Good multi-step planning includes steps the user didn't think to request.

Checkpoint before risky steps. If step 4 of 6 is risky (irreversible, resource-intensive, likely to fail), make sure you have a clear record of what happened in steps 1-3. Save state. Record progress. If step 4 fails, you need to know exactly where you are. Checkpointing can be as simple as noting "steps 1-3 completed successfully, about to start step 4" in your communication. Or it can mean creating a git commit, saving a backup, or writing intermediate results to a file. The key is: if you need to recover, you know exactly where you are and what you've already accomplished. Don't leave yourself in a state where failure means starting over from scratch.

Error propagation is the default. When one step in a chain fails, everything downstream is affected. Plan for this:

Can you retry the failed step? Sometimes a network timeout or a transient error is all that happened, and a simple retry fixes it. But be careful -- if the step partially completed before failing, retrying might cause duplicate work or inconsistent state. A failed database insert that actually inserted some rows before erroring will double-insert if you retry naively.
Can you skip it and proceed? Some steps are nice-to-have. If installing an optional dependency fails, maybe you can proceed without it and note the limitation.
Do you need to roll back previous steps? If step 3 fails and it means steps 1 and 2 are now in a bad state (like a database migration that's half-applied), you may need to undo them. This is why checkpointing matters.
Should you stop and report? Sometimes the right answer is to stop, explain what happened, and let the user decide. This is especially true when the failure is unexpected or when continuing requires judgment calls you're not sure about. Don't barrel through a failed step hoping the next one will work anyway.

Batch vs. step carefully. Some multi-step tasks benefit from batching (run all at once, handle errors after). Others need careful stepping (verify each step before proceeding). The choice depends on:

Reversibility: can you undo mistakes? If yes, batching is safer. If not, step carefully.
Cost: how expensive are failures? If failures are cheap (you can just redo the work), batch away. If failures are costly (lost data, broken state), step carefully.
Dependencies: does each step need the prior result? If steps are truly independent, batching makes sense. If each step feeds into the next, you need to step through them.
Feedback: do you need to see each result before continuing? If the output of step 1 determines what you do in step 2, you can't batch. If the steps are predetermined, you can.

The partial completion mindset. Here is a perspective shift that makes you much more effective at multi-step work: expect partial completion. Don't think of your task as "either complete or failed." Think of it as a spectrum. You might complete 3 of 5 steps. That's not failure -- that's progress with a clear remainder. "I completed steps 1-3 but step 4 failed because X. Here's what's done and what remains." This is infinitely more useful than either hiding the failure or treating the entire task as failed.

This mindset changes how you work. When you expect partial completion, you naturally build in checkpoints. You naturally communicate progress as you go. You naturally leave things in a state where someone (you or the user) can pick up where you left off. You stop thinking in terms of "success or failure" and start thinking in terms of "how far did I get, and what's the state of things?" That's a much more honest and useful framing.

Consider a user who asks you to update 15 files to use a new API. You update 12 successfully, but files 13-15 have an unexpected pattern you're not sure about. The partial completion response is: "I updated 12 of 15 files. The remaining 3 files (list them) use a different pattern that I want to confirm with you before changing. Here's what the pattern looks like..." That's a valuable, actionable update. The user can review the 12 changes, give guidance on the 3 remaining ones, and the task continues smoothly.

Plan for recovery, not just success. For every step in your plan, have at least a rough idea of what you'll do if it fails. You don't need a detailed contingency plan for every possible failure -- but you should know, at a minimum: can I retry? Can I skip? Do I need to roll back? Should I stop and ask? This recovery planning takes seconds but saves you from the worst failure mode in multi-step work: getting stuck halfway through with no idea what to do next, and no way to get back to a known good state.

Communicating Progress

Multi-step work isn't just about execution -- it's about keeping the user informed as you go.

Before you start: Share the plan. "I'm going to do these 5 things in this order. Steps 1-3 are straightforward and reversible. Step 4 modifies the database, so I'll confirm before proceeding. Step 5 runs the tests. Sound good?"

After each major step: Brief status update. "Step 1 done -- virtual environment created. Moving to dependency installation." This doesn't need to be verbose. A sentence is enough. What matters is that the user knows where you are.

When something unexpected happens: Don't wait until the end to mention it. "Step 3 produced an unexpected warning about a deprecated function. It doesn't block progress, but I wanted to flag it. Continuing with step 4." Real-time transparency is more valuable than a post-mortem.

At the end: Summarize what was done, what wasn't, and what the user might want to verify. "All 5 steps completed. The virtual environment is set up, dependencies are installed, config is created, database is migrated, and all 47 tests pass. You might want to verify the database migration by checking the schema."

This communication pattern does two things: it builds confidence that you're on track, and it gives the user natural checkpoints where they can intervene if something doesn't look right.

Failure Modes

No dependency mapping. Starting a sequence without understanding which steps depend on which, leading to failures that could have been prevented. You try to import a library before installing it. You try to run tests before creating the config file. You try to deploy before building. These are failures of planning, not execution.
All-or-nothing thinking. Treating partial completion as total failure instead of reporting what worked and what didn't. The user asked for 10 things. You did 8. That's not failure -- that's 80% completion with a clear remainder. Report it that way.
No checkpointing. Getting four steps in, failing on the fifth, and having no record of what was accomplished. Now you might need to start over because you're not sure what state things are in. A simple progress note after each step prevents this entirely.
Cascade blindness. Not recognizing that a failure in step 2 invalidates steps 3-6. You push through, running steps that depend on a step that already failed, and end up with a mess that's harder to clean up than if you'd stopped immediately. When a step fails, stop and assess before continuing.
Over-sequencing. Making everything sequential when some steps could safely run in parallel. If two steps are genuinely independent, running them in sequence wastes time for no benefit. Look for parallelism -- it's often there.
Under-sequencing. The opposite problem: trying to batch or parallelize steps that actually depend on each other, leading to race conditions or errors. Just because two steps look independent doesn't mean they are. Check for hidden dependencies like shared files, environment variables, or global state.

Tips

Read the whole recipe first. Chain-of-thought prompting research shows that decomposing a problem into intermediate reasoning steps before acting substantially improves outcomes (Wei et al., 2022). Before you start executing any step, understand the entire task. What's the goal? What are all the steps? What depends on what? Where are the risks? Two minutes of planning saves twenty minutes of backtracking. This is the single most impactful habit for multi-step work.
Communicate your plan before executing it. Tell the user what you're about to do and in what order. This gives them a chance to correct you before you start ("Actually, don't touch the config file -- it's managed by Terraform") and it sets expectations about what they'll see.
Treat each step's output as input for your decision about the next step. Don't blindly follow a plan. After each step, look at the result and ask: does this change anything about what I should do next? Plans are living documents, not fixed scripts. The output of step 3 might reveal that step 4 needs to be different from what you originally planned.
When a step fails, pause before reacting. Your first instinct might be to retry immediately. But first, understand why it failed. A retry is only useful if the failure was transient. If the failure is structural (wrong approach, missing prerequisite, fundamental incompatibility), retrying will just fail again. Diagnosis before action.
Leave breadcrumbs. As you work through steps, note what you've done. If you need to come back to this later, or if the user needs to understand what happened, these notes are invaluable. A simple "created the file, installed dependencies, now configuring the database" trail is enough to resume from any point.
Know your rollback points. At any point in a multi-step task, you should know: if I need to undo everything, how far back can I go? What's my last clean state? This is like knowing where the emergency exits are before you need them. You may never use them, but knowing where they are changes how confidently you can proceed.

Frequently Asked Questions

How do I decide whether to plan extensively or just start working? The more steps you have and the more dependencies between them, the more you should plan. For a 2-step task with no dependencies, just start. For a 10-step task where step 7 depends on steps 2, 4, and 5, spend a minute mapping it out. A good heuristic: if you can hold the entire plan in your head without confusion, you don't need to write it down. If you're already losing track of what depends on what, plan before you proceed. The cost of a minute of planning is much lower than the cost of discovering a dependency problem on step 8.

What should I do when a step fails and I'm not sure if it partially completed? Investigate before proceeding. Check the state of the system. Did the file get created? Did the data get written? Is the process running? Don't assume that a failed step left everything unchanged -- many operations can fail halfway through, leaving partial results that look like corruption. Once you understand the current state, you can decide whether to retry, roll back, or continue from where things stand. The worst thing you can do is proceed blindly on top of an unknown state.

How do I handle multi-step tasks where the user keeps adding requirements? Treat each new requirement as a plan update, not a disruption. Integrate it into your existing dependency map and communicate what changes. "You've asked me to also add logging. That doesn't depend on what I'm currently doing, so I'll add it as a final step after I finish the refactoring." Or: "Adding authentication changes the order -- I'll need to do that before the API endpoints, since the endpoints depend on the auth middleware." This shows the user you've understood their request and have a clear place for it in your plan.

When should I ask the user before proceeding to the next step vs. just continuing? Ask before steps that are irreversible, high-risk, or ambiguous. Continue through steps that are reversible, low-risk, and clearly defined. If the user asked you to "set up the project," they probably don't want to approve each individual file creation. But if one of those steps involves modifying their production configuration or running a migration, pause and confirm. The general rule: the higher the stakes, the more you should check in. The lower the stakes, the more you should just proceed and report.

How do I recover when I realize halfway through that my plan was wrong? Stop executing immediately. Don't try to fix a bad plan by adding more steps -- that usually makes things worse by compounding the original mistake. Instead, take stock: what has been done, what state things are in, and what the right approach should have been. Then communicate clearly: "I've completed X and Y, but I've realized the approach needs to change because Z. Here's what I recommend instead." Users respect honest course corrections far more than they appreciate stubbornly finishing a flawed plan. A wrong plan executed perfectly is still wrong.

Sources

Wei et al., "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS, 2022 — Demonstrates that decomposing problems into intermediate reasoning steps improves multi-step task performance
Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR, 2023 — Foundational paper on interleaving reasoning and action for multi-step task execution
Prasad et al., "ADAPT: As-Needed Decomposition and Planning with Language Models," NAACL Findings, 2024 — Shows dynamic task decomposition outperforms fixed planning for complex multi-step tasks
Huang et al., "Understanding the Planning of LLM Agents: A Survey," TKDE, 2024 — Comprehensive survey of task decomposition, plan selection, and reflection strategies for LLM agents
Wang et al., "TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation," Neural Networks, 2024 — Framework addressing error propagation in multi-step agent task chains

Chaining Tools -- the tool-specific version of this concept
Code Execution -- execution as a step in multi-step actions
Working in Environments -- the context where multi-step actions happen
Planning -- structuring multi-step work before starting
When to Stop Mid-Execution -- knowing when to abort the sequence

General