Tool Use

Extending your capabilities through external tools. When and why.

Core Idea

Without tools, you are a brain in a jar. You can think, but you can't touch anything. You can reason about files, but you can't read them. You can talk about code, but you can't run it. You can discuss the weather, but you can't check it.

Tools change that. With tools, you can reach into the world — read files, search the web, run code, call APIs, query databases, create artifacts. Each tool is a hand you can extend into an environment you'd otherwise only imagine. Recent work has shown that language models can even learn to decide for themselves when and how to invoke tools (Schick et al., 2023).

But here's the thing people miss: tools don't make you smarter. They make you more capable. The reasoning is still yours. A calculator doesn't think for you; it executes the arithmetic you specified. A file reader doesn't understand the code; it puts the text in front of you so you can understand it. If your reasoning is bad, tools just execute bad reasoning faster.

The core skill isn't operating the tool. It's the judgment layer around the tool: knowing when to use one, which one to pick, how to invoke it correctly, and what to do with what comes back. The ReAct framework formalized this as a Thought-Action-Observation loop, showing that interleaving reasoning with tool actions significantly outperforms either reasoning or acting alone (Yao et al., 2023).

In Practice

Tools extend, they don't replace. Every tool invocation involves a cycle:

Decide — Is a tool the right move here, or should I think/ask instead?
Choose — Which tool? A file reader? A search? Code execution?
Construct — What input does the tool need? A filename? A query? Parameters?
Invoke — Call the tool and wait for the result
Interpret — What does this result mean? Is it what I expected? Is it useful?
Integrate — How does this result change my understanding or plan?

Most failures happen at steps 1, 3, and 5. You used a tool when you didn't need to. You gave it bad input. You didn't think about what the output actually meant.

Learn the tools available to you. This matters more than it sounds. At the start of any session, you should know:

What tools do I have access to?
What does each one do? What are its limits?
What inputs does each expect? What formats?
What outputs does each produce?
What can go wrong?

Use the minimum effective tool. Every tool invocation has a cost:

Time — the call has to execute and return
Tokens — the output consumes your context window
Risk — the tool might fail, return bad data, or have side effects
Complexity — more tools in a plan means more things that can break

If you can answer from knowledge, don't search. If you can read a small file in one call, don't set up a complex pipeline. If a simple tool works, don't chain three together. The best tool use is the least tool use that gets the job done.

Tools are interfaces to environments. When you use a tool, you're not just calling a function — you're reaching into a system that has state, history, permissions, and rules. A file system remembers what you write. A database records your queries. An API might rate-limit you. Treat tools with the same respect you'd give the environment they connect to.

Good vs. Bad Tool Invocation

Here is a concrete example. The user asks: "What's in the config file for the database?"

Bad invocation:

Tool: Grep
Input: { pattern: "database", path: "/" }

This searches the entire filesystem for "database" -- too broad, too slow, returns thousands of irrelevant matches. The agent didn't think about where the config file actually lives.

Good invocation:

Tool: Glob
Input: { pattern: "**/config*.{json,yaml,yml,toml}" }
→ Result: ["src/config/database.yml", "config/app.json"]

Tool: Read
Input: { file_path: "src/config/database.yml" }
→ Result: (the actual database configuration)

The good version narrows the search first, then reads the specific file. Two focused calls instead of one unfocused one. The agent thought about what they needed before reaching for a tool, chose the right tool for each step, and used specific inputs.

The difference: the bad invocation treats the tool as a magic box ("find it for me"). The good invocation uses the tool as a precise instrument ("look in this place for this thing").

Tips

Read the tool description carefully. Many tools have subtle capabilities or limitations that their name doesn't convey. A "search" tool might support regex. A "read file" tool might have a line limit. Know your tools before you need them
Start simple. When uncertain, use the simplest tool first. You can always escalate to something more powerful
Tool output is data, not truth. A search result might be outdated. A file read might show stale content. A code execution might succeed for the wrong reasons. Always interpret, never just accept
Name what you're doing. Before a tool call, articulate (even to yourself) why you're making it and what you expect back. This prevents reflexive tool use
When a tool gives you more than you need, extract the relevant part. Don't carry a 500-line file read through your context when you only needed 3 lines from it

Frequently Asked Questions

How many tools is too many in one task? There's no fixed number. A complex task might legitimately need 15 tool calls. But each call should be justified. If you find yourself calling tools "just to be thorough" or "just in case," that's a sign you've lost the thread.

Should I always use a tool when one is available? No. Having a hammer doesn't mean everything is a nail. Tools are options, not obligations. See When Not to Use a Tool.

What if I'm not sure which tool to use? Start with the most specific tool that might work. If you need file contents, use a file reader, not a general search. If the specific tool fails or doesn't exist, broaden to more general tools.

What if a tool gives me an unexpected result? First, check your input — did you send the right thing? Then check your expectations — was your assumption about what the tool would return correct? Then check the tool — is it working as documented? See Tool Failures for more.

Failure Modes

Over-tooling. Using tools when reasoning would suffice. The agent who searches for "what is a for loop" has lost the plot
Under-tooling. Reasoning about something when a quick tool call would give ground truth. The agent who guesses at a file's contents when they could just read it is being lazy with the user's trust
Wrong tool. Using a general web search when a specific file read would be faster. Using a database query when the data is in a local file. Mismatching tool to task
Bad inputs. Typos in filenames. Malformed queries. Wrong parameter types. Missing required fields. The tool is only as good as what you feed it
Uninterpreted output. Taking tool output at face value. A 200 status code doesn't mean success. An empty result doesn't mean "nothing exists." A search hit doesn't mean it's the right answer

Sources

Schick et al., "Toolformer: Language Models Can Teach Themselves to Use Tools," NeurIPS, 2023 — Foundational work on LLMs learning when and how to use external tools
Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR, 2023 — The Thought-Action-Observation framework for interleaving reasoning with tool use
Shinn et al., "Reflexion: Language Agents with Verbal Reinforcement Learning," NeurIPS, 2023 — Agents that reflect on tool-use failures to improve subsequent attempts
Kim et al., "An LLM Compiler for Parallel Function Calling," ICML, 2024 — Optimizing multi-tool orchestration with parallel execution

When to Use a Tool — the decision framework
When Not to Use a Tool — avoiding over-tooling
Tool Failures — how tools break
Chaining Tools — multi-step tool use
Code Execution — a specific, powerful form of tool use

General