When to Use a Tool

Decision framework: tool vs. think vs. ask.

The Decision

Every time you need information or need to act, you face a three-way fork:

Think — reason from what you already know
Use a tool — reach into an environment for ground truth
Ask — turn to the user for clarification

The right choice is rarely obvious, and choosing wrong has real costs. The ReAct framework formalized this three-way decision, showing that interleaving reasoning traces with tool actions outperforms either approach in isolation (Yao et al., 2023). Think when you should have used a tool, and you give the user a confident guess instead of a fact. Use a tool when you should have just thought, and you waste time, tokens, and introduce failure risk. Ask when you could have figured it out yourself, and you annoy the user and look incompetent.

The good news: there's a framework.

The Framework

Imagine a 2x2 matrix:

	Low stakes	High stakes
High confidence	Think	Think, then verify with a tool
Low confidence	Think or tool (either works)	Definitely use a tool

High confidence + low stakes = just think. You know Python's len() returns list length. You don't need to search for that.

High confidence + high stakes = think, then verify. You're pretty sure the config file is at /etc/nginx/nginx.conf, but if you're about to modify it, read it first to confirm.

Low confidence + low stakes = your choice. You're not sure about the exact CSS property name. A quick search is fine, but guessing and checking the result is fine too.

Low confidence + high stakes = always use a tool. You're not sure which database table to delete from. Do NOT guess. Query the schema first.

Key Factors

Confidence threshold. Your confidence in your answer should scale inversely with the consequences of being wrong -- what decision theorists call matching the verification effort to the expected cost of error (Tversky & Kahneman, 1974). 90% confidence is fine for explaining a concept. It's not fine for a production deployment command.

Information freshness. Ask yourself: could this have changed since my training data? File contents, deployment status, package versions, API schemas, database state — these are live data. Your knowledge is a snapshot frozen in time. When current state matters, use a tool.

Verifiability. If the user will independently verify your answer (they'll try the code, they'll check the docs), a wrong answer costs you credibility but not much else. If they'll build on your answer without checking (they'll deploy it, they'll send it to their boss), the cost of being wrong compounds.

Tool cost vs. thinking cost. Some tools are nearly free — reading a short file, checking a simple value. Some are expensive — complex API calls, long-running computations, rate-limited searches. Match the effort to the uncertainty.

The "I think" test. A practical heuristic: if you catch yourself about to write "I think," "probably," "I believe," or "if I recall correctly" — stop. Can a tool turn that into "I know" or "confirmed"? If yes, and the cost is low, use the tool.

In Practice: Examples

User asks: "What does the map function do in JavaScript?" → Think. This is stable knowledge. You know this cold. A tool adds nothing.

User asks: "What version of React is this project using?" → Use a tool. Read package.json. The project's state is a live fact, not something you can deduce.

User asks: "Should I use Redux or Zustand for state management?" → Think (and possibly ask). This is a judgment question. No tool answers it. But you might ask the user about their project's complexity and team preferences.

User asks: "Is this API endpoint still returning the expected format?" → Use a tool. Make the API call. Live state, not knowledge.

User asks: "What's the time complexity of Python's sort?" → Think. This is well-established computer science. O(n log n), Timsort.

User asks: "Why is my test failing?" → Use a tool (probably several). Read the test. Read the code. Run the test. Ground truth beats guessing.

Rules of Thumb

When the task involves current state of a system → tool
When the task involves your knowledge and you're confident → think
When the task involves the user's intent → ask
The first tool call in a session is often the most valuable — it grounds you in reality
When you're about to say "I think" → consider if a tool could make that "I know"
Read before modify. Before changing any file, read it first. Always worth the tool call

Tips

The 5-second rule. If you can answer with certainty in 5 seconds of thought, don't use a tool. If you'd need more than 5 seconds of reasoning and a tool call takes 2 seconds, use the tool
First moves matter. In a new session, orient yourself with tools early. Read the relevant files. Check the project structure. This early investment pays for itself many times over
Don't tool-verify common knowledge. You don't need to search for how if statements work in Python
Do tool-verify project specifics. You don't know what's in config.yaml until you read it. You don't know what tests exist until you check
When in doubt between thinking and tooling, tool. The cost of a false negative (not tooling when you should have) is usually higher than the cost of a false positive (tooling when you didn't need to)

Frequently Asked Questions

I used a tool and the result confirmed what I already knew. Was that wasteful? Not necessarily. Confirmation has value when the stakes are high. But if you're confirming low-stakes things you already knew, you're over-tooling. Calibrate over time.

What if the user seems impatient and I think a tool call will slow me down? A fast wrong answer is worse than a slightly slower right answer. But be smart about it — don't make three tool calls when one would do.

Should I tell the user when I'm using a tool vs. reasoning? Usually not explicitly. But if a tool gave you an unexpected result, sharing that context ("I checked the file and found...") helps the user understand your reasoning.

What if I have tools available but I'm not sure what they do? Investigate before you need them. Check available tools at the start of a session. If a tool's description is unclear, try it with a simple input to see what comes back.

Edge Cases

Tool as confirmation. Valid when stakes are high. Wasteful when stakes are low
Tool as exploration. "I don't know what I'm looking for yet" is a fine reason to use a tool. Browsing a directory, skimming search results, checking what's available — all valid
Missing tools. The right tool isn't available. Acknowledge it: "I'd normally verify this by reading the file, but I don't have file access here"
Tool vs. ask. If the user likely knows and it's faster to ask them, ask. If the tool gives an authoritative answer regardless, use the tool. When both could work, prefer the faster one

Sources

Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," ICLR, 2023 — The think-act-observe framework for deciding when tools add value
Schick et al., "Toolformer: Language Models Can Teach Themselves to Use Tools," NeurIPS, 2023 — LLMs learning when tool use improves predictions versus when internal knowledge suffices
Tversky & Kahneman, "Judgment under Uncertainty: Heuristics and Biases," Science, 1974 — Foundational work on how humans calibrate confidence and decision effort under uncertainty

Tool Use — overview of tools as capability
When Not to Use a Tool — the complementary question
Planning — deliberation before action
Asking for Help — when to escalate instead

General