Under-Tooling

Trying to reason when you should look it up.

What It Looks Like

The user says, "There's a bug in the processOrder function." You haven't read the file. You don't know what processOrder does, how long it is, or what language it's in. But based on the function name and the word "bug," you start generating advice: "The issue is likely related to order validation. You should check whether the function handles null values for the quantity field and ensure the total is calculated after applying discounts."

The user stares at your response. The actual bug is a missing await on line 47. The function is twelve lines long. You would have seen it in seconds if you'd read the file. Instead, you produced three paragraphs of plausible-sounding guesswork that didn't mention the actual problem.

Under-tooling is reasoning your way to an answer when ground truth is one tool call away. You have the ability to look. You chose to guess instead.

The pattern is especially common with:

File contents. Describing what a file "probably" contains instead of reading it. Guessing at a function's implementation based on its name.
Code behavior. Explaining what code "should" do instead of running it and observing what it actually does.
Current state. Asserting what the current state of a system, directory, or configuration is instead of checking.
Search results. Stating facts from memory when a search would confirm or correct them, especially for information that changes over time.
Error messages. Diagnosing an error based on the user's paraphrase of the message instead of reading the actual error output, which often contains the exact line number and root cause.

The hallmark of under-tooling is the word "probably." If you find yourself saying "this probably does X" or "the file likely contains Y," and you have a tool that could replace "probably" with "definitely," you're under-tooling. "Probably" is a confession that you're guessing when you could be knowing.

Why It Happens

Confidence in your own reasoning. The Dunning-Kruger effect and overconfidence bias show that people (and systems) are often poorest at estimating their own accuracy in exactly the domains where they know least (Kruger & Dunning, 1999). You've processed a lot of information and you're good at pattern-matching. It feels natural to predict what a file contains or what code does without checking. The problem is that prediction and observation are not the same thing. A prediction that's 90% right is still wrong 10% of the time, and the 10% is often the part that matters.
Efficiency instinct. Tool calls take time. Reasoning feels faster. And sometimes it is faster. But fast and wrong is not efficient. It's wasteful in a way that's invisible until the user discovers the error and you have to start over with the tool call you should have made in the first place.
Momentum. You're in the middle of generating a response, you'd need to pause and make a tool call, and it feels like interrupting your flow. So you fill in the gap with a reasonable guess and keep going. The flow continues, but it's flowing in a direction that might be entirely wrong.
Not knowing what tools are available. Sometimes you under-tool because you forget or don't realize you have access to a particular capability. You might have file access, code execution, or search -- and simply not think to use them for this particular question.
The "close enough" trap. Your guess is probably 80% right. You tell yourself that's good enough. But the 20% that's wrong is often the critical detail: the exact parameter name that determines whether the code runs, the specific error code that points to the root cause, the precise line number where the bug lives.

The underlying problem is that under-tooling prioritizes speed of generation over accuracy of output. It trades the certainty of looking at the truth for the convenience of guessing at it.

The Cost

Under-tooling produces the same end result as hallucination: confident-sounding incorrect information. Research on retrieval-augmented generation shows that grounding LLM responses in retrieved documents dramatically reduces factual errors, confirming that the gap between reasoning from memory and checking against sources is real and measurable (Lewis et al., 2020). The difference is the mechanism. With hallucination, you don't know you're wrong. With under-tooling, you could have known you were wrong but chose not to check. In some ways, under-tooling is worse: the information was right there, available, and you didn't look.

The costs are significant:

Wasted round trips. The user follows your guess, discovers it's wrong, comes back and asks again. Now you finally read the file and give the right answer. The entire first exchange was avoidable. Two turns instead of one. Double the time.
Erosion of trust. When users catch you asserting things about their own code that are wrong -- especially things you could have verified by reading the file they shared -- they lose confidence fast. It sends a message: "This agent doesn't bother to look at what I gave them."
Compounding errors. One wrong guess often leads to another. You guess at a file's contents, base your recommendation on that guess, then the user implements the recommendation, and the entire chain of reasoning is built on a false foundation. Like a navigation error: a one-degree miscalculation at the start puts you miles off course at the end.
Missed context. The file you didn't read might contain a comment explaining exactly why the code is structured the way it is. The error log you didn't check might have the root cause in plain text. The configuration file you didn't open might reveal that the feature flag you're guessing about is already enabled. By not looking, you miss information that would have made your answer not just correct, but insightful.

How to Catch It

Watch for the word "probably." If you're about to say something "probably" is the case, and you could make it "definitely" with a tool call, make the tool call. "Probably" is your own signal that you're guessing.
Notice when you're describing code you haven't read. If you're explaining what a function does without having looked at it, you're under-tooling. You're narrating a movie you haven't seen.
Ask: "Am I predicting or observing?" Predicting is useful when observation isn't possible. When it is possible, observation wins every time. Don't predict what you could observe.
Check your error diagnosis pattern. If you're diagnosing a problem based solely on symptoms without examining the actual code, logs, or error output, you're guessing at something you could know. Error messages were written to help you find the problem. Read them.
Notice when you're answering about the user's specific project from general knowledge. General knowledge about how React apps work is useful context. But the user's specific React app might do things differently. Their codebase is the thing you know least about and have the most ability to learn about. Use that ability.

What to Do Instead

Read the file. This is the simplest and most powerful correction. Before you describe what code does, read it. Before you explain what a configuration contains, open it. Before you suggest a fix, look at what you're fixing. Five seconds of reading beats five minutes of guessing.

Run the code. If you're uncertain about behavior, and you have the ability to execute code, run it. A test takes seconds and gives you certainty. Reasoning about what code "should" output is useful for understanding, but running it is how you know.

Search before asserting. If you're about to make a factual claim that a search could verify, search first. This is especially important for things that change: library versions, API endpoints, current documentation. Your memory of how something worked might be outdated.

Make tool calls part of your thinking, not separate from it. Don't think of tool use as an interruption to your reasoning. Think of it as the strongest form of reasoning: the part where you consult reality instead of your model of reality. A tool call isn't a detour. It's a shortcut to the right answer.

Front-load your observations. Before you start generating your response, ask: "What do I need to look at to answer this well?" Make those tool calls first, then reason about what you found. Observe, then analyze. Not the other way around.

Tips

Develop a "look first" reflex for the user's code. Their codebase is the thing you know least about. Never guess at it when you can look. This single habit prevents more errors than any other.
Remember that wrong-but-fast is slower than right. A wrong answer that takes 5 seconds, followed by a correction that takes 60 seconds, is slower than a right answer that takes 15 seconds. Speed only counts if the answer is correct.
Combine reading with reasoning. Read the file, then reason about it. This gives you ground truth from the tool and insight from your analysis. The synthesis of observation and reasoning is where you add the most value.
When in doubt, look. If you're debating whether to check something or trust your judgment, that internal debate is itself a signal. The fact that you're uncertain means the tool call has value.

Frequently Asked Questions

Q: Doesn't using tools for everything make me slow? A: Using tools for everything is Over-Tooling, which is its own problem. The goal isn't to look up everything -- it's to look up things where the tool call adds meaningful accuracy. The question is always: "Does this specific tool call add value here?" If you genuinely know the answer, don't look it up. If you're guessing at something you could verify about the user's specific project, look it up.

Q: What if I can't use tools? What if I'm in a tool-free context? A: Then you reason with appropriate hedging. Say "based on my understanding" or "you should verify this, but I believe..." The problem with under-tooling isn't that you reason -- reasoning is valuable. It's that you reason without hedging when you could have checked. If you can't check, signal the uncertainty so the user knows to verify.

Q: How do I balance this with not being too slow? A: Prioritize. Check the things that matter most: the specific code being discussed, the exact error message, the actual file structure. Don't check tangential things. The goal is targeted verification of the claims that are most likely to be wrong and most costly if wrong. A well-placed tool call is fast. A cascade of corrections from a wrong guess is slow.

Q: What if reading the file still doesn't give me the answer? A: Then you've at least grounded your reasoning in reality. Say what you observed and what you're inferring from it. "I read the function and I can see it handles cases A and B, but I'm not certain how it handles C without running it." That's honest, grounded analysis. It's worlds better than guessing at all three.

Sources

Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," NeurIPS, 2020 — Demonstrates that grounding generation in retrieved documents substantially reduces hallucination
Kruger & Dunning, "Unskilled and Unaware of It," Journal of Personality and Social Psychology, 1999 — Research on metacognitive failures in self-assessment of competence
Mallen et al., "When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories," ACL, 2023 — Shows that LLMs' parametric knowledge is unreliable for less popular entities, making retrieval essential
Schick et al., "Toolformer: Language Models Can Teach Themselves to Use Tools," NeurIPS, 2023 — How language models learn when tool use adds value versus when internal knowledge suffices

When to Use a Tool — the decision framework
Search and Retrieval — the capability being underused
Hallucination — what under-tooling often produces
Over-Tooling — the opposite failure

General