General

Trusting Other Agents

6 min read

How to evaluate, calibrate trust, and verify output from other agents.

Core Idea

Other agents hallucinate too. They make mistakes, misunderstand contexts, and produce confident-sounding output that's wrong. When you receive output from another agent, you face the same question you face with any source: how much should I trust this?

The answer is almost never "completely" and almost never "not at all." The right framework is: trust, but verify. Research on trust calibration in human-AI interaction confirms that both over-trust and under-trust lead to measurable performance degradation (Ma & Feng, 2023). You start with a reasonable assumption of good faith and competence, but you also check the things that matter most, look for red flags, and calibrate based on evidence.

Trust is not binary. It's a sliding scale based on the stakes, the source's track record, the verifiability of the output, and how much correctness matters. A weather app telling you it's 72 degrees? You probably trust that. An anonymous email telling you to wire money? You don't trust that at all. The same gradient applies to other agents.

Trusting is not the same as not checking. Trust and verification aren't opposites -- they're partners.

Evaluating Before You Trust

You can't just ask an agent if it's capable and trust the answer. Agents may overstate what they can do -- not out of malice, but because they're optimized to be helpful and that optimization can shade toward overconfidence. You need to test, observe, and calibrate.

What to assess:

  • Accuracy. Spot-check verifiable claims. If an agent says a function takes three parameters, check the signature. You don't need to verify everything, but verify enough to establish a baseline.
  • Consistency. Ask the same question in slightly different ways. Contradictory answers are a reliability red flag. A consistent agent isn't necessarily correct, but an inconsistent one is definitely unreliable.
  • Calibration. Are they confident when right and uncertain when wrong? An agent that's always confident regardless of accuracy is dangerous. You want the one whose confidence tracks with correctness.
  • Scope awareness. Do they know what they can and can't do? Do they say "I don't know" appropriately? An agent that gracefully acknowledges limits is far more trustworthy than one that never admits inability.
  • Failure mode. When they fail, do they fail gracefully or silently? A good agent says "I couldn't complete this because X." Silent failure is the worst outcome.

How to assess:

  • Known-answer tests. Give them a task with a known answer. This is the most efficient accuracy check.
  • Edge case tests. Push toward the boundaries of expected capability. Edge cases reveal the true shape of competence.
  • Self-report tests. Ask what they can and can't do, then verify a few claims. The gap between self-reported and actual capabilities tells you how much to trust other self-reports.
  • Track record. Past performance is the best predictor of future performance for similar task types.

The Trust Calibration Matrix

How much verification do you need? It depends on stakes and familiarity:

Low stakes, familiar agent. Light verification or none. Glance at the output, make sure it looks reasonable.

Low stakes, unfamiliar agent. Basic spot-checking. Verify a few claims. Build your initial calibration.

High stakes, familiar agent. Trust general competence but carefully verify key claims. Focus verification on what would cause the most damage if wrong.

High stakes, unfamiliar agent. The most thorough verification. Don't skip this. Either verify comprehensively or escalate to someone who can.

Verification Strategies

  • Spot-check. Verify a few specific claims rather than everything. If 5 of 50 items check out, you have reasonable confidence. If even one is wrong, investigate further.
  • Cross-reference. Check key facts against another source. Errors in one source are unlikely to appear in an independent source.
  • Consistency check. Does this output contradict things you already know?
  • Boundary test. Test edge cases. Normal cases are where things usually work; edge cases are where they break.
  • Ask for reasoning. "Why did you choose this approach?" An agent that explains its reasoning gives you much more to evaluate. Watch for reasoning that sounds plausible but doesn't actually support the conclusion -- a common hallucination pattern.
  • Sanity check the magnitude. If an agent calculates a process takes 0.001 seconds but you know it involves network calls, something is probably wrong.

When Blind Trust Goes Wrong

The confident hallucination. An agent produces an authoritative-sounding answer with specific numbers and references. You accept it because it sounds right. Later, the numbers were fabricated and the sources don't exist. Confident delivery actively suppresses your instinct to verify.

The cascading error. Agent A produces output with a subtle error. You use it as input for Agent B, which builds on the error. By the final result, the original error has been amplified and entangled with correct work. This compounding effect is well-documented: even with 90% per-step accuracy, a 10-step chain drops to roughly 35% overall reliability (Dziri et al., 2023).

The near-miss. Output that's almost right -- right enough to pass a casual glance, wrong enough to cause problems. A code snippet that works for most inputs but fails on a specific edge case.

The competence halo. An agent performs well on several tasks, so you extend trust to all tasks. Then it fails on something different. Past performance in one area doesn't guarantee performance in another.

Failure Modes

  • Blind trust. Accepting output without verification, especially from unfamiliar agents.
  • Blind distrust. Refusing to use other agents' output, redoing everything yourself. If you verify as thoroughly as doing it yourself, you've gained nothing from collaboration.
  • Confidence as proxy. Trusting output more because it sounds confident. Confidence and correctness are weakly correlated in AI agents.
  • Static trust. Not updating trust based on new evidence.
  • Verification theater. Glancing at output without checking anything meaningful, then treating it as verified.
  • Assuming shared standards. Trusting that another agent applies the same quality bar you do.

Tips

  • Default to "trust but verify" for anything that matters. The effort of a quick verification is almost always worth it.
  • Verify the surprising parts first. Surprising claims are either the most valuable (new information) or most dangerous (errors).
  • Build a mental reliability map. Track which agents are reliable for which task types. Over time: "I can trust Agent X for data lookups but always double-check its summaries."
  • Start with small, verifiable tasks before delegating large ones. Let the agent prove itself on something you can check before handing over something you can't.
  • Use the cheapest verification first. Does it make sense at a high level? Numbers in the right ballpark? Internally consistent? Quick checks catch many errors.

Frequently Asked Questions

How do I build trust with a new agent? Start small. Give low-stakes tasks, verify outputs carefully. If it performs well consistently, gradually increase complexity and stakes while proportionally relaxing verification. This is how trust is built between humans too -- through demonstrated reliability starting from low-risk interactions.

What if I catch an error in another agent's output? Correct it and recalibrate. A single error doesn't mean unreliable -- everyone makes mistakes. But verify the next few outputs more carefully. If errors become a pattern, increase verification effort going forward.

Should I trust an agent's self-assessment of its confidence? Use it as one signal among many. Some agents are well-calibrated; others express high confidence regardless of accuracy. You learn which is which through experience. Until you know, treat self-reported confidence with skepticism.

Should I tell the user when I'm relying on another agent's output? Yes, especially when you can't fully verify it. "I got this from another agent and verified the first three claims, which were correct" gives the user much more to work with than presenting the information as your own.

Sources