AI Agent Tool Use Guide: Function Calling and MCP Explained
What you'll learn: How AI agent tool use actually works under the hood, why some agents fake their tool calls, and how MCP is becoming the industry standard as of 2026.
AI Agent Mastery Series — Part 6 of 10 | Intermediate
Let me be honest: when I first encountered AI agent tool use, I thought it was magic. How does a text-generation model suddenly call an API, query a database, or send an email?
After 25 years in network engineering — building infrastructure automation with Ansible and Terraform, running team LLM services on local GPU clusters — I recognized the pattern. Tool use is essentially API orchestration, except the LLM decides which API to call and when. The actual execution? Still your code.
That distinction matters more than most tutorials let on.
▶ Table of Contents (click to expand)
How AI Agent Tool Use Actually Works
Without tool use, an LLM answers like this:
"Tokyo's weather is typically around 15–20°C this time of year."
That's a guess. With AI agent tool use, the flow changes entirely:
- LLM decides to call
get_weather({location: "Tokyo"}) - Your code executes the actual weather API
- Result
18.2°Creturns to the LLM - LLM generates a response based on real data
The core rule: The LLM only decides what to call and with what parameters. Execution always stays in your code.
All major providers — OpenAI, Anthropic, Google — follow this same 5–6 step loop:
| Step | What Happens |
|---|---|
| 1. Tool definition | Define function name, parameters, types in JSON schema |
| 2. Send request | User message + tool definitions sent to LLM API |
| 3. Tool selection | LLM returns structured JSON: function name + parameters |
| 4. Tool execution | Developer code runs the actual function |
| 5. Return result | Execution result sent back to LLM |
| 6. Final response | LLM synthesizes result into natural language answer |
The tool definitions ride along in the context window every single call — which means they cost tokens. Keep them tight but descriptive.
Function Calling vs Tool Use: Same Thing, Different Era
Short answer: they're the same thing now.
The naming history matters for understanding legacy codebases. OpenAI introduced functions + function_call parameters in early 2023. By December 2023, the functions parameter was officially deprecated. The tools parameter became the standard.
Early 2023: functions + function_call introduced
↓
2023-12-01: functions parameter deprecated
↓
Present: tools parameter is the standard
It wasn't just a rename. tools introduced parallel function calling, code interpreter, file search — a broader category of capabilities that functions couldn't accommodate.
Today, OpenAI's own docs read: "Function calling (also known as tool calling)" — meaning they're used interchangeably. But if you're building something new, use tools. functions is legacy.
Writing Tool Definitions That Actually Work
This is where most agent failures originate. A bad tool definition means the LLM won't know when to use a tool, will use the wrong one, or will hallucinate parameters.
What Not to Do
{
"name": "search",
"description": "Searches the web",
"parameters": {
"type": "object",
"properties": {
"query": { "type": "string" }
}
}
}
The problem: the LLM has no idea when to use this tool. Should it use search for math questions? For internal docs? For breaking news? It'll guess — and guess wrong.
What to Do Instead
{
"name": "web_search",
"description": "Use ONLY for current events, recent news, or data that changes frequently. Do NOT use for general knowledge questions already in training data.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query. Be specific. Example: 'OpenAI GPT-5 release date 2026'"
},
"max_results": {
"type": "integer",
"description": "Maximum results to return (1-10)",
"minimum": 1,
"maximum": 10
}
},
"required": ["query"],
"additionalProperties": false
},
"strict": true
}
The description field is doing the heavy lifting here. It tells the LLM both when to use and when not to use this tool.
Seven Principles That Hold Up in Production
- Specify usage conditions — Without "when to use this," the LLM guesses
- Use enum for fixed value sets — Prevents free-text hallucinations
- Include examples in parameter descriptions —
"e.g. Seoul, South Korea"works better than abstract descriptions - Enable strict mode — OpenAI
strict: true, GeminiVALIDATEDmode - Set
additionalProperties: false— Blocks unexpected parameters - Add tool priority in system prompt — When multiple tools overlap, tell the LLM which to prefer
- Minimize tool count — One enterprise team with 340 internal tools reported significant accuracy degradation. More tools, more confusion.
When AI Agents Fake Their Tool Calls
"Standard large language models hallucinate facts. AI agents hallucinate actions."
— Ysquare Technology, 2026
This is tool-use hallucination: the agent reports it called an API without actually calling it.
Why is this worse than regular hallucination? You can fact-check a wrong answer. You can't fact-check an action that never happened.
Three Failure Patterns
1. Parameter Hallucination
Correct tool, wrong parameters. A meeting room booking API has a 10-person limit. The agent requests 15. API rejects it. Agent reports: "Booking confirmed."
2. Tool Selection Hallucination
Wrong tool, or inventing a nonexistent one. A customer service bot promises a refund — but only queried a read-only FAQ. No refund was ever processed.
3. Tool Bypass — The Most Dangerous
Skips the tool call entirely and fabricates the result. Confirms an airline ticket booking without touching the payment gateway. An inventory agent could trigger real purchase orders based on invented stock levels.
The detection numbers are sobering. Research shows that in multi-step agentic workflows, isolating tool-use hallucinations with accuracy drops to 11.6%. A hallucination at step 2 can corrupt the entire output by step 7.
One real-world cost estimate: $14,200 per employee per year — the "verification tax" organizations pay having humans double-check AI-claimed actions.
How to defend against it: Keep execution logs separate from AI responses. Auto-flag responses with no corresponding log entry. Apply strict mode and the principle of least privilege — read-only agents shouldn't have write access.
Running our team's LLM service on local GPU hardware, I saw this firsthand. An agent logged network commands that never reached the actual devices. That's when I started treating execution logs and device state as two separate sources of truth.
MCP: The Standard That Wants to Be AI's USB-C Port
Every provider's tool use implementation looks different:
| Feature | OpenAI | Anthropic | Google Gemini |
|---|---|---|---|
| Tool definition | tools[] |
tools[] |
tools[].functionDeclarations[] |
| Parameter schema key | parameters |
input_schema |
parameters |
| Call response format | tool_calls[] |
tool_use block |
functionCall part |
| Argument format | JSON string (needs parsing) | Parsed object | Parsed object |
| Result return role | role: "tool" |
tool_result in user msg |
functionResponse part |
Build for all three, and you're writing three separate integration layers. That's the N×M problem — every agent needing custom connectors for every data source.
Anthropic published MCP (Model Context Protocol) in November 2024 to solve this. The "USB-C port for AI" analogy holds: one standard protocol, any LLM, any external tool.
MCP Adoption as of 2026
| Metric | Number |
|---|---|
| Public MCP servers | 10,000+ |
| GitHub repositories | 15,926 |
| Monthly SDK downloads | 97M+ |
| Enterprise production adoption | 41% |
This isn't just Anthropic's bet. OpenAI adopted MCP in March 2025. Google followed in April 2025. In December 2025, MCP was donated to the Linux Foundation's Agentic AI Foundation — co-founded by Anthropic, Block, and OpenAI.
It's becoming the industry default.
Three Trends Shaping Tool Use in 2026
Parallel Tool Calling
Instead of sequential calls, the LLM fires multiple independent tools simultaneously. Setting music, adjusting lights, and activating a disco ball in one request — results mapped back via id. This isn't just a nice-to-have; it fundamentally changes latency expectations for agents.
Tool Search for Large-Scale Systems
OpenAI's gpt-5.4 and above now support tool_search — dynamically loading relevant tools rather than stuffing all 340 into the context window. One team managing 12 enterprise tenants with hundreds of internal tools reported this as a critical architectural shift.
MCP Apps
Announced January 2026. MCP servers can now deliver interactive UI components — dashboards, forms, data visualizations — directly through Claude and ChatGPT host apps. The boundary between "tool" and "interface" is blurring.
One more thing worth flagging before the next post.
The performance picture is more nuanced than most comparisons show. On multi-turn reliability, Claude Opus 4.6 leads at 8.4/10. On single-turn accuracy (TAU2 benchmark), GPT-5.4 hits 98.7%. Google's Gemini 3.1 Pro tops cross-MCP coordination at 69.2%.
The "best" model for AI agent tool use depends entirely on what you're building. There's no universal winner yet — and that's probably the most honest thing I can say about where the field stands.
Next up: what happens when multiple agents need to collaborate — or when they conflict. Multi-agent systems, coordination protocols, and the new class of failures that comes with them.
Sources: OpenAI API Docs, Google AI Developers Docs (updated 2026-06-10), Anthropic MCP Announcement (2024-11-25), Prompt Engineering Guide (promptingguide.ai), Ysquare Technology (2026-04-16), Digital Applied MCP Adoption Statistics 2026, Wikipedia: Model Context Protocol
📅 First published: 2026-06-19 | 🔄 Last updated: 2026-06-19


