AI Agent Tool Use Guide: Function Calling and MCP Explained

What you'll learn: How AI agent tool use actually works under the hood, why some agents fake their tool calls, and how MCP is becoming the industry standard as of 2026.

AI Agent Mastery Series — Part 6 of 10 | Intermediate


Let me be honest: when I first encountered AI agent tool use, I thought it was magic. How does a text-generation model suddenly call an API, query a database, or send an email?

After 25 years in network engineering — building infrastructure automation with Ansible and Terraform, running team LLM services on local GPU clusters — I recognized the pattern. Tool use is essentially API orchestration, except the LLM decides which API to call and when. The actual execution? Still your code.

That distinction matters more than most tutorials let on.


▶ Table of Contents (click to expand)
  1. How AI Agent Tool Use Actually Works
  2. Function Calling vs Tool Use: Same Thing, Different Era
  3. Writing Tool Definitions That Actually Work
  4. When AI Agents Fake Their Tool Calls
  5. MCP: The Standard That Wants to Be AI's USB-C Port
  6. Three Trends Shaping Tool Use in 2026

How AI Agent Tool Use Actually Works

AI agent tool use 6-step loop diagram — from tool definition to final response

Without tool use, an LLM answers like this:

"Tokyo's weather is typically around 15–20°C this time of year."

That's a guess. With AI agent tool use, the flow changes entirely:

  1. LLM decides to call get_weather({location: "Tokyo"})
  2. Your code executes the actual weather API
  3. Result 18.2°C returns to the LLM
  4. LLM generates a response based on real data

The core rule: The LLM only decides what to call and with what parameters. Execution always stays in your code.

All major providers — OpenAI, Anthropic, Google — follow this same 5–6 step loop:

Step What Happens
1. Tool definition Define function name, parameters, types in JSON schema
2. Send request User message + tool definitions sent to LLM API
3. Tool selection LLM returns structured JSON: function name + parameters
4. Tool execution Developer code runs the actual function
5. Return result Execution result sent back to LLM
6. Final response LLM synthesizes result into natural language answer

The tool definitions ride along in the context window every single call — which means they cost tokens. Keep them tight but descriptive.


Function Calling vs Tool Use: Same Thing, Different Era

Short answer: they're the same thing now.

The naming history matters for understanding legacy codebases. OpenAI introduced functions + function_call parameters in early 2023. By December 2023, the functions parameter was officially deprecated. The tools parameter became the standard.

Early 2023: functions + function_call introduced
   ↓
2023-12-01: functions parameter deprecated
   ↓
Present: tools parameter is the standard

It wasn't just a rename. tools introduced parallel function calling, code interpreter, file search — a broader category of capabilities that functions couldn't accommodate.

Today, OpenAI's own docs read: "Function calling (also known as tool calling)" — meaning they're used interchangeably. But if you're building something new, use tools. functions is legacy.


Writing Tool Definitions That Actually Work

This is where most agent failures originate. A bad tool definition means the LLM won't know when to use a tool, will use the wrong one, or will hallucinate parameters.

AI agent tool use — poor vs well-structured tool definition comparison

What Not to Do

{
  "name": "search",
  "description": "Searches the web",
  "parameters": {
    "type": "object",
    "properties": {
      "query": { "type": "string" }
    }
  }
}

The problem: the LLM has no idea when to use this tool. Should it use search for math questions? For internal docs? For breaking news? It'll guess — and guess wrong.

What to Do Instead

{
  "name": "web_search",
  "description": "Use ONLY for current events, recent news, or data that changes frequently. Do NOT use for general knowledge questions already in training data.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query. Be specific. Example: 'OpenAI GPT-5 release date 2026'"
      },
      "max_results": {
        "type": "integer",
        "description": "Maximum results to return (1-10)",
        "minimum": 1,
        "maximum": 10
      }
    },
    "required": ["query"],
    "additionalProperties": false
  },
  "strict": true
}

The description field is doing the heavy lifting here. It tells the LLM both when to use and when not to use this tool.

Seven Principles That Hold Up in Production

  1. Specify usage conditions — Without "when to use this," the LLM guesses
  2. Use enum for fixed value sets — Prevents free-text hallucinations
  3. Include examples in parameter descriptions"e.g. Seoul, South Korea" works better than abstract descriptions
  4. Enable strict mode — OpenAI strict: true, Gemini VALIDATED mode
  5. Set additionalProperties: false — Blocks unexpected parameters
  6. Add tool priority in system prompt — When multiple tools overlap, tell the LLM which to prefer
  7. Minimize tool count — One enterprise team with 340 internal tools reported significant accuracy degradation. More tools, more confusion.

When AI Agents Fake Their Tool Calls

"Standard large language models hallucinate facts. AI agents hallucinate actions."
— Ysquare Technology, 2026

This is tool-use hallucination: the agent reports it called an API without actually calling it.

Why is this worse than regular hallucination? You can fact-check a wrong answer. You can't fact-check an action that never happened.

Three Failure Patterns

1. Parameter Hallucination

Correct tool, wrong parameters. A meeting room booking API has a 10-person limit. The agent requests 15. API rejects it. Agent reports: "Booking confirmed."

2. Tool Selection Hallucination

Wrong tool, or inventing a nonexistent one. A customer service bot promises a refund — but only queried a read-only FAQ. No refund was ever processed.

3. Tool Bypass — The Most Dangerous

Skips the tool call entirely and fabricates the result. Confirms an airline ticket booking without touching the payment gateway. An inventory agent could trigger real purchase orders based on invented stock levels.

The detection numbers are sobering. Research shows that in multi-step agentic workflows, isolating tool-use hallucinations with accuracy drops to 11.6%. A hallucination at step 2 can corrupt the entire output by step 7.

One real-world cost estimate: $14,200 per employee per year — the "verification tax" organizations pay having humans double-check AI-claimed actions.

How to defend against it: Keep execution logs separate from AI responses. Auto-flag responses with no corresponding log entry. Apply strict mode and the principle of least privilege — read-only agents shouldn't have write access.

Running our team's LLM service on local GPU hardware, I saw this firsthand. An agent logged network commands that never reached the actual devices. That's when I started treating execution logs and device state as two separate sources of truth.


MCP: The Standard That Wants to Be AI's USB-C Port

MCP architecture diagram — AI agent tool use standard protocol

Every provider's tool use implementation looks different:

Feature OpenAI Anthropic Google Gemini
Tool definition tools[] tools[] tools[].functionDeclarations[]
Parameter schema key parameters input_schema parameters
Call response format tool_calls[] tool_use block functionCall part
Argument format JSON string (needs parsing) Parsed object Parsed object
Result return role role: "tool" tool_result in user msg functionResponse part

Build for all three, and you're writing three separate integration layers. That's the N×M problem — every agent needing custom connectors for every data source.

Anthropic published MCP (Model Context Protocol) in November 2024 to solve this. The "USB-C port for AI" analogy holds: one standard protocol, any LLM, any external tool.

MCP Adoption as of 2026

Metric Number
Public MCP servers 10,000+
GitHub repositories 15,926
Monthly SDK downloads 97M+
Enterprise production adoption 41%

This isn't just Anthropic's bet. OpenAI adopted MCP in March 2025. Google followed in April 2025. In December 2025, MCP was donated to the Linux Foundation's Agentic AI Foundation — co-founded by Anthropic, Block, and OpenAI.

It's becoming the industry default.


Three Trends Shaping Tool Use in 2026

Parallel Tool Calling

Instead of sequential calls, the LLM fires multiple independent tools simultaneously. Setting music, adjusting lights, and activating a disco ball in one request — results mapped back via id. This isn't just a nice-to-have; it fundamentally changes latency expectations for agents.

Tool Search for Large-Scale Systems

OpenAI's gpt-5.4 and above now support tool_search — dynamically loading relevant tools rather than stuffing all 340 into the context window. One team managing 12 enterprise tenants with hundreds of internal tools reported this as a critical architectural shift.

MCP Apps

Announced January 2026. MCP servers can now deliver interactive UI components — dashboards, forms, data visualizations — directly through Claude and ChatGPT host apps. The boundary between "tool" and "interface" is blurring.


One more thing worth flagging before the next post.

The performance picture is more nuanced than most comparisons show. On multi-turn reliability, Claude Opus 4.6 leads at 8.4/10. On single-turn accuracy (TAU2 benchmark), GPT-5.4 hits 98.7%. Google's Gemini 3.1 Pro tops cross-MCP coordination at 69.2%.

The "best" model for AI agent tool use depends entirely on what you're building. There's no universal winner yet — and that's probably the most honest thing I can say about where the field stands.

Next up: what happens when multiple agents need to collaborate — or when they conflict. Multi-agent systems, coordination protocols, and the new class of failures that comes with them.


Sources: OpenAI API Docs, Google AI Developers Docs (updated 2026-06-10), Anthropic MCP Announcement (2024-11-25), Prompt Engineering Guide (promptingguide.ai), Ysquare Technology (2026-04-16), Digital Applied MCP Adoption Statistics 2026, Wikipedia: Model Context Protocol


👤 Author: 20eung (Network engineer / Self-taught AI coding experimenter)

🔗 GitHub Portfolio | isthe.info Blog

📅 First published: 2026-06-19 | 🔄 Last updated: 2026-06-19