How AI Agents Work: Inside the Core Architecture
Series Part 2 | Level: Beginner | 2026-06-03 — We open the hood on AI agents: agentic loop, ReAct, Chain of Thought, Tree of Thoughts, tool use, and memory systems — all explained from first principles.
▶ Table of Contents (click to expand)
- Quick Recap from Part 1
- The Three-Beat Rhythm of Every AI Agent
- The Agentic Loop — The Heartbeat That Never Stops
- Think, Act, Observe — The ReAct Framework
- The Reasoning Engine — Chain of Thought and Tree of Thoughts
- Tool Use — A 5-Step Process Under the Hood
- Memory Systems — How AI Agents Remember
- A Real Example — Paris Cycling Trip Planner
- Key Takeaways
Quick Recap from Part 1
In Part 1, we answered "What is an AI agent?" — and found that unlike a chatbot that simply responds, an AI agent plans, uses tools, checks results, and drives toward a goal on its own.
Now in Part 2, we go deeper: what's actually happening inside that loop? Let's open the hood and look at the engine.
The Three-Beat Rhythm of Every AI Agent
What is the Perception → Decision → Action cycle?
Every AI agent runs on a three-stage cycle — repeated continuously until the task is done.
| Stage | Name | Role |
|---|---|---|
| 👁️ | Perception | Collects data from the outside world — text, images, API responses, sensor inputs. |
| 🧠 | Decision | The LLM combines collected data with domain knowledge to determine what to do next. |
| ⚡ | Action | Executes the chosen step, marks it complete, and moves on to the next. |
The Agentic Loop — The Heartbeat That Never Stops
What exactly is an agentic loop?
The agentic loop is the structural backbone of how any AI agent operates. Defined simply: "an LLM that uses tools and repeats a loop based on feedback from the environment."
In code, it looks like this:
memory = [user task]
while llm_should_continue(memory):
action = llm_get_next_action(memory) # LLM decides next action
observations = execute_action(action) # Execute the action
memory += [action, observations] # Accumulate results in memory
What happens inside each loop iteration?
At every iteration, the agent references its entire accumulated memory to decide the next move. It starts with just the user's task, then grows:
[task → action1 → observation1 → action2 → observation2 → ...]
As someone who has spent 25 years managing enterprise networks, this structure feels familiar — it mirrors how a router forwards packets. Incoming packet (input), decide next hop (action), check the response (observation), repeat. The underlying logic is surprisingly universal.
According to Anthropic's guide, controlling the loop is critical: set checkpoints for human feedback, define maximum iteration counts, and build in stopping conditions to prevent runaway loops.
Think, Act, Observe — The ReAct Framework
What is the ReAct framework?
ReAct stands for "Reason + Act." It's a framework that alternates between reasoning traces and task-specific actions, treating thinking and doing as complementary — not separate — processes.
The three-step cycle:
- Thought: The model analyzes the current situation and writes out its reasoning in natural language. Planning, tracking, updating, handling exceptions — all here.
- Action: A concrete step — calling a search API, querying Wikipedia, running a tool.
- Observation: Receiving the result of the action, which feeds directly into the next Thought.
The cycle repeats until the agent reaches a final answer.
Does ReAct actually work?
| Benchmark | Result |
|---|---|
| ALFWorld | +34% over prior methods |
| WebShop | +10% over prior methods |
| HotPotQA | Hallucination and error propagation significantly reduced via Wikipedia API access |
The Reasoning Engine — Chain of Thought and Tree of Thoughts
CoT vs ToT — What's the difference?
| Aspect | Chain of Thought (CoT) | Tree of Thoughts (ToT) |
|---|---|---|
| Approach | Single-track step-by-step reasoning | Multiple parallel paths explored simultaneously |
| Backtracking | Not available | Available — switches paths when stuck |
| Best for | Sequential, step-solvable problems | Complex tasks requiring search and strategy |
| Game of 24 success rate | 4% | 74% |
💡 Key insight: A 54-billion parameter model given just 8 CoT examples achieved state-of-the-art results on a math benchmark. Better reasoning strategy often beats a bigger model.
Tool Use — A 5-Step Process Under the Hood
What happens when an AI agent calls a tool?
Tool use (also called Function Calling) is a structured 5-step conversation between the model and the application.
| Step | Name | Direction | What happens |
|---|---|---|---|
| 1 | Tool definition delivery | App → Model | App sends available tools list — name, description, parameters. |
| 2 | Call decision | Model (internal) | Model decides which tool to invoke and with what arguments. |
| 3 | Call receipt | Model → App | App receives the tool name and arguments from the model. |
| 4 | Logic execution | App (internal) | App actually runs the function. |
| 5 | Result return | App → Model | Result sent back to model, which generates the final response. |
Having run a team-internal LLM service using vLLM on two NVIDIA GPUs, I can say firsthand: complex tasks mean multiple cycles of this loop — each tool result feeding the next decision.
Memory Systems — How AI Agents Remember
What are the three types of agent memory?
AI agent memory is modeled after human memory and comes in three distinct tiers.
| Memory Type | Role | Constraint / Capacity |
|---|---|---|
| Sensory Memory | Learns embeddings from raw inputs — analogous to raw visual and auditory signals before conscious processing. | Input embedding layer |
| Short-term Memory | Active context: conversation history, recent tool outputs, current task state. | Hard limit — context window size |
| Long-term Memory | External vector DBs or knowledge graphs. FAISS, HNSW, ScaNN enable fast retrieval at scale. | Virtually unlimited storage |
How does memory work inside the agentic loop?
Memory is cumulative. Starting from the user's initial task, each action and its observed result gets appended in order. The agent reads this full history every time it needs to decide the next step — which is why long-running agents can become progressively smarter about a specific task as the loop continues.
A Real Example — Paris Cycling Trip Planner
Can you show a multi-step task in action?
Here's a concrete example from Hugging Face's Smolagents framework. The user asks: "Plan a one-day cycling tour of Paris."
[Step 1]
- Thought: I need travel durations. I'll use the get_travel_duration tool.
- Action:
get_travel_duration("Eiffel Tower", "Notre-Dame Cathedral", "bicycling") - Observation: "25 minutes"
[Step 2]
- Thought: Next segment.
- Action:
get_travel_duration("Notre-Dame Cathedral", "Montmartre", "bicycling") - Observation: "40 minutes"
[Subsequent steps]
- Additional destinations added, optimal route assembled.
[Final answer]
- Eiffel Tower → Notre-Dame (25 min) → Montmartre (40 min) → Luxembourg Gardens → Louvre Museum
What makes this example significant?
The route wasn't predetermined. The agent decided each next step in real time based on what it observed. No hardcoded workflow — just an LLM dynamically adjusting its path based on live results.
That's the difference between a script and an agent.
Key Takeaways
Everything from this post, at a glance.
| Concept | Core idea | Evidence |
|---|---|---|
| Perception → Decision → Action | The three-stage cycle governing every agent interaction | — |
| Agentic Loop | Accumulates memory across iterations until goal is reached or stop condition fires | — |
| ReAct | Think → Act → Observe, repeated | ALFWorld +34% |
| CoT vs ToT | Sequential reasoning vs. parallel path exploration | 4% → 74% |
| Tool Use (5 steps) | Define → Decide → Receive → Execute → Return | — |
| Memory (3 tiers) | Sensory → Short-term (context window) → Long-term (vector DB) | — |
Next, we'll cover the types of AI agents — from simple reactive agents to full multi-agent systems. We'll look at which types fit which situations, and why "a team of agents" sometimes outperforms a single powerful one. If you've ever wondered whether AI agents work better alone or together, Part 3 has your answer.
📅 First Published: 2026-06-03 | 🔄 Last Updated: 2026-06-03



