How AI Agents Work: The Loop That Makes Them Powerful

Bottom line: An AI agent works by running a continuous loop — perceive, plan, act, reflect — where each step calls real tools and writes results into growing memory. Unlike a chatbot that stops after one reply, the agent keeps looping until the goal is done or a stop condition fires.

▶ Table of Contents (click to expand)

Three Lines of Code That Explain How AI Agents Work
ReAct: The Reasoning Engine Inside Every AI Agent
Tool Calling: How an AI Agent Touches the Outside World
Memory: What an AI Agent Keeps, and What Gets Lost
Multi-Agent Systems: Why Teams Beat Solo Agents
One Last Thing

Ask a chatbot to plan a cycling tour of Paris — you get a nicely written paragraph. Ask an AI agent the same thing, and it actually calls an API to check that cycling from the Eiffel Tower to Notre-Dame takes 25 minutes, then checks another leg, then assembles the optimal route. That gap is not a matter of a smarter model. It's a matter of architecture.

I've spent 25 years running enterprise networks. When I first dug into how AI agents work, the structure reminded me immediately of how a router handles packets: input arrives, routing table consulted, forwarded to the next hop, result observed — repeat. The core logic is almost identical. The difference is that a router follows a fixed table, while an AI agent rewrites the table on every cycle.

AI agent internal architecture overview — Brain (LLM), Perception, and Action as three core components

Three Lines of Code That Explain How AI Agents Work

The fastest way to understand how AI agents work is to read this loop — straight from Hugging Face's smolagents documentation:

memory = [user task]
while llm_should_continue(memory):
    action = llm_get_next_action(memory)   # LLM picks next action
    observations = execute_action(action)   # Execute it
    memory += [action, observations]        # Stack results into memory

Three lines. All of it is in here.

The line that matters most is the last one: memory += [action, observations]. Every iteration, what the agent did and what happened gets appended to the record. The next decision reads the entire accumulated record. Memory starts as just the user's task and grows like this:

[task → action1 → observation1 → action2 → observation2 → ...]

How much the LLM actually controls the flow defines the agent's level of agency:

Level	LLM's Role	Example
0	Generates text only, no flow control	Simple summarizer
1	Makes branching decisions	Router (`if llm_decision()`)
2	Decides whether to call a function	Tool use
3	Decides whether to keep looping	Multi-step agent
4	Decides whether to spawn other agents	Multi-agent orchestration

What we call an "AI agent" lives at levels 3 and 4. The LLM doesn't just answer — it controls the loop.

So what's happening inside each loop iteration? That's where ReAct comes in.

ReAct: The Reasoning Engine Inside Every AI Agent

Honestly, this was the most surprising thing about how AI agents work when I first understood it properly.

Before the agent acts, it writes out its thinking in plain text. And that thinking isn't a byproduct — it's the actual reasoning process that drives the next action. The ReAct framework (Reason + Act), published in 2022, formalized this pattern.

Agentic loop cycle diagram — showing memory accumulation across iterations

Here's that Paris cycling tour running through the ReAct loop:

[Step 1]
Thought: I need travel durations. Using get_travel_duration tool.
Action: get_travel_duration("Eiffel Tower", "Notre-Dame", "bicycling")
Observation: "25 minutes"

[Step 2]
Thought: Need the next segment too.
Action: get_travel_duration("Notre-Dame", "Montmartre", "bicycling")
Observation: "40 minutes"

[Final]
Thought: Enough data. Assembling optimal itinerary.
Final answer: Eiffel Tower → Notre-Dame (25 min) → Montmartre (40 min) → ...

Each Thought isn't just "do this next." According to the ReAct paper, the Thought step is where the agent builds plans, tracks errors from previous steps, handles unexpected situations, and updates its approach. All of that happens in a few sentences of natural language.

Here's the key difference from Chain of Thought (CoT): CoT only reasons — it never touches external tools. That means a CoT-only model can confabulate confidently about things it doesn't actually know. ReAct reaches out to the real world. When it doesn't know, it searches.

This distinction directly addresses hallucination. In HotPotQA benchmark experiments, ReAct using Wikipedia access dramatically reduced hallucination and error propagation compared to reasoning-only approaches. On ALFWorld it outperformed prior methods by 34%; on WebShop by 10%.

Push the reasoning mechanism further and you get Tree of Thoughts (ToT) — instead of one reasoning path, ToT explores multiple branches simultaneously and picks the best one. On the Game of 24 math puzzle, GPT-4 with CoT succeeded 4% of the time. With ToT: 74%. Same model, different reasoning strategy.

💡 Key point: ReAct ties reasoning and action into one loop. Thought shapes Action; Observation shapes the next Thought. Without this cycle, an agent is just a model producing long CoT output — not actually doing anything.

Tool Calling: How an AI Agent Touches the Outside World

Tool calling — also called Function Calling — is what gives an agent its hands. Every interaction with the external world follows five steps:

The application sends the model a list of available tools (name, description, parameters)
The model analyzes the user's request and decides which tool to call with which arguments
The application receives the tool name and arguments from the model
The application executes the actual function
The result goes back to the model, which uses it to continue

One thing worth being precise about: the model does not execute the function directly. It only issues the instruction — "call this tool with these arguments." Execution always happens in application code. That boundary matters because it's where a human can review or modify results before they feed back into the model.

Complex tasks loop through all five steps multiple times. The LLMCompiler system proposed running tool calls in parallel — using a Function Calling Planner, Task Fetching Unit, and Executor structure — achieving up to 3.7x latency reduction and 6.7x cost savings compared to sequential execution.

Memory: What an AI Agent Keeps, and What Gets Lost

Human memory research describes sensory, short-term, and long-term memory as a layered system. AI agent memory maps onto the same structure — not by coincidence, but because both are solving the same problem: how to retain and retrieve information across time with limited working space.

Memory Type	Storage Location	Constraint
Sensory memory	Input embedding layer	Current input only
Short-term memory	Context window	Hard window size limit
Long-term memory	External vector DB / knowledge graph	Effectively unlimited

The practical bottleneck is short-term memory. As the agentic loop runs, [action + observation] pairs keep appending. On long tasks, the context window fills up. Long-term memory solves this by storing important information in an external vector database and retrieving it on demand via algorithms like FAISS or HNSW.

The Reflexion framework takes memory further. When a task fails, the agent writes a verbal reflection on why — and stores that reflection text in episodic memory. The next attempt starts by reading those notes. No weight updates, no retraining — just learning from written experience. On the HumanEval coding benchmark, Reflexion achieved 91% accuracy, outperforming GPT-4 alone at 80%.

Multi-Agent Systems: Why Teams Beat Solo Agents

A single AI agent hits real limits: context windows overflow, one model can't be expert in everything, and serial execution is slow for parallelizable work.

Chain of Thought vs Tree of Thoughts comparison diagram — single path vs branching exploration

Anthropic identifies three core multi-agent patterns:

Prompt Chaining — Each agent's output becomes the next agent's input. Sequential, like an assembly line
Orchestrator-Worker — A central orchestrator dynamically routes subtasks to specialized workers, then assembles results
Evaluator-Optimizer — One agent generates output; another evaluates it and sends feedback; the cycle repeats until quality meets the bar

The most unexpected finding from the Generative Agents research by Stanford and Google: 25 simulated characters, given only memory, planning, and reflection mechanisms, spontaneously developed social behaviors — sharing information, making appointments, organizing a party — with no explicit instructions to do so. The emergent coordination was the surprise, not the individual agent behavior.

One Last Thing

When I finally understood how AI agents work end to end, what surprised me wasn't the complexity. It was how simple the core loop is.

while should continue:
    think → act → observe

Running a team LLM service on two GPUs, watching agents loop in real time — each iteration you can see the agent correct course based on what it just observed. It calls the wrong tool first, reads the result, adjusts. Like a junior network engineer tracing a packet hop by hop, figuring out the path.

What I'm still curious about: how does the agent know whether the loop is heading in the right direction? That's what Part 3 gets into — the different types of AI agents, and how to pick the right architecture for the task at hand.

👤 Author: 20eung (Network Engineer / Self-taught AI Coding Experimenter)

🔗 GitHub Portfolio | isthe.info Blog

📅 First Published: 2026-06-03 | 🔄 Last Updated: 2026-06-06

Search This Blog

How To Use AI