What Is an AI Agent? The Mechanics, Explained

Ask ten people what an AI agent is and you will get ten answers, most of them about "autonomy" and none of them about how the thing actually works. The honest version is simpler, and more useful: an AI agent is a model, in a loop, with tools. Every turn of that loop is a single API call to a language model. Understand that one call and you understand agents better than most of the conversation.

This is the anatomy, in plain English, with no hand-waving. By the end you could whiteboard one. Here is the whole thing as a scrollable deck, and the written breakdown follows.

What Exactly Is an AI Agent? The full deck (Part 1 of 3), page 1 — What Exactly Is an AI Agent? The full deck (Part 1 of 3)Download PDF

What Exactly Is an AI Agent? The full deck (Part 1 of 3), page 2 — What Exactly Is an AI Agent? The full deck (Part 1 of 3)Download PDF

A model, in a loop, with tools

Those four words do all the work. An agent has four parts. The model is the reasoning, a large language model that turns text into the next step. The tools are how it reaches past its own text to act on the world. The loop is your code calling the model again and again, the engine. And the goal is what you put in the prompt, plus how it knows it is finished. Everything else is detail on those four.

The model is a stateless text predictor

The "model" is a large language model: give it text, it returns the most plausible continuation. That is the whole brain. The one piece of jargon worth knowing is stateless: the model keeps no memory between calls. You send a message, it answers, and it instantly forgets the entire exchange. Ask a tenth question and it has no idea the first nine happened, unless you send them again.

Picture a brilliant doctor with total amnesia. Every appointment starts cold, so you bring the whole case file each time. Accept that the provider remembers nothing, and the rest of how agents work becomes a logical consequence.

What you actually send: the message array

Every call is a system prompt plus a messages array of alternating turns, each made of typed content blocks. That payload is the model's entire universe. There is nothing else it "knows."

system: standing instructions set once at the top of every call, who the model is and what it is allowed to do. Because it never changes, it is the first thing caching reuses.
messages: the conversation in order, an ordered list of user and assistant turns. Re-sending this whole array on every call is the only reason the model "remembers" anything.
content blocks: a message is not just text. Each block declares a type, text, image, tool_use (the model asking to run a tool), or tool_result (what your code sent back). That is how tools and images ride inside one conversation.

The consequence is the part most explanations skip: memory lives in your code. The model appears to remember only because your client rebuilds and re-sends that array on every call. Who decides what goes into it is exactly what turns a chatbot into an agent.

Tools: the model asks, your code acts

A model can only produce text. It cannot touch a database or call an API. A "tool call" is four steps your code runs around that limit: you list the available tools in the request, the model emits a tool call (a name plus JSON arguments) and stops, your code runs the real operation, and you append the result to the array and call the model again with it in hand.

The model never has its hands on production. It can only ask, and your code decides what is actually wired up. A single turn can ask for several tools at once; your runtime runs them together and returns all the results before the model continues.

The loop is the agent

Here is the engine. Send the array. The model returns either a final answer (stop) or a tool request (continue). You run the tool, append the result, and send again. That self-directed stopping, the model deciding the goal is met and returning a final answer with no tool call attached, is the line between an agent and a plain chatbot.

The goal is simply what you asked for in the prompt; the loop runs on its own until the model decides it has reached it. The one guardrail that matters is a max-iterations cap, so a confused model cannot loop, and bill, forever. When a tool errors, the result still goes back and the model retries, so a bad tool call does not crash the run.

The economics: statelessness is not free

Because every call re-sends the full history, two costs follow. Per call, cost grows linearly with the length of the conversation. But across a long run the earliest tokens get reprocessed on every turn, so the cumulative work grows roughly quadratically. That is why long agents get slower and pricier, and it is the direct price of statelessness, not a bug.

Two mechanisms manage it. Compaction trims the array, folding older turns into a short digest or dropping them outright, an honest tradeoff because summaries are lossy. Caching is the provider keeping the processed beginning of your request for a few minutes and barely charging you to re-read it, like a barista who already knows your usual. Caching is a discount, not memory: you still send the whole array, it is just cheaper to re-read.

The whole machine, and its two limits

That is an agent, completely: a model, tools, a loop, the message array that serves as its memory, and the two cost mechanisms. Five parts, no magic.

But one loop in one process runs into two hard walls. It cannot outgrow one context window: a job too big to fit, or one that needs many things happening at once, will not run in a single loop. And it cannot outlive its process: the array lives in one running process's memory, so a crash, a timeout, or a routine deploy erases it, with no resume. Those two limits are where agents get genuinely hard, and they are the subject of Part 2 and Part 3 of this series.

Where Clausey fits

This is not theory for us. It is the machine Clausey is built on. Clausey runs agents over your own documents: it reads and structures them, answers questions with citations, checks them against your policies, and automates the work that follows, all on the loop described above, with the durability and orchestration those two limits demand.

So the next time an AI "reads your contracts and flags the exceptions," you will know exactly what is happening underneath: a model, in a loop, with tools.

See it work on your own documents.