Not all AI agents are created equal. Here’s a guide to the main ways of building them.
Oct 3, 2025
TL;DR;
Start with a simple ReAct agent (reason → act → observe) and expand only if needed. Workflows add structure, orchestrators coordinate specialists, and multi-agent networks let agents collaborate directly. The hard part isn’t execution it’s context management. Fewer tools, tighter memory, and clear goals usually win over complexity.
Nowadays, every single company wants to build its own AI Agent, and there are companies made just to help you build your own AI Agent, because this is the new hype. But there are a lot of different flavors of AI Agents, and I'm going to try to explain my favorite ones, how they work, and how they compare to each other.
This is not a cookbook, it's a view window of 4 methods of building AI Agents but in production, you'll want (and need) to combine the principles of each of them so you can have a SOTA AI Agent. So, let's dive in.
At this point, I have no idea how many agent structures are out there, but we are going to focus on the 4 most popular ones. This is based on experience and a lot of resources I've read. I'm going through them in order of complexity.
The Simple ReAct Agent.
This is what I recommend you start with. It's the most simple agent structure and it's the foundation for every other. The ReAct pattern follows a loop of three steps: Reasoning (think about what to do), Action (execute a tool or take an action), and Observation (examine the result from the environment). This cycle repeats until the goal is achieved.
The key insight of ReAct is that the agent doesn't just plan and execute, it continuously learns from the environment. After each action, the agent observes what happened and uses that information to reason about the next step. For example, if a tool returns an error, the agent observes the error message and reasons about how to fix it. If a database query returns unexpected data, the agent observes the results and decides whether to refine the query or move to the next step.
So it's a simple recursive loop: Think, Act, Observe, repeat until the goal is achieved. There's not much more to it, really, you can build one in 10 minutes, you just need to remember: Reason, Act, Observe, repeat.
Some fundamentals that I like to follow when building this type of agent:
Keep it simple: Don't overspecialize the agent in tens of different things. If you are doing a simple ReAct agent, make it have a simple goal and a simple set of tools that it can use.
Build for one goal: If your goal is to build Querio, for example, do not build a general agent that can do anything. For Querio, we want to write SQL, write python, analyze the result and answer the user. Nothing more than that. If you implement the creation of a dashboard for example, then it can have two distinct goals, and this type of agent is not the best to do that.
Keep the tools simple: One tool should do one thing and one thing only. The narrower the tool, the better. But do not have many tools, as the agent can get confused.
No strings attached: The reason why this kind of agent works is because it can think about what to do next, without the need of a strict workflow. You can still guide the agent, like blocking some tools to be used only after other tools are used, but try to avoid it, keep it simple.
The Workflow Agent.
This is probably the one that you want, and it's something on top of the Simple ReAct Agent. Here, you have a strict workflow that the agent can follow, and you can guide it to do what you want. Of course, a workflow can have multiple branches, where one workflow can lead to another one, but let's focus on a simple one.
When thinking about AI agents, usually we think about the agent doing everything, but in reality we can think of the agent as "smart if/else" statements, and a strict workflow is just a set of if/else statements. This pattern of using a classifier to route requests to specialized handlers is a well-established practice in AI systems, where you're essentially using an LLM as an intelligent router that decides which path to take. Let's imagine Querio's new feature, Embedded AI Agent. It has two workflows:
General Answer: If the question is a general question, we need to answer it with a simple answer. We don't need to consult the database for that.
Data Analysis: If the question is a data analysis question, we need to consult the database for that.
So, right from the start we have an if/else statement, and a single LLM call can decide which workflow to follow. If the first workflow is chosen, then the agent generates an answer and that's it. If the second workflow is chosen, it's where it gets more interesting. Now, we enter a different path, where we connect to the database, and need to write SQL and analyze the result. We can add multiple steps and multiple workflows, but the principle is the same. Notice that the workflow is not necessarily an agent, it's just a set of if/else statements guided by a LLM. You can have a ReAct Agent inside a workflow. I wasn't going to include this as one of the main types of agents, but it will help understand the next ones better.
The Orchestrator Agent.
This is built on top of the Simple ReAct Agent and the workflow. Here, we have an orchestrator that, as the name suggests, orchestrates the flow of the agent. It's basically an agent that decides which workflow to follow, and this agent gives the instructions to the next step. Think about the orchestrator as a simple ReAct Agent, that thinks then act, but instead of having all the tools available, it has a limited set of tools, and those tools are specialized in a specific task.
So, instead of having a broad set of 100s of tools, for example, we have a limited set of tools, like 10, and those tools are specialized in a specific task. Instead of having 3 tools to make a chart, for example, we can have a single tool, that calls a workflow and returns a result. The main difference between a workflow and an orchestrator is that the orchestrator goes back to itself and thinks again, allowing it to perform multiple workflows.
The beauty of the orchestrator is that you can orchestrate multiple agents, giving them specific tasks. You can think of the orchestrator as a Product Manager, who knows the main goal and creates tasks for the experts (agents) to do. The Product Manager doesn't do the work and doesn't know how to do it properly, but they know who knows how to do it.
So, you can think of an Orchestrator as a ReAct Agent, that thinks then act, but instead of having all the tools available, it has a limited set of tools, and those tools are specialized in a specific task. The Python tool knows exactly how to write Python, it knows the libraries to use and it knows the best practices to follow. The orchestrator doesn't know any of that, it just knows that it needs to call the Python tool to write Python code.
The advantage of the orchestrator is specialization and context management. Each sub-agent knows exactly what it needs to do, because the orchestrator gives it just enough context to do its job. It doesn't know everything, but it knows what it needs to do now. This is an advantage over the ReAct Agent, because when we are working with less context, the agent performs better than when we are working with too much context. Imagine that you are a product manager and all of your employees have ADHD, if you tell everything they will get lost, but if you give them just enough context, they will do their job. But this is a double edged sword, because now you will need to face the biggest problem of building an agent: context engineering. We'll talk about that later. Also, an orchestrator is much more limited in what it can do. It's not as limited as a single workflow, but it's not as flexible as a ReAct Agent. With an orchestrator, everything should be extremely tight and structured, while with a ReAct Agent we just let it be.
The Multi-Agent Network.
This is my favorite one. It's exactly like an Orchestrator, but without an Orchestrator. Yeah, this is strange, but think with me for a second. The advantage of the Orchestrator is that it works with less context than a simple ReAct Agent, and we can specialize and define specific workflows that the agent can follow. This is good, we like that. What we don't like are the disadvantages: an orchestrator is much more limited in what it can do, it lacks understanding of exactly what was done, and it creates a single point of failure where the orchestrator itself can become a bottleneck. If the orchestrator makes a poor routing decision, the entire system suffers. But that's the whole point, right? The Python agent doesn't need to know how to write SQL and doesn't need to know the 3 queries that failed before the final query was written. I agree. 95% of the time. But it's the 5% of the time that makes it not the best strategy.
Imagine you are the product manager and you asked for a SQL query and it returned the result. Now you send the result to the python expert and it tells you that the result is not complete. What do you do? You go back and forth with the SQL expert until it gets it right. The problem is, you are the middle man, you don't know what was done, you just have the results. And if you don't implement a persistent memory for each expert (agent), they won't know what happened before, so they can do the same error over and over again. So what if we remove the middleman?
That's the beauty of the multi-agent network. Imagine you are at a very small company and you, the manager, need a chart. You go to the data guy and ask for a chart. The data guy doesn't know how to create a chart, but he knows who does and knows how to write SQL, so what does he do? He writes SQL, then he asks the chart expert to create the chart, telling everything that happened to get this chart. The chart expert actually can't create the chart because some computation is needed, so he asks the python expert to do the computation and, after it's done, it can create the chart. In the end, you get the chart that you asked, and you have no idea what was done to get this chart.
This is the gold standard (well, for me, at least), because it gets everything good from the ReAct agent, you can implement workflows and you can still have the specialized agents. Of course, it's not for free, and the cost is context management, which is our next topic of conversation.
Context Engineering
This is where things start to get ugly. So far, we talked about execution, how to execute a task, but we didn't talk about the memory. A really good engineer who has Alzheimer's and forgets everything he did or needs to do is not ideal, that's exactly what LLMs are, if you don't manage the context, i.e., its memory, it's not going to remember and it's going to get lost. This is especially true for the orchestrator and the multi-agent network.
Context engineering has become the most critical skill when building AI agents. Research shows that as agent tasks accumulate context over multiple turns, several problems emerge, like context rot, which is a critical concern: recent research shows that LLM performance degrades as input length increases, even on simple tasks. Models don't process their context uniformly, a model that performs well at 1,000 tokens may struggle at 10,000 tokens on the exact same task. This degradation happens across all major models, including the latest GPT, Claude, and Gemini releases. The implication is clear: simply having a long context window doesn't mean the model can use it effectively. This is why aggressive context pruning, summarization, and strategic information placement are essential, not just helpful optimizations, but fundamental requirements for reliable agent performance.
The key strategies for managing context fall into four categories:
write: what to store;
select: what to retrieve;
compress: how to summarize;
isolate: how to separate concerns;
We will talk about each one of them in detail, but for now, just know that you need to manage the context in a way that is efficient and effective.
The ReAct agent can simply have all of the messages and you don't need any context engineering, the workflow can have no memory at all, but the other two need to have a way to remember what happened before, and what should be done.
So, let's work on each one at a time.
The Orchestrator
The way the orchestrator works is that it should know:
What it needs to do
What it did
What is the next step
The orchestrator doesn't need to know what happened in a specific step. If the specialized agent failed to write the code 3 times until it was successful, the orchestrator doesn't need to know that. It just needs to know that the code was written and it was successful. For that, the best approach is to own our own context. But keep it simple, try to maintain the structure as close as the messages structure (the one OpenAI has and everyone copied), and store everything that happened on the context. Then you can simply show only the relevant information to the orchestrator, without the need to show everything.
Let's do a small dissection of the context of the Agent V7 of Querio (the one in production at this moment). The agent was made to follow the explore page, so there are features that are relevant just because the page has them. So, let's talk about those features.
The v7 is a hybrid of the Workflow and an Orchestrator agent. It's a workflow, that follows a strict pattern, and then at the end of this workflow, an orchestrator is called to answer the user. I won't talk about the workflow, but let's focus on the orchestrator. We have some features that are relevant for the explore page:
Steps: we clearly divide the agent run into actionable steps. One step for SQL, one step for Python, one step for visualization, one step for answering the user. This allows us to have the feature "edit step", where we tell the agent exactly what to edit, without the need to do everything again
Versions: We have versions for each output. If you asked the agent for something, then need to modify a step (or the whole code), a new version is created, like a thread. We keep track of everything in each version, so we can roll back to the previous version if needed.
Focus on a single output: Instead of a chat, that can have multiple outputs and it has no strings attached, we focus on a simple 1 input -> 1 output. Any new messages in the chat is treated as a new output.
Follow up: be able to continue a thread of conversation.
So, thinking about that, our context should support both features. Without showing any code, I'll guide you on the decisions made. First, let's ignore the follow up, so it is a little simpler to understand. Our context includes a question, which is the original question that the user asked, like "How many sales happened in the last 30 days?", and an array of versions, where each version is an edit to the output. Each version contains all steps, being each step a sub agent. Each step contains:
The tool that was called: SQL, Python, Visualization, Table, or Final Answer
The result of the tool: Whether it succeeded or failed, and what was the output
This structure allows us to have a clear view of what happened in each version, and we can easily show the orchestrator only what it needs to know. When the orchestrator is planning the next step, we don't show it every single message that happened inside the Python agent, we just show it: "Python was executed, here's what it did, and here's the result". This is the key to context management in an orchestrator.
But why this specific structure? Let's break it down:
Why versions instead of a single history? Because when you edit a step, you don't want to lose the previous work. Imagine you wrote SQL and Python code, created a visualization, and answered the user. Now the user asks to change the visualization. You don't want to run the SQL and Python again, you just want to edit the visualization step. With versions, we can create a new version that copies the successful SQL and Python steps, and only re-runs the visualization. This is much faster and cheaper than running everything again. It's like having multiple branches in git, each edit creates a new branch, but you keep the previous ones.
Why steps instead of a flat message history? Because steps are semantic units of work. When the orchestrator is planning, it doesn't think in terms of "I sent 3 messages and got 2 responses", it thinks in terms of "I wrote SQL, then I wrote Python, now I need to create a visualization". Steps also allow us to implement the "edit step" feature, you can tell the agent "edit the SQL step" and it knows exactly what to edit and what to keep.
What about the orchestrator's actual context? When the orchestrator is running, its context looks like this:
Notice how clean this is? The orchestrator doesn't see the 10 back-and-forth messages that happened inside the SQL agent when it was fixing syntax errors. It just sees "SQL was executed, here's what it did, success". This is the power of context engineering, you own your context, you decide what to show and what to hide.
The orchestrator can also see the plan it made at the beginning, so it can check "did I do what I planned to do?" and decide if it needs to do more work or if it's done. This is crucial for the orchestrator to stay on track and not get lost.
Why we track the original question separately? This is important for follow-up questions. When the user asks "now show me last week", we need to know that the original question was "How many sales happened in the last 30 days?" so we can understand the context of "now show me last week" (show me sales last week). The original question is the anchor that keeps the conversation grounded.
This structure, questions with versions with steps, gives us the perfect balance between flexibility and structure. It's flexible enough to support edits and follow-ups, but structured enough that we can control exactly what the orchestrator sees.
The Multi-Agent Network
The multi-agent network is similar to the orchestrator, but without an orchestrator. The main difference is that a multi-agent network doesn't have a "hub" where it can go back to and decide the next steps. Instead, each agent should be aware of what happened before and the main goal. This structure makes it easier to work without a planner, for example, and can make agents that are faster and more flexible. It's a little bit harder to implement, but it's worth it. The principle: imagine that there's a task, like "Build a Querio Clone", and you have 5 engineers to work on it. First you need to understand what each engineer is good at, so you are able to talk with the right engineer to complete the task. Then, you should be able to define what is the immediate next step to finish the task. For that, every single engineer should have a clear understanding of the main goal and need to know who they are working with. So, the back-end engineer should know who the front-end engineer is and what he is good at, and the front-end engineer should know who the back-end engineer is and what he is good at.
Then, the easiest way to implement this is to implement a simple ReAct Agent. Yes, in the end, it's all ReAct. But the difference is that instead of calling tools, the agent calls other agents, and those agents have their own context. If we want, we can simply append the messages of every agent to the context and implement a ReAct Agent like that, but it can get messy very quickly and we will reach context rot. The guy that is going to make the design system doesn't need to know that the Back-End engineer installed tRPC. The front-end engineer that is going to do the back-end integration, on the other hand, needs to know that we are using tRPC, but it doesn't need to know exactly how tRPC was installed.
For that, one thing that I like to implement is summaries (part of the compress strategy in context engineering). Instead of appending every single message to the context, we can append only a summary of what each agent did. For example, instead of the back-end engineer appending 15 new messages to the context, we have another LLM to summarize the work done in a single message. This guarantees that every single agent knows what was done, but not necessarily how it was done.
But summaries are not perfect. The biggest problem is that you lose information. If the front-end engineer needs to know the exact API endpoint structure and the summary says "built a tRPC API", that's not enough information. So, you need to be smart about what to summarize and what to keep. A good rule of thumb is to keep the outputs and summarize the process. For example, keep the final tRPC router structure, but summarize how it was built. Keep the SQL query that was written, but summarize the 5 attempts that failed. Keep the Python analysis results, but summarize the debugging process.
Another approach is to use scratchpads and long-term memories: external memory stores that persist information across agent steps or even across sessions. Think of a scratchpad as a shared workspace where agents can write structured information (the write strategy) that other agents can selectively retrieve (the select strategy). The SQL agent writes "I created a query that returns X columns with Y rows" to the scratchpad, and the Python agent can query "what did the SQL agent produce?" and get that specific information, without seeing the entire conversation history. This implements the isolate strategy, each agent has its own view into the shared memory, seeing only what's relevant to its task.
The beauty of the multi-agent network is emergent behavior, a well-documented characteristic of multi-agent systems where complex, unpredicted patterns arise from simple agent interactions. Because each agent has full context and can decide who to talk to next, you get interesting patterns that you wouldn't see in an orchestrator. For example, the Python agent might realize the SQL query is insufficient and directly call the SQL agent to fix it, without going back to a central orchestrator. The visualization agent might call the Python agent to transform the data in a specific way before creating the chart. This creates a more natural flow, similar to how humans work together, where solutions emerge from direct collaboration rather than top-down planning.
So, what should we keep in mind when building a multi-agent network?
Context explosion: Without careful management, the context grows exponentially. Agent A calls Agent B, which calls Agent C, which calls Agent A again. Now everyone needs to know what everyone did. This is solved with summaries and smart context pruning.
Circular dependencies: Agent A asks Agent B for help, Agent B asks Agent C, Agent C asks Agent A and so on. This can cause infinite loops and it's hard to debug, you need to implement safeguards to detect and break these cycles. Remember, if the LLM sees a pattern, it will repeat the pattern.
Lost in translation: When agents communicate through summaries, important details can be lost. You need to be strategic about what to keep and what to summarize.
Debugging nightmare: When something goes wrong, it's harder to trace the issue because the flow is not linear. You need excellent logging and visibility into what each agent is doing.
What to build
Ok, now my favorite part: What should I build for my new project? Well, I don't have the answer, but I have a path. When I'm developing agents that I don't know exactly how it should look like, I always start with a simple ReAct Agent. With a ReAct Agent, it's very simple to expand to a multi-agent network if needed, but you will be surprised how far a simple ReAct Agent can take you. In general, I tend to follow those principles:
The less tools the better: It's way easier to get the right tool when there's only 3 tools available instead of 100. Make simple and clear tools.
The less context the better: It's very easy to make an agent that stores too much context in its memory, but the problem is, when there's a massive amount of context, we face the "lost in the middle" problem. Research has shown that language models exhibit a U-shaped performance curve: they're best at retrieving and using information at the beginning or end of their context window, but performance significantly degrades when the model must access information in the middle of long contexts. This means that simply having more context doesn't help, it can actually hurt performance. The solution is aggressive context pruning, summarization, and strategic placement of critical information at the boundaries of the context window.
Agents are really good at making decisions, let them do it: The agent should be able to freely decide what to do next based on what happened before and what is the final goal.
Bad input === Bad output: Agents aren't magic, they are just very good. But if you give them bad input, i.e., bad context, bad tools and bad instructions, the result will be bad.
So, when you start building your agent, start with a simple ReAct Agent, and then expand to a multi-agent network if needed.
One thing that I like to have too is a planner step, to make sure the agent is going to do what it's supposed to do, but the planner can be a problem in this type of agent because of the emergent behavior of the agents. The agent can decide that after step 3, for example, we need an extra step before step 4, and this breaks the plan. I like the approach that cursor takes, where the planner creates a todo list, but this todo list can be modified by the agent whenever it wants. The agent can add new steps, remove steps and change the order of the steps. This means that, instead of a "planner", every agent can decide what the plan should be. This makes sure the agent follows a specific set of steps, but at the same time it's not too rigid.