And how they power Querio
Apr 11, 2025
TL;DR;
WebSockets are an underrated gem for building real-time, responsive apps. In this post, Pedro shares how his team used a custom router-based WebSocket system to power Querio’s new multi-agent architecture—solving problems like unreliable streaming, messy async logic, and reconnections. It’s faster, more structured, and (with some extra effort) way better than HTTP for the job.
Heyo, it's me again, Pedro, and today I want to talk about the technology of the year: Websockets
Yes, I know websockets are older than me, but it's an incredible piece of technology that is completely underrated and not many people talk about it (at least not enough).
So, this is not an introduction to websockets nor a tutorial on how to use it, I'm just going to share how we are using websockets to power our new Agent V7. But first, I need to explain at least why websockets.
Networks are literally decades older than me, so of course people WAY SMARTER than me have already figured out the best way to communicate over networks, and basic Networks 101 tells us that we should use REST over HTTP, and I agree with that. Kind of.
The standard and boring way: HTTP
Why should we use HTTP? 99% of the applications that we use every day are built using only HTTP, and for a good reason. It's simple, it's easy to understand, and it's a very good protocol for most cases. But it lacks a few things:
It's a one-way protocol. You can only send data from the client to the server, and the server returns a response with the outcome of the request. The server can't send data to the client without the client asking for it.
If the request takes a long time to complete, the client will just hang there waiting for the server to respond.
So, imagine Querio using only HTTP requests: you ask a question, the agent takes 25 seconds to answer, and you keep staring at the loading wheel for 25 seconds waiting for the answer, without any feedback on what's happening. That's of course not a good experience, so how can we solve this?
The second problem has a solution, and it's called Streaming. When you make an HTTP request, the server can stream responses to the client, and the client can subscribe to this stream and receive the data as soon as it's available. But this API is REALLY bad. For example, it's very hard to stream multiple structured objects, because the way streaming works, it's possible that you receive a partial object, and you need to keep track of this partial object until the server sends the next part of it. Then, the server can send the last part of the object, but at the same time send another partial object, and then you need to write a script that can handle this, and in the end it's a nightmare—it's really, really, really bad.
But then I started thinking: Bro, game developers are WAY AHEAD of us in this game, web dev is decades behind game dev in every single aspect, so let's look at some techniques that they implement in games that we can use to improve our product. Because at the end of the day, in a multiplayer game you send an action, and the server needs to send all of the information that changed back to the client, and it should also be able to send information from other players to the client, even if the client doesn't request it, with extremely low latency. So how do they do it?
Game Developers are way more intelligent than Web Developers: The UDP way
Well, Game Developers use something called UDP, and this is a protocol that is way simpler and faster than HTTP. It works like this:
You "connect" to a UDP server. (There's no handshake)
You send a message to the server.
The server sends N responses (packets) to the client.
If your connection is bad, you can have packet loss, so the client loses this information.
You can receive data in any order, so the client needs to be able to handle this.
When we make an HTTP request, with every request we need to have a handshake to make sure it's connected to the server, and then we send the request. But in UDP we don't have any handshake, and we don't have, in fact, a connection, meaning that you can send and receive messages almost instantly. And this is very important for a multiplayer game; for example, CS:GO natively receives 64 packets per second, which means the server should be sending 1 packet every 15ms. When playing a tournament, this increases to 128 packets per second, which means the server should be sending 1 packet every 7.8ms. This is something that's just possible with this kind of connection—you would NEVER be able to do this with HTTP.
But the thing is, UDP is a protocol that is not reliable, and it's not designed to be used in modern web apps. It lacks some features that we need for a reliable web app, like:
Ensuring that packets are received in order.
Error checking so that the packets are not corrupted.
Congestion control so that the server doesn't overload the client.
So, if we want to use the speed and efficiency of UDP, make sure the server can talk to the client without being requested, and at the same time we want to have all of the features that HTTP provides, what can we do?
Well, here's where TCP comes in handy.
TCP: The hero we wanted (and needed)
TCP is basically UDP with a handshake, meaning that you need to create a connection to the server, and then you can send and receive messages. It also has all of the features that we mentioned before that UDP lacks, and while it's a little bit slower than UDP, it's still VERY fast—and when I say very fast, I mean it. And to implement TCP in a web app, we can use something called WebSockets.
I won't go into detail on how WebSockets work—there are plenty of articles written by people WAY smarter than me online—but I'll explain how we use it to power our new Agent V7.
The new Agent V7 is way more complex than the previous version. While V6 was a single agent, V7 uses a multi-agent architecture, and each agent has a different purpose. One plans the actions, one writes SQL, one writes Python, one makes visualizations, and so on. Coordinating all of those agents and sending the data to the client would be a nightmare with HTTP, but with WebSockets it's 10x easier. Every agent happens in a step, and every step has a lot of data in it. All we need to do is, every time we update the state of the step, we send the state to the client in a structured way, and the client can just update the UI with the new data—way more reliable than Streaming, making the development experience way better, and the UI way more responsive.
But the WebSockets ecosystem sucks; it's not as popular as HTTP, so we don't have many developers spending time making a good ecosystem for it with good libraries and tools. So we need to do it ourselves, and that's what we did.
I won't go into detail on how we implemented it, because I want to make an open source library for this that everyone can use, and when I develop it, I'll go into detail about it. But I'll give some of the highlights of the implementation, which is totally based on a router-based architecture.
Router-based websockets.
In websockets you send events and receive events, and those events aren't related AT ALL to each other, meaning that you don't know if an event is related to a request or not. For that, I made routers.
A router is just an HTTP router. But instead of having a path, it has a typeName. In a shared library (to be used both on the server and the client) you can create a type for your route, specifying the typeName and the parameters it needs. Then, on the server you can create a handler for the events that, when identifying the typeName, can call the function that handles that route. On the front-end, you can implement a custom send
function that has all the types, and then you can just call it with the parameters it needs.
But this doesn't solve the problem of the events not being related to each other. For that I created a custom event system. If the route is called orchestrator/query
, for example, on the server we can define multiple responses for that route, like orchestrator/query:response
, orchestrator/query:error
, orchestrator/query:progress
, and so on. Then, the server can send N events to the client, making it really easy to understand where this event comes from.
There's another problem with WebSockets too, and it's the fact that you can send an event but you can't await a response. This means that if you want to simulate an HTTP request through websockets, you would need to send the event and then create a handler for the response. This is not good; it's not a good experience, and if it's not a good experience then I don't want to do it. So, how can we solve this?
I racked my brain a little, but I found a solution. I implemented a sendAsync
event so that every time we send an event through it, it sends a promise, appends a messageId to the event, and finally creates a promise to be resolved later. Then, the server can send N events to the client with the messageId, and the messageId is appended to a list, and when the request ends, the server sends a typeName finish
with the messageId. Once receiving the finish event, the client can resolve the promise with all of the messages that were sent to it.
Example usage:
This is a much nicer way to handle events, meaning that we receive the response as soon as it's available and we can wait for it, just like in HTTP. This is also completely type safe, because every response starts with the typeName and then the parameters, meaning that we can have a function signature like this:
As you can see, it's going to return every single TypedAgentWsMessage that has the typeName ${T}:${string}
, with T being the typeName of the event.
Connection
Here's another fun challenge websockets throw our way: keeping that connection alive!
Think about it. If you're a developer working on Agent V6 with streaming, life is pretty chill. Server restarts? No problem! Since there's no persistent connection needed, you can just keep working on your questions without any problems. But with WebSockets? Every server restart means bye-bye to all your connections. Not great for development.
So how do we fix this? We need a solid reconnection system. And honestly, it's not rocket science. We just re-use our connection function, but with a twist: when it's a reconnection, we send the old agent state back to the server. And guess what? This elegant little solution actually kills three birds with one stone! We can use this same function to implement the "continue chat from here" feature AND the "edit element" feature. So by solving one problem, we get two more solutions for free. Pretty sweet deal.
Downsides
Let's be honest: there's a reason why we don't use websockets everywhere. Working with WebSockets is not all rainbows and unicorns. The asynchronous and event-driven nature of WebSockets creates some real debugging nightmares. You'll often find yourself piecing together confusing logs, trying to figure out what happened and when. Infrastructure challenges? Absolutely. Those persistent connections can be problematic, as we already saw with our reconnection issues. Every time the server restarts, boom, connections lost.
And the learning curve is steep, my friends. Without the robust ecosystem of tools and libraries that HTTP enjoys, you're often left building custom solutions for everything, like reconnection logic, error handling, state management, you name it. This all adds up to significantly more development time and complexity. Buuuut with the structure that we are using right now, it's not much harder than using HTTP, you just need to do a little bit more work and be a little bit more careful.
Conclusion
As you can see, there’s no perfect solution, every protocol has their strengths and downsides, and Websockets are not an exception. But I believe we were able to overcome most of the problems with our implementation, and this structure is going to be used to power Querio for a looooong time.