Engineering
Product
Code Execution Environment
Feb 21, 2025
TL;DR;
Pedro discusses improvements to the Code Execution Environment at Querio, focusing on enhancing performance, security, and user experience by keeping the Jupyter Kernel alive, utilizing Abstract Syntax Trees for better code handling, and introducing new cell types for SQL and visualizations. Future plans include integrating the notebook feature into the app and developing a Copilot for user assistance.
Heyo Querio Users, I’m Pedro, the first engineer at Querio. I built the last six versions of the Agent, the Code Execution Environment, and most of the front-end. I joined Querio after working with Javi for almost a year at another company, and when the opportunity arose, he asked me to join him and Rami on this new adventure.
I’ve been coding “for real” for almost six years now, but when I was a kid I loved developing games and modding Minecraft. So when I started coding professionally, it was really easy for me to get on track. I’ve always enjoyed teaching friends; When I was a Tech Lead, I gave classes to our developers and always had a pupil to share everything I knew. I even got kind of big on Twitter (10k followers at its peak) by helping people, teaching about programming, and developing viral apps. My most notorious app was called Piggy, a PFM that got 70k likes on Twitter, but I also built smaller projects, like a bot in 2021 that got more than 20k followers.
In general, I love building things and sharing what I know, and that’s exactly what I’m going to try to do with my blog posts.So ok, let’s stop yapping, let’s get real. Every day at Querio, the first thing I do in the morning is ask, "How can I improve our product today?"

This question drives me to innovate and build new features. While our current Agent works well with our customers' data sources, there are many areas where we can make enhancements. In this blog post, we'll explore the challenges of the Code Execution Environment, review its current implementation, and discuss our plans to improve it, ensuring we deliver the best product possible.

Code Execution Environment
One of the core principles of our agent is that it can write code. That’s the most trivial thing the agent does, it writes SQL and Python to answer users' questions. But even though this is a trivial part of the app, it’s also one of the places where we can achieve the most wins. The current production implementation works, but it’s rough around the edges. I didn’t have the necessary experience to plan and deliver a state-of-the-art Code Execution Environment (CEE), so let’s now make an autopsy to see why it died.

Current Notebook implementation
We have a pretty bare-bones notebook implementation, and it works in a simple way: You create a conversationId and send code along with it. The application then opens the Jupyter Notebook, adds a new cell, and returns the response. However, since this was my first time working with nbformat
and nbclient
, I missed some problems with this approach, including:
1. Cold Starts and performance.
Every time we send code, the Jupyter Kernel restarts. We have to wait around 2 to 3 seconds just for the kernel to start, and then we need to run every cell in the notebook before we can run the newest cell. If the agent writes five pieces of code, this can add up to 15 seconds of query execution time. As you can imagine, this has a significant impact on performance.
2. Output
Currently, every time a cell runs, we only return a single response. The problem is, if there’s a warning triggered when the code runs, or if every print
statement from the agent produces a new output, we end up with only one output per cell. This can cause us to lose a lot of context. If the agent uses multiple print statements, it won’t see every output. Occasionally, the algorithm fails to capture the right output (rarely, less than 1% of the time). At the time, I thought this was the best decision, but only because I couldn’t see the flaws.
3. String Manipulation
Every piece of code we receive is a string, and strings are inherently problematic. They’re hard to manipulate, and a single point of failure can be exploited to create code injection. Of course, I thought a lot about code injection, and as of now, there hasn’t been a single vulnerability on that front. But the worst part is that it’s very hard to understand what the string is trying to do. For example, consider the following code:
Looking at this code, it’s clear that it creates a DataFrame and then displays it. However, to extract the DataFrame’s value, we need to dump the results and access them outside the kernel. This requires analyzing the intent of the code; if it’s displaying a DataFrame, we create a new string that performs df.to_dict()
. This can get very messy very fast, especially if the last line is a multi-line expression, because we don’t know exactly when the expression starts or ends.
If we want to protect a variable, that’s another challenge. The simplest approach is to check if the variable’s name is in the string, but that isn’t very safe.
4. Security
We use RestrictedPython, so we know for sure that the code runs in a safe environment. However, RestrictedPython is another dependency we have to manage, and it gets complex very fast. That’s yet another point of failure that often shoots us in the foot, because it’s too strict but necessary for the current implementation.

New Notebook Implementation
The new notebook implementation aims to fix all those problems. The biggest goals are performance and security. It needs to be very fast, the target is to run code in less than 100ms, and it needs to be secure. We can’t afford to leak any client data; this is the most sensitive topic we’re always guarding against. So, how are we going to do that?
1. Keep Kernel Alive
Instead of cold starting for every new piece of code, we now keep the Jupyter Kernel alive. This means it does one cold start, taking from 1.5 to 3 seconds (usually not more than 2), and every subsequent code run avoids that extra delay. This change alone helped us hit our performance targets, with code being executed in less than 10ms. Now the agent’s experience is nearly instantaneous, he writes the code and instantly gets the result. SQL calls still take time because they need to access the database and, if there's too much data, they can be slow. But normal Python code is instant.
2. AST
String manipulation is hard, but we don’t need to complicate our lives. Every piece of code is a string, but the language that runs the code doesn’t understand strings, it understands tokens and converts them into binary for the CPU. How does it convert to binary? First, it extracts the Abstract Syntax Tree (AST) from the code. The AST is a structure that defines exactly what the code is doing and how it’s working. Take a look at the following piece of code:
It first imports pandas as pd, then creates a DataFrame, and finally runs df.head() to return the first five rows of the DataFrame. Let’s take a look at the AST of the code:
You can see that the body contains everything the code is doing: using Import
to bring in pandas, Assign
to define variables, and Expr
to call df.head()
. With this, we can traverse the tree and extract details like all defined variables, dependencies, and the code’s intent, even for multi-line expressions. This gives us much more control, allowing us to protect variables and detect their usage.
3. New Cell Structure
A normal Jupyter Notebook contains only one type of cell: the code cell. For us, that’s not enough, we need to let the user write SQL as well as Python code, but how do we make it feel natural? We introduced a few types of cells. Every cell inherits from the BaseCell
, which is defined as follows:
As you can see, it stores the cell_id
, position
, state
, result
(the cell output), and cell_type
. From the cell_type
, we have four variations: CodeCell
, TableCell
, SQLCell
, and VisualizationCell
. Let’s take a deep dive into each of them.
Code Cell

As you can see, it has the source code, a list of definitions, and a list of dependencies. The definitions are just a list of Variables, which can be one of the following: "tbd", "int", "float", "str", "bool", "list", "dict", "tuple", "set", "function", "class", "series", "unknown"
. When we run this cell, we execute the source code unchanged, and we can return multiple outputs, not just one.
Table cell

This is the most controversial yet simplest cell type. When a user defines a DataFrame through a Python cell and returns it, we opted not to show the full DataFrame directly. Instead, we mimic a normal Jupyter Notebook by showing only the first and last five rows. This decision was made to reduce complexity and create a more authentic notebook experience. But how do we show full DataFrames? Through the Table Cell:
It has a variable
that must be a DataFrameVariable
, a filter_state
(which currently does nothing but may be used for filtering in the future), and a computed source
. The source
code calls the function dump_dataframe
with the variable_name
, allowing us to access the data outside the kernel and send it to the front-end.
SQL cell

This is how we let the user write SQL. The SQL cell is defined as:
As you can see, it also includes a variable
, a SQL query
, a filter_state
, and the source
. The source
defines the variable and queries the datasource, returning the full table. This SQLCell automatically shows its output without needing the TableCell.
Visualization cell
This cell is expected to be really complex. It will eventually allow the user to define what kind of visualization they want (bar chart, pie chart, etc.), choose the datasource, and select the columns for categories/values. For now, it works similarly to the TableCell
: you define a Plotly chart through a CodeCell
, and it displays via the VisualizationCell
.
Security
We improved security a lot with the AST, which blocks a lot of malicious code from the get-go. But at the end of the day, this is an RCE machine and the user can write whatever they want. So, we're adding measures to prevent users from accessing the file system, accessing someone else's data, making requests, creating a reverse shell, and so on. I’m not going to go into details because every bit we share could help an attacker, but we’re pretty sure the application is safe. If any of our clients don’t trust us to handle security, they can self-host their own code execution engine to ensure all the data lives only on their machine.

Next steps
Now that we've built a robust notebook structure, how do we turn it into a product? The first step is to integrate the Notebook as a feature in our app, a plan we're definitely moving forward with. We’ll allow power users, those who know how to write Python and SQL, to use these notebooks. For users who aren’t as proficient, we’re developing a Copilot that has full context of the datasource to help write Python and SQL. Also, there will be an agent in every notebook, where you can ask for help and it will do things for you. This feature is aimed at power users, but our final goal is for everyone to use it.
Another big change is the complete overhaul of the current Explore Tab. We’re working on an agent that understands how to use the notebook, which is exactly what we’re cooking up at this moment. We’re developing a new MultiAgent workflow that will improve the quality of the product by a ton. I’d be glad to share more details about our new agent in my next blog post, scheduled to air next month.