Text-2-SQL is solved.
Aug 20, 2025
Text-to-SQL (the process of using a prompt to see if an LLM can write correct SQL given a clean DB schema blindly) is one of the biggest unlocks that LLM’s provide.
Data work is one of the last transactional relationships left internally at companies. Everyone needs data, whether dashboards or excel sheets, and in the background there is a data team sifting through tickets and handling every single data request that a company needs. They are frequently dubbed the most understaffed function at any company.
This led to an obsession with text-to-SQL. Automating one of the most time-consuming and “labor” work of data teams would cause a data renaissance. The data world began benchmarking every model to see if they can do well, and an explosion of evaluation datasets including spider and even new ones like from our friends over at defog.ai come about. Every new model comes with new scores and new hopes.
In fact, Querio runs it’s own evaluations too. Although we purposefully cheat. You see, benchmarks are meant to be a blind test. You don’t give the AI any idea of what’s in the data, you don’t give it explanations or previous queries as examples, and you see how well it can get it right. Well in reality, we give Querio a lot of context. So while a model was achieving 56% on a blind test in Nov 2024, we were already at 99% on our internal tests with early agents.
However something crazy happened in Q1 of 2025. Companies kept fine-tuning models more and more so they can improve the base models. A base model would get 70%, then you can fine-tune it to 90%+! Then, Anthropic came out with Claude 3.7 and it… just hit 97% on blind tests with no fine tuning.

Suddenly, this became solved. Models can reliabely write good SQL maximizing any amount of information you provide. They can take the prompt, and get it right! This was very exciting for us at Querio. But maybe you’re thinking “hey, if you are trying to make text-to-sql isn’t the model doing it natively mean you do nothing special?” Well, no. We never made models or fine-tuned them. We built a product that assumed one day, a base model would be able to do this.
This is true for a large sub-set of companies actually. If you think of Cluely, Cursor, Lovable, Opus, these are companies who were creating product for models that didn’t exist yet. The idea of building products in problem spaces that LLMs can ‘grow into,’ rather than building products with current models that eventually ‘grow out,’ is what differentiates great companies from dying ones. This is assuming of course that you also build a product that improves with better model performance.
So now we’re here, in a position where Querio can really achieve it’s vision of enabling anyone, no matter their technical level, to work with data. We got here today for the last 2 years of work. We’re seen our revenue spike, our customers use the product more and more, and engagement just soar because for the first time there’s a new way to work with data.
It's quite ambitious to create a data product from scratch that redefines