Javier Bonilla

Data

6

m

Machines, files and the right amount of insanity

We've already run this experiment in software. Data hasn't caught up yet, here's what that looks like

Software engineering is the frontrunner when it comes to adopting LLMs in their day to day workflows. It makes sense, development was a fertile ground for a piece of tech like LLMs to take over. The reason software engineering was such fertile ground isn't just that there’s plenty of tinkerers and a whole lot of money. It’s that the work was already happening in files locally. Code is text, the terminal is text, the codebase itself is a whole lot of context and your computer, is a computer. Agents didn’t have to translate anything, they stepped into a world they already knew. That’s why the transition was smooth and the rise meteoric.

It feels like all our workflows have been disrupted and that everyone is living in an agentic world. The reality is that the vast majority of workflows outside of coding haven’t changed all that much, we’re barely at the beginning. With that in mind, here’s my personal account of how things went for coding. We started by pasting a bunch of text into ChatGPT, the thing spitting out code, us copying and pasting it back into our codebase. Over and over. We were the agent loop and the tooling interface. That was a trash workflow, so we freed the LLM from the webapp and brought it to our IDE, looped it on itself and got it working directly on our machines. The models got better, we got the agents closer to the machine where the work is happening and with which text has always been the default language. There was one key problem, which at the time felt like a feature. Copilot would ask for permission and Cursor would show you every diff, they were too cautious. Claude Code dropped in and one of the key innovations was no more diffs and auto-approve, it sounded a bit insane… then it worked. Don’t manually approve every damn grep and file write, just let it do its thing and review it at the end (or not). That’s taking us to some pretty crazy places, where we spin up multiple agents in parallel, let them collaborate to do their thing and sometimes have them always be awake and proactively take on work. At every single one of these steps, there’s been a safer, more governed version that lost.

Why? Because it plugged in beautifully with the workflow we humans already had, it’s working straight against files, using your computer’s tools, and it’s the right amount of insane. It’s not an MCP routing calls through a server, it’s not a browser agent clicking buttons as if it was a human, it’s not a multitude of tools wrapping a product’s thousands of REST APIs and it most certainly is not a highly governed tool that gives you certainty nothing will go wrong. I think all of us who have been building agents for a while have felt it, as the models improve and you remove your tools, barriers and orchestration, the agent becomes more reliable and effective. The idea is fairly simple, give an agent bash and a filesystem and it already knows what to do. That knowledge, like reflexes for humans, is baked-in at training scale.

The data industry has technical debt, conflicts of interest and fear. There’s a myriad of closed products who are mapping their existing APIs into tools and building agents that use their product the same way human users do. Through the front door, one endpoint at a time. This isn’t a bad implementation of a good idea, it’s a bad mental model driven by the technical debt, conflicts of interest and fear. The technical debt comes from the way they’ve built their products, the further back they were founded, the more noticeable this is and the harder it will be to pay it off. The conflict of interest comes from the fact that they’ve already taken their bet and are actively reaping the benefits. Using Looker as an example, the bet was LookML and it makes them money, so getting any agent to write code that is not using LookML clearly conflicts a fundamental truth about their business. It’s the Kodak and the digital camera conundrum. The fear is driven more by data teams, than the companies making data products. It’s a fear of data inaccuracies, as if humans had never produced an incorrect analysis before. This fear is what has made governance layers so widespread. Pass every interaction with your warehouse through a piece of middleware that ensures accuracy but is a pain in the ass to standup and maintain. This is the “ask for permission at every step” fear and it makes agents second-class citizens. They can interact, but only through the hoops the product designed for their human workflows and fears. Logic and workflows that live in a file don't have this problem, any agent already knows how to read, modify and run it.

I read a blog post recently that reminded me of this. Someone was genuinely excited to share how cool it was that they were using Claude Code to manage a certain product’s semantic layer. They would write the YAML locally with Claude Code, push it to the product via API, and would then have to check in the UI to see if anything broke. Not only did the API not even have the decency to tell you if the YAML you just pushed was syntactically correct, there was also no way to run any validation query through it to make sure the mapping you did was accurate. Once they found an error, they’d have to copy it from the UI, paste it into Claude Code, rinse and repeat. I don’t know about you, but it sounds awfully similar to what programming with ChatGPT was like… absolutely abysmal and safe. This is what happens when an old architecture is quickly retrofitted to “work with Claude Code”, you make a part of the product code-accessible, but the seam creates an absolute workflow mess. Shit workflows and context switching is the symptom, but logic trapped behind a proprietary boundary is the disease.

What should we build instead? Same thing as with software engineering, an agent that has access to a machine with some files and people who are willing to do things that might feel insane at first. The same way Claude Code doesn't call GitHub's API to write a file, it just writes one, an analytics agent shouldn't call a BI tool's API to update a metric. It should write the file. The machine and the filesystem provide the right infrastructure for agents to read, search, modify and execute without having to cross a product boundary. Your metrics live in a file, your notebook is a file and your logic is searchable, editable and runnable from a terminal. Auto-approving, letting the agent loose and letting it do its thing, is what makes this feel like the future. The insane part is not only letting the agent loose, but no longer gating access to only analysts and scientists, but letting regular business folk use these powerful agents. With this comes a need for good semantic context with strong naming, documentation and structures, and there is room in there for governed layers that ensure accuracy in certain scenarios, just not as a requirement for every single execution. This is what an analytics codebase looks like: the workspace for data work designed the same way a software project is.

It took some time, but software engineering took the leap. Data is still reviewing every diff, that won't last. Some startups are delivering product in this direction and some data teams are taking the leap already. The tools and teams that win won’t be the ones that make agents safer by weakening them. They'll be the ones with machines, files, and the right amount of insanity.

Written by