Nikolay Alexandrov

Engineering

5

m

How to Enable Self-Service Analytics

If You're Already Anthropic

When time came for another issue of my Querio-mandated blogpost the impossible finally happened. Anthropic dropped this bombshell into the villa and I began packing my stuff and saying my goodbyes. As I wiped the tears from my eyes it occurred to me to break with procedure and actually read the article that I am furiously sharing on Linkedin. Maybe I'm getting old.

As a quick summary, Anthsloppic suggests:

  1. Clean up your goddamn dimensional model! Lazy!

  2. No lunch today - build a semantic layer, do not skimp on the metadata - it's very important.

  3. Write skills to document processes. 2 is enough - a knowledge skill contains literally all your knowledge - don't forget anything. The unbook skill will capture common analysis patterns (retention, funnels, etc). Easy!

  4. To keep all of that consistent...ermmmm... move everything in one repo and set up some pre-commit hooks (Github Actions checks if you don't feel like using your M4 macbook today for anything more than 50 chrome tabs)

  5. build evals - fire a question with a known answer, make sure you have a snapshot of data so eval doesn't drift. Alternative: grade the sql directly, by using up more tokens.

  6. To increase accuracy by 6%, spend more tokens and wait 2x the time.

Thanks, Anthropic! Let's break it down.

step one: be anthropic

Brushing aside the fear of stating the obvious: neither Anthropic nor another LLM lab is solving the substrate problem for your data org. Success following this process requires an ambitious amount of refactoring, discipline and more than a few extra tasks.

bad news: you own the hard part

We have worked with many a warehouse and a database here at Querio and they come in a lot of beautiful shapes and sizes. Some databases are documented only in Mandarin, some are still pipelining into outdated fact tables and you're supposed to use NEW___X. There is no LLM-shaped solution to these problems. Standing up and maintaining a robust dimensional model may have become slightly less nightmarish with dbt but it's far from perfect. You and I have seen it.

An effective semantic layer requires owned definitions, baked-in grain and joins, canonical segment filters, a compile-to-SQL step, and tests against the blessed dashboard - then a human to re-bless all of it every time the business redefines a word. Anthropic tried to auto-generate this with an LLM and admits it failed.

From what we have seen the feedback loop between the business and data teams is an organisational boundary that is difficult to cross and scales with org size. Putting everything in a single repo and slapping pre-commit checks on is great and I'm all for it but it will not solve the procedural and communicational silos in your org.

Reminder, Anthropic still washes its hands of all of these problems.

some assembly required

The most masterful feint in the article comes when it sidesteps all the additional things you have to build to enable just the self serve the way it recommends. Author claims impressive accuracy gains on a set of evaluations but after reading this sentence something is staring me and you in the face: evaluations of what exactly? a programmable harness of course.

Claude.ai chat (team and enterprise included) has no scripted path. The only programmable surfaces are Claude Code headless (claude -p --output-format json), the agent SDK, or the raw API.

The braggadocious accuracy numbers require a test runner, a grader, a telemetry table and ci wiring that you build. Around the harness that you also build.

It's the same story for serving the insights: the IDE is the only surface you get free. The slack and teams "claude" apps are generic chat, not your skills, not your connectors, not evaluable. Every real serving surface is a bot you build, maintain and wire to the one harness you also built. The "sync to marketplace + blobs + mcp" section is duct tape stretched over a missing product (if only there was such a product!).

business context ghost

Another favourite of mine is the "company knowledge graph" (indexed docs, roadmaps, decision logs, org chart). This ethereal magic thing is doing all the lifting: resolving what the user actually meant by "Q2 launch". Short of dropping an LLM lab across every one of your org's communication channels and watching it blow up its own context window and begin replying in Turkish I am not quite sure what to do here. But I do know that it will require the same patience, discipline and headcount to maintain a trustworthy and up-to-date "company knowledge graph".

we infer, you maintain

The LLM lab's business model is selling you tokens. Suppressing the chuckles and assuming for a moment that your neighbourhood LLM lab's valuations aren't inflated up the wazoo they can only be justified to investors as labour-replacement. Let's trace just what kind of labour replacement is happening here.

the new deal

For data models with high churn LLM accuracy will rot in weeks. Naturally CI guardrails can help stop the rot reaching prod, but also carry a subtle workflow change - someone has to maintain the bag of domain documentation and skills to keep the model effective - that is an unavoidable tax if you're opting into involving an LLM into your data org.

Claude Code will likely manage to fill this workflow reasonably well. Externalizing the lore out of analysts' heads and into governed artefacts is exactly the right move. The swerve is pretending those artefacts are just incidental prompt fuel, rather than a first-class layer that needs its own workflow, surface and ownership model. Anthropic gets to publish rough guidelines and keep the roadmap; practitioners get to build the surfaces, harnesses and maintenance loops those guidelines quietly imply, at least until Anthropic decides to package that missing layer up and sell it back to them. That is the move this article keeps making.

to infer is to…err

Article's stack is optimised for a narrow class of work: natural language questions over governed data. As we've learned it's not enough to, well, govern the data, you're required to build a layer on top that exposes this data downstream to the LLM for digestion. After that the magic of LLM inference begins beating us down with its light.

It may just be me, but once the work involves repeatable, inspectable, shareable artifacts across people and time, the article gets a lot less helpful. The more the work starts to look like a product rather than a chat conversation, the less helpful the article becomes.

This is where a missing work surface starts to matter. The moment a one-off question has to become something durable - revised, shared, scheduled, watched for drift - it needs somewhere to live. Someone has to build that place, and that someone is you.

data is not software, allegedly

The article really wants you to believe data is not software. Okay. And listen I get it. Code lives in an open solution space, there are multiple correct paths through. Analytics is either right or it's wrong because of the relational algebra or something.

Oddly, right after this statement the article instructs us to be more diligent about expressing our data in software then quietly implies we build more software for the model to have a good time. It seems to this author that the escape from inference hell is not to accept that data is different to software, but to drag more of the data workflow into software - a generational effort that has long been underway.

The more I think about it, "data is not software" starts to sound like an excuse once you notice that every accuracy step-change in the article comes from dragging more of the data work into the software. The model works when repo is clean, everything is defined, some poor bastard is updating the skill definitions on the daily, the evals are wired and paid for, the harness has no bugs. The triumph of inference in the flesh.

To be fair to the labs, people writing software were foolish enough to share it publicly for free for 30 years. Data work does not have such vast public exhaust. Its artifacts are private, political and scattered across internal tools. "Data is not software" sounds profound until you notice it also neatly describes a domain where the labs have far less high-quality exposure than they do in code.

there is no work surface

The more reliable Anthropic's setup becomes, the less it looks like "just ask Claude" and the more it looks like a real governed workflow: cleaned up models, semantic definitions, skill docs, evals, review loops, CI, ownership. I really did hope an LLM would take my job, I could learn gardening! Until then I have to ask: where exactly is all of this work supposed to live?

I don't know much but I do know that syncing markdown around and letting users stitch the rest together is probably not the answer.

To be fair, the article hints at the solution. Externalising business logic into governed artifacts is the right move. But those artefacts are not incidental prompt fuel. They need a first-class work surface of their own: somewhere to onboard onto, to maintain, to test, to review, to serve and evolve the actual workflow. Anthropic describes that workflow in enough detail to make the need obvious, then leaves the product as an exercise for the reader.

I may suggest some answers to this exercise next month, or whenever I am forced to write another blog post. Until then, I know of this one product that has given this workflow a home. Now what was the name again?

Written by