The layer is coupled to the model
Everything above is tuned to one model's view of the world — gemini-3-flash, at low reasoning. The layer is not generic. Hand it to a different model and it might do fine — 90%, 98%, who knows. But the edge-case tuning that bought the last few points to 100% is 100% specific to the model I tuned against. You cannot pull the three apart. The LLM, the data, and the layer are a single system, and the layer is the artifact where they meet.
The coupling itself isn't new — I said as much up top. What's easy to miss is that this time it's hidden. LookML and DAX wore the lock-in on the outside; you always knew which tool you'd married. The new layer is just plain text — no DSL, no proprietary modeling language — so it looks free, but it's not. You won't feel the binding until the model changes, or a newer, faster / cheaper / more accurate model drops.
That's just the shape of an AI-native semantic layer. It's not a problem to solve, but it has consequences I didn't fully appreciate until I was building on top of them.
You can't let users freely pick models. If finance runs one model and marketing runs another, they get different answers from the same data and the same layer. Consistency was the entire reason semantic layers exist. So you own the model interface, or you maintain one layer per model. There's no third option where everyone picks their favorite and the numbers still agree.
Versioning is a moving target. This is the strange one to me. A third-party model can change underneath you — Opus shifts behavior through system-prompt updates fairly often — with no change to your data and no change to your layer. Questions that were right start coming back differently. There's no way to detect this drift except by continuously running the eval. And there's no "roll back," because the thing that changed isn't yours. The layer is a living thing you run and re-tune, not a definition you write once and freeze.
This is an argument for owning the model. Not because a smaller open-source model is smarter (it's not), but because it's controllable. It won't change out from under you on someone else's release schedule. That's a real reason to consider running your own, and it's a genuine trade-off to weigh against just using the best hosted model and re-tuning when it moves.
I want to be careful here, because I've spent a while arguing that frontier models are basically commodities for SQL work — pick any of them, you land in the same place. Both things are true, at different layers. The recipe is portable: hierarchical retrieval, an LLM-authored layer, a scriptable refinement loop — that works on any model. The tuned artifact is coupled. The serving model is a cheap, swappable commodity right up until you swap it, at which point you're not changing a setting, you're re-running the loop. You should pick the model deliberately.