A Practical Guide to Context Management for Data Agents
2026/04/23Featuring:TL;DR: Data agents can query your warehouse in plain English, but without context they return inconsistent answers. English-based context outperforms semantic models for accuracy and ease of maintenance. SQL snippets speed up specific query patterns. Start with a blank context file and build it up as you find gaps.
Why context matters for data agents
Data agents let you query a data warehouse in plain English, but the same question asked ten times can produce ten different answers. LLMs don't understand your business definitions, data semantics, or domain logic unless you spell them out. This session covers what good context looks like and how to manage it as your team grows.
Three formats compared: English, semantic models, and SQL
Using benchmark data from the Dabstep sample dataset, the session tests three context formats against each other. English context — plain definitions and schema explanations — gets full accuracy because LLMs handle natural language best. Semantic models like Cube, LookML, and Omni perform worst because LLMs weren't trained on those structured languages. SQL context, using question-answer pairs, matches English on accuracy and runs faster for queries that resemble the provided examples.
Write context in English, add SQL snippets for frequent or complex queries, and if you already have a semantic model, translate it to English rather than feeding it to an LLM raw.
Writing effective context
Good context adds information the LLM can't guess from column names. Don't restate the obvious. Describing "company_name" as "the name of the company" wastes tokens and adds nothing. Define business-specific terms and clarify ambiguous metrics instead. Negative prompting ("don't use average") tends to backfire on current models, making them use the wrong function more often. Frame instructions positively. Database views work well as SQL context because LLMs can both reference and introspect them through the information schema.
Scaling context across your organization
The session breaks adoption into three stages. Solo explorers can start with a blank context file and build it up as they find gaps. Teams need shared context through GitHub sync, MotherDuck's MCP server, or tools like Virgil, plus lightweight ownership over who updates what. Organizations supporting both explorers and pedestrians — non-technical users who expect reliable answers — need mandatory test suites, granular access control, and alerts when questions fall outside test coverage. For more on building analytics agents with MotherDuck, see the docs.
FAQS
What is context management for data agents?
Context management means giving your data agent enough background — business definitions, how your tables relate to each other, what "revenue" actually means at your company — so it writes correct SQL instead of guessing.
Without that background, you get a familiar problem: ask the same question twice, get two different answers. The model doesn't know your org excludes refunds from revenue, or that "active customer" means someone who logged in within 90 days, not just anyone with an account. So it improvises, and it improvises differently each time.
Get the context right and the agent becomes useful for real business decisions. Get it wrong and you're debugging hallucinated joins at 11pm.
What's the best format for providing context to data agents — English, SQL, or semantic models?
LLMs were trained on massive amounts of natural language, so they parse English definitions well. Plain English descriptions of your data model give the best accuracy and are the easiest to maintain.
Adding SQL context — sample queries for your most common questions — matches English on accuracy and runs faster because the model spends less time exploring. Semantic model formats like Cube or LookML tend to do worse; LLMs just haven't seen enough of those languages to handle them reliably.
Write your definitions in plain English and include SQL snippets for the queries people actually ask.
How should I start adding context for data agents?
Start with no context. Run questions against your data, see where the LLM gets answers wrong, and add context to fix those specific failures. Focus on what the LLM can't figure out from your column and table names alone — how your business defines a metric, which tables join to which, what "active" actually means in your schema. Don't dump everything you have into the context file. Irrelevant context adds noise and slows things down. You can start with MotherDuck's MCP server or a markdown file on your machine.
How do I scale context management from one analyst to a full team?
Share your context file through GitHub or a shared workspace like MotherDuck. Pick someone to own keeping it current. Once non-technical users start relying on your data agents, add test cases that check answers against known correct results. Track questions that fall outside your test coverage so you can expand it. If your context files grow unwieldy, automate some of the curation.
Can I use my existing semantic layer as context for data agents?
You can, but you'll get better results if you translate it to English first. Semantic modeling languages like Cube and LookML predate LLMs, and LLMs aren't great at interpreting them. If you feed your raw semantic model straight to an LLM, you'll get worse answers than if you write out the same definitions in plain English. It's an extra step, but converting your semantic model into natural language descriptions makes a real difference in response accuracy.
Related Videos

2026-04-21
MotherDuck Now Speaks Postgres: Fast Analytics Without Changing Your Stack
MotherDuck's Postgres endpoint lets any Postgres client, driver, or BI tool query your data warehouse without installing a DuckDB library.
Webinar
MotherDuck Features
Ecosystem
SQL

60:41
2026-04-09
Zero-Latency Analytics in Your Application with Dives
BI tools were never built to be app interfaces — they're rigid, clunky, and add complexity to your user experience. Dives offer a different approach: interactive data apps you create with natural language that can be embedded directly into your applications. In this session, Alex Monahan walks through how to build a Dive and embed it in your app, with help from AI agents. You'll learn how to create interactive visualizations with natural language queries, embed Dives into your app with a secure sandbox, set up the auth flow so your users get read-only access without exposing credentials, and choose between server-side and dual execution with DuckDB-Wasm. Whether you're building customer-facing analytics or internal tools, this webinar shows you the full workflow from query to production-ready embed. See how Claude Code + Dives can get you from zero to a working data app fast.
Webinar
MotherDuck Features
BI & Visualization
App Development

