Question 1

What is context management for data agents?

Accepted Answer

Context management means giving your data agent enough background — business definitions, how your tables relate to each other, what "revenue" actually means at your company — so it writes correct SQL instead of guessing.

Without that background, you get a familiar problem: ask the same question twice, get two different answers. The model doesn't know your org excludes refunds from revenue, or that "active customer" means someone who logged in within 90 days, not just anyone with an account. So it improvises, and it improvises differently each time.

Get the context right and the agent becomes useful for real business decisions. Get it wrong and you're debugging hallucinated joins at 11pm.

Question 2

What's the best format for providing context to data agents — English, SQL, or semantic models?

Accepted Answer

LLMs were trained on massive amounts of natural language, so they parse English definitions well. Plain English descriptions of your data model give the best accuracy and are the easiest to maintain.

Adding SQL context — sample queries for your most common questions — matches English on accuracy and runs faster because the model spends less time exploring. Semantic model formats like Cube or LookML tend to do worse; LLMs just haven't seen enough of those languages to handle them reliably.

Write your definitions in plain English and include SQL snippets for the queries people actually ask.

Question 3

How should I start adding context for data agents?

Accepted Answer

Start with no context. Run questions against your data, see where the LLM gets answers wrong, and add context to fix those specific failures. Focus on what the LLM can't figure out from your column and table names alone — how your business defines a metric, which tables join to which, what "active" actually means in your schema. Don't dump everything you have into the context file. Irrelevant context adds noise and slows things down. You can start with MotherDuck's MCP server or a markdown file on your machine.

Question 4

How do I scale context management from one analyst to a full team?

Accepted Answer

Share your context file through GitHub or a shared workspace like MotherDuck. Pick someone to own keeping it current. Once non-technical users start relying on your data agents, add test cases that check answers against known correct results. Track questions that fall outside your test coverage so you can expand it. If your context files grow unwieldy, automate some of the curation.

Question 5

Can I use my existing semantic layer as context for data agents?

Accepted Answer

You can, but you'll get better results if you translate it to English first. Semantic modeling languages like Cube and LookML predate LLMs, and LLMs aren't great at interpreting them. If you feed your raw semantic model straight to an LLM, you'll get worse answers than if you write out the same definitions in plain English. It's an extra step, but converting your semantic model into natural language descriptions makes a real difference in response accuracy.

A Practical Guide to Context Management for Data Agents

Why context matters for data agents

Three formats compared: English, semantic models, and SQL

Writing effective context

Scaling context across your organization

FAQS

What is context management for data agents?

What's the best format for providing context to data agents — English, SQL, or semantic models?

How should I start adding context for data agents?

How do I scale context management from one analyst to a full team?

Can I use my existing semantic layer as context for data agents?

Related Videos

The Database Inside Your Lakehouse: A DuckLake Architecture Deep Dive

What Makes a Great Data Viz? DiveMaxxing Winners Revealed

From Source to Dashboard: Real Pipelines with Flights