MotherDuck as an AI Agent Data Layer: How to Power LLM Workflows with DuckDB and the MCP Server

19 min read
MotherDuck as an AI Agent Data Layer: How to Power LLM Workflows with DuckDB and the MCP Server

TL;DR

  • The MotherDuck MCP Server gives AI agents read and write SQL access to your warehouse through one org-wide setup, with query for reads and query_rw for writes.
  • DuckDB runs in-process with columnar vectorized execution, so agents retrieve context in sub-second time instead of waiting on network round-trips.
  • Hypertenancy isolates compute per agent, and Pulse instances meter per query rather than by wall-clock time, so exploratory or bursty agent queries cost you per query, not per hour of idle compute.
  • AheadComputing cut dashboard turnaround from 3 months to 1.5 hours, and Together AI runs 40 read replicas plus agent workloads without the cost spikes that hit compute-heavy warehouses under agent load.
  • Unlike Snowflake Cortex, MotherDuck stays model-agnostic and supports any model the day it ships.

Why your AI agent needs a data layer (and what goes wrong without one)

An AI agent without a structured data layer invents answers because it has nowhere to look them up. When you ask an agent about last quarter's revenue by region, and it has no warehouse to query, it either hallucinates a plausible-sounding number or gives up. The agent needs a place to retrieve real, current facts on demand, and that place is a database it can query directly.

The second failure shows up when you try to connect agent activity to the rest of your business. Agents generate prompts, responses, and telemetry, but that data usually lands in logs that never touch your revenue tables or user records. You end up unable to answer basic questions like which prompts drive conversions, because the prompt data and the business data live in separate places that never join.

The third failure is cost, and it hits hardest on warehouses that bill for compute time. Agents fire off exploratory queries constantly, most of them small and unpredictable, and a warehouse that spins up compute clusters for each one turns curiosity into a bill. Pablo Ferrari, Director of Data Engineering at Together AI, put it plainly. "On top of compute-heavy warehouses, agent-driven queries would create a serious cost problem. MotherDuck's architecture is an excellent fit for the kind of exploration and unpredictable queries that agents need to be useful."

MotherDuck backed by DuckDB closes all three gaps. Agents query real data through the MCP Server instead of guessing, they write prompt and telemetry data back into the same warehouse that holds your business metrics, and Hypertenancy isolates compute per agent so a burst of exploratory queries is billed per query instead of spiking your compute costs. The rest of this guide shows how each piece works.

What makes DuckDB the right engine for agent context retrieval

DuckDB earns its place in agent pipelines because of three properties, and "it's fast" only describes one of them. When an agent needs context to answer a question, it fires off small, exploratory queries in rapid succession. The engine underneath has to return results in the time a user waits for a chat reply, not the time a batch job takes.

The first property is in-process execution. DuckDB runs inside the same process that issues the query, so there is no network hop to a separate database server before work begins. That matters for agents because a single agent turn might trigger several schema lookups and follow-up queries. Each round-trip to a remote server adds latency the user feels, and eliminating those hops keeps a multi-query reasoning loop feeling instant.

The second property is columnar, vectorized execution. DuckDB stores data by column and processes it in batches rather than row by row, which is exactly the shape analytical questions take. An agent asking "what were error rates by model version last week" scans a few columns across many rows, and columnar layout means the engine reads only the columns it needs. Vectorized processing then runs those scans in tight, cache-friendly batches, which is how DuckDB returns sub-second answers on analytical queries.

The third property is how MotherDuck takes that engine into production. You already know DuckDB is fast on your laptop. MotherDuck runs it as a managed cloud service and scales it both up, with larger instances for heavy queries, and out, across many concurrent workloads. The mechanism that makes this work for agents is Hypertenancy, which gives each agent its own isolated compute.

Isolated compute per agent solves the two problems that kill agentic workloads on shared warehouses. Concurrent agents no longer contend for the same resources, so one agent running an expensive join does not slow down every other agent's queries. Pulse instances take this further by metering per query rather than by wall-clock time, so an agent that fires ten exploratory queries and then goes idle costs you ten queries, not an hour of a running warehouse. That billing model keeps unpredictable agent behavior from turning into runaway wall-clock charges, which is exactly the concern Together AI raised before choosing MotherDuck.

Setting up the MotherDuck MCP Server

The MotherDuck MCP Server is a standard Model Context Protocol endpoint that connects an AI agent directly to your MotherDuck warehouse. Once it's running, agents like Claude, Cursor, Claude Code, and Copilot can inspect your schemas, run queries, create Dives, and write results back. Because it speaks standard MCP, any model that supports the protocol works the day it ships. AheadComputing ran Opus 4.6 through the MotherDuck MCP the morning it released, while the same model wasn't yet available to them in Snowflake Cortex.

Setup happens once for your whole organization. You register the MCP Server against your MotherDuck token, and any authenticated Claude user (web, desktop, or Claude Code) gets warehouse access automatically. There's no per-user provisioning to manage and no separate credentials for each seat. A single admin configuration covers everyone who authenticates against your MotherDuck account.

The two MCP tools: query and query_rw

The MCP Server exposes two tools, and the distinction between them decides how much power you hand an agent. The query tool is read-only. It runs SELECT statements and nothing else, so an agent can explore tables, join context, and pull metrics without any chance of modifying your data. Use query as the default for anything that reads structured context back into a prompt.

The query_rw tool is read-write, and it opens up INSERT, UPDATE, DELETE, CREATE TABLE, and schema changes. That's what lets an agent store embeddings, log LLM telemetry to a table, save a view for later, or build a new table from a transformation it just reasoned through. Grant query_rw when an agent needs to persist its own output, and keep it scoped to agents you actually want writing to production. MotherDuck's write-access guide walks through the patterns for doing this safely.

The safest way to give an agent write freedom is a zero-copy clone. Each agent gets its own isolated playground database cloned from your real data, so it can create, drop, and rewrite tables without touching anything shared. The clone costs nothing to spin up because it references the same underlying storage until the agent writes, which means you can hand every agent a full sandbox and destroy it when the session ends.

Connecting your tools

The connection config is the same regardless of which client you use. You point the client's MCP settings at the MotherDuck endpoint with your token, and Claude, Cursor, Claude Code, or Copilot picks it up on restart. From there the agent sees your warehouse as an available tool and starts issuing SQL on its own. For the full walkthrough, including how MCP fits into a broader ingest-and-analyze loop, MotherDuck's MCP and DuckDB introduction covers the wiring in detail.

Code: querying MotherDuck from an AI agent

The MCP Server exposes two tools your agent calls with SQL. The query tool runs read-only SELECT statements, and query_rw handles writes. Here is what each looks like in practice.

Reading structured context with query

When an agent needs recent conversation history joined to your business data, it calls query with a single SELECT. The following pulls the last day of prompt/response logs and attaches account tier from your metrics table.

Copy code

SELECT l.session_id, l.prompt, l.response, l.created_at, m.account_tier, m.monthly_active_users FROM llm_logs AS l JOIN account_metrics AS m ON l.account_id = m.account_id WHERE l.created_at >= now() - INTERVAL 1 DAY ORDER BY l.created_at DESC LIMIT 50;

Because DuckDB runs this in-process, the agent gets structured context back in well under a second, and the join to business metrics happens in the same query rather than in application code.

Writing telemetry back with query_rw

Agents should record what they did so you can measure quality and cost later. The query_rw tool lets an agent insert telemetry rows into a logging table after each turn.

Copy code

INSERT INTO llm_telemetry (session_id, model, prompt_tokens, completion_tokens, latency_ms, created_at) VALUES ('sess_8842', 'opus-4.6', 1240, 380, 910, now());

The same tool creates tables, so an agent can stand up an embeddings store on first run and populate it as it works.

Copy code

CREATE TABLE IF NOT EXISTS doc_embeddings ( doc_id VARCHAR, chunk TEXT, embedding FLOAT[1536] ); INSERT INTO doc_embeddings VALUES ('doc_017', 'MotherDuck runs DuckDB in the cloud.', [0.021, -0.113, /* ... */]);

Grant query_rw deliberately, since it also permits UPDATE, DELETE, and schema changes. The write-access guide covers the tradeoffs in more detail.

Handing an agent an isolated sandbox

Zero-copy clones give each agent its own database to experiment in without touching production. A clone shares the underlying storage until the agent writes, so it costs nothing to create and isolates any destructive mistakes.

Copy code

CREATE DATABASE agent_sandbox FROM production_warehouse;

Point the agent at agent_sandbox, and it can create tables, rewrite data, and run exploratory queries freely. When it finishes, drop the clone and the experiment disappears with it.

Copy code

DROP DATABASE agent_sandbox;

Pair a clone with query_rw when you want an agent to iterate on schema or transformations, and keep production behind query alone. That split gives you the write-back power these patterns need without exposing your live warehouse to an agent still figuring out what it wants to do.

Architectural patterns: MotherDuck as structured agent memory

Once your agent can read and write SQL, three patterns show up again and again in agentic pipelines. Each one solves a distinct failure mode, and each one has a concrete table shape you can build today.

Warehouse as structured memory

Agents forget everything between sessions, so they re-derive context they already computed or hallucinate facts they never had. A memory table fixes this. Store a agent_memory table keyed on entity_id, fact_type, value, and updated_at, and let the agent query it before it answers rather than reconstructing state from scratch. AheadComputing does a version of this with Agent Skills, markdown files that encode table relationships and metric definitions so Claude Code knows that "latest release model" maps to a specific Unix path instead of guessing. Skills are version-controlled and portable across every model, so the memory travels with the agent, not the vendor.

Event logging for LLM telemetry

Without a log, you cannot see which prompts fail, how much a workflow costs, or whether a model regressed after an update. Give the agent a llm_events table and let it write to it through query_rw. A useful shape carries event_id, session_id, model, prompt_tokens, completion_tokens, latency_ms, cost_usd, and a created_at timestamp. Because MotherDuck runs analytical queries in sub-second time, you can query this table live to spot a spike in token spend or a slow model the moment it happens. The agent logs its own behavior, and you query that behavior with the same engine it uses for everything else.

Joining prompt/response data with business metrics

Telemetry in isolation tells you the agent ran. It does not tell you whether the agent moved a number that matters. The join is where value shows up. Keep your llm_events table alongside your existing orders, signups, or support_tickets tables, and join on a shared key like session_id or user_id. A query that groups conversion rate by the model that handled each session answers a question no telemetry dashboard can. You learn which agent decisions changed the business, not only which ones ran.

Keeping these tables current is where Flights do the work. Flights are agent-native Python pipelines you build, deploy, and schedule from a single prompt, a SQL function, or the MotherDuck UI. Point one at your Postgres tables, an S3 bucket of Parquet, or a HubSpot list, and it refreshes on a schedule with retries and an audit log. Your memory tables stay fresh, your telemetry keeps flowing, and the business tables you join against reflect reality instead of last month. Skills teach the agent what your data means, and Flights keep that data arriving, so the three patterns run as a system rather than a one-time setup.

MotherDuck vs Snowflake Cortex AI for agent workflows

The core difference between MotherDuck and Snowflake Cortex AI is that Cortex is a proprietary agent layer, and MotherDuck exposes a standard MCP endpoint any model can talk to. Cortex decides which models you get access to and when. MotherDuck's MCP Server speaks the Model Context Protocol, so Claude, Cursor, Claude Code, and Copilot all connect the same way, and any model works the day it ships.

That day-one support is not theoretical. When a new Claude model shipped, AheadComputing ran it through the MotherDuck MCP the same morning but couldn't yet use it in Snowflake Cortex. If your agents depend on the strongest available reasoning model to write correct SQL and reason over schemas, waiting on your warehouse vendor to certify each release is a real ceiling on quality.

Cortex also fragments the agent surface by tool type. You provision one agent for natural-language-to-SQL, a separate one for document queries, and separate endpoints for each. MotherDuck gives you a single MCP setup for the whole org. Any authenticated Claude user gets warehouse access automatically, with no per-user provisioning and no separate endpoint to wire up for each capability.

Permissions are where the gap gets painful in practice. An AheadComputing engineer described Snowflake's role model plainly: "For every single thing I wanted to do, I had to grant permission." Every table, every operation, another grant. MotherDuck's query and query_rw tools, combined with zero-copy clones for isolated agent sandboxes, let you give an agent a scoped playground without hand-authoring a permission matrix first.

The cost model is the deciding factor for exploratory agent workloads. Agents run unpredictable queries, and a lot of them. On a compute-heavy warehouse billed by cluster uptime, that pattern gets expensive fast. That was the exact concern Together AI's Pablo Ferrari raised before choosing MotherDuck: compute-heavy warehouses turn agent-driven queries into a serious cost problem. MotherDuck's Hypertenancy isolates compute per agent, and Pulse instances meter per query rather than by wall-clock time, so concurrent and exploratory queries are metered per query instead of forcing you into rate limits and quotas.

Best for

Choose Snowflake Cortex AI if your data already lives in Snowflake and you want a bundled, vendor-managed answer without assembling the pieces yourself. The tradeoff is model choice, setup simplicity, and cost control for agent traffic.

Choose MotherDuck if you want model freedom, one MCP setup instead of an endpoint per tool type, and a cost model that survives the unpredictable query volume agents generate. For teams building agentic pipelines rather than extending an existing Snowflake footprint, MotherDuck gives you lower total cost of ownership and a shorter path to production.

Warehouse options for AI agent workflows: comparison table

Each warehouse handles agent workloads differently, and the gap shows up most in how they price exploratory queries and isolate compute per agent. The table below compares the five options data teams actually shortlist for agentic pipelines.

MotherDuckSnowflake Cortex AIBigQueryRedshiftClickHouse
MCP supportNative, model-agnostic endpointProprietary agent layer, separate endpoints per toolNo native MCPNo native MCPCommunity MCP tooling
Model agnosticismAny model, same-day supportLimited to supported models, release-day lagN/AN/AN/A
Compute isolation per agentHypertenancy plus zero-copy clonesWarehouse-scoped, shared computeSlot-based sharingCluster-scopedShared cluster
Query latency profileSub-second on analytical queriesSecond-to-minutesSecondsSeconds to minutesSub-second, needs SQL rewrites
Cost model for exploratory queriesMetered per query, no idle chargesCompute-heavy, spikes under agent loadPer-byte scannedProvisioned cluster hoursProvisioned nodes
Write-back supportquery_rw for inserts, tables, telemetrySupported with RBAC overheadSupportedSupportedLimited

Best for

  • MotherDuck: developers who want model freedom, sub-second context retrieval, and costs that track usage rather than spiking under exploratory agent load.
  • Snowflake Cortex AI: teams already all-in on Snowflake who want a bundled agent answer and accept the model lag.
  • BigQuery: Google Cloud shops that price by bytes scanned and don't need native MCP yet.
  • Redshift: AWS-centric teams with steady, predictable query volume rather than bursty agent traffic.
  • ClickHouse: raw-speed use cases where you control the SQL and can absorb the rewrites its dialect demands.

Real-world evidence: AheadComputing and Together AI

AheadComputing built its data platform around agents from the first day, and the numbers show why the pattern works. The RISC-V CPU startup put Claude Code in the middle and MotherDuck at the MCP endpoint, not behind a BI dashboard. Hardware engineers ask Claude Code a question, Claude Code queries MotherDuck over MCP, and the answer comes back in the interface the engineer already uses. Dashboard requests that used to take three months take 1.5 hours, and 74% of 121 engineers onboarded within weeks.

AheadComputing evaluated Snowflake first and walked away for concrete reasons. Cortex is a proprietary agent layer that limits which models you can run, and it needs a separate MCP endpoint for each tool type, so natural-language SQL and docs queries live behind different agents. New Claude models weren't available in Cortex on release day the way they were through the MotherDuck MCP. On top of that, RBAC turned every task into a permission grant. MotherDuck gave them one MCP setup where any model works on release day, and that difference decided the evaluation.

The production impact went past dashboards. CI wait times dropped from eight hours to one or two using MotherDuck and Claude Code Dives, and a power-analytics tool that ran up to $50K per six-month license got displaced. Ben Holtzman summed up the shift: "Every meeting now is shifted from trying to find answers to what to do about them."

Together AI picked MotherDuck for a different reason, and the benchmark decided it. The $3.3B AI infrastructure company ran a 100GB TPC-DS test against Redshift, Athena, and ClickHouse, and MotherDuck came out fastest. ClickHouse couldn't run standard ANSI SQL without rewrites, and only 10 to 20 of the TPC-DS queries ran out of the box, so it never made it past the first round.

What matters for agent workloads is what happened after the benchmark. Together AI runs 40 read-scaling replicas alongside 128 Hex users across product teams, the C-suite, and the board, all self-serving on gold tables. Secoda, Hex AI, and custom-built agents fire smaller exploratory queries on top of that, and none of it triggers a cost spike. The serverless model means no rate limits or quotas gate the agent queries, and real-time Kinesis streams flow straight in at roughly 5 to 6 TB per month on the larger tables.

The cost model is what made that possible: agent queries that would need rate-limiting on a pricier warehouse run freely here. Pablo Ferrari, Director of Data Engineering, was blunt about the result: "The MotherDuck-Hex combo is fantastic. It's been transformative for the company." Both companies landed on the same conclusion from opposite starting points. AheadComputing wanted model freedom and simplicity, Together AI wanted exploratory queries that stay cheap, and MotherDuck delivered on both in production.

Getting started: from zero to agent queries in a day

You can go from an empty account to an agent running SQL against your warehouse in an afternoon. The path has five steps, and each one builds on the last.

Start by creating a MotherDuck account. Load a table or two so your agent has something real to query, or point a Flight at an existing source if you already have data sitting in Postgres or S3.

Next, connect the MCP Server to whichever agent you use, whether that's Claude, Cursor, Claude Code, or Copilot. One setup covers your whole org, so every authenticated user gets warehouse access without per-user provisioning.

Once the connection is live, run a query tool call. Ask your agent a question in plain language and watch it inspect the schema, write the SELECT, and hand back results. Seeing that first round-trip work is the moment the whole pattern clicks.

From there, add Agent Skills for your domain. Write markdown skill files that teach the agent your table relationships, metric definitions, and business terminology so it stops guessing and starts answering with your definitions.

Finally, schedule a Flight for recurring ingestion. A Flight keeps your warehouse fresh on a schedule, so the context your agent retrieves reflects the current state of your business rather than last week's snapshot.

The MotherDuck cookbook has copy-paste recipes for each of these steps, and the analytics agents guide walks through building self-service agents on top of this foundation.

Start using MotherDuck now!

FAQS

Without a queryable data layer, an agent has nowhere to look up real facts, so it hallucinates or gives up. It also can't connect its own prompt and telemetry data to business metrics, and its constant exploratory queries get expensive on warehouses that bill for compute time. A warehouse the agent can query directly through MotherDuck's MCP Server closes all three gaps.

It's a standard Model Context Protocol endpoint that connects AI agents like Claude, Cursor, Claude Code, and Copilot directly to your MotherDuck warehouse to inspect schemas, run SQL, create Dives, and write results back. It's configured once for the whole organization, and because it speaks standard MCP, any model that supports the protocol works the day it ships.

query is read-only — it runs SELECT statements only, so an agent can safely explore tables and pull context with no chance of modifying data. query_rw is read-write, enabling INSERT, UPDATE, DELETE, CREATE TABLE, and schema changes so an agent can persist telemetry, store embeddings, or build new tables. Grant query_rw deliberately, only to agents you want writing to production.

Hypertenancy gives each agent its own isolated compute, so concurrent agents don't contend for resources. Pulse instances meter per query rather than by wall-clock time, so an agent that fires ten exploratory queries and then goes idle costs ten queries, not an hour of a running warehouse. That's why bursty agent traffic doesn't create the cost spikes it would on compute-heavy warehouses.

Combine query_rw (scoped to specific agents) with a zero-copy clone — an isolated sandbox database cloned from your real data. The clone costs nothing to create because it shares the underlying storage until the agent writes, so the agent can create, drop, and rewrite tables without touching production. Drop the clone when the session ends.

Three properties. It runs in-process, so there's no network hop to a separate database server on each query. It uses columnar, vectorized execution, reading only the columns a query needs in cache-friendly batches. And MotherDuck runs it as a managed cloud service with Hypertenancy for per-agent isolation. Together these return sub-second results — the latency profile agents need when firing many small queries in a reasoning loop.

Cortex is a proprietary agent layer that limits which models you can use (with release-day lag), needs a separate endpoint per tool type, and requires an RBAC grant for nearly every operation. MotherDuck exposes a single, model-agnostic MCP endpoint (any model, same-day support), uses query and query_rw plus zero-copy clones instead of a hand-authored permission matrix, and meters per query so exploratory agent traffic stays affordable.

AheadComputing cut dashboard turnaround from 3 months to 1.5 hours and CI waits from 8 hours to 1–2, onboarding 74% of 121 engineers within weeks. Together AI, after MotherDuck won a 100GB TPC-DS benchmark against Redshift, Athena, and ClickHouse, runs 40 read-scaling replicas plus 128 Hex users and agent workloads (Secoda, Hex, and custom agents) with no rate limits or cost spikes.

Create a MotherDuck account and load a table, or point a Flight at Postgres or S3. Connect the MCP Server to your agent — one org-wide setup covers every authenticated user. Run a query call and watch the agent inspect the schema and return results. Then add Agent Skills for your domain and schedule a Flight for recurring ingestion so retrieved context stays current.