---
title: "AI and MotherDuck"
description: "Practical guides for using AI with MotherDuck."
---

# AI and MotherDuck

Practical guides for using AI with MotherDuck.

## Included pages

- [Connect to the MotherDuck MCP Server](https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-setup): Set up the MotherDuck MCP Server with Claude, ChatGPT, Cursor, Claude Code, and other AI assistants
- [Using the MotherDuck MCP Server](https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-workflows): Effective workflows and best practices for getting the most out of the MotherDuck MCP Server with AI assistants
- [Restricting to read-only access](https://motherduck.com/docs/key-tasks/ai-and-motherduck/securing-read-only-access): Restrict the remote MCP server to read-only queries using client-side blocking, read scaling tokens, or proxy filtering
- [Creating Visualizations with Dives](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives): Build interactive visualizations from natural language using AI agents and the MotherDuck MCP Server
- [Dive theme gallery](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/dive-theme-gallery): Ready-to-use theme prompts for Dives with screenshots showing each style applied to the same dataset
- [Theming and styling your Dives](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/theming-and-styling-dives): Control the visual appearance of your Dives with theme definitions, chart selection, and interactive filters
- [Managing Dives as Code](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/managing-dives-as-code): Set up a Git-based workflow for developing, previewing, and deploying Dives with GitHub Actions and Claude Code
- [AI Features in the MotherDuck UI](https://motherduck.com/docs/key-tasks/ai-and-motherduck/ai-features-in-ui): Use AI-powered SQL editing, FixUp, and natural language queries in the MotherDuck web interface.
- [Custom AI Agent Builder's Guide](https://motherduck.com/docs/key-tasks/ai-and-motherduck/building-analytics-agents): Build AI-powered analytics agents using MotherDuck's SQL functions and MCP server integration.
- [Embedding Dives in your web application](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/embedding-dives): Embed interactive MotherDuck Dives in your web app using iframes and embed sessions
- [Text Search in MotherDuck](https://motherduck.com/docs/key-tasks/ai-and-motherduck/text-search-in-motherduck): Text search strategies from pattern matching to semantic search with embeddings in MotherDuck.

Source: https://motherduck.com/docs/category/ai-and-motherduck

---

## Connect to the MotherDuck MCP Server

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-setup

> Set up the MotherDuck MCP Server with Claude, ChatGPT, Cursor, Claude Code, and other AI assistants

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import ClaudeIcon from '../../../static/img/icons/brands/claude-icon';
import ChatGPTIcon from '../../../static/img/icons/brands/chatgpt-icon';
import CursorIcon from '../../../static/img/icons/brands/cursor-icon';
import ExternalLinkIcon from '../../../static/img/icons/external-link-icon';
import VideoPlayer from '@site/src/components/VideoPlayer';
import useBaseUrl from '@docusaurus/useBaseUrl';
import DocImage from '@site/src/components/DocImage';

The MotherDuck MCP Server lets AI assistants query and explore your databases using the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). This guide walks you through connecting your preferred AI client to the **remote MCP server** (fully managed, zero setup).

## Prerequisites

- A MotherDuck account ([sign up free](https://app.motherduck.com))
- An MCP-compatible AI client (Claude, ChatGPT, Cursor, Claude Code, Codex, or others)

## Set up the remote MCP server

Select your MCP client and follow the instructions to connect.

<Tabs groupId="mcp-client">
  <TabItem value="claude" label="Claude" default>

<a href="https://claude.ai/directory/0929a5c7-38ce-40ab-8aad-af9ce34553c7" target="_blank" rel="noopener noreferrer" style={{display: 'inline-flex', alignItems: 'center', gap: '8px', padding: '8px 14px', background: 'var(--ifm-background-color)', border: '1px solid var(--ifm-color-emphasis-300)', borderRadius: '4px', color: 'var(--ifm-font-color-base)', fontSize: '14px', fontWeight: 500, textDecoration: 'none', marginBottom: '16px'}}>
  <ClaudeIcon />
  <span>Add MotherDuck to Claude</span>
  <ExternalLinkIcon />
</a>

Or manually:

1. Go to **Settings** → **Connectors**
2. Click **Browse Connectors** to find the MotherDuck connector

![MotherDuck Connector in the Claude connector Directory](./img/claude-connectors-motherduck.png)

A browser window should open for authentication. After authentication you can double check the connection by asking "List all my databases on MotherDuck."

  </TabItem>
  <TabItem value="chatgpt" label="ChatGPT">

<a href="https://chatgpt.com/apps/motherduck/asdk_app_696a54f1c91c81919002b9153ce0e336" target="_blank" rel="noopener noreferrer" style={{display: 'inline-flex', alignItems: 'center', gap: '8px', padding: '8px 14px', background: 'var(--ifm-background-color)', border: '1px solid var(--ifm-color-emphasis-300)', borderRadius: '4px', color: 'var(--ifm-font-color-base)', fontSize: '14px', fontWeight: 500, textDecoration: 'none', marginBottom: '16px'}}>
  <ChatGPTIcon />
  <span>Add MotherDuck to ChatGPT</span>
  <ExternalLinkIcon />
</a>

<VideoPlayer
  playbackId="GU41jMZ01O01uWGaukLbqj1Ee14tnHq800AV6OGWpvQbcE"
  title="Setting up MotherDuck in ChatGPT"
  autoPlay="true"
  muted="true"
  loop="true"
/>

1. Open the ChatGPT desktop or web app
2. Go to **Settings** → **Apps** and click **Browse Apps**

<img src={useBaseUrl('/img/key-tasks/ai-and-motherduck/chatgpt-browse-apps.png')} alt="Browse Apps in ChatGPT settings" style={{maxWidth: '500px', width: '100%', display: 'block', margin: '0 auto'}} />

3. Search for **MotherDuck** and select it

<img src={useBaseUrl('/img/key-tasks/ai-and-motherduck/chatgpt-search-motherduck.png')} alt="Searching for MotherDuck in the ChatGPT App Store" style={{maxWidth: '500px', width: '100%', display: 'block', margin: '0 auto'}} />

4. Click **Continue to MotherDuck** and authenticate with your MotherDuck account

<img src={useBaseUrl('/img/key-tasks/ai-and-motherduck/chatgpt-connect-motherduck.png')} alt="Connect MotherDuck dialog in ChatGPT" style={{maxWidth: '500px', width: '100%', display: 'block', margin: '0 auto'}} />

After authentication, ChatGPT can access your MotherDuck data. Try asking "List all my databases on MotherDuck" to verify the connection.

  </TabItem>
  <TabItem value="cursor" label="Cursor">
<a href="cursor://anysphere.cursor-deeplink/mcp/install?name=motherduck&config=eyJ1cmwiOiJodHRwczovL2FwaS5tb3RoZXJkdWNrLmNvbS9tY3AifQ%3D%3D" target="_blank" rel="noopener noreferrer" style={{display: 'inline-flex', alignItems: 'center', gap: '8px', padding: '8px 14px', background: 'var(--ifm-background-color)', border: '1px solid var(--ifm-color-emphasis-300)', borderRadius: '4px', color: 'var(--ifm-font-color-base)', fontSize: '14px', fontWeight: 500, textDecoration: 'none', marginBottom: '16px'}}>
  <CursorIcon />
  <span>Add MotherDuck to Cursor</span>
  <ExternalLinkIcon />
</a>

1. Open **Cursor Settings** (`Cmd/Ctrl + ,`)
2. Navigate to **Tools & MCP**
3. Click **+ New MCP Server**
4. Add the following to the configuration file:

```json
{
  "MotherDuck": {
    "url": "https://api.motherduck.com/mcp",
    "type": "http"
  }
}
```

5. Save and click **Connect** to authenticate with your MotherDuck account

> [Cursor MCP Documentation](https://docs.cursor.com/context/model-context-protocol)

  </TabItem>
  <TabItem value="claude-code" label="Claude Code">

1. Run the following command in your terminal:

```bash
claude mcp add MotherDuck --transport http https://api.motherduck.com/mcp
```

:::tip
By default, this command adds the MCP server to the current project.
You can also pass the `--scope user` flag, and the MCP server will be
available for all sessions from your current user
([`--scope` documentation](https://code.claude.com/docs/en/mcp#mcp-installation-scopes)).
:::

2. Run `claude` to start Claude Code
3. Type `/mcp`, select **MotherDuck** from the list, and press **Enter**
4. Select **Authenticate** and confirm the authorization dialog

> [Claude Code MCP Documentation](https://code.claude.com/docs/en/mcp)

  </TabItem>
  <TabItem value="github-copilot" label="GitHub Copilot (VS Code)">

Configure GitHub Copilot in VS Code to use the MotherDuck MCP server through a workspace config file:

1. Open the Command Palette (`Cmd/Ctrl + Shift + P`) and run **MCP: Add Server** to open `.vscode/mcp.json`. You can also create the file manually in your workspace. Add this configuration:

```json
{
  "servers": {
    "motherduck": {
      "type": "http",
      "url": "https://api.motherduck.com/mcp"
    }
  }
}
```

2. Save the file and start the server from the **Start** code lens that appears above the `motherduck` entry in `mcp.json`. You can also start it through the Command Palette: `MCP: List Servers` → **motherduck** → **Start Server**.

3. VS Code opens a browser window so you can sign in to MotherDuck through OAuth, then stores the credentials for subsequent server starts.

4. Open the Copilot Chat view, switch to **Agent** mode, and confirm that the MotherDuck tools appear in the tool picker. Try asking "List all my databases on MotherDuck" to verify the connection.

**Authenticate with an access token instead of OAuth**

If you'd rather provide a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#creating-an-access-token) explicitly, use a `promptString` input and a `Bearer` Authorization header. VS Code prompts for the token when the server starts and stores it in its secret store:

```json
{
  "inputs": [
    {
      "type": "promptString",
      "id": "motherduck-token",
      "description": "MotherDuck access token",
      "password": true
    }
  ],
  "servers": {
    "motherduck": {
      "type": "http",
      "url": "https://api.motherduck.com/mcp",
      "headers": {
        "Authorization": "Bearer ${input:motherduck-token}"
      }
    }
  }
}
```

> [VS Code MCP Documentation](https://code.visualstudio.com/docs/copilot/chat/mcp-servers)

  </TabItem>
  <TabItem value="copilot-studio" label="Copilot Studio">

[Microsoft Copilot Studio](https://learn.microsoft.com/en-us/microsoft-copilot-studio/) is a cloud-hosted platform for building agents that run inside Microsoft 365, Teams, and other Microsoft surfaces. Because the platform runs in Microsoft's cloud, it connects to the **remote** MotherDuck MCP server — either with OAuth (each user signs in with their own MotherDuck account) or with a shared API key backed by a service-account token.

1. In Copilot Studio, open your agent. Under **Tools**, click **Add a tool**.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/01-add-tool.png"
     alt="Copilot Studio agent Tools tab with Add a tool button"
     width={700}
   />

2. In the **Add tool** dialog, under **Create new**, click **Model Context Protocol**.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/02-mcp-option.png"
     alt="Add tool dialog with Model Context Protocol highlighted under Create new"
     width={700}
   />

3. Fill in the MCP server details and pick an authentication method:

   - **Server name**: `MotherDuck MCP`
   - **Server description**: `Connect to MotherDuck, query your data, create Dives and more!`
   - **Server URL**: `https://api.motherduck.com/mcp`
   - **Authentication**: either `OAuth 2.0` or `API key` (see below)

   **Option A — OAuth 2.0 (dynamic discovery).** Each end user signs in to MotherDuck with their own account when they first use the agent. Select **OAuth 2.0** and leave **Dynamic discovery** as the type, then click **Create**.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/03a-oauth-auth.png"
     alt="MCP server configuration with OAuth 2.0 Dynamic discovery selected"
     width={500}
   />

   **Option B — API key (shared service-account token).** All end users share a single MotherDuck token. Useful when you don't want every user to provision a MotherDuck account, for example a Teams bot exposed to a wide audience. Select **API key**, set **Type** to `Header`, enter `Authorization` as the **Header name**, and click **Create**.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/03b-api-key-auth.png"
     alt="MCP server configuration with API key authentication, Header type, and Authorization header name"
     width={500}
   />

   :::caution
   **Header name** must be `Authorization` — not `Bearer`. The `Bearer` prefix belongs in the *value* you enter in step 5.
   :::

4. Back in the **Add tool** dialog for MotherDuck MCP, open the **Connection** dropdown and click **Create new connection**.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/04-create-connection.png"
     alt="Connection dropdown showing Create new connection option"
     width={700}
   />

   The next step depends on the authentication method you picked in step 3:

   - **OAuth 2.0**: Copilot Studio opens a browser window that redirects to MotherDuck. The end user signs in to their MotherDuck account and approves the request. The connection is created once authentication completes — skip to step 6.
   - **API key**: Copilot Studio shows the token entry dialog described in step 5.

5. In the **Connect to MotherDuck MCP** dialog, enter your MotherDuck access token prefixed with `Bearer `:

   ```text
   Bearer <your_motherduck_token>
   ```

   Replace `<your_motherduck_token>` with an actual token from [MotherDuck → Settings → Access Tokens](https://app.motherduck.com/settings/tokens), then click **Create**.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/05-bearer-token.png"
     alt="Connect to MotherDuck MCP dialog with the Bearer token entered"
     width={700}
   />

   :::tip
   If the agent is published and used by many end users, create a dedicated [service account](/key-tasks/service-accounts-guide/) and use a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) so the agent can't modify data. See [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/) for details.
   :::

6. Once the connection shows a green check mark, click **Add and configure**. Copilot Studio confirms the tool was added successfully.

7. The MotherDuck MCP entry opens with the full tool list. Enable or disable tools based on what the agent should be allowed to do (for example, disable `query_rw` if the agent should stay read-only), then click **Save**.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/07-tools-list.png"
     alt="MotherDuck MCP tool list with toggles for query, query_rw, list_databases, list_tables, list_columns, search_catalog, ask_docs_question, and others"
     width={700}
   />

8. Open the agent's connection manager and click **Connect** on the MotherDuck MCP entry, then submit. This reuses the connection you created in step 5.

9. Switch to the **Test** pane and ask a question that exercises the tools, for example *"What's the highest rated movie with over 10k votes in my IMDB database?"*. The agent calls the MotherDuck tools and responds with live data from your databases.

   <DocImage
     src="/img/key-tasks/ai-and-motherduck/copilot-studio/09-test-agent.png"
     alt="Copilot Studio test pane showing the agent calling the query tool and returning IMDB results from MotherDuck"
     width={900}
   />

:::note
When you authenticate with an API key, all users of the Copilot Studio agent share the same MotherDuck token. Queries run by any end user are attributed to the service account that owns the token, not to the individual Microsoft 365 user. Use OAuth 2.0 if you need per-user attribution.
:::

> [Copilot Studio MCP documentation](https://learn.microsoft.com/en-us/microsoft-copilot-studio/mcp-add-existing-server-to-agent)

<details>
<summary>Alternative: Power Automate custom connector (OpenAPI)</summary>

If you'd rather wire the MotherDuck MCP server in as a [Power Automate custom connector](https://learn.microsoft.com/en-us/connectors/custom-connectors/) (for example, to share the connector across Copilot Studio and Power Automate flows in the same environment), you can import the following OpenAPI 2.0 spec. The `x-ms-agentic-protocol: mcp-streamable-1.0` extension tells Copilot Studio to treat the connector as a streamable MCP server.

```yaml
swagger: '2.0'
info:
  title: MotherDuck Remote MCP
  description: The remote MCP to connect to MotherDuck tools, docs and more
  version: 1.0.0
host: api.motherduck.com
basePath: /
schemes:
  - https
paths:
  /mcp:
    post:
      summary: MotherDuck Remote MCP
      description: The remote MCP to connect to MotherDuck tools, docs and more
      operationId: InvokeServer
      x-ms-agentic-protocol: mcp-streamable-1.0
      responses:
        '200':
          description: Immediate Response
securityDefinitions:
  api_key:
    type: apiKey
    in: header
    name: Authorization
security:
  - api_key: []
```

In Power Automate, go to **Custom connectors → New custom connector → Import an OpenAPI file**, paste the spec above, and save. When you create a connection, enter `Bearer <your_motherduck_token>` as the API key value — the same format as the native MCP flow described above.

</details>

  </TabItem>
  <TabItem value="others" label="Others">

If you're using **Windsurf**, **Zed**, or another MCP-compatible client, use the following JSON configuration:

```json
{
  "mcpServers": {
    "MotherDuck": {
      "url": "https://api.motherduck.com/mcp",
      "type": "http"
    }
  }
}
```

  </TabItem>
</Tabs>

:::tip Authentication
The remote MCP server uses OAuth, so you'll authenticate with your MotherDuck account during setup. Some clients also support [token-based authentication](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#creating-an-access-token) through a Bearer header.
:::

## Configuring tool permissions

Most MCP clients let you control how the AI uses each tool. The exact UI varies by client, but the general permission levels are:

| Permission | Behavior |
|------------|----------|
| **Always allow** | The AI uses the tool automatically without asking. Faster iteration when errors occur, but no human confirmation before each action. |
| **Needs approval** | The AI asks for your confirmation before each tool use. Gives you visibility into every action. |
| **Blocked** | The AI cannot use this tool. |

:::tip
The MCP Server provides both read-only (`query`) and read-write (`query_rw`) tools. For exploratory analysis, setting read-only tools to "Always allow" enables faster back-and-forth when the AI needs to retry or refine queries. You can keep `query_rw` on "Needs approval" or block it if you only need read access. See [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/) for more options.
:::

## Remote vs local MCP server

MotherDuck offers two MCP server options:

| Server | Best for | Setup | Access |
|--------|----------|--------|--------|
| **Remote** (hosted by MotherDuck) | Most users who query and modify data on MotherDuck cloud | Zero setup; connect through URL and OAuth | Read-write |
| **Local** ([mcp-server-motherduck](https://github.com/motherduckdb/mcp-server-motherduck)) | Self-hosted use; local DuckDB files; or when you need full customization | Install and run the server yourself | Fully customizable |

The **remote server** is recommended for most use cases. Use the **local server** when you need to work with local DuckDB files, want custom tool configurations, or require full control over the server environment.

<a href="https://github.com/motherduckdb/mcp-server-motherduck" target="_blank" rel="noopener noreferrer" className="github-repo-link">
  **Local MCP Server GitHub Repository** – Self-host the open-source MCP server for DuckDB and MotherDuck
</a>

## Where to go from here

- **[AI Data Analysis Getting Started](/getting-started/mcp-getting-started/)**: 5-minute walkthrough of querying data and creating Dives
- **[MCP Workflows Guide](/key-tasks/ai-and-motherduck/mcp-workflows/)**: Best practices for getting accurate results from AI-powered analysis
- **[MCP Server Reference](/sql-reference/mcp/)**: Server capabilities, available tools, and regional availability
- **[Restricting to Read-Only Access](/key-tasks/ai-and-motherduck/securing-read-only-access/)**: Restrict your AI assistant to read-only queries

---

## Using the MotherDuck MCP Server

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-workflows

> Effective workflows and best practices for getting the most out of the MotherDuck MCP Server with AI assistants

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

The MotherDuck **remote** MCP Server connects AI assistants like Claude, ChatGPT, and Cursor to your data. This guide covers workflows for getting accurate, useful analysis results. If you haven't already, [set up your remote MCP connection](/key-tasks/ai-and-motherduck/mcp-setup/).

:::info Remote vs local MCP
This guide is written for the **remote MCP** (fully managed by MotherDuck). Most of the tips apply to the **local MCP** (fully customizable, self-hosted) as well. For local MCP setup and details, see the [MCP reference](/sql-reference/mcp/#local-mcp-server).
:::

## Prerequisites

To use the MotherDuck remote MCP server, you will need:

- A MotherDuck account with at least one database
- An AI client like Claude, Cursor, or ChatGPT already connected to the remote MCP server ([setup instructions](/key-tasks/ai-and-motherduck/mcp-setup/))

:::note Read vs write tools
The remote MCP server exposes two query tools: `query` for read-only SQL and `query_rw` for SQL that can change data or schema. See the [query](/sql-reference/mcp/query/) and [query_rw](/sql-reference/mcp/query-rw/) references for details.

To enforce read-only access, see [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/).
:::

## How it works

When you ask an AI assistant a question about your data, here's what happens behind the scenes:

1. **Schema exploration**: The AI examines your database structure to understand available tables and columns
2. **Query generation**: Based on your question, the AI writes DuckDB SQL
3. **Query execution**: The remote MCP Server runs the query on MotherDuck
4. **Results interpretation**: The AI explains the results in natural language

You can inspect which SQL query the MCP executed by expanding the tool call in the conversation:

![Inspecting the query executed by MCP](./img/mcp_inspect_query.png)

When you create a Dive:

1. **Data analysis**: The AI agent queries your database to understand the data relevant to your request
2. **Visualization generation**: The agent generates an interactive React component with the SQL queries and chart configuration
3. **Inline preview**: The Dive renders in the conversation so you can iterate before saving. In clients that support the Dive Viewer MCP App (Claude web and desktop at launch), the preview runs against live data with the same components used in the MotherDuck UI. In other clients, you see a static preview with sample data, and the Dive queries live data once you open it in MotherDuck.
4. **Save to MotherDuck**: Each save is stored in your workspace and always queries live data, so there are no stale snapshots. You can find the Dive in the [MotherDuck UI](/key-tasks/ai-and-motherduck/dives/#finding-your-dives) under the Object Explorer or **Settings** → **Dives**. With the Dive Viewer, every edit creates a separate version automatically.
5. **Share with your team**: The agent can [share the underlying data](/sql-reference/mcp/share-dive-data) with your organization so others can view and interact with the Dive

## Start with schema exploration

Before diving into analysis, help the AI understand your data. This is a form of **context engineering**: by exploring your schema upfront, you hydrate the conversation with knowledge about your tables, columns, and relationships. This context carries forward, helping the AI write more accurate queries throughout your session.

Start conversations by asking about your database structure:

**Good first prompts:**
- *"What databases and tables do I have access to?"*
- *"Describe the schema of my `analytics` database"*
- *"What columns are in the `orders` table and what do they contain?"*

The remote MCP server provides tools for schema exploration that surface table relationships, data types, and any documentation you've added to your schema.

:::tip
If you have well-documented tables with [`COMMENT ON`](https://duckdb.org/docs/stable/sql/statements/comment_on.html) descriptions, the AI can use these to better understand your data's business meaning.
:::

## Frame questions with context

The more context you provide, the better the results. Include relevant details like:

- **Time ranges**: *"Show me orders from the last 30 days"* vs *"Show me orders"*
- **Filters**: *"Analyze customers in the US with more than 5 purchases"*
- **Metrics**: *"Calculate revenue as `quantity * unit_price`"*
- **Output format**: *"Return results as a summary table with percentages"*

**Example - Vague vs. Specific:**

| ❌ Vague | ✅ Specific |
|----------|-------------|
| "Show me sales data" | "Show me total sales by product category for Q4 2024, sorted by revenue descending" |
| "Find top customers" | "Find the top 10 customers by total order value in the last 12 months" |
| "Analyze trends" | "Compare monthly active users month-over-month for 2024, showing growth rate" |

## Iterate

Complex analysis works best as a conversation. Start simple, validate the results, then build up. Each exchange adds shared context, helping the AI write better queries as you go. While there is a temptation to get the perfect query in one shot, often insight comes as part of the process of data exploration.

When iterating, it can be helpful to have source data nearby to help verify outputs. Our users have noted that using their existing BI dashboard to quickly validate that metrics are correct helps to develop intuition about the information provided by the AI assistants.

## Common workflow patterns

### Data profiling

Quickly understand a new dataset:

```text
"Profile the `transactions` table - show me:
- Row count and date range
- Distribution of key categorical columns
- Summary statistics for numeric columns
- Any null values or data quality issues"
```

:::tip DuckDB functions for EDA
DuckDB has a few SQL functions that are great for hydrating context:
- `DESCRIBE` which retrieves the metadata for a specific table
- `SUMMARIZE` which gets summary stats for a table (can be large)
- The `USING SAMPLE 10` clause (at the end of the query) which samples the data (can be large) -  using it with a where clause to narrow down is very helpful for performance
:::

### Generating charts

Some AI clients can generate visualizations directly from your query results. ChatGPT on the web and Claude Desktop both support creating charts as "artifacts" alongside your conversation.

Visualizations help you spot trends and outliers faster than scanning tables, validate that query results make sense at a glance, and share insights with stakeholders who prefer visual formats.

**Example prompts:**
- *"Chart monthly revenue for 2024 as a line graph"*
- *"Create a bar chart showing the top 10 customers by order count"*
- *"Visualize the distribution of order values as a histogram"*
- *"Show me a time series of daily active users with a 7-day moving average"*

Once you have a chart, you can iterate on it just like query results: *"Add a trend line"*, *"Change to a stacked bar chart"*, or *"Break this down by region"*.

:::note
When using the MCP with more IDE-like interfaces, the MCP plays very nicely with libraries like `matplotlib` for building more traditional charts.
:::

### Querying private S3 buckets

You can use the MCP to analyze files in private S3 buckets (Parquet, CSV, JSON) by storing your AWS credentials as a [secret in MotherDuck](/sql-reference/motherduck-sql-reference/create-secret/).

<Tabs>
<TabItem value="ui" label="MotherDuck UI">

You can create secrets directly in the [MotherDuck UI](https://app.motherduck.com) under **Settings → Secrets**.

![The MotherDuck secrets UI](./img/md_create_secret_ui.png)

</TabItem>
<TabItem value="cli" label="AWS SSO with credential chain">

This is recommended for desktop AI clients. If you use AWS SSO, you can refresh your credentials and store them in MotherDuck:

1. Create an AWS credential profile

```bash
aws configure sso
```

2. Authenticate with AWS SSO:

```bash
aws sso login --profile <your_sso_profile>
```

3. Open a DuckDB client (for example, the CLI) and create a secret using the credential chain:

```sql
ATTACH 'md:';

CREATE OR REPLACE SECRET IN MOTHERDUCK (
    TYPE s3,
    PROVIDER credential_chain,
    CHAIN 'sso',
    PROFILE '<your_sso_profile>'
);
```

This stores your AWS credentials in MotherDuck, making them available to the remote MCP server.

:::note
Run `aws sso login --profile <your_sso_profile>` before creating the secret to refresh your SSO token. Starting with DuckDB v1.4.0, credentials are validated at creation time. If your local credentials are not resolvable, the command will fail: use the correct `CHAIN` and `PROFILE` for your credential type, or add `VALIDATION 'none'` as a last resort to skip local validation.
:::

:::note Credential expiration
If you use temporary credentials (SSO, IAM roles), you'll need to refresh the secret when they expire by running the `CREATE OR REPLACE SECRET` command again.
:::

</TabItem>
</Tabs>

Once your credentials are set up, you can ask your AI assistant to query any S3 bucket you have access to:

```text
"Give me some analytics about s3://my-bucket/sales-data.parquet"
```

![Exploring S3 data with MCP](./img/mcp_explore_s3.png)

### Ad-hoc investigation

The MCP is especially useful for exploratory debugging when you're not sure what you're looking for. Rather than writing queries upfront, you can describe the problem and let the AI help you dig in.

```text
"I noticed a spike in errors on Dec 10th. Help me investigate:
- What types of errors increased?
- Were specific users or endpoints affected?
- What changed compared to the previous week?"
```

One pattern we use at MotherDuck is loading logs or event data into a database and using the MCP to interrogate it conversationally. Instead of manually crafting regex patterns or grep commands, you can ask questions like *"What are the most common error messages in the last hour?"* or *"Show me all requests from user X that resulted in a 500 error"*. This turns log analysis from a tedious grep session into an interactive investigation where each answer informs the next question.

## Working with query results

### Refining results

Results rarely come out perfect on the first try. The conversational nature of MCP means you can refine incrementally rather than rewriting queries from scratch. If you're seeing test data mixed in, just say *"Add a filter to exclude test accounts"*. If the granularity is wrong, ask to *"Change the grouping from daily to weekly"*. Small adjustments like changing sort order or adding a column are easy follow-ups.

### Understanding queries

When the AI generates complex SQL, don't hesitate to ask for an explanation. This is useful both for validating the approach and for learning. Ask *"Explain what this query is doing step by step"* to understand the logic, or *"Are there any edge cases this query might miss?"* to sanity-check the results before relying on them.

### Exporting for further use

Once you have the results you need, ask for output in the format that fits your workflow. You can request a markdown table for documentation, CSV-friendly output for spreadsheets, or a written summary to share with your team. The AI can also help you format results for specific tools or audiences. Sometimes it can also be a great jumping off for further analysis with an expert, so asking for the final query to hand-off can also be a great step.

## Tips for better results

### Be explicit about assumptions

Your data likely has business rules that aren't obvious from the schema alone. If a "completed" order means status is either 'shipped' or 'delivered', say so. If revenue calculations should exclude refunds, mention it upfront. The AI can't infer these domain-specific rules, so stating them early prevents incorrect results and saves iteration time.

### Reference specific tables and columns

When you already know your schema, being specific helps the AI get it right the first time. Instead of asking about "the timestamp", say *"Use the `user_events.event_timestamp` column"*. If you know how tables relate, specify the join: *"Join `orders` to `customers` on `customer_id`"*. This is especially helpful in larger schemas where column names might be ambiguous.

### Ask for validation

When accuracy matters, ask the AI to sanity-check its own work. Questions like *"Does this total match what you'd expect based on the row counts?"* or *"Can you verify this join doesn't create duplicates?"* can catch subtle bugs before you rely on the results. The AI can run quick validation queries to confirm the logic is sound.

## Troubleshooting

:::tip Beyond querying
The remote MCP server includes tools beyond just running queries. Most are metadata lookups or search functions for finding tables and columns, but the [ask docs question](/sql-reference/mcp/ask-docs-question) tool is particularly useful when you're stuck on tricky syntax or DuckDB-specific features. If the AI is struggling with a query pattern, try asking it to look up the relevant documentation first.
:::

| Issue | Solution |
|-------|----------|
| AI queries wrong table | Ask: *"What tables are available?"* then specify the correct one |
| Results don't look right | Ask: *"Show me sample data from the source table"* to verify the data |
| Query is slow | Ask: *"Can you optimize this query?"*, add filters to reduce data scanned, or [increase your Duckling size](/about-motherduck/billing/duckling-sizes/) |
| AI doesn't understand the question | Rephrase with more specific column names and business context |
| Can't type fast enough | Use voice-to-text to interact with your AI assistant |

## Related resources

- [Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) - Setup instructions for all supported AI clients
- [AI Features in the UI](/key-tasks/ai-and-motherduck/ai-features-in-ui/) - Built-in AI features for the MotherDuck interface
- [Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/) - Build custom AI agents with MotherDuck

---

## Restricting to read-only access

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/securing-read-only-access

> Restrict the remote MCP server to read-only queries using client-side blocking, read scaling tokens, or proxy filtering

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import DocImage from '@site/src/components/DocImage';

# Restricting to read-only access

The remote MCP server exposes both the read-only `query` tool and the read-write `query_rw` tool. If you want to ensure your AI assistant can only read data, there are three approaches depending on your setup.

| Approach | Enforcement | Setup | Works with OAuth connectors |
|----------|------------|-------|-----------------------------|
| [Block the tool at the client](#block-the-query_rw-tool-at-the-client) | Client-side | Low (UI toggle) | Yes |
| [Use a read scaling token](#use-a-read-scaling-token) | Server-side | Medium (manual config) | No (replaces OAuth) |
| [Proxy filtering](#proxy-filtering) | Application-side | Varies | N/A (custom backend) |

## Block the `query_rw` tool at the client

The simplest approach: keep using the OAuth connector, but configure your MCP client to never call the `query_rw` tool. The server still exposes the tool, but the client will never invoke it.

Most clients support this at the **individual user** level. ChatGPT also lets **organization admins** enforce tool restrictions across all workspace members.

<Tabs groupId="mcp-client">
  <TabItem value="claude" label="Claude" default>

Each user can block tools individually. Go to **Settings → Connectors → MotherDuck**, expand **Write/delete tools**, and select the blocked icon next to `query_rw`:

![Blocking the query_rw tool in Claude's connector settings](./img/query-rw-blocked.png)

:::note
Claude does not support org-level per-tool blocking. Team/Enterprise admins can remove a connector entirely from **Organization settings → Connectors**, but cannot selectively disable individual tools like `query_rw` for all members.
:::

> [Claude connector permissions documentation](https://support.claude.com/en/articles/11175166-get-started-with-custom-connectors-using-remote-mcp)

  </TabItem>
  <TabItem value="chatgpt" label="ChatGPT">

**Enterprise/Edu admins:** Admins can [enable or disable specific app actions after publishing](https://help.openai.com/en/articles/12584461-developer-mode-and-full-mcp-connectors-in-chatgpt-beta). Go to **Workspace Settings → Apps**, click the `...` menu next to MotherDuck, select **Action control**, and deselect `query_rw`. New tools added by the MCP server are disabled by default — admins must explicitly enable them.

**Business plans:** Per-tool Action control is not available for custom MCP apps after publishing. To change which tools are exposed, remove and recreate the app ([developer mode documentation](https://help.openai.com/en/articles/12584461-developer-mode-and-full-mcp-connectors-in-chatgpt-beta)).

  </TabItem>
  <TabItem value="cursor" label="Cursor">

Open **Cursor Settings** → **Tools & MCP**, expand the MotherDuck server entry, and toggle off `query_rw`.

:::note
Tool toggles are stored locally in Cursor's database, not in the `mcp.json` config file. They cannot be shared across a team through config files.
:::

  </TabItem>
  <TabItem value="claude-code" label="Claude Code">

Add a deny rule to your `.claude/settings.json` (project-level) or `~/.claude/settings.json` (user-level):

```json
{
  "permissions": {
    "deny": ["mcp__MotherDuck__query_rw"]
  }
}
```

> [Claude Code permissions documentation](https://code.claude.com/docs/en/permissions)

  </TabItem>
  <TabItem value="copilot-studio" label="Copilot Studio">

Open your agent in Copilot Studio, go to **Tools**, and open the MotherDuck MCP entry. Toggle `query_rw` off in the tool list and click **Save**. The agent only sees `query` and the schema exploration tools.

<DocImage
  src="/img/key-tasks/ai-and-motherduck/copilot-studio/07-tools-list.png"
  alt="MotherDuck MCP tool list in Copilot Studio with query_rw toggled off"
  width={700}
/>

  </TabItem>
</Tabs>

## Use a read scaling token

For server-side enforcement, authenticate with a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) instead of a regular access token. Read scaling tokens connect to dedicated read replicas that reject all write operations — even if the client calls `query_rw`, writes will fail. This requires manual configuration instead of the one-click OAuth connectors.

:::note
Read scaling connections are [eventually consistent](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#ensuring-data-freshness). Results may lag a few minutes behind the latest database state.
:::

You can create a read scaling token from the [MotherDuck UI](https://app.motherduck.com) under **Settings → Access Tokens** or through the [REST API](/sql-reference/rest-api/users-create-token/).

Read scaling tokens also unlock concurrent MCP sessions: each MCP instance that connects with a read scaling token is assigned to a read replica (duckling) from a pool. Up to the pool size (default 4, max 16), each connection gets its own duckling; once the pool is full, new connections are assigned to existing ducklings in round-robin. This means you can run many MCP sessions in parallel from the same account—for example, multiple AI agents or team members querying simultaneously. See [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for details on pool sizing and how replicas are assigned.

<Tabs groupId="mcp-client">
  <TabItem value="claude" label="Claude" default>

Claude's web connector only supports OAuth, so you need to use the desktop config instead. Open **Settings → Developer → Edit Config** and add:

```json
{
  "mcpServers": {
    "MotherDuck": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://api.motherduck.com/mcp",
        "--header",
        "Authorization: Bearer ${MOTHERDUCK_TOKEN}"
      ],
      "env": {
        "MOTHERDUCK_TOKEN": "<your_read_scaling_token>"
      }
    }
  }
}
```

This uses [`mcp-remote`](https://www.npmjs.com/package/mcp-remote) to bridge the remote MCP server into Claude Desktop's local stdio transport.

  </TabItem>
  <TabItem value="chatgpt" label="ChatGPT">

ChatGPT connectors can't set static headers. To use a read scaling token, run a proxy that injects the `Authorization` header and connect ChatGPT to that proxy.

Example proxy (Cloudflare Worker):

```js
export default {
  async fetch(request, env) {
    const upstreamUrl = new URL(request.url);
    upstreamUrl.protocol = "https:";
    upstreamUrl.hostname = "api.motherduck.com";
    upstreamUrl.pathname = "/mcp";

    const upstreamRequest = new Request(upstreamUrl, request);
    upstreamRequest.headers.set(
      "Authorization",
      `Bearer ${env.MOTHERDUCK_READ_SCALING_TOKEN}`
    );
    upstreamRequest.headers.delete("cookie");

    return fetch(upstreamRequest);
  },
};
```

1. Deploy the proxy and store the read scaling token as a secret (for example, `MOTHERDUCK_READ_SCALING_TOKEN`).
2. In [ChatGPT Settings → Connectors](https://chatgpt.com/#settings/Connectors), click **Create App**.
3. Enter:
   - **Name:** `MotherDuck (Read Only)`
   - **MCP Server URL:** `<your_proxy_url>`
   - **Authentication:** `No authentication`
4. Open a chat, select the connector, and run a query (for example: `SELECT * FROM information_schema.tables LIMIT 5`).

`query_rw` may still appear, but writes fail because read scaling tokens are read-only.

  </TabItem>
  <TabItem value="cursor" label="Cursor">

Open **Cursor Settings** → **Tools & MCP** → **+ New MCP Server** and add the following configuration:

```json
{
  "MotherDuck": {
    "url": "https://api.motherduck.com/mcp",
    "type": "http",
    "headers": {
      "Authorization": "Bearer <your_read_scaling_token>"
    }
  }
}
```

  </TabItem>
  <TabItem value="claude-code" label="Claude Code">

```bash
claude mcp add --transport http \
  --header "Authorization: Bearer <your_read_scaling_token>" \
  MotherDuck https://api.motherduck.com/mcp
```

  </TabItem>
  <TabItem value="copilot-studio" label="Copilot Studio">

Follow the [Copilot Studio MCP setup](/key-tasks/ai-and-motherduck/mcp-setup/?mcp-client=copilot-studio) with **API key** authentication, and when prompted for the connection value, enter your read scaling token:

```text
Bearer <your_read_scaling_token>
```

The `query_rw` tool may still appear in the agent's tool list, but writes fail at the server because read scaling replicas reject write operations. For belt-and-braces, also toggle `query_rw` off in the tool list so the model never sees it as an option.

<DocImage
  src="/img/key-tasks/ai-and-motherduck/copilot-studio/07-tools-list.png"
  alt="MotherDuck MCP tool list in Copilot Studio with query_rw toggled off"
  width={700}
/>

  </TabItem>
  <TabItem value="others" label="Others">

For MCP-compatible clients that support simple authentication, use the following JSON configuration with a read scaling token as the Bearer value:

```json
{
  "mcpServers": {
    "MotherDuck": {
      "url": "https://api.motherduck.com/mcp",
      "type": "http",
      "headers": {
        "Authorization": "Bearer <your_read_scaling_token>"
      }
    }
  }
}
```

For clients that only support local (stdio) servers, use `mcp-remote` to bridge the connection:

```json
{
  "mcpServers": {
    "MotherDuck": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://api.motherduck.com/mcp",
        "--header",
        "Authorization: Bearer ${MOTHERDUCK_TOKEN}"
      ],
      "env": {
        "MOTHERDUCK_TOKEN": "<your_read_scaling_token>"
      }
    }
  }
}
```

  </TabItem>
</Tabs>

## Proxy filtering

If you're integrating the remote MCP server into a backend service or custom agent framework, you can restrict access at the application layer. When proxying MCP tool calls, omit or reject calls to the `query_rw` tool and only forward calls to the read-only `query` tool and schema exploration tools.

See [Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents) for patterns on building custom agent integrations with read-only access controls.

---

## Creating Visualizations with Dives

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives

> Build interactive visualizations from natural language using AI agents and the MotherDuck MCP Server

import VideoPlayer from '@site/src/components/VideoPlayer';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

<VideoPlayer
      playbackId="osPhZS9kGWhGKgEH2xlsnIAFT7b6I00tNaXoXtlKz9vA"
      title="Dives product overview"
      thumbnailTime={46}
    />

Dives are interactive visualizations you create with natural language, directly on top of your data in MotherDuck. Ask a question to your AI agent, and MotherDuck generates a persistent, interactive component that lives in your workspace alongside your SQL.

Think of Dives as a bridge between one-off questions and always-up-to-date dashboards. Instead of building a full dashboard or writing complex queries, you can ask a question and save the answer as a Dive that stays current with your data.

## How Dives work

When you create a Dive with the [MotherDuck MCP](/sql-reference/mcp/) through an AI agent:

1. You ask a question in natural language (for example, "Show me monthly revenue trends by product category")
2. The AI agent queries your MotherDuck database through the [MCP Server](/sql-reference/mcp/) to understand the data
3. The agent creates an interactive visualization, with the necessary SQL to query the data
4. In clients that support the Dive Viewer MCP App, the Dive renders inline in the chat against live data. In other clients, the agent shows a static preview with sample data until you open the Dive in MotherDuck
5. MotherDuck saves the Dive to your workspace

Dives use MotherDuck's [hypertenancy](/concepts/hypertenancy) architecture to serve sub-second queries. Every user gets dedicated compute, so there's no slowdown when your whole team explores data at once.

### Inline preview with the Dive Viewer

On clients that support [MCP Apps](https://apps.extensions.modelcontextprotocol.io/), the MotherDuck MCP Server serves a **Dive Viewer MCP App** that renders your Dive directly in the chat with the same React components used in the MotherDuck UI. At launch, this is supported in Claude web and desktop; other clients fall back to a sample-data preview.

With the Dive Viewer:

- The preview queries **live data** through the MCP Server, so what you see in the chat matches what you'll see in MotherDuck.
- Every edit is applied incrementally and saved as a separate version of the Dive, rather than rewritten from scratch. You can browse versions from the version picker in the MotherDuck UI.
- You iterate conversationally (*"add a filter for US region"*, *"switch to a bar chart"*) and the Viewer updates in place.

## Prerequisites

To create a Dive, you will need:

- A MotherDuck account with at least one database
- An [AI client](/docs/getting-started/mcp-getting-started/) connected to the [MotherDuck MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) (Claude, ChatGPT, Cursor, or others)

Dives are available on all MotherDuck plans at no additional charge.

## Creating a Dive

<VideoPlayer
      playbackId="pL501twlYhiP8YyVKDMUowHwwn6LcmF400rQLO6lAeo02c"
      title="Creating your first MotherDuck Dive"
      autoPlay="true"
      muted="true"
      loop="true"
    />

Connect your AI assistant to the MotherDuck MCP Server, then ask it to create a visualization. The key is to ask for a "Dive" specifically as this tells the agent to persist the visualization in your MotherDuck workspace.

**Example prompts:**

- *"Create a Dive showing monthly revenue trends for the last 12 months"*
- *"Make a Dive that breaks down customer sign-ups by region"*
- *"Build a Dive with a chart of our top 10 products by sales volume. Use MotherDuck's brand colors"*

The AI agent handles the SQL, chart configuration, styling and saving. You just describe what you want to see.

### Iterating on a Dive

Once you have a Dive, you can refine it through conversation:

- *"Add a filter for the US region only"*
- *"Change the chart to a stacked bar chart"*
- *"Add a trend line to show the overall direction"*

Each update modifies the Dive in place, keeping your visualization current.

## Finding your Dives

Dives appear in two places in the MotherDuck UI:

### Object explorer

Your recent Dives appear in the left sidebar, above your Notebooks. Click any Dive to load it in the main view. The list shows your most recent Dives first.

![A screenshot of a dives dashboard in the MotherDuck UI](./img/dives_airquality_eastcoats_westcoast.png)

### Settings page

For a complete list of all Dives in your organization, go to **Settings** → **Dives**. This view makes it easier to find Dives created by others in your team.

![A screenshot of the dives settings and overview in the MotherDuck UI](./img/dives_settings_ui.png)

## Sharing Dives with your team

When you save a Dive, the AI agent checks whether the databases it queries are shared with your organization. If not, it will suggest sharing them so your team can view the Dive. You can also explicitly ask:

> *"Share the data for my revenue Dive with my team"*

This creates org-scoped shares for any private databases referenced in the Dive's queries and updates the Dive to use the shared references. See [`share_dive_data`](/sql-reference/mcp/share-dive-data) for details.

## Version history

Every time you update a Dive, MotherDuck saves a version. You can browse previous versions directly in the MotherDuck UI using the version picker in the top-right corner of a Dive. The dropdown shows each version with its description and when it was created.

![A screenshot of the version history dropdown in the MotherDuck Dives UI](./img/dives_version_history.png)

Selecting a previous version lets you view what the Dive looked like at that point. Version browsing is read-only: switching to an older version does not overwrite the latest version.

You can also retrieve versions programmatically. Use [`list_dives`](/sql-reference/mcp/list-dives) to see the `current_version` for each Dive, and [`read_dive`](/sql-reference/mcp/read-dive) with the `version` parameter to inspect a specific version.

## What makes Dives different

Unlike traditional dashboards:

- **Natural language creation**: Describe what you want in plain English instead of clicking through a UI or writing visualization code
- **Always current**: Dives query live data—no manual refreshes or stale snapshots
- **Workspace-native**: Dives live alongside your SQL in MotherDuck, not in a separate tool
- **Instant exploration**: Filter, drill down, and explore without waiting for queries to run

Unlike one-off AI-generated charts:

- **Persistent**: Dives save to your workspace so you can return to them anytime
- **Shareable**: Team members can view and interact with Dives you create—[share the underlying data](/sql-reference/mcp/share-dive-data) to give them access
- **Interactive**: Filter and explore the data, not just view a static image

## Walkthrough: Building a Dive step by step

<Tabs>
<TabItem value="claude-desktop-web" label="Claude Desktop/Web" default>

Connect the [MotherDuck MCP Server](/sql-reference/mcp/) to Claude for desktop or Claude on the web, then open a new conversation.

**Step 1: Explore your data**

Don't ask for a finished Dive right away. Start vague: *"Take a look at what tables I have in my analytics database."* Claude lists tables, reads column names, samples rows, and figures out how things connect. Doing this first saves you from chasing down SQL errors later.

When it reports back, keep asking questions. *"How do the orders and customers tables connect? What date range am I working with?"* The more Claude knows about your schema upfront, the fewer corrections you'll need.

**Step 2: Shape the analysis**

Point Claude at what you want to see.

If you're not sure what to look for, go open-ended: *"What are the most interesting patterns in this data?"* Claude runs queries and pulls out trends you might have missed.

If you already have something in mind, say so: *"I want to see how revenue breaks down by product category over the last 12 months."* You can also paste in a SQL query or a screenshot of a dashboard you want to recreate.

Mention specifics like calculated columns, filters, or date ranges before asking Claude to build the Dive.

**Step 3: Iterate on the live preview**

Claude renders the Dive inline in the chat with the Dive Viewer MCP App, using the same components as the MotherDuck UI and running against live data. Dive edits are versioned. Users can ask their agent to refer to and clone prior versions for continued iterations. They can also browse through past versions directly in the MotherDuck UI.

Explain *why* you want a change, not just *what*. *"I want to spot outliers quickly"* gives Claude more to work with than *"make the dots bigger."* Group related tweaks into one message. Keep unrelated changes separate. If something isn't working after two or three rounds, try a different approach. If you know what you want to change specifically, go ahead and do it.

Even beyond the charts and visuals themselves, there are so many ways to enhance your Dive. Every type of custom interaction you've seen on the web is available to you. Ask for features like drill downs, cross-filtering, zooming, and more.

You don't have to finish in one sitting.

**Step 4: Find it in MotherDuck**

Every edit from the Dive Viewer is saved to your workspace as a separate version, so the Dive is already there when you're done iterating. If you want to force a save or name a checkpoint explicitly, ask Claude: *"Save this as a Dive in MotherDuck."*

Find the Dive in the [Object Explorer sidebar](#object-explorer) or on the [Settings page](#settings-page), share it with your team, and come back to Claude when you want to change anything.

</TabItem>
<TabItem value="claude-code" label="Claude Code">

Claude Code can allow you to iterate very quickly when building Dives. With Claude Code, you can preview your changes in a local environment for instant feedback loops - and Claude can get that environment set up for you!

To get started, connect the [MotherDuck MCP Server](/sql-reference/mcp/) to Claude Code, then open a new conversation.

**Step 1: Explore your data**

Don't ask for a finished Dive right away. Start vague: *"Take a look at what tables I have in my analytics database."* Claude lists tables, reads column names, samples rows, and figures out how things connect. Doing this first saves you from chasing down SQL errors later.

When it reports back, keep asking questions. *"How do the orders and customers tables connect? What date range am I working with?"* The more Claude knows about your schema upfront, the fewer corrections you'll need.

**Step 2: Shape the analysis**

Point Claude at what you want to see.

If you're not sure what to look for, go open-ended: *"What are the most interesting patterns in this data?"* Claude runs queries and pulls out trends you might have missed.

If you already have something in mind, say so: *"I want to see how revenue breaks down by product category over the last 12 months."* You can also paste in a SQL query or a screenshot of a dashboard you want to recreate.

Mention specifics like calculated columns, filters, or date ranges before asking Claude to build the Dive.

**Step 3: Create a Dive local preview**

Next, ask Claude to create a Dive based on your analysis thus far and any other open questions on your mind. Claude will ask if you would like to see a local preview, and if you accept, the MotherDuck MCP will give Claude the instructions to set up a preview on your local machine.

To set up the preview, Claude will make some local folders and run some npm commands, and after a moment your environment will be ready. You will receive a message like this:

> `The preview is running at http://localhost:5177/.`
> `Open that in your browser to see the Dive with live data from MotherDuck.`

So, cmd + click on that localhost URL (or ctrl + click if you are in Windows), and you'll have a live preview in your browser of the Dive you just created.

**Step 4: Iterate with the preview**

Now you get to tap into the power of Agents for follow up analysis and enhancing the visual.

Explain *why* you want a change, not just *what*. *"I want to spot outliers quickly"* gives Claude more to work with than *"make the dots bigger."* Group related tweaks into one message. Keep unrelated changes separate. If something isn't working after two or three rounds, try a different approach. If you know what you want to change specifically, go ahead and do it.

Feel free to keep questions open ended. Things like, *"What other columns are correlated with revenue? What other interesting patterns should I investigate?"* can let Claude uncover hidden patterns on your behalf.

Even beyond the charts and visuals themselves, there are so many ways to enhance your Dive. Every type of custom interaction you've seen on the web is available to you. Ask for features like drill downs, cross-filtering, zooming, and more.

**Step 5: Publish to MotherDuck**

Tell Claude to save it: *"Save this as a Dive in MotherDuck."*

The Dive runs against live data. Find it in the [Object Explorer sidebar](#object-explorer) or on the [Settings page](#settings-page), share it with your team, and come back to Claude when you want to change anything.

</TabItem>
<TabItem value="chatgpt" label="ChatGPT">

Connect the [MotherDuck MCP Server](/sql-reference/mcp/) to ChatGPT and follow the general steps in [Creating a Dive](#creating-a-dive). The workflow is similar to the Claude Desktop/Web tab: explore your data, shape the analysis, then ask ChatGPT to save the result as a Dive.

</TabItem>
<TabItem value="cursor" label="Cursor">

Connect the [MotherDuck MCP Server](/sql-reference/mcp/) to Cursor and follow the general steps in [Creating a Dive](#creating-a-dive). The workflow is similar to the Claude Code tab: explore your data, shape the analysis, preview locally, then publish the Dive to MotherDuck.

</TabItem>
</Tabs>

## Tips for better Dives

### Be specific about the visualization

Include details about chart type, time ranges, and groupings:

| Less effective | More effective |
|----------------|----------------|
| "Show me sales data" | "Create a Dive with a line chart of weekly sales for 2024, broken down by product category" |
| "Make a customer chart" | "Build a Dive showing customer count by signup month as a bar chart" |

### Use your schema knowledge

If you know your table and column names, include them:

> "Create a Dive from the `orders` table showing `total_amount` by `order_date`, grouped by month"

### Start simple, then iterate

Begin with a basic visualization, then add complexity:

1. *"Create a Dive showing revenue by month"*
2. *"Add a breakdown by region"*
3. *"Filter to show only the top 5 regions"*

## Troubleshooting

| Issue | Solution |
|-------|----------|
| AI creates a chart but doesn't save it as a Dive | Explicitly ask to "create a Dive" or "save this as a Dive in MotherDuck" |
| Dive shows unexpected data | Ask the AI to explain the query it used, then refine your request |
| Can't find a Dive | Check **Settings** → **Dives** for the complete list |
| Dive is slow to load | The underlying query may be scanning a lot of data—ask the AI to add filters or optimize |

## Declaring required databases

When your Dive queries a database that viewers might not have attached, export a `REQUIRED_DATABASES` constant from your component. MotherDuck automatically attaches these databases (including shared databases) before running any queries, so your teammates don't see "Catalog does not exist" errors.

```jsx
export const REQUIRED_DATABASES = [
  {
    type: 'share',
    path: 'md:_share/<database_name>/<share_uuid>',
    alias: '<database_name>'
  }
];
```

Each entry describes one database:

| Field | Description |
|-------|-------------|
| `type` | `"share"` for shared databases, `"database"` for owned databases |
| `path` | The share URL (for example, `md:_share/galactic_coffee/af03aa17-...`) or database name |
| `alias` | The local alias used in your SQL queries |

You can find your share URLs by running `FROM MD_INFORMATION_SCHEMA.OWNED_SHARES;` or by asking the AI agent to use the [`share_dive_data`](/sql-reference/mcp/share-dive-data) tool.

This approach is preferred over calling `ATTACH` inside `useSQLQuery`, because it lets MotherDuck handle the attachment before any data queries fire.

## Related resources

- [Embedding Dives in your website](/key-tasks/ai-and-motherduck/dives/embedding-dives)
- [Dives SQL Functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) — Manage Dives directly from SQL
- [`useSQLQuery` hook](/sql-reference/motherduck-sql-reference/ai-functions/dives/use-sql-query) — React hook reference for querying data inside Dives
- [Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) — Set up the MCP server with your AI assistant
- [MCP Workflows](/key-tasks/ai-and-motherduck/mcp-workflows/) — Tips for effective AI-powered data analysis
- [AI Features in MotherDuck](/docs/key-tasks/ai-and-motherduck/ai-features-in-ui/) — Explore instant SQL and automatic SQL fixes.

---

## Dive theme gallery

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/dive-theme-gallery

> Ready-to-use theme prompts for Dives with screenshots showing each style applied to the same dataset

import EmbeddedDive from '@site/src/components/EmbeddedDive';

Dives give you unlimited abilities in creating visualizations, but that does not automatically mean *good* visualizations. Use the following themes to guide your AI agent to learn from decades of experienced, excellent data visualizers.

Pick a theme, copy the prompt, and paste it into your AI agent alongside your data question. The live theme gallery Dive below lets you switch between all 15 themes interactively.

<EmbeddedDive
  diveId="f03b2355-cf68-4852-b2c6-8414a1454324"
  title="Galactic Coffee theme gallery"
  height="600px"
/>

## Tufte Minimal

Inspired by Edward Tufte, *The Visual Display of Quantitative Information* (1983).

![A Dive styled with the Tufte Minimal theme showing monochrome charts with generous whitespace and no gridlines](./img/theme_gallery_tufftle_minimal.png)

```text
Create a Dive with a Tufte Minimal style.
Inspired by: Edward Tufte, The Visual Display of Quantitative Information (1983).

Visual rules:
- Background: #FFFFF8. Text: #111. Muted: #666.
- Chart colors: monochrome ["#111","#666","#999"].
- Font: Georgia, serif. Titles: normal weight, no transform.
- Layout: generous whitespace, no gridlines, no chart borders.
- Charts: no gridlines, thin strokes (1.5px), linear interpolation.
- Direct labeling instead of legends. Small multiples preferred.
- Interactive: year toggle, metric toggle, click-to-filter on bars/pies.

Pairs well with: small multiples, sparklines, scatter plots, slope charts,
direct-labeled values, heatmaps, composed dual-axis charts.
Avoid: pie charts, 3D charts, heavy gridlines.

Feel: Quiet authority — the data speaks for itself.
```

## Ink & Paper

Inspired by the New York Times Graphics Desk.

![A Dive styled with the Ink and Paper theme showing clean left-aligned charts with subtle gridlines](./img/theme_gallery_ink_and_paper.png)

```text
Create a Dive with an Ink & Paper style.
Inspired by: New York Times Graphics Desk.

Visual rules:
- Background: #fff. Text: #121212. Muted: #666.
- Chart colors: ["#326fa8","#e15759","#59a14f","#edc949","#af7aa1"].
- Font: Georgia, serif. Titles: bold.
- Layout: clean, left-aligned, subtle gridlines.
- Charts: light gridlines, 2px strokes, linear interpolation.
- Interactive: year toggle, metric toggle, click-to-filter cross-filtering.

Pairs well with: annotated line charts, bar charts, horizontal bars,
step charts, small multiples, tables, composed dual-axis charts, heatmaps.

Feel: Authoritative journalism — clarity above all.
```

## Corporate Dashboard

Inspired by classic BI tools (Tableau, Power BI).

![A Dive styled with the Corporate Dashboard theme showing card-based charts with structured grid and uppercase titles](./img/theme_gallery_corporate_dashboard.png)

```text
Create a Dive with a Corporate Dashboard style.
Inspired by: Classic BI tools (Tableau, Power BI).

Visual rules:
- Background: #f5f5f5. Text: #333. Muted: #777.
- Chart colors: ["#2563eb","#16a34a","#dc2626","#f59e0b","#8b5cf6"].
- Font: system-ui, sans-serif. Titles: semibold, UPPERCASE.
- Layout: card-based, subtle borders, structured grid.
- Interactive: year & metric toggles, click-to-filter cross-filtering.

Pairs well with: line charts, pie charts, KPI cards, data tables,
bar charts, combo charts, heatmaps.

Feel: Boardroom-ready — structured and professional.
```

## FT Salmon

Inspired by Financial Times Visual Journalism.

![A Dive styled with the FT Salmon theme showing charts on a signature salmon background with serif typography](./img/theme_gallery_ft_salmon.png)

```text
Create a Dive with an FT Salmon style.
Inspired by: Financial Times Visual Journalism.

Visual rules:
- Background: #FFF1E5 (signature salmon). Text: #33302E. Muted: #807973.
- Chart colors: ["#0F5499","#990F3D","#FF7FAA","#00A0DD"].
- Font: Georgia, serif. Titles: semibold.
- Interactive: year & metric toggles, click-to-filter cross-filtering.

Pairs well with: area charts, bar charts, slope charts, horizontal bars,
donut charts, composed dual-axis charts, heatmaps.

Feel: Financial authority — the pink paper, digitized.
```

## Soft Infographic

Inspired by David McCandless, *Information is Beautiful*.

![A Dive styled with the Soft Infographic theme showing rounded bar charts and pastel colors on a light background](./img/theme_gallery_soft_infographic.png)

```text
Create a Dive with a Soft Infographic style.
Inspired by: David McCandless, Information is Beautiful.

Visual rules:
- Background: #fafafa. Text: #2d2d2d. Muted: #888.
- Chart colors: ["#FF6B6B","#4ECDC4","#45B7D1","#FFA07A","#98D8C8"].
- Font: system-ui, sans-serif. Titles: bold.
- Charts: rounded bars (8px radius), smooth curves.
- Interactive: year & metric toggles, click-to-filter cross-filtering.

Pairs well with: rounded bar charts, donut charts, line charts,
radar charts, composed charts, heatmaps.

Feel: Friendly and approachable — data for everyone.
```

## Du Bois

Inspired by W.E.B. Du Bois, Paris Exposition (1900).

![A Dive styled with the Du Bois theme showing bold horizontal bars on a parchment background with crimson and gold accents](./img/theme_gallery_dubois.png)

```text
Create a Dive with a Du Bois style.
Inspired by: W.E.B. Du Bois, Paris Exposition (1900).

Visual rules:
- Background: #e8d4b8 (parchment). Text: #1a1a1a. Muted: #654321.
- Chart colors: ["#dc143c","#228b22","#000","#ffd700","#654321"].
- Charts: horizontal bars, no gridlines, sharp edges (0 radius).
- Interactive: year & metric toggles, click-to-filter cross-filtering.

Pairs well with: horizontal bar charts, pie charts, heatmaps,
composed dual-axis charts.

Feel: Bold proclamation — data as civil rights evidence.
```

## More themes

The live gallery includes 9 additional themes you can explore and copy:

| Theme | Category | Feel |
|-------|----------|------|
| Knowledge Beautiful | Modern | Dense and layered — every pixel earns its place |
| Film Flowers | Artistic | Organic and poetic — data as a living garden |
| Dark Canvas | Modern | Midnight studio — data glowing in the dark |
| Playful Sketch | Artistic | Personal and intimate — a handwritten letter in data |
| Neon 80s | Fun | Arcade at midnight — data goes synthwave |
| Pirate Map | Fun | X marks the data — adventure on the high seas |
| Vaporwave | Fun | Digital sunset — nostalgia rendered in pastel neon |
| Terminal | Fun | `> data.query --style=hacker` — pure terminal vibes |
| Candy Pop | Fun | Sugar rush — joyful, bold, unapologetically fun |

Explore all 15 themes in the dive:

<EmbeddedDive
  diveId="f03b2355-cf68-4852-b2c6-8414a1454324"
  title="Galactic Coffee theme gallery"
  height="500px"
/>

## Using a gallery prompt with your own data

These prompts are designed to be mixed with your data question. Replace the dataset-specific parts and keep the visual rules:

```text
Create a Dive showing monthly active users from my analytics database.

Theme: FT Salmon
- Background: #FFF1E5 (signature salmon). Text: #33302E. Muted: #807973.
- Chart colors: ["#0F5499","#990F3D","#FF7FAA","#00A0DD"].
- Font: Georgia, serif. Titles: semibold.
- Interactive: time filter (Last 7 days | Last 30 days | Last 90 days | All time),
  click-to-filter cross-filtering.

Charts:
1. Area chart — DAU trend over time
2. Bar chart — Users by country
3. Donut — Traffic source breakdown
4. Table — Top pages by session count
5. Composed chart — Sessions bars + Bounce rate line (dual Y-axis)
6. Heatmap — Country × Day of week activity
```

For more on structuring theme prompts, see [Theming and styling your Dives](/key-tasks/ai-and-motherduck/dives/theming-and-styling-dives/).

## Related resources

- [Theming and styling your Dives](/key-tasks/ai-and-motherduck/dives/theming-and-styling-dives/) — How to write theme prompts, pick chart types, and add interactivity
- [Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/) — Get started with your first Dive
- [Managing Dives as code](/key-tasks/ai-and-motherduck/dives/managing-dives-as-code/) — Version control and CI/CD for Dives

---

## Theming and styling your Dives

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/theming-and-styling-dives

> Control the visual appearance of your Dives with theme definitions, chart selection, and interactive filters

import VideoPlayer from '@site/src/components/VideoPlayer';

When you create a Dive, you can go beyond the default look and feel. By providing a **theme definition** in your prompt, you control colors, typography, chart types, and interaction patterns — turning a basic visualization into a polished, branded data experience.

This guide covers how to structure a theme prompt, pick the right chart types for your data, and add interactivity through filters and cross-filtering.

<VideoPlayer
      playbackId="EHeEV1w00cULDoAv83ec02KI9LbnkYEDmj4tKGtqqp7Lc"
      title="Theming and styling a Dive"
      autoPlay="true"
      muted="true"
      loop="true"
    />

You can explore and play with themed Dives in the [live theme gallery](https://duck-dives.vercel.app/snippets/galactic-coffee-theme-gallery), or browse our [curated theme gallery](/key-tasks/ai-and-motherduck/dives/dive-theme-gallery/) with screenshots and ready-to-copy prompts.

## How theming works in Dives

A Dive is a React component that renders charts using [Recharts](https://recharts.org/) and queries live MotherDuck data through `useSQLQuery`. When you describe a visual style in your prompt, the AI agent translates it into:

- A **color palette** (background, text, muted, and chart colors)
- **Typography** (font family, title weight, text transform)
- **Chart configuration** (grid lines, stroke width, curve type, bar radius)
- **Layout** (grid columns, spacing, card styling)

You don't need to write any code — describe the style and the agent handles the implementation.

## Writing a theme prompt

A good theme prompt has four parts: **colors**, **typography**, **chart rules**, and **feel**. Here's an example that produces a Financial Times-inspired Dive:

```text
Create a Dive with an FT Salmon style.
Inspired by: Financial Times Visual Journalism.

Visual rules:
- Background: #FFF1E5 (signature salmon). Text: #33302E. Muted: #807973.
- Chart colors: ["#0F5499", "#990F3D", "#FF7FAA", "#00A0DD"].
- Font: Georgia, serif. Titles: semibold.
- Interactive: year & metric toggles, click-to-filter cross-filtering.

Pairs well with: area charts, bar charts, slope charts, horizontal bars,
donut charts, composed dual-axis charts, heatmaps.

Feel: Financial authority — the pink paper, digitized.
```

### What to include in your prompt

| Section | What to specify | Example |
|---------|----------------|---------|
| Colors | Background, text, muted accent, 3-5 chart colors | `Background: #0d1117. Chart colors: ["#58a6ff", "#3fb950"]` |
| Typography | Font family, title weight, text transform | `Font: Georgia, serif. Titles: bold, UPPERCASE` |
| Chart rules | Grid lines, stroke width, curve type, bar radius | `No gridlines, 1.5px strokes, linear interpolation` |
| Chart types | Which charts to include | `Pairs well with: area charts, bar charts, heatmaps` |
| Interactivity | Filters and cross-filtering behavior | `Interactive: year toggle, metric toggle, click-to-filter` |
| Feel | One-line mood descriptor | `Feel: Midnight studio — data glowing in the dark` |

### Tips for effective theme prompts

**Reference real-world styles.** Naming a specific design tradition helps the agent make consistent decisions. "Tufte minimal" or "Neon 80s synthwave" gives more coherent results than listing individual properties.

**Specify chart colors as an array.** Providing 3-5 hex colors as a JSON array (for example, `["#2563eb", "#16a34a", "#dc2626"]`) gives the agent an explicit palette instead of leaving it to guess.

**Pick colors that work in charts, not just colors that look nice together.** General-purpose palette generators often produce colors that clash or become indistinguishable when applied to bars, lines, and slices. Use tools designed for data visualization:

- [Colorbrewer 2.0](https://colorbrewer2.org/) — the gold standard for cartography and charts. Pick sequential, diverging, or qualitative palettes and get hex values ready to paste. Every palette is tested for perceptual uniformity and colorblind safety.
- [Viz Palette](https://projects.susielu.com/viz-palette) — paste your candidate colors and preview them on actual chart types (bars, lines, scatter). It flags pairs that are too similar or hard to distinguish with color vision deficiencies.

As a rule of thumb, limit your palette to 5-7 chart colors. More than that and the colors start blending together, especially in legends. If you have more categories than colors, consider grouping smaller categories into an "Other" bucket.

**Mention the "feel" in one sentence.** This guides the agent on ambiguous decisions like spacing, border radius, and animation. "Sugar rush — joyful and bold" produces different results than "Quiet authority — the data speaks for itself."

## Choosing chart types

Different chart types serve different purposes. When building a Dive with multiple charts, pick a mix that covers different analytical angles of your data.

### Chart type reference

| Chart type | Best for | Data shape |
|------------|----------|------------|
| Line chart | Trends over time | Time series |
| Area chart | Volume over time, part-to-whole trends | Time series |
| Bar chart | Comparing categories | Categorical |
| Horizontal bar | Ranked lists, long category names | Categorical, sorted |
| Stacked area | Composition over time | Multi-series time |
| Composed chart (bar + line) | Dual metrics on shared timeline | Time series, two metrics |
| Heatmap | Density across two dimensions | Matrix (for example, station x month) |
| Pie / donut | Part-to-whole — ideally aim for 2 or 3 slices, max 5. A horizontal bar or donut is almost always easier to read. If you still want a pie chart, label slices directly. | Categorical, proportional |
| Radar | Multi-dimensional profile comparison | Categorical, normalized |
| Scatter | Correlation between two measures | Two continuous variables |
| Table | Exact values, detailed comparison | Any structured data |

### Chart pairing recommendations

A 6-chart grid works well with this pattern:

1. **Trend chart** (line, area, or stepped line) — shows how metrics move over time
2. **Comparison chart** (bar or horizontal bar) — ranks categories side by side
3. **Composition chart** (pie, donut, or stacked area) — shows part-to-whole relationships
4. **Detail view** (table or direct-labeled bars) — provides exact values
5. **Dual-axis chart** (composed bar + line) — overlays two related metrics
6. **Density chart** (heatmap or scatter) — reveals patterns across dimensions

This mix gives viewers both the big picture and the ability to drill into specifics.

## Adding interactivity

Interactive filters make a Dive more useful than a static dashboard. You can ask for several types of interactivity in your prompt.

### Time filters

Time filters are the most common interactive control. Two patterns work well depending on your data:

**Relative time windows** work best for operational data that updates continuously — think logs, events, or transactions. Users care about what happened in the last few hours or days, not a specific calendar year:

```text
Add time filter pills: Last 24h | Last 7 days | Last 30 days | Last 90 days | All time.
Filter all charts when a time range is selected. Default to Last 30 days.
```

**Year or period toggles** work better for data with natural calendar boundaries — annual reports, quarterly metrics, or fiscal comparisons:

```text
Add year toggle pills: 2024 | 2025 | All.
Filter all charts when a year is selected.
```

Pick whichever pattern matches how your users think about the data. If they ask "what happened this week?" go with relative windows. If they ask "how did Q4 compare to Q3?" go with period toggles.

### Metric toggles

Let users switch which measure the charts display:

```text
Add a metric toggle between Revenue and Cups Sold.
The hero KPI and all chart Y-axes should update when toggled.
```

This changes the `dataKey` used by line, area, and bar charts, and swaps which metric appears as the primary KPI.

### Cross-filtering with click interactions

Cross-filtering means clicking an element in one chart filters every other chart in the Dive. This is different from putting a filter dropdown on each individual chart — and the difference matters.

**Why cross-filtering over individual filters?** When each chart has its own filter controls, users end up in a state where Chart A shows "US only," Chart B shows "all regions," and Chart C shows "Europe." The charts look coherent but they're answering different questions, and comparing them leads to wrong conclusions. Cross-filtering avoids this by keeping every chart in sync: click "US" on any chart and the entire Dive updates to show the US view. The user always sees one consistent story across all charts.

**When individual filters make sense.** There are cases where a per-chart filter is the right choice — when a chart has a dimension that doesn't exist in the other charts. For example, a chart showing data broken down by warehouse location doesn't need to cross-filter a chart that doesn't have a warehouse column. In that case, a local filter on just that chart is appropriate.

A good rule of thumb: use cross-filtering for shared dimensions (time, region, product category) and individual filters for dimensions unique to a single chart.

Enable cross-filtering in your prompt:

```text
Add click-to-filter cross-filtering:
- Click a bar in the station chart to filter by that station
- Click a pie slice to filter by that coffee type
- Non-selected items render at 30% opacity
- Show dismissible filter pills when filters are active
```

Cross-filtering works best when:

- **Bar charts** filter on their categorical axis (for example, clicking a station bar filters by station)
- **Pie and donut charts** filter on slice category (for example, clicking a product slice filters by product)
- **Unselected items** dim to 30% opacity rather than disappearing, so users keep the full context while focusing on a subset
- **Filter pills** appear below the controls showing active filters with a dismiss button

### Filter pills

When cross-filters are active, visible pills show what's filtered and let users clear filters with one click:

```text
Show active filters as colored pills with ✕ dismiss buttons.
Only show the pills row when filters are active.
```

### Tooltips and accordions

Interactive Dives let you keep the visual layout clean while still providing rich context. Move descriptions, methodology notes, and supporting text into **tooltips** and **accordions** so they're available on demand without cluttering the charts:

```text
Add an info tooltip on each chart title that explains the metric.
Add an expandable accordion below the charts with methodology notes.
```

This works well for Dives shared with a broad audience — power users can expand the details, while casual viewers get an uncluttered experience.

## Laying out a multi-chart Dive

For Dives with multiple charts, specify the grid layout in your prompt:

```text
Use a 3×2 grid layout (3 columns, 2 rows) with 6 charts.
Each chart card should have a title, subtle border, and 160px chart height.
```

Common layouts:

| Charts | Layout | Use case |
|--------|--------|----------|
| 2-4 | `repeat(2, 1fr)` | Focused analysis, fewer metrics |
| 5-6 | `repeat(3, 1fr)` | Dashboard-style overview |
| 8+ | `repeat(4, 1fr)` | Small multiples, sparkline grids |

## Example: full theme prompt

Here's a complete prompt that produces a themed, interactive Dive:

```text
Create a Dive showing sales data from my galactic_coffee database.

Theme: Corporate Dashboard
- Background: #f5f5f5. Text: #333. Muted: #777.
- Chart colors: ["#2563eb", "#16a34a", "#dc2626", "#f59e0b", "#8b5cf6"].
- Font: system-ui, sans-serif. Titles: semibold, UPPERCASE.
- Layout: 3×2 grid with card borders and 8px border radius.

Charts:
1. Line chart — Revenue trend over time
2. Pie chart — Product mix breakdown
3. Table — Station performance details
4. Bar chart — Station comparison
5. Composed chart — Revenue bars + Cups sold line (dual Y-axis)
6. Heatmap — Station × Month revenue density

Interactivity:
- Year toggle: 2024 | 2025 | All
- Metric toggle: Revenue | Cups
- Click a bar to filter by station, click a pie slice to filter by product
- Show filter pills with ✕ dismiss when filters are active

KPIs: Show total revenue, total cups sold, and average rating above the charts.
```

## Related resources

- [Dive theme gallery](/key-tasks/ai-and-motherduck/dives/dive-theme-gallery/) — Screenshots and ready-to-copy prompts for 15 themes
- [Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/) — Get started with your first Dive
- [Managing Dives as code](/key-tasks/ai-and-motherduck/dives/managing-dives-as-code/) — Version control and CI/CD for Dives
- [Dives SQL functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) — Manage Dives directly from SQL
- [MCP Server tools](/sql-reference/mcp/) — Reference for all MCP tools including Dive operations

---

## Managing Dives as Code

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/managing-dives-as-code

> Set up a Git-based workflow for developing, previewing, and deploying Dives with GitHub Actions and Claude Code

import VideoPlayer from '@site/src/components/VideoPlayer';

Creating Dives through an AI agent is fast, but as your team relies on them for decision-making, you may want the same rigor you apply to production code: version history, code review, and automated deployments. Since Dives are React components and SQL queries under the hood, you can manage them with Git and CI/CD — just like the rest of your codebase.

This guide walks through setting up that workflow: local development with hot reload, PR-based preview deployments, and automated production updates on merge. A [starter repo](https://github.com/motherduckdb/blessed-dives-example) with GitHub Actions pipelines is ready to fork and use.

## Quick start

Fork the [starter repo](https://github.com/motherduckdb/blessed-dives-example) to get up and running immediately. It includes:

- A working example Dive
- The Vite preview setup for local development
- GitHub Actions for deploy and cleanup
- A `CLAUDE.md` that teaches the agent the repo conventions

Fork the repo, set a `MOTHERDUCK_TOKEN` secret, and you're deploying Dives on merge.

<VideoPlayer
  playbackId="VZIfavcF0200cYrzg6JG4ln1i6wwLDDmXc1yeApyGM1Yk"
  title="CI/CD with Dives"
  thumbnailTime={4}
  youtubeUrl="https://youtu.be/61ouqduhIbc"
/>

## Prerequisites

- A [MotherDuck account](https://app.motherduck.com) with at least one Dive already published
- A GitHub repository to store your Dive source files (or fork the [starter repo](https://github.com/motherduckdb/blessed-dives-example))
- [Claude Code](https://docs.anthropic.com/en/docs/build-with-claude/claude-code/overview) connected to the [MotherDuck MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/)
- A MotherDuck API token set as a GitHub secret (`MOTHERDUCK_TOKEN`)

## Pull a dive for local development

Start with a Dive that's already published in MotherDuck. Copy its share link from the MotherDuck UI, then tell Claude Code to set it up locally:

```text
Set up this dive for local development: https://app.motherduck.com/dives/...
```

<VideoPlayer
  playbackId="VZIfavcF0200cYrzg6JG4ln1i6wwLDDmXc1yeApyGM1Yk"
  title="Pulling a Dive for local development"
  thumbnailTime={100}
  assetStartTime={43}
  assetEndTime={137}
/>

The agent uses the MotherDuck MCP Server to:

1. Read the Dive source through the SQL API using the share link
2. Pull down the file into a local directory in your repo
3. Register the Dive for CI
4. Start a lightweight Vite development server for live preview

The MCP Server's `get_dive_guide` tool provides the agent with everything it needs — the React component contract, dependency setup, and instructions for the local dev server. No additional skills or context files are required beyond what the MCP server provides.

![Claude Code spinning up the Vite dev server after pulling down a Dive for local development.](./img/claude_code_vite_terminal_1ffa9f80a9.png)

## Edit locally with an AI agent

With the local dev server running, you can iterate on the Dive using Claude Code. The agent can restyle charts, rewrite SQL queries, add filters, swap visualizations — anything you can express as a prompt.

```text
Make this much better visually. Top-tier style please.
```

<VideoPlayer
  playbackId="VZIfavcF0200cYrzg6JG4ln1i6wwLDDmXc1yeApyGM1Yk"
  title="Editing a Dive locally with Claude Code"
  thumbnailTime={170}
  assetStartTime={137}
  assetEndTime={200}
/>

The Vite dev server hot-reloads changes, so you see updates instantly in the browser. The MCP server provides schema context so the agent writes accurate SQL against your live data.

![A Dive running locally, showing the updated dashboard with improved styling and layout.](./img/dive_local_preview_9bccdb19bf.png)

If your repo includes a `CLAUDE.md` file (the [starter repo](https://github.com/motherduckdb/blessed-dives-example) includes one), the agent also knows the folder conventions and how to register new Dives for CI — so you can go from "pull this Dive down" to "push up a PR" without explaining any plumbing.

## Deploy a preview with GitHub actions

Once you're happy with your changes, tell the agent to push a PR:

```text
Put up a PR on a new feature branch
```

When a PR is opened (or updated with new commits), a GitHub Action detects which Dive folders changed and deploys a **preview** Dive to MotherDuck. The preview uses the same live environment as production but has a branch-tagged title so it's clearly labeled. A comment appears on the PR with a direct link.

<VideoPlayer
  playbackId="VZIfavcF0200cYrzg6JG4ln1i6wwLDDmXc1yeApyGM1Yk"
  title="PR preview deployment for Dives"
  thumbnailTime={218}
  assetStartTime={200}
  assetEndTime={238}
/>

![A GitHub Actions bot comment on a PR showing a preview Dive link — click Open Dive to see it live in MotherDuck.](./img/pr_preview_comment_13ca302ff9.png)

Your reviewer clicks the link and sees the Dive running with live queries — no local setup needed.

The deploy action uses path filters to detect which Dive folders changed, then calls a shared deploy script (`scripts/deploy-dive.sh`) for each one. The script reads the Dive's source and metadata, and uses the DuckDB CLI with the MotherDuck extension to create or update the Dive.

## Merge to production

When the preview looks right, merge the PR. A separate deploy job runs that creates or updates the production Dive, matched by title.

<VideoPlayer
  playbackId="VZIfavcF0200cYrzg6JG4ln1i6wwLDDmXc1yeApyGM1Yk"
  title="Merging a Dive to production"
  thumbnailTime={245}
  assetStartTime={244}
  assetEndTime={255}
/>

The production Dive is now live and shareable with anyone in your organization.

![The deploy GitHub Action after a merge to main, completing in 20 seconds.](./img/deploy_action_success_f763894ae0.png)

## Clean up preview dives

Delete the feature branch after merging. A cleanup action fires that removes the preview Dive from your MotherDuck account — no orphaned Dives cluttering your workspace.

<VideoPlayer
  playbackId="VZIfavcF0200cYrzg6JG4ln1i6wwLDDmXc1yeApyGM1Yk"
  title="Cleanup on branch deletion"
  thumbnailTime={265}
  assetStartTime={259}
/>

The entire pipeline is two GitHub Actions and one secret (`MOTHERDUCK_TOKEN`). At MotherDuck, we use a dedicated service account so anyone with repo access can edit and deploy with the same ownership scope.

## Related resources

- [Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/) — Create Dives from natural language with AI agents
- [Dives SQL Functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) — Manage Dives directly from SQL
- [Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) — Set up the MCP server with your AI assistant
- [Starter repo](https://github.com/motherduckdb/blessed-dives-example) — Fork and start deploying

---

## AI Features in the MotherDuck UI

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/ai-features-in-ui

> Use AI-powered SQL editing, FixUp, and natural language queries in the MotherDuck web interface.

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

:::tip Quick overview
For a hands-on walkthrough of FixIt and Edit in the web UI, see the [Web UI guide](/getting-started/interfaces/motherduck-quick-tour/#fix-errors-and-edit-queries-with-ai).
:::

## Automatically Edit SQL Queries in the MotherDuck UI

Edit is a MotherDuck AI-powered feature which allows you to edit SQL queries in the MotherDuck UI. The AI is aware of DuckDB-specific SQL features and relevant database schemas to provide effective suggestions.

Select the specific part of the query you want to edit, then press the keyboard shortcut to open the Edit dialog:
* Windows/Linux: `Ctrl + Shift + E`
* macOS: `⌘ + Shift + E`
  
In the Edit dialog, enter your prompt (e.g., "extract the domain from the url, using a regex") and click Suggest edit.

![Edit](../img/edit-prompt.png)

If the suggestion is not as desired, it can be further clarified with follow-up prompts.

![Edit](../img/edit-follow-up.png)

When happy with the change, click 'Apply edit', and the change will be applied to the query.

![Edit](../img/edit-follow-up-2.png)


## Automatically Fix SQL Errors in the MotherDuck UI

FixIt is a MotherDuck AI-powered feature that helps you resolve common SQL errors by offering fixes in-line. Read more about it in our [blog post](https://motherduck.com/blog/introducing-fixit-ai-sql-error-fixer/).
FixIt can also be called programmatically using the `prompt_fix_line` . Find more information in the [prompt_fix_line documentation](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line).

### How FixIt works

By default, FixIt is enabled for all users. If you run a query that has an error, FixIt will automatically analyze the query and suggest in-line fixes.
When accepting a fix, MotherDuck will automatically update your query and re-execute it.

![FixIt](../img/fixit-suggestion.png)

When 'Auto-suggest' is un-toggled, FixIt will not automatically suggest fixes anymore. FixIt can still be manually triggered by clicking 'Suggest fix' at the bottom of the error message.

![FixIt](../img/fixit-manual-suggestion.png)


## Access SQL Assistant functions
MotherDuck provides built-in AI features to help you write, understand and fix DuckDB SQL queries more efficiently. These features include:

- [Answer questions about your data](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-query) using the `prompt_query` pragma.
- [Generate SQL](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql) for you using the `prompt_sql` table function.
- [Correct and fix up your SQL query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fixup) using the `prompt_fixup` table function.
- [Correct and fix up your SQL query line-by-line](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line) using the `prompt_fix_line` table function.
- [Help you understand a query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-explain) using the `prompt_explain` table function.
- [Help you understand contents of a database](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-schema) using the `prompt_schema` table function.

### Example usage of prompt_sql
We use MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets).

```sql
CALL prompt_sql('what are the top domains being shared on hacker_news?');
```

Output of this SQL statement is a single column table that contains the AI-generated SQL query.
| **query** |
|-----------------|
| ```sql SELECT COUNT(*) as domain_count, SUBSTRING(SPLIT_PART(url, '//', 2), 1, POSITION('/' IN SPLIT_PART(url, '//', 2)) - 1) as domain FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY domain_count DESC LIMIT 10``` |

---

## Custom AI Agent Builder's Guide

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/building-analytics-agents

> Build AI-powered analytics agents using MotherDuck's SQL functions and MCP server integration.

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Building analytics agents with MotherDuck

Analytics agents are AI-powered systems that allow users to interact with data using natural language. Instead of writing SQL queries or building dashboards, users can ask questions like "What were our top-selling products last quarter?" and get immediate answers.

This guide covers best practices for building production-ready analytics agents on MotherDuck.

## Prerequisites

- **Agent framework**: [Claude Agent SDK](https://docs.anthropic.com/en/api/agent-sdk/overview), [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/), or Claude Desktop with MotherDuck remote MCP connector
- **MotherDuck account** with the data you want to query
- **Clean, well-structured data**: The better your schema and metadata, the better your agent performs

## Step 1: Define your agent's interface

Choose the interface your agent will use to query your MotherDuck database.

### Option A: Generated SQL

The agent generates SQL queries and executes them through a tool/function call. This provides maximum flexibility - agents can answer any question your data supports - but requires good SQL generation capabilities.

**Implementation approaches:**

**MCP Server**: Use our [remote MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) (or [local MCP server](/key-tasks/ai-and-motherduck/mcp-setup/#remote-vs-local-mcp-server) for self-hosted, read-write) for Claude Desktop, Cursor, ChatGPT, or Claude Code

**Custom tool calling**: Create a function that accepts SQL strings and executes them:

<Tabs>
<TabItem value="python" label="Python">

```python
import duckdb

def execute_sql(query: str) -> str:
    """Execute SQL query against MotherDuck"""
    conn = duckdb.connect('md:my_database?motherduck_token=<read_scaling_token>')
    try:
        result = conn.execute(query).fetchdf()
        return result.to_string()
    except Exception as e:
        return f"Error: {str(e)}"
```

</TabItem>
</Tabs>

### Option B: Parameterized query templates

The agent receives structured parameters that fill predefined SQL templates. This provides strict correctness guarantees and is easier to validate, but is less flexible and requires more upfront development with queries limited to predefined questions.

**Example**: Agent chooses calling a custom tool with a domain-specific signature like `get_sales_by_region(region: str, start_date: date, end_date: date)` instead of generating custom SQL.

**Recommendation**: Start with Option A (SQL generation) unless you have strict correctness requirements or very limited query patterns.

## Step 2: Give your agent SQL knowledge

Your LLM needs to know how to write good DuckDB queries.

### System prompt for DuckDB and MotherDuck

A system prompt is the foundational instruction set that guides your agent's behavior and capabilities. It's critical for ensuring your agent generates correct, efficient SQL queries and understands how to explore data effectively.

The query guide below should be added to your system prompt because it contains:
- DuckDB SQL syntax and conventions
- Common patterns and best practices
- How to explore schemas efficiently

<details>
<summary>query_guide.md</summary>

```text
# DuckDB SQL Query Syntax and Performance Guide 

## General Knowledge

### Basic Syntax and Features

**Identifiers and Literals:**
- Use double quotes (`"`) for identifiers with spaces/special characters or case-sensitivity
- Use single quotes (`'`) for string literals

**Flexible Query Structure:**
- Queries can start with `FROM`: `FROM my_table WHERE condition;` (equivalent to `SELECT * FROM my_table WHERE condition;`)
- `SELECT` without `FROM` for expressions: `SELECT 1 + 1 AS result;`
- Support for `CREATE TABLE AS` (CTAS): `CREATE TABLE new_table AS SELECT * FROM old_table;`

**Advanced Column Selection:**
- Exclude columns: `SELECT * EXCLUDE (sensitive_data) FROM users;`
- Replace columns: `SELECT * REPLACE (UPPER(name) AS name) FROM users;`
- Pattern matching: `SELECT COLUMNS('sales_.*') FROM sales_data;`
- Transform multiple columns: `SELECT AVG(COLUMNS('sales_.*')) FROM sales_data;`

**Grouping and Ordering Shortcuts:**
- Group by all non-aggregated columns: `SELECT category, SUM(sales) FROM sales_data GROUP BY ALL;`
- Order by all columns: `SELECT * FROM my_table ORDER BY ALL;`

**Complex Data Types:**
- Lists: `SELECT [1, 2, 3] AS my_list;`
- Structs: `SELECT {'a': 1, 'b': 'text'} AS my_struct;`
- Maps: `SELECT MAP([1,2],['one','two']) AS my_map;`
- Access struct fields: `struct_col.field_name` or `struct_col['field_name']`
- Access map values: `map_col[key]`

**Date/Time Operations:**
- String to timestamp: `strptime('2023-07-23', '%Y-%m-%d')::TIMESTAMP`
- Format timestamp: `strftime(NOW(), '%Y-%m-%d')`
- Extract parts: `EXTRACT(YEAR FROM DATE '2023-07-23')`

### Database and Table Qualification

**Fully Qualified Names:**
- Tables are accessed by fully qualified names: `database_name.schema_name.table_name`
- There is always one current database: `SELECT current_database();`
- Tables from the current database don't need database qualification: `schema_name.table_name`
- Tables in the main schema don't need schema qualification: `table_name`
- Shorthand: `my_database.my_table` is equivalent to `my_database.main.my_table`

**Switching Databases:**
- Use `USE my_other_db;` to switch current database
- After switching, tables in that database can be accessed without qualification

### Schema Exploration

**Get database and table information:**
- List all databases: `SELECT alias as database_name, type FROM MD_ALL_DATABASES();`
- List tables in database: `SELECT database_name, schema_name, table_name, comment FROM duckdb_tables() WHERE database_name = 'your_database';`
- List views in database: `SELECT database_name, schema_name, view_name, comment, sql FROM duckdb_views() WHERE database_name = 'your_database';`
- Get column information: `SELECT column_name, data_type, comment, is_nullable FROM duckdb_columns() WHERE database_name = 'your_database' AND table_name = 'your_table';`

**Sample data exploration:**
- Quick preview: `SELECT * FROM table_name LIMIT 5;`
- Column statistics: `SUMMARIZE table_name;`
- Describe table: `DESCRIBE table_name;`

### Performance Tips

**QUALIFY Clause for Window Functions:**
-- Get top 2 products by sales in each category
SELECT category, product_name, sales_amount
FROM products
QUALIFY ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales_amount DESC) <= 2;

**Efficient Patterns:**
- Use `arg_max()` and `arg_min()` for "most recent" queries
- Filter early to reduce data volume
- Use CTEs for complex queries
- Prefer `GROUP BY ALL` for readability
- Use `QUALIFY` instead of subqueries for window function filtering

**Avoid These Patterns:**
- Functions on the left side of WHERE clauses (prevents pushdown)
- Unnecessary ORDER BY on intermediate results
- Cross products and cartesian joins
```

</details>

### Function documentation

MotherDuck maintains `function_docs.jsonl` - compact, LLM-friendly documentation for every DuckDB/MotherDuck function available at: https://app.motherduck.com/assets/docs/function_docs.jsonl

**How to use**:
1. When user asks a question, search function docs using FTS or semantic search
2. Add the 5 most relevant function descriptions to the agent's context
3. This helps with specialized functions (window functions, date arithmetic, JSON operations, etc.)

## Step 3: Give your agent schema context

Your agent needs to understand your database structure to generate correct queries.

### Finding relevant tables

Our `query_guide.md` explains how agents can explore schemas autonomously to find relevant tables. For faster, non-agentic identification, use the built-in `INFORMATION_SCHEMA`.

```sql
-- adjust the search terms and database(s) to your needs
SELECT 
  table_schema,
  table_name,
  table_comment
FROM information_schema."tables"
where table_catalog = current_database()
and table_name like '%sales%'
  or table_name like '%customer%'
  or table_name like '%cust%'
  or table_comment like '%sales%'
  or table_comment like '%customer%';
```

For column level information you can use `information_schema.columns`.

### Make schemas agent-friendly

**Use clear naming**: Choose explicit, unambiguous table and column names

❌ Bad: `ord_dtl`, `cust_id`, `amt`  
✅ Good: `order_details`, `customer_id`, `total_amount`

**Add context with COMMENT ON**:

```sql
COMMENT ON TABLE orders IS 'Customer orders since 2020. Join to customers via customer_id';
COMMENT ON COLUMN orders.status IS 'Possible values: pending, shipped, delivered, cancelled';
COMMENT ON COLUMN orders.total_amount IS 'Total in USD including tax and shipping';
```

Comments help agents understand table relationships, valid values, and business logic. Learn more: [COMMENT ON documentation](https://duckdb.org/docs/stable/sql/statements/comment_on.html)

## Step 4: Configure access controls

Secure your agent's database access with appropriate permissions and isolation.

### Read-only access

Use [read-scaling tokens](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to ensure your agent only has read access. Read-scaling tokens connect to dedicated read replicas that cannot modify data.

<Tabs>
<TabItem value="python" label="Python">

```python
import duckdb
# Using a read-scaling token ensures read-only access
con = duckdb.connect('md:my_database?motherduck_token=<read_scaling_token>')
```

</TabItem>
</Tabs>

**For multi-tenant [customer-facing analytics](/getting-started/customer-facing-analytics/) agents**:

Use [service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) for your agents. You can grant these service accounts read-only access to specific databases using [shares](/key-tasks/sharing-data/sharing-overview/):

```sql
ATTACH 'md:_share/my_org/abc123' AS shared_data;
```

Consider creating separate service accounts per user/tenant for full compute isolation.

**Capacity planning**: Choose the number of [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) replicas and [Duckling size](/about-motherduck/billing/duckling-sizes/) according to the expected query complexity and concurrency.

### Read-write access & sandboxing

For agents that need to create tables, modify data, or experiment safely, use zero-copy clones to create an isolated sandbox. This provides safe experimentation completely isolated from production data, with instant creation through zero-copy operations. Agents get full capabilities to create tables, modify data, and experiment freely, with easy sharing of results back to production when ready.

```sql
-- Create instant writable copy (clones must match source retention type)
CREATE DATABASE my_sandbox FROM my_database_share;

-- Agent can now read/write without affecting production data
-- Changes are isolated to this copy
```

Learn more: [CREATE DATABASE documentation](/sql-reference/motherduck-sql-reference/create-database/)

## Step 5: Implement your agent

Build your agent using an SDK or framework that supports function calling.

**Quick start option**: For immediate experimentation, try [Claude Desktop with the MotherDuck remote MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) - no coding required.

**Custom agent option**: Here's a simple example using the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/):

<Tabs>
<TabItem value="python" label="Python">

```python
import duckdb
from agents import Agent, Runner, function_tool

# Connect to MotherDuck (use a read-scaling token for read-only access)
conn = duckdb.connect('md:?motherduck_token=<read_scaling_token>')

@function_tool
def query_motherduck(sql: str) -> str:
    """Execute SQL query against MotherDuck database.
    
    Args:
        sql: The SQL query to execute against the MotherDuck database.
    """
    try:
        result = conn.execute(sql).fetchdf()
        return result.to_string()
    except Exception as e:
        return f"Error executing query: {str(e)}"

# Load the DuckDB query guide (copy the system prompt template above into a local file)
with open('query_guide.md', 'r') as f:
    query_guide = f.read()

# Create agent with database tool
agent = Agent(
    name="MotherDuck Analytics Agent",
    instructions=f"""You are a data analyst helping users query a MotherDuck database. 

Use the query_motherduck tool to execute SQL queries against the database.

Always start with schema exploration before querying specific tables.

{query_guide}
""",
    tools=[query_motherduck]
)

# Run the agent
result = Runner.run_sync(
    agent, 
    "What were the top 5 products by revenue last month?"
)
print(result.final_output)
```

</TabItem>
</Tabs>

### Validating queries before showing to users

If a human reviews generated queries before execution, use `try_bind()` to validate SQL without running it. It checks syntax and referenced tables/columns in milliseconds.

**Structured output:** `try_bind()` returns `error_message` (VARCHAR) and `error_type` (VARCHAR). Use `error_type` to decide what to do next: `ok` means validation passed, `parser` means SQL syntax is invalid, and `binder` means object resolution failed (for example, a missing table/column or invalid reference). On `parser` or `binder`, pass `error_message` back into the next generation attempt so the model can repair the query.

```sql
-- Valid query - error_type is 'ok', error_message is empty
CALL try_bind('SELECT customer_id, total FROM orders WHERE status = ''shipped''');

-- Invalid query - returns error_message and error_type (e.g. 'parser' or 'binder')
CALL try_bind('SELECT * FORM orders');
```

**Example integration:**

<Tabs>
<TabItem value="python" label="Python">

```python
def generate_query_for_review(question: str) -> str:
    """Generate and validate SQL before showing to user."""
    error_msg = None
    for attempt in range(3):
        sql = agent.generate_sql(question, error_feedback=error_msg)

        # Validate before showing (error_message, error_type)
        row = conn.execute("CALL try_bind(?)", [sql]).fetchall()[0]
        error_message, error_type = row[0], row[1]

        if error_type == "ok":
            return f"Generated query:\n{sql}"

        error_msg = error_message or f"Validation failed: {error_type}"

    return "Could not generate a valid query to answer the question"
```

</TabItem>
</Tabs>

Feed `error_message` and `error_type` from `try_bind()` into retries to fix syntax and binding errors.

## Step 6: Test and iterate

Validate your agent's performance and refine its behavior based on real-world usage.

### Testing and quality

Choose a set of realistic user questions that cover simple filters ("Show me sales from last month"), complex analysis ("What's the trend in customer retention by region?"), and edge cases like empty results ("Show me sales for December 2019") or ambiguous requests ("Show me the best customers"). Test each question and check the agent's behavior. Focus on SQL correctness, result accuracy and query performance. See the next section for how to tackle common issues.

### Common issues and solutions

| Issue | Solution |
|-------|----------|
| Invalid SQL generation | Improve system prompt, add [function docs](#function-documentation) to context |
| Wrong tables queried | Add [COMMENT ON](https://duckdb.org/docs/stable/sql/statements/comment_on.html), improve schema descriptions, implement table filtering |
| Misunderstood questions | Add domain-specific examples to system prompt |
| Query performance | [EXPLAIN ANALYZE](/sql-reference/motherduck-sql-reference/explain-analyze/) to diagnose query inefficiencies, adjust [Duckling size](/about-motherduck/billing/duckling-sizes/) to scale compute resources |

## Next steps

- Explore our [MCP Server](/sql-reference/mcp/) docs (remote and local)
- Try [AI Features in the MotherDuck UI](/key-tasks/ai-and-motherduck/ai-features-in-ui/) with Generate SQL & Edit
- Learn about [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for multi-tenant agents
- Review [Shares](/key-tasks/sharing-data/sharing-overview/) for read-only data access

---

## Embedding Dives in your web application

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/embedding-dives

> Embed interactive MotherDuck Dives in your web app using iframes and embed sessions

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

You can embed Dives in your own web application so your users can interact with live data dashboards without signing in to MotherDuck. Your backend creates an embed session, and your frontend loads the Dive in a sandboxed iframe.

Embedding Dives is available on the **Business plan**.

## Prerequisites

Before you start, you need:

- A **MotherDuck Business plan** account
- A read/write access token for an account with the Admin role. For production, we recommend using a dedicated [service account](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/)
- A Dive you want to embed, with its [data shared](/sql-reference/mcp/share-dive-data) to the target service account that the embedded Dive will run as
- A backend server that can make authenticated API calls

:::tip Use a dedicated service account
We recommend using a service account that does not own databases with the same names as the databases your Dives query. When the service account attaches shared Dive data, the share alias defaults to the source database name. If the service account already has a database with that name, the attach fails. Using a dedicated, empty service account for embedding avoids this conflict.
:::

## How it works

Embedded Dives follow a short server-side flow:

1. **Your backend** calls the MotherDuck API with your access token to create an embed session: an opaque string that contains a read-only session string and the information needed to load the Dive.
2. **Your frontend** renders a sandboxed iframe that loads the Dive from `embed-motherduck.com`, passing the session string.
3. **MotherDuck** loads the Dive and runs live SQL queries.

Your end-users see an interactive dashboard without needing a MotherDuck account.

::::info[Two tokens are in play]
Your service account's access token is a **high-privilege read-write admin token** that stays on your backend and is used only to create embed sessions. The session string it produces contains a **separate, read-only token** that is limited in scope and expires after 24 hours. Only the session string should ever reach the frontend.
::::

```mermaid
sequenceDiagram
    participant M as MotherDuck
    participant B as Your backend
    participant F as Your frontend
    participant E as Embed iframe

    Note over B: Holds your access token
    B->>M: POST /v1/dives/<dive_id>/embed-session
    M-->>B: Session string
    B-->>F: Return session string
    F->>E: Load iframe /sandbox/#session=<session>
    Note over F,E: The session stays in the<br />URL fragment, not the request
    E->>M: Fetch Dive metadata and content
    M-->>E: Return the Dive
```

## Step 1: Create an embed session

Your backend calls the MotherDuck API to create an embed session. The access token used for this call must belong to an account with admin-level access. The session string contains a read-only token that expires after 24 hours.

::::warning[Important]
**Never expose your access token in client-side code.** The access token stays on your backend. Only the session string reaches the browser.
::::


<Tabs groupId="language">
<TabItem value="node" label="Node.js" default>

```javascript
const DIVE_ID = "<your_dive_id>";

const response = await fetch(
  `https://api.motherduck.com/v1/dives/${DIVE_ID}/embed-session`,
  {
    method: "POST",
    headers: {
      // This is the admin account used to generate the embed session.
      Authorization: `Bearer ${MOTHERDUCK_TOKEN}`,
      "Content-Type": "application/json",
    },
    // This is the service account whose compute / perms will be used for the Dive.
    body: JSON.stringify({ username: SERVICE_ACCOUNT_USERNAME }),
  }
);

if (!response.ok) {
  throw new Error(`Failed to create embed session: ${response.status}`);
}

const { session } = await response.json();
// Return this session string to your frontend
```

</TabItem>
<TabItem value="python" label="Python">

```python
import httpx

DIVE_ID = "<your_dive_id>"

response = httpx.post(
    f"https://api.motherduck.com/v1/dives/{DIVE_ID}/embed-session",
    headers={
        "Authorization": f"Bearer {MOTHERDUCK_TOKEN}",
        "Content-Type": "application/json",
    },
    json={"username": SERVICE_ACCOUNT_USERNAME},
)
response.raise_for_status()
session = response.json()["session"]
# Return this session string to your frontend
```

</TabItem>
</Tabs>

Replace `<your_dive_id>` with the ID of your Dive. You can find this in **Settings** > **Dives** or through the [`list_dives`](/sql-reference/mcp/list-dives) MCP tool.

Each session is tied to a single Dive. If you embed multiple Dives on the same page, create a separate embed session for each one. You can use the same service account and access token for all of them. The session string is base64-encoded but **not encrypted** — it contains a read-only (read-scaling) token, the Dive ID, and endpoint URLs. Treat it like a short-lived credential: do not log it or store it in persistent storage.
The embedded Dive runs queries as the service account specified in the session. If you need data isolation (for example, separate databases per region), use separate service accounts scoped to only the data each should access.

## Step 2: Embed the iframe

Add a sandboxed iframe to your page that points to the MotherDuck embed URL. Pass the session string in the URL fragment:

```html
<iframe
  src="https://embed-motherduck.com/sandbox/#session=<session_from_backend>"
  sandbox="allow-scripts allow-same-origin"
  width="100%"
  height="600"
  style="border: none;"
></iframe>
```

Replace `<session_from_backend>` with the session string your backend generated.

The `sandbox` attribute must include `allow-scripts allow-same-origin` for the embed to function.

### Query modes

We recommend getting embedding working with the default **server mode** first, then enabling dual mode afterward. The default server mode is sufficient for most use cases.

By default, embedded Dives run queries server-side through MotherDuck. You can also enable **dual mode**, where queries run on the client (using DuckDB WASM) or the server depending on the query. To use dual mode, add `?queryMode=dual` to the iframe URL:

```html
<iframe
  src="https://embed-motherduck.com/sandbox/?queryMode=dual#session=<session_from_backend>"
  sandbox="allow-scripts allow-same-origin"
  width="100%"
  height="600"
  style="border: none;"
></iframe>
```

#### Server mode data type limitations

Server mode runs queries through the Postgres wire protocol, which does not support all DuckDB data types. Basic types (integers, strings, floats) work fine, but nested types (structs, lists) and some less common timestamp types may not render correctly. If you encounter issues with specific columns, try dual (WASM) mode, which supports the full range of DuckDB types.

### URL structure

| Part | Description |
|------|-------------|
| `embed-motherduck.com/sandbox/` | The MotherDuck embed host |
| `?queryMode=dual` | Optional: enables dual (client + server) query mode |
| `#session=<session>` | The session string, passed in the URL fragment so it is never sent to the server |

The session is placed in the URL fragment (after `#`) rather than the query string. Browsers strip fragments before making HTTP requests, so the session does not appear in server logs or Referer headers.

## Session lifecycle

Embed sessions expire after 24 hours. You have two options for handling expiration:

- **Generate a fresh session per page load.** The simplest approach. Each time a user loads the page, your backend creates a new embed session and passes it to the iframe.
- **Cache and refresh.** Your backend caches the session and refreshes it before it expires. This reduces API calls but adds complexity.

If a session expires while a Dive is open, the embed displays a "Session expired" message. The user needs to reload the page to get a new session.

## Security best practices

- **Keep your access token server-side.** Never include your access token in client-side JavaScript, HTML, or any code that reaches the browser.
- **Use a dedicated service account.** Create a [service account](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) specifically for embedding, separate from your personal account. The account needs a read/write, Admin-level access token to create embed sessions, but the sessions it generates are always read-only.
- **Sessions are read-only.** The embed session always contains a read-scaling token, so it can only read data, not modify it.
- **Session in URL fragment.** The fragment (`#session=...`) is never sent to the server in HTTP requests, keeping the session out of access logs and referrer headers.
- **Scope service accounts for data isolation.** If you need to restrict which data different users can see (for example, per-region databases), create separate service accounts with access scoped to the appropriate data. The embedded Dive queries data as the service account used to create the session.

## CSP configuration

If your site uses a restrictive [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), add `embed-motherduck.com` to your `frame-src` directive:

```text
Content-Security-Policy: frame-src https://embed-motherduck.com;
```

Without this, the browser blocks the iframe from loading.

## Troubleshooting

Errors from the embed itself (expired token, Dive not found) appear as messages **inside the iframe**. CSP or network-related errors typically appear only in the **browser developer console**.

| Error message | Cause | Solution |
|---------------|-------|----------|
| "Dive embedding requires a Business plan." | Your organization is not on the Business plan | Upgrade to a [Business plan](https://motherduck.com/pricing/) |
| "Invalid or expired token. Please reload the page." | The session has expired or is malformed | Create a fresh embed session from your backend |
| "Dive not found." | The Dive ID is incorrect or the Dive has been deleted | Verify the Dive ID in **Settings** > **Dives** |
| "Failed to load dive. Please try again." | A generic error occurred while loading | Check your session string and network connectivity, then reload |
| "Can't open share: Share alias cannot be the same as an existing database name. _name_ is already taken and used as a database name." | Your service account already has a database with the same name as one of the Dive's shared databases | Rename or [detach](/key-tasks/database-operations/detach-and-reattach-motherduck-database/) the conflicting database on the service account. See [share alias conflicts](/sql-reference/motherduck-sql-reference/attach/#share-alias-conflicts) for details. |
| Iframe does not load (blank or blocked) | Your site's CSP blocks `embed-motherduck.com` | Add `frame-src https://embed-motherduck.com` to your CSP header (visible in browser dev console as a CSP violation) |
| User role "restricted" does not meet minimum role "admin" required for dashboards.createEmbedSession" | The user associated with the token is not an admin. Generating embed tokens requires the user or service account to have admin permissions. | In the service accounts panel under settings, change the role of the service account to 'Admin' |
| unauthorized_client: Callback URL mismatch. `<url>` is not in the list of allowed callback URLs | Embedded dives use MotherDuck's authorization system to determine permissions this limits what URLs can be used for authorization. | For local development ensure that you are running on `localhost` not something like `127.0.0.1` |

## Related resources

- [Creating visualizations with Dives](/key-tasks/ai-and-motherduck/dives/)
- [Dives SQL functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/)
- [Managing Dives as code](/key-tasks/ai-and-motherduck/dives/managing-dives-as-code)

---

## Text Search in MotherDuck

Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/text-search-in-motherduck

> Text search strategies from pattern matching to semantic search with embeddings in MotherDuck.

# Text Search in MotherDuck

Text search is a fundamental operation in data analytics - whether you're finding records by name, searching documents for relevant content, or building question-answering systems. This guide covers search strategies available in MotherDuck, from simple pattern matching to advanced semantic search, and how to combine them for optimal results.

## Quick Start: Common Search Patterns

Start here to identify the best search method for your use case. The right search approach depends on what you're searching, how you expect to use search, and what results you need. Most use cases fall into one of three patterns, each linking to detailed implementation guidance below:

**Keyword Search Over Identifiers**: When searching for specific items like company names, product codes, or customer names, use [Exact Match](#exact-match) for precise and low-latency lookups. If you need typo tolerance (e.g., "MotheDuck" → "MotherDuck"), use [Fuzzy Search](#fuzzy-search-text-similarity).

**Keyword Search Over Documents**: When searching longer text like articles, product descriptions, or documentation, use [Full-Text Search](#full-text-search-fts). This ranks documents by keyword relevance, and handles cases where users provide a few keywords that should appear in the content.

**Semantic Search**: When searching by meaning and similarity rather than exact keywords, use [Embedding-based Search](#embedding-based-search). This covers:
- Understanding synonyms (e.g., matching "data warehouse" with "analytics platform")
- Understanding natural language queries (e.g., "wireless headphones with good battery life")
- Finding similar content (e.g., support tickets describing similar customer issues)

---

For answering natural language questions about *structured*  data (e.g., "How many customers do we have in California?"), see [Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/).

## Refining Your Search Strategy

If the patterns above don't fully match your use case, use these four questions to navigate to the right method. Each question links to specific sections with implementation details:

1. **What is the search corpus?** Consider what you're searching through:
   - **Identifiers** like company names, product IDs, or person names → [Exact Match](#exact-match) or [Fuzzy Search](#fuzzy-search-text-similarity)
   - **Documents** like articles, descriptions, or reports → [Keyword search (regex)](#exact-match) or [Full-Text Search](#full-text-search-fts) (FTS) or [Embedding-Based Search](#embedding-based-search) or [Hybrid](#fts-pre-filtering-hybrid-search) (combining FTS + embeddings)
   - **Structured (numerical) data** → [Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/) that convert natural language questions to SQL

2. **What is the user input?** Think about how users express their search:
   - **Single terms** like "MotherDuck" → [Exact Match](#exact-match) or [Fuzzy Search](#fuzzy-search-text-similarity)
   - **Keyword phrases** like "data warehouse analytics" → [Keyword search (regex)](#exact-match) or [Full-Text Search](#full-text-search-fts) or [Embedding-based search](#embedding-based-search)
   - **Questions** like "What companies offer cloud analytics?" → [Embedding-based search](#embedding-based-search) with [HyDE](#hypothetical-document-embeddings-hyde)
   - **Example documents** (finding similar content) → [Embedding-based search](#embedding-based-search)

3. **What is the desired output?** Clarify what you're returning:
   - **Ranked list** (retrieval of documents/records) → Covered by this guide
   - **Generated text answers** (RAG-style Q&A, chatbots, summarization) → Use retrieval methods from this guide in combination with the [`prompt()`](/sql-reference/motherduck-sql-reference/ai-functions/prompt/#retrieval-augmented-generation-rag) function.

4. **What is the desired search behavior?** Think about what search qualities matter:
   - **Exact match** for specific words (IDs and codes) → [Exact Match](#exact-match) or [Keyword search (regex)](#using-regular-expressions)
   - **Typo resilience** to handle misspellings like "MotheDuck" → "MotherDuck" → [Fuzzy search](#fuzzy-search-text-similarity)
   - **Synonym resilience** to match "data warehouse" with "analytics platform" → [Embedding-based search](#embedding-based-search)
   - **Customizable ranking** → See [Reranking](#reranking) in the [Advanced Methods](#advanced-methods) section
   - **Latency and concurrency** → See [Performance Guide](#performance-guide)

## Search Methods

### Exact Match

Use exact match search for specific identifiers, codes, or when you need guaranteed matches. This is the fastest search method.

#### Using LIKE

For substring matching, use `LIKE` (or `ILIKE` for case-insensitive). In patterns, `%` matches any sequence of characters and `_` matches exactly one character.

```sql
-- Find places with 'Starbucks' in their name
SELECT name, locality, region
FROM foursquare.main.fsq_os_places
WHERE name LIKE '%Starbucks%'
LIMIT 10;
```

See also: [Pattern Matching](https://duckdb.org/docs/stable/sql/functions/pattern_matching.html) in DuckDB documentation

#### Using Regular Expressions

For more complex pattern matching or matching multiple keywords, use `regexp_matches()` with `(?i)` for case-insensitive searches:

```sql
-- Find Hacker News posts with 'python', 'javascript', or 'rust' in text
SELECT title, "by", score
FROM sample_data.hn.hacker_news
WHERE regexp_matches(text, '(?i)(python|javascript|rust)')
LIMIT 10;
```

See also: [Regular Expressions](https://duckdb.org/docs/stable/sql/functions/regular_expressions) in DuckDB documentation

### Fuzzy Search (Text Similarity)

Fuzzy search handles typos and spelling variations in entity names like companies, people, or products. Use `jaro_winkler_similarity()` for most fuzzy matching scenarios - it offers the best balance of accuracy and performance compared to `damerau_levenshtein()` or `levenshtein()`.

```sql
-- Find places similar to 'McDonalds' (handles typo 'McDonalsd')
SELECT 
  name,
  locality,
  region,
  jaro_winkler_similarity('McDonalsd', name) AS similarity
FROM foursquare.main.fsq_os_places
ORDER BY similarity DESC
LIMIT 10;
```

See also: [Text Similarity Functions](https://duckdb.org/docs/stable/sql/functions/text#text-similarity-functions) in DuckDB documentation

### Full-Text Search (FTS)

Full-Text Search ranks documents by keyword relevance using BM25 scoring, which considers both how often terms appear in a document and how rare they are across all documents. Use this for articles, descriptions, or longer text where you need relevance ranking. FTS automatically handles word stemming (e.g., "running" matches "run") and removes common stopwords (like "the", "and", "or"), but requires exact word matches - it won't handle typos in search queries.

#### Basic FTS Setup

FTS requires write access to the table. Since we're using a read-only example database, we first create a copy of the table in a read-write database we own:

```sql
CREATE TABLE hn_stories AS 
SELECT id, title, text, "by", score, type
FROM sample_data.hn.hacker_news 
WHERE type = 'story'
  AND LENGTH(text) > 100
LIMIT 10000;
```

Build the FTS index on the text column. This creates a new schema called `fts_{schema}_{table_name}` (in this case `fts_main_hn_stories`):

```sql
PRAGMA create_fts_index(
  'hn_stories',  -- table name
  'id',          -- document ID column
  'text'         -- text column to index
);
```

Search the index using the `match_bm25` function from the newly created schema:

```sql
SELECT 
  id,
  title,
  text,
  fts_main_hn_stories.match_bm25(id, 'database analytics') AS score
FROM hn_stories
ORDER BY score DESC
LIMIT 10;
```

#### Index Maintenance

FTS indexes need to be updated when the underlying data changes. Rebuild the index using the `overwrite` parameter:

```sql
PRAGMA create_fts_index('hn_stories', 'id', 'text', overwrite := 1);
```

See also: [Full-Text Search Guide](https://duckdb.org/docs/stable/guides/sql_features/full_text_search.html) and [Full-Text Search Extension](https://duckdb.org/docs/stable/core_extensions/full_text_search) in DuckDB documentation

### Embedding-Based Search

Embedding-based search finds conceptually similar text by meaning, not keywords. Use this for natural language queries, handling synonyms, or when users search with questions. Embeddings handle synonyms and typos naturally without manual configuration.

:::note
Embedding generation and lookups are priced in [AI Units](/about-motherduck/billing/pricing#advanced-ai-functions). For paid organizations, Business and Lite plans have a default soft limit of 10 AI Units per user/day (sufficient to embed around 600,000 rows) to help prevent unexpected costs. If you'd like to adjust these limits, [just ask!](/troubleshooting/support)
:::

:::info
The DuckDB [VSS extension](https://duckdb.org/docs/stable/core_extensions/vss) for approximate vector search (HNSW) is currently experimental, and not supported in MotherDuck's cloud service (Server-Side). [Learn more](/concepts/duckdb-extensions/) about MotherDuck's support for DuckDB extensions.
:::

#### Basic Embedding-Based Search Setup

Generate embeddings for your text data, then search using exact vector similarity. For search queries phrased as questions (like "What are the best practices for...?"), see [Hypothetical Document Embeddings](#hypothetical-document-embeddings-hyde).


```sql
-- Reusing the hn_stories table from the FTS section, add embeddings
ALTER TABLE hn_stories ADD COLUMN text_embedding FLOAT[512];
UPDATE hn_stories SET text_embedding = embedding(text);

-- Semantic search - this will also match texts with related concepts like 'neural networks', 'deep learning', etc.
SELECT 
  title,
  text,
  array_cosine_similarity(
    embedding('machine learning and artificial intelligence'),
    text_embedding
  ) AS similarity
FROM hn_stories
ORDER BY similarity DESC
LIMIT 10;
```

See also: [MotherDuck Embedding Function](/sql-reference/motherduck-sql-reference/ai-functions/embedding/), and [array_cosine_similarity](https://duckdb.org/docs/stable/sql/functions/array#array_cosine_similarityarray1-array2) in DuckDB documentation

#### Document Chunking for Embedding-Based Search

When documents are longer than ~2000 characters, consider breaking them into smaller chunks to improve retrieval precision and focus results. For production pipelines with PDFs or Word docs, you can use the [MotherDuck integration for Unstructured.io](https://motherduck.com/blog/effortless-etl-unstructured-data-unstructuredio-motherduck/). Otherwise, you can also do document chunking in the database - here are some helpful macros:

```sql
-- Fixed-size chunking with configurable overlap
CREATE MACRO chunk_fixed_size(text_col, chunk_size, overlap) AS TABLE (
  SELECT 
    gs.generate_series as chunk_number,
    substring(text_col, (gs.generate_series - 1) * (chunk_size - overlap) + 1, chunk_size) AS chunk_text
  FROM generate_series(1, CAST(CEIL(LENGTH(text_col) / (chunk_size - overlap * 1.0)) AS INTEGER)) gs
  WHERE LENGTH(substring(text_col, (gs.generate_series - 1) * (chunk_size - overlap) + 1, chunk_size)) > 50
);

-- Paragraph-based chunking (splits on double newlines)
CREATE MACRO chunk_paragraphs(text_col) AS TABLE (
  WITH chunks AS (SELECT string_split(text_col, '\n\n') as arr)
  SELECT 
    UNNEST(generate_series(1, array_length(arr))) as chunk_number,
    UNNEST(arr) as chunk_text
  FROM chunks
);

-- Sentence-based chunking (splits on sentence boundaries)
CREATE MACRO chunk_sentences(text_col) AS TABLE (
  WITH chunks AS (SELECT string_split_regex(text_col, '[.!?]+\s+') as arr)
  SELECT 
    UNNEST(generate_series(1, array_length(arr))) as chunk_number,
    UNNEST(arr) as chunk_text
  FROM chunks
);
```

Use one of the macros to create chunks from your documents. Fixed-size chunks (300-600 chars with 10-20% overlap) work well for most use cases:

```sql
CREATE OR REPLACE TABLE hn_text_chunks AS
SELECT 
  id AS post_id,
  title,
  chunks.chunk_number,
  chunks.chunk_text
FROM hn_stories
CROSS JOIN LATERAL chunk_fixed_size(text, 500, 100) chunks;
-- Alternative: CROSS JOIN LATERAL chunk_paragraphs(text) chunks;
-- Alternative: CROSS JOIN LATERAL chunk_sentences(text) chunks;
```

Generate embeddings for the chunks:

```sql
ALTER TABLE hn_text_chunks ADD COLUMN chunk_embedding FLOAT[512];
UPDATE hn_text_chunks SET chunk_embedding = embedding(chunk_text);
```

Once you have chunks with embeddings, search them the same way as full documents using `array_cosine_similarity()` - the chunk-level results often provide more precise matches than searching entire documents.

## Performance Guide

Search performance depends on several factors, from the chosen search method, to cold vs. warm reads, Duckling sizing, and tenancy model.

When running a search query against your data for the first time (cold read), it may have a higher latency than subsequent queries (warm reads). For production search workloads, ideally dedicate a service account's Duckling primarily to search, so other queries don't compete with search queries. Account for [Duckling cooldown periods](/about-motherduck/billing/duckling-sizes/) - the first search query after cooldown may experience more latency.

The DuckDB analytics engine divides data into chunks and processes them in parallel across threads. More data means more chunks to process in parallel, so larger datasets don't necessarily take proportionally longer to search - they just use more threads simultaneously.

**Duckling sizing:** Optimal latency requires warm reads and enough threads to process your data in parallel. With the ideal [Duckling sizing](/about-motherduck/billing/duckling-sizes/) configuration matched to your dataset size, keyword search over identifiers ([exact match](#exact-match), [fuzzy match](#fuzzy-search-text-similarity)) typically achieves latencies in the range of a few hundred milliseconds, while document search ([regex](#using-regular-expressions), [Full-Text Search](#full-text-search-fts), [embedding search](#embedding-based-search)) typically achieves 0.5-3 second latency. Our team is happy to help advise on the right resource allocation for your specific workload and latency targets - [get in touch](/troubleshooting/support) to discuss how we can meet your needs.

**Handling Concurrent Requests:** For handling multiple simultaneous search requests effectively, consider using [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to distribute load across multiple read scaling Ducklings. Alternatively, consider [hypertenancy](/concepts/hypertenancy), providing isolated compute resources for each user.

To optimize further, see the strategies below. For questions or requirements beyond this guide, please [get in touch](/troubleshooting/support).

### Search Optimization Strategies

When optimizing search performance, consider the following options.

#### Pre-filtering

Reduce the search space using structured metadata (e.g. location, categories, date ranges) that can be inferred from the user's context, before running similarity searches:

```sql
-- Create a local copy with embeddings for place names (using a subset)
CREATE TABLE places AS 
SELECT fsq_place_id, name, locality, region, fsq_category_labels
FROM foursquare.main.fsq_os_places
WHERE name IS NOT NULL
LIMIT 10000;

-- Add embeddings for semantic search
ALTER TABLE places ADD COLUMN name_embedding FLOAT[512];
UPDATE places SET name_embedding = embedding(name);

-- Pre-filter by location before semantic search
WITH filtered_candidates AS (
  SELECT fsq_place_id, name, locality, fsq_category_labels, name_embedding
  FROM places
  WHERE locality = 'New York'  -- Filter by location and region
    AND region = 'NY'
)
SELECT 
  name,
  locality,
  fsq_category_labels,
  array_cosine_similarity(
    embedding('italian restaurant'),
    name_embedding
  ) AS similarity
FROM filtered_candidates
ORDER BY similarity DESC
LIMIT 20;
```

#### Reducing Embedding Dimensionality

Halving embedding dimensions roughly halves compute time. OpenAI embeddings can be truncated at specific dimensions (256 for `text-embedding-3-small`, 256 or 512 for `text-embedding-3-large`). Use lower dimensions for initial pre-filtering, then rerank with full embeddings:

```sql
-- Setup: Create normalization macro
CREATE MACRO normalize(v) AS (
  CASE
    WHEN len(v) = 0 THEN NULL 
    WHEN sqrt(list_dot_product(v, v)) = 0 THEN NULL
    ELSE list_transform(v, element -> element / sqrt(list_dot_product(v, v)))
  END
);

-- Add lower-dimensional column (e.g., 256 dims instead of 512)
ALTER TABLE hn_stories ADD COLUMN text_embedding_short FLOAT[256];
UPDATE hn_stories SET text_embedding_short = normalize(text_embedding[1:256]);
```

Then use a two-stage search:

```sql
-- Stage 1: Fast pre-filter with short embeddings
SET VARIABLE query_emb = embedding('machine learning algorithms', 'text-embedding-3-large');
SET VARIABLE query_emb_short = normalize(getvariable('query_emb')[1:256])::FLOAT[256];

WITH candidates AS (
  SELECT id, 
    array_cosine_similarity(getvariable('query_emb_short'), text_embedding_short) AS similarity
  FROM hn_stories
  ORDER BY similarity DESC
  LIMIT 500  -- Get more candidates if needed
)
-- Stage 2: Rerank with full embeddings
SELECT p.title, p.text,
  array_cosine_similarity(getvariable('query_emb'), p.text_embedding) AS final_similarity
FROM hn_stories p
WHERE p.id IN (SELECT id FROM candidates)
ORDER BY final_similarity DESC
LIMIT 10;
```

#### FTS Pre-filtering (Hybrid Search)

FTS typically has lower latency than embedding search, making it effective as a pre-filter to reduce similarity comparisons. Use a large LIMIT in the FTS stage to ensure good recall:

```sql
-- FTS pre-filter with large limit, then semantic rerank
SET VARIABLE search_query = 'artificial intelligence neural networks';

WITH fts_candidates AS (
  SELECT id,
    fts_main_hn_stories.match_bm25(id, getvariable('search_query')) AS fts_score
  FROM hn_stories
  ORDER BY fts_score DESC
  LIMIT 10000  -- Large limit to ensure recall
)
SELECT h.id, h.title, h.text,
  array_cosine_similarity(
    embedding(getvariable('search_query')),
    h.text_embedding
  ) AS similarity
FROM hn_stories h
INNER JOIN fts_candidates f ON h.id = f.id
ORDER BY similarity DESC
LIMIT 10;
```

See also: [Search Using DuckDB Part 3 (Hybrid Search)](https://motherduck.com/blog/search-using-duckdb-part-3/)


## Advanced Methods

This section covers additional techniques to customize and improve your search. The methods below demonstrate common approaches - many other variants are possible.

:::note
Some methods in this section make use of the `prompt()` function, which is priced in [AI Units](/about-motherduck/billing/pricing#advanced-ai-functions). For paid organizations, Business and Lite plans have a default soft limit of 10 AI Units per user/day (sufficient to process around 80,000 rows) to help prevent unexpected costs. If you'd like to adjust these limits, [just ask!](/troubleshooting/support)
:::

### LLM-Enhanced Keyword Expansion

Generate synonyms with an LLM, then use them in pattern matching:

```sql
-- Generate synonyms using LLM with structured output
SET VARIABLE search_term = 'programming';

WITH synonyms AS (
  SELECT prompt(
    'Give me 5 synonyms for ''' || getvariable('search_term') || '''',
    struct := {'synonyms': 'VARCHAR[]'}
  ).synonyms AS synonym_list
)
-- Search with expanded terms
SELECT 
  title,
  text
FROM sample_data.hn.hacker_news, synonyms
WHERE regexp_matches(text, getvariable('search_term') || '|' || array_to_string(synonym_list, '|'))
LIMIT 10;
```

See also: [MotherDuck `prompt()` Function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/)

### Hypothetical Document Embeddings (HyDE)

HyDE improves question-based retrieval by generating a hypothetical answer first, then searching with that answer's embedding. This works because questions and answers have different linguistic patterns - the hypothetical answer better matches actual document content. Use with semantic search or the semantic component of hybrid search.

```sql
-- HyDE: Generate hypothetical answer, then search with it
WITH hypothetical_answer AS (
  SELECT prompt(
    'Answer this question in 2-3 sentences: 
     "What are the key challenges in building scalable distributed systems?"
     
     Focus on typical technical challenges and solutions.'
  ) AS answer
)
-- Search using the hypothetical answer's embedding
SELECT 
  title,
  text,
  array_cosine_similarity(
    (SELECT embedding(answer) FROM hypothetical_answer),
    text_embedding
  ) AS similarity
FROM hn_stories
ORDER BY similarity DESC
LIMIT 10;
```

See also: [Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE paper)](https://arxiv.org/abs/2212.10496)

### Reranking

Reranking typically happens in two stages: initial retrieval to get top candidates (100-500 results), then precise reranking of that smaller set.

#### Rule-Based Reranking with Metadata

Refine results based on business rules and metadata like score, category, or freshness:

```sql
-- Find similar posts with metadata-based reranking
WITH initial_similarity AS (
  -- Step 1: Fast vector similarity for top candidates
  SELECT
    title,
    text,
    score as author_score,
    array_cosine_similarity(
      embedding('artificial intelligence and machine learning applications'),
      text_embedding
    ) AS emb_similarity
  FROM hn_stories
  ORDER BY emb_similarity DESC
  LIMIT 100
),
reranked_scores AS (
  -- Step 2: Rerank with metadata (author score)
  SELECT
    title,
    text,
    author_score,
    emb_similarity,
    -- Score boost (normalize to 0-1 range based on actual data)
    (author_score / MAX(author_score) OVER ()) AS author_score_norm,
    -- Combined final score: 60% semantic + 40% author score
    (emb_similarity * 0.6 + author_score_norm * 0.4) AS reranked_score
  FROM initial_similarity
)
SELECT
  title,
  text,
  author_score,
  ROUND(emb_similarity, 3) as semantic_score,
  ROUND(author_score_norm, 3) as author_score_normalized,
  ROUND(reranked_score, 3) as final_score
FROM reranked_scores
ORDER BY reranked_score DESC
LIMIT 10;
```

#### LLM-Based Reranking

For complex relevance criteria that are hard to express as rules, use an LLM to judge and score results. The [`prompt()` function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) is optimized for batch processing and processes requests in parallel - so reranking 50 results typically adds only a few hundred milliseconds.

```sql
-- LLM reranking for top search results
SET VARIABLE search_query = 'best practices for code review and software quality';

WITH top_candidates AS (
  -- Initial retrieval (e.g., via semantic search)
  SELECT 
    id,
    title,
    text,
    array_cosine_similarity(
      embedding(getvariable('search_query')),
      text_embedding
    ) AS initial_score
  FROM hn_stories
  ORDER BY initial_score DESC
  LIMIT 20 
),
llm_reranked AS (
  SELECT 
    *,
    prompt(
      format(
        'Rate how well this post matches the query ''{}''. 
         Post: {} - {}',
        getvariable('search_query'), title, text
      ),
      struct := {'rating': 'INTEGER'}
    ).rating AS llm_score
  FROM top_candidates
)
SELECT 
  title,
  text,
  ROUND(initial_score, 3) as initial_score,
  llm_score,
  ROUND((0.6 * initial_score + 0.4 * llm_score / 10.0), 3) AS final_score
FROM llm_reranked
ORDER BY final_score DESC
LIMIT 10;
```

## Next Steps

- Check out the MotherDuck [Embedding Function](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) and [Prompt Function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/)
- Review the [Full-Text Search Guide](https://duckdb.org/docs/stable/guides/sql_features/full_text_search.html) in DuckDB documentation
- Read the MotherDuck blog series: [Search Using DuckDB Part 1](https://motherduck.com/blog/search-using-duckdb-part-1/), [Part 2](https://motherduck.com/blog/search-using-duckdb-part-2/), [Part 3](https://motherduck.com/blog/search-using-duckdb-part-3/)
- Explore [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/)

---
