Source: https://motherduck.com/docs/about-motherduck/about-motherduck
---
title: About MotherDuck
sidebar_class_name: about-motherduck-icon
description: About MotherDuck
---
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/about-motherduck/billing/billing
---
title: MotherDuck Billing
description: Learn more about MotherDuck's pricing model and how to manage billing.
---
import Versions from '@site/src/components/Versions';
import DuckDBDocLink from '@site/src/components/DuckDBDocLink';
MotherDuck offers free and paid [billing](https://motherduck.com/pricing/) plans.
View your Organization's incurred usage, track spend, and view your invoices.
All new users start on a 21-Day Free Trial.
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/about-motherduck/billing/duckling-sizes
---
sidebar_position: 3
title: Duckling Sizes
description: Learn about MotherDuck Duckling (compute instance) sizes and their optimal use cases.
---
MotherDuck implements a distinct tenancy architecture that diverges from traditional database systems.
The platform utilizes a per-user tenancy model, which provisions isolated read-write Ducklings (compute instances) for each Organization member.
This architecture ensures dedicated compute resources and Duckling-level configuration at the individual user level, allowing users to independently optimize performance parameters according to their specific workload requirements. Each Duckling size has different performance characteristics and [billing implications](/about-motherduck/billing/pricing/#compute-pricing).
MotherDuck uses fast SSDs for spill space, so queries can exceed their memory limits with minimal performance impact. DuckDB caches data in memory, and MotherDuck uses fast local disks for storage, which improves cold start times.
## Duckling Sizes
| Duckling Size | Plans | Use Case | Cooldown Period | Startup Time | Read-Write Duckling Enabled? | Read Scaling Duckling Enabled? |
|---------------|------------|----------|-----------------|--------------------|-------------------------------|---------------------------------|
| Pulse | Free, Lite, Business | Good for small workloads| 1 second | ~100ms | Yes | Yes |
| Standard |Lite, Business | Good for most data loading workloads | 60 seconds | ~100ms | Yes | Yes |
| Jumbo | Business | Better for large, complex transformations during loading | 60 seconds | ~100ms | Yes | Yes |
| Mega | Business | Optimal for demanding jobs with even larger scale and volumes than a Jumbo can handle | 5 minutes | ~a few minutes | Yes | Yes |
| Giga | Business, and in [Free Trial on request](https://motherduck.com/contact-us/product-expert/) | Best used for your largest and toughest workloads like batch jobs that run overnight or on weekends | 10 minutes | ~a few minutes | Yes | No |
- We recommend keeping the cooldown periods in mind when planning batch sizes
### PULSE
**Optimized for ad-hoc analytics and read-only workloads**
Pulse Ducklings are auto-scaling and designed for efficiency, making them ideal for:
- Running ad-hoc queries (**Note** complex queries involving [spatial analysis](https://duckdb.org/docs/stable/extensions/spatial/functions.html) or regex-like functions may perform better on larger Duckling sizes)
- Read-optimized workflows with high concurrent user access, such as those in customer-facing analytics.
- Powering data apps and embedded analytics where quick, short queries are common.
- High-concurrency, read-optimized workflows
[Learn how Pulse Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing)
### STANDARD
**Production-grade Duckling designed for analytical processing and reporting**
Standard Ducklings offer a balance of resources for consistent performance, suited for:
- Core analytical workflows requiring balanced performance metrics.
- Development and validation environments for production workflows.
- Standard ETL/ELT pipeline implementation, including:
- Parallel execution of incremental ingestion jobs.
- Multi-threaded transformation processing.
[Learn how Standard Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing)
### JUMBO
**A larger Duckling built for high-throughput processing and faster performance**
Jumbo Ducklings provide resources for heavy workloads, including:
- Large-scale batch processing and ingestion operations.
- Complex query execution on high-volume datasets.
- Advanced join operations and aggregations.
- RAM-intensive processing of deeply-nested JSON structures or other large data objects.
[Learn how Jumbo Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing)
### MEGA
**Built for high-throughput processing on demanding jobs at even larger scale than a Jumbo's capacity**
Mega Ducklings provide compute resources to help expedite large-scale transformations and complex operations, perfect for:
- Batch processing and high-volume ingestion operations.
- Running a weekly job that rebuilds all of your tables that needs to run quickly, in minutes - not hours.
- Complex query execution on high-volume datasets that a Jumbo Duckling won't be able to handle in a time crunch.
- Advanced operations for users with 10x the data volume as other users who require low-latency, swift performance.
[Learn how Mega Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing)
### GIGA
**Our largest Duckling, built for the toughest workloads with massive scale and complexity**
Giga Ducklings provide compute resources for the most demanding tasks, perfect for:
- Complex, large-scale workloads and jobs that won't run on any other Duckling size.
- Running one-time jobs that need to complete overnight or over the weekend, like restating revenue actuals for 10 years's worth of high-volume data.
- Huge volumes of advanced join operations and aggregations.
- Very large amounts of RAM-intensive processing of deeply-nested JSON structures or other large data objects.
[Learn how Giga Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing)
## Changing Duckling Sizes
Duckling sizes can be changed in MotherDuck UI by clicking on the icon in the top right, or under "Settings > Ducklings". Here you can choose the desired Read/Write and Read Scaling size. Changing Duckling size can take a few minutes while your new Duckling wakes up.

The Duckling size for a user or service account can also be set using the [`Set user Ducklings` REST API](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/).
**Note:** Changing Duckling size in the UI or via our [REST API](/sql-reference/rest-api/motherduck-rest-api/) takes
* **2 minutes** for Pulse, Standard and Jumbo
* **5 minutes** for Mega
* **10 minutes** for Giga
---
Source: https://motherduck.com/docs/about-motherduck/billing/managing-billing
---
sidebar_position: 2
title: Managing your bill
description: Learn how to manage your MotherDuck spend, choose plans, monitor usage, and view invoices.
---
import Versions from '@site/src/components/Versions';
import DuckDBDocLink from '@site/src/components/DuckDBDocLink';
This guide explains how to manage your MotherDuck billing, including selecting a plan that suits your needs, keeping track of your usage, and understanding your invoices.
## Choosing Your Billing Plan
MotherDuck offers a variety of [plans with different features and pricing](/about-motherduck/billing/pricing/). During your initial 21-day Free Trial, you can explore MotherDuck's capabilities. Afterwards, or at any time during the trial, you can select a plan by navigating to the [Plans page](https://app.motherduck.com/settings/plans) in Settings within the MotherDuck UI:
- **Transition to Free Plan**: If you select "Free" upon completion of the Free Trial (or if selected during), your organization will be placed on the [Free Plan](/about-motherduck/billing/pricing/#1-free-plan), subject to its specific limitations on storage, compute, and users.
- **Upgrade to Lite Plan**: If you select "Lite" your organization will transition to the [Lite Plan](/about-motherduck/billing/pricing/#2-lite-plan). This plan offers unlimited compute and storage on a pay-as-you-go basis. You will be prompted to add payment information if you haven't already.
- **Upgrade to Business Plan**: Selecting "Business" moves your organization to the [Business Plan](/about-motherduck/billing/pricing/#3-business-plan), designed for more extensive use with features like unlimited users and access to five Duckling sizes. Payment information will be required.
For details on the features and allowances of each plan, please refer to our [Pricing Model documentation](/about-motherduck/billing/pricing/).
## Monitoring Usage
You can monitor your organization's Compute and Storage usage from the [Billing page](https://app.motherduck.com/settings/billing) in the MotherDuck UI.
- **Compute usage** is displayed in Compute Unit-hours (CU-hours). Learn more about [how compute is priced](/about-motherduck/billing/pricing/#compute-pricing).
- **Storage usage** is displayed in prorated GB-days. Learn more about [how storage is priced](/about-motherduck/billing/pricing/#storage-pricing).
GB-days are calculated based on the amount of data stored per day. For example, storing 10 GB of data for a full 30-day month would equate to 300 GB-days. Data recoverability for the past 7 days also contributes to your storage bill.

## Viewing Your Invoice
The [Billing page](https://app.motherduck.com/settings/billing) also enables you to view your past invoices, as well as the current month's invoice thus far.
- **Lite & Business Plan users** see their actual invoices reflecting their usage and any plan fees.
- **[Free Trial](/about-motherduck/billing/pricing/#free-trial) users** see estimated invoices, which are fully discounted during the trial period.
- Invoices are not generated for organizations on the **[Free Plan](/about-motherduck/billing/pricing/#free-plan)**.
Incurred Storage and Compute costs are broken down per-user and per-service-account, as well as aggregated for the entire organization.
:::note
For organizations with more than 500 users and service accounts, invoices may show aggregated usage rather than a full per-user breakdown to maintain clarity.
:::
---
Source: https://motherduck.com/docs/about-motherduck/billing/pricing
---
sidebar_position: 1
title: Understanding the pricing model
description: Details of MotherDuck's pricing model.
---
import Versions from '@site/src/components/Versions';
import DuckDBDocLink from '@site/src/components/DuckDBDocLink';
## MotherDuck Pricing Model
MotherDuck is a serverless cloud data warehouse. We believe in providing our users with simple, straightforward pricing.
MotherDuck offers two paid plans, Lite and Business, and a Free Plan.
:::note
MotherDuck is currently available on AWS in two regions, **US East (N. Virginia)** - `us-east-1` and **Europe (Frankfurt)** - `eu-central-1`. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
:::
### 1. Free Plan
If you're a casual user, student or hobbyist, our Free Plan may be a good fit.
**The Free Plan gives you access to the following features:**
- A limited amount of Compute (up to 10 CU hours / month)
- A limited amount of MotherDuck-provided storage (up to 10 GB)
- One Duckling size for compute: Pulse - our smallest compute instance, optimized for efficiency
- Up to 5 users, including service accounts, can be invited to your organization
### 2. Lite Plan
The Lite Plan is a monthly, pay-as-you-go plan that is perfect for hobbyists and small organizations.
**The Lite Plan gives you access to the following features:**
- Compute is charged at a [per-second usage-basis](https://motherduck.com/docs/about-motherduck/billing/pricing/#compute-pricing) depending on the Duckling size, with access provided for **two Duckling sizes**, Pulse and Standard
- Storage is charged in line with the [following rates](https://motherduck.com/docs/about-motherduck/billing/pricing/#storage-pricing)
- Platform access is provided for a flat $25 per month
- Up to 5 users, including service accounts, can be invited to your organization
More details can be found in the official [pricing table](https://motherduck.com/pricing/).
### 3. Business Plan
The Business plan is a monthly, pay-as-you-go plan that is the most popular choice for MotherDuck customers.
- Compute is charged at a [per-second usage-basis](https://motherduck.com/docs/about-motherduck/billing/pricing/#compute-pricing) depending on the Duckling size, with access provided for **five Duckling sizes**, Pulse, Standard, Jumbo, Mega, and Giga
- Storage is charged in line with the [following rates](https://motherduck.com/docs/about-motherduck/billing/pricing/#storage-pricing)
- Platform access is provided for a flat $100 per month
- There are no limits to the number of users, including service accounts, that can be invited to your organization
More details can be found in the official [pricing table](https://motherduck.com/pricing/).
### Compute Pricing
A **Compute Unit (CU)** in MotherDuck is defined as a measure of CPU and memory usage over time. A **Duckling** in MotherDuck is a compute instance. Each Duckling has a cooldown period, which is the amount of time the Duckling will remain active after completing the last query.
Depending on the Duckling (DuckDB instance) size, MotherDuck meters compute on demand or per Duckling.
- **Pulse:** An on-demand, auto-scaling Duckling that is metered per-query based on the actual CPU seconds consumed and memory usage over time (Compute Units)
- **Standard, Jumbo, Mega, and Giga:** Metered per second a Duckling is running, based on wall clock time, with variable compute costs based on the pricing plan and selected Duckling size. These Ducklings each have a cooldown period, which is the amount of time the Duckling will remain active after completing the last query. This time is intended to keep the Duckling warm in case of follow-up queries that may benefit from MotherDuck's intelligent storage and caching.
- **[Pulse](/about-motherduck/billing/duckling-sizes/#pulse)**
- A burstable, auto-scaling Duckling, metered per-query, based on MotherDuck's **Compute Units (CUs)** the total CPU seconds consumed and memory usage over time (Compute Units)
- Unlike other Duckling sizes, Pulse is billed based on resources consumed, not wall-clock time
- Optimized for small, bursty queries, read-heavy workloads, and frontend scenarios. [Learn more about Pulse Duckling use cases and optimizations.](/about-motherduck/billing/duckling-sizes/#pulse)
- **NOTE:** For long-running, compute-heavy queries, consider using a Standard Duckling instead - Pulse Ducklings may consume high volumes of CUs when scaling up for intensive workloads
- **Example 1 - Simple query:** A small read query with minimal compute needs:
- Query runtime: 2 seconds with low CPU usage = **2 CU seconds billed**
- **Example 2 - Mixed workload:** Running 100 small write operations:
- Each operation: ~2 CU seconds × 100 queries = **200 CU seconds billed**
- **[Standard](/about-motherduck/billing/duckling-sizes/#standard)**
- A fixed-spec Duckling, metered on a per-second basis
- It has a cooldown period of 60 seconds for billing
- Our versatile workhorse Duckling for general purpose data warehouse workloads. [Learn more about Standard Duckling use cases and optimizations.](/about-motherduck/billing/duckling-sizes/#standard)
- As an example, if you run 5 queries consecutively that **each** take 30 seconds to return results, you will be billed as follows:
- *100 ms* for startup time (average startup time)
- 30 seconds * 5 queries = *150 seconds* of total query running time
- *60 seconds* cooldown period
- **Total billed**: 210 seconds
- **[Jumbo](/about-motherduck/billing/duckling-sizes/#jumbo)**
- A larger fixed-spec Duckling, metered on a per-second basis
- It has a cooldown period of 60 seconds for billing
- Designed for faster performance on large-scale data warehouse workloads. [Learn more about Jumbo Duckling use cases and optimizations.](/about-motherduck/billing/duckling-sizes/#jumbo)
- As an example, if you run 2 queries that **each** take 8 minutes to return results, you will be billed as follows:
- *100 ms* for startup time (average startup time)
- 8 minutes * 2 queries = *16 minutes* of total query running time
- *60 seconds* cooldown period
- **Total billed**: 17 minutes or 1020 seconds
**Note:** Changing Duckling size between Pulse, Standard and Jumbo in the UI or via our [REST API](../../../sql-reference/rest-api/motherduck-rest-api) can take up to two minutes to take effect.
- **[Mega](/about-motherduck/billing/duckling-sizes/#mega)**
- A larger fixed-spec Duckling, metered on a per-second basis
- It has a cooldown period of 5 minutes for billing
- Designed for demanding jobs and data warehouse workloads with even larger scale. [Learn more about Mega Duckling use cases and optimizations.](/about-motherduck/billing/duckling-sizes/#mega)
- As an example, if you run 2 queries that **each** take 8 minutes to return results, you will be billed as follows:
- ~A few minutes for startup time (average startup time)
- 8 minutes * 2 queries = *16 minutes* of total query running time
- *5 minutes* cooldown period
- **Total billed**: 21 minutes or 1260 seconds
**Note:** Changing Duckling size to Mega in the UI or via our [REST API](../../../sql-reference/rest-api/motherduck-rest-api) can take up to **5 minutes** to take effect.
- **[Giga](/about-motherduck/billing/duckling-sizes/#giga)**
- Our largest fixed-spec Duckling, metered on a per-second basis
- It has a cooldown period of 10 minutes for billing
- Designed for the largest and toughest data warehouse workloads like batch jobs that need additional capacity to run overnight or complete over the weekend. [Learn more about Giga Duckling use cases and optimizations.](/about-motherduck/billing/duckling-sizes/#giga)
- As an example, if you run 2 queries that **each** take 5 minutes to return results, you will be billed as follows:
- ~A few minutes for startup time (average startup time)
- 5 minutes * 2 queries = *10 minutes* of total query running time
- *10 minutes* cooldown period
- **Total billed**: 20 minutes or 1200 seconds
**Note:** Changing Duckling size to Giga in the UI or via our [REST API](../../../sql-reference/rest-api/motherduck-rest-api) can take up to **10 minutes** to take effect.
#### **Compute**
**Business Plan**
| AWS Region | Pulse | Standard | Jumbo | Mega | Giga |
|---------|--------|-----------|--------|--------|--------|
| **US East (N. Virginia)** - `us-east-1` | $0.40 per CU hour | $1.80 per hour | $3.60 per hour | $10.80 per hour | [*Available on request*](https://motherduck.com/contact-us/product-expert/) |
| **Europe (Frankfurt)** - `eu-central-1` | $0.49 per CU hour | $2.20 per hour | $4.40 per hour | $13.19 per hour | [*Available on request*](https://motherduck.com/contact-us/product-expert/) |
**Lite Plan**
| AWS Region | Pulse | Standard |
|---------|------|--------|
| **US East (N. Virginia)** - `us-east-1` | $0.25 per CU hour | $1.20 per hour |
| **Europe (Frankfurt)** - `eu-central-1` | $0.31 per CU hour | $1.47 per hour |
### Storage Pricing
Under the hood, MotherDuck uses DuckDB's compression algorithms to reduce the storage footprint and optimize performance.
MotherDuck charges for data stored in its managed storage system and computes the volume using GBs per month, metered per-day (using the GB-day unit of measure) in line with the [following rates](https://motherduck.com/docs/about-motherduck/billing/pricing/#storage-pricing).
For example, if your MotherDuck Organization is in `us-east-1` and your bill for December is of 20,000 GB-days, the final bill will be computed as follows:
- 20,000 (GB-days) * 0.0025685 (price per GB-day in `us-east-1`) = $_51.37_
#### What counts towards my storage bill?
- **Standard databases:** By default, MotherDuck provides data recoverability by storing and billing for every byte inserted, modified, or deleted over the past 7 days.
- **Transient databases:** Databases can be set as `TRANSIENT` [at database creation](https://motherduck.com/docs/concepts/Storage-lifecycle/#storage-management). Transient databases are billed for active data stored and a 1 day failsafe minimum. Data is not retained as failsafe bytes beyond this minimum, which is ideal for temporary or easily reproducible datasets like intermediate job outputs.
#### What does not count towards my storage bill?
- [Shares](/key-tasks/sharing-data) do not incur additional data storage as they are a zero-copy operation.
- Using the [CREATE DATABASE X FROM DATABASE Y](/sql-reference/motherduck-sql-reference/create-database/) command is also a zero-copy operation. Only incremental changes made to the new database are added to storage.
- Any data managed by you in your own object storage bucket (e.g. S3, Blob, GCS) and that you can use to process data.
- Data on your laptop accessed via the `duckdb -ui`, even when signed into MotherDuck.
#### What changes can I make to optimize my storage bill?
The right approach to optimize storage usage in MotherDuck varies by use case and implementation. Please reach out to us at support@motherduck.com for additional guidance on how to optimize your storage effectively for your needs.
#### **Storage**
| AWS Region | Cost per Gb-day: MotherDuck-Native storage | Cost per Gb: MotherDuck-Native storage |
|---------|---------|---------------------------|
| **US East (N. Virginia)** - `us-east-1` | $0.0025685 per Gb per day | $0.08 / Gb (31-day month) |
| **Europe (Frankfurt)** - `eu-central-1` | $0.0027742 per Gb per day | $0.086 / Gb (31-day month) |
### AI Function Pricing
MotherDuck enhances your analytical capabilities with integrated AI functions. These functions leverage powerful large language models (LLMs), fine-tuned to assist with SQL tasks and unlock new OLAP use cases.
AI functions are categorized and priced as follows:
- **SQL Assistant Functions**: metered per call, with some free features.
- **Advanced AI Functions**: metered per token consumed for both input and output, priced in AI Units (1 AI Unit = $1.00).
### SQL Assistant Functions
These features, including [FixIt](https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour/#writing-sql-with-confidence-using-fixit) and [Text-to-SQL](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql/), help you write, understand, and correct SQL queries.
In the Free plan, we include a maximum of 200 calls per day for FixIt & SQL Assistant features.
| SQL Assistant Functions | Price | Unit |
| :--------------------------------------------- | :-------- | :------------ |
| FixIt | FREE | per call |
| SQL Assistant (Text-to-SQL, Explain SQL, etc.) | 1 AI Unit | for 60 calls |
### Advanced AI Functions
These functions provide access to powerful generative AI models for tasks like embedding generation and complex prompting. They are metered based on token usage, with costs calculated in AI Units (1 AI Unit = $1.00).
:::note
For Lite and Business plans, there is a default soft limit on Advanced AI Function consumption of 10 AI Units per day to help control costs. This limit can be increased or removed by contacting support@motherduck.com.
:::
**Embedding Models**
| Embedding Model Name | Price | Tokens per AI Unit |
| :------------------------------------ | :-------- | :------------------ |
| OpenAI text-embedding-3-small | 1 AI Unit | 15,000,000 tokens |
| OpenAI text-embedding-3-large | 1 AI Unit | 3,000,000 tokens |
**Generative Prompt Models**
| Provider | Model Name | Price | Input Tokens (per AI Unit) | Output Tokens (per AI Unit) | Blended Tokens (per AI Unit) |
| :------- | :--------------- | :-------- | :----------------------------- | :------------------------------ | :------------------------------- |
| OpenAI | GPT-5 | 1 AI Unit | 240,000 | 30,000 | 100,000 |
| OpenAI | GPT-5-mini | 1 AI Unit | 1,200,000 | 150,000 | 500,000 |
| OpenAI | GPT-5-nano | 1 AI Unit | 6,000,000 | 750,000 | 2,500,000 |
| OpenAI | GPT-4.1 | 1 AI Unit | 150,000 | 37,500 | 93,750 |
| OpenAI | GPT-4.1-mini | 1 AI Unit | 750,000 | 187,500 | 468,750 |
| OpenAI | GPT-4.1-nano | 1 AI Unit | 3,000,000 | 750,000 | 1,875,000 |
| OpenAI | GPT-4o | 1 AI Unit | 120,000 | 30,000 | 75,000 |
| OpenAI | GPT-4o-mini | 1 AI Unit | 2,000,000 | 500,000 | 1,250,000 |
## Incentive Programs
### Free Trial
New users who sign up for MotherDuck and create an organization automatically get access to a 21-day Free Trial without entering a credit card. [Learn how to manage your plan after the trial has ended.](/about-motherduck/billing/managing-billing/#choosing-your-billing-plan)
At any point during your Free Trial, you may choose to set up billing and become a paid customer. You may also choose the Free Plan at the end of the Free Trial. [Learn more about managing Your bill](/about-motherduck/billing/managing-billing/#choosing-your-billing-plan).
### Free Plan
At any point during a 21-day Free Trial, you may select the Free Plan for your organization. Free Plan customers are not required to set up billing or enter a credit card to use MotherDuck.
Every Free Plan organization receives access to the following each month:
- 10 Gigabytes of MotherDuck Storage
- 10 Compute Unit (CU) hours of Compute on the Pulse Duckling
- A maximum of 5 users, including service accounts
If the data volume stored in MotherDuck Storage exceeds the Free Plan limit of 10 GB, you will lose the ability to query data on MotherDuck Storage. Only `DROP` and `DELETE` SQL commands are permitted until the overage is resolved.
You may choose to resolve your Free Plan storage overages by navigating to 'Settings' -> 'Plans' in the MotherDuck Web UI and upgrading to one of our Paid Plans, Lite or Business.
### Startup Program
Qualifying startups get 50% off an annual contract on our Business Plan, in addition to the 21-day trial. No feature gating, and no hidden fees. Apply by filling out [this short form](https://motherduck.com/startups/#apply-now).
---
Source: https://motherduck.com/docs/about-motherduck/legal
---
sidebar_position: 13
title: Legal
---
## Product Terms of Service
[MotherDuck Product Terms of Service](https://motherduck.com/terms-of-service/)
[Products and Fees Addendum](https://motherduck.com/fees-addendum/)
[Acceptable Use Policy](https://motherduck.com/acceptable-use-policy/)
[Support Policy](https://motherduck.com/support-policy/)
---
Source: https://motherduck.com/docs/about-motherduck/release-notes
---
sidebar_position: 1
title: Release notes
---
# Release Notes
Welcome to our release notes, we're excited to hear about your experience 😃
:::info
💁 If you have any questions, please connect with us directly in our [Community Slack support channel](https://slack.motherduck.com/) or send a note to support@motherduck.com.
:::
## January 8, 2026
- **Giga Ducklings on Business plan:** Users on any MotherDuck Business plan can now access [Giga Ducklings](../billing/duckling-sizes/#giga), our largest compute Duckling size, built to tackle the largest, toughest, most complex data transformations. Configure your Duckling size in [Settings > Ducklings](https://app.motherduck.com/settings/ducklings).
## December 17, 2025
- **MotherDuck MCP Server:** Your favorite AI assistant can now talk directly to your data. Connect Claude, ChatGPT, Cursor, or any MCP-compatible client to MotherDuck using the MotherDuck MCP Server at `https://api.motherduck.com/mcp`. Your agent can explore schemas, run read-only SQL queries, and answer questions about your databases through natural conversation. Learn more in the [announcement blog](https://motherduck.com/blog/analytics-agents), and [MCP Server documentation](/sql-reference/mcp/).
## December 16, 2025
- **DuckDB 1.4.3:** MotherDuck supports DuckDB 1.4.3, a bugfix release. Learn more in the [official DuckDB Labs 1.4.3 announcement](https://duckdb.org/2025/12/09/announcing-duckdb-143) and [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.4.3).
- **PlanetScale Postgres integration:** Users of PlanetScale Postgres can now use [pg_duckdb](/concepts/pgduckdb/) to push analytical queries to MotherDuck. Analytical queries are accelerated up to 200x faster with MotherDuck, and keep your Postgres cluster optimized for transactions. Learn more in the [announcement blog](https://motherduck.com/blog/motherduck-planetscale-integration), and [integration documentation](/integrations/databases/planetscale).
- **MotherDuck destination for Artie CDC**: Artie now supports MotherDuck as a destination for CDC. Users of Artie can now stream changes from OLTP databases like PostgreSQL, MySQL, and MongoDB to MotherDuck in real-time. Learn more in the [announcement blog](https://motherduck.com/blog/motherduck-artie-integration/), and [Artie documentation](https://www.artie.com/docs/destinations/motherduck).
- **Recent Queries added to `MD_INFORMATION_SCHEMA`:** Organization admins on MotherDuck Business plans can now access a more realtime view of all currently running or recently completed queries across their full organization using the [`RECENT_QUERIES` view](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/). This view offers detail for queries not yet captured in the [`QUERY_HISTORY` view](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/). Both views are accessible in the [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction/).
- **New columns for query attribution in query history:** The [`QUERY_HISTORY` view](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) along with the new [`RECENT_QUERIES` view](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) in the [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction/) now contain `session_name` and `duckling_id` columns, making it easy to identify which Duckling executed each query, and group read scaling queries by [`session_hint`](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#session-affinity-with-session_hint).
- **MotherDuck Wasm SDK 0.8:** The [MotherDuck Wasm Client](https://www.npmjs.com/package/@motherduck/wasm-client) now leverages a different mechanism for loading the MotherDuck Wasm extension, which makes it easier to control which version of the extension is loaded. Refer to the [documentation](/sql-reference/wasm-client/) to learn more.
## December 12, 2025
- **Query Scheduling Improvements:** Small queries now complete faster without getting stuck waiting behind large, resource-intensive queries, even when heavy queries are processing in the background
- **Search Enhancements:** The search bar in the top left pane of the Object Explorer can now be used to search for schemas, tables, and columns in addition to databases and [shares](https://motherduck.com/docs/key-tasks/sharing-data/)
- **Dvorak Keyboard Support:** Dvorak keyboard shortcuts are now supported in the MotherDuck UI
- **Column Comments added to the Table Summary:** In the Table Summary, users can now hover over any column name to view its comments alongside the column name and type
## December 4, 2025
- **Transient storage filter in Settings:** The Databases page in Settings in the MotherDuck UI now supports filtering by [storage type](https://motherduck.com/docs/concepts/Storage-lifecycle/#transient-databases)
- **`DESCRIBE` and `SUMMARIZE` exports:** Downloading the results of `DESCRIBE` and `SUMMARIZE` queries is now supported in the MotherDuck UI
- **DuckLake option in the Add Database menu:** MotherDuck users can create a new [DuckLake](https://motherduck.com/docs/integrations/file-formats/ducklake/) in the 'Add Database' modal in the left hand pane of the object explorer in the MotherDuck UI
Your browser does not support the video tag.
- **Inline Docs are now available in the Query Editor:** Notebook cells in the MotherDuck [query editor](https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour/#getting-sql-function-help-with-inline-docs) provide function information on hover, showing function signatures, parameter types, return types, and descriptions without leaving the notebook. Inline Docs can be toggled on and off by going to the Preferences page in Settings.
## November 14, 2025
- **DuckDB 1.4.2:** MotherDuck supports DuckDB 1.4.2, a bugfix release. Learn more in the [official DuckDB Labs 1.4.2 announcement](https://duckdb.org/2025/11/12/announcing-duckdb-142) and [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.4.2).
- **Full command menu now at `Cmd/Ctrl+K`:** Access common MotherDuck UI actions from your keyboard, including generating query edits, adding notebook cells, creating notebooks, and navigating between pages. Open the command menu with `Cmd/Ctrl+K` and search for options. For quick access to [generate query edits](../../key-tasks/ai-and-motherduck/ai-features-in-ui#automatically-edit-sql-queries-in-the-motherduck-ui), use `Cmd/Ctrl+Shift+E`. (Note: `Cmd/Ctrl+Shift+P` no longer opens the command menu.)
- **Run queries across multiple notebooks:** You can now run cells across multiple MotherDuck UI notebooks, and allowing each to queue and run. Hover over any notebook in the left sidebar to see how many cells are running or queued. Query cancellation is also more reliable across all notebooks.
## November 6, 2025
- **MotherDuck extends cloud coverage to Europe:** MotherDuck is now [available on AWS in Frankfurt `eu-central-1`](https://motherduck.com/docs/concepts/architecture-and-capabilities/#the-motherduck-cloud-service); users are able to create new Organizations in Europe for lower latency and regional data residency
- **Expanded AI functions support for `PROMPT()`:** The [prompt](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/prompt/) function now supports additional parameters; MotherDuck users can now interact with Large Language Models (LLMs) directly from SQL with more customization and improved support for struct arrays, timestamps, and date and time values -
- **`return_type`:** Generate strongly-typed outputs by specifying the exact SQL type to return
- **`reasoning_effort`:** Use GPT-5 models with the prompt() function
- **MotherDuck Wasm SDK 0.7.0:** The [MotherDuck Wasm Client](https://www.npmjs.com/package/@motherduck/wasm-client) now supports `attach_mode='single'`, simplifying query execution and improving resource predictability when working with a single database. Refer to the [documentation](https://motherduck.com/docs/sql-reference/wasm-client/) to learn more.
- **Usernames added to Database listings in Settings:** MotherDuck Admins can now see the usernames for human users and service accounts on the Databases page in [Settings](https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour/#settings) for more intuitive lookups
- **New export options for `EXPLAIN`:** MotherDuck notebook cells now support copying or exporting [`EXPLAIN` results](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/explain/) to simplify query inspection
- **Enhanced Column Explorer experience for UUIDs:** The [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/) now has added support for UUIDs and fields that default to top-N values for improved column-level insights and schema exploration. Refer to the [documentation](https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour/#diving-into-your-data-with-column-explorer) to learn more.
## October 24, 2025
- **Duplicate MotherDuck notebook cells:** Duplicate cells in MotherDuck UI notebooks using the cell options menu or command menu. Access the duplicate option from the three-dot options menu on any cell, or use `Cmd/Ctrl + Shift + P` to open the command menu and search for "duplicate."
## October 9, 2025
MotherDuck now supports DuckDB versions 1.4.0 and 1.4.1, and DuckLake version 0.3 🎉
DuckDB 1.4 delivers performance gains with improvements like a rewritten sorting engine, more efficient small writes, and new SQL syntax including the MERGE statement. Learn more in the DuckDB [1.4.0](https://github.com/duckdb/duckdb/releases/tag/v1.4.0) and [1.4.1](https://github.com/duckdb/duckdb/releases/tag/v1.4.1) changelogs.
Performance improvements
- **[Sorting is 2x+ faster:](https://github.com/duckdb/duckdb/pull/17584)** Complete rewrite of sorting uses less memory and scales better across threads for ORDER BY, window functions, and list sorting
- **[More efficient small writes:](https://github.com/duckdb/duckdb/pull/18829)** Appending small numbers of rows now writes far fewer bytes
- **[5x faster checkpointing:](https://github.com/duckdb/duckdb/pull/18390)** Reuses table metadata when tables aren't altered during checkpoint
- **[Parallel connection creation:](https://github.com/duckdb/duckdb/pull/18079)** Connections from instance cache can be created in parallel
- **[Faster scalar functions on dictionary data:](https://github.com/duckdb/duckdb/pull/18127)** Functions on dictionary-compressed data only run once per unique value
SQL syntax updates
- **[`MERGE INTO` statement:](https://github.com/duckdb/duckdb/pull/18135)** Standard SQL upserts without requiring primary keys or indexes
- **[`FILL()` window function:](https://duckdb.org/2025/09/16/announcing-duckdb-140.html#fill-window-function)** Interpolate missing values in ordered data
- **[Python-style macro arguments:](https://github.com/duckdb/duckdb/pull/18684)** Macros accept positional or named arguments for any parameter
- **[`STRUCT` to `MAP` cast:](https://github.com/duckdb/duckdb/pull/17799)** Direct casting between struct and map types
Parquet improvements
- **[`VARIANT` type reading:](https://github.com/duckdb/duckdb/pull/18187)** Read Parquet `VARIANT` types for faster semi-structured data processing
- **[Native geometry type writes:](https://github.com/duckdb/duckdb/pull/18832)** Write native Parquet geometry types
- **[Auto-globbing for directories:](https://github.com/duckdb/duckdb/pull/18760)** Automatically treats paths as directories and retries with glob patterns when no file is found
Learn more in the official DuckDB Labs announcements for [1.4.0](https://duckdb.org/2025/09/16/announcing-duckdb-140.html) and [1.4.1](https://duckdb.org/2025/10/07/announcing-duckdb-141.html).
While you can continue using your current version of DuckDB with MotherDuck, we encourage you to [upgrade your DuckDB clients to 1.4.1](https://duckdb.org/install) as soon as you can to take advantage of the fixes and performance improvements.
[Preview] DuckLake 0.3
As we announced earlier this year, MotherDuck now supports [DuckLake](https://ducklake.select), an integrated data lake and catalog format. DuckLake 0.3 makes working with DuckLake more robust, including [`CHECKPOINT` for easy maintenance](https://github.com/duckdb/ducklake/pull/406), new paths for Iceberg interoperability, [spatial geometry types](https://github.com/duckdb/ducklake/pull/412), and [`MERGE INTO` support](https://github.com/duckdb/ducklake/pull/351).
Learn more about using DuckLake databases in MotherDuck in the [documentation](/integrations/file-formats/ducklake), and the recent improvements in the [DuckDB Labs announcement for DuckLake 0.3](https://ducklake.select/2025/09/17/ducklake-03/).
## September 30, 2025
- **Get help from MotherDuck Experts:** Get a human helping hand with technical questions, troubleshooting, and best practices directly in the MotherDuck UI. Open "Expert help" from the Help menu to talk with our team, and you'll be notified of responses. Expert help is available with Business and Lite plans.
- **Transient option for database storage retention:** Databases can now be created with transient retention, which provides a minimal retention period and no failsafe storage. This option can be useful for intermediate datasets or data easily reconstructed from external sources. Create transient databases in the UI or via [`CREATE DATABASE db_name (TRANSIENT)`](../../sql-reference/motherduck-sql-reference/create-database/#syntax). Transient databases are available with Business and Lite plans. Learn more in the [storage management documentation](/concepts/Storage-lifecycle/#storage-management).
- **Duplicate notebooks:** Copy existing SQL notebooks to reuse query templates or create variations of your analysis. Find the duplicate option in any notebook's options menu in the left sidebar.
- **Monitor database storage in the MotherDuck UI:** Organization admins can now review database storage metrics in the updated [Databases](https://app.motherduck.com/settings/databases) page, showing current and cumulative database storage footprint over time. Learn more in the [storage lifecycle documentation](/concepts/Storage-lifecycle/#breaking-down-storage-usage).
## September 10, 2025
- **Instances are now called Ducklings:** We've updated our name for instances to better reflect their purpose as dedicated and scalable DuckDB instances that provide isolated, on-demand compute for each user's analytics workload in MotherDuck. Find the familiar instance controls now in [Settings > Ducklings](https://app.motherduck.com/settings/ducklings). This release does not affect the [Admin REST API methods for instances](../../sql-reference/rest-api/motherduck-rest-api/). Learn more about how [Ducklings](../billing/duckling-sizes/) are different from standard data warehouse instances in [this blog post](https://motherduck.com/blog/scaling-duckdb-with-ducklings/).
- **Rename Notebooks from the Object Explorer:** SQL notebooks can now be renamed directly from the left sidebar using a notebook's options menu.
- **Enum support in `prompt` function:** The `PROMPT` SQL function now supports enum types for consistent classification outputs. See the [function documentation](../../sql-reference/motherduck-sql-reference/ai-functions/prompt/#classification-with-enums) for details and examples.
- **Command menu in the MotherDuck UI:** Navigate the MotherDuck UI from your keyboard using the new command menu. Quickly access common actions like adding notebook cells, creating notebooks, and navigating between pages. Try it out with "Open command menu" in the top-left Organization dropdown, or use `Cmd/Ctrl + Shift + P`
Your browser does not support the video tag.
## September 4, 2025
- **Pre-filled names for service accounts and tokens:** When creating service accounts and tokens in the [Settings > Service Accounts](/key-tasks/service-accounts-guide/) page, names are now pre-filled with the following format to help differentiate between them:
- _Service Accounts:_ `{creator_username}_service_account_{number}`
- _Read-Write Tokens:_ `{sa_username}_read_write_token_{number}`
- _Read-Scaling Tokens:_ `{sa_username}_read_scaling_token_{number}`
- **DuckLake database icon in the MotherDuck UI:** [DuckLake-backed databases](/concepts/ducklake/) now display a distinct icon to easily distinguish them from databases using MotherDuck native storage.
## August 21, 2025
- **Support for H3 Spatial Indexing Extension:** MotherDuck now supports the [H3 DuckDB Extension](https://duckdb.org/community_extensions/extensions/h3.html), which adds support for the [H3 hierarchical hexagonal grid system](https://h3geo.org/) for geospatial analysis. This extension is pre-installed in MotherDuck, and users are not required to install this extension.
## August 13, 2025
- **GPT 5 Support in `prompt` function**: The `PROMPT` function now supports OpenAI's GPT 5 series models. Refer to the [function documentation](../../sql-reference/motherduck-sql-reference/ai-functions/prompt/) for more details.
## August 12, 2025
- **Display Preformatted VARCHAR values:** VARCHAR results in the MotherDuck UI data value pane now support display of preformatted text.
- **Format SQL in MotherDuck Notebook:** Format any SQL statement using the new **Format** button in the notebook cell options menu, or with `Option/Alt + Cmd/Ctrl + O`. When text is selected, only the selection is formatted.
Your browser does not support the video tag.
## August 8, 2025
- **Test S3 Credentials:** MotherDuck users can now test S3 credentials directly in the MotherDuck UI on the Secrets page in Settings when adding new S3 secrets.
- **Support for DuckDB Configuration Options:** With this release, MotherDuck now correctly respects [DuckDB configuration options](https://duckdb.org/docs/stable/configuration/overview.html) and their local defaults, including extension settings like TimeZone. Broader coverage of additional configuration options is planned for the upcoming [DuckDB 1.4 release](https://duckdb.org/release_calendar.html).
## July 31, 2025
- **Updated FixIt Keyboard Shortcut:** The `Escape` key can now be used to reject [FixIt](https://motherduck.com/docs/key-tasks/ai-and-motherduck/ai-features-in-ui/#automatically-fix-sql-errors-in-the-motherduck-ui) suggestions, providing a quicker way to dismiss generated SQL fixes.
- **Generate Notebook Names:** Get descriptive, context-aware names for notebooks in the MotherDuck UI based on their SQL content. Click the new "Generate name from SQL" button to the left of a notebook's name to try it out. Available for users in MotherDuck's Business and Lite plans.
Your browser does not support the video tag.
## July 25, 2025
- **Data Grid UX Improvements:** Data grids now include row numbers to make it easier to explore query results and reference specific rows. Users can now select multiple rows by clicking row numbers with the shift-key modifier.
Your browser does not support the video tag.
- **New UX for FixIt:** [FixIt](https://motherduck.com/docs/key-tasks/ai-and-motherduck/ai-features-in-ui/#automatically-fix-sql-errors-in-the-motherduck-ui) now includes keybindings for the toggles to accept and reject suggestions and turn automatic suggestions on and off.
- **`Cmd/Ctrl + Enter`:** Accept suggestion and run query
- **`Cmd/Ctrl + Shift + Backspace`:** Reject suggestion
Your browser does not support the video tag.
## July 16, 2025
- **NEW - Larger Compute Instances:** MotherDuck now offers two new memory-rich compute duckling (instance) types, **Mega** and **Giga**, built to run at high-capacity for the largest, most demanding jobs. Learn more in the [launch blog](https://motherduck.com/blog/announcing-mega-giga-instance-sizes-huge-scale) and [Docs](https://motherduck.com/docs/about-motherduck/billing/duckling-sizes/).
## July 14, 2025
- **DuckDB 1.3.2:** MotherDuck supports DuckDB 1.3.2, a bugfix release. Additional details are available in the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.3.2).
- **The Settings Button has Moved to the Org Dropdown:** Settings has moved from the left sidebar into the Organization dropdown at the top left for easier access and a cleaner layout.
- **Admin Experience Enhancements:** With this week’s release, MotherDuck organization admins now have the flexibility to do more to manage their Org directly from the MotherDuck UI due to better visibility and admin-specific functionality for managing tokens, service accounts, and storage.
- **New Service Accounts Page in Settings:** Organization admins can now view, create, and manage service accounts and service account tokens in the [Service Accounts](/key-tasks/service-accounts-guide/) section of MotherDuck settings.
- **Impersonation of Service Accounts:** Organization admins can now temporarily [impersonate a service account](/key-tasks/service-accounts-guide/#impersonate-service-accounts-ui-only) while using the MotherDuck UI.
- **Storage Usage History added to `MD_INFORMATION_SCHEMA`:** Organization admins can now access up to 30 days of historical storage data using the [`STORAGE_INFO_HISTORY` view](/sql-reference/motherduck-sql-reference/md_information_schema/storage_info/) in the [`MD_INFORMATION_SCHEMA`](/docs/sql-reference/motherduck-sql-reference/md_information_schema/introduction/). Each record includes a `result_ts` timestamp showing when the storage metrics were calculated.
## July 01, 2025
**[Preview] DuckLake Support**: MotherDuck now supports [DuckLake](https://ducklake.select), an integrated data lake and catalog format.
- MotherDuck currently provides two options for creating and integrating with DuckLake databases:
- **Fully managed**: MotherDuck manages both data storage and metadata
- **Bring your own bucket**: Connect your S3-compatible object storage with options for:
- MotherDuck compute + MotherDuck catalog
- Bring-your-own-compute (BYOC) + MotherDuck catalog
Learn more in the [documentation](/integrations/file-formats/ducklake) and [announcement blog](https://motherduck.com/blog/announcing-ducklake-support-motherduck-preview/).
## June 26, 2025
- **Chat Widget Optimization:** Users can now view their inline edit history in a more compact chat widget and quickly request follow-up changes when needed.
-
- **Improved Boolean cell styling:** Boolean values in the data grid now have distinct visual weights to make it easier to visually scan result sets and prevent confusion with empty cells.
## June 18, 2025
- **DuckDB 1.3.1:** MotherDuck supports DuckDB 1.3.1, a bugfix release. Additional details are available in the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.3.1).
- **`PIVOT` statements in MotherDuck UI:** The MotherDuck UI now supports [`PIVOT` statements](https://duckdb.org/docs/stable/sql/statements/pivot.html), with pivot columns also appearing in the Column Explorer. `PIVOT` transforms distinct column values into separate columns with aggregated data.
- **New `STORAGE_INFO` View in `MD_INFORMATION_SCHEMA`:** Organization admins can now review detailed storage breakdowns per database using the new [`STORAGE_INFO` view](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/storage_info/) in the [`MD_INFORMATION_SCHEMA`](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/introduction/).
## June 12, 2025
- **Improved query execution UX:** After 5 seconds, the run button now displays a timer showing how long the query has been running. It also offers clearer visual cues for canceling a query on mouseover and focus.
Your browser does not support the video tag.
## June 5, 2025
- **Overwrite a database with a zero-copy clone:** The new [`COPY FROM DATABASE (OVERWRITE)` command](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/copy-database-overwrite/) replaces all data in the target database with the source’s contents in a single atomic operation, waiting for active writes to finish and blocking new ones during the process.
- **Copy SQL definitions for views from the Object Explorer:** The dropdown menu for views in the left-hand panel of the MotherDuck UI now lets you copy the associated SQL definition without opening the table summary.
Your browser does not support the video tag.
## May 29, 2025
MotherDuck now supports DuckDB version 1.3.0 🎉
DuckDB 1.3.0 improves performance in real-world scenarios for faster queries, new SQL syntax, and smarter Parquet file handling. Learn more in the [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.3.0) here.
Parquet improvements
- **[New `TRY()` expression for safer queries:](https://duckdb.org/2025/05/21/announcing-duckdb-130.html#try-expression)** More graceful handling for bad data by returning `NULL` instead of an error on problematic rows
- **[Pushdown of arbitrary expressions into scans:](https://github.com/duckdb/duckdb/pull/16430)** Reductions in unnecessary data processing to deliver up to 30x faster queries
- **[Pushdown of inequality conditions into joins:](https://github.com/duckdb/duckdb/pull/16508)** Major speedups for incremental dbt models and join-heavy queries
SQL syntax updates
- **[Python-style lambda syntax:](https://github.com/duckdb/duckdb/pull/17235)** You can now use `lambda x: x + 1` instead of `x -> x + 1`; the old syntax is deprecated, but still supported.
- **[`cast_to_type()` function:](https://github.com/duckdb/duckdb/pull/17209)** Dynamically cast values to match column types - useful in generic expressions and `CASE` statements when writing macros.
- **[Recursive JSON access:](https://github.com/duckdb/duckdb/pull/17406)** New `json_each()` and `json_tree()` functions make it easier to traverse nested JSON structures.
- **[Struct field updates:](https://github.com/duckdb/duckdb/pull/17003)** Individual fields in structs can now be modified using `ALTER`; all fields are rewritten even if only one is updated.
- **[Prepared statements metadata:](https://github.com/duckdb/duckdb/pull/16541)** The `duckdb_prepared_statements()` function returns all prepared statements in the session.
- **[More flexible type definitions:](https://github.com/duckdb/duckdb/pull/17404)** Support has been added for `CREATE OR REPLACE TYPE`, `CREATE TYPE IF NOT EXISTS`, and `CREATE TEMPORARY TYPE`.
- **[Preserved order for `OR` filters:](https://github.com/duckdb/duckdb/pull/17180)** Execution now preserves the order of clauses in `WHERE` conditions using `OR`.
- **[Function alias visibility:](https://github.com/duckdb/duckdb/pull/16600)** `duckdb_functions()` now returns aliases in addition to the function outputs.
Parquet improvements
- **[Late materialization:](https://github.com/duckdb/duckdb/pull/17036)** Queries are 3–10x faster with `LIMIT` due to deferred column loading
- **[~15% average speedup on reads:](https://github.com/duckdb/duckdb/pull/16595)** New scan and filter efficiency improvements
- **[30%+ faster write throughput:](https://github.com/duckdb/duckdb/pull/17061)** Improved multithreaded export performance
- **[Better compression for large strings:](https://github.com/duckdb/duckdb/pull/17164)** Large string values are now dictionary-compressed
- **[Smarter rowgroup combining:](https://github.com/duckdb/duckdb/pull/17118)** Files are more efficient due to merging small rowgroups at write time
Learn more in the official [DuckDB Labs 1.3.0 announcement](https://duckdb.org/2025/05/21/announcing-duckdb-130.html).
While you can continue using your current version of DuckDB, we encourage you to [upgrade your DuckDB clients to 1.3.0](https://duckdb.org/docs/installation/?version=stable&environment=cli&platform=macos&download_method=package_manager) as soon as you can to take advantage of the fixes and performance improvements.
Additional updates from this release are outlined below -
- Query results now display in a redesigned table that delivers enhanced performance when viewing and exploring data - column headers now include type information for better context. Additional table functionality, including sorting and filtering of results, is coming in future releases.
## May 22, 2025
- **Faster queries on complex filters and wide tables:** We've significantly boosted performance for queries with IN filters, selective joins, and LIMIT clauses. Expect noticeable speedups on wide tables or those with large string or JSON columns.
- **New keybindings for power users:**
- Toggle Instant SQL for the current SQL cell: `cmd/ctrl+shift+.`
- Toggle Object Explorer: `cmd/ctrl+b`
- Toggle Inspector (Column Explorer): `cmd/ctrl+i`
- Toggle worksheet mode for the current SQL cell: `cmd/ctrl+e`
- **Org-wide Active Accounts:** Organization admins can now view all active accounts and their associated ducklings in the [Active Accounts](https://app.motherduck.com/settings/active-accounts) section of MotherDuck settings.
- **Smarter Instant SQL caching:** Instant SQL now accounts for filters in your WHERE clause when building its cache, offering a greater number of relevant rows as you work.
- **Full row count in flat table results:** SQL cells now display a full result row count when viewing results in "flat" table mode.
- **GPT 4.1 Support in `prompt` function**: The `PROMPT` function now supports OpenAI's GPT 4.1 series models. Refer to the [function documentation](../../sql-reference/motherduck-sql-reference/ai-functions/prompt/) for more details.
## May 16, 2025
- **Multiple SQL statements now supported in Instant SQL:** Execute individual statements within multi-statement SQL cells by clicking on the desired statement while [Instant SQL](https://motherduck.com/blog/introducing-instant-sql/) is enabled.
- **Copy Table Names directly from Object Explorer:** Use the options menu on any table in the Object Explorer to copy its name to your clipboard. Paste exact table references into any SQL editor—eliminating typos and saving time when writing queries.
## May 1, 2025
- **Window Functions in Instant SQL:** MotherDuck now offers improved window function support in [Instant SQL](https://motherduck.com/blog/introducing-instant-sql/)
- **Copy-Paste in the Object Explorer:** Detailed comments and error messages are now able to be copied and pasted in one-click within Object Explorer tooltips
## April 23, 2025
- **[Preview] AI-powered SQL editing:** MotherDuck users can now access inline AI-powered SQL suggestions within the MotherDuck UI. To try it out, select SQL text in your notebook, use `cmd/ctrl+shift+e` to generate a SQL query or edit by writing an instruction in plain language.
- **[Preview] Introducing Instant SQL:** A new way to write SQL that updates your result set as you type to expedite query building and debugging – all with zero-latency, no run button required. Read more about Instant SQL in the [MotherDuck Blog](https://motherduck.com/blog/introducing-instant-sql/).
Your browser does not support the video tag.
## April 22, 2025
- **Increasing read scaling replica maximum:** MotherDuck Business Plan users can now set a Read Scaling replica pool size of up to 16 database replicas that can be read concurrently. When connecting with a read scaling token, each concurrent end user connects to a read scaling replica of the database that is served by its own duckling. Refer to the [documentation](../../key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for more details.
- **Fresh new look for SQL notebook cells:** The run button, database selection, and other cell options have moved, making more space to focus on your SQL.

## April 17, 2025
- **Now in Preview:** Organization admins on MotherDuck's Business plan can now use the [`QUERY_HISTORY` view](../../sql-reference/motherduck-sql-reference/md_information_schema/query_history/) to get a consolidated view of all queries run across their full organization.
- **Org-wide Databases and Shares:** Organization admins can now view their Organization's [Databases](/sql-reference/motherduck-sql-reference/create-database/) and [Shares](/sql-reference/motherduck-sql-reference/create-share/) in the updated Settings section of the MotherDuck web UI.
- **Txt Files can now be uploaded:** MotherDuck users can now upload txt files to their MotherDuck organization.
## April 10, 2025
- MotherDuck supports DuckDB 1.2.2, a bugfix release. More details in the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.2.2).
- We've updated MotherDuck's timezone handling to use `UTC` as the default, replacing the prior `America/New_York` default. When converting values to the "[Timestamp with Time Zone](https://duckdb.org/docs/stable/sql/data_types/timestamp.html#time-zones)" type, UTC will now be applied by default. A custom timezone for the active connection can be set temporarily using the `SET TimeZone = '';` command ([see available timezone values](https://duckdb.org/docs/stable/sql/data_types/timezones.html)). Your DuckDB client's local timezone will still be used for other time-related query operations. For more details on DuckDB's timezone handling, see the [DuckDB Time Zone documentation](https://duckdb.org/docs/stable/sql/data_types/timestamp.html#time-zone-support).
- MotherDuck users can now specify an alias when [attaching a SHARE](https://motherduck.com/docs/key-tasks/sharing-data/sharing-overview/). Refer to the [documentation]([ATTACH](/sql-reference/motherduck-sql-reference/attach/) for more information and reach out to us in our [Community Slack](https://slack.motherduck.com) if you have any questions or feedback.
## April 3, 2025
- **Access Control for Shares**: MotherDuck users can now create shares with a [RESTRICTED](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-share/#access-clause) access setting, allowing share owners to precisely control access by granting or revoking permissions for individual MotherDuck users or a list of specified users through [GRANT](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/grant-access/) and [REVOKE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/revoke-access/) commands. When first created, a [RESTRICTED](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-share/#access-clause) share is only accessible by the share owner.
- **Manual Data Refresh for Read-Scaling Replicas**: MotherDuck users can now update data more frequently on [read-scaling replicas](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) by using the [CREATE SNAPSHOT OF](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-snapshot/) function to manually trigger snapshot creation, followed by [REFRESH DATABASE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/refresh-database/) on the read-scaling replica. This provides access to the freshest data without waiting for automatic updates. Note that manual snapshot creation will hold any new write queries on the read-write database from starting in order to able take the snapshot.
## March 20, 2025
- Users can now search & filter for notebooks, databases, and shares in the left sidebar with our object search in the top left navigation.
- Introducing performance improvements to the databases section of the sidebar: The attached databases section now scales efficiently to handle very large numbers of databases, schemas, and tables.
Your browser does not support the video tag.
## March 6, 2025
- MotherDuck now supports Indexes for query acceleration, in addition to their use in constraints. Learn more about DuckDB Indexes [here](https://duckdb.org/docs/stable/guides/performance/indexing.html#art-index-scans).
- MotherDuck supports DuckDB 1.2.1, a bugfix release. More details in the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.2.1).
- Support for DuckDB versions 0.10.2, 0.10.3, and 1.0.0 has ended.
- Introducing a smoother local file experience: Persist files across sessions, view metadata directly in the Object Explorer, and convert files to tables.
Your browser does not support the video tag.
## February 19, 2025
- Added [EXPLAIN ANALYZE](https://duckdb.org/docs/guides/meta/explain_analyze) support for profiling hybrid queries.
- Added a "Running Queries" page in settings to monitor active long-running queries.
## February 11, 2025
With today's release, we're introducing a number of features to support businesses building production-grade analytics. See [blogpost](https://motherduck.com/blog/introducing-motherduck-for-business-analytics/) for more details.
**New Plan Options:**
MotherDuck now has two platform plans to choose from, **Lite** and **Business**, alongside our **Free** Plan.
* The **Free Plan** is designed for hobbyists and experimenters with small-scale analytics needs, like hobby projects.
* The **Lite Plan** is most useful for small team use cases and individuals. Maybe your small team is building out some early analytics, or your hobby project is growing into something more.
* The **Business Plan** is ideal for businesses with complex needs, and larger teams. New Instance type options:
**[New Instances](https://motherduck.com/docs/about-motherduck/billing/duckling-sizes/) and compute pricing options:**
_**Pay Per Instance**_: We're adding new choices for MotherDuck compute, with Pay Per Instance **Standard** and **Jumbo** instances.
* The _Pay Per Instance_ model is based on uptime, which provides more predictable costs you can compare to other data warehouse products.
* The **Standard** instance is great for everyday tasks, and balanced performance.
* The **Jumbo** instance is often useful for heavy workflows, like batch ETL pipelines or complex transformations.
* When you run a query, your instance spins up within milliseconds.
* You pay for the seconds that the instance is running, with a minimum of one minute.
_**Pay Per Query**_: Our existing instances are now called **Pulse**.
* These instances are capped in size, however they are billed on our existing _Pay Per Query_ model, metered for billing on Compute Unit seconds.
* The **Pulse** instance enables lightweight, fully serverless analytics.
* This can be very useful for applications where you have data partitioned by user, ad-hoc query execution, or incremental data processing with smaller data sizes.
**Read Scaling Controls:**
* Users with access to [Read Scaling](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling) in their organization can now set the Read Scaling replica pool size, letting you control the maximum concurrency threshold for your read replicas.
* Users can set their Read Scaling [Instance type](https://motherduck.com/docs/about-motherduck/billing/duckling-sizes/) indepdently of the Read/Write Instance type.
## February 6, 2025
MotherDuck supports DuckDB's newly released version 1.2.0 🎉
DuckDB 1.2.0 is packed with improvements that make using MotherDuck even easier, like a better CSV reader, friendlier SQL, and improved performance!
Read more about DuckDB 1.2.0 in the [MotherDuck Blog](https://motherduck.com/blog/announcing-duckdb-12-on-motherduck-cdw), and review the official [DuckDB Labs 1.2.0 announcement](https://duckdb.org/2025/02/05/announcing-duckdb-120.html) for notes on breaking changes and detailed updates.
## January 8, 2025
- MotherDuck clients now verify the server's TLS certificate.
- MotherDuck now automatically opens the browser to facilitate authentication in Windows environments.
## December 12, 2024
- [Preview] Introducing MotherDuck's REST API: Organizations with large numbers of users have struggled to manage them through the MotherDuck UI. We've received requests for a programmatic interface, and we've listened! We are launching a User Management REST API to provide support for managing Users and Access Tokens. Through the REST API, MotherDuck users can now easily create separate users for BI or data ingestion/processing workloads, and enable new experiences for app developers (ie. issuing temporary short-lived read-scaling tokens). See [the documentation](documentation/sql-reference/rest-api/motherduck-rest-api.info.mdx) for more information and reach out to us in our community Slack channel if you have any questions or feedback!
## December 4, 2024
- [Preview] Introducing support for read scaling: With the launch of read scaling tokens, MotherDuck accounts now support scaling up to 4 replicas of your database that can be read concurrently. When connecting with a read scaling token, each concurrent end user connects to a read scaling replica of the database that is served by its own duckling. See [our documentation](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for more information.
- Auto-sync of new and deleted attachments: users who connect to MotherDuck through two different clients concurrently (example: the DuckDB CLI and the MotherDuck UI), will now see changes made by one client in another. For example, if you create a new database in the CLI, the MotherDuck UI will automatically be updated to reflect it and vice versa. Similarly, a new attachment, detaching, or database deletion will be synced.
- Create databases directly from object explorer. Users can now create a new attached database from the Object explorer panel on the left side of the MotherDuck web UI. Previously you could only do so by issuing an SQL command.
## November 21, 2024
- Introducing the **Table Summary**. Customers have told us that they love the Column Explorer, but they wish there was an easy way to see it for tables in their database lists without having to write SQL. So we decided to build the table summary. You can activate it by clicking on a table or view in the Object Explorer, which will reveal a panel that shows the Column Explorer (the column names, types, distributions, and null percentages for the selected table or view). You can get a quick preview of the table preview and see the DDL statement that defines it. We're excited to see how you use it!
- **A resizable, responsive Column Explorer**. To make the table summary work well, we made the Column Explorer both resizable and responsive. This also means the inspector – the right side panel that expands and shows the Column Explorer for your result sets – can be resized. As the panel gets smaller, we responsively hide the null percentage and the distribution plots, giving more room for the column name.
- Introducing the **[MD_INFORMATION_SCHEMA](documentation/sql-reference/motherduck-sql-reference/md_information_schema/introduction.md)**. The MotherDuck MD_INFORMATION_SCHEMA views are read-only, system-defined views that provide metadata information about your MotherDuck objects. The current views that you can query to retrieve metadata information are: databases, owned_shares, and shared_with_me.
## November 7, 2024
- MotherDuck now supports DuckDB 1.1.3 clients, a bugfix release. More info on the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.1.3).
- DuckDB recently [introduced a change](https://github.com/duckdb/duckdb/pull/13372) that would allow for much more efficient concurrent bulk ingestion. We completed the necessary infrastructure changes, plus collaborated on [some bug fixes](https://github.com/duckdb/duckdb/pull/14467) and that optimization is now enabled on our backends.
## October 31, 2024
- Motherduck introduces `Admin` and `member` roles for organizations. `Admin` users can change the roles of other users in the organization or [Remove](documentation/key-tasks/managing-organizations/managing-organizations.mdx#removing-users) a user from the organization.
- MotherDuck & Hydra announced the first release of [pg_duckdb](https://github.com/duckdb/pg_duckdb), a PostgreSQL extension that allows you to run DuckDB (and connect to MotherDuck!) within PostgreSQL. Read more about it [here](https://motherduck.com/blog/pgduckdb-beta-release-duckdb-postgres/)
## October 17, 2024
- MotherDuck now supports DuckDB 1.1.2 clients, a bugfix release. More info on the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.1.2).
## October 14, 2024
- Shares now support [auto-updating](documentation/sql-reference/motherduck-sql-reference/create-share.md). Automatically updated shares no longer require running explicit UPDATE SHARE commands. Instead changes on the underlying database are automatically published to the share within at most 5 minutes, after writes have completed. However, the option for manually updating shares remains available and continues to be the default setting. This allows users who prefer finer control over their update lifecycle to maintain their usual workflow. The auto-updating property is defined at share creation time, and share owners can force an explicit update any time on both types of shares by running [`UPDATE SHARE`](documentation/sql-reference/motherduck-sql-reference/update-share.md).
## October 9, 2024
We are excited to introduce a new SQL [prompt](/documentation/sql-reference/motherduck-sql-reference/ai-functions/prompt.md) function, currently in preview, that enables text generation directly within SQL queries. This feature leverages LLMs to process and generate text based on provided prompts.
Features:
* Generate SQL: Use the prompt function in your SQL queries to request text generation, for example, `SELECT prompt('Write a poem about ducks');`.
* Model Selection: Specify the LLM model type with the model parameter. Available models include `gpt-4o-mini` (default) and `gpt-4o-2024-08-06`.
* Structured Outputs: Opt for structured responses using the struct or json_schema parameters to tailor the output format to your needs.
Checkout more snippets [here](/documentation/sql-reference/motherduck-sql-reference/ai-functions/prompt.md#text-generation).
## October 2, 2024
- MotherDuck now supports [monitoring](documentation/sql-reference/motherduck-sql-reference/connection-management/monitor-connections.md) and [interrupting](documentation/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections.md) server-side queries.
- Various stability and usability improvements.
## September 25, 2024
- MotherDuck now supports DuckDB 1.1.1, a bugfix release. More info on the changelog [here](https://github.com/duckdb/duckdb/releases/tag/v1.1.1).
- In the MotherDuck Web UI, users can easily view and copy the contents of a cell from their query results.
Your browser does not support the video tag.
## September 16, 2024
MotherDuck now supports DuckDB version 1.1.0. 🎉
This releases includes a number of new features and a lot of performance improvements.
Here are some non-exhaustive key updates:
**New features**
- [SQL variables](https://duckdb.org/2024/09/09/announcing-duckdb-110#friendly-sql)
- [Query an query_table functions](https://duckdb.org/2024/09/09/announcing-duckdb-110#query-and-query_table-functions)
- [GeoParquet (Spatial extension features)](https://duckdb.org/2024/09/09/announcing-duckdb-110#spatial-features)
**Performance improvements**
- [Dynamic Filter Pushdown from Joins](https://duckdb.org/2024/09/09/announcing-duckdb-110#dynamic-filter-pushdown-from-joins)
- [Automatic CTE Materialization](https://duckdb.org/2024/09/09/announcing-duckdb-110#automatic-cte-materialization)
- [Parallel Streaming Queries](https://duckdb.org/2024/09/09/announcing-duckdb-110#automatic-cte-materialization)
Read more on [DuckDB's 1.1.0 blog](https://duckdb.org/2024/09/09/announcing-duckdb-110.html).
## September 5, 2024
- New MotherDuck users are optionally guided through running and analyzing a query upon first logging in to the Web UI.
## August 21,2024
- MotherDuck now supports [Full Text Search - FTS extension](https://duckdb.org/docs/extensions/full_text_search.html). You can now create a text search index on tables in your MD databases and search them. (Note: Currently, the creation of the FTS index is not supported from MotherDuck-WASM client and app.motherduck.com, but all other clients do.)
## August 14, 2024
- MotherDuck now has an [embedding()](documentation/sql-reference/motherduck-sql-reference/ai-functions/embedding.md) function to compute `FLOAT[512]` text embeddings based on OpenAI's text-embedding-3-small model. Read more about it in our [announcement blog post](https://motherduck.com/blog/sql-embeddings-for-semantic-meaning-in-text-and-rag/)!
- MotherDuck now supports [sequences](https://duckdb.org/docs/sql/statements/create_sequence.html), with one small limitation: Table column definitions that refer to a sequence by a fully qualified catalog name are rejected. Note that cross-catalog references are already disallowed by DuckDB.
## August 7, 2024
- MotherDuck now supports [foreign keys](https://duckdb.org/docs/sql/constraints.html#foreign-keys). Foreign keys define a column, or set of columns, that refer to a primary key or unique constraint from another table. The constraint enforces that the key exists in the other table.
## July 24, 2024
- In the MotherDuck Web UI, users can now drop, rename, and comment on tables/views and columns from the Object Explorer
- Users can now see the logical size of their MotherDuck databases using `FROM pragma_database_size()`
## July 10, 2024
- **Access Tokens**: Users can now create multiple access tokens and revoke them as needed. Tokens can also be configured to expire after a set number of days. [Learn more](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck).
- **Organization domain invites**: Organizations can be configured such that that anyone with the organization's email domain automatically receives an invitation upon signing up.
- **CREATE SHARE with conflict mode**: Database shares can be created with a conflict mode so if a share with the same name already exists, IF NOT EXISTS will not throw an error and OR REPLACE will replace it with a new share.
## June 26, 2024
- **Delta Lake support**: You can now query Delta Lake tables in MotherDuck. [Learn more](/integrations/file-formats/delta-lake).
- In the MotherDuck Web UI, the Object Explorer interface (that catalogs shares and databases on the left side of the UI) has been revamped.
- ACH has been added as a billing method, in addition to credit card billing.
- Resolved an issue affecting large SQL queries in both the MotherDuck UI and the Wasm SDK.
## June 20, 2024
- New MotherDuck users are now treated to a "Welcome to MotherDuck!" notebook upon first logging on to the Web UI.
- In the MotherDuck Web UI, the legacy notebook called "My Notebook" can now be renamed and/or deleted, and notebooks can now be closed.
- In the MotherDuck Web UI, helpful links and drop-down menus have been improved.
- MotherDuck now supports DuckDB's [Spatial Extension](https://duckdb.org/docs/extensions/spatial.html). This extension is pre-installed in MotherDuck, and users are not required to install this extension. Currently, the `GEOMETRY` type has a limitation in that it does not currently render in the MotherDuck Web UI. More details to come.
## June 13, 2024
- Free Plan compute usage limits are now being enforced. Queries for users on the Free Plan may be throttled. [Learn more](/about-motherduck/billing/pricing#free-plan)
## June 11, 2024
- MotherDuck is now Generally Available!
## June 6, 2024
- MotherDuck now supports [organization-scoped and discoverable shares](/key-tasks/sharing-data/sharing-overview).
- MotherDuck now supports storing [Hugging Face type secrets](/sql-reference/motherduck-sql-reference/create-secret).
## June 3, 2024
- MotherDuck now supports DuckDB version 1.0.0. If you have upgraded to 0.10.2+, you can connect with clients that are either of version 0.10.2, 0.10.3, or 1.0.0.
## May 30, 2024
- MotherDuck now supports DuckDB version 0.10.3. If you have upgraded to 0.10.2+, you can connect with clients that are either of version 0.10.2 or 0.10.3.
- Added support to read datasets directly from HuggingFace. Learn more about this new feature [here](https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb.html).
- Added support for [ARRAY Type](https://duckdb.org/docs/sql/data_types/array.html#:~:text=Array%20Type%20%E2%80%93%20DuckDB&text=An%20ARRAY%20column%20stores%20fixed,ARRAY%20%2C%20LIST%20and%20STRUCT%20types.) in MotherDuck UI.
- MotherDuck UI now supports multiple notebooks.
- Fixed a bug in which running the `UPDATE SHARE` command would kill ongoing queries.
## May 15, 2024
- MotherDuck now supports DuckDB 0.10.2. All new MotherDuck users default to DuckDB version 0.10.2, and all existing users can now permanently migrate to DuckDB version 0.10.2. DuckDB version 0.10.2 features a large number of stability and performance improvements, and all users are encourage to migrate.
- Starting with DuckDB 0.10.2, MotherDuck now supports multiple versions of DuckDB at once. For example, you could use DuckDB version 0.10.3 in the CLI and DuckDB version 1.0 in Python.
- MotherDuck now supports [Multi-Statement Transactions](https://duckdb.org/docs/sql/statements/transactions.html). You must be on DuckDB version 0.10.2 or above.
- MotherDuck now supports [Indexes](https://duckdb.org/docs/sql/indexes.html) for the purpose of constraints of types `UNIQUE` or `PRIMARY KEY`. For example, you can leverage `INSERT ON CONFLICT` to dedupe or upsert your data. [Learn more](https://duckdb.org/docs/sql/statements/insert#on-conflict-clause). Indexes are not yet being utilized in MotherDuck for query acceleration.
- MotherDuck now supports Secrets syntax consistent with DuckDB 0.10 and above. [Learn more](/sql-reference/motherduck-sql-reference/create-secret).
- [FixIt](/getting-started/interfaces/motherduck-quick-tour#writing-sql-with-confidence-using-fixit-and-edit) is now 2-3x faster.
- Improved reliability of the service during releases. Moving forward, MotherDuck releases should not disrupt ongoing queries and workloads for users.
## May 8, 2024
- You can now preview DuckDB version 0.10.2 in MotherDuck.
- You can now [choose your organization's pricing plan](/about-motherduck/billing/managing-billing#choosing-your-billing-plan) using the [Plans](https://app.motherduck.com/settings/plans) page in the Settings section of the MotherDuck Web UI.
- You can now configure your organization's payment method in the [Billing](https://app.motherduck.com/settings/billing) page in the Settings section of the MotherDuck Web UI. Free Plan customers are not required to configure a payment method.
## May 1, 2024
- Fixed a bug, in which MotherDuck releases would kill running queries. Releases no longer disrupt ongoing queries and workloads.
- A number of under the hood stability improvements.
## April 25, 2024
- Improved reliability of `ATTACH` operations.
- Various reliability and polish improvements.
## April 24, 2024
- **[Preview] The MotherDuck [Wasm SDK](/sql-reference/wasm-client) is now available for app developers. Read more about the SDK in the [blog annoucement](https://motherduck.com/blog/building-data-applications-with-motherduck/).
## April 17, 2024
- [Billing Portal](./billing/managing-billing.mdx) is now available in the MotherDuck Web UI. You can use the Billing Portal to view your organization's incurred usage and current and past invoices.
- You can now invite your teammates to [Organizations](../key-tasks/managing-organizations/managing-organizations.mdx). Currently, Organizations are useful to group users together to monitor incurred usage in the Billing Portal, and additional capabilities will land in coming weeks.
- Fixed an issue, in which MotherDuck releases would cancel running queries.
## April 10, 2024
- Catalog changes in one MotherDuck client will now automatically propagate to other clients.
- MotherDuck now supports indexes on temporary tables.
## March 20, 2024
- Fixed an issue, in which users' runtimes can become unresponsive.
- In the MotherDuck UI, improved how row counts and query times are calculated.
- A variety of additional bug fixes and infrastructure-level improvements.
## March 7, 2024
- Operations on all databases that create shares (using `CREATE SHARE`), create databases (using `CREATE DATABASE`), or update shares (using `UPDATE SHARE`) are now metadata-only and copy no data.
## February 29, 2024
- A variety of fixes and improvements across the product.
## February 22, 2024
- Numerous bug fixes and stability improvements across the entire product.
## February 14, 2024
Your browser does not support the video tag.
- In the MotherDuck web UI, you can now visualize your tables and query results with the [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/).
- For any database created starting today, operations on these databases that create shares (using `CREATE SHARE`), create databases (using `CREATE DATABASE`), and update shares (using `UPDATE SHARE`) are metadata-only and copy no data.
## February 13, 2024
- You are no longer required to provide a share name when creating shares. In this case, the created share will be named the same as the source database. For example, executing `CREATE SHARE FROM mydb` would create a share named `mydb`; if your current share is `db`, then `CREATE SHARE` would create a share named `db`. See [`CREATE SHARE`](../sql-reference/motherduck-sql-reference/create-share.md) syntax.
- In CLI or Python, MotherDuck no longer displays the authentication token by default. You can retrieve the authentication token by running [`PRAGMA PRINT_MD_TOKEN`](../sql-reference/motherduck-sql-reference/print-md-token.md).
- Support for DuckDB version 0.9.1 has ended.
## January 04, 2024
New Features:
- MotherDuck now supports [DuckDB macros](../sql-reference/duckdb-sql-reference/duckdb-statements/create-macro.md).
- MotherDuck now supports [DuckDB ENUM data types](../sql-reference/duckdb-sql-reference/enum.md).
- Fully qualified column names in SELECT clauses are now supported. For example:
```sql
SELECT schema.table.column FROM schema.table
```
Updates and Fixes:
- Fixed a bug, in which prepared statements for INSERT operations did not work.
- In the MotherDuck web UI, data exports are now faster.
- Rolled out major infrastructure improvements in hybrid query execution, resulting in faster and more reliable hybrid queries.
## January 03, 2024
Your browser does not support the video tag.
- [FixIt](/getting-started/interfaces/motherduck-quick-tour#writing-sql-with-confidence-using-fixit-and-edit) helps you resolve common SQL errors by offering fixes in-line.
## November 30, 2023
- In the MotherDuck web UI, you can now copy query results to the clipboard or export query results as CSV, TSV, Parquet, or JSON files.

- In the MotherDuck web UI, query error messages are now easier to read.

## November 15, 2023
- MotherDuck has been upgraded to DuckDB 0.9.2. You can use either DuckDB 0.9.1 or DuckDB 0.9.2, but not both, until December 6th.
## November 3rd, 2023
- You can now [query Iceberg tables](../integrations/file-formats/apache-iceberg.mdx) on object storage.
- Improved stability of share attaches.
- In the MotherDuck web UI, a new database selector now enables you to use a specific database for each notebook cell.
## October 25, 2023
- In the MotherDuck web UI, you can now move and reorder individual notebook cells.
- In the MotherDuck web UI, the MotherDuck-specific SQL syntax is now highlighted.
- In the MotherDuck web UI, column histograms are now opt-in on a per-result basis, rather than a global opt-out via Settings.
- Improved how the MotherDuck web UI displays datetime data types, matching formatting in the CLI.
- In the MotherDuck web UI, you can now easily copy-paste a rectangular selection of query results into Google Sheets or Excel.
## October 16, 2023
MotherDuck has been upgraded to DuckDB 0.9.1 :tada:
Please see the migrations guide for more info!
- You can now query Azure object storage. See [documentation](../integrations/cloud-storage/azure-blob-storage.mdx) for more info.
- You can now easily load AWS credentials used locally into MotherDuck. Please see syntax for [`CREATE SECRET`](../sql-reference/motherduck-sql-reference/create-secret.md) for more info.
- Better performance and reliability with lower memory usage.
- More intelligent parsing of CSV files.
## September 21, 2023
- The MotherDuck web UI supports Attaching and Detaching databases and shows detached databases.
- The MotherDuck web UI now loads significantly faster. This is an additional improvement over August 30, 2023.
- When a user updates a shared database, all consumers automatically receive the update within 1 minute.
- Support `CREATE OR REPLACE DATABASE` and `CREATE IF NOT EXISTS DATABASE`.
- Fixed a bug in which queries with long commit times would result in the dreaded "`Invalid Error: RPC 'SETUP_PLAN_FRAGMENTS' failed: Deadline Exceeded (DEADLINE_EXCEEDED)`" error.
- Performance and stability of uploads has been improved.
- The MotherDuck web UI now displays decimals correctly.
## August 30, 2023
- The MotherDuck web UI now loads significantly faster.
- The MotherDuck web UI now supports autocomplete. As you write SQL in the UI, on every keystroke autocomplete brings up query syntax suggestions. You can turn off autocomplete in Web UI settings, found under the gear icon in top right.
- In the MotherDuck web UI, you can now execute multiple SQL statements in the same SQL cell.
## August 23, 2023
- Fixed a bug, in which large uploads and downloads would fail.
- Improved performance of uploading data into MotherDuck from all supported sources.
- Added [SHOW ALL DATABASES](../sql-reference/motherduck-sql-reference/show-databases.md) DDL command. This command enables you to list all database types, including MotherDuck databases, DuckDB databases, and databases that were created from shares.
- In the MotherDuck web UI, you can now cancel queries.

- In the MotherDuck web UI, you can now add files of type JSON and files with arbitrary postfixes.
- In the MotherDuck web UI, under the 'Help' menu, you can now find the service specific Terms of Service.
## August 17, 2023
- Numerous stability and performance improvements across the entire product.
- Added more descriptive error messages in a number of areas.
- Better timestamp support in the MotherDuck UI.
## August 01, 2023
- You can now copy a MotherDuck database through [CREATE DATABASE](/sql-reference/motherduck-sql-reference/create-database) using `CREATE DATABASE cloud_db FROM another_cloud_db`.
- Fixed a https certificate error that was appearing on Windows machine when downloading/loading the MotherDuck extension through the CLI.
- Fixed a bug where [DESCRIBE SHARE](../sql-reference/motherduck-sql-reference/describe-share.md) was not returning the actual database name.
## July 26, 2023
- You can now use MotherDuck in CLI or Python with the Windows operating system.
- LIST and DESCRIBE SHARES SQL commands now return the database name instead of the snapshot name.
- Improved resilience of large uploads.
- Added more descriptive error messages for DDL queries.
## July 21, 2023
- Added DDL for [`DESCRIBE SHARE`](/sql-reference/motherduck-sql-reference/describe-share) and [`UPDATE SHARE`](/sql-reference/motherduck-sql-reference/update-share).
- Added DDL for [`CREATE [OR REPLACE] SECRET`](/sql-reference/motherduck-sql-reference/create-secret) and [`DROP SECRET`](/sql-reference/motherduck-sql-reference/delete-secret).
- Added `RESTRICT` and `CASDADE` options to `DROP DATABASE` DDL. See [documentation](/sql-reference/motherduck-sql-reference/drop-database).
- The current database, set with USE DATABASE, is now persisted across sessions in the web UI.
- Data uploads and downloads have been accelerated by roughly 3x by compressing data over the wire.
- Numerous stability and performance improvements across the entire product.
- Added more descriptive error messages in a number of areas.
## June 29, 2023
- You can now use AI to help you write SQL with the `prompt_sql` function, answer questions about your data with the `prompt_query` pragma, describe your data with the `prompt_schema` pragma, and fix your SQL with the `prompt_fixup` function. See [documentation](/key-tasks/ai-and-motherduck/ai-features-in-ui).
## June 27, 2023
- Added support for [`DROP SHARE [IF EXISTS]`](/sql-reference/motherduck-sql-reference/drop-share),
[`LIST SHARES`](/sql-reference/motherduck-sql-reference/list-shares), and
[`LIST SECRETS`](/sql-reference/motherduck-sql-reference/list-secrets) operations.
Previously these operations were supported via table functions.
The MotherDuck web UI now supports creating, deleting, and listing S3 secrets.
- Numerous improvements to the MotherDuck web UI.
- Fixed a bug, in which the share URL was not returning after running the `CREATE SHARE` command in the CLI.
- Referencing database objects is now case insensitive. For example, if a database `DuCkS` exists, you can now reference it as `ducks` or `DUCKS`. When listing databases, you will see `DuCkS`.
## June 23, 2023
- Numerous fixes to improve the stability and reliability of our authentication process and token expiry.
- In the MotherDuck web UI there is now a new drop-down menu on User Profile (upper right) with options to access settings, send an invite, and log out.
- Added support for `IF EXISTS` option to the `DROP DATABASE` SQL command. See [documentation](/sql-reference/motherduck-sql-reference/drop-database).
- Added support for allowing the `motherduck_token` parameter in the connection string.
- Added md_list_secrets() table function. Because MotherDuck currently only supports a single secret, this function returns either `TRUE` or `FALSE` depending on whether a secret exists. See [documentation](/sql-reference/motherduck-sql-reference/list-secrets).
- Fixed a bug in the MotherDuck web UI where tables were rendered incorrectly.
## June 21, 2023
- In the MotherDuck web UI, the interactive query results panel now supports all DuckDB data types.
- Easier signup flow for new users.
- Performance of loading data into MotherDuck has been improved.
- Added support for `CREATE [OR REPLACE | IF NOT EXISTS] DATABASE` and `CREATE DATABASE FROM CURRENT_DATABASE()`.
- A concurrency issue on dropping and recreating shares has been resolved.
- Timeout handling for hybrid queries has been improved.
- The MotherDuck connection parameter `deny_local_access` has been renamed to `saas_mode` and now sets both `enable_external_access=false` and `lock_configuration=true` DuckDB properties. In practice, this means that when connecting to MotherDuck with the `deny_local_access=true` parameter, users will _not_ be able to read/write local files, read/write local DuckDB databases, install/load any extensions or update any configuration. See [documentation](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-saas-mode).
- Numerous other improvements.
## June 15, 2023
- MotherDuck now supports DuckDB [0.8.1](https://github.com/duckdb/duckdb/releases/tag/v0.8.1). Currently, MotherDuck only supports a single version of DuckDB at a time so you must upgrade your DuckDB instances to 0.8.1.
- Performance of loading data into MotherDuck has been drastically improved.
- Database name in SQL command `CREATE DATABASE` is now a literal. You need to leave the database name unquoted. For example:
- Supported: `CREATE DATABASE ducks;`
- Supported: `CREATE DATABASE "ducks";`
- No longer supported: `CREATE DATABASE 'ducks';`
- You can now create a share using the `CREATE SHARE` statement, in addition to previously supported table function `md_create_database_share()`:
- Supported: `CREATE SHARE myshare FROM ducks;`
- Supported: `CALL md_create_database_share( 'myshare' , 'ducks' );`
- You can now write data to s3 using the `COPY TO` command.
- In the web UI entering and exiting full screen mode has been simplified. You can also choose to only display the query editor or the query results using the overflow menu.
- In the web UI you can now work with compound data types from json in interactive query results.
- You can now use both lowercase and uppercase versions of the environment variable `motherduck_token` (e.g. `MOTHERDUCK_TOKEN`).
## June 7, 2023
- Views are now supported.
- Query results in the web UI are now interactive. Powered by [Tad](https://www.tadviewer.com/) and DuckDB in WASM, you can now quickly sort, filter and pivot results of a SQL query. Click on column headers to sort, or the pivot icon to open the control surface.

- Query results now include interactive column histograms for numeric columns. The gray background area of the column histogram is a brush that can be dragged to interactively filter results.

- The Motherduck extension for CLI and Python auto-updates itself. Users no longer need to run 'FORCE INSTALL motherduck' to update their MotherDuck-powered DuckDB instances.
Note: of course, to get this goodness, we ask you to run force install one last time.
- Various stability and usability improvements.
## May 31st, 2023
**Summary**
- SQL queries in the web UI are now automatically saved in local storage in your web browser and restored when you reload the page.
- The MotherDuck extension is now available for Linux on ARM64!
- Support [ON CONFLICT](https://duckdb.org/docs/sql/statements/insert.html#on-conflict-clause) clause.
- New setting `deny_local_access` to lock down filesystem and extension loading (note: does not prevent DuckDB database access).
## May 24, 2023
**Summary**
- Various stability improvements and bug fixes
## May 22, 2023
**Summary**
- The MotherDuck service is upgraded to DuckDB 0.8.0
- Catalog schemas are now supported.
- Querying `md_databases()` no longer returns snapshots.
- Shares that you create are no longer auto-attached. As the creator, you can attach them via `attach `
- Various stability improvements and bug fixes
**_Known issues_**
- Some shares appear as "empty" databases. Please report to [support@motherduck.com](mailto:support@motherduck.com) if you spot a sharing issue.
## May 17, 2023
- The DuckDB ICU [extension](https://duckdb.org/docs/extensions/overview.html#all-available-extensions) is now enabled by default. This extension adds support for time zones and collations using the ICU library.
- The web UI now displays your avatar instead of initials in the user menu
- The first database alphabetically is now used for querying by default in web UI. CLI behavior has not changed – if you don't pass a specific database through the connection string, the default database _my_db_ will be used for querying.
NOTE: this will change once we upgrade to the just-released DuckDB 0.8.0
- Output of query EXPLAIN is now more user-friendly
- Various stability improvements and bugfixes
## May 5, 2023
- Fixed a bug, in which users were unable to supply the authentication token in-line in the connection string. For instance `.open md:?token=123123` or `duckdb md:?token=3333`.
- DELETE and UPDATE table operations are now supported.
## May 3, 2023
- Stability of DML and DDL operations has been greatly improved
- Hybrid query execution has now been upgraded to execute many query types more efficiently
- ~~You can now upload your current DuckDB database using the `CREATE DATABASE FROM 'CURRENT_DATABASE'` operation~~ (no longer supported as of October 2025)
- In the web UI you can now find a link to MotherDuck's technical documentation
- In the web UI you can now upload files from your local computer to MotherDuck
- In programmatic interfaces (JDBC, CLI, Python) you can now connect to a specific database using syntax `md:` or `motherduck:`
- MotherDuck now creates a default database called `my_db` for you. This is the database you connect to if you do not specify a database when connecting to MotherDuck
## April 26, 2023
- You can now work with multiple databases - cloud or local. You can now query across multiple cloud or local databases
- You can now save your S3 credentials in MotherDuck using the MD_CREATE_SECRET operation
- You can now upload DuckDB databases to MotherDuck using the CREATE DATABASE FROM operation
- MotherDuck UI now has improved notebook experience
## April 19, 2023
- Various stability, performance, and UI improvements
## April 12, 2023
- The JSON extension to DuckDB is now pre-installed automatically in the web UI.
- The table viewer component in the Web UI is now a simple table (rather than an interactive pivot table). This should greatly improve time to first render on query results, especially for small queries. We plan to re-enable the pivot table in an upcoming release, once some underlying performance issues are resolved.
- The duck feet are paddling very hard underwater (numerous stability and performance improvements).
## March 30, 2023
- Fixed: [auto_detection of schema of .csv fails in WASM](https://lindie.app/share/92ac65cc6e006bff2fb60417388294965ef2d4c7)
- Fixed: intermittent "Error reading catalog: Cancelling all calls" error
- Numerous stability and performance improvements
## March 22, 2023
- CLI uses the same database by default as the web app (first sorted alphabetically)
- Multiple improvements in the MotherDuck UI
- Numerous stability and performance improvements
- Enabled query EXPLAIN for queries that execute in hybrid mode
## March 8, 2023
- Numerous stability and performance improvements
- Vastly improved performance of loading multiple CSVs in the same command
- Fixed a bug in CLI, in which authentication via browser would fail
# March 1, 2023
> Even more goodies!
- Delivered major improvements to hybrid execution, resulting in better efficiency, stability, and performance
- Fixed a bug in UI, in which dropping and creating a database with the same name displayed incorrect information
- Migrated to DuckDB 0.7.1
- Fixed an error message when running MotherDuck commands in the CLI without running .open
# January 26, 2023
> We're back with more exciting improvements!
- Addressed server timeouts associated with long-running queries. Still triaging other potential issues with long running issues but network tier issues should be mitigated to a large degree.
- Empty databases now appear in the catalog in UI
- Added an MD_VERSION Pragma function
- Implemented Oauth sign-in flow from native client
- Upgraded MotherDuck-hosted DuckDB to version 0.6.1
- Fixed a number of bugs across the entire service
# December 23, 2022
> Our first release! Duckies first steps 🦆
---
Source: https://motherduck.com/docs/concepts/Storage-lifecycle
---
title: Storage Lifecycle and Management
sidebar_position: 3
description: Understand how MotherDuck manages data storage across different lifecycle stages and how this affects your billing and data management strategies.
---
# Storage Lifecycle and Management
Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a sophisticated multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes.
## Storage Lifecycle Overview
The following documents MotherDuck's storage lifecycle.
```mermaid
graph LR;
A[Active Bytes]-->|bytes deleted or updated|B[Historical Bytes];
B-->|shares dropped|C[Kept for Cloned Bytes];
C-->|bytes deleted or updated by cloned databases|D[Failsafe Bytes];
D-->|7 day retention|E[Deleted];
```
There are 4 distinct stages of the storage lifecycle:
- **Active bytes**: Actively referenced bytes of the database. These bytes are accessible by directly querying the database.
- **Historical bytes**: Non-active bytes referenced by a share of this database
- **Kept for cloned bytes**: Bytes referenced by other databases (via zero-copy clone) that are no longer referenced by this database as active or historical bytes
- **Failsafe bytes**: Bytes that are no longer referenced by any database or share that are retained for some period of time as system backups
MotherDuck will run a periodic job that will reclassify data to the proper storage lifecycle stage.
Data can only flow through the storage lifecycle unidirectionally, from left to right.
The following conditions can trigger data to be reclassified to a new stage:
- **Active bytes:** when the data is deleted from the database
- **Historical bytes:** when all shares referencing the data are dropped or updated
- **Kept for cloned bytes:** when the data is deleted from all zero-copy-cloned databases
- **Failsafe bytes:** after the failsafe retention period (7 days)
An organization is billed for the sum of active, historical, kept for cloned, and failsafe bytes across all of their databases.
### How This Affects Your Data Strategy
Understanding the storage lifecycle helps you make informed decisions about:
- **Data deletion strategies**: When you delete data, it doesn't immediately reduce your bill due to the retention stages
- **Sharing considerations**: Shared data remains in historical bytes until shares are updated or dropped
- **Cloning decisions**: Zero-copy clones can keep data in kept for cloned bytes even after deletion from the source
- **Cost optimization**: Different lifecycle stages have different cost implications and management strategies
For more information on data sharing, see [Sharing Data](/key-tasks/sharing-data/sharing-overview). For details on zero-copy cloning, refer to [MotherDuck Architectural Concepts](/concepts/database-concepts/#motherduck-architectural-concepts).
## Storage Management
MotherDuck supports two ways to configure storage retention for native storage-backed databases.
### Standard Databases:
| Plan | Failsafe Period - Standard Databases | Minimum (Default) Historical Retention |
|------------|--------------------------------------------------|-----------------------------|
| Business | 7 days | 1 day |
| Lite | 7 days | 1 day |
| Free | 7 days | zero days |
### Transient Databases:
For use cases that don't require the default failsafe retention period (7 days), a MotherDuck database can be set as `TRANSIENT` [at database creation](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-database/?_gl=1*wh4wo6*_up*MQ..*_ga*MjAzMjkwMjI2MS4xNzYxMTc0NTk5*_ga_L80NDGFJTP*czE3NjExNzQ1OTgkbzEkZzEkdDE3NjExNzQ2NDYkajEyJGwwJGg5NTM2MDU1NzQ.#database-options) to enforce a 1 day failsafe minimum. This setting can only be defined at database creation and **is not** modifiable.
| Plan | Failsafe Period - Transient Databases | Minimum (Default) Historical Retention |
|------------|---------------------------------------|-----------------------------|
| Business | 1 day | 1 day |
| Lite | 1 day | 1 day |
| Free | 1 day | zero days |
**Transient databases can be helpful for the following datasets:**
* Datasets that are the intermediate output of a job (write once, read once)
* Datasets that can be easily reconstructed from an external data source
## Breaking Down Storage Usage
To better understand your organization's storage bill, start with the [`STORAGE_INFO` view](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/storage_info/) in the [MD_INFORMATION_SCHEMA](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/introduction/). This function returns an overview of the storage footprint by lifecycle stage for the databases in your organization.
If **Active bytes** are higher than expected, consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types.
If **Historical bytes** are higher than expected, consider whether there are outstanding manually updated shares that reference this database in the organization. This footprint will decrease as the shares are updated (UPDATE SHARE) or dropped. You can find all shares that reference some database by using the [OWNED_SHARES](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/owned_shares/) view in the [MD_INFORMATION_SCHEMA](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/introduction/).
If **Kept for cloned bytes** are higher than expected, consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases.
**Failsafe bytes** result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (via overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint. Transient databases won't have failsafe bytes.
If you have the Admin role, you can view your organization's storage breakdown on the [databases page](https://app.motherduck.com/settings/databases).
If you need help understanding or reducing your storage bill, please reach out to [MotherDuck support](https://motherduck.com/contact-us/support/).
---
Source: https://motherduck.com/docs/concepts/architecture-and-capabilities
---
sidebar_position: 1
title: Architecture and capabilities
---
import Image from '@theme/IdealImage';
import Versions from '@site/src/components/Versions';
MotherDuck is a serverless cloud analytics service with a unique architecture that combines the power and scale of the cloud with the efficiency and convenience of DuckDB.
MotherDuck's key components are:
- The MotherDuck cloud service
- MotherDuck's DuckDB SDK
- Dual Execution
- The MotherDuck web UI

### The MotherDuck cloud service
The MotherDuck cloud service enables you to store structured data, query that data with SQL, and share it with others. A key MotherDuck product principle is ease of use.
**Serverless execution model**—You don't need to configure or spin up instances, clusters, or warehouses. You simply write and submit SQL. MotherDuck takes care of the rest. Under the hood, MotherDuck runs DuckDB and speaks DuckDB's SQL dialect.
**Managed storage**—you can load data into MotherDuck storage to be queried or shared. MotherDuck storage is durable, secure, and automatically optimized for best performance. MotherDuck storage is surfaced to you via the **catalog** and logical primitives database, schema, table, view, etc. In addition, MotherDuck can query data outside of MotherDuck storage—as data on Amazon S3, via https endpoints, on your laptop, and so on.
**The service layer**—MotherDuck provides key capabilities like secure identity, authorization, administration, and monitoring.
:::note
MotherDuck is currently available on two AWS regions:
- **US East (N. Virginia):** `us-east-1`, supporting DuckDB versions between and .
- **Europe (Frankfurt):** `eu-central-1`, supporting DuckDB versions between and .
You can choose in which region to create your organization, and organizations can only exist within a single cloud region currently.
We are working on expanding to other regions and cloud providers.
:::
### MotherDuck's DuckDB SDK
If you're using DuckDB in Python or CLI, you can connect to MotherDuck with a single line of code, `ATTACH 'md:';`. After you run this command, your DuckDB instance becomes supercharged by MotherDuck. MotherDuck's Dual Execution is enabled, and your DuckDB instance gets additional capabilities like sharing, secrets storage, better interoperability with S3, and cloud persistence.
### Dual Execution
When connected together, DuckDB and MotherDuck form a different type of distributed system. The two nodes work in concert so you can query data wherever it lives, in the most efficient way possible. This query execution model, called **Dual Execution** (formerly known as Hybrid Execution), automatically routes the various stages of queries execution to the most opportune locations, including highly arbitrary scenarios:
- If a SQL query queries data on your laptop, MotherDuck routes the query to your local DuckDB instance
- If a SQL query queries data in MotherDuck or S3, MotherDuck routes that query to MotherDuck
- If a SQL query executes a join between data on your laptop and data in MotherDuck, MotherDuck finds the best way to efficiently join the two
### The MotherDuck web UI
You can use MotherDuck's web UI to analyze and share data and to perform administrative tasks. Currently MotherDuck's UI consists of a lightweight notebook, a SQL IDE, and a data catalog. Uniquely, MotherDuck caches query results in a highly interactive query results panel, enabling you to sort, filter, and even pivot data quickly.
## Summary of capabilities
Currently with MotherDuck you can:
- Use serverless DuckDB in the cloud to store data and execute DuckDB SQL
- Load data into MotherDuck from your personal computer, https, or S3
- Join datasets on your computer with datasets in MotherDuck or in S3
- Copy DuckDB databases between local and MotherDuck locations
- Materialize query results into local or MotherDuck locations, or S3
- Work with data in MotherDuck’s notebook UI, standard DuckDB CLI, or standard DuckDB Python package
- Share databases with your teammates
- Securely save S3 credentials in MotherDuck
Additionally, MotherDuck supports connectivity to third party tools via:
- JDBC
- Go
- sqlalchemy
## Considerations and limitations
MotherDuck does not yet support the full range of SQL of DuckDB. We are continuously working on improving coverage of DuckDB in MotherDuck. If you need specific features enabled, please let us know.
Below is the list of DuckDB features that MotherDuck does not yet support:
- Custom Python / Native user defined functions.
- Server-side attach of postgres, sqlite, etc.
- Custom or community extensions.
---
Source: https://motherduck.com/docs/concepts/concepts
---
title: Concepts
description: Concepts
sidebar_class_name: architecture-icon
---
This section contains a collection of high level views of concepts & features.
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/concepts/database-concepts
---
sidebar_position: 2
title: Database Concepts
sidebar_label: Database Concepts
description: MotherDuck Database Concepts
---
## MotherDuck Architectural Concepts
:::note
MotherDuck is a cloud-native data warehouse, built on top of DuckDB, a fast in-process analytical database. It inherits some features from DuckDB that present opportunities to think differently about data warehousing methods in order to achieve high levels of performance and simplify the experience.
:::
- **Isolated Compute Tenancy**: Each user is allocated their own "Duckling," which is an isolated piece of compute that sits on top of the MotherDuck storage layer. MotherDuck is designed this way to lessen contention between users, which is a common challenge with other data warehouses. Each Duckling had under 100ms of cold start time as MotherDuck keeps Ducklings on warm standby.
- **Aggressively Serverless**: Unlike conventional data warehouses, DuckDB automatically parallelizes the work that you send to it. The implication of this is that scheduling multiple queries at-a-time does not meaningfully increase throughput, as DuckDB has already parallelized the workload across all available resources.
- **Database level security model**: It has a simplified access model - users either have access to an entire database, or nothing at all. As a result, users will interact with data frequently at the database level. This is unusual when compared to other databases, which often treat multiple database files as single concepts from an interactivity perspective.
- **Database Sharing**: MotherDuck separates storage and compute, which means that one user cannot see another's writes into a database until that database is updated to that user. As such, it has its own concept called ["SHARES"](/key-tasks/sharing-data/sharing-overview/) within Organizations, which are zero-copy clones of the main database for read-only use, enabling high scalability of analytics workloads.
- **Dual Execution**: Every MotherDuck client is also a DuckDB engine, so you can efficiently query local data and (JOIN, UNION) with data that's stored in your MotherDuck data warehouse. [The query planner automatically decides](/concepts/architecture-and-capabilities#dual-execution) the best place to execute each part of your query.
---
Source: https://motherduck.com/docs/concepts/duckdb-extensions
---
sidebar_position: 8
title: DuckDB Extensions in MotherDuck
keywords:
- DuckDB extensions
- MotherDuck extensions
- extension support
- server-side extensions
- web UI extensions
- compatibility
---
# DuckDB Extensions in MotherDuck
MotherDuck supports a wide array of DuckDB extensions to enhance your analytics workflows. Support varies depending on whether you are using the DuckDB CLI, the MotherDuck cloud service (server-side), or the MotherDuck Web UI.
## Extension Support
### MotherDuck Web UI
The MotherDuck Web UI supports a subset of extensions optimized for interactive analytics and data exploration directly in your browser. Some extensions can be loaded in the Web UI but are not supported server side (i.e., they are invoked and ran only in the browser).
### MotherDuck Cloud (Server-Side)
MotherDuck's cloud service supports a curated set of extensions for optimized, secure, and scalable query execution. These extensions are available for all queries running against the MotherDuck service.
### DuckDB CLI
When connected to MotherDuck through the local DuckDB CLI, **all** DuckDB extensions are available. These extensions are loaded locally, giving you access to the entire DuckDB ecosystem for development and testing.
## Extension Support Matrix
The following table summarizes the current support for DuckDB extensions across MotherDuck environments, as it relates to execution context - extensions supported only server-side will only use server-side compute, where as extensions also supported in the Web UI will use local compute as well.
The environments are **MD Web UI**, located at https://app.motherduck.com, **MD Cloud**, which runs on MotherDuck infrastructure when you connect via `md:`, and **DuckDB UI / CLI** which run on local environments where the DuckDB client is installed.
| Extension | MD UI* | MD Cloud | DuckDB UI / CLI |
|----------------------|--------|----------|-----------------|
| autocomplete | ✅ | ❌ | ✅ |
| avro | ✅ | ✅ | ✅ |
| aws | ❌ | ❌ | ✅ |
| azure | ❌ | ✅ | ✅ |
| delta | ❌ | ✅ | ✅ |
| ducklake | ✅ | ✅ | ✅ |
| encodings | ❌ | ✅ | ✅ |
| excel | ✅ | ❌ | ✅ |
| fts | ✅ | ✅ | ✅ |
| httpfs | ✅ | ✅ | ✅ |
| h3 | ✅ | ✅ | ✅ |
| iceberg | ❌ | ✅ | ✅ |
| icu | ✅ | ✅ | ✅ |
| inet | ✅ | ✅ | ✅ |
| jemalloc | ❌ | ❌ | ✅ |
| json | ✅ | ✅ | ✅ |
| mysql | ❌ | ❌ | ✅ |
| parquet | ✅ | ✅ | ✅ |
| postgres | ❌ | ❌ | ✅ |
| spatial | ✅ | ✅ | ✅ |
| sqlite | ✅ | ❌ | ✅ |
| tpcds | ✅ | ✅ | ✅ |
| tpch | ✅ | ✅ | ✅ |
| ui | ❌ | ❌ | ✅ |
| vss | ✅ | ❌ | ✅ |
| community extensions | ❌ | ❌ | ✅ |
:::note
*Not all features of extensions in the MotherDuck UI (Wasm) are supported.
:::
:::note
For some extensions (such as `h3`), you should load it before loading the `motherduck` extension if you want to use it
on local data without routing the query to MotherDuck.
```sql
-- Install and load the h3 extension before MotherDuck
INSTALL h3 FROM community;
LOAD h3;
LOAD motherduck;
ATTACH 'md:';
```
:::
Extensions listed as supported by DuckDB UI / CLI, such as `aws`, `arrow`, `postgres_scanner`, and `vss`, can also be used through a local DuckDB instance connected to MotherDuck.
## Future Development
MotherDuck's extension support is continuously evolving. The team regularly evaluates and adds support for new extensions based on user demand and technical feasibility. If you need specific extensions enabled, please reach out to the MotherDuck team.
---
Source: https://motherduck.com/docs/concepts/ducklake
---
sidebar_position: 8
title: DuckLake
description: Understanding DuckLake - A high-performance open table format for petabyte-scale analytics
---
import Admonition from '@theme/Admonition';
import Versions from '@site/src/components/Versions';
# DuckLake
::::warning[Preview Feature]
Preview features may be operationally incomplete and may offer limited backward compatibility.
::::
::::info
MotherDuck currently supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client version through .
::::
DuckLake is an open table format for large-scale analytics that provides data management capabilities similar to Apache Iceberg and Delta Lake. It organizes data into partitions based on column values like date or region for efficient querying, with actual data files stored on object storage systems. DuckLake innovates by storing metadata in database tables rather than files, enabling faster lookups through database indexes and more efficient partition pruning via SQL queries, while the columnar data itself resides on scalable object storage infrastructure.
MotherDuck provides support for managed DuckLake, enabling you to back MotherDuck databases with a DuckLake catalog and storage for petabyte-scale data workloads.
:::tip
Looking for **code examples?** Check out the [integration guide](/integrations/file-formats/ducklake/) to see how easy it is to start using DuckLake with MotherDuck.
:::
### Key Characteristics
**Database-backed metadata**: DuckLake stores table metadata in a transactional database (PostgreSQL, MySQL) rather than files, providing:
- Faster metadata lookups through database indexes
- Efficient filtering of data by skipping irrelevant partitions using SQL WHERE clauses
- Simplified writes without the performance of manifest file merging
**Multi-table transactions**: Unlike other lake formats that operate on individual tables, DuckLake supports ACID transactions across multiple related tables, better reflecting how organizations think about databases as collections of inter-related tables.
**Simplified architecture**: No additional catalog server required—just a standard transactional database that most organizations already have expertise managing.
## DuckLake vs. Other Lake Formats
### Performance Differences
Table formats like Apache Iceberg and Delta Lake store metadata in file-based structures. Read and write operations must traverse these file-based metadata structures, which can create latency that increases with scale.
**File-based metadata challenges**:
- Sequential file scanning for metadata discovery
- Complex manifest file merging for writes
- Limited query optimization due to metadata access patterns
- Catalog server complexity for coordination
**DuckLake approach**:
- Database indexes provide faster metadata lookups
- Transactional writes reduce manifest merging overhead
- SQL-based partition pruning and query optimization
- Standard database operations for metadata management
### Scale and Capability Comparison
| Capability | DuckLake | Iceberg/Delta Lake |
| ---------- | -------- | ------------------ |
| **Data Scale** | Petabytes | Petabytes |
| **Metadata Storage** | Database tables with indexed access | File-based structures requiring sequential traversal |
| **Metadata Performance** | Database index lookups | Additional catalog required |
| **Write Operations** | Database transactions | Manifest file merging |
| **Multi-table Operations** | Full ACID transactions across tables | Limited cross-table coordination |
| **Infrastructure Requirements** | Standard transactional databases | Separate catalog servers |
| **Schema Evolution** | Coordinated multi-table schema evolution | Individual table-level changes |
## Use Cases and Applications
### When to Choose DuckLake as your Open Table Format
DuckLake is particularly well-suited for:
**Large-scale analytics**: Organizations with petabytes of historical data, high-volume event streams, or analytics requirements that exceed traditional data warehouse storage or processing capabilities.
**Multi-table workloads**: Applications requiring coordinated schema evolution, cross-table constraints, or transactional consistency across related tables.
**Metadata-intensive workloads**: Scenarios where file-based metadata access patterns may impact query performance.
**Reduced infrastructure complexity**: Organizations seeking lake-scale capabilities with fewer separate catalog servers and metadata management components.
### Storage Comparison: MotherDuck Native vs DuckLake Storage
For loading data, MotherDuck and DuckLake perform very similarly.
However, when reading data, MotherDuck native storage format is 2x-10x faster than DuckLake, for both cold & hot runs.
### Migration Considerations
**From data warehouses**: DuckLake provides a scaling option when warehouse storage limits or costs become constraining, while maintaining SQL interfaces and compatibility.
**From other lake formats**: DuckLake may provide performance improvements for metadata-intensive workloads, though migration requires consideration of existing tooling and processes.
**Hybrid architectures**: Organizations can use MotherDuck for traditional data warehouse workloads while graduating specific databases to DuckLake as scale requirements increase.
## Performance Characteristics
### Metadata Operations
DuckLake's database-backed metadata provides different performance characteristics:
- **Partition discovery**: Index-based vs. file scanning
- **Schema evolution**: Transactional vs. eventual consistency
- **Query planning**: Index-based vs. file traversal
- **Concurrent access**: Database locks vs. file coordination
## Data Inlining
DuckLake supports data inlining, an optimization that stores small data changes directly in the metadata catalog rather than creating individual Parquet files. This feature is particularly valuable for high-frequency, small-batch inserts common in streaming and transactional workloads.
For implementation details and examples, see the [DuckLake integration guide](/integrations/file-formats/ducklake/#data-inlining).
## Future Capabilities
MotherDuck continues expanding DuckLake support with planned features including:
**External catalog integration**: Access to customer-managed DuckLake catalogs hosted in cloud databases
**Local storage access**: Direct access to MotherDuck-managed storage from local DuckDB instances for hybrid workloads
**Enhanced Iceberg support**: Continued improvements to Iceberg integration alongside DuckLake development
## Architecture Implications
### Catalog Database Requirements
DuckLake catalogs require a transactional database with:
- ACID transaction support
- Concurrent read/write access
- Standard SQL interface
- Backup and recovery capabilities
Thankfully, this is all supported as part of MotherDuck without adding an additional catalog, although in self-hosted scenarios, an alternative database like Postgres, MySQL, or SQLite can be used.
### Storage Considerations
DuckLake data storage follows similar patterns to other lake formats:
- Columnar file formats (Parquet)
- Partitioned directory structures
- Object storage compatibility
- Compression and encoding optimizations
---
Source: https://motherduck.com/docs/concepts/object-name-resolution
---
sidebar_position: 5
title: Object name resolution
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Object name resolution
## Fully qualfied naming convention
Fully qualified names (FQN) in MotherDuck are of the form `..`. Fully qualified naming convention allows you to query objects in MotherDuck regardless of context. Queryable objects can be tables and views.
For example:
```sql
SELECT * FROM mydatabase.myschema.mytable;
```
Fully qualified naming convention is useful when you want your SQL to execute reliably across multiple interfaces, by various users, or in programmatic scripts.
## Relative naming convention
For convenience, MotherDuck enables you to omit database or schema when querying objects.
When **database is omitted**, MotherDuck will attempt to resolve the query by using the current database:
```sql
SELECT * FROM myschema.mytable;
```
When **both database and schema are omitted**, MotherDuck will first attempt to find the object in the current schema. Thereafter, it will attempt to find the object in other schemas in the current database. If the object name is ambiguous - for example if multiple tables with the same name exist in the database - MotherDuck will return an error:
```sql
SELECT * FROM mytable;
```
You may also choose to **omit just the schema**. MotherDuck will first search the current schema, and thereafter will search for the object across all other schemas in the specified database:
```sql
SELECT * FROM mydatabase.mytable;
```
---
Source: https://motherduck.com/docs/concepts/pgduckdb
---
sidebar_position: 3
title: pg_duckdb Extension
---
[pg_duckdb](https://github.com/duckdb/pg_duckdb) is an Open-source Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres.
Main features include :
- SELECT queries executed by the DuckDB engine can directly read Postgres tables
- Read and Write support for object storage (AWS S3, Cloudflare R2, or Google GCS)
- Read and Write support for data stored in MotherDuck
For more information about functionality and installation, checkout the [repository's README](https://github.com/duckdb/pg_duckdb/blob/main/README.md).
## Connect with MotherDuck
To enable this support you first need to [generate an access token][md-access-token] and then add the following line to your `postgresql.conf` file:
```ini
duckdb.motherduck_token = 'your_access_token'
```
NOTE: If you don't want to store the token in your `postgresql.conf`file can also store the token in the `motherduck_token` environment variable and then explicitly enable MotherDuck support in your `postgresql.conf` file:
```ini
duckdb.motherduck_enabled = true
```
If you installed `pg_duckdb` in a different Postgres database than the default one named `postgres`, then you also need to add the following line to your `postgresql.conf` file:
```ini
duckdb.motherduck_postgres_database = 'your_database_name'
```
After doing this (and possibly restarting Postgres). You can then you create tables in the MotherDuck database by using the `duckdb` [Table Access Method][tam] like this:
```sql
CREATE TABLE orders(id bigint, item text, price NUMERIC(10, 2)) USING duckdb;
CREATE TABLE users_md_copy USING duckdb AS SELECT * FROM users;
```
[tam]: https://www.postgresql.org/docs/current/tableam.html
Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres.
This is done in the following way:
1. Each schema in your default MotherDuck database are simply merged with the Postgres schemas with the same name.
2. Except for the `main` DuckDB schema in your default database, which is merged with the Postgres `public` schema.
3. Tables in other databases are put into dedicated DuckDB-only schemas. These schemas are of the form `ddb$$` (including the literal `$` characters).
4. Except for the `main` schema in those other databases. That schema should be accessed using the shorter name `ddb$` instead.
An example of each of these cases is shown below:
```sql
INSERT INTO my_table VALUES (1, 'abc'); -- inserts into my_db.main.my_table
INSERT INTO your_schema.tab1 VALUES (1, 'abc'); -- inserts into my_db.your_schema.tab1
SELECT COUNT(*) FROM ddb$my_shared_db.aggregated_order_data; -- reads from my_shared_db.main.aggregated_order_data
SELECT COUNT(*) FROM ddb$sample_data$hn.hacker_news; -- reads from sample_data.hn.hacker_news
```
[md]: https://motherduck.com/
[md-access-token]: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token
---
Source: https://motherduck.com/docs/concepts/results
---
title: Results
description: Results
sidebar_class_name: cache-icon
---
:::warning
`RESULT`s are a feature in preview.
:::
**RESULT** provides asynchronous query execution with a transparent cache. Create a RESULT to run a SELECT in the
background, then query it like a table while controlling its lifecycle (pause, resume, cancel, drop). You can think of
a result as a view with an attached cache that is used whenever possible to speed up queries.
Results are stored in memory and will only remain visible until your client-side DuckDB session is restarted.
## Core Concepts
### What is a RESULT?
```sql
CREATE RESULT AS ;
FROM SELECT ...;
```
A RESULT is a named relation in your DuckDB database that:
- Runs the provided `SELECT` in the background (creation is non-blocking)
- Caches rows produced by that statement as it runs
- Provides lifecycle management (pause, resume, cancel, drop)
- Can be queried like a regular table
- Maintains execution state and progress information
### Result States
Results can be in one of three states:
- **BUILDING**: Query is actively running and appending rows to the cache
- **PAUSED**: Query execution is temporarily paused
- **DONE**: Query execution has completed, which can occur for three reasons:
1. Query finished successfully
2. Query was preemptively stopped (e.g., aborted by the user)
3. Query encountered an error
## Interacting with Results
### Creating Results
As soon as you create a RESULT, the provided `SELECT` starts running in the background. You can query the result like a normal
table at any time.
results, you can query the result just like you would query a normal table.
```sql
-- Basic syntax
CREATE RESULT AS ;
-- With conflict resolution
CREATE RESULT IF NOT EXISTS AS ;
CREATE OR REPLACE RESULT AS ;
-- Accessing the result
FROM LIMIT ;
```
### Accessing Results
You can query a result like a table. The relation appears quickly after creation, although the background `SELECT` may still
be running.
query creating the result has completed successfully. This occurs very quickly and does not mean that the `SELECT`
statement associated with the result has completed running.
```sql
FROM LIMIT ;
```
There is **no guarantee** the cache is complete when you query a result. Depending on the state of the `RESULT` and your query, the system
may read from the cache, wait for additional rows, or bypass the cache and re-run the original `SELECT`.
The decision tree below shows how the `FROM my_result LIMIT 100` accessing the RESULT `my_result` behaves.
```mermaid
flowchart TD
start(("FROM my_result LIMIT 100")) -->|Completed successfully| cache(((Read from cache)))
start -->|"RESULT is not running (PAUSED, DONE with error)"| enough
start -->|RESULT is BUILDING| enough_building
enough_building{"Has enough data? (cache > 100)"} -->|Yes| cache
enough_building -->|No| access_limit
access_limit{"access limit < 500,000 (100 < 500,000)"} -->|Yes| delay
access_limit -->|No| rerun
delay(Wait for 100 rows to be cached, or result to complete.) --> cache
enough{"Has enough data (cache > 100) OR DONE without error?"} -->|Yes| cache
enough -->|No| rerun(((Re-run query)))
```
### Lifecycle Management
On creation, new results start in the **BUILDING** state. While building, you can **PAUSE**, **RESUME**, **CANCEL**, or **DROP** the result.
Pause suspends execution, resume continues from where it stopped. Cancel stops the job permanently and it cannot
be resumed. Canceled results can still be queried, but they will not append any new rows to the cache.
When a result is dropped, it is permanently deleted and can no longer be queried. Dropping a result also removes its associated cache.
```mermaid
stateDiagram-v2
[*] --> BUILDING: Result Created
BUILDING --> PAUSED: PAUSE RESULT
PAUSED --> BUILDING: RESUME RESULT
BUILDING --> DONE: SELECT statement completes
BUILDING --> DONE: CANCEL RESULT
PAUSED --> DONE: CANCEL RESULT
note right of BUILDING
Query is actively running
end note
note right of PAUSED
Query execution paused
Can be resumed
end note
note right of DONE
Execution finished because either:
Successful completion
Error occurred
Manual cancellation
end note
note left of DONE
PAUSE/RESUME will error
when in DONE state
end note
```
#### Pause Result
```sql
PAUSE RESULT ;
PAUSE RESULT IF EXISTS ;
```
#### Resume Result
```sql
RESUME RESULT ;
RESUME RESULT IF EXISTS ;
```
#### Cancel Result
```sql
CANCEL RESULT ;
CANCEL RESULT IF EXISTS ;
```
#### Drop Result
```sql
DROP RESULT ;
DROP RESULT IF EXISTS ;
```
### Introspecting Results
Use `SHOW ALL RESULTS` to list all your results alongside their status and progress. The returned table also includes:
1. `name`: The name of the result
2. `error`: Any error message associated with the result (is empty if no error occurred)
3. `status`: The current status of the result (BUILDING, PAUSED, DONE)
4. `row_count`: The number of rows currently in the result cache. This grows as the result builds and is not
stable within the same transaction (it will increase as the result is being built).
```sql
SHOW ALL RESULTS;
--| name | error | status | row_count |
--|-------|---------------------------------------------------------------------|----------|-----------|
--| foo | (empty) | DONE | 100,000 |
--| bar | INTERRUPT Error: The RESULT "bar" has been manually canceled. | DONE | 10,000 |
--| hello | (empty) | PAUSED | 1,000 |
--| world | (empty) | BUILDING | 100 |
```
If you want to order the results, filter them or limit the output you can use the `MD_SHOW_RESULTS` table function:
```sql
FROM MD_SHOW_RESULTS() WHERE name = 'foo';
--| name | error | status | row_count |
--|------|---------|--------|-----------|
--| foo | (empty) | DONE | 100,000 |
```
## Best practices
- Use `LIMIT` when you need only a small sample so that `RESULT` can serve them quickly from the cache.
- Prefer deterministic `SELECT` statements for predictable caching and reuse.
- Pause or cancel long-running results you do not need immediately and remember to drop them when no longer in use.
## Notes and limitations
- `RESULT` accepts `SELECT` statements only.
- The cache may be partial while the result is building. Queries may wait briefly, use the cache, or re-run the `SELECT`.
- A canceled result cannot be resumed.
- Results are stored in memory and will not persist across client restarts.
## See also
- [Building data applications with MotherDuck](https://motherduck.com/blog/building-data-applications-with-motherduck/)
- [MotherDuck wasm npm package](https://www.npmjs.com/package/@motherduck/wasm-client?activeTab=readme)
- [MotherDuck wasm example repository](https://github.com/motherduckdb/wasm-client)
---
Source: https://motherduck.com/docs/getting-started/customer-facing-analytics
---
sidebar_position: 3
title: Customer-Facing Analytics Overview
sidebar_label: Customer-Facing Analytics
description: Using MotherDuck to ship analytics to customers
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Versions from '@site/src/components/Versions';
Customer-Facing Analytics (CFA) has requirements that traditional data architectures rarely meet. CFA demands sub-second response times, per-customer isolation, and integration with operational applications while serving many concurrent end users.
MotherDuck addresses these needs through two architectural capabilities:
- **[Per-user tenancy model](#1-per-user-tenancy-model)**: Each customer gets their own dedicated DuckDB instance (Duckling), providing full compute isolation (so no resource contention between users), predictable performance, and the ability to scale resources independently based on individual customer needs.
- **[Dual execution](#2-dual-execution-for-zero-latency-exploration)**: Enabled by DuckDB's lightweight architecture, queries can run both in the cloud and directly in the client's browser via WebAssembly, delivering near-instantaneous data exploration and filtering.
This guide explains how MotherDuck's architecture addresses the [core CFA challenges](#the-cfa-challenge) and provides [implementation patterns](#implementation-patterns) you can ship.
## What is Customer-Facing Analytics?
**Customer-Facing Analytics (CFA)** embeds analytics directly into operational applications for external users—customers, partners, or end-users—rather than internal stakeholders. Traditional BI targets internal teams, runs on batch-processed data models, serves a small number of users, and tolerates higher-latency queries.
| Dimension | Traditional BI | Customer-Facing Analytics |
|-----------|---------------|---------------------------|
| **Audience** | Internal (analysts, executives) | External (customers, partners) |
| **Delivery** | BI tools (Tableau, Looker) | Embedded in application |
| **Latency** | Seconds to minutes acceptable | Milliseconds to low seconds required |
| **Scale** | Dozens to hundreds of users | Thousands to millions of users |
| **Isolation** | Shared warehouse | Per-customer isolation needed |
| **Tech Stack** | Python, BI tools | JavaScript, embedded SDKs |
:::info
**What about AI-driven analytics?**
AI-driven analytics enables natural language interactions with data, allowing users to ask conversational questions like "What were our top-selling products last quarter?" and get immediate answers. MotherDuck's per-user tenancy and dual execution make it well-suited for building AI-driven analytics solutions. Learn how to [build analytics agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/).
:::
## The CFA Challenge
Building customer-facing analytics systems presents three core challenges:
### Challenge 1: Technology Stack Mismatch
For many applications, the data sits in a transactional database (OLTP database) like Postgres or MySQL. Engineers building CFA features often run analytical queries directly in a multi-tenant transactional database, which works until it fails at scale. Row-based storage and transactional databases are not designed for efficient analytical querying.

Operational applications often live in JavaScript/TypeScript, but traditional data tools are Python-centric. Operational teams work with OLTP databases built for transactions, while data teams use OLAP systems tuned for analytics but with their own challenges. Analytical workloads spike with user activity, while transactional loads need steady compute.
### Challenge 2: Latency Requirements
Users expect sub-second response times—typical for OLTP systems. Anything slower degrades the application experience. Distributed OLAP systems (BigQuery, Snowflake, Databricks) often have cold starts and coordination overhead that keep them above those targets, even for small datasets.
Teams often add caching layers or refresh pipelines between OLTP and OLAP. That adds complexity, introduces another failure point, and delays data freshness.
### Challenge 3: Multi-Tenancy at Scale
Switching to an analytics engine is the first step. Many legacy OLAP engines were designed for internal analytics and are provisioned as a single instance or cluster for all customer data, leading to downstream complexities:

- **Overprovisioning**: Resources sized for peak load sit idle most of the time
- **Noisy neighbors**: Large customer impacts small customers
- **Resource contention**: Concurrency limits affect everyone
- **Unpredictable performance**: Query times vary based on load
- **Security concerns**: All customer data in one shared system
## Why MotherDuck for Customer-Facing Analytics?
MotherDuck's architecture aligns with the requirements of Customer-Facing Analytics. Two architectural advantages set it apart:
### 1. Per-User Tenancy Model
MotherDuck provisions a Duckling (DuckDB instance) for each customer (or even for each customer's users). This per-user tenancy model isolates customer data and delivers consistent DuckDB performance to each user.

**Why single-node beats distributed compute clusters for CFA**
Traditional data warehouses use distributed computing with coordination overhead, data shuffling, and network latency. Even a fast query typically takes a second or more because of this overhead.
DuckDB and MotherDuck use single-node, optimized columnar execution:
- Zero network hops
- Zero coordination overhead
- Optimized vectorized execution
For CFA workloads that query one customer's data at a time, single-node execution is usually faster than distributed, and MotherDuck can reach **subsecond performance**.
#### Scaling Analytics Up and Out
Each customer (and possibly each of their users) has their **own MotherDuck Duckling** (DuckDB instance). One account could run hundreds or thousands of Ducklings at a time, or none. This serverless model underpins MotherDuck's advantage versus other engines.
MotherDuck's **cold start time is ~1 second**, and **per-second billing** (1-second minimum) keeps individual queries cost-efficient.
:::note
While MotherDuck supports provisioning one Duckling per user, start simpler. Begin with a single Duckling and introduce per-user isolation and dedicated [read scaling tokens](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) as your user base grows beyond 100 users or when tighter performance guarantees are needed.
:::

This isolated Duckling approach with vertical scaling delivers:
- **Perfect isolation**: No noisy neighbors
- **Predictable performance**: Dedicated resources per customer
- **Cost-effective**: Pay only for what each customer needs
- **Easy scaling**: Vertically scale individual ducklings as needed
Scale vertically by upgrading (or downgrading) the Duckling size your application uses for each customer, giving more power to higher-priority customers. If you need more compute or higher concurrency, launch [read scaling Ducklings](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for compute-hungry customers.
MotherDuck offers several [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for larger workloads.
For programmatic changes to user settings, refer to our [API docs](/sql-reference/rest-api/motherduck-rest-api/).
### 2. Dual Execution for Zero-Latency Exploration
As you build Customer-Facing Analytics into your product, you need sub-second response times so customers can explore their data quickly. Distributed data warehouses rarely meet that bar.
Because MotherDuck is built on DuckDB, you can connect from any DuckDB client. DuckDB is an in-process database, so it **can run on your server (3-tier) or directly in the client's browser through WebAssembly (1.5-tier)**.
This enables "dual execution": combining local data and compute with cloud data and compute in a single query, giving you flexibility to optimize for performance and cost.
**Traditional approach has multiple network hops:**

**DuckDB-Wasm enables client-side execution:**

Because the same DuckDB SQL engine runs on both MotherDuck Ducklings and on your customers' machines, you can offload data processing to their laptops and provide fast data exploration, filtering, and sorting using SQL. Customers do not need to install anything because DuckDB runs inside the web browser using WebAssembly (Wasm).
You can see this experience in [Column Explorer](/getting-started/interfaces/motherduck-quick-tour/) and [Instant SQL](https://motherduck.com/blog/introducing-instant-sql/) in the MotherDuck UI. Here's a teaser of it in action:

## Implementation Patterns
MotherDuck enables two distinct architectural patterns for customer-facing analytics:
### 3-Tier Architecture
**Best for:** Applications requiring server-side authorization, business logic, or deployments to stateful platforms.
**Typical web application architecture:**
```mermaid
flowchart LR
Frontend["Browser (React Frontend)"]
Backend["Application Server (Express / FastAPI)"]
MotherDuck[("MotherDuck (Cloud Database)")]
Frontend -->|"API Requests"| Backend
Backend -->|"Persistent Connection SQL Queries"| MotherDuck
style Frontend fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style Backend fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style MotherDuck fill:#fff3e0,stroke:#f57c00,stroke-width:2px
```
**Key Benefits:**
- Persistent database connection (connection pooling saves ~200ms per request)
- Fast query performance (~50-100ms)
- Server-side security and authorization
- Works with any DuckDB client (Node.js, Python, Go, Rust, Java)
**Performance optimizations:**
1. Intermediate table results: Pre-aggregate data on MotherDuck for faster queries
2. Prefer one well-structured SQL statement that returns all needed metrics (using SELECT with multiple aggregates, CASE/FILTER, or UNION ALL).
3. For multi-step workflows, wrap statements in a BEGIN … COMMIT transaction to ensure atomicity.
4. For data movement, use bulk operations (COPY, INSERT … SELECT) instead of many row-by-row calls.
5. Application Caching: Cache rarely-changing data on your server to avoid any extra queries on MotherDuck
**When to use:**
- You need server-side authorization and business logic
- You want a traditional, battle-tested architecture
- You're deploying to stateful services (Cloud Run, ECS, Kubernetes)
- Your team works with multiple languages
#### WANT TO GET STARTED? JUMP TO THE **[[BUILDER'S GUIDE]](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/)**
### 1.5-Tier Architecture (DuckDB-Wasm)
**Best for:** Read-heavy dashboards with `<1GB` data per user where you need maximum performance.
**Architecture:**
```mermaid
flowchart LR
Browser["Browser (React + MotherDuck Wasm SDK)"]
MotherDuck[("MotherDuck (Cloud Database)")]
Browser -->|"Initial data fetch Query execution"| MotherDuck
style Browser fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style MotherDuck fill:#fff3e0,stroke:#f57c00,stroke-width:2px
```
**Key Benefits:**
- Sub-10ms query latency (queries run locally in browser)
- Near-zero server costs (just data transfer)
- Offline support after initial data load
- Infinite scalability (users provide compute)
**Performance optimizations:**
1. **Optimize Initial Load**: Use Parquet compression, limit to `<50MB`
2. **IndexedDB Persistence**: Data survives page reloads
3. **Incremental Sync**: Only fetch new data since last sync
**When to use:**
- Read-heavy dashboards with frequent filtering/drilling
- Want `<10ms` query latency
- Data per user is `<1GB`
- Want to minimize server costs
:::info
WebAssembly applications using multi-threading (including DuckDB-Wasm) require cross-origin isolation. This means your page must be served with specific headers (`Cross-Origin-Embedder-Policy: require-corp` and `Cross-Origin-Opener-Policy: same-origin`), and resources from different origins must include a `Cross-Origin-Resource-Policy: cross-origin` header.
If you're building a new application, a dedicated page is easier to manage within these constraints. If you have existing dependencies (iframes, third-party scripts, etc.) and need to integrate analytics into an existing page, the 3-tier architecture is recommended.
:::
#### Hands-on Example
See our [1.5-tier architecture example](https://github.com/motherduckdb/wasm-client/tree/main/examples/nypd-complaints) demonstrating best practices for building a 1.5-tier analytics application using TypeScript, React and the MotherDuck Wasm SDK.
### 3-Tier vs 1.5-Tier
| Factor | 3-Tier | 1.5-Tier (DuckDB-Wasm) |
|--------|--------|------------------------|
| **Query latency** | ~50-100ms | ~5-20ms ⚡ |
| **Server cost** | $$ (per request) | $ (data transfer only) |
| **Scalability** | High (auto-scaling) | ♾️ Unlimited |
| **Data per user** | Any size | `<1GB` optimal |
| **Offline support** | ❌ No | ✅ Yes |
| **Server-side logic** | ✅ Yes | ❌ Limited |
| **Best for** | Complex logic, auth | Read-heavy dashboards |
### Next Steps
1. **Sign up for MotherDuck:** [motherduck.com](https://motherduck.com)
2. Follow the hands-on [Builder's Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/).
### Additional Resources
- [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/)
- [Read Scaling Ducklings](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
- [Ducklinig Sizes](/about-motherduck/billing/duckling-sizes/)
---
Source: https://motherduck.com/docs/getting-started/data-warehouse
---
sidebar_position: 2
title: Data Warehousing Overview
sidebar_label: Data Warehousing
description: Learn to use MotherDuck as a Data Warehouse
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Versions from '@site/src/components/Versions';
## Introduction to MotherDuck for Data Warehousing
MotherDuck is a cloud-native data warehouse built on top of [DuckDB](https://duckdb.org/docs/sql/introduction) that adds enterprise features like cloud storage, sharing, and collaboration to DuckDB's fast analytical engine. The platform serves these needs through its serverless architecture, sharing model, and WASM capabilities. It benefits data analysts with AI-assisted SQL, data engineers with familiar tools like dbt, and data scientists with hybrid local-cloud processing.

MotherDuck integrates with popular data tools including [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/), [Fivetran](https://fivetran.com/docs/destinations/motherduck#motherduck), and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) for data ingestion, [dbt](/docs/integrations/transformation/dbt) for transformations, [Tableau](/integrations/bi-tools/tableau/) and [PowerBI](/integrations/bi-tools/powerbi/) for visualization, and [Airflow](https://airflow.apache.org/docs/) and [Dagster](https://docs.dagster.io/examples/bluesky) for orchestration. This enables teams to build data warehousing solutions using their existing tools.
## Data Ingestion
An easy way to get into MotherDuck is using [ecosystem partners](/integrations/ingestion/) like [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/), [Fivetran](https://fivetran.com/docs/destinations/motherduck), [dlthub](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck), and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) but you can also create custom data engineering pipelines.
MotherDuck is very flexible with how to load your data:
- **From data you have on your filesystem:** If you have CSVs, JSON files or DuckDB databases sitting around, It's easy to load it into your MotherDuck data warehouse.
- **From a data lake on a cloud object store:** If you already have your data in a data lake, as parquet, delta, iceberg or other formats, DuckDB has abstractions for Secrets, Object Storage, and many file types. When combined, this means that many file types can be read into DuckDB from Object Storage with only SQL. Though not as performant as MotherDuck's native storage layer, you can also query your infrequently-accessed data directly from your data lake with MotherDuck.
- **Using Native APIs in many languages:** DuckDB supports numerous languages such as C++, Python, and Java, in addition to its own mostly Postgres-compatible SQL dialect. Using these languages, Data Engineers and Developers can easily integrate with MotherDuck without having to pick up yet-another-language.
### Best Practices for Programmatic Loading
The fastest way to load data is to load single tables in large batches, saturating the network connection between MotherDuck and the source data. DuckDB is incredibly good at handling both files and some kinds of in-memory objects, like Arrow dataframes. As an aside, Parquet files compress at 5-10x compared to CSV, which means you can get 5-10x more throughput simply by using Parquet files. Similarly, open table formats like Delta & Iceberg share those performance gains.
On the other hand, small writes on multiple tables will lead to suboptimal performance. While MotherDuck does indeed offer [ACID compliance](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid.html), it is not an OLTP system like Postgres! Significantly better performance can be achieved by using queues to batch writes to tables. While some latency is introduced with this methodology, the improvement in throughput should far outweigh the cost of doing small writes.
Streaming workloads are better suited to be handled with queues in front of MotherDuck.
## Transforming Data
Once data is loaded into MotherDuck, it must be transformed into a model that matches the business purpose and needs. This can be done directly in MotherDuck using the powerful library of SQL functions offered by [DuckDB](https://duckdb.org/docs/sql/introduction.html). Many data engineers prefer to use data transformation tools like the open source [dbt Core](https://github.com/dbt-labs/dbt-core). More details specifically about using dbt with MotherDuck can be read in the [blog on this topic](https://motherduck.com/blog/duckdb-dbt-e2e-data-engineering-project-part-2/).
VIDEO
For more in-depth reading, the free **[DuckDB in Action eBook](https://motherduck.com/duckdb-book-brief/)** explores these concepts with real-world examples.
## Sharing Data
Once your data is loaded into MotherDuck and appropriately transformed for use by your analysts, you can make that data available using MotherDuck's [sharing capabilities](/key-tasks/sharing-data/sharing-overview/). This can allow every user in your organization to access the data warehouse in the MotherDuck UI, in their Python code or with other tools. Admins don't need to worry that the queries run by users will impact their data pipelines as users have isolated compute.
## Serving Data Analytics
Do you want to serve reports or dashboards for your users? MotherDuck provides tokens that can be used with [popular tools](/integrations/bi-tools/) like Tableau & Power BI to access your data warehouse to serve business intelligence to end users.
### Ducks all the Way Down: Building Data Apps
MotherDuck is built on DuckDB because it is an extremely efficient SQL engine inside a ~20MB executable. This allows you to run the same DuckDB engine which powers your data warehouse inside your web browser, creating highly-interactive visualizations with near-zero latency. This enhances your experience when using the [Column Explorer](/getting-started/interfaces/motherduck-quick-tour/#diving-into-your-data-with-column-explorer) in the MotherDuck UI.
One thing that is unique to MotherDuck is its capabilities for serving data into the web layer via [WASM](/sql-reference/wasm-client). These capabilities enable novel analytical user actions, including very intensive queries that would be prohibitively expensive in other query engines. It also supports data mashup from various sources, so that data in the warehouse can easily be combined with other sources, like files in CSV, JSON, or Parquet.
## Scaling up & out for DWH use cases
Furthermore, MotherDuck has a unique scaling model, of which there are four key concepts relevant for Data Warehousing.
### Vertical Scaling
Compute can scale up with larger DuckDB compute instances called Ducklings. Currently, we offer 5 sizes: [Pulse, Standard, Jumbo, Mega, and Giga](/about-motherduck/billing/duckling-sizes/).
Unlike other data warehouses, every Duckling (compute instance) is isolated from each other: one user's queries will not impact another user's from completing. This per-user tenancy concept assures you can size your warehouse correctly and use your resources very efficently.
### Horizontal Scaling
For serving data to BI tools or other spiky consumers, [Read-Scaling Replicas](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) can absorb the loads and maintain low latency on user interactivity. These should be owned by the same user or service accounts that run production jobs, although they can also leverage [`SHARES`](/key-tasks/sharing-data/sharing-overview/) depending on preferences.
### Per-user tenancy
Especially for production runs, user should leverage seperate user or [service accounts](https://motherduck.com/docs/key-tasks/service-accounts-guide/) with their own dedicated compute for updating and maintaining core tables.
### Distributed DuckDB
DuckDB and MotherDuck work together as a distributed system that automatically optimizes query execution between local and cloud resources through Dual Execution, enabling efficient data access regardless of location.
## Orchestration
In order to keep data up to date inside of MotherDuck, often an orchestrator like [Airflow](https://airflow.apache.org/) or [Dagster](https://dagster.io/) can be used. This runs jobs in specific orders to load & transform data, as well managing workflow and observability, which is necessary for handling more complex data engineering pipelines.
If this is your first data warehouse, you might consider starting with something as simple as [GitHub actions](https://github.com/features/actions) or cron jobs to orchestrate your data pipelines.
:::info
For a more in-depth guide, check out the [Data Warehousing Guide](/key-tasks/data-warehousing/)
:::
## Need Help Along the Way?
Please do not hesitate to **[contact us](https://motherduck.com/customer-support/)** if you need help along your journey. We are here to help you succeed with your data warehouse!
---
Source: https://motherduck.com/docs/getting-started/e2e-tutorial/e2e-tutorial
---
title: "MotherDuck Tutorial"
sidebar_label: "MotherDuck Tutorial"
description: "Complete end-to-end tutorial to get started with MotherDuck and DuckDB"
---
# MotherDuck Tutorial
This comprehensive guide will take you from your first query to sharing databases with your team.
## What You'll Learn
This tutorial is in 3 parts, you'll discover how to:
- 🔍 **[1. Query shared data](./part-1)** - Run your first SQL queries on publicly available datasets
- 📊 **[2. Load your own data](./part-2)** - Upload and work with your own data from files and datasets
- 🤝 **[3. Share databases](./part-3)** - Collaborate by sharing databases with team members
:::tip
Each part of this tutorial builds on the previous one, but you can also jump to specific sections if you're looking to learn particular features.
:::
## Prerequisites
To follow this tutorial, you'll need:
- A **MotherDuck account** ([sign up for free](https://app.motherduck.com))
- Basic **SQL knowledge** (we'll guide you through the queries)
- You have several ways to run the queries:
* Execute them directly on this documentation website 🪄
* Use the [MotherDuck UI](https://app.motherduck.com) for the full interface experience
* Connect with any [DuckDB client](../interfaces/)(Python, Java, DuckDB CLI) of your choice
**⏱️ Estimated time:** 20-30 minutes for the complete tutorial
Let's get started! 🚀
---
Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-1
---
sidebar_position: 1
title: "1 - Running Your First Query"
sidebar_label: "1 - First Query"
description: "Learn MotherDuck and DuckDB by running your first queries on shared data"
---
import MotherDuckSQLEditor from '@site/src/components/MotherDuckSQLEditor';
import Versions from '@site/src/components/Versions';
In this multi-part tutorial, you will go through a full end-to-end example on how to use MotherDuck and DuckDB, **push** and **share** data, take advantage of **hybrid query** execution and query data using SQL through the **MotherDuck UI** or **DuckDB CLI**.
:::note
MotherDuck currently supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client version through .
:::
## Running your first query
### Query from a shared database
Before playing with the dataset we just downloaded, let's run a couple simple queries on the shared sample database. This database contains a series of MotherDuck's public datasets and it's *auto-attached* for each user, meaning it's accessible directly within your MotherDuck session without any additional setup.
We will query the NYC 311 dataset first. This dataset contains over thirty million complaints citizens have filed with the New York City government. We'll select several columns and look at the complaints filed over a few days to demonstrate the [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/) feature of the MotherDuck UI.
Want to explore the full interface? Try running this query in the [MotherDuck UI](https://app.motherduck.com/) to experience the complete dashboard, visual query builder, and advanced analytics features.
:::info
In the MotherDuck UI, the Column Explorer provides quick visual summaries of your data, helping you understand distributions and patterns at a glance.

:::
For the remainder of this tutorial, we'll focus on the NYC taxi data and perform aggregation queries representative of the types of queries often performed in analytics databases. We will first get the average fare based on the number of passengers. The source dataset covers data for the whole month of November 2022.
:::info
The `sample_data` database is auto-attached but for any other shared database you would like to read, you would need to use the `ATTACH` statement. Read more about querying a shared MotherDuck database **[here](/key-tasks/sharing-data/sharing-data.mdx).**
:::
:::tip
**Using a DuckDB client?** You can run these same queries in any of the DuckDB client after connecting with `ATTACH 'md:';` - you'll be prompted to authenticate if no `motherduck_token` is found as environment variable.
:::
### Query from S3
Our shared sample database is great to play with but you probably want to use your own data on AWS S3. Let's see how to do that.
The sample database source data is actually available on our public AWS S3 bucket. Let's run the exact same query but instead of pointing to a MotherDuck table, we will point to a parquet file on S3.
For a secured bucket, we need to pass the AWS credentials - check [authenticating to S3](../../integrations/cloud-storage/amazon-s3.mdx) for more information.
Here's the updated query while reading from S3:
:::info
DuckDB automatically detects the appropriate reader based on file extension, so there’s no need to explicitly specify a function. However, if you need more control over how files are read, you can use the corresponding functions directly:
```sql
SELECT * FROM read_parquet('my_data.parquet');
SELECT * FROM read_csv_auto('my_data.csv');
SELECT * FROM read_json_auto('my_data.json');
```
These functions allow you to customize parsing behavior or override automatic detection when needed.
:::
## Next Steps
Great! You've successfully run your first queries on MotherDuck. You've learned how to:
✅ Query shared databases like `sample_data`
✅ Read data directly from S3
👉 **[Continue to Part 2: Loading Your Dataset →](../part-2)**
---
Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-2
---
sidebar_position: 2
title: "2 - Loading Your Data"
sidebar_label: "2 - Loading Data"
description: "Learn how to load your own datasets into MotherDuck"
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import MotherDuckSQLEditor from '@site/src/components/MotherDuckSQLEditor';
In this section, you'll learn how to load your own data into MotherDuck and run powerful hybrid queries that combine local and cloud data.
👈 **[Go back to Part 1: Running Your First Query](../part-1)**
## Loading your data
### Loading Data using CREATE TABLE AS SELECT
The `CREATE TABLE AS SELECT` (CTAS) pattern creates a new table and populates it with data in a single operation:
```sql
CREATE OR REPLACE TABLE docs_playground.my_table AS SELECT * FROM 'my_data.csv';
```
### Loading Data using INSERT INTO
The `INSERT INTO` pattern allows you to append data to existing tables, update specific records, and manage data incrementally:
```sql
-- First, create the table structure
CREATE TABLE docs_playground.my_table AS SELECT * FROM 'my_data.csv' LIMIT 0;
-- Then load data incrementally
INSERT INTO docs_playground.my_table SELECT * FROM 'new_data.csv';
INSERT OR REPLACE INTO docs_playground.my_table SELECT * FROM 'updated_data.csv';
```
:::tip
While `CREATE TABLE AS SELECT` is convenient for one-time loads or small datasets, for larger datasets and production workflows, we recommend using `INSERT INTO`. This approach provides better control over data loading, allows for incremental updates, and is more efficient for ongoing data management.
:::
There are several ways to get your data into MotherDuck, depending on where your data currently lives:
### From Local File System
To load data files from your file system into MotherDuck, you'll need:
1. A valid MotherDuck token stored as the `motherduck_token` environment variable
2. A DuckDB client (DuckDB CLI, Python, etc.)
To create a MotherDuck token, navigate to the MotherDuck UI, click your organization name in the top left, then go to **Settings > Integrations > Access Token**. For detailed instructions, see our [authentication guide](../../key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md).
Install the DuckDB CLI for macOS/Linux. For other operating systems, see the [DuckDB installation guide](https://duckdb.org/docs/installation/).
```bash
curl -s https://install.motherduck.com | sh
```
Launch the DuckDB CLI:
```bash
duckdb
```
```sql
-- Connect to MotherDuck
ATTACH 'md:';
-- Load CSV data from your local file into the playground database
CREATE TABLE docs_playground.popular_currency_rate_dollar AS
SELECT * FROM './popular_currency_rate_dollar.csv';
```
Install DuckDB using your preferred package manager, such as pip:
```bash
pip install duckdb
```
```python
import duckdb
# Connect to MotherDuck
conn = duckdb.connect('md:')
# Load data into the playground database (automatically created)
conn.execute("""
CREATE TABLE docs_playground.popular_currency_rate_dollar AS
SELECT * FROM './popular_currency_rate_dollar.csv'
""")
```
Head over to the `Create table from file` button in the MotherDuck UI and upload your file directly. This works great for smaller files and provides a visual interface.


### From Remote Storage (S3, GCS, etc.)
For data already stored in cloud storage, you have multiple options:
You can run queries directly against remote storage using our interactive SQL editor:
```sql
ATTACH 'md:';
CREATE TABLE docs_playground.popular_currency_rate_dollar AS
SELECT * FROM 's3://us-prd-motherduck-open-datasets/misc/csv/popular_currency_rate_dollar.csv';
```
```python
import duckdb
conn = duckdb.connect('md:')
conn.execute("""
CREATE TABLE docs_playground.popular_currency_rate_dollar AS
SELECT * FROM 's3://your-bucket/your-file.csv'
""")
```
:::info
For private AWS s3 buckets, you'll need to configure AWS credentials. Check our [AWS s3 authentication guide](../../integrations/cloud-storage/amazon-s3.mdx) for details.
:::
### Querying Your Data
Once your data is loaded, you can query it from any interface:
```sql
ATTACH 'md:';
FROM docs_playground.popular_currency_rate_dollar LIMIT 10;
```
```python
import duckdb
# Connect to MotherDuck
conn = duckdb.connect('md:')
# Query your data
result = conn.sql("FROM docs_playground.popular_currency_rate_dollar LIMIT 10").fetchall()
print(result)
```
👉 **[Continue to Part 3: Sharing Your Database →](../part-3)**
---
Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-3
---
sidebar_position: 3
title: "3 - Sharing Your Database"
sidebar_label: "3 - Sharing Data"
description: "Learn how to share your databases and collaborate with your team"
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import MotherDuckSQLEditor from '@site/src/components/MotherDuckSQLEditor';
In this section, you'll learn how to share your databases with colleagues and collaborate effectively using MotherDuck's sharing features.
👈 **[Go back to Part 2: Loading Your Dataset](../part-2)**
## Creating and Sharing Your Data
Let's create a table with sample data in your playground database, then share it with others. The `docs_playground` database is automatically created when you connect, so you can start experimenting right away!
First, let's populate your playground database with some currency exchange data:
## Sharing your database
With your database and sample data in place, you can now share this dataset with others. MotherDuck shares create a point-in-time snapshot of your database that can be accessed by specified users or groups.
When creating a share, the most important parameters control **access scope**, **visibility**, and **update behavior**. By default, shares use `ACCESS ORGANIZATION` (only your organization members can access), `VISIBILITY DISCOVERABLE` (appears in your organization's shared database list), and `UPDATE MANUAL` (creates a static snapshot that doesn't auto-update).
The syntax to create a share visible to everyone in your Organization is `CREATE SHARE from `.
You can also create shares through the MotherDuck UI by clicking the dropdown menu next to your database and selecting the share option. This will open a window to configure your share settings.


Once created, all members of your organization will be able to view this share in the MotherDuck UI under "Shared with me".
Learn more about sharing in MotherDuck [here](../../key-tasks/sharing-data/sharing-within-org.md).
## Understanding Share Configuration
When creating shares, you can control three key aspects: **who can access** the data, **how users discover** the share, and **when the data updates**. Each parameter has specific options that determine the sharing behavior.
### ACCESS - Who Can Access the Share
- **`ACCESS ORGANIZATION`** (default): Only members of your organization can access the share
- **`ACCESS UNRESTRICTED`**: All MotherDuck users in the same cloud region as your Organization can access the share
- **`ACCESS RESTRICTED`**: Only the share owner has initial access; additional users must be granted access via `GRANT` commands
### VISIBILITY - How Users Discover the Share
- **`VISIBILITY DISCOVERABLE`** (default): The share appears in your organization's "Shared with me" section for easy discovery
- **`VISIBILITY HIDDEN`**: Share can only be accessed via direct URL; not listed in any user interface
:::info Important Visibility Rules
- Organization and Restricted shares default to `DISCOVERABLE`
- Unrestricted shares can only be `HIDDEN`
- Hidden shares can only be used with `ACCESS RESTRICTED`
:::
### UPDATE - When Share Data Updates
- **`UPDATE MANUAL`** (default): Share content only updates when you run `UPDATE SHARE` command
- **`UPDATE AUTOMATIC`**: Share automatically reflects database changes within ~5 minutes
### Example Share Configurations
## Querying Shared Data
After creating a share, authorized users can access the shared database in two ways: by using the share URL directly or by attaching it as a database alias:
```sql
-- Attach a shared database
ATTACH 'md:_share/docs_playground/b556630d-74f1-435c-9459-cfb87d349cb3' AS shared_currency;
-- Query the shared data
SELECT * FROM shared_currency.currency_rates
WHERE rate_to_usd < 1.0
ORDER BY rate_to_usd DESC;
```
## Managing Shares
You can also manage your existing shares:
## Going further
Now that you've mastered the basics, here are some next steps to explore:
- Learn about [MotherDuck's Dual Execution](/key-tasks/running-hybrid-queries/) feature
- Connect to your favorite BI tools: [Tableau](../../integrations/bi-tools/tableau.md), [Power BI](../../integrations/bi-tools/powerbi.md) and learn more about [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
- Set up data pipelines with [dbt](../../integrations/transformation/dbt.md)
- Look at our [supported integrations](/integrations) to integrate with your data stack.
---
Source: https://motherduck.com/docs/getting-started/getting-started
---
title: MotherDuck Docs
sidebar_class_name: getting-started-icon
description: Getting started with MotherDuck serverless cloud data warehouse.
---
import Versions from '@site/src/components/Versions';
import DuckDBDocLink from '@site/src/components/DuckDBDocLink';
import IconGrid from '@site/src/components/IconGrid';
import HorizontalLayout from '@site/src/components/HorizontalLayout';
import HorizontalDivider from '@site/src/components/HorizontalDivider';
import useBaseUrl from '@docusaurus/useBaseUrl';
import Admonition from '@theme/Admonition';
{/* Desktop version - top right */}
{
e.currentTarget.style.transform = 'scale(1.05) translateY(-2px)';
e.currentTarget.style.boxShadow = '0 8px 24px rgba(0, 0, 0, 0.15)';
e.currentTarget.style.borderColor = 'rgba(59, 130, 246, 0.6)';
}}
onMouseOut={(e) => {
e.currentTarget.style.transform = 'scale(1) translateY(0px)';
e.currentTarget.style.boxShadow = '0 4px 12px rgba(0, 0, 0, 0.1)';
e.currentTarget.style.borderColor = 'rgba(59, 130, 246, 0.3)';
}}
>
New to MotherDuck
? Start here.
{/* Mobile version - below title */}
{
e.currentTarget.style.transform = 'translateY(-1px)';
e.currentTarget.style.boxShadow = '0 4px 12px rgba(0, 0, 0, 0.15)';
e.currentTarget.style.borderColor = 'rgba(59, 130, 246, 0.6)';
}}
onMouseOut={(e) => {
e.currentTarget.style.transform = 'translateY(0px)';
e.currentTarget.style.boxShadow = '0 2px 8px rgba(0, 0, 0, 0.1)';
e.currentTarget.style.borderColor = 'rgba(59, 130, 246, 0.3)';
}}
>
Watch Product Tour
}
variant="square"
items={[
{
icon: "book.svg",
title: "MotherDuck Tutorial",
description: "Build end-to-end analytics workflows with data loading, transformation, and sharing",
link: "/docs/getting-started/e2e-tutorial/"
},
{
icon: "database.svg",
title: "Data Warehousing Overview",
description: "Build a modern data warehouse with seamless ingestion and transformation",
link: "/docs/getting-started/data-warehouse/"
},
{
icon: "cfa.svg",
title: "Customer-Facing Analytics Overview",
description: "Build powerful analytics applications with MotherDuck Wasm SDK",
link: "/docs/getting-started/customer-facing-analytics/"
},
]}
/>
::::note
MotherDuck currently supports DuckDB .
- In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through .
- In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck is compatible with client versions through .
::::
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/c
---
title: C
Description: MotherDuck + C
sidebar_class_name: hidden
---
The MotherDuck integration with C is no different than DuckDB. For more information, see [C](https://duckdb.org/docs/stable/clients/c/overview.html) in DuckDB Documentation.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/connect-query-from-python/choose-database
---
sidebar_position: 2
title: Specify MotherDuck database
description: Specify MotherDuck database
---
When you connect to MotherDuck you can specify a database name or omit the database name and connect to the default database.
- If you use `md:` without a database name, you connect to a default MotherDuck database called `my_db`.
- If you use `md:`, you connect to the `` database.
After you establish the connection, either the default database or the one you specify becomes the current database.
You can run the `USE` command to switch the current database, as shown in the following example.
```python
#list the current database
con.sql("SELECT current_database()").show()
# ('database1')
#switch the current database to database2
con.sql("USE database2")
```
To query a table in the current database, you can specify just the table name. To query a table in a different database, you can include the database name when you specify the table. You don't need to switch the current database. The following examples demonstrate each method.
```sql
#querying a table in the current database
con.sql("SELECT count(*) FROM mytable").show()
#querying a table in another database
con.sql("SELECT count(*) FROM another_db.another_table").show()
```
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/connect-query-from-python/installation-authentication
---
sidebar_position: 1
title: "DuckDB Python Client: Installation & authentication"
sidebar_label: Installation & authentication
description: How to install DuckDB and connect to MotherDuck
hide_title: true
---
import Versions, { duckdbVersionRanges } from '@site/src/components/Versions';
# Installation & authentication
## Prerequisites
MotherDuck Python supports the following operating systems:
- Linux (x64, glibc v2.31+, equivalent to ubuntu v20.04+)
- Mac OSX 11+ (M1/ARM or x64)
- Python 3.4 or later
Please let us know if your configuration is unsupported.
## Installing DuckDB
:::note
MotherDuck currently supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client version through .
:::
Use the following `pip` command to install the supported version of DuckDB:
{`pip install duckdb==${ duckdbVersionRanges["us-east-1"].max }`}
## Connect to MotherDuck
You can connect to and work with multiple local and MotherDuck-hosted DuckDB databases at the same time. Currently, the connection syntax varies depending on how you’re opening local DuckDB and MotherDuck.
### Authenticating to MotherDuck
You can authenticate to MotherDuck using either browser-based authentication or an access token. Here are examples of both methods:
#### Using browser-based authentication
```python
import duckdb
# connect to MotherDuck using 'md:' or 'motherduck:'
con = duckdb.connect('md:')
```
When you run this code:
1. A URL and a code will be displayed in your terminal.
2. Your default web browser will automatically open to the URL.
3. You'll see a confirmation request to approve the connection.
4. Once, approved, if you're not already logged in to MotherDuck, you'll be prompted to do so.
5. Finally, you can close the browser tab and return to your Python environment.
This method is convenient for interactive sessions and doesn't require managing access tokens.
#### Using an access token
For automated scripts or environments where browser-based auth isn't suitable, you can use an access token:
```python
import duckdb
# Initiate a MotherDuck connection using an access token
con = duckdb.connect('md:?motherduck_token=')
```
Replace `` with an actual token generated from the MotherDuck UI.
To learn more about creating and managing access tokens, as well as other authentication options, see our guide on [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md).
### Connecting to MotherDuck
Once you've authenticated, you can connect to MotherDuck and start working with your data. Let's look at a few common scenarios.
#### Connecting directly to MotherDuck
Here's how to connect to MotherDuck and run a simple query:
```python
import duckdb
# Connect to MotherDuck via browser-based authentication
con = duckdb.connect('md:my_db')
# Run a query to verify the connection
con.sql("SHOW DATABASES").show()
```
:::tip
When connecting to MotherDuck, you need to specify a database name (like `my_db` in the example). If you're a new user, a default database called `my_db` is automatically created when your account is first set up. You can query any table in your connected database by just using its name. To switch databases, use the `USE` command.
:::
#### Working with both MotherDuck and local databases
MotherDuck allows you to work with both cloud and local databases simultaneously. Here's how:
````python
import duckdb
# Connect to MotherDuck first, specifying a database
con = duckdb.connect('md:my_db')
# Then attach local DuckDB databases
con.sql("ATTACH 'local_database1.duckdb'")
con.sql("ATTACH 'local_database2.duckdb'")
# List all connected databases
con.sql("SHOW DATABASES").show()
````
#### Adding MotherDuck to an existing local connection
If you're already working with a local DuckDB database, you can easily add a MotherDuck connection:
````python
import duckdb
# Start with a local DuckDB database
local_con = duckdb.connect('local_database.duckdb')
# Add a MotherDuck connection, specifying a database
local_con.sql("ATTACH 'md:my_db'")
````
This is another approach to give you the flexibility to work with both local and cloud data in the same session.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/connect-query-from-python/loading-data-into-md
---
sidebar_position: 3
title: Loading data into MotherDuck with Python
sidebar_label: Loading data into MotherDuck
---
## Copying a table from a local DuckDB database into MotherDuck
You can currently use `CREATE TABLE AS SELECT` to load CSV, Parquet, and JSON files into MotherDuck from either local, Amazon S3, or https sources as shown in the following examples.
```python
# load from local machine into table mytable of the current/active used database
con.sql("CREATE TABLE mytable AS SELECT * FROM '~/filepath.csv'");
# load from an S3 bucket into table mytable of the current/active database
con.sql("CREATE TABLE mytable AS SELECT * FROM 's3://bucket/path/*.parquet'")
```
If the source data matches the table’s schema exactly you can also use `INSERT INTO`, as shown in the following example.
```python
# append to table mytable in the currently selected database from S3
con.sql("INSERT INTO mytable SELECT * FROM 's3://bucket/path/*.parquet'")
```
## Copying an entire local DuckDB database To MotherDuck
MotherDuck supports copying your currently opened DuckDB database into a MotherDuck database. The following example copies a local DuckDB database named `localdb` into a MotherDuck-hosted database named `clouddb`.
```python
# open the local db
local_con = duckdb.connect("localdb.ddb")
# connect to MotherDuck
local_con.sql("ATTACH 'md:'")
# The from indicates the file to upload. An empty path indicates the current database
local_con.sql("CREATE DATABASE clouddb FROM CURRENT_DATABASE()")
```
A local DuckDB database can also be copied by its file path:
```sql
local_con = duckdb.connect("md:")
local_con.sql("CREATE DATABASE clouddb FROM 'localdb.ddb'")
```
See [Loading Data into MotherDuck](/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck.mdx) for more detail.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/connect-query-from-python/query-data
---
sidebar_position: 4
title: Query data
---
For more information about database manipulation, see [MotherDuck SQL reference](/docs/sql-reference/motherduck-sql-reference/).
MotherDuck uses DuckDB under the hood, so nearly all [DuckDB SQL](https://duckdb.org/docs/) works in MotherDuck without differences.
MotherDuck leverages "hybrid execution" to decide the best location to execute queries, including across multiple locations. For example, if your data lives on your laptop, MotherDuck executes queries against that data on your laptop. Similarly, if you are joining data on your laptop to data on Amazon S3, MotherDuck executes each part of the query where data lives before bringing it together to be joined locally.
## Querying data In MotherDuck
You can query data loaded into MotherDuck the same way you query data in your DuckDB databases. MotherDuck executes these queries using resources in the cloud.
```sql
# table table_name is in MotherDuck storage
con.sql("SELECT * FROM table_name").show();
```
## Querying data on your machine
You can use MotherDuck to query files on your local machine. These queries execute using your machine's resources.
```sql
# query a Parquet file on your local machine
con.sql("SELECT * FROM '~/file.parquet'").show();
# query a table in a local DuckDB database
con.sql("SELECT * FROM local_table").show();
```
## Joining data across multiple locations
You can use MotherDuck to join data:
- In MotherDuck
- On S3 or other cloud object stores (Azure, GCS, R2, etc)
- On your local machine
## What's next ?
Ready to share your DuckDB data with your colleagues? Read up on [Sharing In MotherDuck](/key-tasks/sharing-data/sharing-data.mdx).
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/go
---
title: Go
Description: MotherDuck + GoLang
sidebar_class_name: hidden
---
The MotherDuck integration with Go is no different than DuckDB. For more information, see [Go](https://duckdb.org/docs/stable/clients/go.html) in DuckDB Documentation.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/java
---
title: Java (JDBC)
Description: MotherDuck + Java
sidebar_class_name: hidden
---
The MotherDuck integration with Java is no different than DuckDB. For more information, see [Java](https://duckdb.org/docs/stable/clients/java.html) in DuckDB Documentation.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/nodejs
---
title: Node.js
Description: MotherDuck + Node.js
sidebar_class_name: hidden
---
The MotherDuck integration with Node.js uses the `@duckdb/node-api` package. For more information, see [Node.js (Neo)](https://duckdb.org/docs/stable/clients/node_neo/overview.html) in DuckDB Documentation. This package replaces the deprecated `duckdb` npm package.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/odbc
---
title: ODBC
Description: MotherDuck + ODBC
sidebar_class_name: hidden
---
The MotherDuck integration with ODBC is no different than DuckDB. For more information, see [ODBC](https://duckdb.org/docs/stable/clients/odbc/overview.html) in DuckDB Documentation.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other
---
title: Others
Description: MotherDuck + Other Client APIss
---
DuckDB supports various client APIs, such as Java (JDBC), ODBC, and C.
To see all client APIs that work with DuckDB, please look at the [DuckDB Documentation](https://duckdb.org/docs/stable/clients/overview.html).
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/r
---
title: R
Description: MotherDuck + R
sidebar_class_name: hidden
---
The MotherDuck integration with R is no different than DuckDB. For more information, see [R](https://duckdb.org/docs/stable/clients/r.html) in DuckDB Documentation.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/rust
---
title: Rust
Description: MotherDuck + Rust
sidebar_class_name: hidden
---
The MotherDuck integration with Rust is no different than DuckDB. For more information, see [Rust](https://duckdb.org/docs/stable/clients/rust.html) in DuckDB Documentation.
---
Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/wasm
---
title: WebAssembly (Wasm)
Description: MotherDuck + WebAssembly
sidebar_class_name: hidden
---
The MotherDuck offers its own fork of DuckDB Wasm, which is [documented here](/sql-reference/wasm-client/).
For more information about DuckDB Wasm, see [WebAssembly](https://duckdb.org/docs/stable/clients/wasm/overview.html) in DuckDB Documentation.
---
Source: https://motherduck.com/docs/getting-started/interfaces/connect-query-from-duckdb-cli
---
sidebar_position: 3
title: "DuckDB CLI: Installation and Connecting to MotherDuck"
sidebar_label: DuckDB CLI
description: Learn to connect and query databases using MotherDuck from the DuckDB CLI
hide_title: true
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import DownloadLink from '@site/src/components/DownloadLink';
import Versions from '@site/src/components/Versions';
# DuckDB CLI
## Installation
:::note
MotherDuck currently supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client version through .
:::
Download and install the DuckDB binary, depending on your operating system.
1. Download the 64-bit Windows binary
2. Extract the Zip File.
The best way to install the CLI is to use the MotherDuck install script:
### Install with bash
```bash
curl -s https://install.motherduck.com | sh
```
1. Download the Linux binary:
- For 64-bit, download the binary
- For arm64/aarch64, download the binary
2. Extract the Zip File.
For more information, see the [DuckDB installation documentation](https://duckdb.org/docs/installation/).
## Run the DuckDB CLI
Run DuckDB using the command:
```sh
./duckdb
```
By default, DuckDB will start with an in-memory database and any changes will not be persisted. To create a persistent database in the DuckDB CLI, you can specify a new filename as the first argument to the `duckdb` command.
Example:
```sh
./duckdb mydatabase.ddb
```
## Connect to MotherDuck
You can connect to MotherDuck by executing the following in DuckDB CLI. DuckDB will automatically download and load the signed MotherDuck extension.
```bash
ATTACH 'md:';
```
DuckDB will prompt you to authenticate with MotherDuck using your default web browser. Follow the instructions displayed in the terminal.
Test your MotherDuck connection using the following command. It will run in the cloud to display a list of your MotherDuck databases.
```sql
show databases;
```
Congrats 🎉 You are connected!
Now you can create databases and switch between them. You can also connect to your local DuckDB databases alongside databases hosted in MotherDuck, and interact with both!
To know more about how to persist your authentication credentials, read [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md)
:::info
Note you can also connect to MotherDuck directly when starting DuckDB CLI by running the following command:
```bash
duckdb "md:"
```
:::
## Accessing the MotherDuck UI from the CLI:
You can access the MotherDuck UI from the CLI by executing the following command in the terminal:
```bash
duckdb -ui
```
If you are already in a DuckDB session, you can instead use `CALL start_ui();`
## Upgrading MotherDuck via the DuckDB CLI:
If you have previously installed the extension, but we have upgraded the service, you may need to run the `FORCE INSTALL` command as shown in the following example.
```sh
FORCE INSTALL motherduck
```
---
Source: https://motherduck.com/docs/getting-started/interfaces/interfaces
---
title: MotherDuck Interfaces
description: MotherDuck Offers a variety of interfaces (APIs) for integration
---
## Client Interfaces
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour
---
sidebar_position: 3
title: MotherDuck Web UI
description: Learn to use the MotherDuck Web UI to configure and query databases
---
## Login
To log in to MotherDuck UI, please go to [app.motherduck.com](https://app.motherduck.com). You will be redirected to our web UI.
:::info
Note you can also connect to the MotherDuck UI directly when starting DuckDB CLI by running the following command:
```bash
duckdb "md:" -ui
```
:::
## Main Window

## Executing a sample query
After you log in, run the following SQL query:
```sql
SELECT
country_name, city, pm25_concentration AS pm25_pollution
FROM sample_data.who.ambient_air_quality
WHERE year=2019 AND pm25_concentration IS NOT NULL
ORDER BY pm25_pollution ASC
```
This query accesses the [Sample Data Database](/getting-started/sample-data-queries/datasets) which is [attached](/key-tasks/sharing-data/sharing-data.mdx) by default.
MotherDuck executes this query in the cloud. Query results are saved into your browser into an interactive panel for fast data exploration with data sorting, filtering, and pivoting.

You can also click the Expand button on the top right of each cell to expand the editor and results area.

## Diving into your data with Column Explorer
### Exploring tables or resultsets
The Column Explorer allows you to see stats on either a selected table or the resultset from the selected notebook cell.
### Seeing value frequencies
For each column, you'll see the column type, the most commonly-occurring values and the percentage of values that are NULL.
In the case the values are numerical, you'll see a histogram visualization.
### Charting data over time
If you have timestamp data, you'll also see a chart in the Column Explorer with automatic binning over time.
The Column Explorer is collapsible by clicking the toggle on the top right.

### Dig into your results in the Cell Content Pane
Click on a cell in your results to see it's full contents.

#### Interact with JSON values
Expand, collapse, and copy content from JSON type columns. You can also copy the keypath to a specific value, or the value itself!

## Writing SQL with confidence using FixIt and Edit
[FixIt](/key-tasks/ai-and-motherduck/ai-features-in-ui#automatically-fix-sql-errors-in-the-motherduck-ui) helps you resolve common SQL errors by offering fixes in-line.

[Edit](/key-tasks/ai-and-motherduck/ai-features-in-ui#automatically-edit-sql-queries-in-the-motherduck-ui) helps you edit SQL queries with natural language prompts.

## Writing queries with Autocomplete
MotherDuck Web UI supports autocomplete. As you write SQL in the UI, on every keystroke autocomplete brings up query syntax suggestions. You can turn off autocomplete in Web UI settings, found by clicking your profile in the top-left and choosing "Settings" followed by "Preferences."
## Getting SQL function help with Inline Docs
Inline Docs help users understand DuckDB and MotherDuck SQL functions without leaving the query editor. Hover over any SQL function name in the editor to see details including the function description, parameter types, and return types. Clicking the "Docs" link in the tooltip opens the function's reference documentation.
You can turn off Inline Docs by heading to "Settings" under the dropdown at the top-left, selecting "Preferences" under "My Account", and toggling off, "Enable Inline Docs".
## Settings
MotherDuck settings are found by clicking your profile at the top-left. These settings are specific to each MotherDuck user and organization.
### General: Access Tokens
This section allows you to create access tokens, which can be use for programmatically [authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Tokens can have expiry dates.
### Organization
This section allows you to change the display name of the organization. You can also enable all users in your email domain to join you in your MotherDuck organization. See ["Managing Organizations"](/key-tasks/managing-organizations) for more information.
### Members
Displays members with access to the organization and allows you to invite new members to join your MotherDuck organization. Members are defined as human users with logins and passwords as well as [service accounts](https://motherduck.com/docs/key-tasks/service-accounts-guide/#overview).
### Preferences: UI settings
* Enable [autocomplete when typing](#writing-queries-with-autocomplete)
* Enable inline [SQL error fix suggestions](/key-tasks/ai-and-motherduck/ai-features-in-ui#automatically-fix-sql-errors-in-the-motherduck-ui)
* Enable [Inline Docs](#getting-sql-function-help-with-inline-docs) for SQL functions
### Secrets
MotherDuck enables you to query cloud blob stores without supplying credentials each time. Currently, credentials are supported for [AWS S3](/integrations/cloud-storage/amazon-s3) and [Azure Blob Storage](/integrations/cloud-storage/azure-blob-storage), [Google Cloud Storage (GCS)](/integrations/cloud-storage/google-cloud-storage), CloudFlare R2 and Hugging Face.
### Plans
Shows your current plan (Free, Standard) and allows you to switch plans.
### Billing
Displays your current plan, primary billing email address and estimated invoices and usage during free trial. After the free trial, you can see actual usage and access your invoices.
### Service Accounts (Admin only)
Lists the service accounts in your organization, and lets you create, manage, and impersonate service accounts to test or troubleshoot workflows. Service accounts enable you to run automated workflows and integrations without using a personal user account.
### Ducklings
Use this section to manage the Ducklings associated with your user account.
#### Duckling Size
Set the Read/Write and Read Scaling [Duckling sizes](../../../about-motherduck/billing/duckling-sizes/#duckling-sizes) for your user account.
#### Read Scaling replica pool size
Set the pool size for your user account's read-scaling replicas. [Learn more about Read Scaling](../../../key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/).
#### Version Information
The current MotherDuck remote version of DuckDB, MotherDuck Client version, and MotherDuck UI version in active use.
#### Reset Duckling
This option allows you to reset your account's Read/Write Duckling (compute instance) for troubleshooting purposes. Reach out to [MotherDuck support](../../../troubleshooting/support/) for help troubleshooting when resetting your Duckling.
## Keyboard shortcuts
MotherDuck supports the following keyboard shortcuts.
Use `Ctrl` for Windows/Linux and `⌘` (Command) for Mac.
Use `Alt` for Windows/Linux and `⌥` (Option) for Mac.
| Command | Action |
|---------|--------|
| `Ctrl`/`⌘` + `k` | Open the command menu. |
| `Ctrl`/`⌘` + `Enter` | Run the current cell. |
| `Ctrl`/`⌘` + `Shift` + `Enter` | Run selected text in the current cell. If no text is selected, run the whole cell. |
| `Shift` + `Enter` or `Alt`/`⌥` + `Enter` | Run the current cell, then advance to the next cell, creating a new one if necessary. |
| `Tab` | When editing a query, indent current line. When navigating the notebook, advance to next UI element/button. |
| `Shift` + `Tab` | When editing a query, de-indent current line. When navigating the notebook, move to previous UI element/button. |
| `Esc` | Change `Tab` key behavior to navigate the UI instead of indent/de-indent editor text. Once another cell is selected, `Tab` behavior reverts to indent/de-indent. |
| `Ctrl`/`⌘` + `Shift` + `e` | Generate an ['Edit'](/key-tasks/ai-and-motherduck/ai-features-in-ui#automatically-edit-sql-queries-in-the-motherduck-ui) for your current cell or selected editor text. |
| `Ctrl`/`⌘` + `/` | Toggle line comments on/off - prepends `--` to the front of each selected line. |
| `Ctrl`/`⌘` + `z` | Undo query edits within currently selected cell. |
| `Ctrl`/`⌘` + `Alt`/`⌥` + `o` | Format SQL in the current cell. When text is selected, only the selection is formatted. |
| `Ctrl`/`⌘` + `Shift` + `z` | Redo query edits within currently selected cell. |
| `Ctrl`/`⌘` + `e` | Toggle between worksheet and notebook view for the active cell. |
| `Ctrl`/`⌘` + `Shift` + `.` | Toggle Instant SQL mode on/off for the active cell. |
| `Ctrl`/`⌘` + `↑` | Move currently selected cell up. |
| `Ctrl`/`⌘` + `↓` | Move currently selected cell down. |
| `Ctrl`/`⌘` + `i` | Toggle the results inspect (right-hand panel) on/off. |
| `Ctrl`/`⌘` + `b` | Toggle the notebook & database browser (left-hand panel) on/off. |
:::note
Press `Esc` to change `Tab` key behavior from indenting/de-indenting text to navigating UI elements. Once another cell is selected, `Tab` behavior reverts to indenting/de-indenting.
:::
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/air-quality
---
sidebar_position: 3
title: Air Quality
description: Sample data from the WHO Ambient Air Quality Database to use with DuckDB and MotherDuck
---
## About the dataset
The [WHO Ambient Air Quality Database](https://www.who.int/publications/m/item/who-ambient-air-quality-database-(update-2023)) (6th edition, released in **May 2023**) compiles annual mean concentrations of nitrogen dioxide (NO2) and particulate matter (PM10, PM2.5) from ground measurements across over 8600 human settlements in more than 120 countries. This data, updated every 2-3 years since **2011**, primarily represents city or town averages and is used to monitor the Sustainable Development Goal Indicator 11.6.2, Air quality in cities.
Here's the schema :
| column_name | column_type | null | key | default | extra |
|--------------------|-------------|------|-----|---------|-------|
| who_region | VARCHAR | YES | | | |
| iso3 | VARCHAR | YES | | | |
| country_name | VARCHAR | YES | | | |
| city | VARCHAR | YES | | | |
| year | BIGINT | YES | | | |
| version | VARCHAR | YES | | | |
| pm10_concentration | BIGINT | YES | | | |
| pm25_concentration | BIGINT | YES | | | |
| no2_concentration | BIGINT | YES | | | |
| pm10_tempcov | BIGINT | YES | | | |
| pm25_tempcov | BIGINT | YES | | | |
| no2_tempcov | BIGINT | YES | | | |
| type_of_stations | VARCHAR | YES | | | |
| reference | VARCHAR | YES | | | |
| web_link | VARCHAR | YES | | | |
| population | VARCHAR | YES | | | |
| population_source | VARCHAR | YES | | | |
| latitude | FLOAT | YES | | | |
| longitude | FLOAT | YES | | | |
| who_ms | BIGINT | YES | | | |
To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx)
## Example queries
### Annual city air quality rating
This query assesses the average annual air quality in different cities per year based on WHO guidelines. It calculates the average concentrations of PM2.5, PM10, and NO2, then assigns an air quality rating of 'Good', 'Moderate', or 'Poor'. 'Good' indicates all pollutants are within WHO recommended levels, 'Poor' indicates all pollutants exceed WHO recommended levels, and 'Moderate' refers to any other scenario. The results are grouped and ordered by city and year.
```sql
SELECT
city,
year,
CASE
WHEN
AVG(pm25_concentration) <= 10
AND AVG(pm10_concentration) <= 20
AND AVG(no2_concentration) <= 40
THEN 'Good'
WHEN
AVG(pm25_concentration) > 10
AND AVG(pm10_concentration) > 20
AND AVG(no2_concentration) > 40
THEN 'Poor'
ELSE 'Moderate'
END AS airqualityrating
FROM
sample_data.who.ambient_air_quality
GROUP BY
city,
year
ORDER BY
city,
year;
```
### Yearly average pollutant concentrations of a city
This query calculates the yearly average concentrations of PM2.5, PM10, and NO2 in a given city, here `Berlin`.
```sql
SELECT
year,
AVG(pm25_concentration) AS avg_pm25,
AVG(pm10_concentration) AS avg_pm10,
AVG(no2_concentration) AS avg_no2
FROM sample_data.who.ambient_air_quality
WHERE city = 'Berlin'
GROUP BY year
ORDER BY year DESC;
```
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/datasets
---
title: Example Datasets
description: A collections of open datasets and queries to get you started with DuckDB and MotherDuck
---
We have prepared a series of datasets for you to play to dive in MotherDuck!
The database `sample_data` is readily available for all new users, as it's automatically attached to your account.
Other databases are available for you to attach, and you can do so by running the following command:
```sql
ATTACH AS
```
| Database | `schema.table` | Description | Share URL | Attached by default |
|-----------------|---------------------------------------------|-----------------------------------------------------------------|------------------------------------------------------------------|---------------------|
| sample_database | [`who.ambient_air_quality`](air-quality.md) | Historical air quality data from the World Health Organization. | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| sample_database | [`nyc.taxi`](nyc-311-data.md) | Taxi Ride data from November 2020 | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| sample_database | [`nyc.rideshare`](nyc-311-data.md) | Ride share trips (Lyft, Uber etc) in NYC | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| sample_database | [`nyc.service_requests`](nyc-311-data.md) | Requests to NYC's 311 complaint hotline via phone and web | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| sample_database | [`hn.hacker_news`](hacker-news.md) | Sample of comments from [Hacker News](https://news.ycombinator.com/) | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| sample_database | `kaggle.movies` | Sample of the movies dataset from [Kaggle](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset) | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| sample_database | `stackoverflow_survey.survey_results` | Survey results from 2017 to 2024 | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| sample_database | `stackoverflow_survey.survey_schemas` | Survey schemas (questions from the survey) from 2017 to 2024 | `'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6'` | Yes |
| stackoverflow | [`main.badges`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No |
| stackoverflow | `main.comments`[](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No |
| stackoverflow | [`main.post_links`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No |
| stackoverflow | [`main.posts`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No |
| stackoverflow | [`main.tags`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No |
| stackoverflow | [`main.votes`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No |
| stackoverflow | [`main.users`](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | `'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5'` | No |
| duckdb_stats | [`PyPi data on DuckDB project`](pypi.md) | Pythons data download from PyPi on the `duckdb` Python package, refreshed daily | `'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410'` | No |
| hacker_news | [`hacker_news.hacker_news`](hacker-news.md) | Full [Hacker News](https://news.ycombinator.com/) datasets from 2016 to 2025 | `'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b'` | No |
| foursquare | [`foursquare.fsq_os_places`](foursquare.md) | A global dataset of over 100 million points of interest (POIs) with detailed location, business, and contact information. | `'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b'` | No |
| foursquare | [`foursquare.fsq_os_categories`](foursquare.md) | A hierarchical classification of POIs with up to six levels, detailing category names and IDs.| `'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b'` | No |
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/foursquare
---
sidebar_position: 4
title: Foursquare
description: Foursquare Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI)
---
## About the dataset
[Foursquare](https://docs.foursquare.com/data-products/docs/fsq-places-open-source) Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI), featuring 22 core attributes, updated monthly, and designed to support geospatial applications with a collaborative, AI- and human-powered data curation system.
This database is updated monthly, we host however a snapshot of 2025-01-10.
You have two tables :
- `fsq_os_places` (Places) : a global dataset of over 100 million points of interest (POIs) with detailed location, business, and contact information.
- `fsq_os_categories` (Categories) : a hierarchical classification of POIs with up to six levels, detailing category names and IDs.
You can attach the `foursquare` database to your account by running the following command:
```sql
ATTACH 'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b' AS foursquare;
```
## Schema
### fsq_os_places - Places Dataset
| Column Name | Type | Description |
|--------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| fsq_place_id | String | The unique identifier of a Foursquare POI. Use this ID to view a venue at: `foursquare.com/v/{fsq_place_id}ud` |
| name | String | Business name of a POI |
| latitude/longitude | Decimal | Decimal coordinates (WGS84 datum) up to 6 decimal places. Derived from third-party sources, user input, and corrections. Default geocode type: front door or rooftop. |
| address | String | User-entered street address of the venue |
| locality | String | City, town, or equivalent where the POI is located |
| region | String | State, province, or territory. Abbreviations used in US, CA, AU, BR; full names elsewhere |
| postcode | String | Postal code or equivalent, formatted based on country (e.g., 5-digit US ZIP code) |
| admin_region | String | Additional sub-division (e.g., Scotland) |
| post_town | String | Town/place used in postal addressing (may differ from geographic location) |
| po_box | String | Post Office Box |
| country | String | 2-letter ISO Country Code |
| date_created | Date | Date the POI entered the database (not necessarily the opening date) |
| date_refreshed | Date | Last date any reference was refreshed via crawl, users, or validation |
| date_closed | Date | Date the POI was marked closed in the database (not necessarily actual closure date) |
| tel | String | Telephone number with local formatting |
| website | String | URL to the POI’s (or chain’s) website |
| email | String | Primary contact email address, if available |
| facebook_id | String | POI's Facebook ID, if available |
| instagram | String | POI's Instagram handle, if available |
| twitter | String | POI's Twitter handle, if available |
| fsq_category_ids | Array (String) | ID(s) of the most granular category(ies). See the Categories page for details |
| fsq_category_labels| Array (String) | Label(s) of the most granular category(ies). See the Categories page for details |
| placemaker_url | String | Link to the POI’s review page in PlaceMaker Tools for suggesting edits or reviewing pending changes |
| geom | wkb | Geometry of the POI in WKB format for visualization through the vector tiling service |
| bbox | struct | An area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0.
`bbox:struct xmin:double ymin:double xmax:double ymax:double` |
---
### fsq_os_categories - Category Dataset
| Column Name | Type | Description |
|----------------------|---------|-----------------------------------------------------------------------------------------------------|
| category_id | String | Unique identifier of the Foursquare category (BSON format) |
| category_level | Integer | Hierarchy depth of the category (1-6) |
| category_name | String | Name of the most granular category |
| category_label | String | Full category hierarchy separated by `>` |
| level1_category_id | String | Unique ID of the first-level category |
| level1_category_name | String | Name of the first-level category |
| level2_category_id | String | Unique ID of the second-level category |
| level2_category_name | String | Name of the second-level category |
| level3_category_id | String | Unique ID of the third-level category |
| level3_category_name | String | Name of the third-level category |
| level4_category_id | String | Unique ID of the fourth-level category |
| level4_category_name | String | Name of the fourth-level category |
| level5_category_id | String | Unique ID of the fifth-level category |
| level5_category_name | String | Name of the fifth-level category |
| level6_category_id | String | Unique ID of the sixth-level category |
| level6_category_name | String | Name of the sixth-level category |
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/hacker-news
---
sidebar_position: 2
title: Hacker News
description: Sample data from Hacker News stories to use for SQL querying of DuckDB and MotherDuck databases.
---
## About the dataset
[Hacker News](https://news.ycombinator.com/) is a social news website focusing on computer science and entrepreneurship. It is run by Y Combinator, a startup accelerator, and it's known for its minimalist interface. Users can post stories (such as links to articles), comment on them, and vote them up or down, affecting their visibility.
There are two ways to access the dataset :
- Through the `sample_data` database, which contains a sample of the data (from **January 2022** to **November 2022**)
- Through the `hacker_news` database, which contains the full dataset (from **2016** to **2025**)
To attach the `sample_data` database, you can use the following command:
```sql
ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data;
```
To attach the `hacker_news` database, you can use the following command:
```sql
ATTACH 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' AS hacker_news;
```
## Schema
| column_name | column_type | null | key | default | extra |
|-------------|-------------|------|-----|---------|-------|
| title | VARCHAR | YES | | | |
| url | VARCHAR | YES | | | |
| text | VARCHAR | YES | | | |
| dead | BOOLEAN | YES | | | |
| by | VARCHAR | YES | | | |
| score | BIGINT | YES | | | |
| time | BIGINT | YES | | | |
| timestamp | TIMESTAMP | YES | | | |
| type | VARCHAR | YES | | | |
| id | BIGINT | YES | | | |
| parent | BIGINT | YES | | | |
| descendants | BIGINT | YES | | | |
| ranking | BIGINT | YES | | | |
| deleted | BOOLEAN | YES | | | |
To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx)
## Example queries
### Most shared websites
This query returns the top domains being shared on Hacker News.
```sql
SELECT
regexp_extract(url, 'http[s]?://([^/]+)/', 1) AS domain,
count(*) AS count
FROM sample_data.hn.hacker_news
WHERE url IS NOT NULL AND regexp_extract(url, 'http[s]?://([^/]+)/', 1) != ''
GROUP BY domain
ORDER BY count DESC
LIMIT 20;
```
### Most Commented Stories Each Month
This query calculates the total number of comments for each story and identifies the most commented story of each month.
```sql
WITH ranked_stories AS (
SELECT
title,
'https://news.ycombinator.com/item?id=' || id AS hn_url,
descendants AS nb_comments,
YEAR(timestamp) AS year,
MONTH(timestamp) AS month,
ROW_NUMBER()
OVER (
PARTITION BY YEAR(timestamp), MONTH(timestamp)
ORDER BY descendants DESC
)
AS rn
FROM sample_data.hn.hacker_news
WHERE type = 'story'
)
SELECT
year,
month,
title,
hn_url,
nb_comments
FROM ranked_stories
WHERE rn = 1
ORDER BY year, month;
```
### Most monthly voted stories
This query determines the most voted story for each month.
```sql
WITH ranked_stories AS (
SELECT
title,
'https://news.ycombinator.com/item?id=' || id AS hn_url,
score,
YEAR(timestamp) AS year,
MONTH(timestamp) AS month,
ROW_NUMBER()
OVER (PARTITION BY YEAR(timestamp), MONTH(timestamp) ORDER BY score DESC)
AS rn
FROM sample_data.hn.hacker_news
WHERE type = 'story'
)
SELECT
year,
month,
title,
hn_url,
score
FROM ranked_stories
WHERE rn = 1
ORDER BY year, month;
```
### Keyword analysis
This query counts the monthly mentions a the keyword (here `duckdb`) in the title or text of Hacker News posts, organized by year and month.
```sql
SELECT
YEAR(timestamp) AS year,
MONTH(timestamp) AS month,
COUNT(*) AS keyword_mentions
FROM sample_data.hn.hacker_news
WHERE
(title LIKE '%duckdb%' OR text LIKE '%duckdb%')
GROUP BY year, month
ORDER BY year ASC, month ASC;
```
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/nyc-311-data
---
sidebar_position: 4
title: NYC 311 Complaint Data
description: New York City provides data from 311 call service requests. This data can be used as sample data for DuckDB and MotherDuck SQL queries.
---
## About the dataset
The [New York City 311 Service Requests Data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9) provides information on requests to the city's complaint service from 2010 to the present.
NYC311 responds to thousands of inquiries, comments and requests from customers every single day. This dataset represents only service requests that can be directed to specific agencies. This dataset is updated daily and expected values for many fields will change over time. The lists of expected values associated with each column are not exhaustive. Each row of data contains information about the service request, including complaint type, responding agency, and geographic location. However the data does not reveal any personally identifying information about the customer who made the request.
This dataset describes site-specific non-emergency complaints (also known as “service requests”) made by customers across New York City about a variety of topics, including noise, sanitation, and street quality.
The columns have been renamed to `lower_case_underscore` format for ease of typing. For more details on column data than below, see the associated data dictionary at that link above, in an Excel file.
| column_name | column_type | null | description |
|--------------------------------|---------------|--------|-------------|
| unique_key | BIGINT | YES | Unique identifier of a Service Request (SR) in the open data set. Each 311 service request is assigned a number that distinguishes it as a separate case incident. |
| created_date | TIMESTAMP | YES | The date and time that a Customer submits a Service Request. |
| closed_date | TIMESTAMP | YES | The date and time that an Agency closes a Service Request. |
| agency | VARCHAR | YES | Acronym of responding City Government Agency or entity responding to 311 Service Request. |
| agency_name | VARCHAR | YES | Full agency name of responding City Government Agency, or entity responding to 311 service request. |
| complaint_type | VARCHAR | YES | This is the first level of a hierarchy identifying the topic of the incident or condition. Complaint Type broadly describes the topic of the incident or condition and are defined by the responding agencies. |
| descriptor | VARCHAR | YES | This is associated to the Complaint Type, and provides further detail on the incident or condition. Descriptor values are dependent on the Complaint Type, and are not always required in the service request. |
| location_type | VARCHAR | YES | Describes the type of location used in the address information |
| incident_zip | VARCHAR | YES | Zip code of the incident address |
| incident_address | VARCHAR | YES | House number and street name of incident address |
| street_name | VARCHAR | YES | Street name of incident address |
| cross_street_1 | VARCHAR | YES | First Cross street based on the geo validated incident location.|
| cross_street_2 | VARCHAR | YES | Second Cross Street based on the geo validated incident location |
| intersection_street_1 | VARCHAR | YES | First intersecting street based on geo validated incident location |
| intersection_street_2 | VARCHAR | YES | Second intersecting street based on geo validated incident location |
| address_type | VARCHAR | YES | Type of information available about the incident location: Address; Block face; Intersection; LatLong; Placename |
| city | VARCHAR | YES | In this dataset, City can refer to a borough or neighborhood. MANHATTAN, BROOKLYN, BRONX, STATEN ISLAND, or in QUEENS, specifc neighborhood name |
| landmark | VARCHAR | YES | If the incident location is identified as a Landmark the name of the landmark will display here. Can refer to any noteworthy location, including but not limited to, parks, hospitals, airports, sports facilities, performance spaces, etc. |
| facility_type | VARCHAR | YES | If applicable, this field describes the type of city facility associated to the service request: DSNY Garage, Precinct, School, School District, N/A |
| status | VARCHAR | YES | Current status of the service request submitted: Assigned, Canceled, Closed, Pending |
| due_date | TIMESTAMP | YES | Date when responding agency is expected to update the SR. This is based on the Complaint Type and internal Service Level Agreements (SLAs) |
| resolution_description | VARCHAR | YES | Describes the last action taken on the service request by the responding agency. May describe next or future steps. |
| resolution_action_updated_date | TIMESTAMP | YES | Date when responding agency last updated the service request. |
| bbl | VARCHAR | YES | Parcel number that identifies the location of the building or property associated with the service request. The block is a subset of a borough. The lot is a subset of a block unique within a borough and block. |
| borough | VARCHAR | YES | The borough number is: 1. Manhattan (New York County) 2. Bronx (Bronx County) 3. Brooklyn (Kings County) 4. Queens (Queens County) 5. Staten Island (Richmond County) |
| x_coordinate_state_plane | VARCHAR | YES | Geo validated, X coordinate of the incident location. X coordinate of the incident location. For more information about NY State Plane Coordinate Zones: https://data.gis.ny.gov/datasets/ny-state-plane-coordinate-system-zones/explore |
| y_coordinate_state_plane | VARCHAR | YES | Geo validated, Y coordinate of the incident location. Y coordinate of the incident location. For more information about NY State Plane Coordinate Zones: https://data.gis.ny.gov/datasets/ny-state-plane-coordinate-system-zones/explore |
| open_data_channel_type | VARCHAR | YES | Indicates how the service request was submitted to 311: Phone, Online, Other (submitted by other agency) |
| park_facility_name | VARCHAR | YES | If the incident location is a Parks Dept facility and service requests pertains to a facility managed by NYC Parks (DPR), the name of the facility will appear here |
| park_borough | VARCHAR | YES | The borough of incident if the service request is pertaining to a NYC Parks Dept facility (DPR) |
| vehicle_type | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commision (TLC): Ambulette / Paratransit; Car Service; Commuter Van; Green Taxi |
| taxi_company_borough | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commision (TLC). |
| taxi_pick_up_location | VARCHAR | YES | If the incident pertains a vehicle managed by the Taxi and Limousine Commision (TLC), this field displays the taxi pick up location |
| bridge_highway_name | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the name will be displayed here |
| bridge_highway_direction | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the direction where the issue took place would be displayed here. |
| road_ramp | VARCHAR | YES | If the incident location was Bridge/Highway this column differentiates if the issue was on the Road or the Ramp. |
| bridge_highway_segment | VARCHAR | YES | Additional information on the section of the Bridge/Highway were the incident took place. |
| latitude | DOUBLE | YES | Geo based Latitude of the incident location in decimal degrees |
| longitude | DOUBLE | YES | Geo based Longitude of the incident location in decimal degrees |
| community_board | VARCHAR | YES | Community boards are local representative bodies. There are 59 community boards throughout the City, each representing a distinct geography. For more information on Community Boards: [NYC government website](https://www.nyc.gov/site/cau/community-boards/community-boards.page) |
To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx)
## Example queries
### The most common complaints in 2018
```sql
SELECT
UPPER(complaint_type),
COUNT(1)
FROM sample_data.nyc.service_requests
WHERE DATE_PART('year', created_date) = 2018
GROUP BY 1
HAVING COUNT(*) > 1000
ORDER BY 2 DESC;
```
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/pypi
---
sidebar_position: 5
title: PyPi Data
description: Want to know how users find and install software you've developed for the Python Community? This DuckDB and MotherDuck database allows you to use SQL to perform data analysis on PyPi data.
---
## About the dataset
PyPi is the Python Package Index, a repository of software packages for the Python programming language. It is a central repository that allows users to find and install software developed and shared by the Python community.
The dataset includes information about packages, releases, and downloads on the `duckdb` python package.
It's refreshed **weekly** and you can visit the dashboard [here](https://duckdbstats.com)
## How to query the dataset
A dedicated shared database is maintained to query the dataset. To attach it to your workspace, you can use the following command:
```sql
ATTACH 'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410' AS duckdb_stats;
```
## Schema
### pypi_file_downloads
This table contains the raw data. Each row represents a download from PyPi.
| column_name | column_type | null |
|--------------|----------------------------------------------------------------------------------------------------------------|------|
| timestamp | TIMESTAMP | YES |
| country_code | VARCHAR | YES |
| url | VARCHAR | YES |
| project | VARCHAR | YES |
| file | STRUCT(filename VARCHAR, project VARCHAR, "version" VARCHAR, "type" VARCHAR) | YES |
| details | STRUCT("installer" STRUCT("name" VARCHAR, "version" VARCHAR), "python" VARCHAR, "implementation" STRUCT("name" VARCHAR, "version" VARCHAR), "distro" STRUCT("name" VARCHAR, "version" VARCHAR, "id" VARCHAR, "libc" STRUCT("lib" VARCHAR, "version" VARCHAR)), "system" STRUCT("name" VARCHAR, "release" VARCHAR), "cpu" VARCHAR, "openssl_version" VARCHAR, "setuptools_version" VARCHAR, "rustc_version" VARCHAR, "ci" BOOLEAN) | YES |
| tls_protocol | VARCHAR | YES |
| tls_cipher | VARCHAR | YES |
### pypi_daily_stats
This table is a daily aggregation of the raw data. It contains the following columns:
| column_name | column_type | null |
|-------------------|-------------|------|
| load_id | VARCHAR | YES |
| download_date | DATE | YES |
| system_name | VARCHAR | YES |
| system_release | VARCHAR | YES |
| version | VARCHAR | YES |
| project | VARCHAR | YES |
| country_code | VARCHAR | YES |
| cpu | VARCHAR | YES |
| python_version | VARCHAR | YES |
| daily_download_sum| BIGINT | YES |
## Examples queries
The following queries assume that the current database connected is `duckdb_stats`. Run `use duckdb_stats` to switch to it.
### Get weekly download stats
```sql
SELECT
DATE_TRUNC('week', download_date) AS week_start_date,
version,
country_code,
python_version,
SUM(daily_download_sum) AS weekly_download_sum
FROM
duckdb_stats.main.pypi_daily_stats
GROUP BY
ALL
ORDER BY
week_start_date
```
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/stackoverflow-survey
---
sidebar_position: 5
title: StackOverflow Survey Data
description: Data from the StackOverflow Developer Survey from 2017 to 2024.
---
## About the dataset
Each year, [Stack Overflow conducts a survey](https://survey.stackoverflow.co/) of developers to understand the trends in the developer community. The survey covers a wide range of topics, including programming languages, frameworks, databases, and platforms, as well as developer demographics, education, and career satisfaction.
Starting from 2017, StackOverflow provided consistent schema and data format for the survey data, making it a great dataset to analyze trends in the developer community over the years.
The source is data are a series of CSV files that has been merged into a single schema with two tables for easy querying.
## How to query the dataset
This dataset is available as part of the `sample_data` database. This database is auto attached to any new user's workspace.
To re-attach the database, you can use the following command:
```sql
ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data;
```
## Schema
### stackoverflow_survey.survey_results
This table contains all the survey results from 2017 to 2024. Each columns represents a question from the survey. As questions change from year to year, the columns may vary a bit and the table is quite large.
### stackoverflow_survey.survey_schema
This table contains the schema of the survey results. `qname` is the name of the question which is also the column name in the `survey_results` table. `question` is the full question text.
| Column Name | Column Type |
|---------------|-------------|
| qname | VARCHAR |
| question | VARCHAR |
| qid | VARCHAR |
| force_resp | VARCHAR |
| type | VARCHAR |
| selector | VARCHAR |
| year | VARCHAR |
## Examples_queries
### List the most popular programming languages in 2024
```sql
SELECT
language,
COUNT(*) AS count
FROM (
SELECT UNNEST(STRING_SPLIT(LanguageHaveWorkedWith, ';')) AS language
FROM sample_data.stackoverflow_survey.survey_results
where year='2024'
) AS languages
GROUP BY language
ORDER BY count DESC;
```
### Top 10 Countries with the Most Respondents in 2024
```sql
SELECT
Country,
COUNT(*) AS Respondents
FROM sample_data.stackoverflow_survey.survey_results
WHERE year = '2024'
GROUP BY Country
ORDER BY Respondents DESC
LIMIT 10;
```
## Correlation Between Remote Work and Job Satisfaction in 2024
```sql
SELECT RemoteWork,
AVG(CAST(JobSat AS DOUBLE)) AS AvgJobSatisfaction,
COUNT(*) AS RespondentCount
FROM sample_data.stackoverflow_survey.survey_results
WHERE JobSat NOT IN ('NA',
'Slightly satisfied',
'Neither satisfied nor dissatisfied',
'Very dissatisfied',
'Very satisfied',
'Slightly dissatisfied')
AND RemoteWork NOT IN ('NA')
AND YEAR='2024'
GROUP BY ALL
```
---
Source: https://motherduck.com/docs/getting-started/sample-data-queries/stackoverflow
---
sidebar_position: 5
title: StackOverflow Data
description: Sample data from StackOverflow to use with DuckDB and MotherDuck to understand SQL-based data analytics.
---
## About the dataset
[Stack Overflow](https://stackoverflow.com/) is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions.
The dataset includes a complete **data dump up to May 2023**, covering posts, comments, users, badges, and related metrics.
You can read more about the dataset in our blog series [part 1](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-1/) and [part 2](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-2/).
## How to query the dataset
As this dataset is quite large, it's not part of the `sample_data` database. Instead, you can find it as a dedicated shared database.
To attach it to your workspace, you can use the following command:
```sql
ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
```
## Schema
### Badges
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| UserId | BIGINT | YES | | | |
| Name | VARCHAR | YES | | | |
| Date | TIMESTAMP | YES | | | |
| Class | BIGINT | YES | | | |
| TagBased | BOOLEAN | YES | | | |
### Comments
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostId | BIGINT | YES | | | |
| Score | BIGINT | YES | | | |
| Text | VARCHAR | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| UserId | BIGINT | YES | | | |
| ContentLicense | VARCHAR | YES | | | |
### Post Links
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| PostId | BIGINT | YES | | | |
| RelatedPostId | BIGINT | YES | | | |
| LinkTypeId | BIGINT | YES | | | |
### Posts
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostTypeId | BIGINT | YES | | | |
| AcceptedAnswerId | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| Score | BIGINT | YES | | | |
| ViewCount | BIGINT | YES | | | |
| Body | VARCHAR | YES | | | |
| OwnerUserId | BIGINT | YES | | | |
| LastEditorUserId | BIGINT | YES | | | |
| LastEditorDisplayName | VARCHAR | YES | | | |
| LastEditDate | TIMESTAMP | YES | | | |
| LastActivityDate | TIMESTAMP | YES | | | |
| Title | VARCHAR | YES | | | |
| Tags | VARCHAR | YES | | | |
| AnswerCount | BIGINT | YES | | | |
| CommentCount | BIGINT | YES | | | |
| FavoriteCount | BIGINT | YES | | | |
| CommunityOwnedDate | TIMESTAMP | YES | | | |
| ContentLicense | VARCHAR | YES | | | |
### Tags
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| TagName | VARCHAR | YES | | | |
| Count | BIGINT | YES | | | |
| ExcerptPostId | BIGINT | YES | | | |
| WikiPostId | BIGINT | YES | | | |
### Votes
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostId | BIGINT | YES | | | |
| VoteTypeId | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
### Users
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| Reputation | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| DisplayName | VARCHAR | YES | | | |
| LastAccessDate | TIMESTAMP | YES | | | |
| AboutMe | VARCHAR | YES | | | |
| Views | BIGINT | YES | | | |
| UpVotes | BIGINT | YES | | | |
| DownVotes | BIGINT | YES | | | |
## Examples queries
The following queries assume that the current database connected is `stackoverflow`. Run `use stackoverflow` to switch to it.
### List the top 5 posts that received the most votes
```sql
SELECT posts.Title, COUNT(votes.Id) AS VoteCount
FROM posts
JOIN votes ON posts.Id = votes.PostId
GROUP BY posts.Title
ORDER BY VoteCount DESC
LIMIT 5;
```
### Find the top 5 posts with the highest view count:
```sql
SELECT Title, ViewCount
FROM posts
ORDER BY ViewCount DESC
LIMIT 5;
```
---
Source: https://motherduck.com/docs/integrations/bi-tools/evidence
---
sidebar_position: 3
title: Evidence
---
import BlockWithBacktick from '@site/src/components/BlockWithBacktick';
[Evidence](https://evidence.dev/) is an open source, code-based alternative to drag-and-drop BI tools. Build polished data products with just SQL and markdown.
## Getting started
Head over to [their installation page](https://docs.evidence.dev/getting-started/install-evidence) and start with their template to get you started.
## Authenticate to MotherDuck
When using development, you can go manually through the UI, pick "settings". If you are running Evidence locally, typically at [http://localhost:3000/settings](http://localhost:3000/settings).

Then select 'DuckDB' as a connection type, and as the filename, use `'md:?motherduck_token=xxxx'` where `xxx` is your [access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). Finally as extension, select "No extension". Click on `Save`.

In production, you can set [some global environmements](https://docs.evidence.dev/deployment/environments#prod-environment), you would have to set two environments variables:
- `EVIDENCE_DUCKDB_FILENAME='md:?motherduck_token=xxxx'`
- `EVIDENCE_DATABASE=duckdb`
## Displaying some data through SQL and Markdown
Once done, you can add a new page in the `pages` folder and add the following code blocks to `stackoverflow.md` file:
First, we simply add some Markdown headers.
```md
---
title: Evidence & MotherDuck
---
# Stories with most score
```
Then, we query our data from the [HackerNews sample_data database](/getting-started/sample-data-queries/hacker-news.md) in MotherDuck. The query is fetching the top stories (posts) from HackerNews.
SELECT id,
title,
score,
"by",
strftime('%Y-%m-%d', to_timestamp(time)) AS date
FROM sample_data.hn.hacker_news
WHERE type = 'story'
ORDER BY score DESC
LIMIT 20;
Finally, we use the reference of that query result `new_items` to create a list that would be generated in Mardown. The list contains the title (with the url of the story), the date, the score and the author of the story.
```md
{#each new_items as item}
* [{item.title}](https://news.ycombinator.com/item?id={item.id}) {item.date} ⬆ {item.score} by [{item.by}](https://news.ycombinator.com/user?id={item.by})
{/each}
```
Head over then to this page you created and you should see the final result that looks like this:

---
Source: https://motherduck.com/docs/integrations/bi-tools/excel
---
sidebar_position: 7
sidebar_label: Microsoft Excel
title: Connect MotherDuck to Excel
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Use Excel's 'Get Data' flow with the DuckDB ODBC driver to load MotherDuck data into Excel. This setup works well for recurring reporting, analysis, ad hoc SQL exploration, finance models, and operational dashboards without having to rely on exported CSVs.
## Before you start
To get started you'll need the following.
- Windows + Excel (ODBC is Windows-only for this flow)
- A MotherDuck access token (create one in the [MotherDuck token page](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token))
- Admin rights on your computer to install the ODBC driver
## Installation steps
### 1. Install the DuckDB ODBC driver
1. Download the latest DuckDB ODBC driver for Windows (amd64):
- [duckdb_odbc-windows-amd64.zip](https://github.com/duckdb/duckdb-odbc/releases/latest/download/duckdb_odbc-windows-amd64.zip)
2. Extract the `.zip` file and run `odbc_install.exe` as Administrator (right click -> Run as administrator).
### 2. Configure the DuckDB System DSN
1. Open the ODBC Data Source Administrator:
- 64-bit Excel: Start menu -> ODBC Data Sources (64-bit)
- 32-bit Excel: Start menu -> ODBC Data Sources (32-bit)

2. Go to System DSN, select DuckDB, and click Configure. 
3. Set Database to one of the following:
- Recommended (scoped): `md:your_database_name`
- Open scope: `md:` (allows access to any database)
4. Click OK to save.

### 3. Connect from Excel (Get Data)
1. In Excel, go to Data -> Get Data -> From Other Sources -> From ODBC.

2. Choose DuckDB from the DSN dropdown and click OK.

3. On the credentials screen, choose Default or Custom and add this to the Connection string properties field:
```
motherduck_token=
```

4. Click Connect.
### 4. Load or transform data
Use the Navigator window to select tables and choose Load to bring data into Excel, or Transform Data to shape it in Power Query before loading.
## Excel ODBC on macOS
Direct ODBC connectivity between Excel and MotherDuck is **not currently supported on macOS** due to a driver incompatibility.
### Why it doesn't work
Excel on macOS uses the **iODBC** driver manager, but the DuckDB ODBC driver is built for **unixODBC**. These drivers are incompatible at the binary level. This is a [known issue](https://github.com/duckdb/duckdb-odbc/issues/40) being tracked by the DuckDB team.
If necessary, you can build this driver yourself.
### Alternatives for macOS users
#### Option 1: Export directly to Excel with DuckDB (CLI and drivers)
DuckDB has an [Excel extension](https://duckdb.org/docs/stable/core_extensions/excel) that can write `.xlsx` files directly. This works with DuckDB CLI or any DuckDB driver, but cannot be used in the MotherDuck UI as we cannot currently export `.xlsx` files to your local file system.
```sql
-- Connect to MotherDuck and export to Excel
ATTACH 'md:';
COPY (SELECT * FROM my_database.my_table) TO 'output.xlsx' WITH (FORMAT xlsx, HEADER true);
```
Or via command line:
```bash
duckdb -c "ATTACH 'md:'; COPY (SELECT * FROM my_database.my_table) TO 'output.xlsx' WITH (FORMAT xlsx, HEADER true);"
```
#### Option 2: Use the MotherDuck Web UI
Query your data in the [MotherDuck Web UI](https://app.motherduck.com) and export results:
1. Run your query in the MotherDuck UI
2. Click the download button to export as CSV
3. Open the CSV in Excel
#### Option 3: Export to CSV via DuckDB CLI
Use the DuckDB CLI to export query results to CSV:
```bash
duckdb -c "ATTACH 'md:'; COPY (SELECT * FROM my_database.my_table) TO 'output.csv' (HEADER, DELIMITER ',');"
```
## Tips
- If you change your MotherDuck token, update the connection string properties in Excel.
- If you use multiple databases, create separate DSNs (e.g., `DuckDB - analytics`, `DuckDB - finance`) with different `md:database` values.
## Troubleshooting
### How do I delete an existing MotherDuck connection in Excel?
1. In Excel, go to Data -> Queries & Connections.
2. Find the connection you want to remove, right click it, and choose Delete.
### How do I modify an existing MotherDuck connection?
1. In Excel, go to Data -> Queries & Connections.
2. Right click the connection and choose Properties.
3. Open the Definition tab and update the connection string (for example, update `motherduck_token=...`) and save.
If you don't see the Definition tab, use Data -> Get Data -> Data Source Settings, select your DuckDB connection, then choose Change Source or Edit Permissions as needed.
---
Source: https://motherduck.com/docs/integrations/bi-tools/hex
---
sidebar_position: 1
title: Hex
---
import Image from '@theme/IdealImage';
[Hex](https://hex.tech/) is a software platform for collaborative data science and analytics using Python, SQL and no-code.
You have two ways to connect to MotherDuck using Hex:
- **Using SQL cells with a data connection**: MotherDuck is a supported [data connection in Hex](https://learn.hex.tech/docs/connect-to-data/data-connections/data-connections-introduction#supported-data-sources).
- **Using Python cells**: You can use Python cells to connect to MotherDuck and query data using DuckDB.
## Using SQL cells with a data connection
:::tip
When many human users query through the same MotherDuck data connection, consider using a [read scaling token](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/).
Hex will then route the queries to a dedicated Duckling per Hex kernel, up to the maximum flock size determined by your organization admin.
What this means in practice:
* Each workbook will get a stable backend for each unique data connection.
Multiple users collaborating on the same workbook will share the Duckling to query faster on warm data caches.
* In a published app, each user will get a stable backend for each data connection to power their own unique exploration.
:::
To add a new data connection, head over the Data browser in a new notebook and click on `Add data connection`.

Select `MotherDuck` as the data source and fill in the required fields. The most important is the MotherDuck token, which you can find in the [MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token).

Once done, you can use the data browser to explore the tables and columns and directly specify your data connection in your SQL cell.


### Query some data
Add another cell and run the same query we ran in a Python cell :
```sql
SELECT dayname(tpep_pickup_datetime) AS day_of_week, strftime('%H', tpep_pickup_datetime) AS hour_of_day, COUNT(*) AS trip_count
FROM sample_data.nyc.taxi
GROUP BY day_of_week, hour_of_day
ORDER BY day_of_week, hour_of_day;
```
This produces both a table and a Dataframe, which you can utilize in the same manner as we previously demonstrated with Python to generate data visualizations.

## Using Python cells
If you prefer programming in Python, you can use Python cells to connect to MotherDuck and start query data. You can jump directly on the [Hex notebook](https://app.hex.tech/c0083b53-a04f-47b1-bff7-a9ff12590a9f/hex/5c85b3e2-3df7-4011-87a0-1fff63787d03/draft/logic) for a quickstart.
The notebook highlight how you can query data using Python or SQL cells and display charts!
### Storing your MotherDuck token
The first step is to safely store your MotherDuck token. You can do this by [creating a new secret in Hex.](https://learn.hex.tech/docs/environment-configuration/environment-views#secrets)

Let's add your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md#authentication-using-an-access-token) under the name `motherduck_token`.

Once done, add the next Python cell to export as environment variable your `motherduck_token`. This will be detected by SQL/Python processes when authenticating to MotherDuck.
```python
# Passing the secrets as environment variable for Python/SQL cell auth
# Fill in your token as a Hex project secret https://learn.hex.tech/docs/environment-configuration/environment-views#secret
import os
os.environ["motherduck_token"] = motherduck_token
```
### Connecting to MotherDuck
Connecting to MotherDuck is straightforward as DuckDB is already pre-installed in the Hex environment!
Add a Python cell and run the following code:

```python
import duckdb
# Connect to MotherDuck using Python
conn = duckdb.connect(f'md:')
```
### Query some data and display a chart
We can now easily query some data based on the [sample_data database](/getting-started/sample-data-queries/datasets.mdx). We will run a simple query and return it as a pandas dataframe in order to display it as a chart.
This database is auto-attached to any MotherDuck user, so you can query it directly.
Add another Python cell and run the following code:
```python
# Query sample_data database and convert it to a pandas dataframe for dataviz
peak_hours = conn.sql("""
SELECT dayname(tpep_pickup_datetime) AS day_of_week, strftime('%H', tpep_pickup_datetime) AS hour_of_day, COUNT(*) AS trip_count
FROM sample_data.nyc.taxi
GROUP BY day_of_week, hour_of_day
ORDER BY day_of_week, hour_of_day;""").to_df()
```
Now we can display the chart using the Visualization cell. Add a new Visualization cell, type `Chart` and select the dataframe we just created `peak_hours`.

Finally, play with the parameters to obtain the following chart which gives you a weekly view of the peak hours in New York City for the yellow cabs.

---
Source: https://motherduck.com/docs/integrations/bi-tools/index
---
title: Business Intelligence Tools
description: Use MotherDuck as a data source in tools for interactive data analysis and presentation
---
import DocCardList from '@theme/DocCardList';
# Business Intelligence Tools
MotherDuck integrates with popular business intelligence tools to help you analyze and visualize your data.
---
Source: https://motherduck.com/docs/integrations/bi-tools/metabase
---
sidebar_position: 5
title: Metabase
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
[Metabase](https://www.metabase.com/) is an open source analytics/BI platform that provides intuitive data visualization and exploration capabilities. This guide details how to connect Metabase to both local DuckDB databases and MotherDuck.
## Prerequisites
- Metabase installed (self-hosted)
- Admin access to your Metabase instance
- For MotherDuck connections: valid MotherDuck token
## Metabase Cloud
Metabase Cloud does not currently support installing custom drivers. Support for the DuckDB/MotherDuck driver on Metabase Cloud is under development.
Until Cloud support is available, use Self-hosted Metabase to connect to DuckDB or MotherDuck.
## Self-hosted Metabase
### Install the DuckDB driver
1. Create a `Dockerfile` that includes the latest Metabase plus the DuckDB driver:
```dockerfile
FROM eclipse-temurin:21-jre
ENV MB_PLUGINS_DIR=/plugins
RUN mkdir -p ${MB_PLUGINS_DIR} /app
# Latest Metabase
ADD https://downloads.metabase.com/latest/metabase.jar /app/metabase.jar
# Latest DuckDB driver
ADD https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar ${MB_PLUGINS_DIR}/
EXPOSE 3000
CMD ["java", "-jar", "/app/metabase.jar"]
```
2. Build and run:
```bash
docker build -t metabase-duckdb:latest .
docker run -d --name metaduck -p 3000:3000 -e MB_PLUGINS_DIR=/plugins metabase-duckdb:latest
```
Tip: For reproducible builds, pin versions instead of `latest`:
```dockerfile
# Example of pinning versions (replace X.Y.Z)
ADD https://downloads.metabase.com/vX.Y.Z/metabase.jar /app/metabase.jar
ADD https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/download/1.X.Y/duckdb.metabase-driver.jar ${MB_PLUGINS_DIR}/
```
Note: Use a Debian/Ubuntu-based JRE image (not Alpine) to avcodoid glibc issues with the DuckDB driver.
1. Download the latest DuckDB driver `.jar`:
```bash
curl -L -o duckdb.metabase-driver.jar \
https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar
```
1. Copy it to the Metabase plugins directory:
- Standard installation (example): If your `metabase.jar` is at `~/app/metabase.jar`, place the driver in `~/app/plugins/`
```bash
mkdir -p ~/app/plugins
mv duckdb.metabase-driver.jar ~/app/plugins/
```
- On Mac: The plugins directory is `~/Library/Application Support/Metabase/Plugins/` (if you are using a Mac)
```bash
mkdir -p "${HOME}/Library/Application Support/Metabase/Plugins/"
mv duckdb.metabase-driver.jar "${HOME}/Library/Application Support/Metabase/Plugins/"
```
- Custom location or Docker: set `MB_PLUGINS_DIR` to point Metabase at your plugins directory and place the `.jar` there (if you are using a custom location or Docker).
1. Restart Metabase so it picks up the new plugin.
1. SSH to the host and download to the plugins directory. Replace user/host and adjust `MB_PLUGINS_DIR` as needed.
```bash
ssh user@your-host "bash -lc '
set -euo pipefail
MB_PLUGINS_DIR=${MB_PLUGINS_DIR:-/app/plugins}
mkdir -p "$MB_PLUGINS_DIR"
if command -v wget >/dev/null; then
wget -qO "$MB_PLUGINS_DIR/duckdb.metabase-driver.jar" \
https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar
else
curl -L -o "$MB_PLUGINS_DIR/duckdb.metabase-driver.jar" \
https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar
fi
'"
```
2. Restart Metabase on the remote host:
- systemd: `ssh user@your-host 'sudo systemctl restart metabase'`
- Docker: `ssh user@your-host 'docker restart '`
:::important
Restart required: Metabase must be restarted after adding or upgrading plugins. Hot-reload of drivers is not supported.
:::
:::tip
Compatibility and upgrades: New DuckDB driver releases are designed to be backward compatible with recent Metabase versions. Upgrading to the latest driver is recommended for bug fixes and stability. If you run a significantly older Metabase version, validate in staging first.
:::
### Add your database connection
After installing the driver, you can add MotherDuck as a data source in Metabase.
1. Log in to Metabase with admin credentials
2. Navigate to **Admin Settings** > **Databases** > **Add Database**
3. Select **DuckDB** as the database type
:::note
Since DuckDB does not do implicit casting by default, the `old_implicit_casting` config is currently necessary for datetime filtering in Metabase to function. It's recommended to keep it set.
:::
#### Connecting to MotherDuck
To connect to MotherDuck:
1. **Database name**: In the Database file field, enter `md:[database_name]` where `[database_name]` is your MotherDuck database name
2. **MotherDuck token**: Paste your MotherDuck token (retrieve from the [MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md))
3. **Configuration**: Enable `old_implicit_casting` (recommended) for proper datetime handling

### DuckLake on Metabase
DuckLake is supported with the DuckDB driver in Metabase. Use the latest DuckDB driver release and a DuckDB version that supports DuckLake (DuckDB v1.3.2 or newer is recommended).
#### MotherDuck-managed DuckLake
If your DuckLake database is managed by MotherDuck, you can connect the same way you connect to any MotherDuck database:
1. Select DuckDB as the database type
2. Database file: `md:[ducklake_database_name]`
3. MotherDuck token: paste your token
4. Keep `old_implicit_casting` enabled (recommended)
No extra Init SQL is required. Query your tables normally in Metabase.
#### Own compute + DuckLake catalog (attach in Init SQL)
If you want Metabase’s embedded DuckDB to query a DuckLake stored externally, attach the DuckLake catalog in the connection’s Init SQL. This works for both MotherDuck-managed catalogs and self-managed catalogs.
- Init SQL for a MotherDuck-managed DuckLake catalog:
```sql
-- Attaches the DuckLake metadata catalog hosted in MotherDuck
ATTACH 'ducklake:md:__ducklake_metadata_[database_name]' AS dl1;
```
- Init SQL for a self-managed DuckLake catalog (local metadata DB) with S3 data path:
```sql
-- Replace the path to your DuckLake metadata DB and bucket prefix
ATTACH 'ducklake:/duckdb/my_ducklake_metadata.ducklake' AS dl1 (
DATA_PATH 's3://my_bucket/lake/'
);
```
Once attached, reference tables with the alias, for example: `FROM dl1.my_table`.
### Connecting to a Local DuckDB database
To connect to a local DuckDB database:
1. Database file: enter the full path to your DuckDB file (e.g., `/path/to/database.db`)
2. Configuration: enable `old_implicit_casting` (recommended) to ensure proper datetime filtering
:::note
DuckDB's concurrency model supports either one process with read/write permissions, or multiple processes with read permissions, but not both at the same time. This means you will not be able to open a local DuckDB in read-only mode, then the same DuckDB in read-write mode in a different process.
:::

## Configuration Best Practices
- **Connection pooling**: For production instances, set an appropriate connection pool size based on expected concurrent users
- **Query timeouts**: Configure timeouts in Metabase settings to prevent long-running queries from affecting system performance
- **Data access**: Use database-level permissions in Metabase to control who can access which data sources
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Driver not detected | Ensure driver is in the correct plugins directory and Metabase has been restarted |
| Connection failures | Verify database path (local) or database name and token (MotherDuck) |
| Permission errors | Check file permissions for local databases |
| Datetime filtering issues | Enable `old_implicit_casting` in the connection settings |
| Add MotherDuck token in the connection string | Specify a correct MotherDuck token or MotherDuck database name after the `md:` prefix |
### Updating the MotherDuck token
Metabase keeps long-lived database connections alive. When you update only the MotherDuck token while an existing connection is still cached, Metabase raises `Connection error: Can't open a connection to same database file with a different configuration than existing connections`.
Use one of the following approaches to refresh the token successfully:
1. **Add a cache buster while editing the database.** Edit the connection under **Admin Settings** > **Databases**, then update both the **Database file** field and the **MotherDuck Token** field with a small cache-busting change (for example, append `?refresh=20250917`). Updating both values at the same time forces Metabase to treat the configuration as new. Save the connection, then optionally revert the fields to their clean values once the change is persisted.
2. **Restart Metabase before updating the token.** Restart the Metabase service and, immediately after it starts, go straight to `/admin/databases` to update the token field. Do not open the Metabase home screen before editing the database connection, or the previous connection (with the old token) will be re-established.
### Connecting to a Local DuckDB database
To connect to a local DuckDB database:
1. **Database file**: Enter the full path to your DuckDB file (e.g., `/path/to/database.db`)
2. **Configuration**: Enable `old_implicit_casting` (recommended) to ensure proper datetime filtering
3. **Additional settings**:
- **Read only**: Toggle as appropriate for your use case
- **Naming strategy**: Choose your preferred table/field naming strategy
:::note
DuckDB's concurrency model supports either one process with read/write permissions, or multiple processes with read permissions, but not both at the same time. This means you will not be able to open a local DuckDB in read-only mode, then the same DuckDB in read-write mode in a different process.
:::

---
Source: https://motherduck.com/docs/integrations/bi-tools/powerbi
---
sidebar_position: 6
sidebar_label: Microsoft Power BI
title: Power BI with DuckDB and MotherDuck
---
[Power BI](https://www.microsoft.com/en-us/power-platform/products/power-bi) is an interactive data visualization product developed by Microsoft. MotherDuck has built an open-source [DuckDB Power Query Connector](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector) that you can use to connect to Power BI to DuckDB and MotherDuck.
## Installing
1. Download the latest [DuckDB ODBC driver for Windows](https://github.com/duckdb/duckdb-odbc/releases/latest/download/duckdb_odbc-windows-arm64.zip). For more information about the Windows ODBC Driver, see the [DuckDB Docs page on DuckDB ODBC API on Windows](https://duckdb.org/docs/stable/clients/odbc/windows).
2. Extract the `.zip` archive. Run `odbc_install.exe` - if Windows displays a security warning, click "More information" then "Run Anyway".
3. Optionally, verify the installation in the Registry Editor:
- Open Registry Editor by running `regedit`
- Navigate to `HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\DuckDB`
- Confirm the Driver field shows your installed version
- If incorrect, delete the `DuckDB` registry key and reinstall
4. Configure Power BI security settings to allow loading of custom extensions:
- Go to File -> Options and settings -> Options -> Security -> Data Extensions
- Enable "Allow any extensions to load without validation or warning"
- 
5. Download the latest version of the DuckDB Power Query extension:
- [duckdb-power-query-connector.mez](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/releases/latest/download/duckdb-power-query-connector.mez)
6. Create the Custom Connectors directory if it does not yet exist:
- Navigate to `[Documents]\Power BI Desktop\Custom Connectors`
- Create this folder, if it doesn't exist
- Note: If this location does not work you may need to place this in your OneDrive Documents folder instead
7. Copy the `duckdb-power-query-connector.mez` file into the Custom Connectors folder
8. Restart Power BI Desktop
## How to use with Power BI
1. In Power BI Desktop, click "Get Data" -> "More..."
2. Search for "DuckDB" in the connector search box and select the DuckDB connector

3. For MotherDuck connections, you'll need to provide:
- Database Location: Use the `md:` prefix followed by your database name (e.g., `md:my_database`). This can also be a local file path (e.g., `~\my_database.db`)
- MotherDuck Token: Get your token from [MotherDuck's token page](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token)
- Read Only (Optional): Set to `true` if you only need read access

4. Click "OK".
5. Click "Connect".

6. Select the table(s) you want to import. Click "Load".

7. You can now query your data and create visualizations!

8. After connecting, you can:
- Browse and select tables from your MotherDuck or DuckDB database
- Use "Transform Data" to modify your queries before loading
- Write custom SQL queries using the "Advanced Editor"
- Import multiple tables in one go
9. Power BI will maintain the connection to your MotherDuck or DuckDB database, allowing you to:
- Refresh data automatically or on-demand
- Create relationships between tables
- Build visualizations and dashboards
- Share reports with other users (requires proper gateway setup)
## Use custom data connectors with an on-premises data gateway
You can use custom data connectors with an on-premises data gateway to connect to data sources that are not supported by default. To do this, you need to install the on-premises data gateway and configure it to use the custom data connector. For more information, see [Use custom data connectors with an on-premises data gateway in Power BI](https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-custom-connectors).
It should be noted that there are some limitations with using a custom connector with an on-premise data gateway:
- Make sure the folder you create is accessible to the background gateway service. Typically, folders under your users' Windows folders or system folders aren't accessible. The on-premises data gateway app shows a message if the folder isn't accessible. This limitation doesn't apply to the on-premises data gateway (personal mode).
- If your custom connector is on a network drive, include the fully qualified path in the on-premises data gateway app.
- You can only use one custom connector data source when working in DirectQuery mode. Multiple custom connector data sources don't work with DirectQuery.
## Additional information
- [Power BI Documentation](https://learn.microsoft.com/en-us/power-bi/connect-data/)
- [DuckDB Power Query Connector](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector)
## Troubleshooting
### Missing VCRUNTIME140.dll
If you receive an error about missing `VCRUNTIME140.dll`, you need to install the Microsoft Visual C++ Redistributable. You can download it from [Microsoft's download page](https://www.microsoft.com/en-us/download/details.aspx?id=52685).
### Visual C++ and ODBC Issues
:::note
These steps are particularly relevant for Windows Server environments, especially for Windows Server 2019, but may also help resolve issues on other Windows versions.
:::
If you encounter issues with ODBC connectivity or receive errors related to Visual C++ libraries, try these troubleshooting steps:
1. Reinstall the Microsoft Visual C++ Redistributable:
- Download the latest version from [Microsoft's official website](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170) for your architecture
- Run the installer with administrator privileges
- Restart your computer after installation
- Try connecting to MotherDuck again
2. If you're still experiencing issues, you can use the ODBC Test tool to diagnose the connection:
- Open the ODBC Test tool (typically available in Windows SDK)
- Look for a dropdown menu labeled "hstmt 1: ..."
- Select this option to run test queries
- If queries work in the ODBC Test tool but not in Power BI, this indicates a Power BI-specific configuration issue
If you continue to experience problems after trying these steps, please:
- Verify that your MotherDuck token is valid and hasn't expired
- Check that your network allows connections to MotherDuck's services
- Ensure you have the latest version of the DuckDB Power Query Connector installed
If you're still experiencing issues, please reach out to us at [support@motherduck.com](mailto:support@motherduck.com) and we'll be happy to help you troubleshoot the issue.
---
Source: https://motherduck.com/docs/integrations/bi-tools/superset-preset
---
sidebar_position: 4
title: Superset & Preset
---
[Apache Superset](https://superset.apache.org/) is a powerful, open-source data exploration and visualization platform designed to be intuitive and interactive. It allows data professionals to quickly integrate and analyze data from various sources, creating insightful dashboards and charts for better decision making.
[Preset](https://preset.io/) is a cloud-native, user-friendly platform built on Apache Superset. It offers enhanced capabilities and managed services to leverage the power of Superset without needing to handle installation and maintenance.
In this guide, we'll cover how you can use MotherDuck with either Superset or Preset.
## Superset
### Setup
The easy way to get started locally with Superset is to use their [docker-compose configurations.](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/)
### Adding a database connection to MotherDuck
To make it work with DuckDB & MotherDuck, you will have to install three extra Python packages in your local superset environment:
- DuckDB SQLAlchemy driver [duckdb-engine](https://github.com/Mause/duckdb_engine)
- DuckDB [duckdb](https://github.com/duckdb/duckdb)
- Flask AppBuilder [flask_appbuilder](https://github.com/dpgaspar/Flask-AppBuilder)
1. First, you will have to clone the [Superset repository](https://github.com/apache/superset):
```bash
git clone https://github.com/apache/superset.git
```
2. Then create a new file in `superset/docker/requirements-local.txt` and add the following packages:
```
duckdb-engine==0.17.0
duckdb==1.2.2
flask_appbuilder==4.6.3
```
3. Then build or run the docker container, depending whether this is the first time you run it or not, with the following command:
```bash
# First time running it
docker-compose up --build
# Subsequent runs
docker-compose up
```
4. Once the container is running, you can access the Superset UI at [http://localhost:8088](http://localhost:8088) or at the address you specified in the `docker-compose.yml` file.
5. Once you are logged in, head over to "Settings" and click on "Database Connections"
6. Click on "+ Database"
7. In the Dropdown, pick "MotherDuck"
:::note
If MotherDuck isn't listed, there's probably an error in the installation of the `duckdb-engine`. Review the installation steps under (2) to install this extra python package.
:::
8. Enter the database name that you want to connect to and the MotherDuck token of the user or service account that you want to use to connect to MotherDuck
:::info
`Database name` is **optional**. Instead of specifying a database name, you can leave it empty to connect to all databases.
:::
9. Finally, you can test your token/connection is valid by clicking "Test connection" and click "Connect".
Now your MotherDuck database is available in Superset and you can start querying data and making some dashboards!
## Preset
### Setup
You can register a Preset account for [free](https://preset.io/pricing/) (up to 5 users).
Upon your account creation, you will need to create a workspace and be prompted to connect to your data source.
### Adding your first database connection to MotherDuck
When you first setup Preset, you will be offered to create a connection to a databse. Preset has a direct integration with MotherDuck, making the connection process simpler.
1. In the Database Connection Dropdown in "Connect your first database", you can now select "MotherDuck" directly
2. Enter your MotherDuck credentials and database information
3. Click "Connect" to verify your connection is valid.
:::note
The Database Name needs to be prefixed with `md:` to connect to MotherDuck.
The Access Token is the token you created in the [MotherDuck dashboard](https://app.motherduck.com).
:::
Now your MotherDuck database is available in Preset and you can start creating dashboards immediately!
:::info
You can connect to multiple databases using a single MotherDuck connection.
:::
### Adding additional database connections
When adding more database connections to Preset, you can choose the option of "Get MotherDuck token". This will generate a new token from the MotherDuck account you are currently logged into.
1. Add a database connection by going to "Settings", then "Database Connections"
2. In the Database Connections page, click on "+ Database" in the top right corner
3. In the dropdown, select "MotherDuck" (see above)
4. Enter your MotherDuck credentials and database information
1. Here you have the option to generate a new token via the `Get MotherDuck token` button or use a token you previously created.
:::caution
Given that usually BI tools such as Preset and Superset are connected to service accounts, we do not recommend the "Get MotherDuck token" option for production systems but only for testing.
For production systems the recommended approach is to generate an access token for the dedicated service account using the MotherDuck REST API and connect this account to Preset instead.
:::
---
Source: https://motherduck.com/docs/integrations/bi-tools/tableau
---
sidebar_position: 5
title: Tableau
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
[Tableau](https://www.tableau.com/) is a widely-used business intelligence and data visualization platform that enables data analysts to build interactive dashboards and reports. MotherDuck supports both Tableau Cloud (via the Tableau Cloud Bridge) and Tableau Server.
## How to use Tableau Cloud with MotherDuck via Tableau Bridge
### Setup
This guide assumes you have:
- a [Tableau account](https://www.tableau.com/)
- a Tableau Cloud Site
- a Tableau Desktop installation (with the same version as the Tableau Cloud Server Version) set up with the DuckDB JDBC Driver and Tableau Connector.
If you don't, sign up or ask your organization to purchase a plan, or sign up for a free trial.
### Obtain a PAT token
Follow [Tableau's instructions on creating a PAT token.](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) This token must belong to a site admin.
### Setup Bridge client
Use instructions [here](https://help.tableau.com/current/online/en-us/to_bridge_client.htm) to install and setup Bridge client.
1. Make sure the machine where the Bridge client is installed has access to the Database used in the above steps.
Important notes:
> Network access - Because Bridge facilitates connections between your private network data and Tableau Cloud, it requires the ability to make outbound connections through the internet. After the initial outbound connection, communication is bidirectional.
> Required ports - Tableau Bridge uses port 443 to make outbound internet requests to Tableau Cloud and port 80 for certificate validation.
2. Install Bridge client and make sure the Bridge client is logged in to the Tableau Cloud site. You can download the installer [here](https://www.tableau.com/support/releases/bridge).
3. Install the driver and taco files as outlined [here](https://help.tableau.com/current/online/en-us/to_sync_local_data.htm#connectors-and-data-types).
- [Windows Server] The driver also needs to be installed here: `C:\Program Files\Tableau\Tableau Bridge\Drivers`
- [Windows Server] The connector also needs to be installed here: `C:\Program Files\Tableau\Connectors`
> Note: Tableau Bridge can be deployed on both Windows or Linux.
### Running Bridge on Linux using Docker (advanced)
If you want to run Bridge centrally on a Linux host, the official guidance recommends running it inside a Docker container, as described in Tableau’s documentation on [installing Bridge for Linux in containers](https://help.tableau.com/current/online/en-us/to_bridge_linux_install.htm).
Below is an **example Dockerfile** you can use as a starting point—this includes where to add JDBC drivers and the **DuckDB/MotherDuck** `.taco` file. It’s provided for inspiration and may require updates to match your environment or newer versions of the software.
Example Dockerfile
```dockerfile
FROM registry.access.redhat.com/ubi8/ubi:latest
RUN yum update -y
RUN yum install -y glibc-langpack-en
# This is the latest version of Tableau Bridge that is known working with the MotherDuck connector
RUN curl -o /tmp/TableauBridge.rpm -L \
https://downloads.tableau.com/tssoftware/TableauBridge-20243.25.0114.1153.x86_64.rpm && \
ACCEPT_EULA=y yum install -y /tmp/TableauBridge.rpm && \
rm /tmp/TableauBridge.rpm
# Drivers
RUN mkdir -p /opt/tableau/tableau_driver/jdbc
# Connectors (tacos)
RUN mkdir -p /root/Documents/My_Tableau_Bridge_Repository/Connectors
# Download DuckDB JDBC driver and signed taco
RUN curl -o /opt/tableau/tableau_driver/jdbc/duckdb_jdbc-1.3.0.0.jar \
-L https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/1.3.0.0/duckdb_jdbc-1.3.0.0.jar && \
curl -o /root/Documents/My_Tableau_Bridge_Repository/Connectors/duckdb_jdbc-v1.1.1-signed.taco \
-L https://github.com/motherduckdb/duckdb-tableau-connector/releases/download/v1.1.1/duckdb_jdbc-v1.1.1-signed.taco
ENV TZ=Europe/Berlin
ENV LC_ALL=en_US.UTF-8
# ----- user specific settings -----
ENV USER_EMAIL=""
ENV PAT_ID=BridgeToken
ENV CLIENT_NAME=""
ENV SITE_NAME=""
ENV POOL_ID=""
# -----------------------------------
CMD /opt/tableau/tableau_bridge/bin/run-bridge.sh -e \
--patTokenId=$PAT_ID \
--userEmail=$USER_EMAIL \
--client=$CLIENT_NAME \
--site=$SITE_NAME \
--patTokenFile="/home/documents/token.txt" \
--poolId=$POOL_ID
```
Key points:
* Build an image that **installs the Bridge RPM** and then copies the DuckDB JDBC driver to `/opt/tableau/tableau_bridge/Drivers` and the connector to `/root/Documents/My_Tableau_Bridge_Repository/Connectors`.
* Start the bridge by calling `run-bridge.sh` and pass the following flags:
* `--patTokenFile /run/secrets/pat.json`
* `--patTokenId `
* `--site `
* `--poolId ` (optional – see note on pools below)
* **PAT naming rule** – the *name* you give the Personal-Access-Token in Tableau **must** be a valid JSON key and must be used **verbatim**
1. as the key in `pat.json` → `{"": ""}`
2. in the `--patTokenId` flag.
A mismatch will result in a silent authentication failure.
* The latest Bridge **2025.1** builds contain a regression that prevents the MotherDuck connector (and several others) from loading. Until Tableau fixes this, pin the image to the **20243.25.0114.1153** release (see discussion [here](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/issues/22)).
* Bridge listens only on outbound **443/tcp**, so you do **not** need to publish any container ports. If you run a host firewall (e.g. `ufw`) remember that Docker bypasses it [[Docker docs](https://docs.docker.com/engine/network/packet-filtering-firewalls/#docker-and-ufw)]. Restrict egress traffic to Tableau Cloud CIDR blocks if your security policy requires it.
* Logs written to `stdout` are useful, but the *detailed* logs live in `/root/Documents/My_Tableau_Bridge_Repository/Log`. Mount this path as a volume or use a side-car to ship the logs to your observability stack.
### Tableau Cloud Bridge Pool setup
By default, Tableau places the Bridge in the default pool.
1. In Settings → Bridge page, make sure the Bridge client is connected in the connection Status.
2. In the "Private Network Allowlist" add the domain of the database and select the pool.
> **Pool Goctha**: Some users report that a Linux containerised Bridge never shows up under a custom site pool. If that happens simply leave `POOL_ID` blank when starting the client – it will join the legacy **Default** pool and still work with live connections.
### Create Embedded Data Source (Live) and Workbook
1. Open Tableau desktop and login to a Tableau Cloud site.
> Note: Make sure the Tableau Desktop and [Tableau Cloud version](https://help.tableau.com/current/server/en-us/version_server_view.htm) match.
2. Create new Workbook and select the database connector.
3. Connect to the database.
4. Setup Datasource to use live connectivity.
5. Create a worksheet with the data.
### Publish the Workbook to Tableau Cloud
1. Click on "Server > Publish Workbook".
2. Select "Publish Separately" under Publish Type and "Embedded password" under Authentication. Select "Maintain connection to a live data source".
3. Click "Publish Workbook & 1 Data Source".
### (Important step!) Update Tableau Bridge client in data source
1. Navigate to the newly published data source in Tableau Cloud (in your browser) and click on the "i" icon to open Data Source Details.
2. Click on "Change Bridge Client..."
3. Change the bridge client from "Site client pool" to your bridge client (the one you set up in the previous section). Click "Save" and close the dialog.
4. Check that the data source now shows up in your Tableau Bridge status dialog. This dialog is located in the Windows Start bar (in the Icon panel).
5. You can now access your Published Workbook on your Tableau Cloud Site, or you can create a new Tableau Workbook using the Published Data Source.
## Tableau Desktop DuckDB/MotherDuck Setup
1. Download a [recent version of the DuckDB JDBC driver](https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/) and copy it into the Tableau Drivers directory:
* MacOS: `~/Library/Tableau/Drivers/`
* Windows: `C:\Program Files\Tableau\Drivers`
* Linux: `/opt/tableau/tableau_driver/jdbc`
2. Download the signed tableau connector (aka "Taco file") file from the [latest available release](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/releases) and copy it into the Connectors directory:
* Desktop Windows: `C:\Users\[YourUser]\Documents\My Tableau Repository\Connectors`
* Desktop MacOS: `/Users/[YourUser]/Documents/My Tableau Repository/Connectors`
* Server Windows: `C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Connectors`
* Server Linux: `[Your Tableau Server Install Directory]/data/tabsvc/vizqlserver/Connectors`
## Connecting
Once the Taco is installed, and you have launched Tableau, you can create a new connection by choosing "DuckDB by MotherDuck":

### Local DuckDB database
If you wish to connect to a local DuckDB database, select "Local file" as DuckDB Server option, and use the file picker:


### In-Memory Database
The driver can be used with an in-memory database by selecting the `In-memory database` DuckDB Server option.

The data will then need to be provided by an Initial SQL string e.g.,
```sql
CREATE VIEW my_parquet AS
SELECT *
FROM read_parquet('/path/to/file/my_file.parquet');
```
You can then access it by using the Tableau Data Source editing controls.
### MotherDuck
To connect to MotherDuck, you have two authentication options:
* Token -- provide the value that you [get from MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token).
* No Authentication -- unless `motherduck_token` environment variable is available to Tableau at startup, you will then be prompted to authenticate when at connection time.
To work with a MotherDuck database in Tableau, you have to provide the database to use when issuing queries.
In `MotherDuck Database` field, provide the name of your database. You don't have to prefix it with `md:`:


## Additional information
* [Tableau Documentation](https://help.tableau.com/current/pro/desktop/en-us/gettingstarted_overview.htm)
* [Tableau Exchange Connector DuckDB/MotherDuck](https://exchange.tableau.com/en-gb/products/1021)
* [DuckDB Tableau Connector](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/)
---
Source: https://motherduck.com/docs/integrations/cloud-storage/amazon-s3
---
sidebar_position: 1
title: Amazon S3
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
## Configure Amazon S3 credentials
You can safely store your Amazon S3 credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. Secrets are scoped to your user account and are not shared with other users in your organization.
### Create a SECRET object
```sql
-- to configure a secret manually:
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key',
SECRET 'secret_key',
REGION 'us-east-1',
SCOPE 'my-bucket-path'
);
```
:::note
When creating a secret using the `CONFIG` (default) provider, be aware that the credential might be temporary. If so, a `SESSION_TOKEN` field also needs to be set for the secret to work correctly.
:::
```sql
-- to store a secret configured through `aws configure`:
CREATE SECRET aws_secret IN MOTHERDUCK (
TYPE S3,
PROVIDER credential_chain
);
```
```sql
-- test the s3 credentials
SELECT count(*) FROM 's3:///';
```
```python
import duckdb
con = duckdb.connect('md:')
con.sql("CREATE SECRET IN MOTHERDUCK (TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'your_bucket_region')");
# testing that our s3 credentials work
con.sql("SELECT count(*) FROM 's3:///'").show()
# 42
```
Click on your profile to access the `Settings` panel and click on `Secrets` menu.


Then click on `Add secret` in the secrets section.

You will then be prompted to enter your Amazon S3 credentials.

You can update your secret by executing [CREATE OR REPLACE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command to overwrite your secret.
### Delete a SECRET object
You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command.
```sql
DROP SECRET ;
```
Click on your profile and access the `Settings` menu. Click on the bin icon to delete your current secrets.

### Amazon S3 credentials as **temporary** secrets
MotherDuck supports DuckDB syntax for providing S3 credentials.
```sql
CREATE SECRET (
TYPE S3,
KEY_ID 's3_access_key',
SECRET 's3_secret_key',
REGION 'us-east-1'
);
```
:::note
Local/In-memory secrets are not persisted across sessions.
:::
## Troubleshooting
For detailed troubleshooting steps, see our [AWS S3 Secrets Troubleshooting](/documentation/troubleshooting/aws-s3-secrets.md) guide.
---
Source: https://motherduck.com/docs/integrations/cloud-storage/azure-blob-storage
---
sidebar_position: 1
title: Azure Blob Storage
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
## Configure Azure Blob Storage Credentials
You can safely store your Azure Blob Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command.
:::note
See [Azure docs](https://learn.microsoft.com/en-gb/azure/storage/common/storage-configure-connection-string#configure-a-connection-string-for-an-azure-storage-account) to find the correct connection string format.
:::
### Create a SECRET object
```sql
-- to configure a secret manually:
CREATE SECRET IN MOTHERDUCK (
TYPE AZURE,
CONNECTION_STRING '[your_connection_string]'
);
```
```sql
-- to store a secret configured through `az configure`:
CREATE SECRET az_secret IN MOTHERDUCK (
TYPE AZURE,
PROVIDER credential_chain,
ACCOUNT_NAME 'some-account'
);
```
```sql
-- test the azure credentials
SELECT count(*) FROM 'azure://[container]/[file]'
SELECT * FROM 'azure://[container]/*.csv';
```
```python
import duckdb
con = duckdb.connect('md:')
con.sql("CREATE SECRET IN MOTHERDUCK (TYPE AZURE, CONNECTION_STRING '[your_connection_string]')");
# testing that our Azure credentials work
con.sql("SELECT count(*) FROM 'azure://[container]/[file]'").show()
con.sql("SELECT * FROM 'azure://[container]/*.csv'").show()
```
Click on your profile to access the `Settings` panel and click on `Secrets` menu.


Then click on `Add secret` in the secrets section.

You will then be prompted to enter your Amazon S3 credentials.

### Delete a SECRET object
You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command.
```sql
DROP SECRET ;
```
Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret.

### Azure credentials as **temporary** secrets
MotherDuck supports DuckDB syntax for providing Azure credentials.
```sql
CREATE SECRET (
TYPE AZURE,
CONNECTION_STRING '[your_connection_string]'
);
```
or if you use the `az configure` command to store your credentials in the `az` CLI.
```sql
CREATE SECRET az_secret (
TYPE AZURE,
PROVIDER credential_chain,
ACCOUNT_NAME 'some-account'
);
```
:::note
Local/In-memory secrets are not persisted across sessions.
:::
---
Source: https://motherduck.com/docs/integrations/cloud-storage/cloudflare-r2
---
sidebar_position: 1
title: Cloudflare R2
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
## Configure Cloudflare R2 credentials
You can safely store your Cloudflare R2 credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command.
:::note
See [Cloudflare docs](https://developers.cloudflare.com/r2/api/s3/tokens/) to create a Cloudflare access token.
:::
### Create a SECRET object
```sql
CREATE SECRET IN MOTHERDUCK (
TYPE R2,
KEY_ID 'your_key_id',
SECRET 'your_secret_key',
ACCOUNT_ID 'your_account_id'
);
```
:::note
The account_id can be found when generating the API token on the endpoint URL `https://.r2.cloudflarestorage.com`
:::
```sql
-- test the R2 credentials
SELECT count(*) FROM 'r2://[bucket]/[file]'
```
```python
import duckdb
con = duckdb.connect('md:')
con.sql("CREATE SECRET IN MOTHERDUCK ( TYPE R2, KEY_ID 'your_key_id', SECRET 'your_secret_key', ACCOUNT_ID 'your_account_id' )");
# testing that our Azure credentials work
con.sql("SELECT count(*) FROM 'r2://[bucket]/[file]'").show()
```
Click on your profile to access the `Settings` panel and click on `Secrets` menu.


Then click on `Add secret` in the secrets section.

Select the Secret Type `R2` and fill in the required fields.
### Delete a SECRET object
You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command.
```sql
DROP SECRET ;
```
Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret.

### R2 credentials as **temporary** secrets
MotherDuck supports DuckDB syntax for providing Azure credentials.
```sql
CREATE SECRET (
TYPE R2,
KEY_ID 'your_key_id',
SECRET 'your_secret_key',
ACCOUNT_ID 'your_account_id'
);
```
:::note
Local/In-memory secrets are not persisted across sessions.
:::
---
Source: https://motherduck.com/docs/integrations/cloud-storage/google-cloud-storage
---
sidebar_position: 1
title: Google Cloud Storage
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
With MotherDuck, you can access files in a private Google Cloud Storage (GCS) bucket. This leverages the GCS S3 compatible connection.
## Google Cloud Storage Connection Process
1. Create an [HMAC key](https://docs.cloud.google.com/storage/docs/authentication/hmackeys) for the service account: Cloud Storage → Settings → Interoperability → Create a key for a service account
2. Save the Access ID and Secret (shown once)
3. Create the DuckDB secret using the HMAC credentials as described below
## Configure Google Cloud Storage credentials
You can safely store your Google Cloud Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command.
### Create a SECRET object
You can safely store your Google Cloud Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command.
```sql
CREATE SECRET IN MOTHERDUCK (
TYPE GCS,
KEY_ID 'HMAC_ACCESS_ID',
SECRET 'HMAC_SECRET',
);
-- test GCS credentials
SELECT count(*) FROM 'gcs:///';
```
```python
import duckdb
con = duckdb.connect('md:')
con.sql("CREATE SECRET IN MOTHERDUCK (TYPE GCS, KEY_ID 'access_key', SECRET 'secret_key')");
# test GCS
con.sql("SELECT count(*) FROM 'gcs:///'").show()
# 42
```
Click on your profile to access the `Settings` panel and click on `Secrets` menu.


Then click on `Add secret` in the secrets section.

You will then be prompted to enter your Amazon S3 credentials.

You can update your secret by executing [CREATE OR REPLACE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command to overwrite your secret.
### Delete a SECRET object
You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command.
```sql
DROP SECRET ;
```
Click on your profile and access the `Settings` menu. Click on the bin icon to delete your current secrets.

### Google Cloud Storage credentials as **temporary** secrets
MotherDuck supports DuckDB syntax for providing GCS credentials.
```sql
CREATE SECRET (
TYPE GCS,
KEY_ID 's3_access_key',
SECRET 's3_secret_key'
);
```
:::note
Local/In-memory secrets are not persisted across sessions.
:::
## Additional resources
- [Using the S3 compatible connection in GCS](https://docs.cloud.google.com/storage/docs/aws-simple-migration)
- [HMAC Keys in Google Cloud](https://docs.cloud.google.com/storage/docs/authentication/hmackeys)
---
Source: https://motherduck.com/docs/integrations/cloud-storage/hetzner-object-storage
---
sidebar_position: 5
title: Hetzner Object Storage
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
## Configure Hetzner Object Storage credentials
You can safely store your Hetzner Object Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command.
:::note
See [Hetzner docs](https://docs.hetzner.com/storage/object-storage/getting-started/generating-s3-keys/) to create S3 access keys. Make sure to save your secret key immediately as it cannot be viewed again after creation.
:::
### Create a SECRET object
```sql
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 'your_access_key', # provided by Hetzner
SECRET 'your_secret_key', # provided by Hetzner
ENDPOINT 'fsn1.your-objectstorage.com', # provided by Hetzner
SCOPE 'your_bucket_scope' # Example: s3://test-bucket
);
```
:::note
The endpoint must include the location (e.g., fsn1, nbg1, or hel1). Available endpoints:
- `fsn1.your-objectstorage.com` (Falkenstein)
- `nbg1.your-objectstorage.com` (Nuremberg)
- `hel1.your-objectstorage.com` (Helsinki)
:::
```sql
-- test the Hetzner Object Storage credentials
SELECT count(*) FROM 's3://[bucket]/[file]'
```
```python
import duckdb
con = duckdb.connect('md:')
con.sql("CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'your_access_key', SECRET 'your_secret_key', ENDPOINT 'fsn1.your-objectstorage.com', SCOPE 'your_bucket_scope' )");
# testing that our Hetzner credentials work
con.sql("SELECT count(*) FROM 's3://[bucket]/[file]'").show()
```
Click on your profile to access the `Settings` panel and click on `Secrets` menu.


Then click on `Add secret` in the secrets section.

Select the Secret Type `S3` and fill in the required fields. Make sure to add the endpoint URL (e.g., `fsn1.your-objectstorage.com`) in the endpoint field.
### Delete a SECRET object
You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command.
```sql
DROP SECRET ;
```
Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret.

### Hetzner Object Storage credentials as temporary secrets
MotherDuck supports DuckDB syntax for providing Hetzner Object Storage credentials.
```sql
CREATE SECRET (
TYPE S3,
KEY_ID 'your_access_key',
SECRET 'your_secret_key',
ENDPOINT 'fsn1.your-objectstorage.com',
SCOPE 'your_bucket_scope'
);
```
:::note
Local/In-memory secrets are not persisted across sessions.
:::
### Multiple locations configuration
If you have buckets in different Hetzner locations, you should be creating scoped secrets:
```sql
-- Secret for Falkenstein location
CREATE SECRET hetzner_fsn1 IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key_1',
SECRET 'secret_key_1',
ENDPOINT 'fsn1.your-objectstorage.com',
SCOPE 's3://my-bucket-fsn1'
);
-- Secret for Nuremberg location
CREATE SECRET hetzner_nbg1 IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key_2',
SECRET 'secret_key_2',
ENDPOINT 'nbg1.your-objectstorage.com',
SCOPE 's3://my-bucket-nbg1'
);
```
:::tip
By default, each key pair is automatically valid for every bucket within the same Hetzner project. Use bucket policies to restrict access if needed.
:::
---
Source: https://motherduck.com/docs/integrations/cloud-storage/index
---
title: Cloud Storage
description: Use MotherDuck with your favorite cloud storage services
---
import DocCardList from '@theme/DocCardList';
# Cloud Storage
MotherDuck integrates with popular cloud storage services to help you manage and store your data.
---
Source: https://motherduck.com/docs/integrations/cloud-storage/tigris
---
sidebar_position: 5
title: Tigris
---
import Tabs from "@theme/Tabs";
import TabItem from "@theme/TabItem";
With MotherDuck, you can access files in a private Tigris bucket. Tigris is a globally distributed S3-compatible object storage service that provides low latency anywhere in the world.
## Tigris Requirements
To get started using Tigris with MotherDuck, you need to:
1. Create a new bucket at [storage.new](https://storage.new) if you don't have one
2. Create an access keypair for that bucket at [storage.new/accesskey](https://storage.new/accesskey)
3. Configure MotherDuck to use Tigris
4. Query files in Tigris
When creating a bucket, you can select from different storage tiers:
- Standard (default) - Best for general use cases
- Infrequent Access - Cheaper than Standard, but charges per gigabyte of retrieval
- Instant Retrieval Archive - For long-term storage with urgent access needs
- Archive - For long-term storage where retrieval time is not critical
## Configure Tigris credentials
### Create a SECRET object
:::note
If you are using multiple secrets, the `SCOPE` parameter will make sure MotherDuck knows which one to use. You can validate which secret to use with [`which_secret`](https://duckdb.org/docs/stable/configuration/secrets_manager).
As an example, see below:
```sql
FROM which_secret('s3://my-other-bucket/file.parquet', 's3');
```
:::
```sql
CREATE OR REPLACE PERSISTENT SECRET tigris (
TYPE s3,
PROVIDER config,
KEY_ID 'tid_access_key_id',
SECRET 'tsec_secret_access_key',
REGION 'auto',
ENDPOINT 't3.storage.dev',
URL_STYLE 'vhost',
SCOPE 's3://my_bucket'
);
-- test Tigris credentials
SELECT count(*) FROM 's3:///';
```
```python
import duckdb
con = duckdb.connect('md:')
con.sql("""
CREATE OR REPLACE PERSISTENT SECRET tigris (
TYPE s3,
PROVIDER config,
KEY_ID 'tid_access_key_id',
SECRET 'tsec_secret_access_key',
REGION 'auto',
ENDPOINT 't3.storage.dev',
URL_STYLE 'vhost',
SCOPE 's3://my_bucket'
)
""")
# test Tigris
con.sql("SELECT count(*) FROM 's3:///'").show()
```
Adding Tigris secrets via the UI is not supported. Please add them using SQL statements.
### Delete a SECRET object
```sql
DROP SECRET tigris;
```
### Tigris credentials as **temporary** secrets
You can also create temporary secrets that are not persisted across sessions:
```sql
CREATE OR REPLACE SECRET (
TYPE s3,
PROVIDER config,
KEY_ID 'tid_access_key_id',
SECRET 'tsec_secret_access_key',
REGION 'auto',
ENDPOINT 't3.storage.dev',
URL_STYLE 'vhost'
);
```
:::note
Local/In-memory secrets are not persisted across sessions.
:::
---
Source: https://motherduck.com/docs/integrations/data-quality/index
---
title: Data Quality Tools
description: Monitor and maintain data quality in MotherDuck
---
import DocCardList from '@theme/DocCardList';
# Data Quality Tools
Ensure data quality and reliability in MotherDuck using these integrated tools.
---
Source: https://motherduck.com/docs/integrations/data-science-ai/index
---
title: Data Science & AI
description: Use MotherDuck with your favorite data science and AI tools
---
import DocCardList from '@theme/DocCardList'
# Data Science & AI Tools
MotherDuck integrates with popular data science and AI tools to help you build powerful machine learning and AI applications.
---
Source: https://motherduck.com/docs/integrations/data-science-ai/marimo
---
sidebar_position: 7
title: Marimo
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# marimo
[marimo](https://marimo.io/) is a reactive notebook for Python and SQL that models notebooks as dataflow graphs. When you run a cell or interact with a UI element, marimo automatically runs affected cells (or marks them as stale), keeping code and outputs consistent and preventing bugs before they happen. Every marimo notebook is stored as pure Python, executable as a script, and deployable as an app.
## Getting Started
### Installation
First, install marimo with SQL support:
```bash
pip install "marimo[sql]"
```
```bash
uv pip install "marimo[sql]"
```
```bash
conda install -c conda-forge marimo duckdb polars
```
### Authentication
There are two ways to authenticate:
1. **Interactive Authentication**: When you first connect to MotherDuck (e.g. `ATTACH 'md:my_db'`), marimo will open a browser window for authentication.
2. **Token-based Authentication**: Set your MotherDuck token as an environment variable:
```bash
export motherduck_token="your_token"
```
You can find your token in the MotherDuck UI under Account Settings.
## Using MotherDuck
First, open your first notebook:
```bash
marimo edit my_notebook.py
```
### 1. Connecting and Database Discovery
```sql
ATTACH IF NOT EXISTS 'md:my_db'
```
```python
import duckdb
# Connect to MotherDuck
duckdb.sql("ATTACH IF NOT EXISTS 'md:my_db'")
```
You will be prompted to authenticate with MotherDuck when you run the above cell. This will open a browser window where you can log in and authorize your marimo notebook to access your MotherDuck database. In order to avoid being prompted each time you open a notebook, you can set the `motherduck_token` environment variable:
```bash
export motherduck_token="your_token"
marimo edit my_notebook.py
```
Once connected, your MotherDuck tables are automatically discovered in the Datasources Panel:

Browse your MotherDuck databases
### 2. Writing SQL Queries
You can query your MotherDuck db using SQL cells in marimo. Here's an example of how to query a table and display the results using marimo:

Query a MotherDuck table
marimo's reactive execution model extends into SQL queries, so changes to your SQL will automatically trigger downstream computations for dependent cells (or optionally mark cells as stale for expensive computations).

### 3. Mixing SQL and Python
marimo allows you to seamlessly combine SQL queries with Python code:

Mixing SQL and Python
## Example Notebook
For a full example of using MotherDuck with marimo, check out this [example notebook](https://github.com/marimo-team/marimo/blob/main/examples/sql/connect_to_motherduck.py).
---
Source: https://motherduck.com/docs/integrations/databases/bigquery
---
sidebar_position: 1
title: BigQuery
---
BigQuery is Google Cloud's fully-managed, serverless data warehouse that enables SQL queries using the processing power of Google's infrastructure.
To load data into MotherDuck, there are two options:
1. **[Using the `duckdb-bigquery` community extension](#1-using-the-duckdb-bigquery-community-extension)** (easiest to use) - Simple SQL-based approach for quick data transfers and exploration.
2. **[Using Google's BigQuery Python SDK](#2-using-googles-bigquery-python-sdk)** - For performance-optimized ETL pipelines with advanced control over data loading.
## Prerequisites
- DuckDB installed (via CLI or Python).
- Access to a GCP project with BigQuery enabled.
- Valid Google Cloud credentials via:
- `GOOGLE_APPLICATION_CREDENTIALS` environment variable, or
- `gcloud auth application-default login`.
Minimum required IAM roles:
- `BigQuery Data Editor`
- `BigQuery Job User`
## 1. Using the DuckDB BigQuery Community Extension
The following examples use the [DuckDB CLI](/getting-started/interfaces/connect-query-from-duckdb-cli.mdx), but you can use any [DuckDB/MotherDuck clients](/getting-started/interfaces/interfaces.mdx).
### Install and Load the Extension
```sql
INSTALL bigquery FROM community;
LOAD bigquery;
```
:::info
A new experimental scan is now available and offers significantly improved performance. To enable it by default, run:`SET bq_experimental_use_incubating_scan=TRUE`
:::
### Attach BigQuery Project
To read data from your project, you attach it just like you would attach a DuckDB database with the following syntax
```sql
ATTACH 'project=my-gcp-project' AS bq (TYPE bigquery, READ_ONLY);
```
To read from a public dataset, you can use the following syntax
```sql
ATTACH 'project=bigquery-public-data dataset=pypi billing_project=my-gcp-project'
AS bq_public (TYPE bigquery, READ_ONLY);
```
### Query a Table
Once attached, you can query BigQuery tables directly using standard SQL syntax:
```sql
SELECT * FROM bq.dataset_name.table_name LIMIT 10;
```
#### Alternative Query Functions
Behind the scenes, the above query uses `bigquery_scan`. The extension provides two explicit functions for more control over data retrieval:
**`bigquery_scan`** - Efficient for reading entire tables or simple queries:
```sql
SELECT * FROM bigquery_scan('my_gcp_project.my_dataset.my_table');
```
**`bigquery_query`** - Execute custom [GoogleSQL](https://cloud.google.com/bigquery/docs/introduction-sql) queries within your BigQuery project. Recommended for querying large tables with complex filters.
```sql
SELECT * FROM bigquery_query(
'my_gcp_project',
'SELECT * FROM `my_gcp_project.my_dataset.my_table` WHERE column = "value"'
);
```
### Loading Data to MotherDuck
Ensure the `motherduck_token` environment variable is set:
```sql
ATTACH 'md:';
```
You can use the `CREATE TABLE ... AS` syntax to create a new table, or `INSERT INTO ... SELECT` to append data to an existing table.
```sql
CREATE DATABASE IF NOT EXISTS pypi_playground;
USE pypi_playground;
CREATE TABLE IF NOT EXISTS duckdb_sample AS
SELECT *
FROM bq_public.pypi.file_downloads
WHERE project = 'duckdb'
AND timestamp = TIMESTAMP '2025-05-26 00:00:00'
LIMIT 100;
```
---
## 2. Using Google's BigQuery Python SDK
For optimized ETL pipeline performance—especially when working with large tables and filter pushdown—we recommend using the [Google Cloud BigQuery Python SDK](https://cloud.google.com/python/docs/reference/bigquery/latest/index.html), which streams results efficiently directly to an Arrow table, enabling zero-copy loading to DuckDB.
### Install Required Libraries
```bash
pip install google-cloud-bigquery[bqstorage] duckdb
```
The "extras" option `[bqstorage]` installs `google-cloud-bigquery-storage`. By default, the `google-cloud-bigquery` client uses the **standard BigQuery API** to read query results. This is fine for small results, but **much slower and less efficient** for large datasets.
### Python end-to-end pipeline example
The above example has 3 functions :
- `get_bigquery_client()` - Authenticates and returns a BigQuery client using service account credentials or default authentication.
- `get_bigquery_result()` - Executes a BigQuery SQL query and returns the results as a PyArrow table.
- `create_duckdb_table_from_arrow()` - Creates a DuckDB table from PyArrow data in either local DuckDB or MotherDuck.
```python
import os
from google.cloud import bigquery
from google.oauth2 import service_account
from google.auth.exceptions import DefaultCredentialsError
import logging
import time
import pyarrow as pa
import duckdb
GCP_PROJECT = 'my-gcp-project'
DATASET_NAME = 'my_dataset'
TABLE_NAME = 'my_table'
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
def get_bigquery_client(project_name: str) -> bigquery.Client:
"""Get Big Query client"""
try:
service_account_path = os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
if service_account_path:
credentials = service_account.Credentials.from_service_account_file(
service_account_path
)
bigquery_client = bigquery.Client(
project=project_name, credentials=credentials
)
return bigquery_client
raise EnvironmentError(
"No valid credentials found for BigQuery authentication."
)
except DefaultCredentialsError as creds_error:
raise creds_error
def get_bigquery_result(
query_str: str, bigquery_client: bigquery.Client
) -> pa.Table:
"""Get query result from BigQuery and yield rows as dictionaries."""
try:
# Start measuring time
start_time = time.time()
# Run the query and directly load into a DataFrame
logging.info(f"Running query: {query_str}")
pa_tbl = bigquery_client.query(query_str).to_arrow()
# Log the time taken for query execution and data loading
elapsed_time = time.time() - start_time
logging.info(
f"BigQuery query executed and data loaded in {elapsed_time:.2f} seconds")
# Iterate over DataFrame rows and yield as dictionaries
return pa_tbl
except Exception as e:
logging.error(f"Error running query: {e}")
raise
def create_duckdb_table_from_arrow(
pa_table: pa.Table,
table_name: str,
db_path: str,
database_name: str = "bigquery",
) -> None:
"""
Create a DuckDB table from PyArrow table data.
Args:
pa_table: PyArrow table containing the data
table_name: Name of the table to create in DuckDB
database_name: Name of the database to create/use (default: bigquery_playground)
db_path: Database path - use 'md:' prefix for MotherDuck, file path for local or just :memory: for in-memory
"""
try:
# Connect to DuckDB
if db_path.startswith("md:"):
# check env var motherduck_token
if not os.environ.get("motherduck_token"):
raise EnvironmentError(
"motherduck_token environment variable is not set")
conn = duckdb.connect(db_path)
# Create database if not exists
conn.sql(f"CREATE DATABASE IF NOT EXISTS {database_name}")
conn.sql(f"USE {database_name}")
# Create table from PyArrow table
conn.sql(
f"CREATE OR REPLACE TABLE {table_name} AS SELECT * FROM pa_table")
logging.info(
f"Successfully created table '{table_name}' in database '{database_name}' with {len(pa_table)} rows to {db_path}")
except Exception as e:
logging.error(f"Error creating DuckDB table: {e}")
raise
if __name__ == "__main__":
# Run the pipeline
bigquery_client = get_bigquery_client(GCP_PROJECT)
pa_table = get_bigquery_result(f"""SELECT * FROM `{GCP_PROJECT}.{DATASET_NAME}.{TABLE_NAME}}`""", bigquery_client)
create_duckdb_table_from_arrow(
pa_table=pa_table, table_name=TABLE_NAME, db_path="md:")
```
---
Source: https://motherduck.com/docs/integrations/databases/index
---
title: databases
description: Use MotherDuck with your favorite databases
---
import DocCardList from '@theme/DocCardList';
# Databases
MotherDuck integrates directly with popular databases to help you build data pipelines and applications.
---
Source: https://motherduck.com/docs/integrations/databases/planetscale
---
sidebar_position: 2
title: PlanetScale
description: Connect PlanetScale Postgres to MotherDuck using pg_duckdb extension or the Postgres connector for analytical query acceleration
---
PlanetScale offers hosted PostgreSQL and MySQL Vitess Databases. MotherDuck supports PlanetScale Postgres via the [pg_duckdb extension](/concepts/pgduckdb), as well as the [Postgres Connector](/integrations/databases/postgres/). In our internal benchmarking, pg_duckdb offers 100x or greater query acceleration for analytical queries when compared to vanilla Postgres.
## Prerequisites
Before connecting PlanetScale to MotherDuck, ensure you have:
- A PlanetScale account with a Postgres database created
- The `pg_duckdb` extension enabled in your PlanetScale database (see [PlanetScale extension documentation](https://planetscale.com/docs/postgres/extensions/pg_duckdb))
- A MotherDuck account and authentication token (get your token from the [MotherDuck dashboard](https://app.motherduck.com))
- Database connection credentials from your PlanetScale dashboard (host, port, username, password, database name)
## Connecting pg_duckdb to MotherDuck
To run pg_duckdb, make sure to add it your [extensions in PlanetScale](https://planetscale.com/docs/postgres/extensions/pg_duckdb).
:::tip
Review the configuration parameters before deploying the extension. Once deployed, you can connect to MotherDuck with the following SQL statements.
:::
```sql
-- Grant necessary permissions to the PlanetScale superuser
GRANT CREATE ON SCHEMA public to pscale_superuser;
-- Create the pg_duckdb extension in your Postgres database
CREATE EXTENSION pg_duckdb;
-- Enable a MotherDuck connection with your authentication token
CALL duckdb.enable_motherduck();
```
To swap tokens, you can drop the MotherDuck connection and then re-add with:
```sql
-- Remove the existing MotherDuck server connection
DROP SERVER motherduck CASCADE;
-- Re-enable MotherDuck with a new authentication token
CALL duckdb.enable_motherduck();
```
### Using Read Replicas with PlanetScale
:::info
Pg_duckdb will automatically round-robin between your replicas when you use a read-only token. When switching between a read-write and a read-only token, you will want to snapshot your database and then force sync as part of the hand-off.
:::
Switching from read-write to read-only is done with the following SQL statement in Postgres:
```sql
-- Create a snapshot of your MotherDuck database to ensure consistency
SELECT * FROM duckdb.raw_query('CREATE SNAPSHOT OF ');
-- Drop the existing MotherDuck connection
DROP SERVER motherduck CASCADE;
-- Re-enable MotherDuck with your read-only token
CALL duckdb.enable_motherduck();
-- Refresh the database to sync with the snapshot
SELECT * FROM duckdb.raw_query('REFRESH DATABASE ');
```
### Reading from MotherDuck
:::info
By default, data in [MotherDuck is mapped to Postgres in two different ways](https://github.com/duckdb/pg_duckdb/blob/main/docs/motherduck.md#schema-mapping). This is because MotherDuck is designed to hold many databases in its global catalog, while Postgres traditionally has a single database in its catalog.
- For data in `my_db.main`, it is mapped directly to the `public` schema in the Postgres database.
- For data in any other database & schema, it is mapped to `ddb$database$schema` in the Postgres database.
:::
Once the catalog is in sync between MotherDuck and Postgres, the data can be queried directly from Postgres. If it is out of sync for any reason, it can be re-sync'd with the following SQL command:
```sql
-- Terminate the pg_duckdb sync worker to force a re-sync
SELECT * FROM pg_terminate_backend((
SELECT pid FROM pg_stat_activity WHERE backend_type = 'pg_duckdb sync worker'
));
```
#### Sample MotherDuck Queries
Once the catalog is synchronized to Postgres, we can query the data as if it was normal data in Postgres.
```sql
-- Query data from a MotherDuck database and schema
-- Note: Non-main schemas use the ddb$database$schema naming convention
SELECT *
FROM "ddb$sample_data$nyc".taxi
ORDER BY tpep_dropoff_datetime DESC
LIMIT 10;
```
Of course, we can also join with data in Postgres.
```sql
-- Join MotherDuck data with local Postgres tables
SELECT a.col1, b.col2
-- MotherDuck table from a non-main schema
FROM "ddb$my_database$my_schema".my_table AS a
-- Local Postgres table in the public schema
LEFT JOIN public.another_table AS b on a.key = b.key
```
The DuckDB `iceberg_scan` function also works as well:
```sql
-- Use DuckDB's iceberg_scan function to query Iceberg tables
SELECT COUNT(*)
FROM iceberg_scan('https://motherduck-demo.s3.amazonaws.com/iceberg/lineitem_iceberg', allow_moved_paths := true)
```
:::info
Two special helper functions exist to run queries directly with DuckDB:
- **`duckdb.query`**: Returns tabular data, use for SELECT queries
- **`duckdb.raw_query`**: Returns void, use for DDL queries such as Snapshot Creation and Database Refresh. This function keeps the database in-sync when handing off between read and write nodes.
:::
```sql
-- Use duckdb.query for SELECT queries that return tabular data
-- This example lists all databases in MotherDuck
SELECT * FROM duckdb.query('FROM md_databases()')
```
```sql
-- Use duckdb.raw_query for DDL queries that return void
-- This example drops a table in MotherDuck
SELECT * FROM duckdb.raw_query('DROP TABLE my_database.my_schema.some_table')
```
### Replicating data to MotherDuck
:::tip
For smaller tables, data can be replicated using simple SQL statements.
:::
```sql
-- Create a table in MotherDuck and populate it with data from Postgres
-- Replace my_database and my_schema with your target database and schema names
CREATE TABLE "ddb$my_database$my_schema".my_table USING duckdb AS
SELECT * FROM public.my_table
```
:::tip
For larger tables, state management, and tighter SLAs & requirements, MotherDuck offers [integrations to various other ingestion partners](/integrations/ingestion/).
:::
### Further reading
The [pg_duckdb github repo](https://github.com/duckdb/pg_duckdb) contains [further documentation](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md) of all available functions.
For ease of finding the documentation, a table of the documentation sections is below:
| Topic | Description |
|-------|-------------|
| [**Functions**](https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md) | Complete reference for all available functions |
| [**Syntax Guide & Gotchas**](https://github.com/duckdb/pg_duckdb/blob/main/docs/gotchas_and_syntax.md) | Quick reference for common SQL patterns and things to know |
| [**Types**](https://github.com/duckdb/pg_duckdb/blob/main/docs/types.md) | Supported data types and type mappings |
| [**Extensions**](https://github.com/duckdb/pg_duckdb/blob/main/docs/extensions.md) | DuckDB extension installation and usage |
| [**Settings**](https://github.com/duckdb/pg_duckdb/blob/main/docs/settings.md) | Configuration options and parameters |
| [**Transactions**](https://github.com/duckdb/pg_duckdb/blob/main/docs/transactions.md) | Transaction behavior and limitations |
## Connecting with the Postgres Extension
You can also connect to PlanetScale Postgres with the DuckDB Postgres extension. This approach allows you to query PlanetScale data directly from DuckDB or MotherDuck.
### Install and Load the Extension
```sql
-- Install the Postgres extension from DuckDB's extension registry
INSTALL postgres;
-- Load the extension to enable Postgres connectivity
LOAD postgres;
-- Attach your PlanetScale database using a connection string
ATTACH '' AS postgres_db (TYPE postgres);
```
### Connection String Format
The connection string format follows PostgreSQL's standard connection parameters. Here's an example with explanations:
```sql
ATTACH 'host= port= user= password= dbname= sslmode=require'
AS planetscale (TYPE postgres);
```
**Connection Parameters:**
- `host`: Your PlanetScale database hostname (found in your PlanetScale dashboard)
- `port`: The database port (typically 3306 for MySQL or 5432 for Postgres)
- `user`: Your PlanetScale database username
- `password`: Your PlanetScale database password
- `dbname`: The name of your database in PlanetScale
- `sslmode=require`: Ensures SSL encryption is used (required for PlanetScale)
:::info
The above connection string works with DuckDB. PlanetScale suggests also using the `sslnegotation` and `sslrootcert` keys when connecting to Postgres, but these keys are not supported by the `libpq` version that is included in DuckDB. The `sslmode=require` parameter is sufficient for secure connections.
:::
---
Source: https://motherduck.com/docs/integrations/databases/postgres
---
sidebar_position: 1
title: PostgreSQL
---
[PostgreSQL](https://www.postgresql.org) is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database systems much later.
As explained by DuckDB Lab's Hannes Muhleisen in the [explainer blog post](https://duckdb.org/2022/09/30/postgres-scanner.html):
> PostgreSQL is designed for traditional transactional use cases, "OLTP", where rows in tables are created, updated and removed concurrently, and it excels at this. But this design decision makes PostgreSQL far less suitable for analytical use cases, "OLAP", where large chunks of tables are read to create summaries of the stored data. Yet there are many use cases where both transactional and analytical use cases are important, for example when trying to gain the latest business intelligence insights into transactional data.
MotherDuck supports two PostgreSQL-native ways interact with the database:
- [The postgres scanner extension for DuckDB](/key-tasks/loading-data-into-motherduck/loading-data-from-postgres)
- [pg_duckdb](/concepts/pgduckdb), a PostgreSQL extension for reading data from Postgres.
---
Source: https://motherduck.com/docs/integrations/dev-tools/index
---
title: Development Tools
description: Developer tools and utilities that work with MotherDuck
---
import DocCardList from '@theme/DocCardList';
# Development Tools
Use MotherDuck with various development tools and utilities to enhance your workflow.
---
Source: https://motherduck.com/docs/integrations/file-formats/apache-iceberg
---
sidebar_position: 1
title: Apache Iceberg
---
MotherDuck supports querying data in the [Apache Iceberg format](https://iceberg.apache.org/). The [Iceberg DuckDB extension](https://duckdb.org/docs/extensions/iceberg.html) is loaded automatically when any of the supported Iceberg functions are called.
## Iceberg functions
| Function Name | Description |
| :--- | :--- |
| `iceberg_scan` | Query Iceberg data |
| `iceberg_metadata` | Query Iceberg metadata, such as the snapshot status, data format, and number of records. |
| `iceberg_snapshots` | Information about the snapshots available in the data folder. |
:::note
The available functions are only for reading Iceberg data. Creating or updating data in Iceberg format is not yet supported.
:::
## Examples
```sql
-- query data
SELECT count(*)
FROM iceberg_scan('path-to-iceberg-folder',
allow_moved_paths=true);
-- query metadata
SELECT *
FROM iceberg_metadata('path-to-iceberg-folder',
allow_moved_paths=true);
-- query snapshots
SELECT *
FROM iceberg_snapshots('path-to-iceberg-folder');
```
### Query iceberg data stored in Amazon S3
```sql
SELECT count(*)
FROM iceberg_scan('s3:///',
allow_moved_paths=true);
```
:::note
To query data in a secure Amazon S3 bucket, you will need to configure your [Amazon S3 credentials](../../cloud-storage/amazon-s3).
:::
Example using MotherDuck Iceberg sample dataset.
```sql
SELECT count(*)
FROM iceberg_scan('s3://us-prd-motherduck-open-datasets/iceberg/lineitem_iceberg',
allow_moved_paths=true)
```
---
Source: https://motherduck.com/docs/integrations/file-formats/delta-lake
---
sidebar_position: 1
title: Delta Lake
---
MotherDuck supports querying data in the [Delta Lake format](https://delta.io/). The [Delta DuckDB extension](https://duckdb.org/docs/extensions/delta.html) is loaded automatically when any of the supported Delta Lake functions are called.
## Delta function
| Function Name | Description | Supported parameters
| :--- | :--- | :--- |
| `delta_scan` | Query Delta Lake data | All the parquet_scan parameters plus delta_file_number.
:::note
The available functions are only for reading Delta Lake data. Creating or updating data in Delta format is not yet supported.
:::
## Examples
```sql
-- query data
SELECT COUNT(*) FROM delta_scan('path-to-delta-folder');
-- query data with parameters
FROM delta_scan('path-to-delta-folder', delta_file_number=1, file_row_number=1);
```
### Query Delta data stored in Amazon S3
:::warning
At the moment, querying Delta tables stored in Amazon S3 from **public** buckets is not supported.
:::
[Create a S3 secret](/sql-reference/motherduck-sql-reference/create-secret.md) in MotherDuck using the secret manager:
```sql
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 's3_access_key',
SECRET 's3_secret_key',
REGION 's3-region'
);
```
Query Delta data stored in S3:
```sql
SELECT count(*)
FROM delta_scan('s3:///');
```
:::note
To query data in an Amazon S3 bucket, you will need to configure your [Amazon S3 credentials](../../cloud-storage/amazon-s3).
:::
Example using MotherDuck Delta sample dataset.
```sql
SELECT COUNT(*)
FROM delta_scan('s3://us-prd-motherduck-open-datasets/file_format_demo/delta_lake/dat/out/reader_tests/generated/basic_append/delta');
```
---
Source: https://motherduck.com/docs/integrations/file-formats/ducklake
---
sidebar_position: 1
title: DuckLake
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
import Versions from '@site/src/components/Versions';
::::note
MotherDuck currently supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client version through .
::::
[DuckLake](https://ducklake.select) is an integrated data lake and catalog format. DuckLake delivers advanced data lake features without traditional lakehouse complexity by using Parquet files and a SQL database.
MotherDuck provides two main options for creating and integrating with DuckLake databases:
- **[Fully managed](#creating-a-fully-managed-ducklake-database)**: Create a DuckLake database where MotherDuck manages both data storage and metadata
- **[Bring your own bucket](#bring-your-own-bucket)**: Connect your own S3-compatible object storage for data storage with:
- **[MotherDuck compute + MotherDuck catalog](#using-motherduck-compute)**: Use MotherDuck for both compute and catalog services
- **[Own compute + MotherDuck catalog](#using-own-compute)**: Use your own DuckDB client for compute while MotherDuck provides catalog services
## Creating a fully managed DuckLake database
Create a fully managed DuckLake with the following command:
```sql
CREATE DATABASE my_ducklake (TYPE DUCKLAKE);
```
MotherDuck stores both data and metadata in MotherDuck-managed storage (not externally accessible at the moment), providing a streamlined way to evaluate DuckLake functionality.
The `my_ducklake` database can be accessed like any other MotherDuck database.
## Data Inlining
::::warning
Data inlining is currently experimental and requires explicit enablement using the `DATA_INLINING_ROW_LIMIT` parameter during database creation.
::::
Data inlining is an optimization feature that stores small data changes directly in the metadata catalog rather than creating individual Parquet files for every insert operation. This eliminates the overhead of creating small Parquet files while maintaining full query and update capabilities.
### How Data Inlining Works
When you enable data inlining with the `DATA_INLINING_ROW_LIMIT` parameter, any insert operation writing fewer rows than your specified threshold is automatically stored as inlined data in the metadata catalog. Larger inserts continue to use Parquet files as usual.
For example, if you set `DATA_INLINING_ROW_LIMIT` to 100, inserts with fewer than 100 rows are stored inline, while inserts with 100 or more rows create Parquet files.
### Creating a DuckLake Database with Data Inlining
To create a (fully managed) DuckLake database with data inlining enabled:
```sql
CREATE DATABASE my_ducklake (
TYPE DUCKLAKE,
DATA_INLINING_ROW_LIMIT 100
);
```
This configuration will inline all inserts with fewer than 100 rows directly into the metadata catalog.
### Flushing Inlined Data
You can manually convert inlined data to Parquet files using the `ducklake_flush_inlined_data` function.
```sql
-- Flush inlined data for a specific table
SELECT ducklake_flush_inlined_data('my_ducklake.my_schema.my_table');
-- Flush all inlined data in a schema
SELECT ducklake_flush_inlined_data('my_ducklake.my_schema');
-- Flush all inlined data in the database
SELECT ducklake_flush_inlined_data('my_ducklake');
```
### When to Use Data Inlining
Data inlining is particularly beneficial for:
- **High-frequency, small-batch inserts**: Applications with streaming data or frequent small updates
- **Incremental data loading**: ETL processes that append small batches of data regularly
- **Transactional workloads**: Systems where individual transactions insert small numbers of rows
## Bring your own bucket
You can use MotherDuck as a compute engine and managed DuckLake catalog while connecting your own S3-compatible object store (such as [AWS S3](/integrations/cloud-storage/amazon-s3/), [GCS](/integrations/cloud-storage/google-cloud-storage/), [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2/), and [Tigris](/integrations/cloud-storage/tigris/)) for data storage. Additionally, you can bring your own compute (BYOC) using your DuckDB client to query and write data directly to your DuckLake.
### Setup
Configure a custom data path when creating your DuckLake to use your own S3-compatible object storage:
:::note
MotherDuck is currently available on AWS in two regions, **US East (N. Virginia)** - `us-east-1` and **Europe (Frankfurt)** - `eu-central-1`. For optimal performance and costs, we recommend using an S3 bucket in the same region as your MotherDuck Organization.
:::
```sql
CREATE DATABASE my_ducklake (
TYPE DUCKLAKE,
DATA_PATH 's3://mybucket/my_optional_path/'
);
```
[Create a corresponding secret](/sql-reference/motherduck-sql-reference/create-secret/) in MotherDuck to allow MotherDuck compute to access your bucket:
```sql
CREATE SECRET my_secret IN MOTHERDUCK (
TYPE S3,
KEY_ID 'my_s3_access_key',
SECRET 'my_s3_secret_key',
REGION 'my-bucket-region',
SCOPE 'my-bucket-path'
);
```
:::info
For service workloads like ETL, static keys are typically required. For queries in the DuckDB UI or CLI, we recommend using `aws sso login` to generate temporary credentials. See [Create Secrets: Amazon S3](/integrations/cloud-storage/amazon-s3/#create-a-secret-object) for details.
:::
You can then create DuckLake tables as you would with a standard DuckDB database using either MotherDuck or local compute as shown in the examples below.
#### Required IAM Permissions for DuckLake
The minimum required IAM permissions are:
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "${s3_bucket_arn}"
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "${s3_bucket_arn}/*"
}
]
}
```
### Using MotherDuck compute
Connect to MotherDuck:
```sql
./duckdb md:
```
Create your first DuckLake table from an hosted Parquet file:
```sql
CREATE TABLE my_ducklake.air_quality AS
SELECT * FROM 's3://us-prd-motherduck-open-datasets/who_ambient_air_quality/parquet/who_ambient_air_quality_database_version_2024.parquet';
```
Query using MotherDuck:
```sql
SELECT
year,
AVG(pm25_concentration) AS avg_pm25,
AVG(pm10_concentration) AS avg_pm10,
AVG(no2_concentration) AS avg_no2
FROM my_ducklake.air_quality
WHERE city = 'Berlin'
GROUP BY year
ORDER BY year DESC;
```
### Using own compute
To use your own compute (e.g., your DuckDB client), you must:
1. Ensure you have appropriate S3-compatible credentials in your compute environment to read/write to your defined `DATA_PATH` (specified at database creation)
2. Attach the metadata database
Create a secret in your compute environment if you have authenticated using `aws sso login`:
```sql
CREATE OR REPLACE SECRET my_secret IN MOTHERDUCK (
TYPE S3,
PROVIDER credential_chain
);
```
Alternatively, provide static AWS keys:
```sql
CREATE SECRET my_secret IN MOTHERDUCK (
TYPE S3,
KEY_ID 'my_s3_access_key',
SECRET 'my_s3_secret_key',
REGION 'my-bucket-region',
SCOPE 'my-bucket-path'
);
```
Attach the metadata database to your DuckDB session:
```sql
ATTACH 'ducklake:md:__ducklake_metadata_' AS ;
```
Every DuckLake database in MotherDuck has a corresponding **metadata database** that stores internal state, including schema definitions, snapshots, file mappings, and more.
Create a table using your own compute:
```sql
CREATE TABLE .air_quality AS
SELECT * FROM 's3://us-prd-motherduck-open-datasets/who_ambient_air_quality/parquet/who_ambient_air_quality_database_version_2024.parquet';
```
With this configuration, your own compute can directly access or write data to your DuckLake (assuming appropriate credentials are configured). Data uploaded via your own compute will appear in the MotherDuck catalog and be queryable as a standard MotherDuck database.
## Performing metadata operations on a DuckLake
DuckLake databases provide additional metadata operations for introspection and maintenance. These operations can be performed from both MotherDuck and your own compute environments. For example, you can [list the snapshots](https://ducklake.select/docs/stable/duckdb/usage/snapshots) backing your DuckLake.
## Current limitations
- **Limited sharing options**: Read-only sharing is supported through the [existing share functionality](https://motherduck.com/docs/key-tasks/sharing-data/), restricted to auto-update shares only
- **Single-account write access**: Write permissions are currently limited to one account per database. This account can perform multiple concurrent writes, as long as they are append-only. If multiple queries attempt to update or delete from the same table concurrently, only the first to commit will succeed. Concurrent DDL operations are also not allowed. Support for *multi-account* write access is planned for a future release.
:::info
For multiple concurrent readers to a MotherDuck DuckLake database, you can create a [read scaling token](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/).
:::
[Data file maintenance](https://ducklake.select/docs/stable/duckdb/maintenance/recommended_maintenance) is not automatically performed by MotherDuck. You can manually trigger these maintenance functions as needed from either MotherDuck or your own compute environments.
---
Source: https://motherduck.com/docs/integrations/file-formats/index
---
title: File Formats
description: Load data into MotherDuck using various file formats
---
import DocCardList from '@theme/DocCardList';
# File Formats
Load data into MotherDuck using various file formats.
---
Source: https://motherduck.com/docs/integrations/how-to-integrate
---
sidebar_position: 999
title: Creating a New Integration
---
Integration with MotherDuck is almost the same as integrating with DuckDB, which means you can do it from any language or framework!
There are a few differences:
1) Use `"md:"` or `"md:my_database"` connection string instead of the local filesystem path.
1) Pass `motherduck_token` configuration property (through the config dictionary, connection string parameter or environment variable).
1) Pass `custom_user_agent` to identify the new integration.
### User-agent guidelines
* The format is `integration/version(custom-metadata1;custom-metadata2)` where the version and metadata sections are optional.
* Avoid using spaces in integration and version sections.
* multiple custom metadata sections should be separated by semicolons.
Some examples:
* `my-integration`
* `my-integration/2.9.0`
* `my-integration/2.9.0(linux_amd64)`
* `my-integration/2.9.0(linux_amd64;us-east-1)`
* `my-integration/2.9.0(linux_amd64;eu-central-1)`
## Language / Framework examples
### Python
```
con = duckdb.connect("md:my_database", config={
"motherduck_token": token,
"custom_user_agent": "INTEGRATION_NAME"
});
```
### Python with SQLAlchemy
```
eng = create_engine("duckdb:///md:my_database", connect_args={
'config': {
'motherduck_token': token,
'custom_user_agent': 'INTEGRATION_NAME'
}
})
```
### Java / JDBC
```
Properties config = new Properties();
config.setProperty("motherduck_token", token);
config.setProperty("custom_user_agent", "INTEGRATION_NAME");
Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:my_database", config);
```
### NodeJS
```
import { DuckDBInstance } from '@duckdb/node-api';
const instance = await DuckDBInstance.create('md:my_database', {
'motherduck_token': token,
'custom_user_agent': 'INTEGRATION_NAME'
});
const conn = await instance.connect();
```
### Go
```
db, err := sql.Open("duckdb", "md:my_database?custom_user_agent=INTEGRATION_NAME")
```
## Implementation best practices
If you use DuckDB/MotherDuck in a shared environment where multiple users are served by the same process, the connection string (e.g. URL for JDBC, Database for Python/ODBC) must be unique per user.
You can disambiguate the connection string with a unique-per-user substring, for example `md:database_name?user=unique_user_name`.
If using the `motherduck_token` in the connection string, make sure not to log it in plaintext.
---
Source: https://motherduck.com/docs/integrations/ingestion/dlt
[dlt](https://dlthub.com/docs/intro) is an open-source Python library that loads data from various, often messy data sources into well-structured, live datasets. It offers a lightweight interface for extracting data from REST APIs, SQL databases, cloud storage, Python data structures, and many more.
dlt is designed to be easy to use, flexible, and scalable:
* dlt infers schemas and data types, normalizes the data, and handles nested data structures.
* dlt supports a variety of popular destinations and has an interface to add custom destinations to create reverse ETL pipelines.
* dlt can be deployed anywhere Python runs, be it on Airflow, serverless functions, or any other cloud deployment of your choice.
* dlt automates pipeline maintenance with schema evolution and schema and data contracts.
Dlt integrates well with DuckDB (they also used it as a local [cache](https://dlthub.com/blog/dltplus-project-cache-in-early-access)) and therefore with MotherDuck.
You can check more about MotherDuck integration in the [official documentation](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck).
## Authentication
To authenticate with MotherDuck, you have two options:
1. **Environment variable:** export your `motherduck_token` as an environment variable:
```bash
export motherduck_token="your_motherduck_token"
```
2. For Local development: add the token to `.dlt/secrets.toml`:
```toml
[destination.motherduck.credentials]
password = "my_motherduck_token"
```
## Minimal example
Below is a minimal example of using dlt to load data from a REST API (with fake data) into a DuckDB (MotherDuck) database:
```python
import dlt
from typing import Dict, Iterator, List, Sequence
import random
from datetime import datetime
from dlt.sources import DltResource
@dlt.source(name="dummy_github")
def dummy_source(repos: List[str] = None) -> Sequence[DltResource]:
"""
A minimal DLT source that generates dummy GitHub-like data.
Args:
repos (List[str]): A list of dummy repository names.
Returns:
Sequence[DltResource]: A sequence of resources with dummy data.
"""
if repos is None:
repos = ["dummy/repo1", "dummy/repo2"]
return (
dummy_repo_info(repos),
dummy_languages(repos),
)
@dlt.resource(write_disposition="replace")
def dummy_repo_info(repos: List[str]) -> Iterator[Dict]:
"""
Generates dummy repository information.
Args:
repos (List[str]): List of repository names.
Yields:
Iterator[Dict]: An iterator over dummy repository data.
"""
for repo in repos:
owner, name = repo.split("/")
yield {
"id": random.randint(10000, 99999),
"name": name,
"full_name": repo,
"owner": {"login": owner},
"description": f"This is a dummy repository for {repo}",
"created_at": datetime.now().isoformat(),
"updated_at": datetime.now().isoformat(),
"stargazers_count": random.randint(0, 1000),
"forks_count": random.randint(0, 500),
}
@dlt.resource(write_disposition="replace")
def dummy_languages(repos: List[str]) -> Iterator[Dict]:
"""
Generates dummy language data for repositories in an unpivoted format.
Args:
repos (List[str]): List of repository names.
Yields:
Iterator[Dict]: An iterator over dummy language data.
"""
languages = ["Python", "JavaScript", "TypeScript", "C++", "Rust", "Go"]
for repo in repos:
# Generate 2-4 random languages for each repo
num_languages = random.randint(2, 4)
selected_languages = random.sample(languages, num_languages)
for language in selected_languages:
yield {
"repo": repo,
"language": language,
"bytes": random.randint(1000, 100000),
"check_time": datetime.now().isoformat(),
}
def run_minimal_example():
"""
Runs a minimal example pipeline that loads dummy GitHub data to MotherDuck.
"""
# Define some dummy repositories
repos = ["example/repo1", "example/repo2", "example/repo3"]
# Configure the pipeline
pipeline = dlt.pipeline(
pipeline_name="minimal_github_pipeline",
destination='motherduck',
dataset_name="minimal_example",
)
# Create the data source
data = dummy_source(repos)
# Run the pipeline with all resources
info = pipeline.run(data)
print(info)
# Show what was loaded
print("\nLoaded data:")
print(f"- {len(repos)} repositories")
print(f"- Languages for {len(repos)} repositories")
if __name__ == "__main__":
run_minimal_example()
```
dlt revolves around three core concepts:
* Sources: Define where the data comes from.
* Resources: Represent structured units of data within a source.
* Pipelines: Manage the data loading process.
In the example above:
* dummy_source defines a source that simulates GitHub-like data.
* dummy_repo_info and dummy_languages are resources producing repository and language data.
* A pipeline loads this data into MotherDuck.
The core integration with MotherDuck is defined in the pipeline configuration:
```python
pipeline = dlt.pipeline(
pipeline_name="minimal_github_pipeline",
destination="motherduck",
dataset_name="minimal_example",
)
```
Setting destination="motherduck" tells dlt to load the data into MotherDuck.
---
Source: https://motherduck.com/docs/integrations/ingestion/index
---
title: Ingestion
description: Configure MotherDuck as the destination for your data in the following data ingestion tools
---
import DocCardList from '@theme/DocCardList';
# Ingestion Tools
Configure MotherDuck as the destination for your data in the following data ingestion tools.
---
Source: https://motherduck.com/docs/integrations/ingestion/streamkap
# Streamkap
[Streamkap](http://streamkap.com) is a stream processing platform built for Change Data Capture (CDC) and event sources. It makes it easy to move operational data into analytics systems like MotherDuck with low latency and high reliability. Streamkap offers various sources, including PostgreSQL, MySQL, SQL Server, a range of SQL and NoSQL databases, Kafka, and other storage systems.
Streamkap is designed to get you streaming in minutes without a heavy setup. You focus on your business, and Streamkap handles the hard parts:
* Lightweight in-stream transformations let you preprocess, clean, and enrich data with minimal latency and cost.
* Automatically adapts to schema changes—added or removed fields, renamed columns, evolving data types, and nested structures.
* Built-in observability and automated recovery reduce operational overhead.
* Fully managed via API or Terraform, integrates with CI/CD workflows, and automates environment provisioning.
* Deploy multiple service versions to isolate workloads—logically (per microservice or environment) or physically (across regions or infrastructure).
* Choose from Streamkap Cloud or BYOC (Bring Your Own Cloud) for maximum flexibility and security.
You can explore Streamkap’s MotherDuck integration and examples in the [official documentation.](https://docs.streamkap.com/docs/motherduck)
# **Overview**
This guide explains how to stream data from Streamkap into the MotherDuck database using Amazon S3 as an intermediary. We'll utilise the S3 connector to first stream data into an S3 bucket. Then, you can configure MotherDuck to read from the S3 bucket to ingest the data into your database.
* Streamkap to S3: Streamkap is Kafka-based, so Kafka messages are streamed into an Amazon S3 bucket via an existing dedicated S3 connector. Please refer to the Streamkap’s [Kafka to S3 Streaming Guide](https://docs.streamkap.com/docs/s3) for detailed instructions.
* S3 to MotherDuck: MotherDuck is configured to read the data from the S3 bucket and load it into the database.
# **Prerequisites**
* Amazon S3 Bucket: A bucket in Amazon S3 where data from Streamkap will be streamed.
* MotherDuck Account: A valid MotherDuck account and database setup where the data will be loaded.
* Streamkap’s Kafka S3 Connector: Your Kafka to S3 connector configured and running.
# **MotherDuck Setup**
Once data is available in the S3 bucket, you can configure MotherDuck to read from the S3 bucket and load it into your database. Follow these steps:
## **Configure the S3 Source in MotherDuck**
To read data from the S3 bucket into MotherDuck, you need to configure a data source that points to the S3 bucket. This involves creating a connection between MotherDuck and your S3 bucket using AWS credentials.
1. Log in to MotherDuck and navigate to your workspace or database.
2. Go to the Secrets.
3. Add new secret and choose Amazon S3 as the secret type.
4. Provide the necessary details to access the S3 bucket:
* Secret Name: The name of your source connection details.
* Region: The region of your S3 bucket (e.g., us-west-2).
* Access Key ID: Your AWS Access Key ID.
* Secret Access Key: Your AWS Secret Access Key.
### **SQL Command for Secret Configuration**
Alternatively, you can configure the secret using SQL. Below is an example configuration for setting up the secret:
```sql
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key',
SECRET 'secret_key',
REGION 'us-east-1'
);
```
### **Verify Existing Secrets**
To check your existing secrets, you can run the following SQL command:
```sql
FROM duckdb_secrets()`
```

## **Query Data from the S3 Bucket**
Once the connection between MotherDuck and your S3 bucket is established, you can define a schema and table in MotherDuck or simply query the data directly from the S3 bucket.
Since your Kafka stream might be writing multiple files to the S3 bucket, we recommend using a wildcard `*` to read all files in a folder. This will enable MotherDuck to automatically pick up new files as they are written to the S3 bucket.
Here is an example SQL query to read data from your S3 bucket (using a wildcard for streaming):
```sql
SELECT key.id, value.name, value.note
FROM read read_parquet('s3://streamkap-s3-test-bucket/parquet_test/*')
```

---
Source: https://motherduck.com/docs/integrations/integrations
---
title: Integrations
description: Integrations that work with MotherDuck from the modern data stack
sidebar_class_name: integration-icon
---
import { IntegrationsTable } from "./integrations.table.js";
import "./integrations.css";
MotherDuck integrates with a lot of common tools from the modern data stack.
If you would like to create a new integration, see [this guide](how-to-integrate).
Below, you will find a comprehensive list of integrations that work with MotherDuck. Each integration includes links to either our own detailed tutorials, the integrator's documentation, or insightful articles and blogs that can help you get started.
:::info
When working with integrations, it may be useful to be aware of the [different connection string parameters](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#using-connection-string-parameters) you can use to connect to MotherDuck.
:::
## Supported Integrations
Use the search box to find specific integrations or click on category tags to filter the table.
:::note
See [DuckDB documentation](https://duckdb.org/docs/api/overview.html) for the full list of supported client APIs and drivers.
:::
## Diagram: Modern Duck Stack

---
Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/go-driver
---
sidebar_position: 1
title: Go driver
---
The [go-duckdb driver](https://github.com/duckdb/duckdb-go) supports MotherDuck out of the box!
To connect, you need a dependency on the driver in your `go.mod` file:
```go
github.com/duckdb/duckdb-go/v2 v2.5.1
```
Your code can then open a connection using the standard [database/sql](https://pkg.go.dev/database/sql) package, or any other mechanisms supported by [go-duckdb](https://github.com/duckdb/duckdb-go/blob/master/README.md):
```go
db, err := sql.Open("duckdb", "md:my_db?motherduck_token=")
```
---
Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/index
---
title: Language APIs & Drivers
description: Connect to MotherDuck using your preferred programming language
---
import DocCardList from '@theme/DocCardList';
# Language APIs & Drivers
Connect to MotherDuck using official drivers and APIs for various programming languages.
---
Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/jdbc-driver
---
sidebar_position: 1
title: JDBC driver
---
The official [DuckDB JDBC driver](https://duckdb.org/docs/api/java.html) supports MotherDuck out of the box!
To connect, you need a dependency on the driver. For example, in your Maven pom.xml file:
```xml
org.duckdb
duckdb_jdbc
1.4.1.0
```
Your code can then create a `Connection` by using `jdbc:duckdb:md:databaseName` connection string format:
```xml
Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db");
```
This `Connection` can then be [used directly](https://docs.oracle.com/en/java/javase/17/docs/api/java.sql/java/sql/Connection.html) or through any framework built on `java.sql` JDBC abstractions.
There are two main ways to programmatically authenticate with a valid MotherDuck token:
1) Passing it in through the connection configuration
```java
Properties config = new Properties();
config.setProperty("motherduck_token", token);
Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:mdw", config);
```
2) Passing the token as a connection string parameter:
```java
Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?motherduck_token="+token);
```
See [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md) for more details.
---
Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/python/python-overview
---
title: Python
description: Connect to MotherDuck using Python
---
Check out our [Python tutorial](/getting-started/interfaces/client-apis/connect-query-from-python/installation-authentication).
---
Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/python/sqlalchemy
---
sidebar_position: 3
title: SQLAlchemy with DuckDB and MotherDuck
sidebar_label: SQLAlchemy
---
[SQLAlchemy](https://www.sqlalchemy.org/) is a SQL toolkit and Object-Relational Mapping (ORM) system for Python, providing full support for SQL expression language constructs and various database dialects.
A lot of Business Intelligence tools supports SQLAlchemy out of the box.
Using the [DuckDB SQLAlchemy driver](https://github.com/Mause/duckdb_engine) we can connect to MotherDuck using an SQLAlchemy URI.
## Install the DuckDB SQLAlchemy driver
```bash
pip install --upgrade duckdb-engine
```
## Configuring the database connection to a local DuckDB database
A local DuckDB database can be accessed using the SQLAlchemy URI:
```bash
duckdb:///path/to/file.db
```
## Configuring the database connection to MotherDuck
The general pattern for the SQLAlchemy URI to access a MotherDuck database is:
```bash
duckdb:///md:?motherduck_token=
```
:::info
The database name `` in the connection string is **optional**. This makes it possible to query multiple databases with one connection to MotherDuck.
:::
Connecting and authentication can be done in several ways:
1. If no token is available, the process will direct you to a web login for authentication, which will allow you to obtain a token.
```python
from sqlalchemy import create_engine, text
eng = create_engine("duckdb:///md:my_db")
with eng.connect() as conn:
result = conn.execute(text("show databases"))
for row in result:
print(row)
```
When running the above, you will see something like this to authenticate:

2. The `MOTHERDUCK_TOKEN` is already set as environment variable
```python
from sqlalchemy import create_engine, text
eng = create_engine("duckdb:///md:my_db")
with eng.connect() as conn:
result = conn.execute(text("show databases"))
for row in result:
print(row)
```
3. Using configuration dictionary
```python
from sqlalchemy import create_engine, text
config = {}
token = 'asdfwerasdf' # Fill in your token
config["motherduck_token"] = token;
eng = create_engine(
"duckdb:///md:my_db",
connect_args={ 'config': config}
)
with eng.connect() as conn:
result = conn.execute(text("show databases"))
for row in result:
print(row)
```
4. Passing the token as a connection string parameter
```python
from sqlalchemy import create_engine, text
token = 'asdfwerasdf' # Fill in your token
eng = create_engine(f"duckdb:///md:my_db?motherduck_token={token}")
with eng.connect() as conn:
result = conn.execute(text("show databases"))
for row in result:
print(row)
```
:::info
While the DuckDB Python API has a `.sql()` method on the connection API, SQLAlchemy does not. However, they both share the `.execute()` function and concept. More info on the SQLAlchemy connection [here](https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Connection)
:::
---
Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/r
---
sidebar_position: 1
title: R
---
[R](https://www.r-project.org/) is a language for statistical analysis.
To connect to MotherDuck from an R program, you need to first install DuckDB:
```
install.packages("duckdb")
```
You'll then need to load the `motherduck` extension and `ATTACH 'md:'` to connect to all of your databases.
To connect to only one database, use `ATTACH 'md:my_db'` syntax.
```
library("DBI")
con <- dbConnect(duckdb::duckdb())
dbExecute(con, "INSTALL 'motherduck'")
dbExecute(con, "LOAD 'motherduck'")
dbExecute(con, "ATTACH 'md:'")
dbExecute(con, "USE my_db")
res <- dbGetQuery(con, "SHOW DATABASES")
print(res)
```
Once connected, any R syntax described in the [DuckDB's documentation](https://duckdb.org/docs/api/r.html) should work.
:::note
Extension autoloading is turned off in R duckdb distributions, so `dbdir = "md:"` style connections do not connect to MotherDuck.
:::
## Considerations and limitations
### Windows integration
MotherDuck extension is not currently available on Windows. As a workaround, you can use [WSL](https://learn.microsoft.com/en-us/windows/wsl/about) (Windows Subsystem for Linux)
---
Source: https://motherduck.com/docs/integrations/orchestration/index
---
title: Orchestration
description: Orchestrate data pipelines with MotherDuck
---
import DocCardList from '@theme/DocCardList';
# Orchestration Tools
Build and manage data pipelines with MotherDuck using these orchestration tools.
---
Source: https://motherduck.com/docs/integrations/reverse-etl/index
---
title: Reverse ETL
description: Reverse ETL tools and utilities that work with MotherDuck
---
import DocCardList from '@theme/DocCardList';
# Development Tools
Use MotherDuck with various development tools and utilities to enhance your workflow.
---
Source: https://motherduck.com/docs/integrations/sql-ides/datagrip
---
sidebar_position: 5
title: DataGrip
---
JetBrains [DataGrip](https://www.jetbrains.com/datagrip/) is a cross-platform IDE for working with SQL and noSQL databases.
It includes a DuckDB integration, which makes connecting to MotherDuck easy.
## Connecting to MotherDuck in DataGrip
Create a new data source and choose the **DuckDB** driver. DataGrip opens the **Data Sources and Drivers** window where you configure the connection.
### Token Authentication
To retrieve a MotherDuck token, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md).
1. In **Data Sources and Drivers > General**, set **Authentication** to **No auth**.
2. Populate the **URL** field with the MotherDuck connection string, replacing `my_db` with your database name or omitting it to connect to the default catalog:
```sh
jdbc:duckdb:md:[my_db]
```

3. Open the **Advanced** tab and add a new parameter named `motherduck_token`, setting its value to the token you generated earlier.

Click "OK" to begin querying MotherDuck!
:::note
The default schema filtering configuration of DataGrip may hide some of the schemas that exist in your MotherDuck account. Reconfigure to display all schemas following [DataGrip documentation](https://www.jetbrains.com/help/datagrip/schemas.html).
:::
## Update the DuckDB Driver Version
DataGrip bundles a DuckDB JDBC driver, but you can replace it with another version if needed.
1. Visit the [DuckDB JDBC maven repository](https://mvnrepository.com/artifact/org.duckdb/duckdb_jdbc).
2. Select the DuckDB release you want to use and download the `.jar` file listed under **Files**.
3. In the **Data Sources and Drivers** window, switch to the **Drivers** pane and select **DuckDB**.
4. On the **General** tab, find **Driver files**, click the **+** icon, and choose the `.jar` file you downloaded.
5. You need to remove the existing DuckDB driver from the **Drivers** pane for the new driver to take effect (needs to be first in the list).
6. [optional] To restore the default driver, click on the **+** icon and select **DuckDB** among the available drivers.
DataGrip now uses the updated DuckDB driver for MotherDuck connections.
---
Source: https://motherduck.com/docs/integrations/sql-ides/dbeaver
---
sidebar_position: 5
title: DBeaver
---
[DBeaver Community](https://dbeaver.io/) is a free cross-platform database integrated development environment (IDE).
It includes a DuckDB integration, so it is a great choice for querying MotherDuck.
## DBeaver DuckDB Setup
DBeaver uses the official [DuckDB JDBC driver](https://duckdb.org/docs/api/java.html), which supports MotherDuck out of the box!
To install DBeaver and the DuckDB driver, first follow the [DuckDB DBeaver guide](https://duckdb.org/docs/guides/sql_editors/dbeaver).
That guide will create a local DuckDB in memory connection.
After completing those steps, follow the steps below to add a MotherDuck connection in addition!
## Connecting DBeaver to MotherDuck
### Browser Authentication
Create a new DuckDB connection in DBeaver.
When entering the connection string in DBeaver, instead of using `:memory:` for an in memory DuckDB, use `md:my_db`.
Replace `my_db` with the name of the target MotherDuck database as needed.
Clicking either "Test Connection" or "Finish" will open the default browser and display an authorization prompt.
Click "Confirm", then return to DBeaver to begin querying MotherDuck!
### Token Authentication
To avoid the authentication prompt when opening DBeaver, a MotherDuck access token can be included as a connection string parameter.
To retrieve a token, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md).
Then, create a new DuckDB connection in DBeaver.
Include the token as a query string parameter in the connection string following this format, replacing `` with the access token from the prior step, and `my_db` with the target MotherDuck database:
```sh
md:my_db?motherduck_token=
```
Click "Finish" to begin querying MotherDuck!
---
Source: https://motherduck.com/docs/integrations/sql-ides/index
---
title: SQL IDEs
description: Use MotherDuck with your favorite SQL development environments
---
import DocCardList from '@theme/DocCardList';
# SQL IDEs
Connect to MotherDuck using popular SQL development environments and query editors.
---
Source: https://motherduck.com/docs/integrations/transformation/dbt-cloud
---
sidebar_position: 20
title: dbt cloud with MotherDuck via pg_duckdb
description: For dbt cloud users, pg_duckdb can be used as a shim for MotherDuck
sidebar_label: dbt cloud
---
[dbt cloud](https://www.getdbt.com/product/dbt-cloud) is a managed service for dbt core. MotherDuck is used with dbt cloud by deploying a Postgres proxy with [pg_duckdb](/concepts/pgduckdb) installed.
## Getting Started
You will need the following items to get started:
1. A Postgres instance with pg_duckdb installed.
2. A [MotherDuck token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token).
3. A dbt cloud account.
## Configuring pg_duckdb
The full documentation for pg_duckdb can be found on [GitHub](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md), but a simple way to set it up is using Docker on EC2.
In our testing, we have used m7g.xlarge, which is a 4-core, 16GB instance. Since Postgres exists as a proxy for MotherDuck, it only needs to have enough working space to stream results back to dbt. Even smaller instances could suffice as well, i.e. a1.large, although it has not been tested thoroughly. The memory limits set below assumes a 16GB limit.
Once you have added your MotherDuck Token and Postgres password to your environment, you can execute the `docker run` statement below:
```yml
docker run -d \
--name pgduckdb \
-p 5432:5432 \
-e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \
-e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \
-v ~/pgduckdb_data_v17:/var/lib/postgresql/data \
--restart unless-stopped \
--memory=12288m \
pgduckdb/pgduckdb:17-main
```
:::note
The default configuration of Postgres is sub-optimal for m7g.xlarge. Consider making the following changes to the `postgresql.conf` file.
```
# Memory configuration optimized for AWS m7g.xlarge with more conservative settings
work_mem = '32MB' # Per-operation memory for sorts, joins, etc.
maintenance_work_mem = '512MB' # Memory for maintenance operations
shared_buffers = '2GB' # ~12.5% of RAM for shared buffer cache
effective_cache_size = '6GB' # Conservative estimate of OS cache
max_connections = 100 # Reduced maximum concurrent connections
```
:::
### Upgrading to newer builds of pg_duckdb
New containers are built for pg_duckdb on every release. Since we are using docker to run the container, the pg_duckdb server can be stopped, pruned, and then rebuilt with the above docker run command. It is recommended to use a script to rebuild docker image on some cadence. Terraform or similar is recommended to handle this maintenance process.
An example shell script can be seen below:
Shell script
```sh
#!/bin/bash
# Error handling function
handle_error() {
local line_no=$1
local exit_code=$2
echo "ERROR: An error occurred at line ${line_no}, exit code ${exit_code}"
exit ${exit_code}
}
# Set up error trap
trap 'handle_error ${LINENO} $?' ERR
# Script to install Docker and run PGDuckDB with MotherDuck on AWS EC2
# Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh
# Detect OS
if grep -q 'Amazon Linux release 2023' /etc/os-release; then
OS_VERSION="Amazon Linux 2023"
elif grep -q 'Amazon Linux release 2' /etc/os-release; then
OS_VERSION="Amazon Linux 2"
elif grep -q 'Ubuntu' /etc/os-release; then
OS_VERSION="Ubuntu"
else
OS_VERSION="Linux"
fi
echo "Starting setup for PGDuckDB with MotherDuck on $OS_VERSION..."
# Check if required environment variables are set
if [ -z "$POSTGRES_PASSWORD" ]; then
echo "ERROR: POSTGRES_PASSWORD environment variable is not set."
echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh"
exit 1
fi
if [ -z "$MOTHERDUCK_TOKEN" ]; then
echo "ERROR: MOTHERDUCK_TOKEN environment variable is not set."
echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh"
exit 1
fi
# Update package lists - continue even if there are errors with some repositories
echo "Updating package lists..."
if [[ "$OS_VERSION" == "Ubuntu" ]]; then
sudo apt-get update -y || true
elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then
sudo dnf update -y || true
else
sudo yum update -y || true
fi
# Check if Docker is already installed
if command -v docker &>/dev/null; then
echo "Docker is already installed, skipping installation."
else
# Install prerequisites based on OS
echo "Installing prerequisites..."
if [[ "$OS_VERSION" == "Ubuntu" ]]; then
sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then
# Use --allowerasing to handle curl package conflicts
sudo dnf install -y --allowerasing \
device-mapper-persistent-data \
lvm2 \
ca-certificates
else
sudo yum install -y \
device-mapper-persistent-data \
lvm2 \
ca-certificates
fi
# Install Docker based on OS
echo "Installing Docker..."
if [[ "$OS_VERSION" == "Ubuntu" ]]; then
# Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# Set up the repository
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update and install
sudo apt-get update -y
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then
# Amazon Linux 2023 - use the standard package
sudo dnf install -y docker
elif [[ "$OS_VERSION" == "Amazon Linux 2" ]]; then
# Amazon Linux 2 - use extras
sudo amazon-linux-extras install -y docker
else
# Fallback
sudo yum install -y docker
fi
# Verify Docker was installed
if ! command -v docker &>/dev/null; then
echo "ERROR: Docker installation failed."
exit 1
fi
fi
# Start Docker service
echo "Starting Docker service..."
sudo systemctl start docker || sudo service docker start
sudo systemctl enable docker || sudo chkconfig docker on
# Add current user to docker group to avoid using sudo with docker commands
echo "Adding current user to docker group..."
sudo usermod -aG docker "$USER"
# Create a new data directory for PostgreSQL 17
echo "Creating new data directory for PostgreSQL 17..."
mkdir -p ~/pgduckdb_data_v17
# Fix permissions on the data directory
echo "Setting correct permissions on data directory..."
sudo chown -R 999:999 ~/pgduckdb_data_v17 # 999 is the standard UID for postgres user in Docker
sudo chmod 700 ~/pgduckdb_data_v17
# Check architecture
ARCH=$(uname -m)
echo "Detected architecture: $ARCH"
if [[ "$ARCH" == "aarch64" || "$ARCH" == "arm64" ]]; then
echo "Using ARM64 architecture (Graviton3)..."
else
echo "Using x86_64 architecture..."
fi
# Check if container already exists and remove it if necessary
if sudo docker ps -a | grep -q pgduckdb; then
echo "Found existing pgduckdb container. Removing it..."
sudo docker stop pgduckdb || true
sudo docker rm pgduckdb || true
fi
# Pull the Docker image
echo "Pulling Docker image..."
sudo docker pull pgduckdb/pgduckdb:17-main
# Check available system memory
echo "Checking system memory..."
TOTAL_MEM_KB=$(grep MemTotal /proc/meminfo | awk '{print $2}')
TOTAL_MEM_MB=$((TOTAL_MEM_KB / 1024))
echo "Total system memory: ${TOTAL_MEM_MB}MB"
# Calculate 75% of system memory for Docker container limit
DOCKER_MEM_LIMIT=$((TOTAL_MEM_MB * 75 / 100))
echo "Setting Docker container memory limit to: ${DOCKER_MEM_LIMIT}MB"
# Run the Docker container with memory limit
echo "Starting PostgreSQL container..."
sudo docker run -d \
--name pgduckdb \
-p 5432:5432 \
-e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \
-e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \
-v ~/pgduckdb_data_v17:/var/lib/postgresql/data \
--restart unless-stopped \
--memory=${DOCKER_MEM_LIMIT}m \
pgduckdb/pgduckdb:17-main
# Wait for PostgreSQL to start
echo "Waiting for PostgreSQL to start..."
sleep 10
# Configure PostgreSQL
echo "Configuring PostgreSQL and DuckDB..."
# Append settings to the main PostgreSQL configuration file
echo "Appending settings to PostgreSQL configuration file..."
sudo docker exec -i pgduckdb bash -c "cat >> /var/lib/postgresql/data/postgresql.conf << 'EOT'
# DuckDB integration settings
duckdb.motherduck_enabled = true
# Memory configuration optimized for AWS m7g.xlarge with more conservative settings
work_mem = '32MB' # Per-operation memory for sorts, joins, etc.
maintenance_work_mem = '512MB' # Memory for maintenance operations
shared_buffers = '2GB' # ~12.5% of RAM for shared buffer cache
effective_cache_size = '6GB' # Conservative estimate of OS cache
max_connections = 100 # Reduced maximum concurrent connections
# Detailed query logging
log_min_duration_statement = 0 # Log all queries
log_statement = 'all' # Log all SQL statements
log_duration = on # Log duration of each SQL statement
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u ' # Prefix format
EOT"
# Restart PostgreSQL to apply all configuration settings
echo "Restarting PostgreSQL container to apply all configuration settings..."
sudo docker restart pgduckdb
# Wait for PostgreSQL to restart
echo "Waiting for PostgreSQL container to restart..."
sleep 10
# Verify PostgreSQL is running with new settings
echo "Verifying PostgreSQL configuration..."
sudo docker exec -i pgduckdb psql -U postgres << EOF
-- Check if PostgreSQL is running
SELECT version();
EOF
# Create monitoring script
echo "Creating monitoring script..."
cat > ~/monitor_pg.sh << 'EOF'
#!/bin/bash
echo "=== PostgreSQL Container Status ==="
docker ps -a -f name=pgduckdb
echo -e "\n=== Resource Usage ==="
docker stats --no-stream pgduckdb
echo -e "\n=== Recent Logs ==="
docker logs --tail 10 pgduckdb
echo -e "\n=== Connection Test ==="
docker exec -it pgduckdb pg_isready -U postgres
if [ $? -eq 0 ]; then
echo "PostgreSQL is accepting connections."
else
echo "PostgreSQL is not accepting connections."
fi
EOF
chmod +x ~/monitor_pg.sh
# Create startup script
echo "Creating startup script..."
cat > ~/start_pg.sh << 'EOF'
#!/bin/bash
echo "Starting PostgreSQL container..."
docker start pgduckdb
echo "Container status:"
docker ps -a -f name=pgduckdb
EOF
chmod +x ~/start_pg.sh
# Check if container is running or restarting
echo "Checking container status..."
CONTAINER_STATUS=$(sudo docker inspect -f '{{.State.Status}}' pgduckdb 2>/dev/null || echo "not_found")
if [[ "$CONTAINER_STATUS" == "restarting" ]]; then
echo "WARNING: Container is restarting. Checking logs for errors..."
sudo docker logs pgduckdb
echo "
Try reducing the memory settings in the PostgreSQL configuration if the container keeps restarting."
echo "You can manually adjust settings by connecting to the container once it's stable."
elif [[ "$CONTAINER_STATUS" != "running" && "$CONTAINER_STATUS" != "not_found" ]]; then
echo "WARNING: Container is not running (status: $CONTAINER_STATUS). Checking logs for errors..."
sudo docker logs pgduckdb
fi
# Final status check
echo "=== Setup Complete ==="
echo "PostgreSQL with DuckDB is now running."
echo "Container status:"
sudo docker ps -a -f name=pgduckdb
echo -e "\n=== Connection Information ==="
echo "Host: localhost"
echo "Port: 5432"
echo "User: postgres"
echo "Password: [The password you provided]"
echo "Database: postgres"
echo -e "\n=== Useful Commands ==="
echo "Monitor status: ./monitor_pg.sh"
echo "Start after reboot: ./start_pg.sh"
echo "Connect to PostgreSQL: docker exec -it pgduckdb psql -U postgres"
echo "View logs: docker logs pgduckdb"
echo -e "\n=== Note ==="
echo "You may need to log out and log back in for the docker group changes to take effect."
echo "After that, you can run docker commands without sudo."
```
## dbt cloud configuration
dbt cloud is configured as standard Postgres, with a couple of key details.
1. You will need to create a schema in MotherDuck for each user as well as production, as using pg_duckdb to create new schemas in MotherDuck is not supported.
2. You will need to set an environmental variable for `DBT_SCHEMA` that uses the pg_duckdb schema format, which is `ddb$[database]$[schema]` since Postgres only supports a single databse per instance. This will need to be set for each user as well as production with `{{ env_var('DBT_SCHEMA')}}`.
3. The recommended thread count follow our dbt-core recommendation, which is 4 threads.
If dbt is configured incorrectly, data may write to Postgres, which is much slower than MotherDuck. In that case, the easiest fix is to rebuild the docker container per above, to assure that no data accidently ends up in Postgres.
## Usage notes
There are a few things to know about using dbt cloud with pg_duckdb that are unusual.
1. You write Postgres dialect SQL that is executed against DuckDB. As such, there is some ideosyncracies that are neither Postgres nor DuckDB, but a secret, third thing (pg_duckdb SQL). The details of this are described in the [pg_duckdb documentation](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md).
2. Views are only stored in Postgres without any artifacts in MotherDuck. As such, they can be used for interim data but not final datasets to be consumed by end-users. As such, changing materialization type from view to table, or table to view, is a hybrid MotherDuck & Postgres transaction, and unsupported.
3. Running on multiple threads can occasionally cause deadlocks with the pg_duckdb catalog maintenance service. This can be resolved with `dbt retry` in your production pipeline runs.
4. DuckDB types are more specific than Postgres, so model builds using numeric types will throw errors that can be resolved with specific typing.
5. From time-to-time the Postgres catalog can get out of sync, and will show tables that do not exist in MotherDuck. To resolve this, create the missing object in MotherDuck, i.e. `CREATE TABLE my_schema.model_name AS SELECT 1;`, which will unblock your dbt model.
---
Source: https://motherduck.com/docs/integrations/transformation/dbt
---
sidebar_position: 1
title: dbt with DuckDB and MotherDuck
description: DuckDB and MotherDuck both support using dbt to manage data loading and transformation
sidebar_label: dbt core
---
[Data Build Tool](https://www.getdbt.com/) (dbt) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses by defining SQL in model files. It bring the composability of programming languages to SQL while automating the mechanics of updating tables.
[dbt-duckdb](https://github.com/jwills/dbt-duckdb) is the adapter which allows dbt to use DuckDB and MotherDuck. The adapter also supports [DuckDB extensions](https://duckdb.org/docs/extensions/overview) and any of the additional [DuckDB configuration options](https://duckdb.org/docs/sql/configuration).
## Installation
Since dbt is a Python library, it can be installed through pip:
```pip3 install dbt-duckdb```
will install both `dbt` and `duckdb`.
## Configuration for Local DuckDB
This configuration allows you to connect to S3 and perform read/write operations on Parquet files using an AWS access key and secret.
`profiles.yml`
```yaml
default:
outputs:
dev:
type: duckdb
path: /tmp/dbt.duckdb
threads: 4
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev
```
:::tip
The `path` attribute specifies where your DuckDB database file will be created. By default, this path is relative to your `profiles.yml` file location. If the database doesn't exist at the specified path, DuckDB will automatically create it.
:::
You can find more information about these connections profiles in the [dbt documentation](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles).
## Configuration for MotherDuck
The only change needed for motherduck is the `path:` setting.
```yaml
default:
outputs:
dev:
type: duckdb
path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}"
threads: 4
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev
```
This assumes that you have setup `MOTHERDUCK_TOKEN` as an environment variable. To know more about how to persist your authentication credentials, read [Authenticating to MotherDuck using an access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). If you don't set the `motherduck_token` in your path, you will be prompted to authenticate to MotherDuck when running your `dbt run` command.

Follow the instructions and it will export the service account variable for the current `dbt run` process.
DuckDB will parallelize a single write query as much as possible, so the gains from running more than one query at a time are minimal on the database side. That being said, our testing indicates that setting `threads: 4` typically leads to the best performance.
## Attaching Additional Databases
dbt-duckdb supports attaching additional databases to your main DuckDB connection, allowing you to work with multiple databases simultaneously. This is particularly useful when you need to reference data from different sources or when working with separate databases for different purposes.
### Configuration
To attach additional databases, add an `attach` section to your profile configuration:
```yaml
default:
outputs:
dev:
type: duckdb
path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}"
threads: 4
extensions:
- httpfs
- parquet
attach:
- path: "md:other_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}"
alias: other_db
- path: "md:third_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}"
alias: third_db
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev
```
:::tip
The `alias` parameter is optional. If not specified, dbt-duckdb will use the filename (without extension) as the alias for the attached database.
:::
### Usage Example
Once you have attached databases, you can use the `database` config parameter in your dbt models to specify which database to write to:
```sql
-- models/my_model.sql
{{ config(database='other_db') }}
SELECT
id,
name,
created_at
FROM {{ ref('source_table') }}
WHERE created_at >= '2024-01-01'
```
You can also specify the database for source tables in your `sources.yml` file:
```yaml
# models/sources.yml
version: 2
sources:
- name: external_data
database: other_db
tables:
- name: customers
description: Customer data from external database
- name: orders
description: Order data from external database
```
Then reference these sources in your models, from the correct database:
```sql
-- models/combined_data.sql
SELECT
c.customer_id,
c.customer_name,
o.order_id,
o.order_date
FROM {{ source('external_data', 'customers') }} c
JOIN {{ source('external_data', 'orders') }} o ON c.customer_id = o.customer_id
```
## Extra resources
Take a look at our video guide on DuckDB and dbt provided below, along with the corresponding [demo tutorial on GitHub](https://github.com/mehd-io/dbt-duckdb-tutorial).
VIDEO
---
Source: https://motherduck.com/docs/integrations/transformation/index
---
title: Data Transformation
description: Transform your data inside MotherDuck
---
import DocCardList from '@theme/DocCardList';
# Data Transformation
Use MotherDuck to transform your data.
---
Source: https://motherduck.com/docs/integrations/web-development/index
---
title: Web Development
description: Build web applications with MotherDuck
---
import DocCardList from '@theme/DocCardList';
# Web Development
Use MotherDuck to power your web applications and services.
---
Source: https://motherduck.com/docs/integrations/web-development/vercel
---
sidebar_position: 1
title: Vercel
description: Hosting a web application with MotherDuck Wasm SDK on Vercel
sidebar_label: Vercel
---
[Vercel](https://vercel.com/) is a cloud platform for static sites and serverless functions. It is a great platform for hosting web applications using [MotherDuck Wasm SDK](/sql-reference/wasm-client).
Vercel typically provides two ways to integrate with 3rd party services :
- Native integration : create a new account on the 3rd party service and connect it to Vercel. Billing and setup is managed by Vercel.
- Non-native integration (connectable accounts) : connect existing 3rd party accounts to Vercel.
:::info
Vercel supports Native Integration with MotherDuck, support for non-native integration is coming soon.
:::
## Native Integration
To kickstart the integration, you can either start from :
- [Vercel's marketplace](https://vercel.com/marketplace/motherduck) and install the integration from there on an existing Vercel project.
- Deploy a new project from [MotherDuck's Vercel template](https://vercel.com/motherduck-marketing/~/integrations/motherduck) which includes snippets to get started with MotherDuck and your Next.js project.
### How to install
1. To install the MotherDuck Native Integration from the Vercel Marketplace:
2 Navigate to the Vercel Marketplace or to the Integrations Console on your Vercel Dashboard.
3. Locate the MotherDuck integration.
4. Click Install.
5. On the Install MotherDuck modal, you are presented with two plans options.

6. On the next modal, you would be prompt to give your database a name. Note that a new installation will create a new account and database within a new MotherDuck organization.

7. You are all set! You have now a new account and database within a new organization. Plus, tokens ([access token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token), and [read scaling token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#understanding-read-scaling-tokens)) are automatically generated and stored in Vercel's environment variables.

You can head to `Getting Started` section on the integration page to have more information on how to use the integration.

### Project templates
Learn more about how to setup your projects by using the following templates:
- [MotherDuck's Vercel template](https://github.com/MotherDuck-Open-Source/nextjs-motherduck-wasm-analytics-quickstart) : A fully-fledged template that includes a Next.js project and a MotherDuck WASM setup with sample data integration and an interactive data visualization example.
- [MotherDuck's Vercel template minimal](https://github.com/MotherDuck-Open-Source/nextjs-motherduck-wasm-analytics-quickstart-minimal) : a minimal template which includes a Next.js project and MotherDuck Wasm setup with some sample data integration.
---
Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/ai-features-in-ui
---
sidebar_position: 1
title: AI Features in the MotherDuck UI
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Automatically Edit SQL Queries in the MotherDuck UI
Edit is a MotherDuck AI-powered feature which allows you to edit SQL queries in the MotherDuck UI. The AI is aware of DuckDB-specific SQL features and relevant database schemas to provide effective suggestions.
Select the specific part of the query you want to edit, then press the keyboard shortcut to open the Edit dialog:
* Windows/Linux: `Ctrl + Shift + E`
* macOS: `⌘ + Shift + E`
In the Edit dialog, enter your prompt (e.g., "extract the domain from the url, using a regex") and click Suggest edit.

If the suggestion is not as desired, it can be further clarified with follow-up prompts.

When happy with the change, click 'Apply edit', and the change will be applied to the query.

## Automatically Fix SQL Errors in the MotherDuck UI
FixIt is a MotherDuck AI-powered feature that helps you resolve common SQL errors by offering fixes in-line. Read more about it in our [blog post](https://motherduck.com/blog/introducing-fixit-ai-sql-error-fixer/).
FixIt can also be called programmatically using the `prompt_fix_line` . Find more information [here](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line).
### How FixIt works
By default, FixIt is enabled for all users. If you run a query that has an error, FixIt will automatically analyze the query and suggest in-line fixes.
When accepting a fix, MotherDuck will automatically update your query and re-execute it.

When 'Auto-suggest' is un-toggled, FixIt will not automatically suggest fixes anymore. FixIt can still be manually triggered by clicking 'Suggest fix' at the bottom of the error message.

## Access SQL Assistant functions
MotherDuck provides built-in AI features to help you write, understand and fix DuckDB SQL queries more efficiently. These features include:
- [Answer questions about your data](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-query) using the `prompt_query` pragma.
- [Generate SQL](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql) for you using the `prompt_sql` table function.
- [Correct and fix up your SQL query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fixup) using the `prompt_fixup` table function.
- [Correct and fix up your SQL query line-by-line](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line) using the `prompt_fix_line` table function.
- [Help you understand a query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-explain) using the `prompt_explain` table function.
- [Help you understand contents of a database](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-schema) using the `prompt_schema` table function.
### Example usage of prompt_sql
We use MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets).
```sql
CALL prompt_sql('what are the top domains being shared on hacker_news?');
```
Output of this SQL statement is a single column table that contains the AI-generated SQL query.
| **query** |
|-----------------|
| SELECT COUNT(*) as domain_count, SUBSTRING(SPLIT_PART(url, '//', 2), 1, POSITION('/' IN SPLIT_PART(url, '//', 2)) - 1) as domain FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY domain_count DESC LIMIT 10 |
---
Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/building-analytics-agents
---
title: Custom AI Agent Builder's Guide
---
import MotherDuckSQLEditor from '@site/src/components/MotherDuckSQLEditor';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Building Analytics Agents with MotherDuck
Analytics agents are AI-powered systems that allow users to interact with data using natural language. Instead of writing SQL queries or building dashboards, users can ask questions like "What were our top-selling products last quarter?" and get immediate answers.
This guide covers best practices for building production-ready analytics agents on MotherDuck.
## Prerequisites
- **Agent framework**: [Claude Agent SDK](https://docs.anthropic.com/en/api/agent-sdk/overview), [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/), or Claude Desktop with MotherDuck MCP connector
- **MotherDuck account** with the data you want to query
- **Clean, well-structured data**: The better your schema and metadata, the better your agent performs
## Step 1: Define Your Agent's Interface
Choose the interface your agent will use to query your MotherDuck database.
### Option A: Generated SQL
The agent generates SQL queries and executes them via a tool/function call. This provides maximum flexibility - agents can answer any question your data supports - but requires good SQL generation capabilities.
**Implementation approaches:**
**MCP Server**: Use our [MCP Server](/sql-reference/mcp/) for Claude Desktop, Cursor, ChatGPT, or Claude Code
**Custom tool calling**: Create a function that accepts SQL strings and executes them:
```python
import duckdb
def execute_sql(query: str) -> str:
"""Execute SQL query against MotherDuck"""
conn = duckdb.connect('md:my_database?motherduck_token=')
try:
result = conn.execute(query).fetchdf()
return result.to_string()
except Exception as e:
return f"Error: {str(e)}"
```
### Option B: Parameterized Query Templates
The agent receives structured parameters that fill predefined SQL templates. This provides strict correctness guarantees and is easier to validate, but is less flexible and requires more upfront development with queries limited to predefined questions.
**Example**: Agent chooses calling a custom tool with a domain-specific signature like `get_sales_by_region(region: str, start_date: date, end_date: date)` instead of generating custom SQL.
**Recommendation**: Start with Option A (SQL generation) unless you have strict correctness requirements or very limited query patterns.
## Step 2: Give Your Agent SQL Knowledge
Your LLM needs to know how to write good DuckDB queries.
### System Prompt for DuckDB and MotherDuck
A system prompt is the foundational instruction set that guides your agent's behavior and capabilities. It's critical for ensuring your agent generates correct, efficient SQL queries and understands how to explore data effectively.
The query guide below should be added to your system prompt because it contains:
- DuckDB SQL syntax and conventions
- Common patterns and best practices
- How to explore schemas efficiently
query_guide.md
```text
# DuckDB SQL Query Syntax and Performance Guide
## General Knowledge
### Basic Syntax and Features
**Identifiers and Literals:**
- Use double quotes (`"`) for identifiers with spaces/special characters or case-sensitivity
- Use single quotes (`'`) for string literals
**Flexible Query Structure:**
- Queries can start with `FROM`: `FROM my_table WHERE condition;` (equivalent to `SELECT * FROM my_table WHERE condition;`)
- `SELECT` without `FROM` for expressions: `SELECT 1 + 1 AS result;`
- Support for `CREATE TABLE AS` (CTAS): `CREATE TABLE new_table AS SELECT * FROM old_table;`
**Advanced Column Selection:**
- Exclude columns: `SELECT * EXCLUDE (sensitive_data) FROM users;`
- Replace columns: `SELECT * REPLACE (UPPER(name) AS name) FROM users;`
- Pattern matching: `SELECT COLUMNS('sales_.*') FROM sales_data;`
- Transform multiple columns: `SELECT AVG(COLUMNS('sales_.*')) FROM sales_data;`
**Grouping and Ordering Shortcuts:**
- Group by all non-aggregated columns: `SELECT category, SUM(sales) FROM sales_data GROUP BY ALL;`
- Order by all columns: `SELECT * FROM my_table ORDER BY ALL;`
**Complex Data Types:**
- Lists: `SELECT [1, 2, 3] AS my_list;`
- Structs: `SELECT {'a': 1, 'b': 'text'} AS my_struct;`
- Maps: `SELECT MAP([1,2],['one','two']) AS my_map;`
- Access struct fields: `struct_col.field_name` or `struct_col['field_name']`
- Access map values: `map_col[key]`
**Date/Time Operations:**
- String to timestamp: `strptime('2023-07-23', '%Y-%m-%d')::TIMESTAMP`
- Format timestamp: `strftime(NOW(), '%Y-%m-%d')`
- Extract parts: `EXTRACT(YEAR FROM DATE '2023-07-23')`
### Database and Table Qualification
**Fully Qualified Names:**
- Tables are accessed by fully qualified names: `database_name.schema_name.table_name`
- There is always one current database: `SELECT current_database();`
- Tables from the current database don't need database qualification: `schema_name.table_name`
- Tables in the main schema don't need schema qualification: `table_name`
- Shorthand: `my_database.my_table` is equivalent to `my_database.main.my_table`
**Switching Databases:**
- Use `USE my_other_db;` to switch current database
- After switching, tables in that database can be accessed without qualification
### Schema Exploration
**Get database and table information:**
- List all databases: `SELECT alias as database_name, type FROM MD_ALL_DATABASES();`
- List tables in database: `SELECT database_name, schema_name, table_name, comment FROM duckdb_tables() WHERE database_name = 'your_database';`
- List views in database: `SELECT database_name, schema_name, view_name, comment, sql FROM duckdb_views() WHERE database_name = 'your_database';`
- Get column information: `SELECT column_name, data_type, comment, is_nullable FROM duckdb_columns() WHERE database_name = 'your_database' AND table_name = 'your_table';`
**Sample data exploration:**
- Quick preview: `SELECT * FROM table_name LIMIT 5;`
- Column statistics: `SUMMARIZE table_name;`
- Describe table: `DESCRIBE table_name;`
### Performance Tips
**QUALIFY Clause for Window Functions:**
-- Get top 2 products by sales in each category
SELECT category, product_name, sales_amount
FROM products
QUALIFY ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales_amount DESC) <= 2;
**Efficient Patterns:**
- Use `arg_max()` and `arg_min()` for "most recent" queries
- Filter early to reduce data volume
- Use CTEs for complex queries
- Prefer `GROUP BY ALL` for readability
- Use `QUALIFY` instead of subqueries for window function filtering
**Avoid These Patterns:**
- Functions on the left side of WHERE clauses (prevents pushdown)
- Unnecessary ORDER BY on intermediate results
- Cross products and cartesian joins
```
### Function Documentation
MotherDuck maintains `function_docs.jsonl` - compact, LLM-friendly documentation for every DuckDB/MotherDuck function available at: https://app.motherduck.com/assets/docs/function_docs.jsonl
**How to use**:
1. When user asks a question, search function docs using FTS or semantic search
2. Add the 5 most relevant function descriptions to the agent's context
3. This helps with specialized functions (window functions, date arithmetic, JSON operations, etc.)
:::info Preview Feature
Function documentation is in 'Preview' and will change in the future. The schema, content, and availability of `function_docs.jsonl` will change over time as we improve the resource.
:::
## Step 3: Give Your Agent Schema Context
Your agent needs to understand your database structure to generate correct queries.
### Finding Relevant Tables
Our `query_guide.md` explains how agents can explore schemas autonomously to find relevant tables. For faster, non-agentic identification, use the built-in `__MD_FILTER_TABLES()` function with fuzzy keyword search:
```sql
-- Use MotherDuck's smart table filtering function
-- Replace 'sales', 'customer' with relevant search terms
SELECT
database_name,
schema_name,
table_name,
table_comment,
column_comments,
table_similarity,
column_matches
FROM __MD_FILTER_TABLES(['sales', 'customer'], current_database())
ORDER BY table_similarity DESC LIMIT 15;
```
:::info Preview Feature
The `__MD_FILTER_TABLES` function is in 'Preview'. The function name, signature, or filtering behavior may change in the future.
:::
If you want to build your own filtering function, these are the raw tables to start with: [`duckdb_tables()`](https://duckdb.org/docs/stable/sql/meta/duckdb_table_functions.html#duckdb_tables), [`duckdb_columns()`](https://duckdb.org/docs/stable/sql/meta/duckdb_table_functions.html#duckdb_columns), [`duckdb_views()`](https://duckdb.org/docs/stable/sql/meta/duckdb_table_functions.html#duckdb_views), and [`MD_ALL_DATABASES()`](/sql-reference/motherduck-sql-reference/show-databases/).
### Make Schemas Agent-Friendly
**Use clear naming**: Choose explicit, unambiguous table and column names
❌ Bad: `ord_dtl`, `cust_id`, `amt`
✅ Good: `order_details`, `customer_id`, `total_amount`
**Add context with COMMENT ON**:
```sql
COMMENT ON TABLE orders IS 'Customer orders since 2020. Join to customers via customer_id';
COMMENT ON COLUMN orders.status IS 'Possible values: pending, shipped, delivered, cancelled';
COMMENT ON COLUMN orders.total_amount IS 'Total in USD including tax and shipping';
```
Comments help agents understand table relationships, valid values, and business logic. Learn more: [COMMENT ON documentation](https://duckdb.org/docs/stable/sql/statements/comment_on.html)
## Step 4: Configure Access Controls
Secure your agent's database access with appropriate permissions and isolation.
### Read-Only Access
Use [read-scaling tokens](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to ensure your agent only has read access. Read-scaling tokens connect to dedicated read replicas that cannot modify data.
```python
import duckdb
# Using a read-scaling token ensures read-only access
con = duckdb.connect('md:my_database?motherduck_token=')
```
**For multi-tenant [customer-facing analytics](/getting-started/customer-facing-analytics/) agents**:
Use [service accounts](/key-tasks/service-accounts-guide/#guide---create-and-configure-a-service-account) for your agents. You can grant these service accounts read-only access to specific databases using [shares](/key-tasks/sharing-data/sharing-overview/):
```sql
ATTACH 'md:_share/my_org/abc123' AS shared_data;
```
Consider creating separate service accounts per user/tenant for full compute isolation.
**Capacity planning**: Choose the number of [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) replicas and [Duckling size](/about-motherduck/billing/duckling-sizes/) according to the expected query complexity and concurrency.
### Read-Write Access & Sandboxing
For agents that need to create tables, modify data, or experiment safely, use zero-copy clones to create an isolated sandbox. This provides safe experimentation completely isolated from production data, with instant creation through zero-copy operations. Agents get full capabilities to create tables, modify data, and experiment freely, with easy sharing of results back to production when ready.
```sql
-- Create instant writable copy (clones must match source retention type)
CREATE DATABASE my_sandbox FROM my_database_share;
-- Agent can now read/write without affecting production data
-- Changes are isolated to this copy
```
Learn more: [CREATE DATABASE documentation](/sql-reference/motherduck-sql-reference/create-database/)
## Step 5: Implement Your Agent
Build your agent using an SDK or framework that supports function calling.
**Quick start option**: For immediate experimentation, try [Claude Desktop with the MotherDuck MCP Server](/sql-reference/mcp/) - no coding required.
**Custom agent option**: Here's a simple example using the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/):
```python
import duckdb
from agents import Agent, Runner, function_tool
# Connect to MotherDuck (use a read-scaling token for read-only access)
conn = duckdb.connect('md:?motherduck_token=')
@function_tool
def query_motherduck(sql: str) -> str:
"""Execute SQL query against MotherDuck database.
Args:
sql: The SQL query to execute against the MotherDuck database.
"""
try:
result = conn.execute(sql).fetchdf()
return result.to_string()
except Exception as e:
return f"Error executing query: {str(e)}"
# Load the DuckDB query guide (copy the system prompt template above into a local file)
with open('query_guide.md', 'r') as f:
query_guide = f.read()
# Create agent with database tool
agent = Agent(
name="MotherDuck Analytics Agent",
instructions=f"""You are a data analyst helping users query a MotherDuck database.
Use the query_motherduck tool to execute SQL queries against the database.
Always start with schema exploration before querying specific tables.
{query_guide}
""",
tools=[query_motherduck]
)
# Run the agent
result = Runner.run_sync(
agent,
"What were the top 5 products by revenue last month?"
)
print(result.final_output)
```
### Validating Queries Before Showing to Users
If your agent has a human in the loop to review, edit, or learn from generated queries, you face a challenge: you don't want to show users queries that are syntactically wrong or use non-existing table or column names, but you also don't want to execute queries without human approval. Use `try_bind()` to check for errors without execution—it validates syntax and verifies all referenced tables/columns exist in just a few milliseconds.
```sql
-- Valid query - empty result means success
CALL try_bind('SELECT customer_id, total FROM orders WHERE status = ''shipped''');
-- Invalid query - returns error message
CALL try_bind('SELECT * FORM orders');
```
**Example integration:**
```python
def generate_query_for_review(question: str) -> str:
"""Generate and validate SQL before showing to user."""
error_msg = None
for attempt in range(3):
sql = agent.generate_sql(question, error_feedback=error_msg)
# Validate before showing
validation = conn.execute("CALL try_bind(?)", [sql]).fetchone()
if not validation[0]: # Empty result means success
return f"Generated query:\n{sql}"
error_msg = validation[0] # Pass error to agent for next attempt
return "Could not generate a valid query to answer the question"
```
The agent should incorporate the error feedback from `try_bind()` into subsequent generation attempts to fix syntax errors or incorrect table/column references.
## Step 6: Test and Iterate
Validate your agent's performance and refine its behavior based on real-world usage.
### Testing and Quality
Choose a set of realistic user questions that cover simple filters ("Show me sales from last month"), complex analysis ("What's the trend in customer retention by region?"), and edge cases like empty results ("Show me sales for December 2019") or ambiguous requests ("Show me the best customers"). Test each question and check the agent's behavior. Focus on SQL correctness, result accuracy and query performance. See the next section for how to tackle common issues.
### Common Issues and Solutions
| Issue | Solution |
|-------|----------|
| Invalid SQL generation | Improve system prompt, add [function docs](#function-documentation) to context |
| Wrong tables queried | Add [COMMENT ON](https://duckdb.org/docs/stable/sql/statements/comment_on.html), improve schema descriptions, implement table filtering |
| Misunderstood questions | Add domain-specific examples to system prompt |
| Query performance | [EXPLAIN ANALYZE](/sql-reference/motherduck-sql-reference/explain-analyze/) to diagnose query inefficiencies, adjust [Duckling size](/about-motherduck/billing/duckling-sizes/) to scale compute resources |
## Next Steps
- Explore our [MCP Server](/sql-reference/mcp/) docs
- Try [AI Features in the MotherDuck UI](/key-tasks/ai-and-motherduck/ai-features-in-ui/) with Generate SQL & Edit
- Learn about [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for multi-tenant agents
- Review [Shares](/key-tasks/sharing-data/sharing-overview/) for read-only data access
---
Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-workflows
---
sidebar_position: 10
title: Using the MotherDuck MCP Server
description: Effective workflows and best practices for getting the most out of the MotherDuck MCP Server with AI assistants
---
The MotherDuck MCP Server connects AI assistants like Claude, ChatGPT, and Cursor to your data. This guide covers effective workflows for getting accurate, useful results from your AI-powered data analysis. If you haven't already, [set up your MCP connection](/sql-reference/mcp/) and by the end, you'll be able to guide an AI assistant of your choice to writing accurate, insightful queries.
## Prerequisites
To use the MotherDuck remote MCP server, you will need:
- A MotherDuck account with at least one database
- An AI client like Claude, Cursor, or ChatGPT already connected to the remote MCP server ([setup instructions](/sql-reference/mcp/))
:::note Read-only access
The MotherDuck MCP Server provides read-only access to your databases. You can explore schemas and run SELECT queries, but the AI cannot modify your data. If you need to run write operations like UPDATE or CREATE TABLE AS statements, ask the AI to generate the query and then run it yourself in the MotherDuck UI or your preferred client. For fully automated write operations, use the [self-hosted MCP server](/sql-reference/mcp/#self-hosted-mcp-server).
:::
## Start with schema exploration
Before diving into analysis, help the AI understand your data. This is a form of **context engineering**: by exploring your schema upfront, you hydrate the conversation with knowledge about your tables, columns, and relationships. This context carries forward, helping the AI write more accurate queries throughout your session.
Start conversations by asking about your database structure:
**Good first prompts:**
- *"What databases and tables do I have access to?"*
- *"Describe the schema of my `analytics` database"*
- *"What columns are in the `orders` table and what do they contain?"*
The MCP server provides tools for schema exploration that surface table relationships, data types, and any documentation you've added to your schema.
:::tip
If you have well-documented tables with [`COMMENT ON`](https://duckdb.org/docs/stable/sql/statements/comment_on.html) descriptions, the AI can use these to better understand your data's business meaning.
:::
## Frame questions with context
The more context you provide, the better the results. Include relevant details like:
- **Time ranges**: *"Show me orders from the last 30 days"* vs *"Show me orders"*
- **Filters**: *"Analyze customers in the US with more than 5 purchases"*
- **Metrics**: *"Calculate revenue as `quantity * unit_price`"*
- **Output format**: *"Return results as a summary table with percentages"*
**Example - Vague vs. Specific:**
| ❌ Vague | ✅ Specific |
|----------|-------------|
| "Show me sales data" | "Show me total sales by product category for Q4 2024, sorted by revenue descending" |
| "Find top customers" | "Find the top 10 customers by total order value in the last 12 months" |
| "Analyze trends" | "Compare monthly active users month-over-month for 2024, showing growth rate" |
## Iterate
Complex analysis works best as a conversation. Start simple, validate the results, then build up. Each exchange adds shared context, helping the AI write better queries as you go. While there is a temptation to get the perfect query in one shot, often insight comes as part of the process of data exploration.
When iterating, it can be helpful to have source data nearby to help verify outputs. Our users have noted that using their existing BI dashboard to quickly validate that metrics are correct helps to develop intuition about the information provided by the AI assistants.
## Common workflow patterns
### Data profiling
Quickly understand a new dataset:
```
"Profile the `transactions` table - show me:
- Row count and date range
- Distribution of key categorical columns
- Summary statistics for numeric columns
- Any null values or data quality issues"
```
:::tip DuckDB functions for EDA
DuckDB has a few SQL functions that are great for hydrating context:
- `Describe` which retrieves the metadata for a specific table
- `Summarize` which gets summary stats for a table (can be large)
- The `using sample 10` clause (at the end of the query) which samples the data (can be large) - using it with a where clause to narrow down is very helpful for performance
:::
### Generating charts
Some AI clients can generate visualizations directly from your query results. ChatGPT on the web and Claude Desktop both support creating charts as "artifacts" alongside your conversation.
Visualizations help you spot trends and outliers faster than scanning tables, validate that query results make sense at a glance, and share insights with stakeholders who prefer visual formats.
**Example prompts:**
- *"Chart monthly revenue for 2024 as a line graph"*
- *"Create a bar chart showing the top 10 customers by order count"*
- *"Visualize the distribution of order values as a histogram"*
- *"Show me a time series of daily active users with a 7-day moving average"*
Once you have a chart, you can iterate on it just like query results: *"Add a trend line"*, *"Change to a stacked bar chart"*, or *"Break this down by region"*.
:::note
When using the MCP with more IDE-like interfaces, the MCP plays very nicely with libraries like `matplotlib` for building more traditional charts.
:::
### Ad-hoc investigation
The MCP is especially useful for exploratory debugging when you're not sure what you're looking for. Rather than writing queries upfront, you can describe the problem and let the AI help you dig in.
```
"I noticed a spike in errors on Dec 10th. Help me investigate:
- What types of errors increased?
- Were specific users or endpoints affected?
- What changed compared to the previous week?"
```
One pattern we use at MotherDuck is loading logs or event data into a database and using the MCP to interrogate it conversationally. Instead of manually crafting regex patterns or grep commands, you can ask questions like *"What are the most common error messages in the last hour?"* or *"Show me all requests from user X that resulted in a 500 error"*. This turns log analysis from a tedious grep session into an interactive investigation where each answer informs the next question.
## Working with query results
### Refining results
Results rarely come out perfect on the first try. The conversational nature of MCP means you can refine incrementally rather than rewriting queries from scratch. If you're seeing test data mixed in, just say *"Add a filter to exclude test accounts"*. If the granularity is wrong, ask to *"Change the grouping from daily to weekly"*. Small adjustments like changing sort order or adding a column are easy follow-ups.
### Understanding queries
When the AI generates complex SQL, don't hesitate to ask for an explanation. This is useful both for validating the approach and for learning. Ask *"Explain what this query is doing step by step"* to understand the logic, or *"Are there any edge cases this query might miss?"* to sanity-check the results before relying on them.
### Exporting for further use
Once you have the results you need, ask for output in the format that fits your workflow. You can request a markdown table for documentation, CSV-friendly output for spreadsheets, or a written summary to share with your team. The AI can also help you format results for specific tools or audiences. Sometimes it can also be a great jumping off for further analysis with an expert, so asking for the final query to hand-off can also be a great step.
## Tips for better results
### Be explicit about assumptions
Your data likely has business rules that aren't obvious from the schema alone. If a "completed" order means status is either 'shipped' or 'delivered', say so. If revenue calculations should exclude refunds, mention it upfront. The AI can't infer these domain-specific rules, so stating them early prevents incorrect results and saves iteration time.
### Reference specific tables and columns
When you already know your schema, being specific helps the AI get it right the first time. Instead of asking about "the timestamp", say *"Use the `user_events.event_timestamp` column"*. If you know how tables relate, specify the join: *"Join `orders` to `customers` on `customer_id`"*. This is especially helpful in larger schemas where column names might be ambiguous.
### Ask for validation
When accuracy matters, ask the AI to sanity-check its own work. Questions like *"Does this total match what you'd expect based on the row counts?"* or *"Can you verify this join doesn't create duplicates?"* can catch subtle bugs before you rely on the results. The AI can run quick validation queries to confirm the logic is sound.
## Troubleshooting
:::tip Beyond querying
The MCP server includes tools beyond just running queries. Most are metadata lookups or search functions for finding tables and columns, but the [ask docs question](/sql-reference/mcp/ask-docs-question) tool is particularly useful when you're stuck on tricky syntax or DuckDB-specific features. If the AI is struggling with a query pattern, try asking it to look up the relevant documentation first.
:::
| Issue | Solution |
|-------|----------|
| AI queries wrong table | Ask: *"What tables are available?"* then specify the correct one |
| Results don't look right | Ask: *"Show me sample data from the source table"* to verify the data |
| Query is slow | Ask: *"Can you optimize this query?"*, add filters to reduce data scanned, or [increase your Duckling size](/about-motherduck/billing/duckling-sizes/) |
| AI doesn't understand the question | Rephrase with more specific column names and business context |
| Can't type fast enough | Use voice-to-text to interact with your AI assistant |
## Related resources
- [MCP Server Setup](/sql-reference/mcp/) - Installation and configuration
- [AI Features in the UI](/key-tasks/ai-and-motherduck/ai-features-in-ui/) - Built-in AI features for the MotherDuck interface
- [Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/) - Build custom AI agents with MotherDuck
---
Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/text-search-in-motherduck
---
title: Text Search in MotherDuck
---
# Text Search in MotherDuck
Text search is a fundamental operation in data analytics - whether you're finding records by name, searching documents for relevant content, or building question-answering systems. This guide covers search strategies available in MotherDuck, from simple pattern matching to advanced semantic search, and how to combine them for optimal results.
## Quick Start: Common Search Patterns
Start here to identify the best search method for your use case. The right search approach depends on what you're searching, how you expect to use search, and what results you need. Most use cases fall into one of three patterns, each linking to detailed implementation guidance below:
**Keyword Search Over Identifiers**: When searching for specific items like company names, product codes, or customer names, use [Exact Match](#exact-match) for precise and low-latency lookups. If you need typo tolerance (e.g., "MotheDuck" → "MotherDuck"), use [Fuzzy Search](#fuzzy-search-text-similarity).
**Keyword Search Over Documents**: When searching longer text like articles, product descriptions, or documentation, use [Full-Text Search](#full-text-search-fts). This ranks documents by keyword relevance, and handles cases where users provide a few keywords that should appear in the content.
**Semantic Search**: When searching by meaning and similarity rather than exact keywords, use [Embedding-based Search](#embedding-based-search). This covers:
- Understanding synonyms (e.g., matching "data warehouse" with "analytics platform")
- Understanding natural language queries (e.g., "wireless headphones with good battery life")
- Finding similar content (e.g., support tickets describing similar customer issues)
---
For answering natural language questions about *structured* data (e.g., "How many customers do we have in California?"), see [Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/).
## Refining Your Search Strategy
If the patterns above don't fully match your use case, use these four questions to navigate to the right method. Each question links to specific sections with implementation details:
1. **What is the search corpus?** Consider what you're searching through:
- **Identifiers** like company names, product IDs, or person names → [Exact Match](#exact-match) or [Fuzzy Search](#fuzzy-search-text-similarity)
- **Documents** like articles, descriptions, or reports → [Keyword search (regex)](#exact-match) or [Full-Text Search](#full-text-search-fts) (FTS) or [Embedding-Based Search](#embedding-based-search) or [Hybrid](#fts-pre-filtering-hybrid-search) (combining FTS + embeddings)
- **Structured (numerical) data** → [Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/) that convert natural language questions to SQL
2. **What is the user input?** Think about how users express their search:
- **Single terms** like "MotherDuck" → [Exact Match](#exact-match) or [Fuzzy Search](#fuzzy-search-text-similarity)
- **Keyword phrases** like "data warehouse analytics" → [Keyword search (regex)](#exact-match) or [Full-Text Search](#full-text-search-fts) or [Embedding-based search](#embedding-based-search)
- **Questions** like "What companies offer cloud analytics?" → [Embedding-based search](#embedding-based-search) with [HyDE](#hypothetical-document-embeddings-hyde)
- **Example documents** (finding similar content) → [Embedding-based search](#embedding-based-search)
3. **What is the desired output?** Clarify what you're returning:
- **Ranked list** (retrieval of documents/records) → Covered by this guide
- **Generated text answers** (RAG-style Q&A, chatbots, summarization) → Use retrieval methods from this guide in combination with the [`prompt()`](/sql-reference/motherduck-sql-reference/ai-functions/prompt/#retrieval-augmented-generation-rag) function.
4. **What is the desired search behavior?** Think about what search qualities matter:
- **Exact match** for specific words (IDs and codes) → [Exact Match](#exact-match) or [Keyword search (regex)](#using-regular-expressions)
- **Typo resilience** to handle misspellings like "MotheDuck" → "MotherDuck" → [Fuzzy search](#fuzzy-search-text-similarity)
- **Synonym resilience** to match "data warehouse" with "analytics platform" → [Embedding-based search](#embedding-based-search)
- **Customizable ranking** → See [Reranking](#reranking) in the [Advanced Methods](#advanced-methods) section
- **Latency and concurrency** → See [Performance Guide](#performance-guide)
## Search Methods
### Exact Match
Use exact match search for specific identifiers, codes, or when you need guaranteed matches. This is the fastest search method.
#### Using LIKE
For substring matching, use `LIKE` (or `ILIKE` for case-insensitive). In patterns, `%` matches any sequence of characters and `_` matches exactly one character.
```sql
-- Find places with 'Starbucks' in their name
SELECT name, locality, region
FROM foursquare.main.fsq_os_places
WHERE name LIKE '%Starbucks%'
LIMIT 10;
```
See also: [Pattern Matching](https://duckdb.org/docs/stable/sql/functions/pattern_matching.html) in DuckDB documentation
#### Using Regular Expressions
For more complex pattern matching or matching multiple keywords, use `regexp_matches()` with `(?i)` for case-insensitive searches:
```sql
-- Find Hacker News posts with 'python', 'javascript', or 'rust' in text
SELECT title, "by", score
FROM sample_data.hn.hacker_news
WHERE regexp_matches(text, '(?i)(python|javascript|rust)')
LIMIT 10;
```
See also: [Regular Expressions](https://duckdb.org/docs/stable/sql/functions/regular_expressions) in DuckDB documentation
### Fuzzy Search (Text Similarity)
Fuzzy search handles typos and spelling variations in entity names like companies, people, or products. Use `jaro_winkler_similarity()` for most fuzzy matching scenarios - it offers the best balance of accuracy and performance compared to `damerau_levenshtein()` or `levenshtein()`.
```sql
-- Find places similar to 'McDonalds' (handles typo 'McDonalsd')
SELECT
name,
locality,
region,
jaro_winkler_similarity('McDonalsd', name) AS similarity
FROM foursquare.main.fsq_os_places
ORDER BY similarity DESC
LIMIT 10;
```
See also: [Text Similarity Functions](https://duckdb.org/docs/stable/sql/functions/text#text-similarity-functions) in DuckDB documentation
### Full-Text Search (FTS)
Full-Text Search ranks documents by keyword relevance using BM25 scoring, which considers both how often terms appear in a document and how rare they are across all documents. Use this for articles, descriptions, or longer text where you need relevance ranking. FTS automatically handles word stemming (e.g., "running" matches "run") and removes common stopwords (like "the", "and", "or"), but requires exact word matches - it won't handle typos in search queries.
#### Basic FTS Setup
FTS requires write access to the table. Since we're using a read-only example database, we first create a copy of the table in a read-write database we own:
```sql
CREATE TABLE hn_stories AS
SELECT id, title, text, "by", score, type
FROM sample_data.hn.hacker_news
WHERE type = 'story'
AND LENGTH(text) > 100
LIMIT 10000;
```
Build the FTS index on the text column. This creates a new schema called `fts_{schema}_{table_name}` (in this case `fts_main_hn_stories`):
```sql
PRAGMA create_fts_index(
'hn_stories', -- table name
'id', -- document ID column
'text' -- text column to index
);
```
Search the index using the `match_bm25` function from the newly created schema:
```sql
SELECT
id,
title,
text,
fts_main_hn_stories.match_bm25(id, 'database analytics') AS score
FROM hn_stories
ORDER BY score DESC
LIMIT 10;
```
#### Index Maintenance
FTS indexes need to be updated when the underlying data changes. Rebuild the index using the `overwrite` parameter:
```sql
PRAGMA create_fts_index('hn_stories', 'id', 'text', overwrite := 1);
```
See also: [Full-Text Search Guide](https://duckdb.org/docs/stable/guides/sql_features/full_text_search.html) and [Full-Text Search Extension](https://duckdb.org/docs/stable/core_extensions/full_text_search) in DuckDB documentation
### Embedding-Based Search
Embedding-based search finds conceptually similar text by meaning, not keywords. Use this for natural language queries, handling synonyms, or when users search with questions. Embeddings handle synonyms and typos naturally without manual configuration.
:::note
Embedding generation and lookups are priced in [AI Units](/about-motherduck/billing/pricing#advanced-ai-functions). Business and Lite plans have a default soft limit of 10 AI Units per user/day (sufficient to embed around 600,000 rows) to help prevent unexpected costs. If you'd like to adjust these limits, [just ask!](/troubleshooting/support)
:::
:::info
The DuckDB [VSS extension](https://duckdb.org/docs/stable/core_extensions/vss) for approximate vector search (HNSW) is currently experimental, and not supported in MotherDuck's cloud service (Server-Side). [Learn more](/concepts/duckdb-extensions/) about MotherDuck's support for DuckDB extensions.
:::
#### Basic Embedding-Based Search Setup
Generate embeddings for your text data, then search using exact vector similarity. For search queries phrased as questions (like "What are the best practices for...?"), see [Hypothetical Document Embeddings](#hypothetical-document-embeddings-hyde).
```sql
-- Reusing the hn_stories table from the FTS section, add embeddings
ALTER TABLE hn_stories ADD COLUMN text_embedding FLOAT[512];
UPDATE hn_stories SET text_embedding = embedding(text);
-- Semantic search - this will also match texts with related concepts like 'neural networks', 'deep learning', etc.
SELECT
title,
text,
array_cosine_similarity(
embedding('machine learning and artificial intelligence'),
text_embedding
) AS similarity
FROM hn_stories
ORDER BY similarity DESC
LIMIT 10;
```
See also: [MotherDuck Embedding Function](/sql-reference/motherduck-sql-reference/ai-functions/embedding/), and [array_cosine_similarity](https://duckdb.org/docs/stable/sql/functions/array#array_cosine_similarityarray1-array2) in DuckDB documentation
#### Document Chunking for Embedding-Based Search
When documents are longer than ~2000 characters, consider breaking them into smaller chunks to improve retrieval precision and focus results. For production pipelines with PDFs or Word docs, you can use the [MotherDuck integration for Unstructured.io](https://motherduck.com/blog/effortless-etl-unstructured-data-unstructuredio-motherduck/). Otherwise, you can also do document chunking in the database - here are some helpful macros:
```sql
-- Fixed-size chunking with configurable overlap
CREATE MACRO chunk_fixed_size(text_col, chunk_size, overlap) AS TABLE (
SELECT
gs.generate_series as chunk_number,
substring(text_col, (gs.generate_series - 1) * (chunk_size - overlap) + 1, chunk_size) AS chunk_text
FROM generate_series(1, CAST(CEIL(LENGTH(text_col) / (chunk_size - overlap * 1.0)) AS INTEGER)) gs
WHERE LENGTH(substring(text_col, (gs.generate_series - 1) * (chunk_size - overlap) + 1, chunk_size)) > 50
);
-- Paragraph-based chunking (splits on double newlines)
CREATE MACRO chunk_paragraphs(text_col) AS TABLE (
WITH chunks AS (SELECT string_split(text_col, '\n\n') as arr)
SELECT
UNNEST(generate_series(1, array_length(arr))) as chunk_number,
UNNEST(arr) as chunk_text
FROM chunks
);
-- Sentence-based chunking (splits on sentence boundaries)
CREATE MACRO chunk_sentences(text_col) AS TABLE (
WITH chunks AS (SELECT string_split_regex(text_col, '[.!?]+\s+') as arr)
SELECT
UNNEST(generate_series(1, array_length(arr))) as chunk_number,
UNNEST(arr) as chunk_text
FROM chunks
);
```
Use one of the macros to create chunks from your documents. Fixed-size chunks (300-600 chars with 10-20% overlap) work well for most use cases:
```sql
CREATE OR REPLACE TABLE hn_text_chunks AS
SELECT
id AS post_id,
title,
chunks.chunk_number,
chunks.chunk_text
FROM hn_stories
CROSS JOIN LATERAL chunk_fixed_size(text, 500, 100) chunks;
-- Alternative: CROSS JOIN LATERAL chunk_paragraphs(text) chunks;
-- Alternative: CROSS JOIN LATERAL chunk_sentences(text) chunks;
```
Generate embeddings for the chunks:
```sql
ALTER TABLE hn_text_chunks ADD COLUMN chunk_embedding FLOAT[512];
UPDATE hn_text_chunks SET chunk_embedding = embedding(chunk_text);
```
Once you have chunks with embeddings, search them the same way as full documents using `array_cosine_similarity()` - the chunk-level results often provide more precise matches than searching entire documents.
## Performance Guide
Search performance depends on several factors, from the chosen search method, to cold vs. warm reads, Duckling sizing, and tenancy model.
When running a search query against your data for the first time (cold read), it may have a higher latency than subsequent queries (warm reads). For production search workloads, ideally dedicate a service account's Duckling primarily to search, so other queries don't compete with search queries. Account for [Duckling cooldown periods](/about-motherduck/billing/duckling-sizes/) - the first search query after cooldown may experience more latency.
The DuckDB analytics engine divides data into chunks and processes them in parallel across threads. More data means more chunks to process in parallel, so larger datasets don't necessarily take proportionally longer to search - they just use more threads simultaneously.
**Duckling sizing:** Optimal latency requires warm reads and enough threads to process your data in parallel. With the ideal [Duckling sizing](/about-motherduck/billing/duckling-sizes/) configuration matched to your dataset size, keyword search over identifiers ([exact match](#exact-match), [fuzzy match](#fuzzy-search-text-similarity)) typically achieves latencies in the range of a few hundred milliseconds, while document search ([regex](#using-regular-expressions), [Full-Text Search](#full-text-search-fts), [embedding search](#embedding-based-search)) typically achieves 0.5-3 second latency. Our team is happy to help advise on the right resource allocation for your specific workload and latency targets - [get in touch](/troubleshooting/support) to discuss how we can meet your needs.
**Handling Concurrent Requests:** For handling multiple simultaneous search requests effectively, consider using [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to distribute load across multiple read scaling Ducklings. Alternatively, consider [per-user tenancy](/getting-started/data-warehouse/#per-user-tenancy), providing isolated compute resources for each user.
To optimize further, see the strategies below. For questions or requirements beyond this guide, please [get in touch](/troubleshooting/support).
### Search Optimization Strategies
When optimizing search performance, consider the following options.
#### Pre-filtering
Reduce the search space using structured metadata (e.g. location, categories, date ranges) that can be inferred from the user's context, before running similarity searches:
```sql
-- Create a local copy with embeddings for place names (using a subset)
CREATE TABLE places AS
SELECT fsq_place_id, name, locality, region, fsq_category_labels
FROM foursquare.main.fsq_os_places
WHERE name IS NOT NULL
LIMIT 10000;
-- Add embeddings for semantic search
ALTER TABLE places ADD COLUMN name_embedding FLOAT[512];
UPDATE places SET name_embedding = embedding(name);
-- Pre-filter by location before semantic search
WITH filtered_candidates AS (
SELECT fsq_place_id, name, locality, fsq_category_labels, name_embedding
FROM places
WHERE locality = 'New York' -- Filter by location and region
AND region = 'NY'
)
SELECT
name,
locality,
fsq_category_labels,
array_cosine_similarity(
embedding('italian restaurant'),
name_embedding
) AS similarity
FROM filtered_candidates
ORDER BY similarity DESC
LIMIT 20;
```
#### Reducing Embedding Dimensionality
Halving embedding dimensions roughly halves compute time. OpenAI embeddings can be truncated at specific dimensions (256 for `text-embedding-3-small`, 256 or 512 for `text-embedding-3-large`). Use lower dimensions for initial pre-filtering, then rerank with full embeddings:
```sql
-- Setup: Create normalization macro
CREATE MACRO normalize(v) AS (
CASE
WHEN len(v) = 0 THEN NULL
WHEN sqrt(list_dot_product(v, v)) = 0 THEN NULL
ELSE list_transform(v, element -> element / sqrt(list_dot_product(v, v)))
END
);
-- Add lower-dimensional column (e.g., 256 dims instead of 512)
ALTER TABLE hn_stories ADD COLUMN text_embedding_short FLOAT[256];
UPDATE hn_stories SET text_embedding_short = normalize(text_embedding[1:256]);
```
Then use a two-stage search:
```sql
-- Stage 1: Fast pre-filter with short embeddings
SET VARIABLE query_emb = embedding('machine learning algorithms', 'text-embedding-3-large');
SET VARIABLE query_emb_short = normalize(getvariable('query_emb')[1:256])::FLOAT[256];
WITH candidates AS (
SELECT id,
array_cosine_similarity(getvariable('query_emb_short'), text_embedding_short) AS similarity
FROM hn_stories
ORDER BY similarity DESC
LIMIT 500 -- Get more candidates if needed
)
-- Stage 2: Rerank with full embeddings
SELECT p.title, p.text,
array_cosine_similarity(getvariable('query_emb'), p.text_embedding) AS final_similarity
FROM hn_stories p
WHERE p.id IN (SELECT id FROM candidates)
ORDER BY final_similarity DESC
LIMIT 10;
```
#### FTS Pre-filtering (Hybrid Search)
FTS typically has lower latency than embedding search, making it effective as a pre-filter to reduce similarity comparisons. Use a large LIMIT in the FTS stage to ensure good recall:
```sql
-- FTS pre-filter with large limit, then semantic rerank
SET VARIABLE search_query = 'artificial intelligence neural networks';
WITH fts_candidates AS (
SELECT id,
fts_main_hn_stories.match_bm25(id, getvariable('search_query')) AS fts_score
FROM hn_stories
ORDER BY fts_score DESC
LIMIT 10000 -- Large limit to ensure recall
)
SELECT h.id, h.title, h.text,
array_cosine_similarity(
embedding(getvariable('search_query')),
h.text_embedding
) AS similarity
FROM hn_stories h
INNER JOIN fts_candidates f ON h.id = f.id
ORDER BY similarity DESC
LIMIT 10;
```
See also: [Search Using DuckDB Part 3 (Hybrid Search)](https://motherduck.com/blog/search-using-duckdb-part-3/)
## Advanced Methods
This section covers additional techniques to customize and improve your search. The methods below demonstrate common approaches - many other variants are possible.
:::note
Some methods in this section make use of the `prompt()` function, which is priced in [AI Units](/about-motherduck/billing/pricing#advanced-ai-functions). Business and Lite plans have a default soft limit of 10 AI Units per user/day (sufficient to process around 80,000 rows) to help prevent unexpected costs. If you'd like to adjust these limits, [just ask!](/troubleshooting/support)
:::
### LLM-Enhanced Keyword Expansion
Generate synonyms with an LLM, then use them in pattern matching:
```sql
-- Generate synonyms using LLM with structured output
SET VARIABLE search_term = 'programming';
WITH synonyms AS (
SELECT prompt(
'Give me 5 synonyms for ''' || getvariable('search_term') || '''',
struct := {'synonyms': 'VARCHAR[]'}
).synonyms AS synonym_list
)
-- Search with expanded terms
SELECT
title,
text
FROM sample_data.hn.hacker_news, synonyms
WHERE regexp_matches(text, getvariable('search_term') || '|' || array_to_string(synonym_list, '|'))
LIMIT 10;
```
See also: [MotherDuck `prompt()` Function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/)
### Hypothetical Document Embeddings (HyDE)
HyDE improves question-based retrieval by generating a hypothetical answer first, then searching with that answer's embedding. This works because questions and answers have different linguistic patterns - the hypothetical answer better matches actual document content. Use with semantic search or the semantic component of hybrid search.
```sql
-- HyDE: Generate hypothetical answer, then search with it
WITH hypothetical_answer AS (
SELECT prompt(
'Answer this question in 2-3 sentences:
"What are the key challenges in building scalable distributed systems?"
Focus on typical technical challenges and solutions.'
) AS answer
)
-- Search using the hypothetical answer's embedding
SELECT
title,
text,
array_cosine_similarity(
(SELECT embedding(answer) FROM hypothetical_answer),
text_embedding
) AS similarity
FROM hn_stories
ORDER BY similarity DESC
LIMIT 10;
```
See also: [Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE paper)](https://arxiv.org/abs/2212.10496)
### Reranking
Reranking typically happens in two stages: initial retrieval to get top candidates (100-500 results), then precise reranking of that smaller set.
#### Rule-Based Reranking with Metadata
Refine results based on business rules and metadata like score, category, or freshness:
```sql
-- Find similar posts with metadata-based reranking
WITH initial_similarity AS (
-- Step 1: Fast vector similarity for top candidates
SELECT
title,
text,
score as author_score,
array_cosine_similarity(
embedding('artificial intelligence and machine learning applications'),
text_embedding
) AS emb_similarity
FROM hn_stories
ORDER BY emb_similarity DESC
LIMIT 100
),
reranked_scores AS (
-- Step 2: Rerank with metadata (author score)
SELECT
title,
text,
author_score,
emb_similarity,
-- Score boost (normalize to 0-1 range based on actual data)
(author_score / MAX(author_score) OVER ()) AS author_score_norm,
-- Combined final score: 60% semantic + 40% author score
(emb_similarity * 0.6 + author_score_norm * 0.4) AS reranked_score
FROM initial_similarity
)
SELECT
title,
text,
author_score,
ROUND(emb_similarity, 3) as semantic_score,
ROUND(author_score_norm, 3) as author_score_normalized,
ROUND(reranked_score, 3) as final_score
FROM reranked_scores
ORDER BY reranked_score DESC
LIMIT 10;
```
#### LLM-Based Reranking
For complex relevance criteria that are hard to express as rules, use an LLM to judge and score results. The [`prompt()` function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) is optimized for batch processing and processes requests in parallel - so reranking 50 results typically adds only a few hundred milliseconds.
```sql
-- LLM reranking for top search results
SET VARIABLE search_query = 'best practices for code review and software quality';
WITH top_candidates AS (
-- Initial retrieval (e.g., via semantic search)
SELECT
id,
title,
text,
array_cosine_similarity(
embedding(getvariable('search_query')),
text_embedding
) AS initial_score
FROM hn_stories
ORDER BY initial_score DESC
LIMIT 20
),
llm_reranked AS (
SELECT
*,
prompt(
format(
'Rate how well this post matches the query ''{}''.
Post: {} - {}',
getvariable('search_query'), title, text
),
struct := {'rating': 'INTEGER'}
).rating AS llm_score
FROM top_candidates
)
SELECT
title,
text,
ROUND(initial_score, 3) as initial_score,
llm_score,
ROUND((0.6 * initial_score + 0.4 * llm_score / 10.0), 3) AS final_score
FROM llm_reranked
ORDER BY final_score DESC
LIMIT 10;
```
## Next Steps
- Check out the MotherDuck [Embedding Function](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) and [Prompt Function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/)
- Review the [Full-Text Search Guide](https://duckdb.org/docs/stable/guides/sql_features/full_text_search.html) in DuckDB documentation
- Read the MotherDuck blog series: [Search Using DuckDB Part 1](https://motherduck.com/blog/search-using-duckdb-part-1/), [Part 2](https://motherduck.com/blog/search-using-duckdb-part-2/), [Part 3](https://motherduck.com/blog/search-using-duckdb-part-3/)
- Explore [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/)
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/attach-modes
---
title: Attach Modes
description: Understand Workspace and Single attach modes
---
## **MotherDuck Attach Modes: Workspace and Single modes**
This guide explains MotherDuck's two connection modes: **workspace** and **single**. Workspace mode is designed for working with multiple databases persistently across sessions, while single mode connects to just one database.
### **Connection Modes**
MotherDuck offers two connection modes: workspace and single. The mode you use determines how your attachments and detachments are handled and whether these changes are saved for future sessions.
* **Workspace Mode**: This is the default mode when you want to work with all attached MotherDuck databases. When you attach or detach a database in this mode, that change is remembered for your next session. This is useful when you consistently work with the same set of databases. Parallel connections to MotherDuck in workspace mode will keep their attachments in sync. E.g. detaching a database in one client in workspace mode will detach it in all other clients that are connected in workspace mode.
* **Single Mode**: This mode is for when you only want to work with a specific MotherDuck database. Any databases you attach or detach during this session will not affect your saved workspace for the next time you connect or interfere with attachment state of other parallel connections to MotherDuck. Single mode is for example useful if you want to connect from BI tools that only support a single attached database.
:::tip
You can't switch between modes in the middle of a session. The mode is set by the first command you use to connect to MotherDuck.
:::
### **Connecting to MotherDuck with a connection string**
When you first connect to MotherDuck in a session, the connection string you use determines the attach mode. This applies to most of clients, like the DuckDB CLI (`duckdb 'md:...'`) and Python (`duckdb.connect('md:...')`).
* **To connect in Workspace Mode (default):**
* Use `md:` or `md:`.
* This connects to your MotherDuck workspace, attaching *all* databases from your last saved session.
* If you specify a database name, it becomes the active database.
* Any changes to attachments (attaching or detaching databases) are saved and will be restored in your next workspace session.
* **To connect in Single Mode:**
* Use `md:?attach_mode=single`.
* This connects *only* to the specified database, ignoring your saved workspace.
* Attachment changes are *temporary* and will *not* be saved.
* Note: You must specify a database name to use single mode. Connecting with `md:?attach_mode=single` is not allowed, as this mode requires a specific database target.
### **Connecting to MotherDuck using the ATTACH command**
If you are already in a DuckDB session, but **not** connected to MotherDuck yet, your first ATTACH command that targets MotherDuck establishes the attach mode for that session.
* **To connect in Workspace Mode:**
* Use `ATTACH 'md:'`.
* This attaches your entire saved workspace.
* The session is now in workspace mode, and any subsequent attachment changes will be persisted for future sessions.
* **To connect in Single Mode:**
* Use `ATTACH 'md:'`.
* This attaches *only* the specified database.
* The session is implicitly set to single mode. Attachment changes are not saved.
* Once in single mode, you cannot attach the entire workspace using `ATTACH 'md:'`.
### **Tips & Tricks**
Further Notes:
* You can also explicitly set the attach mode before connecting to MotherDuck.
```sql
LOAD motherduck
SET motherduck_attach_mode = 'workspace' -- or 'single'
ATTACH 'md:foo'
```
* The MotherDuck UI is always connecting in workspace mode.
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-and-connecting-to-motherduck
---
title: Authenticating and connecting to MotherDuck
description: Learn how to authenticate and connect to MotherDuck
---
# Authenticating and connecting to MotherDuck
These pages explain how to connect to MotherDuck using the CLI, Python, JDBC and NodeJS.
First, you need to [authenticate to MotherDuck](./authenticating-to-motherduck) by [manual authentication](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#manual-authentication) via the Web UI, or automatic authentication via an [access token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token).
To connect to a MotherDuck database, you can [create a connection](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/).
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck
---
sidebar_position: 1
title: Authenticating to MotherDuck
description: Authenticate to a MotherDuck account
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Authenticating to MotherDuck
MotherDuck supports two types of authentication:
- Manual authentication, typically used by the MotherDuck UI
- Authentication using an access token, more convenient for Python, CLI or other clients.
## Manual authentication
MotherDuck UI authenticates using several methods:
- Google
- Github
- Username and password
You can leverage multiple modes of authentication in your account. For example, you can authenticate both via Google and via username and password as you see fit.
To authenticate in CLI or Python, you will be redirected to an authentication web page. Currently, this happens every session. To avoid having to re-authenticate, you can save your access token, as described in the [Authenticate With an Access Token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token) section.
## Authentication using an access token
If you are using Python or CLI and don't want to authenticate every session, you can securely save your credentials locally.
### Creating an access token
To create an access token:
- Go to the [MotherDuck UI](https://app.motherduck.com)
- In top left click on organization name and then `Settings`
- Click `+ Create token`
- Specify a name for the token that you'll recognize (like "DuckDB CLI on my laptop")
- Specify the type of token you want. Tokens can be Read/Write (default) or [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/).
- Choose whether you want the token to expire and then click on `Create token`
- Copy the access token token to your clipboard by clicking on the copy icon

### Storing the access token as an environment variable
You can save the access token as `motherduck_token` in your environment variables.
An example of setting this in a terminal:
```bash
export motherduck_token=''
```
You can also add this line to your `~/.zprofile` or `~/.bash_profile`, or store it in a `.env` file in your project root.
Once this is done, your authentication token is saved and you can connect to MotherDuck with the following connection string:
```bash
duckdb "md:my_db"
```
:::info
This is the best practice for security reasons. The token is sensitive information and should be kept safe. Do not share it with others.
:::
Alternatively, you can specify an access token in the MotherDuck connection string: `md:my_db?motherduck_token=`.
```bash
duckdb "md:my_db?motherduck_token="
```
When in the DuckDB CLI, you can use the `.open` command and specify the connection string as an argument.
```CLI
.open md:my_db?motherduck_token=
```
## Using connection string parameters
### Authentication using SaaS mode
You can limit MotherDuck's ability to interact with your local environment using `SaaS Mode`:
- Disable reading or writing local files
- Disable reading or writing local DuckDB databases
- Disable installing or loading any DuckDB extensions locally
- Disable changing any DuckDB configurations locally
This mode is useful for third-party tools, such as BI vendors, that host DuckDB themselves and require additional security controls to protect their environments.
:::info
Using this parameter requires to use `.open` when using the DuckDB CLI or `duckdb.connect` when using Python. This initiates a new connection to MotherDuck and will detach any existing connection to a local DuckDB database.
:::
```cli
.open md:[]?[motherduck_token=]&saas_mode=true
```
```python
conn = duckdb.connect("md:[]?[motherduck_token=]&saas_mode=true")
```
### Using attach mode
By default, when you connect to MotherDuck, you will be connected to all databases you have access to.
If you want limited the connection to only one database, you can use the `attach_mode` with the value `single`.
For example, to connect to a database named `my_database`, run:
```bash
duckdb 'md:my_database?attach_mode=single'
```
:::note
`` that starts with a number cannot be connected to directly. You will need to connect without a database specified and then `CREATE` and `USE` using a double quoted name. Eg: `USE DATABASE "1database"`
:::
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck
---
sidebar_position: 2
title: Connecting to MotherDuck
description: Create one or more connections to a MotherDuck database
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
A single DuckDB connection executes one query at a time, aiming to maximize the performance of that query, making reuse of a single connection both simple and performant.
We recommend starting with the simplest way of connecting to MotherDuck and running queries, and if that does not meet your requirements, to explore the advanced use-cases described in subsequent sections.
## Create a connection
The below code snippets show how to create a connection to a MotherDuck database from the CLI, Python, JDBC and NodeJS language APIs.
To connect to your MotherDuck database, use `duckdb.connect("md:my_database_name")` which will return a `DuckDBPyConnection` object that you can use to interact with your database.
```python
import duckdb
# Create connection to your default database
conn = duckdb.connect("md:my_db")
# Run query
conn.sql("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)")
conn.sql("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")
res = conn.sql("SELECT * FROM items")
# Close the connection
conn.close()
```
To connect to your MotherDuck database, you can create a `Connection` by using the `"jdbc:duckdb:md:databaseName"` connection string format. For authentication, you need to provide a MotherDuck token.
There are two ways to provide the token:
1. As a connection property:
```java
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.sql.ResultSet;
import java.util.Properties;
// Create properties with your MotherDuck token
Properties props = new Properties();
props.setProperty("motherduck_token", "");
// Create connection to your database
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db", props);
Statement stmt = conn.createStatement()) {
stmt.executeUpdate("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)");
stmt.executeUpdate("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");
try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) {
while (rs.next()) {
System.out.println("Item: " + rs.getString(1) + " costs " + rs.getInt(3));
}
}
}
```
2. As part of the connection string:
```java
// Create connection with token in the connection string
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?motherduck_token=");
Statement stmt = conn.createStatement()) {
stmt.executeUpdate("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)");
stmt.executeUpdate("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");
try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) {
while (rs.next()) {
System.out.println("Item: " + rs.getString(1) + " costs " + rs.getInt(3));
}
}
}
```
:::info
For security reasons, it's generally recommended to use environment variables to store your MotherDuck token rather than hardcoding it in your application. If an environment variable named `motherduck_token` is set, it will be used automatically.
:::
To connect to your MotherDuck database, you can create a `DuckDBInstance` with the `'md:databaseName'` connection string format:
```javascript
import { DuckDBInstance } from '@duckdb/node-api';
// Create connection to your default database
const instance = await DuckDBInstance.create("md:mydb");
const conn = await instance.connect();
// Run queries
await conn.run('CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)');
await conn.run("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)");
const result = await conn.runAndReadAll('SELECT * FROM items');
console.table(result.getRowObjects());
```
To connect to your MotherDuck database, run `duckdb md:`.
```shell
duckdb "md:my_db"
```
Now, you will enter the DuckDB interactive terminal to interact with your database.
```sql
D CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER);
D INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2);
D SELECT * FROM items;
```
## Read Scaling With Session Hints
If you are planning on multiple end users connecting with a [Read Scaling Token](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/read-scaling.md), ensure each user can get a dedicated backend (up to the maximum configured flock size) by passing a [`session_hint`](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/?_gl=1*l0d2b8*_up*MQ..*_ga*MzY2OTE2Mjc4LjE3NTUxNjQ3MDY.*_ga_L80NDGFJTP*czE3NTUxNjQ3MDYkbzEkZzAkdDE3NTUxNjQ3MDYkajYwJGwwJGg3OTc4MzAwODU.#read-scaling-with-session-hints) in the connection string.
Session hints make sure that all the queries from the same end user are routed to the same backend Duckling, even if they originate from different services/servers. This allows for optimal caching and resource allocation for each specific user's needs.
After establishing the connection, it can be used the same way as any DuckDB/MotherDuck connection -- to run queries, and then either be closed explicitly or go out of scope, as in the examples above.
Note that this is a harmless no-op if the connection is made with a regular read/write token.
```python
import duckdb
# Create a connection and allocate a stable backend for user123.
con = duckdb.connect(
"md:my_db?session_hint=user123",
config = {'motherduck_token': ''} )
```
```java
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.sql.ResultSet;
import java.util.Properties;
// Create properties with your MotherDuck token
Properties props = new Properties();
props.setProperty("motherduck_token", "");
// Create a connection and allocate a stable backend for user123.
try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?session_hint=user123", props)) {
// ...
}
```
```javascript
import { DuckDBInstance } from '@duckdb/node-api';
// Create a connection and allocate a stable backend for user123.
const instance = await DuckDBInstance.create(
'md:my_db?session_hint=user123',
{ motherduck_token: '' } );
// ...
```
## Multiple Connections and the Database Instance cache
DuckDB clients in Python, R, JDBC, and ODBC prevent redundant reinitialization by keeping instances of database-global context cached by the database path.
Other language APIs are likely to get similar functionality over time.
When connecting to MotherDuck, the instance is cached for an additional 15 minutes after the last connection is closed (see [Setting Custom Database Instance Cache TTL](#setting-custom-database-instance-cache-time-ttl) for how to override this value).
For an application that creates and closes connections frequently, this could provide a significant speedup for connection creation, as the same catalog data can be reused across connections.
This means that only the first of multiple connections to the same database will take the time to load the MotherDuck extension, verify its signature, and fetch the catalog metadata.
```python
con1 = duckdb.connect("md:my_db") // MotherDuck catalog fetched
con2 = duckdb.connect("md:my_db") // MotherDuck catalog reused
```
```java
// Create properties with your MotherDuck token
Properties props = new Properties();
props.setProperty("motherduck_token", "");
try (var con1 = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); // MotherDuck catalog fetched
var con2 = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); // MotherDuck catalog reused
) {
// ...
}
```
```javascript
const instance = await DuckDBInstance.fromCache("md:sample_data");
const connection1 = await instance.connect();
const connection2 = await instance.connect();
```
## Setting Custom Database Instance Cache Time (TTL)
By default, connections to MotherDuck established through the database instance caching supporting DuckDB APIs will reuse the same database instance for 15 minutes after the last connection is closed.
In some cases, you may want to make that period longer (to avoid the redundant reinitialization) or shorter (to connect to the same database with a different configuration).
The database TTL value can be set either at the initial connection time, or by using the `SET` command at any point.
Any valid [DuckDB Instant part specifiers](https://duckdb.org/docs/stable/sql/functions/datepart.html#part-specifiers-usable-as-date-part-specifiers-and-in-intervals) can be used for the TTL value, for example '5s', '3m', or '1h'.
:::note
The examples below assume you have configured your MotherDuck token using one of the authentication methods described in the [Create a connection](#create-a-connection) section above.
:::
```python
con = duckdb.connect("md:my_db?dbinstance_inactivity_ttl=1h")
con.close()
# different database connection string (without `?dbinstance_inactivity_ttl=1h`), no instance cached; TTL is 15 minutes (default)
con2 = duckdb.connect("md:my_db")
# allow the database instance to expire immediately
con2.execute("SET motherduck_dbinstance_inactivity_ttl='0s'")
# the database instance can only expire after the last connection is closed
con2.close()
# new database instance with a new TTL (the 15 minute default)
con3 = duckdb.connect("md:my_db")
con3.close()
# the last TTL for this database was 15 minutes; the cached database instance will be reused
con4 = duckdb.connect("md:my_db")
```
The TTL can be set either through the connection string or through Properties. However, be careful when using Properties as the database instance cache is keyed by the connection string. This means that if you change the TTL in Properties between connections, you'll get an error as it's trying to connect to the same database with different configurations.
Here's an example that will fail:
```java
Properties props = new Properties();
props.setProperty("motherduck_dbinstance_inactivity_ttl", "2m");
// First connection works fine
try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) {
// TTL is set to 2m
}
// Changing TTL in properties will fail
props.setProperty("motherduck_dbinstance_inactivity_ttl", "5m");
try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) {
// This will throw: "Can't open a connection to same database file
// with a different configuration than existing connections"
}
```
For this reason, it's generally safer to set the TTL through the connection string:
```java
// Set TTL through connection string
try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db?dbinstance_inactivity_ttl=1h")) {
// TTL is set to 1h
}
// Different TTL creates a new instance
try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db?dbinstance_inactivity_ttl=30m")) {
// This works - creates a new instance with 30m TTL
}
// Can also set TTL using SQL
try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db");
var st = con.createStatement()) {
// allow the database instance to expire immediately
st.executeUpdate("SET motherduck_dbinstance_inactivity_ttl='0s'");
}
```
:::note
When using Properties, you must include the `motherduck_` prefix for the TTL property name (i.e., `motherduck_dbinstance_inactivity_ttl`). This prefix is only optional when passing the TTL through the connection string.
:::
## Connect to multiple databases
If you need to connect to MotherDuck and run one or more queries in succession on the same account, you can use a [single database connection](#create-a-connection). If you want to connect to another database in the same account, you can either [reuse the same connection](#example-1-reuse-the-same-duckdb-connection), or [create copies](#example-2-create-copies-of-the-initial-duckdb-connection) of the connection.
If you need to connect to multiple databases, you can either directly reuse the same `DuckDBPyConnection` instance, or create copies of the connection using the `.cursor()` method.
:::note
`FROM ` is a shorthand version of
`SELECT * FROM `.
:::
### Example 1: Reuse the same DuckDB Connection
To connect to different databases in the same MotherDuck account, you can use the same connection object and simply fully qualify the names of the tables in your query.
```python
conn = duckdb.connect("md:my_db")
res1 = conn.sql("FROM my_db1.main.tbl")
res2 = conn.sql("FROM my_db2.main.tbl")
res3 = conn.sql("FROM my_db3.main.tbl")
conn.close()
```
### Example 2: Create copies of the initial DuckDB Connection
`conn.cursor()` returns a copy of the DuckDB connection, with a reference to the existing DuckDB database instance. Closing the original connection also closes all associated cursors.
```python
conn = duckdb.connect("md:my_db")
cur1 = conn.cursor()
cur2 = conn.cursor()
cur3 = conn.cursor()
cur1.sql("USE my_db1")
cur2.sql("USE my_db2")
cur3.sql("USE my_db3")
res = []
for cur in [cur1, cur2, cur3]:
res.append(cur.sql("SELECT * FROM tbl"))
# This closes the original DuckDB connection and all cursors
conn.close()
```
:::note
`duckdb.connect(path)` creates and caches a DuckDB instance. Subsequent calls with the same path reuse this instance. New connections to the same instance are independent, similar to `conn.cursor()`, but closing one doesn't affect others. To create a new instance instead of using the cached one, make the path unique (e.g., `md:my_db?user=`).
:::
### Example 3: Create multiple connections
You can also create multiple connections to the same MotherDuck account using different DuckDB instances. However, keep in mind that each connection takes time to establish, and if connection times are an important factor for your application, it might be beneficial to consider [Example 1](#example-1-reuse-the-same-duckdb-connection) or [Example 2](#example-2-create-copies-of-the-initial-duckdb-connection).
:::note
If you need to run queries on separate connections in quick succession, instead of opening and closing a connection for every query, we recommend using a Connection Pool ([Python](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python#connection-pooling), [JDBC](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-jdbc#connection-pooling) or [NodeJS](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-nodejs#connection-pooling)).
:::
```python
conn1 = duckdb.connect("md:my_db1")
conn2 = duckdb.connect("md:my_db2")
conn3 = duckdb.connect("md:my_db3")
res1 = conn1.sql("SELECT * FROM tbl")
res2 = conn2.sql("SELECT * FROM tbl")
res3 = conn3.sql("SELECT * FROM tbl")
conn1.close()
conn2.close()
conn3.close()
```
If you need to connect to multiple databases, you typically won't need to create multiple DuckDB instances. You can either directly reuse the same `DuckDBConnection` instance, or create copies of the connection using the `.duplicate()` method.
```java
// Create connection with your MotherDuck token
Properties props = new Properties();
props.setProperty("motherduck_token", "");
try (DuckDBConnection duckdbConn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) {
Connection conn1 = duckdbConn.duplicate();
Connection conn2 = duckdbConn.duplicate();
Connection conn3 = duckdbConn.duplicate();
// ...
}
```
If you need to connect to multiple databases, you can re-use the same `DuckDBInstance` and connection.
```javascript
import { DuckDBInstance } from '@duckdb/node-api';
const conn = await DuckDBInstance.create('md:');
const result1 = await conn.runAndReadAll('FROM my_db1.main.tbl');
const result2 = await conn.runAndReadAll('FROM my_db2.main.tbl');
```
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-jdbc
---
sidebar_position: 2
title: Multithreading and Parallelism with JDBC and MotherDuck
sidebar_label: JDBC
description: Performance tuning via multithreading with multiple connections to MotherDuck with JDBC
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Multithreading and parallelism with JDBC
Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create [connections in multiple threads](#connections-in-multiple-threads). For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token.
## Connections in multiple threads
If you have multiple parallelizable queries you want to run in quick succession, you could benefit from concurrency.
:::note
Concurrency is supported by DuckDB, across multiple threads, as described in the [Concurrency](https://duckdb.org/docs/connect/concurrency.html) documentation page. However, be mindful when using this approach, as parallelism does not always lead to better performance. Read the notes on [Parallelism](https://duckdb.org/docs/guides/performance/how_to_tune_workloads.html#parallelism-multi-core-processing) in the DuckDB documentation to understand the specific scenarios in which concurrent queries can be beneficial.
:::
First, let's create a class `MultithreadingExample` and get the MotherDuck token from your environment variables.
```java
package com.example;
import org.duckdb.DuckDBConnection;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.*;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
/**
* Examples for multithreading and connection pooling
*/
public class MultithreadingExample {
private static final String token = System.getenv("motherduck_token");
private final static Logger logger = LoggerFactory.getLogger(MultithreadingExample.class);
```
To use multiple threads, pass the connection object to each thread, and create a copy of the connection with the `.duplicate()` method to run a query:
```java
private static void runQueryFromThread(String label, Connection conn, String query) {
try (Connection dupConn = ((DuckDBConnection) conn).duplicate();
Statement st = dupConn.createStatement();
ResultSet rs = st.executeQuery(query)) {
if (rs.next()) {
logger.info("{}: found at least one row", label);
} else {
logger.info("{}: no rows found", label);
}
} catch (SQLException e) {
throw new RuntimeException("can't run query", e);
}
}
```
You can then use a thread pool executor to run the queries using the `runQueryFromThread` method:
```java
public static void main(String[] args) throws SQLException, InterruptedException {
// Check that a motherduck_token exists
if (token == null) {
throw new IllegalArgumentException(
"Please provide `motherduck_token` environment variable");
}
// Add MotherDuck token to config
Properties config = new Properties();
config.setProperty("motherduck_token", token);
// Create list of queries to run in multiple threads
List queries = new ArrayList<>();
queries.add("SELECT 42;");
queries.add("SELECT 'Hello World!';");
int num_queries = queries.size();
// Create thread pool executor and run queries
ExecutorService executor = Executors.newFixedThreadPool(num_queries);
try (Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:my_db", config);) {
for (int i = 0; i < num_queries; i++) {
String label = "query " + i;
String query = queries.get(i);
executor.submit(() -> runQueryFromThread(label, mdConn, query));
}
executor.shutdown();
boolean success = executor.awaitTermination(30, TimeUnit.SECONDS);
}
if (success) {
logger.info("successfully ran {} queries in threads", num_queries);
}
}
}
```
## Connection pooling
If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token.
For connection pools, we recommend using [HikariCP](https://github.com/brettwooldridge/HikariCP). Below is an example implementation. For this implementation, you can connect to a user account by providing a `motherduck_token` in your database path.
The goal of this implementation is to distribute operations across multiple databases in a round-robin fashion. This `HikariMultiPoolManager` class manages multiple `HikariDataSource`s (connection pools) which each connect to a different connection url, and rotates between them when `getConnection()` is called. You can specify a pool size which is applied to all `HikariDataSource`s.
```java
package com.example;
import com.zaxxer.hikari.HikariDataSource;
import com.zaxxer.hikari.HikariPoolMXBean;
import org.duckdb.DuckDBConnection;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.sql.*;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
/**
* Example DuckDB connection pool implementation
*/
public class HikariMultiPoolManager implements AutoCloseable {
private static final String token = System.getenv("motherduck_token");
private final List dataSources;
private final AtomicInteger index;
private final static Logger logger = LoggerFactory.getLogger(HikariMultiPoolManager.class);
public HikariMultiPoolManager(List urls, int maximumPoolSize) {
// Create Hikari datasources from urls
this.dataSources = new ArrayList<>();
for (String url : urls) {
HikariDataSource ds = new HikariDataSource();
ds.setMaximumPoolSize(maximumPoolSize);
ds.setJdbcUrl(url);
dataSources.add(ds);
}
this.index = new AtomicInteger(0);
}
public Connection getConnection() throws SQLException {
int ind = index.getAndIncrement() % dataSources.size();
HikariDataSource ds = dataSources.get(ind);
return ds.getConnection();
}
public void evict() throws Exception {
for (HikariDataSource ds : dataSources) {
HikariPoolMXBean poolBean = ds.getHikariPoolMXBean();
if (poolBean != null) {
poolBean.softEvictConnections();
}
}
}
@Override
public void close() throws Exception {
for (HikariDataSource ds : dataSources) {
ds.close();
}
}
```
### How to set `urls`
The `HikariMultiPoolManager` takes a list of `urls` and an optional input argument `maximumPoolSize` (defaults to 1). Each path in the list will get a `HikariDataSource` in the pool, that readers can use to query the database(s) they connect to. If you have a `maximumPoolSize` that is larger than 1, the pool will return thread-safe copies of those connections. This gives you a few options on how to configure the pool.
:::note
To learn more about database instances and connections, see [Connect to multiple databases](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#connect-to-multiple-databases).
:::
To create a connection pool with 3 connections to **the same database**, you can pass a single database path, and set `maximumPoolSize=3`:
```java
List urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 3);
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `maximumPoolSize` is larger than the number of databases.
You can also create multiple connections to **the same database** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths and make them unique by adding `&user=` to the database path:
```java
List urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=1");
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=2");
urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=3");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases.
You can also create multiple connections to **separate databases** in **the same MotherDuck account** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path:
```java
List urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token + "&access_mode=read_only");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases.
You can also create multiple connections to **separate databases** in **separate MotherDuck accounts** using *different DuckDB instances*. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path:
```java
List urls = new ArrayList<>();
urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token1 + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token2 + "&access_mode=read_only");
urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token3 + "&access_mode=read_only");
HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases.
### How to run queries with a thread pool
You can then fetch connections from the pool, for example, to run queries from a queue. You can use a `ThreadPoolExecutor` with 3 workers to fetch connections from the pool and run the queries using a `run_query` function:
```java
private static String queryString(HikariMultiPoolManager pool, String query) throws SQLException {
try (Connection conn = pool.getConnection();
Statement ps = conn.createStatement();
ResultSet rs = ps.executeQuery(query)) {
logger.info("connection = {}", conn);
String res = rs.next() ? rs.getString(1) : "[not found]";
logger.info("Got: {}", res);
return res;
}
}
public static void main(String[] args) throws Exception {
if (token == null) {
throw new IllegalArgumentException(
"Please provide `motherduck_token` environment variable");
}
List queries = new ArrayList<>();
// Add queries here
// Example:
queries.add("SELECT 42;");
queries.add("SELECT 'Hello World!';");
List urls = new ArrayList<>();
// Add urls here
// Example:
urls.add("jdbc:duckdb:md:my_db?user=1&motherduck_token=" + token);
urls.add("jdbc:duckdb:md:my_db?user=2&motherduck_token=" + token);
urls.add("jdbc:duckdb:md:my_db?user=3&motherduck_token=" + token);
// Create thread pool and run queries
try(HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);) {
ExecutorService executor = Executors.newFixedThreadPool(urls.size());
for (String query : queries) {
executor.submit(() -> queryString(pool, query));
}
executor.shutdown();
boolean success = executor.awaitTermination(30, TimeUnit.SECONDS);
if (success) {
logger.info("successfully ran {} queries in threads with connection pool", queries.size());
}
}
}
}
```
Reset the connection pool at least once every 24 hours, by soft evicting all connections. This ensures that you are always running on the latest version of MotherDuck.
```java
pool.evict()
```
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-nodejs
---
sidebar_position: 3
title: Multithreading and Parallelism with NodeJS and MotherDuck
sidebar_label: NodeJS
description: Performance tuning via multithreading with multiple connections to MotherDuck with NodeJS
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Multithreading and parallelism with NodeJS
For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). Depending on the needs of your data application, you can use thread-based parallelism for improved performance, for example, if the queries are hybrid with CPU intensive work done locally. To enable thread-based parallelism, you can use [Node worker threads](https://nodejs.org/api/worker_threads.html#worker-threads) with one database connection in each thread.
If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token.
## Connection pooling
If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use, so you can avoid the overhead of creating a new database object for each query. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token.
For connection pools, we recommend using [generic-pool](https://www.npmjs.com/package/generic-pool) with [@duckdb/node-api](https://www.npmjs.com/package/@duckdb/node-api) and overriding the `release` function to delete a connection if it's been in use for too long to optimize resource usage.
First, let's create a file `md_connection_pool.js` to implement the connection pool class. Note that we are adding a new config option, `recycleTimeoutMillis`, that will help us recreate any connections (active or idle) that have been open for a given time. This is different from `idleTimeoutMillis`, which only destroys idle connections.
```javascript
import { DuckDBInstance } from "@duckdb/node-api";
import * as genericPool from "generic-pool";
export class RecyclingPool extends genericPool.Pool {
constructor(Evictor, Deque, PriorityQueue, factory, options) {
super(Evictor, Deque, PriorityQueue, factory, options);
// New _config option for when to recycle a non-idle connection
this._config['recycleTimeoutMillis'] = (typeof options.recycleTimeoutMillis == 'undefined') ? undefined : parseInt(options.recycleTimeoutMillis);
this._config['motherduckToken'] = (typeof options.motherduckToken == 'undefined') ? undefined : options.motherduckToken;
console.log('Creating a RecyclingPool');
}
release(resource) {
const loan = this._resourceLoans.get(resource);
const creationTime = typeof loan == 'undefined' ? 0 : loan.pooledResource.creationTime;
// If the connection has been in use for longer than the recycleTimeoutMillis, then destroy it instead of releasing it back into the pool.
// If that deletion brings the pool size below the min, a new connection will automatically be created within the destroy method.
if (new Date(creationTime + this._config.recycleTimeoutMillis) <= new Date()) {
return this.destroy(resource);
}
return super.release(resource);
}
}
```
You can then create an `MDFactory` class to create the connection in the pool, and use it with `createRecyclingPool` (equivalent to the `createPool` function from `generic-pool`).
```javascript
export class MDFactory {
constructor(opts) {
this.opts = opts
}
async create() {
console.log("Creating a connection");
const instance = await DuckDBInstance.create(`md:my_db?motherduck_token=` + this.opts.motherduckToken);
const connection = await instance.connect();
// Run any connection initialization commands here
// For example, you can set THREADS = 1 if you want to limit duckdb to run on a single thread
await connection.run("SET THREADS='1';");
return connection;
}
async destroy(connection) {
console.log("Destroying a connection");
return connection.close();
}
};
export function createRecyclingPool(config) {
const factory = new MDFactory(config);
return new RecyclingPool(genericPool.DefaultEvictor, genericPool.Deque, genericPool.PriorityQueue, factory, config);
}
```
To try out the connection pool, you can create a file `md_connection_pool_test.js` that creates a `RecyclingPool` and submits a list of queries.
To create the pool instance, first set the configuration options specified by `generic-pool` and pass them to the `createRecyclingPool` function. You can find the list of options in the [docs](https://www.npmjs.com/package/generic-pool). Below are a few example values that we recommend for using with MotherDuck.
```javascript
import { createRecyclingPool } from "./md_connection_pool.js";
// If an idle eviction would bring us below the min pool size, a new connection is made after the eviction
const opts = {
max: 10,
min: 3,
// Background idle connection detruction process runs every evictionRunIntervalMillis
// We don't want all connections to be evicted at the same time, so only destroy one at a time
// Connection must be idle for softIdleTimeoutMillis before it is recycled.
// (Additionally, we implemented recycleTimeoutMillis to also recycle active connections.)
evictionRunIntervalMillis: 30000,
numTestsPerEvictionRun: 1,
softIdleTimeoutMillis: 90000,
// Do not start to use a connection that is older than 20 minutes old. Recreate it first.
// Set this higher than recycleTimeoutMillis below so that recycling will happen proactively rather than delay query execution.
idleTimeoutMillis: 1200000,
// Before returning resource to pool, check if it has been in existence longer than this timeout and if so, destroy it.
// New connections will be added up to the min pool size during the destroy process, so this is proactive rather than reactive.
recycleTimeoutMillis: 900000,
// We don't want all the connections to recycle at the same time, so let's randomize it slightly.
// This number should be smaller than the recycleTimeoutMillis
recycleTimeoutJitter: 60000,
// This gets your MotherDuck token from an environment variable.
motherduckToken: process.env.motherduck_token,
};
const myPool = createRecyclingPool(opts);
```
Then, you can use the pool to asynchronously acquire connections from the pool and run a list of queries.
```javascript
let promiseArray = [];
let queries = ["SELECT 42", "SELECT 'Hello World!'"];
for (let i=0; i < queries.length; i++) {
// Promise is resolved once a resource becomes available
console.log("Acquire connection from pool");
promiseArray.push(myPool.acquire());
promiseArray[i]
.then(async function(client) {
console.log("Starting query");
const results = await client.all(queries[i]);
console.log("Results: ", results[0]);
await new Promise(r => setTimeout(r, 200)); // Delay for testing
// Release the connection (or destroy if it exceeds recycleTimeoutMillis)
myPool.release(client);
})
.catch(function(err) {
console.log(err)
});
}
```
You can easily create additional connection pools that connect to different MotherDuck databases by changing the MotherDuck token.
```javascript
const opts2 = { ...opts, motherduckToken: process.env.motherduck_token_2};
const myPool2 = createRecyclingPool(opts2);
```
To shutdown and stop using a pool, you can optionally run the following code in your application:
```javascript
myPool.drain().then(function() {
myPool.clear();
});
```
To test the pool, run:
```bash
npm install @duckdb/node-api
npm install generic-pool
export motherduck_token="" # Add your MotherDuck token here
node md_connection_pool_test.js
```
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python
---
sidebar_position: 1
title: Multithreading and Parallelism with Python and MotherDuck
sidebar_label: Python
description: Performance tuning via multithreading with multiple connections to MotherDuck with Python
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Multithreading and parallelism with Python
Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create [connections in multiple threads](#connections-in-multiple-threads). For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token.
## Connections in multiple threads
If you have multiple parallelizable queries you want to run in quick succession, you could benefit from concurrency.
:::note
Concurrency is supported by DuckDB, across multiple Python threads, as described in the [Multiple Python Threads](https://duckdb.org/docs/guides/python/multiple_threads.html) documentation page. However, be mindful when using this approach, as parallelism does not always lead to better performance. Read the notes on [Parallelism](https://duckdb.org/docs/guides/performance/how_to_tune_workloads.html#parallelism-multi-core-processing) in the DuckDB documentation to understand the specific scenarios in which concurrent queries can be beneficial.
:::
A single DuckDB connection [is not thread-safe](https://duckdb.org/docs/api/python/overview.html#using-connections-in-parallel-python-programs). To use multiple threads, pass the connection object to each thread, and create a copy of the connection with the `.cursor()` method to run a query:
```python
import duckdb
from threading import Thread
duckdb_con = duckdb.connect('md:my_db')
def query_from_thread(duckdb_con, query):
cur = duckdb_con.cursor()
result = cur.execute(query).fetchall()
print(result)
cur.close()
queries = ["SELECT 42", "SELECT 'Hello World!'"]
threads = []
for i in range(len(queries)):
threads.append(Thread(target = query_from_thread,
args = (duckdb_con, query,),
name = 'query_' + str(i)))
for thread in threads:
thread.start()
for thread in threads:
thread.join()
```
## Connection pooling
If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token.
For connection pools, we recommend using [SQLAlchemy](https://docs.sqlalchemy.org/14/core/pooling.html). Below is an example implementation. For this implementation, you can connect to a user account by providing a `motherduck_token` in your database path.
```python
import logging
from itertools import cycle
from threading import Lock
import duckdb
import sqlalchemy.pool as pool
from sqlalchemy.engine import make_url
_log = logging.getLogger(__name__)
logging.basicConfig(level=logging.DEBUG)
class DuckDBPool(pool.QueuePool):
"""Connection pool for DuckDB databases (MD or local).
When you run con = pool.connect(), it will return a cached copy of one of the
database connections in the pool.
When you run con.close(), it doesn't close the connection, it just
returns it to the pool.
Args:
database_paths: A list of unique databases to connect to.
"""
def __init__(
self,
database_paths,
max_overflow=0,
timeout=60,
reset_on_return=None,
*args,
**kwargs
):
self.database_paths = database_paths
self.gen_database_path = cycle(database_paths)
self.pool_size = kwargs.pop("pool_size", len(database_paths))
self.lock = Lock()
super().__init__(
self._next_conn,
*args,
max_overflow=max_overflow,
pool_size=self.pool_size,
reset_on_return=reset_on_return,
timeout=timeout,
**kwargs
)
def _next_conn(self):
with self.lock:
path = next(self.gen_database_path)
duckdb_conn = duckdb.connect(path)
url = make_url(f"duckdb:///{path}")
_log.debug(f"Connected to database: {url.database}")
return duckdb_conn
```
### How to set `database_paths`
The `DuckDBPool` takes a list of `database_paths` and an optional input argument `pool_size` (defaults to the number of paths). Each path in the list will get a DuckDB connection in the pool, that readers can use to query the database(s) they connect to. If you have a `pool_size` that is larger than the number of paths, the pool will return thread-safe copies of those connections. This gives you a few options on how to configure the pool.
:::note
To learn more about database instances and connections, see [Connect to multiple databases](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#connect-to-multiple-databases).
:::
To create a connection pool with 3 connections to **the same database**, you can pass a single database path, and set `pool_size=3`:
```python
path = "md:my_db?motherduck_token=&access_mode=read_only"
conn_pool = DuckDBPool([path], pool_size=3)
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases.
You can also create multiple connections to **the same database** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths and make them unique by adding `&user=` to the database path:
```python
paths = [
"md:my_db?motherduck_token=&access_mode=read_only&user=1",
"md:my_db?motherduck_token=&access_mode=read_only&user=2",
"md:my_db?motherduck_token=&access_mode=read_only&user=3",
]
conn_pool = DuckDBPool(paths)
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases.
You can also create multiple connections to **separate databases** in **the same MotherDuck account** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path:
```python
paths = [
"md:my_db1?motherduck_token=&access_mode=read_only",
"md:my_db2?motherduck_token=&access_mode=read_only",
"md:my_db3?motherduck_token=&access_mode=read_only",
]
conn_pool = DuckDBPool(paths)
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases.
You can also create multiple connections to **separate databases** in **separate MotherDuck accounts** using *different DuckDB instances*. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path:
```python
paths = [
"md:my_db1?motherduck_token=&access_mode=read_only",
"md:my_db2?motherduck_token=&access_mode=read_only",
"md:my_db3?motherduck_token=&access_mode=read_only",
]
conn_pool = DuckDBPool(paths)
```
Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases.
### How to run queries with a thread pool
You can then fetch connections from the pool, for example, to run queries from a queue. You can use a `ThreadPoolExecutor` with 3 workers to fetch connections from the pool and run the queries using a `run_query` function:
```python
from concurrent.futures import ThreadPoolExecutor
def run_query(conn_pool: DuckDBPool, query: str):
_log.debug(f"Run query: {query}")
conn = conn_pool.connect()
rows = conn.execute(query)
res = rows.fetchall()
conn.close()
_log.debug(f"Done running query: {query}")
return res
with ThreadPoolExecutor(max_workers=3) as executor:
conn_pool = DuckDBPool(database_paths)
futures = [executor.submit(run_query, conn_pool, query) for query in queries]
for future, query in zip(futures, queries):
result = future.result()
print(f"Query [{query}] num rows: {len(result)}")
```
Reset the connection pool at least once every 24 hours, by closing and reopening all connections. This ensures that you are always running on the latest version of MotherDuck.
```python
conn_pool.dispose()
conn_pool.recreate()
```
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism
---
title: Multithreading and parallelism
description: Learn how to use multithreading and parallelism for special cases to read data from MotherDuck
---
DuckDB supports two concurrency models:
- Single-process read/write where one process can both read and write to the database.
- Multi-process read-only (access_mode = 'READ_ONLY') multiple processes can read from the database, but none can write.
This approach provides significant performance benefits for analytics databases. You can find more details on how to handle multiple process writes (or multiple read + write connections) in the [DuckDB documentation](https://duckdb.org/docs/stable/connect/concurrency.html).
## Closing Database Instances
Python snippets showing how to close database instances are shown below:
```py
con = duckdb.connect("md:my_db?cache_buster=123", config={"motherduck_token": my_other_token})
```
Or you can set the `dbinstance_inactivity_ttl` setting to zero:
```py
con = duckdb.connect("md:my_db", config={"motherduck_token": token})
con.sql("SET motherduck_dbinstance_inactivity_ttl='0ms'")
```
Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create connections in multiple threads. For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use connection pooling. Implementation details can be seen in the cards linked below:
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/read-scaling
---
title: Read Scaling
description: Learn how to scale your data applications using read scaling tokens
---
Connecting read-heavy applications or BI tools with many concurrent users through a single MotherDuck account can sometimes lead to performance bottlenecks. By default, all connections using the same account share a single cloud DuckDB instance, called a "duckling". In addition to your read/write duckling, you can use Read Scaling to spin up additional read-only ducklings for read-heavy workloads.
These replicas are **eventually consistent**. Results may lag a few minutes behind the latest database state. This tradeoff prioritizes high availability and performance while achieving near real-time synchronization across all replicas.

## Configuring a Read Scaling Duckling Pool
### Creating a Read Scaling token
To use Read Scaling, you use a read scaling access token from the **MotherDuck UI** when [generating an access token][md-access-token] or via the [REST API](/docs/sql-reference/rest-api/users-create-token/).
### Connect with a read scaling token
Once you have a read scaling token, you can use it to connect to MotherDuck from any DuckDB client as you would with any other authorization token. See [Connecting to MotherDuck](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#read-scaling-with-session-hints).
### Duckling Assignment
Read scaling ducklings remain idle until a connection is initialized from a DuckDB client. When a DuckDB client connects to MotherDuck with a read scaling token, the connection is assigned to one of the read scaling replicas. As more users connect, additional ducklings are spun up until you reach your Read Scaling Duckling Pool size.
If the number of connections exceeds your pool size, new connections are assigned to existing ducklings in a round-robin fashion.
The default Read Scaling Duckling Pool Size is 4 and can be increased up to 16. This is a soft limit, so if you need more ducklings in your pool, please [contact support](https://motherduck.com/contact-us/support/).
### Permissions
A read scaling token grants permission for **read operations** (`SELECT`) while restricting write and administrative operations (updating tables, creating new databases, attaching or detaching databases).
## Ensuring Data Freshness
In read scaling mode, ducklings sync changes from the primary read-write instance within a few minutes which works for most use cases.
If your application requires stricter synchronization, you can manually trigger updates to be more frequent by:
1. Calling [CREATE SNAPSHOT](/sql-reference/motherduck-sql-reference/create-snapshot.md) on the writer duckling
2. Calling [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md) on any read scaling ducklings
This approach guarantees that readers see the most recent snapshot.
::::warning[Watch Out]
Creating a snapshot of a database will interrupt any ongoing queries interacting with that database.
::::
## Best Practices
Here are a few tips to get the most out of MotherDuck's read scaling capabilities.
### Optimize your Read Scaling Duckling Pool size
For the best experience, aim for one duckling per concurrent user to take advantage of DuckDB's single-node power and efficiency. You can scale up as much as you need by configuring a maximum pool size based on expected concurrency and cost considerations. Users are also able to share ducklings if needed. While the default limit is 16 replicas, this is a soft limit. [Get in touch with MotherDuck support](https://motherduck.com/contact-us/support/) if you need more.
### Leverage local processing where possible
Consider using DuckDB WASM to run client instances directly in the browser when possible to fully utilize client resources.
### Maintain user-duckling affinity with `session_hint`
To ensure users consistently connect to the same replica (improving caching and consistency), the DuckDB connection string supports the [`session_hint`](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#read-scaling-with-session-hints) parameter:
- Clients providing the same `session_hint` value are directed to the same replica. This improves caching effectiveness, provides a more consistent view of data across queries for that user and offers better isolation between concurrent users.
- This parameter can be set to the ID of a user session, a user ID, or a hashed value for privacy.
By leveraging read scaling tokens and `session_hint`, you can efficiently scale read operations and group user sessions for optimal performance.
### Instance caching with `dbinstance_inactivity_ttl`
Some DuckDB client library integrations support an *instance cache* to keep connections to the same database instance alive for a short period after use. This improves read scaling by helping maintain session affinity even across separate queries or short connection gaps. This caching behavior boosts the effectiveness of `session_hint`, making it more likely that frequent queries from the same client land on the same duckling, even with short breaks between connections. See [Connecting to MotherDuck](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#setting-custom-database-instance-cache-time-ttl) for more details.
[md-access-token]: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token
---
Source: https://motherduck.com/docs/key-tasks/cloud-storage/cloud-storage
---
title: Interacting with cloud storage
description: Learn how to work with databases and MotherDuck
---
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/key-tasks/cloud-storage/querying-s3-files
---
sidebar_position: 5
title: Querying Files in Amazon S3
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Since MotherDuck is hosted in the cloud, one of the benefits of MotherDuck is better and faster interoperability with Amazon S3. MotherDuck's "hybrid mode" automatically routes queries that query Amazon S3 to MotherDuck's execution runtime in the cloud rather than executing these queries locally.
:::note
MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2).
:::
MotherDuck supports the [DuckDB dialect](https://duckdb.org/docs/guides/import/s3_import) to query data stored in Amazon S3. Such queries are automatically routed to MotherDuck's cloud execution engines for faster and more efficient execution.
Here are some examples of querying data in Amazon S3:
```sql
SELECT * FROM read_parquet('s3:///');
SELECT * FROM read_parquet(['s3:///', ... ,'s3:///']);
SELECT * FROM read_parquet('s3:///*');
SELECT * FROM 's3:////*';
SELECT * FROM iceberg_scan('s3:///', ALLOW_MOVED_PATHS=true);
SELECT * FROM delta_scan('s3:///');
```
See [Apache Iceberg](/integrations/file-formats/apache-iceberg.mdx) for more information on reading Iceberg data.
See [Delta Lake](/integrations/file-formats/delta-lake.mdx) for more information on reading Delta Lake data.
## Accessing private files in Amazon S3
Protected Amazon S3 files require an AWS access key and secret. You can configure MotherDuck using [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md)
### SSL Certificate Verification and S3 Bucket Names
Because of SSL certificate verification requirements, S3 bucket names that contain dots (.) cannot be accessed using virtual-hosted style URLs. This is due to AWS's SSL wildcard certificate (*.s3.amazonaws.com) which only validates single-level subdomains. When a bucket name contains dots, it creates multi-level subdomains that don't match the wildcard pattern, causing SSL verification to fail.
If your bucket name contains dots, you have two options:
1. **Rename your bucket** to remove dots (e.g., use dashes instead)
2. **Use path-style URLs** by adding the `URL_STYLE 'path'` option to your secret:
```sql
CREATE OR REPLACE SECRET my_secret IN MOTHERDUCK (
TYPE s3,
URL_STYLE 'path',
SCOPE 's3://my.bucket.with.dots'
);
```
For more information, see [Amazon S3 Virtual Hosting documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html).
---
Source: https://motherduck.com/docs/key-tasks/cloud-storage/writing-to-s3
---
sidebar_position: 5
title: Writing Data to Amazon S3
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
You can use MotherDuck to transform files on Amazon S3 or export data from MotherDuck to Amazon S3.
:::note
MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2).
:::
MotherDuck supports the [DuckDB dialect](https://duckdb.org/docs/guides/import/s3_export) to write data to Amazon S3. The examples here write data in Parquet format, for more options refer to the [documentation for DuckDB's COPY command](https://duckdb.org/docs/stable/sql/statements/copy.html).
## Syntax
```sql
COPY TO 's3:///[]/';
```
## Example usage
```sql
-- write entire ducks_table table to parquet file in S3
COPY ducks_table to 's3://ducks_bucket/ducks.parquet';
-- writing the output of a query will also work
COPY (SELECT * FROM ducks_table LIMIT 100) to 's3://ducks_bucket/ducks_head.parquet';
```
---
Source: https://motherduck.com/docs/key-tasks/customer-facing-analytics/3-tier-cfa-guide
---
title: Customer-Facing Analytics Guide (3-tier Architecture)
sidebar_label: Builder's Guide
description: Step-by-step guide to building a 3-tier customer-facing analytics application with MotherDuck.
slug: /key-tasks/customer-facing-analytics/3-tier-cfa-guide/
---
To build a **Customer-Facing Analytics (CFA) application** on MotherDuck, use this step-by-step guide. This guide will focus on patterns for traditional 3-tier architecture, but you can also run 1.5-tier apps using Wasm, as seen [here](/getting-started/customer-facing-analytics/#15-tier-architecture-duckdb-wasm).
You'll know you're done when:
- Your application (`B2B Tool`) can run analytics queries for a customer (`Goose Inc`) against MotherDuck from a backend service.
- Data from a transactional database is synced into a per-customer MotherDuck database on a schedule using your orchestrator.
- You understand when to add more service accounts, databases, and read-scaling capacity as your product grows.
Use this guide when you want to:
- Build a 3-tier web app (browser → app server → MotherDuck) with embedded analytics.
- Use per-customer service accounts and databases to isolate data and compute.
- Keep analytics data in MotherDuck in sync with your transactional database.
Before starting, ensure you have:
- A MotherDuck account and an organization you can use for development.
- Basic familiarity with Python and SQL.
- Access to a PostgreSQL database (or a test instance) with an `orders`-style schema.
- Python installed locally (DuckDB is compatible with the latest Python LTS version).
> This guide assumes you've read the conceptual overview [**Customer-Facing Analytics Getting Started**](/getting-started/customer-facing-analytics).
## 1. Understand the 3-Tier CFA Architecture
In this guide, you are building `B2B Tool`, a SaaS product that serves analytics to employees at many customer companies. Each customer company gets:
- Its own **service account** in MotherDuck.
- Its own **database(s)** for analytics tables.
- Its own **compute** (Ducklings) for queries and data loading.
Your high-level architecture:
```mermaid
graph LR;
subgraph Users["End Users"]
U1["Kate (Goose Inc)"];
U2["John (Goose Inc)"];
U3["Hari (Duck Co)"];
end
subgraph App["Your Application"]
FE["Frontend"];
BE["Backend API"];
TX["Transactional DB"];
end
MDORG["MotherDuck"];
U1 --> FE;
U2 --> FE;
U3 --> FE;
FE -->|"HTTP / JSON APIs"| BE;
BE -->|"User + Company lookup"| TX;
BE -->|"Analytics queries via service account tokens"| MDORG;
style U1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style U2 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style U3 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style FE fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style BE fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style TX fill:#ede7f6,stroke:#5e35b1,stroke-width:2px
style MDORG fill:#fff3e0,stroke:#f57c00,stroke-width:2px
```
Hyper-tenancy here means each company (`Goose Inc`, `Duck Co`) owns its MotherDuck database(s) (that store only that company’s analytics data), that compute is isolated (each company has its own ducklings) and heavy workloads for one customer cannot slow down others.
You will:
1. Set up a dev organization and add other developers on the team.
2. Create a service account for your first customer company (`Goose Inc`).
3. Sync data from your transactional DB, such as Postgres, into Goose Inc’s MotherDuck analytical database using your chosen replication method.
4. Connect your backend service to MotherDuck with a **read token** to serve analytics queries.
5. Plan how to scale to many customer companies and higher concurrency.
### Alternative to per-customer service accounts
The per-customer service account pattern is the strongest isolation model. Some teams, especially B2C or lighter multi-tenant apps, opt for a simpler setup:
- Keep a **single writer service account** that owns all customer databases.
- Create a **[read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)** for that account and configure the flock size to target one duckling per concurrent end user (default max 16, adjustable via support). For cost control, users can share a duckling, but that increases contention.
- Have each end user connect in **[single attach mode](/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/)** to the one database they should see (`md:?attach_mode=single`), which avoids carrying other attachments from the workspace.
- Use [`session_hint`](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#read-scaling-with-session-hints) in the connection string to keep an end user pinned to the same read-scaling duckling for cache reuse and steadier latency.
This model trades away service-account isolation in favor of operational simplicity. Ensure your security and compliance needs allow a shared service account before choosing it.
Read-scaling replicas are eventually consistent. If you need fresher reads on demand, combine `CREATE SNAPSHOT` on the writer with `REFRESH DATABASE` on the read-scaling connections.
Example connection string for an end user:
```text
md:customer_db?attach_mode=single&session_hint=
```
## 2. Set Up Your Dev Environment and Organization
Prepare your dev environment:
1. **Create your dev organization and account**
1. Go to `https://motherduck.com` and sign up or log in with your work email (for example, `manager@b2btool.com`).
2. Create or select an organization you’ll use for development (for example, `B2B Tool Co`).
3. In the MotherDuck UI, open the default database (`my_db`) and confirm you can run a simple query such as:
```sql
SELECT 1;
```
You should see a single row with the value `1`.
2. **Upload a small CSV to confirm data ownership and access**
1. In the MotherDuck web UI, upload a small example CSV (for example, `orders_sample.csv`) into `my_db`. If this step is unclear, check out the [MotherDuck tutorial on loading data](/getting-started/e2e-tutorial/part-2/#loading-your-data).
2. Run a query like:
```sql
SELECT COUNT(*) AS row_count FROM orders_sample;
```
You should see the number of rows you uploaded.
3. **Invite a second developer and share data**
1. Invite `devlead@b2btool.com` to your `B2B Tool Co` organization.
2. Create a new database in your personal account (for example, `b2btool_dev`) and copy or create a simple table.
3. Share that database with your colleague following the [**Sharing Data** guide](/key-tasks/sharing-data/sharing-overview/).
4. Ask your colleague to query the shared database from their account.
At this point:
- You have a dev org with two human users.
- You’ve seen how database ownership and read-only sharing works.
Conceptually, your dev setup looks like this:
```mermaid
graph LR;
DM["devlead@b2btool.com"] <-->|"read/write"| DB1[("DB: b2btool_dev")];
DB1 -->|"read only"| DC["Colleague"];
style DM fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style DC fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style DB1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
```
## 3. Create a Service Account for a Customer Company
For customer-facing analytics, your customers usually do **not** log into MotherDuck directly. Instead:
- Your application mediates access.
- Each customer company gets a **service account** in your MotherDuck organization.
- Your backend uses that service account’s tokens to load and query data.
In this guide, you’ll create a service account for your first customer company: `Goose Inc`.
### 3.1 Create a service account in the UI
1. In the MotherDuck UI, go to the **Service Accounts** section for your organization.
2. Click **Create Service Account**.
3. Name it something like `goose-inc-service-account`.
4. Save the generated access token in your secret manager or a secure store.
For more detail, see the [**Service Accounts Guide**](/key-tasks/service-accounts-guide/).
### 3.2 (Optional) Create service accounts via REST API
Later, you will likely automate service account creation. To create a service account programmatically:
- Use the [`users-create-service-account`](/sql-reference/rest-api/users-create-service-account/) REST API endpoint.
- Use the [`users-create-token`](/sql-reference/rest-api/users-create-token/) endpoint to create an access token for that service account.
Your provisioning workflow should: **(1)** detect a new customer signup, **(2)** call `users-create-service-account` for that company, **(3)** call `users-create-token`, and **(4)** store the token metadata (or an alias) in your transactional database so your backend can look it up later.
## 4. Model and Load Customer Data in MotherDuck
Next, populate data for `Goose Inc` into its own MotherDuck database.
Assume:
- Your transactional system (`B2B Tool`) uses PostgreSQL.
- Each customer company is an ecommerce store with:
- `orders` table: order-level facts.
- `fulfillments` table: shipment or delivery events.
Example schema:
```sql
CREATE TABLE orders (
order_id BIGINT PRIMARY KEY,
company_id BIGINT,
order_date TIMESTAMP,
customer_email TEXT,
total_amount NUMERIC(18, 2),
status TEXT
);
CREATE TABLE fulfillments (
fulfillment_id BIGINT PRIMARY KEY,
order_id BIGINT REFERENCES orders(order_id),
fulfilled_at TIMESTAMP,
carrier TEXT,
status TEXT
);
```
Example data:
```sql
INSERT INTO orders
SELECT
row_number() OVER () AS order_id,
(random() * 9 + 1)::BIGINT AS company_id,
current_timestamp - INTERVAL (random() * 365) DAY AS order_date,
'customer' || (random() * 999 + 1)::INT || '@example.com' AS customer_email,
(random() * 9999 + 1)::NUMERIC(18, 2) AS total_amount,
(['pending', 'processing', 'shipped', 'delivered', 'cancelled'])[(random() * 4)::INT + 1] AS status
FROM range(1000);
INSERT INTO fulfillments
SELECT
row_number() OVER () AS fulfillment_id,
(random() * 999 + 1)::BIGINT AS order_id,
current_timestamp - INTERVAL (random() * 300) DAY AS fulfilled_at,
(['UPS', 'FedEx', 'USPS', 'DHL', 'Amazon Logistics'])[(random() * 4)::INT + 1] AS carrier,
(['pending', 'in_transit', 'out_for_delivery', 'delivered', 'failed'])[(random() * 4)::INT + 1] AS status
FROM range(1000);
```
:::info
Use your [orchestrator](/integrations/orchestration/) and [ingestion tool](/integrations/ingestion/) to keep this data in sync for each customer company.
:::
### 4.1 Create a MotherDuck database for `Goose Inc`
Use the `Goose Inc` service account’s token to create a database for that customer:
```sql
CREATE DATABASE goose_inc;
```
Run this in the UI after impersonating the `Goose Inc` service account or connect as that service account from Python and issue the `CREATE DATABASE` statement.
:::note
To move forward, replicate your data into `goose_inc`. [This page](/key-tasks/data-warehousing/Replication/postgres/) shows a simple example for replicating a Postgres database to MotherDuck.
:::
## 5. Run Analytics Queries from Your Backend
With data in Goose Inc’s MotherDuck database, your backend can run analytics queries.
At a high level:
1. Your user (`Kate` at Goose Inc) logs into `B2B Tool`.
2. Your backend authenticates Kate and determines she belongs to the `Goose Inc` customer company.
3. Your backend looks up Goose Inc’s **read token** for its service account from your transactional database or secret store.
4. Your backend uses that read token to run analytics queries against the `goose_inc` database in MotherDuck.
### 5.1 Create a read token for Goose Inc
For production, you’ll usually create a token dedicated to **reading** analytics data:
1. In the MotherDuck UI, impersonate the Goose Inc service account.
2. Create a new access token intended only for read workloads.
3. Store this token securely and associate it with Goose Inc in your transactional database.
You can also create tokens via the REST API using the [`users-create-token`](/sql-reference/rest-api/users-create-token/) endpoint.
### 5.2 Connect from Python using DuckDB
Your backend service connects to MotherDuck using the DuckDB client and the `md:` connection string. Typically, you:
- Set the `MOTHERDUCK_TOKEN` (or `motherduck_token`) environment variable to the Goose Inc read token.
- Connect to the `goose_inc` database using DuckDB.
Example helper in your backend (for example, `analytics_client.py`):
```python
import os
import duckdb
def get_customer_connection(customer_id: str):
"""
Get a DuckDB connection to a customer's MotherDuck database.
Args:
customer_id: Identifier for the customer (e.g., 'goose_inc', 'duck_co')
Returns:
DuckDB connection to the customer's database
"""
# Look up the customer's read token from your secret store or environment
# In production, you'd fetch this from your transactional DB or secret manager
token_env_var = f"{customer_id.upper().replace('-', '_')}_READ_TOKEN"
read_token = os.environ.get(token_env_var)
if not read_token:
raise ValueError(f"Read token not found for customer: {customer_id}")
# Set the token for this connection
os.environ["MOTHERDUCK_TOKEN"] = read_token
# Connect to the customer's database on MotherDuck
# Database name typically matches the customer_id
conn = duckdb.connect(f"md:{customer_id}")
return conn
```
Then, a simple analytics function in your API service:
```python
def get_customer_kpis(customer_id: str):
conn = get_customer_connection(customer_id)
query = """
SELECT
date_trunc('day', order_date) AS day,
COUNT(*) AS orders_count,
SUM(total_amount) AS gross_revenue
FROM orders
WHERE order_date >= current_date - INTERVAL 30 DAY
GROUP BY 1
ORDER BY 1
"""
result = conn.execute(query).fetch_df()
# Convert to JSON-serializable structure for your frontend
return result.to_dict(orient="records")
```
Expose this from a REST endpoint such as `/api/customers/{customer_id}/kpis` and render the results in your frontend dashboards. The same code works for any customer by passing their identifier.
The runtime query flow looks like:
```mermaid
sequenceDiagram
participant User as Kate (Goose Inc)
participant FE as B2B Tool Frontend
participant BE as B2B Tool Backend
participant MD as MotherDuck (Goose Inc DB)
User->>FE: Opens analytics dashboard
FE->>BE: GET /api/customers/goose-inc/kpis
BE->>BE: Lookup Goose Inc read token
BE->>MD: Analytics query using DuckDB + md:goose_inc
MD-->>BE: Result rows
BE-->>FE: JSON KPIs
FE-->>User: Render charts
```
## 6. Scaling to Many Customer Companies
As your product grows, add more customer companies. For each new company:
1. **Create a service account** (via UI or REST API).
2. **Create one or more databases** for that company’s analytics data.
3. **Configure your orchestrator** to run a `dlt` pipeline (or equivalent) for that company.
4. **Create a read token** for the company and store it in your transactional database.
Your architecture naturally scales horizontally:
```mermaid
graph LR;
subgraph Org["Your MotherDuck Org"]
SA1["Service Account: Goose Inc"];
SA2["Service Account: Swan Gmbh"];
SA3["Service Account: Duck Co"];
DB1["DB: goose_inc"];
DB2["DB: swan_gmbh"];
DB3["DB: duck_co"];
end
SA1 --> DB1;
SA2 --> DB2;
SA3 --> DB3;
style Org fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style SA1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style SA2 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style SA3 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style DB1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style DB2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style DB3 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
```
Each service account and database pair has its own compute, minimizing noisy neighbors and making performance a per-customer concern.
## 7. Scaling a Single Customer to High Concurrency
When a customer (for example, `Goose Inc`) grows to hundreds or thousands of simultaneous users, use these levers:
1. **Increase the Duckling size** for the service account’s default compute Duckling to handle heavier transformation jobs (vertical scaling).
2. **Use read scaling** for high-concurrency read workloads:
- Refer to [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to create read-scaling Ducklings for Goose Inc's read token.
- Point your backend’s analytics queries at the read-scaling token instead of the main read/write token.
3. **Optimize queries and models**:
- Pre-aggregate frequently-used metrics.
- Use summary tables to avoid scanning the full `orders` table on every request.
For most applications, you start with a single Duckling per customer and introduce read scaling only when your monitoring shows sustained high concurrency or latency issues.
## 8. Troubleshooting and When to Add More Service Accounts
As you operate your CFA deployment, you may run into several common situations.
### 8.1 Queries are slow or time out for one customer
If you see slow queries or timeouts for a specific customer:
- **Check query patterns**:
- Are you scanning too much data on every request?
- Can you pre-aggregate or cache common metrics?
- **Scale compute for that customer**:
- Increase the size for the service account’s Duckling.
- Add read-scaling Ducklings OR increase the Duckling size used for the read token used by that customer.
You rarely need to change the number of service accounts in this case; focus on scaling and optimizing the existing one.
### 8.2 Data loads interfere with reads
If your hourly (or more frequent) data load jobs are locking tables and causing read queries to queue:
- Consider:
- Scheduling heavy load jobs during off-peak times.
- Using zero-copy cloning (`CREATE SNAPSHOT` and `REFRESH DATABASE`) patterns so that readers query a snapshot database while writers update the primary.
- Ensure you are using a **dedicated read token** and read-scaling configuration for user-facing queries.
### 8.3 When to add more service accounts
In most B2B scenarios:
- You create **one service account per customer company**.
- All users at that company share the same analytics data and compute via your application.
You should consider adding **additional service accounts** when:
- You need hard isolation between different environments (for example, separate service accounts for `Prod`, `Staging`, and `Sandbox` within the same customer).
- A customer has sub-tenants of their own and you want to isolate compute and data at that sub-tenant level (for example, separate service accounts per region or per major business unit).
When you add new service accounts:
1. Create the service account (UI or REST API).
2. Create dedicated databases for the new scope.
3. Create tokens and wire them into your application’s configuration.
### 8.4 Common token and permission issues
If you see authentication or permission errors:
- **Token expired or revoked**:
- Rotate the token in MotherDuck and update your secret store.
- **Permission denied on database or table**:
- Confirm that the service account owns the database or has the necessary privileges.
- Re-check sharing settings if you are using shared data.
## 9. Next Steps
Once you have a basic 3-tier CFA deployment working:
- **Automate provisioning**:
- Automate service account and token creation using the [REST APIs](/sql-reference/rest-api/motherduck-rest-api/).
- Automate database and schema creation for new customer companies.
- **Automate data loading**:
- Move your `dlt` jobs fully into your orchestrator so that new companies are onboarded with little manual work.
- Monitor load durations and adjust scheduling as your data grows.
- **Enhance your frontend**:
- Add charts and drill-downs powered by MotherDuck.
- Consider additional guides under `Customer-Facing Analytics` for advanced topics in your docs set.
For a high-level conceptual overview and architecture comparison, see the [**Customer-Facing Analytics Getting Started**](/getting-started/customer-facing-analytics/) page.
---
Source: https://motherduck.com/docs/key-tasks/customer-facing-analytics/customer-facing-analytics
---
sidebar_position: 14
title: Building your First App with Customer-Facing Analytics
sidebar_label: Customer-Facing Analytics
---
To build your first application with **Customer-Facing Analytics (CFA)** on MotherDuck, use this overview as a starting point.
You'll know you're done when:
- Each of your customer tenants (or organizations) has its own service account and database(s) in MotherDuck.
- Your application can query customer-specific analytics data with predictable performance and isolation.
- You understand which detailed guide to follow next for implementation.
Use this overview to choose a **tenancy model** and learn the building blocks before the step-by-step 3-tier guide.
## Customer Provisioning
Every [Duckling](https://motherduck.com/blog/scaling-duckdb-with-ducklings/) is an isolated bucket of compute. For Customer-Facing Analytics, this usually means:
- Each **customer tenant or organization** has **one service account** dedicated to serving analytics (and often also ingestion and transformation).
- Your backend mediates all access; customers typically do not log into MotherDuck directly.
You manage service accounts and tokens using:
- [`users-create-service-account`](/sql-reference/rest-api/users-create-service-account/) – create a service account per customer tenant.
- [`users-create-token`](/sql-reference/rest-api/users-create-token/) – create tokens for ingestion and read workloads.
With accounts and tokens in place, you can:
- Create databases under each service account.
- Load data into those databases via your orchestrator.
- Use dedicated read tokens from your application to serve analytics.
For a concrete example of this pattern in a 3-tier web app, see the **[CFA Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/)**.
## Data Modeling and Loading
One database per customer tenant or organization scales cleanly because:
- Each database is tied to a tenant's service account.
- Each tenant's workloads are isolated from the others.
- You can scale Duckling (compute instance) sizes independently based on tenant needs using [different sizes (Pulse, Standard, etc)](/about-motherduck/billing/duckling-sizes/).
You can also:
- Use a single "landing" service account to ingest raw data from upstream systems.
- Use [ATTACH](/sql-reference/motherduck-sql-reference/attach.md) and [zero-copy cloning](/key-tasks/sharing-data/sharing-overview/#consuming-shared-data) to fan that data out into per-customer databases owned by their respective service accounts.
High-level patterns for data pipelines:
```mermaid
graph LR;
A[Source Systems]-->D[(Landing Database)];
D-->F[(Transform & Clone)];
F-->G[(Customer DB A)];
F-->H[(Customer DB B)];
F-->I[(Customer DB C)];
subgraph App
E[Serve Analytics]
end
G-->E;
H-->E;
I-->E;
```
Check out the detailed [Builder's Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/) for instructions on loading data into per-customer MotherDuck databases and orchestrating customer-facing analytics pipelines.
## Other Considerations
Since MotherDuck [Shares](/key-tasks/sharing-data/sharing-overview/) are read-only, in more real-time scenarios it may make sense to use:
- [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot/) to force a checkpoint on the writer.
- [`REFRESH DATABASE`](/sql-reference/motherduck-sql-reference/refresh-database/) to get the latest version of the data on the reader.
This pattern can help enforce consistency between writer and reader databases that power your customer-facing dashboards.
For high-scale, high-concurrency applications, MotherDuck offers [Read Scaling Replicas](https://motherduck.com/blog/read-scaling-preview/) for applications that send hundreds or thousands of queries in a few seconds, such as BI tools or busy embedded dashboards. Read replicas:
- Can be created and modified in the UI.
- Can be managed via the [MotherDuck REST API](/sql-reference/rest-api/motherduck-rest-api/).
- Follow the same consistency considerations as Shares, and can be checkpointed and refreshed more frequently if needed.
When you're ready to implement a full 3-tier architecture with per-customer service accounts, scheduled data loading, and a backend API, continue to the [**Customer-Facing Analytics Guide**](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/).
---
Source: https://motherduck.com/docs/key-tasks/data-warehousing/Orchestration/github-action-cron
---
sidebar_position: 1
title: Github Actions
---
# Orchestrating Queries with Github Action
GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform that allows you to automate your build, test, and deployment pipeline. You can create workflows that build and test every pull request to your repository, or deploy merged pull requests to production.
For the purposes of data warehousing, we can use GitHub Actions to extract, load, and transform data as a simple cron job. You can learn more about [Github Actions on the documentation pages](https://docs.github.com/en/actions).
## Triggering GitHub Actions
This How-to guide will cover two invocation examples: Actions invoked via `workflow dispatch` (manually triggered by a button in Github) and via a scheduled job. After reviewing the job invocation methodology, it continues on to show the definition of a container, installation of DuckDB, and then execution of some basic operations in MotherDuck. It should be noted that this is not intended to be a complete document - rather, a narrow slice of useful code that can be directly applied to the types of problems that can be solved with MotherDuck.
### Manually triggered actions
The most basic way to use Github Actions is to use `workflow dispatch` so that the action can be triggered by clicking a button in GitHub. Detailed documentation about this can be found on the [Github website](https://docs.github.com/en/actions/managing-workflow-runs-and-deployments/managing-workflow-runs/manually-running-a-workflow).
Using `workflow dispatch` in practice looks like this:
```yml
name: manual_build
on:
workflow_dispatch:
inputs:
name:
# Friendly description to be shown in the UI instead of 'name'
description: 'What is the reason to trigger this manually?'
# Default value if no value is explicitly provided
default: 'testing github actions'
# Input has to be provided for the workflow to run
required: false
jobs:
...
```
### Running cron jobs
Many types of jobs are better suited for scheduled orchestration. This can be done with the `schedule` attribute, which will use [traditional cron syntax](https://healthchecks.io/docs/cron/) to determine when to run the job.
Using `schedule` can look like this:
```yml
name: 'Scheduled Run'
on:
schedule:
- cron: '0 10 * * *' # This line sets the job to run every day at 10am UTC
jobs:
...
```
## Defining Jobs & Steps
After invocation method is defined, jobs should be defined. This contains the specific steps required to accomplish the job. For this example, we will define the container, install DuckDB, and then run a script a MotherDuck.
Job definition can look like this:
```yml
jobs:
deploy:
name: 'Deploy'
runs-on: ubuntu-latest
```
We have now define the Action environment, which is the latest stable version of ubuntu. There are of course other places these can run on, but the ubuntu container is a great starting point because it can also be easily shared with Github Codespaces, which makes testing easier.
:::note
Github Actions are composable, but for simplicity this guide will not cover how to link actions to each other, or other more advanced steps. This can all be found in the [documentation on Github](https://docs.github.com/en/actions).
:::
Afer the Job is defined, we add the steps. Since is yaml, the spacing is important, which why the steps are tabbed over.
```yml
steps:
# check out master using the "Checkout" action
- name: Check out
uses: actions/checkout@master
# install duckdb binary
- name: Install DuckDB
run: |
wget https://github.com/duckdb/duckdb/releases/download/v1.4.2/duckdb_cli-linux-amd64.zip
unzip duckdb_cli-linux-amd64
rm duckdb_cli-linux-amd64.zip
# run sql script with a specific token
- name: Run SQL script
env:
MOTHERDUCK_TOKEN: ${{ secrets.MOTHERDUCK_TOKEN }}
run: ./duckdb < script.sql
```
The example script invoked above looks like this:
```sql
-- attach to motherduck
ATTACH 'md:';
-- set the database
USE my_db;
-- create the table if it doesn't exist
CREATE TABLE IF NOT EXISTS target (
source VARCHAR(255),
timestamp TIMESTAMP
);
-- insert a row
INSERT INTO target (source, timestamp)
VALUES ('github action', CURRENT_TIMESTAMP);
```
## Other Considerations
In order to use this Action as currently written, you will need to create a secret in your repo called MOTHERDUCK_TOKEN with a [token generated from your MotherDuck account](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token).
## Handling More Complex Workflows
The example on this page covers very simple, single step orchestration. For more complex requirements, please check out our [orchestration partners](https://motherduck.com/ecosystem/?category=Orchestration). An overview of the MotherDuck Ecosystem is shown below.

---
Source: https://motherduck.com/docs/key-tasks/data-warehousing/Replication/flat-files
---
sidebar_position: 10
title: Flat Files
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import DownloadLink from '@site/src/components/DownloadLink';
# Replicating Flat Files to MotherDuck
The goal of this guide is to show users simple examples of loading data from flat file sources into MotherDuck. Examples are shown for both the MotherDuck Web UI and the DuckDB CLI. To install the DuckDB CLI, [check out the instructions first.](/getting-started/interfaces/connect-query-from-duckdb-cli)
## CSV
From the UI, follow these steps:
1. Navigate to the **Add Data** section.
2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB.
3. Execute the generated query which will create a table for you.
1. Modify the query as needed to suit the correct Database / Schema / Table name.
In the CLI, you can load a CSV file using the `read_csv` function. For example:
### Local File
```sql
CREATE TABLE my_table AS
SELECT * FROM read_csv('path/to/local_file.csv');
```
### S3 File
To load from S3, ensure your DuckDB instance is configured with [S3 secrets](/documentation/integrations/cloud-storage/amazon-s3.mdx). Then:
```sql
CREATE TABLE my_table AS
SELECT * FROM read_csv('s3://bucket-name/path-to-file.csv');
```
## JSON
From the UI, follow these steps:
1. Navigate to the **Add Data** section.
2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB.
3. Execute the generated query which will create a table for you.
1. Modify the query as needed to suit the correct Database / Schema / Table name.
In the CLI, use the `read_json` function to load JSON files.
### Local File
```sql
CREATE TABLE my_table AS
SELECT * FROM read_json('path/to/local_file.json');
```
### S3 File
Make sure S3 support is enabled as described in the [S3 secrets documentation](/documentation/integrations/cloud-storage/amazon-s3.mdx).
```sql
CREATE TABLE my_table AS
SELECT * FROM read_json('s3://bucket-name/path-to-file.json');
```
## Parquet
From the UI, follow these steps:
1. Navigate to the **Add Data** section.
2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB.
3. Execute the generated query which will create a table for you.
1. Modify the query as needed to suit the correct Database / Schema / Table name.
In the CLI, use the `read_parquet` function to load Parquet files.
### Local File
```sql
CREATE TABLE my_table AS
SELECT * FROM read_parquet('path/to/local_file.parquet');
```
### S3 File
Ensure S3 support is enabled as described in the [S3 secrets documentation](/documentation/integrations/cloud-storage/amazon-s3.mdx).
```sql
CREATE TABLE my_table AS
SELECT * FROM read_parquet('s3://bucket-name/path-to-file.parquet');
```
## Handling More Complex Workflows
Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

---
Source: https://motherduck.com/docs/key-tasks/data-warehousing/Replication/postgres
---
sidebar_position: 1
title: PostgreSQL
---
# Replicating PostgreSQL tables to MotherDuck
This page will serve to show basic patterns for using Python to connect to PostgreSQL using the [`postgres_scanner`](https://duckdb.org/docs/extensions/postgres.html), connect to MotherDuck, and then write the data from PostgreSQL into MotherDuck. For more complex replication scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion).
If you are looking for the [pg_duckdb extension](https://github.com/duckdb/pg_duckdb), head on over to the [pg_duckdb explainer page](/concepts/pgduckdb).
To skip the documentation and look at the entire script, expand the element below:
SQL script
```sql
-- install pg extension in DuckDB
INSTALL postgres;
LOAD postgres;
-- attach pg as pg_db
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS pg_db (TYPE POSTGRES, READ_ONLY);
-- connect to MotherDuck
ATTACH 'md:my_db';
-- insert data into MotherDuck
CREATE OR REPLACE TABLE my_db.main.postgres_table AS
SELECT * FROM pg_db.public.some_table
```
## Loading the PostgreSQL Exentsion & Authenticating
:::info
MotherDuck does not yet support the PostgreSQL and MySQL extensions, so you need to perform the following steps on your own computer or cloud computing resource. We are working on supporting the PostgreSQL extension on the server side so that this can happen within the MotherDuck app in the future with improved performance.
:::
The first step to connect to Postgres is to install & load the postgres extension using the [DuckDB CLI](/getting-started/interfaces/connect-query-from-duckdb-cli):
```sql
INSTALL postgres;
LOAD postgres;
```
Once this is completed, you can connect to postgres by attaching it to your duckdb session:
```sql
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS pg_db (TYPE POSTGRES, READ_ONLY);
```
More detailed information can be found on the [DuckDB documentation](https://duckdb.org/docs/extensions/postgres.html#connecting).
## Connecting to MotherDuck & inserting the table
Once you are connected to your postgres database, you need to connect to MotherDuck. To learn more about authentication, [go here](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck).
```sql
ATTACH 'md:my_db';
```
Once you have authenticated, you can execute CTAS in SQL to replicate data from postgres into MotherDuck.
```sql
CREATE OR REPLACE TABLE my_db.main.postgres_table AS
SELECT * FROM pg_db.public.some_table
```
Congratulations! You have now replicated data from Postgres into MotherDuck.
## Handling More Complex Workflows
Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

---
Source: https://motherduck.com/docs/key-tasks/data-warehousing/Replication/spreadsheets
---
sidebar_position: 20
title: Excel and Google Sheets
---
# Using Excel and Google Sheets Data in MotherDuck
Key bits of data and side schedules often exist in spreadsheets like Excel and Google Sheets. It is nice to be able to easily add that data to your data warehouse and query it. This guide aims to show you how to perform this workflow using the DuckDB CLI for both [Excel](#microsoft-excel) and [Google Sheets](#google-sheets).
:::tip
In order use these extensions, you will need to first install the DuckDB CLI. [Instructions can be found here.](/getting-started/interfaces/connect-query-from-duckdb-cli).
:::
## Microsoft Excel
:::note
The purpose of this guide is to show you how to _load_ data from Excel into MotherDuck. If you'd like to _retrieve_ MotherDuck data in Excel, you can [follow this guide](/integrations/bi-tools/excel/).
:::
To read from an Excel spreadsheet, open the DuckDB CLI by typing `duckdb 'md:'` in your terminal.
This will ask you for access to your MotherDuck account if you haven't already provided it.
You can now read Excel files directly with a simple `SELECT * FROM 'movies.xslx'` which will automatically load the
DuckDB Excel extension. If you want to get more control you can use
[the `read_xlsx` function](https://duckdb.org/docs/stable/core_extensions/excel) directly.
```sql
SELECT * FROM read_xlsx('movies.xlsx', sheet = 'Action Movies');
```
The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS).
```sql
CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name
SELECT *
FROM "C:\users\documents\movies.xlsx";
```
Of course, sometimes there is data in multiple tabs. In that case, you can use the `sheet` parameter to pass the tab names, and depending on the context, even union multiple tabs into a single table.
```sql
CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name
SELECT *
FROM st_read("C:\users\documents\movies.xlsx", sheet = 'Action Movies')
UNION ALL
SELECT *
FROM st_read("C:\users\documents\movies.xlsx", sheet = 'Romance Movies');
```
## Google Sheets
::::info
While the Excel extension is a core DuckDB extension, the Google Sheets extension is a community extension maintained by Evidence.
::::
The first step to handle Google Sheets is to install the [duckdb-gsheets](https://duckdb-gsheets.com/) extension. That is done with these commands after starting the DuckDB CLI with `duckdb 'md:'`
```sql
INSTALL gsheets FROM community;
LOAD gsheets;
```
Since Google Sheets is a hosted application, we need to use [DuckDB Secrets](https://duckdb.org/docs/configuration/secrets_manager.html)
to handle authentication. This is as simple as:
```sql
CREATE SECRET (TYPE gsheet);
```
:::note
Using this workflow will require interactivity with a browser, so if you need to run it from a job (i.e. Airflow or similar), consider setting up a [Google API access token](https://duckdb-gsheets.com/#getting-a-google-api-access-token).
:::
In order to read from a Google Sheet, we need at minimum the sheet id, which is found in the URL, for example `https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit`. The string between `d/` and `/edit` represents the spreadsheet id. It can therefore be queried with:
```sql
SELECT *
FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit');
```
The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS).
```sql
CREATE OR REPLACE TABLE my_db.main.my_table AS -- use fully qualified table name
SELECT *
FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit');
```
For convenience, the spreadsheet id itself can be queried as well.
```sql
SELECT *
FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8');
```
To query data from multiple tabs, the tab name can be passed as parameter using `sheet` to select the preferred tab.
```sql
SELECT * FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8', sheet='Sheet2');
```
For more detailed documentation, including writing to Google Sheets, review the [duckdb-gsheets documentation](https://duckdb-gsheets.com/#getting-a-google-api-access-token).
## Handling More Complex Workflows
Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

---
Source: https://motherduck.com/docs/key-tasks/data-warehousing/Replication/sql-server
---
sidebar_position: 2
title: SQL Server
---
# Replicating SQL Server tables to MotherDuck
This page will serve to show basic patterns for using Python to connect to SQL Server, read data into a dataframe, connect to MotherDuck, and then writing the data from the dataframe into MotherDuck. For more complex replication scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion).
To skip the documentation and look at the entire script, expand the element below:
Python script
```py
import pyodbc
# Define your connection parameters
server = 'ip_address'
database = 'master' # or use your database name
username = 'your_username'
password = 'your_password' # consider using a secret manager or .env
port = 1433 # default SQL Server port
# Define the connection string for ODBC Driver 17
connection_string = (
f"DRIVER={{ODBC Driver 17 for SQL Server}};"
f"SERVER={server},{port};"
f"DATABASE={database};"
f"UID={username};"
f"PWD={password};"
)
# Connect to SQL Server
try:
connection = pyodbc.connect(connection_string)
print("Connection successful.")
except pyodbc.Error as e:
print(f"Error: {e}")
finally:
connection.close()
import pandas as pd
try:
connection = pyodbc.connect(connection_string)
query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials"
# Execute the query using pyodbc
cursor = connection.cursor()
cursor.execute(query)
# Fetch the column names and data
columns = [column[0] for column in cursor.description]
data = cursor.fetchall()
# Convert the data into a DataFrame
df = pd.DataFrame.from_records(data, columns=columns)
finally:
connection.close()
import duckdb
motherduck_token = 'your_token'
# Attach using the MOTHERDUCK_TOKEN
duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'")
# Create or replace table in the attached database
duckdb.sql(
"""
CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS
SELECT * FROM df
"""
)
```
## SQL Server Authentication
SQL Server supports [multiple methods of authentication](https://learn.microsoft.com/en-us/sql/relational-databases/security/choose-an-authentication-mode?view=sql-server-ver16) - for the purpose of this example, we will use username/password authentication and [pyodbc](https://github.com/mkleehammer/pyodbc/), along with [ODBC Driver 17 for SQL Server](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16). It should be noted that 'ODBC Driver 18 for SQL Server' is also available and includes support for some newer SQL Server features, but for the sake of compatibility, this example will use 17.
Consider the following authentication example:
```py
import pyodbc
# Define your connection parameters
server = 'ip_address'
database = 'master' # or use your database name
username = 'your_username'
password = 'your_password' # consider using a secret manager or .env
port = 1433 # default SQL Server port
# Define the connection string for ODBC Driver 17
connection_string = (
f"DRIVER={{ODBC Driver 17 for SQL Server}};"
f"SERVER={server},{port};"
f"DATABASE={database};"
f"UID={username};"
f"PWD={password};"
)
# Connect to SQL Server
try:
connection = pyodbc.connect(connection_string)
print("Connection successful.")
except pyodbc.Error as e:
print(f"Error: {e}")
finally:
connection.close()
```
This will set your credentials, and then attempt to connect to your server with `pyodbc.connect`, and return an error if it fails.
## Reading a SQL Server table into a dataframe
Once you have authenticated, you can define arbitrary queries and then execute them with `pd.read_sql`, using the `query` and `connection` objects. For the purpose of this example, we are using SQL Server 2022 along with the AdventureWorks OLTP database.
:::note
While `pandas` is a great library, it is not particularly well-suited for very large tables. To learn more about using buffers and alternative libraries, check out [this link](/key-tasks/loading-data-into-motherduck/loading-data-md-python/).
:::
```py
import pandas as pd
try:
connection = pyodbc.connect(connection_string)
query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials"
# Execute the query using pyodbc
cursor = connection.cursor()
cursor.execute(query)
# Fetch the column names and data
columns = [column[0] for column in cursor.description]
data = cursor.fetchall()
# Convert the data into a DataFrame
df = pd.DataFrame.from_records(data, columns=columns)
finally:
connection.close()
```
## Inserting the table into MotherDuck
Now that the data has been loaded into a dataframe object, we can connect to MotherDuck and insert the table.
:::note
You will need to [generate a token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token) in your MotherDuck account. For production use cases, make sure to use a secret manager and never commit your token to your codebase.
:::
```py
import duckdb
motherduck_token = 'your_token'
# Attach using the MOTHERDUCK_TOKEN
duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'")
# Create or replace table in the attached database
duckdb.sql(
"""
CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS
SELECT * FROM df
"""
)
```
This will create the table, or replace it for the table already exists.
## Handling More Complex Workflows
Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

---
Source: https://motherduck.com/docs/key-tasks/data-warehousing/data-warehousing
---
title: Data Warehousing How-to
description: Data Warehousing How-to guides
---
import DocCardList from '@theme/DocCardList';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Versions from '@site/src/components/Versions';
## Introduction to MotherDuck for Data Warehousing
MotherDuck is a cloud-native data warehouse built on top of [DuckDB](https://duckdb.org/docs/sql/introduction), a fast in-process analytical database. While DuckDB provides the core analytical engine capabilities, MotherDuck adds cloud storage, sharing, and collaboration features that make it a complete data warehouse solution. Key advantages include its serverless architecture that eliminates infrastructure management, an intuitive interface that simplifies data analysis, and hybrid execution that intelligently processes queries across local and cloud resources.
MotherDuck is an ideal choice for organizations seeking a modern data warehouse solution. It excels at ad-hoc analytics by providing instant compute resources for each user, serves well as a departmental data mart with its simplified sharing model, and enables powerful embedded analytics through its WASM capabilities. Different personas benefit uniquely - data analysts get an intuitive SQL interface with AI assistance, engineers can leverage familiar APIs and tools like dbt, and data scientists can seamlessly combine local and cloud data processing.

The modern data stack with MotherDuck integrates seamlessly with popular tools across the ecosystem. As shown in the ecosystem diagram, this includes ingestion tools like [Fivetran](https://fivetran.com/docs/destinations/motherduck#motherduck) and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) for loading data, transformation tools like [dbt](/docs/integrations/transformation/dbt) for modeling, BI tools like [Tableau](/integrations/bi-tools/tableau/) and [PowerBI](/integrations/bi-tools/powerbi/) for visualization, and orchestration tools like [Airflow](https://airflow.apache.org/docs/) and [Dagster](https://docs.dagster.io/examples/bluesky) for pipeline management. This comprehensive integration enables teams to build complete data warehousing solutions while leveraging their existing tooling investments.
## MotherDuck Basics: Concepts to Understand Before You Start

MotherDuck's core architecture is built on a serverless foundation that eliminates infrastructure management overhead. The platform handles data storage with enterprise-grade durability and security, while optimizing performance through intelligent data organization. Each user gets their own isolated compute resource called a "Duckling" that sits on top of the storage layer, and the separation of storage and compute enables independent scaling of these resources based on workload demands.
The [dual execution model](/concepts/architecture-and-capabilities/#dual-execution) is a unique capability that allows MotherDuck to seamlessly query both local and cloud data. The query planner intelligently determines the optimal execution path, deciding whether to process data locally, in the cloud, or using a hybrid approach. This enables efficient querying across data sources while minimizing data movement and optimizing for performance.
MotherDuck follows a familiar hierarchical structure with databases containing schemas and tables. Databases serve as the primary unit of organization and access control, while schemas help logically group related tables together. This structure provides a clean way to organize data while maintaining compatibility with common [SQL patterns](https://duckdb.org/docs/sql/introduction) and tools.
Authentication in MotherDuck is handled through secure [token-based access](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token), with comprehensive user and organization management capabilities. The platform uses a simplified access model where users either have full access to a database or none at all. The [SHARES](/key-tasks/sharing-data/managing-shares/) feature enables secure data sharing within organizations and with external parties through zero-copy clones that maintain data consistency and security.
The [MotherDuck user interface](/getting-started/interfaces/motherduck-quick-tour/) provides a modern notebook-style environment for data interaction. The SQL IDE includes powerful features like intelligent autocomplete, AI-powered query suggestions and fixes, and an interactive Column Explorer that helps users understand and analyze their data structure. These features combine to create an intuitive and productive environment for data analysis.
While MotherDuck is designed for analytical workloads, it's important to note that it's not optimized for high-frequency small transactions like traditional OLTP databases. The platform works best with batch operations and [analytical queries](https://duckdb.org/docs/sql/introduction), and users should consider using queues for streaming workloads to achieve optimal performance. Additionally, the database-level security model means access cannot be controlled at the schema or table level.
## Data Ingestion: Getting Your Data In
MotherDuck provides multiple strategies for ingesting data into your data warehouse. The platform leverages DuckDB's powerful data loading capabilities while adding cloud-native features for seamless data ingestion at scale. You can load data through direct file imports, cloud storage connections, database migrations, or specialized ETL tools like [Fivetran](https://fivetran.com/docs/destinations/motherduck#motherduck) and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) depending on your needs. The [MotherDuck Web UI](/getting-started/interfaces/motherduck-quick-tour/) provides an intuitive interface for data loading and exploration.
### Loading local data
Loading data from local files is straightforward with support for common formats like CSV, Parquet, and JSON. The [MotherDuck UI](/getting-started/interfaces/motherduck-quick-tour/) provides an intuitive interface for uploading files directly, while the [Python client](https://duckdb.org/docs/api/python/overview) enables programmatic loading using DuckDB's native functions. For example, you can use [read_csv()](https://duckdb.org/docs/data/csv), [read_parquet()](https://duckdb.org/docs/data/parquet), or [read_json()](https://duckdb.org/docs/data/json) to efficiently load data files while taking advantage of DuckDB's parallel processing capabilities.
### Interacting with cloud storage (S3, GCS, etc)
Cloud storage integration allows you to directly query and load data from major providers including [AWS S3](https://duckdb.org/docs/guides/import/s3_import), [Google Cloud Storage](https://duckdb.org/docs/guides/import/gcs_import), [Azure Blob Storage](https://duckdb.org/docs/stable/extensions/azure), and [Cloudflare R2](https://duckdb.org/docs/guides/import/s3_import). Using SQL commands like SELECT FROM read_parquet('s3://bucket/file.parquet'), you can seamlessly access cloud data. MotherDuck handles credential management securely through [environment variables](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) or configuration settings.
### Database-to-database data loading
For database migrations, MotherDuck supports importing data from other databases like [PostgreSQL](https://duckdb.org/docs/guides/import/query_postgres.html) and [MySQL](https://duckdb.org/docs/guides/import/query_mysql). You can directly connect to these sources using database connectors and execute queries to extract and load data. Existing [DuckDB databases](https://duckdb.org/docs/stable/data/multiple_files/overview) can be imported efficiently since MotherDuck is built on DuckDB's core engine.
### Fetching data from APIs
[Data ingestion](/integrations/ingestion/) tools like Fivetran, Airbyte, dltHub and Estuary integrate with MotherDuck to provide automated, reliable data pipelines. These tools handle complex ETL workflows, data validation, and transformation while offering features like scheduling, monitoring and error handling that simplify ongoing data operations.
For real-time data needs, MotherDuck works with streaming partners like [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/) to enable continuous data ingestion. While DuckDB is optimized for batch operations, these integrations allow you to build streaming pipelines that buffer and load data in micro-batches for near real-time analytics.
### Unstructured data integrations
When working with unstructured data like documents, emails or images, tools like [Unstructured.io](https://motherduck.com/blog/effortless-etl-unstructured-data-unstructuredio-motherduck/) can pre-process and structure the data before loading into MotherDuck. This enables you to analyze unstructured data alongside your structured data warehouse tables.
### Loading Performance Notes
For optimal performance, follow DuckDB's recommended practices around batch sizes and data types. Load data in reasonably sized batches (at leasts 122k rows) to balance memory usage and throughput. Use appropriate data types like TIMESTAMP for datetime values and avoid unnecessary type conversions. Sort data by columns that are frequently queried together such as TIMESTAMPs. Monitor [recent queries](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) during large loads and adjust batch sizes accordingly.
## Data Transformation: Shaping Your Data for Analysis
Data transformation is a critical step in the data warehousing process that converts raw data into analysis-ready formats. MotherDuck provides powerful SQL capabilities inherited from DuckDB for transforming data directly within the warehouse. You can leverage DuckDB's rich library of SQL functions to clean, reshape, and model your data through operations like filtering, joining, aggregating and pivoting.
### Transformation Tools
- **[dbt (data build tool)](/integrations/transformation/dbt/)**
* Native MotherDuck adapter for seamless integration to dbt core
* Enables version controlled, modular SQL transformations
* Supports testing, documentation and lineage tracking
* Recommended for complex transformation workflows
* See our [blog post](https://motherduck.com/blog/duckdb-dbt-e2e-data-engineering-project-part-2/) for detailed examples
- **[SQLMesh](https://sqlmesh.readthedocs.io/en/stable/integrations/engines/motherduck/)**
* Compatible with MotherDuck through DuckDB support
* Provides data pipeline and transformation management
* Enables incremental processing and scheduling
*
- **[Paradime](https://docs.paradime.io/app-help/documentation/settings/connections/scheduler-environment/duckdb)**
* Modern data transformation platform built for DuckDB/MotherDuck
* Offers collaborative development environment
* Includes version control and deployment tools
## Orchestration: Automating Your Data Pipelines
Orchestration is essential for keeping data up to date with MotherDuck. Scheduling data loads and transformations ensures your data warehouse stays current by running ingestion jobs at appropriate intervals to capture new data from your sources. Managing dependencies between tasks allows you to create reliable pipelines where transformations only run after their prerequisite data loads complete successfully. Monitoring and alerting capabilities help you track pipeline health and quickly address any issues that arise.
For orchestrating MotherDuck workflows, you have several options:
Popular workflow orchestration platforms like [Airflow, Dagster, Kestra, Prefect and Bacalhau](/integrations/orchestration/) provide robust scheduling, dependency management and monitoring capabilities.
For simpler use cases, basic scheduling tools like cron jobs or [GitHub Actions](/key-tasks/data-warehousing/Orchestration/github-action-cron/) can effectively orchestrate straightforward data pipelines.
Many ingestion & transformation tools also come with built-in orchestration features, allowing you to schedule and monitor data loads without additional tooling.
When orchestrating MotherDuck pipelines, follow these best practices:
- Design idempotent jobs that can safely re-run without duplicating or corrupting data.
- Implement proper error handling and retries to gracefully handle temporary failures.
- Set up logging and monitoring to maintain visibility into pipeline health and performance.
## Connecting BI Tools and Data Applications
MotherDuck provides robust support for business intelligence and reporting through its cloud data warehouse capabilities. The platform enables organizations to build scalable analytics solutions by connecting their data warehouse to popular visualization and reporting tools. With isolated compute tenancy per user, analysts can run complex queries without impacting other users' performance.
For connecting popular BI tools, MotherDuck offers several integration options. Tableau users can connect via the [cloud and server connectors](/integrations/bi-tools/tableau/), with support for both token-based and environment variable authentication methods. The platform works with both live and extracted connections, and Tableau Bridge enables cloud connectivity. [Microsoft Power BI](/integrations/bi-tools/powerbi/) integration is achieved through the DuckDB ODBC driver and Power Query connector, supporting both import and DirectQuery modes. Other supported BI tools include Omni, Metabase, Preset/Superset, and Rill, typically connecting through standard JDBC/ODBC interfaces.
MotherDuck seamlessly integrates with data science and AI tools through its native APIs and connectors. Python users can leverage the DuckDB SDK and Pandas integration for data analysis workflows. The platform supports R for statistical computing, while AI applications can be built using LangChain or LlamaIndex integrations. Notebook tools like Hex and Jupyter provide both hosted and on-prem environments for data exploration.
For building [custom data applications](/getting-started/customer-facing-analytics/), MotherDuck's unique architecture enables novel approaches through its WASM-powered 1.5-tier architecture. The platform runs DuckDB in the browser via WebAssembly, allowing for highly interactive visualizations with near-zero latency. Developers can use MotherDuck's APIs and SDKs in languages like Python and Go to create custom data applications that leverage both local and cloud-based data processing.
## Advanced Topics & Best Practices
### Performance Tuning and Optimization in MotherDuck
MotherDuck inherits DuckDB's powerful query optimization capabilities. You can analyze query performance using the `EXPLAIN` command to view execution plans and identify bottlenecks. While DuckDB doesn't use traditional indexes, it automatically creates statistics and metadata to optimize query execution with row groups. As a result, [sorting the data on insert](https://duckdb.org/2025/05/14/sorting-for-fast-selective-queries.html) is very effective way to improve query performance.
### Data Sharing and Collaboration
MotherDuck implements a straightforward data sharing model through SHARES, which provide read-only access to specific databases. To create a share, use the [`CREATE SHARE`](/sql-reference/motherduck-sql-reference/create-share/) command and specify the database you want to share. Recipients can then access the shared data through their own MotherDuck account while maintaining data isolation.
### Monitoring and Logging MotherDuck Usage
DuckDB's meta-queries like `EXPLAIN ANALYZE` provide detailed query execution statistics. You can also use the platform's built-in profiling capabilities to monitor query performance and resource utilization, helping identify optimization opportunities and troubleshoot performance issues. [Recent queries](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) and [historical queries](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) can be observed as well, in order to further optimize the warehouse load.
### Cost Management
While MotherDuck's pricing model is still evolving, you can optimize costs by efficiently managing compute resources. Consider implementing data lifecycle policies to archive or delete old data. Monitor query patterns to identify opportunities for optimization and avoid unnecessary data processing.
### Security Best Practices for Your MotherDuck Warehouse
- Implement robust security practices by following MotherDuck's database-level security model.
- Use token-based authentication for all connections and avoid sharing credentials.
- When integrating with tools, leverage environment variables for secure credential management.
- Regularly audit database access and maintain an inventory of active shares.
### Leveraging AI Features within MotherDuck
MotherDuck enhances DuckDB with AI-powered features to improve productivity. The platform includes a [SQL AI fixer](/getting-started/interfaces/motherduck-quick-tour/#writing-sql-with-confidence-using-fixit-and-edit) that helps identify and correct query syntax issues. The `prompt()` function enables natural language interactions with your data warehouse, allowing users to generate SQL queries from plain English descriptions. These are just a few of the AI capabilities that help make data analysis more accessible while maintaining the power and flexibility of SQL.
## Further Guides:
## Appendix
### Troubleshooting Common Issues
When working with MotherDuck, you may encounter challenges around data loading, query performance, or connectivity. For data loading issues, refer to our [best practices for programmatic loading](/key-tasks/data-warehousing/) which covers optimizing batch sizes and file formats. For query performance, review our [dual execution capabilities](/concepts/architecture-and-capabilities/#dual-execution) to understand how MotherDuck optimizes query execution across local and cloud resources. For connectivity problems, check our [authentication guides](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck) and ensure you're following the recommended connection patterns.
### Useful SQL Snippets for MotherDuck
MotherDuck supports a wide range of SQL functionality inherited from DuckDB. For data ingestion, refer to our [PostgreSQL replication examples](/key-tasks/data-warehousing/Replication/postgres) which demonstrate common patterns for loading data. For building customer facing analytics, check our [guide](/getting-started/customer-facing-analytics) which includes examples of data processing and visualization queries. The [DuckDB SQL documentation](https://duckdb.org/docs/sql/introduction.html) provides comprehensive reference for the SQL dialect.
### Links to Further Resources (MotherDuck Docs, Community)
To deepen your understanding of data warehousing with MotherDuck, explore our [data warehousing concepts guide](/key-tasks/data-warehousing/) which covers architectural principles and best practices. For hands-on examples, the free [DuckDB in Action eBook](https://motherduck.com/duckdb-book-brief/) provides real-world scenarios and solutions. If you need help, don't hesitate to [contact our support team](https://motherduck.com/customer-support/) or explore our [ecosystem integrations](/integrations/) for additional tools and capabilities.
Please do not hesitate to **[contact us](https://motherduck.com/customer-support/)** if you need help along your journey.
---
Source: https://motherduck.com/docs/key-tasks/database-operations/basics-operations
---
sidebar_position: 1
title: Basics database operations
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
While embedded DuckDB uses files on your local filesystem to represent databases, MotherDuck implements SQL syntax for creating, listing and dropping databases.
## Create database
```sql
-- [OR REPLACE] and [IF NOT EXISTS] are optional modifiers.
CREATE [OR REPLACE | IF NOT EXISTS] DATABASE ;
USE ;
```
Creating copies of databases in MotherDuck in this manner is a metadata-only operation that copies no data. Learn more in the [`CREATE DATABASE`](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-database/?_gl=1*14cxe42*_up*MQ..*_ga*MjA4NjkwOTk1Ni4xNzYxMjczNjc0*_ga_L80NDGFJTP*czE3NjEyNzM2NzQkbzEkZzAkdDE3NjEyNzM2NzQkajYwJGwwJGgxMDk0NzQ2NDQ0) overview documentation.
## Listing databases
```sql
-- returns all connected local and remote databases
SHOW DATABASES;
-- returns current database
SELECT current_database();
```
Learn more in the [`SHOW ALL DATABASES`](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/show-databases/?_gl=1*15gbzga*_up*MQ..*_ga*MjA4NjkwOTk1Ni4xNzYxMjczNjc0*_ga_L80NDGFJTP*czE3NjEyNzM2NzQkbzEkZzAkdDE3NjEyNzM2NzQkajYwJGwwJGgxMDk0NzQ2NDQ0) overview documentation.
## Delete database
```sql
USE ;
DROP DATABASE ;
```
Example usage:
```sql
> SHOW DATABASES;
test01
-- Let's put two different t1 tables into into two different databases
> CREATE TABLE dbname.t1 AS (SELECT range AS r FROM range(12));
> SELECT * FROM t1;
-- now for the other database
> CREATE DATABASE test02;
> CREATE TABLE test02.t1 AS (SELECT 'test02' AS dbname)
-- show the databases we've created
> SHOW DATABASES;
test01
test02
```
Learn more in the [`DROP DATABASE`](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/show-databases/?_gl=1*15gbzga*_up*MQ..*_ga*MjA4NjkwOTk1Ni4xNzYxMjczNjc0*_ga_L80NDGFJTP*czE3NjEyNzM2NzQkbzEkZzAkdDE3NjEyNzM2NzQkajYwJGwwJGgxMDk0NzQ2NDQ0) overview documentation.
---
Source: https://motherduck.com/docs/key-tasks/database-operations/copying-databases
---
sidebar_position: 10
title: Copying DuckDB Databases
---
# Copying MotherDuck and DuckDB Databases
The `COPY FROM DATABASE` statement creates an exact duplicate of an existing database, including both schema and data. This functionality enables the following operations:
[Interact with MotherDuck Databases](#copy-a-motherduck-database-to-a-motherduck-database)
- Copy between MotherDuck databases
[Interact with Local Databases](#interacting-with-local-databases)
- Import local database to MotherDuck
- Export MotherDuck database to local filesystem
- Copy between local databases
The `COPY FROM DATABASE` command is implemented as a multiple statement macro, which is not supported in WebAssembly. As a result, simultaneous schema and data copying is not available in the MotherDuck Web UI. However, the Web UI supports copying schema only (`SCHEMA` option) or data only (`DATA` option). All functionality is available in other drivers, including the DuckDB CLI.
:::caution No zero-copy clone
`COPY FROM DATABASE` creates a *physical* copy of both the schema and the data. It **does not** use MotherDuck's zero-copy cloning, so the operation may take longer to run and will consume additional storage proportional to the size of the source database.
:::
## Syntax
The syntax for `COPY FROM DATABASE` is:
```sql
COPY FROM DATABASE TO [ (SCHEMA) | (DATA) ]
```
### Parameters
- ``: The name or path of the source database to copy from
- ``: The name or path of the target database to create
- `(SCHEMA)`: Optional parameter to copy only the database schema without data
- `(DATA)`: Optional parameter to copy only the database data without schema
## Example Usage
### Copy a MotherDuck database to a MotherDuck database
This is the same as [creating a new database from an existing one](/sql-reference/motherduck-sql-reference/create-database.md).
```sql
COPY FROM DATABASE my_db TO my_db_copy;
```
### Interacting with Local Databases
These operations can be done with access to the local filesystem, i.e. inside the DuckDB CLI.
#### Copy a local database to a MotherDuck database
```sql
ATTACH 'local_database.db';
ATTACH 'md:';
CREATE DATABASE md_database;
COPY FROM DATABASE local_database TO md_database;
```
#### Copy a MotherDuck database to a local database
To copy a MotherDuck database to a local database requires some extra steps.
```sql
ATTACH 'md:';
ATTACH 'local_database.db' as local_db;
COPY FROM DATABASE my_db TO local_db;
```
#### Copy a local database to a local database
To copy a local database to a local database, please see the [DuckDB documentation](https://duckdb.org/docs/stable/sql/statements/copy.html#copy-from-database--to).
### Copying the Database Schema
```sql
COPY FROM DATABASE my_db TO my_db_copy (SCHEMA);
```
This will copy the schema of the database, but not the data.
### Copying the Database Data
```sql
COPY FROM DATABASE my_db TO my_db_copy (DATA);
```
This will copy the data of the database, but not the schema.
---
Source: https://motherduck.com/docs/key-tasks/database-operations/database-operations
---
title: Database operations
description: Learn how to work with databases and MotherDuck
---
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/key-tasks/database-operations/detach-and-reattach-motherduck-database
---
sidebar_position: 12
title: Detach and re-attach a MotherDuck database
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
After [creating a remote MotherDuck database](/sql-reference/motherduck-sql-reference/create-database.md),
the [`DETACH` command](/sql-reference/motherduck-sql-reference/detach.md) may be used to detach it.
This will prevent access and modifications to the database until it is re-attached using the [`ATTACH` command](/sql-reference/motherduck-sql-reference/attach.md).
This pattern can be used to isolate queries and changes to a specific set of databases.
Note that this is a convenience feature and not a security feature, as a MotherDuck database may be reattached at any time.
Database shares behave slightly differently than non-shared databases, so if you want to `ATTACH` and `DETACH` shares, please have a look at how to [manage shared MotherDuck databases](/key-tasks/sharing-data/sharing-data.mdx).
## Creating, detaching, and re-attaching a database
This guide will show how to `CREATE`, `DETACH`, and `ATTACH` a database using the CLI and the UI.
```sql
CREATE DATABASE my_new_md_database;
DETACH my_new_md_database;
ATTACH 'my_new_md_database';
-- OR
ATTACH 'md:my_new_md_database';
```
To create a database, add a new cell and enter the SQL command `CREATE DATABASE `.
Click the Run button.

Click on the menu of the database you would like to detach and select `Detach`.

The database will be moved to the "Detached Databases" section of the object explorer.

To re-attach, click on the menu of the database in the "Detached Databases" section and select `Attach`.

The database will be returned to the "My Databases" section.

## Show All Databases
To see all databases, both attached and detached, use the [`SHOW ALL DATABASES` command](/sql-reference/motherduck-sql-reference/show-databases.md).
```sql
SHOW ALL DATABASES;
```
Example output:
```bash
┌──────────────────────────────────────────┬─────────────┬──────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┐
│ alias │ is_attached │ type │ fully_qualified_name │
│ varchar │ boolean │ varchar │ varchar │
├──────────────────────────────────────────┼─────────────┼──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ true │ motherduck │ md:TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │
│ TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ true │ motherduck │ md:TEST_DB_62b53d968a4f4b6682ed117a7251b814 │
│ base │ false │ motherduck │ md:base │
│ base2 │ true │ motherduck │ md:base2 │
│ db1 │ false │ motherduck │ md:db1 │
│ integration_test_001 │ false │ motherduck │ md:integration_test_001 │
│ my_db │ true │ motherduck │ md:my_db │
│ my_share_1 │ true │ motherduck share │ md:_share/integration_test_001/18d6dbdb-e130-4cdf-97c4-60782ed5972b │
│ sample_data │ false │ motherduck │ md:sample_data │
│ source_db │ true │ motherduck │ md:source_db │
│ test_db_115 │ false │ motherduck │ md:test_db_115 │
│ test_db_28d │ false │ motherduck │ md:test_db_28d │
│ test_db_cc9 │ false │ motherduck │ md:test_db_cc9 │
│ test_share │ true │ motherduck share │ md:_share/source_db/b990b424-2f9a-477a-b216-680a22c3f43f │
│ test_share_002 │ true │ motherduck share │ md:_share/integration_test_001/06cc5500-e49a-4f62-9203-105e89a4b8ae │
├──────────────────────────────────────────┴─────────────┴──────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┤
│ 15 rows (15 shown) 4 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
---
Source: https://motherduck.com/docs/key-tasks/database-operations/specifying-different-databases
---
sidebar_position: 2.2
title: Specifying different databases
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
MotherDuck enables you to specify an active/current database and an active/current schema within that database.
Queryable objects (e.g. tables) that belong to the current database are resolved with just ``.
MotherDuck will automatically search all schemas within the current database.
If there are overlapping names within different schemas, objects can be qualified with `.`.
Queryable objects in your account outside of the active/current database are resolved with `.`.
However, if a schema in the current database shares the same name as another database, the fully qualified name must be used: `..` (an error will be thrown to indicate the ambiguity).
This applies to databases that both live in MotherDuck and in your local DuckDB environment.
For example:
```sql
-- check your current database
SELECT current_database();
dbname
-- check your current schema
SELECT current_schema();
main
-- query a table mytable that exists in the current database dbname
SELECT count(*) FROM mytable;
34
-- query a table mytable2 that exists in the database dbname2
SELECT count(*) FROM dbname2.mytable2;
41
-- query a table mytable3 that exists in schema2
-- note that the syntax is identical to the database name syntax above and
-- MotherDuck will detect whether a database or schema is involved
SELECT count(*) FROM schema2.mytable3
42
-- query a table in another database when a schema exists with the same name in the current database
-- (overlappingname is both a database name and a schema name)
SELECT count(*) FROM overlappingname.myschemaname.mytable4
43
```
You can also reference local databases in the same MotherDuck queries. This type of query is known as a [hybrid query](/key-tasks/running-hybrid-queries.md).
To change the active database, schema, or database/schema combination, execute a `USE` command.
See the documentation on [switching the current database](./switching-the-current-database.md) for details.
---
Source: https://motherduck.com/docs/key-tasks/database-operations/switching-the-current-database
---
sidebar_position: 3
title: Switching the current database
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Below are examples of how to determine the current/active database and schema and switch between different databases and schemas:
```sql
-- check your current database
SELECT current_database();
dbname
-- list all tables in the current database
SHOW TABLES;
table1
table2
-- list all databases
SHOW DATABASES;
dbname
dbname2
-- switch to database named 'dbname2'
USE dbname2;
-- verify that you've successfully switched databases
SELECT current_database();
dbname2
-- check your current schema
SELECT current_schema();
main
-- list all schemas across all databases
SELECT * FROM duckdb_schemas();
```
| oid | database_name | database_oid | schema_name | internal | sql |
|------|---------------|--------------|--------------------|----------|------|
| 986 | my_db | 989 | information_schema | true | NULL |
| 974 | my_db | 989 | main | false | NULL |
| 972 | my_db | 989 | my_schema | false | NULL |
| 987 | my_db | 989 | pg_catalog | true | NULL |
| 1508 | system | 0 | information_schema | true | NULL |
| 0 | system | 0 | main | true | NULL |
| 1509 | system | 0 | pg_catalog | true | NULL |
| 1510 | temp | 1453 | information_schema | true | NULL |
| 1454 | temp | 1453 | main | true | NULL |
| 1511 | temp | 1453 | pg_catalog | true | NULL |
```sql
-- switch to schema my_schema within the same database
USE my_schema;
-- verify that you've successfully switched schemas
SELECT current_schema();
my_schema
-- switch to database my_db and schema main
USE my_db.my_schema
-- verify that both the database and schema have been changed
SELECT current_database(), current_schema();
```
| current_database() | current_schema() |
|--------------------|------------------|
| my_db | main |
---
Source: https://motherduck.com/docs/key-tasks/how-to-guides
---
title: How-to guides
sidebar_class_name: how-to-guide-icon
description: How-to guides
---
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/considerations-for-loading-data
---
sidebar_position: 0.5
title: Loading Data Best Practices
description: Understanding trade-offs and performance implications when loading data into MotherDuck
---
# Loading Data Best Practices
When loading data into MotherDuck, understanding the trade-offs between different approaches helps you make informed decisions that optimize for your specific use case. This guide explains the key considerations that impact performance, cost, and reliability.
## Data Loading Methods and Their Trade-offs
### File Format Considerations
The choice of file format significantly impacts loading performance:
**Parquet (Recommended)**
- **Compression**: 5-10x better compression than CSV
- **Performance**: 5-10x more throughput due to compression
- **Schema**: Self-describing with embedded metadata
- **Use Case**: Production data loading, large datasets
**CSV**
- **Compression**: Minimal compression benefits
- **Performance**: Slower loading, especially for large files
- **Schema**: Requires manual type inference or specification
- **Use Case**: Simple data exploration, small datasets
**JSON**
- **Compression**: Moderate compression
- **Performance**: Slower than Parquet due to parsing overhead
- **Schema**: Flexible but requires careful type handling
- **Use Case**: Semi-structured data, API responses
## Performance Optimization Strategies
### Batch Size Optimization
MotherDuck can handle both **batch operations** and **real-time streaming**. For optimal insert performance, use the section below as a guide.
The size of your data batches directly impacts performance and resource usage:
**Optimal Batch Size: 1,000,000+ rows**
- DuckDB operates in groups of 122,800 rows (row group size)
- A 1.2M row insert will parallelize across 10 threads automatically
- 100k and 1M row inserts will perform roughly the same due to parallelization overhead
- Minimum effective batch size is >1M rows for optimal performance
:::tip
Load data in batches of at least 1M rows to leverage DuckDB's parallelization. Smaller batches (like 100k rows) don't provide meaningful performance benefits and may actually be slower due to overhead.
:::
### Memory Management
Effective memory management is crucial for large data loads:
**Data Type Optimization**
- Use explicit schemas to avoid type inference overhead
- Choose appropriate data types (e.g., TIMESTAMP for dates)
- Avoid unnecessary type conversions
**Sorting Strategy**
- Sort data by frequently queried columns during loading
- To re-sort existing tables, use `CREATE OR REPLACE` with the preferred sorting method
- Improves query performance through better data locality
- Consider the trade-off between loading speed and query performance
### Network and Location Considerations
**Data Location**
- MotherDuck is currently available on AWS in two regions, **US East (N. Virginia)** - `us-east-1` and **Europe (Frankfurt)** - `eu-central-1`
- For optimal performance, consider locating source data in the same region as your MotherDuck Organization
- Consider network latency when loading from remote sources
**Cloud Storage Integration**
- Direct integration with S3, GCS, Azure Blob Storage
- Use [cloud storage](/integrations/cloud-storage/) to leverage network speeds for better performance
- Reduces local storage requirements
## Duckling Sizing
**Duckling Selection**
For data sets under 100 GB in size, use Jumbo Ducklings to load the data. For larger data sizes, use [Mega or Giga](/about-motherduck/billing/duckling-sizes/).
## Summary
The key to successful data loading in MotherDuck is understanding the trade-offs between different approaches and optimizing for your specific use case. Focus on:
1. **Batch inserts** of at least 1,000,000 rows for fastest performance.
2. If you can control how they are written from sources, use **Parquet** for compression and speed
3. Write data into **S3** for speedy reads.
4. Use **larger Duckling sizes (Jumbo or bigger)** for loading bigger data sets.
By following these guidelines and understanding the underlying principles, you can build efficient, reliable data loading pipelines that scale with your needs while managing costs effectively.
---
Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-cloud-or-https
---
sidebar_position: 2
title: From Cloud Storage or over HTTPS
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# From Cloud Storage or over HTTPS
# From Public Cloud Storage
MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2).
:::note
MotherDuck is currently available on AWS in two regions, **US East (N. Virginia)** - `us-east-1` and **Europe (Frankfurt)** - `eu-central-1`. For an optimal experience, we strongly encourage you locate your data in the same region as your MotherDuck Organization.
:::
The following example features Amazon S3.
Connect to MotherDuck if you haven't already by doing the following:
```sql
-- assuming the db my_db exists
ATTACH 'md:my_db';
```
```sql
-- CTAS a table from a publicly available demo dataset stored in s3
CREATE OR REPLACE TABLE pypi_small AS
SELECT * FROM 's3://motherduck-demo/pypi.small.parquet';
-- JOIN the demo dataset against a larger table to find the most common duplicate urls
-- Note you can directly refer to the url as a table!
SELECT pypi_small.url, COUNT(*)
FROM pypi_small
JOIN 's3://motherduck-demo/pypi_downloads.parquet' AS s3_pypi
ON pypi_small.url = s3_pypi.url
GROUP BY pypi_small.url
ORDER BY COUNT(*) DESC
LIMIT 10;
```
## From a Secure Cloud Storage Provider
MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). In order to access them securly, you first must [create a secret](/sql-reference/motherduck-sql-reference/create-secret/).
You can set cloud storage secrets directly from the UI under Settings —> Integrations —> Secrets.

When adding a secret, you have the option to select your cloud storage provider (S3, R2, GCS, Azure)

Depending on the cloud provider you'll need to provide at minimum an access key and secret for your service account.
When adding S3 credentials you can immediately test and verify your connection.
To create a secret in MotherDuck using the CLI or SQL notebooks you'll need to explicitly add the `IN MOTHERDUCK`.
```sql
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key',
SECRET 'secret_key',
REGION 'us-east-1',
SCOPE 'my-bucket-path'
);
-- Now you can query from a secure S3 bucket
CREATE OR REPLACE TABLE mytable AS SELECT * FROM 's3://...';
```
## Over HTTPS
MotherDuck supports loading data over HTTPS.
```sql
-- Reads the Central Park Squirrel Data
SELECT * FROM read_csv_auto('https://docs.google.com/spreadsheets/d/e/2PACX-1vQUZR6ikwZBRXWWQsFaUceEiYzJiVw4OQNGtwGBfcMfVatpCyfxxaWPdoKJIHlwNM-ow1oeW_2F-pO5/pub?gid=2035607922&single=true&output=csv');
```
## Related Content
- [Troubleshooting AWS S3 Secrets](/docs/troubleshooting/aws-s3-secrets/)
---
Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-local-machine
---
sidebar_position: 0.9
title: From Your Local Machine
description: Moving data from local to MotherDuck through the UI or programmatically.
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Single file
Using the CLI, you can connect to MotherDuck, create a database, and load a single local file (JSON, Parquet, CSV, etc.) to a MotherDuck table.
First, connect to MotherDuck using the `ATTACH` command.
```sql
ATTACH 'md:';
```
Create a cloud database (or point to any existing one) and load a local file into a table.
```sql
CREATE DATABASE test01;
USE test01;
CREATE OR REPLACE TABLE orders as SELECT * from 'orders.csv';
```
In the MotherDuck UI, you can add JSON, CSV or Parquet file directly using the **Add Files** button in the top left of the UI.
See the [Getting Started Tutorial](../../../getting-started/e2e-tutorial/part-2#loading-your-data) for details.
## Multiple files or database
To upload multiple files at once, or data in other formats supported by DuckDB, you can use the DuckDB CLI or any other supported [DuckDB client](https://duckdb.org/docs/data/multiple_files/overview.html).
If your all your files reside from a single table, you can use the [glob syntax to load all files into a single table](https://duckdb.org/docs/data/multiple_files/overview.html).
For example, to load all CSV files from a directory into a single table, you can use the following SQL command:
```sql
ATTACH 'md:';
CREATE DATABASE test01;
USE test01;
CREATE OR REPLACE TABLE orders as SELECT * from 'dir/*.csv';
```
If your files are in different formats or you want to load them into different tables, you can first load the files into different tables in a local DuckDB database and then copy the entire database into MotherDuck.
To copy the entire local DuckDB database into MotherDuck, you can use the following SQL commands:
```sql
ATTACH 'md:';
```
```sql
ATTACH 'local.ddb';
CREATE DATABASE cloud_db from 'local.ddb';
```
---
Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-postgres
---
sidebar_position: 11
title: From a PostgreSQL or MySQL Database
description: Learn to load a table from your PostgreSQL or MySQL database into MotherDuck.
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Using PostgresSQL or MySQL DuckDB Extensions
DuckDB's [PostgreSQL extension](https://duckdb.org/docs/extensions/postgres.html) and [MySQL extension](https://duckdb.org/docs/extensions/mysql.html) makes it extremely easy to connect to and access data stored in your OLTP databases. Once connected, you can just as easily export the data to MotherDuck to offload analytical queries while benefiting from data centralization, persistence, and data sharing capabilities. In this guide we will demonstrate this workflow with the PostgreSQL extension. Consult the [DuckDB MySQL extension documentation](https://duckdb.org/docs/extensions/mysql) to make adjustments to the steps to work with MySQL databases.
:::info
MotherDuck does not yet support the PostgreSQL and MySQL extensions, so you need to perform the following steps on your own computer or cloud computing resource. We are working on supporting the PostgreSQL extension on the server side so that this can happen within the MotherDuck app in the future with improved performance.
:::
### Prerequisites
- **PostgreSQL Database Credentials**: Ensure you have access details to the PostgreSQL database, including host address, port, and user credentials. You can put the user credentials in the [PostgreSQL Password File](https://www.postgresql.org/docs/current/libpq-pgpass.html), [store them in environment variables](https://duckdb.org/docs/extensions/postgres.html#configuring-via-environment-variables), or pass them inline in the script below.
- **Network Connectivity**: Your machine must be able to connect to the target PostgreSQL database.
- **MotherDuck Credentials**: MotherDuck credentials should be set up. If not, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md).
- **DuckDB**: Either the DuckDB command-line interface or Python + the DuckDB package should be installed and operational. See the [Getting Started tutorials](../../getting-started/getting-started.mdx) for instructions to install DuckDB.
### Steps
The following SQL script reads from a table in the PostgreSQL database and write it to the table named `my_db.pg_data_schema.first_pg_table` in MotherDuck.
Fill in the placeholders ``, ``, ``, ``, ``, ``, and `` with the appropriate values and save the script to a file, e.g., `ingest_data_from_postgres.sql`.
```sql
-- Connect to a MotherDuck database.
ATTACH 'md:';
USE 'my_db';
-- Optionally create a schema, by default MotherDuck uses the main schema;
CREATE SCHEMA IF NOT EXISTS pg_data_schema;
-- Ingest data from PostgreSQL to a MotherDuck table
CREATE OR REPLACE TABLE pg_data_schema.first_pg_table AS
SELECT * FROM
postgres_scan('dbname= host= user= password= connect_timeout=10', '', '')
-- optionally limit the number of rows ingested
LIMIT ;
-- Optional: Verify the number of rows in the MotherDuck table
SELECT count(1) FROM pg_data_schema.first_pg_table;
```
#### Run with DuckDB CLI
After filling out the placeholders, you can either execute the statements line by line in the DuckDB CLI, or save the commands in a file, e.g., `ingest_data_from_postgres.sql`, and run:
```sh
> duckdb < ingest_data_from_postgres.sql
```
#### Run with Python
You can also execute it using Python with the DuckDB package.
```python
import duckdb
with open("ingest_data_from_postgres.sql", 'r') as f:
s = f.read()
duckdb.sql(s)
```
After completing these steps, you should see the new table show up in the MotherDuck Web UI.
## Using a MotherDuck integration Partners
MotherDuck collaborates with various integration partners to facilitate data transfer in diverse ways—including change data capture (CDC)—from your PostgreSQL or MySQL database to MotherDuck.
For example, you can refer to our [Estuary guide](https://motherduck.com/blog/streaming-data-to-motherduck/) that demonstrates how to stream data from Neon, a PostgreSQL-based database, to MotherDuck.
To explore the full range of solutions tailored to your needs, visit our [MotherDuck ecosystem partners page](https://motherduck.com/ecosystem/).
---
Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck
---
title: Loading Data into MotherDuck
description: Learn how to load data into MotherDuck from various sources
---
You can leverage MotherDuck's managed storage to persist your data. MotherDuck storage provides a high level of manageability and abstraction, optimizing your data for secure, durable, performant, and efficient use. There are several ways to load data into MotherDuck storage.
## Before You Start: Understanding Trade-offs
Before choosing a loading method, it's important to understand the performance implications and trade-offs involved. Our [Considerations for Loading Data](./considerations-for-loading-data.mdx) guide explains:
- **Batch vs. streaming approaches** and when to use each
- **File format choices** and their impact on performance
- **Optimal batch sizes** for different scenarios
- **Cost implications** of different loading strategies
- **Common performance pitfalls** and how to avoid them
This understanding will help you make informed decisions that optimize for your specific use case.
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-md-python
---
sidebar_position: 1
title: Loading data to MotherDuck with Python
---
# Loading data to MotherDuck with Python
As you ingest data using Python, typically coming from API or other sources, you have different options to load data to MotherDuck.
1. (fast) Using a Pandas/Polars/PyArrow dataframe as an in memory buffer before bulk loading to MotherDuck.
2. (fast) Write to a temporary file and load it to MotherDuck using a `COPY` command.
3. (slow) Using `executemany` method to perform serveral `INSERT` statements in a single transaction and load data to MotherDuck.
Option `1` is the easiest as dataframe libraries are optimized for bulk insert.
Option `2` involve writing to disk but `COPY` command is faster than `INSERT` statement.
Option `3` should be discouraged unless data is very small (< 500 rows).
:::tip
No matter which options you are picking, we recommend loading data in chunks (typically `100k` rows to match row group size) to avoid memory issues and making sure your transaction is not too large, typically finishing around a minute maximum.
:::
:::info
Next to the below recommendation, we suggest reading our guidelines around [connections](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md) and [threading](/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python.md) which will help you to optimize your data loading process.
:::
## 1. Using Pandas/Polars/PyArrow to load data to MotherDuck
When using a dataframe library you can load data to MotherDuck in a single transaction.
```python
import duckdb
import pyarrow as pa
# Create a PyArrow table
data = {
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva']
}
arrow_table = pa.table(data)
con = duckdb.connect('md:')
con.sql('CREATE TABLE my_table (id INTEGER, name VARCHAR) as SELECT * FROM arrow_table')
```
### Buffering data
When you have a large dataset, it's recommended you chunk your data and load it in batches. This will help you to avoid memory issues and make sure your transaction is not too large.
Here's a class example to load data in chunks using PyArrow and DuckDB.
```python
import duckdb
import os
import pyarrow as pa
import logging
# Setup basic configuration for logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
class ArrowTableLoadingBuffer:
def __init__(
self,
duckdb_schema: str,
pyarrow_schema: pa.Schema,
database_name: str,
table_name: str,
destination="local",
chunk_size: int = 100_000, # Default chunk size
):
self.duckdb_schema = duckdb_schema
self.pyarrow_schema = pyarrow_schema
self.database_name = database_name
self.table_name = table_name
self.total_inserted = 0
self.conn = self.initialize_connection(destination, duckdb_schema)
self.chunk_size = chunk_size
def initialize_connection(self, destination, sql):
if destination == "md":
logging.info("Connecting to MotherDuck...")
if not os.environ.get("motherduck_token"):
raise ValueError(
"MotherDuck token is required. Set the environment variable 'MOTHERDUCK_TOKEN'."
)
conn = duckdb.connect("md:")
logging.info(
f"Creating database {self.database_name} if it doesn't exist"
)
conn.execute(f"CREATE DATABASE IF NOT EXISTS {self.database_name}")
conn.execute(f"USE {self.database_name}")
else:
conn = duckdb.connect(database=f"{self.database_name}.db")
conn.execute(sql) # Execute schema setup on initialization
return conn
def insert(self, table: pa.Table):
total_rows = table.num_rows
for batch_start in range(0, total_rows, self.chunk_size):
batch_end = min(batch_start + self.chunk_size, total_rows)
chunk = table.slice(batch_start, batch_end - batch_start)
self.insert_chunk(chunk)
logging.info(f"Inserted chunk {batch_start} to {batch_end}")
self.total_inserted += total_rows
logging.info(f"Total inserted: {self.total_inserted} rows")
def insert_chunk(self, chunk: pa.Table):
self.conn.register("buffer_table", chunk)
insert_query = f"INSERT INTO {self.table_name} SELECT * FROM buffer_table"
self.conn.execute(insert_query)
self.conn.unregister("buffer_table")
```
Using the above class, you can load your data in chunks.
```python
import pyarrow as pa
# Define the explicit PyArrow schema
pyarrow_schema = pa.schema([
('id', pa.int32()),
('name', pa.string())
])
# Sample data to create a PyArrow table based on the schema
data = {
'id': [1, 2, 3, 4, 5],
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva']
}
arrow_table = pa.table(data, schema=pyarrow_schema)
# Define the DuckDB schema as a DDL statement
duckdb_schema = "CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)"
# Initialize the loading buffer
loader = ArrowTableLoadingBuffer(
duckdb_schema=duckdb_schema,
pyarrow_schema=pyarrow_schema,
database_name="my_db", # The DuckDB database filename or MotherDuck database name
table_name="my_table", # The name of the table in DuckDB or MotherDuck
destination="md", # Set "md" for MotherDuck or "local" for a local DuckDB database
chunk_size=2 # Example chunk size for illustration
)
# Load the data
loader.insert(arrow_table)
```
### Typing your dataset
When working with production pipeline, it's recommended to type your dataset to avoid any issues with inference.
Pyarrow is our recommendation to type your dataset as it's the easiest way to type your dataset, especially for complex data types.
In the above example, the schema is defined explicitly on both the PyArrow table and the DuckDB schema.
```python
# Initialize the loading buffer
loader = ArrowTableLoadingBuffer(
duckdb_schema=duckdb_schema, # prepare a DuckDB DDL statement
pyarrow_schema=pyarrow_schema, # define explictely your PyArrow schema
database_name="my_db",
table_name="my_table",
destination="md",
chunk_size=2
)
```
## 2. Write to a temporary file and load it to MotherDuck using a `COPY` command
When you have a large dataset, another method is to write your data to temporary files and load it to MotherDuck using a `COPY` command. This also works great if you have existing data on a blob storage like AWS S3, Google Cloud Storage or Azure Blob Storage as you will benefit from cloud network speed.
```python
import pyarrow as pa
import pyarrow.parquet as pq
import duckdb
import os
# Step 1: Define the schema and create a large PyArrow table
schema = pa.schema([
('id', pa.int32()),
('name', pa.string())
])
# Example data - multiply the data to simulate a large dataset
data = {
'id': list(range(1, 1000001)), # Simulating 1 million rows
'name': ['Name_' + str(i) for i in range(1, 1000001)]
}
# Create the PyArrow table with the schema
large_table = pa.table(data, schema=schema)
# Step 2: Write the large PyArrow table to a Parquet file
parquet_file = "large_data.parquet"
pq.write_table(large_table, parquet_file)
# Step 3: Load the Parquet file into MotherDuck using the COPY command
conn = duckdb.connect("md:") # Connect to MotherDuck
conn.execute("CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)")
# Use the COPY command to load the Parquet file into MotherDuck
conn.execute(f"COPY my_table FROM '{os.path.abspath(parquet_file)}' (FORMAT 'parquet')")
print("Data successfully loaded into MotherDuck")
```
---
Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-duckdb-database
---
sidebar_position: 4
title: Load a DuckDB database into MotherDuck
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
MotherDuck supports uploading local DuckDB databases in the cloud as referenced by the [CREATE DATABASE](/sql-reference/motherduck-sql-reference/create-database.md) statement.
To create a remote database from the current active local database, execute the following command:
```sql
CREATE OR REPLACE DATABASE remote_database_name FROM CURRENT_DATABASE();
```
To upload an attached local duckdb database, execute the following commands:
```sql
ATTACH '/path/to/local/database.ddb' AS local_db_name;
ATTACH 'md:';
CREATE OR REPLACE DATABASE remote_database_name FROM local_db_name;
```
To upload an duckdb file on disk:
```sql
ATTACH 'md:';
CREATE OR REPLACE DATABASE remote_database_name FROM '/path/to/local/database.ddb';
```
Here's a full end-to-end example:
```sql
-- Let's generate some data based on the tpch extension (will be automatically autoloaded).
-- This will create a couple of tables in the current database.
CALL dbgen(sf=0.1);
-- Connect to MotherDuck
ATTACH 'md:';
CREATE OR REPLACE DATABASE remote_tpch from CURRENT_DATABASE();
```
:::note
Uploading database does not alter context, meaning you are still in the local context after the upload and the query will run locally.
:::
---
Source: https://motherduck.com/docs/key-tasks/managing-organizations/managing-organizations
---
title: Managing Organizations
description: Learn how to manage your Organization with MotherDuck
---
import Versions from '@site/src/components/Versions';
An Organization is a top-level entity in MotherDuck that enables you to perform administrative functions, such as managing users, setting up billing, configuring sharing, monitoring security, and so on. A MotherDuck user can only belong to a single Organization at a time.
Currently, Organizations are helpful for:
- Grouping users together for tracking usage and billing.
- Sharing data with other users of the same organization.
:::note
MotherDuck is currently available on two AWS regions:
- **US East (N. Virginia):** `us-east-1`, supporting DuckDB versions between and .
- **Europe (Frankfurt):** `eu-central-1`, supporting DuckDB versions between and .
You can choose in which region to create your organization, and organizations can only exist within a single cloud region currently.
We are working on expanding to other regions and cloud providers.
:::
## Creating an Organization
If you already have a MotherDuck account, an Organization was already created for you by MotherDuck.
If you are a new MotherDuck user, during sign-up you will be prompted to create a new Organization.

:::note
If another coworker at your company already has an organization, you can create your own organization to get started with MotherDuck right away, and then ask them to invite you to their organization later (See ["Joining an Existing Organization"](#joining-an-existing-organization) below).
:::
## Inviting Users to Your Organization
You can check if your teammates are in your Organization by navigating to the MotherDuck Web UI -> "Settings" -> "Members". There you may also invite your teammates to join your Organization.
You may invite both teammates without a MotherDuck account and existing MotherDuck users.

## Joining an Existing Organization
If you'd like to join your teammates' existing MotherDuck Organization, you must be invited by an Administrator in that Organization. Once an invite is generated, you will receive an email with a link to join the Organization.
## Roles
Within an Organization a user can have an "Admin" or "Member" role. The first user in an organization shall be the "Admin" and subsequent users shall have the "Member" role.
"Admin" users can change the roles of other users in the organization or "Remove" a user from the organization.
:::note
In the future sending invitations, changing between plans, or updating billing information will require an "Admin" role.
:::
## Removing Users
If a user leaves your team or no longer needs access, "Admins" users can "Remove" them from the organization to restrict data access or clean up some resources that are no longer used. This is done from the context menu in the ["Members" table](https://app.motherduck.com/settings/members).
:::warning
Because a user can only belong to one organization, removing them from the organization permanently deletes the user and all of their data. This action cannot be undone.
:::
## Limitations and Upcoming Improvements
Currently Organizations have the following limitations:
- It is not possible to explore existing Organizations. Please reach out to other MotherDuck users at your company or [contact us](../../troubleshooting/support.md) if you would like to find other users at your company.
---
Source: https://motherduck.com/docs/key-tasks/running-hybrid-queries
---
sidebar_position: 8
title: Running dual execution (or hybrid) queries
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
MotherDuck can use local data and remote data in the same query.
**Example:**
Run the DuckDB CLI.
```bash
duckdb
```
Connect to MotherDuck.
You may be prompted to sign in if you aren't already.
```sql
ATTACH 'md:';
```
Run the following in a MotherDuck notebook.
Create a local database in memory.
```sql
ATTACH ':memory:' AS local_db;
CREATE TABLE local_db.pricing AS
FROM (VALUES ('A', 1.4), ('B', 1.12), ('C', 2.552), ('D', 5.23))
pricing(item, price);
FROM local_db.pricing;
```
```bash
┌─────────┬──────────────┐
│ item │ price │
│ varchar │ decimal(4,3) │
├─────────┼──────────────┤
│ A │ 1.400 │
│ B │ 1.120 │
│ C │ 2.552 │
│ D │ 5.230 │
└─────────┴──────────────┘
```
Create a remote database in MotherDuck.
```sql
CREATE OR REPLACE DATABASE remote_db;
CREATE TABLE remote_db.sales AS
SELECT
'ABCD'[floor(random() * 3.999)::int + 1] AS item,
current_date() - interval (random() * 100) days AS dt,
floor(random() * 50)::int AS tally
FROM generate_series(1000);
FROM remote_db.sales LIMIT 10;
```
```bash
┌─────────┬─────────────────────┬───────┐
│ item │ dt │ tally │
│ varchar │ timestamp │ int32 │
├─────────┼─────────────────────┼───────┤
│ D │ 2024-11-29 00:00:00 │ 0 │
│ A │ 2024-10-04 00:00:00 │ 17 │
│ A │ 2024-10-13 00:00:00 │ 0 │
│ C │ 2024-11-05 00:00:00 │ 49 │
│ A │ 2024-09-30 00:00:00 │ 12 │
│ B │ 2024-09-27 00:00:00 │ 47 │
│ C │ 2024-11-23 00:00:00 │ 47 │
│ B │ 2024-09-18 00:00:00 │ 13 │
│ A │ 2024-11-18 00:00:00 │ 40 │
│ C │ 2024-09-18 00:00:00 │ 4 │
├─────────┴─────────────────────┴───────┤
│ 10 rows 3 columns │
└───────────────────────────────────────┘
```
Join the remote sales table to our local pricing data
to get revenue by month.
```sql
SELECT
date_trunc('month', dt) AS mo,
round(sum(price * tally),2) AS rev
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;
```
```bash
┌────────────┬───────────────┐
│ mo │ rev │
│ date │ decimal(38,2) │
├────────────┼───────────────┤
│ 2024-09-01 │ 9241.39 │
│ 2024-10-01 │ 14226.12 │
│ 2024-11-01 │ 13136.55 │
│ 2024-12-01 │ 7783.26 │
└────────────┴───────────────┘
```
To see what is running locally and remotely, you can use EXPLAIN:
```sql
EXPLAIN
SELECT
date_trunc('month', dt) AS mo,
round(sum(price * tally),2) AS rev
FROM remote_db.sales
JOIN (FROM local_db.pricing WHERE price > 2) pricing
ON sales.item = pricing.item
GROUP BY mo ORDER BY mo;
```
In each operator of the plan, `(L)` indicates local while `(R)`
indicates remote. Data is transferred using sinks and sources.
```bash
┌─────────────────────────────┐
│┌───────────────────────────┐│
││ Physical Plan ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│ DOWNLOAD_SOURCE (L) │
│ ──────────────────── │
│ bridge_id: 1 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ BATCH_DOWNLOAD_SINK (R) │
│ ──────────────────── │
│ bridge_id: 1 │
│ parallel: true │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ ORDER_BY (R) │
│ ──────────────────── │
│ date_trunc('month', sales │
│ .dt) ASC │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R) │
│ ──────────────────── │
│ 0 │
│ rev │
│ │
│ ~125 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_GROUP_BY (R) │
│ ──────────────────── │
│ Groups: #0 │
│ Aggregates: sum(#1) │
│ │
│ ~125 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R) │
│ ──────────────────── │
│ mo │
│ (CAST(price AS DECIMAL(14 │
│ ,3)) * CAST(tally AS │
│ DECIMAL(14,0))) │
│ │
│ ~250 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (R) │
│ ──────────────────── │
│ #0 │
│ #1 │
│ #2 │
│__internal_compress_string_│
│ utinyint(#3) │
│ #4 │
│ │
│ ~250 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ HASH_JOIN (R) │
│ ──────────────────── │
│ Join Type: INNER │
│ │
│ Conditions: ├──────────────┐
│ item = item │ │
│ │ │
│ ~250 Rows │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ SEQ_SCAN (R) ││ UPLOAD_SOURCE (R) │
│ ──────────────────── ││ ──────────────────── │
│ sales ││ bridge_id: 2 │
│ ││ │
│ Projections: ││ │
│ item ││ │
│ dt ││ │
│ tally ││ │
│ ││ │
│ ~1001 Rows ││ │
└───────────────────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ BATCH_UPLOAD_SINK (L) │
│ ──────────────────── │
│ bridge_id: 2 │
│ parallel: true │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION (L) │
│ ──────────────────── │
│ item │
│ price │
│ │
│ ~1 Rows │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN (L) │
│ ──────────────────── │
│ pricing │
│ │
│ Projections: │
│ price │
│ item │
│ │
│ Filters: │
│ price>2.000 AND price IS │
│ NOT NULL │
│ │
│ ~1 Rows │
└───────────────────────────┘
```
A dual execution (or hybrid) query can be run on any database format supported by
DuckDB, including
[sqlite](https://duckdb.org/docs/extensions/sqlite_scanner),
[postgres](https://duckdb.org/docs/extensions/postgres_scanner)
and many others.
---
Source: https://motherduck.com/docs/key-tasks/service-accounts-guide
---
id: service-accounts-guide
title: "Managing Service Accounts"
description: "A step-by-step guide to creating, configuring, and managing service accounts and their tokens."
sidebar_label: "Service Accounts Guide"
custom_edit_url: null
---
import Admonition from '@theme/Admonition';
import Heading from "@theme/Heading";
import Tabs from '@theme/Tabs';
import TabItem from "@theme/TabItem";
## Guide - Create and configure a service account
This guide walks you through the process of creating service accounts, their associated access tokens, and configuring Ducklings (compute instances) with the MotherDuck UI or programmatically with the MotherDuck Admin REST API.
:::note
**Prerequisites:** All actions described below are only available to **Admin users** in your MotherDuck Organization.
API calls must be authenticated using an access token generated by an **Admin user** in your MotherDuck Organization. Pass this token in the `Authorization` header as `Bearer YOUR_ADMIN_TOKEN`.
:::
```mermaid
flowchart LR
A[Admin User] --> B[Create Service Account]
B --> C["Generate Token (Optional)"]
C --> D[Configure Ducklings]
D --> E[Service Account Ready]
F[Admin Token] --> G[Create Service Account]
G --> H["Generate Token (Optional)"]
H --> I[Configure Ducklings]
I --> E
subgraph UI["via MotherDuck UI"]
B
C
D
end
subgraph API["via REST API (curl, Python, etc.)"]
G
H
I
end
style UI stroke:#4CAF50, stroke-width:2px
style API stroke-dasharray: 5 5, stroke:#2196F3, stroke-width:2px
```
This guide involves three main steps:
1. **Create a Service Account:** Use the "Create new user" endpoint.
2. **Create an Access Token:** Generate an access token for the newly created service account.
3. **Configure Ducklings:** Set the size of read-write and read-scaling Ducklings for the service account.
### Step 1: Create a New Service Account

1. Navigate to the MotherDuck Web UI -> *Settings* -> *Service Accounts*
2. Click **Create service account**
3. Enter a username for the account (username can only contain characters, numbers, and underscores)
To create a service account, you will use the [Create New User (service account)](../../sql-reference/rest-api/users-create-service-account) endpoint (the exact endpoint is `POST /v1/users`).
Refer to the [endpoint documentation](../../sql-reference/rest-api/users-create-service-account) for full details on request parameters and responses.
**Key details:**
* You will define a `username` for your service account.
* **Important:** For all subsequent API calls and identification, use the `username` you defined.
Example:
```bash
curl -X POST \
https://api.motherduck.com/v1/users \
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"username": "my-service-account-001"
}'
```
To create a service account, you will use the [Create New User (service account)](../../sql-reference/rest-api/users-create-service-account) endpoint (the exact endpoint is `POST /v1/users`).
Refer to the [endpoint documentation](../../sql-reference/rest-api/users-create-service-account) for full details on request parameters and responses.
**Key details:**
* You will define a `username` for your service account.
* **Important:** For all subsequent API calls and identification, use the `username` you defined.
Example:
```python
# /// script
# dependencies = ["requests"]
# ///
import requests
response = requests.post(
'https://api.motherduck.com/v1/users',
headers={
'Authorization': 'Bearer YOUR_ADMIN_TOKEN',
'Content-Type': 'application/json'
},
json={
'username': 'my-service-account-001'
}
)
if response.status_code == 200:
service_account = response.json()
print(f"Service account created: {service_account}")
else:
print(f"Error: {response.status_code} - {response.text}")
```
### Step 2: Create an Access Token for the Service Account

1. Click on the service account username to open details
2. Click **Create token**
3. Provide a token name
4. For organizations on the Business plan, select a token type. Select **Read Scaling Token** to leverage MotherDuck's [Read Scaling](../authenticating-and-connecting-to-motherduck/read-scaling) feature
5. (Optional) Select **Automatically expire this token** to set the token's time-to-live
6. Click **Create token**. Immediately copy the token from the modal and store it securely. It won't be shown again once the modal is closed
Additional tokens can be created at any time from the service account's details.
Once the service account is created, you should generate an access token for it using the [Create an Access Token](../../sql-reference/rest-api/users-create-token) endpoint (the exact endpoint is `POST /v1/users/:username/tokens`).
Refer to the [endpoint documentation](../../sql-reference/rest-api/users-create-token) for full details.
**Key details:**
* The `:username` in the path refers to the `username` you chose in Step 1 (e.g., `my-service-account-001`).
* The path parameter `:username` is a placeholder; replace it with the actual username. For example, `/v1/users/my-service-account-001/tokens`.
* The `token_type` should be set only if you are on the business plan and want to leverage MotherDuck's [Read Scaling](../authenticating-and-connecting-to-motherduck/read-scaling) feature
Example:
```bash
curl -X POST \
https://api.motherduck.com/v1/users/my-service-account-001/tokens \
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "token-for-my-service-account-001",
// "ttl": 300 # Optional parameter: token time-to-live in seconds (e.g., 5 minutes).
// When ttl parameter is omitted token will not be set to expire
"token_type": "read_write" // Or "read_scaling"
}'
```
The response will contain the access token for your service account. Securely store this token as it will be used by your service account to authenticate with MotherDuck.
Once the service account is created, you should generate an access token for it using the [Create an Access Token](../../sql-reference/rest-api/users-create-token) endpoint (the exact endpoint is `POST /v1/users/:username/tokens`).
Refer to the [endpoint documentation](../../sql-reference/rest-api/users-create-token) for full details.
**Key details:**
* The `:username` in the path refers to the `username` you chose in Step 1 (e.g., `my-service-account-001`).
* The path parameter `:username` is a placeholder; replace it with the actual username. For example, `/v1/users/my-service-account-001/tokens`.
* The `token_type` should be set only if you are on the business plan and want to leverage MotherDuck's [Read Scaling](../authenticating-and-connecting-to-motherduck/read-scaling) feature
* To automatically expire the token, set the optional `ttl` parameter to the token's uptime in seconds.
Example:
```python
# /// script
# dependencies = ["requests"]
# ///
import requests
response = requests.post(
'https://api.motherduck.com/v1/users/my-service-account-001/tokens',
headers={
'Authorization': 'Bearer YOUR_ADMIN_TOKEN',
'Content-Type': 'application/json'
},
json={
'name': 'token-for-my-service-account-001',
'token_type': 'read_write' # Or 'read_scaling',
'ttl': '300' # OPTIONAL - Token will be valid for 300 seconds (5 minutes)
}
)
if response.status_code == 200:
token_data = response.json()
print(f"Token created successfully!")
print(f"Token: {token_data['token']}")
# Securely store this token
else:
print(f"Error: {response.status_code} - {response.text}")
```
The response will contain the access token for your service account. Securely store this token as it will be used by your service account to authenticate with MotherDuck.
### Step 3: Set Account Ducklings (Configure Compute)

1. Set the read/write Duckling size for the account using the dropdown under the **Read/Write Duckling** header
2. For organizations on the Business plan using [read scaling](../authenticating-and-connecting-to-motherduck/read-scaling), set the account's read scaling Duckling size and replica pool size using the respective dropdowns.
To define the size of Ducklings provisioned for the service account, use the "set user Ducklings" endpoint. (The exact endpoint is `PUT /v1/users/:username/instances`).
Refer to the [Set User Ducklings](../../sql-reference/rest-api/ducklings-set-duckling-config-for-user) endpoint documentation for details on the correct payload and available Duckling sizes.
**Key details:**
* The `:username` in the path is the service account `:username` from Step 1.
* This endpoint is used to set both read-write and read-scaling Ducklings
* You always need to pass the entire body, even though you are only interested in changing only one of the two Duckling configs
Example:
```bash
curl -X PUT \
https://api.motherduck.com/v1/users/my-service-account-001/instances \
-H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"config": {
"read_write": {
"instance_size": "jumbo"
},
"read_scaling": {
"instance_size": "pulse",
"flock_size": 16
}
}
}'
```
To define the size of Ducklings provisioned for the service account, use the "set user Ducklings" endpoint. (The exact endpoint is `PUT /v1/users/:username/instances`).
Refer to the [Set User Ducklings](../../sql-reference/rest-api/ducklings-set-duckling-config-for-user) endpoint documentation for details on the correct payload and available Duckling sizes.
**Key details:**
* The `:username` in the path is the service account `:username` from Step 1.
* This endpoint is used to set both read-write and read-scaling Ducklings
* You always need to pass the entire body, even though you are only interested in changing only one of the two Duckling configs
Example:
```python
# /// script
# dependencies = ["requests"]
# ///
import requests
response = requests.put(
'https://api.motherduck.com/v1/users/my-service-account-001/instances',
headers={
'Authorization': 'Bearer YOUR_ADMIN_TOKEN',
'Content-Type': 'application/json'
},
json={
'config': {
'read_write': {
'instance_size': 'jumbo'
},
'read_scaling': {
'instance_size': 'pulse',
'flock_size': 16
}
}
}
)
if response.status_code == 200:
print("Ducklings configured successfully!")
else:
print(f"Error: {response.status_code} - {response.text}")
```
### Summary
By following these steps, you can create and configure service accounts for your MotherDuck organization. Remember to:
* Use an Admin account or token for all management operations.
* Securely store the generated service account tokens.
* Use the chosen service account `username` in any API calls.
---
:::note
The REST API methods for managing service accounts are in 'Preview' and may change in the future.
For detailed information on each API call, always refer to the specific endpoint documentation.
:::
## Impersonate Service Accounts (UI Only)
Admin users can log into the MotherDuck UI as a service account in the organization using the **Impersonation** feature. Impersonation allows admins to view and interact with the MotherDuck Web UI by impersonating the service account, which is useful for manually performing read-write actions, monitoring ongoing query activity, or testing and troubleshooting service account-specific resources.

1. Click the trident (⋮) next to the service account you want to impersonate
2. Select **Impersonate this account** from the dropdown
3. The MotherDuck UI will refresh, and you will be logged into the MotherDuck Web UI as that service account. While impersonating, a persistent banner will be shown at the top of the UI, with options to **Refresh session** or **Return to admin**
4. Impersonation sessions expire after two hours. Refresh the browser tab to reset the expiry countdown

:::tip
You can bookmark the URL while in an impersonation session to generate a new impersonation session using that same service account at a future time.
You must be logged into the MotherDuck Web UI as an Admin user for the URL to successfully start a new impersonation session.
:::
Service account impersonation is currently only available through the MotherDuck Web UI. Use service account tokens to authenticate with service accounts outside of the MotherDuck Web UI.
## Managing Service Accounts and Tokens
### Accounts
Navigate to the MotherDuck Web UI -> *Settings* -> *Service Accounts*

- New service accounts can be created by clicking the **Create service account** button above the account list
- The Duckling settings for each account can be managed using the Duckling size and pool size dropdowns in the list
- To view a service account's tokens and details, click the account's username in the list, or click the trident (⋮) next to the service account, and select **View details**
- Service accounts can be deleted by clicking the trident (⋮) next to the service account, and selecting **Delete account**
- When a service account is deleted, all tokens associated with the service account are immediately revoked
- To view the service accounts associated with your MotherDuck account, use the [List Access Tokens](../../sql-reference/rest-api/users-list-tokens) endpoint.
- Be sure to refer to the [API reference docs](../../sql-reference/rest-api/users-list-tokens) for full details on request parameters and responses.
``` python
# /// script
# dependencies = ["requests","pprint"]
# ///
import requests
import pprint
url = "https://api.motherduck.com/v1/users/"
payload = {}
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer YOUR_ADMIN_TOKEN' # Replace with your admin token
}
admin_user = :username
response = requests.request("GET", url + admin_user + "/tokens", headers=headers, data=payload)
pprint.pprint(response.json()) # Easier to read with many tokens
```
- To delete a service account, use the [Delete a User](../../sql-reference/rest-api/users-delete) endpoint.
- Be sure to refer to the [API reference docs](../../sql-reference/rest-api/users-delete) for full details on request parameters and responses.
:::note
NOTE: When deleting accounts, please double check your username. This operation cannot be undone!
:::
``` python
# /// script
# dependencies = ["requests","pprint"]
# ///
import requests
import pprint
url = "https://api.motherduck.com/v1/users/"
payload = {}
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer YOUR_ADMIN_TOKEN' # Replace with your admin token
}
target_user = :username
response = requests.request("DELETE", url + target_user, headers=headers, data=payload)
pprint.pprint(response.text)
```
### Tokens
Navigate to the MotherDuck Web UI -> *Settings* -> *Service Accounts*

- To view a service account's tokens, click the account's username in the list, or click the trident (⋮) next to the service account, and select **View details**
- Each valid token for a service account and its type (Read/Write or Read Scaling), creation time, and expiry time is listed in the service account details
- To revoke a token for a service account, click the three-dots (…) next to the token, and select **Revoke token**. A confirmation prompt will appear, select **Revoke token**

- To view a service account's tokens, use the [List Access Tokens](../../sql-reference/rest-api/users-list-tokens) endpoint. From there, you can see the token type, token ID, creation time, and expiration time.
- Be sure to refer to the [API reference docs](../../sql-reference/rest-api/users-list-tokens) for full details on request parameters and responses.
``` python
# /// script
# dependencies = ["requests","pprint"]
# ///
import requests
import pprint
url = "https://api.motherduck.com/v1/users/"
payload = {}
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer YOUR_ADMIN_TOKEN' # Replace with your admin token
}
target_user = :username
response = requests.request("GET", url + target_user + "/tokens", headers=headers, data=payload)
pprint.pprint(response.json()) # Easier to read with many tokens
```
- To revoke a token, use the [Invalidate a Token](../../sql-reference/rest-api/users-delete-token) endpoint. You will need the token ID and its associated username.
- Be sure to refer to the [API reference docs](../../sql-reference/rest-api/users-delete-token) for full details on request parameters and responses.
:::note
NOTE: When deleting tokens, please double check your username and token_id. This operation cannot be undone!
:::
``` python
# /// script
# dependencies = ["requests","pprint"]
# ///
import requests
import pprint
url = "https://api.motherduck.com/v1/users/"
payload = {}
headers = {
'Accept': 'application/json',
'Authorization': 'Bearer YOUR_ADMIN_TOKEN' # Replace with your admin token
}
target_user = :username
token_id = :token_id
response = requests.request("DELETE", url + target_user + "/tokens/" + token_id, headers=headers, data=payload)
pprint.pprint(response.text)
```
---
Source: https://motherduck.com/docs/key-tasks/sharing-data/managing-shares
---
sidebar_position: 4
title: Managing shares
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Getting details about a share
You can learn more about a specific share that you've created by using [`DESCRIBE SHARE`](/sql-reference/motherduck-sql-reference/describe-share.md) command. For example:
```sql
-- if you are the share owner, use the database name
DESCRIBE SHARE "duckshare";
-- if you are the share viewer, use the full url
DESCRIBE SHARE "md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6";
```
In the UI you can roll over a share to see a tooltip that tells you the share owner, when it was last updated, and access scope.
## Listing Shares
You can list the shares you have created via the [`LIST SHARES`](/sql-reference/motherduck-sql-reference/list-shares.md) statement. For example:
```sql
LIST SHARES;
```
1. You can see shares that you've created under "Shares I've created".
2. You can find **Discoverable** **Organization** shares that members of your Organization created under "Shared with me".
To view the URLs of shares created by others that you have currently attached, use the [`SHOW ALL DATABASES`](/sql-reference/motherduck-sql-reference/show-databases/) command. The `fully_qualified_name` column gives you the share URL of the attached share.
## Deleting a share
Shares can be deleted with the [`DROP SHARE`](/sql-reference/motherduck-sql-reference/drop-share.md) or `DROP SHARE IF EXISTS` method. For example:
Users who have [`ATTACH`](/sql-reference/motherduck-sql-reference/attach.md)-ed it will lose access.
```sql
DROP SHARE "share1";
```
1. Roll over the share you'd like to delete.
2. Click on the "trident" on the right side.
3. Select "Drop".
4. Confirm.
## Updating a share
Sharing a database creates a point-in-time snapshot of the database at the time it is shared.
To publish changes, you need to explicitly run `UPDATE SHARE `.
When updating a `SHARE` with the same database, the URL does not change.
```sql
UPDATE SHARE ;
```
In the following example database 'mydb' was previously shared by creating a share 'myshare', and the database 'mydb' has been updated since. Owner of the database would like their colleagues to receive the new version of this database:
```sql
# 'myshare' was previously created on the database 'mydb'
UPDATE SHARE "myshare";
```
If you lost your database share url, you can use the `LIST SHARES` command to list all your share or `DESCRIBE SHARE ` to get specific details about a given share name.
## Editing/Altering a share
You can change the configuration of shares you've created in the UI. SQL operation `ALTER SHARE` is in the works.
1. Roll over the share you'd like to edit.
2. Click on the "trident" on the right side.
3. Select "Alter".
4. Change the share configuration as you see fit.
5. Confirm "Alter share".
**Error handling:** If you don't see the trident icon, you may not have permission to edit this share.
---
Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-data
---
title: Sharing data in MotherDuck
description: Learn how to securely share data in MotherDuck
---
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
You can easily and securely share data in MotherDuck. MotherDuck's sharing model is specifically optimized for the following scenarios:
- Sharing data with everyone in your Organization for easy discovery and low-friction access. Typical of small highly collaborative data teams.
- Sharing data with specific accounts in your Organization. Popular with data application builders needing to isolate tenants.
- Sharing data publicly with anyone with a MotherDuck account in the same cloud region as your Organization, including users outside your Organization.
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-overview
---
sidebar_position: 1
title: Sharing concepts and overview
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Sharing data in MotherDuck
MotherDuck's data sharing model currently has the following key characteristics:
- Sharing is at the granularity of a MotherDuck database.
- Sharing is read-only.
- Sharing is done through **share** objects.
- You can make shares easily discoverable and queryable by all users in your [Organization](../managing-organizations/managing-organizations.mdx).
- You can create restricted shares, where access to each is managed with an [Access Control List (ACL)](./sharing-with-users.md).
- Alternatively, you can use hidden share URLs to limit access to specific people in your organization you share the URL with.
- You can also configure the URL of a hidden share to be accessible by anyone with a MotherDuck account in the same cloud region as your Organization.
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
Sharing in MotherDuck works as follows:
1. The **data provider** shares their database in MotherDuck by creating a share.
2. The **data consumer** attaches said share, which creates a database clone in their workspace. The data consumer can now query this database.
3. The **data provider** periodically updates the share to push updates to the database to **data consumers**.
## Creating a share
The first step in sharing databases in MotherDuck is to create a share, which can be done in both UI and SQL. Creating a share does not incur additional costs, and no actual data is copied or transferred - creating a share is a zero-copy, metadata-only operation.
Click on the "trident" next to the database you'd like to share. Select "share". Then:

1. Optionally, choose a share name. Default will be the database name.
2. Choose whether the share should only be accessible by all users in your Organization, specified users in your Organization, or any MotherDuck user in the same cloud region who has access to the share link.
3. Choose whether the share should be automatically updated or not. Default is `MANUAL`
The following example creates a share from database "birds":
- Share is also named "birds".
- This share can only be accessed by accounts authenticated in your [Organization](../managing-organizations/managing-organizations.mdx).
- This share is discoverable. Users in your Organization will be able to easily find this share.
```sql
use birds;
CREATE SHARE; -- Shorthand syntax. Share name is optional. By default, shares are Organization-scoped and Discoverable.
CREATE SHARE IF NOT EXISTS birds FROM birds
(ACCESS ORGANIZATION , VISIBILITY DISCOVERABLE, UPDATE MANUAL); -- This query is identical to the previous one but with explicit options.
```
Learn more about the [CREATE SHARE](/sql-reference/motherduck-sql-reference/create-share.md) SQL command.
### Organization shares
When creating a share, you may choose scope of access to this share:
- **Organization**. Only users authenticated in your Organization will have access to this share.
- **Restricted**. Only the share owner and users specified with `GRANT` commands can access the share.
- **Unrestricted**. Any user signed into any MotherDuck organization in the same cloud region can access this share using the share URL.
### Discoverable shares
When creating a share, you may choose to make this share **Discoverable**. All authenticated users in your Organization will be able to easily find this share in the UI.
You can create **Discoverable** shares that are **Unrestricted**, but only members of your Organization can find this share in the UI. Non-members can still access this share using the share URL.
### Share URLs
When you create a share, a URL for this share is generated:
- If the share is **Discoverable**, members of your Organization will easily be able to find this share without the share URL. Alternatively, they can use the URL directly.
- If the share is **Hidden** (e.g. not Discoverable), other users will not be able to find the share URL. You will need to send this URL directly to the users with whom you want to share this data.
## Consuming shared data
The **data consumer** needs to attach the share to their workspace, thereby creating a read-only zero-copy clone of the source database. This is a free, metadata-only operation.
### Consuming discoverable shares
If the **data provider** created a Discoverable share you have access to, you should be able to find this share in the UI.
1. Select the share you want under "Shared with me".
2. Optionally roll over the share to see the tooltip that tells you the share owner, when it was last updated, and share access scope.
2. Click "attach".
3. You can now query the resulting database.
### Consuming hidden shares
If the **data provider** created a Hidden (e.g. non-Discoverable) share, they need to pass the share URL to the **data consumer**. The **data consumer**, in turn, needs to attach the share URL.
```sql
ATTACH 'md:_share/ducks/0a9a026ec5a55946a9de39851087ed81' AS birds; # attaches the share as database `birds`
```
## Updating shared data
If during creation of the share, the **data provider** chooses to have the share update automatically, the share will be updated periodically.
If the share was created with `MANUAL` updates, the **data provider** needs to manually update the share.
```sql
UPDATE SHARE birds;
```
Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md) and [data replication timing and checkpoints](./updating-shares.md).
## Consuming updated data
By default, shares automatically update every minute. However, if you need the most up-to-date data sooner, the consumer can manually refresh the share after the producer executes UPDATE SHARE.
To manually refresh the data:
```sql
REFRESH DATABASES; -- Refreshes all connected databases and shares
REFRESH DATABASE my_share; -- Alternatively, refresh a specific database/share
```
Lean more about [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md).
---
Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-with-users
---
sidebar_position: 3
title: Sharing data with specific users
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
MotherDuck enables you to securely share data with specific users. Common scenarios include:
- Building data applications, in which each tenant should only have access to their own data.
- Sharing sensitive data within your Organization.
- Sharing data outside of your Organization.
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
Sharing data with individuals is easy. MotherDuck supports two approaches:
- Creating a share with **Restricted** access, limiting access to a list of specified users within your organization (known as an "ACL" or "Access Control List").
- Creating a **Hidden** share and providing individuals with the share URL.
## Creating a share with restricted access (ACL)
**Overview**
1. **Data provider** creates a share with **Restricted** access.
2. **Data provider** _(share owner)_ specifies which **data consumers** _(users)_ can read from the share.
3. **Data consumer** **attaches** the share.
4. **Data provider** periodically updates the share to push new data to **data consumers**.
Anyone within your organization that is _not_ included in the list will **not** be able to access the share, even if they have a share link.
Click on the "trident" next to the database you'd like to share. Select "Share".
1. Optionally name the share.
2. Under "Who has access" choose "Specified users with the share link". Search for and add the users within your Organization that should have access to read the share.
3. Choose whether the share should be [automatically updated or not](../sharing-overview/#updating-shared-data). Default is `MANUAL`.
3. Create the share.
4. For the specified users, the share will appear in their UI under 'Shared with me' and can be attached.
```sql
use birds;
CREATE SHARE birds FROM birds
(ACCESS RESTRICTED); -- This query creates a share accessible only by organization users specified with GRANT commands
GRANT READ ON SHARE birds TO duck1, duck2; -- Gives the users with usernames 'duck1' and 'duck2' access to the share 'birds'
```
**Data consumer** must `ATTACH` the restricted share before querying the share. See [consuming restricted shares](./#consuming-restricted-shares).
:::note
Restricted shares default to **Discoverable** visibility for users who have been granted access to the share. (Learn more about ["Discoverable shares"](../sharing-overview/#discoverable-shares)).
:::
### Consuming restricted shares
The **data consumers** in your Organization with access to the restricted share can use the UI or SQL to **attach** the share and start querying it.
1. Select the restricted share you want to attach under "Shared with me"
2. Click "attach" and optionally name the resulting database.
3. You can now query the resulting database.
Run the `ATTACH` command to attach the share as a queryable database. This is a zero-cost metadata-only operation.
```sql
ATTACH md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa; -- Creates a zero-copy clone database called birds
```
Learn more about [ATTACH](/sql-reference/motherduck-sql-reference/attach.md).
### Modifying share access
**Data providers** _(share owners)_ can modify which users within your Organization have access to the share.
1. Find the target share in the "Shares I've created" section of the Object Explorer, and choose the 'Alter' option from the context menu.
2. From here, you can add and remove users with access to the share.
3. You may also alter the share to use a different **access** scope. Learn more about [share access scopes](../sharing-overview/#organization-shares).
For more details on how to configure access controls for restricted shares, see the [`GRANT READ ON SHARE` reference page](/sql-reference/motherduck-sql-reference/grant-access/).
```sql
GRANT READ ON SHARE birds TO duck3; -- Gives the user with username 'duck3' access to the share 'birds'
REVOKE READ ON SHARE birds FROM penguin; -- Revokes access to the share 'birds' from the user with username 'penguin'
```
For more details on configuring access controls for restricted shares, see the [`GRANT READ ON SHARE` reference page](/sql-reference/motherduck-sql-reference/grant-access/).
## Creating hidden shares
**Overview**
1. **Data provider** creates the share URL and passes this URL to the **data consumer**.
2. **Data consumer** **attaches** the share.
3. **Data provider** periodically updates the share to push new data to **data consumers**.
To share a database, first create a Hidden share. No actual data is copied and no additional costs are incurred in this process.
Click on the "trident" next to the database you'd like to share. Select "share".
1. Optionally name the share.
2. To share the data with MotherDuck users inside or outside of your Organization, choose the "Anyone with the share link" option. This will enable anyone with the share link in the same cloud region to attach and query the share, including users outside your Organization.
3. Create the share.
4. Copy the resulting **ATTACH** command to your clipboard and send it to your **data consumers**.
```sql
use birds;
CREATE SHARE birds FROM birds
(ACCESS UNRESTRICTED , VISIBILITY HIDDEN); -- This query creates a Hidden share accessible by anyone with the share link in the same cloud region, including users outside your Organization
> md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa
```
Save the returned share URL and pass it to **data consumers**.
### Consuming hidden shares
The **data consumer** in your Organization can use SQL to attach the share and start querying it!
Run the `ATTACH` command to attach the share as a queryable database. This is a zero-cost metadata-only operation.
```sql
ATTACH md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa; -- Creates a zero-copy clone database called birds
```
Learn more about [ATTACH](/sql-reference/motherduck-sql-reference/attach.md).
## Updating shared data
If during creation of the share, the **data provider** chose to have the share updated automatically, the share will be updated periodically.
If the share was created with `MANUAL` updates, the **data provider** needs to manually update the share.
```sql
UPDATE SHARE birds;
```
Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md) and [data replication timing and checkpoints](./updating-shares.md).
---
Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-within-org
---
sidebar_position: 2
title: Sharing data with your organization
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Sharing data with your organization
MotherDuck makes it easy for you to share data with all members of your Organization and making that data easily discoverable and queryable. This is a common use case for small, highly collaborative data teams.
1. **Data provider** creates an **Organization** scoped, **Discoverable** share.
2. **Data consumers** easily find the share and **attach** it.
3. **Data provider** periodically updates the share to push new data to **data consumers**.
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
## 1. Creating organization-scoped, discoverable shares
To share a database with your Organization, create a share. No actual data is copied and no additional costs are incurred in this process.

Click on the "trident" next to the database you'd like to share. Select "share". Then:
1. Optionally, choose a share name. Default will be the database name.
2. Choose whether the share should only be accessible by all users in your Organization, specified users in your Organization, or any MotherDuck user in the same cloud region who has access to the share link.
4. Choose whether the share should be automatically updated or not; the current default is `MANUAL`
```sql
use birds;
CREATE SHARE; -- Shorthand syntax. Share name is optional. By default, shares are Organization-scoped and Discoverable.
CREATE SHARE birds FROM birds
(ACCESS ORGANIZATION , VISIBILITY DISCOVERABLE); -- This query is identical to the previous one yet optionally more verbose.
```
## 2. Finding and consuming shares
The **data consumer** in your Organization can use the UI to find the share, attach it, and start querying it!
1. Select the share you want under "Shared with me"
2. Click "attach" and optionally name the resulting database.
3. You can now query the resulting database.
:::note
The ability to list and discover Discoverable shares in SQL is coming shortly.
:::
## 3. Updating shared data
If during creation of the share, the **data provider** chose to have the share updated automatically, the share will be updated periodically.
If the share was created with `MANUAL` updates, the **data provider** needs to manually update the share.
```sql
UPDATE SHARE birds;
```
Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md) and [data replication timing and checkpoints](./updating-shares.md).
---
Source: https://motherduck.com/docs/key-tasks/sharing-data/updating-shares
---
sidebar_position: 5
title: Updating shares
description: Learn about data replication timing, checkpoints, and how to ensure your latest data is available in shares and read-only Ducklings.
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Data replication speed
**Use this when you need to:** Understand how quickly data changes become available in shares and read-only Ducklings.
**Prerequisites:** You should have shares or read-only Ducklings configured in your MotherDuck environment.
**You'll know you're done when:** You understand the timing characteristics and can optimize data availability when needed.
MotherDuck automatically replicates data to shares and read-only Ducklings with the following timing characteristics:
### Auto-updated shares
For shares configured with auto-update enabled, MotherDuck polls for new data **once per minute**. When new data is detected, it becomes available in the share after the next checkpoint occurs.
### Checkpoints and data availability
Data is written to shares whenever there is a checkpoint. Checkpoints occur automatically based on your database's configuration. For read-scaling Ducklings, you can force a snapshot using [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot/) to make data available sooner.
For read-scaling Ducklings, to force a snapshot and make data immediately available:
```sql
CREATE SNAPSHOT OF ;
```
**Expected result:** A new read-only snapshot is created, ensuring read-scaling connections can access the most up-to-date data.
**Use case:** Run this when you need to ensure the latest data is available to read-scaling Ducklings immediately.
**Important:** This command will wait for any ongoing write queries to complete and prevent new ones from starting during snapshot creation.
1. Navigate to your database in the MotherDuck interface
2. Look for snapshot options in the database management section
3. Trigger a snapshot to ensure your latest data is available in read-scaling Ducklings immediately
**Expected result:** Your latest data becomes immediately available in all read-scaling Ducklings.
### Read-only Ducklings
Data replication to read-only Ducklings within the same account follows the same timing as shares - data becomes available after checkpoints, with polling occurring once per minute for auto-updated configurations.
## Manual share updates
**Use this when you need to:** Publish recent changes from your database to make them available in the share.
**Prerequisites:** You must be the owner of the share and have made changes to the source database since the last share update.
**You'll know you're done when:** The share reflects the latest version of your database and the last updated timestamp changes.
Sharing a database creates a point-in-time snapshot of the database at the time it is shared. To publish changes, you need to explicitly run `UPDATE SHARE `.
When updating a `SHARE` with the same database, the URL does not change.
```sql
UPDATE SHARE ;
```
**Example:** Database 'mydb' was previously shared by creating a share 'myshare', and the database 'mydb' has been updated since. The owner wants colleagues to receive the new version:
```sql
# 'myshare' was previously created on the database 'mydb'
UPDATE SHARE "myshare";
```
**Expected result:** The share is updated with the latest data from the source database.
**Recovery:** If you lost your database share URL, you can use the `LIST SHARES` command to list all your shares or `DESCRIBE SHARE ` to get specific details about a given share name.
## Refreshing shared data (consumer side)
**Use this when you need to:** Get the most up-to-date data from a share or read-scaling Duckling after the producer has made updates.
**Prerequisites:** You must have attached a share or be connected to a read-scaling Duckling.
**You'll know you're done when:** Your local copy reflects the latest data from the producer.
By default, shares and read-scaling Ducklings _automatically sync every minute_. However, if you need the most up-to-date data sooner, you can manually refresh after the producer executes their update command.
### Complete workflow for maximum freshness
For the freshest possible data, follow this two-step process:
1. **Producer side:** Either wait for normal checkpoints or force an update
2. **Consumer side:** Run `REFRESH DATABASE` to pull the latest changes
**Producer (writer connection):**
```sql
-- Make your changes
INSERT INTO my_db.my_table VALUES (...);
-- Option 1: Wait for normal checkpoint (automatic)
-- Data becomes available after the next checkpoint occurs
-- Option 2: Force a snapshot to make data immediately available
CREATE SNAPSHOT OF my_db;
```
**Consumer (read-scaling connection):**
```sql
-- Refresh to get the latest snapshot
REFRESH DATABASES; -- Refreshes all connected databases and shares
-- OR
REFRESH DATABASE my_db; -- Refresh just one specific database
```
**Producer (share owner):**
```sql
-- Make your changes
INSERT INTO my_db.my_table VALUES (...);
-- Option 1: Wait for normal checkpoint (automatic)
-- Data becomes available after the next checkpoint occurs
-- Option 2: Force a share update to make data immediately available
UPDATE SHARE "myshare";
```
**Consumer (share recipient):**
```sql
-- Refresh to get the latest share data
REFRESH DATABASES; -- Refreshes all connected databases and shares
-- OR
REFRESH DATABASE my_share; -- Refresh just one specific share
```
### Understanding the refresh output
When you run `REFRESH DATABASES`, you'll see output showing which databases were refreshed:
```sql
REFRESH DATABASES;
┌─────────┬───────────────────┬──────────────────────────┬───────────┐
│ name │ type │ fully_qualified_name │ refreshed │
│ varchar │ varchar │ varchar │ boolean │
├─────────┼───────────────────┼──────────────────────────┼───────────┤
│ my_db │ motherduck │ md:my_db │ false │
│ myshare │ motherduck share │ md:_share/myshare/uuid │ true │
└─────────┴───────────────────┴──────────────────────────┴───────────┘
```
The `refreshed` column shows `true` for databases that were successfully refreshed with new data.
Learn more about [`REFRESH DATABASE`](/sql-reference/motherduck-sql-reference/refresh-database.md).
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/_include-thing-for-differences-with-duckdb
{props.thing} in MotherDuck {props.verb} differences from DuckDB. When referencing information about {props.thing} in DuckDB Documentation at {props.ddburl}, consider the differences listed in this topic.
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/_include-thing-for-parity-with-duckdb
{props.thing} in MotherDuck {props.verb} no different than in DuckDB. For more information, see {props.ddburl} in DuckDB Documentation.
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/aggregate-functions
---
sidebar_position: 6
title: Aggregate functions
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Aggregate Functions} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/configurations
---
sidebar_position: 8
title: Configurations
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Configuration} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/constraints
---
sidebar_position: 9
title: Constraints
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Constraints} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/data-types
---
sidebar_position: 3
title: Data types
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Data Types} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-sql-reference
---
title: DuckDB SQL
description: DuckDB SQL Reference
---
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/alter-table
---
title: ALTER TABLE
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
ALTER TABLE} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/attach-detach
---
title: ATTACH/DETACH
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
ATTACH/DETACH} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/call
---
title: CALL
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
CALL} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/comment-on
---
title: COMMENT ON
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
COMMENT ON} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/copy
---
title: COPY
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
COPY} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-index
---
title: CREATE INDEX
---
# CREATE INDEX
The `CREATE INDEX` statement in MotherDuck has differences from DuckDB. While the syntax is supported, indexes are not currently utilized for query acceleration in MotherDuck. This is generally not a concern as MotherDuck is already highly optimized for analytical workloads and provides excellent query performance through optimized data storage and processing.
## Key Differences
- Indexes can be created but do not provide performance benefits
- Queries that would use an index scan in DuckDB will use a sequential scan in MotherDuck instead
## Example
```sql
-- Create a table and an index
CREATE TABLE users(id INTEGER, name VARCHAR);
INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');
CREATE INDEX idx_user_id ON users(id);
-- This query will use a sequential scan in MotherDuck
-- even though an index scan would be used in DuckDB
SELECT * FROM users WHERE id = 1;
```
You can verify this behavior using the [EXPLAIN](/sql-reference/motherduck-sql-reference/explain/) statement:
```sql
EXPLAIN SELECT * FROM users WHERE id = 100;
-- Will show SEQ_SCAN in MotherDuck
-- Would show INDEX_SCAN in DuckDB
```
:::note
While queries that would benefit from index acceleration in DuckDB will use different execution plans in MotherDuck, MotherDuck's architecture is designed to provide fast analytical query performance even without indexes. The platform uses various optimizations and a cloud-native architecture to ensure efficient query execution.
:::
Additionally, it's worth noting that indexes can significantly slow down `INSERT` operations, as the index needs to be updated with each new record. Since indexes don't provide query acceleration benefits in MotherDuck, creating them will only add this overhead without any corresponding advantages.
For reference, you can learn more about how indexes work in DuckDB in their [Indexes documentation](https://duckdb.org/docs/sql/indexes).
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-macro
---
title: CREATE MACRO
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
CREATE MACRO} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-table
---
title: CREATE TABLE
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
CREATE TABLE} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/delete
---
title: DELETE
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
DELETE} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/drop
---
title: DROP
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
DROP} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/duckdb-statements
---
title: DuckDB statements
description: DuckDB statements
---
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/export
---
title: EXPORT
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
Export & Import Database} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/insert
---
title: INSERT
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
INSERT Statement} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/pivot
---
title: PIVOT
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
PIVOT Statement} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/select
---
title: SELECT
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
SELECT Statement} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/set-reset
---
title: SET/RESET
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
SET/RESET} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/unpivot
---
title: UNPIVOT
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
UNPIVOT Statement} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/update
---
title: UPDATE
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
UPDATE Statement} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/use
---
title: USE
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
USE} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/vacuum
---
title: VACUUM
---
import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx';
VACUUM} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/enum
---
sidebar_position: 3
title: Enum data type
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
enum data type} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/expressions
---
sidebar_position: 3
title: Expressions
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Expressions} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/functions
---
sidebar_position: 5
title: Functions
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Functions} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/information-schema
---
sidebar_position: 10
title: Information schema
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Information Schema} />
If you want to query information about your MotherDuck entities, take a look at [md_information_schema](/sql-reference/motherduck-sql-reference/md_information_schema/introduction).
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/metadata-functions
---
sidebar_position: 11
title: Metadata functions
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
DuckDB_% Metadata Functions} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/pragma-statements
---
sidebar_position: 12
title: PRAGMA statements
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Pragmas} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/query-syntax
---
sidebar_position: 2
title: Query syntax
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
SELECT Clause} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/sample
---
sidebar_position: 13
title: SAMPLE
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Samples} />
---
Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/window-functions
---
sidebar_position: 7
title: Window functions
---
import PartialExample from './_include-thing-for-parity-with-duckdb.mdx';
Window Functions} />
---
Source: https://motherduck.com/docs/sql-reference/mcp/ask-docs-question
---
sidebar_position: 7
title: ask_docs_question
description: Ask questions about DuckDB or MotherDuck documentation
---
# ask_docs_question
Ask a question about DuckDB or MotherDuck and get answers from official documentation.
## Description
The `ask_docs_question` tool queries the official DuckDB and MotherDuck documentation to answer questions about SQL syntax, features, best practices, and more. This is useful when you need help with DuckDB-specific SQL syntax or MotherDuck features.
The tool uses MotherDuck's documentation assistant to provide accurate answers based on official documentation sources.
## Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `question` | string | Yes | Question about DuckDB or MotherDuck |
## Output Schema
```json
{
"success": boolean,
"question": string, // Original question (on success)
"answer": string, // Documentation-based answer (on success)
"sources": string, // Source references (optional, on success)
"error": string // Error message (on failure)
}
```
## Example Usage
**Ask about DuckDB syntax:**
```
How do I use window functions in DuckDB?
```
The AI assistant will call the tool with:
```json
{
"question": "How do I use window functions in DuckDB?"
}
```
**Ask about MotherDuck features:**
```
How do I create a share in MotherDuck?
```
```json
{
"question": "How do I create a share in MotherDuck?"
}
```
**Ask about data types:**
```
What's the difference between LIST and ARRAY types in DuckDB?
```
```json
{
"question": "What's the difference between LIST and ARRAY types in DuckDB?"
}
```
## Success Response Example
```json
{
"success": true,
"question": "How do I use window functions in DuckDB?",
"answer": "Window functions in DuckDB allow you to perform calculations across a set of rows related to the current row. Here's how to use them:\n\n**Basic syntax:**\n```sql\nSELECT \n column,\n SUM(value) OVER (PARTITION BY category ORDER BY date) as running_total\nFROM table_name;\n```\n\n**Common window functions:**\n- `ROW_NUMBER()` - assigns unique row numbers\n- `RANK()` and `DENSE_RANK()` - ranking with/without gaps\n- `LAG()` and `LEAD()` - access previous/next rows\n- `FIRST_VALUE()` and `LAST_VALUE()` - first/last value in window\n\n**Using QUALIFY:**\nDuckDB supports the QUALIFY clause to filter window function results:\n```sql\nSELECT *\nFROM sales\nQUALIFY ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) = 1;\n```\n\nThis returns only the top sale per region.",
"sources": "https://duckdb.org/docs/sql/window_functions"
}
```
## Tips for Good Questions
- Be specific about what you want to know
- Include context about what you're trying to accomplish
- Mention specific functions or features if known
---
Source: https://motherduck.com/docs/sql-reference/mcp/list-columns
---
sidebar_position: 4
title: list_columns
description: List columns of a table or view with types and comments
---
# list_columns
List all columns of a table or view with their types and comments.
## Description
The `list_columns` tool returns detailed column information for a specified table or view, including data types, nullability, and any comments. This is useful for understanding table structure before writing queries.
## Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `table` | string | Yes | Table or view name |
| `database` | string | Yes | Database name |
| `schema` | string | No | Schema name (defaults to `main`) |
## Output Schema
```json
{
"success": boolean,
"database": string, // Database name
"schema": string, // Schema name
"table": string, // Table or view name
"objectType": "table" | "view", // Whether it's a table or view
"columns": [ // List of columns (on success)
{
"name": string, // Column name
"type": string, // Data type
"nullable": boolean, // Whether nulls are allowed
"comment": string | null // Column comment if set
}
],
"columnCount": number, // Number of columns
"error": string // Error message (on failure)
}
```
## Example Usage
**Get columns for a table:**
```
What columns does the customers table have in my_database?
```
The AI assistant will call the tool with:
```json
{
"table": "customers",
"database": "my_database"
}
```
**Get columns in a specific schema:**
```
Show me the schema of staging.raw_events in analytics_db
```
```json
{
"table": "raw_events",
"database": "analytics_db",
"schema": "staging"
}
```
## Success Response Example
```json
{
"success": true,
"database": "my_database",
"schema": "main",
"table": "customers",
"objectType": "table",
"columns": [
{
"name": "id",
"type": "INTEGER",
"nullable": false,
"comment": "Primary key"
},
{
"name": "email",
"type": "VARCHAR",
"nullable": false,
"comment": "Customer email address"
},
{
"name": "name",
"type": "VARCHAR",
"nullable": true,
"comment": "Full name"
},
{
"name": "created_at",
"type": "TIMESTAMP",
"nullable": false,
"comment": null
},
{
"name": "metadata",
"type": "JSON",
"nullable": true,
"comment": "Additional customer attributes"
}
],
"columnCount": 5
}
```
## Error Response Example
```json
{
"success": false,
"error": "Catalog Error: Table \"nonexistent_table\" does not exist"
}
```
---
Source: https://motherduck.com/docs/sql-reference/mcp/list-databases
---
sidebar_position: 1
title: list_databases
description: List all databases in your MotherDuck account
---
# list_databases
List all databases in your MotherDuck account with their names and types.
## Description
The `list_databases` tool returns all databases accessible to your MotherDuck account, including both owned databases and attached shared databases. This is useful for discovering what data is available before running queries.
## Input Parameters
This tool takes no input parameters.
## Output Schema
```json
{
"success": boolean,
"databases": [ // List of databases (on success)
{
"alias": string, // Database name/alias
"is_attached": boolean, // Whether the database is currently attached
"type": string // Database type (e.g., "motherduck", "memory")
}
],
"error": string // Error message (on failure)
}
```
## Example Usage
**List available databases:**
```
What databases do I have access to?
```
The AI assistant will call the tool with no parameters.
## Success Response Example
```json
{
"success": true,
"databases": [
{
"alias": "my_db",
"is_attached": true,
"type": "motherduck"
},
{
"alias": "analytics",
"is_attached": true,
"type": "motherduck"
},
{
"alias": "shared_sales_data",
"is_attached": true,
"type": "motherduck"
}
]
}
```
---
Source: https://motherduck.com/docs/sql-reference/mcp/list-shares
---
sidebar_position: 2
title: list_shares
description: List database shares that have been shared with you
---
# list_shares
List all database [shares](/key-tasks/sharing-data/sharing-overview) that have been shared with you.
## Description
The `list_shares` tool returns all database shares that have been shared with you by other users. Each share includes its name and URL, which can be used to attach the share as a database using the `query` tool.
To attach a share, execute: `ATTACH '' AS my_alias;`
To detach a share: `DETACH ;`
## Input Parameters
This tool takes no input parameters.
## Output Schema
```json
{
"success": boolean,
"shares": [ // List of shares (on success)
{
"name": string, // Share name
"url": string // Share URL for attaching
}
],
"error": string // Error message (on failure)
}
```
## Example Usage
**List available shares:**
```
What shares have been shared with me?
```
The AI assistant will call the tool with no parameters.
**Attach a share after listing:**
```
Attach the sales_data share so I can query it
```
After getting the share URL from `list_shares`, the AI will use the `query` tool:
```json
{
"database": "my_db",
"sql": "ATTACH 'md:_share/org123/sales_data' AS sales_data"
}
```
## Success Response Example
```json
{
"success": true,
"shares": [
{
"name": "sales_data",
"url": "md:_share/org123/sales_data"
},
{
"name": "product_catalog",
"url": "md:_share/org456/product_catalog"
},
{
"name": "analytics_benchmark",
"url": "md:_share/org789/analytics_benchmark"
}
]
}
```
## Empty Response Example
When no shares have been shared with you:
```json
{
"success": true,
"shares": []
}
```
## Related
- [Sharing Overview](/key-tasks/sharing-data/sharing-overview) - Learn about MotherDuck's data sharing capabilities
- [Managing Shares](/key-tasks/sharing-data/managing-shares) - How to create and manage shares
---
Source: https://motherduck.com/docs/sql-reference/mcp/list-tables
---
sidebar_position: 3
title: list_tables
description: List tables and views in a MotherDuck database
---
# list_tables
List all tables and views in a MotherDuck database with their comments.
## Description
The `list_tables` tool returns all tables and views in a specified database, including their schema, type (table or view), and any comments that have been added. You can optionally filter by schema.
## Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `database` | string | Yes | Database name to list tables from |
| `schema` | string | No | Schema name to filter by (defaults to all schemas) |
## Output Schema
```json
{
"success": boolean,
"database": string, // Database name
"schema": string, // Schema filter used ("all" if not specified)
"tables": [ // List of tables and views (on success)
{
"schema": string, // Schema name
"name": string, // Table or view name
"type": "table" | "view", // Object type
"comment": string | null // Table/view comment if set
}
],
"tableCount": number, // Number of tables
"viewCount": number, // Number of views
"error": string // Error message (on failure)
}
```
## Example Usage
**List all tables in a database:**
```
Show me all tables in my_database
```
The AI assistant will call the tool with:
```json
{
"database": "my_database"
}
```
**List tables in a specific schema:**
```
What tables are in the staging schema of analytics_db?
```
```json
{
"database": "analytics_db",
"schema": "staging"
}
```
## Success Response Example
```json
{
"success": true,
"database": "my_database",
"schema": "all",
"tables": [
{
"schema": "main",
"name": "customers",
"type": "table",
"comment": "Customer master data"
},
{
"schema": "main",
"name": "orders",
"type": "table",
"comment": "Order transactions"
},
{
"schema": "main",
"name": "monthly_sales",
"type": "view",
"comment": "Aggregated monthly sales view"
},
{
"schema": "staging",
"name": "raw_events",
"type": "table",
"comment": null
}
],
"tableCount": 3,
"viewCount": 1
}
```
## Error Response Example
```json
{
"success": false,
"error": "Catalog Error: Database \"nonexistent_db\" does not exist"
}
```
---
Source: https://motherduck.com/docs/sql-reference/mcp/mcp
---
sidebar_position: 0
title: MCP Server
description: Connect AI assistants to your MotherDuck data using the Model Context Protocol
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import DocCardList from '@theme/DocCardList';
import ClaudeIcon from '../../../static/img/icons/brands/claude-icon';
import CursorIcon from '../../../static/img/icons/brands/cursor-icon';
import ExternalLinkIcon from '../../../static/img/icons/external-link-icon';
# MotherDuck MCP Server
The MotherDuck MCP Server enables AI assistants to query and explore your MotherDuck databases using the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). Connect your favorite AI agent to explore schemas, run SQL queries, and ask documentation questions with natural language.
## Quick Start
Select your MCP client and follow the instructions to connect to the MotherDuck MCP server.
Add MotherDuck to Claude
Or manually:
1. Go to **Settings** → **Connectors**
2. Click **Browse Connectors** to find the MotherDuck connector

A browser window should open for authentication. After authentication you can double check the connection by asking "List all my databases on MotherDuck."
If you haven't already, make sure to first turn on 'Developer Mode' on chatgpt.com under [Settings → Apps → Advanced](https://chatgpt.com/#settings/Connectors/Advanced)

1. Go to [ChatGPT Settings → Connectors](https://chatgpt.com/#settings/Connectors)
2. Click **Create App**
3. Enter the following:
- **Name:** `MotherDuck`
- **MCP Server URL:** `https://api.motherduck.com/mcp`
4. Click **Create** and authenticate with your MotherDuck account

→ [ChatGPT Connectors Documentation](https://help.openai.com/en/articles/11487775-connectors-in-chatgpt)
Add MotherDuck to Cursor
1. Open **Cursor Settings** (`Cmd/Ctrl + ,`)
2. Navigate to **Tools & MCP**
3. Click **+ New MCP Server**
4. Add the following to the configuration file:
```json
{
"MotherDuck": {
"url": "https://api.motherduck.com/mcp",
"type": "http"
}
}
```
5. Save and click **Connect** to authenticate with your MotherDuck account
→ [Cursor MCP Documentation](https://docs.cursor.com/context/model-context-protocol)
1. Run the following command in your terminal:
```bash
claude mcp add MotherDuck --transport http https://api.motherduck.com/mcp
```
2. Run `claude` to start Claude Code
3. Type `/mcp`, select **MotherDuck** from the list, and press **Enter**
4. Select **Authenticate** and confirm the authorization dialog
→ [Claude Code MCP Documentation](https://code.claude.com/docs/en/mcp)
If you're using **Windsurf**, **Zed**, or another MCP-compatible client, use the following JSON configuration:
```json
{
"mcpServers": {
"MotherDuck": {
"url": "https://api.motherduck.com/mcp",
"type": "http"
}
}
}
```
### Authentication
The MCP server supports OAuth authentication. When you add the server to your AI client, you'll be redirected to authenticate with your MotherDuck account. The server uses your MotherDuck credentials to access your databases with the same permissions as your account.
Some clients also support simple authentication - in that case, you can provide your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#creating-an-access-token) as a Bearer header.
## Server Capabilities
With the MCP server, your agent can:
- Execute read-only SQL queries against your databases
- Explore database schemas, tables, and columns
- Attach and detach [shares](/key-tasks/sharing-data/sharing-overview) that have been shared with you
- Ask questions about DuckDB and MotherDuck documentation
**Example prompts:**
- *"Analyze monthly revenue trends and identify our fastest-growing product categories"*
- *"Compare customer retention rates across different acquisition channels"*
- *"Build a cohort analysis showing user engagement over their first 90 days"*
For clients that [support MCP instructions](https://modelcontextprotocol.io/clients#feature-support-matrix), the server provides detailed [query guidelines](https://app.motherduck.com/assets/docs/mcp_server_instructions.md) to help AI assistants write effective DuckDB SQL.
Learn more about [using the MotherDuck MCP Server](/key-tasks/ai-and-motherduck/mcp-workflows).
## Available Tools
The MCP server provides the following tools for AI assistants:
## Regional Availability
The MotherDuck MCP server is available in all MotherDuck regions. Requests are routed to the MCP server closest to where the client runs:
- **Desktop clients** (Cursor, Claude Code): Routed based on your physical location
- **Web-based agents** (Claude.ai, ChatGPT): Routed based on the agent provider's server location
Your data is always processed in your MotherDuck organization's region. However, query results transit through the MCP server. If you have strict data residency requirements, ensure your MCP client runs within your region.
## Self-Hosted MCP Server
For local DuckDB databases or self-hosted scenarios, MotherDuck maintains an open-source MCP server that you can run locally.
📦 **mcp-server-motherduck** - Open-source MCP server for DuckDB and MotherDuck
The self-hosted server supports:
- **Read-write operations** on local and cloud databases
- Local DuckDB databases (no cloud connection required)
- MotherDuck cloud databases with your access token
- Custom configurations and security settings
## Related Resources
- [MCP User Guide](/key-tasks/ai-and-motherduck/mcp-workflows) - Tips and workflows for using the MotherDuck MCP Server
- [Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents) - Guide to building AI agents with MotherDuck
- [MCP Specification (2025-06-18)](https://modelcontextprotocol.io/specification/2025-06-18) - Official protocol documentation
---
Source: https://motherduck.com/docs/sql-reference/mcp/query
---
sidebar_position: 6
title: query
description: Execute read-only SQL queries against MotherDuck databases
---
# query
Execute read-only DuckDB SQL queries against MotherDuck databases.
## Description
The `query` tool executes SQL queries against your MotherDuck databases. For cross-database queries, use fully qualified names: `database.schema.table` (or `database.table` for the main schema).
This tool is read-only. The following operations are blocked:
- `CREATE TABLE` / `DROP TABLE` / `ALTER TABLE`
- `INSERT INTO` / `MERGE INTO`
- `CREATE DATABASE` / `DROP DATABASE`
- `CREATE SHARE` / `DROP SHARE`
- `CREATE SECRET` / `DROP SECRET`
- `CREATE SNAPSHOT` / `REFRESH DATABASE`
## Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `database` | string | Yes | Database name to query |
| `sql` | string | Yes | DuckDB SQL query to execute |
## Output Schema
```json
{
"success": boolean,
"columns": string[], // Column names (on success)
"columnTypes": string[], // Column types (on success)
"rows": any[][], // Query results (on success)
"rowCount": number, // Number of rows returned (on success)
"error": string, // Error message (on failure)
"errorType": string // Error type (on failure)
}
```
## Limits
- **Result limit:** Maximum 2,048 rows and 50,000 characters. Results exceeding these limits will be truncated with a truncation message.
- **Query timeout:** 55 seconds, to stay within common client timeouts. Queries exceeding this limit will be cancelled server-side and the tool will respond with an error message.
## Example Usage
**Simple query:**
```
Query the top 5 customers by total orders from my_database
```
The AI assistant will call the tool with:
```json
{
"database": "my_database",
"sql": "SELECT customer_name, COUNT(*) as order_count FROM orders GROUP BY customer_name ORDER BY order_count DESC LIMIT 5"
}
```
**Cross-database query:**
```
Join the users table from auth_db with orders from sales_db
```
```json
{
"database": "auth_db",
"sql": "SELECT u.name, o.order_id, o.amount FROM auth_db.main.users u JOIN sales_db.main.orders o ON u.id = o.user_id LIMIT 100"
}
```
## Success Response Example
```json
{
"success": true,
"columns": ["customer_name", "order_count"],
"columnTypes": ["VARCHAR", "BIGINT"],
"rows": [
["Acme Corp", 150],
["TechStart Inc", 89],
["Global Services", 72]
],
"rowCount": 3
}
```
## Error Response Example
```json
{
"success": false,
"error": "This query type is not permitted in read-only mode. CREATE DATABASE, DROP DATABASE, CREATE SHARE, DROP SHARE, CREATE SECRET, DROP SECRET, CREATE SNAPSHOT, and REFRESH DATABASE are blocked.",
"errorType": "ForbiddenQueryError"
}
```
---
Source: https://motherduck.com/docs/sql-reference/mcp/search-catalog
---
sidebar_position: 5
title: search_catalog
description: Fuzzy search across databases, schemas, tables, columns, and shares
---
# search_catalog
Search the catalog for databases, schemas, tables, columns, and shares using fuzzy matching.
## Description
The `search_catalog` tool performs fuzzy search across your entire MotherDuck catalog. It finds matching objects by name using partial matching, supporting underscores, dots, and multi-word queries. This is useful for discovering available data when you don't know exact names.
The search uses Jaro-Winkler similarity scoring and returns results ranked by relevance. Results are limited per category to provide a balanced view across different object types.
## Input Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `query` | string | Yes | Search term to find in object names (supports partial matching, underscores, dots) |
| `object_types` | string[] | No | Filter results to specific types: `"database"`, `"schema"`, `"table"`, `"column"`, `"share"` |
## Output Schema
```json
{
"success": boolean,
"query": string, // Search query used
"resultCount": number, // Total results found
"results": [ // Search results (on success)
{
"type": "database" | "schema" | "table" | "column" | "share",
"name": string, // Object name
"fullyQualifiedName": string, // Full path (e.g., "db.schema.table.column")
"database": string | null, // Database (null for shares)
"schema": string | null, // Schema (null for databases/shares)
"table": string | null, // Table (only for columns)
"dataType": string | null, // Data type (columns) or URL (shares)
"comment": string | null, // Object comment if set
"relevanceScore": number // Match score 0-1 (higher is better)
}
],
"error": string, // Error message (on failure)
"errorType": string // Error type (on failure)
}
```
## Result Limits
Results are limited per object type to provide balanced coverage:
- Shares: 10 results
- Columns: 40 results
- Tables: 30 results
- Schemas: 20 results
- Databases: 20 results
Maximum total results: 100
## Example Usage
**Search for tables with "sales" in the name:**
```
Find all tables related to sales data
```
The AI assistant will call the tool with:
```json
{
"query": "sales"
}
```
**Search only for columns:**
```
Find columns containing "email"
```
```json
{
"query": "email",
"object_types": ["column"]
}
```
**Search with qualified name:**
```
Find anything matching analytics.events
```
```json
{
"query": "analytics.events"
}
```
## Success Response Example
```json
{
"success": true,
"query": "sales",
"resultCount": 8,
"results": [
{
"type": "table",
"name": "sales_data",
"fullyQualifiedName": "analytics.main.sales_data",
"database": "analytics",
"schema": "main",
"table": null,
"dataType": null,
"comment": "Daily sales transactions",
"relevanceScore": 0.95
},
{
"type": "table",
"name": "monthly_sales",
"fullyQualifiedName": "analytics.main.monthly_sales",
"database": "analytics",
"schema": "main",
"table": null,
"dataType": null,
"comment": null,
"relevanceScore": 0.89
},
{
"type": "column",
"name": "total_sales",
"fullyQualifiedName": "analytics.main.revenue.total_sales",
"database": "analytics",
"schema": "main",
"table": "revenue",
"dataType": "DECIMAL(18,2)",
"comment": "Total sales amount",
"relevanceScore": 0.87
},
{
"type": "share",
"name": "regional_sales_share",
"fullyQualifiedName": "regional_sales_share",
"database": "regional_sales_share",
"schema": null,
"table": null,
"dataType": "md:_share/org123/regional_sales_share",
"comment": null,
"relevanceScore": 0.82
}
]
}
```
## Error Response Example
```json
{
"success": false,
"error": "Search query cannot be empty",
"errorType": "ValidationError"
}
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/ai-functions
---
sidebar_position: 0
title: AI Functions
---
import DocCardList from '@theme/DocCardList';
# AI Functions
MotherDuck AI functions reference. These functions leverage AI models to perform various tasks including text generation, embeddings, and SQL assistance.
For more practical guidance, see our [AI and MotherDuck](/category/ai-and-motherduck/) how-to guides.
Costs can be found on the [Pricing Page](/about-motherduck/billing/pricing/#ai-function-pricing). Information about regional data processing of AI functions can be found at the bottom of the individual function pages.
## Available Functions
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/embedding
---
sidebar_position: 1
title: EMBEDDING
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
::::warning[Preview Feature]
This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.
::::
## Embedding Function
The `embedding` function allows you to generate vector representations (embeddings) of text directly from SQL. These embeddings capture semantic meaning, enabling powerful [semantic search](/key-tasks/ai-and-motherduck/text-search-in-motherduck/#embedding-based-search) and other natural language processing tasks.
The function uses OpenAI's models: `text-embedding-3-small` (default) with 512 dimensions or `text-embedding-3-large` with 1024 dimensions. Both models support single- and multi-row inputs, enabling batch processing.
The maximum input size is limited to 2048 characters - larger inputs will be truncated.
Consumption is measured in [AI Units](/about-motherduck/billing/pricing#ai-function-pricing). One AI Unit equates to approximately:
- 60,000 embedding rows with `text-embedding-3-small`
- 12,000 embedding rows with `text-embedding-3-large`
These estimates assume an input size of 1,000 characters.
### Syntax
```sql
SELECT embedding(my_text_column) FROM my_table; -- returns FLOAT[512] column
```
### Parameters
The `embedding` function accepts parameters using named parameter syntax with the `:=` operator.
| **Parameter** | **Required** | **Description** |
|--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| `text_input` | Yes | The text to be converted into an embedding vector |
| `model` | No | Model type, either `'text-embedding-3-small'` (default) or `'text-embedding-3-large'` |
### Return Types
The `embedding` function returns different array sizes depending on the model used:
- With `text-embedding-3-small`: Returns `FLOAT[512]`
- With `text-embedding-3-large`: Returns `FLOAT[1024]`
### Examples
#### Basic Embedding Generation
```sql
-- Generate embeddings using the default model (text-embedding-3-small)
SELECT embedding('This is a sample text') AS text_embedding;
-- Generate embeddings using the larger model for higher dimensionality
SELECT embedding('This is a sample text', model:='text-embedding-3-large') AS text_embedding;
```
#### Batch Processing
```sql
-- Generate embeddings for multiple rows at once
SELECT
title,
embedding(overview) AS overview_embeddings
FROM kaggle.movies
LIMIT 10;
```
### Use Cases
#### Creating an Embedding Database
This example uses the sample movies dataset from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets).
```sql
--- Create a new table with embeddings for the first 100 overview entries
CREATE TABLE my_db.movies AS
SELECT title,
overview,
embedding(overview) AS overview_embeddings
FROM kaggle.movies
LIMIT 100;
```
If write access to the source table is available, the embedding column can also be added in place:
```sql
--- Update the existing table to add new column for embeddings
ALTER TABLE my_db.movies ADD COLUMN overview_embeddings FLOAT[512];
--- Populate the column with embeddings
UPDATE my_db.movies
SET overview_embeddings = embedding(overview);
```
The movies table now contains a new column `overview_embeddings` with vector representations of each movie description:
```sql
SELECT * FROM my_db.movies;
```
| **title** | **overview** | **overview_embeddings** |
| ----------------- | ----------------- |----------------------------------------------------|
| 'Toy Story 3' | 'Led by Woody, Andy's toys live happily in [...]' | [0.023089351132512093, -0.012809964828193188, ...] |
| 'Jumanji' | 'When siblings Judy and Peter discover an [...]' | [-0.005538413766771555, 0.0799209326505661, ...] |
| ... | ... | ... |
#### Semantic Similarity Search
The `array_cosine_similarity` function can be used to compute similarities between embeddings.
This enables semantic search to retrieve entries that are conceptually / semantically similar to a query, even if they don't share the same keywords.
```sql
-- Find movies similar to "Toy Story" based on semantic similarity
SELECT
title,
overview,
array_cosine_similarity(
embedding('Led by Woody, Andy''s toys live happily [...]'),
overview_embeddings
) AS similarity
FROM kaggle.movies
WHERE title != 'Toy Story'
ORDER BY similarity DESC
LIMIT 5;
```
| **title** | **overview** | **similarity** |
|-----------------|-----------------|-----------------|
|'Toy Story 3'|'Woody, Buzz, and the rest of Andy's toys haven't [...]'|0.7372807860374451|
|'Toy Story 2'|'Andy heads off to Cowboy Camp, leaving his toys [...]'|0.7222828269004822|
|... |... |... |
For advanced similarity search techniques including document chunking, hybrid search, and performance optimization, see the [Embedding-Based Search](/key-tasks/ai-and-motherduck/text-search-in-motherduck/#embedding-based-search) section in the Text Search guide.
#### Building a Recommendation System
Embeddings can be used to build content-based recommendation systems:
```sql
-- Create a macro to recommend similar movies
CREATE OR REPLACE MACRO recommend_similar_movies(movie_title) AS TABLE (
WITH target_embedding AS (
SELECT embedding(overview) AS emb
FROM sample_data.kaggle.movies
WHERE title = movie_title
LIMIT 1
)
SELECT
m.title AS recommended_title,
m.overview,
array_cosine_similarity(t.emb, m.overview_embeddings) AS similarity
FROM
sample_data.kaggle.movies m,
target_embedding t
WHERE
m.title != movie_title
ORDER BY
similarity DESC
LIMIT 5
);
-- Use the macro to get recommendations
SELECT * FROM recommend_similar_movies('The Matrix');
```
#### Retrieval-Augmented Generation (RAG)
Embeddings are a key component in building [RAG](https://motherduck.com/blog/search-using-duckdb-part-2/) systems,
which can be combined with the [[`prompt` function]](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/prompt/#retrieval-augmented-generation-rag) for powerful question-answering capabilities:
```sql
-- Create a reusable macro for question answering
CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
SELECT question_text AS question, prompt(
'User asks the following question:\n' || question_text || '\n\n' ||
'Here is some additional information:\n' ||
STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' ||
'Please answer the question based only on the additional information provided.',
model := 'gpt-4o'
) AS response
FROM (
SELECT title, overview
FROM sample_data.kaggle.movies
ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC
LIMIT 3
)
);
-- Use the macro to answer questions
SELECT question, response
FROM ask_question('Can you recommend some good sci-fi movies about AI?');
```
### Security Considerations
When passing free-text arguments from external sources to the embedding function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection.
```python
# Using prepared statements in Python
user_query = "Led by Woody, Andy's toys live happily [...]"
con.execute("""
SELECT title, overview, array_cosine_similarity(embedding(?), overview_embeddings) as similarity
FROM kaggle.movies
ORDER BY similarity DESC
LIMIT 5""", [user_query])
```
### Error Handling
When usage limits have been reached or an unexpected error occurs while computing embeddings,
the function will not fail the entire query but will return `NULL` values for the affected rows.
To check if all embeddings were computed successfully:
```sql
-- Check for NULL values in embedding column
SELECT count(*)
FROM my_db.movies
WHERE overview_embeddings IS NULL AND overview IS NOT NULL;
```
Missing values can be filled in with a separate query:
```sql
-- Fill in missing embedding values
UPDATE my_db.movies
SET overview_embeddings = embedding(overview)
WHERE overview_embeddings IS NULL AND overview IS NOT NULL;
```
### Performance Considerations
- **Batch Processing**: when processing multiple rows, consider using `LIMIT` to control the number of API calls.
- **Model Selection**: use `text-embedding-3-small` for faster, less expensive embeddings when the highest precision isn't critical.
- **Caching**: results are not cached between queries, so consider storing embeddings in tables for repeated use.
- **Dimensionality**: higher dimensions (using `text-embedding-3-large`) provide more precise semantic representation but require more storage and computation time.
### Notes
These capabilities are provided by MotherDuck's integration with Azure OpenAI and inputs to the embedding function will be processed by Azure OpenAI.
For availability and usage limits, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model).
Usage limits are in place to safeguard your spend, not because of throughput limitations. MotherDuck has the capacity to handle high-volume embedding workloads and is always open to working alongside customers to support any type of workload and model requirements.
If you need higher usage limits or have specific requirements, please see our [support page](/troubleshooting/support/).
#### Regional Processing
Requests are processed based on the region of the MotherDuck organization according to the table below. Functions that are not available within the region (no checkmark) will be processed with global compute resources.
| Function | Global | Europe |
|----------|--------|-------------------------|
| `EMBEDDING` (`text-embedding-3-small`) | ✓ | |
| `EMBEDDING` (`text-embedding-3-large`) | ✓ | ✓ |
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/prompt
---
sidebar_position: 1
title: PROMPT
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
::::warning[Preview Feature]
This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.
::::
## Prompt Function
The `prompt` function allows you to interact with Large Language Models (LLMs) directly from SQL. You can generate both free-form text and structured data outputs.
The function supports OpenAI's `gpt-5` series (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`), `gpt-4o-mini` (default), `gpt-4o`, and the `gpt-4.1` series. All models support single- and multi-row inputs, enabling batch processing.
Consumption is measured in [AI Units](/about-motherduck/billing/pricing#ai-function-pricing). When reasoning over table rows, one AI Unit equates to approximately:
- 480 rows responses with `gpt-4o`
- 8,000 rows responses with `gpt-4o-mini`
- 600 rows responses with `gpt-4.1`
- 3,000 rows responses with `gpt-4.1-mini`
- 12,000 rows responses with `gpt-4.1-nano`
- 720 rows responses with `gpt-5`
- 3,600 rows with `gpt-5-mini`
- 18,000 rows with `gpt-5-nano`
These estimates assume an input size of 1,000 characters and response size of 250 characters.
## Syntax
```sql
SELECT prompt('Write a poem about ducks'); -- returns a single cell table with the response
```
### Parameters
| **Parameter** | **Required** | **Description** |
|--------------------|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `prompt_text` | Yes | The text input to send to the model |
| `model` | No | Model type: `'gpt-5'`, `'gpt-5-mini'`, `'gpt-5-nano'`, `'gpt-4o-mini'` (default), `'gpt-4o'`, `'gpt-4.1'`, `'gpt-4.1-mini'`, or `'gpt-4.1-nano'` |
| `temperature` | No | Model temperature value between `0` and `1`, default: `0.1`. Lower values produce more deterministic outputs. **Not supported with GPT-5 models** (use `reasoning_effort` instead). |
| `reasoning_effort` | No | Controls reasoning depth for GPT-5 models only. Valid values: `'minimal'` (default), `'low'`, `'medium'`, `'high'`. Higher effort may improve accuracy for complex tasks. **Only available for GPT-5 series models**. |
| `return_type` | No | Specifies the exact SQL type to return (e.g., `'INTEGER'`, `'BOOLEAN'`, `'DATE'`, `'VARCHAR[]'`, `'STRUCT(name VARCHAR, age INTEGER)'`). Supports most DuckDB types including primitives, arrays, structs, and enums. Mutually exclusive with `struct` and `json_schema`. |
| `struct` | No | Output schema as struct, e.g. `{summary: 'VARCHAR', persons: 'VARCHAR[]'}`. Will result in `STRUCT` output. Mutually exclusive with `return_type` and `json_schema`. |
| `struct_descr` | No | Descriptions for struct fields that will be added to the model's context, e.g. `{summary: 'a 1 sentence summary of the text', persons: 'an array of all persons mentioned in the text'}` |
| `json_schema` | No | A JSON schema that adheres to [OpenAI's structured output guide](https://platform.openai.com/docs/guides/structured-outputs/supported-schemas). Provides more flexibility than the struct/struct_descr parameters. Will result in `JSON` output. Mutually exclusive with `return_type` and `struct`. |
**Note**: The `return_type` and `struct` parameters support enum types for classification tasks. Define enum types first using `CREATE TYPE`, then reference them in the struct schema (e.g., `sentiment: 'sentiment_enum'` or `categories: 'category_enum[]'` for arrays).
### Return Types
The `prompt` function can return different data types depending on the parameters used:
- Without structure parameters: Returns `VARCHAR`
- With `return_type` parameter: Returns the exact SQL type specified (e.g., `INTEGER`, `BOOLEAN`, `DATE`, `VARCHAR[]`, `STRUCT(...)`)
- With `struct` parameter: Returns a `STRUCT` with the specified schema
- With `json_schema` parameter: Returns `JSON`
**Note**: The `return_type`, `struct`, and `json_schema` parameters are mutually exclusive - only one can be used at a time.
## Example Usage
### Basic Text Generation
```sql
-- Call gpt-4o-mini (default) to generate text
SELECT prompt('Write a poem about ducks') AS response;
-- Call gpt-4o with higher temperature for more creative outputs
SELECT prompt('Write a poem about ducks', model:='gpt-4o', temperature:=1) AS response;
```
### Structured Output with Struct
```sql
-- Extract structured information from text using struct parameter
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'},
struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review;
```
This returns a `STRUCT` value that can be accessed with dot notation:
```sql
SELECT
zoo_review.summary,
zoo_review.favourite_animals,
zoo_review.star_rating
FROM (
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'},
struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review
);
```
### Structured Output with JSON Schema
```sql
-- Extract structured information using JSON schema
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
json_schema := '{
"name": "zoo_visit_review",
"schema": {
"type": "object",
"properties": {
"summary": { "type": "string" },
"sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] },
"animals_seen": { "type": "array", "items": { "type": "string" } }
},
"required": ["summary", "sentiment", "animals_seen"],
"additionalProperties": false
},
"strict": true
}') AS json_review;
```
This returns a `JSON` value that, if saved, can be accessed using JSON extraction functions:
```sql
SELECT
json_extract_string(json_review, '$.summary') AS summary,
json_extract_string(json_review, '$.sentiment') AS sentiment,
json_extract(json_review, '$.animals_seen') AS animals_seen
FROM (
SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.',
json_schema := '{ ... }') AS json_review
);
```
### Typed Output with Return Type
The `return_type` parameter allows you to specify the exact SQL type for the model's response, providing strong typing for single-value extractions:
```sql
-- Extract an integer from text
SELECT prompt('The answer is 42', return_type := 'INTEGER') AS answer;
-- Returns: 42 (as INTEGER type)
-- Extract a boolean
SELECT prompt('Is the sky blue?', return_type := 'BOOLEAN') AS is_blue;
-- Returns: true (as BOOLEAN type)
-- Extract a date
SELECT prompt('When is January 15, 2025?', return_type := 'DATE') AS event_date;
-- Returns: 2025-01-15 (as DATE type)
-- Extract multiple structured fields
SELECT prompt(
'John is 30 years old and lives in NYC',
return_type := 'STRUCT(name VARCHAR, age INTEGER, city VARCHAR)'
) AS person_info;
-- Returns: {'name': 'John', 'age': 30, 'city': 'NYC'} (as STRUCT type)
-- Extract arrays
SELECT prompt('List the days of the week', return_type := 'VARCHAR[]') AS weekdays;
-- Returns: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
```
The `return_type` parameter supports most DuckDB types including:
- **Primitives**: `VARCHAR`, `INTEGER`, `BIGINT`, `DOUBLE`, `BOOLEAN`, `DATE`, `TIMESTAMP`, etc.
- **Arrays**: `INTEGER[]`, `VARCHAR[]`, `DOUBLE[]`, etc.
- **Structs**: `STRUCT(field1 TYPE1, field2 TYPE2, ...)`
- **Enums**: Custom enum types created with `CREATE TYPE`
### GPT-5 Reasoning Effort
The `reasoning_effort` parameter controls how much computational effort GPT-5 models spend on reasoning. This is only available for GPT-5 series models (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`):
```sql
-- Use minimal reasoning (fastest, default)
SELECT prompt('What is 2+2?', 'gpt-5-mini',
reasoning_effort := 'minimal',
return_type := 'INTEGER') AS result;
-- Use low reasoning for simple tasks
SELECT prompt('Count the letters in "hello"', 'gpt-5-nano',
reasoning_effort := 'low',
return_type := 'INTEGER') AS letter_count;
-- Use medium reasoning for moderate complexity
SELECT prompt('Calculate 5 factorial', 'gpt-5-mini',
reasoning_effort := 'medium',
return_type := 'INTEGER') AS factorial;
-- Use high reasoning for complex tasks
SELECT prompt('Solve this logic puzzle: ...', 'gpt-5',
reasoning_effort := 'high') AS solution;
```
**Note**: The `reasoning_effort` parameter cannot be used with non-GPT-5 models, and `temperature` cannot be used with GPT-5 models. They are mutually exclusive ways of controlling model behavior.
## Use Cases
### Text Generation
Using the prompt function to write a poem about ducks:
```sql
--- Prompt LLM to write a poem about ducks
SELECT prompt('Write a poem about ducks') AS response;
```
| **response** |
|------------------------------------------------------------------------------------------------------------------|
| 'Beneath the whispering willow trees, Where ripples dance with wayward breeze, A symphony of quacks arise [...]' |
### Summarization
We use the prompt function to create a one-sentence summary of movie descriptions.
The example is based on the sample movies dataset from [MotherDuck's sample data database](/docs/getting-started/interfaces/client-apis/connect-query-from-python/query-data).
```sql
--- Create a new table with summaries for the first 100 overview texts
CREATE TABLE my_db.movies AS
SELECT title,
overview,
prompt('Summarize this movie description in one sentence: ' || overview) AS summary
FROM kaggle.movies
LIMIT 100;
```
If write access to the source table is available, the summary column can also be added in place:
```sql
--- Update the existing table to add new column for summaries
ALTER TABLE my_db.movies ADD COLUMN summary VARCHAR;
--- Populate the column with summaries
UPDATE my_db.movies
SET summary = prompt('Summarize this movie description in one sentence: ' || overview);
```
The movies table now contains a new column `summary` with one-sentence summaries of the movies:
```sql
SELECT title, overview, summary
FROM my_db.movies;
```
| **title** | **overview** | **summary** |
|-----------|----------------------------------------------|------------------------------------------------------|
| Toy Story | Led by Woody, Andy's toys live happily [...] | In "Toy Story," Woody's jealousy of the new [...] |
| Jumanji | When siblings Judy and Peter discover [...] | In this thrilling adventure, siblings Judy and [...] |
| ... | ... | ... |
### Structured Data Extraction
The prompt function can be used to extract structured data from text.
The example is based on the same sample movies dataset from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). This time we aim to extract structured metadata from the movie's overview description.
We are interested in the main characters mentioned in the descriptions, as well as the movie's genre and a rating of how much action the movie contains, given a scale of 1 (no action) to 5 (lot of action).
For this, we make use of the `struct` and `struct_descr` parameters, which will result in structured output.
```sql
--- Update the existing table to add new column for structured metadata
ALTER TABLE my_db.movies ADD COLUMN metadata STRUCT(main_characters VARCHAR[], genre VARCHAR, action INTEGER);
--- Populate the column with structured information
UPDATE my_db.movies
SET metadata = prompt(
overview,
struct:={main_characters: 'VARCHAR[]', genre: 'VARCHAR', action: 'INTEGER'},
struct_descr:={
main_characters: 'an array of the main character names mentioned in the movie description',
genre: 'the primary genre of the movie based on the description',
action: 'rate on a scale from 1 (no action) to 5 (high action) how much action the movie contains'
}
);
```
The resulting `metadata` field is a `STRUCT` that can be accessed as follows:
```sql
SELECT title,
overview,
metadata.main_characters,
metadata.genre,
metadata.action
FROM my_db.movies;
```
| **title** | **overview** | **metadata.main_characters** | **metadata.genre** | **action** |
|-----------|----------------------------------------------|-------------------------------------------------------------------------|------------------------------|------------|
| Toy Story | Led by Woody, Andy's toys live happily [...] | ['"Woody"', '"Buzz Lightyear"', '"Andy"', '"Mr. Potato Head"', '"Rex"'] | Animation, Adventure, Comedy | 3 |
| Jumanji | When siblings Judy and Peter discover [...] | ['"Judy Shepherd"', '"Peter Shepherd"', '"Alan Parrish"'] | Adventure, Fantasy, Family | 4 |
| ... | ... | ... | ... | ... |
### Classification with Enums
The `prompt` function supports enum types for classification tasks, ensuring consistent and constrained outputs. This is particularly useful for sentiment analysis, categorization, and other classification scenarios.
#### Sentiment Analysis
```sql
-- Define an enum for sentiment classification
CREATE TYPE sentiment_type AS ENUM ('positive', 'negative', 'neutral');
-- Classify customer reviews
SELECT
review_text,
prompt(
'Classify the sentiment of this review: ' || review_text,
struct := {sentiment: 'sentiment_type'}
).sentiment AS sentiment
FROM (
VALUES
('The product is amazing, I love it!'),
('Terrible quality, waste of money.'),
('It works fine, nothing special.')
) AS reviews(review_text);
```
This returns:
| **review_text** | **sentiment** |
|-----------------|---------------|
| The product is amazing, I love it! | positive |
| Terrible quality, waste of money. | negative |
| It works fine, nothing special. | neutral |
#### Extracting Multiple Categories
Use enum arrays to extract multiple instances of the same category from text:
```sql
-- Define enums for different types of skills mentioned in text
CREATE TYPE skill_type AS ENUM ('sql', 'python', 'javascript', 'react', 'aws', 'docker', 'git');
CREATE TYPE topic_type AS ENUM ('database', 'frontend', 'backend', 'devops', 'analytics', 'security');
-- Extract skills and topics from job descriptions
SELECT
description,
prompt(
'Extract the technical skills and topics mentioned in this text: ' || description,
struct := {
skills: 'skill_type[]',
topics: 'topic_type[]'
}
) AS extracted
FROM (
VALUES
('Looking for a developer with Python and SQL experience for database analytics work'),
('Frontend role using React and JavaScript, plus Git for version control'),
('DevOps engineer needed for AWS and Docker deployment automation')
) AS jobs(description);
```
This returns arrays of enum values:
| **description** | **extracted.skills** | **extracted.topics** |
|-----------------|---------------------|---------------------|
| Looking for a developer with Python and SQL experience for database analytics work | ['python', 'sql'] | ['database', 'analytics'] |
| Frontend role using React and JavaScript, plus Git for version control | ['javascript', 'react', 'git'] | ['frontend'] |
| DevOps engineer needed for AWS and Docker deployment automation | ['aws', 'docker'] | ['devops'] |
### Retrieval-Augmented Generation (RAG)
The `prompt` function can be combined with [similarity search on embeddings](/docs/sql-reference/motherduck-sql-reference/ai-functions/embedding/) to build a [RAG](https://motherduck.com/blog/search-using-duckdb-part-2/) pipeline. For advanced retrieval strategies including hybrid search, reranking, and HyDE, see the [Text Search guide](/key-tasks/ai-and-motherduck/text-search-in-motherduck/).
```sql
-- Create a reusable macro for question answering
CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
SELECT question_text AS question, prompt(
'User asks the following question:\n' || question_text || '\n\n' ||
'Here is some additional information:\n' ||
STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' ||
'Please answer the question based only on the additional information provided.',
model := 'gpt-4o'
) AS response
FROM (
SELECT title, overview
FROM kaggle.movies
ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC
LIMIT 3
)
);
-- Use the macro to answer questions
SELECT question, response
FROM ask_question('Can you recommend some good sci-fi movies about AI?');
```
This will result in the following output:
| **question** | **response** |
|-----------------------------------------------------|-----------------------------------------------------------------------------------|
| Can you recommend some good sci-fi movies about AI? | Based on the information provided, here are some sci-fi movies about AI that you might enjoy: [...] |
:::warning
When passing free-text arguments from external sources to the prompt function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection.
:::
Using prepared statements in [Python](/docs/getting-started/interfaces/client-apis/connect-query-from-python/query-data/):
```python
# First register the macro
con.execute("""
CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE (
-- Macro definition as above
);
""")
# Then use prepared statements for user input
user_query = "Can you recommend some good sci-fi movies about AI?"
result = con.execute("""
SELECT response FROM ask_question(?)
""", [user_query]).fetchone()
print(result[0])
```
## Batch Processing
The `prompt` function can process multiple rows in a single query:
```sql
--- Process multiple rows at once
SELECT
title,
prompt('Write a tagline for this movie: ' || overview) AS tagline
FROM kaggle.movies
LIMIT 10;
```
## Error Handling
When usage limits have been reached or an unexpected error occurs while computing prompt responses,
the function will not fail the entire query but will return `NULL` values for the affected rows.
To check if all responses were computed successfully, check if any values in the resulting column are null.
```sql
-- Check for NULL values in response column
SELECT count(*)
FROM my_db.movies
WHERE response IS NULL AND overview IS NOT NULL;
```
Missing values can be filled in with a separate query:
```sql
-- Fill in missing prompt responses
UPDATE my_db.movies
SET response = prompt('Summarize this movie description in one sentence: ' || overview)
WHERE response IS NULL AND overview IS NOT NULL;
```
## Performance Considerations
- **Batch Processing**: When processing multiple rows, consider using `LIMIT` to control the number of API calls.
- **Model Selection**: Use `gpt-4o-mini` for faster, less expensive responses when high accuracy isn't critical.
- **Caching**: Results are not cached between queries, so consider storing results in tables for repeated use.
## Notes
These capabilities are provided by MotherDuck's integration with Azure OpenAI. Inputs to the prompt function will be processed by Azure OpenAI.
For availability and usage limits, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model).
Usage limits are in place to safeguard your spend, not because of throughput limitations. MotherDuck has the capacity to handle high-volume embedding workloads and is always open to working alongside customers to support any type of workload and model requirements.
If you need higher usage limits or have specific requirements, please see our [support page](/troubleshooting/support/).
### Regional Processing
Requests are processed based on the region of the MotherDuck organization according to the table below. Functions that are not available within the region (no checkmark) will be processed with global compute resources.
| Function | Global | Europe |
|----------|--------|-------------------------|
| `PROMPT` | ✓ | ✓ |
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/index
---
sidebar_position: 0
title: SQL Assistant
---
import DocCardList from '@theme/DocCardList';
# SQL Assistant
Built-in SQL functions that use AI to help you work with SQL. Generate SQL queries, execute read-only questions directly, fix errors, explain queries, and more.
These functions can be useful building blocks for [AI-driven analytics solutions](/key-tasks/ai-and-motherduck/building-analytics-agents/) or used stand-alone on all MotherDuck surfaces (including the CLI).
To use external tools like Claude Desktop or Cursor with MotherDuck, see [MCP Server](/sql-reference/mcp/).
## Available Functions
## Notes
SQL assistant functions operate on your current database by evaluating the schemas and contents of the database. You can specify which tables and columns should be considered using the optional `include_tables` parameter. By default, all tables in the current database are considered.
To point the SQL assistant functions at a specific database, execute the `USE database` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)).
These capabilities are provided by MotherDuck's integration with Azure OpenAI.
For availability and pricing, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model).
If you have further questions or specific requirements, please see our [support page](/troubleshooting/support/).
### Regional Processing
Requests are processed based on the region of the MotherDuck organization according to the table below. Functions that are not available within the region (no checkmark) will be processed with global compute resources.
| Function | Global | Europe |
|----------|--------|-------------------------|
| SQL Assistant Functions | ✓ | ✓ |
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-explain
---
sidebar_position: 0.9
title: PROMPT_EXPLAIN
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
## Explain a query
The `prompt_explain` table function allows MotherDuck AI to analyze and explain SQL queries in plain English. This feature helps you understand complex queries, verify that a query does what you intend, and learn SQL concepts through practical examples.
::::tip
This function is particularly useful for understanding queries written by others or for automatically documenting your own queries for future reference.
::::
### Syntax
```sql
CALL prompt_explain('', [include_tables=['', '']]);
```
### Parameters
| **Parameter** | **Required** | **Description** |
|--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| `query` | Yes | The SQL query to explain |
| `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database). Can also be a dictionary in the format `{'table_name': ['column1', 'column2']}` to specify which columns to include for each table. |
### Example usage
Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets).
#### Explaining a complex query
```sql
CALL prompt_explain('
SELECT COUNT(*) as domain_count,
SUBSTRING(SPLIT_PART(url, ''//'', 2), 1, POSITION(''/'' IN SPLIT_PART(url, ''//'', 2)) - 1) as domain
FROM hn.hacker_news
WHERE url IS NOT NULL GROUP BY domain ORDER BY domain_count DESC LIMIT 10;
');
```
**Output**: when you run a `prompt_explain` query, you'll receive a single-column table with a detailed explanation:
| **explanation** |
|-----------------|
|The query retrieves the top 10 most frequent domains from the `url` field in the `hn.hacker_news` table. It counts the occurrences of each domain by extracting the domain part from the URL (after the '//' and before the next '/'), groups the results by domain, and orders them in descending order of their count. The result includes the count of occurrences (`domain_count`) and the domain name itself (`domain`). |
#### Using dictionary format for include_tables
You can specify which columns to include for each table using the dictionary format:
```sql
CALL prompt_explain('
SELECT u.id, u.name, COUNT(s.id) AS story_count
FROM hn.users u
LEFT JOIN hn.stories s ON u.id = s.user_id
GROUP BY u.id, u.name
HAVING COUNT(s.id) > 5
ORDER BY story_count DESC
LIMIT 20;
', include_tables={'hn.users': ['id', 'name'], 'hn.stories': ['id', 'user_id']});
```
This approach allows you to focus the explanation on only the relevant columns, which can be helpful for tables with many columns.
#### How it works
The `prompt_explain` function processes your query in several steps:
1. **Parsing**: analyzes the SQL syntax to understand the query structure
2. **Schema analysis**: examines the referenced tables and columns to understand the data model
3. **Operation analysis**: identifies the operations being performed (filtering, joining, aggregating, etc.)
4. **Translation**: converts the technical SQL into a clear, human-readable explanation
5. **Context addition**: adds relevant context about the purpose and expected results of the query
### Best practices
For the best results with `prompt_explain`:
1. **Provide complete queries**: include all parts of the query for the most accurate explanation
2. **Use table aliases consistently**: this helps the function understand table relationships
3. **Specify relevant tables**: use the `include_tables` parameter for large databases
4. **Review explanations**: verify that the explanation matches your understanding of the query
5. **Use for documentation**: save explanations as comments in your code for future reference
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line
---
sidebar_position: 0.9
title: PROMPT_FIX_LINE
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
## Fix your query line-by-line
The `prompt_fix_line` table function allows MotherDuck AI to correct specific lines in your SQL queries that contain syntax or spelling errors. Unlike [`prompt_fixup`](../prompt-fixup), which rewrites the entire query, this function targets only the problematic lines, making it faster and more precise for localized errors.
::::tip
This function is ideal for fixing minor syntax errors in large queries where you want to preserve most of the original query structure and formatting.
::::
### Syntax
```sql
CALL prompt_fix_line('', error='', [include_tables=['', '']]);
```
### Parameters
| **Parameter** | **Required** | **Description** |
|--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| `query` | Yes | The SQL query that needs correction |
| `error` | No | The error message from the SQL parser (helps identify the problematic line) |
| `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database) |
### Example usage
Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets).
#### Fixing simple syntax errors
```sql
-- Fixing a misspelled keyword with error message
CALL prompt_fix_line('SEELECT COUNT(*) as domain_count FROM hn.hackers', error='
Parser Error: syntax error at or near "SEELECT"
LINE 1: SEELECT COUNT(*) as domain_count FROM h...
^');
-- Fixing a typo in a column name
CALL prompt_fix_line('SELECT user_id, titlee, score FROM hn.stories LIMIT 10');
-- Fixing incorrect operator usage
CALL prompt_fix_line('SELECT * FROM hn.stories WHERE score => 100');
```
#### Fixing errors in multi-line queries
```sql
-- Fixing a specific line in a complex query
CALL prompt_fix_line('SELECT
user_id,
COUNT(*) AS post_count,
AVG(scor) AS average_score
FRUM hn.stories
GROUP BY user_id
ORDER BY post_count DESC
LIMIT 10', error='
Parser Error: syntax error at or near "FRUM"
LINE 5: FRUM hn.stories
^');
```
### Example output
When you run a `prompt_fix_line` query, you'll receive a two-column table with the line number and corrected content:
| **line_number** | **line_content** |
|-----------------|-------------------------------------------------|
| 1 | SELECT COUNT(*) as domain_count FROM hn.hackers |
For multi-line queries, only the problematic line is corrected:
| **line_number** | **line_content** |
|-----------------|-------------------------------------------------|
| 5 | FROM hn.stories |
#### How it works
The `prompt_fix_line` function processes your query in a targeted way:
1. **Error localization**: uses the error message (if provided) to identify the specific line with issues
2. **Context analysis**: examines surrounding lines to understand the query's structure and intent
3. **Targeted correction**: fixes only the problematic line while preserving the rest of the query
4. **Line replacement**: returns the corrected line with its line number for easy integration
For example, when fixing a syntax error in a single line:
```sql
CALL prompt_fix_line('SEELECT COUNT(*) as domain_count FROM hn.hackers', error='
Parser Error: syntax error at or near "SEELECT"
LINE 1: SEELECT COUNT(*) as domain_count FROM h...
^');
```
The function will focus only on line 1, correcting the misspelled keyword:
| **line_number** | **line_content** |
|-----------------|-------------------------------------------------|
| 1 | SELECT COUNT(*) as domain_count FROM hn.hackers |
For multi-line queries with an error on a specific line:
```sql
CALL prompt_fix_line('SELECT
user_id,
COUNT(*) AS post_count,
AVG(scor) AS average_score
FRUM hn.stories
GROUP BY user_id
ORDER BY post_count DESC
LIMIT 10', error='
Parser Error: syntax error at or near "FRUM"
LINE 5: FRUM hn.stories
^');
```
The function will only correct line 5, leaving the rest of the query untouched:
| **line_number** | **line_content** |
|-----------------|-------------------------------------------------|
| 5 | FROM hn.stories |
This allows you to apply the fix by replacing just the problematic line in your original query, which is especially valuable for large, complex queries where a complete rewrite would be disruptive.
When multiple errors exist, you would run `prompt_fix_line` multiple times, fixing one line at a time:
```sql
-- First fix
CALL prompt_fix_line('SELECT
user_id,
COUNT(*) AS post_count,
AVG(scor) AS average_score
FRUM hn.stories
GROUP BY user_id
ORDER BY post_count DESC
LIMIT 10', error='
Parser Error: syntax error at or near "FRUM"
LINE 5: FRUM hn.stories
^');
-- After applying the first fix, run again for the second error
CALL prompt_fix_line('SELECT
user_id,
COUNT(*) AS post_count,
AVG(scor) AS average_score
FROM hn.stories
GROUP BY user_id
ORDER BY post_count DESC
LIMIT 10', error='
Parser Error: column "scor" does not exist
LINE 4: AVG(scor) AS average_score
^');
```
The second call would return:
| **line_number** | **line_content** |
|-----------------|-------------------------------------------------|
| 4 | AVG(score) AS average_score |
Note: you need to run `prompt_fix_line` multiple times to fix all errors.
### Best practices
For the best results with `prompt_fix_line`:
1. **Include the error message**: the parser error helps pinpoint the exact issue
2. **Preserve query structure**: use this function when you want to maintain most of your original query
3. **Fix one error at a time**: to address multiple errors, run `prompt_fix_line` multiple times
4. **Include context**: provide the complete query, not just the problematic line
5. **Be specific with table names**: use the `include_tables` parameter for large databases
### Limitations
While `prompt_fix_line` is efficient, be aware of these limitations:
- Only fixes syntax errors, not logical errors in query structure
- Accurate error messages help identify the problematic line and improve output
- May not be able to fix errors that span multiple lines
- Cannot fix issues related to missing tables or columns in your database
- Works best with standard SQL patterns and common table structures
### Troubleshooting
If you're not getting the expected results:
- Ensure you've included the complete error message
- Check that the line numbers in the error message match your query
- For complex errors, try using `prompt_fixup` instead
- If multiple lines need fixing, address them one at a time
- Verify that your database schema is accessible to the function
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fixup
---
sidebar_position: 0.9
title: PROMPT_FIXUP
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
## Fix up your query
The `prompt_fixup` table function allows MotherDuck AI to correct and **completely rewrite** SQL queries that have logical or severe syntactical issues. This powerful feature analyzes your problematic query, identifies issues, and generates a corrected version that follows proper SQL syntax and semantics.
::::tip
For minor syntax errors or typos in large queries, consider using the [`prompt_fix_line`](../prompt-fix-line) function instead, which is faster and more precise as it only rewrites the problematic line.
::::
### Syntax
```sql
CALL prompt_fixup('', [include_tables=['', '']]);
```
### Parameters
| **Parameter** | **Required** | **Description** |
|--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| `query` | Yes | The SQL query that needs correction |
| `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database) |
### Example Usage
Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets).
#### Fixing syntax errors
```sql
-- Fixing misspelled keywords
CALL prompt_fixup('SEELECT COUNT(*) as domain_count FROM hn.hackers');
-- Fixing incorrect table names
CALL prompt_fixup('SELECT * FROM hn.stories WHERE score > 100 ODER BY score DESC');
-- Fixing missing clauses
CALL prompt_fixup('SELECT AVG(score) hn.hacker_news GROUP score > 10');
```
#### Fixing logical errors
```sql
-- Fixing incorrect join syntax
CALL prompt_fixup('SELECT u.name, s.title FROM hn.users u, hn.stories s WHERE u.id = s.user_id ORDER BY s.score');
-- Fixing aggregation issues
CALL prompt_fixup('SELECT user_id, AVG(score) FROM hn.stories GROUP BY score');
-- Fixing complex query structure
CALL prompt_fixup('SELECT COUNT(*) FROM hn.stories WHERE timestamp > "2020-01-01" AND timestamp < "2020-12-31" WITH score > 100');
```
### Example output
When you run a `prompt_fixup` query, you'll receive a single-column table with the corrected SQL:
| **query** |
|-----------------|
| SELECT COUNT(*) as domain_count FROM hn.hacker_news |
#### How it works
The `prompt_fixup` function processes your query in several steps:
1. **Analysis**: examines your query to identify syntax errors, logical issues, and structural problems
2. **Schema validation**: checks your query against the database schema to ensure table and column references are valid
3. **Correction**: applies fixes based on the identified issues and your likely intent
4. **Rewriting**: generates a complete, corrected version of your query that maintains your original goal
For example, when fixing this query with multiple issues:
```sql
CALL prompt_fixup('SEELECT AVG(scor) FRUM hn.stories WERE timestamp > "2020-01-01" GRUP BY user_id');
```
The function will:
- Correct misspelled keywords (`SEELECT` → `SELECT`, `FRUM` → `FROM`, `WERE` → `WHERE`, `GRUP` → `GROUP`)
- Fix column name typos (`scor` → `score`)
- Ensure proper clause ordering and syntax
Resulting in a properly formatted query:
| **query** |
|-----------------|
| SELECT AVG(score) FROM hn.stories WHERE timestamp > '2020-01-01' GROUP BY user_id |
For logical errors, the process is similar but focuses on semantic correctness:
```sql
CALL prompt_fixup('SELECT user_id, AVG(score) FROM hn.stories GROUP BY score');
```
Will be corrected to:
| **query** |
|-----------------|
| SELECT user_id, AVG(score) FROM hn.stories GROUP BY user_id |
The function recognized that grouping should be by `user_id` (the non-aggregated column) rather than by `score` (which is being averaged).
### Best practices
For the best results with `prompt_fixup`:
1. **Include the entire query**: even if only part of it has issues
2. **Be specific with table names**: use the `include_tables` parameter for large databases
3. **Review the fixed query**: always check that the corrected query matches your intent
4. **Use for complex issues**: prefer this function for logical errors or major syntax problems
5. **Consider alternatives**: for simple typos, `prompt_fix_line` may be more efficient
### Limitations
While `prompt_fixup` is powerful, be aware of these limitations:
- May change query logic if the original intent isn't clear
- Performance depends on the complexity of your query
- Works best with standard SQL patterns and common table structures
- May not preserve exact formatting or comments from the original query
- Cannot fix issues related to missing tables or columns in your database
### Troubleshooting
If you're not getting the expected results:
- Check that you've included all relevant tables in the `include_tables` parameter
- Ensure your database schema is accessible to the function
- For very complex queries, try breaking them into smaller parts
- If the fixed query doesn't match your intent, try providing more context in comments
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-query
---
sidebar_position: 0.1
title: PROMPT_QUERY
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
## Answer questions about your data
The `prompt_query` pragma allows you to ask questions about your data in natural language. This feature translates your plain English questions into SQL, executes the query, and returns the results.
Under the hood, MotherDuck analyzes your database schema, generates appropriate SQL and executes the query on your behalf. This makes data exploration and analysis accessible to users of all technical levels.
For comprehensive guidance on building analytics agents, including best practices and implementation patterns, see [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/).
::::info
The `prompt_query` pragma is a read-only operation and does not allow queries that modify the database.
::::
### Syntax
```sql
PRAGMA prompt_query('')
```
### Parameters
| **Parameter** | **Required** | **Description** |
|--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| `question` | Yes | The natural language question about your data |
### Example usage
Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets).
`prompt_query` can be used to answer both simple and complex questions.
#### Basic questions
```sql
-- Find the most shared domains
PRAGMA prompt_query('what are the top domains being shared on hacker_news?')
-- Analyze posting patterns
PRAGMA prompt_query('what day of the week has the most posts?')
-- Identify trends
PRAGMA prompt_query('how has the number of posts changed over time?')
```
#### Complex questions
```sql
-- Multi-part analysis
PRAGMA prompt_query('what are the top 5 domains with the highest average score, and how many stories were posted from each?')
-- Time-based analysis
PRAGMA prompt_query('compare the average score of posts made during weekdays versus weekends')
-- Conditional filtering
PRAGMA prompt_query('which users have posted the most stories about artificial intelligence or machine learning?')
```
### Best practices
For the best results with `prompt_query`:
1. **Be specific**: clearly state what information you're looking for
2. **Provide context**: include relevant details about the data you want to analyze
3. **Use natural language**: phrase your questions as you would ask a data analyst
4. **Start simple**: begin with straightforward questions and build to more complex ones
5. **Refine iteratively**: if results aren't what you expected, try rephrasing your question
### Limitations
While `prompt_query` is powerful, be aware of these limitations:
- Only performs read operations (`SELECT` queries)
- Works best with well-structured data with clear column names
- Complex statistical analyses will likely require you (or an LLM) to write SQL
- Performance depends on the complexity of your question and database size
- May not understand highly domain-specific terminology without you giving more context
### Troubleshooting
If you're not getting the expected results:
- Check that you're connected to the correct database
- Ensure your question is clear and specific
- Try rephrasing your question using different terms
- For complex analyses, break down into multiple simpler questions
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-schema
---
sidebar_position: 0.9
title: PROMPT_SCHEMA
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
## Describe contents of a database
The `prompt_schema` table function allows MotherDuck AI to analyze and describe the contents of your current database in plain English. This feature helps you understand the structure, purpose, and relationships between tables in your database without having to manually inspect each table's schema.
::::tip
This function is particularly useful when working with unfamiliar databases or when you need a high-level overview of a complex database structure.
::::
### Syntax
```sql
CALL prompt_schema([include_tables=['', '']]);
```
### Parameters
| **Parameter** | **Required** | **Description** |
|--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------|
| `include_tables` | No | Array of table names to consider for analysis (defaults to all tables in current database) |
### Example usage
Here are several examples using MotherDuck's [sample data database](/getting-started/sample-data-queries/datasets).
#### Describing the entire database
```sql
CALL prompt_schema();
```
#### Example output
When you run a `prompt_schema` query, you'll receive a single-column table with a detailed description:
| **summary** |
|-----------------|
| The database contains tables related to ambient air quality data, Stack Overflow survey results, NYC taxi and service requests, rideshare data, movie information with embeddings, and Hacker News articles, capturing a wide range of information from environmental metrics to user-generated content and transportation data. |
#### Describing specific tables
```sql
CALL prompt_schema(include_tables=['hn.hacker_news', 'hn.stories']);
```
| **summary** |
|-----------------|
| The database contains information about Hacker News posts, including details such as the title, URL, content, author, score, time of posting, type of post, and various identifiers and status flags. |
#### How it works
The `prompt_schema` function processes your database in several steps:
1. **Schema extraction**: examines the structure of tables, including column names and data types
2. **Data sampling**: analyzes sample data to understand the content and purpose of each table
3. **Relationship detection**: identifies potential relationships between tables based on column names and values
4. **Domain recognition**: categorizes tables into domains or subject areas based on their content
5. **Summary generation**: creates a human-readable description of the database structure and purpose
### Best practices
For the best results with `prompt_schema`:
1. **Focus on relevant tables**: use the `include_tables` parameter to analyze specific parts of large databases
2. **Run on updated databases**: ensure your database is up-to-date for the most accurate description
3. **Use for documentation**: save the output as part of your database documentation
4. **Combine with other tools**: use alongside `DESCRIBE` and `SHOW` commands for complete understanding
5. **Share with team members**: use the output to help new team members understand the database structure
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql
---
sidebar_position: 0.8
title: PROMPT_SQL
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Overview
The `prompt_sql` function allows you to generate SQL queries using natural language. Simply describe what you want to analyze in plain English, and MotherDuck AI will translate your request into a valid SQL query based on your database schema and content.
This function helps users who are less familiar with SQL syntax to generate queries and experienced SQL users save time when working with unfamiliar schemas.
For comprehensive guidance on building analytics agents, including best practices and implementation patterns, see [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/).
## Syntax
```sql
CALL prompt_sql(''[, include_tables=]);
```
## Parameters
| Parameter | Type | Description | Required |
|-----------|------|-------------|----------|
| `natural language question` | STRING | Your query in plain English describing the data you want to analyze | Yes |
| `include_tables` | ARRAY or MAP | Specifies which tables and columns to consider for query generation. When not provided, all tables in the current database will be considered. | No |
### Include tables parameter
You can specify which tables and columns should be considered during SQL generation using the `include_tables` parameter. This is particularly useful when:
- You want to focus on specific tables in a large database
- You want to improve performance by reducing the schema analysis scope
The parameter accepts three formats:
1. **Array of table names**: include all columns from specified tables:
```sql
include_tables=['table1', 'table2']
```
2. **Map of tables to columns**: include only specific columns from tables:
```sql
include_tables={'table1': ['column1', 'column2'], 'table2': ['column3']}
```
3. **Map with column regex patterns**: include columns matching patterns:
```sql
include_tables={'table1': ['column_prefix.*', 'exact_column']}
```
## Examples
### Basic example
Let's start with a simple example using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news):
```sql
CALL prompt_sql('what are the top domains being shared on hacker_news?');
```
Output:
| **query** |
|-----------------|
| SELECT regexp_extract(url, 'https?://([^/]+)') AS domain, COUNT(*) AS count FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY count DESC; |
### Intermediate example
This example demonstrates how to generate a more complex query with filtering, aggregation, and time-based analysis:
```sql
CALL prompt_sql('Show me the average score of stories posted by each author who has posted at least 5 stories in 2022, sorted by average score');
```
Output:
| **query** |
|-----------------|
| SELECT 'by', AVG(score) AS average_score FROM hn.hacker_news WHERE EXTRACT(YEAR FROM 'timestamp') = 2022 GROUP BY 'by' HAVING COUNT(id) >= 5 ORDER BY average_score; |
### Advanced Example: Multi-table Analysis with Specific Columns
This example shows how to generate a query that focuses on specific columns:
```sql
CALL prompt_sql(
'Find the top 10 users who submitted the most stories with the highest average scores in 2023',
include_tables={
'hn.hacker_news': ['id', 'by', 'score', 'timestamp', 'type', 'title']
}
);
```
Output:
| **query** |
|-----------------|
| SELECT "by", AVG(score) AS avg_score, COUNT(*) AS story_count FROM hn.hacker_news WHERE "type" = 'story' AND EXTRACT(YEAR FROM "timestamp") = 2023 GROUP BY "by" ORDER BY story_count DESC, avg_score DESC LIMIT 10; |
### Expert example
This example demonstrates generating a complex query with subqueries, window functions, and complex logic:
```sql
CALL prompt_sql('For each month in 2022, show me the top 3 users who posted stories with the highest scores, and how their average score compares to the previous month');
```
Output:
| **query** |
|-----------------|
| WITH monthly_scores AS ( SELECT "by" AS user, DATE_TRUNC('month', "timestamp") AS month, AVG(score) AS avg_score FROM hn.hacker_news WHERE "type" = 'story' AND DATE_PART('year', "timestamp") = 2022 GROUP BY user, month ), ... |
## Failure example
This example shows that for some complex queries, the model might not generate a valid SQL query. Therefore the output will be the following error message:
```sql
CALL prompt_sql('Identify the most discussed technology topics in Hacker News stories from the past year based on title keywords, and show which days of the week have the highest engagement for each topic');
```
Output:
| **query** |
|-----------------|
| Invalid Input Error: The AI could not generate valid SQL. Try re-running the command or rephrasing your question. |
To generate a valid SQL query, you can try to break down the question into simpler parts.
## Best practices
1. **Be specific in your questions**: the more specific your natural language query, the more accurate the generated SQL will be.
2. **Start simple and iterate**: begin with basic queries and gradually add complexity as needed.
3. **Use the `include_tables` parameter**: when working with large databases, specify relevant tables to improve performance and accuracy.
4. **Review generated SQL**: always review the generated SQL before executing it, especially for complex queries.
5. **Understand your schema**: knowing your table structure helps you phrase questions that align with available data.
6. **Use domain-specific terminology**: include field names in your questions when possible.
7. **Provide context in your questions**: mention time periods, specific metrics, or business context to get more relevant results.
## Notes
- By default, all tables in the current database are considered. Use the `include_tables` parameter to narrow the scope.
- To target a specific database, first execute the `USE ` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)).
- The quality of generated SQL depends on the clarity of your natural language question and the quality of your database schema (table and column names).
## Troubleshooting
If you encounter issues with the `prompt_sql` function, consider the following troubleshooting steps:
1. **Check your database schema**: ensure that the tables and columns you're querying are present in the current database.
2. **Be specific in your questions**: the more specific your natural language query, the more accurate the generated SQL will be.
3. **Use the `include_tables` parameter**: when working with large databases, specify relevant tables to improve performance and accuracy.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/attach
---
sidebar_position: 1
title: ATTACH
---
# ATTACH
The `ATTACH` command in MotherDuck can be used to:
- Attach a local database to access local data
- Re-attach a previously detached MotherDuck database or [share](https://motherduck.com/docs/key-tasks/sharing-data/sharing-overview/)
- Attach a public [share](https://motherduck.com/docs/key-tasks/sharing-data/) created by any MotherDuck user in the same cloud region as your Organization, including users outside your Organization
:::note "Aliasing Databases"
Aliasing behavior in MotherDuck considers your relationship to the database itself. For databases owned by your user, aliases are not allowed. You will see an error that says `Database aliases are not yet supported by MotherDuck in workspace mode` when attempting to do this.
For shares, database aliases are _optional_, the default name is the string following `_share`, i.e. `ATTACH md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa;` will have the aliase `birds`. When attaching a share, the alias name remains in effect for as long as the database is attached. If the database is detached for any reason, the associated alias name is automatically cleared as well.
:::
## Attaching Databases
### Syntax for Databases
```sql
ATTACH 'md:'
```
Parameters:
* `database_name`: The name of the database to which to connect. If omitted, it defaults to 'workspace', which connects to all databases.
:::note
Shares are region-scoped based on your Organization's cloud region. MotherDuck Organizations are currently scoped to a single cloud region that must be chosen at Org creation when signing up.
:::
### Examples of Database Attachment
```sql
-- Connect to a specific MotherDuck database
ATTACH 'md:';
-- Connect to all MotherDuck databases in the workspace:
ATTACH 'md:';
-- Connect to a local database
ATTACH '/path/to/my_database.duckdb';
ATTACH 'a_new_local_duckdb';
```
:::note
Shares are region-scoped based on your Organization's cloud region. MotherDuck Organizations are currently scoped to a single cloud region that must be chosen at Org creation when signing up.
:::
### Important Notes for Database Attachment
* Local database `ATTACH` operations:
* Are temporary and last only for the current session
* Data stays local and isn't uploaded to MotherDuck
* Use file paths instead of share URLs
* MotherDuck database `ATTACH` operations:
* Are persistent, as they attach the database/share to your MotherDuck account.
* Requires read/write permissions for the database.
* The database must have been created by the active user and must have already been detached.
* If the remote database was not detached prior to running the `ATTACH` command, using the `md:` prefix will produce an error rather than creating a local database and attaching it.
* For a remote MotherDuck database, the database name is used to indicate what to attach and no alias is permitted.
## Attaching Shares
Sharing in MotherDuck is done through shares. Recipients of a share must `ATTACH` the share, which creates a read-only database. This is a zero-copy, zero-cost, metadata-only operation. [Learn more about sharing in MotherDuck](/key-tasks/sharing-data/sharing-overview.md).
### Syntax for Shares
```sql
ATTACH [AS ];
```
### Shorthand Convention for Shares
You may choose to name the new database by using `AS `. If you omit this clause, the new database will be given the same name as the source database that's being shared.
### Examples of Attaching Shares
```sql
ATTACH 'md:_share/ducks/0a9a026ec5a55946a9de39851087ed81' AS birds; # attaches the share as database `birds`
ATTACH 'md:_share/ducks/0a9a026ec5a55946a9de39851087ed81'; # attaches the share as database `ducks`
```
## Troubleshooting
### Finding Share URL
The Share URL is provided when you [create a share](/key-tasks/sharing-data/sharing-overview/#creating-a-share).
You can also see shares available for attachment by querying the [`MD_INFORMATION_SCHEMA.SHARED_WITH_ME`](/sql-reference/motherduck-sql-reference/md_information_schema/shared_with_me/) view.
### Handling name conflicts between local and remote databases
In case of name conflict between a local database and a remote database, there are two possible paths:
1. Attach the local database with a different name using an alias with `AS`. For instance : `ATTACH 'my_db.db' AS my_new_name`
2. Create a share out of your remote database and attach it with an alias. Shares are read-only.
### Using `SHARES` with [Read-Scaling Replicas](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
A database cannot be attached with a [read_scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#understanding-read-scaling-tokens) token. Databases should first be attached by an account + token with read_write permission, then accessed via read_scaling tokens.
For more information, see the [Attach & Detach](/key-tasks/database-operations/detach-and-reattach-motherduck-database) guide.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/connection-management/connection-duckdb-id
---
sidebar_position: 3
title: Identify client connection and DuckDB ID
---
:::info
This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.
:::
# Identify client connection and DuckDB ID
`md_current_client_connection_id` and `md_current_client_duckdb_id` are two scalar functions that can be used to identify the current `client_connection_id` and `client_duckdb_id`.
## Syntax
```sql
SELECT md_current_client_connection_id();
SELECT md_current_client_duckdb_id();
```
## Example usage
To [interrupt](documentation/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections.md) all server-side connections that are initiated by the current client DuckDB instance, we can use:
```sql
SELECT md_interrupt_server_connection(client_connection_id)
FROM md_active_server_connections()
WHERE client_duckdb_id = md_current_client_duckdb_id()
AND client_connection_id != md_current_client_connection_id();
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections
---
sidebar_position: 2
title: Interrupting active server connections
---
:::info
This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.
:::
# Interrupting active server connections
The `md_interrupt_server_connection` scalar function can be used to interrupt an active transaction on a server-side connection.
This will interrupt and fail / rollback the active transaction (when executing for example a long-running query), but will allow the connection to be used for future transactions and queries.
The function takes as input the `client_connection_id`, i.e. the unique identifier for the client DuckDB connection that initiated the server connection.
## Syntax
```sql
SELECT md_interrupt_server_connection();
```
## Example usage
Interrupting a specific connection:
```sql
SELECT md_interrupt_server_connection('2601e799-51b3-47a7-a64f-18688d148887');
```
Using `md_interrupt_server_connection` in conjunction with [`md_active_server_connections`](documentation/sql-reference/motherduck-sql-reference/connection-management/monitor-connections.md) to interrupt a subset or all of the currently active connections:
```sql
-- Interrupt all connections where a `CREATE TABLE` query is running
SELECT md_interrupt_server_connection(client_connection_id)
FROM md_active_server_connections()
WHERE starts_with(client_query, 'CREATE TABLE');
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/connection-management/monitor-connections
---
sidebar_position: 1
title: Monitoring active server connections
---
:::info
This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility.
:::
# Monitoring active server connections
The `md_active_server_connections` table function can be used to list all server-side connections that have active transactions.
## Syntax
```sql
FROM md_active_server_connections();
```
This returns a list of active server connections, with the following information:
| **column_name** | **column_type** | **description** |
|-------------------------------------|-----------------|----------------------------------------------------------------------------------|
| client_duckdb_id | UUID | Unique identifier for the client DuckDB instance that initiated the connection |
| client_user_agent | VARCHAR | User agent for the client |
| client_duckdb_version | USMALLINT[3] | DuckDB version from the client |
| client_connection_id | UUID | Unique identifier for the client DuckDB connection that initiated the connection |
| client_transaction_id | UBIGINT | Identifier for the transaction within the current connection |
| server_transaction_stage | VARCHAR | Stage the server-side transaction is in |
| server_transaction_elapsed_time | INTERVAL | How long the server-side transaction has been in the current stage |
| client_query_id | UBIGINT | Identifier for the query within the current transaction |
| client_query | VARCHAR | Query string (possibly truncated) |
| server_query_elapsed_time | INTERVAL | How long the query has been running on the server-side |
| server_query_execution_elapsed_time | INTERVAL | How long the connection has been interrupted |
| server_query_progress | DOUBLE | Progress information (value between 0.0 and 1.0) |
| server_interrupt_elapsed_time | INTERVAL | How long the connection has been interrupted |
| server_interrupt_reason | VARCHAR | Why the connection was interrupted |
| query_total_upload_size | UBIGINT | Data uploaded in Bytes |
| query_total_download_size | UBIGINT | Data downloaded in Bytes |
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/copy-database-overwrite
---
sidebar_position: 1
title: COPY FROM DATABASE (OVERWRITE)
---
# COPY FROM DATABASE (OVERWRITE)
The `COPY FROM DATABASE ... (OVERWRITE)` statement will make the target_db contain exactly the same data as source_db via zero-copy cloning, effectively overwriting it.
This command will wait on any ongoing write transactions on the target database to complete, and prevent new ones from starting while it is in progress.
:::note
The syntax is supported in MotherDuck only, as it operates on a MotherDuck metadata level.
:::
:::tip Zero-copy clone
This command operates purely at the MotherDuck metadata layer, so it is a **zero-copy clone**. The operation is almost instantaneous and does not duplicate any underlying data.
:::
## Syntax
```sql
COPY FROM DATABASE (OVERWRITE) [ TO ]
```
### Parameters
- ``: The name or path of the source database to copy from, can be either a MotherDuck database or a share.
- ``: The name or path of the target database to create, must be a MotherDuck database that the user owns.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/copy-database
---
sidebar_position: 1
title: COPY FROM DATABASE
hide_title: true
description: Copy a database from one location to another in MotherDuck
keywords:
- copy database
- clone database
- duplicate database
- database copy
- database clone
---
# COPY FROM DATABASE
The `COPY FROM DATABASE` statement creates a new database from an existing one by copying its structure and data. This command can be used to:
[Interact with MotherDuck Databases](#copy-a-motherduck-database-to-a-motherduck-database)
- Copy MotherDuck databases to MotherDuck databases
[Interact with Local Databases](#interacting-with-local-databases)
- Copy local databases to MotherDuck databases
- Copy MotherDuck databases to local databases
- Copy local databases to local databases
The `COPY FROM DATABASE` command is a multiple statement macro. Multiple statement macros are not supported in Wasm and as a result, this command will not work in the MotherDuck Web UI when copying both schema and data. However, the command works in the MotherDuck Web UI if either the `(DATA)` option is specified or the `(SCHEMA)` option is specified. All other drivers support this command, including the DuckDB CLI.
:::caution No zero-copy clone
`COPY FROM DATABASE` creates a *physical* copy of both the schema and the data. It **does not** use MotherDuck's zero-copy cloning, so the operation may take longer to run and will consume additional storage proportional to the size of the source database. If you want to zero-copy clone a database, use the [`COPY FROM DATABASE (OVERWRITE)`](/sql-reference/motherduck-sql-reference/copy-database-overwrite.md) or the [`CREATE DATABASE ... FROM` statement](/sql-reference/motherduck-sql-reference/create-database.md).
:::
## Syntax
```sql
COPY FROM DATABASE TO [ (SCHEMA) | (DATA) ]
```
### Parameters
- ``: The name or path of the source database to copy from
- ``: The name or path of the target database to create
- `(SCHEMA)`: Optional parameter to copy only the database schema without data
- `(DATA)`: Optional parameter to copy only the database data without schema
## Example Usage
### Copy a MotherDuck database to a MotherDuck database
This is the same as [creating a new database from an existing one](/sql-reference/motherduck-sql-reference/create-database.md).
```sql
COPY FROM DATABASE my_db TO my_db_copy;
```
### Interacting with Local Databases
These operations can be done with access to the local filesystem, i.e. inside the DuckDB CLI.
#### Copy a local database to a MotherDuck database
```sql
ATTACH 'md:';
ATTACH 'local_database.db';
CREATE DATABASE md_database;
COPY FROM DATABASE local_database TO md_database;
```
#### Copy a MotherDuck database to a local database
To copy a MotherDuck database to a local database requires some extra steps.
```sql
ATTACH 'md:';
ATTACH 'local_database.db' as local_db;
COPY FROM DATABASE my_db TO local_db;
```
#### Copy a local database to a local database
To copy a local database to a local database, please see the [DuckDB documentation](https://duckdb.org/docs/stable/sql/statements/copy.html#copy-from-database--to).
### Copying the Database Schema
```sql
COPY FROM DATABASE my_db TO my_db_copy (SCHEMA);
```
This will copy the schema of the database, but not the data.
### Copying the Database Data
```sql
COPY FROM DATABASE my_db TO my_db_copy (DATA);
```
This will copy the data of the database, but not the schema.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-database
---
sidebar_position: 1
title: CREATE DATABASE
hide_title: true
description: Create a database, zero-copy clone from an existing database, or import a local DuckDB file into MotherDuck.
---
## CREATE DATABASE
The `CREATE DATABASE` statement creates a new MotherDuck database. You can also use it to:
- Create a MotherDuck database from a local DuckDB database.
- Create a MotherDuck database from another MotherDuck database or [share](https://motherduck.com/docs/key-tasks/sharing-data/sharing-overview/) via zero-copy clone (without physically copying data).
:::note Copy to local database
To copy a MotherDuck database to a local database, use the [`COPY FROM DATABASE`](/sql-reference/motherduck-sql-reference/copy-database.md) statement.
:::
::::tip Zero-copy clone
When the source is another MotherDuck database or a share, `CREATE DATABASE ... FROM` performs a zero-copy clone. The command completes almost instantly because no data is physically duplicated. When the source is a local file, data is physically copied to MotherDuck.
::::
## Syntax
```sql
CREATE [ OR REPLACE ] DATABASE [ IF NOT EXISTS ]
[
FROM |
FROM '' |
FROM 'md:_share/...' |
FROM CURRENT_DATABASE() -- Important: this command does not work with attached shares
]
[(DATABASE OPTIONS)];
```
You can also pass the name of an attached share or a share URL as the database name, for example `CREATE DATABASE FROM my_share` or `CREATE DATABASE FROM 'md:_share/...'`.
If the database name already exists, the statement returns an error unless you specify `IF NOT EXISTS`.
Similar to DuckDB table name conventions, database names that start with a number or contain special characters must be double-quoted when used. Example: `CREATE DATABASE "123db"`
Creating a database does not change the active database. Run `USE DATABASE ` to switch.
## Database Options
For native storage-backed databases, MotherDuck supports two ways for users to configure historical retention periods. At database creation, users can either set the database type as transient or standard (default). Each database type comes with its own failsafe period.
| Name | Data Type | Value |
|------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| STANDARD | native storage format | Leave blank; any database created in MotherDuck will default to a standard, native storage-backed database. |
| TRANSIENT | native storage format | Specify `TRANSIENT` at database creation to enable it to use transient storage. Refer to [Storage Lifecycle Management](/concepts/Storage-lifecycle#storage-management) for more details. |
| DUCKLAKE | integrated data lake and catalog format | Specify `TYPE DUCKLAKE` at database creation to create a fully managed DuckLake. Refer to the [DuckLake Overview](https://motherduck.com/docs/integrations/file-formats/ducklake/) for more details. |
Once these properties are set, they cannot be changed. However, MotherDuck supports cloning databases and copying data content between transient and standard databases. Note that the following syntax `CREATE DATABASE empty_duck FROM non_empty_duck (TRANSIENT);` is not supported. Please refer to the examples below for supported methods.
## Example Usage
To create an empty database:
```sql
CREATE DATABASE empty_ducks;
```
If the database name already exists, the statement fails unless you use `OR REPLACE` or `IF NOT EXISTS`.
```sql
CREATE DATABASE ducks;
-- Succeeds if 'ducks' does not exist
CREATE DATABASE ducks;
-- Error: Failed to create database: database with name 'ducks' already exists
CREATE OR REPLACE DATABASE ducks; -- Replaces existing 'ducks' with an empty database
CREATE DATABASE IF NOT EXISTS ducks; -- No-op if 'ducks' already exists
```
To *copy* an entire database from your local DuckDB instance into MotherDuck:
```sql
USE ducks_db;
CREATE DATABASE ducks FROM CURRENT_DATABASE();
-- Or alternatively:
CREATE OR REPLACE DATABASE ducks FROM ducks_db;
```
To configure database options in MotherDuck:
```sql
-- Create a Transient database:
CREATE DATABASE cloud_db (TRANSIENT)
-- Create a DuckLake:
CREATE DATABASE cloud_ducklake (TYPE DUCKLAKE);
```
To copy content between standard and transient databases:
```sql
-- Option 1: Clone with inherited properties
-- The new database inherits the transient/standard property from the source
CREATE OR REPLACE DATABASE dest_db FROM source_db;
-- Option 2: Copy content while preserving destination properties
-- Replaces the contents of dest_db without changing its configuration
CREATE DATABASE dest_db (TRANSIENT);
COPY FROM DATABASE source_db (OVERWRITE) TO dest_db;
```
:::note Limitations
You cannot copy data from a transient database into a standard database using `COPY FROM DATABASE (OVERWRITE)`, as this would violate the standard database's 7-day failsafe guarantee. To copy transient data into a standard database, use Option 1.
:::
To zero-copy clone a database that is already attached in MotherDuck:
```sql
CREATE DATABASE cloud_db FROM another_cloud_db;
```
To upload a local DuckDB database file:
```sql
CREATE DATABASE flying_ducks FROM './databases/local_ducks.db';
```
To upload an attached local DuckDB database:
```sql
ATTACH './databases/local_ducks.db';
CREATE DATABASE flying_ducks FROM local_ducks;
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-secret
---
sidebar_position: 1
title: CREATE SECRET
description: Create a secret in MotherDuck
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# CREATE SECRET
MotherDuck enables you to store your cloud storage credentials for convenience, using the familiar DuckDB `CREATE SECRET` syntax. See [DuckDB CREATE SECRET documentation](https://duckdb.org/docs/sql/statements/create_secret.html).
Make sure to add the `PERSISTENT` or `IN MOTHERDUCK` keyword to create MotherDuck secrets. Secrets stored in MotherDuck are fully encrypted and scoped to the user who created them. They are not shared with other users in your organization.
:::note
You can use the `PERSISTENT` keyword to create a local file persistent secret in local DuckDB as well. It gets stored unencrypted in the `~/.duckdb/stored_secrets` directory.
When you've loaded the MotherDuck extension, `PERSISTENT` secrets are stored encrypted in MotherDuck. Locally persisted secrets are not impacted.
You can still create locally persisted secrets when using MotherDuck by specifying the secret storage backend: `CREATE SECRET IN LOCAL_FILE`.
:::
When using MotherDuck, the statement below creates a cloud-persistent secret stored in MotherDuck.
## Syntax
```sql
CREATE [OR REPLACE] (PERSISTENT SECRET [secret_name] | SECRET [secret_name] IN MOTHERDUCK)
(
TYPE ,
);
```
## Secret parameters
Supported parameters for S3, GCS, and R2 secrets:
| Name | Description | Secret | Type | Default |
|------|-------------|--------|------|---------|
| ENDPOINT | Specify a custom S3 endpoint | S3, GCS, R2 | STRING | s3.amazonaws.com for S3 |
| KEY_ID | The ID of the key to use | S3, GCS, R2 | STRING | - |
| REGION | The region used for authentication (this should match the region of the bucket to query) | S3, GCS, R2 | STRING | Orgs will default to the region that was chosen at signup: us-east-1 or eu-central-1 |
| SECRET | The secret of the key to use | S3, GCS, R2 | STRING | - |
| SESSION_TOKEN | Optionally, a session token can be passed to use temporary credentials | S3, GCS, R2 | STRING | - |
| URL_COMPATIBILITY_MODE | Can help when URLs contain problematic characters | S3, GCS, R2 | BOOLEAN | true |
| URL_STYLE | Either vhost or path | S3, GCS, R2 | STRING | vhost for S3, path for R2 and GCS |
| USE_SSL | Whether to use HTTPS or HTTP | S3, GCS, R2 | BOOLEAN | true |
| ACCOUNT_ID | The R2 account ID to use for generating the endpoint URL | R2 | STRING | - |
| KMS_KEY_ID | AWS KMS (Key Management Service) key for Server Side Encryption S3 | S3 | STRING | - |
| SCOPE | Scope of secret resolution; In the case of multiple matching secrets, the longest prefix is chosen | S3, GCS, R2 | STRING | - |
::::info
Because of SSL certificate verification requirements, S3 bucket names that contain dots (.) cannot be accessed using vhost style URLs. This is due to AWS's SSL wildcard certificate (*.s3.amazonaws.com) which only validates single-level subdomains. To resolve this SSL issue, use `URL_STYLE path` in your secret.
::::
## Examples
### Manually defined S3 secret
To manually create an S3 secret in MotherDuck:
```sql
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 's3_access_key',
SECRET 's3_secret_key',
REGION 'us-east-1',
SCOPE 'my-bucket-path'
);
```
This creates a new secret with a default name (for S3, `__default_s3`) and a default scope (i.e., `[s3://, s3n://, s3a://]`) used for path matching explained below.
::::info
DuckDB uses the `SCOPE` parameter to determine which secret to use. When using persistent secrets or public buckets, scoping the secrets is important so that the database uses the correct secret. Imprecise scoping will lead to authentication errors.
Learn more in the [DuckDB documentation](https://duckdb.org/docs/stable/configuration/secrets_manager.html#creating-multiple-secrets-for-the-same-service-type).
::::
### Secret providers
MotherDuck supports the same [secret providers](https://duckdb.org/docs/configuration/secrets_manager.html#secret-providers) as DuckDB.
To create a secret by automatically fetching credentials using mechanisms provided by the AWS SDK, see [AWS CREDENTIAL_CHAIN provider](https://duckdb.org/docs/extensions/httpfs/s3api#credential_chain-provider).
To create a secret by automatically fetching credentials using mechanisms provided by the Azure SDK, see [Azure CREDENTIAL_CHAIN provider](https://duckdb.org/docs/extensions/azure#credential_chain-provider).
To create a secret by automatically fetching credentials using mechanisms provided by the Hugging Face CLI, see [Hugging Face CREDENTIAL_CHAIN provider](https://duckdb.org/docs/extensions/httpfs/hugging_face#authentication).
To store a secret from a given secret provider in MotherDuck, specify the `PERSISTENT` or `IN MOTHERDUCK` keyword in addition.
### Provider examples
To store a secret configured through `aws configure`:
```sql
CREATE PERSISTENT SECRET aws_secret (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
);
```
To store a secret configured through `az configure`:
```sql
CREATE SECRET azure_secret IN MOTHERDUCK (
TYPE AZURE,
PROVIDER CREDENTIAL_CHAIN,
ACCOUNT_NAME 'some-account'
);
```
## Querying with secrets
[Secret scope](https://duckdb.org/docs/configuration/secrets_manager.html#creating-multiple-secrets-for-the-same-service-type) is supported in the same way as in DuckDB to allow multiple secrets of the same type to be stored in MotherDuck.
When there are multiple local (i.e. in memory and store in local file) and remote (i.e. MotherDuck) secrets of the same type, scope matching (secret scope against the file path) happens to determine which secret to use to open a file. Both local and remote secrets are considered in scope matching.
In the case of multiple matching secrets, the secret with the longest matching scope prefix is chosen.
In the case of multiple secrets stored in different secret storages sharing the same scope (e.g. the default scope if not specified), matching secret is chosen based on the following order: local temp secret > local_file secret > MotherDuck secret.
To see which secret (either local or remote) is being used by MotherDuck, the DuckDB `which_secret` table function can be used, which takes a path and the secret type.
### Example Usage
To see which secret is used to open a file:
```sql
FROM which_secret('s3://my-bucket/my_dataset.parquet', 's3');
┌───────────────────────┬────────────┬────────────┐
│ name │ persistent │ storage │
│ varchar │ varchar │ varchar │
├───────────────────────┼────────────┼────────────┤
│ __default_s3 │ PERSISTENT │ motherduck │
└───────────────────────┴────────────┴────────────┘
```
## Troubleshooting
If you encounter issues creating or using secrets, check out our troubleshooting guides:
- **[AWS S3 Secrets Troubleshooting](/documentation/troubleshooting/aws-s3-secrets.md)** - Common issues with AWS S3 authentication and credentials
- **[Error Messages](/documentation/troubleshooting/error_messages.md)** - Understanding MotherDuck error messages
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-share
---
sidebar_position: 1
title: CREATE SHARE
---
# CREATE SHARE
The `CREATE SHARE` statement creates a new a share from a database. This command is used to share databases with other users. [Learn more about sharing in MotherDuck](/key-tasks/sharing-data/sharing-overview.md).
## Syntax
```sql
CREATE [ OR REPLACE ] SHARE [ IF NOT EXISTS ] [] [FROM ] (
[ACCESS ORGANIZATION | UNRESTRICTED | RESTRICTED],
[VISIBILITY DISCOVERABLE | HIDDEN],
[UPDATE MANUAL | AUTOMATIC]
);
```
If you attempt to create a share, yet a share with that name already exists, no new share will be created and the query will return an error.
The error will be silenced when you specify `IF NOT EXISTS`.
This statement returns a share URL of the form `md:_share//`.
- If the share is **Hidden**, you must pass this URL to the **data consumer**, who will need to [`ATTACH`](attach.md) the share.
- If the share is **Discoverable**, passing the URL to the **data consumer** is optional.
### _OR REPLACE_ Clause
When you use the `OR REPLACE` clause to create or replace a share named `foo`, the share's **URL changes**. This means that
any clients currently connected to the old share URL will be **disconnected within a few minutes**.
To continue using the share named `foo`, clients must **re-attach** to it using the **new URL** provided by the `CREATE SHARE`
command. The old share URL will no longer be valid.
### _ACCESS_ Clause
You can configure scope of access of the share:
- `ACCESS ORGANIZATION` (default) - only members of your Organization can access the share.
- `ACCESS UNRESTRICTED` - all MotherDuck users in the same cloud region as the share creator can access the share.
- `ACCESS RESTRICTED` - the share owner will be the only user with access to the share initially. Access for other users the share can be updated via the [`GRANT`](grant-access.md) and [`REVOKE`](revoke-access.md) commands.
If omitted, defaults to `ACCESS ORGANIZATION`.
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
### _VISIBILITY_ Clause
For Organization scoped shares **only**, you may choose to make them Discoverable:
- `VISIBILITY DISCOVERABLE` (default) - all members of your Organization will be able to list/find the share in the UI or SQL.
- `VISIBILITY HIDDEN` - the share can only be accessed directly by the share URL, and is not listed to other users. A Share can be hidden only if it has its `ACCESS` set to `RESTRICTED`.
If omitted, Organization-scoped and Restricted shares default to `VISIBILITY DISCOVERABLE`. Unrestricted shares can only be **Hidden**.
### _UPDATE_ Clause
Shares can be automatically or manually updated by the share creator.
- `UPDATE MANUAL` (default) - shares are only updated via the [`UPDATE SHARE`](update-share.md) command.
- `UPDATE AUTOMATIC` - the share is automatically updated when the underlying database changes. Typically changes on the underlying database will automatically be published to the share within at most 5 minutes, after writes have completed. Ongoing overlapping writes may prolong share updating.
If omitted, defaults to `UPDATE MANUAL`.
### Shorthand Convention
- If the database name is omitted, a share will be created from the current/active database.
- If the share name is omitted, the share will be named after the source database.
- If both database and share names are omitted, the share will be named and created after the current/active database.
## Example Usage
```sql
-- If ducks_share exists, it will be replaced with a new share.
--A new share URL is returned.
CREATE OR REPLACE SHARE ducks_share;
-- If ducks_share exists, nothing is done. Its existing share URL is returned.
--Otherwise, a new share is created and its share URL is returned.
CREATE SHARE IF NOT EXISTS ducks_share;
```
```sql
USE mydb;
-- Using shorthand: Create a share named ''mydb'' from the current database ''mydb''.
-- Defaults: ACCESS ORGANIZATION, VISIBILITY DISCOVERABLE, UPDATE MANUAL
CREATE SHARE;
-- Using shorthand: Create a share named ''db2'' from the specified database ''db2''.
-- Defaults: ACCESS ORGANIZATION, VISIBILITY DISCOVERABLE, UPDATE MANUAL
CREATE SHARE FROM db2;
-- Explicitly create a share named ''birds_share'' from database ''birds''.
-- Set specific access, visibility, and update behavior.
CREATE SHARE birds_share FROM birds (
ACCESS RESTRICTED, -- Only the share owner has initial access
VISIBILITY HIDDEN, -- Not listed; requires direct URL access
UPDATE AUTOMATIC -- Automatically updates with source DB changes
);
```
:::note
All shares created prior to June 6, 2024 are Unrestricted and Hidden. To make these legacy shares Organization-scoped and Discoverable, you can alter them in the UI or delete and create new shares.
:::
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/create-snapshot
---
sidebar_position: 1
title: CREATE SNAPSHOT
---
# CREATE SNAPSHOT
`CREATE SNAPSHOT OF ` creates a new read-only snapshot of the specified database for read-scaling Ducklings. Only one database can be snapshotted per command.
In the background, a snapshot of each database is taken every minute to sync changes with read-scaling Ducklings. If writing queries are active on a database, the snapshot is skipped to avoid disruption.
To force a snapshot, run `CREATE SNAPSHOT` manually. This command will wait on any ongoing write queries on the database to complete, and prevent new ones from starting. As soon as all ongoing write queries are completed, the command create the snapshot, ensuring that read-scaling connections can access the most up-to-date data.
Read-scaling Duckling picks up the latest available snapshot every minute. To minimize delay and ensure access to the latest
data, use `CREATE SNAPSHOT` on the writer connection, followed by a `REFRESH DATABASE ` on the read scaling connection.
```sql
CREATE SNAPSHOT OF ;
```
Lean more about [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md).
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/delete-secret
---
sidebar_position: 1
title: DROP SECRET
---
# DROP SECRET
The DuckDB `DROP SECRET` statement (see DuckDB [DROP SECRET documentation](https://duckdb.org/docs/sql/statements/create_secret#syntax-for-drop-secret)) works in MotherDuck to delete the secret previously created with `CREATE SECRET` statement.
# Syntax
```sql
DROP SECRET ;
```
When there are multiple secrets with the same name stored in different secret storages (e.g. in memory vs. in MotherDuck), either the persistent type or the secret storage type needs to be specified to remove ambiguity when dropping the secret.
# Example Usage
Disambiguate by specifying the storage type when dropping a secret:
```sql
DROP SECRET __default_s3 FROM motherduck;
```
Disambiguate by specifying the persistence type when dropping a secret:
```sql
DROP PERSISTENT SECRET __default_s3;
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/describe-share
---
sidebar_position: 1
title: DESCRIBE SHARE
---
# DESCRIBE SHARE
The `DESCRIBE SHARE` statement is used to get details about a specific share.
:::info
The **creator** of the share object can execute this statement by passing the **share name**. The **receiver** of the share object can execute this statement by passing the **share link**.
:::
# Syntax
```sql
DESCRIBE SHARE [ | ];
```
# Example
Let's use the `sample_data` database which is auto attached to MotherDuck users to illustrate the command
```sql
DESCRIBE SHARE 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6';
```
It returns a table with the following columns:
| column_name | column_type | description |
|---------------| ----------- |-----------------------------|
| name | VARCHAR | Name of the share |
| url | VARCHAR | URL of the share |
| source_db_name | VARCHAR | Name of the database shared |
| source_db_uuid | UUID | uid of the database shared |
| access | VARCHAR | Whether anyone (referred to as UNRESTRICTED) within the same cloud region or only organization members (referred to as ORGANIZATION) can attach to the share by its share_url |
| visibility | VARCHAR | Whether the share is DISCOVERABLE or HIDDEN |
| update | VARCHAR | The share’s update mode (MANUAL vs. AUTOMATIC) |
| created_ts | TIMESTAMP WITH TIME ZONE | The share’s creation time |
You can be specific about what which columns you want to return by using the table function :
```sql
SELECT name, url, source_db_name FROM md_describe_database_share('md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6');
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/detach
---
sidebar_position: 1
title: DETACH
---
# DETACH
The `DETACH` command in MotherDuck can be used to:
- Detach a local DuckDB database
- Detach a remote MotherDuck database
- Detach a shared database
:::info
Database aliases are not persisted when [Shares](/key-tasks/sharing-data/) are detached.
:::
## Detaching Databases
After a database has been created, it can be detached. This will prevent queries from accessing or modifying that database while it is detached. This command may be used on both local DuckDB databases and remote MotherDuck databases.
For a local database, specify the name of the database to detach and not the full path.
In the case of a remote MotherDuck database, the [`ATTACH`](attach.md) command can be used to re-attach at any point, so this is designed to be a convenience feature, not a security feature. `DETACH` can be used to isolate work on specific databases, while preserving the contents of the detached databases.
To see all databases, both attached and detached, use the [`SHOW ALL DATABASES` command](show-databases.md).
### Syntax for Databases
```sql
DETACH ;
```
### Examples of Database Detachment
```sql
-- Prior command:
-- ATTACH '/path/to/local_database.duckdb';
DETACH local_database;
-- Prior command:
-- CREATE DATABASE my_md_database;
DETACH my_md_database;
```
## Detaching Shares
Attached shares are sticky, and will continue to appear in your catalog unless you explicitly detach them.
### Syntax for Shares
```sql
DETACH ;
```
### Examples of Share Detachment
```sql
DETACH ducks;
```
For more information, see the [Attach & Detach](/key-tasks/database-operations/detach-and-reattach-motherduck-database) guide.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/drop-database
---
sidebar_position: 1
title: DROP DATABASE
---
# DROP DATABASE
The `DROP` statement removes a database entry added previously with the `CREATE` command.
By default (or if the `RESTRICT` clause is provided), the entry will not be dropped if there are any existing database shares that were created from it. If the `CASCADE` clause is provided then all the shares that are dependent on the database will be dropped as well.
# Syntax
```sql
DROP DATABASE [IF EXISTS] [CASCADE | RESTRICT];
```
# Example usage
```sql
DROP DATABASE ducks; -- drops database named `ducks`
DROP DATABASE ducks CASCADE; -- drops database named `ducks` and all the shares created from `ducks`
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/drop-share
---
sidebar_position: 1
title: DROP SHARE
---
# DROP SHARE
`DROP SHARE` is used to delete a share by the share creator. Users who have attached the share will lose access.
This will throw an error if the share does not exist.
`DROP SHARE IF EXISTS` is used to delete a share by the share creator and will not throw an error if the share does not exist.
Shares can be attached with an alias name.
To **drop a share** with `DROP SHARE`, you must reference its **original name**, which you can find by running `LIST SHARES`.
If you want to remove a share from your workspace without deleting it completely, use [`DETACH`](/sql-reference/motherduck-sql-reference/detach.md) instead.
# Syntax
```sql
DROP SHARE "";
DROP SHARE IF EXISTS "";
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/explain-analyze
---
sidebar_position: 1
title: EXPLAIN ANALYZE
---
# EXPLAIN ANALYZE
`EXPLAIN ANALYZE` displays and executes the query plan, showing performance metrics and cardinality information for each operator.
:::note
The [query profiling guide on DuckDB](https://duckdb.org/docs/stable/dev/profiling.html) is a great place to start with this topic.
:::
## Syntax
This SQL query shows the physical query plan for an example query.
```sql
EXPLAIN ANALYZE
```
## Example Usage
```sql
EXPLAIN ANALYZE
SELECT 1 AS col
UNION ALL
FROM (SELECT vendorId
FROM sample_data.nyc.taxi LIMIT 1);
```
This will return the analyzed plan, which shows which elements of the query are running locally and some are running remotely on MotherDuck.
```
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
EXPLAIN ANALYZE SELECT 1 AS col UNION ALL FROM (SELECT vendorId FROM sample_data.nyc.taxi LIMIT 1)
-- MD_SQL_METADATA: {"source":"hatchling","purpose":"runCellStatement","containsUserSQL":true}
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││ Total Time: 0.0955s ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│ QUERY │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXTENSION │
│ ──────────────────── │
│ md_type: │
│ HYBRID_STATS_COLLECTOR │
│ │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXPLAIN_ANALYZE │
│ ──────────────────── │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXTENSION │
│ ──────────────────── │
│ md_type: │
│ HYBRID_RUNNER │
│ │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXTENSION │
│ ──────────────────── │
│ md_type: │
│ DOWNLOAD_SOURCE │
│ │
│ bridge_id: 1 │
│ │
│ 2 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ EXTENSION │
│ ──────────────────── │
│ md_type: │
│ DOWNLOAD_SINK │
│ │
│ bridge_id: 1 │
│ parallel: false │
│ │
│ 0 Rows │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ UNION │
│ ──────────────────── │
│ 2 Rows ├──────────────┐
│ (0.00s) │ │
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ PROJECTION ││ STREAMING_LIMIT │
│ ──────────────────── ││ ──────────────────── │
│ col ││ │
│ ││ │
│ 1 Rows ││ 1 Rows │
│ (0.00s) ││ (0.00s) │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ DUMMY_SCAN ││ TABLE_SCAN │
│ ──────────────────── ││ ──────────────────── │
│ ││ Table: taxi │
│ ││ Type: Sequential Scan │
│ ││ Projections: VendorID │
│ ││ │
│ 1 Rows ││ 4096 Rows │
│ (0.00s) ││ (0.00s) │
└───────────────────────────┘└───────────────────────────┘
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/explain
---
sidebar_position: 1
title: EXPLAIN
---
# EXPLAIN
The `EXPLAIN` statement shows the physical query plan that will be executed. It displays a tree of operators that will run in sequence to produce the query results. The query optimizer transforms this plan to improve performance.
On MotherDuck queries, `(L)` indicates queries that are executed **locally** and `(R)` indicates the queries are executed **remotely**.
For more detailed query analysis, review the documentation on [`EXPLAIN ANALYZE`](/sql-reference/motherduck-sql-reference/explain-analyze/).
:::note
The [query profiling guide on DuckDB](https://duckdb.org/docs/stable/dev/profiling.html) is a great place to start with this topic.
:::
## Syntax
This SQL query shows the physical query plan for an example query.
```sql
EXPLAIN
```
## Example Usage
```sql
EXPLAIN
SELECT 1 AS col
UNION ALL
SELECT 2
```
This will return the physical plan, which executes entirely locally.
```
Physical Plan
┌───────────────────────────┐
│ UNION (L) ├──────────────┐
└─────────────┬─────────────┘ │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ PROJECTION (L) ││ PROJECTION (L) │
│ ──────────────────── ││ ──────────────────── │
│ col ││ 2 │
│ ││ │
│ ~1 Rows ││ ~1 Rows │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│ DUMMY_SCAN (L) ││ DUMMY_SCAN (L) │
└───────────────────────────┘└───────────────────────────┘
```
#### `EXPLAIN` in MotherDuck compared to DuckDB
The MotherDuck `EXPLAIN` plan is similar to the DuckDB `EXPLAIN` plan, with two main differences:
* Operations that run locally are marked as (L), and operations running remotely on the MotherDuck service are marked as (R).
* The MotherDuck DuckDB extension adds four new type of custom operators, to exchange data between your local DuckDB and the MotherDuck service:
* The **`UploadSink`** operator runs locally and sends data from your local DuckDB to the remote MotherDuck service.
* The **`UploadSource`** operator runs remotely in the DuckDB on the MotherDuck side and consumes the uploaded data.
* The **`DownloadSink`** operator runs remotely on the MotherDuck side and prepares the data to be downloaded by the local DuckDB.
* The **`DownloadSource`** operator runs in your local DuckDB, fetching the data from the MotherDuck service made available via the remote DownloadSink.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/grant-access
---
sidebar_position: 1
title: GRANT READ ON SHARE
---
# GRANT READ ON SHARE
For restricted shares, use the `GRANT` command to explicitly give users access to the share. After a user has been `GRANT`-ed access they will still need to run an `ATTACH` command to be able to run queries against the shared database. Only the owner of the share can use the `GRANT` command to give access to others.
## Syntax
```sql
GRANT READ ON SHARE TO [, , ];
```
## Example usage
```sql
-- Owner: gives the user with username 'duck' access to the share 'birds'
GRANT READ ON SHARE birds TO duck;
-- Owner: gives the users with usernames 'usr1' and 'usr2' access to the share 'taxis'
GRANT READ ON SHARE taxis TO user_1, user_2;
```
If a username contains special characters, such as '@', it must be enclosed in double quotes (`"`).
## Complete workflow example
Below is a complete workflow showing how to share a database with a restricted audience and how recipients can access it. We will first create a share and grant access to specific users. Then, we will show how recipients can attach and query the shared database. For more information on each step, refer to the [CREATE SHARE](create-share.md), [LIST SHARES](list-shares.md), and [ATTACH](attach.md) documentation.
### 1. Owner creates a share and grants access
```sql
-- Owner: create a restricted share of database 'analytics'
-- Using CREATE OR REPLACE allows updating the share if it already exists.
-- ACCESS RESTRICTED is required to use GRANT/REVOKE.
CREATE OR REPLACE SHARE analytics_share FROM analytics (ACCESS RESTRICTED);
-- Owner: Grant access to specific users.
-- If a username contains special characters like '@', enclose it in double quotes.
GRANT READ ON SHARE analytics_share TO user_1, "user_2@example-com";
-- Owner retrieves the share URL to provide to recipients.
-- The URL uniquely identifies the share.
LIST SHARES;
-- Example output contains URL like: md:_share/analytics/0a9a026ec5a55946a9de39851087ed81
-- Owner shares the full URL (e.g., 'md:_share/analytics/0a9a026ec5a55946a9de39851087ed81') with the granted users.
```
### 2. Recipient attaches and queries the shared database
```sql
-- Recipient: attach the shared database using the full URL provided by the owner.
-- Using the full URL prevents naming conflicts.
ATTACH 'md:_share/analytics/0a9a026ec5a55946a9de39851087ed81' AS analytics_data;
-- Recipient: switch to the attached database
USE analytics_data;
-- Recipient: query the shared database
SELECT * FROM customer_metrics LIMIT 10;
```
When the share is attached, it creates a read-only reference to the shared database that doesn't consume additional storage for the recipient. Recipients can query the data but cannot modify it.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/list-secrets
---
sidebar_position: 1
title: LIST SECRETS
---
# LIST SECRETS
Secrets can be listed in the same way as in DuckDB by using the table function `duckdb_secrets()`.
# Syntax
```sql
FROM duckdb_secrets();
```
| name | type | provider | persistent | storage | scope | secret_string |
|-----------------|-------|------------------|------------|------------|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| __default_azure | azure | credential_chain | false | memory | [azure://, az://] | name=__default_azure;type=azure;provider=credential_chain;serializable=true;scope=azure://,az://;account_name=some-account |
| __default_s3 | s3 | credential_chain | false | memory | [s3://, s3n://, s3a://] | name=__default_s3;type=s3;provider=credential_chain;serializable=true;scope=s3://,s3n://,s3a://;endpoint=s3.amazonaws.com;key_id=my_key;region=us-east-1;secret=redacted;session_token=redacted |
| __default_r2 | r2 | config | true | motherduck | [r2://] | name=__default_r2;type=r2;provider=config;serializable=true;scope=r2://;endpoint=my_account.r2.cloudflarestorage.com;key_id=my_key;region=us-east-1;s3_url_compatibility_mode=0;secret=redacted;session_token=redacted;url_style=path;use_ssl=1 |
| __default_gcs | gcs | config | true | motherduck | [gcs://, gs://] | name=__default_gcs;type=gcs;provider=config;serializable=true;scope=gcs://,gs://;endpoint=storage.googleapis.com;key_id=my_key;region=us-east-1;s3_url_compatibility_mode=0;secret=redacted;session_token=redacted;url_style=path;use_ssl=1 |
:::note
DuckDB allows you to specify `redact` when listing secrets (it's set to `true` by default). However, MotherDuck secrets are always redacted for security reasons despite the flag.
:::
# Example Usage
To inspect specific field(s) in `secret_string`:
```sql
select
name, storage,
list_filter(split(secret_string,';'), x -> starts_with(x, 'region'))[1]
from duckdb_secrets(redact=false) where name='__default_s3';
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/list-shares
---
sidebar_position: 1
title: LIST SHARES
---
# LIST SHARES
The `LIST SHARES` statement lists all shares created by the current user.
It provides the same information as querying the [`MD_INFORMATION_SCHEMA.OWNED_SHARES`](/sql-reference/motherduck-sql-reference/md_information_schema/owned_shares/) view. You can also use the table function `md_list_database_shares()` for a subset of this information.
:::tip
To see shares that have been shared *with* you (by others), query the [`MD_INFORMATION_SCHEMA.SHARED_WITH_ME`](/sql-reference/motherduck-sql-reference/md_information_schema/shared_with_me/) view instead.
:::
## Syntax
```sql
-- Using DDL (lists all owned shares with details)
LIST SHARES;
-- Equivalent to querying the information schema view
SELECT * FROM MD_INFORMATION_SCHEMA.OWNED_SHARES;
-- Using table function (returns a subset of columns)
SELECT name, url, source_db_name FROM md_list_database_shares();
```
## Output
The `LIST SHARES` statement and `SELECT * FROM MD_INFORMATION_SCHEMA.OWNED_SHARES` return a table with the following columns:
| Column Name | Data Type | Value |
| ---------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| NAME | STRING | The name of the share |
| URL | STRING | The `share_url` which can be used to attach the share |
| SOURCE_DB_NAME | STRING | The name of the database where this share was created from |
| SOURCE_DB_UUID | UUID | UUID of the database where this share was created from |
| ACCESS | STRING | Whether anyone (`UNRESTRICTED`) within the same cloud region or only organization members (`ORGANIZATION`) can attach to the share by its `share_url`. RESTRICTED shares are hidden from the list. |
| VISIBILITY | STRING | Whether the share is `DISCOVERABLE` or `HIDDEN` |
| UPDATE | STRING | The share's update mode (`MANUAL` vs. `AUTOMATIC`) |
| CREATED_TS | TIMESTAMP | The share's creation time |
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md-run-parameter
---
sidebar_position: 1
title: MD_RUN parameter
---
# MD_RUN parameter
For certain DuckDB **Table Functions**, MotherDuck now provides an additional parameter, `MD_RUN` that gives explicit control over where the query is executed.
This parameter is available to the following functions:
- `read_csv()`
- `read_csv_auto()`
- `read_json()`
- `read_json_auto()`
- `read_parquet()` and its alias `parquet_scan()`
To leverage the MD_RUN parameter, you can choose:
- `MD_RUN=LOCAL` executes the function in your local DuckDB environment
- `MD_RUN=REMOTE` executes the function in MotherDuck-hosted DuckDB runtimes in the cloud
- `MD_RUN=AUTO` executes remotely all s3://, http://, and https:// requests, except those to localhost/127.0.01. This is the default option.
The following is an example of evoking this parameter to execute the function remotely:
```sql
SELECT *
FROM read_csv_auto(
'https://github.com/duckdb/duckdb/raw/main/data/csv/ips.csv.gz',
MD_RUN=REMOTE)
LIMIT 100
```
In this example `MD_RUN=REMOTE` is redundant, because omitting it implies `MD_RUN=AUTO` and given that this is a non-local https:// resource, MotherDuck will automatically chose remote execution already.
One can force local execution with `MD_RUN=LOCAL`. Be aware that DuckDB-WASM does not support reading compressed files yet, so inside the Web Browser one would get an error for this particular file as it is ips.csv**.gz** (it does work locally from the CLI or e.g. a python notebook).
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/database_size
---
sidebar_position: 5
title: PRAGMA database_size
---
# Database Size
Database size can be fetched with `PRAGMA database_size;`. It contains attributes to allow insights into the sizes of MotherDuck databases.
### Alternative invocations
This Pragma can also be invoked as a tabular function with:
- `FROM pragma_database_size();`
## Schema
`PRAGMA database_size` has the following schema:
| Column Name | Data Type | Value |
|-----------------------|-------------|-----------------------------------|
| database_name | VARCHAR | name of the database |
| database_size | VARCHAR | database size in 1000 byte increments (i.e. Kilobytes) |
| block_size | BIGINT | _not currently returned_ |
| used_blocks | BIGINT | _not currently returned_ |
| total_blocks | BIGINT | _not currently returned_ |
| free_blocks | BIGINT | _not currently returned_ |
| wal_size | VARCHAR | _not currently returned_ |
| memory_usage | VARCHAR | _not currently returned_ |
| memory_limit | VARCHAR | _not currently returned_ |
## Example Usage
```sql
PRAGMA database_size;
```
Example result:
| database_name | database_size | block_size | used_blocks | total_blocks | free_blocks | wal_size | memory_usage | memory_limit |
|---------------|---------------|------------|-------------|--------------|-------------|----------|--------------|--------------|
| my_db | 153.9 GiB | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
| another_database | 3.1 TiB | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
In some cases, you may want to filter the dataset, in which case you can use a tabular function in your `FROM` clause. An example is shown below:
```sql
SELECT *
FROM pragma_database_size()
WHERE database_name = 'my_db'
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/databases
---
sidebar_position: 1
title: DATABASES view
---
# DATABASES view
The `MD_INFORMATION_SCHEMA.DATABASES` view provides information about the current databases that the user created.
## Schema
When you query the `MD_INFORMATION_SCHEMA.DATABASES` view, the query results contain one row for each database that the current user created.
The `MD_INFORMATION_SCHEMA.DATABASES` view has the following schema:
| Column Name | Data Type | Value |
|-------------|-----------|-----------------------------------|
| NAME | STRING | The name or alias of the database |
| UUID | STRING | The UUID of the database |
| CREATED_TS | TIMESTAMP | The database’s creation time |
## Example usage
```sql
from MD_INFORMATION_SCHEMA.DATABASES;
```
| name | uuid | created_ts |
|----------------------|--------------------------------------|------------------------|
| tpch_sf1000_template | 2c80b37d-d307-44d8-aff6-33ea2294bd35 | 2024-10-21 14:26:30-04 |
| db1 | 445864c7-5758-42a2-9a5c-2f16620ebc9f | 2024-09-15 09:32:05-04 |
| foo | 4d829a9e-e0da-408c-aafa-0fc50186a588 | 2024-09-03 13:32:10-04 |
| tpch_sf1000 | fc4bf9f4-80d1-4fd9-b6fe-d6d71f40ef42 | 2024-10-21 14:26:30-04 |
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/introduction
---
title: Introduction to MD_INFORMATION_SCHEMA
description: Introduction to MD_INFORMATION_SCHEMA
---
# Introduction to MD_INFORMATION_SCHEMA
The MotherDuck `MD_INFORMATION_SCHEMA` views are read-only, system-defined views that provide metadata information
about your MotherDuck objects.
The following table lists all `MD_INFORMATION_SCHEMA` views that you can query to retrieve metadata information:
| Resource Type | MD_INFORMATION_SCHEMA View |
|-----------------|----------------------------------|
| Database | [DATABASES](databases.md) |
| Database Size | [PRAGMA database_size](database_size.md) |
| Database Shares | [OWNED_SHARES](owned_shares.md) [SHARED_WITH_ME](shared_with_me.md) |
| Storage | [STORAGE_INFO](storage_info.md) _includes history_ |
| Queries | [RECENT_QUERIES](recent_queries.md) |
| Queries | [QUERY_HISTORY](query_history.md) |
## Example usage
```sql
-- list all databases you created
from md_information_schema.databases;
-- list all shares you created
from md_information_schema.owned_shares;
-- select specific columns
select name, url, access, visibility from md_information_schema.owned_shares;
-- set md_information_schema as the current database
use md_information_schema;
-- list all the views in md_information_schema
show tables;
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/owned_shares
---
sidebar_position: 2
title: OWNED_SHARES view
---
# OWNED_SHARES view
The `MD_INFORMATION_SCHEMA.OWNED_SHARES` view provides information about shares created by the current user.
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
## Schema
Querying the `MD_INFORMATION_SCHEMA.OWNED_SHARES` view will return query results that contain one row for each share created by the current user.
The `MD_INFORMATION_SCHEMA.OWNED_SHARES` view has the following schema:
| Column Name | Data Type | Value |
|-------------|-----------|-----------------------------------|
| NAME | STRING | The name of the share |
| URL | STRING | The share_url which can be used to attach the share |
| SOURCE_DB_NAME | STRING | The name of the database where this share was created from |
| SOURCE_DB_UUID | UUID | UUID of the database where this share was created from |
| ACCESS | STRING | Whether anyone (referred to as UNRESTRICTED) or only organization members (referred to as ORGANIZATION) can attach to the share by its share_url |
| GRANTS | STRUCT(username VARCHAR, access VARCHAR)[] | A list of all grants that are active for the share |
| VISIBILITY | STRING | Whether the share is DISCOVERABLE or HIDDEN |
| UPDATE | STRING | The share’s update mode (MANUAL vs. AUTOMATIC) |
| CREATED_TS | TIMESTAMP | The share’s creation time |
## Example usage
```sql
from MD_INFORMATION_SCHEMA.OWNED_SHARES;
select name, url, created_ts from MD_INFORMATION_SCHEMA.OWNED_SHARES;
```
| name | url | source_db_name |
|----------|---------------------------------------------------------|----------------|
| my_share | md:_share/my_share/2ef6b580-2445-4f4f-bce8-c13a85812464 | db1 |
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/query_history
---
sidebar_position: 1
title: QUERY_HISTORY view
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
::::warning[Preview Feature]
This is a preview feature only available on Business plans. Note that preview features may be operationally incomplete and may offer limited backward compatibility. This feature is only available for organization admins.
::::
# QUERY_HISTORY view
The `MD_INFORMATION_SCHEMA.QUERY_HISTORY` view provides organization admins with a consolidated view of all queries run across their full organization.
## Schema
When you query the `MD_INFORMATION_SCHEMA.QUERY_HISTORY` view, the query results contain one row for each query that was run in the organization. Note that the information in this view will have some delays. A more realtime view of ongoing and recently completed queries that have not been captured in `QUERY_HISTORY` yet is provided via the [`MD_INFORMATION_SCHEMA.RECENT_QUERIES`](recent_queries.md) view.
The `MD_INFORMATION_SCHEMA.QUERY_HISTORY` view has the following schema:
| Column Name | Data Type | Value |
|-----------------------|-------------|-----------------------------------|
| QUERY_ID | UUID | A unique ID representing the particular query run |
| QUERY_TEXT | STRING | Query SQL text (up to 100k chars) |
| START_TIME | TIMESTAMPTZ | Start time of the query |
| END_TIME | TIMESTAMPTZ | End time of the query |
| EXECUTION_TIME | INTERVAL | Duration where the query is actively executing |
| WAIT_TIME | INTERVAL | Duration where the query is waiting on resources to become available. For example a query needs to wait because other queries are using all available execution threads, or a query might be waiting on data to become available (in case of data upload). |
| TOTAL_ELAPSED_TIME | INTERVAL | Total duration of the query |
| ERROR_MESSAGE | STRING | Error message, if the query returned an error |
| ERROR_TYPE | STRING | Error type, if the query returned an error |
| USER_AGENT | STRING | User agent of the client |
| USER_NAME | STRING | Identifier for the MotherDuck user in their organization |
| QUERY_NR | UBIGINT | ID of the query within the transaction that ran the query. Number that just increments for each query that is run within a given transaction |
| TRANSACTION_NR | UBIGINT | ID of the transaction that contained the query. Number that just increments for each new transaction on a given connection |
| CONNECTION_ID | UUID | Unique ID for the [client DuckDB connection](../connection-management/connection-duckdb-id.md) where the query was issued |
| DUCKDB_ID | UUID | Unique ID for the [client DuckDB instance](../connection-management/connection-duckdb-id.md) where the query was issued |
| DUCKDB_VERSION | STRING | Client DuckDB version that issued the query |
| INSTANCE_TYPE | STRING | The size of Duckling that the query was run on (Pulse / Standard / Jumbo / Mega / Giga / ...) |
| QUERY_TYPE | STRING | The nature of the query (DDL / DML / QUERY / ...) |
| BYTES_UPLOADED | UBIGINT | Number of bytes uploaded from client to server (relevant for hybrid queries) |
| BYTES_DOWNLOADED | UBIGINT | Number of bytes downloaded from server to client (relevant for hybrid queries) |
| BYTES_SPILLED_TO_DISK | UBIGINT | Total number of bytes [spilled to disk](https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads.html#spilling-to-disk) for "larger than in-memory" workloads |
| DUCKLING_ID | STRING | Identifies the duckling that ran the query. It is composed of the user name and a qualifier (`rw` for read-write ducklings, or `rs.0`, `rs.1`, ... for the respective read-scaling duckling) |
| SESSION_NAME | STRING | The [`session_hint`](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/?_gl=1*l0d2b8*_up*MQ..*_ga*MzY2OTE2Mjc4LjE3NTUxNjQ3MDY.*_ga_L80NDGFJTP*czE3NTUxNjQ3MDYkbzEkZzAkdDE3NTUxNjQ3MDYkajYwJGwwJGg3OTc4MzAwODU.#read-scaling-with-session-hints) that was supplied for connecting to the read-scaling duckling |
Note that the fields `START_TIME`, `END_TIME`, `TOTAL_ELAPSED_TIME`, `ERROR_MESSAGE`, and `ERROR_TYPE` are currently just captured on the server (i.e. when query starts and ends on server), in the future they will be based on client information too (taking better into account the full hybrid context).
## Example usage
```sql
from MD_INFORMATION_SCHEMA.QUERY_HISTORY limit 10;
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries
---
sidebar_position: 1
title: RECENT_QUERIES view
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import Admonition from '@theme/Admonition';
::::warning[Preview Feature]
This is a preview feature only available on Business plans. Note that preview features may be operationally incomplete and may offer limited backward compatibility. This feature is only available for organization admins.
::::
# RECENT_QUERIES view
The `MD_INFORMATION_SCHEMA.RECENT_QUERIES` view provides organization admins with a consolidated view of all currently running or recently completed queries across their full organization. It complements the [`MD_INFORMATION_SCHEMA.QUERY_HISTORY`](query_history.md) view, which is geared towards analytics of past events and has some delays, with a more realtime view of recent queries (active and completed) that are not yet exposed in `QUERY_HISTORY`.
## Schema
When you query the `MD_INFORMATION_SCHEMA.RECENT_QUERIES` view, the query results contain one row for each query that is running or has recently completed in the organization. Note that the information in this view is updated every couple of seconds.
The `MD_INFORMATION_SCHEMA.RECENT_QUERIES` view shares the same schema as the [`MD_INFORMATION_SCHEMA.QUERY_HISTORY`](query_history.md) view. The main difference is that for queries that have not completed yet the `END_TIME` field is null, and all other fields represent ongoing metrics that will be updated every few seconds.
Full schema:
| Column Name | Data Type | Value |
|-----------------------|-------------|-----------------------------------|
| QUERY_ID | UUID | A unique ID representing the particular query run |
| QUERY_TEXT | STRING | Query SQL text (up to 100k chars) |
| START_TIME | TIMESTAMPTZ | Start time of the query |
| END_TIME | TIMESTAMPTZ | End time of the query, if the query is completed |
| EXECUTION_TIME | INTERVAL | Duration where the query is actively executing |
| WAIT_TIME | INTERVAL | Duration where the query is waiting on resources to become available. For example a query needs to wait because other queries are using all available execution threads, or a query might be waiting on data to become available (in case of data upload). |
| TOTAL_ELAPSED_TIME | INTERVAL | Total duration of the query (the sum of execution time and wait time) |
| ERROR_MESSAGE | STRING | Error message, if the query returned an error |
| ERROR_TYPE | STRING | Error type, if the query returned an error |
| USER_AGENT | STRING | User agent of the client |
| USER_NAME | STRING | Identifier for the MotherDuck user in their organization |
| QUERY_NR | UBIGINT | ID of the query within the transaction that ran the query. Number that just increments for each query that is run within a given transaction |
| TRANSACTION_NR | UBIGINT | ID of the transaction that contained the query. Number that just increments for each new transaction on a given connection |
| CONNECTION_ID | UUID | Unique ID for the [client DuckDB connection](../connection-management/connection-duckdb-id.md) where the query was issued |
| DUCKDB_ID | UUID | Unique ID for the [client DuckDB instance](../connection-management/connection-duckdb-id.md) where the query was issued |
| DUCKDB_VERSION | STRING | Client DuckDB version that issued the query |
| INSTANCE_TYPE | STRING | The type of duckling that the query was run on (Pulse / Standard / Jumbo / Mega / Giga / ...) |
| QUERY_TYPE | STRING | The nature of the query (DDL / DML / QUERY / ...) |
| BYTES_UPLOADED | UBIGINT | Number of bytes uploaded from client to server (relevant for hybrid queries) |
| BYTES_DOWNLOADED | UBIGINT | Number of bytes downloaded from server to client (relevant for hybrid queries) |
| BYTES_SPILLED_TO_DISK | UBIGINT | Total number of bytes [spilled to disk](https://duckdb.org/docs/stable/guides/performance/how_to_tune_workloads.html#spilling-to-disk) for "larger than in-memory" workloads |
| DUCKLING_ID | STRING | Identifies the duckling that ran the query. It is composed of the user name and a qualifier (`rw` for read-write ducklings, or `rs.0`, `rs.1`, ... for the respective read-scaling duckling) |
| SESSION_NAME | STRING | The [`session_hint`](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/?_gl=1*l0d2b8*_up*MQ..*_ga*MzY2OTE2Mjc4LjE3NTUxNjQ3MDY.*_ga_L80NDGFJTP*czE3NTUxNjQ3MDYkbzEkZzAkdDE3NTUxNjQ3MDYkajYwJGwwJGg3OTc4MzAwODU.#read-scaling-with-session-hints) that was supplied for connecting to the read-scaling duckling |
Note that the fields `START_TIME`, `END_TIME`, `TOTAL_ELAPSED_TIME`, `ERROR_MESSAGE`, and `ERROR_TYPE` are currently just captured on the server (i.e. when query starts and ends on server), in the future they will be based on client information too (taking better into account the full hybrid context).
## Example usage
```sql
from MD_INFORMATION_SCHEMA.RECENT_QUERIES where end_time is null limit 10;
```
## Limitations
The `RECENT_QUERIES` view has been optimized for quickly answering questions such as "Which ongoing queries in my organization are taking a long time to complete". Query results of this view are therefore limited to 1000 rows, but support filter pushdowns so that this limit only applies after some basic filters. Take for example the following query:
```sql
from MD_INFORMATION_SCHEMA.RECENT_QUERIES where end_time is null and total_elapsed_time > '5 seconds';
```
The 1000 row limit only applies after the `end_time` and the `total_elapsed_time` filters have been applied, showing at most 1000 queries that are still ongoing and that are taking longer than 5 seconds. To check what filters are pushed down and apply before the row limit, the "filters" section of the MD_SERVER_RECENT_QUERIES table scan operator can be checked in the [query plan explain output](https://duckdb.org/docs/stable/guides/meta/explain).
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/shared_with_me
---
sidebar_position: 3
title: SHARED_WITH_ME view
---
# SHARED_WITH_ME view
The `MD_INFORMATION_SCHEMA.SHARED_WITH_ME` view provides information about all shares that the current user can attach to (excluding their own created shares).
:::note
Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is currently scoped to a single cloud region that must be chosen at Org creation when signing up.
MotherDuck is currently available on AWS in two regions:
- **US East (N. Virginia):** `us-east-1`
- **Europe (Frankfurt):** `eu-central-1`
:::
## Schema
When you query the `MD_INFORMATION_SCHEMA.SHARED_WITH_ME` view, the query results contain one row for each share that the current user can discover.
The `MD_INFORMATION_SCHEMA.SHARED_WITH_ME` view has the following schema:
| Column Name | Data Type | Value |
|-------------|-----------|-----------------------------------|
| NAME | STRING | The name of the share |
| URL | STRING | The share_url which can be used to attach the share |
| CREATED_TS | TIMESTAMP | The share’s creation time |
| UPDATE | STRING | The share’s update mode (MANUAL vs. AUTOMATIC) |
| ACCESS | STRING | Whether anyone (referred to as UNRESTRICTED) or only organization members (referred to as ORGANIZATION) can attach to the share by its share_url |
## Example usage
```sql
from MD_INFORMATION_SCHEMA.SHARED_WITH_ME;
```
| name | url | created_ts |update | access |
|-----------------------------|----------------------------------------------------------------------------|------------------------|----------|-----------|
| efs_ia_benchmark | md:_share/efs_ia_benchmark/11597119-359a-4e02-8e5c-bc2b9b8c1908 | 2024-07-16 15:09:11-04 | MANUAL | ORGANIZATION |
| hf_load_test_share | md:_share/hf_load_test_share/f76062a5-f1f5-4024-987d-fc2eea48311b | 2024-07-29 09:07:33-04 | MANUAL | ORGANIZATION |
| mdw | md:_share/mdw/87be4635-fbfd-4d4b-9cae-b629842733d5 | 2024-10-16 17:04:42-04 | AUTOMATIC | ORGANIZATION |
| my_sample_share | md:_share/my_sample_share/c4ee2a30-2fb6-4cb5-b664-9030ae43ffdc | 2024-09-24 17:53:23-04 | MANUAL | ORGANIZATION |
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/md_information_schema/storage_info
---
sidebar_position: 1
title: STORAGE_INFO views
description: View storage footprint, billing, and lifecycle information for all databases in your MotherDuck organization
---
import Admonition from '@theme/Admonition';
# STORAGE_INFO views
:::note Admin Only Feature
This feature can only be used by Admin users.
:::
## Overview
MotherDuck provides two views to look at how much storage is used - a current snapshot (`STORAGE_INFO`) and the previous 30 days of history (`STORAGE_INFO_HISTORY`).
The `MD_INFORMATION_SCHEMA.STORAGE_INFO` view provides comprehensive storage information for all databases in your MotherDuck organization. This view is essential for understanding storage usage, billing calculations, and database lifecycle management.
The `MD_INFORMATION_SCHEMA.STORAGE_INFO_HISTORY` view provides storage information for up to the past 30 days of usage.
If you're an admin, you can view your organization's storage breakdown on the [databases page](https://app.motherduck.com/settings/databases). Here, you'll find the total breakdown of current bytes across all your databases, as well as a breakdown for each database. You can also click on a row to get a lifecycle breakdown for a given database.
## Syntax
To see the latest snapshot:
```sql
SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO;
```
To see the history:
```sql
SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO_HISTORY;
```
## Columns
The `MD_INFORMATION_SCHEMA.STORAGE_INFO` view returns one row for each database in your organization with the following columns:
| Column Name | Data Type | Description |
| ----------------------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| `database_name` | VARCHAR | Name of the database |
| `database_id` | UUID | Unique ID for the database |
| `created_ts` | TIMESTAMP | Time when the database was created |
| `deleted_ts` | TIMESTAMP | Time when the database was deleted (NULL if not deleted) |
| `username` | VARCHAR | Username of the database owner |
| `active_bytes` | BIGINT | Actively referenced bytes of the database |
| `historical_bytes` | BIGINT | Non-active bytes that are referenced by a share of this database |
| `kept_for_cloned_bytes` | BIGINT | Bytes referenced by other databases (via zero-copy clone) that are no longer referenced by this database as active or historical bytes |
| `failsafe_bytes` | BIGINT | Bytes that are no longer referenced by any database or share |
| `computed_ts` | TIMESTAMP | Time at which active_bytes, historical_bytes, etc. were computed |
The `MD_INFORMATION_SCHEMA.STORAGE_INFO_HISTORY` view has the same schema, but will return results from up to the past 30 days, so a single database might have multiple entries reflecting its state at different points in time.
## Examples
### Basic Usage
View storage information for all databases in your organization:
```sql
-- Get storage information for all databases
SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO;
```
**Sample results:**
| database_name | database_id | created_ts | deleted_ts | username | active_bytes | historical_bytes | kept_for_cloned_bytes | failsafe_bytes | results_ts |
| ------------- | ------------------------------------ | ------------------- | ---------- | -------- | ------------ | ---------------- | --------------------- | -------------- | ---------------------- |
| test_db_1 | 7ed1baf3-e4ff-42c9-a37b-9f683905ce45 | 2024-12-02 20:18:36 | NULL | bob | 82063360 | 0 | 268496896 | 0 | 2025-06-25 16:46:16.37 |
| test_db_2 | fcc16e53-d761-4e40-84ec-15570fab363e | 2024-11-12 03:38:52 | NULL | jim | 274432 | 0 | 0 | 0 | 2025-06-25 16:46:16.37 |
### Filtering and Analysis
Find databases with high storage usage:
```sql
-- Find databases using more than 1GB of active storage
SELECT
database_name,
username,
active_bytes,
ROUND(active_bytes / 1000.0 / 1000.0 / 1000.0, 2) as active_gb
FROM MD_INFORMATION_SCHEMA.STORAGE_INFO
WHERE active_bytes > 1000000000 -- 1GB in bytes
ORDER BY active_bytes DESC;
```
### Storage Cost Analysis
Analyze storage costs by user:
```sql
-- Calculate total storage usage per user
SELECT
username,
COUNT(*) as database_count,
SUM(active_bytes) as total_active_bytes,
SUM(historical_bytes) as total_historical_bytes,
SUM(kept_for_cloned_bytes) as total_cloned_bytes,
SUM(failsafe_bytes) as total_failsafe_bytes
FROM MD_INFORMATION_SCHEMA.STORAGE_INFO
GROUP BY username
ORDER BY total_active_bytes DESC;
```
Analyze active and failsafe storage footprint over the past week for a specific database:
```sql
SELECT active_bytes, failsafe_bytes, computed_ts
FROM MD_INFORMATION_SCHEMA.STORAGE_INFO_HISTORY
WHERE database_name = "my_database"
AND computed_ts >= NOW - INTERVAL 7 DAYS
ORDER BY computed_ts DESC;
```
## Notes
- **Data Refresh**: Information in this view refreshes every 1-6 hours
- **Retention**: STORAGE_INFO_HISTORY only returns one set of results per day, even though the latest results are re-computed multiple times per day
- **Billing Data**: This view returns the underlying data used to power MotherDuck storage billing
- **Permissions**: You must have appropriate permissions to access this view
- **Organization Scope**: Only shows databases within your current organization
## Troubleshooting
### Common Issues
**Outdated information**
- Data refreshes only happen periodically, so recent changes may not be immediately visible
**Permission denied errors**
- Contact your organization administrator to ensure you have the necessary permissions
- This feature is only for Admins
- Verify your authentication token is valid and has the required scope
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/print-md-token
---
sidebar_position: 1
title: PRINT_MD_TOKEN pragma
---
# PRINT_MD_TOKEN pragma
You can retrieve your MotherDuck authentication token using the `PRINT_MD_TOKEN` pragma.
In CLI or Python, to avoid having to re-authenticate every time, you can store your token as an environment variable; for example, by running `export motherduck_token='xxxx'` in the terminal. Be sure to replace 'xxxx' with your own token!
# Syntax
```sql
PRAGMA PRINT_MD_TOKEN;
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/refresh-database
---
sidebar_position: 1
title: REFRESH DATABASE
---
# REFRESH DATABASE
There are two types of databases that can be refreshed: **database shares** and databases attached to **read-scaling**
connections.
**Read-scaling** connections sync automatically every minute. To ensure maximum freshness, run `CREATE SNAPSHOT` on the
writer, followed by `REFRESH DATABASES` on the reader. This pulls the latest snapshot.
**Database shares** can also be refreshed—either automatically or manually. In this case, the writer uses `UPDATE SHARE`
instead of `CREATE SNAPSHOT`, followed by `REFRESH DATABASES` on the reader.
## Behavior by connection mode
The behavior of `REFRESH DATABASES` depends on how you connected to MotherDuck:
- **Workspace mode** (`ATTACH 'md:'`): Refreshes all databases in your workspace, including new databases created by other connections (e.g., R/W instances). This allows you to pick up databases that were created after your initial connection.
## Syntax
```sql
REFRESH { DATABASE | DATABASES } [];
```
## Examples
```sql
REFRESH DATABASES; -- Refreshes all connected databases and shares
┌─────────┬───────────────────┬──────────────────────────┬───────────┐
│ name │ type │ fully_qualified_name │ refreshed │
│ varchar │ varchar │ varchar │ boolean │
├─────────┼───────────────────┼──────────────────────────┼───────────┤
│ │ motherduck │ md: │ false │
│ │ motherduck share │ md:_share// │ true │
└─────────┴───────────────────┴──────────────────────────┴───────────┘
REFRESH DATABASE my_db; -- Alternatively, refresh a specific database
┌─────────┬──────────────────┬──────────────────────────┬───────────┐
│ name │ type │ fully_qualified_name │ refreshed │
│ varchar │ varchar │ varchar │ boolean │
├─────────┼──────────────────┼──────────────────────────┼───────────┤
│ │ motherduck share │ md:_share// │ false │
└─────────┴──────────────────┴──────────────────────────┴───────────┘
```
## Related Content
- [CREATE SNAPSHOT](/sql-reference/motherduck-sql-reference/create-snapshot.md)
- [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md)
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/revoke-access
---
sidebar_position: 1
title: REVOKE READ ON SHARE
---
# REVOKE READ ON SHARE
For restricted shares, use the `REVOKE` command to explicitly remove share access from users that have an existing `GRANT`. After running a `REVOKE` command there may be a delay of a few minutes before access is fully removed if a user is currently querying the share. Only the owner of the share can use the `REVOKE` command to remove access from others. `GRANT` and `REVOKE` do not apply to `UNRESTRICTED` shares.
## Syntax
```sql
REVOKE READ ON SHARE FROM [, , ];
```
If a username contains special characters, such as '@', it must be enclosed in double quotes (`"`).
## Example usage
```sql
-- revokes access to the share 'birds' from the user with username 'duck'
REVOKE READ ON SHARE birds FROM duck;
-- revokes access to the share 'taxis' from the users with usernames 'usr1' and 'usr2'
REVOKE READ ON SHARE taxis FROM usr1, usr2;
-- revokes access from a user whose username contains special characters
REVOKE READ ON SHARE sensitive_data FROM "user@example-com";
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/show-databases
---
sidebar_position: 1
title: SHOW ALL DATABASES
---
# SHOW ALL DATABASES
The `SHOW ALL DATABASES` statement shows all databases, would it be MotherDuck database, DuckDB database or MotherDuck shares.
It returns:
* `alias` (`db_name` or `share_alias`)
* `is_attached` flag to mention if the database is attached or not.
* `type` (e.g. DuckDB, MotherDuck, MotherDuck share)
* `fully_qualified_name` (empty, md:_share/ or md:db_name)
To query specific columns, you can use the table function `MD_ALL_DATABASES()`.
# Syntax
```sql
SHOW ALL DATABASES;
```
or using the table function
```sql
select * from MD_ALL_DATABASES();
```
# Example usage
```sql
SHOW ALL DATABASES;
```
Example output:
```bash
┌──────────────────────────────────────────┬─────────────┬──────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┐
│ alias │ is_attached │ type │ fully_qualified_name │
│ varchar │ boolean │ varchar │ varchar │
├──────────────────────────────────────────┼─────────────┼──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ true │ motherduck │ md:TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │
│ TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ true │ motherduck │ md:TEST_DB_62b53d968a4f4b6682ed117a7251b814 │
│ base │ false │ motherduck │ md:base │
│ base2 │ true │ motherduck │ md:base2 │
│ db1 │ false │ motherduck │ md:db1 │
│ integration_test_001 │ false │ motherduck │ md:integration_test_001 │
│ my_db │ true │ motherduck │ md:my_db │
│ my_share_1 │ true │ motherduck share │ md:_share/integration_test_001/18d6dbdb-e130-4cdf-97c4-60782ed5972b │
│ sample_data │ false │ motherduck │ md:sample_data │
│ source_db │ true │ motherduck │ md:source_db │
│ test_db_115 │ false │ motherduck │ md:test_db_115 │
│ test_db_28d │ false │ motherduck │ md:test_db_28d │
│ test_db_cc9 │ false │ motherduck │ md:test_db_cc9 │
│ test_share │ true │ motherduck share │ md:_share/source_db/b990b424-2f9a-477a-b216-680a22c3f43f │
│ test_share_002 │ true │ motherduck share │ md:_share/integration_test_001/06cc5500-e49a-4f62-9203-105e89a4b8ae │
├──────────────────────────────────────────┴─────────────┴──────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┤
│ 15 rows (15 shown) 4 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
```
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/temporary-tables
---
sidebar_position: 1
title: TEMPORARY TABLES
---
# TEMPORARY TABLES
The `CREATE TEMPORARY TABLE` statement creates a new a temporary from a sql query. This command is used to create a local temporary table. [More information can be found in the DuckDB documentation.](https://duckdb.org/docs/sql/statements/create_table.html#temporary-tables)
## Syntax
```sql
CREATE [ OR REPLACE ] TEMPORARY TABLE [ IF NOT EXISTS ] AS ...
```
Temporary Tables can be created traditionally with column names and types, or with `Create Table ... As Select` (CTAS).
### Shorthand Convention
The word `TEMP` can be used interchangably with `TEMPORARY`.
## Example Usage
```sql
CREATE TEMPORARY TABLE flights AS
FROM 'https://duckdb.org/data/flights.csv';
```
This will create a local table with data from the duckdb `flights.csv` file.
## Notes
- Temporary Tables in MotherDuck persist locally, not on the server. As such, local constraints should be considered when using them.
- Because they are bound to your session, when your session ends, any temporary tables will no longer be available.
---
Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/update-share
---
sidebar_position: 1
title: UPDATE SHARE
---
# UPDATE SHARE
Shares can either be manually or automatically updated by the share creator. All users of the share will automatically see share updates within 1 minute, containing both DDL (like CREATE TABLE) and DML (inserts, updates, or deletes) changes.
These updates are transactionally consistent snapshots, i.e. never partial database updates.
The share creator can have the share be automatically updated when the underlying database changes. This is done by specifying the `UPDATE AUTOMATIC` option during [share creation](create-share.md).
Alternatively the share creator can manually update the share with a new point-in-time snapshot of the database. This is done by running the `UPDATE SHARE` command.
# Syntax
```sql
UPDATE SHARE ;
```
---
Source: https://motherduck.com/docs/sql-reference/rest-api/ducklings-get-duckling-config-for-user.api
---
id: ducklings-get-duckling-config-for-user
title: "Get user Ducklings"
description: "Gets Duckling (instance) configuration for a user. Requires 'Admin' role."
sidebar_label: "Get user Ducklings"
hide_title: true
hide_table_of_contents: true
api: eJzlmEtv2zgQx78KoUtbwOtHm1x8c2sla6Br7yr2HtYIDFqibSYSqSWpPCrou+8MJUUPCykKG4sAOZkKh8OZP39kyEkdGTNFDZdiFjhjJ0j8+5CLvf5tz8y0+PgmxY7vr6RaaaacnhMw7Sse4yAYcs2MJlxoQ4XPiG9tk9wl2UlFKElgWJ947N+EK6bJh0kQcfGBKBmyPrgzdK+d8dopfWjntudo5ieKm2foSJ1JYg5S8R80n3J9m4FFTBWNmGFKWxuOscTUHMCjgA74wnltsx0ydpBHbg5cEHNgRKo9FaV7mNs/sIg649QxzzE60kaBDNAT0afvTOxhkvHny8tfcpv1HJUrADoblTDMAeSIpdCQMkz2eTjEn6bPm8QHRfQuCUlpDHGAyoYJg+Y0jkPu2zkGdxrHpMcJyO0d8w0MjBWut+H5jIrRYPMIMrOf25ars9H8B+vShokkwmWMk9DGiOYBVQE075JoK53brKHBuuXy1vZCQNqnSN3/E1LP2YXSv2+7gIFbyzqQyiP0MrSrn7cvXs+k4fMo65rorYTBFI0vujj4SgO7g5g251v/CMii+07pmrMvgWamFOzmcgiI+0SjOGSt2DIMLvgll9a+4W8y3XjuXyv3Zon+uNZJHm/hkSpFn2EESBjpE7Jsr0xpiFQ0g50IYiclckfycGCDU0MemWLlvuTbkNkDz5SZ1ZNaH3FQKWkFgNPAcNPWs5DoIzDx6RUJuYCJo/L8epk0LdaioWivEqRFVSn02oJoSRwdkzgTD4BrQL5BHsAgp6F+c0R2xXgOMlfzyWr5+8Kb/eNO3yWaHcJWiI5OQbQhbZ3Rbt46WP1yzOpK0OLqwII3B2kjuHPQebXwvs6mU3f+LtGsy1kx+eUUJitB60C2oOog8eKYxLk05Eom4u1hWEV2Dgbni+XmarGav8/j8UXLCsCLUwCs1KwDWGfpiL7LrtvjDICDp1BI3CK7t4VgK7xzcDibL11vPvm+uXG9v11v43rewnuXTDbFLcG8PO1C2S1v8992C7kWqRkawzGKVYc9szrj433sDB5GA3xJ60FaPuCzQVUbwNKAeihf/YkKYcjBmFiPBwMa834kQTGFdYy+LyMnq9USbpDqfBVbFYWXtURPZQUAv7eMKngHYrQoibUsdP3DToQ1EuLBvZpM/pzBSIwsF3PUH/aHSFwstYmonaUoTFwzY4sipJ5WYynSaoeeocCS52bYkxnEIeW2GGGVSwvR187DCAyt7PA7rlVOGlWZA6SC1mm6pZqtVJhl+Gd4QCis1EDzgSpOt5ZROIS4xjYs8Q5ubeyVHD8WwQefyE+KM53JlFtZ4EaG62KCX9C8Z8/1OlCGe/EAT25YUowv7574PotNbeDRiYgQvcB67eIbCq8ANW4KTnplA713BgXKWYulvGcC1CtjNPiNAWbZf01Ev3s=
sidebar_class_name: "get api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Gets Duckling (instance) configuration for a user. Requires 'Admin' role.
---
Source: https://motherduck.com/docs/sql-reference/rest-api/ducklings-set-duckling-config-for-user.api
---
id: ducklings-set-duckling-config-for-user
title: "Set user Ducklings"
description: "Configure account-specific settings, such as Duckling sizes for service accounts. See the Service Accounts Guide for context."
sidebar_label: "Set user Ducklings"
hide_title: true
hide_table_of_contents: true
api: eJztmE1v2zgQhv8KwUtbwI2dNrn45iQO1kDX6Sr2HtYIDFqibSYSqSWpfFTQf+8MJUUfFhIUNooA6SWRzOFw5p2HlDQpVTHXzAolJwEd0iDx70IhN+az4faiuDlXci02l0rPDde0RwNufC1inARTrrk1REhjmfQ58Z1tkrska6UJIwlMOyIe/z8RmhvyYRREQn4gWoUcvFm2MXS4oKULQ2961HA/0cI+wUBKR4ndKi1+sHzFxU0GFjHTLOKWa+NsBIYSM7sFjxIG4A6XdZftiHGAPAi7FZLYLSdKb5gs3cPa/pZHjA5Tap9idGSsBhVgJGKP37jcwCLDL6env+Q261GdCwAyW51wzAF/4caeqeAJl2sa9Choabm0OMTiOBS+c9W/NbhcuhunWt1y30KcscaqWgFawmhektftNGfB8gFU56/blsVaGvGDd0nFZRJhVeMkNFgBNA+YDuDyNolWit5kDUkWLZc3bhQCMj5DBn9PSD26DpV/13YBE1eOfOBWROhl4GDIr09ezqThcyfrmuithHdMizrC7/mIiZU0ee5fBgP819qYiQ+7yayTkJTG9GBQ/YHlrcGCxiddHJyxwB2+cNIcrv4RkMU2ndI1V5/BSci1hgdBOQXEfWRRHPJWbBkGF/ySS2ff8De6WHrjf+bj6xn6E8YkebyFR6Y1e4IZIGFk9siyXZnSEKloBjuSxC1K1Jrk4cDDgVnywDUv96VYhdw9K22ZWT2pxQ4HlZJOAHiSWGHbehYSfQQmPr0goZCwcFQ++54XTYtaNBTtVYK0qCqFXjgQHYnHuyRO5D3gGpBzyAMYFCw0b47IrhgPQeZ8OprP/rryJv+NL94lmh3CVoge74NoQ9o6o928dbD6dZfVuWTFaycP3hykjeAOQefllXc2ubgYT98lmnU5Kya/7sNkJWgdyBZUHSSe7JI4VZZcqkS+PQyryA7B4PRqtry8mk/f5/H4rGUF4Mk+AFZq1gGss7RD32nX2+MEgIPP6JCMi+zeFoKt8A7B4WQ6G3vT0bfl9dj7d+wtx5535b1LJpvilmCe7vdC2S1v87HdQq5FaobGcIxiwypOnM7Y+BnS/v1xH7swpp+WzZ+sX/WVsK2k78uOUaJDmLK1NjbDfp/F4ihSoJjGFtiRryKa1fpQ10h1XsVWN+q5luip7B7h/YozDd+BGC3uDa/q9YzrmpS9meYHdev7tPgM7fjM7TZsfpoOnGRYFxduUdy/XbbY4yMevNyT0fcJzEN58ooeHw2OBrhkrIyNmEu16Kxdc+uaeqSubYOHtDom9m8Q5vpa/mj7cciEa6a56qVF4Rf0/hgMXenh/7DW+Wt0FbeQCVqn6YoZPtdhluHPUBiNnUa4vGdasJXbJ3AQCoPXgNka3hz5Cyl+LGIPPpFXmoudyZTHicTDBF5ZE7yDyzv+VO9jZngebAEAwArjy4fP8yg+z9BJNX3nbM565YyR7/PYvmh7U9ti3+f45bcqmpRRvos1e8AeBPx1kSonSt4Zwt9SGjK5SfL9nLtEBvHtp7Zlii3SKy8wqU4toGDOYqbuuISildJYvEddsuwnVCzhWg==
sidebar_class_name: "put api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
import Admonition from '@theme/Admonition';
::::info
This endpoint is used to configure user-specific settings, primarily for setting Duckling sizes for service accounts.
For a complete walkthrough of service account management, see Service Accounts Guide - Step 3 .
::::
::::caution[Username Parameter]
When configuring a service account, ensure the `username` in the path (`/v1/users/:username/instances`) is the specific username defined when creating the service account. Note: The endpoint path uses "instances" for legacy reasons but configures Ducklings (compute instances).
::::
Set user specific config such as Duckling (instance) sizes
::::note
Authentication for this endpoint requires an Admin token. This configuration currently applies to Duckling sizes and will in the future include read scaling tokens.
::::
---
Source: https://motherduck.com/docs/sql-reference/rest-api/motherduck-rest-api.info
---
id: motherduck-rest-api
title: "MotherDuck REST API"
description: ""
sidebar_label: Introduction
sidebar_position: 0
hide_title: true
custom_edit_url: null
---
import ApiLogo from "@theme/ApiLogo";
import Admonition from '@theme/Admonition';
import Heading from "@theme/Heading";
import SchemaTabs from "@theme/SchemaTabs";
import TabItem from "@theme/TabItem";
import Export from "@theme/ApiExplorer/Export";
import DocCardList from '@theme/DocCardList';
::::warning[Preview Feature]
The REST API methods are in 'Preview' and may change in the future
::::
To better support scenarios that require some flexibility or dynamic configuration around
managing a MotherDuck organization we are exposing an OpenAPI endpoint with some new functionality.
At the moment it enables limited management of users and tokens via HTTP without requiring a
DuckDB + MotherDuck client to be running.
All of the methods are authenticated using a Read/Write token of a user with the `Admin` role within your MotherDuck Organization
and passing it via the `Authorization` header with a value of `Bearer {TOKEN}`.
::::info[Service Account Management]
You can use this REST API to programmatically manage service accounts, including their creation, token generation, and Duckling configuration.
For a detailed walkthrough, please see our [Service Accounts Guide](../../../key-tasks/service-accounts-guide).
::::
If you would like to generate your own OpenAPI client the spec file is located at https://api.motherduck.com/docs/specs
---
Source: https://motherduck.com/docs/sql-reference/rest-api/users-create-service-account.api
---
id: users-create-service-account
title: "Create new user (service account)"
description: "Create a new user, typically a service account, within your MotherDuck organization."
sidebar_label: "Create new user"
hide_title: true
hide_table_of_contents: true
api: eJzlmFFv2zYQx78KIQzoNrhx3CYvfnNmBTOQ2p1s76GOG9DSxWYjURpJOfEEfffdUVIkK163wR4QIC8JRR2Pd3/+KPOYOXECihsRy1Hg9J1Ug9LvfQXcwBTUVvgw8P04lcbpOAFoX4mEjNH0F2vEaAQTmvmpUiBNuGMKtFHCNxAwEzPrS8g144XpozAbbL/7BNEK1Dum4hDQt+Fr7fQXRQDOsuNoQI/C7LAzcwap2cRK/MmLuRfLHC0U/JHiVFdxgEaZfRQKMAujUug4fiwNBkSveJKEwreDu980ecgc7W8g4tQyuwQwn3j1DXxKM1GkiRGg6S0FJHkE1k8YTu5tQOUYSlSuaQw3Bu2w6+vi9jbJbvIl9kZC3oBcm43T7+ETf6qePlxe5p1/4+X2Nrhb/vyDQ/nuy1/FxaJUG7bChZAC9WBCMi5ZrNZcVnrleaehzqJOaZnnxTudxFIX+X44P6d/+5NNU98Hre/TkFXGzv+i8L4i/xT4xaFYr3jAvAKN08UYYfZ8fSDE9rLMNsBAqVixakjHgSceJSG0YsspuOA/ubT2e/4GwzvP/W3uTmfkT2idFvGWHrlSfIcjhIFIH5FleyEqw2XeDnaA+NGkLL5nRTjMbLhhj6CgYkesQmD3mI+pMmsmhZv7b2YrBcOtYIRp61lK9CMy8dN3JBQSJ46KfdGYNCvXYk/RTi1Ii6pK6IUF0ZLYe0niSG55KAKGX8oAGRQ81K+OyEMxnoLM+Xgwn/068UZf3OGbRPOAsDWivWMQ3ZO2yehh3g6w+vElq3PJy99YTOy1QboX3CnovJ54V6Ph0B2/STSbctZMfjyGyVrQJpAtqA6QePGSxHFs2DWeOF8fhnVkp2BwPJndXU/m47f5eXzWsgbw4hgAazWbADZZekHf5aHT40jSCZyHzC2ze10ItsI7BYej8cz1xoObu6nr/e56d67nTbw3yeS+uBWYl8cdKA/Lu/+z3UKuRWpOxvgZpfo8ie35EwtFrCKd7rbXLQpmqpfVllpUnaYqxLcbYxLd73Z5Is6iGNVRQeo/nPlxZMvJqsCeEsHFirXK7Od1I080hbXE5xVwBcquI+0Dry7E3Tr/uqx7XndMDdWyjkvJP9m4hhgX8/DIzQafRzgRJVLo3Ds7PzungZR4xG1QpdPy/kHCo71YaC9RVu/cU19VFKIYeDLdJORCUnxW8qxcmIWz7aHh813GhlYNe7NsxTXMVZjn1I2qKbrfwOaWK8FXJbA4AHiAKdFaPsCOMihyeT+juck8TO2ObH+R6FahGDHAij0x37VdNsD6PLEFz6q8TIkKeBV/xE7623ewEVttLSq2L3NCLtdpgXHhk5igH/0GPSUtnarRuD/hcteIEOWxFrP4ASRK1ClTMfSM4aLzvwAw4m74
sidebar_class_name: "post api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
import Admonition from '@theme/Admonition';
::::info
This endpoint is used to create new users / service accounts. For a detailed guide on managing service accounts, see the [Service Accounts Guide](../../../key-tasks/service-accounts-guide).
::::
Create user is currently restricted to creating a user with a 'Member' role
---
Source: https://motherduck.com/docs/sql-reference/rest-api/users-create-token.api
---
id: users-create-token
title: "Create an access token for a user"
description: "Create an access token for a user, including service accounts."
sidebar_label: "Create an access token"
hide_title: true
hide_table_of_contents: true
api: eJzlWN9z4jYQ/lc0ms70rsMFuCR94I0kZMpMCncO9KGEMsIWoIstu5KchDL+37sr2/gHbno38JCZvARbXq12v+9bRdodDSOumBGhHHq0R2PNlf7kKs4Mn4SPXNIWNWytaW9GDb5rOm9Rzd1YCbOF0R3tx2YTKvGPdQIj8wQsIqZYwA04szYCPsCY2YA7CR+ylexji3pcu0pE6XxYTq25Ifj9Z00yE+1ueMBob0eZ749X1qnZRuhIGyXkmuKaBhZEF3/NHh6i3V0yh9FAyDsu17B0rwtv7CV/+3x5mbS+x8vDg7eY//ITTeZJiyr+dywUB6yMijmmiiNcm6vQ22J8VYMWdUNpuDQ29CjyhWtxan/TmOyulFgWSLj8xl2DgSikxgiu7Vfjl4xkHCy5SrMTQRzQ3nmnY7PL3rqX5792cKiGLVJI+EskUs6JkATIDKWnaZJTc4jJ6ximXhfppMPJXGJEMwCGeYtnkA3yaV+0y3y0mWOYKxb7gFLZLKngPUvDmydJOq6jUOoUnM+dDv5Uc72PXZdrvYp9khvT09Fha6Mh22oMzIZAUmuIWnhNc1ahChjmHsdg8J88wAfLHF8w8/9LT0TASYlvrglMAx9pcXsLo3/IyTPTBMp8LSRU4JZkXmiScRlKf1vytwxDnzOb89HqqMvAZPsSYFXJphxJZdlMMhdNKrliHnHSCj6dOgIgna0b863hu+GEKxUqkk9BjlkQ+bwWGxIXej/k0tpX/PVvFs7g63RwP7Fi1DrmZREwpRhCByQE+ogs64Tlhrh/VoPtS2IXJeGKpOEQs2GGPHPF86oVS58TKBH4kmVWTmp2II8CSQsAbC5GmDqeGUQfQBMfX4FQyLQ2cby06C7jooJoqwCkpqoc6JkVolVi91CJQ/kEgvfINeQBGhTM129OkU0xnkKZ01F/Ovlt7Az/HNy8S2k2AFtItHuMRCvQljXarLcGrZ4fanUqWXbq496bE2kluFOo83bsXA1vbgajdynNMpyFJs+P0WQBaFmQNVE1KPHiUImj0JDbMJZvT4ZFZKfQ4Gg8WdyOp6P3uT3usSwEeHGMAAs0ywIsa+lAfZdNp8ehxLsi88kgy+5tSbAW3il0OBxNBs6of7e4Hzh/DJzFwHHGzrvUZBXcXJiXxx0om+Gt/tuuSa6m1ANUru0NiTAAp3QbtRgw22Wx7mHjxf5PFNoTq23V9Gj7qdu2HaH2Lm/XJO2sDYRdIPWUN3hi5YP9xphI99ptFomzIASElRe7j2duGNCk1Da6xypIWa81j/bco6e87WPvk5wpiBSzw1pyip7LoMDQ9kg6+w7TXtzlG2ilvwDgAR92akbq7zbqG4iaOHCoJ/0vQ3CAaaZgds86Zx2ciEAFzIacLfc9OFeY2Sdr+ItpRz4T9r5sodxlFMzoUxfmWRLgt1fqmhXtuA2SBqa73ZJpPlV+kuAwQKSwRQePT0wJtrQKhy1MaHwGuldw5uMHUe33MPrBySrhI2nsyjXGn5e/xOKHI2aMb/D4yLfltp/tpm2ADAAGo0o/X6drf5qgk2L6wV6Knbt0Rh/AjsyrtvOSwL+M7VVtmXXrgrTsFHvG9gH8taGGFgsrUDu2oz6T6zgtwNQnKhGPK+WeR6rRVv5Q6lFWwQCerIVtsAJXOTZZmwjr+F94PGm8
sidebar_class_name: "post api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
import Admonition from '@theme/Admonition';
Create an access token for a user
:::note
- For service account token creation, ensure you are using an **Admin token** for authentication. The token generated by this call will be the service account's own token for its operations.
- For detailed guidance on service account token creation and best practices, see Service Accounts Guide - Step 2 .
- If a new service account is created through the admin API, make sure to connect to this service account manually with a read/write token before using read scaling tokens.
- Each token is tied to a specific user - make sure to use the exact `username` that was specified during service account creation is put into the path (`/v1/users/:username/tokens`).
- If the optional `ttl` parameter is not specified, the access token will remain valid indefinitely until revoked by an administrator.
:::
---
Source: https://motherduck.com/docs/sql-reference/rest-api/users-delete-token.api
---
id: users-delete-token
title: "Invalidate a user access token"
description: "Invalidate a user access token"
sidebar_label: "Invalidate an access token"
hide_title: true
hide_table_of_contents: true
api: eJzlmE1v4zYQhv8KQRTobuGN493k4pu3VlADqd1V7B42cQ1aGsfc6Ksk5SYV9N87Q0m2ZKldNM4hQHyxRJEzL18++hhmPE5ACSPjaOLzIU81KP3BhwAMzOMHiHiPG3Gv+fCWGzrXfNnjGrxUSfOErRkfpWYbK/m3DYItyxx7JEKJEGMobftIvIBtZovhIryAZzbaSvrYor0thIIPM26eErqmjZLRPc97XMGfqVSA0oxKIe91hiLR9rDHfdCekkkhBZWrezCMrv+oWdnlkE0EwWxj9TXzknyD2inEH7d3d0l2nS+xNZTRNUT3mHo4wDPxWJ19vLwkad+Pcnfnr5Y//cDzZXtuS2rRSRxp0KTu4/k5/TVndJN6Hmi9SQNWdcZEXhwZiIydU5IE0rNr0f+maUzW9jdefwPP8Bx/PX7Rleez8JmL+kCb58dHBxThZWQxoxCVi3toL/Txus23wECpWLFqSI/DowiTAI605STO/18hbf9GvNF45TpfFs7NnOJJrdNCbxlRKCWecIQ0EOoTZpk31vx235FYaIodRcwmZfGGFXKY2QrD/gIF1brLdQBsg/Mx1czqk8K78F+ylYYhbUaaYz9Li94hE+//w0IZYeKwuOFrSbNyLRqO9g6GHFFVGU1aSxIHbRIn0U4E0mc/4zyQQSkC/eqI7NL4EmQupqPF/JeZO/nqjN8kmh3GHhAdnIJow9o6o928dbD6qc3qIhLlyxD8VwdpQ9xL0Hk1cz9PxmNn+ibRrNt5YPLTKUweDK0DeQRVB4kXbRKnsWFXcRq9PgwPyl6CwelsvrqaLaZv8/G49/IA4MUpAB7crANYZ6lF32XX1+Mkou9eETCnnN3rQvBI3ktwOJnOHXc6ul7dOO7vjrtyXHfmvkkmm+ZWYF6e9kHZbW/ztX2E3BGpLVfK97wwwIStEJmwpRUzZdmL5es2ppq4qIaLeg6LPd7fDfq2Tu5nVeWZ94viuJ9VZW1OlSaoXVUBpyrAoVtjEj3s90Uiz8IYvVZ+6j2ceXHI81pdfUP3Q7H+R9X1ngKKVBWzdL4GoUBZKshM27NckV9tojEmYi5+kbPRbxMcScoKJwZn52fnxGoSaxMKm6WsrL9rUsPTvTgDj6afBAJrdQxrp56V7t3y3QDHWf/wf1ir3cv9BWzc7w2gJ1sURcOybC00LFSQ59SMRYSi/Qc83Aklxdpyig8iqekY122DX27QUrh/EvF3bsnze0bJu5RXt2xENyw6kdIZHj7AU30Hg2r/Z+Xt3J94hpK9iXZfYQvCRxTIjeLyCNcsMbWBrWcwwbfnfexcO3MHu9N3R21VS8R61UFt46SpC5fK9rAbSHneNIw05vk/O5pT7Q==
sidebar_class_name: "delete api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
Invalidate a user access token
---
Source: https://motherduck.com/docs/sql-reference/rest-api/users-delete.api
---
id: users-delete
title: "Delete a user"
description: "Permanently delete a user and all of their data. THIS CANNOT BE UNDONE"
sidebar_label: "Delete a user"
hide_title: true
hide_table_of_contents: true
api: eJzlmE1z2kgQhv/K1JySKhZMYl+44SBXqHJEVsAelnK5BqkNE+srMyNiVqX/vt36QEJoU5vAwVU+IVDP9NvvPCOmlfIoBiWMjMKpx0c80aD0Hx74YID3uAfaVTKm23jzK6hAhBAaf8+KECYYjWAi9JjwfRY9MbMFqZgnjOizxefpnH0a2/ZswW4ttrQnM9vCaY3YaD5aFdn4Q49rcBMlzR5/TPk4MdtIyX9EkXb1kGFELJQIMKPSeYwkPbEwW5wtxBul9PyyLTtX+EOarQxJHYvURoTl9MxE7FCtdrcQCD5KudnHNKU2SoYbvBOIl3sIN5hu9OHm5jcTZD2u4HsiFaDTRiVAdSnQcRRq0JT2w9UVfRzPPk9cF7R+SnxWBaMiNwoNrgSFizj2pZtnG3zTNCY9LSVafwPX4MBY0YobWWQ8mHZSdHYkd1VHPmQZ3bvu0norPObgINDmchoDrF5sOiS212GB3oNSkWLVkB6HFxHEPrS0ZSTO+6Up8/ij+caTR8f6c2nNFzSf1Dop9JYzCqXEHkdIA4E+o8r2QlSBD1lb7DhkeVLahoUcxFEY9gMUVOzItQ/sCesxVWXNonCv/Ue20jAk1kjT9rO06B0y8f4nFsoQEwfFtm4kTcu1OHK0VxvSoqoyepWDmJM4PCVxGu6ELz32CetABqXw9asjskvjJchc2uPl4vPMmf5tTd4kmh3G1ogOz0H0yNomo928dbD68ZTVZSjKvzws7LVBeiTuEnTezZzb6WRi2W8SzaadNZMfz2GyNrQJZAuqDhKvT0m0I8PuoiR8fRjWyi7BIB5IH+9meB59kwwevKwBvD4HwNrNJoBNlk7ou+k6PU4RODxj+swqq3tdCLbkXYLDqb2wHHt8/zi3nL8s59FynJnzJpk8NrcC8+a8A2W3vcd/2y3kWqRmFIyPUeqND31i3neO+GA3HOQd7CCtmqOMukhQu6pNTZSPgVtjYj0aDEQs+0GEVikvcZ/7bhTwrNH8zgnnYvlaLfBhEWmmqlGl72sQClS+qORFHlka+iVPNMFEzMEDNRt/neJIUla4OOxf9a8ItTjSBtt6Glt20pNmb9+2P6135QVfCBTlGXgxg9gX2N2jsNy8tHR7xXdDDCzeGPT4qG5He3yLFVBEmq6FhqXys4x+xoZB0RsFvNwJJcU6ZxIfOlLTNS7pE57S4Cf1vXNKdt+z//0SobOUahOHtIXxoJjQN7x8hn3zzUVGu3ALwkPbSWlxe4z9f2waA0+ehUTRAdOJdW8tyFT6/2+wU7LSqy4oQacutDGPWETPEGbZQaah76Qxy/4F/Dk/SA==
sidebar_class_name: "delete api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
import Admonition from '@theme/Admonition';
::::warning[This action can not be undone!]
Once you delete a user or service account, all their data will be erased from MotherDuck.
There is no possibility of recovery data or users deleted via the API.
::::
Permanently delete a user and all of their data.
---
Source: https://motherduck.com/docs/sql-reference/rest-api/users-list-tokens.api
---
id: users-list-tokens
title: "List a user's access tokens"
description: "List a user's access tokens"
sidebar_label: "List a user's access tokens"
hide_title: true
hide_table_of_contents: true
api: eJzlmE1v2zgQhv8Kocu2gDeO2+Tim1sruwZSu1XsPawRGLQ0sdlIpEpSTryG/vsOKcn6bIvAPgTIKQo1Mxy+85AW5+CIGCTVTPBJ4AydRIFUf4ZM6bl4BK6cnqPpRjnDpaOzgfueo8BPJNN7HD04o0RvhWT/2Rg4cp+iRUwljUBjLGvD8AWO6S2G4/gin8g+9pwAlC9ZnPnbF+SJ6S3jRG+BCLmhvAiPc/tbiKgzPDh6H5tASkvGN/gmos+3wDc4yfDD9fWLwqY9R8KPhElADbRMwKxBgooFV6DMZB8uL82fesy7xPdBqYckJIUx5uELroFrY07jOGS+naP/XRmfQ3sBYv0dfI2OsTS10CybMZe7tKNS0j2aMQ2R+r0/C7pEehAyotrokaBBWpSjaYgv4DlGPVZUd4Wp6zBnERCbL8m8FEE3jOFLoBqClVYvCvJEFdaHbRinYbgneZSsSjRYCR7uK/HWQoRAbRGt+yobb88HPIkMyDbIEwJsqmX/UT4Njc19WiNhaUSsraKaQW069Gy4FtsltW+uuvj5RAPioQsofT5uIiSSbjoFaAiOuwCkFJIULqboNIpDaORmKimCF4W09rV4o/HKc78t3Lu5iceUSuAEun+6ymYZCsP7tJnsiBM7KREPJEsHDwaqyRNIKPYzW4dAcM/YI8OurLqoZYuXUkkrAJ4imummnrlE75CJ97+QkPFss2bn3nHSQ16LmqK9UpAGVYXQSwuiJXHQJnHCd7gDAvIZ14EMMhqqV0dkV47nIHMxHS3mf8+8yb/u+E2i2SFsiejgFERr0lYZ7eatg9WPbVYXnOafHBC8OkhryZ2DzpuZ92kyHrvTN4lmVc6SyY+nMFkKWgWyAVUHiVdtEqdCkxuR8NeHYZnZORiczuarm9li+jaPx6OWJYBXpwBYqlkFsMpSi77rrq/HCQKHV6iQuPnqXheCjfTOweFkOne96eh2ded6/7jeyvW8mfcmmayLW4B5fdoHZbe89Z/tBnINUluq3DL83KXEXL7/wEuhvS4TXXQWIsAj13QdNmBrYhoEQ6e/G/RtG6J/KJoEaf/ogyO7oq2QyBDtt1rHatjv05hdRAKllUHiP174InLSSrPizuCflbvRsjgW3UQqWgz2ZglUgrQQGO2sZV6AL3aiMU5EPPwAJ6OvE/Q0mWULH1xcXlwaNGOhdETtLHnn49ea1PQ7ZqbhWffjkDJ7zbXrPuR6LZ3dAP2sYvh3WGmslB2bLWZhTA+HNVWwkGGammG8JEjTxcHHHZWMri2HeNAwZZ6xNA/4ZQatrI4njfPOy3l9T37TuOlcSbFdudms+EmYmP/w8RH21R5RavbbFm/eWA2TX/Z6hNLFuuLYOvVM/Y+Q/eWae5L5ma82D7IS94oHE70zKVTOWti2GKpX5Gg1Ngmm6f8fcaOl
sidebar_class_name: "get api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
List a user's access tokens
---
Source: https://motherduck.com/docs/sql-reference/rest-api/zducklings-get-active-accounts.api
---
id: ducklings-get-active-accounts
title: "Get active accounts"
description: "[Preview] Get active accounts in an organization along with active Ducklings per account. Requires 'Admin' role"
sidebar_label: "Get active accounts"
hide_title: true
hide_table_of_contents: true
api: eJzlmFGP4jYQx7+KNS93lVJge3sveeOO7BWphWuAPhQQMslAfBvsnO3A0SjfvbKTkBC4dltW6kr7REj+mRn/5+ckdgYiQUk1E3wYggthGjzGjG/Vj1vU/UCzPfaDQKRcK3AgRBVIlhg1uDD/LHHP8LAkn1ATasWElmrCOKGcCLmlnP1pExAaC74lB6ajSj2o0pEEZXVvh/j4NWUSFXnTD3eMvyFSxAgOaLpV4M6hSgJLBxQGqWT6CO48g36qIyHLfODOl/nSAYkqEVyhAjeDn3o983M+kkkaBKjUJo1JJQYHAsE1cm3kNEliFtio3S/K3JOBCiLcUXOkjwmCC2L9BQMNDiTSmKpZkfFUbK2kUtIjOMA07tQ/R0gVSk532FAqLRnfQu7UHfvv8Vl4Gbnd7AXIwwKIkGQBUnVGCyCHCCWSEWEhcs02DBU5RCyIiEQaEhVQUxaRaK0zpRYZnpAJabg6SKbxlNGcKSMuwMRSmupUPSVagVoVKRAiDsWBmyi5YcOSFhqqWAhljafwy7wtOvWi6fylrCY0t9fur2H3gYYWdVT6+XDboVJ0+wSfpxESlFJIUt3iAH6juyTGVm25KS78VyGt/ixef7Dyvd9m3mRq4jGlUrwB2e+Ost2ISrjM28X2ObFJidiQohyiI6rJwVBdPgbYOkayEZLoamTNQc2X38tWGrZ0QDPd9rO06O19r/fD31jI+EbIXfEkayTNyl6cOerUhrSoqoyeWxAtiXeXJA75nsYsJB8l2slMY/XiiLxW43OQORv1Z9Ofx/7wD2/wKtG8YmyN6N0tiJ5Z22T0Om9XWH13yeqM0/Itj+GLg/SsuOeg82HsfxgOBt7oVaLZtLNm8t0tTNaGNoFsQXWFxPtLEkdCkweR8peHYV3ZczA4Gk9XD+PZ6HU+Hk9e1gDe3wJg7WYTwCZLF/S9v/b1OOTafIrGxCtH97IQbJX3HBwOR1PPH/V/WU08/3fPX3m+P/ZfJZPn5lZgvr/tg/K6veev7RZyLVJzI9aRMDsJW7Q+Ux2BC939XbdYh61ovZ2gUO5RKrtyT2UMLkRaJ8rtdmnCOjuhI5RmkdUJxA7yxlp/YlAuWtda8Z8aaCKZFFYJLqyRSpS2ocYHqyzN/NUmMlsRxPcmU9L/PAQHTGWFg3edXqdnMEuE0jtqsxTLcbiy8dFuQFbPy/9jx6RwQ+M33U1iyrgZh/U6K3szh/0dONDuztKBSChtrmfZmiqcyTjPzemvKUqz27J0YE8lo+sSXgcipCFK285HPJpJEQSYGAz2NE7tvGw/l0xXT8h88sxKxryIG40sG+dUByZ6Ncv5sRE7ywrFVDwiz3NwyiK0+Q+5IfQvzQyFNw==
sidebar_class_name: "get api-method"
info_path: sql-reference/rest-api/motherduck-rest-api
custom_edit_url: null
---
import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint";
import ParamsDetails from "@theme/ParamsDetails";
import RequestSchema from "@theme/RequestSchema";
import StatusCodes from "@theme/StatusCodes";
import OperationTabs from "@theme/OperationTabs";
import TabItem from "@theme/TabItem";
import Heading from "@theme/Heading";
[Preview] Get active accounts in an organization along with active Ducklings per account. Requires 'Admin' role
---
Source: https://motherduck.com/docs/sql-reference/sql-reference
---
title: SQL reference
sidebar_class_name: sql-reference-icon
description: SQL reference for MotherDuck & DuckDB
---
import DocCardList from '@theme/DocCardList';
---
Source: https://motherduck.com/docs/sql-reference/wasm-client
---
position: 3
---
# MotherDuck Wasm Client
[MotherDuck](https://motherduck.com/) is a managed DuckDB-in-the-cloud service.
[DuckDB Wasm](https://github.com/duckdb/duckdb-wasm) brings DuckDB to every browser thanks to WebAssembly.
The MotherDuck Wasm Client library enables using MotherDuck through DuckDB Wasm in your own browser applications.
## Examples
Example projects and live demos can be found [here](https://github.com/motherduckdb/wasm-client).
## Status
Please note that the MotherDuck Wasm Client library is in an early stage of active development. Its structure and API may change considerably.
Our current intention is to align more closely with the DuckDB Wasm API in the future, to make using MotherDuck with DuckDB Wasm as easy as possible.
## DuckDB Version Support
- The MotherDuck Wasm Client library uses the same version of DuckDB Wasm as the MotherDuck web UI. Since the DuckDB Wasm assets are fetched dynamically, and the MotherDuck web UI is updated weekly and adopts new DuckDB versions promptly, the DuckDB version used could change even without upgrading the MotherDuck Wasm Client library. Check `pragma version` to see which DuckDB version is in use.
## Installation
`npm install @motherduck/wasm-client`
## Requirements
To faciliate efficient communication across worker threads, the MotherDuck Wasm Client library currently uses advanced browser features, including [SharedArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer).
Due to [security requirements](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/SharedArrayBuffer#security_requirements) of modern browsers, these features require applications to be [cross-origin isolated](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated).
To use the MotherDuck Wasm Client library, your application must be in cross-origin isolation mode, which is enabled when it is served with the following headers:
```
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
```
You can check whether your application is in this mode by examining the [crossOriginIsolated](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated) property in the browser console.
Note that applications in this mode are restricted in [some](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cross-Origin-Opener-Policy#same-origin) [ways](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cross-Origin-Embedder-Policy#require-corp). In particular, resources from different origins can only be loaded if they are served with a [Cross-Origin-Resource-Policy (CORS)](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cross-Origin-Resource-Policy) header with the value `cross-origin`.
## Dependencies
The MotherDuck Wasm Client library depends on `apache-arrow` as a peer dependency.
If you use `npm` version 7 or later to install `@motherduck/wasm-client`, then `apache-arrow` will automatically be installed, if it is not already.
If you already have `apache-arrow` installed, then `@motherduck/wasm-client` will use it, as long as it is a compatible version (`^14.0.x` at the time of this writing).
Optionally, you can use a variant of `@motherduck/wasm-client` that bundles `apache-arrow` instead of relying on it as a peer dependency.
Don't use this option if you are using `apache-arrow` elsewhere in your application, because different copies of this library don't work together.
To use this version, change your imports to:
```ts
import '@motherduck/wasm-client/with-arrow';
```
instead of:
```ts
import '@motherduck/wasm-client';
```
## Usage
The MotherDuck Wasm Client library is written in TypeScript and exposes full TypeScript type definitions. These instructions assume you are using it from TypeScript.
Once you have installed `@motherduck/wasm-client`, you can import the main class, `MDConnection`, as follows:
```ts
import { MDConnection } from '@motherduck/wasm-client';
```
### Creating Connections
To create a `connection` to a MotherDuck-connected DuckDB instance, call the `create` static method:
```ts
const connection = MDConnection.create({
mdToken: token
});
```
The `mdToken` parameter is required and should be set to a valid MotherDuck access token. You can create a MotherDuck access token in the MotherDuck UI. For more information, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token).
The `create` call returns immediately, but starts the process of loading the DuckDB Wasm assets from `https://app.motherduck.com` and starting the DuckDB Wasm worker.
This initialization process happens asynchronously. Any query evaluated before initialization is complete will be queued.
To determine whether initialization is complete, call the `isInitialized` method, which returns a promise resolving to `true` when DuckDB Wasm is initialized:
```ts
await connection.isInitialized();
```
Multiple connections can be created. Connections share a DuckDB Wasm instance, so creating subsequent connections will not repeat the initialization process.
Queries evaluated on different connections happen concurrently; queries evaluated on the same connection are queued sequentially.
### Evaluating Queries
To evaluate a query, call the `evaluateQuery` method on the `connection` object:
```ts
try {
const result = await connection.evaluateQuery(sql);
console.log('query result', result);
} catch (err) {
console.log('query failed', err);
}
```
The `evaluateQuery` method returns a [promise](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Promises) for the result. In an [async function](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/async_function), you can use the `await` syntax as above. Or, you can use the `then` and/or `catch` methods:
```ts
connection.evaluateQuery(sql).then((result) => {
console.log('query result', result);
}).catch((reason) => {
console.log('query failed', reason);
});
```
See [Results](#results) below for the structure of the result object.
### Prepared Statements
To create a [prepared](https://duckdb.org/docs/api/c/prepared) [statement](https://duckdb.org/docs/api/wasm/query#prepared-statements) for later evaluation, use the `prepareQuery` method:
```ts
const prepareResult = await this.prepareQuery('SELECT v + ? FROM generate_series(0, 10000) AS t(v);');
```
This returns an [AsyncPreparedStatement](https://shell.duckdb.org/docs/classes/index.AsyncPreparedStatement.html), which can be evaluated later using the `send` method:
```ts
const arrowStream = await prepareResult.send(234);
```
Note: The `query` method of the AsyncPreparedStatement should not be used, because it can lead to deadlock when combined with the MotherDuck extension.
To immediately evaluate a prepared statement, call the `evaluatePreparedStatement` method:
```ts
const result = await connection.evaluatePreparedStatement('SELECT v + ? FROM generate_series(0, 10000) AS t(v);', [234]);
```
This returns a materialized result, as described in [Results](#results) below.
### Canceling Queries
To evalute a query that can be canceled, use the `enqueueQuery` and `evaluateQueuedQuery` methods:
```ts
const queryId = connection.enqueueQuery(sql);
const result = await connection.evaluateQueuedQuery(queryId);
```
To cancel a query evaluated in this fashion, use the `cancelQuery` method, passing the `queryId` returned by `enqueueQuery`:
```ts
const queryWasCanceled = await connection.cancelQuery(queryId);
```
The `cancelQuery` method returns a promise for a boolean indicating whether the query was successfully canceled.
The result promise of a canceled query will be rejected with and error message. The `cancelQuery` method takes an optional second argument for controlling this message:
```ts
const queryWasCanceled = await connection.cancelQuery(queryId, 'custom error message');
```
### Streaming Results
The query methods above return fully materialized results. To evalute a query and return a stream of results, use `evaluateStreamingQuery` or `evaluateStreamingPreparedStatement`:
```ts
const result = await connection.evaluateStreamingQuery(sql);
```
See [Results](#results) below for the structure of the result object.
### Error Handling
The query result promises returned by `evaluateQuery`, `evaluatePreparedStatement`, `evaluateQueuedQuery`, and `evaluateStreamingQuery` will be rejected in the case of an error.
For convenience, "safe" variants of these three method are provided that catch this error and always resolve to a value indicating success or failure. For example:
```ts
const result = await connection.safeEvaluateQuery(sql);
if (result.status === 'success') {
console.log('rows', result.rows);
} else {
console.log('error', result.err);
}
```
### Results
A successful query result may either be fully materialized, or it may contain a stream.
Use the `type` property of the result object, which is either `'materialized'` or `'streaming'`, to distinguish these.
#### Materialized Results
A materialized result contains a `data` property, which provides several methods for getting the results.
The number of columns and rows in the result are available through the `columnCount` and `rowCount` properties of `data`.
Column names and types can be retrived using the `columnName(columnIndex)` and `columnType(columnIndex)` methods.
Individual values can be accessed using the `value(columnIndex, rowIndex)` method. See below for details about the forms values can take.
Several convenience methods also simplify common access patterns; see `singleValue()`, `columnNames()`, `deduplicatedColumnNames()`, and `toRows()`.
The `toRows()` method is especially useful in many cases. It returns the result as an array of row objects.
Each row object has one property per column, named after that column. (Multiple columns with the same name are dedupicated with suffixes.)
The type of each column property of a row object depends on the type of the corresponding column in DuckDB.
Many values are converted to a JavaScript primitive type, such as `boolean`, `number`, or `string`.
Some numeric values too large to fit in a JavaScript `number` (e.g a DuckDB [BIGINT](https://duckdb.org/docs/sql/data_types/numeric#integer-types)) are converted to a JavaScript `bigint`.
Some DuckDB types, such as [DATE](https://duckdb.org/docs/sql/data_types/date), [TIME](https://duckdb.org/docs/sql/data_types/time), [TIMESTAMP](https://duckdb.org/docs/sql/data_types/timestamp), and [DECIMAL](https://duckdb.org/docs/sql/data_types/numeric#fixed-point-decimals), are converted to JavaScript objects implementing an interface specific to that type. Nested types such as DuckDB [LIST](https://duckdb.org/docs/sql/data_types/list), [MAP](https://duckdb.org/docs/sql/data_types/map), and [STRUCT](https://duckdb.org/docs/sql/data_types/struct) are also exposed through speical JavaScript objects.
These objects all implement `toString` to return a string representation. For primitive, this representation is identical to DuckDB's string conversion (e.g. using [CAST](https://duckdb.org/docs/sql/expressions/cast.html) to VARCHAR). For nested types, the representation is equivalent to the syntax used to construct these types.
They also have properties exposing the underlying value. For example, the object for a DuckDB TIME has a `microseconds` property (of type `bigint`). See the TypeScript type definitions for details.
Note that these result types differ from those returned by DuckDB Wasm without the MotherDuck Wasm Client library. The MotherDuck Wasm Client library implements custom conversion logic to preserve the full range of some types.
#### Streaming Results
A streaming result contains three ways to consume the results, `arrowStream`, `dataStream`, and `dataReader`. The first two (`arrowStream` and `dataStream`) implement the async iterator protocol, and return items representing batches of rows, but return different kinds of batch objects. Batches correspond to DuckDB DataChunks, which are no more than 2048 rows. The third (`dataReader`) wraps `dataStream` and makes consuming multiple batches easier.
The `dataStream` iterator returns a sequence of `data` objects, each of which implements the same interface as the `data` property of a materialized query result, described above.
The `dataReader` implements the same `data` interface, but also adds useful methods such as `readAll` and `readUntil`, which can be used to read at least a given number of rows, possibly across multiple batches.
The `arrowStream` property provides access to the underlying Arrow RecordBatch stream reader. This can be useful if you need the underlying Arrow representation. Also, this stream has convenience methods such as `readAll` to materialize all batches.
Note, however, that Arrow performs sometimes lossy conversion of the underlying data to JavaScript types for certain DuckDB types, especially dates, times, and decimals.
Also, converting Arrow values to strings will not always match DuckDB's string conversion.
Note that results of remote queries are not streamed end-to-end yet.
Results of remote queries are fully materialized on the client upstream of this API.
So the first batch will not be returned from this API until all results have been received by the client.
End-to-end streaming of remote query results is on our roadmap.
### DuckDB Wasm API
To access the underlying DuckDB Wasm instance, use the `getAsyncDuckDb` function. Note that this function returns (a Promise to) a singleton instance of DuckDB Wasm also used by the MotherDuck Wasm Client.
---
Source: https://motherduck.com/docs/troubleshooting/aws-s3-secrets
---
sidebar_position: 5
title: AWS S3 Secrets Troubleshooting
keywords:
- AWS S3
- secrets
- authentication
- credentials
- troubleshooting
- IAM policy
- credential chain
---
# AWS S3 Secrets Troubleshooting
This page is for troubleshooting help with AWS S3 secrets in MotherDuck. For more information on creating a secret, see: [Create Secret](/documentation/sql-reference/motherduck-sql-reference/create-secret.md).
## Prerequisites
Before troubleshooting AWS S3 secrets, ensure you have:
- **Required**: [A valid MotherDuck Token](documentation/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md#creating-an-access-token) with access to the target database
- **Required**: [AWS credentials](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html) (access keys, SSO, or IAM role)
- **Optional**: [DuckDB](https://duckdb.org/docs/stable/clients/cli/overview.html) CLI (for troubleshooting purposes, though any DuckDB client will work)
- **Optional**: [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) (for bucket access verification)
:::note
**AWS CLI PATH**: If you installed AWS CLI manually, you may need to add it to your system PATH. Package managers like Homebrew (macOS) typically add it to PATH automatically. Verify with `which aws` (macOS/Linux) or `where aws` (Windows) - if it returns a path, you're all set!
:::
## Verify Secret Access
### Check that the secret is configured
First, make sure you're connected to MotherDuck:
```sql
-- Connect to MotherDuck (replace 'your_db' with your database name)
ATTACH 'md:your_db';
```
Then type in the following:
```sql
.mode line
SELECT secret_string, storage FROM duckdb_secrets();
```
The output should look something like this. Make sure that the output string includes values for: `key_id`, `region`, and `session_token`:
```
secret_string = name=aws_sso;type=s3;provider=credential_chain;serializable=true;scope=s3://,s3n://,s3a://;endpoint=s3.amazonaws.com;key_id=;region=us-east-1;secret=;session_token=
```
:::note
If you see no results, it means no secrets are configured. You'll need to create a secret first using [CREATE SECRET](/documentation/sql-reference/motherduck-sql-reference/create-secret.md).
:::
If your output is missing a value for `key_id`, `region`, or `session_token`, you can recreate your secret by following the directions for [CREATE OR REPLACE SECRET](/documentation/sql-reference/motherduck-sql-reference/create-secret.md).
If that output worked successfully, you can confirm you have access to your AWS bucket by running these commands **in your terminal** (not in DuckDB):
```bash
# Log into AWS by running:
aws sso login
# Check bucket access:
aws s3 ls
```
**Example Output:**
```
PRE lambda-deployments/
PRE raw/
PRE ducklake/
2025-05-29 07:03:26 14695690 sample-data.csv
```
:::note
**Understanding the output**: `PRE` indicates folders/prefixes, while files show their size and modification date. If you only see `PRE` entries, your bucket contains organized data in folders. To explore deeper, use `aws s3 ls s3:////` or `aws s3 ls s3:/// --recursive` to see all files.
:::
## Configure permissions in AWS
This is an example of an IAM policy that will allow MotherDuck to access your S3 bucket. Note: if you use KMS keys, the IAM policy should also have `kms:Decrypt` in `AllowBucketListingAndLocation`.
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowBucketListingAndLocation",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::your_bucket_name"
]
},
{
"Sid": "AllowObjectRead",
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::your_bucket_name/*"
]
}
]
}
```
## AWS Credential Chain
MotherDuck automatically finds your AWS credentials using AWS's credential chain. This is the recommended approach, as it uses short-lived credentials (typically valid for 1 hour), which are more secure and reduce the risk of credential leakage. For most users, it works seamlessly with your existing AWS setup.
### Most Common: AWS SSO
If you use AWS SSO (like most users), run:
```bash
aws sso login
```
To create a secret using the credential chain, run:
```sql
CREATE OR REPLACE SECRET my_secret IN MOTHERDUCK (
TYPE s3,
PROVIDER credential_chain,
CHAIN 'env;config' --optional
);
```
### Other Credential Types
The credential chain also works with:
- **Access keys** stored in `~/.aws/credentials`
- **IAM roles** (if running on EC2)
- **Environment variables**
### Advanced: Role Assumption
:::note
**Only needed for**: Cross-account access, elevated permissions, or when you need to assume a different role than your current profile.
:::
If you need to assume a specific IAM role, create a profile in `~/.aws/config`:
```ini
[profile my_motherduck_role]
role_arn = arn:aws:iam::your_account_id:role/your_role_name
source_profile = your_source_profile
```
Then create a secret that uses this profile:
```sql
CREATE SECRET my_s3_secret (
TYPE S3,
PROVIDER credential_chain,
PROFILE 'my_motherduck_role',
REGION 'us-east-1' -- Use your bucket's region if different
);
```
## Common Challenges
### Scope
When using multiple secrets, the `SCOPE` parameter ensures MotherDuck knows which secret to use. You can validate which secret is being used with the `which_secret` function:
```sql
SELECT * FROM which_secret('s3://my-bucket/file.parquet', 's3');
```
### Periods in bucket name (url_style = path)
Because of SSL certificate verification requirements, S3 bucket names that contain dots (.) cannot be accessed using virtual-hosted style URLs. This is due to AWS's SSL wildcard certificate (`*.s3.amazonaws.com`) which only validates single-level subdomains.
If your bucket name contains dots, you have two options:
1. **Rename your bucket** to remove dots (e.g., use dashes instead)
2. **Use path-style URLs** by adding the `URL_STYLE 'path'` option to your secret:
```sql
CREATE OR REPLACE SECRET my_secret (
TYPE s3,
URL_STYLE 'path',
SCOPE 's3://my.bucket.with.dots'
);
```
For more information, see [Amazon S3 Virtual Hosting documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html).
## What's Next
After resolving your AWS S3 secret issues:
- **[Query your S3 data](/key-tasks/cloud-storage/querying-s3-files.md)** - Learn how to query files stored in S3
- **[Load data into MotherDuck](/key-tasks/loading-data-into-motherduck/)** - Set up data loading workflows
- **[Configure additional cloud storage](/integrations/cloud-storage/)** - Set up Azure, Google Cloud, or other providers
- **[Share data with your team](/key-tasks/sharing-data/)** - Collaborate using MotherDuck's sharing features
---
Source: https://motherduck.com/docs/troubleshooting/error_messages
---
sidebar_position: 1
title: Error Messages
---
## Connection Errors
### Disallowed connections with a different configuration
If you create different connections with the same connection database path (such as `md:my_db`) but a different configuration dictionary, you may encounter the following error:
```text
Connection Error: Can't open a connection to same database file with a different configuration than existing connections
```
This validation error prevents accidental retrieval of a previously cached database connection, and can happen only in DuckDB APIs that make use of a [database instance cache](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#multiple-connections-and-the-database-instance-cache).
In file-based DuckDB, this can only happen when the previous connection is still in scope.
With MotherDuck, the database instance cache is longer lived, so you may see this error even after the previous connections have been closed.
#### How To Recover
For multiple connections that are used sequentially:
* If the configuration does not need to differ, consider unifying it, which will allow the same underlying client-side database instance to be reused.
* If the configuration differs intentionally, [set the database instance TTL to zero](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#setting-custom-database-instance-cache-time-ttl) and close the previous connections.
For multiple connections whose lifecycles need to overlap, add a differentiating suffix to the connection string, so that these connections are no longer considered to be backed by the same database.
A good differentiating string is the [`session_hint`](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#read-scaling-with-session-hints).
While it is meant to associate an individual end user to a dedicated backend when used with read scaling tokens, it can also be used to signal client-side intent for a distinct database instance when used with regular tokens.
---
Source: https://motherduck.com/docs/troubleshooting/faq
---
sidebar_position: 1
title: FAQ
keywords:
- MotherDuck version
- open vs attach
- database connection
- WAL file
- database cache
- compatibility
---
import Versions from '@site/src/components/Versions';
### What's the difference between .open md: & ATTACH 'md:' ?
`.open` initiates a new database connection (to a given database or `my_db` by default) and can be passed different parameters in the connection strings like `motherduck_token` or [saas_mode](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck.md#authentication-using-saas-mode) flag. If you have previous local database attached, it will be detached when using `.open`.
`ATTACH` keeps the current database connection and attaches a new motherduck (cloud) database(s) to the current connection. You'll need to use `USE` to select the database you want to query.
### How do I know which version of DuckDB I should be running ?
MotherDuck currently supports DuckDB .
- In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through .
- In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck is compatible with client versions through .
Please check that you have a compatible version of DuckDB running locally.
### How do I know which version of DuckDB am I running?
You can use the `VERSION` pragma to find out which version of DuckDB you are running
```sql
PRAGMA VERSION;
```
### How do I know what's executed locally and what's executed remote ?
If you run an [EXPLAIN](/sql-reference/motherduck-sql-reference/explain/) on your query, you will see the phyical plan. Each operation is followed by either (L)= Local or (R)= Remote as shown in the query plan example below. More information can be found in the [documentation](/sql-reference/motherduck-sql-reference/explain/).
```sql
EXPLAIN [Your Query]
```

### I connect to both MotherDuck and a local database, why is there an uncheckpointed WAL left behind?
DuckDB keeps a [database instance cache](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#multiple-connections-and-the-database-instance-cache) for each unique connection path.
Connecting to MotherDuck extends the lifetime of the database instance to a default of 15 minutes.
If you observe a WAL file left behind for the local database after the process exits or run into the "File is already open" error when closing and reopening the connection, there are several workarounds:
* Run `CHECKPOINT "local-database-name"` in the application code.
* Run `DETACH "local-database-name"` in the application code
* Disable the cache lifetime extension by setting `motherduck_dbinstance_inactivity_ttl` setting to `0s` (see [Setting Custom Database Instance Cache TTL](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#setting-custom-database-instance-cache-time-ttl)).
### Why am I not in the same Organization as my team?
If you sign up to MotherDuck directly, you will create your own Organization as a part of the sign up flow.
To join your team's Organization, reach out to your team and request that they [invite you to their Organization](../key-tasks/managing-organizations/managing-organizations.mdx#inviting-users-to-your-organization).
As an alternative, you may reach out to [MotherDuck support](./support.md) and we can search for other users within your domain.
### How do I use my team's shared databases?
Some database shares are scoped at the `ORGANIZATION` level. To use those shares, you must be in the same Organization as the person who created the share.
In addition, some shares are marked as 'DISCOVERABLE`. This allows members of the same Organization to easily find those shares through the UI.
Follow the steps outlined in ["Why am I not in the same Organization as my team?"](#why-am-i-not-in-the-same-organization-as-my-team) to join your team!
### How do I delete my account?
You can delete your account and all associated information by following these steps:
1. Navigate to your personal Settings and select "Members" from the left sidebar
2. Click the three dots (⋮) next to your name
3. Select "Delete"
4. Confirm the account deletion
:::note
If you are the only member of your Organization, deleting your account will also delete the Organization.
:::
For additional assistance, please contact our [support team](./support.md).
### Why I am getting SSL errors when connecting to MotherDuck from a Docker image?
If you see SSL errors when trying to connect to MotherDuck from a Docker image, this is likely because the image does not have updated CA certificates. If the container was working and suddenly stopped, it is likely that the certificates in the image have expired. Please refer to [Docker's documentation](https://docs.docker.com/engine/network/ca-certs/) for best practices on updating CA certificates in Docker images.
Some common errors you might see indicating an issue with your CA certificates include:
* `Could not get default pem root certs.`
* `Failed to create security handshaker.`
* `Update handshaker factory failed.`
### Why don't COPY DATABASE statements work in the MotherDuck Web UI?
The MotherDuck Web UI has limitations with certain SQL statements that are implemented as multiple statement macros:
**COPY DATABASE statements** have limited support in the MotherDuck Web UI:
* The full `COPY FROM DATABASE` command is not supported when copying both schema and data simultaneously
* **Workaround**: Use the `COPY FROM DATABASE` command with specific options:
* `COPY FROM DATABASE source_db TO target_db (SCHEMA)` - copies only the database structure
* `COPY FROM DATABASE source_db TO target_db (DATA)` - copies only the database data
For full functionality with these commands, use the DuckDB CLI or other supported drivers. More information about database copying can be found in the [database operations documentation](/documentation/key-tasks/database-operations/copying-databases.md).
---
Source: https://motherduck.com/docs/troubleshooting/reinstall-md-extension
---
sidebar_position: 3
title: Reinstall MotherDuck extension
---
The MotherDuck extension will be automatically loaded and downloaded as soon as you connect to MotherDuck. However, you can force a reinstallation by following these steps:
```
FORCE INSTALL motherduck;
```
Next to that make sure you are running the current supported [version of DuckDB](../faq#how-do-i-know-which-version-of-duckdb-i-should-be-running-).
---
Source: https://motherduck.com/docs/troubleshooting/support
---
sidebar_position: 7
title: Support
---
Have a question that isn't answered in our [FAQ](./faq.md)? Join the [MotherDuck Slack Community](https://slack.motherduck.com/) or contact us at [support@motherduck.com](mailto:support@motherduck.com?subject=Support+question).
---
Source: https://motherduck.com/docs/troubleshooting/troubleshooting-access-policy
---
sidebar_position: 6
title: Troubleshooting Data Access Policy
---
In order to help you with certain kinds of MotherDuck issues, it can be helpful for us to access your MotherDuck account. For example, if a specific query on a specific dataset is triggering a bug, it may be necessary for us to access the data and SQL query, and possibly re-run a specific query, in order to reproduce the issue and diagnose the problem.
A MotherDuck employee may use our community Slack or
email to request your permission to access your MotherDuck account while troubleshooting an issue.
If you give us permission to access to your MotherDuck account for troubleshooting, here is what you need to know:
- Our goal is to understand the issue and resolve the problem. We will make every effort to minimize the amount of time we spend accessing your account and the amount of data we access. We will only access the data we need to investigate and troubleshoot the specific issue.
- Any access to your data will be strictly read-only.
- A MotherDuck employee may pull in other MotherDuck employees during the debugging process. By agreeing to allow us to access your account for troubleshooting an issue, other MotherDuck employees who are asked to help investigate the issue may also access your account, subject to the same terms of this policy, without requesting additional authorization from you.
- We will not share or disclose the data we access while troubleshooting the issue to any third party or non-MotherDuck employee.
- We may make temporary copies of your data while debugging the issue. Any such copies will be permanently deleted once the issue is resolved.
- We may use the data we access in your account to generate a redacted copy of the data to be used for creating a bug report or test.
- The permission you have granted to access your account lapses once this specific issue is resolved.
---
Source: https://motherduck.com/docs/troubleshooting/troubleshooting
---
title: Troubleshooting
sidebar_class_name: troubleshooting-icon
description: Troubleshooting
---
---
Source: https://motherduck.com/docs/troubleshooting/uninstall
---
sidebar_position: 2
title: Uninstall MotherDuck extension
---
### How do I uninstall MotherDuck?
* Remove `motherduck_*` from your environment variables (most likely only `motherduck_token`) [1]
* Remove any `motherduck*.duckdb_extension` file located into `~/.duckdb` [2]
[1] To view all your environment variables you may use:
```bash
$ env | grep -i motherduck
```
To unset in the current session:
```bash
$ unset motherduck_token
```
To unset the variable permanently, you may have to check your shell initialisation files (`~/.bashrc`, `~/.zshrc`, etc.)
[2] Note those files are generally under `~/.duckdb/extensions//`. Eg. `~/.duckdb/extensions/v0.9.1/osx_arm64`.
You may use this script:
```bash
$ find ~/.duckdb -name 'motherduck*.duckdb_extension' -exec rm {} \;
```
---
Source: https://motherduck.com/docs/troubleshooting/version-lifecycle-schedules
---
sidebar_position: 8
title: MotherDuck Version Lifecycle Schedules
---
import Versions from '@site/src/components/Versions';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
MotherDuck supports DuckDB versions according to a predictable lifecycle so you always know which version is safe to use.
The lifecycle schedules below form a part of MotherDuck’s Support Policies. They include Major Releases and Minor Releases to support [DuckDB](/troubleshooting/faq/#how-do-i-know-which-version-of-duckdb-i-should-be-running-) and [DuckLake](/integrations/file-formats/ducklake/) versions and specify end of life dates for both.
## Currently Supported Versions
MotherDuck currently supports DuckDB .
- In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through .
- In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck is compatible with client versions through .
MotherDuck strives to support current DuckDB minor versions in alignment with the [DuckDB](https://duckdb.org/release_calendar) and [DuckLake](https://ducklake.select/release_calendar) release calendars.
We recommend that users run the latest minor version when possible to take advantage of the most up-to-date features and functionality.
## MotherDuck Support Lifecycle Schedules
The following lifecycle schedules apply to DuckDB and DuckLake versions.
```mermaid
%%{init: {"gantt": {"fontSize": 13, "sectionFontSize": 15}}}%%
gantt
title MotherDuck Version Lifecycle Schedules
dateFormat YYYY-MM-DD
axisFormat %b %Y
section 1.0.x
DuckDB 1.0.0 :2024-06-01, 2025-07-31
section 1.1.x
DuckDB 1.1.0 :2024-09-01, 2025-07-31
DuckDB 1.1.1 :2024-09-15, 2025-07-31
DuckDB 1.1.2 :2024-10-01, 2025-07-31
DuckDB 1.1.3 :2024-11-01, 2025-07-31
section 1.2.x
DuckDB 1.2.0 :2025-02-01, 2026-01-31
DuckDB 1.2.1 :2025-03-01, 2026-01-31
DuckDB 1.2.2 :2025-04-01, 2026-01-31
section 1.3.x
DuckDB 1.3.0 :2025-06-01, 2026-03-31
DuckDB 1.3.1 :2025-06-15, 2026-03-31
DuckDB 1.3.2 :2025-07-01, 2026-03-31
section 1.4.x
DuckDB 1.4.0 :2025-09-01, 2026-05-31
DuckDB 1.4.1 :2025-10-01, 2026-05-31
DuckDB 1.4.2 :2025-11-01, 2026-05-31
section DuckLake
DuckLake 0.1 :2025-06-01, 2026-03-31
DuckLake 0.2 :2025-07-01, 2026-03-31
DuckLake 0.3 :2025-09-01, 2026-05-31
```
| DuckDB Release | DuckLake Release | Release Date | End of Life Date * |
|----------------|-------------------------------------|------------------|--------------------------|
| 1.0.0 | — | June 2024 | July 2025 |
| 1.1.0 | — | September 2024 | July 2025 |
| 1.1.1 | — | September 2024 | July 2025 |
| 1.1.2 | — | October 2024 | July 2025 |
| 1.1.3 | — | November 2024 | July 2025 |
| 1.2.0 | — | February 2025 | January 2026 |
| 1.2.1 | — | March 2025 | January 2026 |
| 1.2.2 | — | April 2025 | January 2026 |
| 1.3.0 | 0.1 (June 2025), 0.2 (July 2025) | June 2025 | March 2026 |
| 1.3.1 | 0.1 (June 2025), 0.2 (July 2025) | June 2025 | March 2026 |
| 1.3.2 | 0.1 (June 2025), 0.2 (July 2025) | July 2025 | March 2026 |
| 1.4.0 | 0.3 (September 2025) | September 2025 | May 2026 |
| 1.4.1 | 0.3 (September 2025) | October 2025 | May 2026 |
| 1.4.2 | 0.3 (September 2025) | November 2025 | May 2026 |
| 1.4.3 | 0.3 (September 2025) | December 2025 | May 2026 |
* Beginning with DuckDB 1.3.0, MotherDuck will support each Minor Release until the date specified above.
## End of Life (EoL) Policy
When a new minor version becomes available, the previous one enters Extended Support. While we don't offer support for new features, critical fixes may still be backported for the greater of:
- **6 months** after the version’s release, or
- **4 months** after the next minor version is released
When a minor version reaches its End of Life (EoL):
- Connections using that DuckDB version are blocked, requiring MotherDuck users to upgrade
- Ahead of scheduled End of Life (EoL) dates, MotherDuck provides in-app UI warnings, email communications, and targeted outreach to users about impacted versions slated for deprecation
## MotherDuck Extended Lifecycle Support Add-On
MotherDuck offers an **Extended Lifecycle Support Add-On** to provide customers with peace of mind and flexibility to upgrade at a later date by extending ongoing technical support for a minor DuckDB version after it reaches its End of Life (EOL) date.
For more information, please [get in touch with our team](https://motherduck.com/contact-us/product-expert/).
💁 If you have additional questions about our version lifecycle, please feel free connect with us directly in our [Community Slack support channel](https://slack.motherduck.com/) or send a note to support@motherduck.com.
---
Source: https://motherduck.com/docs/troubleshooting/windows-certs
---
sidebar_position: 4
title: Install certificate on Windows machines
---
In some circumstances, you may face an error that reads like `Http response at 400 or 500 level, http status code: 0`.
On Windows machine, this is usually due to [Let's Encrypt](https://letsencrypt.org/) certificate not being trusted.
To fix this, please follow the steps below:
* download this file https://letsencrypt.org/certs/isrgrootx1.der
* open it (double click on the file)

* click on "Install Certificate" and follow the instructions:

Then you should be able to try again.
If it still doesn't work, could you check if it was correctly installed by opening the certmgr (typing "`cert`" in the search box should show it)

And then it should be under `Trusted Root Certification Authorities\Certificates`:
