Source: https://motherduck.com/docs/about-motherduck/about-motherduck --- title: About MotherDuck sidebar_class_name: about-motherduck-icon description: About MotherDuck --- ## Included pages - [Release notes](https://motherduck.com/docs/about-motherduck/release-notes): Latest updates, new features, and improvements to MotherDuck. - [Release notes archive](https://motherduck.com/docs/about-motherduck/release-notes-archive): Archived MotherDuck release notes. - [Feature stages](https://motherduck.com/docs/about-motherduck/feature-stages): Understanding MotherDuck's feature lifecycle stages — Preview and Generally Available. - [Billing](https://motherduck.com/docs/about-motherduck/billing): Learn more about MotherDuck's pricing model and how to manage billing. - [Legal](https://motherduck.com/docs/about-motherduck/legal): Terms of service, privacy policy, and other legal documents for MotherDuck. --- Source: https://motherduck.com/docs/about-motherduck/billing/billing --- title: Billing description: Learn more about MotherDuck's pricing model and how to manage billing. --- import Versions from '@site/src/components/Versions'; import DuckDBDocLink from '@site/src/components/DuckDBDocLink'; MotherDuck offers two [paid](https://motherduck.com/pricing/) self-service plans: Lite and Business. View your Organization's incurred usage, track spend, and view your invoices. All new users start on a 7-Day Free Trial with access to the full set of Business Plan features. ## Included pages - [Pricing model](https://motherduck.com/docs/about-motherduck/billing/pricing): Details of MotherDuck's pricing model. - [Manage billing](https://motherduck.com/docs/about-motherduck/billing/managing-billing): Learn how to manage your MotherDuck spend, choose plans, monitor usage, and view invoices. - [Tag workloads with custom user agents](https://motherduck.com/docs/about-motherduck/billing/tag-workloads-with-custom-user-agents): Add workload tags with custom_user_agent and use QUERY_HISTORY to group activity by workload, tenant, or pipeline. - [Duckling sizes](https://motherduck.com/docs/about-motherduck/billing/duckling-sizes): Learn about MotherDuck Duckling (compute instance) sizes and their optimal use cases. --- Source: https://motherduck.com/docs/about-motherduck/billing/duckling-sizes --- sidebar_position: 3 title: Duckling sizes description: Learn about MotherDuck Duckling (compute instance) sizes and their optimal use cases. --- MotherDuck implements a distinct tenancy architecture that diverges from traditional database systems. The platform utilizes a [hypertenancy](/concepts/hypertenancy) model, which provisions isolated read-write Ducklings (compute instances) for each Organization member. This architecture ensures dedicated compute resources and Duckling-level configuration at the individual user level, allowing users to independently optimize performance parameters according to their specific workload requirements. Each Duckling size has different performance characteristics and [billing implications](/about-motherduck/billing/pricing/#compute-pricing). MotherDuck uses fast SSDs for spill space, so queries can exceed their memory limits with minimal performance impact. DuckDB caches data in memory, and MotherDuck uses fast local disks for storage, which improves cold start times. ## Duckling sizes | Duckling Size | Plans | Use Case | Default Cooldown | Configurable Cooldown Period | Startup Time | Read-Write Duckling Enabled? | Read Scaling Duckling Enabled? | |---------------|------------|----------|------------------|------------------------------|--------------------|-----------------------------|-------------------------------| | Pulse | Lite, Business | Good for small workloads| 1 second | N/A | ~100ms | Yes | Yes | | Standard | Business | Good for most data loading workloads | 1 minute | 1 min – 24 hours | ~100ms | Yes | Yes | | Jumbo | Business | Better for large, complex transformations during loading | 1 minute | 1 min – 24 hours | ~100ms | Yes | Yes | | Mega | Business | Optimal for demanding jobs with even larger scale and volumes than a Jumbo can handle | 5 minutes | 1 min – 24 hours | ~a few minutes | Yes | Yes | | Giga | Business, and in [Free Trial on request](https://motherduck.com/contact-us/product-expert/) | Best used for your largest and toughest workloads like batch jobs that run overnight or on weekends | 10 minutes | 1 min – 24 hours | ~a few minutes | Yes | No | - The cooldown period is [configurable](#configuring-the-cooldown-period) for Standard, Jumbo, Mega, and Giga Ducklings - We recommend keeping the cooldown periods in mind when planning batch sizes - To shut down a Duckling without waiting for cooldown, use [`SHUTDOWN` or `SHUTDOWN TERMINATE`](/sql-reference/motherduck-sql-reference/shutdown-terminate/) ### PULSE **Optimized for ad-hoc analytics and read-only workloads** Pulse Ducklings are auto-scaling and designed for efficiency, making them ideal for: - Running ad-hoc queries (**Note** complex queries involving [spatial analysis](https://duckdb.org/docs/current/core_extensions/spatial/functions.html) or regex-like functions may perform better on larger Duckling sizes) - Read-optimized workflows with high concurrent user access, such as those in customer-facing analytics. - Powering data apps and embedded analytics where quick, short queries are common. - High-concurrency, read-optimized workflows [Learn how Pulse Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing) ### STANDARD **Production-grade Duckling designed for analytical processing and reporting** Standard Ducklings offer a balance of resources for consistent performance, suited for: - Core analytical workflows requiring balanced performance metrics. - Development and validation environments for production workflows. - Standard ETL/ELT pipeline implementation, including: - Parallel execution of incremental ingestion jobs. - Multi-threaded transformation processing. [Learn how Standard Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing) ### JUMBO **A larger Duckling built for high-throughput processing and faster performance** Jumbo Ducklings provide resources for heavy workloads, including: - Large-scale batch processing and ingestion operations. - Complex query execution on high-volume datasets. - Advanced join operations and aggregations. - RAM-intensive processing of deeply-nested JSON structures or other large data objects. [Learn how Jumbo Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing) ### MEGA **Built for high-throughput processing on demanding jobs at even larger scale than a Jumbo's capacity** Mega Ducklings provide compute resources to help expedite large-scale transformations and complex operations, perfect for: - Batch processing and high-volume ingestion operations. - Running a weekly job that rebuilds all of your tables that needs to run quickly, in minutes - not hours. - Complex query execution on high-volume datasets that a Jumbo Duckling won't be able to handle in a time crunch. - Advanced operations for users with 10x the data volume as other users who require low-latency, swift performance. [Learn how Mega Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing) ### GIGA **Our largest Duckling, built for the toughest workloads with massive scale and complexity** Giga Ducklings provide compute resources for the most demanding tasks, perfect for: - Complex, large-scale workloads and jobs that won't run on any other Duckling size. - Running one-time jobs that need to complete overnight or over the weekend, like restating revenue actuals for 10 years's worth of high-volume data. - Huge volumes of advanced join operations and aggregations. - Very large amounts of RAM-intensive processing of deeply-nested JSON structures or other large data objects. [Learn how Giga Ducklings are billed.](/about-motherduck/billing/pricing/#compute-pricing) ## Configuring the cooldown period The **cooldown period** is the duration an idle Duckling stays running after the last query completes. During cooldown, the Duckling remains warm — cached data stays in memory, so follow-up queries start faster. You are billed for the cooldown period, since the Duckling is still running. You can configure the cooldown period per user or service account through the MotherDuck UI (under **Settings > Ducklings**) or through the [`Set user Ducklings` REST API](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/). ### Configurable cooldown period by Duckling type | Duckling type | Default Cooldown | Configurable Cooldown Period | |---------------|-----------------|------------------------------| | Pulse | 1 second | N/A | | Standard | 1 min | 1 min – 24 hours | | Jumbo | 1 min | 1 min – 24 hours | | Mega | 5 min | 1 min – 24 hours | | Giga | 10 min | 1 min – 24 hours | Pulse Ducklings are meant for 'bursty' workloads - as a result, they are on-demand and auto-scaling. Because they are metered on a per-query basis, with a minimum of 1 Compute Unit (CU)\* second, they do not have a configurable cooldown. \***We define and measure the amount of CPU and memory usage over time as a Compute Unit (CU).** ### When to adjust the cooldown period **Shorter cooldown** — reduces idle billing when queries are infrequent or spread out over long intervals. Good for batch jobs or scheduled pipelines where you know the Duckling won't be needed again immediately. **Longer cooldown** — keeps the Duckling warm between queries, avoiding cold-start latency. Good for interactive analytics sessions, dashboards with periodic refreshes, or workloads where cache hits improve performance. ### Example: reducing costs for a nightly batch job A Giga Duckling has a default cooldown of 10 minutes. If you run a batch job that takes 5 minutes and know there's no follow-up query, the Duckling stays idle (and billable) for 10 minutes after the job completes. By reducing the cooldown to 5 minutes, you save 5 minutes of idle billing per run. For a daily job, that's over 30 hours of saved compute per year. To eliminate idle billing entirely, use [`SHUTDOWN`](/sql-reference/motherduck-sql-reference/shutdown-terminate/) at the end of your job to shut down the Duckling gracefully, or [`SHUTDOWN TERMINATE`](/sql-reference/motherduck-sql-reference/shutdown-terminate/) to force-terminate it immediately. Note that you will always be billed for the minimum cooldown time of 1 minute. ::::info MotherDuck meters compute per-second and bills for a 1-minute minimum. While Standard, Jumbo, Mega, and Giga Ducklings are billed for *wall clock time*, Pulse Ducklings are metered on a per-query basis to support 'bursty' workloads. As a result, they are on-demand and auto-scaling. Because Pulse Ducklings are metered on a *per-query basis, with a minimum of 1 Compute Unit (CU)\* second*, they do not have a configurable cooldown. \***Compute Unit (CU): The amount of CPU and memory usage over time.** :::: ### Important notes - Cooldown is **best effort** — Ducklings may be shut down before the configured cooldown expires due to lifetime limits, background operations, and maintenance upgrades. - You are only billed for the time a Duckling is actually running. If a Duckling shuts down early, billing stops at that point. - The UI validates the min/max bounds and shows an error if the configured value is out of range. ## Changing Duckling sizes Duckling sizes can be changed in MotherDuck UI by clicking on the icon in the top right, or under "Settings > Ducklings". Here you can choose the desired Read/Write and Read Scaling size. Changing Duckling size can take up to a few minutes while your new Duckling wakes up. ![Duckling Selector](img/duckling_selector.png) The Duckling size for a user or service account can also be set using the [`Set user Ducklings` REST API](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/). **Note:** Changing Duckling size in the UI or through our [REST API](/sql-reference/rest-api/motherduck-rest-api/) takes * **2 minutes** for Pulse, Standard and Jumbo * **5 minutes** for Mega * **10 minutes** for Giga --- Source: https://motherduck.com/docs/about-motherduck/billing/managing-billing --- sidebar_position: 2 title: Manage billing description: Learn how to manage your MotherDuck spend, choose plans, monitor usage, and view invoices. --- import Versions from '@site/src/components/Versions'; import DuckDBDocLink from '@site/src/components/DuckDBDocLink'; This guide explains how to manage your MotherDuck billing, including selecting a plan that suits your needs, keeping track of your usage, and understanding your invoices. ## Choosing your billing plan MotherDuck offers a variety of [plans with different features and pricing](/about-motherduck/billing/pricing/). During your initial 7-day Free Trial of the Business Plan, you can explore the full set of MotherDuck's capabilities. Afterwards, or at any time during the trial, you can select a plan by navigating to the [Plans page](https://app.motherduck.com/settings/plans) in Settings within the MotherDuck UI: - **Continue with Lite Plan**: If you select "Lite" your organization will continue on the [Lite Plan](/about-motherduck/billing/pricing/#plan-comparison). This plan includes 10 CU hours on Pulse and 10 GB of storage per month at no cost. Additional usage is billed on a pay-as-you-go basis. - **Upgrade to Business Plan**: Selecting "Business" moves your organization to the [Business Plan](/about-motherduck/billing/pricing/#plan-comparison), designed for teams with features like 10 users, unlimited service accounts, access to all five Duckling sizes, a 99.9% availability SLA, [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/), and a configurable snapshot retention period of 0 - 90 days. For details on the features and allowances of each plan, please refer to our [Pricing Model documentation](/about-motherduck/billing/pricing/). ## Monitoring usage You can monitor your organization's Compute and Storage usage from the [Billing page](https://app.motherduck.com/settings/billing) in the MotherDuck UI. - **Compute usage** is displayed in Compute Unit-hours (CU-hours). Learn more about [how compute is priced](/about-motherduck/billing/pricing/#compute-pricing). - **Storage usage** is displayed as your average storage in GB over the billing period. Learn more about [how storage is priced](/about-motherduck/billing/pricing/#storage-pricing). Your storage bill is calculated based on your average daily storage over the month. For example, if you store 10 GB for half the month and 20 GB for the other half, your average is 15 GB. Historical data retention (default 1 day (Lite) or 7 days (Business) for new databases) also contributes to your storage usage. ![Usage](img/billing.png) ## Viewing your invoice The [Billing page](https://app.motherduck.com/settings/billing) also lets you view your past invoices, as well as the current month's invoice thus far. - **Lite Plan users** see invoices for any usage beyond the included 10 CU hours and 10 GB storage. - **Business Plan users** see their actual invoices reflecting their usage and the $250/month platform fee. - **[Free Trial users](/about-motherduck/billing/pricing/#free-trial)** see estimated invoices, which are fully discounted during the trial period. Incurred Storage and Compute costs are broken down per-user and per-service-account, as well as aggregated for the entire organization. :::note For organizations with more than 500 users and service accounts, invoices may show aggregated usage rather than a full per-user breakdown to maintain clarity. ::: --- Source: https://motherduck.com/docs/about-motherduck/billing/pricing --- sidebar_position: 1 title: Pricing model description: Details of MotherDuck's pricing model. --- import Versions from '@site/src/components/Versions'; import DuckDBDocLink from '@site/src/components/DuckDBDocLink'; import ComputePricingTables from './_compute-pricing-tables.mdx'; import StoragePricingTable from './_storage-pricing-table.mdx'; import SqlAssistantPricingTable from './_sql-assistant-pricing-table.mdx'; import AdvancedAiPricingTables from './_advanced-ai-pricing-tables.mdx'; ## MotherDuck pricing model MotherDuck is a serverless cloud data warehouse. We believe in providing our users with simple pricing. MotherDuck offers two self-serve [plans](https://motherduck.com/pricing/): Lite and Business. :::note MotherDuck is available on AWS in three regions: **US East (N. Virginia)** - `us-east-1`, **US West (Oregon)** - `us-west-2`, and **Europe (Frankfurt)** - `eu-central-1`. Each MotherDuck Organization is scoped to a single cloud region that must be chosen at Org creation when signing up. ::: ### Plan comparison | Feature | Lite | Business | Enterprise | |---------|------|----------|------------| | **Best for** | Individual users, small projects | Teams and organizations | Bespoke deployments: *[Contact us](https://motherduck.com/contact-us/product-expert/)* | | **Platform fee** | $0/month | $250/month | *Custom* | | **Compute included** | Includes 10 CU hours / month + [Pay-as-you-go for additional usage](#compute-pricing) | [Pay-as-you-go for additional usage](#compute-pricing) | *Custom* | | **[Duckling sizes](https://motherduck.com/docs/about-motherduck/billing/duckling-sizes/)** | Pulse only | Pulse, Standard, Jumbo, Mega, Giga | Pulse, Standard, Jumbo, Mega, Giga | | **[Read Scaling](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)** | - | Yes | Yes | | **[Storage included](/concepts/storage-lifecycle/)** | Includes 10 GB / month + [Pay-as-you-go for additional usage](#compute-pricing) | [Pay-as-you-go for additional usage](#storage-pricing) | *Custom* | | **Users** | 3 active users / 2 service accounts | 10 active users / unlimited service accounts | *Custom* | | **SLA** | - | 99.9% Availability | 99.9% Availability | | **Backup** | 1 day (paid feature) | - [Point-in-time Restore](https://motherduck.com/docs/concepts/data-recovery/)
- up to 90 day backups | *Custom* | | **Observability** | - | [Query history](/docs/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) | [Query history](/docs/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) | **Users** are defined as human users with a login through email + password, Google, GitHub, or [SSO](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/), while **[Service Accounts](/docs/key-tasks/service-accounts-guide/)** are defined as non-human accounts designed for programmatic access and automation workflows (for example, backend services, integrations, or customer-facing analytics). ### Compute pricing A **Duckling** in MotherDuck is a compute instance. Each Duckling has a **cooldown period**, which is the amount of time the Duckling will remain active after completing the last query. This keeps the Duckling warm for follow-up queries that may benefit from MotherDuck's intelligent storage and caching. The cooldown period is [configurable](/about-motherduck/billing/duckling-sizes/#configuring-the-cooldown-period) for Standard, Jumbo, Mega, and Giga Ducklings. MotherDuck meters compute per-second and bills for a 1-minute minimum. While Standard, Jumbo, Mega, and Giga Ducklings are billed for *wall clock time*, Pulse Ducklings are metered on a per-query basis to support 'bursty' workloads. As a result, they are on-demand and auto-scaling. ::::note Because Pulse Ducklings are metered on a *per-query basis, with a minimum of 1 Compute Unit (CU) second* instead of wall-clock time, they do not have a configurable cooldown. A **Compute Unit (CU)** is defined as *the amount of CPU and memory usage over time*. :::: If you want to group query history by integration, pipeline, or tenant, set `custom_user_agent` when connecting to MotherDuck and query [`MD_INFORMATION_SCHEMA.QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/). See [Tag workloads with custom user agents](/about-motherduck/billing/tag-workloads-with-custom-user-agents/) for an example pattern. Some teams use that breakdown in internal allocation, but MotherDuck billing still follows the pricing model on this page. The [`SHUTDOWN` and `SHUTDOWN TERMINATE`](/sql-reference/motherduck-sql-reference/shutdown-terminate/) commands can be used to shut down a Duckling without waiting for the cooldown period. `SHUTDOWN` waits for running queries to complete, and `SHUTDOWN TERMINATE` force-terminates immediately. #### Duckling sizes | Duckling | Billing | Default Cooldown | Configurable Cooldown Period | Details | |----------|---------|------------------|------------------------------|---------| | [Pulse](/about-motherduck/billing/duckling-sizes/#pulse) | Per CU (resources consumed), not wall-clock time | 1 second | N/A | Small, bursty queries, read-heavy workloads, and frontend scenarios. For compute-heavy queries, consider Standard instead.
**Billing example:** 2s low CPU + 1s cooldown = 3 CU seconds, and 100 small writes × 2 CUs + 1s cooldown = 201 CU seconds. | | [Standard](/about-motherduck/billing/duckling-sizes/#standard) | Per second | 1 minute | 1 min to 24 hours | General purpose data warehouse workloads.
**Billing example:** 5 queries × 30s + 100ms startup + 60s cooldown = 210 seconds. | | [Jumbo](/about-motherduck/billing/duckling-sizes/#jumbo) | Per second | 1 minute | 1 min to 24 hours | Large-scale data warehouse workloads.
**Billing example:** 2 queries × 8min + 100ms startup + 60s cooldown = 17 minutes. | | [Mega](/about-motherduck/billing/duckling-sizes/#mega) | Per second | 5 minutes | 1 min to 24 hours | Demanding jobs and large-scale workloads.
**Billing example:** 2 queries × 8min + few min startup + 5min cooldown = ~21 minutes. | | [Giga](/about-motherduck/billing/duckling-sizes/#giga) | Per second | 10 minutes | 1 min to 24 hours | Batch jobs and overnight or weekend processing.
**Billing example:** 2 queries × 5min + few min startup + 10min cooldown = ~20 minutes. | :::warning For long-running, compute-heavy queries, consider using a Standard or even larger Duckling instead of a Pulse. Pulse Ducklings may consume high volumes of CUs when scaling up for intensive, bursty workloads ::: :::note Changing your Duckling size to Pulse, Standard, or Jumbo through the [UI or REST API](../../../sql-reference/rest-api/motherduck-rest-api) may take up to 2 minutes. Switching to a Mega takes up to 5 minutes, while switching to a Giga takes up to 10 minutes. ::: #### **Compute** ### Storage pricing Under the hood, MotherDuck uses DuckDB's compression algorithms to reduce the storage footprint and optimize performance. MotherDuck charges for data stored in its managed storage system based on your **average storage usage over the billing period**. Your monthly bill is calculated as the average of your daily storage (in GB) multiplied by the per-GB rate. For example, if your MotherDuck Organization is in `us-east-1` and your average storage over December is 650 GB, the final bill will be computed as follows: - 650 GB × $0.04/GB = **$26.00** #### What counts towards my storage bill? - **Standard databases:** MotherDuck provides point-in-time restore by retaining historical data as `historical_bytes` for organizations on paid plans. - **Transient databases:** Databases can be set as `TRANSIENT` [at database creation](/concepts/storage-lifecycle#storage-management). Transient databases are billed for active data stored and a 1-day failsafe minimum. Data is not retained as failsafe bytes beyond this minimum, which is ideal for temporary or reproducible datasets like intermediate job outputs. - **NOTE:** By default, for both Standard and Transient databases, new databases retain 1 day of historical data on **Lite** (paid) and 7 days of historical data on **Business** - Business plan users are able to configure their `historical_bytes` retention window from 0 to 90 days. Users are billed for active data plus historical, retained, and failsafe bytes. Refer to the [Storage Lifecycle](/concepts/storage-lifecycle) for more details. #### What does not count towards my storage bill? - [Shares](/key-tasks/sharing-data) do not incur additional data storage as they are a zero-copy operation. - Using the [CREATE DATABASE X FROM DATABASE Y](/sql-reference/motherduck-sql-reference/create-database/) command is also a zero-copy operation. Only incremental changes made to the new database are added to storage as `active_bytes`, while active Shares that point to a deleted databases will retain `retained_for_clone_bytes`. - Any data managed by you in your own object storage bucket, for example S3, Blob, or GCS, that you can use to process data. - Data on your laptop accessed through the `duckdb -ui`, even when signed into MotherDuck. #### What changes can I make to optimize my storage bill? The right approach to optimize storage usage in MotherDuck varies by use case and implementation. Please reach out to us at support@motherduck.com for additional guidance on how to optimize your storage effectively for your needs. #### **Storage rates** ### AI function pricing MotherDuck enhances your analytical capabilities with integrated AI functions. These functions leverage powerful large language models (LLMs), fine-tuned to assist with SQL tasks and unlock new OLAP use cases. AI functions are categorized and priced as follows: - **SQL Assistant Functions**: metered per call, with some free features. - **Advanced AI Functions**: metered per token consumed for both input and output, priced in AI Units (1 AI Unit = $1.00). ### SQL assistant functions These features, including [FixIt](/docs/getting-started/interfaces/motherduck-quick-tour/#help-me-fix-this-broken-query--fixit) and [Text-to-SQL](/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql/), help you write, understand, and correct SQL queries. SQL Assistant features are included with both Lite and Business plans. ### Advanced AI functions These functions provide access to powerful generative AI models for tasks like embedding generation and complex prompting. They are metered based on token usage, with costs calculated in AI Units (1 AI Unit = $1.00). :::note For Lite and Business plans, there is a default soft limit on Advanced AI Function consumption of 10 AI Units per day to help control costs. This limit can be increased or removed by contacting support@motherduck.com. ::: ## Incentive programs: ### Free trial New users who sign up for MotherDuck and create an organization automatically get access to a 7-day Free Trial without entering a credit card. [Learn how to manage your plan after the trial has ended.](/about-motherduck/billing/managing-billing/#choosing-your-billing-plan) At any point during your Free Trial, you may choose to set up billing and select a plan. At the end of your trial, you can continue with the Lite plan (no credit card required) or upgrade to Business for additional features. [Learn more about managing your bill](/about-motherduck/billing/managing-billing/#choosing-your-billing-plan). --- Source: https://motherduck.com/docs/about-motherduck/billing/tag-workloads-with-custom-user-agents --- sidebar_position: 2 title: Tag workloads with custom user agents description: Add workload tags with custom_user_agent and use QUERY_HISTORY to group activity by workload, tenant, or pipeline. --- Connecting to MotherDuck with `custom_user_agent` parameter will tag queries to identify which workload issued them. That workload can represent an integration, pipeline, tenant, or internal service. Those tags appear in [`MD_INFORMATION_SCHEMA.QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/), so organization admins can inspect tagged activity, group it by workload, and use that breakdown in internal reporting. `MD_INFORMATION_SCHEMA.QUERY_HISTORY` is available on Business plans and only to organization admins. ## 1. choose a tagging convention Use the `custom_user_agent` format described in [Choose a `custom_user_agent` format](/integrations/how-to-integrate/#custom-user-agent-format). You can use this pattern even if you are only tagging existing workloads for reporting or allocation. You do not need to build a full customer-facing integration. Recommended format: - `integration/version(metadata1,metadata2)` with optional version and metadata - Avoid spaces in the integration and version parts - If you want to group by a single workload label later, keep that label in the first metadata position Examples: - `catalogsync` - `catalogsync/5.1.5.1` - `catalogsync/5.1.5.1(batchload,teamfinance)` - `customerportal/5.1.5.1(tenant42,eucentral1)` ## 2. understand what `QUERY_HISTORY` stores `QUERY_HISTORY.USER_AGENT` stores the full DuckDB user agent, not only your custom tag. When `custom_user_agent` is set, the value looks like this: ```text duckdb/() ``` Representative values: | QUERY_HISTORY.USER_AGENT | Extracted `custom_tag` | Extracted `integration_name` | Extracted `metadata` | |---|---|---|---| | `duckdb/v1.5.1(osx_arm64) capi catalogsync/5.1.5.1(batchload,teamfinance)` | `catalogsync/5.1.5.1(batchload,teamfinance)` | `catalogsync` | `batchload,teamfinance` | | `duckdb/v1.5.1(wasm_eh) motherduck-wasm customerportal/5.1.5.1(tenant42,eucentral1)` | `customerportal/5.1.5.1(tenant42,eucentral1)` | `customerportal` | `tenant42,eucentral1` | | `duckdb/v1.5.1(linux_amd64) cpp` | | | `NULL` | ## 3. set `custom_user_agent` Example in Python: ```python con = duckdb.connect("md:analytics", config={ "motherduck_token": token, "custom_user_agent": "catalogsync/5.1.5.1(batchload,teamfinance)" }) ``` For other languages and frameworks, see the [language and framework examples for setting `custom_user_agent`](/integrations/how-to-integrate/#custom-user-agent-examples). ## 4. inspect recent tagged queries Use this query to inspect recent `QUERY_HISTORY` rows and verify that your tags are being extracted the way you expect: ```sql with tagged_queries as ( select start_time, user_name, instance_type, user_agent, regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1) as custom_tag from MD_INFORMATION_SCHEMA.QUERY_HISTORY where regexp_matches(user_agent, '^(?:[^ ]+ ){2}.+$') order by start_time desc limit 20 ), parsed as ( select start_time, user_name, instance_type, user_agent, custom_tag, regexp_extract(custom_tag, '^([^/( ]+)', 1) as integration_name, nullif(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), '') as metadata from tagged_queries ) select start_time, user_name, instance_type, user_agent, custom_tag, integration_name, metadata from parsed order by start_time desc ``` The extraction logic is: - `regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1)` strips the built-in DuckDB and API tokens and returns your custom tag - `regexp_extract(custom_tag, '^([^/( ]+)', 1)` extracts the integration name - `regexp_extract(custom_tag, '\\(([^)]*)\\)', 1)` extracts the metadata payload inside parentheses ## 5. group tagged activity by workload This example groups tagged queries by integration, the first metadata value, and duckling size over the last 7 days. ```sql with tagged_queries as ( select start_time, end_time, instance_type, regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1) as custom_tag from MD_INFORMATION_SCHEMA.QUERY_HISTORY where start_time >= now() - interval 7 day and regexp_matches(user_agent, '^(?:[^ ]+ ){2}.+$') ), parsed as ( select coalesce(nullif(regexp_extract(custom_tag, '^([^/( ]+)', 1), ''), custom_tag) as integration_name, nullif(split_part(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), ',', 1), '') as workload_name, instance_type, date_diff('second', start_time, end_time) as elapsed_seconds from tagged_queries ) select integration_name, coalesce(workload_name, 'unlabeled') as workload_name, instance_type, count(*) as queries, sum(elapsed_seconds) as total_elapsed_seconds, avg(elapsed_seconds) as avg_elapsed_seconds from parsed group by all order by total_elapsed_seconds desc ``` If you want to group by the full metadata string instead, replace the `workload_name` expression with: ```sql nullif(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), '') as workload_name ``` ## 6. use tagged activity for internal allocation Some teams use tagged query history as an input to internal chargeback or cost allocation. One approach is to calculate each workload's share of tracked query time and apply that share to a monthly invoice outside of MotherDuck. ```sql with tagged_queries as ( select start_time, end_time, regexp_extract(user_agent, '^(?:[^ ]+ ){2}(.+)$', 1) as custom_tag from MD_INFORMATION_SCHEMA.QUERY_HISTORY where start_time >= date_trunc('month', now()) and regexp_matches(user_agent, '^(?:[^ ]+ ){2}.+$') ), workload_usage as ( select coalesce( nullif(split_part(regexp_extract(custom_tag, '\\(([^)]*)\\)', 1), ',', 1), ''), regexp_extract(custom_tag, '^([^/( ]+)', 1) ) as workload_name, sum(date_diff('second', start_time, end_time)) as elapsed_seconds from tagged_queries group by 1 ), totals as ( select sum(elapsed_seconds) as total_elapsed_seconds from workload_usage ) select workload_name, elapsed_seconds, elapsed_seconds::double / nullif(total_elapsed_seconds, 0) as tracked_usage_share from workload_usage, totals order by tracked_usage_share desc ``` This is an internal accounting convention, not a MotherDuck billing feature. For the billing model itself, including Pulse compared to fixed-size ducklings and cooldown behavior, see [Understanding the pricing model](/about-motherduck/billing/pricing/) and [Duckling sizes](/about-motherduck/billing/duckling-sizes/). --- Source: https://motherduck.com/docs/about-motherduck/feature-stages --- sidebar_position: 3 title: Feature stages description: Understanding MotherDuck's feature lifecycle stages — Preview and Generally Available. --- # Feature stages MotherDuck features go through lifecycle stages before they are considered stable and production-ready. ## Preview A feature in **preview** is available for use but may be operationally incomplete. Preview features: - May have limited backward compatibility - Are subject to change without notice - Are not covered by MotherDuck's SLA - May have limited support Preview features are a great way to try out new functionality and provide feedback. If you have questions or feedback about a preview feature, connect with us in our [Community Slack](https://slack.motherduck.com/) or email support@motherduck.com. ## Generally Available (GA) A feature that is **generally available** is stable, production-ready, and fully supported. GA features: - Have full backward compatibility guarantees - Are covered by MotherDuck's SLA - Receive full support --- Source: https://motherduck.com/docs/about-motherduck/legal --- sidebar_position: 13 title: Legal description: Terms of service, privacy policy, and other legal documents for MotherDuck. --- ## Product Terms of Service [MotherDuck Product Terms of Service](https://motherduck.com/terms-of-service/) [Products and Fees Addendum](https://motherduck.com/fees-addendum/) [Acceptable Use Policy](https://motherduck.com/acceptable-use-policy/) [Support Policy](https://motherduck.com/support-policy/) --- Source: https://motherduck.com/docs/about-motherduck/release-notes-archive --- sidebar_position: 2 title: Release notes archive description: Archived MotherDuck release notes. --- import VideoPlayer from '@site/src/components/VideoPlayer'; # Release notes archive This archive contains release notes older than May 8, 2025. For newer updates, see the [current release notes](/about-motherduck/release-notes/). ## May 1, 2025 - **Window Functions in Instant SQL:** MotherDuck now offers improved window function support in [Instant SQL](https://motherduck.com/blog/introducing-instant-sql/) - **Copy-Paste in the Object Explorer:** Detailed comments and error messages are now able to be copied and pasted in one-click within Object Explorer tooltips ## April 23, 2025 - **[Preview] AI-powered SQL editing:** MotherDuck users can now access inline AI-powered SQL suggestions within the MotherDuck UI. To try it out, select SQL text in your notebook, use `cmd/ctrl+shift+e` to generate a SQL query or edit by writing an instruction in plain language. - **[Preview] Introducing Instant SQL:** A new way to write SQL that updates your result set as you type to expedite query building and debugging – all with zero-latency, no run button required. Read more about Instant SQL in the [MotherDuck Blog](https://motherduck.com/blog/introducing-instant-sql/). ## April 22, 2025 - **Increasing read scaling replica maximum:** MotherDuck Business Plan users can now set a Read Scaling replica pool size of up to 16 database replicas that can be read concurrently. When connecting with a read scaling token, each concurrent end user connects to a read scaling replica of the database that is served by its own duckling. Refer to the [documentation](../../key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for more details. - **Fresh new look for SQL notebook cells:** The run button, database selection, and other cell options have moved, making more space to focus on your SQL. ![Export query results](./img/release-notes-250422-new_cell_layout.png) ## April 17, 2025 - **Now in Preview:** Organization admins on MotherDuck's Business plan can now use the [`QUERY_HISTORY` view](../../sql-reference/motherduck-sql-reference/md_information_schema/query_history/) to get a consolidated view of all queries run across their full organization. - **Org-wide Databases and Shares:** Organization admins can now view their Organization's [Databases](/sql-reference/motherduck-sql-reference/create-database/) and [Shares](/sql-reference/motherduck-sql-reference/create-share/) in the updated Settings section of the MotherDuck web UI. - **Txt Files can now be uploaded:** MotherDuck users can now upload txt files to their MotherDuck organization. ## April 10, 2025 - MotherDuck supports DuckDB 1.2.2, a bugfix release. More details in the [DuckDB 1.2.2 changelog](https://github.com/duckdb/duckdb/releases/tag/v1.2.2). - We've updated MotherDuck's timezone handling to use `UTC` as the default, replacing the prior `America/New_York` default. When converting values to the "[Timestamp with Time Zone](https://duckdb.org/docs/stable/sql/data_types/timestamp.html#time-zones)" type, UTC will now be applied by default. A custom timezone for the active connection can be set temporarily using the `SET TimeZone = '';` command ([see available timezone values](https://duckdb.org/docs/stable/sql/data_types/timezones.html)). Your DuckDB client's local timezone will still be used for other time-related query operations. For more details on DuckDB's timezone handling, see the [DuckDB Time Zone documentation](https://duckdb.org/docs/stable/sql/data_types/timestamp.html#time-zone-support). - MotherDuck users can now specify an alias when [attaching a SHARE](/key-tasks/sharing-data/sharing-overview/). Refer to the [ATTACH documentation](/sql-reference/motherduck-sql-reference/attach/) for more information and reach out to us in our [Community Slack](https://slack.motherduck.com) if you have any questions or feedback. ## April 3, 2025 - **Access Control for Shares**: MotherDuck users can now create shares with a [RESTRICTED](/sql-reference/motherduck-sql-reference/create-share/#access-clause) access setting, allowing share owners to precisely control access by granting or revoking permissions for individual MotherDuck users or a list of specified users through [GRANT](/sql-reference/motherduck-sql-reference/grant-access/) and [REVOKE](/sql-reference/motherduck-sql-reference/revoke-access/) commands. When first created, a [RESTRICTED](/sql-reference/motherduck-sql-reference/create-share/#access-clause) share is only accessible by the share owner. - **Manual Data Refresh for Read Scaling Replicas**: MotherDuck users can now update data more frequently on [read scaling replicas](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) by using the [CREATE SNAPSHOT OF](/sql-reference/motherduck-sql-reference/create-snapshot/) function to manually trigger snapshot creation, followed by [REFRESH DATABASE](/sql-reference/motherduck-sql-reference/refresh-database/) on the read scaling replica. This provides access to the freshest data without waiting for automatic updates. Note that manual snapshot creation will hold any new write queries on the read-write database from starting in order to able take the snapshot. ## March 20, 2025 - Users can now search & filter for notebooks, databases, and shares in the left sidebar with our object search in the top left navigation. - Introducing performance improvements to the databases section of the sidebar: The attached databases section now scales efficiently to handle very large numbers of databases, schemas, and tables. ## March 6, 2025 - MotherDuck now supports Indexes for query acceleration, in addition to their use in constraints. Learn more about [DuckDB Indexes](https://duckdb.org/docs/stable/guides/performance/indexing.html#art-index-scans). - MotherDuck supports DuckDB 1.2.1, a bugfix release. More details in the [DuckDB 1.2.1 changelog](https://github.com/duckdb/duckdb/releases/tag/v1.2.1). - Support for DuckDB versions 0.10.2, 0.10.3, and 1.0.0 has ended. - Introducing a smoother local file experience: Persist files across sessions, view metadata directly in the Object Explorer, and convert files to tables. ## February 19, 2025 - Added [EXPLAIN ANALYZE](https://duckdb.org/docs/guides/meta/explain_analyze) support for profiling hybrid queries. - Added a "Running Queries" page in settings to monitor active long-running queries. ## February 11, 2025 With today's release, we're introducing a number of features to support businesses building production-grade analytics. See [blogpost](https://motherduck.com/blog/introducing-motherduck-for-business-analytics/) for more details. **New Plan Options:** MotherDuck now has two platform plans to choose from, **Lite** and **Business**, alongside our **Free** Plan. * The **Free Plan** is designed for hobbyists and experimenters with small-scale analytics needs, like hobby projects. * The **Lite Plan** is most useful for small team use cases and individuals. Maybe your small team is building out some early analytics, or your hobby project is growing into something more. * The **Business Plan** is ideal for businesses with complex needs, and larger teams. New Instance type options: **[New Instances](/about-motherduck/billing/duckling-sizes/) and compute pricing options:** _**Pay Per Instance**_: We're adding new choices for MotherDuck compute, with Pay Per Instance **Standard** and **Jumbo** instances. * The _Pay Per Instance_ model is based on uptime, which provides more predictable costs you can compare to other data warehouse products. * The **Standard** instance is great for everyday tasks, and balanced performance. * The **Jumbo** instance is often useful for heavy workflows, like batch ETL pipelines or complex transformations. * When you run a query, your instance spins up within milliseconds. * You pay for the seconds that the instance is running, with a minimum of one minute. _**Pay Per Query**_: Our existing instances are now called **Pulse**. * These instances are capped in size, however they are billed on our existing _Pay Per Query_ model, metered for billing on Compute Unit seconds. * The **Pulse** instance enables lightweight, fully serverless analytics. * This can be very useful for applications where you have data partitioned by user, ad-hoc query execution, or incremental data processing with smaller data sizes. **Read Scaling Controls:** * Users with access to [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling) in their organization can now set the Read Scaling replica pool size, letting you control the maximum concurrency threshold for your read replicas. * Users can set their Read Scaling [Instance type](/about-motherduck/billing/duckling-sizes/) independently of the Read/Write Instance type. ## February 6, 2025 MotherDuck supports DuckDB's newly released version 1.2.0 🎉 DuckDB 1.2.0 is packed with improvements that make using MotherDuck even easier, like a better CSV reader, friendlier SQL, and improved performance! Read more about DuckDB 1.2.0 in the [MotherDuck Blog](https://motherduck.com/blog/announcing-duckdb-12-on-motherduck-cdw), and review the official [DuckDB Labs 1.2.0 announcement](https://duckdb.org/2025/02/05/announcing-duckdb-120.html) for notes on breaking changes and detailed updates. ## January 8, 2025 - MotherDuck clients now verify the server's TLS certificate. - MotherDuck now automatically opens the browser to facilitate authentication in Windows environments. ## December 12, 2024 - [Preview] Introducing MotherDuck's REST API: Organizations with large numbers of users have struggled to manage them through the MotherDuck UI. We've received requests for a programmatic interface, and we've listened! We are launching a User Management REST API to provide support for managing Users and Access Tokens. Through the REST API, MotherDuck users can now easily create separate users for BI or data ingestion/processing workloads, and enable new experiences for app developers (ie. issuing temporary short-lived read scaling tokens). See [the documentation](documentation/sql-reference/rest-api/motherduck-rest-api.info.mdx) for more information and reach out to us in our community Slack channel if you have any questions or feedback! ## December 4, 2024 - [Preview] Introducing support for read scaling: With the launch of read scaling tokens, MotherDuck accounts now support scaling up to 4 replicas of your database that can be read concurrently. When connecting with a read scaling token, each concurrent end user connects to a read scaling replica of the database that is served by its own duckling. See [our documentation](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for more information. - Auto-sync of new and deleted attachments: users who connect to MotherDuck through two different clients concurrently (example: the DuckDB CLI and the MotherDuck UI), will now see changes made by one client in another. For example, if you create a new database in the CLI, the MotherDuck UI will automatically be updated to reflect it and vice versa. Similarly, a new attachment, detaching, or database deletion will be synced. - Create databases directly from object explorer. Users can now create a new attached database from the Object explorer panel on the left side of the MotherDuck web UI. Previously you could only do so by issuing an SQL command. ## November 21, 2024 - Introducing the **Table Summary**. Customers have told us that they love the Column Explorer, but they wish there was an easy way to see it for tables in their database lists without having to write SQL. So we decided to build the table summary. You can activate it by clicking on a table or view in the Object Explorer, which will reveal a panel that shows the Column Explorer (the column names, types, distributions, and null percentages for the selected table or view). You can get a quick preview of the table preview and see the DDL statement that defines it. We're excited to see how you use it! - **A resizable, responsive Column Explorer**. To make the table summary work well, we made the Column Explorer both resizable and responsive. This also means the inspector – the right side panel that expands and shows the Column Explorer for your result sets – can be resized. As the panel gets smaller, we responsively hide the null percentage and the distribution plots, giving more room for the column name. - Introducing the **[MD_INFORMATION_SCHEMA](documentation/sql-reference/motherduck-sql-reference/md_information_schema/introduction.md)**. The MotherDuck MD_INFORMATION_SCHEMA views are read-only, system-defined views that provide metadata information about your MotherDuck objects. The current views that you can query to retrieve metadata information are: databases, owned_shares, and shared_with_me. ## November 7, 2024 - MotherDuck now supports DuckDB 1.1.3 clients, a bugfix release. More info in the [DuckDB 1.1.3 changelog](https://github.com/duckdb/duckdb/releases/tag/v1.1.3). - DuckDB recently [introduced a change](https://github.com/duckdb/duckdb/pull/13372) that would allow for much more efficient concurrent bulk ingestion. We completed the necessary infrastructure changes, plus collaborated on [some bug fixes](https://github.com/duckdb/duckdb/pull/14467) and that optimization is now enabled on our backends. ## October 31, 2024 - Motherduck introduces `Admin` and `member` roles for organizations. `Admin` users can change the roles of other users in the organization or [Remove](documentation/key-tasks/managing-organizations/managing-organizations.mdx#removing-users) a user from the organization. - MotherDuck & Hydra announced the first release of [pg_duckdb](https://github.com/duckdb/pg_duckdb), a PostgreSQL extension that allows you to run DuckDB (and connect to MotherDuck!) within PostgreSQL. Read more about it in the [pg_duckdb announcement blog post](https://motherduck.com/blog/pgduckdb-beta-release-duckdb-postgres/) ## October 17, 2024 - MotherDuck now supports DuckDB 1.1.2 clients, a bugfix release. More info in the [DuckDB 1.1.2 changelog](https://github.com/duckdb/duckdb/releases/tag/v1.1.2). ## October 14, 2024 - Shares now support [auto-updating](documentation/sql-reference/motherduck-sql-reference/create-share.md). Automatically updated shares no longer require running explicit UPDATE SHARE commands. Instead changes on the underlying database are automatically published to the share within at most 5 minutes, after writes have completed. However, the option for manually updating shares remains available and continues to be the default setting. This allows users who prefer finer control over their update lifecycle to maintain their usual workflow. The auto-updating property is defined at share creation time, and share owners can force an explicit update any time on both types of shares by running [`UPDATE SHARE`](documentation/sql-reference/motherduck-sql-reference/update-share.md). ## October 9, 2024 We are excited to introduce a new SQL [prompt](/documentation/sql-reference/motherduck-sql-reference/ai-functions/prompt.md) function, currently in preview, that enables text generation directly within SQL queries. This feature leverages LLMs to process and generate text based on provided prompts. Features: * Generate SQL: Use the prompt function in your SQL queries to request text generation, for example, `SELECT prompt('Write a poem about ducks');`. * Model Selection: Specify the LLM model type with the model parameter. Available models include `gpt-4o-mini` (default) and `gpt-4o-2024-08-06`. * Structured Outputs: Opt for structured responses using the struct or json_schema parameters to tailor the output format to your needs. Check out more [prompt function examples](/documentation/sql-reference/motherduck-sql-reference/ai-functions/prompt.md#text-generation). ## October 2, 2024 - MotherDuck now supports [monitoring](documentation/sql-reference/motherduck-sql-reference/connection-management/monitor-connections.md) and [interrupting](documentation/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections.md) server-side queries. - Various stability and usability improvements. ## September 25, 2024 - MotherDuck now supports DuckDB 1.1.1, a bugfix release. More info in the [DuckDB 1.1.1 changelog](https://github.com/duckdb/duckdb/releases/tag/v1.1.1). - In the MotherDuck Web UI, users can easily view and copy the contents of a cell from their query results. ## September 16, 2024 MotherDuck now supports DuckDB version 1.1.0. 🎉 This releases includes a number of new features and a lot of performance improvements. Here are some non-exhaustive key updates: **New features** - [SQL variables](https://duckdb.org/2024/09/09/announcing-duckdb-110#friendly-sql) - [Query an query_table functions](https://duckdb.org/2024/09/09/announcing-duckdb-110#query-and-query_table-functions) - [GeoParquet (Spatial extension features)](https://duckdb.org/2024/09/09/announcing-duckdb-110#spatial-features) **Performance improvements** - [Dynamic Filter Pushdown from Joins](https://duckdb.org/2024/09/09/announcing-duckdb-110#dynamic-filter-pushdown-from-joins) - [Automatic CTE Materialization](https://duckdb.org/2024/09/09/announcing-duckdb-110#automatic-cte-materialization) - [Parallel Streaming Queries](https://duckdb.org/2024/09/09/announcing-duckdb-110#automatic-cte-materialization) Read more on [DuckDB's 1.1.0 blog](https://duckdb.org/2024/09/09/announcing-duckdb-110.html). ## September 5, 2024 - New MotherDuck users are optionally guided through running and analyzing a query upon first logging in to the Web UI. ## August 21,2024 - MotherDuck now supports [Full Text Search - FTS extension](https://duckdb.org/docs/extensions/full_text_search.html). You can now create a text search index on tables in your MD databases and search them. (Note: Currently, the creation of the FTS index is not supported from MotherDuck-WASM client and app.motherduck.com, but all other clients do.) ## August 14, 2024 - MotherDuck now has an [embedding()](documentation/sql-reference/motherduck-sql-reference/ai-functions/embedding.md) function to compute `FLOAT[512]` text embeddings based on OpenAI's text-embedding-3-small model. Read more about it in our [announcement blog post](https://motherduck.com/blog/sql-embeddings-for-semantic-meaning-in-text-and-rag/)! - MotherDuck now supports [sequences](https://duckdb.org/docs/sql/statements/create_sequence.html), with one small limitation: Table column definitions that refer to a sequence by a fully qualified catalog name are rejected. Note that cross-catalog references are already disallowed by DuckDB. ## August 7, 2024 - MotherDuck now supports [foreign keys](https://duckdb.org/docs/sql/constraints.html#foreign-keys). Foreign keys define a column, or set of columns, that refer to a primary key or unique constraint from another table. The constraint enforces that the key exists in the other table. ## July 24, 2024 - In the MotherDuck Web UI, users can now drop, rename, and comment on tables/views and columns from the Object Explorer - Users can now see the logical size of their MotherDuck databases using `FROM pragma_database_size()` ## July 10, 2024 - **Access Tokens**: Users can now create multiple access tokens and revoke them as needed. Tokens can also be configured to expire after a set number of days. [Learn more](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). - **Organization domain invites**: Organizations can be configured such that that anyone with the organization's email domain automatically receives an invitation upon signing up. - **CREATE SHARE with conflict mode**: Database shares can be created with a conflict mode so if a share with the same name already exists, IF NOT EXISTS will not throw an error and OR REPLACE will replace it with a new share. ## June 26, 2024 - **Delta Lake support**: You can now query Delta Lake tables in MotherDuck. [Learn more](/integrations/file-formats/delta-lake). - In the MotherDuck Web UI, the Object Explorer interface (that catalogs shares and databases on the left side of the UI) has been revamped. - ACH has been added as a billing method, in addition to credit card billing. - Resolved an issue affecting large SQL queries in both the MotherDuck UI and the Wasm SDK. ## June 20, 2024 - New MotherDuck users are now treated to a "Welcome to MotherDuck!" notebook upon first logging on to the Web UI. - In the MotherDuck Web UI, the legacy notebook called "My Notebook" can now be renamed and/or deleted, and notebooks can now be closed. - In the MotherDuck Web UI, helpful links and drop-down menus have been improved. - MotherDuck now supports DuckDB's [Spatial Extension](https://duckdb.org/docs/extensions/spatial.html). This extension is pre-installed in MotherDuck, and users are not required to install this extension. Currently, the `GEOMETRY` type has a limitation in that it does not currently render in the MotherDuck Web UI. More details to come. ## June 13, 2024 - Free Plan compute usage limits are now being enforced. Queries for users on the Free Plan may be throttled. [Learn more](/about-motherduck/billing/pricing#plan-comparison) ## June 11, 2024 - MotherDuck is now Generally Available! ## June 6, 2024 - MotherDuck now supports [organization-scoped and discoverable shares](/key-tasks/sharing-data/sharing-overview). - MotherDuck now supports storing [Hugging Face type secrets](/sql-reference/motherduck-sql-reference/create-secret). ## June 3, 2024 - MotherDuck now supports DuckDB version 1.0.0. If you have upgraded to 0.10.2+, you can connect with clients that are either of version 0.10.2, 0.10.3, or 1.0.0. ## May 30, 2024 - MotherDuck now supports DuckDB version 0.10.3. If you have upgraded to 0.10.2+, you can connect with clients that are either of version 0.10.2 or 0.10.3. - Added support to read datasets directly from HuggingFace. Learn more about this new feature in the [DuckDB HuggingFace announcement](https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb.html). - Added support for [ARRAY Type](https://duckdb.org/docs/sql/data_types/array.html#:~:text=Array%20Type%20%E2%80%93%20DuckDB&text=An%20ARRAY%20column%20stores%20fixed,ARRAY%20%2C%20LIST%20and%20STRUCT%20types.) in MotherDuck UI. - MotherDuck UI now supports multiple notebooks. - Fixed a bug in which running the `UPDATE SHARE` command would kill ongoing queries. ## May 15, 2024 - MotherDuck now supports DuckDB 0.10.2. All new MotherDuck users default to DuckDB version 0.10.2, and all existing users can now permanently migrate to DuckDB version 0.10.2. DuckDB version 0.10.2 features a large number of stability and performance improvements, and all users are encourage to migrate. - Starting with DuckDB 0.10.2, MotherDuck now supports multiple versions of DuckDB at once. For example, you could use DuckDB version 0.10.3 in the CLI and DuckDB version 1.0 in Python. - MotherDuck now supports [Multi-Statement Transactions](https://duckdb.org/docs/sql/statements/transactions.html). You must be on DuckDB version 0.10.2 or above. - MotherDuck now supports [Indexes](https://duckdb.org/docs/sql/indexes.html) for the purpose of constraints of types `UNIQUE` or `PRIMARY KEY`. For example, you can leverage `INSERT ON CONFLICT` to dedupe or upsert your data. [Learn more](https://duckdb.org/docs/sql/statements/insert#on-conflict-clause). - MotherDuck now supports Secrets syntax consistent with DuckDB 0.10 and above. [Learn more](/sql-reference/motherduck-sql-reference/create-secret). - [FixIt](/getting-started/interfaces/motherduck-quick-tour#fix-errors-and-edit-queries-with-ai) is now 2-3x faster. - Improved reliability of the service during releases. Moving forward, MotherDuck releases should not disrupt ongoing queries and workloads for users. ## May 8, 2024 - You can now preview DuckDB version 0.10.2 in MotherDuck. - You can now [choose your organization's pricing plan](/about-motherduck/billing/managing-billing#choosing-your-billing-plan) using the [Plans](https://app.motherduck.com/settings/plans) page in the Settings section of the MotherDuck Web UI. - You can now configure your organization's payment method in the [Billing](https://app.motherduck.com/settings/billing) page in the Settings section of the MotherDuck Web UI. Free Plan customers are not required to configure a payment method. ## May 1, 2024 - Fixed a bug, in which MotherDuck releases would kill running queries. Releases no longer disrupt ongoing queries and workloads. - A number of under the hood stability improvements. ## April 25, 2024 - Improved reliability of `ATTACH` operations. - Various reliability and polish improvements. ## April 24, 2024 - **[Preview] The MotherDuck [Wasm SDK](/sql-reference/wasm-client) is now available for app developers. Read more about the SDK in the [blog announcement](https://motherduck.com/blog/building-data-applications-with-motherduck/). ## April 17, 2024 - [Billing Portal](./billing/managing-billing.mdx) is now available in the MotherDuck Web UI. You can use the Billing Portal to view your organization's incurred usage and current and past invoices. - You can now invite your teammates to [Organizations](../key-tasks/managing-organizations/managing-organizations.mdx). Currently, Organizations are useful to group users together to monitor incurred usage in the Billing Portal, and additional capabilities will land in coming weeks. - Fixed an issue, in which MotherDuck releases would cancel running queries. ## April 10, 2024 - Catalog changes in one MotherDuck client will now automatically propagate to other clients. - MotherDuck now supports indexes on temporary tables. ## March 20, 2024 - Fixed an issue, in which users' runtimes can become unresponsive. - In the MotherDuck UI, improved how row counts and query times are calculated. - A variety of additional bug fixes and infrastructure-level improvements. ## March 7, 2024 - Operations on all databases that create shares (using `CREATE SHARE`), create databases (using `CREATE DATABASE`), or update shares (using `UPDATE SHARE`) are now metadata-only and copy no data. ## February 29, 2024 - A variety of fixes and improvements across the product. ## February 22, 2024 - Numerous bug fixes and stability improvements across the entire product. ## February 14, 2024 - In the MotherDuck web UI, you can now visualize your tables and query results with the [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/). - For any database created starting today, operations on these databases that create shares (using `CREATE SHARE`), create databases (using `CREATE DATABASE`), and update shares (using `UPDATE SHARE`) are metadata-only and copy no data. ## February 13, 2024 - You are no longer required to provide a share name when creating shares. In this case, the created share will be named the same as the source database. For example, executing `CREATE SHARE FROM mydb` would create a share named `mydb`; if your current share is `db`, then `CREATE SHARE` would create a share named `db`. See [`CREATE SHARE`](../sql-reference/motherduck-sql-reference/create-share.md) syntax. - In CLI or Python, MotherDuck no longer displays the authentication token by default. You can retrieve the authentication token by running [`PRAGMA PRINT_MD_TOKEN`](../sql-reference/motherduck-sql-reference/print-md-token.md). - Support for DuckDB version 0.9.1 has ended. ## January 04, 2024 New Features: - MotherDuck now supports [DuckDB macros](../sql-reference/duckdb-sql-reference/duckdb-statements/create-macro.md). - MotherDuck now supports [DuckDB ENUM data types](../sql-reference/duckdb-sql-reference/enum.md). - Fully qualified column names in SELECT clauses are now supported. For example: ```sql SELECT schema.table.column FROM schema.table ``` Updates and Fixes: - Fixed a bug, in which prepared statements for INSERT operations did not work. - In the MotherDuck web UI, data exports are now faster. - Rolled out major infrastructure improvements in hybrid query execution, resulting in faster and more reliable hybrid queries. ## January 03, 2024 - [FixIt](/getting-started/interfaces/motherduck-quick-tour#fix-errors-and-edit-queries-with-ai) helps you resolve common SQL errors by offering fixes in-line. ## November 30, 2023 - In the MotherDuck web UI, you can now copy query results to the clipboard or export query results as CSV, TSV, Parquet, or JSON files. ![Export query results](./img/release-notes-1.15.0-export.png) - In the MotherDuck web UI, query error messages are now easier to read. ![Query error message](./img/release-notes-1.15.0-error-messages.png) ## November 15, 2023 - MotherDuck has been upgraded to DuckDB 0.9.2. You can use either DuckDB 0.9.1 or DuckDB 0.9.2, but not both, until December 6th. ## November 3rd, 2023 - You can now [query Iceberg tables](../integrations/file-formats/apache-iceberg.mdx) on object storage. - Improved stability of share attaches. - In the MotherDuck web UI, a new database selector now enables you to use a specific database for each notebook cell. ## October 25, 2023 - In the MotherDuck web UI, you can now move and reorder individual notebook cells. - In the MotherDuck web UI, the MotherDuck-specific SQL syntax is now highlighted. - In the MotherDuck web UI, column histograms are now opt-in on a per-result basis, rather than a global opt-out via Settings. - Improved how the MotherDuck web UI displays datetime data types, matching formatting in the CLI. - In the MotherDuck web UI, you can now easily copy-paste a rectangular selection of query results into Google Sheets or Excel. ## October 16, 2023 MotherDuck has been upgraded to DuckDB 0.9.1 :tada: Please see the migrations guide for more info! - You can now query Azure object storage. See [documentation](../integrations/cloud-storage/azure-blob-storage.mdx) for more info. - You can now easily load AWS credentials used locally into MotherDuck. Please see syntax for [`CREATE SECRET`](../sql-reference/motherduck-sql-reference/create-secret.md) for more info. - Better performance and reliability with lower memory usage. - More intelligent parsing of CSV files. ## September 21, 2023 - The MotherDuck web UI supports Attaching and Detaching databases and shows detached databases. - The MotherDuck web UI now loads significantly faster. This is an additional improvement over August 30, 2023. - When a user updates a shared database, all consumers automatically receive the update within 1 minute. - Support `CREATE OR REPLACE DATABASE` and `CREATE IF NOT EXISTS DATABASE`. - Fixed a bug in which queries with long commit times would result in the dreaded "`Invalid Error: RPC 'SETUP_PLAN_FRAGMENTS' failed: Deadline Exceeded (DEADLINE_EXCEEDED)`" error. - Performance and stability of uploads has been improved. - The MotherDuck web UI now displays decimals correctly. ## August 30, 2023 - The MotherDuck web UI now loads significantly faster. - The MotherDuck web UI now supports autocomplete. As you write SQL in the UI, on every keystroke autocomplete brings up query syntax suggestions. You can turn off autocomplete in Web UI settings, found under the gear icon in top right. - In the MotherDuck web UI, you can now execute multiple SQL statements in the same SQL cell. ## August 23, 2023 - Fixed a bug, in which large uploads and downloads would fail. - Improved performance of uploading data into MotherDuck from all supported sources. - Added [SHOW ALL DATABASES](../sql-reference/motherduck-sql-reference/show-databases.md) DDL command. This command enables you to list all database types, including MotherDuck databases, DuckDB databases, and databases that were created from shares. - In the MotherDuck web UI, you can now cancel queries. ![cancel query](./img/release0823_1.png) - In the MotherDuck web UI, you can now add files of type JSON and files with arbitrary postfixes. - In the MotherDuck web UI, under the 'Help' menu, you can now find the service specific Terms of Service. ## August 17, 2023 - Numerous stability and performance improvements across the entire product. - Added more descriptive error messages in a number of areas. - Better timestamp support in the MotherDuck UI. ## August 01, 2023 - You can now copy a MotherDuck database through [CREATE DATABASE](/sql-reference/motherduck-sql-reference/create-database) using `CREATE DATABASE cloud_db FROM another_cloud_db`. - Fixed a https certificate error that was appearing on Windows machine when downloading/loading the MotherDuck extension through the CLI. - Fixed a bug where [DESCRIBE SHARE](../sql-reference/motherduck-sql-reference/describe-share.md) was not returning the actual database name. ## July 26, 2023 - You can now use MotherDuck in CLI or Python with the Windows operating system. - LIST and DESCRIBE SHARES SQL commands now return the database name instead of the snapshot name. - Improved resilience of large uploads. - Added more descriptive error messages for DDL queries. ## July 21, 2023 - Added DDL for [`DESCRIBE SHARE`](/sql-reference/motherduck-sql-reference/describe-share) and [`UPDATE SHARE`](/sql-reference/motherduck-sql-reference/update-share). - Added DDL for [`CREATE [OR REPLACE] SECRET`](/sql-reference/motherduck-sql-reference/create-secret) and [`DROP SECRET`](/sql-reference/motherduck-sql-reference/delete-secret). - Added `RESTRICT` and `CASCADE` options to `DROP DATABASE` DDL. See [documentation](/sql-reference/motherduck-sql-reference/drop-database). - The current database, set with USE DATABASE, is now persisted across sessions in the web UI. - Data uploads and downloads have been accelerated by roughly 3x by compressing data over the wire. - Numerous stability and performance improvements across the entire product. - Added more descriptive error messages in a number of areas. ## June 29, 2023 - You can now use AI to help you write SQL with the `prompt_sql` function, answer questions about your data with the `prompt_query` pragma, describe your data with the `prompt_schema` pragma, and fix your SQL with the `prompt_fixup` function. See [documentation](/key-tasks/ai-and-motherduck/ai-features-in-ui). ## June 27, 2023 - Added support for [`DROP SHARE [IF EXISTS]`](/sql-reference/motherduck-sql-reference/drop-share), [`LIST SHARES`](/sql-reference/motherduck-sql-reference/list-shares), and [`LIST SECRETS`](/sql-reference/motherduck-sql-reference/list-secrets) operations. Previously these operations were supported via table functions. The MotherDuck web UI now supports creating, deleting, and listing S3 secrets. - Numerous improvements to the MotherDuck web UI. - Fixed a bug, in which the share URL was not returning after running the `CREATE SHARE` command in the CLI. - Referencing database objects is now case insensitive. For example, if a database `DuCkS` exists, you can now reference it as `ducks` or `DUCKS`. When listing databases, you will see `DuCkS`. ## June 23, 2023 - Numerous fixes to improve the stability and reliability of our authentication process and token expiry. - In the MotherDuck web UI there is now a new drop-down menu on User Profile (upper right) with options to access settings, send an invite, and log out. - Added support for `IF EXISTS` option to the `DROP DATABASE` SQL command. See [documentation](/sql-reference/motherduck-sql-reference/drop-database). - Added support for allowing the `motherduck_token` parameter in the connection string. - Added md_list_secrets() table function. Because MotherDuck currently only supports a single secret, this function returns either `TRUE` or `FALSE` depending on whether a secret exists. See [documentation](/sql-reference/motherduck-sql-reference/list-secrets). - Fixed a bug in the MotherDuck web UI where tables were rendered incorrectly. ## June 21, 2023 - In the MotherDuck web UI, the interactive query results panel now supports all DuckDB data types. - Easier signup flow for new users. - Performance of loading data into MotherDuck has been improved. - Added support for `CREATE [OR REPLACE | IF NOT EXISTS] DATABASE` and `CREATE DATABASE FROM CURRENT_DATABASE()`. - A concurrency issue on dropping and recreating shares has been resolved. - Timeout handling for hybrid queries has been improved. - The MotherDuck connection parameter `deny_local_access` has been renamed to `saas_mode` and now sets both `enable_external_access=false` and `lock_configuration=true` DuckDB properties. In practice, this means that when connecting to MotherDuck with the `deny_local_access=true` parameter, users will _not_ be able to read/write local files, read/write local DuckDB databases, install/load any extensions or update any configuration. See [documentation](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-saas-mode). - Numerous other improvements. ## June 15, 2023 - MotherDuck now supports DuckDB [0.8.1](https://github.com/duckdb/duckdb/releases/tag/v0.8.1). Currently, MotherDuck only supports a single version of DuckDB at a time so you must upgrade your DuckDB instances to 0.8.1. - Performance of loading data into MotherDuck has been drastically improved. - Database name in SQL command `CREATE DATABASE` is now a literal. You need to leave the database name unquoted. For example: - Supported: `CREATE DATABASE ducks;` - Supported: `CREATE DATABASE "ducks";` - No longer supported: `CREATE DATABASE 'ducks';` - You can now create a share using the `CREATE SHARE` statement, in addition to previously supported table function `md_create_database_share()`: - Supported: `CREATE SHARE myshare FROM ducks;` - Supported: `CALL md_create_database_share( 'myshare' , 'ducks' );` - You can now write data to s3 using the `COPY TO` command. - In the web UI entering and exiting full screen mode has been simplified. You can also choose to only display the query editor or the query results using the overflow menu. - In the web UI you can now work with compound data types from json in interactive query results. - You can now use both lowercase and uppercase versions of the environment variable `motherduck_token` (e.g. `MOTHERDUCK_TOKEN`). ## June 7, 2023 - Views are now supported. - Query results in the web UI are now interactive. Powered by [Tad](https://www.tadviewer.com/) and DuckDB in WASM, you can now quickly sort, filter and pivot results of a SQL query. Click on column headers to sort, or the pivot icon to open the control surface. ![query results](./img/release0607_1.png) - Query results now include interactive column histograms for numeric columns. The gray background area of the column histogram is a brush that can be dragged to interactively filter results. ![query results 2](./img/release0607_2.png) - The Motherduck extension for CLI and Python auto-updates itself. Users no longer need to run 'FORCE INSTALL motherduck' to update their MotherDuck-powered DuckDB instances. Note: of course, to get this goodness, we ask you to run force install one last time. - Various stability and usability improvements. ## May 31st, 2023 **Summary** - SQL queries in the web UI are now automatically saved in local storage in your web browser and restored when you reload the page. - The MotherDuck extension is now available for Linux on ARM64! - Support [ON CONFLICT](https://duckdb.org/docs/sql/statements/insert.html#on-conflict-clause) clause. - New setting `deny_local_access` to lock down filesystem and extension loading (note: does not prevent DuckDB database access). ## May 24, 2023 **Summary** - Various stability improvements and bug fixes ## May 22, 2023 **Summary** - The MotherDuck service is upgraded to DuckDB 0.8.0 - Catalog schemas are now supported. - Querying `md_databases()` no longer returns snapshots. - Shares that you create are no longer auto-attached. As the creator, you can attach them via `attach ` - Various stability improvements and bug fixes **_Known issues_** - Some shares appear as "empty" databases. Please report to [support@motherduck.com](mailto:support@motherduck.com) if you spot a sharing issue. ## May 17, 2023 - The DuckDB ICU [extension](https://duckdb.org/docs/extensions/overview.html#all-available-extensions) is now enabled by default. This extension adds support for time zones and collations using the ICU library. - The web UI now displays your avatar instead of initials in the user menu - The first database alphabetically is now used for querying by default in web UI. CLI behavior has not changed – if you don't pass a specific database through the connection string, the default database _my_db_ will be used for querying. NOTE: this will change once we upgrade to the just-released DuckDB 0.8.0 - Output of query EXPLAIN is now more user-friendly - Various stability improvements and bugfixes ## May 5, 2023 - Fixed a bug, in which users were unable to supply the authentication token in-line in the connection string. For instance `.open md:?token=123123` or `duckdb md:?token=3333`. - DELETE and UPDATE table operations are now supported. ## May 3, 2023 - Stability of DML and DDL operations has been greatly improved - Hybrid query execution has now been upgraded to execute many query types more efficiently - ~~You can now upload your current DuckDB database using the `CREATE DATABASE FROM 'CURRENT_DATABASE'` operation~~ (no longer supported as of October 2025) - In the web UI you can now find a link to MotherDuck's technical documentation - In the web UI you can now upload files from your local computer to MotherDuck - In programmatic interfaces (JDBC, CLI, Python) you can now connect to a specific database using syntax `md:` or `motherduck:` - MotherDuck now creates a default database called `my_db` for you. This is the database you connect to if you do not specify a database when connecting to MotherDuck ## April 26, 2023 - You can now work with multiple databases - cloud or local. You can now query across multiple cloud or local databases - You can now save your S3 credentials in MotherDuck using the MD_CREATE_SECRET operation - You can now upload DuckDB databases to MotherDuck using the CREATE DATABASE FROM operation - MotherDuck UI now has improved notebook experience ## April 19, 2023 - Various stability, performance, and UI improvements ## April 12, 2023 - The JSON extension to DuckDB is now pre-installed automatically in the web UI. - The table viewer component in the Web UI is now a simple table (rather than an interactive pivot table). This should greatly improve time to first render on query results, especially for small queries. We plan to re-enable the pivot table in an upcoming release, once some underlying performance issues are resolved. - The duck feet are paddling very hard underwater (numerous stability and performance improvements). ## March 30, 2023 - Fixed: [auto_detection of schema of .csv fails in WASM](https://lindie.app/share/92ac65cc6e006bff2fb60417388294965ef2d4c7) - Fixed: intermittent "Error reading catalog: Cancelling all calls" error - Numerous stability and performance improvements ## March 22, 2023 - CLI uses the same database by default as the web app (first sorted alphabetically) - Multiple improvements in the MotherDuck UI - Numerous stability and performance improvements - Enabled query EXPLAIN for queries that execute in hybrid mode ## March 8, 2023 - Numerous stability and performance improvements - Vastly improved performance of loading multiple CSVs in the same command - Fixed a bug in CLI, in which authentication via browser would fail ## March 1, 2023 > Even more goodies! - Delivered major improvements to hybrid execution, resulting in better efficiency, stability, and performance - Fixed a bug in UI, in which dropping and creating a database with the same name displayed incorrect information - Migrated to DuckDB 0.7.1 - Fixed an error message when running MotherDuck commands in the CLI without running .open ## January 26, 2023 > We're back with more exciting improvements! - Addressed server timeouts associated with long-running queries. Still triaging other potential issues with long running issues but network tier issues should be mitigated to a large degree. - Empty databases now appear in the catalog in UI - Added an MD_VERSION Pragma function - Implemented Oauth sign-in flow from native client - Upgraded MotherDuck-hosted DuckDB to version 0.6.1 - Fixed a number of bugs across the entire service ## December 23, 2022 > Our first release! Duckies first steps 🦆 --- Source: https://motherduck.com/docs/about-motherduck/release-notes --- sidebar_position: 1 title: Release notes description: Latest updates, new features, and improvements to MotherDuck. --- import VideoPlayer from '@site/src/components/VideoPlayer'; import useBaseUrl from '@docusaurus/useBaseUrl'; # Release notes Welcome to our release notes, we're excited to hear about your experience 😃 :::info 💁 If you have any questions, please connect with us directly in our [Community Slack support channel](https://slack.motherduck.com/) or send a note to support@motherduck.com. ::: For older updates, see the [release notes archive](/about-motherduck/release-notes-archive/). ## May 7, 2026 - **Data exports from Dives:** [Dives](/key-tasks/ai-and-motherduck/dives/) can now include export buttons that deliver CSV, JSON, Parquet, or Excel files, both in the MotherDuck UI and in [embedded Dives](/key-tasks/ai-and-motherduck/dives/embedding-dives/#handle-data-exports-from-embedded-dives). Wire up exports from your Dive code with the new [`exportAs` and `useExport` hooks](/sql-reference/motherduck-sql-reference/ai-functions/dives/use-sql-query/#export-query-results) from `@motherduck/react-sql-query`. Embedded exports are delivered to your host page through a `postMessage` channel and require dual mode, now the default for embedded Dives. - **MotherDuck extends cloud coverage to Oregon `us-west-2`:** MotherDuck is now [available on AWS in Oregon `us-west-2`](/concepts/architecture-and-capabilities/#the-motherduck-cloud-service); users are able to create new Organizations in Oregon for lower latency and regional data residency. - **Drizzle support via the Postgres endpoint:** You can now connect to MotherDuck from [Drizzle](https://orm.drizzle.team/) using the [Postgres endpoint](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/drizzle/). ## May 1, 2026 - **Streamlined Dives sidebar:** The [Dives](/key-tasks/ai-and-motherduck/dives/) list in the Object Explorer is now capped at your 10 most recent Dives, with a new **View all Dives** link that opens the full searchable Dives table. - **Wasm Client SDK no longer requires COI headers:** The [MotherDuck Wasm client](/sql-reference/wasm-client/) SDK (now at version `1.5.2`) no longer requires Cross-Origin Isolation (COI) headers, so custom DuckDB Wasm applications using `LOAD motherduck` can run in standard, non-COI environments. This change also brings faster transfers for larger payloads. - **JIT provisioning enabled by default for SSO:** When you activate [SSO](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/) for the first time, [Just-in-Time (JIT) provisioning](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/#just-in-time-jit-user-provisioning) is now on by default, so users from your verified domain can join the organization automatically on first login. - **Link handling in Dives:** When you click a link inside a [Dive](/key-tasks/ai-and-motherduck/dives/), MotherDuck surfaces a confirmation popup for external URLs so you can review the destination before opening it, and navigates directly for links that point inside the MotherDuck UI (for example, another Dive). [Embedded Dives](/key-tasks/ai-and-motherduck/dives/embedding-dives/#handle-link-navigation-from-embedded-dives) forward each click to the host page through a `postMessage` channel, so the parent app can apply its own navigation policy. ## April 22, 2026 - **Duckling Overview UI:** MotherDuck Admins now have access to a [Duckling Overview page](/getting-started/interfaces/motherduck-quick-tour/#duckling-overview) in Settings for a view of activity across every [Duckling](/about-motherduck/billing/duckling-sizes/) in the organization over the last 24 hours, including key metrics like status, disk spills, active minutes, and query-level drilldowns. - **Dive Viewer Inline Preview for Remote MCP Server:** The [MotherDuck remote MCP server](/sql-reference/mcp/) now includes an inline preview [Dive viewer](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/?_gl=1*10naj7t*_up*MQ..*_ga*MTE5NTMzMTIwMS4xNzc2ODk2MDQ0*_ga_L80NDGFJTP*czE3NzY4OTYwNDMkbzEkZzAkdDE3NzY4OTYwNDMkajYwJGwwJGgxMTY1MDY4NDcx#inline-preview-with-the-dive-viewer), allowing users and AI agents to create and view [Dives](/key-tasks/ai-and-motherduck/dives/) directly in their chat workflow. - **Removed Cross-Origin Isolation (COI) requirement for DuckDB Wasm:** MotherDuck's DuckDB Wasm integration can now run in standard i-frames and non-COI third-party environments. This is already powering the MotherDuck UI and [Embedded Dives](/key-tasks/ai-and-motherduck/dives/embedding-dives/), which default to [dual execution](/key-tasks/running-hybrid-queries/) for zero-latency client-side queries. Support for `LOAD motherduck` from custom DuckDB Wasm applications will follow in a future WASM SDK release. ## April 16, 2026 - **DuckDB 1.5.2 support:** MotherDuck supports DuckDB 1.5.2, a bugfix release. Learn more in the [official DuckDB Labs 1.5.2 announcement](https://duckdb.org/2026/04/13/announcing-duckdb-152) and [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.5.2). - **DuckLake 1.0 support:** MotherDuck supports DuckLake 1.0. Learn more in the [official DuckDB Labs DuckLake 1.0 announcement](https://ducklake.select/2026/04/13/ducklake-10/). - **Concurrent checkpoints:** Checkpoints can now run concurrently to reads, insertions and deletions. Previously they could block or be blocked by user queries and also interfere with share updates, as these require a checkpoint. Note that concurrent checkpoints require writes or deletions to be issued by DuckDB clients that are at least on [version 1.5](https://motherduck.com/docs/troubleshooting/version-lifecycle-schedules/). ## April 9, 2026 - **Postgres wire protocol endpoint support for PowerBI:** The [Postgres Endpoint](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/postgres-endpoint-jdbc/), a Postgres wire protocol-compatible MotherDuck client, now enables connectivity with [PowerBI](https://motherduck.com/docs/integrations/bi-tools/powerbi/). ## April 3, 2026 - **[Preview] Embedded Dives:** MotherDuck Dives can now be embedded directly in your customer-facing applications. Create a Dive, generate an embed session, and drop an iframe into your app. Your end-users can explore data in real time, powered by MotherDuck. Embedded Dives currently require a Business plan. Read the [announcement blog](https://motherduck.com/blog/introducing-embedded-dives/) for more details or refer to the [documentation](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/embedding-dives/) to get started. - **Airbyte Connector Certified:** MotherDuck is now a certified Airbyte destination. Refer to the [Airbyte destination documentation](https://docs.airbyte.com/integrations/destinations/motherduck) for more details. - **Customizable Sidebar:** The Object Explorer sidebar is now fully customizable. Right-click any section to show, hide, or reorder it — or access the customization dialog from the command menu or the org menu. Your expand/collapse state is persisted across sessions. ## March 25, 2026 - **Configurable Duckling cooldown periods:** The cooldown period for Standard, Jumbo, Mega, and Giga Ducklings can now be configured from 1 minute to 24 hours via the UI, SQL, or REST API. Learn more in the [Duckling sizes](https://motherduck.com/docs/about-motherduck/billing/duckling-sizes/) documentation. - **SHUTDOWN and SHUTDOWN TERMINATE commands:** Two new SQL commands give you direct control over Duckling lifecycle without waiting for the cooldown period. [SHUTDOWN](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/shutdown-terminate/#shutdown) shuts down a Duckling after running queries complete; [SHUTDOWN TERMINATE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/shutdown-terminate/#shutdown) force-terminates immediately. Both are subject to a 1-minute billing minimum and are useful for cost control in batch pipelines or CI/CD workflows where you want to stop billing as soon as work is done. See the [command reference](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/shutdown-terminate/) for details. - **Dives Version History in UI:** Every time you update a [Dive](https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/), MotherDuck saves a version. Browse previous versions using the version picker in the top-right corner of any Dive or retrieve them programmatically with [list_dives](https://motherduck.com/docs/sql-reference/mcp/list-dives/) and [read_dive](https://motherduck.com/docs/sql-reference/mcp/read-dive/). Version browsing is read-only — selecting an older version does not overwrite the latest. ## March 24, 2026 - **DuckDB 1.5.1 support:** MotherDuck supports [DuckDB 1.5.1](https://duckdb.org/2026/03/23/announcing-duckdb-151) ("Variegata"). This release brings significant performance improvements including bloom filter join pushdown, stats-only min/max evaluation, faster TopN queries with late materialization, and lazy view binding. Note that the VARIANT data type and background checkpointing improvements are not yet supported in MotherDuck and will be made available in a future update. See the [version lifecycle schedules](/troubleshooting/version-lifecycle-schedules/) for supported version ranges. - **DuckLake 0.4:** The DuckLake open table format has been updated to [version 0.4](/integrations/file-formats/ducklake/#whats-new-in-ducklake-10), introducing deletion inlining, sorted compaction, stats-only `COUNT(*)`, TopN file pruning, expression-based default values, and macro support. ## March 12, 2026 - **Postgres wire protocol endpoint support for Tableau Cloud:** The [Postgres Endpoint](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/postgres-endpoint-jdbc/), a Postgres wire protocol-compatible MotherDuck client, now enables connectivity with [Tableau Cloud](https://motherduck.com/docs/integrations/bi-tools/tableau/tableau-cloud/). - **Dives can now be remixed:** A new 'Remix this Dive' menu option in the left-side object explorer opens ChatGPT or Claude with pre-filled metadata to help you explore and iterate on existing analysis and create new [Dives](/key-tasks/ai-and-motherduck/dives/).
## March 6, 2026 - **Postgres wire protocol endpoint:** Query your MotherDuck databases using any Postgres-compatible client, including `psql`, Python libraries like psycopg2/psycopg3, JDBC drivers, and serverless platforms like Cloudflare Workers — without installing a DuckDB client library. The Postgres endpoint is ideal for serverless environments, languages without a DuckDB SDK, thin client architectures, or any tool that supports PostgreSQL data sources. See guides for [Python](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/postgres-endpoint/), [Java (JDBC)](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/postgres-endpoint-jdbc/), and [Cloudflare Workers](https://motherduck.com/docs/getting-started/interfaces/serverless/cloudflare-workers/). ## February 27, 2026 - **Write support for the MCP server is now live:** Users can now perform write operations via the MCP Server using the new `query_rw` tool. This enables programmatic data modification workflows in addition to read queries. Refer to the [documentation](/sql-reference/mcp/query-rw/) and [setup guide for read-only access](/sql-reference/mcp/#restricting-to-read-only-access) for more details. ## February 26, 2026 - **Dives are now available on all plans:** Dives are now available on all MotherDuck plans, giving users access to shareable visualizations built by AI agents and backed by composable SQL. Refer to the [documentation](/key-tasks/ai-and-motherduck/dives/) for more details. - **Faster edits for Dive Previews in Claude:** Claude agents in [Claude Web](http://claude.ai/) and [Claude Desktop](https://code.claude.com/docs/en/desktop) now apply edits to the existing Dive preview instead of re-generating it from scratch. - **SSO support:** MotherDuck now supports Single Sign-On (SSO) with Okta, Microsoft Entra ID, and federated SAML/OIDC. Refer to the [documentation](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/) for more details. ## February 19, 2026 - **Visualizations with Dives:** Create interactive visualizations directly in your MotherDuck UI using natural language. To get started, connect your [AI agent](/getting-started/mcp-getting-started/) (Claude, ChatGPT, Cursor, or any MCP-compatible client) to the [MotherDuck MCP Server](/sql-reference/mcp/) and ask it to build a Dive. The agent writes the SQL, configures the charts, and saves the visualization in your MotherDuck UI. Dives stay live and up-to-date and can be shared across your organization. Read the [announcement blog](https://motherduck.com/blog/duck-dive-and-answer/) and get started in the [Dives documentation](/key-tasks/ai-and-motherduck/dives/). Dives are now available on all MotherDuck plans at no additional charge. ## February 5, 2026 - **Expanded DuckDB Extension support in the MotherDuck MCP Server:** The MotherDuck Remote MCP Server is now compatible with the DuckLake, Spatial, Iceberg, and Delta DuckDB extensions. Read more about using the MotherDuck MCP Server [in the documentation](/key-tasks/ai-and-motherduck/mcp-workflows/). - **Point-in-time Restore**: Restore databases to previous states using automatic or named snapshots within a configurable retention window of up to 90 days. Use [`UNDROP DATABASE`](/sql-reference/motherduck-sql-reference/undrop-database/) to recover deleted databases. Learn more in the [launch blog post](https://motherduck.com/blog/point-in-time-restore/) and the [data recovery documentation](/concepts/data-recovery/). - **DuckDB Database File Upload:** Upload DuckDB database files (.duckdb, .db) from your laptop to MotherDuck using the "Add data" menu in the MotherDuck UI. Preview tables and schemas in the UI before copying to MotherDuck. See the [documentation on loading DuckDB databases into MotherDuck](/key-tasks/loading-data-into-motherduck/loading-duckdb-database/) to learn more about using DuckDB database files. ## January 29, 2026 - **DuckDB 1.4.4:** MotherDuck supports DuckDB 1.4.4, a bugfix release. Learn more in the [official DuckDB Labs 1.4.4 announcement](https://duckdb.org/2026/01/26/announcing-duckdb-144.html) and [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.4.4). ## January 23, 2026 - **Expanded MCP Server support:** The MotherDuck remote MCP Server now supports [Warp](https://www.warp.dev/), [PearAI](https://trypear.ai/), [Trae](https://www.trae.ai/), [Void](https://voideditor.com/), [Positron](https://positron.posit.co/), [Supermaven](https://supermaven.com/), [Aider](https://aider.chat/), and [JetBrains IDEs](https://www.jetbrains.com/). Use your favorite AI assistant to answer questions about your data through natural conversation. See the [MCP Server documentation](/sql-reference/mcp/) to get started. - **Add Data from Cloud Storage:** Import data from Amazon S3, Google Cloud Storage, Cloudflare R2, and others directly in the MotherDuck UI. Click "Add data" and select "From cloud storage" to browse your bucket, select files (or use Wildcard mode for patterns), preview the data, and create tables. Learn more in the [documentation for loading data from Cloud Storage](/key-tasks/loading-data-into-motherduck/loading-data-from-cloud-or-https/). ## January 8, 2026 - **Giga Ducklings on Business plan:** Users on any MotherDuck Business plan can now access [Giga Ducklings](../billing/duckling-sizes/#giga), our largest compute Duckling size, built to tackle the largest, toughest, most complex data transformations. Configure your Duckling size in [Settings > Ducklings](https://app.motherduck.com/settings/ducklings). ## December 17, 2025 - **MotherDuck MCP Server:** Your favorite AI assistant can now talk directly to your data. Connect Claude, ChatGPT, Cursor, or any MCP-compatible client to MotherDuck using the MotherDuck **remote** MCP Server at `https://api.motherduck.com/mcp`. Your agent can explore schemas, run read-only SQL queries, and answer questions about your databases through natural conversation. Learn more in the [announcement blog](https://motherduck.com/blog/analytics-agents), and [MCP Server documentation](/sql-reference/mcp/). ## December 16, 2025 - **DuckDB 1.4.3:** MotherDuck supports DuckDB 1.4.3, a bugfix release. Learn more in the [official DuckDB Labs 1.4.3 announcement](https://duckdb.org/2025/12/09/announcing-duckdb-143) and [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.4.3). - **PlanetScale Postgres integration:** Users of PlanetScale Postgres can now use [pg_duckdb](/concepts/pgduckdb/) to push analytical queries to MotherDuck. Analytical queries are accelerated up to 200x faster with MotherDuck, and keep your Postgres cluster optimized for transactions. Learn more in the [announcement blog](https://motherduck.com/blog/motherduck-planetscale-integration), and [integration documentation](/integrations/databases/planetscale). - **MotherDuck destination for Artie CDC**: Artie now supports MotherDuck as a destination for CDC. Users of Artie can now stream changes from OLTP databases like PostgreSQL, MySQL, and MongoDB to MotherDuck in real-time. Learn more in the [announcement blog](https://motherduck.com/blog/motherduck-artie-integration/), and [Artie documentation](https://www.artie.com/docs/destinations/motherduck). - **Recent Queries added to `MD_INFORMATION_SCHEMA`:** Organization admins on MotherDuck Business plans can now access a more realtime view of all currently running or recently completed queries across their full organization using the [`RECENT_QUERIES` view](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/). This view offers detail for queries not yet captured in the [`QUERY_HISTORY` view](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/). Both views are accessible in the [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction/). - **New columns for query attribution in query history:** The [`QUERY_HISTORY` view](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) along with the new [`RECENT_QUERIES` view](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) in the [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction/) now contain `session_name` and `duckling_id` columns, making it easy to identify which Duckling executed each query, and group read scaling queries by [`session_name`](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#session-affinity-with-session-name). - **MotherDuck Wasm SDK 0.8:** The [MotherDuck Wasm Client](https://www.npmjs.com/package/@motherduck/wasm-client) now leverages a different mechanism for loading the MotherDuck Wasm extension, which makes it easier to control which version of the extension is loaded. Refer to the [documentation](/sql-reference/wasm-client/) to learn more. ## December 12, 2025 - **Query Scheduling Improvements:** Small queries now complete faster without getting stuck waiting behind large, resource-intensive queries, even when heavy queries are processing in the background - **Search Enhancements:** The search bar in the top left pane of the Object Explorer can now be used to search for schemas, tables, and columns in addition to databases and [shares](/key-tasks/sharing-data/) - **Dvorak Keyboard Support:** Dvorak keyboard shortcuts are now supported in the MotherDuck UI - **Column Comments added to the Table Summary:** In the Table Summary, users can now hover over any column name to view its comments alongside the column name and type
## December 4, 2025 - **Transient storage filter in Settings:** The Databases page in Settings in the MotherDuck UI now supports filtering by [storage type](/concepts/storage-lifecycle/#transient-databases) - **`DESCRIBE` and `SUMMARIZE` exports:** Downloading the results of `DESCRIBE` and `SUMMARIZE` queries is now supported in the MotherDuck UI - **DuckLake option in the Add Database menu:** MotherDuck users can create a new [DuckLake](/integrations/file-formats/ducklake/) in the 'Add Database' modal in the left hand pane of the object explorer in the MotherDuck UI - **Inline Docs are now available in the Query Editor:** Notebook cells in the MotherDuck [query editor](/getting-started/interfaces/motherduck-quick-tour/#inline-docs) provide function information on hover, showing function signatures, parameter types, return types, and descriptions without leaving the notebook. Inline Docs can be toggled on and off by going to the Preferences page in Settings.
## November 14, 2025 - **DuckDB 1.4.2:** MotherDuck supports DuckDB 1.4.2, a bugfix release. Learn more in the [official DuckDB Labs 1.4.2 announcement](https://duckdb.org/2025/11/12/announcing-duckdb-142) and [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.4.2). - **Full command menu now at `Cmd/Ctrl+K`:** Access common MotherDuck UI actions from your keyboard, including generating query edits, adding notebook cells, creating notebooks, and navigating between pages. Open the command menu with `Cmd/Ctrl+K` and search for options. For quick access to [generate query edits](../../key-tasks/ai-and-motherduck/ai-features-in-ui#automatically-edit-sql-queries-in-the-motherduck-ui), use `Cmd/Ctrl+Shift+E`. (Note: `Cmd/Ctrl+Shift+P` no longer opens the command menu.) - **Run queries across multiple notebooks:** You can now run cells across multiple MotherDuck UI notebooks, and allowing each to queue and run. Hover over any notebook in the left sidebar to see how many cells are running or queued. Query cancellation is also more reliable across all notebooks.
## November 6, 2025 - **MotherDuck extends cloud coverage to Europe:** MotherDuck is now [available on AWS in Frankfurt `eu-central-1`](/concepts/architecture-and-capabilities/#the-motherduck-cloud-service); users are able to create new Organizations in Europe for lower latency and regional data residency - **Expanded AI functions support for `PROMPT()`:** The [prompt](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) function now supports additional parameters; MotherDuck users can now interact with Large Language Models (LLMs) directly from SQL with more customization and improved support for struct arrays, timestamps, and date and time values - - **`return_type`:** Generate strongly-typed outputs by specifying the exact SQL type to return - **`reasoning_effort`:** Use GPT-5 models with the prompt() function - **MotherDuck Wasm SDK 0.7.0:** The [MotherDuck Wasm Client](https://www.npmjs.com/package/@motherduck/wasm-client) now supports `attach_mode='single'`, simplifying query execution and improving resource predictability when working with a single database. Refer to the [documentation](/sql-reference/wasm-client/) to learn more. - **Usernames added to Database listings in Settings:** MotherDuck Admins can now see the usernames for human users and service accounts on the Databases page in [Settings](/getting-started/interfaces/motherduck-quick-tour/#settings) for more intuitive lookups - **New export options for `EXPLAIN`:** MotherDuck notebook cells now support copying or exporting [`EXPLAIN` results](/sql-reference/motherduck-sql-reference/explain/) to simplify query inspection - **Enhanced Column Explorer experience for UUIDs:** The [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/) now has added support for UUIDs and fields that default to top-N values for improved column-level insights and schema exploration. Refer to the [documentation](/getting-started/interfaces/motherduck-quick-tour/#column-explorer) to learn more. ## October 24, 2025 - **Duplicate MotherDuck notebook cells:** Duplicate cells in MotherDuck UI notebooks using the cell options menu or command menu. Access the duplicate option from the three-dot options menu on any cell, or use `Cmd/Ctrl + Shift + P` to open the command menu and search for "duplicate."
## October 9, 2025 MotherDuck now supports DuckDB versions 1.4.0 and 1.4.1, and DuckLake version 0.3 🎉 DuckDB 1.4 delivers performance gains with improvements like a rewritten sorting engine, more efficient small writes, and new SQL syntax including the MERGE statement. Learn more in the DuckDB [1.4.0](https://github.com/duckdb/duckdb/releases/tag/v1.4.0) and [1.4.1](https://github.com/duckdb/duckdb/releases/tag/v1.4.1) changelogs. ### Performance improvements - **[Sorting is 2x+ faster:](https://github.com/duckdb/duckdb/pull/17584)** Complete rewrite of sorting uses less memory and scales better across threads for ORDER BY, window functions, and list sorting - **[More efficient small writes:](https://github.com/duckdb/duckdb/pull/18829)** Appending small numbers of rows now writes far fewer bytes - **[5x faster checkpointing:](https://github.com/duckdb/duckdb/pull/18390)** Reuses table metadata when tables aren't altered during checkpoint - **[Parallel connection creation:](https://github.com/duckdb/duckdb/pull/18079)** Connections from instance cache can be created in parallel - **[Faster scalar functions on dictionary data:](https://github.com/duckdb/duckdb/pull/18127)** Functions on dictionary-compressed data only run once per unique value ### SQL syntax updates - **[`MERGE INTO` statement:](https://github.com/duckdb/duckdb/pull/18135)** Standard SQL upserts without requiring primary keys or indexes - **[`FILL()` window function:](https://duckdb.org/2025/09/16/announcing-duckdb-140.html#fill-window-function)** Interpolate missing values in ordered data - **[Python-style macro arguments:](https://github.com/duckdb/duckdb/pull/18684)** Macros accept positional or named arguments for any parameter - **[`STRUCT` to `MAP` cast:](https://github.com/duckdb/duckdb/pull/17799)** Direct casting between struct and map types ### Parquet improvements - **[`VARIANT` type reading:](https://github.com/duckdb/duckdb/pull/18187)** Read Parquet `VARIANT` types for faster semi-structured data processing - **[Native geometry type writes:](https://github.com/duckdb/duckdb/pull/18832)** Write native Parquet geometry types - **[Auto-globbing for directories:](https://github.com/duckdb/duckdb/pull/18760)** Automatically treats paths as directories and retries with glob patterns when no file is found Learn more in the official DuckDB Labs announcements for [1.4.0](https://duckdb.org/2025/09/16/announcing-duckdb-140.html) and [1.4.1](https://duckdb.org/2025/10/07/announcing-duckdb-141.html). While you can continue using your current version of DuckDB with MotherDuck, we encourage you to [upgrade your DuckDB clients to 1.4.1](https://duckdb.org/install) as soon as you can to take advantage of the fixes and performance improvements. ### [Preview] DuckLake 0.3 As we announced earlier this year, MotherDuck now supports [DuckLake](https://ducklake.select), an integrated data lake and catalog format. DuckLake 0.3 makes working with DuckLake more robust, including [`CHECKPOINT` for easy maintenance](https://github.com/duckdb/ducklake/pull/406), new paths for Iceberg interoperability, [spatial geometry types](https://github.com/duckdb/ducklake/pull/412), and [`MERGE INTO` support](https://github.com/duckdb/ducklake/pull/351). Learn more about using DuckLake databases in MotherDuck in the [documentation](/integrations/file-formats/ducklake), and the recent improvements in the [DuckDB Labs announcement for DuckLake 0.3](https://ducklake.select/2025/09/17/ducklake-03/). ## September 30, 2025 - **Get help from MotherDuck Experts:** Get a human helping hand with technical questions, troubleshooting, and best practices directly in the MotherDuck UI. Open "Expert help" from the Help menu to talk with our team, and you'll be notified of responses. Expert help is available with Business and Lite plans. - **Transient option for database storage retention:** Databases can now be created with transient retention, which provides a minimal retention period and no failsafe storage. This option can be useful for intermediate datasets or data easily reconstructed from external sources. Create transient databases in the UI or via [`CREATE DATABASE db_name (TRANSIENT)`](../../sql-reference/motherduck-sql-reference/create-database/#syntax). Transient databases are available with Business and Lite plans. Learn more in the [storage management documentation](/concepts/storage-lifecycle#storage-management). - **Duplicate notebooks:** Copy existing SQL notebooks to reuse query templates or create variations of your analysis. Find the duplicate option in any notebook's options menu in the left sidebar. - **Monitor database storage in the MotherDuck UI:** Organization admins can now review database storage metrics in the updated [Databases](https://app.motherduck.com/settings/databases) page, showing current and cumulative database storage footprint over time. Learn more in the [storage lifecycle documentation](/concepts/storage-lifecycle#breaking-down-storage-usage).
## September 10, 2025 - **Instances are now called Ducklings:** We've updated our name for instances to better reflect their purpose as dedicated and scalable DuckDB instances that provide isolated, on-demand compute for each user's analytics workload in MotherDuck. Find the familiar instance controls now in [Settings > Ducklings](https://app.motherduck.com/settings/ducklings). This release does not affect the [Admin REST API methods for instances](../../sql-reference/rest-api/motherduck-rest-api/). Learn more about how [Ducklings](../billing/duckling-sizes/) are different from standard data warehouse instances in [this blog post](https://motherduck.com/blog/scaling-duckdb-with-ducklings/). - **Rename Notebooks from the Object Explorer:** SQL notebooks can now be renamed directly from the left sidebar using a notebook's options menu. - **Enum support in `prompt` function:** The `PROMPT` SQL function now supports enum types for consistent classification outputs. See the [function documentation](../../sql-reference/motherduck-sql-reference/ai-functions/prompt/#classification-with-enums) for details and examples. - **Command menu in the MotherDuck UI:** Navigate the MotherDuck UI from your keyboard using the new command menu. Quickly access common actions like adding notebook cells, creating notebooks, and navigating between pages. Try it out with "Open command menu" in the top-left Organization dropdown, or use `Cmd/Ctrl + Shift + P` ## September 4, 2025 - **Pre-filled names for service accounts and tokens:** When creating service accounts and tokens in the [Settings > Service Accounts](/key-tasks/service-accounts-guide/manage-service-accounts-and-tokens/) page, names are now pre-filled with the following format to help differentiate between them: - _Service Accounts:_ `{creator_username}_service_account_{number}` - _Read-Write Tokens:_ `{sa_username}_read_write_token_{number}` - _Read-Scaling Tokens:_ `{sa_username}_read_scaling_token_{number}` - **DuckLake database icon in the MotherDuck UI:** [DuckLake-backed databases](/concepts/ducklake/) now display a distinct icon to easily distinguish them from databases using MotherDuck native storage. ## August 21, 2025 - **Support for H3 Spatial Indexing Extension:** MotherDuck now supports the [H3 DuckDB Extension](https://duckdb.org/community_extensions/extensions/h3.html), which adds support for the [H3 hierarchical hexagonal grid system](https://h3geo.org/) for geospatial analysis. This extension is pre-installed in MotherDuck, and users are not required to install this extension. ## August 13, 2025 - **GPT 5 Support in `prompt` function**: The `PROMPT` function now supports OpenAI's GPT 5 series models. Refer to the [function documentation](../../sql-reference/motherduck-sql-reference/ai-functions/prompt/) for more details. ## August 12, 2025 - **Display Preformatted VARCHAR values:** VARCHAR results in the MotherDuck UI data value pane now support display of preformatted text. - **Format SQL in MotherDuck Notebook:** Format any SQL statement using the new **Format** button in the notebook cell options menu, or with `Option/Alt + Cmd/Ctrl + O`. When text is selected, only the selection is formatted. ## August 8, 2025 - **Test S3 Credentials:** MotherDuck users can now test S3 credentials directly in the MotherDuck UI on the Secrets page in Settings when adding new S3 secrets. - **Support for DuckDB Configuration Options:** With this release, MotherDuck now correctly respects [DuckDB configuration options](https://duckdb.org/docs/stable/configuration/overview.html) and their local defaults, including extension settings like TimeZone. Broader coverage of additional configuration options is planned for the upcoming [DuckDB 1.4 release](https://duckdb.org/release_calendar.html). ## July 31, 2025 - **Updated FixIt Keyboard Shortcut:** The `Escape` key can now be used to reject [FixIt](/key-tasks/ai-and-motherduck/ai-features-in-ui/#automatically-fix-sql-errors-in-the-motherduck-ui) suggestions, providing a quicker way to dismiss generated SQL fixes. - **Generate Notebook Names:** Get descriptive, context-aware names for notebooks in the MotherDuck UI based on their SQL content. Click the new "Generate name from SQL" button to the left of a notebook's name to try it out. Available for users in MotherDuck's Business and Lite plans. ## July 25, 2025 - **Data Grid UX Improvements:** Data grids now include row numbers to make it easier to explore query results and reference specific rows. Users can now select multiple rows by clicking row numbers with the shift-key modifier. - **New UX for FixIt:** [FixIt](/key-tasks/ai-and-motherduck/ai-features-in-ui/#automatically-fix-sql-errors-in-the-motherduck-ui) now includes keybindings for the toggles to accept and reject suggestions and turn automatic suggestions on and off. - **`Cmd/Ctrl + Enter`:** Accept suggestion and run query - **`Cmd/Ctrl + Shift + Backspace`:** Reject suggestion ## July 16, 2025 - **NEW - Larger Compute Instances:** MotherDuck now offers two new memory-rich compute duckling (instance) types, **Mega** and **Giga**, built to run at high-capacity for the largest, most demanding jobs. Learn more in the [launch blog](https://motherduck.com/blog/announcing-mega-giga-instance-sizes-huge-scale) and [Docs](/about-motherduck/billing/duckling-sizes/). ## July 14, 2025 - **DuckDB 1.3.2:** MotherDuck supports DuckDB 1.3.2, a bugfix release. Additional details are available in the [DuckDB 1.3.2 changelog](https://github.com/duckdb/duckdb/releases/tag/v1.3.2). - **The Settings Button has Moved to the Org Dropdown:** Settings has moved from the left sidebar into the Organization dropdown at the top left for easier access and a cleaner layout. - **Admin Experience Enhancements:** With this week’s release, MotherDuck organization admins now have the flexibility to do more to manage their Org directly from the MotherDuck UI due to better visibility and admin-specific functionality for managing tokens, service accounts, and storage. - **New Service Accounts Page in Settings:** Organization admins can now view, create, and manage service accounts and service account tokens in the [Service Accounts](/key-tasks/service-accounts-guide/manage-service-accounts-and-tokens/) section of MotherDuck settings. - **Impersonation of Service Accounts:** Organization admins can now temporarily [impersonate a service account](/key-tasks/service-accounts-guide/impersonate-service-accounts/) while using the MotherDuck UI. - **Storage Usage History added to `MD_INFORMATION_SCHEMA`:** Organization admins can now access up to 30 days of historical storage data using the [`STORAGE_INFO_HISTORY` view](/sql-reference/motherduck-sql-reference/md_information_schema/storage_info/) in the [`MD_INFORMATION_SCHEMA`](/docs/sql-reference/motherduck-sql-reference/md_information_schema/introduction/). Each record includes a `result_ts` timestamp showing when the storage metrics were calculated. ## July 01, 2025 **[Preview] DuckLake Support**: MotherDuck now supports [DuckLake](https://ducklake.select), an integrated data lake and catalog format. - MotherDuck currently provides two options for creating and integrating with DuckLake databases: - **Fully managed**: MotherDuck manages both data storage and metadata - **Bring your own bucket (BYOB)**: Connect your S3-compatible object storage with options for: - MotherDuck compute + MotherDuck catalog - Bring-your-own-compute (BYOC) + MotherDuck catalog Learn more in the [documentation](/integrations/file-formats/ducklake) and [announcement blog](https://motherduck.com/blog/announcing-ducklake-support-motherduck-preview/). ## June 26, 2025 - **Chat Widget Optimization:** Users can now view their inline edit history in a more compact chat widget and quickly request follow-up changes when needed. - - **Improved Boolean cell styling:** Boolean values in the data grid now have distinct visual weights to make it easier to visually scan result sets and prevent confusion with empty cells. ## June 18, 2025 - **DuckDB 1.3.1:** MotherDuck supports DuckDB 1.3.1, a bugfix release. Additional details are available in the [DuckDB 1.3.1 changelog](https://github.com/duckdb/duckdb/releases/tag/v1.3.1). - **`PIVOT` statements in MotherDuck UI:** The MotherDuck UI now supports [`PIVOT` statements](https://duckdb.org/docs/stable/sql/statements/pivot.html), with pivot columns also appearing in the Column Explorer. `PIVOT` transforms distinct column values into separate columns with aggregated data. - **New `STORAGE_INFO` View in `MD_INFORMATION_SCHEMA`:** Organization admins can now review detailed storage breakdowns per database using the new [`STORAGE_INFO` view](/sql-reference/motherduck-sql-reference/md_information_schema/storage_info/) in the [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction/). ## June 12, 2025 - **Improved query execution UX:** After 5 seconds, the run button now displays a timer showing how long the query has been running. It also offers clearer visual cues for canceling a query on mouseover and focus. ## June 5, 2025 - **Overwrite a database with a zero-copy clone:** The new [`COPY FROM DATABASE (OVERWRITE)` command](/sql-reference/motherduck-sql-reference/copy-database-overwrite/) replaces all data in the target database with the source’s contents in a single atomic operation, waiting for active writes to finish and blocking new ones during the process. - **Copy SQL definitions for views from the Object Explorer:** The dropdown menu for views in the left-hand panel of the MotherDuck UI now lets you copy the associated SQL definition without opening the table summary. ## May 29, 2025 MotherDuck now supports DuckDB version 1.3.0 🎉 DuckDB 1.3.0 improves performance in real-world scenarios for faster queries, new SQL syntax, and smarter Parquet file handling. Learn more in the [changelog](https://github.com/duckdb/duckdb/releases/tag/v1.3.0) here. ### Parquet improvements - **[New `TRY()` expression for safer queries:](https://duckdb.org/2025/05/21/announcing-duckdb-130.html#try-expression)** More graceful handling for bad data by returning `NULL` instead of an error on problematic rows - **[Pushdown of arbitrary expressions into scans:](https://github.com/duckdb/duckdb/pull/16430)** Reductions in unnecessary data processing to deliver up to 30x faster queries - **[Pushdown of inequality conditions into joins:](https://github.com/duckdb/duckdb/pull/16508)** Major speedups for incremental dbt models and join-heavy queries ### SQL syntax updates - **[Python-style lambda syntax:](https://github.com/duckdb/duckdb/pull/17235)** You can now use `lambda x: x + 1` instead of `x -> x + 1`; the old syntax is deprecated, but still supported. - **[`cast_to_type()` function:](https://github.com/duckdb/duckdb/pull/17209)** Dynamically cast values to match column types - useful in generic expressions and `CASE` statements when writing macros. - **[Recursive JSON access:](https://github.com/duckdb/duckdb/pull/17406)** New `json_each()` and `json_tree()` functions make it easier to traverse nested JSON structures. - **[Struct field updates:](https://github.com/duckdb/duckdb/pull/17003)** Individual fields in structs can now be modified using `ALTER`; all fields are rewritten even if only one is updated. - **[Prepared statements metadata:](https://github.com/duckdb/duckdb/pull/16541)** The `duckdb_prepared_statements()` function returns all prepared statements in the session. - **[More flexible type definitions:](https://github.com/duckdb/duckdb/pull/17404)** Support has been added for `CREATE OR REPLACE TYPE`, `CREATE TYPE IF NOT EXISTS`, and `CREATE TEMPORARY TYPE`. - **[Preserved order for `OR` filters:](https://github.com/duckdb/duckdb/pull/17180)** Execution now preserves the order of clauses in `WHERE` conditions using `OR`. - **[Function alias visibility:](https://github.com/duckdb/duckdb/pull/16600)** `duckdb_functions()` now returns aliases in addition to the function outputs. ### Parquet improvements - **[Late materialization:](https://github.com/duckdb/duckdb/pull/17036)** Queries are 3–10x faster with `LIMIT` due to deferred column loading - **[~15% average speedup on reads:](https://github.com/duckdb/duckdb/pull/16595)** New scan and filter efficiency improvements - **[30%+ faster write throughput:](https://github.com/duckdb/duckdb/pull/17061)** Improved multithreaded export performance - **[Better compression for large strings:](https://github.com/duckdb/duckdb/pull/17164)** Large string values are now dictionary-compressed - **[Smarter rowgroup combining:](https://github.com/duckdb/duckdb/pull/17118)** Files are more efficient due to merging small rowgroups at write time Learn more in the official [DuckDB Labs 1.3.0 announcement](https://duckdb.org/2025/05/21/announcing-duckdb-130.html). While you can continue using your current version of DuckDB, we encourage you to [upgrade your DuckDB clients to 1.3.0](https://duckdb.org/docs/installation/?version=stable&environment=cli&platform=macos&download_method=package_manager) as soon as you can to take advantage of the fixes and performance improvements. ### Additional updates from this release are outlined below - - Query results now display in a redesigned table that delivers enhanced performance when viewing and exploring data - column headers now include type information for better context. Additional table functionality, including sorting and filtering of results, is coming in future releases. ## May 22, 2025 - **Faster queries on complex filters and wide tables:** We've significantly boosted performance for queries with IN filters, selective joins, and LIMIT clauses. Expect noticeable speedups on wide tables or those with large string or JSON columns. - **New keybindings for power users:** - Toggle Instant SQL for the current SQL cell: `cmd/ctrl+shift+.` - Toggle Object Explorer: `cmd/ctrl+b` - Toggle Inspector (Column Explorer): `cmd/ctrl+i` - Toggle worksheet mode for the current SQL cell: `cmd/ctrl+e` - **Org-wide Active Accounts:** Organization admins can now view all active accounts and their associated ducklings in the [Active Accounts](https://app.motherduck.com/settings/active-accounts) section of MotherDuck settings. - **Smarter Instant SQL caching:** Instant SQL now accounts for filters in your WHERE clause when building its cache, offering a greater number of relevant rows as you work. - **Full row count in flat table results:** SQL cells now display a full result row count when viewing results in "flat" table mode. - **GPT 4.1 Support in `prompt` function**: The `PROMPT` function now supports OpenAI's GPT 4.1 series models. Refer to the [function documentation](../../sql-reference/motherduck-sql-reference/ai-functions/prompt/) for more details. ## May 16, 2025 - **Multiple SQL statements now supported in Instant SQL:** Execute individual statements within multi-statement SQL cells by clicking on the desired statement while [Instant SQL](https://motherduck.com/blog/introducing-instant-sql/) is enabled. - **Copy Table Names directly from Object Explorer:** Use the options menu on any table in the Object Explorer to copy its name to your clipboard. Paste exact table references into any SQL editor—eliminating typos and saving time when writing queries. For earlier updates, see the [release notes archive](/about-motherduck/release-notes-archive/). --- Source: https://motherduck.com/docs/concepts/architecture-and-capabilities --- sidebar_position: 1 title: Architecture and capabilities description: MotherDuck's serverless architecture combining cloud scale with DuckDB's efficiency through Dual Execution. --- import Image from '@theme/IdealImage'; import Versions from '@site/src/components/Versions'; import InteractiveColumnDiagram from '@site/src/components/InteractiveColumnDiagram'; export const architectureColumns = [ { label: 'Clients', nodes: [ { id: 'ui', title: 'MotherDuck UI', subtitle: 'SQL IDE, notebooks, Dives', icon: 'ui', color: 'green', feature: 'Instant SQL previews on every keystroke, no explicit query run needed', href: '/docs/getting-started/interfaces/motherduck-quick-tour/', }, { id: 'sdks', title: 'DuckDB SDKs', subtitle: 'Python, Node.js, Go, Rust, R, Java', icon: 'code', color: 'green', href: '/docs/getting-started/interfaces/client-apis/', }, { id: 'cli', title: 'DuckDB CLI', subtitle: 'Local compute and storage', icon: 'terminal', color: 'green', href: '/docs/getting-started/interfaces/connect-query-from-duckdb-cli/', }, { id: 'pg', title: 'Postgres endpoint', subtitle: 'BI tools, any Postgres client', icon: 'plug', color: 'green', feature: 'Use any Postgres-compatible tool, no DuckDB install needed', href: '/docs/getting-started/interfaces/postgres-endpoint/', }, { id: 'mcp', title: 'MCP server', subtitle: 'AI assistants', icon: 'bot', color: 'green', feature: 'Fully managed remote server for Claude, ChatGPT, Cursor, and other AI tools', href: '/docs/getting-started/mcp-getting-started/', }, ], }, { label: 'MotherDuck', nodes: [ { id: 'governance', title: 'Governance', subtitle: 'Auth, sharing, secrets, admin', icon: 'shield', color: 'yellow', href: '/docs/key-tasks/sharing-data/sharing-overview/', }, { id: 'ducklings', title: 'Ducklings', subtitle: 'Serverless DuckDB compute', icon: 'cpu', color: 'yellow', feature: 'Sub-100ms cold start with read replicas for horizontal scaling', href: '/docs/concepts/scaling-patterns/', }, { id: 'dives', title: 'Dives', subtitle: 'Interactive visualizations', icon: 'chart', color: 'yellow', feature: 'Shareable live dashboards powered by SQL, with version history', href: '/docs/key-tasks/ai-and-motherduck/dives/', }, { id: 'catalog', title: 'Catalog', subtitle: 'Databases, schemas, tables, views', icon: 'layers', color: 'yellow', href: '/docs/concepts/database-concepts/', }, { id: 'storage', title: 'Storage', subtitle: 'Managed storage and DuckLake', icon: 'harddrive', color: 'yellow', feature: 'Transactional lakehouse format with automatic optimization', href: '/docs/integrations/file-formats/ducklake/', }, ], }, { label: 'External sources', nodes: [ { id: 'cloud-storage', title: 'Cloud storage', subtitle: 'S3, GCS, Azure, R2', icon: 'cloud', color: 'sky', href: '/docs/key-tasks/cloud-storage/querying-s3-files/', }, { id: 'databases', title: 'Databases', subtitle: 'Postgres, SQLite, MySQL', icon: 'server', color: 'sky', href: '/docs/integrations/', }, { id: 'ducklake-byob', title: 'DuckLake BYOB', subtitle: 'Bring your own S3 or R2 bucket', icon: 'lake', color: 'sky', href: '/docs/integrations/file-formats/ducklake/', }, ], }, ]; export const architectureConnectors = [ { label: 'Dual\nExecution', tooltip: 'Queries are automatically routed to the optimal location: local DuckDB, MotherDuck cloud, or both', }, { label: 'Query &\ningest', tooltip: 'Query external sources in place or load data into MotherDuck storage', }, ]; MotherDuck is a serverless cloud analytics service with a unique architecture that combines the power and scale of the cloud with the efficiency and convenience of DuckDB. MotherDuck's key components are: - The MotherDuck cloud service - MotherDuck's DuckDB SDK - Dual Execution - The MotherDuck web UI ### The MotherDuck cloud service The MotherDuck cloud service lets you store structured data, query that data with SQL, and share it with others. A key MotherDuck product principle is ease of use. **Serverless execution model**—You don't need to configure or spin up instances, clusters, or warehouses. You write and submit SQL. MotherDuck takes care of the rest. Under the hood, MotherDuck runs DuckDB and speaks DuckDB's SQL dialect. **Managed storage**—you can load data into MotherDuck storage to be queried or shared. MotherDuck storage is durable, secure, and automatically optimized for best performance. MotherDuck storage is surfaced to you through the **catalog** and logical primitives database, schema, table, view, and so on. In addition, MotherDuck can query data outside of MotherDuck storage—as data on Amazon S3, through HTTPS endpoints, on your laptop, and more. **The service layer**—MotherDuck provides key capabilities like secure identity, authorization, administration, and monitoring. :::note MotherDuck is available on three AWS regions: - **US East (N. Virginia):** `us-east-1`, supporting DuckDB versions between and . - **US West (Oregon):** `us-west-2`, supporting DuckDB versions between and . - **Europe (Frankfurt):** `eu-central-1`, supporting DuckDB versions between and . You can choose in which region to create your organization, and organizations can only exist within a single cloud region. We are working on expanding to other regions and cloud providers. ::: ### MotherDuck's DuckDB SDK If you're using DuckDB in Python or CLI, you can connect to MotherDuck with a single line of code, `ATTACH 'md:';`. After you run this command, your DuckDB instance becomes supercharged by MotherDuck. MotherDuck's Dual Execution is enabled, and your DuckDB instance gets additional capabilities like sharing, secrets storage, better interoperability with S3, and cloud persistence. ### Dual execution When connected together, DuckDB and MotherDuck form a different type of distributed system. The two nodes work in concert so you can query data wherever it lives, in the most efficient way possible. This query execution model, called **Dual Execution** (formerly known as Hybrid Execution), automatically routes the various stages of queries execution to the most opportune locations, including highly arbitrary scenarios: - If a SQL query queries data on your laptop, MotherDuck routes the query to your local DuckDB instance - If a SQL query queries data in MotherDuck or cloud storage (S3, GCS, Azure, R2), MotherDuck routes that query to MotherDuck's cloud engine, which connects to your storage provider directly. MotherDuck can use both cloud-stored and local secrets to authenticate. See [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/) for details. - If a SQL query executes a join between data on your laptop and data in MotherDuck, MotherDuck finds the best way to efficiently join the two ### The MotherDuck web UI You can use MotherDuck's web UI to analyze and share data and to perform administrative tasks. MotherDuck's UI consists of a lightweight notebook, a SQL IDE, and a data catalog. Uniquely, MotherDuck caches query results in a highly interactive query results panel, letting you sort, filter, and even pivot data quickly. ## Summary of capabilities With MotherDuck you can: - Use serverless DuckDB in the cloud to store data and execute DuckDB SQL - Load data into MotherDuck from your personal computer, https, or S3 - Join datasets on your computer with datasets in MotherDuck or in S3 - Copy DuckDB databases between local and MotherDuck locations - Materialize query results into local or MotherDuck locations, or S3 - Work with data in MotherDuck's notebook UI, standard DuckDB CLI, or standard DuckDB Python package - Share databases with your teammates - Securely save S3 credentials in MotherDuck Additionally, MotherDuck supports connectivity to third party tools through: - JDBC - Go - sqlalchemy ## Considerations and limitations MotherDuck does not yet support the full range of SQL of DuckDB. We are continuously working on improving coverage of DuckDB in MotherDuck. If you need specific features enabled, please let us know. Below is the list of DuckDB features that MotherDuck does not yet support: - Custom Python / Native user defined functions. - Server-side attach of postgres, sqlite, etc. - Custom or community extensions. --- Source: https://motherduck.com/docs/concepts/concepts --- title: Concepts description: Concepts sidebar_class_name: architecture-icon --- This section contains a collection of high level views of concepts & features. ## Included pages - [Architecture and capabilities](https://motherduck.com/docs/concepts/architecture-and-capabilities): MotherDuck's serverless architecture combining cloud scale with DuckDB's efficiency through Dual Execution. - [Database Concepts](https://motherduck.com/docs/concepts/database-concepts): MotherDuck Database Concepts - [Hypertenancy](https://motherduck.com/docs/concepts/hypertenancy): Learn how MotherDuck's hypertenancy model provides dedicated compute for every user through per-user Ducklings, enabling predictable performance without noisy neighbors. - [Resource management](https://motherduck.com/docs/concepts/resource-management): Understand MotherDuck's resource hierarchy, from organizations, accounts, tokens, and secrets down to databases and tables, and how each level provides compute isolation, data isolation, and access control. - [Database Snapshots](https://motherduck.com/docs/concepts/database-snapshots): Understand how snapshots work in MotherDuck, including retention, restore, snapshot management, and plan availability - [pg_duckdb Extension](https://motherduck.com/docs/concepts/pgduckdb): Use pg_duckdb to run DuckDB analytics within PostgreSQL and connect to MotherDuck. - [Storage Lifecycle and Management](https://motherduck.com/docs/concepts/storage-lifecycle): Understand how MotherDuck manages data storage across different lifecycle stages and how this affects your billing and data management strategies. - [Data Recovery](https://motherduck.com/docs/concepts/data-recovery): Understand MotherDuck's data recovery mechanisms - [Workload scaling patterns](https://motherduck.com/docs/concepts/scaling-patterns): Choose the right compute size, scaling approach, and connection model for your MotherDuck workload using a decision flowchart and workload-to-pattern matrix. - [Object name resolution](https://motherduck.com/docs/concepts/object-name-resolution): Fully qualified naming conventions and database resolution rules in MotherDuck. - [DuckDB Extensions in MotherDuck](https://motherduck.com/docs/concepts/duckdb-extensions): Supported DuckDB extensions for the MotherDuck cloud service, Web UI, and CLI. - [DuckLake](https://motherduck.com/docs/concepts/ducklake): Understanding DuckLake - A high-performance open table format for petabyte-scale analytics - [Results](https://motherduck.com/docs/concepts/results): Results --- Source: https://motherduck.com/docs/concepts/data-recovery --- title: Data Recovery sidebar_position: 4 description: Understand MotherDuck's data recovery mechanisms --- ## Overview MotherDuck provides [historical snapshots](/concepts/snapshots) to support point-in-time backup/restore mechanisms on Lite and Business plans. On the Lite plan, databases only keep the active snapshot (no historical retention) until usage limits are reached, after which Lite snapshot retention and [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) behavior apply. This page covers an example workflow with [named snapshots](/concepts/snapshots#2-named-snapshots) and outlines how to restore a database to a historical snapshot within the [snapshot retention](/concepts/snapshots#snapshot-retention) window (`snapshot_retention_days`). Refer to the [Database Snapshots](/concepts/snapshots) page for more details. :::note DuckLake databases DuckLake databases manage snapshots through [auto maintenance](/concepts/ducklake#auto-maintenance). Snapshot retention for DuckLake defaults to infinite (`NULL`) and is configured with `SNAPSHOT_RETENTION_DAYS` through [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database). See [DuckLake snapshot retention](/concepts/ducklake#snapshot-retention) for details. ::: ### Snapshot options per plan (native storage) | Plan | Snapshot Retention Default | Configurable Retention Period | Named Snapshots | Point-in-Time Restore | [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) Database | |------|----------------------------|-------------------------------|-----------------|----------------------|-------------------| | **Business** | 7 days | 0–90 days | Yes | Yes | Yes | | **Lite (paid)** | 1 day | 1 day | No | Yes | Yes | | **Lite (free)** | 0 days | N/A | N/A | N/A | N/A | Snapshots can be used to restore a new database to the snapshot using [`CREATE DATABASE`](/sql-reference/motherduck-sql-reference/create-database) or to [`ALTER`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) an existing database to reflect the contents of a specific snapshot. - **[Automatic snapshots](/concepts/snapshots#1-automatic-snapshots)** are retained for a set period of time according to `snapshot_retention_days` after they are no longer the active snapshot for a database. - **[Named snapshots](/concepts/snapshots#2-named-snapshots)** are created explicitly and persist until unnamed. They are not subject to automatic garbage collection. A new database: ```sql CREATE DATABASE FROM ( SNAPSHOT_TIME ... | SNAPSHOT_NAME ... | SNAPSHOT_ID ... ) ``` An existing database: ```sql ALTER DATABASE SET SNAPSHOT TO ( SNAPSHOT_TIME ... | SNAPSHOT_NAME ... | SNAPSHOT_ID ... ) ``` Snapshots can also be used to recover a dropped database: ```sql UNDROP DATABASE ``` Refer to [undrop database](/sql-reference/motherduck-sql-reference/undrop-database) page for details. Example: ```sql -- You cannot drop the currently active database USE some_other_db; DROP DATABASE recovery_demo; UNDROP DATABASE recovery_demo; ``` Refer to the [named snapshots](/sql-reference/motherduck-sql-reference/create-snapshot) page for an example. ## Restoring your database to a named snapshot ```sql CREATE DATABASE example_db; USE example_db; CREATE TABLE one AS SELECT 1; CREATE SNAPSHOT one OF example_db; CREATE TABLE two AS SELECT 2; CREATE SNAPSHOT two OF example_db; CREATE TABLE three AS SELECT 3; CREATE SNAPSHOT three OF example_db; -- Accidentally drop data! DROP TABLE two; -- Restore a previous snapshot of the DB and check it's what you want CREATE DATABASE example_restore FROM example_db (SNAPSHOT_NAME 'three'); -- The snapshot looks correct! SELECT * FROM example_restore.two; -- Restore the database to the old valid snapshot ALTER DATABASE example_db SET SNAPSHOT TO (SNAPSHOT_NAME 'three'); -- We have successfully restored our data! SELECT * FROM two; ``` ## Restoring a database to a historical snapshot To find all snapshots corresponding to your database, run the following queries. To see the history of snapshots for a given database: ```sql SELECT * FROM MD_INFORMATION_SCHEMA.DATABASE_SNAPSHOTS WHERE database_name = '' ORDER BY created_ts DESC; ``` If you have a rough idea of the time range you want to restore your database to, you can filter the above query by `created_ts`, ```sql SELECT snapshot_id, created_ts, active_bytes FROM MD_INFORMATION_SCHEMA.DATABASE_SNAPSHOTS WHERE database_name = '' and created_ts >= '2024-12-02 20:00:00' and created_ts <= '2024-12-02 20:05:00' ORDER BY created_ts DESC ``` The results should look something like this: | snapshot_id | created_ts | active_bytes| |-------------|-----------|----------| | `73034f48-e832-40d6-a30f-9055eb302a2e` | `2024-12-02 20:03:30` | `2191330` | | `c204ce3b-f3fd-4677-8a05-e8680648cf27` | `2024-12-02 20:02:05` | `2183991` | | `63395025-b139-4c6f-8fc2-7b8c0feff748` | `2024-12-02 20:01:55` | `1847296` | Example (restore an existing database by ID): ```sql ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_ID ''); ``` Both automated and named snapshots can be used to restore to a desired state that was captured. Users can either restore a new or existing database to a specific snapshot. ```sql -- For a new database CREATE DATABASE restored_database FROM your_database_name (SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27'); ``` After running the above command, users can run queries on `restored_database` and use the state of the database from a prior point-in-time. Once users have the exact snapshot they are interested in restoring, we recommend finding the `snapshot_id` (instead of using `snapshot_time`) and using the command: ```sql ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27'); ``` **Note:** Running a `SET SNAPSHOT TO` command that specifies a timestamp that doesn't exist in `md_information_schema.database_snapshots` will select the most recent snapshot created at or before the specified timestamp. In our example, snapshot `63395025-b139-4c6f-8fc2-7b8c0feff748` would be selected because it is the only snapshot in the information schema table that was created before `'2024-12-02 20:02:04'` ```sql ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_TIME '2024-12-02 20:02:04'); ``` However, if you run: ```sql ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_TIME '2024-12-02 20:02:05'); ``` then snapshot `c204ce3b-f3fd-4677-8a05-e8680648cf27` will be selected because there is an exact timestamp match. ## See also - [Database Snapshots](/concepts/snapshots) — Understanding snapshot types, retention, and best practices - [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot) — SQL reference for creating snapshots - [`UNDROP DATABASE`](/sql-reference/motherduck-sql-reference/undrop-database) — Recovering dropped databases --- Source: https://motherduck.com/docs/concepts/database-concepts --- sidebar_position: 2 title: Database Concepts sidebar_label: Database Concepts description: MotherDuck Database Concepts --- ## MotherDuck Architectural Concepts :::note MotherDuck is a cloud-native data warehouse, built on top of DuckDB, a fast in-process analytical database. It inherits some features from DuckDB that present opportunities to think differently about data warehousing methods in order to achieve high levels of performance and simplify the experience. ::: - **Isolated Compute Tenancy**: Each user is allocated their own "Duckling," which is an isolated piece of compute that sits on top of the MotherDuck storage layer. MotherDuck is designed this way to lessen contention between users, which is a common challenge with other data warehouses. Each Duckling had under 100ms of cold start time as MotherDuck keeps Ducklings on warm standby. - **Aggressively Serverless**: Unlike conventional data warehouses, DuckDB automatically parallelizes the work that you send to it. The implication of this is that scheduling multiple queries at-a-time does not meaningfully increase throughput, as DuckDB has already parallelized the workload across all available resources. - **Database level security model**: It has a simplified access model - users either have access to an entire database, or nothing at all. As a result, users will interact with data frequently at the database level. This is unusual when compared to other databases, which often treat multiple database files as single concepts from an interactivity perspective. - **Database Sharing**: MotherDuck separates storage and compute, which means that one user cannot see another's writes into a database until that database is updated to that user. As such, it has its own concept called ["SHARES"](/key-tasks/sharing-data/sharing-overview/) within Organizations, which are zero-copy clones of the main database for read-only use, enabling high scalability of analytics workloads. - **Dual Execution**: Every MotherDuck client is also a DuckDB engine, so you can efficiently query local data and (JOIN, UNION) with data that's stored in your MotherDuck data warehouse. [The query planner automatically decides](/concepts/architecture-and-capabilities#dual-execution) the best place to execute each part of your query. --- Source: https://motherduck.com/docs/concepts/database-snapshots --- title: Database Snapshots sidebar_position: 3 slug: /concepts/snapshots description: Understand how snapshots work in MotherDuck, including retention, restore, snapshot management, and plan availability --- ## What are snapshots? Snapshots capture the complete state of a database at a specific point in time. MotherDuck creates **historical snapshots** in the background for attached databases (databases that are connected to MotherDuck and available for querying), enabling [data recovery](/concepts/data-recovery) features such as restore and [undrop](/sql-reference/motherduck-sql-reference/undrop-database). Historical snapshots come in two forms. :::note DuckLake databases DuckLake databases have their own snapshot and maintenance system. Snapshot lifecycle management for DuckLake is handled through [auto maintenance](/concepts/ducklake#auto-maintenance), not the native storage snapshot system described on this page. See [DuckLake snapshot retention](/concepts/ducklake#snapshot-retention) for details on configuring `SNAPSHOT_RETENTION_DAYS` and `AUTO_MAINTENANCE` for DuckLake databases. ::: ### 1. automatic snapshots {#1-automatic-snapshots} Automatic snapshots are created continuously in the background by MotherDuck whenever data changes. For paid plans every new database has automatic snapshots configured by default. For paid plans you can also set or adjust your database's snapshot retention window with: ```sql ALTER DATABASE example_database SET SNAPSHOT_RETENTION_DAYS = 4; ``` Automatic snapshots: - Are created whenever data in the database changes or explicitly with `CREATE SNAPSHOT OF ;` - Are retained as [`historical_bytes`](/concepts/storage-lifecycle) according to the database's `snapshot_retention_days` setting - Can be queried using [`md_information_schema.database_snapshots`](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots) - Are automatically removed by garbage collection when they fall outside the retention window ### 2. named snapshots {#2-named-snapshots} Named snapshots have to be explicitly created with a name using [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot). ```sql CREATE SNAPSHOT my_backup OF example_database ``` These persist indefinitely until the name is removed. Named snapshots are **not** subject to automatic garbage collection and are only available on the Business plan. Named snapshots differ from automatic snapshots: - They are **not garbage-collected** by snapshot retention - They persist even if the source database they are associated with is deleted - They can be referenced directly by name when restoring or cloning a database - Snapshot names must be unique per user - They can only be deleted by removing the name, after which they are picked up by garbage collection Named snapshots are intended for **long-lived backups** and are the recommended mechanism for durable recovery points. Named snapshots can be used with the [`ALTER DATABASE SET SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) command, as well as the [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) command to specify the snapshot you want to use. ## Restoring a database You can [restore a database](/docs/sql-reference/motherduck-sql-reference/create-database/#source-database-options) from a snapshot by specifying the snapshot name, `snapshot_id` or a timestamp. When using a timestamp the latest snapshot at or before that time will be selected. ```sql CREATE DATABASE example_db_from_snap FROM example_db (SNAPSHOT_NAME 'snap'); CREATE DATABASE example_db_from_id FROM example_db (SNAPSHOT_ID '4bfbd992-e586-48ab-9176-8dfb2d2c30b4'); CREATE DATABASE example_db_from_ts FROM example_db (SNAPSHOT_TIME '2026-01-01 00:00:01.234567'); ``` ## Snapshot features per plan ### Native storage databases | Plan | Automatic snapshot retention default | Configurable retention period | Named snapshots | Point-in-time restore | [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) database | |------|----------------------------|-------------------------------|-----------------|----------------------|-------------------| | **Business** | 7 days | 0-90 days | Yes | Yes | Yes | | **Lite (paid)** | 1 day | 1 day | No | Yes | Yes | | **Lite (free)** | 0 days | N/A | N/A | N/A | N/A | ### DuckLake databases DuckLake databases manage snapshots through [auto maintenance](/concepts/ducklake#auto-maintenance) rather than the native storage snapshot system. | Database type | Auto maintenance default | Snapshot retention default | Configurable retention | |------|----------------------------|-------------------------------|-----------------| | **Fully managed** | Enabled | Infinite (`NULL`) | Yes, with `SNAPSHOT_RETENTION_DAYS` | | **BYOB** | Disabled | Infinite (`NULL`) | Yes, after enabling `AUTO_MAINTENANCE` | ## Snapshot retention The `snapshot_retention_days` database setting controls how long historical snapshots are retained for [data recovery](/concepts/data-recovery). This setting determines how much data is stored as [`historical_bytes`](/concepts/storage-lifecycle) in your storage footprint. - **`0` days:** No historical snapshots are accessible; automatic snapshots are immediately eligible for garbage collection - **`1+` days:** Automatic snapshots created within the retention window can be accessed and restored Users can modify snapshot retention at any time using [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database): ```sql ALTER DATABASE my_database SET SNAPSHOT_RETENTION_DAYS = 4; ``` To see your database's current snapshot retention, use [`md_information_schema.databases`](/sql-reference/motherduck-sql-reference/md_information_schema/databases) and look for the `historical_snapshot_retention` field. ::::note Snapshot retention days are inherited when cloning a database. :::: ::::important Increasing `snapshot_retention_days` does not restore previously deleted snapshots. Once the garbage collection process removes a snapshot, it cannot be recovered through this setting. :::: ## Working with named snapshots Named snapshots are subject to naming rules. - Snapshot names must be 1–255 characters long - Names are unique per user across all databases - If a name includes special characters (such as `.` or `/`), wrap it in double quotes - If you create two named snapshots in a row without any new writes, the second can fail because the latest snapshot already has a name ### Renaming a named snapshot Users can change the name of an existing named snapshot using the [`ALTER SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-snapshot) command: ```sql ALTER SNAPSHOT SET snapshot_name = ''; ``` ### Deleting (un-naming) a named snapshot To remove a name from a snapshot, run the following command: ```sql ALTER SNAPSHOT SET snapshot_name = ''; ``` Once unnamed, the snapshot will become subject to the database's `snapshot_retention_days` policy and will be deleted automatically when it falls outside the retention window. ## Historical snapshots and failsafe bytes It's important to understand the distinction between historical snapshots and failsafe data: - **Historical snapshots** are point-in-time copies of your database that you can restore yourself using SQL commands. They are stored as `historical_bytes` and controlled by your `snapshot_retention_days` setting. - **Failsafe data** is a system-managed backup that MotherDuck retains for disaster recovery. It is stored as `failsafe_bytes` and can only be restored by contacting MotherDuck support. | | Historical Snapshots: `historical_bytes` | Failsafe Data: `failsafe_bytes` | |---|---|---| | **Purpose** | User-initiated data recovery and point-in-time restore | System-level disaster recovery backup | | **Controlled by** | `snapshot_retention_days` setting | MotherDuck system (7 days for standard databases, 1 day for transient) | | **Recovery method** | Self-service through [`ALTER DATABASE SET SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) or [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) | Requires contacting [MotherDuck support](https://motherduck.com/contact-us/support/) | | **Visibility** | Queryable through [`md_information_schema.database_snapshots`](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots) | Not directly visible to users | | **Storage billing** | Billed as `historical_bytes` | Billed as `failsafe_bytes` | For more details on storage lifecycle stages, see [Storage Lifecycle and Management](/concepts/storage-lifecycle). ## Best practices - Use **named snapshots** for long-lived backups you may need to restore far into the future - If you frequently overwrite your data, use a short snapshot retention window (1-7 days) to avoid storing multiple copies of the same data - Failsafe restores should be thought of as a precautionary, last-minute measure in exception scenarios only; we recommend using historical snapshots for routine recovery needs - Do **not** use [transient databases](/concepts/storage-lifecycle#transient-databases) for critical or hard-to-reconstruct data ## Related content - [Data Recovery](/concepts/data-recovery) — Step-by-step guide to restoring databases from snapshots - [Storage Lifecycle and Management](/concepts/storage-lifecycle) — Understanding storage stages and billing - [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot) — SQL reference for creating snapshots - [`DATABASE_SNAPSHOTS` view](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots) — Query snapshot history and metadata - [`CREATE DATABASE` from a snapshot](/docs/sql-reference/motherduck-sql-reference/create-database/#source-database-options) --- Source: https://motherduck.com/docs/concepts/duckdb-extensions --- sidebar_position: 8 title: DuckDB Extensions in MotherDuck description: Supported DuckDB extensions for the MotherDuck cloud service, Web UI, and CLI. keywords: - DuckDB extensions - MotherDuck extensions - extension support - server-side extensions - web UI extensions - compatibility --- # DuckDB extensions in MotherDuck MotherDuck supports a wide array of DuckDB extensions to enhance your analytics workflows. Support varies depending on whether you are using the DuckDB CLI, the MotherDuck cloud service (server-side), or the MotherDuck Web UI. ## Extension support ### MotherDuck Web UI The MotherDuck Web UI supports a subset of extensions optimized for interactive analytics and data exploration directly in your browser. Some extensions can be loaded in the Web UI but are not supported server side (i.e., they are invoked and ran only in the browser). ### MotherDuck Cloud (server-side) MotherDuck's cloud service supports a curated set of extensions for optimized, secure, and scalable query execution. These extensions are available for all queries running against the MotherDuck service. ### DuckDB CLI When connected to MotherDuck through the local DuckDB CLI, **all** DuckDB extensions are available. These extensions are loaded locally, giving you access to the entire DuckDB ecosystem for development and testing. ## Extension support matrix The following table summarizes the current support for DuckDB extensions across MotherDuck environments, as it relates to execution context - extensions supported only server-side will only use server-side compute, where as extensions also supported in the Web UI will use local compute as well. The environments are **MD Web UI**, located at https://app.motherduck.com, **MD Cloud**, which runs on MotherDuck infrastructure when you connect using `md:`, and **DuckDB UI / CLI** which run on local environments where the DuckDB client is installed. | Extension | MD UI* | MD Cloud | DuckDB UI / CLI | |----------------------|--------|----------|-----------------| | autocomplete | ✅ | ❌ | ✅ | | avro | ✅ | ✅ | ✅ | | aws | ❌ | ❌ | ✅ | | azure | ❌ | ✅ | ✅ | | delta | ❌ | ✅ | ✅ | | ducklake | ✅ | ✅ | ✅ | | encodings | ❌ | ✅ | ✅ | | excel | ✅ | ✅ | ✅ | | fts | ✅ | ✅ | ✅ | | httpfs | ✅ | ✅ | ✅ | | h3 | ✅ | ✅ | ✅ | | iceberg | ❌ | ✅ | ✅ | | icu | ✅ | ✅ | ✅ | | inet | ✅ | ✅ | ✅ | | jemalloc | ❌ | ❌ | ✅ | | json | ✅ | ✅ | ✅ | | mysql | ❌ | ❌ | ✅ | | parquet | ✅ | ✅ | ✅ | | postgres | ❌ | ❌ | ✅ | | spatial | ✅ | ✅ | ✅ | | sqlite | ✅ | ❌ | ✅ | | tpcds | ✅ | ✅ | ✅ | | tpch | ✅ | ✅ | ✅ | | ui | ❌ | ❌ | ✅ | | vss | ❌ | ❌ | ✅ | | community extensions | ❌ | ❌ | ✅ | :::note *Not all features of extensions in the MotherDuck UI (Wasm) are supported. ::: :::note For some extensions (such as `h3`), you should load it before loading the `motherduck` extension if you want to use it on local data without routing the query to MotherDuck. ```sql -- Install and load the h3 extension before MotherDuck INSTALL h3 FROM community; LOAD h3; LOAD motherduck; ATTACH 'md:'; ``` ::: Extensions listed as supported by DuckDB UI / CLI, such as `aws`, `postgres`, and `vss`, can also be used through a local DuckDB instance connected to MotherDuck. ## Future development MotherDuck's extension support is continuously evolving. The team regularly evaluates and adds support for new extensions based on user demand and technical feasibility. If you need specific extensions enabled, please reach out to the MotherDuck team. --- Source: https://motherduck.com/docs/concepts/ducklake --- sidebar_position: 8 title: DuckLake description: Understanding DuckLake - A high-performance open table format for petabyte-scale analytics feature_stage: preview --- import Admonition from '@theme/Admonition'; import Versions from '@site/src/components/Versions'; # DuckLake ::::info MotherDuck supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions through . :::: DuckLake is an open table format for large-scale analytics that provides data management capabilities similar to Apache Iceberg and Delta Lake. It organizes data into partitions based on column values like date or region for efficient querying, with actual data files stored on object storage systems. DuckLake innovates by storing metadata in database tables rather than files, enabling faster lookups through database indexes and more efficient partition pruning using SQL queries, while the columnar data itself resides on scalable object storage infrastructure. MotherDuck provides support for managed DuckLake, enabling you to back MotherDuck databases with a DuckLake catalog and storage for petabyte-scale data workloads. :::tip Looking for **code examples?** Check out the [integration guide](/integrations/file-formats/ducklake/) to see how easy it is to start using DuckLake with MotherDuck. ::: ## Key characteristics **Database-backed metadata**: DuckLake stores table metadata in a transactional database (PostgreSQL, MySQL) rather than files, providing: - Faster metadata lookups through database indexes - Efficient filtering of data by skipping irrelevant partitions using SQL WHERE clauses - Simplified writes without the performance of manifest file merging **Multi-table transactions**: Unlike other lake formats that operate on individual tables, DuckLake supports ACID transactions across multiple related tables, better reflecting how organizations think about databases as collections of inter-related tables. **Simplified architecture**: No additional catalog server required—just a standard transactional database that most organizations already have expertise managing. ## DuckLake vs. other lake formats ### Performance differences Table formats like Apache Iceberg and Delta Lake store metadata in file-based structures. Read and write operations must traverse these file-based metadata structures, which can create latency that increases with scale. **File-based metadata challenges**: - Sequential file scanning for metadata discovery - Complex manifest file merging for writes - Limited query optimization due to metadata access patterns - Catalog server complexity for coordination **DuckLake approach**: - Database indexes provide faster metadata lookups - Transactional writes reduce manifest merging overhead - SQL-based partition pruning and query optimization - Standard database operations for metadata management ### Scale and capability comparison | Capability | DuckLake | Iceberg/Delta Lake | | ---------- | -------- | ------------------ | | **Data Scale** | Petabytes | Petabytes | | **Metadata Storage** | Database tables with indexed access | File-based structures requiring sequential traversal | | **Metadata Performance** | Database index lookups | Additional catalog required | | **Write Operations** | Database transactions | Manifest file merging | | **Multi-table Operations** | Full ACID transactions across tables | Limited cross-table coordination | | **Infrastructure Requirements** | Standard transactional databases | Separate catalog servers | | **Schema Evolution** | Coordinated multi-table schema evolution | Individual table-level changes | ## Use cases and applications ### When to choose DuckLake as your open table format DuckLake is particularly well-suited for: **Large-scale analytics**: Organizations with petabytes of historical data, high-volume event streams, or analytics requirements that exceed traditional data warehouse storage or processing capabilities. **Multi-table workloads**: Applications requiring coordinated schema evolution, cross-table constraints, or transactional consistency across related tables. **Metadata-intensive workloads**: Scenarios where file-based metadata access patterns may impact query performance. **Reduced infrastructure complexity**: Organizations seeking lake-scale capabilities with fewer separate catalog servers and metadata management components. ### Storage comparison: MotherDuck native vs DuckLake storage For loading data, MotherDuck and DuckLake perform very similarly. However, when reading data, MotherDuck native storage format is 2x-10x faster than DuckLake, for both cold & hot runs. ### Migration considerations **From data warehouses**: DuckLake provides a scaling option when warehouse storage limits or costs become constraining, while maintaining SQL interfaces and compatibility. **From other lake formats**: DuckLake may provide performance improvements for metadata-intensive workloads, though migration requires consideration of existing tooling and processes. **Hybrid architectures**: Organizations can use MotherDuck for traditional data warehouse workloads while graduating specific databases to DuckLake as scale requirements increase. ## Performance characteristics ### Metadata operations DuckLake's database-backed metadata provides different performance characteristics: - **Partition discovery**: Index-based vs. file scanning - **Schema evolution**: Transactional vs. eventual consistency - **Query planning**: Index-based vs. file traversal - **Concurrent access**: Database locks vs. file coordination ## Data inlining DuckLake supports data inlining, an optimization that stores small data changes directly in the metadata catalog rather than creating individual Parquet files. This feature is particularly valuable for high-frequency, small-batch inserts common in streaming and transactional workloads. Starting with DuckLake 0.4, **deletion inlining** extends this concept to delete operations -- small deletes are stored in the metadata catalog rather than creating separate deletion files. For implementation details and examples, see the [DuckLake integration guide](/integrations/file-formats/ducklake/#data-inlining). ## Storage lifecycle DuckLake databases follow most of the same [storage lifecycle stages](/concepts/storage-lifecycle) as native storage databases: 1. **Active bytes**: Data that is part of the current state of the database 2. **Historical bytes**: Data retained by snapshots that is no longer part of the active state 3. **Failsafe bytes**: Data retained as system backups after snapshots expire (7-day retention) 4. **Deleted**: Data fully removed from the system Unlike native storage databases, DuckLake does not have a "retained for clone" stage because DuckLake does not support zero-copy cloning. Storage optimization and snapshot expiration are handled by [auto maintenance](#auto-maintenance) rather than the native storage garbage collector. For retention defaults and plan-specific details, see [Storage lifecycle and management](/concepts/storage-lifecycle#ducklake-databases). ## Auto maintenance MotherDuck runs background maintenance on DuckLake databases to optimize storage layout and manage data lifecycle. Maintenance runs periodically on the duckling that owns the database while it is active. ### Defaults and configuration | Database type | Default | Description | | ------------- | ------- | ----------- | | Fully managed DuckLake | Enabled | Maintenance runs automatically; opt out with `ALTER DATABASE SET AUTO_MAINTENANCE = FALSE` | | BYOB (Bring Your Own Bucket) | Disabled | Opt in with `ALTER DATABASE SET AUTO_MAINTENANCE = TRUE` to enable maintenance | To disable auto maintenance: ```sql ALTER DATABASE SET AUTO_MAINTENANCE = FALSE; ``` To enable auto maintenance (for example, for BYOB databases): ```sql ALTER DATABASE SET AUTO_MAINTENANCE = TRUE; ``` ### Maintenance operations Auto maintenance runs two phases for each database. You can also run these operations manually using the [DuckLake maintenance functions](https://ducklake.select/docs/stable/duckdb/maintenance/recommended_maintenance). #### File layout optimization These operations keep query performance high by organizing data files: | Operation | Description | | --------- | ----------- | | **[Flush inlined data](https://ducklake.select/docs/stable/duckdb/advanced_features/data_inlining#flushing-inlined-data)** | Converts small inlined data stored in the metadata catalog to Parquet files. This may produce small files, which the merge operation consolidates in the same maintenance round. | | **[Merge small files](https://ducklake.select/docs/stable/duckdb/maintenance/merge_adjacent_files)** | Combines adjacent small Parquet files into larger files, reducing the number of files scanned during queries. | | **[Rewrite data files](https://ducklake.select/docs/stable/duckdb/maintenance/rewrite_data_files)** | Rewrites data files that have accumulated deleted rows to reclaim space and remove delete overhead. | Merge and rewrite operate on disjoint sets of files: merge handles files without deletes, while rewrite handles files with deletes. #### Snapshot lifecycle management These operations manage time travel snapshots and clean up files that are no longer needed: | Operation | Description | | --------- | ----------- | | **[Expire snapshots](https://ducklake.select/docs/stable/duckdb/maintenance/expire_snapshots)** | Removes snapshots older than the configured retention period and queues their associated files for deletion. | | **[Clean up old files](https://ducklake.select/docs/stable/duckdb/maintenance/cleanup_of_files)** | Physically deletes files that have been queued for deletion by expire, merge, or rewrite operations. Files are kept for at least 12 hours after queuing, allowing in-flight queries to finish. | ### Snapshot retention Snapshot expiration is controlled by the `SNAPSHOT_RETENTION_DAYS` database option. By default, this is set to `NULL` (infinite retention), meaning snapshots are never automatically expired. You must explicitly configure a retention period to enable automatic snapshot expiration. To set a retention period: ```sql ALTER DATABASE SET SNAPSHOT_RETENTION_DAYS = 7; ``` To revert to infinite retention: ```sql ALTER DATABASE SET SNAPSHOT_RETENTION_DAYS = NULL; ``` :::note When `SNAPSHOT_RETENTION_DAYS` is `NULL`, the expire snapshots operation is skipped entirely. No snapshot data is expired unless you explicitly set a retention period. The file cleanup operation still runs to delete files queued by merge and rewrite, which are always safe to remove regardless of retention settings. ::: ### Write conflicts The merge and rewrite operations modify table metadata, which can occasionally conflict with concurrent write transactions on the same table. If a conflict occurs, the maintenance operation is skipped for that table and retried in the next round. While these conflicts are rare, you can disable auto maintenance for the affected database if you experience elevated transaction conflicts. ### Where maintenance runs Maintenance runs on the duckling that owns the database. It executes in the background while the duckling is active. If the duckling shuts down, any in-progress maintenance operations stop gracefully. Maintenance resumes when the duckling starts again. ## Future capabilities MotherDuck continues expanding DuckLake support with planned features including: **External catalog integration**: Access to customer-managed DuckLake catalogs hosted in cloud databases **Local storage access**: Direct access to MotherDuck-managed storage from local DuckDB instances for hybrid workloads **Enhanced Iceberg support**: Continued improvements to Iceberg integration alongside DuckLake development ## Architecture implications ### Catalog database requirements DuckLake catalogs require a transactional database with: - ACID transaction support - Concurrent read/write access - Standard SQL interface - Backup and recovery capabilities Thankfully, this is all supported as part of MotherDuck without adding an additional catalog, although in self-hosted scenarios, an alternative database like Postgres, MySQL, or SQLite can be used. ### Storage considerations DuckLake data storage follows similar patterns to other lake formats: - Columnar file formats (Parquet) - Partitioned directory structures - Object storage compatibility - Compression and encoding optimizations --- Source: https://motherduck.com/docs/concepts/hypertenancy --- sidebar_position: 2 title: Hypertenancy description: Learn how MotherDuck's hypertenancy model provides dedicated compute for every user through per-user Ducklings, enabling predictable performance without noisy neighbors. --- MotherDuck implements a unique tenancy model called **hypertenancy**: every user or service account gets their own dedicated DuckDB compute instance, called a Duckling. Unlike traditional data warehouses where all users share a single cluster, hypertenancy provides full compute isolation at the individual user level. ## The problem with traditional multi-tenancy Traditional data warehouses and OLAP systems use a shared-compute model: ```mermaid graph TB subgraph Users["All Users"] U1{{"User A"}}:::green U2{{"User B"}}:::green U3{{"User C"}}:::green end subgraph Warehouse["Shared Data Warehouse"] Cluster["Single Compute Cluster"]:::yellow end U1 --> Cluster U2 --> Cluster U3 --> Cluster ``` This shared model creates several challenges: - **Noisy neighbors**: One user's expensive query affects everyone else's performance - **Resource contention**: Concurrency limits apply across all users - **Unpredictable performance**: Query times vary based on overall system load - **Overprovisioning**: Resources must be sized for peak aggregate load, sitting idle most of the time - **Difficult cost attribution**: Hard to track compute costs per user or customer ## How Hypertenancy works With hypertenancy, MotherDuck provisions a separate Duckling for each user: ```mermaid graph TB subgraph Users["All Users"] U1{{"User A"}}:::green U2{{"User B"}}:::green U3{{"User C"}}:::green end subgraph MotherDuck["MotherDuck"] D1["Duckling A"]:::yellow D2["Duckling B"]:::yellow D3["Duckling C"]:::yellow end U1 --> D1 U2 --> D2 U3 --> D3 ``` Each Duckling is a complete DuckDB instance with dedicated CPU, memory, and fast SSD spill space. This architecture delivers: - **Perfect isolation**: No noisy neighbors—one user's workload never impacts another - **Predictable performance**: Dedicated resources mean consistent query times - **Independent scaling**: Each user's compute can be sized to their specific needs - **Per-user billing**: Compute costs directly attributable to individual users - **Fast cold starts**: Ducklings start in approximately 1 second ## Scaling with Hypertenancy Hypertenancy supports both vertical and horizontal scaling, letting you match compute resources to actual demand. ### Vertical scaling: Duckling sizes Each user's Duckling can be configured to different sizes based on their workload requirements: | Duckling Size | Best For | |---------------|----------| | **Pulse** | Ad-hoc queries, read-heavy workloads, high-concurrency analytics | | **Standard** | Core analytical workflows, ETL/ELT pipelines | | **Jumbo** | Large-scale batch processing, complex joins | | **Mega** | Demanding jobs with high data volumes | | **Giga** | Largest and toughest batch workloads | You can adjust Duckling size per user through the [MotherDuck UI](/about-motherduck/billing/duckling-sizes/#changing-duckling-sizes) or [REST API](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/). For example, in a customer-facing analytics scenario, you might provision: - **Pulse** Ducklings for most customers running standard dashboards - **Standard** or **Jumbo** Ducklings for enterprise customers with heavier workloads - **Mega** or **Giga** Ducklings for batch data loading jobs ### Horizontal scaling: read scaling When a single user needs to handle many concurrent queries—such as a service account powering a customer-facing application—you can enable [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). Read scaling provisions additional read-only Ducklings that share the same data but distribute query load: ```mermaid graph TB subgraph App["Application Users"] E1{{"End User 1"}}:::green E2{{"End User 2"}}:::green E3{{"End User 3"}}:::green E4{{"End User 4"}}:::green end S1[Service Account]:::watermelon subgraph MotherDuck["MotherDuck (Customer X)"] RW["Read-Write Duckling
(Data Loading)"] R1["Read Scaling Duckling 1"] R2["Read Scaling Duckling 2"] end S1 --> RW E1 --> R1 E2 --> R1 E3 --> R2 E4 --> R2 ``` Read scaling lets you serve hundreds or thousands of concurrent end users through a single service account while maintaining predictable performance. ## Hypertenancy use cases ### Customer-facing analytics Hypertenancy is particularly powerful for [customer-facing analytics](/getting-started/customer-facing-analytics/). Each of your customers can have their own service account with isolated Ducklings: - **Data isolation**: Each customer's data stays in their own database - **Compute isolation**: One customer's workload never impacts another - **Cache isolation**: Each customer's Duckling maintains its own cache, so cached query results and data remain private and predictable - **Independent sizing**: Scale resources per customer based on their tier or needs - **Predictable costs**: Bill customers accurately based on their actual compute usage For a hands-on guide to building customer-facing analytics with per-customer service accounts, see the [Builder's Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/). ### Development and production pipelines Service accounts enable clean separation between deployment environments. Each environment gets its own isolated compute: | Environment | Service Account | Duckling Size | Purpose | |-------------|-----------------|---------------|---------| | Local/Dev | `dev-pipeline` | Pulse | Interactive development and testing | | Staging | `staging-pipeline` | Standard | Pre-production validation | | Production | `prod-pipeline` | Standard/Jumbo/... | Production workloads | This separation ensures: - Development experiments never impact production performance - Each environment has appropriately sized compute - Clear cost attribution per environment - Easy rollback by switching service account credentials ### Data warehouse and data pipeline workloads For data pipelines, you can assign dedicated service accounts to different stages of your data workflow. If you're using dbt you can run dbt models with different duckling sizes. | Pipeline Stage | Service Account | Duckling Size | Workload Pattern | |----------------|-----------------|---------------|------------------| | Ingestion | `ingest-service` | Jumbo/Mega | Bulk data loading, high I/O | | Transformation | `transform-service-standard` / `transform-service-jumbo` / | Standard/Jumbo | dbt models, ETL jobs | | Reporting | `reporting-service` | Pulse (read scaling) | Dashboard queries, read-heavy | This pattern provides: - **Workload isolation**: Heavy batch ingestion jobs won't slow down interactive reporting queries - **Right-sized compute**: Each stage gets the Duckling size optimized for its workload - **Cost visibility**: Track compute costs per pipeline stage - **Independent scheduling**: Run ingestion during off-peak hours without affecting daytime analysts ### Analytics and data science For internal analytics teams, hypertenancy means analysts and data scientists each get their own compute. A data scientist running a complex ML feature extraction job won't slow down an analyst building a quick dashboard. ## Why single-node beats distributed for per-user compute Traditional distributed data warehouses use clusters with multiple nodes that coordinate to execute queries. This architecture introduces: - Network latency between nodes - Coordination overhead - Data shuffling costs For queries that operate on one user's data at a time (the common pattern in hypertenancy), single-node execution on a Duckling eliminates this overhead entirely. The result is often faster query performance and lower costs compared to distributed systems, especially for interactive analytics workloads. DuckDB's efficient columnar execution, combined with MotherDuck's fast storage architecture, means queries can handle datasets larger than memory with minimal performance impact. ## Related content - **Learn about Duckling sizes**: [Duckling Sizes](/about-motherduck/billing/duckling-sizes/) - **Configure read scaling**: [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) - **Build customer-facing analytics**: [Customer-Facing Analytics Overview](/getting-started/customer-facing-analytics/) - **Set up per-customer service accounts**: [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) --- Source: https://motherduck.com/docs/concepts/object-name-resolution --- sidebar_position: 5 title: Object name resolution description: Fully qualified naming conventions and database resolution rules in MotherDuck. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Object name resolution ## Fully qualified naming convention Fully qualified names (FQN) in MotherDuck are of the form `..`. Fully qualified naming convention allows you to query objects in MotherDuck regardless of context. Queryable objects can be tables and views. For example: ```sql SELECT * FROM mydatabase.myschema.mytable; ``` Fully qualified naming convention is useful when you want your SQL to execute reliably across multiple interfaces, by various users, or in programmatic scripts. ## Relative naming convention For convenience, MotherDuck enables you to omit database or schema when querying objects. When **database is omitted**, MotherDuck will attempt to resolve the query by using the current database: ```sql SELECT * FROM myschema.mytable; ``` When **both database and schema are omitted**, MotherDuck will first attempt to find the object in the current schema. Thereafter, it will attempt to find the object in other schemas in the current database. If the object name is ambiguous - for example if multiple tables with the same name exist in the database - MotherDuck will return an error: ```sql SELECT * FROM mytable; ``` You may also choose to **omit just the schema**. MotherDuck will first search the current schema, and thereafter will search for the object across all other schemas in the specified database: ```sql SELECT * FROM mydatabase.mytable; ``` --- Source: https://motherduck.com/docs/concepts/pgduckdb --- sidebar_position: 3 title: pg_duckdb Extension description: Use pg_duckdb to run DuckDB analytics within PostgreSQL and connect to MotherDuck. --- [pg_duckdb](https://github.com/duckdb/pg_duckdb) is an open-source Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. Use `pg_duckdb` when you specifically need DuckDB or MotherDuck access from inside a PostgreSQL server. If you only need to connect to MotherDuck from a PostgreSQL-compatible client, use the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) instead. ## Main features - SELECT queries executed by the DuckDB engine can directly read Postgres tables - Read and write support for object storage (AWS S3, Cloudflare R2, or Google GCS) - Read and write support for data stored in MotherDuck For more information about functionality and installation, check out the [repository's README](https://github.com/duckdb/pg_duckdb/blob/main/README.md). ## Connect with MotherDuck To enable this support you first need to [generate an access token][md-access-token] and then add the following line to your `postgresql.conf` file: ```ini duckdb.motherduck_token = 'your_access_token' ``` NOTE: If you don't want to store the token in your `postgresql.conf` file can also store the token in the `motherduck_token` environment variable and then explicitly enable MotherDuck support in your `postgresql.conf` file: ```ini duckdb.motherduck_enabled = true ``` If you installed `pg_duckdb` in a different Postgres database than the default one named `postgres`, then you also need to add the following line to your `postgresql.conf` file: ```ini duckdb.motherduck_postgres_database = 'your_database_name' ``` After doing this (and possibly restarting Postgres). You can then create tables in the MotherDuck database by using the `duckdb` [Table Access Method][tam] like this: ```sql CREATE TABLE orders(id bigint, item text, price NUMERIC(10, 2)) USING duckdb; CREATE TABLE users_md_copy USING duckdb AS SELECT * FROM users; ``` [tam]: https://www.postgresql.org/docs/current/tableam.html Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres. This is done in the following way: 1. Each schema in your default MotherDuck database is merged with the Postgres schema that has the same name. 2. Except for the `main` DuckDB schema in your default database, which is merged with the Postgres `public` schema. 3. Tables in other databases are put into dedicated DuckDB-only schemas. These schemas are of the form `ddb$$` (including the literal `$` characters). 4. Except for the `main` schema in those other databases. That schema should be accessed using the shorter name `ddb$` instead. An example of each of these cases is shown below: ```sql INSERT INTO my_table VALUES (1, 'abc'); -- inserts into my_db.main.my_table INSERT INTO your_schema.tab1 VALUES (1, 'abc'); -- inserts into my_db.your_schema.tab1 SELECT COUNT(*) FROM ddb$my_shared_db.aggregated_order_data; -- reads from my_shared_db.main.aggregated_order_data SELECT COUNT(*) FROM ddb$sample_data$hn.hacker_news; -- reads from sample_data.hn.hacker_news ``` [md]: https://motherduck.com/ [md-access-token]: /key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token --- Source: https://motherduck.com/docs/concepts/resource-management --- sidebar_position: 2 title: Resource management description: Understand MotherDuck's resource hierarchy, from organizations, accounts, tokens, and secrets down to databases and tables, and how each level provides compute isolation, data isolation, and access control. --- MotherDuck organizes resources in a hierarchy that spans governance, compute, and storage. Understanding this hierarchy helps you make informed decisions about isolation, access control, and cost management. ## The resource hierarchy MotherDuck resources are organized into three layers: governance (who can do what), compute (where queries run), and storage (where data lives). ### Governance Organizations contain accounts (for both users and machine-to-machine/services), and accounts hold tokens and secrets. Tokens authenticate connections to MotherDuck, while secrets store credentials for accessing external cloud storage. ```mermaid flowchart LR Org["Organization"]:::yellow Org --> User Org --> SA subgraph " " User{{"User account"}}:::green SA{{"Service account"}}:::green end User --> Creds SA --> Creds subgraph Creds["Credentials (per account)"] Token1["R/W token"]:::sky Token2["Read scaling token"]:::sky Secret["Secrets"]:::sky end ``` ### Compute Each account gets a dedicated R/W Duckling. Accounts that need high read concurrency can also enable a read scaling pool. ```mermaid flowchart LR Token1["R/W token"]:::sky Token2["Read scaling token"]:::sky RW["R/W Duckling"]:::yellow RS["Read scaling pool"]:::yellow RSI1["Read scaling Duckling"]:::yellow RSI2["Read scaling Duckling"]:::yellow Token1 --> RW Token2 --> RS RS --> RSI1 RS --> RSI2 ``` ### Storage Databases follow the standard DuckDB hierarchy. Shares provide user, organization or public access to databases not created by that account. ```mermaid flowchart LR RW["R/W Duckling"]:::yellow DB[("Database")]:::yellow Share[("Share
(read-only clone)")]:::sky Schema["Schema"]:::green Table["Table / View"]:::green RW --> DB DB --> Schema Schema --> Table DB -. "CREATE SHARE" .-> Share ``` Ducklings can also read from and write to external cloud storage. A [secret](#secrets) provides the credentials, and the Duckling connects to the storage provider directly: ```mermaid flowchart LR RW["R/W Duckling"]:::yellow Secret["Secret"]:::sky S3[("S3 / GCS / R2 / Azure")]:::green RW -->|"SELECT FROM 's3://...'"| S3 Secret -. "authenticates" .-> S3 ``` ## What each level means ### Organization An organization is the top-level container in MotherDuck. It defines: - A **billing boundary**: all compute and storage costs roll up to the organization - A **region**: each organization lives in a single region (for example, `us-east-1`, `us-west-2`, or `eu-central-1`) - **Admin controls**: organization admins manage users, service accounts, and SSO configuration Every MotherDuck user belongs to exactly one organization. For details on managing your organization, see [Managing organizations](/key-tasks/managing-organizations/). ### Accounts: users and service accounts MotherDuck has two types of accounts: - **User accounts** represent individual people who sign in interactively - **Service accounts** represent applications, pipelines, or automated processes Both types function the same way from a resource perspective: each account gets its own dedicated [Read-Write Duckling and read scaling flock](#ducklings) and owns its own databases. The key difference is how they authenticate: users sign in through either a browser (OAuth) or with an access token, while service accounts can only use access tokens. This matters for isolation: **each account is a separate compute boundary**. Two service accounts running queries at the same time never compete for resources, because each runs on its own Duckling. ::::tip Organization admins can [impersonate a service account](/key-tasks/service-accounts-guide/impersonate-service-accounts/) through the MotherDuck UI to view its resources, run queries, or troubleshoot issues as that account. :::: For details on creating service accounts, see [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/). ### Access tokens Tokens are the credentials that authenticate a connection to MotherDuck. Each token is scoped to a specific account and determines how the connection is routed: | Token type | Routes to | Use case | |---|---|---| | **R/W token** | The account's R/W Duckling | Data loading, writes, interactive queries | | **Read scaling token** | The account's read scaling pool | High-concurrency read workloads | Multiple connections using the same R/W token share the same Duckling. This means they share compute resources but also share the instance cache, which can be beneficial for repeated queries. For details on authentication, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/). ### Secrets Secrets store cloud storage credentials (for S3, GCS, Azure, R2, and Hugging Face) in MotherDuck so your Ducklings can read from and write to external storage. Secrets are: - **Encrypted**: stored fully encrypted in MotherDuck - **Account-scoped**: each secret belongs to the account that created it and is not visible to other accounts in the organization - **Scope-matched**: when multiple secrets exist for the same storage type, MotherDuck picks in alphabetical order. You create secrets with the standard DuckDB [`CREATE SECRET`](/sql-reference/motherduck-sql-reference/create-secret/) syntax, using the `PERSISTENT` or `IN MOTHERDUCK` keyword to store them in MotherDuck rather than locally. Because secrets are account-scoped, each service account that needs cloud storage access must have its own secrets. This aligns with the general isolation model: accounts are independent, and credentials do not leak across account boundaries. ### Ducklings A Duckling is a dedicated DuckDB compute instance. Every account gets its own R/W Duckling, providing [hypertenancy](/concepts/hypertenancy/) to guarantee full compute isolation at the individual account level. Ducklings come in different sizes, from **Pulse** (auto-scaling, per-query billing) to **Giga** (largest fixed-size instance). You choose the size based on workload requirements. See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for details. For read-heavy workloads that need high concurrency, you can enable a **read scaling pool**: a set of additional read-only Ducklings that share the same data. Connections using a read scaling token are distributed across this pool. See [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for details. ### Databases, schemas, and tables Databases follow the standard DuckDB hierarchy: a database contains schemas, and schemas contain tables and views. In MotherDuck: - Every database is **owned by the account** that created it - Access is **all-or-nothing at the database level**: an account either has full access to a database or no access at all - You can grant read-only access to others through [shares](/key-tasks/sharing-data/sharing-overview/) MotherDuck also supports [Ducklake](/concepts/ducklake/), an open table format that stores data in your own object storage while MotherDuck manages the catalog. ## Isolation boundaries Different boundaries in the resource hierarchy provide different types of isolation. Use this table to understand what separates what: | Boundary | Compute isolation | Data isolation | Access control | Secret isolation | |---|---|---|---|---| | Different organizations | Full | Full | Full | Full | | Different accounts (same org) | Full (separate Ducklings) | Per-database (owned separately) | Per-database (through shares) | Full (secrets are account-scoped) | | Different tokens (same account) | None (same R/W Duckling) | None (same databases) | None (same permissions) | None (same secrets) | | Read scaling pool | Read-only isolation (separate Ducklings) | Shared (read-only replicas) | Token-scoped | Shared (same account secrets) | | Different databases | N/A | Full | Share-based | N/A | Key takeaways: - **Accounts are the primary isolation boundary.** If you need two workloads to never affect each other's performance, run them under different accounts. - **Tokens do not provide isolation.** Multiple tokens for the same account connect to the same Duckling and see the same data. - **Shares provide data access without compute sharing.** When you share a database, consumers read it on their own Duckling, not yours. - **Secrets follow account boundaries.** Each account manages its own cloud storage credentials. Secrets created by one account are never visible to another. ## Common patterns ### Isolate ETL from analysts Create separate service accounts for your data pipeline and your analysts. Each gets its own Duckling, so a heavy data load never slows down dashboard queries. ```mermaid flowchart LR subgraph Org["Organization"] ETL{{"etl-pipeline"}}:::green Analyst{{"analyst-team"}}:::green end subgraph Compute["Compute"] D1["Jumbo Duckling"]:::yellow D2["Pulse Duckling"]:::yellow D3["Pulse Duckling"]:::yellow D4["Pulse Duckling"]:::yellow end ETL --> D1 Analyst --> D2 Analyst --> D3 Analyst --> D4 D1 --> DB[("Shared database
(via share)")]:::sky D2 --> DB D3 --> DB D4 --> DB ``` The ETL service account owns the database and writes to it on a large Duckling. The analyst-team account uses a [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) pool of Pulse Ducklings to handle concurrent dashboard queries. Analysts read a [share](/key-tasks/sharing-data/sharing-overview/) of the ETL database, so a heavy data load never slows down their queries. ### Separate dev and prod environments Use different service accounts per environment. Each has isolated compute and its own databases: ```mermaid flowchart LR subgraph Org["Organization"] Dev{{"dev-pipeline"}}:::green Staging{{"staging-pipeline"}}:::green Prod{{"prod-pipeline"}}:::green end subgraph Compute["Compute"] D1["Pulse Duckling"]:::yellow D2["Standard Duckling"]:::yellow D3["Jumbo Duckling"]:::yellow end Dev --> D1 Staging --> D2 Prod --> D3 D1 --> DB1[("dev-db")]:::sky D2 --> DB2[("staging-db")]:::sky D3 --> DB3[("prod-db")]:::sky ``` Right-size each environment: Pulse for development (on-demand), Standard for staging validation, and Jumbo for production workloads. ### Customer-facing analytics (3-tier) For B2B applications that embed analytics, use a service account per customer. Your backend mediates access, and each customer gets isolated compute and data: ```mermaid flowchart LR subgraph App["Your application (auth, sessions, routing)"] FE["Frontend UI"]:::green BE["Backend API"]:::green end subgraph Org["Organization"] SA1{{"customer-a"}}:::green SA2{{"customer-b"}}:::green end subgraph Compute["Compute"] D1["Jumbo"]:::yellow RS["Read scaling pool"]:::yellow D2["Pulse Duckling"]:::yellow D3["Pulse Duckling"]:::yellow end FE -->|"user request"| BE BE -->|"read token"| SA1 BE -->|"read token"| SA2 SA1 --> D1 SA2 --> RS RS --> D2 RS --> D3 D1 --> DB1[("customer-a-db")]:::sky D2 --> DB2[("customer-b-db")]:::sky D3 --> DB2 ``` Your application handles user authentication and session management, then routes queries to the right customer's service account using stored read tokens. Each customer's service account owns its own database and Duckling. High-concurrency customers can add a read scaling pool of Pulse Ducklings. For the full walkthrough, see the [3-tier customer-facing analytics guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/). ### Embedded analytics with DuckDB WASM For lightweight, interactive analytics embedded directly in a web page, you can skip the backend tier entirely. The browser runs DuckDB WASM and connects to MotherDuck with a read-only token: ```mermaid flowchart LR Browser["Browser + DuckDB WASM"]:::green subgraph Org["Organization"] SA{{"embed-account"}}:::green end subgraph Compute["Compute"] D1["Duckling"]:::yellow end Browser -->|"read token"| SA SA --> D1 D1 --> DB[("analytics-db")]:::sky ``` DuckDB WASM runs queries client-side or routes them to MotherDuck depending on the query. This is how [Dives](/key-tasks/ai-and-motherduck/dives/) work: each embedded Dive connects to MotherDuck through a session token and queries live data directly from the browser, with no backend needed. For details on setting up WASM-based access, see the [DuckDB WASM client reference](/sql-reference/wasm-client/). ### Give read access to another team Use shares to grant read-only access without sharing compute: 1. Create a share of the database: `CREATE SHARE my_share FROM my_database` 2. Grant access to the other team's account: `GRANT READ ON SHARE my_share TO 'other_user'` The other team reads the shared database on their own Duckling. Your compute is not affected. ### Understand compute costs Because each account runs on its own Duckling, compute costs are directly attributable: - **Per-account billing**: you can see exactly how much compute each service account or user consumes - **Right-sizing**: assign different [Duckling sizes](/about-motherduck/billing/duckling-sizes/) based on workload needs - **Pulse for variable workloads**: use Pulse Ducklings for ad-hoc or bursty workloads to pay per query instead of per hour ## What's next MotherDuck is adding workspaces and role-based access control (RBAC) to provide finer-grained access control within organizations. These features build on the resource hierarchy described here. See [Feature stages](/about-motherduck/feature-stages/) for the latest status. ## Related content - [Architecture and capabilities](/concepts/architecture-and-capabilities/) - [Hypertenancy](/concepts/hypertenancy/) - [Duckling sizes](/about-motherduck/billing/duckling-sizes/) - [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) - [Sharing data](/key-tasks/sharing-data/sharing-overview/) - [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) - [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/) --- Source: https://motherduck.com/docs/concepts/results --- title: Results description: Results sidebar_class_name: cache-icon feature_stage: preview --- **RESULT** provides asynchronous query execution with a transparent cache. Create a RESULT to run a SELECT in the background, then query it like a table while controlling its lifecycle (pause, resume, cancel, drop). You can think of a result as a view with an attached cache that is used whenever possible to speed up queries. Results are stored in memory and will only remain visible until your client-side DuckDB session is restarted. For the SQL syntax reference, see [`RESULT`](/sql-reference/motherduck-sql-reference/result). ## Core concepts ### What is a RESULT? ```sql CREATE RESULT AS ; FROM SELECT ...; ``` A RESULT is a named relation in your DuckDB database that: - Runs the provided `SELECT` in the background (creation is non-blocking) - Caches rows produced by that statement as it runs - Provides lifecycle management (pause, resume, cancel, drop) - Can be queried like a regular table - Maintains execution state and progress information ### Result states Results can be in one of three states: - **BUILDING**: Query is actively running and appending rows to the cache - **PAUSED**: Query execution is temporarily paused - **DONE**: Query execution has completed, which can occur for three reasons: 1. Query finished successfully 2. Query was preemptively stopped (e.g., aborted by the user) 3. Query encountered an error ## Interacting with results ### Creating results When you create a RESULT, the provided `SELECT` starts running in the background. You can query the result like a normal table at any time. results, you can query the result just like you would query a normal table. ```sql -- Basic syntax CREATE RESULT AS ; -- With conflict resolution CREATE RESULT IF NOT EXISTS AS ; CREATE OR REPLACE RESULT AS ; -- Accessing the result FROM LIMIT ; ``` ### Accessing results You can query a result like a table. The relation appears quickly after creation, although the background `SELECT` may still be running. query creating the result has completed successfully. This occurs very quickly and does not mean that the `SELECT` statement associated with the result has completed running. ```sql FROM LIMIT ; ``` There is **no guarantee** the cache is complete when you query a result. Depending on the state of the `RESULT` and your query, the system may read from the cache, wait for additional rows, or bypass the cache and re-run the original `SELECT`. The decision tree below shows how the `FROM my_result LIMIT 100` accessing the RESULT `my_result` behaves. ```mermaid flowchart TD start(("FROM my_result LIMIT 100")):::circle -->|Completed successfully| cache(((Read from cache))):::circle start -->|"RESULT is not running (PAUSED/DONE with error)"| enough start -->|RESULT is BUILDING| enough_building enough_building{"Has enough data?
(cache > 100)"}:::green -->|Yes| cache enough_building -->|No| access_limit access_limit{"access limit < 500,000
(100 < 500,000)"}:::green -->|Yes| delay access_limit -->|No| rerun delay(Wait for 100 rows in cache
or result complete) --> cache enough{"Has enough data?
(cache > 100) OR DONE without error?"}:::green -->|Yes| cache enough -->|No| rerun(((Re-run query))):::circle ``` ### Lifecycle management On creation, new results start in the **BUILDING** state. While building, you can **PAUSE**, **RESUME**, **CANCEL**, or **DROP** the result. Pause suspends execution, resume continues from where it stopped. Cancel stops the job permanently and it cannot be resumed. Canceled results can still be queried, but they will not append any new rows to the cache. When a result is dropped, it is permanently deleted and can no longer be queried. Dropping a result also removes its associated cache. ```mermaid stateDiagram-v2 [*] --> BUILDING: Result Created BUILDING --> PAUSED: PAUSE RESULT PAUSED --> BUILDING: RESUME RESULT BUILDING --> DONE: SELECT statement completes BUILDING --> DONE: CANCEL RESULT PAUSED --> DONE: CANCEL RESULT note right of BUILDING Query is actively running end note note right of PAUSED Query execution paused. Can be resumed. end note note right of DONE Execution finished: completed, error, or canceled. end note note left of DONE PAUSE/RESUME will error when in DONE state end note ``` #### Pause result ```sql PAUSE RESULT ; PAUSE RESULT IF EXISTS ; ``` #### Resume result ```sql RESUME RESULT ; RESUME RESULT IF EXISTS ; ``` #### Cancel result ```sql CANCEL RESULT ; CANCEL RESULT IF EXISTS ; ``` #### Drop result ```sql DROP RESULT ; DROP RESULT IF EXISTS ; ``` ### Introspecting results Use `SHOW ALL RESULTS` to list all your results alongside their status and progress. The returned table also includes: 1. `name`: The name of the result 2. `error`: Any error message associated with the result (is empty if no error occurred) 3. `status`: The current status of the result (BUILDING, PAUSED, DONE) 4. `row_count`: The number of rows in the result cache. This grows as the result builds and is not stable within the same transaction (it will increase as the result is being built). ```sql SHOW ALL RESULTS; --| name | error | status | row_count | --|-------|---------------------------------------------------------------------|----------|-----------| --| foo | (empty) | DONE | 100,000 | --| bar | INTERRUPT Error: The RESULT "bar" has been manually canceled. | DONE | 10,000 | --| hello | (empty) | PAUSED | 1,000 | --| world | (empty) | BUILDING | 100 | ``` If you want to order the results, filter them or limit the output you can use the `MD_SHOW_RESULTS` table function: ```sql FROM MD_SHOW_RESULTS() WHERE name = 'foo'; --| name | error | status | row_count | --|------|---------|--------|-----------| --| foo | (empty) | DONE | 100,000 | ``` ## Best practices - Use `LIMIT` when you need only a small sample so that `RESULT` can serve them quickly from the cache. - Prefer deterministic `SELECT` statements for predictable caching and reuse. - Pause or cancel long-running results you do not need immediately and remember to drop them when no longer in use. ## Notes and limitations - `RESULT` accepts `SELECT` statements only. - The cache may be partial while the result is building. Queries may wait briefly, use the cache, or re-run the `SELECT`. - A canceled result cannot be resumed. - Results are stored in memory and will not persist across client restarts. ## See also - [Building data applications with MotherDuck](https://motherduck.com/blog/building-data-applications-with-motherduck/) - [MotherDuck wasm npm package](https://www.npmjs.com/package/@motherduck/wasm-client?activeTab=readme) - [MotherDuck wasm example repository](https://github.com/motherduckdb/wasm-client) --- Source: https://motherduck.com/docs/concepts/scaling-patterns --- sidebar_position: 4 title: Workload scaling patterns description: Choose the right compute size, scaling approach, and connection model for your MotherDuck workload using a decision flowchart and workload-to-pattern matrix. --- import ScalingPatternsDiagram from '@site/src/components/ScalingPatternsDiagram'; MotherDuck gives you several levers to scale your workloads. The right combination depends on your concurrency needs, query characteristics, and whether your workload is read-heavy or write-heavy. This page helps you match your workload to the right scaling pattern. ## How MotherDuck scales per workload MotherDuck scales workloads through compute units called [Ducklings](/concepts/hypertenancy/). Each user or service account gets a dedicated Duckling and read scaling flock, and you can adjust three levers to match your workload: | Scaling lever | What it does | When to use it | |---|---|---| | **Vertical scaling** | Resize your Duckling ([Pulse through Giga](/about-motherduck/billing/duckling-sizes)) | Queries need more CPU or memory| | **Horizontal scaling** | Add read-only Ducklings through [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) | Many concurrent users running read queries | | **Workload isolation** | Create separate [service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) | Teams or pipelines that should not share compute, for example to prevent a large data ingestion impacting the queries of analysts. | These levers are complementary. For example, you might use a Jumbo Duckling (vertical) for data loading and use a flock of pulse Ducklings with read scaling (horizontal) for your dashboard users. You can connect through any [supported interface](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/), including the native DuckDB SDK, the Postgres endpoint, or DuckDB WASM. :::tip[Not sure what you need?] Follow the [decision flowchart](#decision-flowchart) at the bottom of this page to find the right scaling pattern for your workload. ::: ## Understanding the scaling levers ### Vertical scaling: Duckling sizes When a single query needs more resources, move to a larger Duckling. Larger Ducklings have more CPU, memory, and extra SSD space to be used whenever queries use up too much memory. This helps with: - Complex joins and aggregations - Large data loading jobs - Queries that process more data than fits in memory Duckling sizes range from **Pulse** (lightweight, on-demand billing) to **Giga** (maximum resources for the heaviest batch jobs). See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for the full comparison. **When to size up**: If queries are slow and you see high values for `BYTES_SPILLED_TO_DISK` or `WAIT_TIME` in your [query history](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/), your Duckling may need more memory. ### Horizontal scaling: Read scaling When you need to serve many concurrent read queries, [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) adds read-only Duckling replicas behind your account. Key concepts: - **Pool size**: The default pool is 4 Ducklings, configurable up to 16 (soft limit). [Contact support](https://motherduck.com/contact-us/support/) for higher limits. - **Eventual consistency**: Read replicas lag a few minutes behind the primary. Use [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot/) and [`REFRESH DATABASES`](/sql-reference/motherduck-sql-reference/refresh-database/) if you need tighter synchronization. - **One Duckling per user**: For the best performance, aim for one Duckling per concurrent user. This takes full advantage of DuckDB's single-node architecture. ### Session affinity and routing By default, read scaling distributes connections across the pool in round-robin fashion. When the number of connections exceeds your pool size, new connections share existing Ducklings. For workloads where users run unique queries, use [`session_name`](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#session-affinity-with-session-name) to route a user's connections to the same Duckling. This improves performance because: - **Cache locality**: DuckDB caches data per instance. Routing the same user to the same Duckling means their subsequent queries benefit from a warm cache. - **Consistent view**: Queries within a session see a consistent snapshot of the data. - **Better isolation**: Concurrent users do not compete for the same Duckling's resources. Set `session_name` to a user ID, session ID, or any stable identifier to benefit from caching where possible. You can [set the Duckling cooldown period](/docs/about-motherduck/billing/duckling-sizes/#configuring-the-cooldown-period) to match your use case and keep the Duckling alive with its cache. **When to use `session_name`**: Use it when users run unique, personalized queries, for example in customer-facing analytics or multi-tenant dashboards. Skip it when all users run the same queries (such as a shared reporting dashboard), since a shared connection pool already routes efficiently. ### Workload isolation: Service accounts [Service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) give you full compute and data isolation between workloads. Each service account gets its own Duckling and read scaling pool, which makes it easier to track usage and billing. Use separate service accounts when: - Different teams should never share compute (for example, production vs. development) - A data loading pipeline should not compete with queries from analysts - You need your customers to be able to write back to a database in a [customer-facing analytics](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/) setup, or have a separate read scaling flock for each customer. If you need a visual interface to manage these service accounts, the UI lets you impersonate a service account and adjust settings and run queries as the service account. ## Quick reference: workload patterns Use this matrix to find the recommended pattern for common workloads. Each row represents a typical use case, with inputs describing the workload and outputs recommending a configuration. | Use case | Users | Concurrency | R/W | Overlap | Weight | Duckling size | Scaling approach | |---|---|---|---|---|---|---|---| | *Ad-hoc analyst* | 👤 | Sequential | R/W | — | ⚡/🏋️ | Pulse / Standard+ | Default (single Duckling) | | *dbt or ELT pipeline* | 👤 | Concurrent | W | — | 🏋️ | Jumbo / Mega | Default (single Duckling) | | *Scheduled ingestion job* | 👤 | Sequential | W | — | 🏋️ | Jumbo+ | Default + dedicated service account | | *BI dashboard (Omni, Hex, Metabase)* | 👥👥 | Concurrent | R | High | ⚡ | Pulse / Standard | Read scaling (shared pool) | | *Embedded analytics* | 👥👥 | Concurrent | R | Low | ⚡/🏋️ | Pulse / Standard+ | Read scaling + `session_name` | | *Customer-facing app (3-tier)* | 👥👥 | Concurrent | R | Low | ⚡ | Standard | Read scaling + `session_name` | | *Serverless function (Lambda, Workers)* | 👥👥 | Concurrent | R | Varies | ⚡ | Standard | Read scaling | | *Multi-team production* | 👥👥 | Concurrent | R/W | Low | ⚡/🏋️ | Per team | Separate service accounts | ### Reading the matrix **Input columns** describe your workload: - **Users**: How many people or clients connect: 👤 single, 👥👥 many - **Concurrency**: Whether queries run one at a time (sequential) or in parallel (concurrent) - **R/W**: Whether the workload reads (R), writes (W), or both (R/W) - **Overlap**: Whether different users tend to run the same queries (high) or unique queries (low) - **Weight**: Whether queries are light (⚡ sub-second) or heavy (🏋️ seconds to minutes) **Output columns** recommend a configuration: - **Duckling size**: Which [Duckling size](/about-motherduck/billing/duckling-sizes/) to use - **Scaling approach**: Which horizontal scaling method to apply ## Choosing an interface Your choice of interface does not change the scaling levers available to you, but it does affect session management and connection behavior. | Interface | Best for | Session management | |---|---|---| | Native SDK (Python, Node.js, Java) | Client applications, scripts, dbt | Instance cache, `session_name` | | [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/) | Serverless functions, BI tools, environments without DuckDB | Per-connection | | DuckDB WASM | Browser-based applications | Client-side compute | ## Cost considerations Scaling decisions affect your compute costs: - **Vertical scaling** increases the per-second cost of your Duckling. Larger Ducklings cost more but finish heavy queries faster. - **Horizontal scaling** adds Ducklings proportional to active sessions, not total users. Idle Ducklings shut down after the configured [cooldown period](/about-motherduck/billing/duckling-sizes/). - **Pulse Ducklings** use per-query billing (minimum 1 compute-unit second), making them cost-effective for sporadic, lightweight workloads. - **Cooldown tuning** balances cost against cache warmth. A longer cooldown keeps the cache warm for returning users but costs more during idle periods. See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) and [pricing](/about-motherduck/billing/pricing/) for the full cost breakdown. ## Decision flowchart If you are not sure where to start, follow this flowchart: ```mermaid flowchart TD WriteQ{"Write-heavy?
(data loading, ETL)"}:::yellow WriteQ -->|Yes| WeightQ{"Heavy queries?
(complex joins, large loads)"}:::yellow WriteQ -->|No| ConcQ{"How many concurrent
read users?"}:::yellow WeightQ -->|Yes| SizeUp["Size up your Duckling
(Jumbo / Mega / Giga)"]:::green WeightQ -->|No| StdDuckling["Standard Duckling"]:::green SizeUp --> Isolation{"Need compute isolation
between workloads?"}:::yellow StdDuckling --> Isolation Isolation -->|Yes| SvcAcct["Separate service accounts"]:::green Isolation -->|No| SingleAcct["Single service account"]:::green ConcQ -->|"1-5"| Default["Default Duckling,
size for your heaviest query"]:::green ConcQ -->|"5-50"| OverlapQ{"Do users read the
same data?"}:::yellow ConcQ -->|"50+"| OverlapQ2{"Do users read the
same data?"}:::yellow OverlapQ -->|"Yes, mostly shared"| SharedPool["Read scaling
(shared connection pool)"]:::green OverlapQ -->|"No, differs per user"| SessionHint["Read scaling
+ session_name"]:::green OverlapQ2 -->|"Yes, mostly shared"| HighConcShared["Read scaling at max
connection pool size"]:::green OverlapQ2 -->|"No, differs per user"| HighConcUnique["Read scaling at max
connection pool size
+ session_name"]:::green ``` ## Related content - [Hypertenancy](/concepts/hypertenancy/): how MotherDuck's per-user compute model works - [Duckling sizes](/about-motherduck/billing/duckling-sizes/): compare sizes and configure your Ducklings - [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/): set up read-only Duckling pools - [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/): create isolated compute for teams and pipelines - [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/): connect through the PostgreSQL wire protocol - [Customer-facing analytics](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/): build multi-tenant analytics with per-customer isolation --- Source: https://motherduck.com/docs/concepts/storage-lifecycle --- title: Storage Lifecycle and Management sidebar_position: 3 description: Understand how MotherDuck manages data storage across different lifecycle stages and how this affects your billing and data management strategies. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes. The storage lifecycle applies to both native storage databases and [DuckLake](/concepts/ducklake) databases, with some differences in lifecycle stages and management. See [storage management](#storage-management) for retention defaults by database type. ## Storage lifecycle overview The following diagram shows the storage lifecycle for native storage databases. ```mermaid graph LR; A[Active Bytes]-->|bytes deleted or updated|B[Historical Bytes]; B-->|shares dropped|C[Retained for Clone Bytes]; B-->|historical retention period passes, or snapshots become unnamed|D[Failsafe Bytes]; C-->|bytes deleted or updated by cloned databases|D[Failsafe Bytes]; D-->|7 day retention|E[Deleted]; ``` There are 5 distinct stages of the storage lifecycle: 1. **Active bytes**: Actively referenced bytes of the database. These bytes are accessible by directly querying the database. 2. **Historical bytes**: Non-active bytes referenced by historical [snapshots](/concepts/snapshots) or shares of this database. Used for time travel and self-service restore. 3. **Retained for clone bytes**: Bytes referenced by other databases (through zero-copy clone) that are no longer referenced by this database as active or historical bytes. This stage applies to native storage databases only. 4. **Failsafe bytes**: Bytes no longer referenced by any database or share, retained for a period as a last-resort, best-effort recovery service. Recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed to be complete. Don't rely on failsafe bytes as part of a backup plan. 5. **Deleted**: Bytes are fully removed from the system and no longer accessible. MotherDuck runs a periodic job that reclassifies data to the proper storage lifecycle stage. For DuckLake databases, [auto maintenance](/concepts/ducklake#auto-maintenance) handles file cleanup and snapshot expiration. Data can only flow through the storage lifecycle unidirectionally, from left to right. The following conditions can trigger data to be reclassified to a new stage: | Trigger | State transition | |---------|------------------| | Data is deleted or updated in the database | Active → Historical | | All shares referencing the data are dropped or updated, and all historic [snapshots](/concepts/snapshots) referencing the data are deleted | Historical → Retained for Clone or Failsafe | | Data is deleted from all zero-copy-cloned databases | Retained for Clone → Failsafe | | Failsafe retention period passes (7 days for standard, 1 day for transient) | Failsafe → Deleted | An organization is billed based on the average of active, historical, retained for clone, and failsafe bytes across all of their databases over the billing period. Refer to the [data recovery](/concepts/data-recovery) overview for more details on how to manage historical snapshots. ### How this affects your data strategy Understanding the storage lifecycle helps you make informed decisions about: - **Data deletion strategies**: When you delete data, it doesn't immediately reduce your bill due to the retention stages - **Sharing considerations**: Shared data remains in historical bytes until shares are updated or dropped - **Cloning decisions**: [Zero-copy clones](/docs/sql-reference/motherduck-sql-reference/create-database/) can keep data in retained for clone bytes even after deletion from the source - **Cost optimization**: Different lifecycle stages have different cost implications and management strategies For more information on data sharing, see [Sharing Data](/key-tasks/sharing-data/sharing-overview). For details on zero-copy cloning, refer to [MotherDuck Architectural Concepts](/concepts/database-concepts/#motherduck-architectural-concepts). ## Storage management Storage retention behavior depends on the database type: standard, transient, or DuckLake. `SNAPSHOT_RETENTION_DAYS` controls how many days historical snapshots are retained for data recovery and time travel (see [Data Recovery](/concepts/data-recovery)). The recommended minimum is at least 1 day, so you can recover your data if you accidentally drop or overwrite it. To see the historical retention and transient status of your databases, use the [`md_information_schema.databases`](/sql-reference/motherduck-sql-reference/md_information_schema/databases) view. Lite starts in free-tier mode with no historical retention until usage limits are reached, after which Lite defaults apply. ### Standard databases | Plan | Failsafe period | Default historical retention | Min historical retention | Max historical retention | |----------|-------------------------------------|------------------------------|------------------------------|------------------------------| | **Business** | 7 days | 7 days | 0 days | 90 days | | **Lite (paid)** | 7 days | 1 day | 1 day | 1 day | | **Lite (free)** | 7 days | 0 days | 0 days | 0 days | Historical retention enables point-in-time restore for your data. Business plan users can configure retention up to 90 days for extended data recovery capabilities. ### Transient databases For use cases that don't require the default failsafe retention period (7 days), a native storage database can be set as `TRANSIENT` [at database creation](/sql-reference/motherduck-sql-reference/create-database/#database-options) to enforce a 1 day failsafe minimum. This setting can only be defined at database creation and **is not** modifiable. | Plan | Failsafe period | Default historical retention | Min historical retention | Max historical retention | |----------|----------------------------------|--------------------------------------------------|--------------------------------------------------|--------------------------------------------------| | **Business** | 1 day | 1 day | 0 days | 90 days | | **Lite (paid)** | 1 day | 1 day | 1 day | 1 day | | **Lite (free)** | 1 day | 0 days | 0 days | 0 days | Transient databases enforce a 1-day minimum lifetime for data, which shows up in your bill as failsafe bytes. Transient databases can be helpful for the following datasets: * Datasets that are the intermediate output of a job (write once, read once) * Datasets that can be reconstructed from an external data source ### DuckLake databases [DuckLake](/concepts/ducklake) databases follow the same lifecycle stages as native storage databases (active, historical, failsafe, deleted), except there is no "retained for clone" stage since DuckLake does not support zero-copy cloning. | Setting | Fully managed DuckLake | BYOB DuckLake | |---------|----------------------|---------------| | **Failsafe period** | 7 days | 7 days | | **Default snapshot retention** | Infinite (`NULL`) | Infinite (`NULL`) | | **Auto maintenance** | Enabled by default | Disabled by default | | **Configurable retention** | Yes, with `SNAPSHOT_RETENTION_DAYS` | Yes, after enabling `AUTO_MAINTENANCE` | DuckLake storage optimization and snapshot expiration are handled by [auto maintenance](/concepts/ducklake#auto-maintenance) rather than the native storage garbage collector. When `SNAPSHOT_RETENTION_DAYS` is set to `NULL` (the default), snapshots are retained indefinitely. To configure snapshot retention for a DuckLake database: ```sql ALTER DATABASE my_ducklake SET SNAPSHOT_RETENTION_DAYS = 7; ``` For more details on DuckLake storage management, see the [DuckLake storage lifecycle](/concepts/ducklake#storage-lifecycle) section. ## Backup strategies If your data can't be recreated from source, plan an explicit backup strategy. Failsafe bytes are a last-resort recovery mechanism, not a backup plan: recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed. The storage lifecycle gives you several mechanisms that you can rely on for backups: - **Automatic snapshots** for time travel and short-term restore, retained as `historical_bytes` according to `SNAPSHOT_RETENTION_DAYS`. Retention defaults and limits depend on your plan (see [Standard databases](#standard-databases)). - **Named snapshots** (Business plan) for long-lived backups that persist until you explicitly remove them. See [database snapshots](/concepts/snapshots#2-named-snapshots) for details. - **Zero-copy clones** through [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) for isolated copies without duplicating storage costs. [Transient databases](#transient-databases) skip the default 7-day failsafe retention and are appropriate for data that can be recreated from a job or external source. For recovery procedures, see [data recovery](/concepts/data-recovery). ## Breaking down storage usage :::note Admin only Storage breakdown information is only available to users with the Admin role. ::: To understand your organization's storage bill, you have two entry points: Query the [`STORAGE_INFO` and `STORAGE_INFO_HISTORY` views](/sql-reference/motherduck-sql-reference/md_information_schema/storage_info) in [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction) for a breakdown by lifecycle stage, as either a current snapshot or up to 30 days of history. ```sql -- Get current storage information for all databases SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO; ``` Open the [databases page](https://app.motherduck.com/settings/databases) in settings to see total storage across all databases and a per-database breakdown. Click a row to view lifecycle stages for that database. ### _Active bytes_ are higher than expected Consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types. ### _Historical bytes_ are higher than expected You should look into either outstanding manually updated shares referencing this database in the organization or your historical database snapshots. Outstanding manually updated shares may keep historical data referenced (which prevent it from being deleted). Your historical byte footprint will decrease as the shares are updated (`UPDATE SHARE`) or dropped. You can find all shares that reference some database by using the [OWNED_SHARES](/sql-reference/motherduck-sql-reference/md_information_schema/owned_shares) view in the [MD_INFORMATION_SCHEMA](/sql-reference/motherduck-sql-reference/md_information_schema/introduction). Otherwise you can consider reducing the `SNAPSHOT_RETENTION_DAYS` on your database to reduce the number of historical snapshots you retain. Note that this will reduce the window of time that you can restore data from. See [data recovery](/concepts/data-recovery) for more details on how to plan and setup a proper data recovery protocol for your organization. ### _Retained for clone bytes_ are higher than expected Consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases. ### _Failsafe bytes_ are higher than expected Failsafe bytes result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (through overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint. You can also consider using a [`TRANSIENT` database](#transient-databases), if it supports your use case, to reduce failsafe bytes to [1 day](https://motherduck.com/docs/concepts/storage-lifecycle/#transient-databases). If you need help understanding or reducing your storage bill, reach out to [MotherDuck support](https://motherduck.com/contact-us/support/). --- Source: https://motherduck.com/docs/getting-started/customer-facing-analytics --- sidebar_position: 3 title: Customer-Facing Analytics Overview sidebar_label: Customer-Facing Analytics description: Build customer-facing embedded analytics with MotherDuck. Per-user isolation, sub-second SQL dashboards, and white-label analytics for SaaS—no complex infrastructure needed. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Versions from '@site/src/components/Versions'; import CalloutBox from '@site/src/components/CalloutBox'; Customer-facing analytics (CFA), or embedded analytics, has requirements that traditional data architectures rarely meet. If you're building SaaS analytics dashboards, white-label reporting, or embedded data visualizations, CFA demands sub-second response times, per-customer isolation, and integration with operational applications — all while serving many concurrent end users. MotherDuck addresses these needs through two architectural capabilities: - **[Hypertenancy](#1-hypertenancy)**: Each customer gets their own dedicated DuckDB instance (Duckling), providing full compute isolation (so no resource contention between users), predictable performance, and the ability to scale resources independently based on individual customer needs. - **[Dual execution](#2-dual-execution-for-zero-latency-exploration)**: Enabled by DuckDB's lightweight architecture, queries can run both in the cloud and directly in the client's browser through WebAssembly, delivering near-instantaneous data exploration and filtering. This guide explains how MotherDuck's architecture addresses the [core CFA challenges](#the-cfa-challenge) and provides [implementation patterns](#implementation-patterns) you can ship. ## What is customer-facing analytics? **Customer-Facing Analytics (CFA)** embeds analytics directly into operational applications for external users—customers, partners, or end-users—rather than internal stakeholders. Traditional BI targets internal teams, runs on batch-processed data models, serves a small number of users, and tolerates higher-latency queries. | Dimension | Traditional BI | Customer-Facing Analytics | | -------------- | ------------------------------- | ------------------------------------ | | **Audience** | Internal (analysts, executives) | External (customers, partners) | | **Delivery** | BI tools (Tableau, Looker) | Embedded in application | | **Latency** | Seconds to minutes acceptable | Milliseconds to low seconds required | | **Scale** | Dozens to hundreds of users | Thousands to millions of users | | **Isolation** | Shared warehouse | Per-customer isolation needed | | **Tech Stack** | Python, BI tools | JavaScript, embedded SDKs | "Customer-facing analytics" and "embedded analytics" get used interchangeably. Both describe integrating analytical capabilities directly into a product instead of sending users to a separate BI tool. The difference is one of emphasis: customer-facing analytics focuses on the *audience* (your customers), while embedded analytics focuses on the *delivery* (built into your app). MotherDuck supports both. ### Common use cases - **SaaS analytics dashboards:** give customers self-serve analytics within your product, covering usage metrics, performance KPIs, and ROI reporting - **White-label analytics:** offer analytics under your brand that customers can explore without leaving your app - **Embedded dashboards:** drop interactive charts and tables directly into your application UI - **Multi-tenant reporting:** serve thousands of customers from one platform while keeping each tenant's data and compute isolated :::info **What about AI-driven analytics?** AI-driven analytics enables natural language interactions with data, allowing users to ask conversational questions like "What were our top-selling products last quarter?" and get immediate answers. MotherDuck's [hypertenancy](/concepts/hypertenancy) and dual execution make it well-suited for building AI-driven analytics solutions. Learn how to [build analytics agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/). ::: ## The CFA challenge Building customer-facing analytics systems presents three core challenges: ### Challenge 1: Technology stack mismatch For many applications, the data sits in a transactional database (OLTP database) like Postgres or MySQL. Engineers building CFA features often run analytical queries directly in a multi-tenant transactional database, which works until it fails at scale. Row-based storage and transactional databases are not designed for efficient analytical querying. ![Crying Database](./img/crying_db.webp) Operational applications often live in JavaScript/TypeScript, but traditional data tools are Python-centric. Operational teams work with OLTP databases built for transactions, while data teams use OLAP systems tuned for analytics but with their own challenges. Analytical workloads spike with user activity, while transactional loads need steady compute. ### Challenge 2: Latency requirements Users expect sub-second response times—typical for OLTP systems. Anything slower degrades the application experience. Distributed OLAP systems (BigQuery, Snowflake, Databricks) often have cold starts and coordination overhead that keep them above those targets, even for small datasets. Teams often add caching layers or refresh pipelines between OLTP and OLAP. That adds complexity, introduces another failure point, and delays data freshness. ### Challenge 3: Multi-tenancy at scale Switching to an analytics engine is the first step. Many legacy OLAP engines were designed for internal analytics and are provisioned as a single instance or cluster for all customer data, leading to downstream complexities: ![Legacy Data Warehouse](./img/legacy_data_warehouse.png) - **Overprovisioning**: Resources sized for peak load sit idle most of the time - **Noisy neighbors**: Large customer impacts small customers - **Resource contention**: Concurrency limits affect everyone - **Unpredictable performance**: Query times vary based on load - **Security concerns**: All customer data in one shared system ## Why MotherDuck for customer-facing analytics? MotherDuck's architecture aligns with the requirements of Customer-Facing Analytics. Two architectural advantages set it apart: ### 1. Hypertenancy MotherDuck provisions a Duckling (DuckDB instance) for each customer (or even for each customer's users). This [hypertenancy](/concepts/hypertenancy) model isolates customer data and delivers consistent DuckDB performance to each user. ![Happy Database](./img/happy_db.webp) **Why single-node beats distributed compute clusters for CFA** Traditional data warehouses use distributed computing with coordination overhead, data shuffling, and network latency. Even a fast query typically takes a second or more because of this overhead. DuckDB and MotherDuck use single-node, optimized columnar execution: - Zero network hops - Zero coordination overhead - Optimized vectorized execution For CFA workloads that query one customer's data at a time, single-node execution is usually faster than distributed, and MotherDuck can reach **subsecond performance**. #### Scaling analytics up and out Each customer (and possibly each of their users) has their **own MotherDuck Duckling** (DuckDB instance). One account could run hundreds or thousands of Ducklings at a time, or none. This serverless model underpins MotherDuck's advantage versus other engines. MotherDuck's **cold start time is ~1 second**, and **per-second billing** (1-second minimum) keeps individual queries cost-efficient. :::note While MotherDuck supports provisioning one Duckling per user, start simpler. Begin with a single Duckling and introduce per-user isolation and dedicated [read scaling tokens](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) as your user base grows beyond 100 users or when tighter performance guarantees are needed. ::: ![MD Router](./img/md_router.svg) This isolated Duckling approach with vertical scaling delivers: - **Perfect isolation**: No noisy neighbors - **Predictable performance**: Dedicated resources per customer - **Cost-effective**: Pay only for what each customer needs - **Easy scaling**: Vertically scale individual ducklings as needed Scale vertically by upgrading (or downgrading) the Duckling size your application uses for each customer, giving more power to higher-priority customers. If you need more compute or higher concurrency, launch [read scaling Ducklings](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for compute-hungry customers. MotherDuck offers several [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for larger workloads. For programmatic changes to user settings, refer to our [API docs](/sql-reference/rest-api/motherduck-rest-api/). ### White-label analytics Many SaaS companies need analytics that look and feel native to their product. MotherDuck's architecture supports white-label analytics by design: - **Per-customer isolation:** each tenant gets a dedicated Duckling, with no shared infrastructure leaking through - **Flexible query layer:** use any frontend charting library (Recharts, D3, Observable Plot) with MotherDuck as the SQL backend - **No vendor branding:** unlike embedded BI tools that surface their own UI, MotherDuck powers your queries behind the scenes - **DuckDB-Wasm for client-side execution:** ship analytics that run entirely in the browser for maximum responsiveness ### 2. dual execution for zero-latency exploration As you build Customer-Facing Analytics into your product, you need sub-second response times so customers can explore their data quickly. Distributed data warehouses rarely meet that bar. Because MotherDuck is built on DuckDB, you can connect from any DuckDB client. DuckDB is an in-process database, so it **can run on your server (3-tier) or directly in the client's browser through WebAssembly (1.5-tier)**. This enables "dual execution": combining local data and compute with cloud data and compute in a single query, giving you flexibility to optimize for performance and cost. **Traditional approach has multiple network hops:** ```mermaid flowchart LR subgraph Client Side User{{"USER"}}:::green Browser["CLIENT (Browser)"] end subgraph Server Side Server["SERVER"]:::watermelon Database[("DATABASE")]:::yellow end User --> Browser Browser --> Server Server --> Database ``` **DuckDB-Wasm enables client-side execution:** ```mermaid flowchart LR subgraph Client Side User{{"USER"}}:::green subgraph Browser["CLIENT (Browser)"] LocalDB[("DATABASE")]:::database end end subgraph Server Side CloudDB[("DATABASE")]:::database end User --> Browser Browser --> CloudDB ``` Because the same DuckDB SQL engine runs on both MotherDuck Ducklings and on your customers' machines, you can offload data processing to their laptops and provide fast data exploration, filtering, and sorting using SQL. Customers do not need to install anything because DuckDB runs inside the web browser using WebAssembly (Wasm). You can see this experience in [Column Explorer](/getting-started/interfaces/motherduck-quick-tour/) and [Instant SQL](https://motherduck.com/blog/introducing-instant-sql/) in the MotherDuck UI. Here's a teaser of it in action: ![Instant SQL](./img/fast_queries.gif) ## Implementation patterns MotherDuck enables two distinct architectural patterns for customer-facing analytics: ### 3-tier architecture **Best for:** Applications requiring server-side authorization, business logic, or deployments to stateful platforms. **Typical web application architecture:** ```mermaid flowchart LR Frontend["Browser (React Frontend)"] Backend["Application Server (Express / FastAPI)"] MotherDuck[("MotherDuck (Cloud Database)")]:::yellow Frontend -->|"API Requests"| Backend Backend -->|"Persistent Connection, SQL Queries"| MotherDuck ``` **Key Benefits:** - Persistent database connection (connection pooling saves ~200ms per request) - Fast query performance (~50-100ms) - Server-side security and authorization - Works with any DuckDB client (Node.js, Python, Go, Rust, Java) **Performance optimizations:** 1. Intermediate table results: Pre-aggregate data on MotherDuck for faster queries 2. Prefer one well-structured SQL statement that returns all needed metrics (using SELECT with multiple aggregates, CASE/FILTER, or UNION ALL). 3. For multi-step workflows, wrap statements in a BEGIN … COMMIT transaction to ensure atomicity. 4. For data movement, use bulk operations (COPY, INSERT … SELECT) instead of many row-by-row calls. 5. Application Caching: Cache rarely-changing data on your server to avoid any extra queries on MotherDuck **When to use:** - You need server-side authorization and business logic - You want a traditional, battle-tested architecture - You're deploying to stateful services (Cloud Run, ECS, Kubernetes) - Your team works with multiple languages ### 1.5-tier architecture (DuckDB-Wasm) **Best for:** Read-heavy dashboards with `<1GB` data per user where you need maximum performance. This works well for embedded dashboards with interactive charts, tables, and filters that respond in under 10ms because queries execute locally in the user's browser. **Architecture:** ```mermaid flowchart LR Browser["Browser
(React + MotherDuck Wasm SDK)"] MotherDuck[("MotherDuck
(Cloud Database)")]:::yellow Browser -->|"Initial data fetch
Query execution"| MotherDuck ``` **Key Benefits:** - Sub-10ms query latency (queries run locally in browser) - Near-zero server costs (just data transfer) - Offline support after initial data load - Infinite scalability (users provide compute) **Performance optimizations:** 1. **Optimize Initial Load**: Use Parquet compression, limit to `<50MB` 2. **IndexedDB Persistence**: Data survives page reloads 3. **Incremental Sync**: Only fetch new data since last sync **When to use:** - Read-heavy dashboards with frequent filtering/drilling - Want `<10ms` query latency - Data per user is `<1GB` - Want to minimize server costs :::info WebAssembly applications using multi-threading (including DuckDB-Wasm) require cross-origin isolation. This means your page must be served with specific headers (`Cross-Origin-Embedder-Policy: require-corp` and `Cross-Origin-Opener-Policy: same-origin`), and resources from different origins must include a `Cross-Origin-Resource-Policy: cross-origin` header. If you're building a new application, a dedicated page is easier to manage within these constraints. If you have existing dependencies (iframes, third-party scripts, etc.) and need to integrate analytics into an existing page, the 3-tier architecture is recommended. ::: #### Hands-on example See our [1.5-tier architecture example](https://github.com/motherduckdb/wasm-client/tree/main/examples/nypd-complaints) demonstrating best practices for building a 1.5-tier analytics application using TypeScript, React and the MotherDuck Wasm SDK. ### 3-Tier vs 1.5-Tier | Factor | 3-Tier | 1.5-Tier (DuckDB-Wasm) | | --------------------- | ------------------- | ---------------------- | | **Query latency** | ~50-100ms | ~5-20ms ⚡ | | **Server cost** | $$ (per request) | $ (data transfer only) | | **Scalability** | High (auto-scaling) | ♾️ Unlimited | | **Data per user** | Any size | `<1GB` optimal | | **Offline support** | ❌ No | ✅ Yes | | **Server-side logic** | ✅ Yes | ❌ Limited | | **Best for** | Complex logic, auth | Read-heavy dashboards | ### Additional resources - [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/) - [Read Scaling Ducklings](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) - [Duckling Sizes](/about-motherduck/billing/duckling-sizes/) ## FAQ ### What is embedded analytics? Embedded analytics means putting data visualizations, dashboards, and interactive reports directly inside a software application. Users explore data in the product they already use rather than switching to a separate BI tool. MotherDuck powers embedded analytics with sub-second SQL queries and per-user compute isolation. ### What is the difference between embedded analytics and traditional BI? Traditional BI is built for internal teams using standalone tools like Tableau or Looker. Embedded analytics is for your external customers, living inside your product. That difference creates harder technical requirements: you need lower latency, higher concurrency (potentially thousands of simultaneous users), and per-tenant data isolation. MotherDuck's Duckling architecture handles all three. ### What is white-label analytics? White-label analytics lets you offer data analytics under your own brand. Your customers see dashboards that match your product's look and feel, with no third-party logos visible. MotherDuck supports this by providing a SQL query engine (DuckDB) that runs behind your UI — there's no user-facing vendor footprint. ### How do you add analytics to a SaaS product? Two main approaches. In a 3-tier architecture, your server queries MotherDuck and returns results to the frontend. This works well when you have complex auth or business logic. In a 1.5-tier architecture, DuckDB runs directly in the browser through WebAssembly, which is a better fit for read-heavy dashboards where each user's data stays under 1GB. Both approaches give you fast query performance. ### What is multi-tenant analytics? Multi-tenant analytics means serving multiple customers from one shared platform while keeping each customer's data separate. MotherDuck works differently, through Hypertenancy — every tenant gets a dedicated DuckDB instance (a Duckling). This avoids noisy-neighbor problems and keeps performance predictable while maintaining data isolation between each customer. --- Source: https://motherduck.com/docs/getting-started/data-warehouse --- sidebar_position: 2 title: Data Warehousing Overview sidebar_label: Data Warehousing description: Learn to use MotherDuck as a Data Warehouse --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Versions from '@site/src/components/Versions'; import CalloutBox from '@site/src/components/CalloutBox'; ## Introduction to MotherDuck for data warehousing MotherDuck is a cloud-native data warehouse built on top of [DuckDB](https://duckdb.org/docs/sql/introduction) that adds enterprise features like cloud storage, sharing, and collaboration to DuckDB's fast analytical engine. The platform serves these needs through its serverless architecture, sharing model, and WASM capabilities. It benefits data analysts with AI-assisted SQL, data engineers with familiar tools like dbt, and data scientists with hybrid local-cloud processing. ![img_duck_stack](img/bi_tool.svg) MotherDuck integrates with popular data tools including [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/), [Fivetran](https://fivetran.com/docs/destinations/motherduck#motherduck), and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) for data ingestion, [dbt](/docs/integrations/transformation/dbt) for transformations, [Tableau](/integrations/bi-tools/tableau/) and [PowerBI](/integrations/bi-tools/powerbi/) for visualization, and [Airflow](https://airflow.apache.org/docs/) and [Dagster](https://docs.dagster.io/integrations/libraries/duckdb/using-duckdb-with-dagster) for orchestration. This enables teams to build data warehousing solutions using their existing tools. ## Data ingestion An easy way to get into MotherDuck is using [ecosystem partners](/integrations/ingestion/) like [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/), [Fivetran](https://fivetran.com/docs/destinations/motherduck), [dlthub](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck), and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) but you can also create custom data engineering pipelines. MotherDuck is very flexible with how to load your data: - **From data you have on your filesystem:** If you have CSVs, JSON files or DuckDB databases sitting around, It's easy to load it into your MotherDuck data warehouse. - **From a data lake on a cloud object store:** If you already have your data in a data lake, as parquet, delta, iceberg or other formats, DuckDB has abstractions for Secrets, Object Storage, and many file types. When combined, this means that many file types can be read into DuckDB from Object Storage with only SQL. Though not as performant as MotherDuck's native storage layer, you can also query your infrequently-accessed data directly from your data lake with MotherDuck. - **Using Native APIs in many languages:** DuckDB supports numerous languages such as C++, Python, and Java, in addition to its own mostly Postgres-compatible SQL dialect. Using these languages, Data Engineers and Developers can integrate with MotherDuck without having to pick up yet-another-language. ### Best practices for programmatic loading The fastest way to load data is to load single tables in large batches, saturating the network connection between MotherDuck and the source data. DuckDB is incredibly good at handling both files and some kinds of in-memory objects, like Arrow dataframes. As an aside, Parquet files compress at 5-10x compared to CSV, which means you can get 5-10x more throughput by using Parquet files. Similarly, open table formats like Delta & Iceberg share those performance gains. On the other hand, small writes on multiple tables will lead to suboptimal performance. While MotherDuck does indeed offer [ACID compliance](https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid.html), it is not an OLTP system like Postgres! Significantly better performance can be achieved by using queues to batch writes to tables. While some latency is introduced with this methodology, the improvement in throughput should far outweigh the cost of doing small writes. Streaming workloads are better suited to be handled with queues in front of MotherDuck. ## Transforming data Once data is loaded into MotherDuck, it must be transformed into a model that matches the business purpose and needs. This can be done directly in MotherDuck using the powerful library of SQL functions offered by [DuckDB](https://duckdb.org/docs/sql/introduction.html). Many data engineers prefer to use data transformation tools like the open source [dbt Core](https://github.com/dbt-labs/dbt-core). More details specifically about using dbt with MotherDuck can be read in the [blog on this topic](https://motherduck.com/blog/duckdb-dbt-e2e-data-engineering-project-part-2/).
For more in-depth reading, the free **[DuckDB in Action eBook](https://motherduck.com/duckdb-book-brief/)** explores these concepts with real-world examples. ## Sharing data Once your data is loaded into MotherDuck and appropriately transformed for use by your analysts, you can make that data available using MotherDuck's [sharing capabilities](/key-tasks/sharing-data/sharing-overview/). This can allow every user in your organization to access the data warehouse in the MotherDuck UI, in their Python code or with other tools. Admins don't need to worry that the queries run by users will impact their data pipelines as users have isolated compute. ## Serving data analytics Do you want to serve reports or dashboards for your users? MotherDuck provides tokens that can be used with [popular tools](/integrations/bi-tools/) like Tableau & Power BI to access your data warehouse to serve business intelligence to end users. ### Ducks all the way down: Building data apps MotherDuck is built on DuckDB because it is an extremely efficient SQL engine inside a ~20MB executable. This lets you run the same DuckDB engine which powers your data warehouse inside your web browser, creating highly-interactive visualizations with near-zero latency. This enhances your experience when using the [Column Explorer](/getting-started/interfaces/motherduck-quick-tour/#column-explorer) in the MotherDuck UI. One thing that is unique to MotherDuck is its capabilities for serving data into the web layer through [WASM](/sql-reference/wasm-client). These capabilities enable novel analytical user actions, including very intensive queries that would be prohibitively expensive in other query engines. It also supports data mashup from various sources, so that data in the warehouse can be combined with other sources, like files in CSV, JSON, or Parquet. ## Scaling up & out for DWH use cases Furthermore, MotherDuck has a unique scaling model, of which there are four key concepts relevant for Data Warehousing. ### Vertical scaling Compute can scale up with larger DuckDB compute instances called Ducklings. MotherDuck offers 5 sizes: [Pulse, Standard, Jumbo, Mega, and Giga](/about-motherduck/billing/duckling-sizes/). Unlike other data warehouses, every Duckling (compute instance) is isolated from each other: one user's queries will not impact another user's from completing. This [hypertenancy](/concepts/hypertenancy) model assures you can size your warehouse correctly and use your resources very efficiently. ### Horizontal scaling For serving data to BI tools or other spiky consumers, [Read Scaling Replicas](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) can absorb the loads and maintain low latency on user interactivity. These should be owned by the same user or service accounts that run production jobs, although they can also leverage [`SHARES`](/key-tasks/sharing-data/sharing-overview/) depending on preferences. ### Hypertenancy Especially for production runs, use separate user accounts or [service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) with dedicated compute for updating and maintaining core tables. ### Distributed DuckDB DuckDB and MotherDuck work together as a distributed system that automatically optimizes query execution between local and cloud resources through Dual Execution, enabling efficient data access regardless of location. ## Orchestration To keep data up to date inside of MotherDuck, often an orchestrator like [Airflow](https://airflow.apache.org/) or [Dagster](https://dagster.io/) can be used. This runs jobs in specific orders to load & transform data, as well managing workflow and observability, which is necessary for handling more complex data engineering pipelines. If this is your first data warehouse, you might consider starting with something as simple as [GitHub actions](https://github.com/features/actions) or cron jobs to orchestrate your data pipelines. :::info For a more in-depth guide, check out the [Data Warehousing Guide](/key-tasks/data-warehousing/) ::: --- Source: https://motherduck.com/docs/getting-started/e2e-tutorial/e2e-tutorial --- title: "MotherDuck Tutorial" sidebar_label: "MotherDuck Tutorial" description: "Complete end-to-end tutorial to get started with MotherDuck and DuckDB" --- import SignUpLink from '@site/src/components/SignUpLink'; # MotherDuck tutorial This comprehensive guide will take you from your first query to sharing databases with your team. ## What you'll learn This tutorial is in 3 parts, you'll discover how to: - 🔍 **[1. Query shared data](./part-1)** - Run your first SQL queries on publicly available datasets - 📊 **[2. Load your own data](./part-2)** - Upload and work with your own data from files and datasets - 🤝 **[3. Share databases](./part-3)** - Collaborate by sharing databases with team members :::tip Each part of this tutorial builds on the previous one, but you can also jump to specific sections if you're looking to learn particular features. ::: ## Prerequisites To follow this tutorial, you'll need: - A **MotherDuck account** (sign up for free) - Basic **SQL knowledge** (we'll guide you through the queries) - You have several ways to run the queries: * Execute them directly on this documentation website 🪄 * Use the [MotherDuck UI](https://app.motherduck.com) for the full interface experience * Connect with any [DuckDB client](../interfaces/)(Python, Java, DuckDB CLI) of your choice **⏱️ Estimated time:** 20-30 minutes for the complete tutorial Let's get started! 🚀 --- Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-1 --- sidebar_position: 1 title: "1 - Running Your First Query" sidebar_label: "1 - First Query" description: "Learn MotherDuck and DuckDB by running your first queries on shared data" --- import MotherDuckSQLEditor from '@site/src/components/MotherDuckSQLEditor'; import Versions from '@site/src/components/Versions'; import SignUpLink from '@site/src/components/SignUpLink'; In this multi-part tutorial, you will go through a full end-to-end example on how to use MotherDuck and DuckDB, **push** and **share** data, take advantage of **hybrid query** execution and query data using SQL through the **MotherDuck UI** or **DuckDB CLI**. :::note MotherDuck supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions through . ::: ## Running your first query ### Query from a shared database Before playing with the dataset we just downloaded, let's run a couple simple queries on the shared sample database. This database contains a series of MotherDuck's public datasets and it's *auto-attached* for each user, meaning it's accessible directly within your MotherDuck session without any additional setup. We will query the NYC 311 dataset first. This dataset contains over thirty million complaints citizens have filed with the New York City government. We'll select several columns and look at the complaints filed over a few days to demonstrate the [Column Explorer](https://motherduck.com/blog/introducing-column-explorer/) feature of the MotherDuck UI. Want to explore the full interface? Try running this query in the MotherDuck UI to experience the complete dashboard, visual query builder, and advanced analytics features. :::info In the MotherDuck UI, the Column Explorer provides quick visual summaries of your data, helping you understand distributions and patterns at a glance. ![Column Explorer showing data distribution summaries in the MotherDuck UI](./img/demo_ui_column_explorer.png) ::: For the remainder of this tutorial, we'll focus on the NYC taxi data and perform aggregation queries representative of the types of queries often performed in analytics databases. We will first get the average fare based on the number of passengers. The source dataset covers data for the whole month of November 2022. :::info The `sample_data` database is auto-attached but for any other shared database you would like to read, you would need to use the `ATTACH` statement. Read more about [querying shared MotherDuck databases](/key-tasks/sharing-data/sharing-data.mdx). ::: :::tip **Using a DuckDB client?** You can run these same queries in any of the DuckDB client after connecting with `ATTACH 'md:';` - you'll be prompted to authenticate if no `motherduck_token` is found as environment variable. ::: ### Query from S3 Our shared sample database is great to play with but you probably want to use your own data on AWS S3. Let's see how to do that. The sample database source data is actually available on our public AWS S3 bucket. Let's run the exact same query but instead of pointing to a MotherDuck table, we will point to a parquet file on S3. For a secured bucket, we need to pass the AWS credentials - check [authenticating to S3](../../integrations/cloud-storage/amazon-s3.mdx) for more information. Here's the updated query while reading from S3: :::info DuckDB automatically detects the appropriate reader based on file extension, so there’s no need to explicitly specify a function. However, if you need more control over how files are read, you can use the corresponding functions directly: ```sql SELECT * FROM read_parquet('my_data.parquet'); SELECT * FROM read_csv_auto('my_data.csv'); SELECT * FROM read_json_auto('my_data.json'); ``` These functions allow you to customize parsing behavior or override automatic detection when needed. ::: ## Next steps Great! You've successfully run your first queries on MotherDuck. You've learned how to: ✅ Query shared databases like `sample_data` ✅ Read data directly from S3 👉 **[Continue to Part 2: Loading Your Dataset →](../part-2)** --- Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-2 --- sidebar_position: 2 title: "2 - Loading Your Data" sidebar_label: "2 - Loading Data" description: "Learn how to load your own datasets into MotherDuck" --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import MotherDuckSQLEditor from '@site/src/components/MotherDuckSQLEditor'; In this section, you'll learn how to load your own data into MotherDuck and run powerful hybrid queries that combine local and cloud data. 👈 **[Go back to Part 1: Running Your First Query](../part-1)** ## Loading your data ### Loading data using CREATE TABLE AS SELECT The `CREATE TABLE AS SELECT` (CTAS) pattern creates a new table and populates it with data in a single operation: ```sql CREATE OR REPLACE TABLE docs_playground.my_table AS SELECT * FROM 'my_data.csv'; ``` ### Loading data using INSERT INTO The `INSERT INTO` pattern lets you append data to existing tables, update specific records, and manage data incrementally: ```sql -- First, create the table structure CREATE TABLE docs_playground.my_table AS SELECT * FROM 'my_data.csv' LIMIT 0; -- Then load data incrementally INSERT INTO docs_playground.my_table SELECT * FROM 'new_data.csv'; INSERT OR REPLACE INTO docs_playground.my_table SELECT * FROM 'updated_data.csv'; ``` :::tip While `CREATE TABLE AS SELECT` is convenient for one-time loads or small datasets, for larger datasets and production workflows, we recommend using `INSERT INTO`. This approach provides better control over data loading, allows for incremental updates, and is more efficient for ongoing data management. ::: There are several ways to get your data into MotherDuck, depending on where your data lives: ### From local file system To load data files from your file system into MotherDuck, you'll need: 1. A valid MotherDuck token stored as the `motherduck_token` environment variable 2. A DuckDB client (DuckDB CLI, Python, etc.) To create a MotherDuck token, navigate to the MotherDuck UI, click your organization name in the top left, then go to **Settings > Integrations > Access Token**. For detailed instructions, see our [authentication guide](../../key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md). Install the DuckDB CLI for macOS/Linux. For other operating systems, see the [DuckDB installation guide](https://duckdb.org/docs/installation/). ```bash curl -s https://install.motherduck.com | sh ``` Launch the DuckDB CLI: ```bash duckdb ``` ```sql -- Connect to MotherDuck ATTACH 'md:'; -- Load CSV data from your local file into the playground database CREATE TABLE docs_playground.popular_currency_rate_dollar AS SELECT * FROM './popular_currency_rate_dollar.csv'; ``` Install DuckDB using your preferred package manager, such as pip: ```bash pip install duckdb ``` ```python import duckdb # Connect to MotherDuck conn = duckdb.connect('md:') # Load data into the playground database (automatically created) conn.execute(""" CREATE TABLE docs_playground.popular_currency_rate_dollar AS SELECT * FROM './popular_currency_rate_dollar.csv' """) ``` Head over to the `Add data` button in the MotherDuck UI and upload your file directly. This works great for smaller files and provides a visual interface. ![Add file](./img/screenshot_add_data.png) ![load data](./img/screenshot_loading_data2.png) ### From remote storage (S3, GCS, etc.) For data already stored in cloud storage, you have multiple options: You can load public remote data into your playground database using our interactive SQL editor: ```sql ATTACH 'md:'; CREATE TABLE docs_playground.popular_currency_rate_dollar AS SELECT * FROM 's3://us-prd-motherduck-open-datasets/misc/csv/popular_currency_rate_dollar.csv'; ``` ```python import duckdb conn = duckdb.connect('md:') conn.execute(""" CREATE TABLE docs_playground.popular_currency_rate_dollar AS SELECT * FROM 's3://your-bucket/your-file.csv' """) ``` 1. In the left panel of the UI, click **Add data** 2. Select **From cloud storage** 3. For a publicly accessible bucket, skip creating a secret 4. Switch to **Wildcard** mode, and enter the S3 path `s3://us-prd-motherduck-open-datasets/**/popular_currency_rate_dollar.csv` 5. Name the table `popular_currency_rate_dollar` and select `docs_playground` as the destination database 6. Click **Create table** ![Create table from S3](./img/screenshot_ui_create_table_from_s3.png) For more details, see [Loading Data from Cloud Storage](../../key-tasks/loading-data-into-motherduck/loading-data-from-cloud-or-https.md). :::info For private AWS s3 buckets, you'll need to configure AWS credentials. Check our [AWS s3 authentication guide](../../integrations/cloud-storage/amazon-s3.mdx) for details. ::: ### Querying your data Once your data is loaded, you can query it from any interface: ```sql ATTACH 'md:'; FROM docs_playground.popular_currency_rate_dollar LIMIT 10; ``` ```python import duckdb # Connect to MotherDuck conn = duckdb.connect('md:') # Query your data result = conn.sql("FROM docs_playground.popular_currency_rate_dollar LIMIT 10").fetchall() print(result) ``` 👉 **[Continue to Part 3: Sharing Your Database →](../part-3)** --- Source: https://motherduck.com/docs/getting-started/e2e-tutorial/part-3 --- sidebar_position: 3 title: "3 - Sharing Your Database" sidebar_label: "3 - Sharing Data" description: "Learn how to share your databases and collaborate with your team" --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import MotherDuckSQLEditor from '@site/src/components/MotherDuckSQLEditor'; In this section, you'll learn how to share your databases with colleagues and collaborate effectively using MotherDuck's sharing features. 👈 **[Go back to Part 2: Loading Your Dataset](../part-2)** ## Creating and sharing your data Let's create a table with sample data in your playground database, then share it with others. The `docs_playground` database is automatically created when you connect, so you can start experimenting right away! First, let's populate your playground database with some currency exchange data: ## Sharing your database With your database and sample data in place, you can share this dataset with others. MotherDuck shares create a point-in-time snapshot of your database that can be accessed by specified users or groups. When creating a share, the most important parameters control **access scope**, **visibility**, and **update behavior**. By default, shares use `ACCESS ORGANIZATION` (only your organization members can access), `VISIBILITY DISCOVERABLE` (appears in your organization's shared database list), and `UPDATE MANUAL` (creates a static snapshot that doesn't auto-update). The syntax to create a share visible to everyone in your Organization is `CREATE SHARE from `. You can also create shares through the MotherDuck UI by clicking the dropdown menu next to your database and selecting the share option. This will open a window to configure your share settings. ![share 1](./img/screenshot_tutorial_share_1_2.png) ![share 2](./img/screenshot_tutorial_share_2_2.png) Once created, all members of your organization will be able to view this share in the MotherDuck UI under "Shared with me". Learn more about [sharing in MotherDuck](../../key-tasks/sharing-data/sharing-within-org.md). ## Understanding share configuration When creating shares, you can control three key aspects: **who can access** the data, **how users discover** the share, and **when the data updates**. Each parameter has specific options that determine the sharing behavior. ### ACCESS - who can access the share - **`ACCESS ORGANIZATION`** (default): Only members of your organization can access the share - **`ACCESS UNRESTRICTED`**: All MotherDuck users in the same cloud region as your Organization can access the share - **`ACCESS RESTRICTED`**: Only the share owner has initial access; additional users must be granted access through `GRANT` commands ### VISIBILITY - how users discover the share - **`VISIBILITY DISCOVERABLE`** (default): The share appears in your organization's "Shared with me" section for easy discovery - **`VISIBILITY HIDDEN`**: Share can only be accessed through a direct URL; not listed in any user interface :::info Important Visibility Rules - Organization and Restricted shares default to `DISCOVERABLE` - Unrestricted shares can only be `HIDDEN` - Hidden shares can only be used with `ACCESS RESTRICTED` ::: ### UPDATE - when share data updates - **`UPDATE MANUAL`** (default): Share content only updates when you run `UPDATE SHARE` command - **`UPDATE AUTOMATIC`**: Share automatically reflects database changes within ~5 minutes ### Example share configurations ## Querying shared data After creating a share, authorized users can access the shared database in two ways: by using the share URL directly or by attaching it as a database alias: ```sql -- Attach a shared database ATTACH 'md:_share/docs_playground/b556630d-74f1-435c-9459-cfb87d349cb3' AS shared_currency; -- Query the shared data SELECT * FROM shared_currency.currency_rates WHERE rate_to_usd < 1.0 ORDER BY rate_to_usd DESC; ``` ## Managing Shares You can also manage your existing shares: ## Going further Now that you've mastered the basics, here are some next steps to explore: - Learn about [MotherDuck's Dual Execution](/key-tasks/running-hybrid-queries/) feature - Connect to your favorite BI tools: [Tableau](../../integrations/bi-tools/tableau/index.mdx), [Power BI](../../integrations/bi-tools/powerbi/index.mdx) and learn more about [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) - Set up data pipelines with [dbt](../../integrations/transformation/dbt.md) - Look at our [supported integrations](/integrations) to integrate with your data stack. --- Source: https://motherduck.com/docs/getting-started/getting-started --- title: Getting Started sidebar_class_name: getting-started-icon description: Getting started with MotherDuck serverless cloud data warehouse. --- import IconGrid from '@site/src/components/IconGrid'; import HorizontalLayout from '@site/src/components/HorizontalLayout'; import HorizontalDivider from '@site/src/components/HorizontalDivider'; import VideoPlayer from '@site/src/components/VideoPlayer'; import CalloutBox from '@site/src/components/CalloutBox'; import styles from './getting-started.module.css'; # MotherDuck documentation
MotherDuck is a serverless cloud data warehouse built on DuckDB. It's designed for fast, interactive SQL analytics without the infrastructure overhead. Build a modern data warehouse for BI, or power customer-facing analytics in your app. Develop and iterate locally, share and scale in the cloud when you need it.
--- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/index --- title: Client APIs description: Client APIs for MotherDuck --- # Client APIs MotherDuck works with all DuckDB client APIs. Choose your preferred language or driver below. ## Included pages - [Python](https://motherduck.com/docs/getting-started/interfaces/client-apis/python): Connect and query MotherDuck from Python - [Other Client APIs](https://motherduck.com/docs/getting-started/interfaces/client-apis/other): Other DuckDB client APIs that work with MotherDuck --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/c --- title: C description: MotherDuck + C sidebar_position: 1 --- The MotherDuck integration with C is no different than DuckDB. For more information, see [C](https://duckdb.org/docs/stable/clients/c/overview.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/go --- title: Go description: MotherDuck + GoLang sidebar_position: 2 --- The MotherDuck integration with Go is no different than DuckDB. For more information, see [Go](https://duckdb.org/docs/stable/clients/go.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/index --- title: Other Client APIs description: Other DuckDB client APIs that work with MotherDuck --- # Other Client APIs DuckDB supports various client APIs that work seamlessly with MotherDuck. For the complete list of client APIs, see the [DuckDB Documentation](https://duckdb.org/docs/stable/clients/overview.html). ## Included pages - [C](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/c): MotherDuck + C - [Go](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/go): MotherDuck + GoLang - [Java (JDBC)](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/java): MotherDuck + Java - [Node.js](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/nodejs): MotherDuck + Node.js - [ODBC](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/odbc): MotherDuck + ODBC - [R](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/r): MotherDuck + R - [Rust](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/rust): MotherDuck + Rust - [WebAssembly (Wasm)](https://motherduck.com/docs/getting-started/interfaces/client-apis/other/wasm): MotherDuck + WebAssembly --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/java --- title: Java (JDBC) description: MotherDuck + Java sidebar_position: 3 --- The MotherDuck integration with Java is no different than DuckDB. For more information, see [Java](https://duckdb.org/docs/stable/clients/java.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/nodejs --- title: Node.js description: MotherDuck + Node.js sidebar_position: 4 --- The MotherDuck integration with Node.js uses the `@duckdb/node-api` package. For more information, see [Node.js (Neo)](https://duckdb.org/docs/stable/clients/node_neo/overview.html) in DuckDB Documentation. This package replaces the deprecated `duckdb` npm package. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/odbc --- title: ODBC description: MotherDuck + ODBC sidebar_position: 5 --- The MotherDuck integration with ODBC is no different than DuckDB. For more information, see [ODBC](https://duckdb.org/docs/stable/clients/odbc/overview.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/r --- title: R description: MotherDuck + R sidebar_position: 6 --- The MotherDuck integration with R is no different than DuckDB. For more information, see [R](https://duckdb.org/docs/stable/clients/r.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/rust --- title: Rust description: MotherDuck + Rust sidebar_position: 7 --- The MotherDuck integration with Rust is no different than DuckDB. For more information, see [Rust](https://duckdb.org/docs/stable/clients/rust.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/other/wasm --- title: WebAssembly (Wasm) description: MotherDuck + WebAssembly sidebar_position: 8 --- The MotherDuck offers its own fork of DuckDB Wasm, which is [documented here](/sql-reference/wasm-client/). For more information about DuckDB Wasm, see [WebAssembly](https://duckdb.org/docs/stable/clients/wasm/overview.html) in DuckDB Documentation. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/choose-database --- sidebar_position: 2 title: Specify MotherDuck database description: Specify MotherDuck database --- When you connect to MotherDuck you can specify a database name or omit the database name and connect to the default database. - If you use `md:` without a database name, you connect to a default MotherDuck database called `my_db`. - If you use `md:`, you connect to the `` database. After you establish the connection, either the default database or the one you specify becomes the current database. You can run the `USE` command to switch the current database, as shown in the following example. ```python #list the current database con.sql("SELECT current_database()").show() # ('database1') #switch the current database to database2 con.sql("USE database2") ``` To query a table in the current database, you can specify just the table name. To query a table in a different database, you can include the database name when you specify the table. You don't need to switch the current database. The following examples demonstrate each method. ```sql #querying a table in the current database con.sql("SELECT count(*) FROM mytable").show() #querying a table in another database con.sql("SELECT count(*) FROM another_db.another_table").show() ``` --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/index --- title: Python description: Connect and query MotherDuck from Python --- # Python Learn how to connect to MotherDuck and query your data using Python. ## Included pages - [DuckDB Python installation and authentication](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/installation-authentication): How to install DuckDB and connect to MotherDuck - [Specify MotherDuck database](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/choose-database): Specify MotherDuck database - [Loading data into MotherDuck with Python](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/loading-data-into-md): Load CSV, Parquet, and JSON files into MotherDuck from local, S3, or HTTPS sources using Python. - [Query data](https://motherduck.com/docs/getting-started/interfaces/client-apis/python/query-data): Execute SQL queries against MotherDuck using Python with hybrid local and cloud execution. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/installation-authentication --- sidebar_position: 1 title: "DuckDB Python installation and authentication" sidebar_label: Installation & authentication description: How to install DuckDB and connect to MotherDuck hide_title: true --- import Versions, { duckdbVersionRanges } from '@site/src/components/Versions'; # Installation & authentication ## Prerequisites MotherDuck Python supports the following operating systems: - Linux (x64, glibc v2.31+, equivalent to ubuntu v20.04+) - Mac OSX 11+ (M1/ARM or x64) - Python 3.4 or later Please let us know if your configuration is unsupported. ## Installing DuckDB :::note MotherDuck supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions through . ::: Use the following `pip` command to install the supported version of DuckDB:

{`pip install duckdb==${ duckdbVersionRanges["us-east-1"].max }`}
## Connect to MotherDuck You can connect to and work with multiple local and MotherDuck-hosted DuckDB databases at the same time. The connection syntax varies depending on how you’re opening local DuckDB and MotherDuck. ### Authenticating to MotherDuck You can authenticate to MotherDuck using either browser-based authentication or an access token. Here are examples of both methods: #### Using browser-based authentication ```python import duckdb # connect to MotherDuck using 'md:' or 'motherduck:' con = duckdb.connect('md:') ``` When you run this code: 1. A URL and a code will be displayed in your terminal. 2. Your default web browser will automatically open to the URL. 3. You'll see a confirmation request to approve the connection. 4. Once, approved, if you're not already logged in to MotherDuck, you'll be prompted to do so. 5. Finally, you can close the browser tab and return to your Python environment. This method is convenient for interactive sessions and doesn't require managing access tokens. #### Using an access token For automated scripts or environments where browser-based auth isn't suitable, you can use an access token: ```python import duckdb # Initiate a MotherDuck connection using an access token con = duckdb.connect('md:?motherduck_token=') ``` Replace `` with an actual token generated from the MotherDuck UI. To learn more about creating and managing access tokens, as well as other authentication options, see our guide on [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md). ### Connecting to MotherDuck Once you've authenticated, you can connect to MotherDuck and start working with your data. Let's look at a few common scenarios. #### Connecting directly to MotherDuck Here's how to connect to MotherDuck and run a simple query: ```python import duckdb # Connect to MotherDuck via browser-based authentication con = duckdb.connect('md:my_db') # Run a query to verify the connection con.sql("SHOW DATABASES").show() ``` :::tip When connecting to MotherDuck, you need to specify a database name (like `my_db` in the example). If you're a new user, a default database called `my_db` is automatically created when your account is first set up. You can query any table in your connected database by just using its name. To switch databases, use the `USE` command. ::: #### Working with both MotherDuck and local databases MotherDuck lets you work with both cloud and local databases simultaneously. Here's how: ````python import duckdb # Connect to MotherDuck first, specifying a database con = duckdb.connect('md:my_db') # Then attach local DuckDB databases con.sql("ATTACH 'local_database1.duckdb'") con.sql("ATTACH 'local_database2.duckdb'") # List all connected databases con.sql("SHOW DATABASES").show() ```` #### Adding MotherDuck to an existing local connection If you're already working with a local DuckDB database, you can add a MotherDuck connection: ````python import duckdb # Start with a local DuckDB database local_con = duckdb.connect('local_database.duckdb') # Add a MotherDuck connection, specifying a database local_con.sql("ATTACH 'md:my_db'") ```` This is another approach to give you the flexibility to work with both local and cloud data in the same session. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/loading-data-into-md --- sidebar_position: 3 title: Loading data into MotherDuck with Python sidebar_label: Loading data into MotherDuck description: Load CSV, Parquet, and JSON files into MotherDuck from local, S3, or HTTPS sources using Python. --- ## Copying a table from a local DuckDB database into MotherDuck You can use `CREATE TABLE AS SELECT` to load CSV, Parquet, and JSON files into MotherDuck from either local, Amazon S3, or https sources as shown in the following examples. ```python # load from local machine into table mytable of the current/active used database con.sql("CREATE TABLE mytable AS SELECT * FROM '~/filepath.csv'"); # load from an S3 bucket into table mytable of the current/active database con.sql("CREATE TABLE mytable AS SELECT * FROM 's3://bucket/path/*.parquet'") ``` If the source data matches the table’s schema exactly you can also use `INSERT INTO ... SELECT` to append data, as shown in the following example. ```python # append to table mytable in the currently selected database from S3 con.sql("INSERT INTO mytable SELECT * FROM ‘s3://bucket/path/*.parquet’") ``` :::tip Use `INSERT INTO ... SELECT` to load data from files as shown above. Do not use single-row `INSERT INTO ... VALUES` statements in a loop — this is significantly slower because each statement incurs separate network overhead. See [Loading data best practices](/key-tasks/loading-data-into-motherduck/considerations-for-loading-data/) for more detail. ::: ## Copying an entire local DuckDB database to MotherDuck MotherDuck supports copying your opened DuckDB database into a MotherDuck database. The following example copies a local DuckDB database named `localdb` into a MotherDuck-hosted database named `clouddb`. ```python # open the local db local_con = duckdb.connect("localdb.ddb") # connect to MotherDuck local_con.sql("ATTACH 'md:'") # The from indicates the file to upload. An empty path indicates the current database local_con.sql("CREATE DATABASE clouddb FROM CURRENT_DATABASE()") ``` A local DuckDB database can also be copied by its file path: ```sql local_con = duckdb.connect("md:") local_con.sql("CREATE DATABASE clouddb FROM 'localdb.ddb'") ``` See [Loading Data into MotherDuck](/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck.mdx) for more detail. --- Source: https://motherduck.com/docs/getting-started/interfaces/client-apis/python/query-data --- sidebar_position: 4 title: Query data description: Execute SQL queries against MotherDuck using Python with hybrid local and cloud execution. --- For more information about database manipulation, see [MotherDuck SQL reference](/docs/sql-reference/motherduck-sql-reference/). MotherDuck uses DuckDB under the hood, so nearly all [DuckDB SQL](https://duckdb.org/docs/) works in MotherDuck without differences. MotherDuck leverages "hybrid execution" to decide the best location to execute queries, including across multiple locations. For example, if your data lives on your laptop, MotherDuck executes queries against that data on your laptop. Similarly, if you are joining data on your laptop to data on Amazon S3, MotherDuck executes each part of the query where data lives before bringing it together to be joined locally. ## Querying data In MotherDuck You can query data loaded into MotherDuck the same way you query data in your DuckDB databases. MotherDuck executes these queries using resources in the cloud. ```sql # table table_name is in MotherDuck storage con.sql("SELECT * FROM table_name").show(); ``` ## Querying data on your machine You can use MotherDuck to query files on your local machine. These queries execute using your machine's resources. ```sql # query a Parquet file on your local machine con.sql("SELECT * FROM '~/file.parquet'").show(); # query a table in a local DuckDB database con.sql("SELECT * FROM local_table").show(); ``` ## Joining data across multiple locations You can use MotherDuck to join data: - In MotherDuck - On S3 or other cloud object stores (Azure, GCS, R2, etc) - On your local machine ## What's next ? Ready to share your DuckDB data with your colleagues? Read up on [Sharing In MotherDuck](/key-tasks/sharing-data/sharing-data.mdx). --- Source: https://motherduck.com/docs/getting-started/interfaces/connect-query-from-duckdb-cli --- sidebar_position: 3 title: "Install and connect with the DuckDB CLI" sidebar_label: DuckDB CLI description: Learn to connect and query databases using MotherDuck from the DuckDB CLI hide_title: true --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import DownloadLink from '@site/src/components/DownloadLink'; import OSTabLabel from '@site/src/components/OSTabLabel'; import Versions from '@site/src/components/Versions'; import DuckDBCliTerminalDemo from '@site/src/components/DuckDBCliTerminalDemo'; # DuckDB CLI ## Installation :::note MotherDuck supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions through . ::: Download and install the DuckDB binary, depending on your operating system. }> 1. Download the 64-bit Windows binary 2. Extract the zip file. }> The recommended way to install the CLI is with the MotherDuck install script: ### Install with bash ```bash curl -s https://install.motherduck.com | sh ``` }> 1. Download the Linux binary: - For 64-bit, download the binary - For arm64/aarch64, download the binary 2. Extract the zip file. For more information, see the [DuckDB installation documentation](https://duckdb.org/docs/installation/). ## Try it Walk through starting DuckDB, attaching MotherDuck, and running your first query in the playground below. Each step explains what happens before you press Enter, so you can preview the full flow before running it on your machine. ## Step by step ### Start the DuckDB CLI After installing, start DuckDB from your terminal: ```sh duckdb ``` DuckDB opens an in-memory database by default, so any tables you create won't persist when you exit. Pass a filename to open or create a persistent local database: ```sh duckdb mydatabase.duckdb ``` ### Connect to MotherDuck From inside the DuckDB CLI, attach MotherDuck: ```sql ATTACH 'md:'; ``` DuckDB downloads the signed MotherDuck extension and opens your default browser to authenticate. Follow the instructions in the terminal. To list your MotherDuck databases and confirm the connection, run: ```sql SHOW DATABASES; ``` You can query local DuckDB data and MotherDuck databases from the same session. For more on persisting your authentication credentials, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md). :::tip You can also connect to MotherDuck directly when starting DuckDB: ```bash duckdb "md:" ``` ::: :::note Manual extension update When MotherDuck releases a new extension version you can force-reinstall the extension from the CLI. ```sh FORCE INSTALL motherduck; ``` ::: ### Open the MotherDuck UI from the CLI Launch the MotherDuck UI from your terminal: ```bash duckdb -ui ``` If you're already in a DuckDB session, run `CALL start_ui();` instead. --- Source: https://motherduck.com/docs/getting-started/interfaces/interfaces --- title: MotherDuck Interfaces description: MotherDuck Offers a variety of interfaces (APIs) for integration --- ## Client Interfaces ## Included pages - [Client APIs](https://motherduck.com/docs/getting-started/interfaces/client-apis): Client APIs for MotherDuck - [Install and connect with the DuckDB CLI](https://motherduck.com/docs/getting-started/interfaces/connect-query-from-duckdb-cli): Learn to connect and query databases using MotherDuck from the DuckDB CLI - [MotherDuck Web UI](https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour): A guide to the MotherDuck Web UI — write SQL with Instant SQL, use AI to fix and edit queries, and explore your data interactively. - [Postgres endpoint (thin client)](https://motherduck.com/docs/getting-started/interfaces/postgres-endpoint): Query MotherDuck from any Postgres-compatible client without installing DuckDB --- Source: https://motherduck.com/docs/getting-started/interfaces/motherduck-quick-tour --- sidebar_position: 3 title: MotherDuck Web UI description: A guide to the MotherDuck Web UI — write SQL with Instant SQL, use AI to fix and edit queries, and explore your data interactively. --- import VideoPlayer from '@site/src/components/VideoPlayer'; import useBaseUrl from '@docusaurus/useBaseUrl'; import SignUpLink from '@site/src/components/SignUpLink'; ## Getting started To log in to the MotherDuck UI, go to app.motherduck.com. :::info You can also open the web UI directly from the DuckDB CLI: ```bash duckdb "md:" -ui ``` ::: ### Main window The MotherDuck UI is organized around a notebook-style editor with a database browser on the left and results inspection on the right. ![UI](../img/screenshot_ui.png) ## Instant SQL: write SQL with real time feedback **Instant SQL** gives you keystroke-fast query previews — results update as you type, with no run button needed. Under the hood, MotherDuck uses [dual execution](/concepts/architecture-and-capabilities/#dual-execution) to parse and run your query locally first, giving you immediate feedback while full cloud results load in the background. A caching indicator in the cell header shows when results are served from local cache. {/* TODO: Replace with new "Instant SQL" video once recorded — see .context/video-outlines.md for shot list */} ### Enabling instant SQL Toggle Instant SQL on or off per cell using: - The **Instant SQL toggle** in the cell header - The keyboard shortcut `Ctrl`/`⌘` + `Shift` + `.` ### What works with instant SQL - **Filtering in real time:** Add or change a `WHERE` clause and watch results narrow instantly. - **Multi-statement cells:** Click on any individual statement within a multi-statement cell to preview just that one. - **Window functions:** Window functions are fully supported in Instant SQL previews. ## Fix errors and edit queries with AI MotherDuck's AI features help you fix broken queries, rewrite SQL in plain English, and generate queries from scratch — all without leaving the editor. ### "Help me fix this broken query" — FixIt When you run a query that has an error, **FixIt** automatically analyzes the error and suggests an inline fix. Click to accept and re-run in one step. {/* TODO: Replace with new "Fix errors and edit SQL with AI" video once recorded — see .context/video-outlines.md for shot list */} By default, FixIt auto-suggests fixes whenever an error occurs. You can turn off auto-suggest and still trigger FixIt manually by clicking **Suggest fix** at the bottom of any error message. ![FixIt manual trigger](../../key-tasks/img/fixit-manual-suggestion.png) Toggle auto-suggest in **Settings → Preferences → Enable inline SQL error fix suggestions**. :::tip Free for all users FixIt is available on all plans, including the free tier. ::: ### "Modify my SQL using plain english" — edit Select text in your query (or place your cursor anywhere) and press `Ctrl`/`⌘` + `Shift` + `E` to open the **Edit** dialog. Describe what you want to change in natural language: ![Edit prompt](../../key-tasks/img/edit-prompt.png) Review the suggestion, then iterate with follow-up prompts if needed: ![Edit follow-up](../../key-tasks/img/edit-follow-up.png) When you're happy with the result, click **Apply edit** to update your query. ![Edit applied](../../key-tasks/img/edit-follow-up-2.png) ### Going further with SQL assistant functions For programmatic AI access (text-to-SQL, query explanation, schema understanding), see the [SQL Assistant functions](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/) reference. These are available in any DuckDB client connected to MotherDuck, not just the web UI. ## Explore your results ### Interactive data grid Query results load into an interactive data grid where you can sort, filter, and pivot without writing more SQL. Click the **Expand** button at the top right of any cell to go full-screen on the editor and results. ![Expand cells](../img/screenshot_expand_cells.png) ### Column explorer The Column Explorer shows statistics for every column in a table or result set — value frequencies, NULL percentages, histograms for numeric columns, and time-series charts for timestamp columns. {/* TODO: Replace with new "Explore your data" video once recorded — see .context/video-outlines.md for shot list */} Toggle the Column Explorer with `Ctrl`/`⌘` + `I` or the toggle button at the top right of the results panel. ### Cell content pane Click any cell in the results grid to see its full contents in the Cell Content Pane. ![Cell content — long text](../img/cell_content_long_text.png) For JSON columns, you can expand and collapse nodes, copy the value, or copy the keypath to any nested field. ![Cell content — JSON](../img/cell_content_json.png) ## Write queries faster ### Autocomplete Autocomplete suggests SQL syntax, table names, column names, and functions as you type. Turn it off in **Settings → Preferences → Enable autocomplete when typing**. ### Inline docs Hover over any SQL function in the editor to see its description, parameter types, and return type. Click the **Docs** link in the tooltip to open the full reference.
Turn off Inline Docs in **Settings → Preferences → Enable Inline Docs**. ### Format SQL Press `Ctrl`/`⌘` + `Alt`/`⌥` + `O` to auto-format the SQL in your current cell. When text is selected, only the selection is formatted. ## Navigate the workspace ### Object explorer Browse your databases, schemas, and tables in the left-hand panel. Toggle it with `Ctrl`/`⌘` + `B`. ### Command menu Press `Ctrl`/`⌘` + `K` to open the command menu for quick access to actions, notebooks, and settings. ### Notebook and worksheet views Toggle between notebook view (multiple cells) and worksheet view (single expanded cell) with `Ctrl`/`⌘` + `E`. ### Running queries The Running Queries page, found under **Settings** → **Running Queries**, lets you monitor and manage long-running queries on your Duckling. For each query, you can see: - **Query**: The SQL text of the query (click to expand the full statement). - **Status**: Whether the query is active or has completed. - **Start time**: When the query started executing. - **Elapsed time**: How long the query has been running. This is useful for identifying queries that are taking longer than expected. You can cancel a running query directly from this page. For programmatic access to active connections and query cancellation through SQL, see [`md_active_server_connections()`](/sql-reference/motherduck-sql-reference/connection-management/monitor-connections/) and [`md_interrupt_server_connection()`](/sql-reference/motherduck-sql-reference/connection-management/interrupt-connections/). For a broader view of query activity across your organization, see the [`RECENT_QUERIES`](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) and [`QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) views. ### Duckling overview The Duckling overview page, found under **Settings** → **Duckling overview**, gives organization admins an at-a-glance view of activity across every Duckling in the organization over the last 24 hours. For each Duckling, you can see: - **Account**: The MotherDuck user or service account the Duckling belongs to. - **Status**: Whether the Duckling is running normally or has encountered errors. - **Spills**: Whether queries on this Duckling spilled to disk, which indicates memory pressure from larger-than-memory workloads. - **Active minutes**: How long the Duckling was actively running queries over the last 24 hours. Click a Duckling row to drill in. A bar chart visualizes query activity over time, and a table below lists individual queries. Click a query to open a side panel with the full SQL text, or open a dedicated focus page for a single query. ![Duckling overview drill-down showing summary stats, a query activity bar chart, and a table of top queries](../img/duckling-overview-drilldown.png) Use the timezone toggle in the page header to switch between UTC and your local time. This page is admin-only and is built on the [`QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) view, so it has the same ingestion delay — queries from the last few seconds may not appear yet. For a programmable view of the same data, or a more real-time view of ongoing queries, see the [`QUERY_HISTORY`](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) and [`RECENT_QUERIES`](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) views. ## Keyboard shortcuts Use `Ctrl` for Windows/Linux and `⌘` (Command) for Mac. Use `Alt` for Windows/Linux and `⌥` (Option) for Mac. ### Running queries | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `Enter` | Run the current cell. | | `Ctrl`/`⌘` + `Shift` + `Enter` | Run selected text in the current cell. If no text is selected, run the whole cell. | | `Shift` + `Enter` or `Alt`/`⌥` + `Enter` | Run the current cell, then advance to the next cell (creates a new one if needed). | ### Editing | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `z` | Undo within current cell. | | `Ctrl`/`⌘` + `Shift` + `z` | Redo within current cell. | | `Ctrl`/`⌘` + `Alt`/`⌥` + `o` | Format SQL in the current cell (or selection). | | `Ctrl`/`⌘` + `/` | Toggle line comments (`--`). | | `Tab` | Indent current line (in editor). | | `Shift` + `Tab` | De-indent current line (in editor). | ### AI features | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `Shift` + `.` | Toggle [Instant SQL](#instant-sql-write-sql-with-real-time-feedback) on/off for the active cell. | | `Ctrl`/`⌘` + `Shift` + `e` | Open [Edit](#modify-my-sql-using-plain-english--edit) for your current cell or selected text. | ### Navigation and layout | Command | Action | |---------|--------| | `Ctrl`/`⌘` + `k` | Open the command menu. | | `Ctrl`/`⌘` + `/` | Search notebooks, databases and more. | | `Ctrl`/`⌘` + `b` | Toggle the Object Explorer (left panel). | | `Ctrl`/`⌘` + `i` | Toggle the Column Explorer (right panel). | | `Ctrl`/`⌘` + `e` | Toggle notebook/worksheet view for the active cell. | | `Ctrl`/`⌘` + `↑` | Move current cell up. | | `Ctrl`/`⌘` + `↓` | Move current cell down. | | `Esc` | Switch `Tab` to UI navigation mode (reverts on next cell selection). | ## Settings Settings are found by clicking your profile at the top-left. | Section | Setting | Description | |---------|---------|-------------| | **Organization** | Details | Change the display name of the organization. Enable all users in your email domain to join. See [Managing Organizations](/key-tasks/managing-organizations). | | | Plans | View your current plan (Free, Standard) and switch plans. | | | Members | View and invite members to your organization. Members include human users and [service accounts](/key-tasks/service-accounts-guide/). | | **My Account** | Preferences | Enable [autocomplete](#autocomplete), inline [SQL error fix suggestions](#help-me-fix-this-broken-query--fixit) (FixIt), and [Inline Docs](#inline-docs). | | | Notifications | Configure notification preferences. | | | Ducklings | Manage [Duckling sizes](/about-motherduck/billing/duckling-sizes/#duckling-sizes), [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) pool size, version information, and Duckling reset for troubleshooting. | | **Integrations** | Access Tokens | Create tokens for programmatically [authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Tokens can have expiry dates. | | | Secrets | Store credentials for [AWS S3](/integrations/cloud-storage/amazon-s3), [Azure Blob Storage](/integrations/cloud-storage/azure-blob-storage), [Google Cloud Storage](/integrations/cloud-storage/google-cloud-storage), Cloudflare R2, and Hugging Face. | | **Monitor** | Running Queries | View and manage active queries. | | | Duckling overview | (Admin-only) View activity across every Duckling in your organization over the last 24 hours. See [Duckling overview](#duckling-overview). | | **Data** | Databases | Browse and manage your databases. | | | Shares | View and manage [shared databases](/key-tasks/sharing-data/). | | **Content** | Dives | Manage your saved [Dives](/key-tasks/ai-and-motherduck/dives/). | --- Source: https://motherduck.com/docs/getting-started/interfaces/postgres-endpoint --- sidebar_position: 4 title: "Postgres endpoint (thin client)" sidebar_label: Postgres Endpoint description: Query MotherDuck from any Postgres-compatible client without installing DuckDB feature_stage: preview --- MotherDuck's Postgres endpoint lets you query your databases using any client that speaks the PostgreSQL wire protocol — no DuckDB installation required. This is ideal for serverless environments, BI tools, or languages without a DuckDB SDK. ## Quick start with psql Set your access token and connect: ```bash export MOTHERDUCK_TOKEN="your_token_here" PGPASSWORD=$MOTHERDUCK_TOKEN psql \ -h pg.us-east-1-aws.motherduck.com \ -p 5432 \ -U postgres \ "dbname=sample_data sslmode=verify-full sslrootcert=system" ``` Run a query: ```sql SELECT title, score FROM sample_data.hn.hacker_news WHERE type = 'story' ORDER BY score DESC LIMIT 5; ``` ## Quick start with Python ```python # /// script # dependencies = ["psycopg"] # /// import psycopg, os conn = psycopg.connect( host="pg.us-east-1-aws.motherduck.com", port=5432, dbname="sample_data", user="postgres", password=os.environ["MOTHERDUCK_TOKEN"], sslmode="verify-full", sslrootcert="system", ) with conn.cursor() as cur: cur.execute("SELECT title, score FROM sample_data.hn.hacker_news WHERE type='story' LIMIT 5") for row in cur: print(row) conn.close() ``` ## Key things to know - You're writing **DuckDB SQL**, not PostgreSQL SQL. Queries and MotherDuck SQL that run entirely inside MotherDuck generally work, but the Postgres endpoint is not a full DuckDB client. - Commands that depend on **local files, local attachments, or extension management** are not supported over the Postgres endpoint. - The Postgres endpoint is best for query execution, DDL and DML on MotherDuck tables, metadata inspection, and server-side reads from remote storage. - Features that depend on DuckDB client session state, such as temporary tables or result creation, require a DuckDB client path instead. - Always connect with **SSL enabled** (`sslmode=verify-full` recommended). - Use your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) as the password. ## Next steps - [Postgres Endpoint reference](/sql-reference/postgres-endpoint) — connection parameters, SSL options, session options, and known limitations - [Connect from Python](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/python) — psycopg2 and psycopg3 setup - [Connect from Java](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/java) — PostgreSQL JDBC driver setup - [Connect from Node.js](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/nodejs) — node-postgres setup - [Connect from Cloudflare Workers](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/cloudflare-workers) — serverless edge deployment --- Source: https://motherduck.com/docs/getting-started/mcp-getting-started --- sidebar_position: 4 title: Talk to Your Data with AI sidebar_label: AI Data Analysis description: Get started with the MotherDuck MCP Server to analyze your data using natural language with Claude, ChatGPT, and other AI assistants --- import VideoPlayer from '@site/src/components/VideoPlayer'; import SignUpLink from '@site/src/components/SignUpLink'; The MotherDuck **remote** MCP Server lets you analyze your data using natural language and generate interactive visualizations, all without writing SQL. Connect your favorite AI assistant (Claude, ChatGPT, Cursor, or others) and start asking questions about your databases, then turn insights into shareable [Dives](/key-tasks/ai-and-motherduck/dives) with a single prompt. :::info Connection URL The remote MCP server is hosted at `https://api.motherduck.com/mcp`. Claude Desktop's connector uses this URL automatically; for clients that need manual configuration, see the [setup guide](/key-tasks/ai-and-motherduck/mcp-setup/). ::: :::note This guide covers the **remote MCP server** (fully managed by MotherDuck). If you need to work with local DuckDB files or want full control over the server, see the [local MCP server](/key-tasks/ai-and-motherduck/mcp-setup/#remote-vs-local-mcp-server). ::: In this guide, you'll connect the MCP server in Claude Desktop, query your data, and create a Dive visualization, all in under 5 minutes. ## What you'll learn - Connect the MotherDuck MCP Server to Claude Desktop - List your databases - Ask analytical questions about your data - Create an interactive Dive visualization from your analysis ## Prerequisites - A MotherDuck account (sign up free) - Claude Desktop installed ([download](https://claude.ai/download)) :::tip Using a different AI client? This guide uses Claude Desktop, but the remote MCP Server works with ChatGPT, Cursor, Claude Code, and other MCP-compatible clients. See the [full setup guide](/key-tasks/ai-and-motherduck/mcp-setup/) for instructions for your preferred client. ::: ## Step 1: Add the MCP server to Claude Desktop Open Claude Desktop settings and add the MotherDuck remote MCP Server: 1. Open **Claude Desktop** → **Settings** → **Connectors** 2. Click **Browse Connectors** and search for "MotherDuck" 3. Click **Add** to install the MotherDuck connector 4. A browser window opens for authentication with your MotherDuck account ## Step 2: Verify the connection and permissions After adding the connector, confirm Claude has access to the MotherDuck tools: 1. Open **Claude Desktop** → **Settings** → **Connectors** 2. Select **MotherDuck** and click on **Configure** You should see tools like `query`, `list_databases`, and `ask_docs_question` available. You can configure tool permissions to control how Claude uses each tool. See [Configuring tool permissions](/key-tasks/ai-and-motherduck/mcp-setup/#configuring-tool-permissions) for details. ## Step 3: List your databases Test the connection by asking Claude to list your databases: **Try this prompt:** ```text List all my databases on MotherDuck. ``` Claude will use the MCP tools to connect to MotherDuck and return your database list. ## Step 4: Analyze your data Now let's run an actual analysis. If you don't have data yet, you can attach the sample Hacker News database: **Attach the sample database:** ```text Attach this db 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' give me some analytics. ``` The `hacker_news` database contains Hacker News stories, comments, and metadata from 2016 to 2025. You'll see that even with a minimal prompt, you get great results for a first data exploration. For more tips on effective prompting and workflow patterns, check out the [MCP Workflows Guide](/key-tasks/ai-and-motherduck/mcp-workflows/).
:::info Sample databases The `hacker_news` database is one of several sample datasets available. See [Sample Data & Queries](/getting-started/sample-data-queries/datasets) for more datasets to explore. ::: ## Step 5: Create visualizations with Dives Now that you've explored your data, turn your insights into a persistent, interactive visualization. [Dives](/key-tasks/ai-and-motherduck/dives) are shareable visualizations that live in your MotherDuck workspace and stay up to date with your data. **Try this prompt:** ```text Create a Dive based on these insights. ``` Claude renders the Dive inline in the conversation with the Dive Viewer MCP App, using the same components as the MotherDuck UI and running against live data. Iterate conversationally: *"add a filter for the last 30 days"*, *"switch to a bar chart"*. Each edit saves as a separate version. ```text Save it to MotherDuck. ``` The Dive is saved to your workspace. You can open it in the MotherDuck UI, share it with your team, and it will always query live data. ## Next steps You're now ready to analyze your data and create visualizations with AI. Here are some ways to go deeper: - **[MCP Workflows Guide](/key-tasks/ai-and-motherduck/mcp-workflows/)**: Best practices and workflow patterns, including [how it works under the hood](/key-tasks/ai-and-motherduck/mcp-workflows/#how-it-works) - **[Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/)**: Go deeper into Dives by iterating on visualizations, sharing with your team, and managing version history - **[Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/)**: Setup instructions for ChatGPT, Cursor, Claude Code, and other clients - **[MCP Server Reference](/sql-reference/mcp/)**: Server capabilities, available tools, and regional availability - **[Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/)**: Build custom AI agents that programmatically query your data --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/air-quality --- sidebar_position: 3 title: Air Quality description: Sample data from the WHO Ambient Air Quality Database to use with DuckDB and MotherDuck --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; import SQLExampleEditor from '@site/src/components/SQLExampleEditor'; ## Explore the data Interactive dashboard built on the WHO air quality dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset The [WHO Ambient Air Quality Database](https://www.who.int/publications/m/item/who-ambient-air-quality-database-(update-2023)) (6th edition, released in **May 2023**) compiles annual mean concentrations of nitrogen dioxide (NO2) and particulate matter (PM10, PM2.5) from ground measurements across over 8600 human settlements in more than 120 countries. This data, updated every 2-3 years since **2011**, primarily represents city or town averages and is used to monitor the Sustainable Development Goal Indicator 11.6.2, Air quality in cities. To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx) ## Example queries ### Annual city air quality rating This query assesses the average annual air quality in different cities per year based on WHO guidelines. It calculates the average concentrations of PM2.5, PM10, and NO2, then assigns an air quality rating of 'Good', 'Moderate', or 'Poor'. 'Good' indicates all pollutants are within WHO recommended levels, 'Poor' indicates all pollutants exceed WHO recommended levels, and 'Moderate' refers to any other scenario. The results are grouped and ordered by city and year. {` SELECT city, year, CASE WHEN AVG(pm25_concentration) <= 10 AND AVG(pm10_concentration) <= 20 AND AVG(no2_concentration) <= 40 THEN 'Good' WHEN AVG(pm25_concentration) > 10 AND AVG(pm10_concentration) > 20 AND AVG(no2_concentration) > 40 THEN 'Poor' ELSE 'Moderate' END AS airqualityrating FROM sample_data.who.ambient_air_quality GROUP BY city, year ORDER BY city, year; `} ### Yearly average pollutant concentrations of a city This query calculates the yearly average concentrations of PM2.5, PM10, and NO2 in a given city, here `Berlin`. {` SELECT year, AVG(pm25_concentration) AS avg_pm25, AVG(pm10_concentration) AS avg_pm10, AVG(no2_concentration) AS avg_no2 FROM sample_data.who.ambient_air_quality WHERE city = 'Berlin' GROUP BY year ORDER BY year DESC; `} ## Schema | column_name | column_type | null | key | default | extra | |--------------------|-------------|------|-----|---------|-------| | who_region | VARCHAR | YES | | | | | iso3 | VARCHAR | YES | | | | | country_name | VARCHAR | YES | | | | | city | VARCHAR | YES | | | | | year | BIGINT | YES | | | | | version | VARCHAR | YES | | | | | pm10_concentration | BIGINT | YES | | | | | pm25_concentration | BIGINT | YES | | | | | no2_concentration | BIGINT | YES | | | | | pm10_tempcov | BIGINT | YES | | | | | pm25_tempcov | BIGINT | YES | | | | | no2_tempcov | BIGINT | YES | | | | | type_of_stations | VARCHAR | YES | | | | | reference | VARCHAR | YES | | | | | web_link | VARCHAR | YES | | | | | population | VARCHAR | YES | | | | | population_source | VARCHAR | YES | | | | | latitude | FLOAT | YES | | | | | longitude | FLOAT | YES | | | | | who_ms | BIGINT | YES | | | | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/datasets --- title: Example Datasets description: A collections of open datasets and queries to get you started with DuckDB and MotherDuck --- We have prepared a series of datasets for you to [dive](/key-tasks/ai-and-motherduck/dives/) into MotherDuck! ## sample_data The `sample_data` database is automatically attached to every MotherDuck account regardless of your region. You can start querying the following tables right away: | `schema.table` | Description | |--------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| | [`who.ambient_air_quality`](air-quality.md) | Historical air quality data from the World Health Organization. | | [`nyc.taxi`](nyc-311-data.md) | Taxi ride data from November 2020 | | [`nyc.rideshare`](nyc-311-data.md) | Ride share trips (Lyft, Uber etc) in NYC | | [`nyc.service_requests`](nyc-311-data.md) | Requests to NYC's 311 complaint hotline through phone and web | | [`hn.hacker_news`](hacker-news.md) | Sample of comments from [Hacker News](https://news.ycombinator.com/) | | [`kaggle.movies`](kaggle-movies.md) | Movie titles and overviews with pre-computed embeddings from [Kaggle](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset) | | [`stackoverflow_survey.survey_results`](stackoverflow-survey.md) | Survey results from 2017 to 2024 | | [`stackoverflow_survey.survey_schemas`](stackoverflow-survey.md) | Survey schemas (questions from the survey) from 2017 to 2024 | ## Additional datasets The following datasets are available as separate shared databases. See each dataset's page for instructions on how to attach them. :::note `aws-us-east-1` region only These additional databases are only available for accounts in the `aws-us-east-1` region. ::: | Dataset | Description | |--------------------------------------------|---------------------------------------------------------------------------------------| | [StackOverflow](stackoverflow.md) | Full StackOverflow data dump up to May 2023 | | [PyPi / DuckDB Stats](pypi.md) | Python package download data for the `duckdb` package, refreshed weekly | | [Hacker News (full)](hacker-news.md) | Full [Hacker News](https://news.ycombinator.com/) dataset from 2016 to 2025 | | [Foursquare](foursquare.md) | Global dataset of over 100 million points of interest (POIs) with location and business information | ## FAQ ### How do I re-attach the sample_data database? The `sample_data` database is attached automatically, but if you have accidentally removed it, you can re-attach it with: ```sql ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data; ``` --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/foursquare --- sidebar_position: 4 title: Foursquare description: Foursquare Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI) --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; ## Explore the data Interactive dashboard built on the Foursquare Open Source Places dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset [Foursquare](https://docs.foursquare.com/data-products/docs/fsq-places-open-source) Open Source Places (FSQ OS Places) is a global, open-source dataset of over 100 million points of interest (POI), featuring 22 core attributes, updated monthly, and designed to support geospatial applications with a collaborative, AI- and human-powered data curation system. This database is updated monthly, we host however a snapshot of 2025-01-10. You have two tables : - `fsq_os_places` (Places) : a global dataset of over 100 million points of interest (POIs) with detailed location, business, and contact information. - `fsq_os_categories` (Categories) : a hierarchical classification of POIs with up to six levels, detailing category names and IDs. :::note `aws-us-east-1` region only This database is only available for accounts in the `aws-us-east-1` region. ::: You can attach the `foursquare` database to your account by running the following command: ```sql ATTACH 'md:_share/foursquare/0cbf467d-03b0-449e-863a-ce17975d2c0b' AS foursquare; ``` ## Example queries The following queries assume that the current database connected is `foursquare`. Run `use foursquare` to switch to it. ### Countries with the most places ```sql SELECT country, COUNT(*) AS places FROM fsq_os_places GROUP BY country ORDER BY places DESC LIMIT 10; ``` ## Schema ### fsq_os_places - places dataset | Column Name | Type | Description | |--------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | fsq_place_id | String | The unique identifier of a Foursquare POI. Use this ID to view a venue at: `foursquare.com/v/{fsq_place_id}ud` | | name | String | Business name of a POI | | latitude/longitude | Decimal | Decimal coordinates (WGS84 datum) up to 6 decimal places. Derived from third-party sources, user input, and corrections. Default geocode type: front door or rooftop. | | address | String | User-entered street address of the venue | | locality | String | City, town, or equivalent where the POI is located | | region | String | State, province, or territory. Abbreviations used in US, CA, AU, BR; full names elsewhere | | postcode | String | Postal code or equivalent, formatted based on country (e.g., 5-digit US ZIP code) | | admin_region | String | Additional sub-division (e.g., Scotland) | | post_town | String | Town/place used in postal addressing (may differ from geographic location) | | po_box | String | Post Office Box | | country | String | 2-letter ISO Country Code | | date_created | Date | Date the POI entered the database (not necessarily the opening date) | | date_refreshed | Date | Last date any reference was refreshed through crawl, users, or validation | | date_closed | Date | Date the POI was marked closed in the database (not necessarily actual closure date) | | tel | String | Telephone number with local formatting | | website | String | URL to the POI’s (or chain’s) website | | email | String | Primary contact email address, if available | | facebook_id | String | POI's Facebook ID, if available | | instagram | String | POI's Instagram handle, if available | | twitter | String | POI's Twitter handle, if available | | fsq_category_ids | Array (String) | ID(s) of the most granular category(ies). See the Categories page for details | | fsq_category_labels| Array (String) | Label(s) of the most granular category(ies). See the Categories page for details | | placemaker_url | String | Link to the POI’s review page in PlaceMaker Tools for suggesting edits or reviewing pending changes | | geom | wkb | Geometry of the POI in WKB format for visualization through the vector tiling service | | bbox | struct | An area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0. `bbox:struct xmin:double ymin:double xmax:double ymax:double` | --- ### fsq_os_categories - category dataset | Column Name | Type | Description | |----------------------|---------|-----------------------------------------------------------------------------------------------------| | category_id | String | Unique identifier of the Foursquare category (BSON format) | | category_level | Integer | Hierarchy depth of the category (1-6) | | category_name | String | Name of the most granular category | | category_label | String | Full category hierarchy separated by `>` | | level1_category_id | String | Unique ID of the first-level category | | level1_category_name | String | Name of the first-level category | | level2_category_id | String | Unique ID of the second-level category | | level2_category_name | String | Name of the second-level category | | level3_category_id | String | Unique ID of the third-level category | | level3_category_name | String | Name of the third-level category | | level4_category_id | String | Unique ID of the fourth-level category | | level4_category_name | String | Name of the fourth-level category | | level5_category_id | String | Unique ID of the fifth-level category | | level5_category_name | String | Name of the fifth-level category | | level6_category_id | String | Unique ID of the sixth-level category | | level6_category_name | String | Name of the sixth-level category | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/hacker-news --- sidebar_position: 2 title: Hacker News description: Sample data from Hacker News stories to use for SQL querying of DuckDB and MotherDuck databases. --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; import SQLExampleEditor from '@site/src/components/SQLExampleEditor'; ## Explore the data Interactive dashboard built on the Hacker News sample dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset [Hacker News](https://news.ycombinator.com/) is a social news website focusing on computer science and entrepreneurship. It is run by Y Combinator, a startup accelerator, and it's known for its minimalist interface. Users can post stories (such as links to articles), comment on them, and vote them up or down, affecting their visibility. There are two ways to access the dataset: - Through the `sample_data` database, which contains a sample of the data (from **January 2022** to **November 2022**). This database is automatically attached to every MotherDuck account. - Through the `hacker_news` database, which contains the full dataset (from **2016** to **2025**). To attach the full `hacker_news` database, you can use the following command: :::note `aws-us-east-1` region only The `hacker_news` database is only available for accounts in the `aws-us-east-1` region. ::: ```sql ATTACH 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' AS hacker_news; ``` To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx) ## Example queries ### Most shared websites This query returns the top domains being shared on Hacker News. {` SELECT regexp_extract(url, 'http[s]?://([^/]+)/', 1) AS domain, count(*) AS count FROM sample_data.hn.hacker_news WHERE url IS NOT NULL AND regexp_extract(url, 'http[s]?://([^/]+)/', 1) != '' GROUP BY domain ORDER BY count DESC LIMIT 20; `} ### Most commented stories each month This query calculates the total number of comments for each story and identifies the most commented story of each month. {` WITH ranked_stories AS ( SELECT title, 'https://news.ycombinator.com/item?id=' || id AS hn_url, descendants AS nb_comments, YEAR(timestamp) AS year, MONTH(timestamp) AS month, ROW_NUMBER() OVER ( PARTITION BY YEAR(timestamp), MONTH(timestamp) ORDER BY descendants DESC ) AS rn FROM sample_data.hn.hacker_news WHERE type = 'story' ) SELECT year, month, title, hn_url, nb_comments FROM ranked_stories WHERE rn = 1 ORDER BY year, month; `} ### Most monthly voted stories This query determines the most voted story for each month. {` WITH ranked_stories AS ( SELECT title, 'https://news.ycombinator.com/item?id=' || id AS hn_url, score, YEAR(timestamp) AS year, MONTH(timestamp) AS month, ROW_NUMBER() OVER (PARTITION BY YEAR(timestamp), MONTH(timestamp) ORDER BY score DESC) AS rn FROM sample_data.hn.hacker_news WHERE type = 'story' ) SELECT year, month, title, hn_url, score FROM ranked_stories WHERE rn = 1 ORDER BY year, month; `} ### Keyword analysis This query counts the monthly mentions a the keyword (here `duckdb`) in the title or text of Hacker News posts, organized by year and month. {` SELECT YEAR(timestamp) AS year, MONTH(timestamp) AS month, COUNT(*) AS keyword_mentions FROM sample_data.hn.hacker_news WHERE (title LIKE '%duckdb%' OR text LIKE '%duckdb%') GROUP BY year, month ORDER BY year ASC, month ASC; `} ## Schema | column_name | column_type | null | key | default | extra | |-------------|-------------|------|-----|---------|-------| | title | VARCHAR | YES | | | | | url | VARCHAR | YES | | | | | text | VARCHAR | YES | | | | | dead | BOOLEAN | YES | | | | | by | VARCHAR | YES | | | | | score | BIGINT | YES | | | | | time | BIGINT | YES | | | | | timestamp | TIMESTAMP | YES | | | | | type | VARCHAR | YES | | | | | id | BIGINT | YES | | | | | parent | BIGINT | YES | | | | | descendants | BIGINT | YES | | | | | ranking | BIGINT | YES | | | | | deleted | BOOLEAN | YES | | | | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/kaggle-movies --- sidebar_position: 3 title: Kaggle Movies description: A dataset of over 40,000 movies with titles, overviews, and pre-computed embeddings for semantic search. --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; import SQLExampleEditor from '@site/src/components/SQLExampleEditor'; ## Explore the data Interactive dashboard with semantic search on the Kaggle Movies sample dataset. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset This dataset is a subset of the [Kaggle Movies Dataset](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset), containing over 40,000 movie titles and overviews. It also includes pre-computed 512-dimensional vector embeddings (generated with OpenAI's `text-embedding-3-small` model) for both the title and overview fields, making it useful for experimenting with [semantic search](/key-tasks/ai-and-motherduck/text-search-in-motherduck/) in MotherDuck. ## How to query the dataset This dataset is available as part of the `sample_data` database, which is automatically attached to every MotherDuck account. ## Example queries ### Browse movies {` SELECT title, overview FROM sample_data.kaggle.movies LIMIT 10; `} ### Find similar movies using vector search Use the pre-computed embeddings together with the [`embedding`](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) function to find movies similar to a search query: {` SELECT title, overview, array_cosine_similarity( overview_embeddings, embedding('a space adventure with aliens') ) AS similarity FROM sample_data.kaggle.movies WHERE overview IS NOT NULL ORDER BY similarity DESC LIMIT 10; `} ### Find movies similar to another movie {` WITH target AS ( SELECT overview_embeddings FROM sample_data.kaggle.movies WHERE title = 'The Matrix' LIMIT 1 ) SELECT m.title, m.overview, array_cosine_similarity(m.overview_embeddings, t.overview_embeddings) AS similarity FROM sample_data.kaggle.movies m, target t WHERE m.title != 'The Matrix' ORDER BY similarity DESC LIMIT 10; `} ## Schema | Column Name | Column Type | Description | |-----------------------|-------------|-----------------------------------------------------------------| | title | VARCHAR | Movie title | | overview | VARCHAR | Short description or synopsis of the movie | | title_embeddings | FLOAT[512] | Pre-computed vector embedding of the title | | overview_embeddings | FLOAT[512] | Pre-computed vector embedding of the overview | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/nyc-311-data --- sidebar_position: 4 title: NYC 311 Complaint Data description: New York City provides data from 311 call service requests. This data can be used as sample data for DuckDB and MotherDuck SQL queries. --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; import SQLExampleEditor from '@site/src/components/SQLExampleEditor'; ## Explore the data Interactive dashboards built on the NYC sample datasets. Use them as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset The [New York City 311 Service Requests Data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9) provides information on requests to the city's complaint service from 2010 to the present. NYC311 responds to thousands of inquiries, comments and requests from customers every single day. This dataset represents only service requests that can be directed to specific agencies. This dataset is updated daily and expected values for many fields will change over time. The lists of expected values associated with each column are not exhaustive. Each row of data contains information about the service request, including complaint type, responding agency, and geographic location. However the data does not reveal any personally identifying information about the customer who made the request. This dataset describes site-specific non-emergency complaints (also known as “service requests”) made by customers across New York City about a variety of topics, including noise, sanitation, and street quality. To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx) ## Example queries ### The most common complaints in 2018 {` SELECT UPPER(complaint_type), COUNT(1) FROM sample_data.nyc.service_requests WHERE DATE_PART('year', created_date) = 2018 GROUP BY 1 HAVING COUNT(*) > 1000 ORDER BY 2 DESC; `} ## Schema The columns have been renamed to `lower_case_underscore` format for ease of typing. For more details on column data than below, see the associated data dictionary at that link above, in an Excel file. | column_name | column_type | null | description | |--------------------------------|---------------|--------|-------------| | unique_key | BIGINT | YES | Unique identifier of a Service Request (SR) in the open data set. Each 311 service request is assigned a number that distinguishes it as a separate case incident. | | created_date | TIMESTAMP | YES | The date and time that a Customer submits a Service Request. | | closed_date | TIMESTAMP | YES | The date and time that an Agency closes a Service Request. | | agency | VARCHAR | YES | Acronym of responding City Government Agency or entity responding to 311 Service Request. | | agency_name | VARCHAR | YES | Full agency name of responding City Government Agency, or entity responding to 311 service request. | | complaint_type | VARCHAR | YES | This is the first level of a hierarchy identifying the topic of the incident or condition. Complaint Type broadly describes the topic of the incident or condition and are defined by the responding agencies. | | descriptor | VARCHAR | YES | This is associated to the Complaint Type, and provides further detail on the incident or condition. Descriptor values are dependent on the Complaint Type, and are not always required in the service request. | | location_type | VARCHAR | YES | Describes the type of location used in the address information | | incident_zip | VARCHAR | YES | Zip code of the incident address | | incident_address | VARCHAR | YES | House number and street name of incident address | | street_name | VARCHAR | YES | Street name of incident address | | cross_street_1 | VARCHAR | YES | First Cross street based on the geo validated incident location.| | cross_street_2 | VARCHAR | YES | Second Cross Street based on the geo validated incident location | | intersection_street_1 | VARCHAR | YES | First intersecting street based on geo validated incident location | | intersection_street_2 | VARCHAR | YES | Second intersecting street based on geo validated incident location | | address_type | VARCHAR | YES | Type of information available about the incident location: Address; Block face; Intersection; LatLong; Placename | | city | VARCHAR | YES | In this dataset, City can refer to a borough or neighborhood. MANHATTAN, BROOKLYN, BRONX, STATEN ISLAND, or in QUEENS, specific neighborhood name | | landmark | VARCHAR | YES | If the incident location is identified as a Landmark the name of the landmark will display here. Can refer to any noteworthy location, including but not limited to, parks, hospitals, airports, sports facilities, performance spaces, etc. | | facility_type | VARCHAR | YES | If applicable, this field describes the type of city facility associated to the service request: DSNY Garage, Precinct, School, School District, N/A | | status | VARCHAR | YES | Current status of the service request submitted: Assigned, Canceled, Closed, Pending | | due_date | TIMESTAMP | YES | Date when responding agency is expected to update the SR. This is based on the Complaint Type and internal Service Level Agreements (SLAs) | | resolution_description | VARCHAR | YES | Describes the last action taken on the service request by the responding agency. May describe next or future steps. | | resolution_action_updated_date | TIMESTAMP | YES | Date when responding agency last updated the service request. | | bbl | VARCHAR | YES | Parcel number that identifies the location of the building or property associated with the service request. The block is a subset of a borough. The lot is a subset of a block unique within a borough and block. | | borough | VARCHAR | YES | The borough number is: 1. Manhattan (New York County) 2. Bronx (Bronx County) 3. Brooklyn (Kings County) 4. Queens (Queens County) 5. Staten Island (Richmond County) | | x_coordinate_state_plane | VARCHAR | YES | Geo validated, X coordinate of the incident location. X coordinate of the incident location. For more information about NY State Plane Coordinate Zones: https://data.gis.ny.gov/datasets/ny-state-plane-coordinate-system-zones/explore | | y_coordinate_state_plane | VARCHAR | YES | Geo validated, Y coordinate of the incident location. Y coordinate of the incident location. For more information about NY State Plane Coordinate Zones: https://data.gis.ny.gov/datasets/ny-state-plane-coordinate-system-zones/explore | | open_data_channel_type | VARCHAR | YES | Indicates how the service request was submitted to 311: Phone, Online, Other (submitted by other agency) | | park_facility_name | VARCHAR | YES | If the incident location is a Parks Dept facility and service requests pertains to a facility managed by NYC Parks (DPR), the name of the facility will appear here | | park_borough | VARCHAR | YES | The borough of incident if the service request is pertaining to a NYC Parks Dept facility (DPR) | | vehicle_type | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commission (TLC): Ambulette / Paratransit; Car Service; Commuter Van; Green Taxi | | taxi_company_borough | VARCHAR | YES | Data provided if service request pertains to a vehicle managed by the Taxi and Limousine Commission (TLC). | | taxi_pick_up_location | VARCHAR | YES | If the incident pertains a vehicle managed by the Taxi and Limousine Commission (TLC), this field displays the taxi pick up location | | bridge_highway_name | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the name will be displayed here | | bridge_highway_direction | VARCHAR | YES | If the incident is identified as a Bridge/Highway, the direction where the issue took place would be displayed here. | | road_ramp | VARCHAR | YES | If the incident location was Bridge/Highway this column differentiates if the issue was on the Road or the Ramp. | | bridge_highway_segment | VARCHAR | YES | Additional information on the section of the Bridge/Highway were the incident took place. | | latitude | DOUBLE | YES | Geo based Latitude of the incident location in decimal degrees | | longitude | DOUBLE | YES | Geo based Longitude of the incident location in decimal degrees | | community_board | VARCHAR | YES | Community boards are local representative bodies. There are 59 community boards throughout the City. For more information on Community Boards: [NYC government website](https://www.nyc.gov/site/cau/community-boards/community-boards.page) | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/pypi --- sidebar_position: 5 title: PyPi Data description: Want to know how users find and install software you've developed for the Python Community? This DuckDB and MotherDuck database allows you to use SQL to perform data analysis on PyPi data. --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; ## Explore the data Interactive dashboard built on the DuckDB PyPI download stats. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset PyPi is the Python Package Index, a repository of software packages for the Python programming language. It is a central repository that allows users to find and install software developed and shared by the Python community. The dataset includes information about packages, releases, and downloads on the `duckdb` python package. It's refreshed **weekly** and you can visit the [DuckDB Stats dashboard](https://duckdbstats.com). ## How to query the dataset A dedicated shared database is maintained to query the dataset. :::note `aws-us-east-1` region only This database is only available for accounts in the `aws-us-east-1` region. ::: To attach it to your workspace, you can use the following command: ```sql ATTACH 'md:_share/duckdb_stats/1eb684bf-faff-4860-8e7d-92af4ff9a410' AS duckdb_stats; ``` ## Example queries The following queries assume that the current database connected is `duckdb_stats`. Run `use duckdb_stats` to switch to it. ### Get weekly download stats ```sql SELECT DATE_TRUNC('week', download_date) AS week_start_date, version, country_code, python_version, SUM(daily_download_sum) AS weekly_download_sum FROM duckdb_stats.main.pypi_daily_stats GROUP BY ALL ORDER BY week_start_date ``` ## Schema ### pypi_file_downloads This table contains the raw data. Each row represents a download from PyPi. | column_name | column_type | null | |--------------|----------------------------------------------------------------------------------------------------------------|------| | timestamp | TIMESTAMP | YES | | country_code | VARCHAR | YES | | url | VARCHAR | YES | | project | VARCHAR | YES | | file | STRUCT(filename VARCHAR, project VARCHAR, "version" VARCHAR, "type" VARCHAR) | YES | | details | STRUCT("installer" STRUCT("name" VARCHAR, "version" VARCHAR), "python" VARCHAR, "implementation" STRUCT("name" VARCHAR, "version" VARCHAR), "distro" STRUCT("name" VARCHAR, "version" VARCHAR, "id" VARCHAR, "libc" STRUCT("lib" VARCHAR, "version" VARCHAR)), "system" STRUCT("name" VARCHAR, "release" VARCHAR), "cpu" VARCHAR, "openssl_version" VARCHAR, "setuptools_version" VARCHAR, "rustc_version" VARCHAR, "ci" BOOLEAN) | YES | | tls_protocol | VARCHAR | YES | | tls_cipher | VARCHAR | YES | ### pypi_daily_stats This table is a daily aggregation of the raw data. It contains the following columns: | column_name | column_type | null | |-------------------|-------------|------| | load_id | VARCHAR | YES | | download_date | DATE | YES | | system_name | VARCHAR | YES | | system_release | VARCHAR | YES | | version | VARCHAR | YES | | project | VARCHAR | YES | | country_code | VARCHAR | YES | | cpu | VARCHAR | YES | | python_version | VARCHAR | YES | | daily_download_sum| BIGINT | YES | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/stackoverflow-survey --- sidebar_position: 5 title: StackOverflow Survey Data description: Data from the StackOverflow Developer Survey from 2017 to 2024. --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; import SQLExampleEditor from '@site/src/components/SQLExampleEditor'; ## Explore the data Interactive dashboard built on the survey data. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset Each year, [Stack Overflow conducts a survey](https://survey.stackoverflow.co/) of developers to understand the trends in the developer community. The survey covers a wide range of topics, including programming languages, frameworks, databases, and platforms, as well as developer demographics, education, and career satisfaction. Starting from 2017, StackOverflow provided consistent schema and data format for the survey data, making it a great dataset to analyze trends in the developer community over the years. The source is data are a series of CSV files that has been merged into a single schema with two tables for easy querying. ## How to query the dataset This dataset is available as part of the `sample_data` database, which is automatically attached to every MotherDuck account. ## Example queries ### List the most popular programming languages in 2024 {` SELECT language, COUNT(*) AS count FROM ( SELECT UNNEST(STRING_SPLIT(LanguageHaveWorkedWith, ';')) AS language FROM sample_data.stackoverflow_survey.survey_results where year='2024' ) AS languages GROUP BY language ORDER BY count DESC; `} ### Top 10 countries with the most respondents in 2024 {` SELECT Country, COUNT(*) AS Respondents FROM sample_data.stackoverflow_survey.survey_results WHERE year = '2024' GROUP BY Country ORDER BY Respondents DESC LIMIT 10; `} ### Correlation between remote work and job satisfaction in 2024 {` SELECT RemoteWork, AVG(CAST(JobSat AS DOUBLE)) AS AvgJobSatisfaction, COUNT(*) AS RespondentCount FROM sample_data.stackoverflow_survey.survey_results WHERE JobSat NOT IN ('NA', 'Slightly satisfied', 'Neither satisfied nor dissatisfied', 'Very dissatisfied', 'Very satisfied', 'Slightly dissatisfied') AND RemoteWork NOT IN ('NA') AND YEAR='2024' GROUP BY ALL `} ## Schema ### stackoverflow_survey.survey_results This table contains all the survey results from 2017 to 2024. Each column represents a question from the survey. As questions change from year to year, the columns may vary a bit and the table is quite large. ### stackoverflow_survey.survey_schema This table contains the schema of the survey results. `qname` is the name of the question, which is also the column name in the `survey_results` table. `question` is the full question text. | Column Name | Column Type | |---------------|-------------| | qname | VARCHAR | | question | VARCHAR | | qid | VARCHAR | | force_resp | VARCHAR | | type | VARCHAR | | selector | VARCHAR | | year | VARCHAR | --- Source: https://motherduck.com/docs/getting-started/sample-data-queries/stackoverflow --- sidebar_position: 5 title: StackOverflow Data description: Sample data from StackOverflow to use with DuckDB and MotherDuck to understand SQL-based data analytics. --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; ## Explore the data Interactive dashboard built on the full Stack Overflow archive. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/). ## About the dataset [Stack Overflow](https://stackoverflow.com/) is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions. The dataset includes a complete **data dump up to May 2023**, covering posts, comments, users, badges, and related metrics. You can read more about the dataset in our blog series [part 1](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-1/) and [part 2](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-2/). ## How to query the dataset As this dataset is quite large, it's not part of the `sample_data` database. Instead, you can find it as a dedicated shared database. :::note `aws-us-east-1` region only This database is only available for accounts in the `aws-us-east-1` region. ::: To attach it to your workspace, you can use the following command: ```sql ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow; ``` ## Example queries The following queries assume that the current database connected is `stackoverflow`. Run `use stackoverflow` to switch to it. ### List the top 5 posts that received the most votes ```sql SELECT posts.Title, COUNT(votes.Id) AS VoteCount FROM posts JOIN votes ON posts.Id = votes.PostId GROUP BY posts.Title ORDER BY VoteCount DESC LIMIT 5; ``` ### Find the top 5 posts with the highest view count: ```sql SELECT Title, ViewCount FROM posts ORDER BY ViewCount DESC LIMIT 5; ``` ## Schema ### Badges | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | UserId | BIGINT | YES | | | | | Name | VARCHAR | YES | | | | | Date | TIMESTAMP | YES | | | | | Class | BIGINT | YES | | | | | TagBased | BOOLEAN | YES | | | | ### Comments | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostId | BIGINT | YES | | | | | Score | BIGINT | YES | | | | | Text | VARCHAR | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | UserId | BIGINT | YES | | | | | ContentLicense | VARCHAR | YES | | | | ### Post links | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | PostId | BIGINT | YES | | | | | RelatedPostId | BIGINT | YES | | | | | LinkTypeId | BIGINT | YES | | | | ### Posts | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostTypeId | BIGINT | YES | | | | | AcceptedAnswerId | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | Score | BIGINT | YES | | | | | ViewCount | BIGINT | YES | | | | | Body | VARCHAR | YES | | | | | OwnerUserId | BIGINT | YES | | | | | LastEditorUserId | BIGINT | YES | | | | | LastEditorDisplayName | VARCHAR | YES | | | | | LastEditDate | TIMESTAMP | YES | | | | | LastActivityDate | TIMESTAMP | YES | | | | | Title | VARCHAR | YES | | | | | Tags | VARCHAR | YES | | | | | AnswerCount | BIGINT | YES | | | | | CommentCount | BIGINT | YES | | | | | FavoriteCount | BIGINT | YES | | | | | CommunityOwnedDate | TIMESTAMP | YES | | | | | ContentLicense | VARCHAR | YES | | | | ### Tags | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | TagName | VARCHAR | YES | | | | | Count | BIGINT | YES | | | | | ExcerptPostId | BIGINT | YES | | | | | WikiPostId | BIGINT | YES | | | | ### Votes | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | PostId | BIGINT | YES | | | | | VoteTypeId | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | ### Users | column_name | column_type | null | key | default | extra | |---|---|---|---|---|---| | Id | BIGINT | YES | | | | | Reputation | BIGINT | YES | | | | | CreationDate | TIMESTAMP | YES | | | | | DisplayName | VARCHAR | YES | | | | | LastAccessDate | TIMESTAMP | YES | | | | | AboutMe | VARCHAR | YES | | | | | Views | BIGINT | YES | | | | | UpVotes | BIGINT | YES | | | | | DownVotes | BIGINT | YES | | | | --- Source: https://motherduck.com/docs/integrations/bi-tools/evidence --- sidebar_position: 3 title: Evidence description: Build code-based data products with Evidence connected to MotherDuck using SQL and markdown. --- import BlockWithBacktick from '@site/src/components/BlockWithBacktick'; [Evidence](https://evidence.dev/) is an open source, code-based alternative to drag-and-drop BI tools. Build polished data products with just SQL and markdown. ## Getting started Head over to [their installation page](https://docs.evidence.dev/getting-started/install-evidence) and start with their template to get you started. ## Authenticate to MotherDuck When using development, you can go manually through the UI, pick "settings". If you are running Evidence locally, typically at [http://localhost:3000/settings](http://localhost:3000/settings). ![img](../img/evidence_settings.png) Then select 'DuckDB' as a connection type, and as the filename, use `'md:?motherduck_token=xxxx'` where `xxx` is your [access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). Finally as extension, select "No extension". Click on `Save`. ![img](../img/evidence_duckdb.png) In production, you can set [some global environments](https://docs.evidence.dev/deployment/environments#prod-environment), you would have to set two environments variables: - `EVIDENCE_DUCKDB_FILENAME='md:?motherduck_token=xxxx'` - `EVIDENCE_DATABASE=duckdb` ## Displaying some data through SQL and Markdown Once done, you can add a new page in the `pages` folder and add the following code blocks to `stackoverflow.md` file: First, we simply add some Markdown headers. ```md --- title: Evidence & MotherDuck --- # Stories with most score ``` Then, we query our data from the [HackerNews sample_data database](/getting-started/sample-data-queries/hacker-news.md) in MotherDuck. The query is fetching the top stories (posts) from HackerNews. SELECT id, title, score, "by", strftime('%Y-%m-%d', to_timestamp(time)) AS date FROM sample_data.hn.hacker_news WHERE type = 'story' ORDER BY score DESC LIMIT 20; Finally, we use the reference of that query result `new_items` to create a list that would be generated in Mardown. The list contains the title (with the url of the story), the date, the score and the author of the story. ```md {#each new_items as item} * [{item.title}](https://news.ycombinator.com/item?id={item.id}) {item.date} ⬆ {item.score} by [{item.by}](https://news.ycombinator.com/user?id={item.by}) {/each} ``` Head over then to this page you created and you should see the final result that looks like this: ![img](../img/evidence_hackernews.png) --- Source: https://motherduck.com/docs/integrations/bi-tools/excel --- sidebar_position: 7 sidebar_label: Microsoft Excel title: Connect MotherDuck to Excel description: Load MotherDuck data into Excel using the DuckDB ODBC driver on Windows or export options for macOS. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Use Excel's 'Get Data' flow with the DuckDB ODBC driver to load MotherDuck data into Excel. This setup works well for recurring reporting, analysis, ad hoc SQL exploration, finance models, and operational dashboards without having to rely on exported CSVs. ## Before you start To get started you'll need the following. - Windows + Excel (ODBC is Windows-only for this flow) - A MotherDuck access token (create one in the [MotherDuck token page](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token)) - Admin rights on your computer to install the ODBC driver ## Installation steps ### 1. Install the DuckDB ODBC driver 1. Download the latest DuckDB ODBC driver for Windows (amd64): - [duckdb_odbc-windows-amd64.zip](https://github.com/duckdb/duckdb-odbc/releases/latest/download/duckdb_odbc-windows-amd64.zip) 2. Extract the `.zip` file and run `odbc_install.exe` as Administrator (right click -> Run as administrator). ### 2. Configure the DuckDB System DSN 1. Open the ODBC Data Source Administrator: - 64-bit Excel: Start menu -> ODBC Data Sources (64-bit) - 32-bit Excel: Start menu -> ODBC Data Sources (32-bit) ![ODBC Data Sources in Windows](./img/ODBC-data-sources-windows.png) 2. Go to System DSN, select DuckDB, and click Configure. ![Select the DuckDB system DSN](./img/ODBC-data-source-duckdb.png) 3. Set Database to one of the following: - Recommended (scoped): `md:your_database_name` - Open scope: `md:` (allows access to any database) 4. Click OK to save. ![DuckDB ODBC configuration for MotherDuck](./img/ODBC-data-source-configuration.png) ### 3. Connect from Excel (Get Data) 1. In Excel, go to Data -> Get Data -> From Other Sources -> From ODBC. ![Excel Get Data menu](./img/getdata-excel.png) 2. Choose DuckDB from the DSN dropdown and click OK. ![From ODBC dialog in Excel](./img/from-ODBC-driver-excel.png) 3. On the credentials screen, choose Default or Custom and add this to the Connection string properties field: ```text motherduck_token= ``` ![DuckDB ODBC driver installer](./img/ODBC-driver-excel.png) 4. Click Connect. ### 4. Load or transform data Use the Navigator window to select tables and choose Load to bring data into Excel, or Transform Data to shape it in Power Query before loading. ## Excel ODBC on macOS Direct ODBC connectivity between Excel and MotherDuck is **not currently supported on macOS** due to a driver incompatibility. ### Why it doesn't work Excel on macOS uses the **iODBC** driver manager, but the DuckDB ODBC driver is built for **unixODBC**. These drivers are incompatible at the binary level. This is a [known issue](https://github.com/duckdb/duckdb-odbc/issues/40) being tracked by the DuckDB team. If necessary, you can build this driver yourself. ### Alternatives for macOS users #### Option 1: Export directly to Excel with DuckDB (CLI and drivers) DuckDB has an [Excel extension](https://duckdb.org/docs/stable/core_extensions/excel) that can write `.xlsx` files directly. This works with DuckDB CLI or any DuckDB driver, but cannot be used in the MotherDuck UI as we cannot currently export `.xlsx` files to your local file system. ```sql -- Connect to MotherDuck and export to Excel ATTACH 'md:'; COPY (SELECT * FROM my_database.my_table) TO 'output.xlsx' WITH (FORMAT xlsx, HEADER true); ``` Or via command line: ```bash duckdb -c "ATTACH 'md:'; COPY (SELECT * FROM my_database.my_table) TO 'output.xlsx' WITH (FORMAT xlsx, HEADER true);" ``` #### Option 2: Use the MotherDuck Web UI Query your data in the [MotherDuck Web UI](https://app.motherduck.com) and export results: 1. Run your query in the MotherDuck UI 2. Click the download button to export as CSV 3. Open the CSV in Excel #### Option 3: Export to CSV via DuckDB CLI Use the DuckDB CLI to export query results to CSV: ```bash duckdb -c "ATTACH 'md:'; COPY (SELECT * FROM my_database.my_table) TO 'output.csv' (HEADER, DELIMITER ',');" ``` ## Tips - If you change your MotherDuck token, update the connection string properties in Excel. - If you use multiple databases, create separate DSNs (e.g., `DuckDB - analytics`, `DuckDB - finance`) with different `md:database` values. ## Troubleshooting ### How do I delete an existing MotherDuck connection in Excel? 1. In Excel, go to Data -> Queries & Connections. 2. Find the connection you want to remove, right click it, and choose Delete. ### How do I modify an existing MotherDuck connection? 1. In Excel, go to Data -> Queries & Connections. 2. Right click the connection and choose Properties. 3. Open the Definition tab and update the connection string (for example, update `motherduck_token=...`) and save. If you don't see the Definition tab, use Data -> Get Data -> Data Source Settings, select your DuckDB connection, then choose Change Source or Edit Permissions as needed. --- Source: https://motherduck.com/docs/integrations/bi-tools/hex --- sidebar_position: 1 title: Hex description: Connect Hex notebooks to MotherDuck using SQL data connections or Python cells for interactive analytics. --- import Image from '@theme/IdealImage'; [Hex](https://hex.tech/) is a software platform for collaborative data science and analytics using Python, SQL and no-code. You have two ways to connect to MotherDuck using Hex: - **Using SQL cells with a data connection**: MotherDuck is a supported [data connection in Hex](https://learn.hex.tech/docs/connect-to-data/data-connections/data-connections-introduction#supported-data-sources). - **Using Python cells**: You can use Python cells to connect to MotherDuck and query data using DuckDB. ## Using SQL cells with a data connection :::tip When many human users query through the same MotherDuck data connection, consider using a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). Hex will then route the queries to a dedicated Duckling per Hex kernel, up to the maximum flock size determined by your organization admin. What this means in practice: * Each workbook will get a stable backend for each unique data connection. Multiple users collaborating on the same workbook will share the Duckling to query faster on warm data caches. * In a published app, each user will get a stable backend for each data connection to power their own unique exploration. ::: To add a new data connection, head over the Data browser in a new notebook and click on `Add data connection`. ![hex_data_browser](../img/hex_data_browser.png) Select `MotherDuck` as the data source and fill in the required fields. The most important is the MotherDuck token, which you can find in the [MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). ![hex_configuration](../img/hex_configuration.png) Once done, you can use the data browser to explore the tables and columns and directly specify your data connection in your SQL cell. ![hex_data_browser](../img/hex_data_browser_2.png) ![hex_sql_cell](../img/hex_sql_cell.png) ### Query some data Add another cell and run the same query we ran in a Python cell : ```sql SELECT dayname(tpep_pickup_datetime) AS day_of_week, strftime('%H', tpep_pickup_datetime) AS hour_of_day, COUNT(*) AS trip_count FROM sample_data.nyc.taxi GROUP BY day_of_week, hour_of_day ORDER BY day_of_week, hour_of_day; ``` This produces both a table and a Dataframe, which you can utilize in the same manner as we previously demonstrated with Python to generate data visualizations. ![hex_sql_result](../img/hex_sql_result.png) ## Using Python cells :::tip Use Python 3.12 or later When using Python cells in your environment to connect to MotherDuck, set your Hex project's Python version to 3.12 or later to ensure you have a compatible version of DuckDB pre-installed in your Hex environment. To change your Python version, go to **Settings** --> **Environment** and select **Python 3.12** or **Latest**. ::: If you prefer programming in Python, you can use Python cells to connect to MotherDuck and start query data. You can jump directly on the [Hex notebook](https://app.hex.tech/c0083b53-a04f-47b1-bff7-a9ff12590a9f/hex/5c85b3e2-3df7-4011-87a0-1fff63787d03/draft/logic) for a quickstart. The notebook highlight how you can query data using Python or SQL cells and display charts! ### Storing your MotherDuck token The first step is to safely store your MotherDuck token. You can do this by [creating a new secret in Hex.](https://learn.hex.tech/docs/environment-configuration/environment-views#secrets) ![Hex secrets](../img/hex_secrets.png) Let's add your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md#authentication-using-an-access-token) under the name `motherduck_token`. ![Hex secrets2](../img/hex_secrets_2.png) Once done, add the next Python cell to export as environment variable your `motherduck_token`. This will be detected by SQL/Python processes when authenticating to MotherDuck. ```python # Passing the secrets as environment variable for Python/SQL cell auth # Fill in your token as a Hex project secret https://learn.hex.tech/docs/environment-configuration/environment-views#secret import os os.environ["motherduck_token"] = motherduck_token ``` ### Connecting to MotherDuck DuckDB is already pre-installed in the Hex environment, so you can connect to MotherDuck directly. Add a Python cell and run the following code: ![Hex add cell](../img/hex_add_cell.png) ```python import duckdb # Connect to MotherDuck using Python conn = duckdb.connect(f'md:') ``` ### Query some data and display a chart You can query data from the [sample_data database](/getting-started/sample-data-queries/datasets.mdx). The following example runs a query and returns the result as a pandas dataframe to display as a chart. This database is auto-attached to any MotherDuck user, so you can query it directly. Add another Python cell and run the following code: ```python # Query sample_data database and convert it to a pandas dataframe for dataviz peak_hours = conn.sql(""" SELECT dayname(tpep_pickup_datetime) AS day_of_week, strftime('%H', tpep_pickup_datetime) AS hour_of_day, COUNT(*) AS trip_count FROM sample_data.nyc.taxi GROUP BY day_of_week, hour_of_day ORDER BY day_of_week, hour_of_day;""").to_df() ``` Now we can display the chart using the Visualization cell. Add a new Visualization cell, type `Chart` and select the dataframe we just created `peak_hours`. ![Hex chart](../img/hex_chart_df.png) Finally, play with the parameters to obtain the following chart which gives you a weekly view of the peak hours in New York City for the yellow cabs. ![Hex chart peak hours](../img/hex_chart_peak_hours.png) --- Source: https://motherduck.com/docs/integrations/bi-tools/index --- title: Business Intelligence Tools description: Use MotherDuck as a data source in tools for interactive data analysis and presentation --- # Business Intelligence Tools MotherDuck integrates with popular business intelligence tools to help you analyze and visualize your data. ## Included pages - [Hex](https://motherduck.com/docs/integrations/bi-tools/hex): Connect Hex notebooks to MotherDuck using SQL data connections or Python cells for interactive analytics. - [Evidence](https://motherduck.com/docs/integrations/bi-tools/evidence): Build code-based data products with Evidence connected to MotherDuck using SQL and markdown. - [Superset & Preset](https://motherduck.com/docs/integrations/bi-tools/superset-preset): Build dashboards with Apache Superset or Preset connected to MotherDuck via the DuckDB SQLAlchemy driver. - [Metabase](https://motherduck.com/docs/integrations/bi-tools/metabase): Connect self-hosted Metabase to MotherDuck or local DuckDB databases using the DuckDB driver plugin. - [Tableau](https://motherduck.com/docs/integrations/bi-tools/tableau): Connect Tableau Cloud, Desktop, or Server to MotherDuck for interactive dashboards and reports. - [Connect MotherDuck to Excel](https://motherduck.com/docs/integrations/bi-tools/excel): Load MotherDuck data into Excel using the DuckDB ODBC driver on Windows or export options for macOS. - [Microsoft Power BI](https://motherduck.com/docs/integrations/bi-tools/powerbi): Connect Power BI Desktop or Power BI Service to MotherDuck for interactive dashboards and reports. --- Source: https://motherduck.com/docs/integrations/bi-tools/metabase --- sidebar_position: 5 title: Metabase description: Connect self-hosted Metabase to MotherDuck or local DuckDB databases using the DuckDB driver plugin. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; [Metabase](https://www.metabase.com/) is an open source analytics/BI platform that provides intuitive data visualization and exploration capabilities. This guide details how to connect Metabase to both local DuckDB databases and MotherDuck. ## Prerequisites - Metabase installed (self-hosted) - Admin access to your Metabase instance - For MotherDuck connections: valid MotherDuck token ## Metabase Cloud Metabase Cloud does not currently support installing custom drivers. Support for the DuckDB/MotherDuck driver on Metabase Cloud is under development. Until Cloud support is available, use Self-hosted Metabase to connect to DuckDB or MotherDuck. ## Self-hosted Metabase ### Install the DuckDB driver 1. Create a `Dockerfile` that includes the latest Metabase plus the DuckDB driver: ```dockerfile FROM eclipse-temurin:21-jre ENV MB_PLUGINS_DIR=/plugins RUN mkdir -p ${MB_PLUGINS_DIR} /app # Latest Metabase ADD https://downloads.metabase.com/latest/metabase.jar /app/metabase.jar # Latest DuckDB driver ADD https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar ${MB_PLUGINS_DIR}/ EXPOSE 3000 CMD ["java", "-jar", "/app/metabase.jar"] ``` 2. Build and run: ```bash docker build -t metabase-duckdb:latest . docker run -d --name metaduck -p 3000:3000 -e MB_PLUGINS_DIR=/plugins metabase-duckdb:latest ``` Tip: For reproducible builds, pin versions instead of `latest`: ```dockerfile # Example of pinning versions (replace X.Y.Z) ADD https://downloads.metabase.com/vX.Y.Z/metabase.jar /app/metabase.jar ADD https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/download/1.X.Y/duckdb.metabase-driver.jar ${MB_PLUGINS_DIR}/ ``` Note: Use a Debian/Ubuntu-based JRE image (not Alpine) to avcodoid glibc issues with the DuckDB driver. 1. Download the latest DuckDB driver `.jar`: ```bash curl -L -o duckdb.metabase-driver.jar \ https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar ``` 1. Copy it to the Metabase plugins directory: - Standard installation (example): If your `metabase.jar` is at `~/app/metabase.jar`, place the driver in `~/app/plugins/` ```bash mkdir -p ~/app/plugins mv duckdb.metabase-driver.jar ~/app/plugins/ ``` - On Mac: The plugins directory is `~/Library/Application Support/Metabase/Plugins/` (if you are using a Mac) ```bash mkdir -p "${HOME}/Library/Application Support/Metabase/Plugins/" mv duckdb.metabase-driver.jar "${HOME}/Library/Application Support/Metabase/Plugins/" ``` - Custom location or Docker: set `MB_PLUGINS_DIR` to point Metabase at your plugins directory and place the `.jar` there (if you are using a custom location or Docker). 1. Restart Metabase so it picks up the new plugin. 1. SSH to the host and download to the plugins directory. Replace user/host and adjust `MB_PLUGINS_DIR` as needed. ```bash ssh user@your-host "bash -lc ' set -euo pipefail MB_PLUGINS_DIR=${MB_PLUGINS_DIR:-/app/plugins} mkdir -p "$MB_PLUGINS_DIR" if command -v wget >/dev/null; then wget -qO "$MB_PLUGINS_DIR/duckdb.metabase-driver.jar" \ https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar else curl -L -o "$MB_PLUGINS_DIR/duckdb.metabase-driver.jar" \ https://github.com/MotherDuck-Open-Source/metabase_duckdb_driver/releases/latest/download/duckdb.metabase-driver.jar fi '" ``` 2. Restart Metabase on the remote host: - systemd: `ssh user@your-host 'sudo systemctl restart metabase'` - Docker: `ssh user@your-host 'docker restart '` :::important Restart required: Metabase must be restarted after adding or upgrading plugins. Hot-reload of drivers is not supported. ::: :::tip Compatibility and upgrades: New DuckDB driver releases are designed to be backward compatible with recent Metabase versions. Upgrading to the latest driver is recommended for bug fixes and stability. If you run a significantly older Metabase version, validate in staging first. ::: ### Add your database connection After installing the driver, you can add MotherDuck as a data source in Metabase. 1. Log in to Metabase with admin credentials 2. Navigate to **Admin Settings** > **Databases** > **Add Database** 3. Select **DuckDB** as the database type :::note Since DuckDB does not do implicit casting by default, the `old_implicit_casting` config is currently necessary for datetime filtering in Metabase to function. It's recommended to keep it set. ::: #### Connecting to MotherDuck To connect to MotherDuck: 1. **Database name**: In the Database file field, enter `md:[database_name]` where `[database_name]` is your MotherDuck database name 2. **MotherDuck token**: Paste your MotherDuck token (retrieve from the [MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md)) 3. **Configuration**: Enable `old_implicit_casting` (recommended) for proper datetime handling ![Example](../img/metabase_motherduck.png) ### DuckLake on Metabase DuckLake is supported with the DuckDB driver in Metabase. Use the latest DuckDB driver release and a DuckDB version that supports DuckLake (DuckDB v1.3.2 or newer is recommended). #### MotherDuck-managed DuckLake If your DuckLake database is managed by MotherDuck, you can connect the same way you connect to any MotherDuck database: 1. Select DuckDB as the database type 2. Database file: `md:[ducklake_database_name]` 3. MotherDuck token: paste your token 4. Keep `old_implicit_casting` enabled (recommended) No extra Init SQL is required. Query your tables normally in Metabase. #### Own compute + DuckLake catalog (attach in Init SQL) If you want Metabase’s embedded DuckDB to query a DuckLake stored externally, attach the DuckLake catalog in the connection’s Init SQL. This works for both MotherDuck-managed catalogs and self-managed catalogs. - Init SQL for a MotherDuck-managed DuckLake catalog: ```sql -- Attaches the DuckLake metadata catalog hosted in MotherDuck ATTACH 'ducklake:md:__ducklake_metadata_[database_name]' AS dl1; ``` - Init SQL for a self-managed DuckLake catalog (local metadata DB) with S3 data path: ```sql -- Replace the path to your DuckLake metadata DB and bucket prefix ATTACH 'ducklake:/duckdb/my_ducklake_metadata.ducklake' AS dl1 ( DATA_PATH 's3://my_bucket/lake/' ); ``` Once attached, reference tables with the alias, for example: `FROM dl1.my_table`. ### Connecting to a Local DuckDB database To connect to a local DuckDB database: 1. Database file: enter the full path to your DuckDB file (e.g., `/path/to/database.db`) 2. Configuration: enable `old_implicit_casting` (recommended) to ensure proper datetime filtering :::note DuckDB's concurrency model supports either one process with read/write permissions, or multiple processes with read permissions, but not both at the same time. This means you will not be able to open a local DuckDB in read-only mode, then the same DuckDB in read-write mode in a different process. ::: ![Example](../img/metabase_local_duckdb.png) ## Configuration Best Practices - **Connection pooling**: For production instances, set an appropriate connection pool size based on expected concurrent users - **Query timeouts**: Configure timeouts in Metabase settings to prevent long-running queries from affecting system performance - **Data access**: Use database-level permissions in Metabase to control who can access which data sources ## Troubleshooting | Issue | Solution | |-------|----------| | Driver not detected | Ensure driver is in the correct plugins directory and Metabase has been restarted | | Connection failures | Verify database path (local) or database name and token (MotherDuck) | | Permission errors | Check file permissions for local databases | | Datetime filtering issues | Enable `old_implicit_casting` in the connection settings | | Add MotherDuck token in the connection string | Specify a correct MotherDuck token or MotherDuck database name after the `md:` prefix | ### Updating the MotherDuck token Metabase keeps long-lived database connections alive. When you update only the MotherDuck token while an existing connection is still cached, Metabase raises `Connection error: Can't open a connection to same database file with a different configuration than existing connections`. Use one of the following approaches to refresh the token successfully: 1. **Add a cache buster while editing the database.** Edit the connection under **Admin Settings** > **Databases**, then update both the **Database file** field and the **MotherDuck Token** field with a small cache-busting change (for example, append `?refresh=20250917`). Updating both values at the same time forces Metabase to treat the configuration as new. Save the connection, then optionally revert the fields to their clean values once the change is persisted. 2. **Restart Metabase before updating the token.** Restart the Metabase service and, immediately after it starts, go straight to `/admin/databases` to update the token field. Do not open the Metabase home screen before editing the database connection, or the previous connection (with the old token) will be re-established. ### Connecting to a Local DuckDB database To connect to a local DuckDB database: 1. **Database file**: Enter the full path to your DuckDB file (e.g., `/path/to/database.db`) 2. **Configuration**: Enable `old_implicit_casting` (recommended) to ensure proper datetime filtering 3. **Additional settings**: - **Read only**: Toggle as appropriate for your use case - **Naming strategy**: Choose your preferred table/field naming strategy :::note DuckDB's concurrency model supports either one process with read/write permissions, or multiple processes with read permissions, but not both at the same time. This means you will not be able to open a local DuckDB in read-only mode, then the same DuckDB in read-write mode in a different process. ::: ![Example](../img/metabase_local_duckdb.png) --- Source: https://motherduck.com/docs/integrations/bi-tools/powerbi/index --- title: Microsoft Power BI description: Connect Power BI Desktop or Power BI Service to MotherDuck for interactive dashboards and reports. --- [Power BI](https://www.microsoft.com/en-us/power-platform/products/power-bi) is an interactive data visualization product developed by Microsoft. You can connect Power BI to MotherDuck through the built-in PostgreSQL database connector using MotherDuck's Postgres endpoint. ## Included pages - [Power BI Desktop with MotherDuck](https://motherduck.com/docs/integrations/bi-tools/powerbi/powerbi-desktop): Connect Power BI Desktop to MotherDuck using the Postgres endpoint for dashboards and reports. - [Power BI Service with MotherDuck](https://motherduck.com/docs/integrations/bi-tools/powerbi/powerbi-service): Publish Power BI reports to the cloud using the On-Premises Data Gateway and MotherDuck's Postgres endpoint. - [Power BI custom connector (legacy)](https://motherduck.com/docs/integrations/bi-tools/powerbi/powerbi-custom-connector): Connect Power BI to MotherDuck using the DuckDB ODBC driver and Power Query custom connector. --- Source: https://motherduck.com/docs/integrations/bi-tools/powerbi/powerbi-custom-connector --- sidebar_position: 3 sidebar_label: Power BI Custom Connector (Legacy) title: Power BI custom connector (legacy) description: Connect Power BI to MotherDuck using the DuckDB ODBC driver and Power Query custom connector. --- import DocImage from '@site/src/components/DocImage'; :::warning[Legacy] The custom connector is a legacy approach. Use the [Postgres endpoint setup](./powerbi-desktop.mdx) instead for a simpler connection that doesn't require installing drivers or custom extensions. ::: The open-source [DuckDB Power Query Connector](https://github.com/motherduckdb/duckdb-power-query-connector/) lets you connect Power BI to DuckDB and MotherDuck using the DuckDB ODBC driver. ## Installing 1. Download the latest [DuckDB ODBC driver for Windows (x86_64/AMD64)](https://github.com/duckdb/duckdb-odbc/releases/download/v1.4.4.0/duckdb_odbc-windows-amd64.zip). See [the releases page](https://github.com/duckdb/duckdb-odbc/releases) for other versions and architectures. For more information about the Windows ODBC Driver, see the [DuckDB Docs page on DuckDB ODBC API on Windows](https://duckdb.org/docs/stable/clients/odbc/windows). 2. Extract the `.zip` archive. Run `odbc_install.exe`. If Windows displays a security warning, click "More information" then "Run Anyway". 3. Optionally, verify the installation in the Registry Editor: - Open Registry Editor by running `regedit` - Navigate to `HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\DuckDB` - Confirm the Driver field shows your installed version - If incorrect, delete the `DuckDB` registry key and reinstall 4. Configure Power BI security settings to allow loading of custom extensions: - Go to File -> Options and settings -> Options -> Security -> Data Extensions - Enable "Allow any extensions to load without validation or warning" - 5. Download the latest version of the DuckDB Power Query extension: - [duckdb-power-query-connector.mez](https://github.com/MotherDuck-Open-Source/duckdb-power-query-connector/releases/latest/download/duckdb-power-query-connector.mez) 6. Create the Custom Connectors directory if it does not yet exist: - Navigate to `[Documents]\Power BI Desktop\Custom Connectors` - Create this folder, if it doesn't exist - Note: If this location does not work you may need to place this in your OneDrive Documents folder instead 7. Copy the `duckdb-power-query-connector.mez` file into the Custom Connectors folder 8. Restart Power BI Desktop ## How to use with Power BI 1. In Power BI Desktop, click "Get Data" -> "More..." 2. Search for "DuckDB" in the connector search box and select the DuckDB connector 3. For MotherDuck connections, you'll need to provide: - Database Location: Use the `md:` prefix followed by your database name (for example, `md:my_database`). This can also be a local file path (for example, `~\my_database.db`) or an in-memory database (`:memory:`). - MotherDuck Token: Get your token from [MotherDuck's token page](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). *For local DuckDB connections:* Enter "localtoken" to enable the connection. - Read Only (Optional): Set to `true` if you only need read access. - Saas_mode (Optional): Set to `true` to disable [DuckDB extensions](../../../concepts/duckdb-extensions.md). - Attach_mode (Optional): Set to `single` to scope the connection to one database (recommended for BI-tool catalog browsers, which can be confused by multiple attached databases). Leave blank to use the default workspace mode and see all databases in your workspace. See [Attach modes](/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/). 4. Click "OK". 5. Click "Connect". 6. Select the table(s) you want to import. Click "Load". 7. You can query your data and create visualizations. 8. After connecting, you can: - Browse and select tables from your MotherDuck or DuckDB database - Use "Transform Data" to modify your queries before loading - Write custom SQL queries using the "Advanced Editor" - Import multiple tables in one go 9. Power BI maintains the connection to your MotherDuck or DuckDB database, letting you: - Refresh data automatically or on-demand - Create relationships between tables - Build visualizations and dashboards - Share reports with other users (requires proper gateway setup) ## Use custom data connectors with an on-premises data gateway You can use custom data connectors with an on-premises data gateway to connect to data sources that are not supported by default. To do this, you need to install the on-premises data gateway and configure it to use the custom data connector. For more information, see [Use custom data connectors with an on-premises data gateway in Power BI](https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-custom-connectors). There are some limitations with using a custom connector with an on-premises data gateway: - The folder you create must be accessible to the background gateway service. Folders under user Windows folders or system folders typically aren't accessible. The on-premises data gateway app shows a message if the folder isn't accessible. This limitation doesn't apply to the on-premises data gateway (personal mode). - If your custom connector is on a network drive, include the fully qualified path in the on-premises data gateway app. - You can only use one custom connector data source when working in DirectQuery mode. Multiple custom connector data sources don't work with DirectQuery. ## Additional information - [Power BI documentation](https://learn.microsoft.com/en-us/power-bi/connect-data/) - [DuckDB Power Query Connector](https://github.com/motherduckdb/duckdb-power-query-connector/) ## Troubleshooting ### Missing VCRUNTIME140.dll If you receive an error about missing `VCRUNTIME140.dll`, you need to install the Microsoft Visual C++ Redistributable. You can download it from [Microsoft's download page](https://www.microsoft.com/en-us/download/details.aspx?id=52685). ### Visual C++ and ODBC issues :::note These steps are particularly relevant for Windows Server environments, especially for Windows Server 2019, but may also help resolve issues on other Windows versions. ::: If you encounter issues with ODBC connectivity or receive errors related to Visual C++ libraries, try these troubleshooting steps: 1. Reinstall the Microsoft Visual C++ Redistributable: - Download the latest version from [Microsoft's official website](https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170) for your architecture - Run the installer with administrator privileges - Restart your computer after installation - Try connecting to MotherDuck again 2. If you're still experiencing issues, you can use the ODBC Test tool to diagnose the connection: - Open the ODBC Test tool (typically available in Windows SDK) - Look for a dropdown menu labeled "hstmt 1: ..." - Select this option to run test queries - If queries work in the ODBC Test tool but not in Power BI, this indicates a Power BI-specific configuration issue If you continue to experience problems after trying these steps: - Verify that your MotherDuck token is valid and hasn't expired - Check that your network allows connections to MotherDuck's services - Confirm you have the latest version of the DuckDB Power Query Connector installed If you're still experiencing issues, reach out to us at [support@motherduck.com](mailto:support@motherduck.com) and we'll be happy to help you troubleshoot the issue. --- Source: https://motherduck.com/docs/integrations/bi-tools/powerbi/powerbi-desktop --- sidebar_position: 1 sidebar_label: Power BI Desktop title: Power BI Desktop with MotherDuck description: Connect Power BI Desktop to MotherDuck using the Postgres endpoint for dashboards and reports. --- import DocImage from '@site/src/components/DocImage'; :::info[Preview] The Postgres endpoint is in [preview](/about-motherduck/feature-stages/). Features and behavior may change. ::: :::warning[Looking for the custom connector?] The DuckDB custom connector is a legacy approach. If you still need it, see the [legacy custom connector guide](./powerbi-custom-connector.md). ::: ## Before you start You'll need: - [Power BI Desktop](https://www.microsoft.com/en-us/power-platform/products/power-bi/desktop) installed on Windows - A [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) - Your Postgres host, which you can find at [MotherDuck Postgres settings](https://app.motherduck.com/settings/postgres) (for example, `pg.us-east-1-aws.motherduck.com`) ## Connect to MotherDuck 1. In Power BI Desktop, click **Get data**. 2. Search for **PostgreSQL database** in the connector list and select it. 3. Fill in the connection details: - **Server**: Your Postgres host (for example, `pg.us-east-1-aws.motherduck.com`). You can find this at [MotherDuck Postgres settings](https://app.motherduck.com/settings/postgres). - **Database**: Your database or share name in MotherDuck (for example, `sample_data`). 4. Select a data connectivity mode: - **DirectQuery**: Queries run against MotherDuck in real time. Best for dashboards that need up-to-date data. - **Import**: Loads a snapshot of the data into Power BI's in-memory model. Best when you want fast local interactions and can refresh on a schedule. 5. Click **OK**. 6. When prompted for credentials, select **Database** on the left and enter: - **User name**: `postgres` - **Password**: Your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) 7. Click **Connect**. In the Navigator, select the tables you want to use and click **Load**. 8. You can build visualizations with your MotherDuck data. ## Connection parameters | Parameter | Value | |-----------|-------| | **Server** | `pg.-aws.motherduck.com` (find yours at [Postgres settings](https://app.motherduck.com/settings/postgres)) | | **Database** | Your database name or share name | | **User name** | `postgres` | | **Password** | Your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) | ## Additional information - [Postgres endpoint reference](/sql-reference/postgres-endpoint) for connection parameters, SSL options, and limitations - [Connect through the Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) for a general how-to guide - [Power BI documentation](https://learn.microsoft.com/en-us/power-bi/connect-data/) --- Source: https://motherduck.com/docs/integrations/bi-tools/powerbi/powerbi-service --- sidebar_position: 2 sidebar_label: Power BI Service title: Power BI Service with MotherDuck description: Publish Power BI reports to the cloud using the On-Premises Data Gateway and MotherDuck's Postgres endpoint. --- import DocImage from '@site/src/components/DocImage'; :::info[Preview] The Postgres endpoint is in [preview](/about-motherduck/feature-stages/). Features and behavior may change. ::: Power BI Service is the cloud-based version of Power BI that lets you publish, share, and schedule refreshes for reports and dashboards. To connect Power BI Service to MotherDuck, you need a Microsoft On-Premises Data Gateway that bridges the cloud service to MotherDuck's Postgres endpoint. Both **Import** and **DirectQuery** modes work through the gateway. ## Before you start You'll need: - A published `.pbix` report connected to MotherDuck through the [Power BI Desktop setup](./powerbi-desktop.mdx) - A [Power BI Pro or Premium Per User](https://www.microsoft.com/en-us/power-platform/products/power-bi/pricing) license (required for sharing reports and using the standard gateway) - A [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) - A Windows machine to host the gateway (see [Microsoft's gateway requirements](https://learn.microsoft.com/en-us/data-integration/gateway/service-gateway-install#requirements)) ## Install the gateway 1. Download the standard gateway installer from [Microsoft's gateway download page](https://aka.ms/on-premises-data-gateway-installer). Download the **standard (enterprise) gateway**, not the personal mode gateway. 2. Run the installer and accept the default installation path. 3. After installation, the configuration wizard opens. Sign in with your **Microsoft work or school account** (the one associated with your Power BI tenant). 4. Select **Register a new gateway on this computer**. 5. Enter a gateway name (for example, `MD-PG-Gateway`) and a recovery key. Store the recovery key securely. 6. Click **Configure** and wait for registration to complete. **Verify:** The configuration wizard shows "The gateway is online and ready to be used." The Windows service `On-premises data gateway service` should be running in `services.msc`. ## Add a MotherDuck data source 1. In [Power BI Service](https://app.powerbi.com), click the **Settings gear** and select **Manage connections and gateways**. 2. Verify your gateway shows **Online**. 3. Click **+ New** and select **On-premises**. 4. Fill in the connection details: | Field | Value | |-------|-------| | **Gateway cluster name** | Select your gateway | | **Connection name** | A descriptive name (for example, `MotherDuck-PG-sample_data`) | | **Data Source Type** | **PostgreSQL** | | **Server** | Your Postgres host (for example, `pg.us-east-1-aws.motherduck.com`) | | **Database** | Your MotherDuck database name | | **Authentication method** | **Basic** | | **Username** | `postgres` | | **Password** | Your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) | | **Encrypted Connection** | Checked | | **Privacy Level** | Organizational | 5. Click **Create**. :::warning The **Server** and **Database** values must match your `.pbix` file character-for-character. If they differ, the published dataset won't find the gateway data source. ::: ## Publish and connect a report 1. In Power BI Desktop, publish your report: **File > Publish > Publish to Power BI** and select a workspace. 2. In Power BI Service, go to the workspace and find the semantic model (dataset). 3. Open **Settings** for the semantic model and expand **Gateway and cloud connections**. 4. Map the connection to your gateway data source. 5. Under **Data source credentials**, click **Edit credentials** and enter: - Authentication method: **Basic** - User name: `postgres` - Password: Your MotherDuck access token - Encrypted connection: Checked 6. Click **Sign in**. ## Set up scheduled refresh For reports using **Import** mode, you can configure automatic data refreshes. 1. In the semantic model settings, expand **Refresh**. 2. Toggle **Keep your data up to date** to **On**. 3. Set your refresh frequency and time zone. 4. Click **Apply**. To verify, trigger a manual refresh: open the semantic model's three-dot menu and select **Refresh now**. All steps should complete with green check marks. ## DirectQuery through the gateway For reports using **DirectQuery** mode, queries run against MotherDuck in real time through the gateway. No scheduled refresh is needed since data is always live. After publishing and mapping the gateway data source (steps above), DirectQuery reports work automatically in Power BI Service. ## Connection parameters | Parameter | Value | |-----------|-------| | **Server** | `pg.-aws.motherduck.com` (find yours at [Postgres settings](https://app.motherduck.com/settings/postgres)) | | **Database** | Your database name | | **Username** | `postgres` | | **Password** | Your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) | | **Encrypted Connection** | Checked | ## Troubleshooting ### Gateway shows offline Check the gateway machine is on, connected to the network, and the `On-premises data gateway service` Windows service is running. Restart the service if needed. ### Firewall blocking port 5432 If `Test-NetConnection -ComputerName pg.us-east-1-aws.motherduck.com -Port 5432` returns `TcpTestSucceeded: False`, add an outbound firewall rule allowing TCP 5432 to the MotherDuck Postgres host. ### SSL/TLS handshake failure MotherDuck uses certificates from a publicly trusted CA, so the gateway should trust them by default. If you see "The remote certificate is invalid," run Windows Update to refresh the root CA store, or manually import the ISRG Root X1 certificate into the machine-level Trusted Root Certification Authorities store. After importing, restart the gateway service. ### Credential errors - The username must be `postgres`. - The password is your **MotherDuck access token** (starting with `md_`), not your web UI password. - Check for trailing whitespace in the token. ### Published dataset doesn't see the gateway The **Server** and **Database** values in the gateway data source must match the `.pbix` file exactly, including case. Recreate the data source with the correct values if they differ. ## Additional information - [Postgres endpoint reference](/sql-reference/postgres-endpoint) for connection parameters, SSL options, and limitations - [Connect through the Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) for a general how-to guide - [Microsoft gateway documentation](https://learn.microsoft.com/en-us/power-bi/connect-data/service-gateway-onprem) - [Power BI Service documentation](https://learn.microsoft.com/en-us/power-bi/fundamentals/power-bi-service-overview) --- Source: https://motherduck.com/docs/integrations/bi-tools/superset-preset --- sidebar_position: 4 title: Superset & Preset description: Build dashboards with Apache Superset or Preset connected to MotherDuck via the DuckDB SQLAlchemy driver. --- import HorizontalLayout from '@site/src/components/HorizontalLayout'; [Apache Superset](https://superset.apache.org/) is a powerful, open-source data exploration and visualization platform designed to be intuitive and interactive. It allows data professionals to quickly integrate and analyze data from various sources, creating insightful dashboards and charts for better decision making. [Preset](https://preset.io/) is a cloud-native, user-friendly platform built on Apache Superset. It offers enhanced capabilities and managed services to leverage the power of Superset without needing to handle installation and maintenance. In this guide, we'll cover how you can use MotherDuck with either Superset or Preset. ## Superset ### Setup The easy way to get started locally with Superset is to use their [docker-compose configurations.](https://superset.apache.org/docs/installation/installing-superset-using-docker-compose/) ### Adding a database connection to MotherDuck To make it work with DuckDB & MotherDuck, you will have to install two extra Python packages in your local Superset environment: - DuckDB SQLAlchemy driver [duckdb-engine](https://github.com/Mause/duckdb_engine) - DuckDB [duckdb](https://github.com/duckdb/duckdb) 1. Clone the [Superset repository](https://github.com/apache/superset): ```bash git clone https://github.com/apache/superset.git ``` 2. Create a new file in `superset/docker/requirements-local.txt` and add the following packages: ```text duckdb-engine duckdb ``` 3. Build or run the docker container, depending whether this is the first time you run it or not, with the following command: ```bash # First time running it docker-compose up --build # Subsequent runs docker-compose up ``` 4. Once the container is running, you can access the Superset UI at [http://localhost:8088](http://localhost:8088) or at the address you specified in the `docker-compose.yml` file. 5. Once you are logged in, head over to "Settings" and click on "Database Connections", then click on "+ Database".
![Superset Settings menu showing Database Connections option](./img/superset-database-connections-menu.png)
![Superset Add Database button](./img/superset-add-database.png)
6. In the Dropdown, pick "MotherDuck", then enter the database name that you want to connect to and the MotherDuck token of the user or service account. :::note If MotherDuck isn't listed, there's probably an error in the installation of the `duckdb-engine`. Review the installation steps under (2) to install this extra python package. ::: :::info `Database name` is **optional**. Instead of specifying a database name, you can leave it empty to connect to all databases. :::
![Superset dropdown showing MotherDuck option](./img/superset-select-motherduck.png)
![Superset MotherDuck connection form with database name and token fields](./img/superset-motherduck-connection.png)
7. Finally, you can test your token/connection is valid by clicking "Test connection" and click "Connect". Now your MotherDuck database is available in Superset and you can start querying data and making some dashboards! ## Preset ### Setup You can register a Preset account for [free](https://preset.io/pricing/) (up to 5 users). Upon your account creation, you will need to create a workspace and be prompted to connect to your data source. ### Adding your first database connection to MotherDuck When you first setup Preset, you will be offered to create a connection to a database. Preset has a direct integration with MotherDuck, making the connection process simpler. 1. In the Database Connection Dropdown in "Connect your first database", select "MotherDuck" and enter your MotherDuck credentials and database information. :::note The Database Name needs to be prefixed with `md:` to connect to MotherDuck. The Access Token is the token you created in the [MotherDuck dashboard](https://app.motherduck.com). :::
![Preset database connection dropdown with MotherDuck option](./img/preset-select-motherduck.png)
![Preset MotherDuck credentials form](./img/preset-motherduck-credentials.png)
2. Click "Connect" to verify your connection is valid. Now your MotherDuck database is available in Preset and you can start creating dashboards immediately! :::info You can connect to multiple databases using a single MotherDuck connection. ::: ### Adding additional database connections When adding more database connections to Preset, you can choose the option of "Get MotherDuck token". This generates a new token from the MotherDuck account you are logged into. 1. Add a database connection by going to "Settings", then "Database Connections". In the Database Connections page, click on "+ Database" in the top right corner.
![Preset Settings showing Database Connections page](./img/preset-database-connections-menu.png)
![Preset Add Database button](./img/preset-add-database.png)
2. In the dropdown, select "MotherDuck" (see above). 3. Enter your MotherDuck credentials and database information. Here you have the option to generate a new token using the `Get MotherDuck token` button or use a token you previously created. ![Preset MotherDuck credentials form with Get MotherDuck token option](./img/preset-motherduck-token.png) :::caution Given that usually BI tools such as Preset and Superset are connected to service accounts, we do not recommend the "Get MotherDuck token" option for production systems but only for testing. For production systems the recommended approach is to generate an access token for the dedicated service account using the MotherDuck REST API and connect this account to Preset instead. ::: ## Related content - [SQLAlchemy with DuckDB and MotherDuck](/docs/integrations/language-apis-and-drivers/python/sqlalchemy/) - [Authenticating to MotherDuck](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/) - [Managing Service Accounts](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/) --- Source: https://motherduck.com/docs/integrations/bi-tools/tableau/index --- title: Tableau description: Connect Tableau Cloud, Desktop, or Server to MotherDuck for interactive dashboards and reports. --- [Tableau](https://www.tableau.com/) is a widely-used business intelligence and data visualization platform that enables data analysts to build interactive dashboards and reports. You can connect Tableau Cloud to MotherDuck through the built-in PostgreSQL connector using MotherDuck's Postgres endpoint. For Tableau Desktop and Server, use the DuckDB JDBC connector. ## Included pages - [Tableau Cloud with MotherDuck](https://motherduck.com/docs/integrations/bi-tools/tableau/tableau-cloud): Connect Tableau Cloud to MotherDuck using the Postgres endpoint for dashboards and reports. - [Tableau Desktop and Server with MotherDuck](https://motherduck.com/docs/integrations/bi-tools/tableau/tableau-desktop): Connect Tableau Desktop or Server to MotherDuck using the DuckDB JDBC driver and Tableau connector. - [Tableau Bridge (legacy)](https://motherduck.com/docs/integrations/bi-tools/tableau/tableau-bridge): Connect Tableau Cloud to MotherDuck using Tableau Bridge and the DuckDB JDBC connector. --- Source: https://motherduck.com/docs/integrations/bi-tools/tableau/tableau-bridge --- sidebar_position: 3 sidebar_label: Tableau Bridge (Legacy) title: Tableau Bridge (legacy) description: Connect Tableau Cloud to MotherDuck using Tableau Bridge and the DuckDB JDBC connector. --- import useBaseUrl from '@docusaurus/useBaseUrl'; :::warning[Deprecated] Connecting through Tableau Bridge is a legacy approach. Use the [Postgres endpoint setup](./tableau-cloud.mdx) instead for a simpler connection that doesn't require Bridge infrastructure. ::: ## How to use Tableau Cloud with MotherDuck through Tableau Bridge ### Setup This guide assumes you have: - a [Tableau account](https://www.tableau.com/) - a Tableau Cloud Site - a Tableau Desktop installation (with the same version as the Tableau Cloud Server Version) set up with the DuckDB JDBC Driver and Tableau Connector. If you don't, sign up or ask your organization to purchase a plan, or sign up for a free trial. ### Obtain a PAT token Follow [Tableau's instructions on creating a PAT token.](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) This token must belong to a site admin. ### Set up Bridge client Use the [Tableau Bridge client setup instructions](https://help.tableau.com/current/online/en-us/to_bridge_client.htm) to install and set up Bridge client. 1. Make sure the machine where the Bridge client is installed has access to the Database used in the above steps. Important notes: > Network access - Because Bridge facilitates connections between your private network data and Tableau Cloud, it requires the ability to make outbound connections through the internet. After the initial outbound connection, communication is bidirectional. > Required ports - Tableau Bridge uses port 443 to make outbound internet requests to Tableau Cloud and port 80 for certificate validation. 2. Install Bridge client and make sure the Bridge client is signed in to the Tableau Cloud site. You can download the installer from the [Tableau Bridge releases page](https://www.tableau.com/support/releases/bridge). 3. Install the driver and taco files as outlined in the [Tableau connector setup guide](https://help.tableau.com/current/online/en-us/to_sync_local_data.htm#connectors-and-data-types). - [Windows Server] The driver also needs to be installed here: `C:\Program Files\Tableau\Tableau Bridge\Drivers` - [Windows Server] The connector also needs to be installed here: `C:\Program Files\Tableau\Connectors` > Note: Tableau Bridge can be deployed on both Windows or Linux. ### Running Bridge on Linux using Docker (advanced) If you want to run Bridge centrally on a Linux host, the official guidance recommends running it inside a Docker container, as described in Tableau's documentation on [installing Bridge for Linux in containers](https://help.tableau.com/current/online/en-us/to_bridge_linux_install.htm). Below is an **example Dockerfile** you can use as a starting point—this includes where to add JDBC drivers and the **DuckDB/MotherDuck** `.taco` file. It's provided for inspiration and may require updates to match your environment or newer versions of the software.
Example Dockerfile ```dockerfile FROM registry.access.redhat.com/ubi8/ubi:latest RUN yum update -y RUN yum install -y glibc-langpack-en # This is the latest version of Tableau Bridge that is known working with the MotherDuck connector RUN curl -o /tmp/TableauBridge.rpm -L \ https://downloads.tableau.com/tssoftware/TableauBridge-20243.25.0114.1153.x86_64.rpm && \ ACCEPT_EULA=y yum install -y /tmp/TableauBridge.rpm && \ rm /tmp/TableauBridge.rpm # Drivers RUN mkdir -p /opt/tableau/tableau_driver/jdbc # Connectors (tacos) RUN mkdir -p /root/Documents/My_Tableau_Bridge_Repository/Connectors # Download DuckDB JDBC driver and signed taco RUN curl -o /opt/tableau/tableau_driver/jdbc/duckdb_jdbc-1.3.0.0.jar \ -L https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/1.3.0.0/duckdb_jdbc-1.3.0.0.jar && \ curl -o /root/Documents/My_Tableau_Bridge_Repository/Connectors/duckdb_jdbc-v1.1.1-signed.taco \ -L https://github.com/motherduckdb/duckdb-tableau-connector/releases/download/v1.1.1/duckdb_jdbc-v1.1.1-signed.taco ENV TZ=Europe/Berlin ENV LC_ALL=en_US.UTF-8 # ----- user specific settings ----- ENV USER_EMAIL="" ENV PAT_ID=BridgeToken ENV CLIENT_NAME="" ENV SITE_NAME="" ENV POOL_ID="" # ----------------------------------- CMD /opt/tableau/tableau_bridge/bin/run-bridge.sh -e \ --patTokenId=$PAT_ID \ --userEmail=$USER_EMAIL \ --client=$CLIENT_NAME \ --site=$SITE_NAME \ --patTokenFile="/home/documents/token.txt" \ --poolId=$POOL_ID ```
Key points: * Build an image that **installs the Bridge RPM** and then copies the DuckDB JDBC driver to `/opt/tableau/tableau_bridge/Drivers` and the connector to `/root/Documents/My_Tableau_Bridge_Repository/Connectors`. * Start the bridge by calling `run-bridge.sh` and pass the following flags: * `--patTokenFile /run/secrets/pat.json` * `--patTokenId ` * `--site ` * `--poolId ` (optional – see note on pools below) * **PAT naming rule** – the *name* you give the Personal-Access-Token in Tableau **must** be a valid JSON key and must be used **verbatim** 1. as the key in `pat.json` → `{"": ""}` 2. in the `--patTokenId` flag. A mismatch will result in a silent authentication failure. * The latest Bridge **2025.1** builds contain a regression that prevents the MotherDuck connector (and several others) from loading. Until Tableau fixes this, pin the image to the **20243.25.0114.1153** release (see discussion in [GitHub issue #22](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/issues/22)). * Bridge listens only on outbound **443/tcp**, so you do **not** need to publish any container ports. If you run a host firewall (for example, `ufw`) remember that Docker bypasses it [[Docker docs](https://docs.docker.com/engine/network/packet-filtering-firewalls/#docker-and-ufw)]. Restrict egress traffic to Tableau Cloud CIDR blocks if your security policy requires it. * Logs written to `stdout` are useful, but the *detailed* logs live in `/root/Documents/My_Tableau_Bridge_Repository/Log`. Mount this path as a volume or use a side-car to ship the logs to your observability stack. ### Tableau Cloud Bridge pool setup By default, Tableau places the Bridge in the default pool. 1. In Settings → Bridge page, make sure the Bridge client is connected in the connection Status. 2. In the "Private Network Allowlist" add the domain of the database and select the pool. Tableau Bridge Pooling > **Pool Gotcha**: Some users report that a Linux containerised Bridge never shows up under a custom site pool. If that happens, leave `POOL_ID` blank when starting the client – it will join the legacy **Default** pool and still work with live connections. ### Create embedded data source (live) and workbook 1. Open Tableau desktop and sign in to a Tableau Cloud site. > Note: Make sure the Tableau Desktop and [Tableau Cloud version](https://help.tableau.com/current/server/en-us/version_server_view.htm) match. 2. Create new Workbook and select the database connector. 3. Connect to the database. Tableau Cloud DuckDB connector dialog 4. Set up Datasource to use live connectivity. 5. Create a worksheet with the data. Tableau worksheet with MotherDuck data ### Publish the workbook to Tableau Cloud 1. Click on "Server > Publish Workbook". Tableau publish workbook menu 2. Select "Publish Separately" under Publish Type and "Embedded password" under Authentication. Select "Maintain connection to a live data source". Tableau publish separately dialog Tableau publish workbook and data source dialog 3. Click "Publish Workbook & 1 Data Source". Tableau publishing complete confirmation ### (Important step!) update Tableau Bridge client in data source 1. Navigate to the newly published data source in Tableau Cloud (in your browser) and click on the "i" icon to open Data Source Details. Tableau data source info icon 2. Click on "Change Bridge Client..." Tableau data source details dialog 3. Change the bridge client from "Site client pool" to your bridge client (the one you set up in the previous section). Click "Save" and close the dialog. Tableau change bridge client dialog 4. Check that the data source shows up in your Tableau Bridge status dialog. This dialog is located in the Windows Start bar (in the Icon panel). Tableau Bridge connected status 5. You can access your Published Workbook on your Tableau Cloud Site, or you can create a new Tableau Workbook using the Published Data Source. Tableau workbook using published data source ## Additional information - [Tableau Documentation](https://help.tableau.com/current/pro/desktop/en-us/gettingstarted_overview.htm) - [Tableau Exchange Connector DuckDB/MotherDuck](https://exchange.tableau.com/en-gb/products/1021) - [DuckDB Tableau Connector](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/) --- Source: https://motherduck.com/docs/integrations/bi-tools/tableau/tableau-cloud --- sidebar_position: 1 sidebar_label: Tableau Cloud title: Tableau Cloud with MotherDuck description: Connect Tableau Cloud to MotherDuck using the Postgres endpoint for dashboards and reports. --- import DocImage from '@site/src/components/DocImage'; :::info[Preview] The Postgres endpoint is in [preview](/about-motherduck/feature-stages/). Features and behavior may change. ::: :::warning[Looking for the Tableau Bridge setup?] Connecting through Tableau Bridge is a legacy approach. If you still need it, refer to the [legacy Tableau Bridge guide](./tableau-bridge.md). ::: ## Before you start You'll need: - A [Tableau Cloud](https://www.tableau.com/) account - A [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) - Your Postgres host and port, which you can find at [MotherDuck Postgres settings](https://app.motherduck.com/settings/postgres) (for example, `pg.us-east-1-aws.motherduck.com`) ## Connect to MotherDuck 1. In a Tableau Cloud workbook, click **Connect to Data**. 2. Under the **Connectors** tab, select **PostgreSQL**. 3. Fill in the connection details: - **Server**: Your Postgres host (for example, `pg.us-east-1-aws.motherduck.com`). Find this at [MotherDuck Postgres settings](https://app.motherduck.com/settings/postgres). - **Port**: The port from your Postgres settings (for example, `5432`). - **Database**: Your database name in MotherDuck (for example, `sample_data`). - **Username**: `postgres` - **Password**: Your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) - Check **Require SSL**. 4. Click **Sign In**. Tableau connects to MotherDuck and shows your tables. 5. Select your tables and build visualizations with your MotherDuck data. ## Connection parameters | Parameter | Value | |-----------|-------| | **Server** | `pg.-aws.motherduck.com` (find yours at [Postgres settings](https://app.motherduck.com/settings/postgres)) | | **Port** | `5432` (find yours at [Postgres settings](https://app.motherduck.com/settings/postgres)) | | **Database** | Your database name | | **Username** | `postgres` | | **Password** | Your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) | | **Require SSL** | Checked | ## Additional information - [Postgres endpoint reference](/sql-reference/postgres-endpoint) for connection parameters, SSL options, and limitations - [Connect through the Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) for a general how-to guide - [Tableau documentation](https://help.tableau.com/current/online/en-us/to_connect_live_sql.htm) --- Source: https://motherduck.com/docs/integrations/bi-tools/tableau/tableau-desktop --- sidebar_position: 2 sidebar_label: Tableau Desktop & Server title: Tableau Desktop and Server with MotherDuck description: Connect Tableau Desktop or Server to MotherDuck using the DuckDB JDBC driver and Tableau connector. --- ## Tableau Desktop setup for DuckDB and MotherDuck 1. Download a [recent version of the DuckDB JDBC driver](https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/) and copy it into the Tableau Drivers directory: * MacOS: `~/Library/Tableau/Drivers/` * Windows: `C:\Program Files\Tableau\Drivers` * Linux: `/opt/tableau/tableau_driver/jdbc` 2. Download the signed tableau connector (aka "Taco file") file from the [latest available release](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/releases) and copy it into the Connectors directory: * Desktop Windows: `C:\Users\[YourUser]\Documents\My Tableau Repository\Connectors` * Desktop MacOS: `/Users/[YourUser]/Documents/My Tableau Repository/Connectors` * Server Windows: `C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Connectors` * Server Linux: `[Your Tableau Server Install Directory]/data/tabsvc/vizqlserver/Connectors` ## Connecting Once the Taco is installed, and you have launched Tableau, you can create a new connection by choosing "DuckDB by MotherDuck": ![Tableau connector list](../../img/tableau-connector-list.png) ### Local DuckDB database If you wish to connect to a local DuckDB database, select "Local file" as DuckDB Server option, and use the file picker: ![DuckDB Server dropdown](../../img/tableau-connect-options-local-file.png) ![Connection Dialogue](../../img/tableau-connect-local-file.png) ### In-memory database The driver can be used with an in-memory database by selecting the `In-memory database` DuckDB Server option. ![DuckDB Server dropdown](../../img/tableau-connect-options-in-memory.png) The data will then need to be provided by an Initial SQL string, for example: ```sql CREATE VIEW my_parquet AS SELECT * FROM read_parquet('/path/to/file/my_file.parquet'); ``` You can then access it by using the Tableau Data Source editing controls. ### MotherDuck To connect to MotherDuck, you have two authentication options: * Token -- provide the value that you [get from MotherDuck UI](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). * No Authentication -- unless `motherduck_token` environment variable is available to Tableau at startup, you will then be prompted to authenticate when at connection time. To work with a MotherDuck database in Tableau, you have to provide the database to use when issuing queries. In `MotherDuck Database` field, provide the name of your database. You don't have to prefix it with `md:`: ![DuckDB Server dropdown](../../img/tableau-connect-options-md.png) ![Connection Dialogue](../../img/tableau-connect-motherduck.png) ## Additional information * [Tableau Documentation](https://help.tableau.com/current/pro/desktop/en-us/gettingstarted_overview.htm) * [Tableau Exchange Connector DuckDB/MotherDuck](https://exchange.tableau.com/en-gb/products/1021) * [DuckDB Tableau Connector](https://github.com/MotherDuck-Open-Source/duckdb-tableau-connector/) --- Source: https://motherduck.com/docs/integrations/cloud-storage/amazon-s3 --- sidebar_position: 1 title: Amazon S3 description: Configure AWS S3 credentials to query files from private buckets using MotherDuck. --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import CloudExecutionCallout from "./_cloud-execution-callout.mdx"; ## Configure S3 credentials You can safely store your Amazon S3 credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. Secrets are scoped to your user account and are not shared with other users in your organization. ### Create a SECRET object ```sql -- to configure a secret manually: CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'us-east-1', SCOPE 'my-bucket-path' ); ``` :::note When creating a secret using the `CONFIG` (default) provider, be aware that the credential might be temporary. If so, a `SESSION_TOKEN` field also needs to be set for the secret to work correctly. ::: ```sql -- to store a secret using your local AWS credentials (from `aws configure` or SSO): -- if you use AWS SSO, run `aws sso login --profile ` first CREATE SECRET aws_secret IN MOTHERDUCK ( TYPE S3, PROVIDER credential_chain, -- optional: add CHAIN and PROFILE for SSO credentials CHAIN 'sso', PROFILE '' ); ``` :::note Secret validation Starting with DuckDB v1.4.0, credentials are validated at secret creation time. If your credentials are not resolvable locally (for example, expired SSO tokens or missing `~/.aws/credentials`), the `CREATE SECRET` command will fail with a `Secret Validation Failure` error. The recommended fix is to use the correct `CHAIN` and `PROFILE` for your credential type (see the SSO example above). If you need to bypass local validation, you can add `VALIDATION 'none'`, but keep in mind that this skips the local check that confirms your credentials are valid before storing them in MotherDuck. ::: ```sql -- test the s3 credentials SELECT count(*) FROM 's3:///'; -- browse objects in a bucket or prefix FROM md_list_files('s3:///'); ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK (TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'your_bucket_region')"); # testing that our s3 credentials work con.sql("SELECT count(*) FROM 's3:///'").show() # 42 ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) You will then be prompted to enter your Amazon S3 credentials. ![menu_3](./img/settings_secrets_pop_up.png) You can update your secret by executing [CREATE OR REPLACE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command to overwrite your secret. ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete your current secrets. ![menu_4](./img/secrets_delete_4.png) ### Amazon S3 credentials as **temporary** secrets MotherDuck supports DuckDB syntax for providing S3 credentials. ```sql CREATE SECRET ( TYPE S3, KEY_ID 's3_access_key', SECRET 's3_secret_key', REGION 'us-east-1' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: ## Troubleshooting For detailed troubleshooting steps, see our [AWS S3 Secrets Troubleshooting](/documentation/troubleshooting/aws-s3-secrets.md) guide. ## Browse buckets and files To inspect storage from SQL before querying specific files: ```sql FROM md_list_buckets_for_secret('__default_s3'); FROM md_list_files('s3:///'); FROM md_list_files('s3:////'); ``` See [`MD_LIST_BUCKETS_FOR_SECRET()`](/sql-reference/motherduck-sql-reference/md-list-buckets-for-secret) and [`MD_LIST_FILES()`](/sql-reference/motherduck-sql-reference/md-list-files) for details. --- Source: https://motherduck.com/docs/integrations/cloud-storage/azure-blob-storage --- sidebar_position: 1 title: Azure Blob Storage description: Configure Azure Blob Storage credentials to query files from private containers using MotherDuck. --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import CloudExecutionCallout from "./_cloud-execution-callout.mdx"; ## Configure Azure Blob Storage credentials You can safely store your Azure Blob Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. :::note See [Azure docs](https://learn.microsoft.com/en-gb/azure/storage/common/storage-configure-connection-string#configure-a-connection-string-for-an-azure-storage-account) to find the correct connection string format. ::: ### Create a SECRET object ```sql -- to configure a secret manually: CREATE SECRET IN MOTHERDUCK ( TYPE AZURE, CONNECTION_STRING '[your_connection_string]' ); ``` ```sql -- to store a secret configured through `az configure`: CREATE SECRET az_secret IN MOTHERDUCK ( TYPE AZURE, PROVIDER credential_chain, ACCOUNT_NAME 'some-account' ); ``` ```sql -- test the azure credentials SELECT count(*) FROM 'azure://[container]/[file]' SELECT * FROM 'azure://[container]/*.csv'; -- browse objects in a container FROM md_list_files('azure://[container]/', limit := 50); ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK (TYPE AZURE, CONNECTION_STRING '[your_connection_string]')"); # testing that our Azure credentials work con.sql("SELECT count(*) FROM 'azure://[container]/[file]'").show() con.sql("SELECT * FROM 'azure://[container]/*.csv'").show() ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) You will then be prompted to enter your Amazon S3 credentials. ![menu_3](./img/secrets_add_azure.png) ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret. ![menu_4](./img/secrets_delete_azure.png) ### Azure credentials as **temporary** secrets MotherDuck supports DuckDB syntax for providing Azure credentials. ```sql CREATE SECRET ( TYPE AZURE, CONNECTION_STRING '[your_connection_string]' ); ``` or if you use the `az configure` command to store your credentials in the `az` CLI. ```sql CREATE SECRET az_secret ( TYPE AZURE, PROVIDER credential_chain, ACCOUNT_NAME 'some-account' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: ## Browse files in Azure Blob Storage To inspect a container before querying individual files, use [`MD_LIST_FILES()`](/sql-reference/motherduck-sql-reference/md-list-files): ```sql FROM md_list_files('azure://[container]/'); FROM md_list_files('az://[container]/path/'); ``` --- Source: https://motherduck.com/docs/integrations/cloud-storage/cloudflare-r2 --- sidebar_position: 1 title: Cloudflare R2 description: Configure Cloudflare R2 credentials to query files from private buckets using MotherDuck. --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import CloudExecutionCallout from "./_cloud-execution-callout.mdx"; ## Configure Cloudflare R2 credentials You can safely store your Cloudflare R2 credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. :::note See [Cloudflare docs](https://developers.cloudflare.com/r2/api/s3/tokens/) to create a Cloudflare access token. ::: ### Create a SECRET object ```sql CREATE SECRET IN MOTHERDUCK ( TYPE R2, KEY_ID 'your_key_id', SECRET 'your_secret_key', ACCOUNT_ID 'your_account_id' ); ``` :::note The `ACCOUNT_ID` can be found when generating the API token on the endpoint URL `https://.r2.cloudflarestorage.com`. ::: :::note R2 buckets are regionless, so you do not need to specify a `REGION` parameter. If provided, it defaults to `auto`. ::: ```sql -- test the R2 credentials SELECT count(*) FROM 'r2://[bucket]/[file]' ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK ( TYPE R2, KEY_ID 'your_key_id', SECRET 'your_secret_key', ACCOUNT_ID 'your_account_id' )"); # testing that our R2 credentials work con.sql("SELECT count(*) FROM 'r2://[bucket]/[file]'").show() ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) Select the Secret Type `R2` and fill in the required fields. ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret. ![menu_4](./img/secrets_delete_azure.png) ### R2 credentials as **temporary** secrets MotherDuck supports DuckDB syntax for providing R2 credentials. ```sql CREATE SECRET ( TYPE R2, KEY_ID 'your_key_id', SECRET 'your_secret_key', ACCOUNT_ID 'your_account_id' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: --- Source: https://motherduck.com/docs/integrations/cloud-storage/google-cloud-storage --- sidebar_position: 1 title: Google Cloud Storage description: Query files from private Google Cloud Storage buckets using the GCS S3-compatible connection. --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import CloudExecutionCallout from "./_cloud-execution-callout.mdx"; With MotherDuck, you can access files in a private Google Cloud Storage (GCS) bucket. This leverages the GCS S3 compatible connection. ## Google Cloud Storage connection process 1. Create an [HMAC key](https://docs.cloud.google.com/storage/docs/authentication/hmackeys) for the service account: Cloud Storage → Settings → Interoperability → Create a key for a service account 2. Save the Access ID and Secret (shown once) 3. Create the DuckDB secret using the HMAC credentials as described below ## Configure Google Cloud Storage credentials You can safely store your Google Cloud Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. ### Create a SECRET object You can safely store your Google Cloud Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. ```sql CREATE SECRET IN MOTHERDUCK ( TYPE GCS, KEY_ID 'HMAC_ACCESS_ID', SECRET 'HMAC_SECRET' ); -- test GCS credentials SELECT count(*) FROM 'gcs:///'; ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK (TYPE GCS, KEY_ID 'access_key', SECRET 'secret_key')"); # test GCS con.sql("SELECT count(*) FROM 'gcs:///'").show() # 42 ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) You will then be prompted to enter your Amazon S3 credentials. ![menu_3](./img/settings_secrets_pop_up.png) You can update your secret by executing [CREATE OR REPLACE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command to overwrite your secret. ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete your current secrets. ![menu_4](./img/secrets_delete_4.png) ### Google Cloud Storage credentials as **temporary** secrets MotherDuck supports DuckDB syntax for providing GCS credentials. ```sql CREATE SECRET ( TYPE GCS, KEY_ID 's3_access_key', SECRET 's3_secret_key' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: ## Additional resources - [Using the S3 compatible connection in GCS](https://docs.cloud.google.com/storage/docs/aws-simple-migration) - [HMAC Keys in Google Cloud](https://docs.cloud.google.com/storage/docs/authentication/hmackeys) --- Source: https://motherduck.com/docs/integrations/cloud-storage/hetzner-object-storage --- sidebar_position: 5 title: Hetzner Object Storage description: Configure MotherDuck to read files from Hetzner Object Storage using S3-compatible credentials. --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import CloudExecutionCallout from "./_cloud-execution-callout.mdx"; ## Configure Hetzner Object Storage credentials You can safely store your Hetzner Object Storage credentials in MotherDuck for convenience by creating a `SECRET` object using the [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) command. :::note See [Hetzner docs](https://docs.hetzner.com/storage/object-storage/getting-started/generating-s3-keys/) to create S3 access keys. Save your secret key immediately as it cannot be viewed again after creation. ::: ### Create a SECRET object ```sql CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'your_access_key', # provided by Hetzner SECRET 'your_secret_key', # provided by Hetzner ENDPOINT 'fsn1.your-objectstorage.com', # provided by Hetzner SCOPE 'your_bucket_scope' # Example: s3://test-bucket ); ``` :::note The endpoint must include the location (e.g., fsn1, nbg1, or hel1). Available endpoints: - `fsn1.your-objectstorage.com` (Falkenstein) - `nbg1.your-objectstorage.com` (Nuremberg) - `hel1.your-objectstorage.com` (Helsinki) ::: ```sql -- test the Hetzner Object Storage credentials SELECT count(*) FROM 's3://[bucket]/[file]' ``` ```python import duckdb con = duckdb.connect('md:') con.sql("CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'your_access_key', SECRET 'your_secret_key', ENDPOINT 'fsn1.your-objectstorage.com', SCOPE 'your_bucket_scope' )"); # testing that our Hetzner credentials work con.sql("SELECT count(*) FROM 's3://[bucket]/[file]'").show() ``` Click on your profile to access the `Settings` panel and click on `Secrets` menu. ![menu_1](./img/settings_access.png) ![menu_2](./img/settings_panel.png) Then click on `Add secret` in the secrets section. ![menu_3](./img/settings_secrets_panel.png) Select the Secret Type `S3` and fill in the required fields. Ensure you add the endpoint URL (e.g., `fsn1.your-objectstorage.com`) in the endpoint field. ### Delete a SECRET object You can use the same method above, using the [DROP SECRET](/sql-reference/motherduck-sql-reference/delete-secret.md) command. ```sql DROP SECRET ; ``` Click on your profile and access the `Settings` menu. Click on the bin icon to delete the secret. ![menu_4](./img/secrets_delete_azure.png) ### Hetzner Object Storage credentials as temporary secrets MotherDuck supports DuckDB syntax for providing Hetzner Object Storage credentials. ```sql CREATE SECRET ( TYPE S3, KEY_ID 'your_access_key', SECRET 'your_secret_key', ENDPOINT 'fsn1.your-objectstorage.com', SCOPE 'your_bucket_scope' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: ### Multiple locations configuration If you have buckets in different Hetzner locations, you should be creating scoped secrets: ```sql -- Secret for Falkenstein location CREATE SECRET hetzner_fsn1 IN MOTHERDUCK ( TYPE S3, KEY_ID 'access_key_1', SECRET 'secret_key_1', ENDPOINT 'fsn1.your-objectstorage.com', SCOPE 's3://my-bucket-fsn1' ); -- Secret for Nuremberg location CREATE SECRET hetzner_nbg1 IN MOTHERDUCK ( TYPE S3, KEY_ID 'access_key_2', SECRET 'secret_key_2', ENDPOINT 'nbg1.your-objectstorage.com', SCOPE 's3://my-bucket-nbg1' ); ``` :::tip By default, each key pair is automatically valid for every bucket within the same Hetzner project. Use bucket policies to restrict access if needed. ::: --- Source: https://motherduck.com/docs/integrations/cloud-storage/index --- title: Cloud Storage description: Use MotherDuck with your favorite cloud storage services --- # Cloud Storage MotherDuck integrates with popular cloud storage services to help you manage and store your data. ## Included pages - [Amazon S3](https://motherduck.com/docs/integrations/cloud-storage/amazon-s3): Configure AWS S3 credentials to query files from private buckets using MotherDuck. - [Azure Blob Storage](https://motherduck.com/docs/integrations/cloud-storage/azure-blob-storage): Configure Azure Blob Storage credentials to query files from private containers using MotherDuck. - [Cloudflare R2](https://motherduck.com/docs/integrations/cloud-storage/cloudflare-r2): Configure Cloudflare R2 credentials to query files from private buckets using MotherDuck. - [Google Cloud Storage](https://motherduck.com/docs/integrations/cloud-storage/google-cloud-storage): Query files from private Google Cloud Storage buckets using the GCS S3-compatible connection. - [Hetzner Object Storage](https://motherduck.com/docs/integrations/cloud-storage/hetzner-object-storage): Configure MotherDuck to read files from Hetzner Object Storage using S3-compatible credentials. - [Tigris](https://motherduck.com/docs/integrations/cloud-storage/tigris): Query files from Tigris globally distributed object storage using MotherDuck. --- Source: https://motherduck.com/docs/integrations/cloud-storage/tigris --- sidebar_position: 5 title: Tigris description: Query files from Tigris globally distributed object storage using MotherDuck. --- import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; import CloudExecutionCallout from "./_cloud-execution-callout.mdx"; With MotherDuck, you can access files in a private Tigris bucket. Tigris is a globally distributed S3-compatible object storage service that provides low latency anywhere in the world. ## Tigris requirements To get started using Tigris with MotherDuck, you need to: 1. Create a new bucket at [storage.new](https://storage.new) if you don't have one 2. Create an access keypair for that bucket at [storage.new/accesskey](https://storage.new/accesskey) 3. Configure MotherDuck to use Tigris 4. Query files in Tigris When creating a bucket, you can select from different storage tiers: - Standard (default) - Best for general use cases - Infrequent Access - Cheaper than Standard, but charges per gigabyte of retrieval - Instant Retrieval Archive - For long-term storage with urgent access needs - Archive - For long-term storage where retrieval time is not critical ## Configure Tigris credentials ### Create a SECRET object :::note If you are using multiple secrets, the `SCOPE` parameter will make sure MotherDuck knows which one to use. You can validate which secret to use with [`which_secret`](https://duckdb.org/docs/stable/configuration/secrets_manager). As an example, see below: ```sql FROM which_secret('s3://my-other-bucket/file.parquet', 's3'); ``` ::: ```sql CREATE OR REPLACE PERSISTENT SECRET tigris ( TYPE s3, PROVIDER config, KEY_ID 'tid_access_key_id', SECRET 'tsec_secret_access_key', REGION 'auto', ENDPOINT 't3.storage.dev', URL_STYLE 'vhost', SCOPE 's3://my_bucket' ); -- test Tigris credentials SELECT count(*) FROM 's3:///'; ``` ```python import duckdb con = duckdb.connect('md:') con.sql(""" CREATE OR REPLACE PERSISTENT SECRET tigris ( TYPE s3, PROVIDER config, KEY_ID 'tid_access_key_id', SECRET 'tsec_secret_access_key', REGION 'auto', ENDPOINT 't3.storage.dev', URL_STYLE 'vhost', SCOPE 's3://my_bucket' ) """) # test Tigris con.sql("SELECT count(*) FROM 's3:///'").show() ``` Adding Tigris secrets through the UI is not supported. Please add them using SQL statements. ### Delete a SECRET object ```sql DROP SECRET tigris; ``` ### Tigris credentials as **temporary** secrets You can also create temporary secrets that are not persisted across sessions: ```sql CREATE OR REPLACE SECRET ( TYPE s3, PROVIDER config, KEY_ID 'tid_access_key_id', SECRET 'tsec_secret_access_key', REGION 'auto', ENDPOINT 't3.storage.dev', URL_STYLE 'vhost' ); ``` :::note Local/In-memory secrets are not persisted across sessions. ::: --- Source: https://motherduck.com/docs/integrations/data-quality/index --- title: Data Quality Tools description: Monitor and maintain data quality in MotherDuck --- # Data Quality Tools Ensure data quality and reliability in MotherDuck using these integrated tools. ## Included pages No included pages are currently listed for this category. --- Source: https://motherduck.com/docs/integrations/data-science-ai/index --- title: Data Science & AI description: Use MotherDuck with your favorite data science and AI tools --- # Data Science & AI Tools MotherDuck integrates with popular data science and AI tools to help you build powerful machine learning and AI applications. ## Included pages - [Marimo](https://motherduck.com/docs/integrations/data-science-ai/marimo): Use marimo reactive notebooks with MotherDuck for interactive Python data analysis. --- Source: https://motherduck.com/docs/integrations/data-science-ai/marimo --- sidebar_position: 7 title: Marimo description: Use marimo reactive notebooks with MotherDuck for interactive Python data analysis. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # marimo [marimo](https://marimo.io/) is a reactive notebook for Python and SQL that models notebooks as dataflow graphs. When you run a cell or interact with a UI element, marimo automatically runs affected cells (or marks them as stale), keeping code and outputs consistent and preventing bugs before they happen. Every marimo notebook is stored as pure Python, executable as a script, and deployable as an app. ## Getting Started ### Installation First, install marimo with SQL support: ```bash pip install "marimo[sql]" ``` ```bash uv pip install "marimo[sql]" ``` ```bash conda install -c conda-forge marimo duckdb polars ``` ### Authentication There are two ways to authenticate: 1. **Interactive Authentication**: When you first connect to MotherDuck (e.g. `ATTACH 'md:my_db'`), marimo will open a browser window for authentication. 2. **Token-based Authentication**: Set your MotherDuck token as an environment variable: ```bash export motherduck_token="your_token" ``` You can find your token in the MotherDuck UI under Account Settings. ## Using MotherDuck First, open your first notebook: ```bash marimo edit my_notebook.py ``` ### 1. Connecting and Database Discovery ```sql ATTACH IF NOT EXISTS 'md:my_db' ``` ```python import duckdb # Connect to MotherDuck duckdb.sql("ATTACH IF NOT EXISTS 'md:my_db'") ``` You will be prompted to authenticate with MotherDuck when you run the above cell. This will open a browser window where you can log in and authorize your marimo notebook to access your MotherDuck database. In order to avoid being prompted each time you open a notebook, you can set the `motherduck_token` environment variable: ```bash export motherduck_token="your_token" marimo edit my_notebook.py ``` Once connected, your MotherDuck tables are automatically discovered in the Datasources Panel: ![Browse your MotherDuck databases](../img/marimo_motherduck_db_discovery.png) _Browse your MotherDuck databases_ ### 2. Writing SQL Queries You can query your MotherDuck db using SQL cells in marimo. Here's an example of how to query a table and display the results using marimo: ![Query a MotherDuck table](../img/marimo_motherduck_sql.png) _Query a MotherDuck table_ marimo's reactive execution model extends into SQL queries, so changes to your SQL will automatically trigger downstream computations for dependent cells (or optionally mark cells as stale for expensive computations). ![img](../img/marimo_motherduck_reactivity-ezgif.com-speed.gif) ### 3. Mixing SQL and Python marimo allows you to seamlessly combine SQL queries with Python code: ![Mixing SQL and Python](../img/marimo_motherduck_python_and_sql.png) _Mixing SQL and Python_ ## Example Notebook For a full example of using MotherDuck with marimo, check out this [example notebook](https://github.com/marimo-team/marimo/blob/main/examples/sql/connect_to_motherduck.py). --- Source: https://motherduck.com/docs/integrations/databases/bigquery --- sidebar_position: 1 title: BigQuery description: Load data from Google BigQuery into MotherDuck using the duckdb-bigquery extension or Python SDK. --- BigQuery is Google Cloud's fully-managed, serverless data warehouse that enables SQL queries using the processing power of Google's infrastructure. To load data into MotherDuck, there are two options: 1. **[Using the `duckdb-bigquery` community extension](#1-using-the-duckdb-bigquery-community-extension)** (easiest to use) - Simple SQL-based approach for quick data transfers and exploration. 2. **[Using Google's BigQuery Python SDK](#2-using-googles-bigquery-python-sdk)** - For performance-optimized ETL pipelines with advanced control over data loading. ## Prerequisites - DuckDB installed (via CLI or Python). - Access to a GCP project with BigQuery enabled. - Valid Google Cloud credentials via: - `GOOGLE_APPLICATION_CREDENTIALS` environment variable, or - `gcloud auth application-default login`. Minimum required IAM roles: - `BigQuery Data Editor` - `BigQuery Job User` ## 1. Using the DuckDB BigQuery Community Extension The following examples use the [DuckDB CLI](/getting-started/interfaces/connect-query-from-duckdb-cli.mdx), but you can use any [DuckDB/MotherDuck clients](/getting-started/interfaces/interfaces.mdx). ### Install and Load the Extension ```sql INSTALL bigquery FROM community; LOAD bigquery; ``` :::info A new experimental scan is now available and offers significantly improved performance. To enable it by default, run:`SET bq_experimental_use_incubating_scan=TRUE` ::: ### Attach BigQuery Project To read data from your project, you attach it just like you would attach a DuckDB database with the following syntax ```sql ATTACH 'project=my-gcp-project' AS bq (TYPE bigquery, READ_ONLY); ``` To read from a public dataset, you can use the following syntax ```sql ATTACH 'project=bigquery-public-data dataset=pypi billing_project=my-gcp-project' AS bq_public (TYPE bigquery, READ_ONLY); ``` ### Query a Table Once attached, you can query BigQuery tables directly using standard SQL syntax: ```sql SELECT * FROM bq.dataset_name.table_name LIMIT 10; ``` #### Alternative Query Functions Behind the scenes, the above query uses `bigquery_scan`. The extension provides two explicit functions for more control over data retrieval: **`bigquery_scan`** - Efficient for reading entire tables or simple queries: ```sql SELECT * FROM bigquery_scan('my_gcp_project.my_dataset.my_table'); ``` **`bigquery_query`** - Execute custom [GoogleSQL](https://cloud.google.com/bigquery/docs/introduction-sql) queries within your BigQuery project. Recommended for querying large tables with complex filters. ```sql SELECT * FROM bigquery_query( 'my_gcp_project', 'SELECT * FROM `my_gcp_project.my_dataset.my_table` WHERE column = "value"' ); ``` ### Loading Data to MotherDuck Ensure the `motherduck_token` environment variable is set: ```sql ATTACH 'md:'; ``` You can use the `CREATE TABLE ... AS` syntax to create a new table, or `INSERT INTO ... SELECT` to append data to an existing table. ```sql CREATE DATABASE IF NOT EXISTS pypi_playground; USE pypi_playground; CREATE TABLE IF NOT EXISTS duckdb_sample AS SELECT * FROM bq_public.pypi.file_downloads WHERE project = 'duckdb' AND timestamp = TIMESTAMP '2025-05-26 00:00:00' LIMIT 100; ``` --- ## 2. Using Google's BigQuery Python SDK For optimized ETL pipeline performance—especially when working with large tables and filter pushdown—we recommend using the [Google Cloud BigQuery Python SDK](https://cloud.google.com/python/docs/reference/bigquery/latest/index.html), which streams results efficiently directly to an Arrow table, enabling zero-copy loading to DuckDB. ### Install Required Libraries ```bash pip install google-cloud-bigquery[bqstorage] duckdb ``` The "extras" option `[bqstorage]` installs `google-cloud-bigquery-storage`. By default, the `google-cloud-bigquery` client uses the **standard BigQuery API** to read query results. This is fine for small results, but **much slower and less efficient** for large datasets. ### Python end-to-end pipeline example The above example has 3 functions : - `get_bigquery_client()` - Authenticates and returns a BigQuery client using service account credentials or default authentication. - `get_bigquery_result()` - Executes a BigQuery SQL query and returns the results as a PyArrow table. - `create_duckdb_table_from_arrow()` - Creates a DuckDB table from PyArrow data in either local DuckDB or MotherDuck. ```python import os from google.cloud import bigquery from google.oauth2 import service_account from google.auth.exceptions import DefaultCredentialsError import logging import time import pyarrow as pa import duckdb GCP_PROJECT = 'my-gcp-project' DATASET_NAME = 'my_dataset' TABLE_NAME = 'my_table' # Configure logging logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S' ) def get_bigquery_client(project_name: str) -> bigquery.Client: """Get Big Query client""" try: service_account_path = os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") if service_account_path: credentials = service_account.Credentials.from_service_account_file( service_account_path ) bigquery_client = bigquery.Client( project=project_name, credentials=credentials ) return bigquery_client raise EnvironmentError( "No valid credentials found for BigQuery authentication." ) except DefaultCredentialsError as creds_error: raise creds_error def get_bigquery_result( query_str: str, bigquery_client: bigquery.Client ) -> pa.Table: """Get query result from BigQuery and yield rows as dictionaries.""" try: # Start measuring time start_time = time.time() # Run the query and directly load into a DataFrame logging.info(f"Running query: {query_str}") pa_tbl = bigquery_client.query(query_str).to_arrow() # Log the time taken for query execution and data loading elapsed_time = time.time() - start_time logging.info( f"BigQuery query executed and data loaded in {elapsed_time:.2f} seconds") # Iterate over DataFrame rows and yield as dictionaries return pa_tbl except Exception as e: logging.error(f"Error running query: {e}") raise def create_duckdb_table_from_arrow( pa_table: pa.Table, table_name: str, db_path: str, database_name: str = "bigquery", ) -> None: """ Create a DuckDB table from PyArrow table data. Args: pa_table: PyArrow table containing the data table_name: Name of the table to create in DuckDB database_name: Name of the database to create/use (default: bigquery_playground) db_path: Database path - use 'md:' prefix for MotherDuck, file path for local or just :memory: for in-memory """ try: # Connect to DuckDB if db_path.startswith("md:"): # check env var motherduck_token if not os.environ.get("motherduck_token"): raise EnvironmentError( "motherduck_token environment variable is not set") conn = duckdb.connect(db_path) # Create database if not exists conn.sql(f"CREATE DATABASE IF NOT EXISTS {database_name}") conn.sql(f"USE {database_name}") # Create table from PyArrow table conn.sql( f"CREATE OR REPLACE TABLE {table_name} AS SELECT * FROM pa_table") logging.info( f"Successfully created table '{table_name}' in database '{database_name}' with {len(pa_table)} rows to {db_path}") except Exception as e: logging.error(f"Error creating DuckDB table: {e}") raise if __name__ == "__main__": # Run the pipeline bigquery_client = get_bigquery_client(GCP_PROJECT) pa_table = get_bigquery_result(f"""SELECT * FROM `{GCP_PROJECT}.{DATASET_NAME}.{TABLE_NAME}}`""", bigquery_client) create_duckdb_table_from_arrow( pa_table=pa_table, table_name=TABLE_NAME, db_path="md:") ``` --- Source: https://motherduck.com/docs/integrations/databases/index --- title: databases description: Use MotherDuck with your favorite databases --- # Databases MotherDuck integrates directly with popular databases to help you build data pipelines and applications. ## Included pages - [BigQuery](https://motherduck.com/docs/integrations/databases/bigquery): Load data from Google BigQuery into MotherDuck using the duckdb-bigquery extension or Python SDK. - [PostgreSQL](https://motherduck.com/docs/integrations/databases/postgres): Connect PostgreSQL and MotherDuck using the Postgres endpoint, DuckDB's PostgreSQL extension, or pg_duckdb. - [PlanetScale](https://motherduck.com/docs/integrations/databases/planetscale): Connect PlanetScale Postgres to MotherDuck using pg_duckdb extension or the Postgres connector for analytical query acceleration --- Source: https://motherduck.com/docs/integrations/databases/planetscale --- sidebar_position: 2 title: PlanetScale description: Connect PlanetScale Postgres to MotherDuck using pg_duckdb extension or the Postgres connector for analytical query acceleration --- PlanetScale offers hosted PostgreSQL and MySQL Vitess Databases. MotherDuck supports PlanetScale Postgres via the [pg_duckdb extension](/concepts/pgduckdb), as well as the [Postgres Connector](/integrations/databases/postgres/). In our internal benchmarking, pg_duckdb offers 100x or greater query acceleration for analytical queries when compared to vanilla Postgres. ## Prerequisites Before connecting PlanetScale to MotherDuck, ensure you have: - A PlanetScale account with a Postgres database created - The `pg_duckdb` extension enabled in your PlanetScale database (see [PlanetScale extension documentation](https://planetscale.com/docs/postgres/extensions/pg_duckdb)) - A MotherDuck account and authentication token (get your token from the [MotherDuck dashboard](https://app.motherduck.com)) - Database connection credentials from your PlanetScale dashboard (host, port, username, password, database name) ## Connecting pg_duckdb to MotherDuck To run pg_duckdb, make sure to add it your [extensions in PlanetScale](https://planetscale.com/docs/postgres/extensions/pg_duckdb). :::tip Review the configuration parameters before deploying the extension. Once deployed, you can connect to MotherDuck with the following SQL statements. ::: ```sql -- Grant necessary permissions to the PlanetScale superuser GRANT CREATE ON SCHEMA public to pscale_superuser; -- Create the pg_duckdb extension in your Postgres database CREATE EXTENSION pg_duckdb; -- Enable a MotherDuck connection with your authentication token CALL duckdb.enable_motherduck(); ``` To swap tokens, you can drop the MotherDuck connection and then re-add with: ```sql -- Remove the existing MotherDuck server connection DROP SERVER motherduck CASCADE; -- Re-enable MotherDuck with a new authentication token CALL duckdb.enable_motherduck(); ``` ### Using Read Replicas with PlanetScale :::info Pg_duckdb will automatically round-robin between your replicas when you use a read-only token. When switching between a read-write and a read-only token, you will want to snapshot your database and then force sync as part of the hand-off. ::: Switching from read-write to read-only is done with the following SQL statement in Postgres: ```sql -- Create a snapshot of your MotherDuck database to ensure consistency SELECT * FROM duckdb.raw_query('CREATE SNAPSHOT OF '); -- Drop the existing MotherDuck connection DROP SERVER motherduck CASCADE; -- Re-enable MotherDuck with your read-only token CALL duckdb.enable_motherduck(); -- Refresh the database to sync with the snapshot SELECT * FROM duckdb.raw_query('REFRESH DATABASE '); ``` ### Reading from MotherDuck :::info By default, data in [MotherDuck is mapped to Postgres in two different ways](https://github.com/duckdb/pg_duckdb/blob/main/docs/motherduck.md#schema-mapping). This is because MotherDuck is designed to hold many databases in its global catalog, while Postgres traditionally has a single database in its catalog. - For data in `my_db.main`, it is mapped directly to the `public` schema in the Postgres database. - For data in any other database & schema, it is mapped to `ddb$database$schema` in the Postgres database. ::: Once the catalog is in sync between MotherDuck and Postgres, the data can be queried directly from Postgres. If it is out of sync for any reason, it can be re-sync'd with the following SQL command: ```sql -- Terminate the pg_duckdb sync worker to force a re-sync SELECT * FROM pg_terminate_backend(( SELECT pid FROM pg_stat_activity WHERE backend_type = 'pg_duckdb sync worker' )); ``` #### Sample MotherDuck Queries Once the catalog is synchronized to Postgres, we can query the data as if it was normal data in Postgres. ```sql -- Query data from a MotherDuck database and schema -- Note: Non-main schemas use the ddb$database$schema naming convention SELECT * FROM "ddb$sample_data$nyc".taxi ORDER BY tpep_dropoff_datetime DESC LIMIT 10; ``` Of course, we can also join with data in Postgres. ```sql -- Join MotherDuck data with local Postgres tables SELECT a.col1, b.col2 -- MotherDuck table from a non-main schema FROM "ddb$my_database$my_schema".my_table AS a -- Local Postgres table in the public schema LEFT JOIN public.another_table AS b on a.key = b.key ``` The DuckDB `iceberg_scan` function also works as well: ```sql -- Use DuckDB's iceberg_scan function to query Iceberg tables SELECT COUNT(*) FROM iceberg_scan('https://motherduck-demo.s3.amazonaws.com/iceberg/lineitem_iceberg', allow_moved_paths := true) ``` :::info Two special helper functions exist to run queries directly with DuckDB: - **`duckdb.query`**: Returns tabular data, use for SELECT queries - **`duckdb.raw_query`**: Returns void, use for DDL queries such as Snapshot Creation and Database Refresh. This function keeps the database in-sync when handing off between read and write nodes. ::: ```sql -- Use duckdb.query for SELECT queries that return tabular data -- This example lists all databases in MotherDuck SELECT * FROM duckdb.query('FROM md_databases()') ``` ```sql -- Use duckdb.raw_query for DDL queries that return void -- This example drops a table in MotherDuck SELECT * FROM duckdb.raw_query('DROP TABLE my_database.my_schema.some_table') ``` ### Replicating data to MotherDuck :::tip For smaller tables, data can be replicated using simple SQL statements. ::: ```sql -- Create a table in MotherDuck and populate it with data from Postgres -- Replace my_database and my_schema with your target database and schema names CREATE TABLE "ddb$my_database$my_schema".my_table USING duckdb AS SELECT * FROM public.my_table ``` :::tip For larger tables, state management, and tighter SLAs & requirements, MotherDuck offers [integrations to various other ingestion partners](/integrations/ingestion/). ::: ### Further reading The [pg_duckdb github repo](https://github.com/duckdb/pg_duckdb) contains [further documentation](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md) of all available functions. For ease of finding the documentation, a table of the documentation sections is below: | Topic | Description | |-------|-------------| | [**Functions**](https://github.com/duckdb/pg_duckdb/blob/main/docs/functions.md) | Complete reference for all available functions | | [**Syntax Guide & Gotchas**](https://github.com/duckdb/pg_duckdb/blob/main/docs/gotchas_and_syntax.md) | Quick reference for common SQL patterns and things to know | | [**Types**](https://github.com/duckdb/pg_duckdb/blob/main/docs/types.md) | Supported data types and type mappings | | [**Extensions**](https://github.com/duckdb/pg_duckdb/blob/main/docs/extensions.md) | DuckDB extension installation and usage | | [**Settings**](https://github.com/duckdb/pg_duckdb/blob/main/docs/settings.md) | Configuration options and parameters | | [**Transactions**](https://github.com/duckdb/pg_duckdb/blob/main/docs/transactions.md) | Transaction behavior and limitations | ## Connecting with the Postgres Extension You can also connect to PlanetScale Postgres with the DuckDB Postgres extension. This approach allows you to query PlanetScale data directly from DuckDB or MotherDuck. ### Install and Load the Extension ```sql -- Install the Postgres extension from DuckDB's extension registry INSTALL postgres; -- Load the extension to enable Postgres connectivity LOAD postgres; -- Attach your PlanetScale database using a connection string ATTACH '' AS postgres_db (TYPE postgres); ``` ### Connection String Format The connection string format follows PostgreSQL's standard connection parameters. Here's an example with explanations: ```sql ATTACH 'host= port= user= password= dbname= sslmode=require' AS planetscale (TYPE postgres); ``` **Connection Parameters:** - `host`: Your PlanetScale database hostname (found in your PlanetScale dashboard) - `port`: The database port (typically 3306 for MySQL or 5432 for Postgres) - `user`: Your PlanetScale database username - `password`: Your PlanetScale database password - `dbname`: The name of your database in PlanetScale - `sslmode=require`: Ensures SSL encryption is used (required for PlanetScale) :::info The above connection string works with DuckDB. PlanetScale suggests also using the `sslnegotiation` and `sslrootcert` keys when connecting to Postgres, but these keys are not supported by the `libpq` version that is included in DuckDB. The `sslmode=require` parameter is sufficient for secure connections. ::: --- Source: https://motherduck.com/docs/integrations/databases/postgres --- sidebar_position: 1 title: PostgreSQL description: Connect PostgreSQL and MotherDuck using the Postgres endpoint, DuckDB's PostgreSQL extension, or pg_duckdb. --- :::tip[Looking for a Postgres-compatible connection to MotherDuck?] Use the **[Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/)** to connect any Postgres-wire-compatible client — BI tools, ORMs, serverless runtimes, or languages without a DuckDB SDK — directly to MotherDuck. No extension required. ::: [PostgreSQL](https://www.postgresql.org) is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database systems much later. As explained by DuckDB Lab's Hannes Mühleisen in the [explainer blog post](https://duckdb.org/2022/09/30/postgres-scanner.html): > PostgreSQL is designed for traditional transactional use cases, "OLTP", where rows in tables are created, updated and removed concurrently, and it excels at this. But this design decision makes PostgreSQL far less suitable for analytical use cases, "OLAP", where large chunks of tables are read to create summaries of the stored data. Yet there are many use cases where both transactional and analytical use cases are important, for example when trying to gain the latest business intelligence insights into transactional data. Choose the PostgreSQL workflow based on where your query needs to run. ## Query MotherDuck from PostgreSQL-compatible clients Use the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) when an application, BI tool, or serverless runtime needs to connect to MotherDuck through the PostgreSQL wire protocol. This is the preferred path for PostgreSQL-compatible clients because it does not require installing or operating a PostgreSQL extension. ## Load PostgreSQL data into MotherDuck Use [DuckDB's PostgreSQL extension](/key-tasks/loading-data-into-motherduck/loading-data-from-postgres) when a DuckDB client needs to read from PostgreSQL and copy data into MotherDuck. This workflow is best for one-time loads, backfills, and controlled client-side movement between PostgreSQL, DuckDB, and MotherDuck. ## Run DuckDB from inside PostgreSQL Use [pg_duckdb](/concepts/pgduckdb) when queries need to run inside a PostgreSQL server with DuckDB or MotherDuck access. This is useful when PostgreSQL-local tables need to be joined with DuckDB or MotherDuck data from the PostgreSQL environment itself. --- Source: https://motherduck.com/docs/integrations/dev-tools/index --- title: Development Tools description: Developer tools and utilities that work with MotherDuck --- # Development Tools Use MotherDuck with various development tools and utilities to enhance your workflow. ## Included pages - [Retool](https://motherduck.com/docs/integrations/dev-tools/retool): Connect Retool to MotherDuck to build internal tools and dashboards powered by your cloud data warehouse. --- Source: https://motherduck.com/docs/integrations/dev-tools/retool --- sidebar_position: 1 title: Retool description: Connect Retool to MotherDuck to build internal tools and dashboards powered by your cloud data warehouse. --- [Retool](https://retool.com/) is a low-code platform for building internal tools and custom business applications with drag-and-drop UI components. There are two ways to connect Retool to MotherDuck, depending on whether you use Retool Cloud or self-hosted Retool. ## Retool Cloud (native connector) Retool Cloud has a native MotherDuck resource type. To connect: 1. Go to **Resources** and select **Create new** > **Resource**. 2. Search for **MotherDuck** and select it. 3. Give the resource a descriptive name (for example, "MotherDuck analytics"). 4. Under **Resource credentials**, enter your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). 5. Optionally enter a **Database name**. Leave it empty to use workspace mode, which lets you query across multiple databases. 6. Click **Test connection**, then **Create resource**. You can use this resource in your Retool apps to run SQL queries against your MotherDuck databases. The resource supports both SQL mode for reading data and GUI mode for write operations (insert, update, delete, upsert). ### Connection options You can pass optional key-value pairs under **Connection options** to customize behavior: | Option | Values | Description | |--------|--------|-------------| | `access_mode` | `READ_WRITE`, `READ_ONLY` | Controls whether the connection can write data | | `attach_mode` | `single`, `workspace` | Sets the [attach mode](/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/). `single` scopes the connection to one database (useful when querying a specific tenant or to avoid catalog clutter); `workspace` (default) attaches every database in your saved workspace. | | `TimeZone` | For example, `UTC`, `America/New_York` | Sets the session time zone | | `default_null_order` | `NULLS_FIRST`, `NULLS_LAST` | Default null ordering for queries | | `default_order` | `ASC`, `DESC` | Default sort order for queries | For more details, see the [Retool MotherDuck documentation](https://docs.retool.com/data-sources/guides/connect/motherduck). ### Known limitations - `BLOB` and `ARRAY` column types are not supported by the native connector. Queries that return these types will fail. Cast these columns to a supported type (for example, using `CAST` or `list_string_agg`) or exclude them from your result set. ## Self-hosted (JDBC) If you run a self-hosted Retool instance, you can connect to MotherDuck through the [DuckDB JDBC driver](/integrations/language-apis-and-drivers/jdbc-driver/). Your instance needs network access to `motherduck.com` over HTTPS (port 443). 1. In your Retool instance, go to **Resources** and select **Create new**. 2. Choose **JDBC** as the resource type. 3. Use the following JDBC connection string: ```text jdbc:duckdb:md:?motherduck_token= ``` Replace `` with your MotherDuck database and `` with your [access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token). 4. Test the connection and save. For more details on the JDBC driver, see [JDBC driver](/integrations/language-apis-and-drivers/jdbc-driver/). --- Source: https://motherduck.com/docs/integrations/file-formats/apache-iceberg --- sidebar_position: 1 title: Apache Iceberg description: Query Apache Iceberg data and work with Iceberg REST catalogs from MotherDuck sessions using the Iceberg DuckDB extension. --- MotherDuck supports the [Apache Iceberg format](https://iceberg.apache.org/) through the [DuckDB Iceberg extension](https://duckdb.org/docs/stable/core_extensions/iceberg/overview). The extension is loaded automatically when Iceberg functions or catalogs are used in your current MotherDuck session. ## Iceberg REST catalogs You can attach an [Iceberg REST catalog](https://duckdb.org/docs/stable/core_extensions/iceberg/iceberg_rest_catalogs) in your current MotherDuck session and query it with standard SQL. The attached catalog is local to the current client session: it does not become a persisted MotherDuck database or workspace attachment. ### Authentication Create a secret with your catalog credentials: ```sql -- OAuth2 CREATE SECRET my_iceberg_secret ( TYPE iceberg, CLIENT_ID 'my_client_id', CLIENT_SECRET 'my_client_secret', OAUTH2_SERVER_URI 'https://my-catalog.example.com/v1/oauth/tokens' ); -- Bearer token CREATE SECRET my_iceberg_secret ( TYPE iceberg, TOKEN 'my_bearer_token' ); ``` ### Attaching a catalog ```sql ATTACH 'my_warehouse' AS my_iceberg ( TYPE iceberg, SECRET my_iceberg_secret, ENDPOINT 'https://my-catalog.example.com' ); ``` :::note This `ATTACH` adds the Iceberg catalog to your current client session only. Re-attach it in each new session. Use `DETACH my_iceberg;` to remove it from the current session. ::: Once attached, browse and query tables using standard SQL: ```sql -- List schemas SHOW SCHEMAS IN my_iceberg; -- Query a table SELECT * FROM my_iceberg.my_schema.my_table; ``` ### Session-scoped write operations Within the attached session, DuckDB's Iceberg REST catalog support includes operations such as creating schemas and tables and inserting data: ```sql CREATE SCHEMA my_iceberg.analytics; CREATE TABLE my_iceberg.analytics.events ( event_id INT, event_type VARCHAR, created_at TIMESTAMP ); INSERT INTO my_iceberg.analytics.events VALUES (1, 'page_view', '2025-01-15 10:30:00'); ``` ### Additional DuckDB Iceberg catalog features DuckDB documents additional Iceberg REST catalog capabilities such as time travel for attached catalogs. Refer to the upstream documentation for the current support matrix and syntax details. ```sql SELECT * FROM my_iceberg.my_schema.my_table AT (VERSION => 1234567890); SELECT * FROM my_iceberg.my_schema.my_table AT (TIMESTAMP => TIMESTAMP '2025-01-15 10:30:00'); ``` ### Limitations - Attached Iceberg REST catalogs are local to the current client session and are not persisted as MotherDuck workspace attachments - `UPDATE` and `DELETE` only work on unpartitioned, unsorted tables - Only merge-on-read semantics (no copy-on-write) - `MERGE INTO` and `ALTER TABLE` are not supported - Reading from REST catalogs is limited to S3, S3 Tables, and GCS storage backends For more details, see the [DuckDB Iceberg REST catalog documentation](https://duckdb.org/docs/stable/core_extensions/iceberg/iceberg_rest_catalogs). ## Scanning individual Iceberg tables Use `iceberg_scan` to query individual Iceberg tables directly by path, without attaching a catalog: ```sql SELECT count(*) FROM iceberg_scan('s3://my-bucket/my-iceberg-table', allow_moved_paths = true); ``` :::note To query data in a secure Amazon S3 bucket, you will need to configure your [Amazon S3 credentials](../../cloud-storage/amazon-s3). ::: ### `iceberg_scan` parameters | Parameter | Type | Default | Description | | :--- | :--- | :--- | :--- | | `allow_moved_paths` | `BOOLEAN` | `false` | Allow scanning Iceberg tables that have been moved or relocated | | `metadata_compression_codec` | `VARCHAR` | `''` | Set to `'gzip'` to read gzip-compressed metadata files | | `snapshot_from_id` | `UBIGINT` | `NULL` | Query a specific snapshot by ID | | `snapshot_from_timestamp` | `TIMESTAMP` | `NULL` | Query the latest snapshot as of a given timestamp | | `version` | `VARCHAR` | `'?'` | Explicit version string, hint file path, or `'?'` for auto-detection | | `version_name_format` | `VARCHAR` | `'v%s%s.metadata.json,%s%s.metadata.json'` | Custom metadata filename pattern | ### Time travel with `iceberg_scan` ```sql -- Query a specific snapshot SELECT * FROM iceberg_scan('s3://my-bucket/my-iceberg-table', allow_moved_paths = true, snapshot_from_id = 1234567890); -- Query as of a timestamp SELECT * FROM iceberg_scan('s3://my-bucket/my-iceberg-table', allow_moved_paths = true, snapshot_from_timestamp = TIMESTAMP '2025-01-15 10:30:00'); ``` ### Metadata and snapshot functions Use `iceberg_metadata` to inspect manifest entries (file paths, formats, record counts): ```sql SELECT * FROM iceberg_metadata('s3://my-bucket/my-iceberg-table', allow_moved_paths = true); ``` Use `iceberg_snapshots` to list available snapshots: ```sql SELECT * FROM iceberg_snapshots('s3://my-bucket/my-iceberg-table'); ``` ### Example with sample dataset ```sql SELECT count(*) FROM iceberg_scan('s3://us-prd-motherduck-open-datasets/iceberg/lineitem_iceberg', allow_moved_paths = true); ``` --- Source: https://motherduck.com/docs/integrations/file-formats/delta-lake --- sidebar_position: 1 title: Delta Lake description: Query Delta Lake tables from MotherDuck using the Delta DuckDB extension. --- MotherDuck supports querying data in the [Delta Lake format](https://delta.io/). The [Delta DuckDB extension](https://duckdb.org/docs/extensions/delta.html) is loaded automatically when any of the supported Delta Lake functions are called. ## Delta function | Function Name | Description | Supported parameters | :--- | :--- | :--- | | `delta_scan` | Query Delta Lake data | All the parquet_scan parameters plus delta_file_number. :::note The available functions are only for reading Delta Lake data. Creating or updating data in Delta format is not yet supported. ::: ## Examples ```sql -- query data SELECT COUNT(*) FROM delta_scan('path-to-delta-folder'); -- query data with parameters FROM delta_scan('path-to-delta-folder', delta_file_number=1, file_row_number=1); ``` ### Query Delta data stored in Amazon S3 :::warning At the moment, querying Delta tables stored in Amazon S3 from **public** buckets is not supported. ::: [Create a S3 secret](/sql-reference/motherduck-sql-reference/create-secret.md) in MotherDuck using the secret manager: ```sql CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 's3_access_key', SECRET 's3_secret_key', REGION 's3-region' ); ``` Query Delta data stored in S3: ```sql SELECT count(*) FROM delta_scan('s3:///'); ``` :::note To query data in an Amazon S3 bucket, you will need to configure your [Amazon S3 credentials](../../cloud-storage/amazon-s3). ::: Example using MotherDuck Delta sample dataset. ```sql SELECT COUNT(*) FROM delta_scan('s3://us-prd-motherduck-open-datasets/file_format_demo/delta_lake/dat/out/reader_tests/generated/basic_append/delta'); ``` --- Source: https://motherduck.com/docs/integrations/file-formats/ducklake --- sidebar_position: 1 title: DuckLake description: Use DuckLake for transactional lakehouse analytics with MotherDuck-managed metadata and object storage. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; import Versions from '@site/src/components/Versions'; ::::note MotherDuck supports DuckDB . In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions through . In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client version through . :::: [DuckLake](https://ducklake.select) is an integrated data lake and catalog format. DuckLake delivers advanced data lake features without traditional lakehouse complexity by using Parquet files and a SQL database. MotherDuck provides two main options for creating and integrating with DuckLake databases: - **[Fully managed](#creating-a-fully-managed-ducklake-database)**: Create a DuckLake database where MotherDuck manages both data storage and metadata - **[Bring your own bucket (BYOB)](#bring-your-own-bucket)**: Connect your own S3 or R2 bucket for data storage with: - **[MotherDuck compute + MotherDuck catalog](#using-motherduck-compute)**: Use MotherDuck for both compute and catalog services - **[Own compute + MotherDuck catalog](#using-own-compute)**: Use your own DuckDB client for compute while MotherDuck provides catalog services ## Creating a fully managed DuckLake database Create a fully managed DuckLake with the following command: ```sql CREATE DATABASE my_ducklake (TYPE DUCKLAKE); ``` MotherDuck stores both data and metadata in MotherDuck-managed storage (not externally accessible at the moment), providing a streamlined way to evaluate DuckLake functionality. The `my_ducklake` database can be accessed like any other MotherDuck database — including over the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/) for clients that don't use the DuckDB SDK. To inspect the metadata catalog backing the DuckLake, see [Performing metadata operations on a DuckLake](#performing-metadata-operations-on-a-ducklake). You can attach the DuckLake metadata with: ```sql ATTACH 'md:__ducklake_metadata_' AS ; ``` ::::note The metadata database can only be attached by the database owner. :::: ## Data inlining Data inlining is an optimization feature that stores small data changes directly in the metadata catalog rather than creating individual Parquet files for every insert operation. This eliminates the overhead of creating small Parquet files while maintaining full query and update capabilities. ### Creating a DuckLake database with custom inlining To create a (fully managed) DuckLake database with a custom inlining threshold: ```sql CREATE DATABASE my_ducklake ( TYPE DUCKLAKE, DATA_INLINING_ROW_LIMIT 100 ); ``` This configuration will inline all inserts with fewer than 100 rows directly into the metadata catalog. ### How data inlining works Data inlining is **enabled by default** with a threshold of 10 rows. Any insert writing fewer than 10 rows is automatically stored inline in the metadata catalog rather than creating a Parquet file. You can customize the threshold with the `DATA_INLINING_ROW_LIMIT` parameter. For example, if you set it to 100, inserts with fewer than 100 rows are stored inline, while inserts with 100 or more rows create Parquet files. Set it to 0 to disable inlining. The inlining threshold applies **per insert operation**. For example, if the limit is set to 100, four separate inserts of 50 rows each will all be stored inline (200 total rows), because each individual insert is below the threshold. When an insert exceeds the threshold, that insert writes directly to a Parquet file, but any previously inlined data remains in the metadata catalog. Larger inserts do not automatically flush existing inlined data. ### Flushing inlined data Because inlined data can accumulate, it is good practice to periodically flush it to parquet storage using the `ducklake_flush_inlined_data` function: ```sql -- Flush inlined data for a specific table SELECT ducklake_flush_inlined_data('my_ducklake.my_schema.my_table'); -- Flush all inlined data in a schema SELECT ducklake_flush_inlined_data('my_ducklake.my_schema'); -- Flush all inlined data in the database SELECT ducklake_flush_inlined_data('my_ducklake'); ``` For workloads with frequent small inserts, schedule regular flushes to prevent excessive inlined data accumulation. > Automatic background flush operations are in active development. ### Configuring inlining You can override the database-level inlining threshold for individual tables: ```sql -- Disable inlining for a specific table CALL my_ducklake.set_option('data_inlining_row_limit', 0, table_name => 'my_table'); -- Set a custom threshold for a specific table CALL my_ducklake.set_option('data_inlining_row_limit', 50, table_name => 'my_table'); ``` You can also set a session-level default that applies to new tables: ```sql SET ducklake_default_data_inlining_row_limit = 0; ``` ## DuckLake configuration DuckLake provides configuration options that you can set at the database or table level using the `set_option` function. For example, you can adjust the `parquet_row_group_size` to control how data is organized in Parquet files: ```sql -- Set row group size for the entire database CALL my_ducklake.set_option('parquet_row_group_size', 50000); -- Set row group size for a specific table CALL my_ducklake.set_option('parquet_row_group_size', 50000, table_name => 'my_table'); ``` Note that calls the `set_option` take precedence over configuration passed when creating the database. ```sql CREATE DATABASE my_ducklake ( TYPE DUCKLAKE, DATA_INLINING_ROW_LIMIT 100 -- sets database level inlining row limit to 100 ); -- overrides the prior value and sets database level row limit to 250 CALL my_ducklake.set_option('data_inlining_row_limit', 250); -- overrides prior value _ONLY_ for `my_table`. CALL my_ducklake.set_option('data_inlining_row_limit', 0, table_name => 'my_table'); ``` For the full list of available configuration options, see the [DuckLake configuration reference](https://ducklake.select/docs/stable/duckdb/usage/configuration#setting-config-values-1). ## Bring your own bucket (BYOB) {#bring-your-own-bucket} You can use MotherDuck as a compute engine and managed DuckLake catalog while connecting your own [AWS S3](/integrations/cloud-storage/amazon-s3/) or [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2/) object store for data storage. Additionally, you can bring your own compute (BYOC) using your DuckDB client to query and write data directly to your DuckLake. ### Setup Configure a custom data path when creating your DuckLake to use your own bucket. :::note Your S3 bucket must be in the same region as your MotherDuck organization: - **US organizations**: `us-east-1` (US East - N. Virginia) - **EU organizations**: `eu-central-1` (Europe - Frankfurt) Other AWS regions are not supported for BYOB. This does not affect reading files directly from S3-compatible object stores (for example, CSV or Parquet): you can still query data from buckets in any region. ::: ```sql CREATE DATABASE my_ducklake ( TYPE DUCKLAKE, DATA_PATH 's3://mybucket/my_optional_path/' ); ``` :::tip Cloudflare R2 buckets are not bound to a specific region, so you can use them with any MotherDuck organization regardless of region. When creating your R2 bucket, set a [location hint](https://developers.cloudflare.com/r2/reference/data-location/) close to your MotherDuck region to minimize latency (for example, `enam` for US organizations, `weur` for EU organizations). ::: ```sql CREATE DATABASE my_ducklake ( TYPE DUCKLAKE, DATA_PATH 'r2://mybucket/my_optional_path/' ); ``` Create a corresponding secret in MotherDuck to allow MotherDuck compute to access your bucket. See [Cloud Storage integrations](/integrations/cloud-storage/) for instructions on creating secrets for your provider. You can then create DuckLake tables as you would with a standard DuckDB database using either MotherDuck or local compute as shown in the examples below. #### Required permissions for DuckLake The minimum required IAM permissions are: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket" ], "Resource": "${s3_bucket_arn}" }, { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Resource": "${s3_bucket_arn}/*" } ] } ``` Your R2 API token needs the following permissions on the bucket: - **Object Read** - read data files - **Object Write** - write and delete data files - **Bucket List** - list objects in the bucket See the [Cloudflare R2 API tokens documentation](https://developers.cloudflare.com/r2/api/s3/tokens/) for instructions on creating an API token. ### Using MotherDuck compute Connect to MotherDuck: ```sql ./duckdb md: ``` Create your first DuckLake table from an hosted Parquet file: ```sql CREATE TABLE my_ducklake.air_quality AS SELECT * FROM 'https://us.data.motherduck.com/who_ambient_air_quality/parquet/who_ambient_air_quality_database_version_2024.parquet'; ``` Query using MotherDuck: ```sql SELECT year, AVG(pm25_concentration::double) AS avg_pm25, AVG(pm10_concentration::double) AS avg_pm10, AVG(no2_concentration::double) AS avg_no2 FROM my_ducklake.air_quality WHERE city = 'Berlin/DEU' GROUP BY year ORDER BY year DESC; ``` ### Using own compute To use your own compute (for example, your DuckDB client), you must: 1. Ensure you have appropriate credentials in your compute environment to read/write to your defined `DATA_PATH` (specified at database creation) 2. Attach the DuckLake using the `ducklake:` prefix so compute runs locally against the MotherDuck-managed metadata catalog Create a secret in your compute environment: If you have authenticated using `aws sso login`: ```sql CREATE OR REPLACE SECRET my_secret IN MOTHERDUCK ( TYPE S3, PROVIDER credential_chain, CHAIN 'sso', PROFILE '' ); ``` :::note Run `aws sso login --profile ` before creating the secret to refresh your SSO token. You may need to restart your DuckDB CLI session after logging in for the credentials to be picked up. Starting with DuckDB v1.4.0, credentials are validated at creation time: if validation fails, confirm your SSO session is active and that you are using the correct `CHAIN` and `PROFILE`. ::: Alternatively, provide static AWS keys: ```sql CREATE SECRET my_secret IN MOTHERDUCK ( TYPE S3, KEY_ID 'my_s3_access_key', SECRET 'my_s3_secret_key', REGION 'my-bucket-region', SCOPE 'my-bucket-path' ); ``` ```sql CREATE SECRET my_secret IN MOTHERDUCK ( TYPE R2, KEY_ID 'your_r2_access_key', SECRET 'your_r2_secret_key', ACCOUNT_ID 'your_account_id' ); ``` Attach the DuckLake to your DuckDB session, pointing at the MotherDuck-managed metadata catalog and your data bucket: ```sql ATTACH 'ducklake:md:__ducklake_metadata_' AS (DATA_PATH ''); ``` This tells DuckLake to: - Use `ducklake:md:__ducklake_metadata_` as the metadata catalog (through MotherDuck) - Use `` for reading and writing data files - Run all compute locally on your DuckDB client rather than on MotherDuck The `ducklake:` prefix is what enables local compute. Attaching with `ATTACH 'md:__ducklake_metadata_'` (without the prefix) gives you the metadata catalog for inspection only -- see [Performing metadata operations on a DuckLake](#performing-metadata-operations-on-a-ducklake). Create a table using your own compute: ```sql CREATE TABLE .air_quality AS SELECT * FROM 'https://us.data.motherduck.com/who_ambient_air_quality/parquet/who_ambient_air_quality_database_version_2024.parquet'; ``` With this configuration, your own compute can directly access or write data to your DuckLake (assuming appropriate credentials are configured). Data uploaded using your own compute will appear in the MotherDuck catalog and be queryable as a standard MotherDuck database. ## What's new in DuckLake 1.0 DuckLake 1.0 is the first production-ready release, with a stable specification and backward-compatibility guarantees. Highlights include: - **Stable specification and multi-engine support**: The DuckLake 1.0 spec is stable with backward-compatibility guarantees going forward, and is designed to be used across multiple query engines. - **Full inlining for inserts, updates, and deletes**: Updates now join inserts and deletes in being inlined into the metadata catalog when under the row threshold (10 by default). Customize with `DATA_INLINING_ROW_LIMIT`. - **Clustering with `SET SORTED BY`**: Declare sort keys on columns or arbitrary SQL expressions. DuckLake applies the sort during compaction and inline flush (and optionally on insert), improving row-group and file pruning for filtered queries. - **Bucket partitioning**: Iceberg-compatible `bucket(N, column)` transforms for high-cardinality columns, giving a middle ground between traditional partitioning and avoiding the small-files problem. - **GEOMETRY enhancements**: Per-file bounding-box statistics enable file pruning on spatial filters, and `GEOMETRY` can now be nested inside `STRUCT`, `LIST`, and `MAP`. - **VARIANT type with shredded statistics**: `VARIANT` sub-fields receive file-level statistics, enabling filter pushdown and faster selective queries over semi-structured data. - **Deletion vectors (experimental)**: Iceberg v3-compatible deletion vectors, stored as Puffin files as an alternative to delete files. See the [DuckLake 1.0 release post](https://ducklake.select/2026/04/13/ducklake-10/) and the [MotherDuck announcement](https://motherduck.com/blog/announcing-ducklake-1-0-on-motherduck/) for more detail. ## Additional DuckLake features DuckLake on MotherDuck also supports: - **Stats-only `COUNT(*)`**: Simple `COUNT(*)` queries are answered directly from metadata statistics without scanning data files. - **TopN file pruning**: `LIMIT` queries with an `ORDER BY` skip data files that fall outside the requested range, making paginated and top-N queries faster. - **Expressions as default values**: Column defaults can use expressions like `now()`, not only literal values. - **Macros**: DuckLake catalogs can store [macros](https://duckdb.org/docs/sql/statements/create_macro.html). ## Performing metadata operations on a DuckLake DuckLake databases provide additional metadata operations for introspection and maintenance. These operations can be performed from both MotherDuck and your own compute environments. For example, you can [list the snapshots](https://ducklake.select/docs/stable/duckdb/usage/snapshots) backing your DuckLake. Every DuckLake database in MotherDuck has a corresponding **metadata database** that stores internal state, including schema definitions, snapshots, file mappings, and more. To inspect this metadata catalog directly from any DuckDB session -- this works for both fully managed and BYOB databases: ```sql ATTACH 'md:__ducklake_metadata_' AS ; ``` ::::note The metadata database can only be attached by the database owner. This form attaches the metadata catalog for inspection only. To run DuckLake compute locally against your data, use the `ducklake:` ATTACH form shown in [Using own compute](#using-own-compute). :::: ## Current limitations - **Limited sharing options**: Read-only sharing is supported through the [existing share functionality](/key-tasks/sharing-data/), restricted to auto-update shares only - **Single-account write access**: Write permissions are limited to one account per database. This account can perform multiple concurrent writes, as long as they are append-only. If multiple queries attempt to update or delete from the same table concurrently, only the first to commit will succeed. Concurrent DDL operations are also not allowed. Support for *multi-account* write access is planned for a future release. - **Limited BYOB storage providers**: Bring Your Own Bucket is supported for [AWS S3](/integrations/cloud-storage/amazon-s3/) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2/) storage. Other clouds are under consideration for future support. :::info For multiple concurrent readers to a MotherDuck DuckLake database, you can create a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). ::: --- Source: https://motherduck.com/docs/integrations/file-formats/index --- title: File Formats description: Load data into MotherDuck using various file formats --- # File Formats Load data into MotherDuck using various file formats. ## Included pages - [Apache Iceberg](https://motherduck.com/docs/integrations/file-formats/apache-iceberg): Query Apache Iceberg data and work with Iceberg REST catalogs from MotherDuck sessions using the Iceberg DuckDB extension. - [Delta Lake](https://motherduck.com/docs/integrations/file-formats/delta-lake): Query Delta Lake tables from MotherDuck using the Delta DuckDB extension. - [DuckLake](https://motherduck.com/docs/integrations/file-formats/ducklake): Use DuckLake for transactional lakehouse analytics with MotherDuck-managed metadata and object storage. --- Source: https://motherduck.com/docs/integrations/how-to-integrate --- sidebar_position: 999 title: Creating a new integration description: Guidelines for integrating your application with MotherDuck, including connection strings and custom user agent configuration. --- import Tabs from '@theme/Tabs' import TabItem from '@theme/TabItem' Integrating with MotherDuck follows the same pattern as integrating with DuckDB, so you can use the same client APIs and frameworks. There are three differences: 1. Use `md:` or `md:analytics` as the connection string instead of a local filesystem path. 2. Pass `motherduck_token` through a config dictionary, connection string parameter, or environment variable. 3. Pass `custom_user_agent` so MotherDuck can identify the integration in query history. ### Choose a `custom_user_agent` format {#custom-user-agent-format} Use the format `integration/version(metadata1,metadata2)`. The version and metadata sections are optional. - Avoid spaces in the integration and version sections. - Separate multiple metadata values with commas. - If you plan to group by one workload label later, put it first in the metadata list. Examples: - `catalogsync` - `catalogsync/5.1.5.1` - `catalogsync/5.1.5.1(batchload)` - `catalogsync/5.1.5.1(batchload,useast1)` ## Language and framework examples {#custom-user-agent-examples} ```python con = duckdb.connect("md:analytics", config={ "motherduck_token": token, "custom_user_agent": "catalogsync/5.1.5.1(batchload,useast1)" }) ``` ```python eng = create_engine("duckdb:///md:analytics", connect_args={ "config": { "motherduck_token": token, "custom_user_agent": "catalogsync/5.1.5.1(batchload,useast1)" } }) ``` ```java Properties config = new Properties(); config.setProperty("motherduck_token", token); config.setProperty("custom_user_agent", "catalogsync/5.1.5.1(batchload,useast1)"); Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:analytics", config); ``` ```javascript import { DuckDBInstance } from '@duckdb/node-api' const instance = await DuckDBInstance.create("md:analytics", { motherduck_token: token, custom_user_agent: "catalogsync/5.1.5.1(batchload,useast1)" }) const conn = await instance.connect() ``` ```go dsn := fmt.Sprintf( "md:analytics?motherduck_token=%s&custom_user_agent=%s", url.QueryEscape(token), url.QueryEscape("catalogsync/5.1.5.1(batchload,useast1)"), ) db, err := sql.Open("duckdb", dsn) ``` ## Implementation best practices If you use DuckDB or MotherDuck in a shared environment where one process serves multiple users, the connection string must be unique per user. You can disambiguate the connection string with a user-specific parameter such as `md:analytics?session_user=`. If you pass `motherduck_token` in the connection string, ensure your application does not log it in plaintext. --- Source: https://motherduck.com/docs/integrations/ingestion/dlt --- title: dlt (data load tool) description: Use dlt to extract and load data from APIs and databases into MotherDuck with automatic schema inference. --- [dlt](https://dlthub.com/docs/intro) is an open-source Python library that loads data from various, often messy data sources into well-structured, live datasets. It offers a lightweight interface for extracting data from REST APIs, SQL databases, cloud storage, Python data structures, and many more. dlt is designed to be easy to use, flexible, and scalable: * dlt infers schemas and data types, normalizes the data, and handles nested data structures. * dlt supports a variety of popular destinations and has an interface to add custom destinations to create reverse ETL pipelines. * dlt can be deployed anywhere Python runs, be it on Airflow, serverless functions, or any other cloud deployment of your choice. * dlt automates pipeline maintenance with schema evolution and schema and data contracts. Dlt integrates well with DuckDB (they also used it as a local [cache](https://dlthub.com/blog/dltplus-project-cache-in-early-access)) and therefore with MotherDuck. You can check more about MotherDuck integration in the [official documentation](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck). ## Authentication To authenticate with MotherDuck, you have two options: 1. **Environment variable:** export your `motherduck_token` as an environment variable: ```bash export motherduck_token="your_motherduck_token" ``` 2. For Local development: add the token to `.dlt/secrets.toml`: ```toml [destination.motherduck.credentials] password = "my_motherduck_token" ``` ## Minimal example Below is a minimal example of using dlt to load data from a REST API (with fake data) into a DuckDB (MotherDuck) database: ```python import dlt from typing import Dict, Iterator, List, Sequence import random from datetime import datetime from dlt.sources import DltResource @dlt.source(name="dummy_github") def dummy_source(repos: List[str] = None) -> Sequence[DltResource]: """ A minimal DLT source that generates dummy GitHub-like data. Args: repos (List[str]): A list of dummy repository names. Returns: Sequence[DltResource]: A sequence of resources with dummy data. """ if repos is None: repos = ["dummy/repo1", "dummy/repo2"] return ( dummy_repo_info(repos), dummy_languages(repos), ) @dlt.resource(write_disposition="replace") def dummy_repo_info(repos: List[str]) -> Iterator[Dict]: """ Generates dummy repository information. Args: repos (List[str]): List of repository names. Yields: Iterator[Dict]: An iterator over dummy repository data. """ for repo in repos: owner, name = repo.split("/") yield { "id": random.randint(10000, 99999), "name": name, "full_name": repo, "owner": {"login": owner}, "description": f"This is a dummy repository for {repo}", "created_at": datetime.now().isoformat(), "updated_at": datetime.now().isoformat(), "stargazers_count": random.randint(0, 1000), "forks_count": random.randint(0, 500), } @dlt.resource(write_disposition="replace") def dummy_languages(repos: List[str]) -> Iterator[Dict]: """ Generates dummy language data for repositories in an unpivoted format. Args: repos (List[str]): List of repository names. Yields: Iterator[Dict]: An iterator over dummy language data. """ languages = ["Python", "JavaScript", "TypeScript", "C++", "Rust", "Go"] for repo in repos: # Generate 2-4 random languages for each repo num_languages = random.randint(2, 4) selected_languages = random.sample(languages, num_languages) for language in selected_languages: yield { "repo": repo, "language": language, "bytes": random.randint(1000, 100000), "check_time": datetime.now().isoformat(), } def run_minimal_example(): """ Runs a minimal example pipeline that loads dummy GitHub data to MotherDuck. """ # Define some dummy repositories repos = ["example/repo1", "example/repo2", "example/repo3"] # Configure the pipeline pipeline = dlt.pipeline( pipeline_name="minimal_github_pipeline", destination='motherduck', dataset_name="minimal_example", ) # Create the data source data = dummy_source(repos) # Run the pipeline with all resources info = pipeline.run(data) print(info) # Show what was loaded print("\nLoaded data:") print(f"- {len(repos)} repositories") print(f"- Languages for {len(repos)} repositories") if __name__ == "__main__": run_minimal_example() ``` dlt revolves around three core concepts: * Sources: Define where the data comes from. * Resources: Represent structured units of data within a source. * Pipelines: Manage the data loading process. In the example above: * dummy_source defines a source that simulates GitHub-like data. * dummy_repo_info and dummy_languages are resources producing repository and language data. * A pipeline loads this data into MotherDuck. The core integration with MotherDuck is defined in the pipeline configuration: ```python pipeline = dlt.pipeline( pipeline_name="minimal_github_pipeline", destination="motherduck", dataset_name="minimal_example", ) ``` Setting destination="motherduck" tells dlt to load the data into MotherDuck. --- Source: https://motherduck.com/docs/integrations/ingestion/index --- title: Ingestion description: Configure MotherDuck as the destination for your data in the following data ingestion tools --- # Ingestion Tools Configure MotherDuck as the destination for your data in the following data ingestion tools. ## Included pages - [dlt (data load tool)](https://motherduck.com/docs/integrations/ingestion/dlt): Use dlt to extract and load data from APIs and databases into MotherDuck with automatic schema inference. - [Streamkap](https://motherduck.com/docs/integrations/ingestion/streamkap): Stream Change Data Capture (CDC) events from databases into MotherDuck via S3 using Streamkap. --- Source: https://motherduck.com/docs/integrations/ingestion/streamkap --- title: Streamkap description: Stream Change Data Capture (CDC) events from databases into MotherDuck via S3 using Streamkap. --- # Streamkap [Streamkap](http://streamkap.com) is a stream processing platform built for Change Data Capture (CDC) and event sources. It makes it easy to move operational data into analytics systems like MotherDuck with low latency and high reliability. Streamkap offers various sources, including PostgreSQL, MySQL, SQL Server, a range of SQL and NoSQL databases, Kafka, and other storage systems. Streamkap is designed to get you streaming in minutes without a heavy setup. You focus on your business, and Streamkap handles the hard parts: * Lightweight in-stream transformations let you preprocess, clean, and enrich data with minimal latency and cost. * Automatically adapts to schema changes—added or removed fields, renamed columns, evolving data types, and nested structures. * Built-in observability and automated recovery reduce operational overhead. * Fully managed via API or Terraform, integrates with CI/CD workflows, and automates environment provisioning. * Deploy multiple service versions to isolate workloads—logically (per microservice or environment) or physically (across regions or infrastructure). * Choose from Streamkap Cloud or BYOC (Bring Your Own Cloud) for maximum flexibility and security. You can explore Streamkap’s MotherDuck integration and examples in the [official documentation.](https://docs.streamkap.com/motherduck) # **Overview** This guide explains how to stream data from Streamkap into the MotherDuck database using Amazon S3 as an intermediary. We'll utilise the S3 connector to first stream data into an S3 bucket. Then, you can configure MotherDuck to read from the S3 bucket to ingest the data into your database. * Streamkap to S3: Streamkap is Kafka-based, so Kafka messages are streamed into an Amazon S3 bucket via an existing dedicated S3 connector. Please refer to the Streamkap’s [Kafka to S3 Streaming Guide](https://docs.streamkap.com/s3) for detailed instructions. * S3 to MotherDuck: MotherDuck is configured to read the data from the S3 bucket and load it into the database. # **Prerequisites** * Amazon S3 Bucket: A bucket in Amazon S3 where data from Streamkap will be streamed. * MotherDuck Account: A valid MotherDuck account and database setup where the data will be loaded. * Streamkap’s Kafka S3 Connector: Your Kafka to S3 connector configured and running. # **MotherDuck Setup** Once data is available in the S3 bucket, you can configure MotherDuck to read from the S3 bucket and load it into your database. Follow these steps: ## Configure the S3 Source in MotherDuck To read data from the S3 bucket into MotherDuck, you need to configure a data source that points to the S3 bucket. This involves creating a connection between MotherDuck and your S3 bucket using AWS credentials. 1. Log in to MotherDuck and navigate to your workspace or database. 2. Go to the Secrets. 3. Add new secret and choose Amazon S3 as the secret type. 4. Provide the necessary details to access the S3 bucket: * Secret Name: The name of your source connection details. * Region: The region of your S3 bucket (e.g., us-west-2). * Access Key ID: Your AWS Access Key ID. * Secret Access Key: Your AWS Secret Access Key. ### SQL Command for Secret Configuration Alternatively, you can configure the secret using SQL. Below is an example configuration for setting up the secret: ```sql CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'us-east-1' ); ``` ### Verify Existing Secrets To check your existing secrets, you can run the following SQL command: ```sql FROM duckdb_secrets()` ``` ![Streamkap S3 secret configuration in MotherDuck](../img/streamkap_image1.png) ## Query Data from the S3 Bucket Once the connection between MotherDuck and your S3 bucket is established, you can define a schema and table in MotherDuck or simply query the data directly from the S3 bucket. Since your Kafka stream might be writing multiple files to the S3 bucket, we recommend using a wildcard `*` to read all files in a folder. This will enable MotherDuck to automatically pick up new files as they are written to the S3 bucket. Here is an example SQL query to read data from your S3 bucket (using a wildcard for streaming): ```sql SELECT key.id, value.name, value.note FROM read read_parquet('s3://streamkap-s3-test-bucket/parquet_test/*') ``` ![Query results from S3 bucket in MotherDuck](../img/streamkap_image2.png) --- Source: https://motherduck.com/docs/integrations/integrations --- title: Integrations description: Integrations that work with MotherDuck from the modern data stack sidebar_class_name: integration-icon --- import { IntegrationsTable } from "./integrations.table.js"; import "./integrations.css"; MotherDuck integrates with a lot of common tools from the modern data stack. If you would like to create a new integration, see [this guide](how-to-integrate). Below, you will find a comprehensive list of integrations that work with MotherDuck. Each integration includes links to either our own detailed tutorials, the integrator's documentation, or insightful articles and blogs that can help you get started. :::info When working with integrations, it may be useful to be aware of the [different connection string parameters](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#using-connection-string-parameters) you can use to connect to MotherDuck. ::: ## Supported Integrations Use the search box to find specific integrations or click on category tags to filter the table. :::note See [DuckDB documentation](https://duckdb.org/docs/api/overview.html) for the full list of supported client APIs and drivers. ::: ## Diagram: Modern Duck Stack ![img_duck_stack](../img/md-diagram.svg) --- Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/go-driver --- sidebar_position: 1 title: Go driver description: Connect to MotherDuck from Go applications using the go-duckdb driver. --- The [go-duckdb driver](https://github.com/duckdb/duckdb-go) supports MotherDuck out of the box! To connect, you need a dependency on the driver in your `go.mod` file: ```go github.com/duckdb/duckdb-go/v2 v2.5.1 ``` Your code can then open a connection using the standard [database/sql](https://pkg.go.dev/database/sql) package, or any other mechanisms supported by [go-duckdb](https://github.com/duckdb/duckdb-go/blob/master/README.md): ```go db, err := sql.Open("duckdb", "md:my_db?motherduck_token=") ``` ## Go gotchas ### Use "motherduck_" prefixed configuration in the connection string Because `duckdb-go` parses all arguments out into a configuration dictionary, the shorthand properties such as `attach_mode` will not work. Use the fully qualified properties such as `motherduck_attach_mode` for the MotherDuck-specific properties: ```go db, err := sql.Open("duckdb", "md:my_db?motherduck_attach_mode=single") ``` ### Connecting to multiple accounts from the same process Because `duckdb-go` parses all arguments out into a configuration dictionary, trying to connect with multiple MotherDuck accounts (different `motherduck_token` values) from the same Go process will fail with [Can't open a connection to same database file with a different configuration](/documentation/troubleshooting/error_messages.md#disallowed-connections-with-a-different-configuration). If connecting to different accounts is a requirement, work around this by connecting to an in-memory DuckDB database first: ```go c, err := duckdb.NewConnector(":memory:?custom_user_agent=INTEGRATION_NAME/v1.2.3", func(execer driver.ExecerContext) error { bootQueries := []string{ `INSTALL motherduck`, `LOAD motherduck`, fmt.Sprintf("SET motherduck_token='%s'", token), `SET motherduck_session_name='user123'`, `ATTACH 'md:my_db'`, } for _, query := range bootQueries { _, err := execer.ExecContext(context.Background(), query, nil) if err != nil { return err } } return nil }) if err != nil { // handle the error } defer c.Close() db := sql.OpenDB(c) defer db.Close() ``` --- Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/index --- title: Language APIs & Drivers description: Connect to MotherDuck using your preferred programming language --- # Language APIs & Drivers Connect to MotherDuck using official drivers and APIs for various programming languages. ## Included pages - [Go driver](https://motherduck.com/docs/integrations/language-apis-and-drivers/go-driver): Connect to MotherDuck from Go applications using the go-duckdb driver. - [JDBC driver](https://motherduck.com/docs/integrations/language-apis-and-drivers/jdbc-driver): Connect to MotherDuck from Java applications using the official DuckDB JDBC driver. - [Python](https://motherduck.com/docs/integrations/language-apis-and-drivers/python/python-overview): Connect to MotherDuck using Python - [R](https://motherduck.com/docs/integrations/language-apis-and-drivers/r): Connect to MotherDuck from R for statistical analysis using the DuckDB R package. --- Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/jdbc-driver --- sidebar_position: 1 title: JDBC driver description: Connect to MotherDuck from Java applications using the official DuckDB JDBC driver. --- import CodeBlock from '@theme/CodeBlock'; import appVersions from '@site/static/duckdb-versions.json'; The official [DuckDB JDBC driver](https://duckdb.org/docs/api/java.html) supports MotherDuck out of the box! To connect, you need a dependency on the driver. For example, in your Maven pom.xml file: {` org.duckdb duckdb_jdbc ${appVersions.language_clients.duckdb_jdbc} `} Your code can then create a `Connection` by using `jdbc:duckdb:md:databaseName` connection string format: ```xml Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db"); ``` This `Connection` can then be [used directly](https://docs.oracle.com/en/java/javase/17/docs/api/java.sql/java/sql/Connection.html) or through any framework built on `java.sql` JDBC abstractions. There are two main ways to programmatically authenticate with a valid MotherDuck token: 1) Passing it in through the connection configuration ```java Properties config = new Properties(); config.setProperty("motherduck_token", token); Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:mdw", config); ``` 2) Passing the token as a connection string parameter: ```java Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?motherduck_token="+token); ``` See [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md) for more details. --- Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/python/python-overview --- title: Python description: Connect to MotherDuck using Python --- Check out our [Python tutorial](/getting-started/interfaces/client-apis/python/installation-authentication). --- Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/python/sqlalchemy --- sidebar_position: 3 title: SQLAlchemy with DuckDB and MotherDuck sidebar_label: SQLAlchemy description: Connect to MotherDuck using SQLAlchemy and the DuckDB SQLAlchemy driver for Python applications. --- [SQLAlchemy](https://www.sqlalchemy.org/) is a SQL toolkit and Object-Relational Mapping (ORM) system for Python, providing full support for SQL expression language constructs and various database dialects. A lot of Business Intelligence tools supports SQLAlchemy out of the box. Using the [DuckDB SQLAlchemy driver](https://github.com/Mause/duckdb_engine) we can connect to MotherDuck using an SQLAlchemy URI. ## Install the DuckDB SQLAlchemy driver ```bash pip install --upgrade duckdb-engine ``` ## Configuring the database connection to a local DuckDB database A local DuckDB database can be accessed using the SQLAlchemy URI: ```bash duckdb:///path/to/file.db ``` ## Configuring the database connection to MotherDuck The general pattern for the SQLAlchemy URI to access a MotherDuck database is: ```bash duckdb:///md:?motherduck_token= ``` :::info The database name `` in the connection string is **optional**. This makes it possible to query multiple databases with one connection to MotherDuck. ::: Connecting and authentication can be done in several ways: 1. If no token is available, the process will direct you to a web login for authentication, which will allow you to obtain a token. ```python from sqlalchemy import create_engine, text eng = create_engine("duckdb:///md:my_db") with eng.connect() as conn: result = conn.execute(text("show databases")) for row in result: print(row) ``` When running the above, you will see something like this to authenticate: ![motherduck login](../img/sqlalchemy_auth.png) 2. The `MOTHERDUCK_TOKEN` is already set as environment variable ```python from sqlalchemy import create_engine, text eng = create_engine("duckdb:///md:my_db") with eng.connect() as conn: result = conn.execute(text("show databases")) for row in result: print(row) ``` 3. Using configuration dictionary ```python from sqlalchemy import create_engine, text config = {} token = 'asdfwerasdf' # Fill in your token config["motherduck_token"] = token; eng = create_engine( "duckdb:///md:my_db", connect_args={ 'config': config} ) with eng.connect() as conn: result = conn.execute(text("show databases")) for row in result: print(row) ``` 4. Passing the token as a connection string parameter ```python from sqlalchemy import create_engine, text token = 'asdfwerasdf' # Fill in your token eng = create_engine(f"duckdb:///md:my_db?motherduck_token={token}") with eng.connect() as conn: result = conn.execute(text("show databases")) for row in result: print(row) ``` :::info While the DuckDB Python API has a `.sql()` method on the connection API, SQLAlchemy does not. However, they both share the `.execute()` function and concept. More info in the [SQLAlchemy connection documentation](https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.Connection) ::: --- Source: https://motherduck.com/docs/integrations/language-apis-and-drivers/r --- sidebar_position: 1 title: R description: Connect to MotherDuck from R for statistical analysis using the DuckDB R package. --- [R](https://www.r-project.org/) is a language for statistical analysis. To connect to MotherDuck from an R program, you need to first install DuckDB: ```r install.packages("duckdb") ``` You'll then need to load the `motherduck` extension and `ATTACH 'md:'` to connect to all of your databases. To connect to only one database, use `ATTACH 'md:my_db'` syntax. ```r library("DBI") con <- dbConnect(duckdb::duckdb()) dbExecute(con, "INSTALL 'motherduck'") dbExecute(con, "LOAD 'motherduck'") dbExecute(con, "ATTACH 'md:'") dbExecute(con, "USE my_db") res <- dbGetQuery(con, "SHOW DATABASES") print(res) ``` Once connected, any R syntax described in the [DuckDB's documentation](https://duckdb.org/docs/api/r.html) should work. :::note Extension autoloading is turned off in R duckdb distributions, so `dbdir = "md:"` style connections do not connect to MotherDuck. ::: ## Considerations and limitations ### Windows integration MotherDuck extension is not currently available on Windows. As a workaround, you can use [WSL](https://learn.microsoft.com/en-us/windows/wsl/about) (Windows Subsystem for Linux) --- Source: https://motherduck.com/docs/integrations/orchestration/index --- title: Orchestration description: Orchestrate data pipelines with MotherDuck --- # Orchestration Tools Build and manage data pipelines with MotherDuck using these orchestration tools. ## Included pages No included pages are currently listed for this category. --- Source: https://motherduck.com/docs/integrations/reverse-etl/index --- title: Reverse ETL description: Reverse ETL tools and utilities that work with MotherDuck --- # Development Tools Use MotherDuck with various development tools and utilities to enhance your workflow. ## Included pages No included pages are currently listed for this category. --- Source: https://motherduck.com/docs/integrations/serverless-compute/cloudflare-workers --- sidebar_position: 1 title: Cloudflare Workers description: Query MotherDuck from Cloudflare Workers using the Postgres wire protocol feature_stage: preview --- [Cloudflare Workers](https://workers.cloudflare.com/) is an edge compute platform for running serverless functions globally. Workers can connect to MotherDuck through the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) using the [`pg`](https://www.npmjs.com/package/pg) npm package. ## Connection Workers connect to MotherDuck using a standard Postgres connection string: ```typescript import { Client } from "pg"; const connectionString = `postgresql://user:${MOTHERDUCK_TOKEN}@pg.us-east-1-aws.motherduck.com:5432/${DATABASE}?sslmode=require`; const client = new Client({ connectionString }); await client.connect(); ``` Key requirements: - The `nodejs_compat` compatibility flag must be enabled in `wrangler.toml` — it provides the `node:net` module that `pg` needs for TCP connections. - Use `?sslmode=require`. MotherDuck's Postgres endpoint only accepts encrypted connections, and Cloudflare Workers delegates certificate verification to the runtime's TLS stack. - Store your MotherDuck token as a [Wrangler secret](https://developers.cloudflare.com/workers/configuration/secrets/) — never commit tokens to source code. ## Connection pooling with Hyperdrive For production workloads, [Cloudflare Hyperdrive](https://developers.cloudflare.com/hyperdrive/) provides built-in connection pooling. This reduces latency by reusing connections across Worker invocations instead of opening a new connection per request. ```toml # wrangler.toml [[hyperdrive]] binding = "MD_HYPERDRIVE" id = "" ``` ```typescript const client = new Client({ connectionString: env.MD_HYPERDRIVE.connectionString, }); ``` For local development with `wrangler dev`, add a `localConnectionString` to the Hyperdrive binding or export `CLOUDFLARE_HYPERDRIVE_LOCAL_CONNECTION_STRING_MD_HYPERDRIVE`. That lets you test the Worker locally against MotherDuck before deploying the Hyperdrive-backed version. ## Tutorial For a step-by-step guide to building and deploying a Cloudflare Worker that queries MotherDuck, see [Connect from Cloudflare Workers](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/cloudflare-workers). --- Source: https://motherduck.com/docs/integrations/serverless-compute/index --- title: Serverless Compute description: Connect to MotherDuck from serverless and edge compute platforms --- # Serverless Compute Query MotherDuck from serverless functions and edge runtimes using the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint). Because these environments can't run native DuckDB bindings, the Postgres wire protocol provides a thin-client path to MotherDuck with no DuckDB dependencies. ## Included pages - [Cloudflare Workers](https://motherduck.com/docs/integrations/serverless-compute/cloudflare-workers): Query MotherDuck from Cloudflare Workers using the Postgres wire protocol --- Source: https://motherduck.com/docs/integrations/sql-ides/datagrip --- sidebar_position: 5 title: DataGrip description: Connect JetBrains DataGrip to MotherDuck using the built-in DuckDB integration. --- JetBrains [DataGrip](https://www.jetbrains.com/datagrip/) is a cross-platform IDE for working with SQL and noSQL databases. It includes a DuckDB integration, which makes connecting to MotherDuck easy. ## Connecting to MotherDuck in DataGrip Create a new data source and choose the **DuckDB** driver. DataGrip opens the **Data Sources and Drivers** window where you configure the connection. ### Token Authentication To retrieve a MotherDuck token, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md). 1. In **Data Sources and Drivers > General**, set **Authentication** to **No auth**. 2. Populate the **URL** field with the MotherDuck connection string, replacing `my_db` with your database name or omitting it to connect to the default catalog: ```sh jdbc:duckdb:md:[my_db] ``` ![config](../img/datagrip_config.png) 3. Open the **Advanced** tab and add a new parameter named `motherduck_token`, setting its value to the token you generated earlier. ![config](../img/datagrip_token.png) Click "OK" to begin querying MotherDuck! :::note The default schema filtering configuration of DataGrip may hide some of the schemas that exist in your MotherDuck account. Reconfigure to display all schemas following [DataGrip documentation](https://www.jetbrains.com/help/datagrip/schemas.html). ::: ## Update the DuckDB Driver Version DataGrip bundles a DuckDB JDBC driver, but you can replace it with another version if needed. 1. Visit the [DuckDB JDBC maven repository](https://mvnrepository.com/artifact/org.duckdb/duckdb_jdbc). 2. Select the DuckDB release you want to use and download the `.jar` file listed under **Files**. 3. In the **Data Sources and Drivers** window, switch to the **Drivers** pane and select **DuckDB**. 4. On the **General** tab, find **Driver files**, click the **+** icon, and choose the `.jar` file you downloaded. 5. You need to remove the existing DuckDB driver from the **Drivers** pane for the new driver to take effect (needs to be first in the list). 6. [optional] To restore the default driver, click on the **+** icon and select **DuckDB** among the available drivers. DataGrip now uses the updated DuckDB driver for MotherDuck connections. --- Source: https://motherduck.com/docs/integrations/sql-ides/dbeaver --- sidebar_position: 5 title: DBeaver description: Connect DBeaver Community to MotherDuck using the DuckDB database integration. --- [DBeaver Community](https://dbeaver.io/) is a free cross-platform database integrated development environment (IDE). It includes a DuckDB integration, so it is a great choice for querying MotherDuck. ## DBeaver DuckDB Setup DBeaver uses the official [DuckDB JDBC driver](https://duckdb.org/docs/api/java.html), which supports MotherDuck out of the box! To install DBeaver and the DuckDB driver, first follow the [DuckDB DBeaver guide](https://duckdb.org/docs/guides/sql_editors/dbeaver). That guide will create a local DuckDB in memory connection. After completing those steps, follow the steps below to add a MotherDuck connection in addition! ## Connecting DBeaver to MotherDuck ### Browser Authentication Create a new DuckDB connection in DBeaver. When entering the connection string in DBeaver, instead of using `:memory:` for an in memory DuckDB, use `md:my_db`. Replace `my_db` with the name of the target MotherDuck database as needed. Clicking either "Test Connection" or "Finish" will open the default browser and display an authorization prompt. Click "Confirm", then return to DBeaver to begin querying MotherDuck! ### Token Authentication To avoid the authentication prompt when opening DBeaver, a MotherDuck access token can be included as a connection string parameter. To retrieve a token, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md). Then, create a new DuckDB connection in DBeaver. Include the token as a query string parameter in the connection string following this format, replacing `` with the access token from the prior step, and `my_db` with the target MotherDuck database: ```sh md:my_db?motherduck_token= ``` Click "Finish" to begin querying MotherDuck! --- Source: https://motherduck.com/docs/integrations/sql-ides/index --- title: SQL IDEs description: Use MotherDuck with your favorite SQL development environments --- # SQL IDEs Connect to MotherDuck using popular SQL development environments and query editors. ## Included pages - [DataGrip](https://motherduck.com/docs/integrations/sql-ides/datagrip): Connect JetBrains DataGrip to MotherDuck using the built-in DuckDB integration. - [DBeaver](https://motherduck.com/docs/integrations/sql-ides/dbeaver): Connect DBeaver Community to MotherDuck using the DuckDB database integration. --- Source: https://motherduck.com/docs/integrations/transformation/dbt-cloud --- sidebar_position: 20 title: dbt cloud with MotherDuck via pg_duckdb description: For dbt cloud users, pg_duckdb can be used as a shim for MotherDuck sidebar_label: dbt cloud --- [dbt cloud](https://www.getdbt.com/product/dbt-cloud) is a managed service for dbt core. MotherDuck is used with dbt cloud by deploying a Postgres proxy with [pg_duckdb](/concepts/pgduckdb) installed. :::note If you only need to connect to MotherDuck from a PostgreSQL-compatible client, use the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) instead. Use this `pg_duckdb` pattern when you specifically need to operate a PostgreSQL server or proxy for dbt Cloud. ::: ## Getting started You will need the following items to get started: 1. A Postgres instance with pg_duckdb installed. 2. A [MotherDuck token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token). 3. A dbt cloud account. ## Configuring pg_duckdb The full documentation for pg_duckdb can be found on [GitHub](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md), but a simple way to set it up is using Docker on EC2. In our testing, we have used m7g.xlarge, which is a 4-core, 16GB instance. Since Postgres exists as a proxy for MotherDuck, it only needs to have enough working space to stream results back to dbt. Even smaller instances could suffice as well, i.e. a1.large, although it has not been tested thoroughly. The memory limits set below assumes a 16GB limit. Once you have added your MotherDuck Token and Postgres password to your environment, you can execute the `docker run` statement below: ```yml docker run -d \ --name pgduckdb \ -p 5432:5432 \ -e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \ -e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \ -v ~/pgduckdb_data_v17:/var/lib/postgresql/data \ --restart unless-stopped \ --memory=12288m \ pgduckdb/pgduckdb:17-main ``` :::note The default configuration of Postgres is sub-optimal for m7g.xlarge. Consider making the following changes to the `postgresql.conf` file. ```ini # Memory configuration optimized for AWS m7g.xlarge with more conservative settings work_mem = '32MB' # Per-operation memory for sorts, joins, etc. maintenance_work_mem = '512MB' # Memory for maintenance operations shared_buffers = '2GB' # ~12.5% of RAM for shared buffer cache effective_cache_size = '6GB' # Conservative estimate of OS cache max_connections = 100 # Reduced maximum concurrent connections ``` ::: ### Upgrading to newer builds of pg_duckdb New containers are built for pg_duckdb on every release. Since we are using docker to run the container, the pg_duckdb server can be stopped, pruned, and then rebuilt with the above docker run command. It is recommended to use a script to rebuild docker image on some cadence. Terraform or similar is recommended to handle this maintenance process. An example shell script can be seen below:
Shell script ```sh #!/bin/bash # Error handling function handle_error() { local line_no=$1 local exit_code=$2 echo "ERROR: An error occurred at line ${line_no}, exit code ${exit_code}" exit ${exit_code} } # Set up error trap trap 'handle_error ${LINENO} $?' ERR # Script to install Docker and run PGDuckDB with MotherDuck on AWS EC2 # Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh # Detect OS if grep -q 'Amazon Linux release 2023' /etc/os-release; then OS_VERSION="Amazon Linux 2023" elif grep -q 'Amazon Linux release 2' /etc/os-release; then OS_VERSION="Amazon Linux 2" elif grep -q 'Ubuntu' /etc/os-release; then OS_VERSION="Ubuntu" else OS_VERSION="Linux" fi echo "Starting setup for PGDuckDB with MotherDuck on $OS_VERSION..." # Check if required environment variables are set if [ -z "$POSTGRES_PASSWORD" ]; then echo "ERROR: POSTGRES_PASSWORD environment variable is not set." echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh" exit 1 fi if [ -z "$MOTHERDUCK_TOKEN" ]; then echo "ERROR: MOTHERDUCK_TOKEN environment variable is not set." echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh" exit 1 fi # Update package lists - continue even if there are errors with some repositories echo "Updating package lists..." if [[ "$OS_VERSION" == "Ubuntu" ]]; then sudo apt-get update -y || true elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then sudo dnf update -y || true else sudo yum update -y || true fi # Check if Docker is already installed if command -v docker &>/dev/null; then echo "Docker is already installed, skipping installation." else # Install prerequisites based on OS echo "Installing prerequisites..." if [[ "$OS_VERSION" == "Ubuntu" ]]; then sudo apt-get install -y \ apt-transport-https \ ca-certificates \ curl \ gnupg \ lsb-release elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then # Use --allowerasing to handle curl package conflicts sudo dnf install -y --allowerasing \ device-mapper-persistent-data \ lvm2 \ ca-certificates else sudo yum install -y \ device-mapper-persistent-data \ lvm2 \ ca-certificates fi # Install Docker based on OS echo "Installing Docker..." if [[ "$OS_VERSION" == "Ubuntu" ]]; then # Add Docker's official GPG key curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg # Set up the repository echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Update and install sudo apt-get update -y sudo apt-get install -y docker-ce docker-ce-cli containerd.io elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then # Amazon Linux 2023 - use the standard package sudo dnf install -y docker elif [[ "$OS_VERSION" == "Amazon Linux 2" ]]; then # Amazon Linux 2 - use extras sudo amazon-linux-extras install -y docker else # Fallback sudo yum install -y docker fi # Verify Docker was installed if ! command -v docker &>/dev/null; then echo "ERROR: Docker installation failed." exit 1 fi fi # Start Docker service echo "Starting Docker service..." sudo systemctl start docker || sudo service docker start sudo systemctl enable docker || sudo chkconfig docker on # Add current user to docker group to avoid using sudo with docker commands echo "Adding current user to docker group..." sudo usermod -aG docker "$USER" # Create a new data directory for PostgreSQL 17 echo "Creating new data directory for PostgreSQL 17..." mkdir -p ~/pgduckdb_data_v17 # Fix permissions on the data directory echo "Setting correct permissions on data directory..." sudo chown -R 999:999 ~/pgduckdb_data_v17 # 999 is the standard UID for postgres user in Docker sudo chmod 700 ~/pgduckdb_data_v17 # Check architecture ARCH=$(uname -m) echo "Detected architecture: $ARCH" if [[ "$ARCH" == "aarch64" || "$ARCH" == "arm64" ]]; then echo "Using ARM64 architecture (Graviton3)..." else echo "Using x86_64 architecture..." fi # Check if container already exists and remove it if necessary if sudo docker ps -a | grep -q pgduckdb; then echo "Found existing pgduckdb container. Removing it..." sudo docker stop pgduckdb || true sudo docker rm pgduckdb || true fi # Pull the Docker image echo "Pulling Docker image..." sudo docker pull pgduckdb/pgduckdb:17-main # Check available system memory echo "Checking system memory..." TOTAL_MEM_KB=$(grep MemTotal /proc/meminfo | awk '{print $2}') TOTAL_MEM_MB=$((TOTAL_MEM_KB / 1024)) echo "Total system memory: ${TOTAL_MEM_MB}MB" # Calculate 75% of system memory for Docker container limit DOCKER_MEM_LIMIT=$((TOTAL_MEM_MB * 75 / 100)) echo "Setting Docker container memory limit to: ${DOCKER_MEM_LIMIT}MB" # Run the Docker container with memory limit echo "Starting PostgreSQL container..." sudo docker run -d \ --name pgduckdb \ -p 5432:5432 \ -e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \ -e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \ -v ~/pgduckdb_data_v17:/var/lib/postgresql/data \ --restart unless-stopped \ --memory=${DOCKER_MEM_LIMIT}m \ pgduckdb/pgduckdb:17-main # Wait for PostgreSQL to start echo "Waiting for PostgreSQL to start..." sleep 10 # Configure PostgreSQL echo "Configuring PostgreSQL and DuckDB..." # Append settings to the main PostgreSQL configuration file echo "Appending settings to PostgreSQL configuration file..." sudo docker exec -i pgduckdb bash -c "cat >> /var/lib/postgresql/data/postgresql.conf << 'EOT' # DuckDB integration settings duckdb.motherduck_enabled = true # Memory configuration optimized for AWS m7g.xlarge with more conservative settings work_mem = '32MB' # Per-operation memory for sorts, joins, etc. maintenance_work_mem = '512MB' # Memory for maintenance operations shared_buffers = '2GB' # ~12.5% of RAM for shared buffer cache effective_cache_size = '6GB' # Conservative estimate of OS cache max_connections = 100 # Reduced maximum concurrent connections # Detailed query logging log_min_duration_statement = 0 # Log all queries log_statement = 'all' # Log all SQL statements log_duration = on # Log duration of each SQL statement log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u ' # Prefix format EOT" # Restart PostgreSQL to apply all configuration settings echo "Restarting PostgreSQL container to apply all configuration settings..." sudo docker restart pgduckdb # Wait for PostgreSQL to restart echo "Waiting for PostgreSQL container to restart..." sleep 10 # Verify PostgreSQL is running with new settings echo "Verifying PostgreSQL configuration..." sudo docker exec -i pgduckdb psql -U postgres << EOF -- Check if PostgreSQL is running SELECT version(); EOF # Create monitoring script echo "Creating monitoring script..." cat > ~/monitor_pg.sh << 'EOF' #!/bin/bash echo "=== PostgreSQL Container Status ===" docker ps -a -f name=pgduckdb echo -e "\n=== Resource Usage ===" docker stats --no-stream pgduckdb echo -e "\n=== Recent Logs ===" docker logs --tail 10 pgduckdb echo -e "\n=== Connection Test ===" docker exec -it pgduckdb pg_isready -U postgres if [ $? -eq 0 ]; then echo "PostgreSQL is accepting connections." else echo "PostgreSQL is not accepting connections." fi EOF chmod +x ~/monitor_pg.sh # Create startup script echo "Creating startup script..." cat > ~/start_pg.sh << 'EOF' #!/bin/bash echo "Starting PostgreSQL container..." docker start pgduckdb echo "Container status:" docker ps -a -f name=pgduckdb EOF chmod +x ~/start_pg.sh # Check if container is running or restarting echo "Checking container status..." CONTAINER_STATUS=$(sudo docker inspect -f '{{.State.Status}}' pgduckdb 2>/dev/null || echo "not_found") if [[ "$CONTAINER_STATUS" == "restarting" ]]; then echo "WARNING: Container is restarting. Checking logs for errors..." sudo docker logs pgduckdb echo " Try reducing the memory settings in the PostgreSQL configuration if the container keeps restarting." echo "You can manually adjust settings by connecting to the container once it's stable." elif [[ "$CONTAINER_STATUS" != "running" && "$CONTAINER_STATUS" != "not_found" ]]; then echo "WARNING: Container is not running (status: $CONTAINER_STATUS). Checking logs for errors..." sudo docker logs pgduckdb fi # Final status check echo "=== Setup Complete ===" echo "PostgreSQL with DuckDB is now running." echo "Container status:" sudo docker ps -a -f name=pgduckdb echo -e "\n=== Connection Information ===" echo "Host: localhost" echo "Port: 5432" echo "User: postgres" echo "Password: [The password you provided]" echo "Database: postgres" echo -e "\n=== Useful Commands ===" echo "Monitor status: ./monitor_pg.sh" echo "Start after reboot: ./start_pg.sh" echo "Connect to PostgreSQL: docker exec -it pgduckdb psql -U postgres" echo "View logs: docker logs pgduckdb" echo -e "\n=== Note ===" echo "You may need to log out and log back in for the docker group changes to take effect." echo "After that, you can run docker commands without sudo." ```
## dbt cloud configuration dbt cloud is configured as standard Postgres, with a couple of key details. 1. You will need to create a schema in MotherDuck for each user as well as production, as using pg_duckdb to create new schemas in MotherDuck is not supported. 2. You will need to set an environmental variable for `DBT_SCHEMA` that uses the pg_duckdb schema format, which is `ddb$[database]$[schema]` since Postgres only supports a single databse per instance. This will need to be set for each user as well as production with `{{ env_var('DBT_SCHEMA')}}`. 3. The recommended thread count follow our dbt-core recommendation, which is 4 threads. If dbt is configured incorrectly, data may write to Postgres, which is much slower than MotherDuck. In that case, the easiest fix is to rebuild the docker container per above, to assure that no data accidently ends up in Postgres. ## Usage notes There are a few things to know about using dbt cloud with pg_duckdb that are unusual. 1. You write Postgres dialect SQL that is executed against DuckDB. As such, there is some ideosyncracies that are neither Postgres nor DuckDB, but a secret, third thing (pg_duckdb SQL). The details of this are described in the [pg_duckdb documentation](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md). 2. Views are only stored in Postgres without any artifacts in MotherDuck. As such, they can be used for interim data but not final datasets to be consumed by end-users. As such, changing materialization type from view to table, or table to view, is a hybrid MotherDuck & Postgres transaction, and unsupported. 3. Running on multiple threads can occasionally cause deadlocks with the pg_duckdb catalog maintenance service. This can be resolved with `dbt retry` in your production pipeline runs. 4. DuckDB types are more specific than Postgres, so model builds using numeric types will throw errors that can be resolved with specific typing. 5. From time-to-time the Postgres catalog can get out of sync, and will show tables that do not exist in MotherDuck. To resolve this, create the missing object in MotherDuck, i.e. `CREATE TABLE my_schema.model_name AS SELECT 1;`, which will unblock your dbt model. --- Source: https://motherduck.com/docs/integrations/transformation/dbt --- sidebar_position: 1 title: dbt with DuckDB and MotherDuck description: DuckDB and MotherDuck both support using dbt to manage data loading and transformation sidebar_label: dbt core --- [Data Build Tool](https://www.getdbt.com/) (dbt) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses by defining SQL in model files. It bring the composability of programming languages to SQL while automating the mechanics of updating tables. [dbt-duckdb](https://github.com/jwills/dbt-duckdb) is the adapter which allows dbt to use DuckDB and MotherDuck. The adapter also supports [DuckDB extensions](https://duckdb.org/docs/extensions/overview) and any of the additional [DuckDB configuration options](https://duckdb.org/docs/sql/configuration). ## Installation Since dbt is a Python library, it can be installed through pip: ```pip3 install dbt-duckdb``` will install both `dbt` and `duckdb`. ## Configuration for Local DuckDB This configuration allows you to connect to S3 and perform read/write operations on Parquet files using an AWS access key and secret. `profiles.yml` ```yaml default: outputs: dev: type: duckdb path: /tmp/dbt.duckdb threads: 4 extensions: - httpfs - parquet settings: s3_region: my-aws-region s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}" target: dev ``` :::tip The `path` attribute specifies where your DuckDB database file will be created. By default, this path is relative to your `profiles.yml` file location. If the database doesn't exist at the specified path, DuckDB will automatically create it. ::: You can find more information about these connections profiles in the [dbt documentation](https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles). ## Configuration for MotherDuck The only change needed for motherduck is the `path:` setting. ```yaml default: outputs: dev: type: duckdb path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}" threads: 4 extensions: - httpfs - parquet settings: s3_region: my-aws-region s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}" target: dev ``` This assumes that you have setup `MOTHERDUCK_TOKEN` as an environment variable. To know more about how to persist your authentication credentials, read [Authenticating to MotherDuck using an access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). If you don't set the `motherduck_token` in your path, you will be prompted to authenticate to MotherDuck when running your `dbt run` command. ![auth_md](../img/auth_dbt.png) Follow the instructions and it will export the service account variable for the current `dbt run` process. DuckDB will parallelize a single write query as much as possible, so the gains from running more than one query at a time are minimal on the database side. That being said, our testing indicates that setting `threads: 4` typically leads to the best performance. ## Attaching Additional Databases dbt-duckdb supports attaching additional databases to your main DuckDB connection, allowing you to work with multiple databases simultaneously. This is particularly useful when you need to reference data from different sources or when working with separate databases for different purposes. ### Configuration To attach additional databases, add an `attach` section to your profile configuration: ```yaml default: outputs: dev: type: duckdb path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}" threads: 4 extensions: - httpfs - parquet attach: - path: "md:other_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}" alias: other_db - path: "md:third_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}" alias: third_db settings: s3_region: my-aws-region s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}" s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}" target: dev ``` :::tip The `alias` parameter is optional. If not specified, dbt-duckdb will use the filename (without extension) as the alias for the attached database. ::: ### Usage Example Once you have attached databases, you can use the `database` config parameter in your dbt models to specify which database to write to: ```sql -- models/my_model.sql {{ config(database='other_db') }} SELECT id, name, created_at FROM {{ ref('source_table') }} WHERE created_at >= '2024-01-01' ``` You can also specify the database for source tables in your `sources.yml` file: ```yaml # models/sources.yml version: 2 sources: - name: external_data database: other_db tables: - name: customers description: Customer data from external database - name: orders description: Order data from external database ``` Then reference these sources in your models, from the correct database: ```sql -- models/combined_data.sql SELECT c.customer_id, c.customer_name, o.order_id, o.order_date FROM {{ source('external_data', 'customers') }} c JOIN {{ source('external_data', 'orders') }} o ON c.customer_id = o.customer_id ``` ## Extra resources Take a look at our video guide on DuckDB and dbt provided below, along with the corresponding [demo tutorial on GitHub](https://github.com/mehd-io/dbt-duckdb-tutorial). --- Source: https://motherduck.com/docs/integrations/transformation/index --- title: Data Transformation description: Transform your data inside MotherDuck --- # Data Transformation Use MotherDuck to transform your data. ## Included pages - [dbt with DuckDB and MotherDuck](https://motherduck.com/docs/integrations/transformation/dbt): DuckDB and MotherDuck both support using dbt to manage data loading and transformation - [dbt cloud with MotherDuck via pg_duckdb](https://motherduck.com/docs/integrations/transformation/dbt-cloud): For dbt cloud users, pg_duckdb can be used as a shim for MotherDuck --- Source: https://motherduck.com/docs/integrations/web-development/index --- title: Web Development description: Build web applications with MotherDuck --- # Web Development Use MotherDuck to power your web applications and services. ## Included pages - [Vercel](https://motherduck.com/docs/integrations/web-development/vercel): Deploy MotherDuck-powered Next.js apps on Vercel using Postgres endpoint or the Wasm SDK --- Source: https://motherduck.com/docs/integrations/web-development/vercel --- sidebar_position: 1 title: Vercel description: Deploy MotherDuck-powered Next.js apps on Vercel using Postgres endpoint or the Wasm SDK sidebar_label: Vercel --- [Vercel](https://vercel.com/) is a cloud platform for static sites and serverless functions. It supports two ways to connect your Next.js application to MotherDuck: - **[Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint)** — connects from server-side API routes using the standard [`pg`](https://www.npmjs.com/package/pg) npm package. This lets you connect to MotherDuck databases like a regular Postgres database, but use MotherDuck as the fast analytics query backend. - **[Wasm SDK](/sql-reference/wasm-client)** — runs DuckDB directly in the browser using WebAssembly. Best for building analytics dashboards that are highly interactive, allowing queries to execute on the user's device. Both approaches work with Vercel's Native Integration for automatic token management. Vercel typically provides two ways to integrate with 3rd party services: - Native integration: create a new account on the 3rd party service and connect it to Vercel. Billing and setup is managed by Vercel. - Non-native integration (connectable accounts): connect existing 3rd party accounts to Vercel. :::info Vercel supports Native Integration with MotherDuck. Support for non-native integration (connectable accounts) is not yet available. But to use your existing MotherDuck account, populate the Vercel app environment variables with a MotherDuck token for your existing account. ::: ## Native integration To kickstart the integration, you can either start from: - [Vercel's marketplace](https://vercel.com/marketplace/motherduck) and install the integration from there on an existing Vercel project. - Deploy a new project from [MotherDuck's Vercel template](https://vercel.com/motherduck-marketing/~/integrations/motherduck) which includes snippets to get started with MotherDuck and your Next.js project. ### How to install 1. To install the MotherDuck Native Integration from the Vercel Marketplace: 2 Navigate to the Vercel Marketplace or to the Integrations Console on your Vercel Dashboard. 3. Locate the MotherDuck integration. 4. Click Install. 5. On the Install MotherDuck modal, you are presented with two plans options. ![modal1](./img/vercel1.png) 6. On the next modal, you would be prompt to give your database a name. Note that a new installation will create a new account and database within a new MotherDuck organization. ![modal2](./img/vercel2.png) 7. You are all set! You have now a new account and database within a new organization. Plus, tokens ([access token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token), and [read scaling token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#understanding-read-scaling-tokens)) are automatically generated and stored in Vercel's environment variables. ![model3](./img/vercel3.png) You can head to `Getting Started` section on the integration page to have more information on how to use the integration. ![model4](./img/vercel4.png) --- ## Connect using the Postgres endpoint Next.js API routes can connect to MotherDuck through the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) using the [`pg`](https://www.npmjs.com/package/pg) npm package. This gives you a thin-client path to query MotherDuck from serverless functions without any DuckDB dependencies. This guide walks through building a Next.js app that queries NYC taxi data from MotherDuck's built-in `sample_data` database. ### Prerequisites - [Node.js](https://nodejs.org/) v18+ - A [Vercel account](https://vercel.com/signup) - A [MotherDuck account](https://motherduck.com/) and [access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) ### Project setup Create a new Next.js project and install the Postgres client: ```bash npx create-next-app@latest motherduck-nextjs --typescript --app cd motherduck-nextjs npm install pg @vercel/functions npm install --save-dev @types/pg ``` ### Store your token For local development, create a `.env.local` file (this is gitignored by default in Next.js): ```text MOTHERDUCK_TOKEN="your_token_here" MOTHERDUCK_HOST="pg.us-east-1-aws.motherduck.com" MOTHERDUCK_DB="sample_data" ``` For production, add the environment variable through the Vercel dashboard or CLI: ```bash vercel env add MOTHERDUCK_TOKEN ``` If you installed the [Native Integration](#native-integration), your access token is already available as an environment variable. ### Create the connection pool Create `src/lib/motherduck.ts`. The pool is initialized at module scope so it persists across requests within the same function instance — this is the recommended pattern for [connection pooling on Vercel](https://vercel.com/kb/guide/connection-pooling-with-functions). ```typescript import { Pool, PoolClient } from "pg"; import { attachDatabasePool } from "@vercel/functions"; const token = process.env.MOTHERDUCK_TOKEN; const host = process.env.MOTHERDUCK_HOST ?? "pg.us-east-1-aws.motherduck.com"; const db = process.env.MOTHERDUCK_DB ?? "sample_data"; if (!token) { throw new Error("MOTHERDUCK_TOKEN environment variable is required"); } const pool = new Pool({ connectionString: `postgresql://user:${token}@${host}:5432/${db}`, ssl: { rejectUnauthorized: true }, max: 10, idleTimeoutMillis: 5000, }); attachDatabasePool(pool); export async function withClient( fn: (client: PoolClient) => Promise ): Promise { const client = await pool.connect(); try { return await fn(client); } finally { client.release(); } } ``` A few things to note: - **`attachDatabasePool(pool)`** from `@vercel/functions` ensures idle connections are cleaned up before a function instance is suspended, preventing connection leaks. - **`idleTimeoutMillis: 5000`** closes unused connections after 5 seconds, balancing reuse during traffic bursts with prompt cleanup during quiet periods. - **`ssl: { rejectUnauthorized: true }`** enables full certificate verification (`verify-full`). Node.js verifies the server certificate against the system CA bundle and checks that the hostname matches the certificate. MotherDuck's Postgres endpoint uses a publicly trusted certificate, so no custom CA configuration is needed. ### Write the API routes Create two route handlers. The first returns a sample of recent taxi trips. **`src/app/api/trips/route.ts`** ```typescript import { NextResponse } from "next/server"; import { withClient } from "@/lib/motherduck"; export async function GET() { try { const rows = await withClient(async (client) => { const result = await client.query( `SELECT tpep_pickup_datetime AS pickup, tpep_dropoff_datetime AS dropoff, passenger_count, trip_distance, fare_amount, tip_amount, total_amount FROM nyc.taxi ORDER BY tpep_pickup_datetime DESC LIMIT 20` ); return result.rows; }); return NextResponse.json(rows); } catch (error) { console.error("Failed to fetch trips:", error); return NextResponse.json( { error: "Failed to fetch trips" }, { status: 500 } ); } } ``` The second accepts date range parameters and returns aggregated fare data. It validates inputs before querying and uses parameterized queries (`$1`, `$2`) to prevent SQL injection — never interpolate user input directly into SQL strings. **`src/app/api/stats/route.ts`** ```typescript import { NextRequest, NextResponse } from "next/server"; import { withClient } from "@/lib/motherduck"; export async function GET(request: NextRequest) { const startDate = request.nextUrl.searchParams.get("start"); const endDate = request.nextUrl.searchParams.get("end"); if (!startDate || !endDate) { return NextResponse.json( { error: "Both 'start' and 'end' query parameters are required. Use YYYY-MM-DD format.", }, { status: 400 } ); } const datePattern = /^\d{4}-\d{2}-\d{2}$/; if (!datePattern.test(startDate) || !datePattern.test(endDate)) { return NextResponse.json( { error: "Invalid date format. Use YYYY-MM-DD." }, { status: 400 } ); } try { const data = await withClient(async (client) => { const result = await client.query( `SELECT sum(passenger_count)::INTEGER AS total_passengers, round(sum(fare_amount), 2) AS total_fare FROM nyc.taxi WHERE tpep_pickup_datetime >= $1 AND tpep_pickup_datetime < $2`, [`${startDate} 00:00:00`, `${endDate} 00:00:00`] ); return result.rows[0]; }); return NextResponse.json({ start: startDate, end: endDate, ...data, }); } catch (error) { console.error("Failed to fetch stats:", error); return NextResponse.json( { error: "Failed to fetch stats" }, { status: 500 } ); } } ``` ### Test locally ```bash npm run dev ``` Then open `http://localhost:3000/api/trips` or try the stats endpoint with a date range: ```text http://localhost:3000/api/stats?start=2022-11-01&end=2022-12-01 ``` ### Deploy ```bash vercel deploy ``` Or push to a connected Git repository — Vercel deploys automatically on every push. --- ## Connect using the Wasm SDK The Wasm SDK runs DuckDB in the browser, making it ideal for highly interactive analytics dashboards. ### Project templates Learn more about how to setup your projects by using the following templates: - [MotherDuck's Vercel template](https://github.com/MotherDuck-Open-Source/nextjs-motherduck-wasm-analytics-quickstart) : A fully-fledged template that includes a Next.js project and a MotherDuck WASM setup with sample data integration and an interactive data visualization example. - [MotherDuck's Vercel template minimal](https://github.com/MotherDuck-Open-Source/nextjs-motherduck-wasm-analytics-quickstart-minimal) : a minimal template which includes a Next.js project and MotherDuck Wasm setup with some sample data integration. --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/ai-features-in-ui --- sidebar_position: 5 title: AI Features in the MotherDuck UI description: Use AI-powered SQL editing, FixUp, and natural language queries in the MotherDuck web interface. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; :::tip Quick overview For a hands-on walkthrough of FixIt and Edit in the web UI, see the [Web UI guide](/getting-started/interfaces/motherduck-quick-tour/#fix-errors-and-edit-queries-with-ai). ::: ## Automatically Edit SQL Queries in the MotherDuck UI Edit is a MotherDuck AI-powered feature which allows you to edit SQL queries in the MotherDuck UI. The AI is aware of DuckDB-specific SQL features and relevant database schemas to provide effective suggestions. Select the specific part of the query you want to edit, then press the keyboard shortcut to open the Edit dialog: * Windows/Linux: `Ctrl + Shift + E` * macOS: `⌘ + Shift + E` In the Edit dialog, enter your prompt (e.g., "extract the domain from the url, using a regex") and click Suggest edit. ![Edit](../img/edit-prompt.png) If the suggestion is not as desired, it can be further clarified with follow-up prompts. ![Edit](../img/edit-follow-up.png) When happy with the change, click 'Apply edit', and the change will be applied to the query. ![Edit](../img/edit-follow-up-2.png) ## Automatically Fix SQL Errors in the MotherDuck UI FixIt is a MotherDuck AI-powered feature that helps you resolve common SQL errors by offering fixes in-line. Read more about it in our [blog post](https://motherduck.com/blog/introducing-fixit-ai-sql-error-fixer/). FixIt can also be called programmatically using the `prompt_fix_line` . Find more information in the [prompt_fix_line documentation](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line). ### How FixIt works By default, FixIt is enabled for all users. If you run a query that has an error, FixIt will automatically analyze the query and suggest in-line fixes. When accepting a fix, MotherDuck will automatically update your query and re-execute it. ![FixIt](../img/fixit-suggestion.png) When 'Auto-suggest' is un-toggled, FixIt will not automatically suggest fixes anymore. FixIt can still be manually triggered by clicking 'Suggest fix' at the bottom of the error message. ![FixIt](../img/fixit-manual-suggestion.png) ## Access SQL Assistant functions MotherDuck provides built-in AI features to help you write, understand and fix DuckDB SQL queries more efficiently. These features include: - [Answer questions about your data](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-query) using the `prompt_query` pragma. - [Generate SQL](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql) for you using the `prompt_sql` table function. - [Correct and fix up your SQL query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fixup) using the `prompt_fixup` table function. - [Correct and fix up your SQL query line-by-line](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line) using the `prompt_fix_line` table function. - [Help you understand a query](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-explain) using the `prompt_explain` table function. - [Help you understand contents of a database](/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-schema) using the `prompt_schema` table function. ### Example usage of prompt_sql We use MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). ```sql CALL prompt_sql('what are the top domains being shared on hacker_news?'); ``` Output of this SQL statement is a single column table that contains the AI-generated SQL query. | **query** | |-----------------| | ```sql SELECT COUNT(*) as domain_count, SUBSTRING(SPLIT_PART(url, '//', 2), 1, POSITION('/' IN SPLIT_PART(url, '//', 2)) - 1) as domain FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY domain_count DESC LIMIT 10``` | --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/building-analytics-agents --- sidebar_position: 5 title: Custom AI Agent Builder's Guide description: Build AI-powered analytics agents using MotherDuck's SQL functions and MCP server integration. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Building analytics agents with MotherDuck Analytics agents are AI-powered systems that allow users to interact with data using natural language. Instead of writing SQL queries or building dashboards, users can ask questions like "What were our top-selling products last quarter?" and get immediate answers. This guide covers best practices for building production-ready analytics agents on MotherDuck. ## Prerequisites - **Agent framework**: [Claude Agent SDK](https://docs.anthropic.com/en/api/agent-sdk/overview), [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/), or Claude Desktop with MotherDuck remote MCP connector - **MotherDuck account** with the data you want to query - **Clean, well-structured data**: The better your schema and metadata, the better your agent performs ## Step 1: Define your agent's interface Choose the interface your agent will use to query your MotherDuck database. ### Option A: Generated SQL The agent generates SQL queries and executes them through a tool/function call. This provides maximum flexibility - agents can answer any question your data supports - but requires good SQL generation capabilities. **Implementation approaches:** **MCP Server**: Use our [remote MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) (or [local MCP server](/key-tasks/ai-and-motherduck/mcp-setup/#remote-vs-local-mcp-server) for self-hosted, read-write) for Claude Desktop, Cursor, ChatGPT, or Claude Code **Custom tool calling**: Create a function that accepts SQL strings and executes them: ```python import duckdb def execute_sql(query: str) -> str: """Execute SQL query against MotherDuck""" conn = duckdb.connect('md:my_database?motherduck_token=') try: result = conn.execute(query).fetchdf() return result.to_string() except Exception as e: return f"Error: {str(e)}" ``` ### Option B: Parameterized query templates The agent receives structured parameters that fill predefined SQL templates. This provides strict correctness guarantees and is easier to validate, but is less flexible and requires more upfront development with queries limited to predefined questions. **Example**: Agent chooses calling a custom tool with a domain-specific signature like `get_sales_by_region(region: str, start_date: date, end_date: date)` instead of generating custom SQL. **Recommendation**: Start with Option A (SQL generation) unless you have strict correctness requirements or very limited query patterns. ## Step 2: Give your agent SQL knowledge Your LLM needs to know how to write good DuckDB queries. ### System prompt for DuckDB and MotherDuck A system prompt is the foundational instruction set that guides your agent's behavior and capabilities. It's critical for ensuring your agent generates correct, efficient SQL queries and understands how to explore data effectively. The query guide below should be added to your system prompt because it contains: - DuckDB SQL syntax and conventions - Common patterns and best practices - How to explore schemas efficiently
query_guide.md ```text # DuckDB SQL Query Syntax and Performance Guide ## General Knowledge ### Basic Syntax and Features **Identifiers and Literals:** - Use double quotes (`"`) for identifiers with spaces/special characters or case-sensitivity - Use single quotes (`'`) for string literals **Flexible Query Structure:** - Queries can start with `FROM`: `FROM my_table WHERE condition;` (equivalent to `SELECT * FROM my_table WHERE condition;`) - `SELECT` without `FROM` for expressions: `SELECT 1 + 1 AS result;` - Support for `CREATE TABLE AS` (CTAS): `CREATE TABLE new_table AS SELECT * FROM old_table;` **Advanced Column Selection:** - Exclude columns: `SELECT * EXCLUDE (sensitive_data) FROM users;` - Replace columns: `SELECT * REPLACE (UPPER(name) AS name) FROM users;` - Pattern matching: `SELECT COLUMNS('sales_.*') FROM sales_data;` - Transform multiple columns: `SELECT AVG(COLUMNS('sales_.*')) FROM sales_data;` **Grouping and Ordering Shortcuts:** - Group by all non-aggregated columns: `SELECT category, SUM(sales) FROM sales_data GROUP BY ALL;` - Order by all columns: `SELECT * FROM my_table ORDER BY ALL;` **Complex Data Types:** - Lists: `SELECT [1, 2, 3] AS my_list;` - Structs: `SELECT {'a': 1, 'b': 'text'} AS my_struct;` - Maps: `SELECT MAP([1,2],['one','two']) AS my_map;` - Access struct fields: `struct_col.field_name` or `struct_col['field_name']` - Access map values: `map_col[key]` **Date/Time Operations:** - String to timestamp: `strptime('2023-07-23', '%Y-%m-%d')::TIMESTAMP` - Format timestamp: `strftime(NOW(), '%Y-%m-%d')` - Extract parts: `EXTRACT(YEAR FROM DATE '2023-07-23')` ### Database and Table Qualification **Fully Qualified Names:** - Tables are accessed by fully qualified names: `database_name.schema_name.table_name` - There is always one current database: `SELECT current_database();` - Tables from the current database don't need database qualification: `schema_name.table_name` - Tables in the main schema don't need schema qualification: `table_name` - Shorthand: `my_database.my_table` is equivalent to `my_database.main.my_table` **Switching Databases:** - Use `USE my_other_db;` to switch current database - After switching, tables in that database can be accessed without qualification ### Schema Exploration **Get database and table information:** - List all databases: `SELECT alias as database_name, type FROM MD_ALL_DATABASES();` - List tables in database: `SELECT database_name, schema_name, table_name, comment FROM duckdb_tables() WHERE database_name = 'your_database';` - List views in database: `SELECT database_name, schema_name, view_name, comment, sql FROM duckdb_views() WHERE database_name = 'your_database';` - Get column information: `SELECT column_name, data_type, comment, is_nullable FROM duckdb_columns() WHERE database_name = 'your_database' AND table_name = 'your_table';` **Sample data exploration:** - Quick preview: `SELECT * FROM table_name LIMIT 5;` - Column statistics: `SUMMARIZE table_name;` - Describe table: `DESCRIBE table_name;` ### Performance Tips **QUALIFY Clause for Window Functions:** -- Get top 2 products by sales in each category SELECT category, product_name, sales_amount FROM products QUALIFY ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales_amount DESC) <= 2; **Efficient Patterns:** - Use `arg_max()` and `arg_min()` for "most recent" queries - Filter early to reduce data volume - Use CTEs for complex queries - Prefer `GROUP BY ALL` for readability - Use `QUALIFY` instead of subqueries for window function filtering **Avoid These Patterns:** - Functions on the left side of WHERE clauses (prevents pushdown) - Unnecessary ORDER BY on intermediate results - Cross products and cartesian joins ```
### Function documentation MotherDuck maintains `function_docs.jsonl` - compact, LLM-friendly documentation for every DuckDB/MotherDuck function available at: https://app.motherduck.com/assets/docs/function_docs.jsonl **How to use**: 1. When user asks a question, search function docs using FTS or semantic search 2. Add the 5 most relevant function descriptions to the agent's context 3. This helps with specialized functions (window functions, date arithmetic, JSON operations, etc.) ## Step 3: Give your agent schema context Your agent needs to understand your database structure to generate correct queries. ### Finding relevant tables Our `query_guide.md` explains how agents can explore schemas autonomously to find relevant tables. For faster, non-agentic identification, use the built-in `INFORMATION_SCHEMA`. ```sql -- adjust the search terms and database(s) to your needs SELECT table_schema, table_name, table_comment FROM information_schema."tables" where table_catalog = current_database() and table_name like '%sales%' or table_name like '%customer%' or table_name like '%cust%' or table_comment like '%sales%' or table_comment like '%customer%'; ``` For column level information you can use `information_schema.columns`. ### Make schemas agent-friendly **Use clear naming**: Choose explicit, unambiguous table and column names ❌ Bad: `ord_dtl`, `cust_id`, `amt` ✅ Good: `order_details`, `customer_id`, `total_amount` **Add context with COMMENT ON**: ```sql COMMENT ON TABLE orders IS 'Customer orders since 2020. Join to customers via customer_id'; COMMENT ON COLUMN orders.status IS 'Possible values: pending, shipped, delivered, cancelled'; COMMENT ON COLUMN orders.total_amount IS 'Total in USD including tax and shipping'; ``` Comments help agents understand table relationships, valid values, and business logic. Learn more: [COMMENT ON documentation](https://duckdb.org/docs/stable/sql/statements/comment_on.html) ## Step 4: Configure access controls Secure your agent's database access with appropriate permissions and isolation. ### Read-only access Use [read-scaling tokens](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to ensure your agent only has read access. Read-scaling tokens connect to dedicated read replicas that cannot modify data. ```python import duckdb # Using a read-scaling token ensures read-only access con = duckdb.connect('md:my_database?motherduck_token=') ``` **For multi-tenant [customer-facing analytics](/getting-started/customer-facing-analytics/) agents**: Use [service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) for your agents. You can grant these service accounts read-only access to specific databases using [shares](/key-tasks/sharing-data/sharing-overview/): ```sql ATTACH 'md:_share/my_org/abc123' AS shared_data; ``` Consider creating separate service accounts per user/tenant for full compute isolation. **Capacity planning**: Choose the number of [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) replicas and [Duckling size](/about-motherduck/billing/duckling-sizes/) according to the expected query complexity and concurrency. ### Read-write access & sandboxing For agents that need to create tables, modify data, or experiment safely, use zero-copy clones to create an isolated sandbox. This provides safe experimentation completely isolated from production data, with instant creation through zero-copy operations. Agents get full capabilities to create tables, modify data, and experiment freely, with easy sharing of results back to production when ready. ```sql -- Create instant writable copy (clones must match source retention type) CREATE DATABASE my_sandbox FROM my_database_share; -- Agent can now read/write without affecting production data -- Changes are isolated to this copy ``` Learn more: [CREATE DATABASE documentation](/sql-reference/motherduck-sql-reference/create-database/) ## Step 5: Implement your agent Build your agent using an SDK or framework that supports function calling. **Quick start option**: For immediate experimentation, try [Claude Desktop with the MotherDuck remote MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) - no coding required. **Custom agent option**: Here's a simple example using the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/): ```python import duckdb from agents import Agent, Runner, function_tool # Connect to MotherDuck (use a read-scaling token for read-only access) conn = duckdb.connect('md:?motherduck_token=') @function_tool def query_motherduck(sql: str) -> str: """Execute SQL query against MotherDuck database. Args: sql: The SQL query to execute against the MotherDuck database. """ try: result = conn.execute(sql).fetchdf() return result.to_string() except Exception as e: return f"Error executing query: {str(e)}" # Load the DuckDB query guide (copy the system prompt template above into a local file) with open('query_guide.md', 'r') as f: query_guide = f.read() # Create agent with database tool agent = Agent( name="MotherDuck Analytics Agent", instructions=f"""You are a data analyst helping users query a MotherDuck database. Use the query_motherduck tool to execute SQL queries against the database. Always start with schema exploration before querying specific tables. {query_guide} """, tools=[query_motherduck] ) # Run the agent result = Runner.run_sync( agent, "What were the top 5 products by revenue last month?" ) print(result.final_output) ``` ### Validating queries before showing to users If a human reviews generated queries before execution, use `try_bind()` to validate SQL without running it. It checks syntax and referenced tables/columns in milliseconds. **Structured output:** `try_bind()` returns `error_message` (VARCHAR) and `error_type` (VARCHAR). Use `error_type` to decide what to do next: `ok` means validation passed, `parser` means SQL syntax is invalid, and `binder` means object resolution failed (for example, a missing table/column or invalid reference). On `parser` or `binder`, pass `error_message` back into the next generation attempt so the model can repair the query. ```sql -- Valid query - error_type is 'ok', error_message is empty CALL try_bind('SELECT customer_id, total FROM orders WHERE status = ''shipped'''); -- Invalid query - returns error_message and error_type (e.g. 'parser' or 'binder') CALL try_bind('SELECT * FORM orders'); ``` **Example integration:** ```python def generate_query_for_review(question: str) -> str: """Generate and validate SQL before showing to user.""" error_msg = None for attempt in range(3): sql = agent.generate_sql(question, error_feedback=error_msg) # Validate before showing (error_message, error_type) row = conn.execute("CALL try_bind(?)", [sql]).fetchall()[0] error_message, error_type = row[0], row[1] if error_type == "ok": return f"Generated query:\n{sql}" error_msg = error_message or f"Validation failed: {error_type}" return "Could not generate a valid query to answer the question" ``` Feed `error_message` and `error_type` from `try_bind()` into retries to fix syntax and binding errors. ## Step 6: Test and iterate Validate your agent's performance and refine its behavior based on real-world usage. ### Testing and quality Choose a set of realistic user questions that cover simple filters ("Show me sales from last month"), complex analysis ("What's the trend in customer retention by region?"), and edge cases like empty results ("Show me sales for December 2019") or ambiguous requests ("Show me the best customers"). Test each question and check the agent's behavior. Focus on SQL correctness, result accuracy and query performance. See the next section for how to tackle common issues. ### Common issues and solutions | Issue | Solution | |-------|----------| | Invalid SQL generation | Improve system prompt, add [function docs](#function-documentation) to context | | Wrong tables queried | Add [COMMENT ON](https://duckdb.org/docs/stable/sql/statements/comment_on.html), improve schema descriptions, implement table filtering | | Misunderstood questions | Add domain-specific examples to system prompt | | Query performance | [EXPLAIN ANALYZE](/sql-reference/motherduck-sql-reference/explain-analyze/) to diagnose query inefficiencies, adjust [Duckling size](/about-motherduck/billing/duckling-sizes/) to scale compute resources | ## Next steps - Explore our [MCP Server](/sql-reference/mcp/) docs (remote and local) - Try [AI Features in the MotherDuck UI](/key-tasks/ai-and-motherduck/ai-features-in-ui/) with Generate SQL & Edit - Learn about [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for multi-tenant agents - Review [Shares](/key-tasks/sharing-data/sharing-overview/) for read-only data access --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/dive-theme-gallery --- sidebar_position: 3.6 title: Dive theme gallery description: Ready-to-use theme prompts for Dives with screenshots showing each style applied to the same dataset --- import EmbeddedDive from '@site/src/components/EmbeddedDive'; Dives give you unlimited abilities in creating visualizations, but that does not automatically mean *good* visualizations. Use the following themes to guide your AI agent to learn from decades of experienced, excellent data visualizers. Pick a theme, copy the prompt, and paste it into your AI agent alongside your data question. The live theme gallery Dive below lets you switch between all 15 themes interactively. ## Tufte Minimal Inspired by Edward Tufte, *The Visual Display of Quantitative Information* (1983). ![A Dive styled with the Tufte Minimal theme showing monochrome charts with generous whitespace and no gridlines](./img/theme_gallery_tufftle_minimal.png) ```text Create a Dive with a Tufte Minimal style. Inspired by: Edward Tufte, The Visual Display of Quantitative Information (1983). Visual rules: - Background: #FFFFF8. Text: #111. Muted: #666. - Chart colors: monochrome ["#111","#666","#999"]. - Font: Georgia, serif. Titles: normal weight, no transform. - Layout: generous whitespace, no gridlines, no chart borders. - Charts: no gridlines, thin strokes (1.5px), linear interpolation. - Direct labeling instead of legends. Small multiples preferred. - Interactive: year toggle, metric toggle, click-to-filter on bars/pies. Pairs well with: small multiples, sparklines, scatter plots, slope charts, direct-labeled values, heatmaps, composed dual-axis charts. Avoid: pie charts, 3D charts, heavy gridlines. Feel: Quiet authority — the data speaks for itself. ``` ## Ink & Paper Inspired by the New York Times Graphics Desk. ![A Dive styled with the Ink and Paper theme showing clean left-aligned charts with subtle gridlines](./img/theme_gallery_ink_and_paper.png) ```text Create a Dive with an Ink & Paper style. Inspired by: New York Times Graphics Desk. Visual rules: - Background: #fff. Text: #121212. Muted: #666. - Chart colors: ["#326fa8","#e15759","#59a14f","#edc949","#af7aa1"]. - Font: Georgia, serif. Titles: bold. - Layout: clean, left-aligned, subtle gridlines. - Charts: light gridlines, 2px strokes, linear interpolation. - Interactive: year toggle, metric toggle, click-to-filter cross-filtering. Pairs well with: annotated line charts, bar charts, horizontal bars, step charts, small multiples, tables, composed dual-axis charts, heatmaps. Feel: Authoritative journalism — clarity above all. ``` ## Corporate Dashboard Inspired by classic BI tools (Tableau, Power BI). ![A Dive styled with the Corporate Dashboard theme showing card-based charts with structured grid and uppercase titles](./img/theme_gallery_corporate_dashboard.png) ```text Create a Dive with a Corporate Dashboard style. Inspired by: Classic BI tools (Tableau, Power BI). Visual rules: - Background: #f5f5f5. Text: #333. Muted: #777. - Chart colors: ["#2563eb","#16a34a","#dc2626","#f59e0b","#8b5cf6"]. - Font: system-ui, sans-serif. Titles: semibold, UPPERCASE. - Layout: card-based, subtle borders, structured grid. - Interactive: year & metric toggles, click-to-filter cross-filtering. Pairs well with: line charts, pie charts, KPI cards, data tables, bar charts, combo charts, heatmaps. Feel: Boardroom-ready — structured and professional. ``` ## FT Salmon Inspired by Financial Times Visual Journalism. ![A Dive styled with the FT Salmon theme showing charts on a signature salmon background with serif typography](./img/theme_gallery_ft_salmon.png) ```text Create a Dive with an FT Salmon style. Inspired by: Financial Times Visual Journalism. Visual rules: - Background: #FFF1E5 (signature salmon). Text: #33302E. Muted: #807973. - Chart colors: ["#0F5499","#990F3D","#FF7FAA","#00A0DD"]. - Font: Georgia, serif. Titles: semibold. - Interactive: year & metric toggles, click-to-filter cross-filtering. Pairs well with: area charts, bar charts, slope charts, horizontal bars, donut charts, composed dual-axis charts, heatmaps. Feel: Financial authority — the pink paper, digitized. ``` ## Soft Infographic Inspired by David McCandless, *Information is Beautiful*. ![A Dive styled with the Soft Infographic theme showing rounded bar charts and pastel colors on a light background](./img/theme_gallery_soft_infographic.png) ```text Create a Dive with a Soft Infographic style. Inspired by: David McCandless, Information is Beautiful. Visual rules: - Background: #fafafa. Text: #2d2d2d. Muted: #888. - Chart colors: ["#FF6B6B","#4ECDC4","#45B7D1","#FFA07A","#98D8C8"]. - Font: system-ui, sans-serif. Titles: bold. - Charts: rounded bars (8px radius), smooth curves. - Interactive: year & metric toggles, click-to-filter cross-filtering. Pairs well with: rounded bar charts, donut charts, line charts, radar charts, composed charts, heatmaps. Feel: Friendly and approachable — data for everyone. ``` ## Du Bois Inspired by W.E.B. Du Bois, Paris Exposition (1900). ![A Dive styled with the Du Bois theme showing bold horizontal bars on a parchment background with crimson and gold accents](./img/theme_gallery_dubois.png) ```text Create a Dive with a Du Bois style. Inspired by: W.E.B. Du Bois, Paris Exposition (1900). Visual rules: - Background: #e8d4b8 (parchment). Text: #1a1a1a. Muted: #654321. - Chart colors: ["#dc143c","#228b22","#000","#ffd700","#654321"]. - Charts: horizontal bars, no gridlines, sharp edges (0 radius). - Interactive: year & metric toggles, click-to-filter cross-filtering. Pairs well with: horizontal bar charts, pie charts, heatmaps, composed dual-axis charts. Feel: Bold proclamation — data as civil rights evidence. ``` ## More themes The live gallery includes 9 additional themes you can explore and copy: | Theme | Category | Feel | |-------|----------|------| | Knowledge Beautiful | Modern | Dense and layered — every pixel earns its place | | Film Flowers | Artistic | Organic and poetic — data as a living garden | | Dark Canvas | Modern | Midnight studio — data glowing in the dark | | Playful Sketch | Artistic | Personal and intimate — a handwritten letter in data | | Neon 80s | Fun | Arcade at midnight — data goes synthwave | | Pirate Map | Fun | X marks the data — adventure on the high seas | | Vaporwave | Fun | Digital sunset — nostalgia rendered in pastel neon | | Terminal | Fun | `> data.query --style=hacker` — pure terminal vibes | | Candy Pop | Fun | Sugar rush — joyful, bold, unapologetically fun | Explore all 15 themes in the dive: ## Using a gallery prompt with your own data These prompts are designed to be mixed with your data question. Replace the dataset-specific parts and keep the visual rules: ```text Create a Dive showing monthly active users from my analytics database. Theme: FT Salmon - Background: #FFF1E5 (signature salmon). Text: #33302E. Muted: #807973. - Chart colors: ["#0F5499","#990F3D","#FF7FAA","#00A0DD"]. - Font: Georgia, serif. Titles: semibold. - Interactive: time filter (Last 7 days | Last 30 days | Last 90 days | All time), click-to-filter cross-filtering. Charts: 1. Area chart — DAU trend over time 2. Bar chart — Users by country 3. Donut — Traffic source breakdown 4. Table — Top pages by session count 5. Composed chart — Sessions bars + Bounce rate line (dual Y-axis) 6. Heatmap — Country × Day of week activity ``` For more on structuring theme prompts, see [Theming and styling your Dives](/key-tasks/ai-and-motherduck/dives/theming-and-styling-dives/). ## Related resources - [Theming and styling your Dives](/key-tasks/ai-and-motherduck/dives/theming-and-styling-dives/) — How to write theme prompts, pick chart types, and add interactivity - [Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/) — Get started with your first Dive - [Managing Dives as code](/key-tasks/ai-and-motherduck/dives/managing-dives-as-code/) — Version control and CI/CD for Dives --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/embedding-dives --- sidebar_position: 5 title: Embedding Dives in your web application description: Embed interactive MotherDuck Dives in your web app using iframes and embed sessions feature_stage: preview --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; You can embed Dives in your own web application so your users can interact with live data dashboards without signing in to MotherDuck. Your backend creates an embed session, and your frontend loads the Dive in a sandboxed iframe. Embedding Dives is available on the **Business plan**. ## Prerequisites Before you start, you need: - A **MotherDuck Business plan** account - A read/write access token for an account with the Admin role. For production, we recommend using a dedicated [service account](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) - A Dive you want to embed, with its [data shared](/sql-reference/mcp/share-dive-data) to the target service account that the embedded Dive will run as - A backend server that can make authenticated API calls :::tip Use a dedicated service account We recommend using a service account that does not own databases with the same names as the databases your Dives query. When the service account attaches shared Dive data, the share alias defaults to the source database name. If the service account already has a database with that name, the attach fails. Using a dedicated, empty service account for embedding avoids this conflict. ::: ## How it works Embedded Dives follow a short server-side flow: 1. **Your backend** calls the MotherDuck API with your access token to create an embed session: an opaque string that contains a read-only session string and the information needed to load the Dive. 2. **Your frontend** renders a sandboxed iframe that loads the Dive from `embed-motherduck.com`, passing the session string. 3. **MotherDuck** loads the Dive and runs live SQL queries. Your end-users see an interactive dashboard without needing a MotherDuck account. ::::info[Two tokens are in play] Your service account's access token is a **high-privilege read-write admin token** that stays on your backend and is used only to create embed sessions. The session string it produces contains a **separate, read-only token** that is limited in scope and expires after 24 hours. Only the session string should ever reach the frontend. :::: ```mermaid sequenceDiagram participant M as MotherDuck participant B as Your backend participant F as Your frontend participant E as Embed iframe Note over B: Holds your access token B->>M: POST /v1/dives//embed-session M-->>B: Session string B-->>F: Return session string F->>E: Load iframe /sandbox/#session= Note over F,E: The session stays in the
URL fragment, not the request E->>M: Fetch Dive metadata and content M-->>E: Return the Dive ``` ## Step 1: Create an embed session Your backend calls the MotherDuck API to create an embed session. The access token used for this call must belong to an account with admin-level access. The session string contains a read-only token that expires after 24 hours. ::::warning[Important] **Never expose your access token in client-side code.** The access token stays on your backend. Only the session string reaches the browser. :::: ```javascript const DIVE_ID = ""; const response = await fetch( `https://api.motherduck.com/v1/dives/${DIVE_ID}/embed-session`, { method: "POST", headers: { // This is the admin account used to generate the embed session. Authorization: `Bearer ${MOTHERDUCK_TOKEN}`, "Content-Type": "application/json", }, // This is the service account whose compute / perms will be used for the Dive. body: JSON.stringify({ username: SERVICE_ACCOUNT_USERNAME }), } ); if (!response.ok) { throw new Error(`Failed to create embed session: ${response.status}`); } const { session } = await response.json(); // Return this session string to your frontend ``` ```python import httpx DIVE_ID = "" response = httpx.post( f"https://api.motherduck.com/v1/dives/{DIVE_ID}/embed-session", headers={ "Authorization": f"Bearer {MOTHERDUCK_TOKEN}", "Content-Type": "application/json", }, json={"username": SERVICE_ACCOUNT_USERNAME}, ) response.raise_for_status() session = response.json()["session"] # Return this session string to your frontend ``` Replace `` with the ID of your Dive. You can find this in **Settings** > **Dives** or through the [`list_dives`](/sql-reference/mcp/list-dives) MCP tool. Each session is tied to a single Dive. If you embed multiple Dives on the same page, create a separate embed session for each one. You can use the same service account and access token for all of them. The session string is base64-encoded but **not encrypted** — it contains a read-only (read scaling) token, the Dive ID, and endpoint URLs. Treat it like a short-lived credential: do not log it or store it in persistent storage. The embedded Dive runs queries as the service account specified in the session. If you need data isolation (for example, separate databases per region), use separate service accounts scoped to only the data each should access. ## Step 2: Embed the iframe Add a sandboxed iframe to your page that points to the MotherDuck embed URL. Pass the session string in the URL fragment: ```html ``` Replace `` with the session string your backend generated. The `sandbox` attribute must include `allow-scripts allow-same-origin` for the embed to function. ### Query modes Embedded Dives use **dual mode** by default, where queries can use browser DuckDB WASM or run server-side through MotherDuck depending on the query. Dual mode is required for browser DuckDB features such as data exports. You can force **server mode** for embeds that only need server-side SQL queries. To use server mode, add `?queryMode=server` to the iframe URL: ```html ``` #### Server mode data type limitations Server mode runs queries through the Postgres wire protocol, which does not support all DuckDB data types. Basic types (integers, strings, floats) work fine, but nested types (structs, lists) and some less common timestamp types may not render correctly. If you encounter issues with specific columns, try dual (WASM) mode, which supports the full range of DuckDB types. ### URL structure | Part | Description | |------|-------------| | `embed-motherduck.com/sandbox/` | The MotherDuck embed host | | `?queryMode=server` | Optional: forces server-only query mode | | `#session=` | The session string, passed in the URL fragment so it is never sent to the server | The session is placed in the URL fragment (after `#`) rather than the query string. Browsers strip fragments before making HTTP requests, so the session does not appear in server logs or Referer headers. ## Handle link navigation from embedded Dives Embedded Dives run inside an isolated MotherDuck sandbox iframe. Dive code cannot directly navigate the parent page or open popups. When someone clicks a link in an embedded Dive, or Dive code calls `window.open()`, the sandbox blocks the browser navigation and sends a `postMessage` to the parent page. The message has the following shape: ```typescript type NavigationRequest = { type: "navigation-request"; url: string; source: "anchor-click" | "window-open"; target: "_blank" | "_self" | null; rel: string | null; }; ``` The parent page decides how to handle the request. Listen for `navigation-request`, validate the event origin and URL, and apply your own policy before opening anything. The following example uses `window.confirm`; replace it with your application's confirmation UI: ```typescript const iframe = document.querySelector("#motherduck-dive"); if (!iframe) { throw new Error("MotherDuck Dive iframe not found"); } const motherduckEmbedOrigin = new URL(iframe.src).origin; window.addEventListener("message", (event) => { if (event.origin !== motherduckEmbedOrigin) return; if (event.source !== iframe.contentWindow) return; const message = event.data; if (message?.type !== "navigation-request") return; let url: URL; try { url = new URL(message.url); } catch { return; } if (!["https:", "http:"].includes(url.protocol)) return; const confirmed = window.confirm(`Open ${url.toString()}?`); if (!confirmed) return; window.open(url.toString(), "_blank", "noopener,noreferrer"); }); ``` ::::warning[Important] Treat `navigation-request` as untrusted user intent from sandboxed content, not as a command. The parent page should not navigate, submit forms, mutate application state, or grant permissions based only on the message. :::: ### Use absolute URLs in Dive links If you plan to embed a Dive, use absolute URLs in links inside the Dive. Avoid app-relative links like this: ```html Settings ``` In an embedded Dive, `/settings/members` resolves against the embed origin, not the MotherDuck app. The parent page receives a URL such as: ```text https://embed-motherduck.com/settings/members ``` Use absolute URLs instead: ```html Docs Another Dive ``` For embedded Dives, the parent page owns the policy for whether a navigation request opens a new tab, replaces the current page, or is blocked. ## Handle data exports from embedded Dives Dives can include export buttons created with the `exportAs` return value from `useSQLQuery()` or the `useExport()` hook. When a user starts an export, the Dive runs the export SQL with DuckDB `COPY TO` and sends the generated file to the parent page. Because embedded Dives run in a sandboxed iframe, the iframe cannot download the file directly. Your parent page must listen for export messages, validate the event, and decide how to offer the file to your user. Embedded exports support `csv`, `json`, `parquet`, and `xlsx` formats. Exports require dual mode because file generation uses browser DuckDB. If you force `?queryMode=server`, export controls return an error. The parent page receives these message types: ```typescript type ExportStarted = { type: "export-started"; requestId: string; format: "csv" | "json" | "parquet" | "xlsx"; title?: string; filename: string; }; type ExportFile = { type: "export-file"; requestId: string; format: "csv" | "json" | "parquet" | "xlsx"; title?: string; filename: string; mimeType: string; byteLength: number; previewOptions?: Record; data: ArrayBuffer; }; type ExportError = { type: "export-error"; requestId: string; format: "csv" | "json" | "parquet" | "xlsx"; title?: string; filename?: string; error: string; }; ``` The following example stores the completed export and shows a host-page download button. Replace the status and button UI with your application's pattern: ```html

``` ```javascript const iframe = document.querySelector("#motherduck-dive"); const status = document.querySelector("#dive-export-status"); const downloadButton = document.querySelector("#dive-export-download"); if (!iframe || !status || !downloadButton) { throw new Error("MotherDuck Dive export controls not found"); } const motherduckEmbedOrigin = new URL(iframe.src).origin; let pendingExport = null; function isArrayBuffer(value) { return Object.prototype.toString.call(value) === "[object ArrayBuffer]"; } function isExportFile(message) { return ( message?.type === "export-file" && typeof message.requestId === "string" && typeof message.filename === "string" && typeof message.mimeType === "string" && typeof message.byteLength === "number" && isArrayBuffer(message.data) ); } window.addEventListener("message", (event) => { if (event.origin !== motherduckEmbedOrigin) return; if (event.source !== iframe.contentWindow) return; const message = event.data; if (message?.type === "export-started") { status.textContent = `Preparing ${message.filename}`; downloadButton.hidden = true; pendingExport = null; return; } if (message?.type === "export-error") { status.textContent = `Export failed: ${message.error}`; downloadButton.hidden = true; pendingExport = null; return; } if (!isExportFile(message)) return; pendingExport = message; status.textContent = `${message.filename} is ready to download`; downloadButton.hidden = false; }); downloadButton.addEventListener("click", () => { if (!pendingExport) return; const blob = new Blob([pendingExport.data], { type: pendingExport.mimeType || "application/octet-stream", }); const url = URL.createObjectURL(blob); const link = document.createElement("a"); link.href = url; link.download = pendingExport.filename; document.body.appendChild(link); link.click(); link.remove(); URL.revokeObjectURL(url); pendingExport = null; downloadButton.hidden = true; status.textContent = "Export downloaded"; }); ``` ::::warning[Important] Treat export messages as untrusted content from sandboxed Dive code. After you validate the event origin and source, use the message to offer a download to your user. Do not upload the file, attach it to another account, or trigger backend workflows based only on the message. :::: Exports run the full SQL passed by the Dive, not the rows already rendered in React. Large exports can use significant browser memory because the generated file is transferred to the parent page as an `ArrayBuffer`. For larger data delivery workflows, consider creating a server-side export flow outside the embedded Dive. ## Session lifecycle Embed sessions expire after 24 hours. You have two options for handling expiration: - **Generate a fresh session per page load.** The simplest approach. Each time a user loads the page, your backend creates a new embed session and passes it to the iframe. - **Cache and refresh.** Your backend caches the session and refreshes it before it expires. This reduces API calls but adds complexity. If a session expires while a Dive is open, the embed displays a "Session expired" message. The user needs to reload the page to get a new session. ## Security best practices - **Keep your access token server-side.** Never include your access token in client-side JavaScript, HTML, or any code that reaches the browser. - **Use a dedicated service account.** Create a [service account](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) specifically for embedding, separate from your personal account. The account needs a read/write, Admin-level access token to create embed sessions, but the sessions it generates are always read-only. - **Sessions are read-only.** The embed session always contains a read scaling token, so it can only read data, not modify it. - **Session in URL fragment.** The fragment (`#session=...`) is never sent to the server in HTTP requests, keeping the session out of access logs and referrer headers. - **Scope service accounts for data isolation.** If you need to restrict which data different users can see (for example, per-region databases), create separate service accounts with access scoped to the appropriate data. The embedded Dive queries data as the service account used to create the session. ## CSP configuration If your site uses a restrictive [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), add `embed-motherduck.com` to your `frame-src` directive: ```text Content-Security-Policy: frame-src https://embed-motherduck.com; ``` Without this, the browser blocks the iframe from loading. ## Troubleshooting Errors from the embed itself (expired token, Dive not found) appear as messages **inside the iframe**. CSP or network-related errors typically appear only in the **browser developer console**. | Error message | Cause | Solution | |---------------|-------|----------| | "Dive embedding requires a Business plan." | Your organization is not on the Business plan | Upgrade to a [Business plan](https://motherduck.com/pricing/) | | "Invalid or expired token. Please reload the page." | The session has expired or is malformed | Create a fresh embed session from your backend | | "Dive not found." | The Dive ID is incorrect or the Dive has been deleted | Verify the Dive ID in **Settings** > **Dives** | | "Failed to load dive. Please try again." | A generic error occurred while loading | Check your session string and network connectivity, then reload | | "Can't open share: Share alias cannot be the same as an existing database name. _name_ is already taken and used as a database name." | Your service account already has a database with the same name as one of the Dive's shared databases | Rename or [detach](/key-tasks/database-operations/detach-and-reattach-motherduck-database/) the conflicting database on the service account. See [share alias conflicts](/sql-reference/motherduck-sql-reference/attach/#share-alias-conflicts) for details. | | Links in the embedded Dive do not open | Embedded Dives cannot directly navigate the parent page or open popups from the sandbox | Listen for `navigation-request` messages in the parent page, validate the URL, and decide whether to open it | | Export buttons do not download a file | The iframe cannot download files directly from the sandbox, or the embed is using server mode | Listen for `export-file` messages in the parent page and offer the file for download. Use dual mode for Dives that include export controls. | | Iframe does not load (blank or blocked) | Your site's CSP blocks `embed-motherduck.com` | Add `frame-src https://embed-motherduck.com` to your CSP header (visible in browser dev console as a CSP violation) | | User role "restricted" does not meet minimum role "admin" required for dashboards.createEmbedSession" | The user associated with the token is not an admin. Generating embed tokens requires the user or service account to have admin permissions. | In the service accounts panel under settings, change the role of the service account to 'Admin' | | unauthorized_client: Callback URL mismatch. `` is not in the list of allowed callback URLs | Embedded dives use MotherDuck's authorization system to determine permissions this limits what URLs can be used for authorization. | For local development ensure that you are running on `localhost` not something like `127.0.0.1` | ## Related resources - [Creating visualizations with Dives](/key-tasks/ai-and-motherduck/dives/) - [Dives SQL functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) - [Managing Dives as code](/key-tasks/ai-and-motherduck/dives/managing-dives-as-code) --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/index --- sidebar_position: 3 title: Creating Visualizations with Dives description: Build interactive visualizations from natural language using AI agents and the MotherDuck MCP Server feature_stage: preview --- import VideoPlayer from '@site/src/components/VideoPlayer'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Dives are interactive visualizations you create with natural language, directly on top of your data in MotherDuck. Ask a question to your AI agent, and MotherDuck generates a persistent, interactive component that lives in your workspace alongside your SQL. Think of Dives as a bridge between one-off questions and always-up-to-date dashboards. Instead of building a full dashboard or writing complex queries, you can ask a question and save the answer as a Dive that stays current with your data. ## How Dives work When you create a Dive with the [MotherDuck MCP](/sql-reference/mcp/) through an AI agent: 1. You ask a question in natural language (for example, "Show me monthly revenue trends by product category") 2. The AI agent queries your MotherDuck database through the [MCP Server](/sql-reference/mcp/) to understand the data 3. The agent creates an interactive visualization, with the necessary SQL to query the data 4. In clients that support the Dive Viewer MCP App, the Dive renders inline in the chat against live data. In other clients, the agent shows a static preview with sample data until you open the Dive in MotherDuck 5. MotherDuck saves the Dive to your workspace Dives use MotherDuck's [hypertenancy](/concepts/hypertenancy) architecture to serve sub-second queries. Every user gets dedicated compute, so there's no slowdown when your whole team explores data at once. ### Inline preview with the Dive Viewer On clients that support [MCP Apps](https://apps.extensions.modelcontextprotocol.io/), the MotherDuck MCP Server serves a **Dive Viewer MCP App** that renders your Dive directly in the chat with the same React components used in the MotherDuck UI. At launch, this is supported in Claude web and desktop; other clients fall back to a sample-data preview. With the Dive Viewer: - The preview queries **live data** through the MCP Server, so what you see in the chat matches what you'll see in MotherDuck. - Every edit is applied incrementally and saved as a separate version of the Dive, rather than rewritten from scratch. You can browse versions from the version picker in the MotherDuck UI. - You iterate conversationally (*"add a filter for US region"*, *"switch to a bar chart"*) and the Viewer updates in place. ## Prerequisites To create a Dive, you will need: - A MotherDuck account with at least one database - An [AI client](/docs/getting-started/mcp-getting-started/) connected to the [MotherDuck MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) (Claude, ChatGPT, Cursor, or others) Dives are available on all MotherDuck plans at no additional charge. ## Creating a Dive Connect your AI assistant to the MotherDuck MCP Server, then ask it to create a visualization. The key is to ask for a "Dive" specifically as this tells the agent to persist the visualization in your MotherDuck workspace. **Example prompts:** - *"Create a Dive showing monthly revenue trends for the last 12 months"* - *"Make a Dive that breaks down customer sign-ups by region"* - *"Build a Dive with a chart of our top 10 products by sales volume. Use MotherDuck's brand colors"* The AI agent handles the SQL, chart configuration, styling and saving. You just describe what you want to see. ### Iterating on a Dive Once you have a Dive, you can refine it through conversation: - *"Add a filter for the US region only"* - *"Change the chart to a stacked bar chart"* - *"Add a trend line to show the overall direction"* Each update modifies the Dive in place, keeping your visualization current. ## Finding your Dives Dives appear in two places in the MotherDuck UI: ### Object explorer Your recent Dives appear in the left sidebar, above your Notebooks. Click any Dive to load it in the main view. The list shows your most recent Dives first. ![A screenshot of a dives dashboard in the MotherDuck UI](./img/dives_airquality_eastcoats_westcoast.png) ### Settings page For a complete list of all Dives in your organization, go to **Settings** → **Dives**. This view makes it easier to find Dives created by others in your team. ![A screenshot of the dives settings and overview in the MotherDuck UI](./img/dives_settings_ui.png) ## Sharing Dives with your team When you save a Dive, the AI agent checks whether the databases it queries are shared with your organization. If not, it will suggest sharing them so your team can view the Dive. You can also explicitly ask: > *"Share the data for my revenue Dive with my team"* This creates org-scoped shares for any private databases referenced in the Dive's queries and updates the Dive to use the shared references. See [`share_dive_data`](/sql-reference/mcp/share-dive-data) for details. ## Version history Every time you update a Dive, MotherDuck saves a version. You can browse previous versions directly in the MotherDuck UI using the version picker in the top-right corner of a Dive. The dropdown shows each version with its description and when it was created. ![A screenshot of the version history dropdown in the MotherDuck Dives UI](./img/dives_version_history.png) Selecting a previous version lets you view what the Dive looked like at that point. Version browsing is read-only: switching to an older version does not overwrite the latest version. You can also retrieve versions programmatically. Use [`list_dives`](/sql-reference/mcp/list-dives) to see the `current_version` for each Dive, and [`read_dive`](/sql-reference/mcp/read-dive) with the `version` parameter to inspect a specific version. ## What makes Dives different Unlike traditional dashboards: - **Natural language creation**: Describe what you want in plain English instead of clicking through a UI or writing visualization code - **Always current**: Dives query live data—no manual refreshes or stale snapshots - **Workspace-native**: Dives live alongside your SQL in MotherDuck, not in a separate tool - **Instant exploration**: Filter, drill down, and explore without waiting for queries to run Unlike one-off AI-generated charts: - **Persistent**: Dives save to your workspace so you can return to them anytime - **Shareable**: Team members can view and interact with Dives you create—[share the underlying data](/sql-reference/mcp/share-dive-data) to give them access - **Interactive**: Filter and explore the data, not just view a static image ## Walkthrough: Building a Dive step by step Connect the [MotherDuck MCP Server](/sql-reference/mcp/) to Claude for desktop or Claude on the web, then open a new conversation. **Step 1: Explore your data** Don't ask for a finished Dive right away. Start vague: *"Take a look at what tables I have in my analytics database."* Claude lists tables, reads column names, samples rows, and figures out how things connect. Doing this first saves you from chasing down SQL errors later. When it reports back, keep asking questions. *"How do the orders and customers tables connect? What date range am I working with?"* The more Claude knows about your schema upfront, the fewer corrections you'll need. **Step 2: Shape the analysis** Point Claude at what you want to see. If you're not sure what to look for, go open-ended: *"What are the most interesting patterns in this data?"* Claude runs queries and pulls out trends you might have missed. If you already have something in mind, say so: *"I want to see how revenue breaks down by product category over the last 12 months."* You can also paste in a SQL query or a screenshot of a dashboard you want to recreate. Mention specifics like calculated columns, filters, or date ranges before asking Claude to build the Dive. **Step 3: Iterate on the live preview** Claude renders the Dive inline in the chat with the Dive Viewer MCP App, using the same components as the MotherDuck UI and running against live data. Dive edits are versioned. Users can ask their agent to refer to and clone prior versions for continued iterations. They can also browse through past versions directly in the MotherDuck UI. Explain *why* you want a change, not just *what*. *"I want to spot outliers quickly"* gives Claude more to work with than *"make the dots bigger."* Group related tweaks into one message. Keep unrelated changes separate. If something isn't working after two or three rounds, try a different approach. If you know what you want to change specifically, go ahead and do it. Even beyond the charts and visuals themselves, there are so many ways to enhance your Dive. Every type of custom interaction you've seen on the web is available to you. Ask for features like drill downs, cross-filtering, zooming, and more. You don't have to finish in one sitting. **Step 4: Find it in MotherDuck** Every edit from the Dive Viewer is saved to your workspace as a separate version, so the Dive is already there when you're done iterating. If you want to force a save or name a checkpoint explicitly, ask Claude: *"Save this as a Dive in MotherDuck."* Find the Dive in the [Object Explorer sidebar](#object-explorer) or on the [Settings page](#settings-page), share it with your team, and come back to Claude when you want to change anything. Claude Code can allow you to iterate very quickly when building Dives. With Claude Code, you can preview your changes in a local environment for instant feedback loops - and Claude can get that environment set up for you! To get started, connect the [MotherDuck MCP Server](/sql-reference/mcp/) to Claude Code, then open a new conversation. **Step 1: Explore your data** Don't ask for a finished Dive right away. Start vague: *"Take a look at what tables I have in my analytics database."* Claude lists tables, reads column names, samples rows, and figures out how things connect. Doing this first saves you from chasing down SQL errors later. When it reports back, keep asking questions. *"How do the orders and customers tables connect? What date range am I working with?"* The more Claude knows about your schema upfront, the fewer corrections you'll need. **Step 2: Shape the analysis** Point Claude at what you want to see. If you're not sure what to look for, go open-ended: *"What are the most interesting patterns in this data?"* Claude runs queries and pulls out trends you might have missed. If you already have something in mind, say so: *"I want to see how revenue breaks down by product category over the last 12 months."* You can also paste in a SQL query or a screenshot of a dashboard you want to recreate. Mention specifics like calculated columns, filters, or date ranges before asking Claude to build the Dive. **Step 3: Create a Dive local preview** Next, ask Claude to create a Dive based on your analysis thus far and any other open questions on your mind. Claude will ask if you would like to see a local preview, and if you accept, the MotherDuck MCP will give Claude the instructions to set up a preview on your local machine. To set up the preview, Claude will make some local folders and run some npm commands, and after a moment your environment will be ready. You will receive a message like this: > `The preview is running at http://localhost:5177/.` > `Open that in your browser to see the Dive with live data from MotherDuck.` So, cmd + click on that localhost URL (or ctrl + click if you are in Windows), and you'll have a live preview in your browser of the Dive you just created. **Step 4: Iterate with the preview** Now you get to tap into the power of Agents for follow up analysis and enhancing the visual. Explain *why* you want a change, not just *what*. *"I want to spot outliers quickly"* gives Claude more to work with than *"make the dots bigger."* Group related tweaks into one message. Keep unrelated changes separate. If something isn't working after two or three rounds, try a different approach. If you know what you want to change specifically, go ahead and do it. Feel free to keep questions open ended. Things like, *"What other columns are correlated with revenue? What other interesting patterns should I investigate?"* can let Claude uncover hidden patterns on your behalf. Even beyond the charts and visuals themselves, there are so many ways to enhance your Dive. Every type of custom interaction you've seen on the web is available to you. Ask for features like drill downs, cross-filtering, zooming, and more. **Step 5: Publish to MotherDuck** Tell Claude to save it: *"Save this as a Dive in MotherDuck."* The Dive runs against live data. Find it in the [Object Explorer sidebar](#object-explorer) or on the [Settings page](#settings-page), share it with your team, and come back to Claude when you want to change anything. Connect the [MotherDuck MCP Server](/sql-reference/mcp/) to ChatGPT and follow the general steps in [Creating a Dive](#creating-a-dive). The workflow is similar to the Claude Desktop/Web tab: explore your data, shape the analysis, then ask ChatGPT to save the result as a Dive. Connect the [MotherDuck MCP Server](/sql-reference/mcp/) to Cursor and follow the general steps in [Creating a Dive](#creating-a-dive). The workflow is similar to the Claude Code tab: explore your data, shape the analysis, preview locally, then publish the Dive to MotherDuck. ## Tips for better Dives ### Be specific about the visualization Include details about chart type, time ranges, and groupings: | Less effective | More effective | |----------------|----------------| | "Show me sales data" | "Create a Dive with a line chart of weekly sales for 2024, broken down by product category" | | "Make a customer chart" | "Build a Dive showing customer count by signup month as a bar chart" | ### Use your schema knowledge If you know your table and column names, include them: > "Create a Dive from the `orders` table showing `total_amount` by `order_date`, grouped by month" ### Start simple, then iterate Begin with a basic visualization, then add complexity: 1. *"Create a Dive showing revenue by month"* 2. *"Add a breakdown by region"* 3. *"Filter to show only the top 5 regions"* ## Troubleshooting | Issue | Solution | |-------|----------| | AI creates a chart but doesn't save it as a Dive | Explicitly ask to "create a Dive" or "save this as a Dive in MotherDuck" | | Dive shows unexpected data | Ask the AI to explain the query it used, then refine your request | | Can't find a Dive | Check **Settings** → **Dives** for the complete list | | Dive is slow to load | The underlying query may be scanning a lot of data—ask the AI to add filters or optimize | ## Declaring required databases When your Dive queries a database that viewers might not have attached, export a `REQUIRED_DATABASES` constant from your component. MotherDuck automatically attaches these databases (including shared databases) before running any queries, so your teammates don't see "Catalog does not exist" errors. ```jsx export const REQUIRED_DATABASES = [ { type: 'share', path: 'md:_share//', alias: '' } ]; ``` Each entry describes one database: | Field | Description | |-------|-------------| | `type` | `"share"` for shared databases, `"database"` for owned databases | | `path` | The share URL (for example, `md:_share/galactic_coffee/af03aa17-...`) or database name | | `alias` | The local alias used in your SQL queries | You can find your share URLs by running `FROM MD_INFORMATION_SCHEMA.OWNED_SHARES;` or by asking the AI agent to use the [`share_dive_data`](/sql-reference/mcp/share-dive-data) tool. This approach is preferred over calling `ATTACH` inside `useSQLQuery`, because it lets MotherDuck handle the attachment before any data queries fire. ## Related resources - [Embedding Dives in your website](/key-tasks/ai-and-motherduck/dives/embedding-dives) - [Dives SQL Functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) — Manage Dives directly from SQL - [`useSQLQuery` hook](/sql-reference/motherduck-sql-reference/ai-functions/dives/use-sql-query) — React hook reference for querying data inside Dives - [Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) — Set up the MCP server with your AI assistant - [MCP Workflows](/key-tasks/ai-and-motherduck/mcp-workflows/) — Tips for effective AI-powered data analysis - [AI Features in MotherDuck](/docs/key-tasks/ai-and-motherduck/ai-features-in-ui/) — Explore instant SQL and automatic SQL fixes. --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/managing-dives-as-code --- sidebar_position: 4 title: Managing Dives as Code description: Set up a Git-based workflow for developing, previewing, and deploying Dives with GitHub Actions and Claude Code feature_stage: preview --- import VideoPlayer from '@site/src/components/VideoPlayer'; import SignUpLink from '@site/src/components/SignUpLink'; Creating Dives through an AI agent is fast, but as your team relies on them for decision-making, you may want the same rigor you apply to production code: version history, code review, and automated deployments. Since Dives are React components and SQL queries under the hood, you can manage them with Git and CI/CD — just like the rest of your codebase. This guide walks through setting up that workflow: local development with hot reload, PR-based preview deployments, and automated production updates on merge. A [starter repo](https://github.com/motherduckdb/blessed-dives-example) with GitHub Actions pipelines is ready to fork and use. ## Quick start Fork the [starter repo](https://github.com/motherduckdb/blessed-dives-example) to get up and running immediately. It includes: - A working example Dive - The Vite preview setup for local development - GitHub Actions for deploy and cleanup - A `CLAUDE.md` that teaches the agent the repo conventions Fork the repo, set a `MOTHERDUCK_TOKEN` secret, and you're deploying Dives on merge. ## Prerequisites - A MotherDuck account with at least one Dive already published - A GitHub repository to store your Dive source files (or fork the [starter repo](https://github.com/motherduckdb/blessed-dives-example)) - [Claude Code](https://docs.anthropic.com/en/docs/build-with-claude/claude-code/overview) connected to the [MotherDuck MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) - A MotherDuck API token set as a GitHub secret (`MOTHERDUCK_TOKEN`) ## Pull a dive for local development Start with a Dive that's already published in MotherDuck. Copy its share link from the MotherDuck UI, then tell Claude Code to set it up locally: ```text Set up this dive for local development: https://app.motherduck.com/dives/... ``` The agent uses the MotherDuck MCP Server to: 1. Read the Dive source through the SQL API using the share link 2. Pull down the file into a local directory in your repo 3. Register the Dive for CI 4. Start a lightweight Vite development server for live preview The MCP Server's `get_dive_guide` tool provides the agent with everything it needs — the React component contract, dependency setup, and instructions for the local dev server. No additional skills or context files are required beyond what the MCP server provides. ![Claude Code spinning up the Vite dev server after pulling down a Dive for local development.](./img/claude_code_vite_terminal_1ffa9f80a9.png) ## Edit locally with an AI agent With the local dev server running, you can iterate on the Dive using Claude Code. The agent can restyle charts, rewrite SQL queries, add filters, swap visualizations — anything you can express as a prompt. ```text Make this much better visually. Top-tier style please. ``` The Vite dev server hot-reloads changes, so you see updates instantly in the browser. The MCP server provides schema context so the agent writes accurate SQL against your live data. ![A Dive running locally, showing the updated dashboard with improved styling and layout.](./img/dive_local_preview_9bccdb19bf.png) If your repo includes a `CLAUDE.md` file (the [starter repo](https://github.com/motherduckdb/blessed-dives-example) includes one), the agent also knows the folder conventions and how to register new Dives for CI — so you can go from "pull this Dive down" to "push up a PR" without explaining any plumbing. ## Deploy a preview with GitHub actions Once you're happy with your changes, tell the agent to push a PR: ```text Put up a PR on a new feature branch ``` When a PR is opened (or updated with new commits), a GitHub Action detects which Dive folders changed and deploys a **preview** Dive to MotherDuck. The preview uses the same live environment as production but has a branch-tagged title so it's clearly labeled. A comment appears on the PR with a direct link. ![A GitHub Actions bot comment on a PR showing a preview Dive link — click Open Dive to see it live in MotherDuck.](./img/pr_preview_comment_13ca302ff9.png) Your reviewer clicks the link and sees the Dive running with live queries — no local setup needed. The deploy action uses path filters to detect which Dive folders changed, then calls a shared deploy script (`scripts/deploy-dive.sh`) for each one. The script reads the Dive's source and metadata, and uses the DuckDB CLI with the MotherDuck extension to create or update the Dive. ## Merge to production When the preview looks right, merge the PR. A separate deploy job runs that creates or updates the production Dive, matched by title. The production Dive is now live and shareable with anyone in your organization. ![The deploy GitHub Action after a merge to main, completing in 20 seconds.](./img/deploy_action_success_f763894ae0.png) ## Clean up preview dives Delete the feature branch after merging. A cleanup action fires that removes the preview Dive from your MotherDuck account — no orphaned Dives cluttering your workspace. The entire pipeline is two GitHub Actions and one secret (`MOTHERDUCK_TOKEN`). At MotherDuck, we use a dedicated service account so anyone with repo access can edit and deploy with the same ownership scope. ## Related resources - [Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/) — Create Dives from natural language with AI agents - [Dives SQL Functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) — Manage Dives directly from SQL - [Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) — Set up the MCP server with your AI assistant - [Starter repo](https://github.com/motherduckdb/blessed-dives-example) — Fork and start deploying --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/dives/theming-and-styling-dives --- sidebar_position: 3.5 title: Theming and styling your Dives description: Control the visual appearance of your Dives with theme definitions, chart selection, and interactive filters feature_stage: preview --- import VideoPlayer from '@site/src/components/VideoPlayer'; When you create a Dive, you can go beyond the default look and feel. By providing a **theme definition** in your prompt, you control colors, typography, chart types, and interaction patterns — turning a basic visualization into a polished, branded data experience. This guide covers how to structure a theme prompt, pick the right chart types for your data, and add interactivity through filters and cross-filtering. You can explore and play with themed Dives in the [live theme gallery](https://duck-dives.vercel.app/snippets/galactic-coffee-theme-gallery), or browse our [curated theme gallery](/key-tasks/ai-and-motherduck/dives/dive-theme-gallery/) with screenshots and ready-to-copy prompts. ## How theming works in Dives A Dive is a React component that renders charts using [Recharts](https://recharts.org/) and queries live MotherDuck data through `useSQLQuery`. When you describe a visual style in your prompt, the AI agent translates it into: - A **color palette** (background, text, muted, and chart colors) - **Typography** (font family, title weight, text transform) - **Chart configuration** (grid lines, stroke width, curve type, bar radius) - **Layout** (grid columns, spacing, card styling) You don't need to write any code — describe the style and the agent handles the implementation. ## Writing a theme prompt A good theme prompt has four parts: **colors**, **typography**, **chart rules**, and **feel**. Here's an example that produces a Financial Times-inspired Dive: ```text Create a Dive with an FT Salmon style. Inspired by: Financial Times Visual Journalism. Visual rules: - Background: #FFF1E5 (signature salmon). Text: #33302E. Muted: #807973. - Chart colors: ["#0F5499", "#990F3D", "#FF7FAA", "#00A0DD"]. - Font: Georgia, serif. Titles: semibold. - Interactive: year & metric toggles, click-to-filter cross-filtering. Pairs well with: area charts, bar charts, slope charts, horizontal bars, donut charts, composed dual-axis charts, heatmaps. Feel: Financial authority — the pink paper, digitized. ``` ### What to include in your prompt | Section | What to specify | Example | |---------|----------------|---------| | Colors | Background, text, muted accent, 3-5 chart colors | `Background: #0d1117. Chart colors: ["#58a6ff", "#3fb950"]` | | Typography | Font family, title weight, text transform | `Font: Georgia, serif. Titles: bold, UPPERCASE` | | Chart rules | Grid lines, stroke width, curve type, bar radius | `No gridlines, 1.5px strokes, linear interpolation` | | Chart types | Which charts to include | `Pairs well with: area charts, bar charts, heatmaps` | | Interactivity | Filters and cross-filtering behavior | `Interactive: year toggle, metric toggle, click-to-filter` | | Feel | One-line mood descriptor | `Feel: Midnight studio — data glowing in the dark` | ### Tips for effective theme prompts **Reference real-world styles.** Naming a specific design tradition helps the agent make consistent decisions. "Tufte minimal" or "Neon 80s synthwave" gives more coherent results than listing individual properties. **Specify chart colors as an array.** Providing 3-5 hex colors as a JSON array (for example, `["#2563eb", "#16a34a", "#dc2626"]`) gives the agent an explicit palette instead of leaving it to guess. **Pick colors that work in charts, not just colors that look nice together.** General-purpose palette generators often produce colors that clash or become indistinguishable when applied to bars, lines, and slices. Use tools designed for data visualization: - [Colorbrewer 2.0](https://colorbrewer2.org/) — the gold standard for cartography and charts. Pick sequential, diverging, or qualitative palettes and get hex values ready to paste. Every palette is tested for perceptual uniformity and colorblind safety. - [Viz Palette](https://projects.susielu.com/viz-palette) — paste your candidate colors and preview them on actual chart types (bars, lines, scatter). It flags pairs that are too similar or hard to distinguish with color vision deficiencies. As a rule of thumb, limit your palette to 5-7 chart colors. More than that and the colors start blending together, especially in legends. If you have more categories than colors, consider grouping smaller categories into an "Other" bucket. **Mention the "feel" in one sentence.** This guides the agent on ambiguous decisions like spacing, border radius, and animation. "Sugar rush — joyful and bold" produces different results than "Quiet authority — the data speaks for itself." ## Choosing chart types Different chart types serve different purposes. When building a Dive with multiple charts, pick a mix that covers different analytical angles of your data. ### Chart type reference | Chart type | Best for | Data shape | |------------|----------|------------| | Line chart | Trends over time | Time series | | Area chart | Volume over time, part-to-whole trends | Time series | | Bar chart | Comparing categories | Categorical | | Horizontal bar | Ranked lists, long category names | Categorical, sorted | | Stacked area | Composition over time | Multi-series time | | Composed chart (bar + line) | Dual metrics on shared timeline | Time series, two metrics | | Heatmap | Density across two dimensions | Matrix (for example, station x month) | | Pie / donut | Part-to-whole — ideally aim for 2 or 3 slices, max 5. A horizontal bar or donut is almost always easier to read. If you still want a pie chart, label slices directly. | Categorical, proportional | | Radar | Multi-dimensional profile comparison | Categorical, normalized | | Scatter | Correlation between two measures | Two continuous variables | | Table | Exact values, detailed comparison | Any structured data | ### Chart pairing recommendations A 6-chart grid works well with this pattern: 1. **Trend chart** (line, area, or stepped line) — shows how metrics move over time 2. **Comparison chart** (bar or horizontal bar) — ranks categories side by side 3. **Composition chart** (pie, donut, or stacked area) — shows part-to-whole relationships 4. **Detail view** (table or direct-labeled bars) — provides exact values 5. **Dual-axis chart** (composed bar + line) — overlays two related metrics 6. **Density chart** (heatmap or scatter) — reveals patterns across dimensions This mix gives viewers both the big picture and the ability to drill into specifics. ## Adding interactivity Interactive filters make a Dive more useful than a static dashboard. You can ask for several types of interactivity in your prompt. ### Time filters Time filters are the most common interactive control. Two patterns work well depending on your data: **Relative time windows** work best for operational data that updates continuously — think logs, events, or transactions. Users care about what happened in the last few hours or days, not a specific calendar year: ```text Add time filter pills: Last 24h | Last 7 days | Last 30 days | Last 90 days | All time. Filter all charts when a time range is selected. Default to Last 30 days. ``` **Year or period toggles** work better for data with natural calendar boundaries — annual reports, quarterly metrics, or fiscal comparisons: ```text Add year toggle pills: 2024 | 2025 | All. Filter all charts when a year is selected. ``` Pick whichever pattern matches how your users think about the data. If they ask "what happened this week?" go with relative windows. If they ask "how did Q4 compare to Q3?" go with period toggles. ### Metric toggles Let users switch which measure the charts display: ```text Add a metric toggle between Revenue and Cups Sold. The hero KPI and all chart Y-axes should update when toggled. ``` This changes the `dataKey` used by line, area, and bar charts, and swaps which metric appears as the primary KPI. ### Cross-filtering with click interactions Cross-filtering means clicking an element in one chart filters every other chart in the Dive. This is different from putting a filter dropdown on each individual chart — and the difference matters. **Why cross-filtering over individual filters?** When each chart has its own filter controls, users end up in a state where Chart A shows "US only," Chart B shows "all regions," and Chart C shows "Europe." The charts look coherent but they're answering different questions, and comparing them leads to wrong conclusions. Cross-filtering avoids this by keeping every chart in sync: click "US" on any chart and the entire Dive updates to show the US view. The user always sees one consistent story across all charts. **When individual filters make sense.** There are cases where a per-chart filter is the right choice — when a chart has a dimension that doesn't exist in the other charts. For example, a chart showing data broken down by warehouse location doesn't need to cross-filter a chart that doesn't have a warehouse column. In that case, a local filter on just that chart is appropriate. A good rule of thumb: use cross-filtering for shared dimensions (time, region, product category) and individual filters for dimensions unique to a single chart. Enable cross-filtering in your prompt: ```text Add click-to-filter cross-filtering: - Click a bar in the station chart to filter by that station - Click a pie slice to filter by that coffee type - Non-selected items render at 30% opacity - Show dismissible filter pills when filters are active ``` Cross-filtering works best when: - **Bar charts** filter on their categorical axis (for example, clicking a station bar filters by station) - **Pie and donut charts** filter on slice category (for example, clicking a product slice filters by product) - **Unselected items** dim to 30% opacity rather than disappearing, so users keep the full context while focusing on a subset - **Filter pills** appear below the controls showing active filters with a dismiss button ### Filter pills When cross-filters are active, visible pills show what's filtered and let users clear filters with one click: ```text Show active filters as colored pills with ✕ dismiss buttons. Only show the pills row when filters are active. ``` ### Tooltips and accordions Interactive Dives let you keep the visual layout clean while still providing rich context. Move descriptions, methodology notes, and supporting text into **tooltips** and **accordions** so they're available on demand without cluttering the charts: ```text Add an info tooltip on each chart title that explains the metric. Add an expandable accordion below the charts with methodology notes. ``` This works well for Dives shared with a broad audience — power users can expand the details, while casual viewers get an uncluttered experience. ## Laying out a multi-chart Dive For Dives with multiple charts, specify the grid layout in your prompt: ```text Use a 3×2 grid layout (3 columns, 2 rows) with 6 charts. Each chart card should have a title, subtle border, and 160px chart height. ``` Common layouts: | Charts | Layout | Use case | |--------|--------|----------| | 2-4 | `repeat(2, 1fr)` | Focused analysis, fewer metrics | | 5-6 | `repeat(3, 1fr)` | Dashboard-style overview | | 8+ | `repeat(4, 1fr)` | Small multiples, sparkline grids | ## Example: full theme prompt Here's a complete prompt that produces a themed, interactive Dive: ```text Create a Dive showing sales data from my galactic_coffee database. Theme: Corporate Dashboard - Background: #f5f5f5. Text: #333. Muted: #777. - Chart colors: ["#2563eb", "#16a34a", "#dc2626", "#f59e0b", "#8b5cf6"]. - Font: system-ui, sans-serif. Titles: semibold, UPPERCASE. - Layout: 3×2 grid with card borders and 8px border radius. Charts: 1. Line chart — Revenue trend over time 2. Pie chart — Product mix breakdown 3. Table — Station performance details 4. Bar chart — Station comparison 5. Composed chart — Revenue bars + Cups sold line (dual Y-axis) 6. Heatmap — Station × Month revenue density Interactivity: - Year toggle: 2024 | 2025 | All - Metric toggle: Revenue | Cups - Click a bar to filter by station, click a pie slice to filter by product - Show filter pills with ✕ dismiss when filters are active KPIs: Show total revenue, total cups sold, and average rating above the charts. ``` ## Related resources - [Dive theme gallery](/key-tasks/ai-and-motherduck/dives/dive-theme-gallery/) — Screenshots and ready-to-copy prompts for 15 themes - [Creating Visualizations with Dives](/key-tasks/ai-and-motherduck/dives/) — Get started with your first Dive - [Managing Dives as code](/key-tasks/ai-and-motherduck/dives/managing-dives-as-code/) — Version control and CI/CD for Dives - [Dives SQL functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) — Manage Dives directly from SQL - [MCP Server tools](/sql-reference/mcp/) — Reference for all MCP tools including Dive operations --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-setup --- sidebar_position: 0 title: Connect to the MotherDuck MCP Server sidebar_label: Connect to MCP Server description: Set up the MotherDuck MCP Server with Claude, ChatGPT, Cursor, Claude Code, and other AI assistants --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import ClaudeIcon from '../../../static/img/icons/brands/claude-icon'; import ChatGPTIcon from '../../../static/img/icons/brands/chatgpt-icon'; import CursorIcon from '../../../static/img/icons/brands/cursor-icon'; import ExternalLinkIcon from '../../../static/img/icons/external-link-icon'; import VideoPlayer from '@site/src/components/VideoPlayer'; import useBaseUrl from '@docusaurus/useBaseUrl'; import DocImage from '@site/src/components/DocImage'; import SignUpLink from '@site/src/components/SignUpLink'; The MotherDuck MCP Server lets AI assistants query and explore your databases using the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). This guide walks you through connecting your preferred AI client to the **remote MCP server** (fully managed, zero setup). For local DuckDB files or self-hosted setups, see the [local MCP server](#remote-vs-local-mcp-server). :::info Connection URL The remote MCP server is hosted at `https://api.motherduck.com/mcp`. Most clients connect through OAuth automatically; clients that need a manual configuration use this URL with an HTTP transport. You can also authenticate with a [Bearer token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#creating-an-access-token) instead of OAuth. ::: ## Prerequisites - A MotherDuck account (sign up free) - An MCP-compatible AI client (Claude, ChatGPT, Cursor, Claude Code, Codex, or others) ## Set up the remote MCP server Select your MCP client and follow the instructions to connect. Add MotherDuck to Claude Or manually: 1. Go to **Settings** → **Connectors** 2. Click **Browse Connectors** to find the MotherDuck connector ![MotherDuck Connector in the Claude connector Directory](./img/claude-connectors-motherduck.png) A browser window should open for authentication. After authentication you can double check the connection by asking "List all my databases on MotherDuck." Add MotherDuck to ChatGPT 1. Open the ChatGPT desktop or web app 2. Go to **Settings** → **Apps** and click **Browse Apps** Browse Apps in ChatGPT settings 3. Search for **MotherDuck** and select it Searching for MotherDuck in the ChatGPT App Store 4. Click **Continue to MotherDuck** and authenticate with your MotherDuck account Connect MotherDuck dialog in ChatGPT After authentication, ChatGPT can access your MotherDuck data. Try asking "List all my databases on MotherDuck" to verify the connection. Add MotherDuck to Cursor 1. Open **Cursor Settings** (`Cmd/Ctrl + ,`) 2. Navigate to **Tools & MCP** 3. Click **+ New MCP Server** 4. Add the following to the configuration file: ```json { "MotherDuck": { "url": "https://api.motherduck.com/mcp", "type": "http" } } ``` 5. Save and click **Connect** to authenticate with your MotherDuck account > [Cursor MCP Documentation](https://docs.cursor.com/context/model-context-protocol) 1. Run the following command in your terminal: ```bash claude mcp add MotherDuck --transport http https://api.motherduck.com/mcp ``` :::tip By default, this command adds the MCP server to the current project. You can also pass the `--scope user` flag, and the MCP server will be available for all sessions from your current user ([`--scope` documentation](https://code.claude.com/docs/en/mcp#mcp-installation-scopes)). ::: 2. Run `claude` to start Claude Code 3. Type `/mcp`, select **MotherDuck** from the list, and press **Enter** 4. Select **Authenticate** and confirm the authorization dialog > [Claude Code MCP Documentation](https://code.claude.com/docs/en/mcp) Configure GitHub Copilot in VS Code to use the MotherDuck MCP server through a workspace config file: 1. Open the Command Palette (`Cmd/Ctrl + Shift + P`) and run **MCP: Add Server** to open `.vscode/mcp.json`. You can also create the file manually in your workspace. Add this configuration: ```json { "servers": { "motherduck": { "type": "http", "url": "https://api.motherduck.com/mcp" } } } ``` 2. Save the file and start the server from the **Start** code lens that appears above the `motherduck` entry in `mcp.json`. You can also start it through the Command Palette: `MCP: List Servers` → **motherduck** → **Start Server**. 3. VS Code opens a browser window so you can sign in to MotherDuck through OAuth, then stores the credentials for subsequent server starts. 4. Open the Copilot Chat view, switch to **Agent** mode, and confirm that the MotherDuck tools appear in the tool picker. Try asking "List all my databases on MotherDuck" to verify the connection. **Authenticate with an access token instead of OAuth** If you'd rather provide a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#creating-an-access-token) explicitly, use a `promptString` input and a `Bearer` Authorization header. VS Code prompts for the token when the server starts and stores it in its secret store: ```json { "inputs": [ { "type": "promptString", "id": "motherduck-token", "description": "MotherDuck access token", "password": true } ], "servers": { "motherduck": { "type": "http", "url": "https://api.motherduck.com/mcp", "headers": { "Authorization": "Bearer ${input:motherduck-token}" } } } } ``` > [VS Code MCP Documentation](https://code.visualstudio.com/docs/copilot/chat/mcp-servers) [Microsoft Copilot Studio](https://learn.microsoft.com/en-us/microsoft-copilot-studio/) is a cloud-hosted platform for building agents that run inside Microsoft 365, Teams, and other Microsoft surfaces. Because the platform runs in Microsoft's cloud, it connects to the **remote** MotherDuck MCP server — either with OAuth (each user signs in with their own MotherDuck account) or with a shared API key backed by a service-account token. 1. In Copilot Studio, open your agent. Under **Tools**, click **Add a tool**. 2. In the **Add tool** dialog, under **Create new**, click **Model Context Protocol**. 3. Fill in the MCP server details and pick an authentication method: - **Server name**: `MotherDuck MCP` - **Server description**: `Connect to MotherDuck, query your data, create Dives and more!` - **Server URL**: `https://api.motherduck.com/mcp` - **Authentication**: either `OAuth 2.0` or `API key` (see below) **Option A — OAuth 2.0 (dynamic discovery).** Each end user signs in to MotherDuck with their own account when they first use the agent. Select **OAuth 2.0** and leave **Dynamic discovery** as the type, then click **Create**. **Option B — API key (shared service-account token).** All end users share a single MotherDuck token. Useful when you don't want every user to provision a MotherDuck account, for example a Teams bot exposed to a wide audience. Select **API key**, set **Type** to `Header`, enter `Authorization` as the **Header name**, and click **Create**. :::caution **Header name** must be `Authorization` — not `Bearer`. The `Bearer` prefix belongs in the *value* you enter in step 5. ::: 4. Back in the **Add tool** dialog for MotherDuck MCP, open the **Connection** dropdown and click **Create new connection**. The next step depends on the authentication method you picked in step 3: - **OAuth 2.0**: Copilot Studio opens a browser window that redirects to MotherDuck. The end user signs in to their MotherDuck account and approves the request. The connection is created once authentication completes — skip to step 6. - **API key**: Copilot Studio shows the token entry dialog described in step 5. 5. In the **Connect to MotherDuck MCP** dialog, enter your MotherDuck access token prefixed with `Bearer `: ```text Bearer ``` Replace `` with an actual token from [MotherDuck → Settings → Access Tokens](https://app.motherduck.com/settings/tokens), then click **Create**. :::tip If the agent is published and used by many end users, create a dedicated [service account](/key-tasks/service-accounts-guide/) and use a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) so the agent can't modify data. See [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/) for details. ::: 6. Once the connection shows a green check mark, click **Add and configure**. Copilot Studio confirms the tool was added successfully. 7. The MotherDuck MCP entry opens with the full tool list. Enable or disable tools based on what the agent should be allowed to do (for example, disable `query_rw` if the agent should stay read-only), then click **Save**. 8. Open the agent's connection manager and click **Connect** on the MotherDuck MCP entry, then submit. This reuses the connection you created in step 5. 9. Switch to the **Test** pane and ask a question that exercises the tools, for example *"What's the highest rated movie with over 10k votes in my IMDB database?"*. The agent calls the MotherDuck tools and responds with live data from your databases. :::note When you authenticate with an API key, all users of the Copilot Studio agent share the same MotherDuck token. Queries run by any end user are attributed to the service account that owns the token, not to the individual Microsoft 365 user. Use OAuth 2.0 if you need per-user attribution. ::: > [Copilot Studio MCP documentation](https://learn.microsoft.com/en-us/microsoft-copilot-studio/mcp-add-existing-server-to-agent)
Alternative: Power Automate custom connector (OpenAPI) If you'd rather wire the MotherDuck MCP server in as a [Power Automate custom connector](https://learn.microsoft.com/en-us/connectors/custom-connectors/) (for example, to share the connector across Copilot Studio and Power Automate flows in the same environment), you can import the following OpenAPI 2.0 spec. The `x-ms-agentic-protocol: mcp-streamable-1.0` extension tells Copilot Studio to treat the connector as a streamable MCP server. ```yaml swagger: '2.0' info: title: MotherDuck Remote MCP description: The remote MCP to connect to MotherDuck tools, docs and more version: 1.0.0 host: api.motherduck.com basePath: / schemes: - https paths: /mcp: post: summary: MotherDuck Remote MCP description: The remote MCP to connect to MotherDuck tools, docs and more operationId: InvokeServer x-ms-agentic-protocol: mcp-streamable-1.0 responses: '200': description: Immediate Response securityDefinitions: api_key: type: apiKey in: header name: Authorization security: - api_key: [] ``` In Power Automate, go to **Custom connectors → New custom connector → Import an OpenAPI file**, paste the spec above, and save. When you create a connection, enter `Bearer ` as the API key value — the same format as the native MCP flow described above.
If you're using **Windsurf**, **Zed**, or another MCP-compatible client, use the following JSON configuration: ```json { "mcpServers": { "MotherDuck": { "url": "https://api.motherduck.com/mcp", "type": "http" } } } ```
:::tip Authentication The remote MCP server uses OAuth, so you'll authenticate with your MotherDuck account during setup. Some clients also support [token-based authentication](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#creating-an-access-token) through a Bearer header. ::: ## Configuring tool permissions Most MCP clients let you control how the AI uses each tool. The exact UI varies by client, but the general permission levels are: | Permission | Behavior | |------------|----------| | **Always allow** | The AI uses the tool automatically without asking. Faster iteration when errors occur, but no human confirmation before each action. | | **Needs approval** | The AI asks for your confirmation before each tool use. Gives you visibility into every action. | | **Blocked** | The AI cannot use this tool. | :::tip The MCP Server provides both read-only (`query`) and read-write (`query_rw`) tools. For exploratory analysis, setting read-only tools to "Always allow" enables faster back-and-forth when the AI needs to retry or refine queries. You can keep `query_rw` on "Needs approval" or block it if you only need read access. See [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/) for more options. ::: ## Remote vs local MCP server MotherDuck offers two MCP server options: | Server | Best for | Setup | Access | |--------|----------|--------|--------| | **Remote** (hosted by MotherDuck) | Most users who query and modify data on MotherDuck cloud | Zero setup; connect through URL and OAuth | Read-write | | **Local** ([mcp-server-motherduck](https://github.com/motherduckdb/mcp-server-motherduck)) | Self-hosted use; local DuckDB files; or when you need full customization | Install and run the server yourself | Fully customizable | The **remote server** is recommended for most use cases. Use the **local server** when you need to work with local DuckDB files, want custom tool configurations, or require full control over the server environment. **Local MCP Server GitHub Repository** – Self-host the open-source MCP server for DuckDB and MotherDuck ## Where to go from here - **[AI Data Analysis Getting Started](/getting-started/mcp-getting-started/)**: 5-minute walkthrough of querying data and creating Dives - **[MCP Workflows Guide](/key-tasks/ai-and-motherduck/mcp-workflows/)**: Best practices for getting accurate results from AI-powered analysis - **[MCP Server Reference](/sql-reference/mcp/)**: Server capabilities, available tools, and regional availability - **[Restricting to Read-Only Access](/key-tasks/ai-and-motherduck/securing-read-only-access/)**: Restrict your AI assistant to read-only queries --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-workflows --- sidebar_position: 1 title: Using the MotherDuck MCP Server description: Effective workflows and best practices for getting the most out of the MotherDuck MCP Server with AI assistants --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; The MotherDuck **remote** MCP Server, available at `https://api.motherduck.com/mcp`, connects AI assistants like Claude, ChatGPT, and Cursor to your data. This guide covers workflows for getting accurate, useful analysis results. If you haven't already, [set up your remote MCP connection](/key-tasks/ai-and-motherduck/mcp-setup/). :::info Remote vs local MCP This guide is written for the **remote MCP** (fully managed by MotherDuck). Most of the tips apply to the **local MCP** (fully customizable, self-hosted) as well. For local MCP setup and details, see the [MCP reference](/sql-reference/mcp/#local-mcp-server). ::: ## Prerequisites To use the MotherDuck remote MCP server, you will need: - A MotherDuck account with at least one database - An AI client like Claude, Cursor, or ChatGPT already connected to the remote MCP server ([setup instructions](/key-tasks/ai-and-motherduck/mcp-setup/)) :::note Read vs write tools The remote MCP server exposes two query tools: `query` for read-only SQL and `query_rw` for SQL that can change data or schema. See the [query](/sql-reference/mcp/query/) and [query_rw](/sql-reference/mcp/query-rw/) references for details. To enforce read-only access, see [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/). ::: ## How it works When you ask an AI assistant a question about your data, here's what happens behind the scenes: 1. **Schema exploration**: The AI examines your database structure to understand available tables and columns 2. **Query generation**: Based on your question, the AI writes DuckDB SQL 3. **Query execution**: The remote MCP Server runs the query on MotherDuck 4. **Results interpretation**: The AI explains the results in natural language You can inspect which SQL query the MCP executed by expanding the tool call in the conversation: ![Inspecting the query executed by MCP](./img/mcp_inspect_query.png) When you create a Dive: 1. **Data analysis**: The AI agent queries your database to understand the data relevant to your request 2. **Visualization generation**: The agent generates an interactive React component with the SQL queries and chart configuration 3. **Inline preview**: The Dive renders in the conversation so you can iterate before saving. In clients that support the Dive Viewer MCP App (Claude web and desktop at launch), the preview runs against live data with the same components used in the MotherDuck UI. In other clients, you see a static preview with sample data, and the Dive queries live data once you open it in MotherDuck. 4. **Save to MotherDuck**: Each save is stored in your workspace and always queries live data, so there are no stale snapshots. You can find the Dive in the [MotherDuck UI](/key-tasks/ai-and-motherduck/dives/#finding-your-dives) under the Object Explorer or **Settings** → **Dives**. With the Dive Viewer, every edit creates a separate version automatically. 5. **Share with your team**: The agent can [share the underlying data](/sql-reference/mcp/share-dive-data) with your organization so others can view and interact with the Dive ## Start with schema exploration Before diving into analysis, help the AI understand your data. This is a form of **context engineering**: by exploring your schema upfront, you hydrate the conversation with knowledge about your tables, columns, and relationships. This context carries forward, helping the AI write more accurate queries throughout your session. Start conversations by asking about your database structure: **Good first prompts:** - *"What databases and tables do I have access to?"* - *"Describe the schema of my `analytics` database"* - *"What columns are in the `orders` table and what do they contain?"* The remote MCP server provides tools for schema exploration that surface table relationships, data types, and any documentation you've added to your schema. :::tip If you have well-documented tables with [`COMMENT ON`](https://duckdb.org/docs/stable/sql/statements/comment_on.html) descriptions, the AI can use these to better understand your data's business meaning. ::: ## Frame questions with context The more context you provide, the better the results. Include relevant details like: - **Time ranges**: *"Show me orders from the last 30 days"* vs *"Show me orders"* - **Filters**: *"Analyze customers in the US with more than 5 purchases"* - **Metrics**: *"Calculate revenue as `quantity * unit_price`"* - **Output format**: *"Return results as a summary table with percentages"* **Example - Vague vs. Specific:** | ❌ Vague | ✅ Specific | |----------|-------------| | "Show me sales data" | "Show me total sales by product category for Q4 2024, sorted by revenue descending" | | "Find top customers" | "Find the top 10 customers by total order value in the last 12 months" | | "Analyze trends" | "Compare monthly active users month-over-month for 2024, showing growth rate" | ## Iterate Complex analysis works best as a conversation. Start simple, validate the results, then build up. Each exchange adds shared context, helping the AI write better queries as you go. While there is a temptation to get the perfect query in one shot, often insight comes as part of the process of data exploration. When iterating, it can be helpful to have source data nearby to help verify outputs. Our users have noted that using their existing BI dashboard to quickly validate that metrics are correct helps to develop intuition about the information provided by the AI assistants. ## Common workflow patterns ### Data profiling Quickly understand a new dataset: ```text "Profile the `transactions` table - show me: - Row count and date range - Distribution of key categorical columns - Summary statistics for numeric columns - Any null values or data quality issues" ``` :::tip DuckDB functions for EDA DuckDB has a few SQL functions that are great for hydrating context: - `DESCRIBE` which retrieves the metadata for a specific table - `SUMMARIZE` which gets summary stats for a table (can be large) - The `USING SAMPLE 10` clause (at the end of the query) which samples the data (can be large) - using it with a where clause to narrow down is very helpful for performance ::: ### Generating charts Some AI clients can generate visualizations directly from your query results. ChatGPT on the web and Claude Desktop both support creating charts as "artifacts" alongside your conversation. Visualizations help you spot trends and outliers faster than scanning tables, validate that query results make sense at a glance, and share insights with stakeholders who prefer visual formats. **Example prompts:** - *"Chart monthly revenue for 2024 as a line graph"* - *"Create a bar chart showing the top 10 customers by order count"* - *"Visualize the distribution of order values as a histogram"* - *"Show me a time series of daily active users with a 7-day moving average"* Once you have a chart, you can iterate on it just like query results: *"Add a trend line"*, *"Change to a stacked bar chart"*, or *"Break this down by region"*. :::note When using the MCP with more IDE-like interfaces, the MCP plays very nicely with libraries like `matplotlib` for building more traditional charts. ::: ### Querying private S3 buckets You can use the MCP to analyze files in private S3 buckets (Parquet, CSV, JSON) by storing your AWS credentials as a [secret in MotherDuck](/sql-reference/motherduck-sql-reference/create-secret/). You can create secrets directly in the [MotherDuck UI](https://app.motherduck.com) under **Settings → Secrets**. ![The MotherDuck secrets UI](./img/md_create_secret_ui.png) This is recommended for desktop AI clients. If you use AWS SSO, you can refresh your credentials and store them in MotherDuck: 1. Create an AWS credential profile ```bash aws configure sso ``` 2. Authenticate with AWS SSO: ```bash aws sso login --profile ``` 3. Open a DuckDB client (for example, the CLI) and create a secret using the credential chain: ```sql ATTACH 'md:'; CREATE OR REPLACE SECRET IN MOTHERDUCK ( TYPE s3, PROVIDER credential_chain, CHAIN 'sso', PROFILE '' ); ``` This stores your AWS credentials in MotherDuck, making them available to the remote MCP server. :::note Run `aws sso login --profile ` before creating the secret to refresh your SSO token. Starting with DuckDB v1.4.0, credentials are validated at creation time. If your local credentials are not resolvable, the command will fail: use the correct `CHAIN` and `PROFILE` for your credential type, or add `VALIDATION 'none'` as a last resort to skip local validation. ::: :::note Credential expiration If you use temporary credentials (SSO, IAM roles), you'll need to refresh the secret when they expire by running the `CREATE OR REPLACE SECRET` command again. ::: Once your credentials are set up, you can ask your AI assistant to query any S3 bucket you have access to: ```text "Give me some analytics about s3://my-bucket/sales-data.parquet" ``` ![Exploring S3 data with MCP](./img/mcp_explore_s3.png) ### Ad-hoc investigation The MCP is especially useful for exploratory debugging when you're not sure what you're looking for. Rather than writing queries upfront, you can describe the problem and let the AI help you dig in. ```text "I noticed a spike in errors on Dec 10th. Help me investigate: - What types of errors increased? - Were specific users or endpoints affected? - What changed compared to the previous week?" ``` One pattern we use at MotherDuck is loading logs or event data into a database and using the MCP to interrogate it conversationally. Instead of manually crafting regex patterns or grep commands, you can ask questions like *"What are the most common error messages in the last hour?"* or *"Show me all requests from user X that resulted in a 500 error"*. This turns log analysis from a tedious grep session into an interactive investigation where each answer informs the next question. ## Working with query results ### Refining results Results rarely come out perfect on the first try. The conversational nature of MCP means you can refine incrementally rather than rewriting queries from scratch. If you're seeing test data mixed in, just say *"Add a filter to exclude test accounts"*. If the granularity is wrong, ask to *"Change the grouping from daily to weekly"*. Small adjustments like changing sort order or adding a column are easy follow-ups. ### Understanding queries When the AI generates complex SQL, don't hesitate to ask for an explanation. This is useful both for validating the approach and for learning. Ask *"Explain what this query is doing step by step"* to understand the logic, or *"Are there any edge cases this query might miss?"* to sanity-check the results before relying on them. ### Exporting for further use Once you have the results you need, ask for output in the format that fits your workflow. You can request a markdown table for documentation, CSV-friendly output for spreadsheets, or a written summary to share with your team. The AI can also help you format results for specific tools or audiences. Sometimes it can also be a great jumping off for further analysis with an expert, so asking for the final query to hand-off can also be a great step. ## Tips for better results ### Be explicit about assumptions Your data likely has business rules that aren't obvious from the schema alone. If a "completed" order means status is either 'shipped' or 'delivered', say so. If revenue calculations should exclude refunds, mention it upfront. The AI can't infer these domain-specific rules, so stating them early prevents incorrect results and saves iteration time. ### Reference specific tables and columns When you already know your schema, being specific helps the AI get it right the first time. Instead of asking about "the timestamp", say *"Use the `user_events.event_timestamp` column"*. If you know how tables relate, specify the join: *"Join `orders` to `customers` on `customer_id`"*. This is especially helpful in larger schemas where column names might be ambiguous. ### Ask for validation When accuracy matters, ask the AI to sanity-check its own work. Questions like *"Does this total match what you'd expect based on the row counts?"* or *"Can you verify this join doesn't create duplicates?"* can catch subtle bugs before you rely on the results. The AI can run quick validation queries to confirm the logic is sound. ## Troubleshooting :::tip Beyond querying The remote MCP server includes tools beyond just running queries. Most are metadata lookups or search functions for finding tables and columns, but the [ask docs question](/sql-reference/mcp/ask-docs-question) tool is particularly useful when you're stuck on tricky syntax or DuckDB-specific features. If the AI is struggling with a query pattern, try asking it to look up the relevant documentation first. ::: | Issue | Solution | |-------|----------| | AI queries wrong table | Ask: *"What tables are available?"* then specify the correct one | | Results don't look right | Ask: *"Show me sample data from the source table"* to verify the data | | Query is slow | Ask: *"Can you optimize this query?"*, add filters to reduce data scanned, or [increase your Duckling size](/about-motherduck/billing/duckling-sizes/) | | AI doesn't understand the question | Rephrase with more specific column names and business context | | Can't type fast enough | Use voice-to-text to interact with your AI assistant | ## Related resources - [Connect to MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) - Setup instructions for all supported AI clients - [AI Features in the UI](/key-tasks/ai-and-motherduck/ai-features-in-ui/) - Built-in AI features for the MotherDuck interface - [Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/) - Build custom AI agents with MotherDuck --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/securing-read-only-access --- sidebar_position: 2 title: Restricting to read-only access description: Restrict the remote MCP server to read-only queries using client-side blocking, read scaling tokens, or proxy filtering --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import DocImage from '@site/src/components/DocImage'; # Restricting to read-only access The remote MCP server exposes both the read-only `query` tool and the read-write `query_rw` tool. If you want to ensure your AI assistant can only read data, there are three approaches depending on your setup. | Approach | Enforcement | Setup | Works with OAuth connectors | |----------|------------|-------|-----------------------------| | [Block the tool at the client](#block-the-query_rw-tool-at-the-client) | Client-side | Low (UI toggle) | Yes | | [Use a read scaling token](#use-a-read-scaling-token) | Server-side | Medium (manual config) | No (replaces OAuth) | | [Proxy filtering](#proxy-filtering) | Application-side | Varies | N/A (custom backend) | ## Block the `query_rw` tool at the client The simplest approach: keep using the OAuth connector, but configure your MCP client to never call the `query_rw` tool. The server still exposes the tool, but the client will never invoke it. Most clients support this at the **individual user** level. ChatGPT also lets **organization admins** enforce tool restrictions across all workspace members. Each user can block tools individually. Go to **Settings → Connectors → MotherDuck**, expand **Write/delete tools**, and select the blocked icon next to `query_rw`: ![Blocking the query_rw tool in Claude's connector settings](./img/query-rw-blocked.png) :::note Claude does not support org-level per-tool blocking. Team/Enterprise admins can remove a connector entirely from **Organization settings → Connectors**, but cannot selectively disable individual tools like `query_rw` for all members. ::: > [Claude connector permissions documentation](https://support.claude.com/en/articles/11175166-get-started-with-custom-connectors-using-remote-mcp) **Enterprise/Edu admins:** Admins can [enable or disable specific app actions after publishing](https://help.openai.com/en/articles/12584461-developer-mode-and-full-mcp-connectors-in-chatgpt-beta). Go to **Workspace Settings → Apps**, click the `...` menu next to MotherDuck, select **Action control**, and deselect `query_rw`. New tools added by the MCP server are disabled by default — admins must explicitly enable them. **Business plans:** Per-tool Action control is not available for custom MCP apps after publishing. To change which tools are exposed, remove and recreate the app ([developer mode documentation](https://help.openai.com/en/articles/12584461-developer-mode-and-full-mcp-connectors-in-chatgpt-beta)). Open **Cursor Settings** → **Tools & MCP**, expand the MotherDuck server entry, and toggle off `query_rw`. :::note Tool toggles are stored locally in Cursor's database, not in the `mcp.json` config file. They cannot be shared across a team through config files. ::: Add a deny rule to your `.claude/settings.json` (project-level) or `~/.claude/settings.json` (user-level): ```json { "permissions": { "deny": ["mcp__MotherDuck__query_rw"] } } ``` > [Claude Code permissions documentation](https://code.claude.com/docs/en/permissions) Open your agent in Copilot Studio, go to **Tools**, and open the MotherDuck MCP entry. Toggle `query_rw` off in the tool list and click **Save**. The agent only sees `query` and the schema exploration tools. ## Use a read scaling token For server-side enforcement, authenticate with a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) instead of a regular access token. Read scaling tokens connect to dedicated read replicas that reject all write operations — even if the client calls `query_rw`, writes will fail. This requires manual configuration instead of the one-click OAuth connectors. :::note Read scaling connections are [eventually consistent](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#ensuring-data-freshness). Results may lag a few minutes behind the latest database state. ::: You can create a read scaling token from the [MotherDuck UI](https://app.motherduck.com) under **Settings → Access Tokens** or through the [REST API](/sql-reference/rest-api/users-create-token/). Read scaling tokens also unlock concurrent MCP sessions: each MCP instance that connects with a read scaling token is assigned to a read replica (duckling) from a pool. Up to the pool size (default 4, max 16), each connection gets its own duckling; once the pool is full, new connections are assigned to existing ducklings in round-robin. This means you can run many MCP sessions in parallel from the same account—for example, multiple AI agents or team members querying simultaneously. See [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for details on pool sizing and how replicas are assigned. Claude's web connector only supports OAuth, so you need to use the desktop config instead. Open **Settings → Developer → Edit Config** and add: ```json { "mcpServers": { "MotherDuck": { "command": "npx", "args": [ "mcp-remote", "https://api.motherduck.com/mcp", "--header", "Authorization: Bearer ${MOTHERDUCK_TOKEN}" ], "env": { "MOTHERDUCK_TOKEN": "" } } } } ``` This uses [`mcp-remote`](https://www.npmjs.com/package/mcp-remote) to bridge the remote MCP server into Claude Desktop's local stdio transport. ChatGPT connectors can't set static headers. To use a read scaling token, run a proxy that injects the `Authorization` header and connect ChatGPT to that proxy. Example proxy (Cloudflare Worker): ```js export default { async fetch(request, env) { const upstreamUrl = new URL(request.url); upstreamUrl.protocol = "https:"; upstreamUrl.hostname = "api.motherduck.com"; upstreamUrl.pathname = "/mcp"; const upstreamRequest = new Request(upstreamUrl, request); upstreamRequest.headers.set( "Authorization", `Bearer ${env.MOTHERDUCK_READ_SCALING_TOKEN}` ); upstreamRequest.headers.delete("cookie"); return fetch(upstreamRequest); }, }; ``` 1. Deploy the proxy and store the read scaling token as a secret (for example, `MOTHERDUCK_READ_SCALING_TOKEN`). 2. In [ChatGPT Settings → Connectors](https://chatgpt.com/#settings/Connectors), click **Create App**. 3. Enter: - **Name:** `MotherDuck (Read Only)` - **MCP Server URL:** `` - **Authentication:** `No authentication` 4. Open a chat, select the connector, and run a query (for example: `SELECT * FROM information_schema.tables LIMIT 5`). `query_rw` may still appear, but writes fail because read scaling tokens are read-only. Open **Cursor Settings** → **Tools & MCP** → **+ New MCP Server** and add the following configuration: ```json { "MotherDuck": { "url": "https://api.motherduck.com/mcp", "type": "http", "headers": { "Authorization": "Bearer " } } } ``` ```bash claude mcp add --transport http \ --header "Authorization: Bearer " \ MotherDuck https://api.motherduck.com/mcp ``` Follow the [Copilot Studio MCP setup](/key-tasks/ai-and-motherduck/mcp-setup/?mcp-client=copilot-studio) with **API key** authentication, and when prompted for the connection value, enter your read scaling token: ```text Bearer ``` The `query_rw` tool may still appear in the agent's tool list, but writes fail at the server because read scaling replicas reject write operations. For belt-and-braces, also toggle `query_rw` off in the tool list so the model never sees it as an option. For MCP-compatible clients that support simple authentication, use the following JSON configuration with a read scaling token as the Bearer value: ```json { "mcpServers": { "MotherDuck": { "url": "https://api.motherduck.com/mcp", "type": "http", "headers": { "Authorization": "Bearer " } } } } ``` For clients that only support local (stdio) servers, use `mcp-remote` to bridge the connection: ```json { "mcpServers": { "MotherDuck": { "command": "npx", "args": [ "mcp-remote", "https://api.motherduck.com/mcp", "--header", "Authorization: Bearer ${MOTHERDUCK_TOKEN}" ], "env": { "MOTHERDUCK_TOKEN": "" } } } } ``` ## Proxy filtering If you're integrating the remote MCP server into a backend service or custom agent framework, you can restrict access at the application layer. When proxying MCP tool calls, omit or reject calls to the `query_rw` tool and only forward calls to the read-only `query` tool and schema exploration tools. See [Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents) for patterns on building custom agent integrations with read-only access controls. --- Source: https://motherduck.com/docs/key-tasks/ai-and-motherduck/text-search-in-motherduck --- title: Text Search in MotherDuck description: Text search strategies from pattern matching to semantic search with embeddings in MotherDuck. --- # Text Search in MotherDuck Text search is a fundamental operation in data analytics - whether you're finding records by name, searching documents for relevant content, or building question-answering systems. This guide covers search strategies available in MotherDuck, from simple pattern matching to advanced semantic search, and how to combine them for optimal results. ## Quick Start: Common Search Patterns Start here to identify the best search method for your use case. The right search approach depends on what you're searching, how you expect to use search, and what results you need. Most use cases fall into one of three patterns, each linking to detailed implementation guidance below: **Keyword Search Over Identifiers**: When searching for specific items like company names, product codes, or customer names, use [Exact Match](#exact-match) for precise and low-latency lookups. If you need typo tolerance (e.g., "MotheDuck" → "MotherDuck"), use [Fuzzy Search](#fuzzy-search-text-similarity). **Keyword Search Over Documents**: When searching longer text like articles, product descriptions, or documentation, use [Full-Text Search](#full-text-search-fts). This ranks documents by keyword relevance, and handles cases where users provide a few keywords that should appear in the content. **Semantic Search**: When searching by meaning and similarity rather than exact keywords, use [Embedding-based Search](#embedding-based-search). This covers: - Understanding synonyms (e.g., matching "data warehouse" with "analytics platform") - Understanding natural language queries (e.g., "wireless headphones with good battery life") - Finding similar content (e.g., support tickets describing similar customer issues) --- For answering natural language questions about *structured* data (e.g., "How many customers do we have in California?"), see [Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/). ## Refining Your Search Strategy If the patterns above don't fully match your use case, use these four questions to navigate to the right method. Each question links to specific sections with implementation details: 1. **What is the search corpus?** Consider what you're searching through: - **Identifiers** like company names, product IDs, or person names → [Exact Match](#exact-match) or [Fuzzy Search](#fuzzy-search-text-similarity) - **Documents** like articles, descriptions, or reports → [Keyword search (regex)](#exact-match) or [Full-Text Search](#full-text-search-fts) (FTS) or [Embedding-Based Search](#embedding-based-search) or [Hybrid](#fts-pre-filtering-hybrid-search) (combining FTS + embeddings) - **Structured (numerical) data** → [Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents/) that convert natural language questions to SQL 2. **What is the user input?** Think about how users express their search: - **Single terms** like "MotherDuck" → [Exact Match](#exact-match) or [Fuzzy Search](#fuzzy-search-text-similarity) - **Keyword phrases** like "data warehouse analytics" → [Keyword search (regex)](#exact-match) or [Full-Text Search](#full-text-search-fts) or [Embedding-based search](#embedding-based-search) - **Questions** like "What companies offer cloud analytics?" → [Embedding-based search](#embedding-based-search) with [HyDE](#hypothetical-document-embeddings-hyde) - **Example documents** (finding similar content) → [Embedding-based search](#embedding-based-search) 3. **What is the desired output?** Clarify what you're returning: - **Ranked list** (retrieval of documents/records) → Covered by this guide - **Generated text answers** (RAG-style Q&A, chatbots, summarization) → Use retrieval methods from this guide in combination with the [`prompt()`](/sql-reference/motherduck-sql-reference/ai-functions/prompt/#retrieval-augmented-generation-rag) function. 4. **What is the desired search behavior?** Think about what search qualities matter: - **Exact match** for specific words (IDs and codes) → [Exact Match](#exact-match) or [Keyword search (regex)](#using-regular-expressions) - **Typo resilience** to handle misspellings like "MotheDuck" → "MotherDuck" → [Fuzzy search](#fuzzy-search-text-similarity) - **Synonym resilience** to match "data warehouse" with "analytics platform" → [Embedding-based search](#embedding-based-search) - **Customizable ranking** → See [Reranking](#reranking) in the [Advanced Methods](#advanced-methods) section - **Latency and concurrency** → See [Performance Guide](#performance-guide) ## Search Methods ### Exact Match Use exact match search for specific identifiers, codes, or when you need guaranteed matches. This is the fastest search method. #### Using LIKE For substring matching, use `LIKE` (or `ILIKE` for case-insensitive). In patterns, `%` matches any sequence of characters and `_` matches exactly one character. ```sql -- Find places with 'Starbucks' in their name SELECT name, locality, region FROM foursquare.main.fsq_os_places WHERE name LIKE '%Starbucks%' LIMIT 10; ``` See also: [Pattern Matching](https://duckdb.org/docs/stable/sql/functions/pattern_matching.html) in DuckDB documentation #### Using Regular Expressions For more complex pattern matching or matching multiple keywords, use `regexp_matches()` with `(?i)` for case-insensitive searches: ```sql -- Find Hacker News posts with 'python', 'javascript', or 'rust' in text SELECT title, "by", score FROM sample_data.hn.hacker_news WHERE regexp_matches(text, '(?i)(python|javascript|rust)') LIMIT 10; ``` See also: [Regular Expressions](https://duckdb.org/docs/stable/sql/functions/regular_expressions) in DuckDB documentation ### Fuzzy Search (Text Similarity) Fuzzy search handles typos and spelling variations in entity names like companies, people, or products. Use `jaro_winkler_similarity()` for most fuzzy matching scenarios - it offers the best balance of accuracy and performance compared to `damerau_levenshtein()` or `levenshtein()`. ```sql -- Find places similar to 'McDonalds' (handles typo 'McDonalsd') SELECT name, locality, region, jaro_winkler_similarity('McDonalsd', name) AS similarity FROM foursquare.main.fsq_os_places ORDER BY similarity DESC LIMIT 10; ``` See also: [Text Similarity Functions](https://duckdb.org/docs/stable/sql/functions/text#text-similarity-functions) in DuckDB documentation ### Full-Text Search (FTS) Full-Text Search ranks documents by keyword relevance using BM25 scoring, which considers both how often terms appear in a document and how rare they are across all documents. Use this for articles, descriptions, or longer text where you need relevance ranking. FTS automatically handles word stemming (e.g., "running" matches "run") and removes common stopwords (like "the", "and", "or"), but requires exact word matches - it won't handle typos in search queries. #### Basic FTS Setup FTS requires write access to the table. Since we're using a read-only example database, we first create a copy of the table in a read-write database we own: ```sql CREATE TABLE hn_stories AS SELECT id, title, text, "by", score, type FROM sample_data.hn.hacker_news WHERE type = 'story' AND LENGTH(text) > 100 LIMIT 10000; ``` Build the FTS index on the text column. This creates a new schema called `fts_{schema}_{table_name}` (in this case `fts_main_hn_stories`): ```sql PRAGMA create_fts_index( 'hn_stories', -- table name 'id', -- document ID column 'text' -- text column to index ); ``` Search the index using the `match_bm25` function from the newly created schema: ```sql SELECT id, title, text, fts_main_hn_stories.match_bm25(id, 'database analytics') AS score FROM hn_stories ORDER BY score DESC LIMIT 10; ``` #### Index Maintenance FTS indexes need to be updated when the underlying data changes. Rebuild the index using the `overwrite` parameter: ```sql PRAGMA create_fts_index('hn_stories', 'id', 'text', overwrite := 1); ``` See also: [Full-Text Search Guide](https://duckdb.org/docs/stable/guides/sql_features/full_text_search.html) and [Full-Text Search Extension](https://duckdb.org/docs/stable/core_extensions/full_text_search) in DuckDB documentation ### Embedding-Based Search Embedding-based search finds conceptually similar text by meaning, not keywords. Use this for natural language queries, handling synonyms, or when users search with questions. Embeddings handle synonyms and typos naturally without manual configuration. :::note Embedding generation and lookups are priced in [AI Units](/about-motherduck/billing/pricing#advanced-ai-functions). For paid organizations, Business and Lite plans have a default soft limit of 10 AI Units per user/day (sufficient to embed around 600,000 rows) to help prevent unexpected costs. If you'd like to adjust these limits, [just ask!](/troubleshooting/support) ::: :::info The DuckDB [VSS extension](https://duckdb.org/docs/stable/core_extensions/vss) for approximate vector search (HNSW) is currently experimental, and not supported in MotherDuck's cloud service (Server-Side). [Learn more](/concepts/duckdb-extensions/) about MotherDuck's support for DuckDB extensions. ::: #### Basic Embedding-Based Search Setup Generate embeddings for your text data, then search using exact vector similarity. For search queries phrased as questions (like "What are the best practices for...?"), see [Hypothetical Document Embeddings](#hypothetical-document-embeddings-hyde). ```sql -- Reusing the hn_stories table from the FTS section, add embeddings ALTER TABLE hn_stories ADD COLUMN text_embedding FLOAT[512]; UPDATE hn_stories SET text_embedding = embedding(text); -- Semantic search - this will also match texts with related concepts like 'neural networks', 'deep learning', etc. SELECT title, text, array_cosine_similarity( embedding('machine learning and artificial intelligence'), text_embedding ) AS similarity FROM hn_stories ORDER BY similarity DESC LIMIT 10; ``` See also: [MotherDuck Embedding Function](/sql-reference/motherduck-sql-reference/ai-functions/embedding/), and [array_cosine_similarity](https://duckdb.org/docs/stable/sql/functions/array#array_cosine_similarityarray1-array2) in DuckDB documentation #### Document Chunking for Embedding-Based Search When documents are longer than ~2000 characters, consider breaking them into smaller chunks to improve retrieval precision and focus results. For production pipelines with PDFs or Word docs, you can use the [MotherDuck integration for Unstructured.io](https://motherduck.com/blog/effortless-etl-unstructured-data-unstructuredio-motherduck/). Otherwise, you can also do document chunking in the database - here are some helpful macros: ```sql -- Fixed-size chunking with configurable overlap CREATE MACRO chunk_fixed_size(text_col, chunk_size, overlap) AS TABLE ( SELECT gs.generate_series as chunk_number, substring(text_col, (gs.generate_series - 1) * (chunk_size - overlap) + 1, chunk_size) AS chunk_text FROM generate_series(1, CAST(CEIL(LENGTH(text_col) / (chunk_size - overlap * 1.0)) AS INTEGER)) gs WHERE LENGTH(substring(text_col, (gs.generate_series - 1) * (chunk_size - overlap) + 1, chunk_size)) > 50 ); -- Paragraph-based chunking (splits on double newlines) CREATE MACRO chunk_paragraphs(text_col) AS TABLE ( WITH chunks AS (SELECT string_split(text_col, '\n\n') as arr) SELECT UNNEST(generate_series(1, array_length(arr))) as chunk_number, UNNEST(arr) as chunk_text FROM chunks ); -- Sentence-based chunking (splits on sentence boundaries) CREATE MACRO chunk_sentences(text_col) AS TABLE ( WITH chunks AS (SELECT string_split_regex(text_col, '[.!?]+\s+') as arr) SELECT UNNEST(generate_series(1, array_length(arr))) as chunk_number, UNNEST(arr) as chunk_text FROM chunks ); ``` Use one of the macros to create chunks from your documents. Fixed-size chunks (300-600 chars with 10-20% overlap) work well for most use cases: ```sql CREATE OR REPLACE TABLE hn_text_chunks AS SELECT id AS post_id, title, chunks.chunk_number, chunks.chunk_text FROM hn_stories CROSS JOIN LATERAL chunk_fixed_size(text, 500, 100) chunks; -- Alternative: CROSS JOIN LATERAL chunk_paragraphs(text) chunks; -- Alternative: CROSS JOIN LATERAL chunk_sentences(text) chunks; ``` Generate embeddings for the chunks: ```sql ALTER TABLE hn_text_chunks ADD COLUMN chunk_embedding FLOAT[512]; UPDATE hn_text_chunks SET chunk_embedding = embedding(chunk_text); ``` Once you have chunks with embeddings, search them the same way as full documents using `array_cosine_similarity()` - the chunk-level results often provide more precise matches than searching entire documents. ## Performance Guide Search performance depends on several factors, from the chosen search method, to cold vs. warm reads, Duckling sizing, and tenancy model. When running a search query against your data for the first time (cold read), it may have a higher latency than subsequent queries (warm reads). For production search workloads, ideally dedicate a service account's Duckling primarily to search, so other queries don't compete with search queries. Account for [Duckling cooldown periods](/about-motherduck/billing/duckling-sizes/) - the first search query after cooldown may experience more latency. The DuckDB analytics engine divides data into chunks and processes them in parallel across threads. More data means more chunks to process in parallel, so larger datasets don't necessarily take proportionally longer to search - they just use more threads simultaneously. **Duckling sizing:** Optimal latency requires warm reads and enough threads to process your data in parallel. With the ideal [Duckling sizing](/about-motherduck/billing/duckling-sizes/) configuration matched to your dataset size, keyword search over identifiers ([exact match](#exact-match), [fuzzy match](#fuzzy-search-text-similarity)) typically achieves latencies in the range of a few hundred milliseconds, while document search ([regex](#using-regular-expressions), [Full-Text Search](#full-text-search-fts), [embedding search](#embedding-based-search)) typically achieves 0.5-3 second latency. Our team is happy to help advise on the right resource allocation for your specific workload and latency targets - [get in touch](/troubleshooting/support) to discuss how we can meet your needs. **Handling Concurrent Requests:** For handling multiple simultaneous search requests effectively, consider using [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to distribute load across multiple read scaling Ducklings. Alternatively, consider [hypertenancy](/concepts/hypertenancy), providing isolated compute resources for each user. To optimize further, see the strategies below. For questions or requirements beyond this guide, please [get in touch](/troubleshooting/support). ### Search Optimization Strategies When optimizing search performance, consider the following options. #### Pre-filtering Reduce the search space using structured metadata (e.g. location, categories, date ranges) that can be inferred from the user's context, before running similarity searches: ```sql -- Create a local copy with embeddings for place names (using a subset) CREATE TABLE places AS SELECT fsq_place_id, name, locality, region, fsq_category_labels FROM foursquare.main.fsq_os_places WHERE name IS NOT NULL LIMIT 10000; -- Add embeddings for semantic search ALTER TABLE places ADD COLUMN name_embedding FLOAT[512]; UPDATE places SET name_embedding = embedding(name); -- Pre-filter by location before semantic search WITH filtered_candidates AS ( SELECT fsq_place_id, name, locality, fsq_category_labels, name_embedding FROM places WHERE locality = 'New York' -- Filter by location and region AND region = 'NY' ) SELECT name, locality, fsq_category_labels, array_cosine_similarity( embedding('italian restaurant'), name_embedding ) AS similarity FROM filtered_candidates ORDER BY similarity DESC LIMIT 20; ``` #### Reducing Embedding Dimensionality Halving embedding dimensions roughly halves compute time. OpenAI embeddings can be truncated at specific dimensions (256 for `text-embedding-3-small`, 256 or 512 for `text-embedding-3-large`). Use lower dimensions for initial pre-filtering, then rerank with full embeddings: ```sql -- Setup: Create normalization macro CREATE MACRO normalize(v) AS ( CASE WHEN len(v) = 0 THEN NULL WHEN sqrt(list_dot_product(v, v)) = 0 THEN NULL ELSE list_transform(v, element -> element / sqrt(list_dot_product(v, v))) END ); -- Add lower-dimensional column (e.g., 256 dims instead of 512) ALTER TABLE hn_stories ADD COLUMN text_embedding_short FLOAT[256]; UPDATE hn_stories SET text_embedding_short = normalize(text_embedding[1:256]); ``` Then use a two-stage search: ```sql -- Stage 1: Fast pre-filter with short embeddings SET VARIABLE query_emb = embedding('machine learning algorithms', 'text-embedding-3-large'); SET VARIABLE query_emb_short = normalize(getvariable('query_emb')[1:256])::FLOAT[256]; WITH candidates AS ( SELECT id, array_cosine_similarity(getvariable('query_emb_short'), text_embedding_short) AS similarity FROM hn_stories ORDER BY similarity DESC LIMIT 500 -- Get more candidates if needed ) -- Stage 2: Rerank with full embeddings SELECT p.title, p.text, array_cosine_similarity(getvariable('query_emb'), p.text_embedding) AS final_similarity FROM hn_stories p WHERE p.id IN (SELECT id FROM candidates) ORDER BY final_similarity DESC LIMIT 10; ``` #### FTS Pre-filtering (Hybrid Search) FTS typically has lower latency than embedding search, making it effective as a pre-filter to reduce similarity comparisons. Use a large LIMIT in the FTS stage to ensure good recall: ```sql -- FTS pre-filter with large limit, then semantic rerank SET VARIABLE search_query = 'artificial intelligence neural networks'; WITH fts_candidates AS ( SELECT id, fts_main_hn_stories.match_bm25(id, getvariable('search_query')) AS fts_score FROM hn_stories ORDER BY fts_score DESC LIMIT 10000 -- Large limit to ensure recall ) SELECT h.id, h.title, h.text, array_cosine_similarity( embedding(getvariable('search_query')), h.text_embedding ) AS similarity FROM hn_stories h INNER JOIN fts_candidates f ON h.id = f.id ORDER BY similarity DESC LIMIT 10; ``` See also: [Search Using DuckDB Part 3 (Hybrid Search)](https://motherduck.com/blog/search-using-duckdb-part-3/) ## Advanced Methods This section covers additional techniques to customize and improve your search. The methods below demonstrate common approaches - many other variants are possible. :::note Some methods in this section make use of the `prompt()` function, which is priced in [AI Units](/about-motherduck/billing/pricing#advanced-ai-functions). For paid organizations, Business and Lite plans have a default soft limit of 10 AI Units per user/day (sufficient to process around 80,000 rows) to help prevent unexpected costs. If you'd like to adjust these limits, [just ask!](/troubleshooting/support) ::: ### LLM-Enhanced Keyword Expansion Generate synonyms with an LLM, then use them in pattern matching: ```sql -- Generate synonyms using LLM with structured output SET VARIABLE search_term = 'programming'; WITH synonyms AS ( SELECT prompt( 'Give me 5 synonyms for ''' || getvariable('search_term') || '''', struct := {'synonyms': 'VARCHAR[]'} ).synonyms AS synonym_list ) -- Search with expanded terms SELECT title, text FROM sample_data.hn.hacker_news, synonyms WHERE regexp_matches(text, getvariable('search_term') || '|' || array_to_string(synonym_list, '|')) LIMIT 10; ``` See also: [MotherDuck `prompt()` Function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) ### Hypothetical Document Embeddings (HyDE) HyDE improves question-based retrieval by generating a hypothetical answer first, then searching with that answer's embedding. This works because questions and answers have different linguistic patterns - the hypothetical answer better matches actual document content. Use with semantic search or the semantic component of hybrid search. ```sql -- HyDE: Generate hypothetical answer, then search with it WITH hypothetical_answer AS ( SELECT prompt( 'Answer this question in 2-3 sentences: "What are the key challenges in building scalable distributed systems?" Focus on typical technical challenges and solutions.' ) AS answer ) -- Search using the hypothetical answer's embedding SELECT title, text, array_cosine_similarity( (SELECT embedding(answer) FROM hypothetical_answer), text_embedding ) AS similarity FROM hn_stories ORDER BY similarity DESC LIMIT 10; ``` See also: [Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE paper)](https://arxiv.org/abs/2212.10496) ### Reranking Reranking typically happens in two stages: initial retrieval to get top candidates (100-500 results), then precise reranking of that smaller set. #### Rule-Based Reranking with Metadata Refine results based on business rules and metadata like score, category, or freshness: ```sql -- Find similar posts with metadata-based reranking WITH initial_similarity AS ( -- Step 1: Fast vector similarity for top candidates SELECT title, text, score as author_score, array_cosine_similarity( embedding('artificial intelligence and machine learning applications'), text_embedding ) AS emb_similarity FROM hn_stories ORDER BY emb_similarity DESC LIMIT 100 ), reranked_scores AS ( -- Step 2: Rerank with metadata (author score) SELECT title, text, author_score, emb_similarity, -- Score boost (normalize to 0-1 range based on actual data) (author_score / MAX(author_score) OVER ()) AS author_score_norm, -- Combined final score: 60% semantic + 40% author score (emb_similarity * 0.6 + author_score_norm * 0.4) AS reranked_score FROM initial_similarity ) SELECT title, text, author_score, ROUND(emb_similarity, 3) as semantic_score, ROUND(author_score_norm, 3) as author_score_normalized, ROUND(reranked_score, 3) as final_score FROM reranked_scores ORDER BY reranked_score DESC LIMIT 10; ``` #### LLM-Based Reranking For complex relevance criteria that are hard to express as rules, use an LLM to judge and score results. The [`prompt()` function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) is optimized for batch processing and processes requests in parallel - so reranking 50 results typically adds only a few hundred milliseconds. ```sql -- LLM reranking for top search results SET VARIABLE search_query = 'best practices for code review and software quality'; WITH top_candidates AS ( -- Initial retrieval (e.g., via semantic search) SELECT id, title, text, array_cosine_similarity( embedding(getvariable('search_query')), text_embedding ) AS initial_score FROM hn_stories ORDER BY initial_score DESC LIMIT 20 ), llm_reranked AS ( SELECT *, prompt( format( 'Rate how well this post matches the query ''{}''. Post: {} - {}', getvariable('search_query'), title, text ), struct := {'rating': 'INTEGER'} ).rating AS llm_score FROM top_candidates ) SELECT title, text, ROUND(initial_score, 3) as initial_score, llm_score, ROUND((0.6 * initial_score + 0.4 * llm_score / 10.0), 3) AS final_score FROM llm_reranked ORDER BY final_score DESC LIMIT 10; ``` ## Next Steps - Check out the MotherDuck [Embedding Function](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) and [Prompt Function](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) - Review the [Full-Text Search Guide](https://duckdb.org/docs/stable/guides/sql_features/full_text_search.html) in DuckDB documentation - Read the MotherDuck blog series: [Search Using DuckDB Part 1](https://motherduck.com/blog/search-using-duckdb-part-1/), [Part 2](https://motherduck.com/blog/search-using-duckdb-part-2/), [Part 3](https://motherduck.com/blog/search-using-duckdb-part-3/) - Explore [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/) --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/attach-modes --- title: Attach Modes description: Understand Workspace and Single attach modes --- ## MotherDuck attach modes: workspace and single modes This guide explains MotherDuck's two connection modes: **workspace** and **single**. Workspace mode is designed for working with multiple databases persistently across sessions, while single mode uses a non-persistent, isolated session that does not reuse your saved workspace. ### Connection modes MotherDuck offers two connection modes: workspace and single. The mode you use determines how your attachments and detachments are handled and whether these changes are saved for future sessions. * **Workspace Mode**: This is the default mode when you want to work with all attached MotherDuck databases. When you attach or detach a database in this mode, that change is remembered for your next session. This is useful when you consistently work with the same set of databases. Parallel connections to MotherDuck in workspace mode will keep their attachments in sync. E.g. detaching a database in one client in workspace mode will detach it in all other clients that are connected in workspace mode. * **Single Mode**: This mode is for when you want a one-time, non-persistent session that does not reuse or change your saved workspace. Any databases you attach or detach during this session will not affect your saved workspace for the next time you connect or interfere with attachment state of other parallel connections to MotherDuck. You can still attach multiple databases in a single-mode session, including databases shared with you. For example, you can start with your own database and then `ATTACH 'md:_share/...'` to attach a share. Single mode is useful with BI tools that only support a single attached database at a time. :::tip You can't switch between modes in the middle of a session. The mode is set by the first command you use to connect to MotherDuck. ::: ### Connecting to MotherDuck with a connection string When you first connect to MotherDuck in a session, the connection string you use determines the attach mode. This applies to most of clients, like the DuckDB CLI (`duckdb 'md:...'`) and Python (`duckdb.connect('md:...')`). * **To connect in Workspace Mode (default):** * Use `md:` or `md:`. * This connects to your MotherDuck workspace, attaching *all* databases from your last saved session. * If you specify a database name, it becomes the active database. * Any changes to attachments (attaching or detaching databases) are saved and will be restored in your next workspace session. * **To connect in Single Mode:** * Use `md:?attach_mode=single`. * This connects to the specified database without using your saved workspace. * Attachment changes are *temporary* and will *not* be saved. * Note: You must specify a database name to use single mode. Connecting with `md:?attach_mode=single` is not allowed, as this mode requires a specific database target. ### Connecting to MotherDuck using the ATTACH command If you are already in a DuckDB session, but **not** connected to MotherDuck yet, your first ATTACH command that targets MotherDuck establishes the attach mode for that session. * **To connect in Workspace Mode:** * Use `ATTACH 'md:'`. * This attaches your entire saved workspace. * The session is now in workspace mode, and any subsequent attachment changes will be persisted for future sessions. * **To connect in Single Mode:** * Use `ATTACH 'md:'`. * This attaches the specified database without using your saved workspace. * The session is implicitly set to single mode. Attachment changes are not saved. * Once in single mode, you cannot attach the entire workspace using `ATTACH 'md:'`. ### Tips & tricks Further Notes: * You can also explicitly set the attach mode before connecting to MotherDuck. ```sql LOAD motherduck; SET motherduck_attach_mode = 'workspace'; -- or 'single' ATTACH 'md:foo'; -- database created by your account ``` * The MotherDuck UI is always connecting in workspace mode. --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-and-connecting-to-motherduck --- title: Authenticating and connecting to MotherDuck description: Learn how to authenticate and connect to MotherDuck --- # Authenticating and connecting to MotherDuck These pages explain how to connect to MotherDuck using the CLI, Python, JDBC and NodeJS. First, you need to [authenticate to MotherDuck](./authenticating-to-motherduck) by [manual authentication](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#manual-authentication) via the Web UI, or automatic authentication via an [access token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token). Organizations on Business or Enterprise plans can also configure [Single Sign-On (SSO)](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/) with their identity provider. To connect to a MotherDuck database, you can [create a connection](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/). ## Included pages - [Authenticating to MotherDuck](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck): Authenticate to a MotherDuck account - [Connecting to MotherDuck](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck): Create one or more connections to a MotherDuck database - [Connect via the Postgres endpoint](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint): Connect to MotherDuck using any Postgres-compatible client via the Postgres wire protocol endpoint - [Multithreading and parallelism](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism): Learn how to use multithreading and parallelism for special cases to read data from MotherDuck - [Read Scaling](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling): Learn how to scale your data applications using read scaling tokens - [Attach Modes](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes): Understand Workspace and Single attach modes --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck --- sidebar_position: 1 title: Authenticating to MotherDuck description: Authenticate to a MotherDuck account --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Authenticating to MotherDuck MotherDuck supports the following authentication methods: - **Manual authentication**, typically used by the MotherDuck UI (Google, GitHub, or email and password) - **Access token authentication**, more convenient for Python, CLI, or other clients - **[Single Sign-On (SSO)](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/)**, for organizations that want to authenticate through their corporate identity provider (available on Business and Enterprise plans) ## Manual authentication MotherDuck UI authenticates using several methods: - Google - Github - Username and password You can leverage multiple modes of authentication in your account. For example, you can authenticate both through Google and with a username and password as you see fit. To authenticate in CLI or Python, you will be redirected to an authentication web page. This happens every session. To avoid having to re-authenticate, you can save your access token, as described in the [Authenticate With an Access Token](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token) section. ## Authentication using an access token If you are using Python or CLI and don't want to authenticate every session, you can securely save your credentials locally. ### Creating an access token To create an access token: - Go to the [MotherDuck UI](https://app.motherduck.com) - In top left click on organization name and then `Settings` - Click `+ Create token` - Specify a name for the token that you'll recognize (like "DuckDB CLI on my laptop") - Specify the type of token you want. Tokens can be Read/Write (default) or [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). - Choose whether you want the token to expire and then click on `Create token` - Copy the access token token to your clipboard by clicking on the copy icon ![access token example](../img/creating_access_token.jpg) ### Storing the access token as an environment variable You can save the access token as `motherduck_token` in your environment variables. An example of setting this in a terminal: ```bash export motherduck_token='' ``` You can also add this line to your `~/.zprofile` or `~/.bash_profile`, or store it in a `.env` file in your project root. Once this is done, your authentication token is saved and you can connect to MotherDuck with the following connection string: ```bash duckdb "md:my_db" ``` :::info This is the best practice for security reasons. The token is sensitive information and should be kept safe. Do not share it with others. ::: Alternatively, you can specify an access token in the MotherDuck connection string: `md:my_db?motherduck_token=`. ```bash duckdb "md:my_db?motherduck_token=" ``` When in the DuckDB CLI, you can use the `.open` command and specify the connection string as an argument. ```CLI .open md:my_db?motherduck_token= ``` ## Using connection string parameters ### Authentication using SaaS mode You can limit MotherDuck's ability to interact with your local environment using `SaaS Mode`: - Disable reading or writing local files - Disable reading or writing local DuckDB databases - Disable installing or loading any DuckDB extensions locally - Disable changing any DuckDB configurations locally This mode is useful for third-party tools, such as BI vendors, that host DuckDB themselves and require additional security controls to protect their environments. You can enable SaaS mode in two ways: 1. **Using a configuration setting** (recommended for persistent configuration): ```sql SET motherduck_saas_mode = true; ``` 2. **Using a connection string parameter** (for connection-time configuration): ```cli .open md:[]?[motherduck_token=]&saas_mode=true ``` ```python conn = duckdb.connect("md:[]?[motherduck_token=]&saas_mode=true") ``` :::info Using the connection string parameter requires to use `.open` when using the DuckDB CLI or `duckdb.connect` when using Python. This initiates a new connection to MotherDuck and will detach any existing connection to a local DuckDB database. You cannot provide a token to `ATTACH md:` directly, only when connecting. ::: ### Using attach mode By default, MotherDuck connects in **workspace mode**, which attaches every database in your saved workspace and keeps attachment changes in sync across parallel connections. To scope the connection to a single database instead, use **single mode** by appending `?attach_mode=single` to the connection string. Single mode is useful for BI tools and other clients that get confused by multiple attached databases. For full details, see [Attach modes](/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/). For example, to connect to a database named `my_database` in single mode, run: ```bash duckdb 'md:my_database?attach_mode=single' ``` :::note `` that starts with a number cannot be connected to directly. You will need to connect without a database specified and then `CREATE` and `USE` using a double quoted name. Eg: `USE DATABASE "1database"` ::: --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/scim-provisioning --- sidebar_position: 3 title: SCIM provisioning description: Automate user lifecycle management in MotherDuck using SCIM with your identity provider. draft: true --- # SCIM provisioning :::info SCIM provisioning is coming soon and will be available on **Enterprise** plans. [SSO](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/) must be configured before enabling SCIM. ::: ## What is SCIM? SCIM (System for Cross-domain Identity Management) is an open standard ([RFC 7643](https://datatracker.ietf.org/doc/rfc7643/)) for automating user lifecycle management across cloud applications. It enables your identity provider (IdP) to automatically create, update, and delete user accounts in MotherDuck. ## How SCIM complements SSO SSO and SCIM serve different purposes: | | SSO | SCIM | | --- | --- | --- | | **Purpose** | Authentication — controls *how* users log in | Provisioning — controls *which* users exist | | **Handles** | Login redirects, session management | Account creation, updates, deprovisioning | | **Trigger** | User-initiated (at login time) | IdP-initiated (on staff changes) | With SSO alone, MotherDuck uses [Just-in-Time (JIT) provisioning](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/#just-in-time-jit-user-provisioning) to create accounts on first login. However, JIT does not handle changes after initial provisioning. If an employee leaves your company, their MotherDuck account remains active unless manually removed. SCIM solves this by keeping MotherDuck user accounts in sync with your IdP. When you add, modify, or remove a user in your IdP, those changes are automatically propagated to MotherDuck. ## What SCIM enables When SCIM is configured, your identity provider automatically: - **Creates** a MotherDuck account when a user is assigned to the MotherDuck application in your IdP - **Updates** user attributes (name, email, role) when they change in your IdP - **Deprovisions** users when they are removed from the MotherDuck application or deactivated in your IdP. Deprovisioned users are disabled immediately and fully deleted after 30 days. ## SCIM attribute mapping The following attributes are synced from your IdP to MotherDuck: | Attribute | Required | Description | Example | | --- | --- | --- | --- | | `email` | Yes | User's email address (primary identifier) | `jane@acme.com` | | `name` | Yes | Display name | `Jane Doe` | | `roles` | Yes | MotherDuck role mapping | `[md:role:admin]` | | `org_id` | Yes | Organization routing (derived from domain) | `acme.org` | | `external_id` | No | Stable identifier from your IdP | `00u1234abcd` | ## Prerequisites - [SSO must be configured](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/) for your organization - **Enterprise** plan - Admin access to your identity provider ## Setup overview Once SCIM is available, the setup process will follow these steps: 1. Open your IdP's admin console and the MotherDuck **Settings** → **Authentication** page. 2. Set the provisioning mode to **Automatic Provisioning** in your IdP. 3. Configure the admin credentials using the **Tenant URL** and **Secret Token** from the MotherDuck Authentication page. 4. Click **Test Connection** in your IdP to verify connectivity. 5. Enable automatic provisioning in the MotherDuck UI. Detailed setup instructions will be published when SCIM support becomes available. --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup --- sidebar_position: 2 title: Setting up SSO description: Configure Single Sign-On (SSO) for your MotherDuck organization using your identity provider. --- import VideoPlayer from '@site/src/components/VideoPlayer'; # Setting up SSO Single Sign-On (SSO) allows your organization to authenticate MotherDuck users through your existing identity provider (IdP). When SSO is enabled, users with a verified email domain are automatically redirected to your corporate login page, removing the need for separate MotherDuck credentials. :::note SSO is available on **Business** and **Enterprise** plans. ::: ## How SSO works When you configure SSO, MotherDuck connects to your identity provider using either the SAML or OIDC protocol. The login flow works as follows: 1. A user enters their email on the MotherDuck login page. 2. MotherDuck looks up the email domain. If the domain is verified and SSO is enabled, the user is redirected to your corporate IdP. 3. The user authenticates with the IdP. 4. MotherDuck receives the authentication response and creates or updates the user's session. Users with personal email addresses or domains without SSO configured continue to use standard login methods (Google, GitHub, or email and password). ## Supported SSO configurations MotherDuck supports four SSO configuration options: | Configuration | Protocol | Use when | | --- | --- | --- | | **Okta** | OIDC | Your organization uses Okta Workforce Identity | | **Microsoft Entra ID** | OIDC | Your organization uses Microsoft Entra ID (formerly Azure AD) | | **SAML** | SAML | Your IdP supports SAML but is not Okta or Entra ID | | **OIDC** | OIDC | Your IdP supports OpenID Connect but is not Okta or Entra ID | The generic SAML and OIDC options allow you to connect any compatible identity provider, such as Google Workspace, PingFederate, or Keycloak. ### SAML vs. OIDC **SAML** (Security Assertion Markup Language) is an XML-based protocol widely used in enterprise environments for browser-based SSO. Most traditional enterprise IdPs support SAML. **OIDC** (OpenID Connect) is a JSON-based protocol built on top of OAuth 2.0. It is more common in cloud-native and modern environments. Both protocols achieve the same result: authenticating users through your IdP. Choose the protocol that your IdP supports or that your IT team is most familiar with. ## Prerequisites Before setting up SSO, ensure you have: - **Org Admin** role in your MotherDuck organization - A **Business** or **Enterprise** plan - Admin access to your company's identity provider - A **custom domain name** for your organization (for example, `acme.com`) and the ability to add a DNS TXT record to the domain for verification - All users in your organization use **non-aliased email addresses** (addresses like `user+tag@company.com` are not supported) :::caution SSO is supported for organizations where all users belong to a **single MotherDuck organization**. If your users are spread across multiple MotherDuck organizations (for example, separate US and EU orgs), do not enable SSO. Multi-organization SSO support is planned for a future release. ::: ## Setting up SSO ### Step 1: Start SSO configuration in MotherDuck 1. In the MotherDuck UI, click your organization name in the top left and select **Settings**. 2. Navigate to the **Authentication** tab. 3. Click **Set up SSO** to begin the setup process. ![MotherDuck Settings showing the Authentication tab with the Set up SSO button](./img/sso-authentication-settings.png) 4. Select your identity provider from the list, or choose **Custom SAML** or **Custom OIDC** if your IdP is not listed. ![Select your identity provider for SSO configuration](./img/sso-select-identity-provider.png) ### Step 2: Create a MotherDuck application in your identity provider 1. Log in to your identity provider's admin console. 2. Create a new application and name it **MotherDuck**. 3. Select the appropriate protocol (SAML or OIDC) based on your chosen configuration. ### Step 3: Configure the connection The MotherDuck setup wizard provides step-by-step instructions for each provider. Follow the instructions on the SSO onboarding portal to configure the connection between your IDP and MotherDuck. For example, the Okta configuration walks you through creating an OIDC application: ![Okta OIDC SSO configuration wizard showing the Create Application step](./img/sso-okta-create-application.png) ### Step 4: Map user attributes In your IdP, map the following attributes to the MotherDuck application: | Attribute | Required | Description | | --- | --- | --- | | `email` | Yes | The user's email address (primary login identifier) | | `given_name` | No | The user's first name | | `family_name` | No | The user's last name | ### Step 5: Assign users Assign yourself (and optionally other users) to the MotherDuck application in your IdP. ### Step 6: Verify your domain MotherDuck requires domain ownership verification before SSO can be enabled. Follow the instructions to add a DNS TXT record for your domain. Once the record is detected, your domain is verified. ![SSO configuration status showing pending domain verification](./img/sso-pending-domain-verification.png) ### Step 7: Enable SSO After domain verification succeeds, return to the setup wizard and click **Done** to complete the configuration, then click **Enable SSO** to activate the connection. ![SSO configuration dialog to confirm enabling SSO](./img/sso-enable-sso-dialog-confirmation.png) :::warning Enabling SSO is **not reversible** without contacting MotherDuck support. Before enabling, ensure that: - All users in your organization use non-aliased email addresses on the verified domain - Your users belong to **only this** MotherDuck organization - You have tested the IdP configuration by assigning yourself to the application ::: When SSO is enabled: - All existing non-SSO login methods (Google, GitHub, email/password) are **deactivated** for users with the verified domain - Any pending invitations matching the SSO domain will need to **sign up through SSO** - Users must authenticate through the configured IdP going forward ### Step 8: Test SSO login 1. Log out of MotherDuck. 2. On the login page, enter your corporate email address. 3. You should be redirected to your IdP's login page. 4. After authenticating, you are returned to the MotherDuck UI. ## Just-in-Time (JIT) user provisioning When SSO is enabled, new users from your verified domain can be automatically provisioned on their first login. This is called Just-in-Time (JIT) provisioning. JIT provisioning is enabled by default the first time you activate SSO. Admins can change this setting at any time from the organization **Settings** page (see below). With JIT enabled: - A user enters their corporate email on the MotherDuck login page - They are redirected to your IdP and authenticate - The user is automatically given the option to join your organization at signup ### Controlling access with JIT and invite settings Admins can configure JIT provisioning and organization invite policies from the organization **Settings** page. These two settings work together to control how new users join your organization: | Setting | When enabled | When disabled | | --- | --- | --- | | **JIT provisioning** | Users who authenticate through your IdP can join the organization on first login *(default on first SSO activation)* | New users must be invited by an Admin | | **Organization invites** | Any member can invite new users to the organization | Only Admins can invite new users, giving you tighter control over who has access | When both organization invites and JIT provisioning are disabled, new users can only join if an Admin invites them. When JIT is enabled but invites are disabled, users who have been given access in your IdP can still join on first login, but members cannot send invitations. ![invite policy](./img/org-invite-policy.png) For more information on managing organization members and roles, see [Managing organizations](/docs/key-tasks/managing-organizations/). JIT provisioning handles initial account creation only. It does not manage role changes or account deletion after provisioning. For automated user lifecycle management, SCIM provisioning support is planned for a future release. ## Managing members Managing users with SSO works the same as before. You can invite any new user by supplying their email address. If the email domain matches one of your verified domains, the user will be redirected to their IdP for authentication. ## Limitations - **Single organization only**: SSO is supported for users who belong to a single MotherDuck organization. Multi-org SSO is planned for a future release. - **No aliased emails**: Email addresses with aliases (for example, `user+tag@company.com`) are not supported when SSO is enabled. - **One connection per domain**: Each verified domain can have only one SSO connection. Users with an email address on that domain in any MotherDuck organization will be redirected to their IdP. - **Non-reversible**: Enabling SSO cannot be undone without contacting [MotherDuck support](mailto:support@motherduck.com). - **CLI and SDK authentication**: Users authenticating through the SDKs continue to use [access tokens](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token). SSO applies to browser-based login flows for the WebUI, CLI and MCP. --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck --- sidebar_position: 2 title: Connecting to MotherDuck description: Create one or more connections to a MotherDuck database --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import useBaseUrl from '@docusaurus/useBaseUrl'; There are two ways to connect to MotherDuck: | Method | Client needed | Best for | |--------|--------------|----------| | **DuckDB SDK** | DuckDB client library | Python, Node.js, Java, CLI — full feature set, hybrid execution, local caching | | **[Postgres Endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint)** | Any PostgreSQL client | Thin clients, serverless environments, BI tools, languages without a DuckDB SDK | This page covers connecting with the **DuckDB SDK**. For the Postgres endpoint, see [Postgres Endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint). ## Connecting with the DuckDB SDK A single DuckDB connection executes one query at a time, aiming to maximize the performance of that query, making reuse of a single connection both simple and performant. We recommend starting with the simplest way of connecting to MotherDuck and running queries, and if that does not meet your requirements, to explore the advanced use-cases described in subsequent sections. ## Create a connection The below code snippets show how to create a connection to a MotherDuck database from the CLI, Python, JDBC, and Node.js language APIs. :::info For security reasons, it's generally recommended to use environment variables to store your MotherDuck token rather than hardcoding it in your application. ::: :::tip The `INSERT INTO` statements below are for illustration only. For loading real data, do not insert rows one at a time — use bulk methods like `INSERT INTO ... SELECT` from files, `COPY`, or DataFrame-based approaches. See [Loading data into MotherDuck](/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck.mdx) for recommended approaches. ::: To connect to your MotherDuck database, use `duckdb.connect("md:my_database_name")`. This will return a `DuckDBPyConnection` object that you can use to interact with your database. There are two ways to provide your access token in Python to authenticate your user session. ```python import duckdb # Create connection to your default database conn = duckdb.connect("md:my_db", config={"motherduck_token" :}) # Optionally, import your token from a .env file # Run query conn.sql("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)") conn.sql("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)") res = conn.sql("SELECT * FROM items") # Close the connection conn.close() ``` ```python import duckdb # Create connection to your default database conn = duckdb.connect(f"md:my_db?motherduck_token={}") # Optionally, import your token directly from a .env file # Run query conn.sql("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)") conn.sql("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)") res = conn.sql("SELECT * FROM items") # Close the connection conn.close() ``` To connect to your MotherDuck database, you can create a `Connection` by using the `"jdbc:duckdb:md:databaseName"` connection string format. For authentication, you need to provide a MotherDuck token. There are two ways to provide the token: ```java import java.sql.Connection; import java.sql.DriverManager; import java.sql.Statement; import java.sql.ResultSet; import java.util.Properties; // Create properties with your MotherDuck token Properties props = new Properties(); props.setProperty("motherduck_token", ""); // Create connection to your database try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); Statement stmt = conn.createStatement()) { stmt.executeUpdate("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)"); stmt.executeUpdate("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)"); try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) { while (rs.next()) { System.out.println("Item: " + rs.getString(1) + " costs " + rs.getInt(3)); } } } ``` ```java // Create connection with token in the connection string try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?motherduck_token="); Statement stmt = conn.createStatement()) { stmt.executeUpdate("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)"); stmt.executeUpdate("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)"); try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) { while (rs.next()) { System.out.println("Item: " + rs.getString(1) + " costs " + rs.getInt(3)); } } } ``` :::info If an environment variable named `motherduck_token` is set, it will be used automatically. ::: To connect to your MotherDuck database, you can create a `DuckDBInstance` with the `'md:databaseName'` connection string format. For authentication, you need to provide a MotherDuck token. There are two ways to provide the token: ```javascript import { DuckDBInstance } from '@duckdb/node-api'; // Create connection to your default database const instance = await DuckDBInstance.create('md:my_db', { motherduck_token: '', }); const conn = await instance.connect(); // Run queries await conn.run('CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)'); await conn.run("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)"); const result = await conn.runAndReadAll('SELECT * FROM items'); console.table(result.getRowObjects()); ``` ```javascript import { DuckDBInstance } from '@duckdb/node-api'; // Create connection to your default database const instance = await DuckDBInstance.create('md:my_db?motherduck_token='); const conn = await instance.connect(); // Run queries await conn.run('CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)'); await conn.run("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)"); const result = await conn.runAndReadAll('SELECT * FROM items'); console.table(result.getRowObjects()); ``` :::info If an environment variable named `motherduck_token` is set, it's used automatically. ::: To connect to your MotherDuck database, run `duckdb md:`. ```shell duckdb "md:my_db" ``` Now, you will enter the DuckDB interactive terminal to interact with your database. ```sql D CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER); D INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2); D SELECT * FROM items; ``` ## Session names The `session_name` connection string parameter lets you give your session a name. You can set it in the connection string (`md:my_db?session_name=my_label`) or as a DuckDB setting before connecting to MotherDuck (`SET motherduck_session_name='my_label'`). :::note The older `session_hint` parameter still works as an alias for `session_name`. ::: ### Read scaling with session names If you are planning on multiple end users connecting with a [Read Scaling Token](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/read-scaling.mdx), ensure each user can get a dedicated backend (up to the maximum configured flock size) by passing a `session_name` in the connection string. Session names ensure that all the queries from the same end user are routed to the same backend duckling, even if they originate from different services/servers. This allows for optimal caching and resource allocation for each specific user's needs. After establishing the connection, it can be used the same way as any DuckDB/MotherDuck connection -- to run queries, and then either be closed explicitly or go out of scope, as in the examples above. ### Annotating queries with session names The `session_name` value appears in the `SESSION_NAME` column of [query history](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/), making it easy to identify and group queries. This works for both read scaling and read/write connections. ```python import duckdb # Create a connection and allocate a stable backend for user123. con = duckdb.connect( "md:my_db?session_name=user123", config = {'motherduck_token': ''} ) ``` ```java import java.sql.Connection; import java.sql.DriverManager; import java.sql.Statement; import java.sql.ResultSet; import java.util.Properties; // Create properties with your MotherDuck token Properties props = new Properties(); props.setProperty("motherduck_token", ""); // Create a connection and allocate a stable backend for user123. try (Connection conn = DriverManager.getConnection("jdbc:duckdb:md:my_db?session_name=user123", props)) { // ... } ``` ```javascript import { DuckDBInstance } from '@duckdb/node-api'; // Create a connection and allocate a stable backend for user123. const instance = await DuckDBInstance.create( 'md:my_db?session_name=user123', { motherduck_token: '' } ); // ... ``` ## Multiple connections and the database instance cache DuckDB clients in Python, Go, R, JDBC, and ODBC prevent redundant reinitialization by keeping instances of database-global context cached by the database path. When connecting to MotherDuck, the instance is cached for an additional 15 minutes after the last connection is closed (see [Setting Custom Database Instance Cache TTL](#setting-custom-database-instance-cache-time-ttl) for how to override this value). For an application that creates and closes connections frequently, this could provide a significant speedup for connection creation, as the same catalog data can be reused across connections. This means that only the first of multiple connections to the same database will take the time to load the MotherDuck extension, verify its signature, and fetch the catalog metadata. ```python con1 = duckdb.connect("md:my_db") // MotherDuck catalog fetched con2 = duckdb.connect("md:my_db") // MotherDuck catalog reused ``` ```java // Create properties with your MotherDuck token Properties props = new Properties(); props.setProperty("motherduck_token", ""); try (var con1 = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); // MotherDuck catalog fetched var con2 = DriverManager.getConnection("jdbc:duckdb:md:my_db", props); // MotherDuck catalog reused ) { // ... } ``` :::warning Node.js does not cache instances automatically Unlike some other clients, the Node.js client (`@duckdb/node-api`) does **not** cache database instances by default. Each call to `DuckDBInstance.create()` creates a new instance, which means the MotherDuck extension is reloaded and the catalog metadata is re-fetched every time. Depending on the size of your catalog this can cause significant connection delays. To avoid this, use `DuckDBInstance.fromCache()` or create a `DuckDBInstanceCache` as shown below. ::: In Node.js, you must explicitly opt in to instance caching by using `DuckDBInstance.fromCache()` instead of `DuckDBInstance.create()`. This uses a built-in default cache to ensure only one instance is created per database path, avoiding reloading the MotherDuck extension and re-fetching catalog metadata on subsequent connections. ```javascript import { DuckDBInstance } from '@duckdb/node-api'; // First call creates the instance and fetches the MotherDuck catalog const instance = await DuckDBInstance.fromCache('md:my_db', { motherduck_token: '', }); const connection1 = await instance.connect(); // Second call reuses the cached instance — no reinitialization needed const instance2 = await DuckDBInstance.fromCache('md:my_db'); const connection2 = await instance2.connect(); ``` For more control, you can create your own `DuckDBInstanceCache`: ```javascript import { DuckDBInstanceCache } from '@duckdb/node-api'; const cache = new DuckDBInstanceCache(); // Retrieves an existing instance or creates one if it doesn't exist const instance = await cache.getOrCreateInstance('md:my_db'); const connection = await instance.connect(); ``` ## Setting custom database instance cache time (TTL) By default, connections to MotherDuck established through the database instance caching supporting DuckDB APIs will reuse the same database instance for 15 minutes after the last connection is closed. In some cases, you may want to make that period longer (to avoid the redundant reinitialization) or shorter (to connect to the same database with a different configuration). The database TTL value can be set either at the initial connection time, or by using the `SET` command at any point. Any valid [DuckDB Instant part specifiers](https://duckdb.org/docs/stable/sql/functions/datepart.html#part-specifiers-usable-as-date-part-specifiers-and-in-intervals) can be used for the TTL value, for example '5s', '3m', or '1h'. :::note The examples below assume you have configured your MotherDuck token using one of the authentication methods described in the [Create a connection](#create-a-connection) section above. ::: ```python con = duckdb.connect("md:my_db?dbinstance_inactivity_ttl=1h") con.close() # different database connection string (without `?dbinstance_inactivity_ttl=1h`), no instance cached; TTL is 15 minutes (default) con2 = duckdb.connect("md:my_db") # allow the database instance to expire immediately con2.execute("SET motherduck_dbinstance_inactivity_ttl='0s'") # the database instance can only expire after the last connection is closed con2.close() # new database instance with a new TTL (the 15 minute default) con3 = duckdb.connect("md:my_db") con3.close() # the last TTL for this database was 15 minutes; the cached database instance will be reused con4 = duckdb.connect("md:my_db") ``` The TTL can be set either through the connection string or through Properties. However, be careful when using Properties as the database instance cache is keyed by the connection string. This means that if you change the TTL in Properties between connections, you'll get an error as it's trying to connect to the same database with different configurations. Here's an example that will fail: ```java Properties props = new Properties(); props.setProperty("motherduck_dbinstance_inactivity_ttl", "2m"); // First connection works fine try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) { // TTL is set to 2m } // Changing TTL in properties will fail props.setProperty("motherduck_dbinstance_inactivity_ttl", "5m"); try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) { // This will throw: "Can't open a connection to same database file // with a different configuration than existing connections" } ``` For this reason, it's generally safer to set the TTL through the connection string: ```java // Set TTL through connection string try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db?dbinstance_inactivity_ttl=1h")) { // TTL is set to 1h } // Different TTL creates a new instance try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db?dbinstance_inactivity_ttl=30m")) { // This works - creates a new instance with 30m TTL } // Can also set TTL using SQL try (var con = DriverManager.getConnection("jdbc:duckdb:md:my_db"); var st = con.createStatement()) { // allow the database instance to expire immediately st.executeUpdate("SET motherduck_dbinstance_inactivity_ttl='0s'"); } ``` :::note When using Properties, you must include the `motherduck_` prefix for the TTL property name (i.e., `motherduck_dbinstance_inactivity_ttl`). This prefix is only optional when passing the TTL through the connection string. ::: ```javascript import { DuckDBInstance } from '@duckdb/node-api'; // Set TTL to 1 hour through the connection string const instance = await DuckDBInstance.fromCache('md:my_db?dbinstance_inactivity_ttl=1h'); const conn = await instance.connect(); // Or set the TTL using SQL after connecting await conn.run("SET motherduck_dbinstance_inactivity_ttl='30m'"); // Allow the database instance to expire immediately after the connection closes await conn.run("SET motherduck_dbinstance_inactivity_ttl='0s'"); ``` ## Connect to multiple databases If you need to connect to MotherDuck and run one or more queries in succession on the same account, you can use a [single database connection](#create-a-connection). If you want to connect to another database in the same account, you can either [reuse the same connection](#example-1-reuse-the-same-duckdb-connection), or [create copies](#example-2-create-copies-of-the-initial-duckdb-connection) of the connection. If you need to connect to multiple databases, you can either directly reuse the same `DuckDBPyConnection` instance, or create copies of the connection using the `.cursor()` method. :::note `FROM ` is a shorthand version of `SELECT * FROM
`. ::: ### Example 1: Reuse the same DuckDB connection To connect to different databases in the same MotherDuck account, you can use the same connection object and fully qualify the names of the tables in your query. ```python conn = duckdb.connect("md:my_db") res1 = conn.sql("FROM my_db1.main.tbl") res2 = conn.sql("FROM my_db2.main.tbl") res3 = conn.sql("FROM my_db3.main.tbl") conn.close() ``` ### Example 2: Create copies of the initial DuckDB connection `conn.cursor()` returns a copy of the DuckDB connection, with a reference to the existing DuckDB database instance. Closing the original connection also closes all associated cursors. ```python conn = duckdb.connect("md:my_db") cur1 = conn.cursor() cur2 = conn.cursor() cur3 = conn.cursor() cur1.sql("USE my_db1") cur2.sql("USE my_db2") cur3.sql("USE my_db3") res = [] for cur in [cur1, cur2, cur3]: res.append(cur.sql("SELECT * FROM tbl")) # This closes the original DuckDB connection and all cursors conn.close() ``` :::note `duckdb.connect(path)` creates and caches a DuckDB instance. Subsequent calls with the same path reuse this instance. New connections to the same instance are independent, similar to `conn.cursor()`, but closing one doesn't affect others. To create a new instance instead of using the cached one, make the path unique (e.g., `md:my_db?user=`). ::: ### Example 3: Create multiple connections You can also create multiple connections to the same MotherDuck account using different DuckDB instances. However, keep in mind that each connection takes time to establish, and if connection times are an important factor for your application, it might be beneficial to consider [Example 1](#example-1-reuse-the-same-duckdb-connection) or [Example 2](#example-2-create-copies-of-the-initial-duckdb-connection). :::note If you need to run queries on separate connections in quick succession, instead of opening and closing a connection for every query, we recommend using a Connection Pool ([Python](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python#connection-pooling), [JDBC](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-jdbc#connection-pooling) or [Node.js](/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-nodejs#connection-pooling)). ::: ```python conn1 = duckdb.connect("md:my_db1") conn2 = duckdb.connect("md:my_db2") conn3 = duckdb.connect("md:my_db3") res1 = conn1.sql("SELECT * FROM tbl") res2 = conn2.sql("SELECT * FROM tbl") res3 = conn3.sql("SELECT * FROM tbl") conn1.close() conn2.close() conn3.close() ``` If you need to connect to multiple databases, you typically won't need to create multiple DuckDB instances. You can either directly reuse the same `DuckDBConnection` instance, or create copies of the connection using the `.duplicate()` method. ```java // Create connection with your MotherDuck token Properties props = new Properties(); props.setProperty("motherduck_token", ""); try (DuckDBConnection duckdbConn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:md:my_db", props)) { Connection conn1 = duckdbConn.duplicate(); Connection conn2 = duckdbConn.duplicate(); Connection conn3 = duckdbConn.duplicate(); // ... } ``` If you need to connect to multiple databases, you can re-use the same `DuckDBInstance` and connection. Use `fromCache` to ensure the instance is reused efficiently. ```javascript import { DuckDBInstance } from '@duckdb/node-api'; const instance = await DuckDBInstance.fromCache('md:', { motherduck_token: '', }); const conn = await instance.connect(); const result1 = await conn.runAndReadAll('FROM my_db1.main.tbl'); const result2 = await conn.runAndReadAll('FROM my_db2.main.tbl'); ``` --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-jdbc --- sidebar_position: 2 title: Multithreading and parallelism with JDBC sidebar_label: JDBC description: Performance tuning via multithreading with multiple connections to MotherDuck with JDBC --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import useBaseUrl from '@docusaurus/useBaseUrl'; # Multithreading and parallelism with JDBC Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create [connections in multiple threads](#connections-in-multiple-threads). For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. ## Connections in multiple threads If you have multiple parallelizable queries you want to run in quick succession, you could benefit from concurrency. :::note Concurrency is supported by DuckDB, across multiple threads, as described in the [Concurrency](https://duckdb.org/docs/connect/concurrency.html) documentation page. However, be mindful when using this approach, as parallelism does not always lead to better performance. Read the notes on [Parallelism](https://duckdb.org/docs/guides/performance/how_to_tune_workloads.html#parallelism-multi-core-processing) in the DuckDB documentation to understand the specific scenarios in which concurrent queries can be beneficial. ::: First, let's create a class `MultithreadingExample` and get the MotherDuck token from your environment variables. ```java package com.example; import org.duckdb.DuckDBConnection; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.List; import java.util.Properties; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; /** * Examples for multithreading and connection pooling */ public class MultithreadingExample { private static final String token = System.getenv("motherduck_token"); private final static Logger logger = LoggerFactory.getLogger(MultithreadingExample.class); ``` To use multiple threads, pass the connection object to each thread, and create a copy of the connection with the `.duplicate()` method to run a query: ```java private static void runQueryFromThread(String label, Connection conn, String query) { try (Connection dupConn = ((DuckDBConnection) conn).duplicate(); Statement st = dupConn.createStatement(); ResultSet rs = st.executeQuery(query)) { if (rs.next()) { logger.info("{}: found at least one row", label); } else { logger.info("{}: no rows found", label); } } catch (SQLException e) { throw new RuntimeException("can't run query", e); } } ``` You can then use a thread pool executor to run the queries using the `runQueryFromThread` method: ```java public static void main(String[] args) throws SQLException, InterruptedException { // Check that a motherduck_token exists if (token == null) { throw new IllegalArgumentException( "Please provide `motherduck_token` environment variable"); } // Add MotherDuck token to config Properties config = new Properties(); config.setProperty("motherduck_token", token); // Create list of queries to run in multiple threads List queries = new ArrayList<>(); queries.add("SELECT 42;"); queries.add("SELECT 'Hello World!';"); int num_queries = queries.size(); // Create thread pool executor and run queries ExecutorService executor = Executors.newFixedThreadPool(num_queries); try (Connection mdConn = DriverManager.getConnection("jdbc:duckdb:md:my_db", config);) { for (int i = 0; i < num_queries; i++) { String label = "query " + i; String query = queries.get(i); executor.submit(() -> runQueryFromThread(label, mdConn, query)); } executor.shutdown(); boolean success = executor.awaitTermination(30, TimeUnit.SECONDS); } if (success) { logger.info("successfully ran {} queries in threads", num_queries); } } } ``` ## Connection pooling If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. For connection pools, we recommend using [HikariCP](https://github.com/brettwooldridge/HikariCP). Below is an example implementation. For this implementation, you can connect to a user account by providing a `motherduck_token` in your database path. The goal of this implementation is to distribute operations across multiple databases in a round-robin fashion. This `HikariMultiPoolManager` class manages multiple `HikariDataSource`s (connection pools) which each connect to a different connection url, and rotates between them when `getConnection()` is called. You can specify a pool size which is applied to all `HikariDataSource`s. ```java package com.example; import com.zaxxer.hikari.HikariDataSource; import com.zaxxer.hikari.HikariPoolMXBean; import org.duckdb.DuckDBConnection; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.sql.*; import java.util.ArrayList; import java.util.List; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; /** * Example DuckDB connection pool implementation */ public class HikariMultiPoolManager implements AutoCloseable { private static final String token = System.getenv("motherduck_token"); private final List dataSources; private final AtomicInteger index; private final static Logger logger = LoggerFactory.getLogger(HikariMultiPoolManager.class); public HikariMultiPoolManager(List urls, int maximumPoolSize) { // Create Hikari datasources from urls this.dataSources = new ArrayList<>(); for (String url : urls) { HikariDataSource ds = new HikariDataSource(); ds.setMaximumPoolSize(maximumPoolSize); ds.setJdbcUrl(url); dataSources.add(ds); } this.index = new AtomicInteger(0); } public Connection getConnection() throws SQLException { int ind = index.getAndIncrement() % dataSources.size(); HikariDataSource ds = dataSources.get(ind); return ds.getConnection(); } public void evict() throws Exception { for (HikariDataSource ds : dataSources) { HikariPoolMXBean poolBean = ds.getHikariPoolMXBean(); if (poolBean != null) { poolBean.softEvictConnections(); } } } @Override public void close() throws Exception { for (HikariDataSource ds : dataSources) { ds.close(); } } ``` ### How to set `urls` The `HikariMultiPoolManager` takes a list of `urls` and an optional input argument `maximumPoolSize` (defaults to 1). Each path in the list will get a `HikariDataSource` in the pool, that readers can use to query the database(s) they connect to. If you have a `maximumPoolSize` that is larger than 1, the pool will return thread-safe copies of those connections. This gives you a few options on how to configure the pool. :::note To learn more about database instances and connections, see [Connect to multiple databases](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#connect-to-multiple-databases). ::: To create a connection pool with 3 connections to **the same database**, you can pass a single database path, and set `maximumPoolSize=3`: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 3); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `maximumPoolSize` is larger than the number of databases. You can also create multiple connections to **the same database** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths and make them unique by adding `&user=` to the database path: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=1"); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=2"); urls.add("jdbc:duckdb:md:my_db?motherduck_token=" + token + "&access_mode=read_only&user=3"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **the same MotherDuck account** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token + "&access_mode=read_only"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **separate MotherDuck accounts** using *different DuckDB instances*. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```java List urls = new ArrayList<>(); urls.add("jdbc:duckdb:md:my_db1?motherduck_token=" + token1 + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db2?motherduck_token=" + token2 + "&access_mode=read_only"); urls.add("jdbc:duckdb:md:my_db3?motherduck_token=" + token3 + "&access_mode=read_only"); HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1); ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. ### How to run queries with a thread pool You can then fetch connections from the pool, for example, to run queries from a queue. You can use a `ThreadPoolExecutor` with 3 workers to fetch connections from the pool and run the queries using a `run_query` function: ```java private static String queryString(HikariMultiPoolManager pool, String query) throws SQLException { try (Connection conn = pool.getConnection(); Statement ps = conn.createStatement(); ResultSet rs = ps.executeQuery(query)) { logger.info("connection = {}", conn); String res = rs.next() ? rs.getString(1) : "[not found]"; logger.info("Got: {}", res); return res; } } public static void main(String[] args) throws Exception { if (token == null) { throw new IllegalArgumentException( "Please provide `motherduck_token` environment variable"); } List queries = new ArrayList<>(); // Add queries here // Example: queries.add("SELECT 42;"); queries.add("SELECT 'Hello World!';"); List urls = new ArrayList<>(); // Add urls here // Example: urls.add("jdbc:duckdb:md:my_db?user=1&motherduck_token=" + token); urls.add("jdbc:duckdb:md:my_db?user=2&motherduck_token=" + token); urls.add("jdbc:duckdb:md:my_db?user=3&motherduck_token=" + token); // Create thread pool and run queries try(HikariMultiPoolManager pool = new HikariMultiPoolManager(urls, 1);) { ExecutorService executor = Executors.newFixedThreadPool(urls.size()); for (String query : queries) { executor.submit(() -> queryString(pool, query)); } executor.shutdown(); boolean success = executor.awaitTermination(30, TimeUnit.SECONDS); if (success) { logger.info("successfully ran {} queries in threads with connection pool", queries.size()); } } } } ``` Reset the connection pool at least once every 24 hours, by soft evicting all connections. This ensures that you are always running on the latest version of MotherDuck. ```java pool.evict() ``` --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-nodejs --- sidebar_position: 3 title: Multithreading and parallelism with Node.js sidebar_label: Node.js description: Performance tuning via multithreading with multiple connections to MotherDuck with Node.js --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import useBaseUrl from '@docusaurus/useBaseUrl'; # Multithreading and parallelism with Node.js For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). Depending on the needs of your data application, you can use thread-based parallelism for improved performance, for example, if the queries are hybrid with CPU intensive work done locally. To enable thread-based parallelism, you can use [Node worker threads](https://nodejs.org/api/worker_threads.html#worker-threads) with one database connection in each thread. If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. ## Connection pooling If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use, so you can avoid the overhead of creating a new database object for each query. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. For connection pools, we recommend using [generic-pool](https://www.npmjs.com/package/generic-pool) with [@duckdb/node-api](https://www.npmjs.com/package/@duckdb/node-api) and overriding the `release` function to delete a connection if it's been in use for too long to optimize resource usage. First, let's create a file `md_connection_pool.js` to implement the connection pool class. Note that we are adding a new config option, `recycleTimeoutMillis`, that will help us recreate any connections (active or idle) that have been open for a given time. This is different from `idleTimeoutMillis`, which only destroys idle connections. ```javascript import { DuckDBInstance } from "@duckdb/node-api"; import * as genericPool from "generic-pool"; export class RecyclingPool extends genericPool.Pool { constructor(Evictor, Deque, PriorityQueue, factory, options) { super(Evictor, Deque, PriorityQueue, factory, options); // New _config option for when to recycle a non-idle connection this._config['recycleTimeoutMillis'] = (typeof options.recycleTimeoutMillis == 'undefined') ? undefined : parseInt(options.recycleTimeoutMillis); this._config['motherduckToken'] = (typeof options.motherduckToken == 'undefined') ? undefined : options.motherduckToken; console.log('Creating a RecyclingPool'); } release(resource) { const loan = this._resourceLoans.get(resource); const creationTime = typeof loan == 'undefined' ? 0 : loan.pooledResource.creationTime; // If the connection has been in use for longer than the recycleTimeoutMillis, then destroy it instead of releasing it back into the pool. // If that deletion brings the pool size below the min, a new connection will automatically be created within the destroy method. if (new Date(creationTime + this._config.recycleTimeoutMillis) <= new Date()) { return this.destroy(resource); } return super.release(resource); } } ``` You can then create an `MDFactory` class to create the connection in the pool, and use it with `createRecyclingPool` (equivalent to the `createPool` function from `generic-pool`). ```javascript export class MDFactory { constructor(opts) { this.opts = opts } async create() { console.log("Creating a connection"); const instance = await DuckDBInstance.fromCache('md:my_db', { motherduck_token: this.opts.motherduckToken, }); const connection = await instance.connect(); // Run any connection initialization commands here // For example, you can set THREADS = 1 if you want to limit duckdb to run on a single thread await connection.run("SET THREADS='1';"); return connection; } async destroy(connection) { console.log("Destroying a connection"); return connection.close(); } }; export function createRecyclingPool(config) { const factory = new MDFactory(config); return new RecyclingPool(genericPool.DefaultEvictor, genericPool.Deque, genericPool.PriorityQueue, factory, config); } ``` To try out the connection pool, you can create a file `md_connection_pool_test.js` that creates a `RecyclingPool` and submits a list of queries. To create the pool instance, first set the configuration options specified by `generic-pool` and pass them to the `createRecyclingPool` function. You can find the list of options in the [docs](https://www.npmjs.com/package/generic-pool). Below are a few example values that we recommend for using with MotherDuck. ```javascript import { createRecyclingPool } from "./md_connection_pool.js"; // If an idle eviction would bring us below the min pool size, a new connection is made after the eviction const opts = { max: 10, min: 3, // Background idle connection detruction process runs every evictionRunIntervalMillis // We don't want all connections to be evicted at the same time, so only destroy one at a time // Connection must be idle for softIdleTimeoutMillis before it is recycled. // (Additionally, we implemented recycleTimeoutMillis to also recycle active connections.) evictionRunIntervalMillis: 30000, numTestsPerEvictionRun: 1, softIdleTimeoutMillis: 90000, // Do not start to use a connection that is older than 20 minutes old. Recreate it first. // Set this higher than recycleTimeoutMillis below so that recycling will happen proactively rather than delay query execution. idleTimeoutMillis: 1200000, // Before returning resource to pool, check if it has been in existence longer than this timeout and if so, destroy it. // New connections will be added up to the min pool size during the destroy process, so this is proactive rather than reactive. recycleTimeoutMillis: 900000, // We don't want all the connections to recycle at the same time, so let's randomize it slightly. // This number should be smaller than the recycleTimeoutMillis recycleTimeoutJitter: 60000, // This gets your MotherDuck token from an environment variable. motherduckToken: process.env.motherduck_token, }; const myPool = createRecyclingPool(opts); ``` Then, you can use the pool to asynchronously acquire connections from the pool and run a list of queries. ```javascript let promiseArray = []; let queries = ["SELECT 42", "SELECT 'Hello World!'"]; for (let i=0; i < queries.length; i++) { // Promise is resolved once a resource becomes available console.log("Acquire connection from pool"); promiseArray.push(myPool.acquire()); promiseArray[i] .then(async function(client) { console.log("Starting query"); const results = await client.all(queries[i]); console.log("Results: ", results[0]); await new Promise(r => setTimeout(r, 200)); // Delay for testing // Release the connection (or destroy if it exceeds recycleTimeoutMillis) myPool.release(client); }) .catch(function(err) { console.log(err) }); } ``` You can create additional connection pools that connect to different MotherDuck databases by changing the MotherDuck token. ```javascript const opts2 = { ...opts, motherduckToken: process.env.motherduck_token_2}; const myPool2 = createRecyclingPool(opts2); ``` To shutdown and stop using a pool, you can optionally run the following code in your application: ```javascript myPool.drain().then(function() { myPool.clear(); }); ``` To test the pool, run: ```bash npm install @duckdb/node-api npm install generic-pool export motherduck_token="" # Add your MotherDuck token here node md_connection_pool_test.js ``` --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python --- sidebar_position: 1 title: Multithreading and parallelism with Python sidebar_label: Python description: Performance tuning via multithreading with multiple connections to MotherDuck with Python --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import useBaseUrl from '@docusaurus/useBaseUrl'; # Multithreading and parallelism with Python Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create [connections in multiple threads](#connections-in-multiple-threads). For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use [connection pooling](#connection-pooling). If you need to run many concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. ## Connections in multiple threads If you have multiple parallelizable queries you want to run in quick succession, you could benefit from concurrency. :::note Concurrency is supported by DuckDB, across multiple Python threads, as described in the [Multiple Python Threads](https://duckdb.org/docs/guides/python/multiple_threads.html) documentation page. However, be mindful when using this approach, as parallelism does not always lead to better performance. Read the notes on [Parallelism](https://duckdb.org/docs/guides/performance/how_to_tune_workloads.html#parallelism-multi-core-processing) in the DuckDB documentation to understand the specific scenarios in which concurrent queries can be beneficial. ::: A single DuckDB connection [is not thread-safe](https://duckdb.org/docs/api/python/overview.html#using-connections-in-parallel-python-programs). To use multiple threads, pass the connection object to each thread, and create a copy of the connection with the `.cursor()` method to run a query: ```python import duckdb from threading import Thread duckdb_con = duckdb.connect('md:my_db') def query_from_thread(duckdb_con, query): cur = duckdb_con.cursor() result = cur.execute(query).fetchall() print(result) cur.close() queries = ["SELECT 42", "SELECT 'Hello World!'"] threads = [] for i in range(len(queries)): threads.append(Thread(target = query_from_thread, args = (duckdb_con, query,), name = 'query_' + str(i))) for thread in threads: thread.start() for thread in threads: thread.join() ``` ## Connection pooling If your application needs multiple read-only connections to a MotherDuck database, for example, to handle requests in a queue, you can use a Connection Pool. A Connection Pool keeps connections open for a longer period for efficient re-use. The connections in your pool can connect to one database in the same MotherDuck account, or multiple databases in one or more accounts. To run concurrent read-only queries on the same MotherDuck account, you can use a [Read Scaling](/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) token. For connection pools, we recommend using [SQLAlchemy](https://docs.sqlalchemy.org/14/core/pooling.html). Below is an example implementation. For this implementation, you can connect to a user account by providing a `motherduck_token` in your database path. ```python import logging from itertools import cycle from threading import Lock import duckdb import sqlalchemy.pool as pool from sqlalchemy.engine import make_url _log = logging.getLogger(__name__) logging.basicConfig(level=logging.DEBUG) class DuckDBPool(pool.QueuePool): """Connection pool for DuckDB databases (MD or local). When you run con = pool.connect(), it will return a cached copy of one of the database connections in the pool. When you run con.close(), it doesn't close the connection, it just returns it to the pool. Args: database_paths: A list of unique databases to connect to. """ def __init__( self, database_paths, max_overflow=0, timeout=60, reset_on_return=None, *args, **kwargs ): self.database_paths = database_paths self.gen_database_path = cycle(database_paths) self.pool_size = kwargs.pop("pool_size", len(database_paths)) self.lock = Lock() super().__init__( self._next_conn, *args, max_overflow=max_overflow, pool_size=self.pool_size, reset_on_return=reset_on_return, timeout=timeout, **kwargs ) def _next_conn(self): with self.lock: path = next(self.gen_database_path) duckdb_conn = duckdb.connect(path) url = make_url(f"duckdb:///{path}") _log.debug(f"Connected to database: {url.database}") return duckdb_conn ``` ### How to set `database_paths` The `DuckDBPool` takes a list of `database_paths` and an optional input argument `pool_size` (defaults to the number of paths). Each path in the list will get a DuckDB connection in the pool, that readers can use to query the database(s) they connect to. If you have a `pool_size` that is larger than the number of paths, the pool will return thread-safe copies of those connections. This gives you a few options on how to configure the pool. :::note To learn more about database instances and connections, see [Connect to multiple databases](/docs/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#connect-to-multiple-databases). ::: To create a connection pool with 3 connections to **the same database**, you can pass a single database path, and set `pool_size=3`: ```python path = "md:my_db?motherduck_token=&access_mode=read_only" conn_pool = DuckDBPool([path], pool_size=3) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **the same database** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths and make them unique by adding `&user=` to the database path: ```python paths = [ "md:my_db?motherduck_token=&access_mode=read_only&user=1", "md:my_db?motherduck_token=&access_mode=read_only&user=2", "md:my_db?motherduck_token=&access_mode=read_only&user=3", ] conn_pool = DuckDBPool(paths) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **the same MotherDuck account** using **different DuckDB instances**. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```python paths = [ "md:my_db1?motherduck_token=&access_mode=read_only", "md:my_db2?motherduck_token=&access_mode=read_only", "md:my_db3?motherduck_token=&access_mode=read_only", ] conn_pool = DuckDBPool(paths) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. You can also create multiple connections to **separate databases** in **separate MotherDuck accounts** using *different DuckDB instances*. However, keep in mind that each connection takes time to establish. Create multiple paths where each uses a different database path: ```python paths = [ "md:my_db1?motherduck_token=&access_mode=read_only", "md:my_db2?motherduck_token=&access_mode=read_only", "md:my_db3?motherduck_token=&access_mode=read_only", ] conn_pool = DuckDBPool(paths) ``` Set `access_mode=read_only` to avoid potential write conflicts. This is especially important if your `pool_size` is larger than the number of databases. ### How to run queries with a thread pool You can then fetch connections from the pool, for example, to run queries from a queue. You can use a `ThreadPoolExecutor` with 3 workers to fetch connections from the pool and run the queries using a `run_query` function: ```python from concurrent.futures import ThreadPoolExecutor def run_query(conn_pool: DuckDBPool, query: str): _log.debug(f"Run query: {query}") conn = conn_pool.connect() rows = conn.execute(query) res = rows.fetchall() conn.close() _log.debug(f"Done running query: {query}") return res with ThreadPoolExecutor(max_workers=3) as executor: conn_pool = DuckDBPool(database_paths) futures = [executor.submit(run_query, conn_pool, query) for query in queries] for future, query in zip(futures, queries): result = future.result() print(f"Query [{query}] num rows: {len(result)}") ``` Reset the connection pool at least once every 24 hours, by closing and reopening all connections. This ensures that you are always running on the latest version of MotherDuck. ```python conn_pool.dispose() conn_pool.recreate() ``` --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism --- title: Multithreading and parallelism description: Learn how to use multithreading and parallelism for special cases to read data from MotherDuck --- DuckDB supports two concurrency models: - Single-process read/write where one process can both read and write to the database. - Multi-process read-only (access_mode = 'READ_ONLY') multiple processes can read from the database, but none can write. This approach provides significant performance benefits for analytics databases. You can find more details on how to handle multiple process writes (or multiple read + write connections) in the [DuckDB documentation](https://duckdb.org/docs/stable/connect/concurrency.html). ## Multi-threading and parallelism in different languages Depending on the needs of your data application, you can use multithreading for improved performance. If your queries will benefit from concurrency, you can create connections in multiple threads. This is the case when for example you have multiple users reading different sets of data, or if you are reading from separate tables or data files at the same time. For multiple long-lived connections to one or more databases in one or more MotherDuck accounts, you can use connection pooling. Implementation details can be seen in the cards linked below: ## Included pages - [Multithreading and parallelism with Python](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python): Performance tuning via multithreading with multiple connections to MotherDuck with Python - [Multithreading and parallelism with JDBC](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-jdbc): Performance tuning via multithreading with multiple connections to MotherDuck with JDBC - [Multithreading and parallelism with Node.js](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-nodejs): Performance tuning via multithreading with multiple connections to MotherDuck with Node.js --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/cloudflare-workers --- sidebar_position: 5 title: Connect from Cloudflare Workers description: Query MotherDuck from Cloudflare Workers using the Postgres wire protocol feature_stage: preview --- Cloudflare Workers do not support native DuckDB bindings, but they can connect to MotherDuck through the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) using the [`pg`](https://www.npmjs.com/package/pg) npm package. This gives you a thin-client path to query MotherDuck from edge functions without any DuckDB dependencies. This guide walks through building a Worker that queries NYC taxi data from MotherDuck's built-in `sample_data` database. The full source code is available in the [motherduck-examples](https://github.com/motherduckdb/motherduck-examples/tree/main/cloudflare-workers) repository. ## Prerequisites - [Node.js](https://nodejs.org/) v18+ - A [Cloudflare account](https://dash.cloudflare.com/sign-up) - A [MotherDuck account](https://motherduck.com/) and [access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) ## Project setup Create a new directory and install dependencies: ```bash mkdir motherduck-worker && cd motherduck-worker npm init -y npm install pg@^8.16.3 npm install --save-dev wrangler @types/pg ``` ### Configure wrangler.toml ```toml name = "motherduck-taxi-stats" main = "src/index.ts" compatibility_date = "2026-04-02" compatibility_flags = ["nodejs_compat"] [vars] MOTHERDUCK_HOST = "pg.us-east-1-aws.motherduck.com" MOTHERDUCK_DB = "sample_data" ``` The `nodejs_compat` flag is required — it enables the `node:net` module that the `pg` package uses for TCP connections. Use a `compatibility_date` on or after `2024-09-23`; in practice, set it to today's date when you create the project. Generate the Worker binding types after you save `wrangler.toml`: ```bash npx wrangler types ``` ### Store your token as a secret ```bash npx wrangler secret put MOTHERDUCK_TOKEN ``` This prompts you to paste your MotherDuck token. It's stored encrypted and injected as an environment variable at runtime — it never appears in your source code or `wrangler.toml`. For local development, create a `.dev.vars` file (add this to `.gitignore`): ```text MOTHERDUCK_TOKEN="your_token_here" ``` ## Write the Worker Create `src/index.ts`. We'll build this in two parts: first the connection and routing, then the route handlers. ### Connect and route requests ```typescript import { Client } from "pg"; interface Env { MOTHERDUCK_HOST: string; MOTHERDUCK_DB: string; MOTHERDUCK_TOKEN: string; } function createClient(env: Env): Client { return new Client({ connectionString: `postgresql://user:${env.MOTHERDUCK_TOKEN}@${env.MOTHERDUCK_HOST}:5432/${env.MOTHERDUCK_DB}?sslmode=require`, }); } export default { async fetch(request: Request, env: Env): Promise { const url = new URL(request.url); if (url.pathname === "/stats") { return handleStats(env, url); } return handleDefault(env); }, }; ``` The connection string is assembled from the environment variables defined in `wrangler.toml` and the secret token. The `?sslmode=require` parameter tells `pg` to open a TLS connection, and the Workers runtime performs certificate verification. The `fetch` handler routes first and opens a database connection only inside the route handlers. That keeps validation failures on `/stats` returning `400` instead of depending on database connectivity. ### Handle route logic Add the two handler functions to the same file. The `/stats` route accepts date range parameters and returns aggregated fare data. It validates inputs before querying and uses parameterized queries (`$1`, `$2`) to prevent SQL injection — never interpolate user input directly into SQL strings. ```typescript async function handleStats(env: Env, url: URL): Promise { const startDate = url.searchParams.get("start"); const endDate = url.searchParams.get("end"); if (!startDate || !endDate) { return Response.json( { error: "Both 'start' and 'end' query parameters are required. Use YYYY-MM-DD format." }, { status: 400 } ); } const datePattern = /^\d{4}-\d{2}-\d{2}$/; if (!datePattern.test(startDate) || !datePattern.test(endDate)) { return Response.json( { error: "Invalid date format. Use YYYY-MM-DD." }, { status: 400 } ); } const client = createClient(env); try { await client.connect(); const result = await client.query( `SELECT sum(passenger_count)::INTEGER AS total_passengers, round(sum(fare_amount), 2) AS total_fare FROM nyc.taxi WHERE tpep_pickup_datetime >= $1 AND tpep_pickup_datetime < $2`, [`${startDate} 00:00:00`, `${endDate} 00:00:00`] ); return Response.json({ start: startDate, end: endDate, ...result.rows[0], }); } finally { await client.end(); } } ``` The default route returns a sample of recent taxi trips — no user input needed: ```typescript async function handleDefault(env: Env): Promise { const client = createClient(env); try { await client.connect(); const result = await client.query( `SELECT tpep_pickup_datetime AS pickup, tpep_dropoff_datetime AS dropoff, passenger_count, trip_distance, fare_amount, tip_amount, total_amount FROM nyc.taxi ORDER BY tpep_pickup_datetime DESC LIMIT 20` ); return Response.json(result.rows); } finally { await client.end(); } } ``` ## Test locally ```bash npx wrangler dev ``` Then open `http://localhost:8787/` or try the stats endpoint with a date range: ```text http://localhost:8787/stats?start=2022-11-01&end=2022-12-01 ``` If `wrangler dev` starts successfully but direct Postgres queries fail locally with `Connection terminated`, switch to the Hyperdrive setup below and use a `localConnectionString` for local testing, or run `npx wrangler dev --remote` to exercise the Cloudflare runtime directly. ## Deploy ```bash npx wrangler deploy ``` ## Using Hyperdrive for connection pooling For production workloads, [Cloudflare Hyperdrive](https://developers.cloudflare.com/hyperdrive/) provides built-in connection pooling. This reduces latency by reusing connections across Worker invocations instead of opening a new connection per request. ### 1. create a Hyperdrive configuration ```bash npx wrangler hyperdrive create motherduck-db \ --connection-string="postgresql://user:$MOTHERDUCK_TOKEN@pg.us-east-1-aws.motherduck.com:5432/sample_data?sslmode=require" ``` ### 2. update wrangler.toml ```toml name = "motherduck-taxi-stats" main = "src/index.ts" compatibility_date = "2026-04-02" compatibility_flags = ["nodejs_compat"] [[hyperdrive]] binding = "MD_HYPERDRIVE" id = "" ``` ### 3. update the connection code Replace the connection string construction with: ```typescript const client = new Client({ connectionString: env.MD_HYPERDRIVE.connectionString, }); ``` Hyperdrive handles connection pooling and credential injection automatically. For local development with Hyperdrive, configure a direct connection string for `wrangler dev`: ```bash export CLOUDFLARE_HYPERDRIVE_LOCAL_CONNECTION_STRING_MD_HYPERDRIVE="postgresql://user:$MOTHERDUCK_TOKEN@pg.us-east-1-aws.motherduck.com:5432/sample_data?sslmode=require" npx wrangler dev ``` ## SSL notes Cloudflare Workers use `pg-cloudflare` for socket connections, which delegates TLS to the Workers runtime through `cloudflare:sockets`. The runtime encrypts the connection and verifies the server certificate against Cloudflare's trust store, but those verification settings are not exposed through the `pg` client. In this environment, application code uses the runtime-managed TLS configuration rather than supplying `rejectUnauthorized`, custom CA certificates, or `sslmode=verify-full`. Use `?sslmode=require` in the connection string. This tells `pg` to initiate TLS using STARTTLS, and the Workers runtime handles the actual certificate verification at the socket level. For standard Node.js environments where you can configure certificate verification directly, see [Connect from Node.js](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/nodejs). --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/drizzle --- sidebar_position: 6 title: "Connect from Drizzle via Postgres endpoint" sidebar_label: Drizzle description: Use Drizzle as a typed wrapper around the pg driver to query MotherDuck via the Postgres wire protocol feature_stage: preview --- [Drizzle](https://orm.drizzle.team/) is a TypeScript ORM with both relational and SQL-like query APIs. It runs in Node.js servers, Vercel functions, Cloudflare Workers, and other edge runtimes. You can use Drizzle with MotherDuck through the Postgres endpoint. Drizzle's `drizzle-orm/node-postgres` integration wraps the `pg` driver, so you get the typed `db.execute(sql\`...\`)` API and connection lifecycle management on top of the same Postgres-protocol connection covered in [Connect from Node.js](./nodejs.md). Use Drizzle here as a **typed query executor over `pg`**, not as a schema-and-migrations ORM. Drizzle's schema introspection, code-first migrations (`drizzle-kit pull` / `migrate` / `push`), and query-builder code generation all assume a Postgres backend with `pg_catalog` and Postgres DDL semantics — none of which the pg endpoint exposes. Define your MotherDuck schema separately (DuckDB client, MotherDuck UI, or SQL scripts) and use Drizzle for query execution. For connection parameters, SSL options, and limitations, see the [Postgres Endpoint reference](/sql-reference/postgres-endpoint). ## Prerequisites You'll need a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Set it as an environment variable: ```bash export MOTHERDUCK_TOKEN="your_token_here" ``` Install Drizzle and `pg`: ```bash npm install drizzle-orm pg npm install --save-dev @types/pg ``` ## Connect Wrap a `pg` client with `drizzle()`. As with the bare `pg` client, pass SSL through the config object — do **not** put `sslrootcert=system` in a connection string, since node-postgres tries to read `system` as a file path and throws `ENOENT`. ```ts import pg from "pg"; import { drizzle } from "drizzle-orm/node-postgres"; import { sql } from "drizzle-orm"; const client = new pg.Client({ host: "pg.us-east-1-aws.motherduck.com", port: 5432, user: "postgres", password: process.env.MOTHERDUCK_TOKEN, database: "md:", ssl: { rejectUnauthorized: true }, }); await client.connect(); const db = drizzle(client); const { rows } = await db.execute(sql` SELECT title, score FROM sample_data.hn.hacker_news WHERE type = ${'story'} LIMIT 10 `); console.log(rows); await client.end(); ``` Using `md:` as the database name connects to your default database in `workspace` [attach mode](key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/attach-modes.md), so all databases attached in your MotherDuck workspace are accessible. To connect to a specific database, pass its name in `database` (e.g., `database: "my_db"`) — this uses `single` attach mode by default. The `sql` template tag is what you'll use most. It produces parameterized queries against the pg endpoint and lets you write DuckDB SQL directly, including three-part names (`database.schema.table`), DuckDB functions, and DuckDB-specific syntax. For pure dynamic SQL with no parameters, `sql.raw("...")` works too. ## Read scaling and concurrency For concurrent workloads, MotherDuck's pg endpoint can route each session to a separate read replica using the `session_hint` startup option — this dramatically improves throughput under concurrency. See [Session affinity and routing](/concepts/scaling-patterns/#session-affinity-and-routing) for the underlying scaling pattern. Drizzle's `Pool` doesn't expose per-connection startup options, so for read scaling you'll want a raw `pg.Client` per session: ```ts const client = new pg.Client({ host: "pg.us-east-1-aws.motherduck.com", port: 5432, user: "postgres", password: process.env.MOTHERDUCK_TOKEN, database: "md:", ssl: { rejectUnauthorized: true }, options: "-c session_hint=user_1", // unique per concurrent session }); await client.connect(); const db = drizzle(client); ``` In benchmarking, `session_hint` cut 5-user concurrent latency from ~16s to ~1.3s on the same workload. ## What doesn't work The pg endpoint speaks DuckDB SQL, not Postgres SQL, and doesn't expose Postgres system catalogs. Drizzle features that depend on either will fail: - **`drizzle-kit migrate`, `push`, `generate`** — these execute Postgres DDL and assume Postgres migration tracking. Manage your MotherDuck schema separately. - **`drizzle-kit pull` / `introspect`** — schema introspection queries `pg_catalog` tables that don't exist on the pg endpoint. - **`pgTable(...)` schema definitions for query-builder calls** (`db.select().from(...)`) work for simple cases but are brittle: Drizzle treats the table name as a single quoted identifier, so three-part DuckDB names (`database.schema.table`) need careful handling. Prefer `db.execute(sql\`...\`)` with explicit SQL until you know the shape you need. - **Standard pg endpoint limits** — local-file `COPY`, `INSTALL` / `LOAD`, `SET`, temp tables, and result-creation commands are not supported. See the [main pg endpoint reference](/sql-reference/postgres-endpoint) for the full list. ## SSL notes Setting `ssl: { rejectUnauthorized: true }` is the equivalent of `sslmode=verify-full` with `sslrootcert=system` in libpq — node-postgres uses Node's built-in trusted root store. For a custom CA, see the [Node.js page](./nodejs.md#ssl-notes); the same approach applies when wrapping the client with `drizzle()`. For more details on SSL options across drivers, see [SSL and certificate verification](/sql-reference/postgres-endpoint#ssl-and-certificate-verification). --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/java --- sidebar_position: 3 title: "Connect from Java via Postgres endpoint" sidebar_label: Java (JDBC) description: Connect to MotherDuck from Java using the PostgreSQL JDBC driver via the Postgres wire protocol feature_stage: preview --- You can query MotherDuck from Java using the standard [PostgreSQL JDBC driver](https://jdbc.postgresql.org/) — no DuckDB installation required. For connection parameters, SSL options, and limitations, see the [Postgres Endpoint reference](/sql-reference/postgres-endpoint). ## Prerequisites You'll need a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Set it as an environment variable: ```bash export MOTHERDUCK_TOKEN="your_token_here" ``` Add the PostgreSQL JDBC driver to your project: import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ```xml org.postgresql postgresql 42.7.5 ``` ```groovy implementation 'org.postgresql:postgresql:42.7.5' ``` ## Connect ```java import java.sql.*; public class MotherDuckExample { public static void main(String[] args) throws SQLException { String token = System.getenv("MOTHERDUCK_TOKEN"); String url = "jdbc:postgresql://pg.us-east-1-aws.motherduck.com:5432/md:" + "?sslmode=verify-full" + "&sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory"; try (Connection conn = DriverManager.getConnection(url, "postgres", token); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery( "SELECT title, score FROM sample_data.hn.hacker_news WHERE type='story' LIMIT 10")) { ResultSetMetaData meta = rs.getMetaData(); int columnCount = meta.getColumnCount(); while (rs.next()) { for (int i = 1; i <= columnCount; i++) { System.out.print(meta.getColumnName(i) + "=" + rs.getString(i)); if (i < columnCount) System.out.print(", "); } System.out.println(); } } } } ``` You can also configure the connection using a `Properties` object: ```java import java.sql.*; import java.util.Properties; Properties props = new Properties(); props.setProperty("user", "postgres"); props.setProperty("password", System.getenv("MOTHERDUCK_TOKEN")); props.setProperty("sslmode", "verify-full"); props.setProperty("sslfactory", "org.postgresql.ssl.DefaultJavaSSLFactory"); Connection conn = DriverManager.getConnection( "jdbc:postgresql://pg.us-east-1-aws.motherduck.com:5432/md:", props ); ``` ## SSL notes The PostgreSQL JDBC driver looks for a root certificate at `~/.postgresql/root.crt` by default. To use your JVM's built-in truststore instead (which includes standard CAs like Let's Encrypt), set `sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory`. If certificate verification doesn't work in your environment, you can fall back to `sslmode=require`, which encrypts the connection but doesn't verify the server certificate. For more details on SSL options, see [SSL and certificate verification](/sql-reference/postgres-endpoint#ssl-and-certificate-verification). --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/nodejs --- sidebar_position: 4 title: "Connect from Node.js via Postgres endpoint" sidebar_label: Node.js description: Connect to MotherDuck from Node.js using the pg (node-postgres) library via the Postgres wire protocol feature_stage: preview --- You can query MotherDuck from Node.js using [node-postgres](https://node-postgres.com/) (`pg`) — no DuckDB installation required. For connection parameters, SSL options, and limitations, see the [Postgres Endpoint reference](/sql-reference/postgres-endpoint). ## Prerequisites You'll need a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Set it as an environment variable: ```bash export MOTHERDUCK_TOKEN="your_token_here" ``` Install the `pg` package: ```bash npm install pg ``` ## Connect Use a configuration object to connect. Do **not** pass `sslrootcert=system` in a connection string — node-postgres tries to read `system` as a file path and throws an `ENOENT` error. ```js import pg from "pg"; const client = new pg.Client({ host: "pg.us-east-1-aws.motherduck.com", port: 5432, user: "postgres", password: process.env.MOTHERDUCK_TOKEN, database: "md:", ssl: { rejectUnauthorized: true }, }); await client.connect(); const { rows } = await client.query( "SELECT title, score FROM sample_data.hn.hacker_news WHERE type='story' LIMIT 10" ); console.log(rows); await client.end(); ``` ## SSL notes Node.js uses the operating system's certificate store by default. Setting `ssl: { rejectUnauthorized: true }` tells node-postgres to use TLS and verify the server certificate against these trusted roots — this is the equivalent of `sslmode=verify-full` with `sslrootcert=system` in libpq. If you need to specify a custom CA certificate (for example, the [ISRG Root X1](https://letsencrypt.org/certs/isrgrootx1.pem) certificate from Let's Encrypt): ```js import fs from "fs"; const client = new pg.Client({ host: "pg.us-east-1-aws.motherduck.com", port: 5432, user: "postgres", password: process.env.MOTHERDUCK_TOKEN, database: "md:", ssl: { rejectUnauthorized: true, ca: fs.readFileSync("/path/to/isrgrootx1.pem").toString(), }, }); ``` For more details on SSL options, see [SSL and certificate verification](/sql-reference/postgres-endpoint#ssl-and-certificate-verification). :::info Cloudflare Workers Cloudflare Workers use a different socket implementation (`pg-cloudflare`) that handles SSL differently. See [Connect from Cloudflare Workers](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/cloudflare-workers) for Workers-specific setup. ::: --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/postgres-endpoint --- sidebar_position: 1 title: Connect via the Postgres endpoint description: Connect to MotherDuck using any Postgres-compatible client via the Postgres wire protocol endpoint feature_stage: preview --- MotherDuck's Postgres endpoint lets you query your databases using any client that speaks the [PostgreSQL wire protocol](https://www.postgresql.org/docs/current/protocol.html) — without installing a DuckDB client library. This is ideal for serverless environments, BI tools, or languages without a DuckDB SDK. For full-featured access — including hybrid execution, local caching, and the complete DuckDB extension ecosystem — use the [DuckDB SDK](/getting-started/interfaces/client-apis/) instead. ## Before you start You'll need a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Set it as an environment variable: ```bash export MOTHERDUCK_TOKEN="your_token_here" ``` ## Connect with psql ```bash PGPASSWORD=$MOTHERDUCK_TOKEN psql \ -h pg.us-east-1-aws.motherduck.com \ -p 5432 \ -U postgres \ "dbname=md: sslmode=verify-full sslrootcert=system" ``` ## Connect with a URI ```sh postgresql://postgres:$MOTHERDUCK_TOKEN@pg.us-east-1-aws.motherduck.com:5432/md:?sslmode=verify-full&sslrootcert=system ``` Use `md:` as the database name to connect to your default database. To connect to a specific database, replace `md:` with the database name, for example `sample_data`. :::info For security, always use environment variables for your MotherDuck token. Never hardcode tokens in your application code. ::: ## Secure your connection Always connect with SSL enabled. The recommended approach is `sslmode=verify-full` with `sslrootcert=system`, which verifies the server certificate against your operating system's trusted roots. If your client doesn't support this, you can download the [ISRG Root X1](https://letsencrypt.org/certs/isrgrootx1.pem) certificate from Let's Encrypt and set `sslrootcert` to its path. Some libraries (psycopg2, JDBC, node-postgres) handle SSL differently — see the language-specific guides below or the [SSL reference](/sql-reference/postgres-endpoint#ssl-and-certificate-verification) for details. ## Key things to know - You're writing **DuckDB SQL**, not PostgreSQL SQL. Queries and MotherDuck SQL that run entirely inside MotherDuck generally work, but the Postgres endpoint is not a full DuckDB client. - Commands that depend on **local files, local attachments, or extension management** are not supported over the Postgres endpoint. Examples: local-file `COPY`, `EXPORT DATABASE`, `IMPORT DATABASE`, `ATTACH ':memory:'`, `ATTACH '/path/to/file.duckdb'`, `CREATE DATABASE ... FROM '/path/to/file.duckdb'`, `MD_RUN=LOCAL` on table functions, `INSTALL`, and `LOAD`. - Use the Postgres endpoint for query execution, DDL and DML on MotherDuck tables, metadata inspection, and server-side reads from remote storage. - Avoid using `SET` statements, temporary tables, or result-creation commands — those are not supported in Postgres-endpoint server mode. - Prefer **long-lived connections** rather than opening and closing per query. For high-concurrency applications, use a connection pooler. ## DuckLake databases You can query and write to MotherDuck-managed [DuckLake](/concepts/ducklake/) databases over the Postgres endpoint the same way as native-storage MotherDuck databases — connect with a [read-write token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token) and run `SELECT`, DDL, and DML against them. The standard Postgres endpoint limitations above still apply (for example, client-side `COPY` from local files is not supported). Using the Postgres endpoint as the metadata catalog for a self-hosted DuckLake by pointing a DuckDB client running DuckLake at the endpoint as its catalog backend, is not supported yet. ## Language and platform guides - [Connect from Python (psycopg2 / psycopg3)](./python) - [Connect from Java (JDBC)](./java) - [Connect from Node.js](./nodejs) - [Connect from Cloudflare Workers](./cloudflare-workers) - [Connect from Drizzle](./drizzle) ## Reference For connection parameters, SSL options, session settings, and limitations, see the [Postgres Endpoint reference](/sql-reference/postgres-endpoint). --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/python --- sidebar_position: 2 title: "Connect from Python via Postgres endpoint" sidebar_label: Python description: Connect to MotherDuck from Python using psycopg2 or psycopg3 via the Postgres wire protocol feature_stage: preview --- You can query MotherDuck from Python using standard PostgreSQL client libraries. No DuckDB installation is required. This guide covers [psycopg2](https://www.psycopg.org/docs/) and [psycopg (v3)](https://www.psycopg.org/psycopg3/docs/). For connection parameters, SSL options, and limitations, see the [Postgres Endpoint reference](/sql-reference/postgres-endpoint). ## Prerequisites You need a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck). Set it as an environment variable: ```bash export MOTHERDUCK_TOKEN="your_token_here" ``` import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Connect ```python # /// script # dependencies = ["psycopg"] # /// import os import psycopg with psycopg.connect( host="pg.us-east-1-aws.motherduck.com", # or us-west-2-aws or eu-central-1-aws port=5432, dbname="md:", user="postgres", password=os.environ["MOTHERDUCK_TOKEN"], sslmode="verify-full", sslrootcert="system", # available in libpq 16+ ) as conn: with conn.cursor() as cur: cur.execute( """ SELECT title, score FROM sample_data.hn.hacker_news WHERE type = 'story' ORDER BY score DESC LIMIT 5 """ ) for row in cur: print(row) ``` You can also use a connection URI: ```python import os import psycopg token = os.environ["MOTHERDUCK_TOKEN"] with psycopg.connect( f"postgresql://postgres:{token}@pg.us-east-1-aws.motherduck.com:5432/md:?sslmode=verify-full&sslrootcert=system" ) as conn: with conn.cursor() as cur: cur.execute("SELECT current_database()") print(cur.fetchone()) ``` ```python # /// script # dependencies = ["psycopg2-binary", "certifi"] # /// import os import certifi import psycopg2 conn = psycopg2.connect( host="pg.us-east-1-aws.motherduck.com", # or us-west-2-aws or eu-central-1-aws port=5432, dbname="md:", user="postgres", password=os.environ["MOTHERDUCK_TOKEN"], sslmode="verify-full", sslrootcert=certifi.where(), ) with conn: with conn.cursor() as cur: cur.execute( """ SELECT title, score FROM sample_data.hn.hacker_news WHERE type = 'story' ORDER BY score DESC LIMIT 5 """ ) for row in cur.fetchall(): print(row) ``` Use `md:` as the database name to connect to your default database, or replace it with a specific database name such as `sample_data`. ## Loading data from Python For loading through the Postgres endpoint, the recommended pattern is server-side reads from remote storage: - Use `psycopg` or SQLAlchemy to execute `CREATE TABLE AS SELECT` or `INSERT INTO ... SELECT`. - Point `read_parquet`, `read_csv`, or `read_json` at S3, GCS, R2, Azure, or HTTPS. - Set `MD_RUN = REMOTE` on those file reads. Example with SQLAlchemy: ```python import os from sqlalchemy import create_engine, text engine = create_engine( "postgresql+psycopg://postgres:@pg.us-east-1-aws.motherduck.com:5432/md:", connect_args={ "password": os.environ["MOTHERDUCK_TOKEN"], "sslmode": "require", }, ) with engine.begin() as conn: conn.execute( text( """ CREATE OR REPLACE TABLE my_db.main.weather_events AS SELECT * FROM read_csv( 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv', HEADER = true, AUTO_DETECT = true, MD_RUN = REMOTE ) """ ) ) ``` The following patterns are not supported from Python over the Postgres endpoint: - `COPY ... FROM '/local/file.csv'` - `cursor.copy(...)` / `COPY FROM STDIN` - `psql \copy` - `MD_RUN = LOCAL` - SQLAlchemy's default `executemany` path for bulk ingest If the rows exist only in application memory and the volume is modest, prefer explicit multi-values `INSERT` statements. For large local bulk loads, switch to a DuckDB client path instead. See [Loading data through the Postgres endpoint](/key-tasks/loading-data-into-motherduck/loading-data-via-postgres-endpoint) for the full decision guide. ## SSL notes - **psycopg (v3)** wraps libpq and supports `sslrootcert=system` directly. - **psycopg2** bundles its own statically linked OpenSSL, so `sslrootcert=system` is not supported. Use the `certifi` package to point to CA certificates, or download the [ISRG Root X1](https://letsencrypt.org/certs/isrgrootx1.pem) certificate and set `sslrootcert` to its path. For more details on SSL options, see [SSL and certificate verification](/sql-reference/postgres-endpoint#ssl-and-certificate-verification). --- Source: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/read-scaling --- title: Read Scaling description: Learn how to scale your data applications using read scaling tokens --- import ScalingPatternsDiagram from '@site/src/components/ScalingPatternsDiagram'; Connecting read-heavy applications or BI tools with many concurrent users through a single MotherDuck account can sometimes lead to performance bottlenecks. By default, all connections using the same account share a single cloud DuckDB instance, called a "duckling". In addition to your read/write duckling, you can use Read Scaling to spin up additional read-only ducklings for read-heavy workloads. These replicas are **eventually consistent**. Results may lag a few minutes behind the latest database state. This tradeoff prioritizes high availability and performance while achieving near real-time synchronization across all replicas. ## Configuring a read scaling duckling pool ### Creating a read scaling token To use Read Scaling, you use a read scaling access token from the **MotherDuck UI** when [generating an access token][md-access-token] or through the [REST API](/docs/sql-reference/rest-api/users-create-token/). ### Connect with a read scaling token {#understanding-read-scaling-tokens} Once you have a read scaling token, you can use it to connect to MotherDuck from any DuckDB client as you would with any other authorization token. See [Connecting to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#session-names). ### Duckling assignment Read scaling ducklings remain idle until a connection is initialized from a DuckDB client. When a DuckDB client connects to MotherDuck with a read scaling token, the connection is assigned to one of the read scaling replicas. As more users connect, additional ducklings are spun up until you reach your Read Scaling Duckling Pool size. If the number of connections exceeds your pool size, new connections are assigned to existing ducklings in a round-robin fashion. The default Read Scaling Duckling Pool Size is 4 and can be increased up to 16. This is a soft limit, so if you need more ducklings in your pool, please [contact support](https://motherduck.com/contact-us/support/). ### Permissions A read scaling token grants permission for **read operations** (`SELECT`) while restricting write and administrative operations (updating tables, creating new databases, attaching or detaching databases). ## Ensuring data freshness In read scaling mode, ducklings sync changes from the primary read-write instance within a few minutes which works for most use cases. If your application requires stricter synchronization, you can manually trigger updates to be more frequent by: 1. Calling [CREATE SNAPSHOT](/sql-reference/motherduck-sql-reference/create-snapshot.md) on the writer duckling 2. Calling [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md) on any read scaling ducklings This approach guarantees that readers see the most recent snapshot. ::::warning[Watch Out] Creating a snapshot of a database will interrupt any ongoing queries interacting with that database. :::: ## Best practices Here are a few tips to get the most out of MotherDuck's read scaling capabilities. ### Optimize your read scaling duckling pool size For the best experience, aim for one duckling per concurrent user to take advantage of DuckDB's single-node power and efficiency. You can scale up as much as you need by configuring a maximum pool size based on expected concurrency and cost considerations. Users are also able to share ducklings if needed. While the default limit is 16 replicas, this is a soft limit. [Get in touch with MotherDuck support](https://motherduck.com/contact-us/support/) if you need more. ### Leverage local processing where possible Consider using DuckDB WASM to run client instances directly in the browser when possible to fully utilize client resources. ### Maintain user-duckling affinity with `session_name` {#session-affinity-with-session-name} To ensure users consistently connect to the same replica (improving caching and consistency), the DuckDB connection string supports the [`session_name`](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#session-names) parameter: - Clients providing the same `session_name` value are directed to the same replica. This improves caching effectiveness, provides a more consistent view of data across queries for that user and offers better isolation between concurrent users. - This parameter can be set to the ID of a user session, a user ID, or a hashed value for privacy. By leveraging read scaling tokens and `session_name`, you can efficiently scale read operations and group user sessions for optimal performance. ### Instance caching with `dbinstance_inactivity_ttl` Some DuckDB client library integrations support an *instance cache* to keep connections to the same database instance alive for a short period after use. This improves read scaling by helping maintain session affinity even across separate queries or short connection gaps. This caching behavior boosts the effectiveness of `session_name`, making it more likely that frequent queries from the same client land on the same duckling, even with short breaks between connections. See [Connecting to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#setting-custom-database-instance-cache-time-ttl) for more details. [md-access-token]: /key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token --- Source: https://motherduck.com/docs/key-tasks/cloud-storage/cloud-storage --- title: Interacting with cloud storage description: Learn how to work with databases and MotherDuck --- ## Included pages - [Querying Files in Amazon S3](https://motherduck.com/docs/key-tasks/cloud-storage/querying-s3-files): Query Parquet, CSV, and JSON files in S3 with automatic cloud execution routing. - [Writing Data to Amazon S3](https://motherduck.com/docs/key-tasks/cloud-storage/writing-to-s3): Export data from MotherDuck to Amazon S3 or transform S3 files in place. - [S3 Import Best Practices](https://motherduck.com/docs/key-tasks/cloud-storage/s3-import-best-practices): Optimize file size, format, and layout in Amazon S3 for fast, cost-effective data loading into MotherDuck. --- Source: https://motherduck.com/docs/key-tasks/cloud-storage/querying-s3-files --- sidebar_position: 5 title: Querying Files in Amazon S3 description: Query Parquet, CSV, and JSON files in S3 with automatic cloud execution routing. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Since MotherDuck is hosted in the cloud, one of the benefits of MotherDuck is better and faster interoperability with Amazon S3. MotherDuck's [Dual Execution](/concepts/architecture-and-capabilities#dual-execution) automatically routes queries against cloud storage to MotherDuck's execution runtime in the cloud rather than executing them locally. :::note MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). ::: :::info How MotherDuck accesses cloud storage When you query cloud storage while connected to MotherDuck (for example, `read_parquet('s3://...')`), the query runs on MotherDuck's cloud execution engine, not on your local machine. MotherDuck connects to your storage provider directly from the cloud. To authenticate, MotherDuck can use **any** of your secrets, including temporary, in-memory secrets created in your local DuckDB session. This means even if you create a secret locally without `IN MOTHERDUCK` or `PERSISTENT`, MotherDuck's cloud service can still use it to read your data. Your local DuckDB client does not connect to cloud storage directly. For details on secret storage options and how secrets are resolved, see [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/). ::: :::tip To browse objects before you query them, use [`MD_LIST_FILES()`](/sql-reference/motherduck-sql-reference/md-list-files): ```sql FROM md_list_files('s3:////'); ``` To discover buckets exposed by an S3 secret, use [`MD_LIST_BUCKETS_FOR_SECRET()`](/sql-reference/motherduck-sql-reference/md-list-buckets-for-secret). ::: MotherDuck supports the [DuckDB dialect](https://duckdb.org/docs/guides/import/s3_import) to query data stored in Amazon S3. Such queries are automatically routed to MotherDuck's cloud execution engines for faster and more efficient execution. Here are some examples of querying data in Amazon S3: ```sql SELECT * FROM read_parquet('s3:///'); SELECT * FROM read_parquet(['s3:///', ... ,'s3:///']); SELECT * FROM read_parquet('s3:///*'); SELECT * FROM 's3:////*'; SELECT * FROM iceberg_scan('s3:///', ALLOW_MOVED_PATHS=true); SELECT * FROM delta_scan('s3:///'); ``` See [Apache Iceberg](/integrations/file-formats/apache-iceberg.mdx) for more information on reading Iceberg data. See [Delta Lake](/integrations/file-formats/delta-lake.mdx) for more information on reading Delta Lake data. ## Accessing private files in S3 Protected Amazon S3 files require an AWS access key and secret. You can configure MotherDuck using [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret.md) ### SSL certificate verification and S3 bucket names Because of SSL certificate verification requirements, S3 bucket names that contain dots (.) cannot be accessed using virtual-hosted style URLs. This is due to AWS's SSL wildcard certificate (*.s3.amazonaws.com) which only validates single-level subdomains. When a bucket name contains dots, it creates multi-level subdomains that don't match the wildcard pattern, causing SSL verification to fail. If your bucket name contains dots, you have two options: 1. **Rename your bucket** to remove dots (e.g., use dashes instead) 2. **Use path-style URLs** by adding the `URL_STYLE 'path'` option to your secret: ```sql CREATE OR REPLACE SECRET my_secret IN MOTHERDUCK ( TYPE s3, URL_STYLE 'path', SCOPE 's3://my.bucket.with.dots' ); ``` For more information, see [Amazon S3 Virtual Hosting documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html). --- Source: https://motherduck.com/docs/key-tasks/cloud-storage/s3-import-best-practices --- sidebar_position: 6 title: S3 Import Best Practices description: Optimize file size, format, and layout in Amazon S3 for fast, cost-effective data loading into MotherDuck. tags: [s3, parquet, csv, import, best-practices] --- # S3 import best practices Loading data from Amazon S3 is one of the fastest ways to get data into MotherDuck. Because MotherDuck runs queries against S3 directly from the cloud, the file layout in your bucket has a significant impact on loading speed and cost. This guide covers how to organize files in S3 for optimal performance. For general loading advice (batch sizes, memory management, Duckling sizing), see [Loading data best practices](/key-tasks/loading-data-into-motherduck/considerations-for-loading-data/). ## Choose the right file format Parquet is the best format for most S3 imports. It compresses well, includes schema metadata, and lets DuckDB read only the columns and row groups it needs. | Format | Best for | Avoid when | |--------|----------|------------| | **Parquet** | Most workloads, large files, production pipelines | Files under ~1 MB, where metadata overhead outweighs benefits | | **CSV** | Small files (under 5 MB), quick exploration, simple schemas | Large datasets, complex types, multi-line text | | **JSON** | Small files (under 5 MB), Semi-structured data, API responses | Large files without a known schema (schema discovery is slow) | :::tip For very small files (under ~1 MB), CSV or JSON can be faster than Parquet because Parquet's metadata and footer add overhead that outweighs the compression benefits at small sizes. However, you want to avoid the 'small files problem' where your bottleneck becomes listing and reading each individual small file with the same schema when they could have been aggregated in one or more bigger Parquet files. ::: ### Parquet settings When writing Parquet files destined for MotherDuck: - **Compression**: Use Snappy (default) or ZSTD. Snappy offers faster decompression; ZSTD gives better compression ratios for cold storage. - **Row group size**: Aim for 100K-1M rows per row group. DuckDB processes row groups in parallel, so multiple groups per file improve throughput. - **Column encoding**: Leave this at the writer's default. DuckDB and most Parquet writers choose efficient encodings automatically. ## Optimize file size File size is the single most impactful factor for S3 import performance. Files that are too small create per-file overhead (HTTP requests, file listing, metadata parsing). Files that are too large limit parallelism. | File size | Impact | |-----------|--------| | **Under 1 MB** | Too small. Per-file overhead dominates. Merge small files into larger ones. | | **1-10 MB** | Acceptable for low-volume or infrequent loads. | | **10-256 MB** | Optimal range. Good balance of parallelism and minimal overhead. | | **Over 256 MB** | Still works fine into the multiple gigabytes, but DuckDB can only parallelize within a single file by row group. | :::tip Aim for **10-256 MB per file** in Parquet format. If your pipeline produces many small files (for example, one file per API call or per minute), batch them before writing to S3 or use a compaction step to merge them periodically. ::: ### Row count guidelines Row count guidelines follow from file size, but as a rough reference: | Rows per file | Typical file size (Parquet) | Recommendation | |---------------|----------------------------|----------------| | Under 1,000 | Under 100 KB | Too small, merge files | | 1,000-100,000 | 100 KB - 10 MB | Acceptable for small tables | | 100,000-10,000,000 | 10 MB - 500 MB | Optimal range | | Over 10,000,000 | Over 500 MB | Consider splitting into multiple files | ## Organize your S3 bucket A consistent file layout in S3 makes it easier to load data incrementally and query subsets efficiently. ### Use Hive-style partitioning for large datasets If your dataset is large and you query it by date or category, partition your files using Hive-style paths: ```text s3://my-bucket/events/year=2025/month=03/data.parquet s3://my-bucket/events/year=2025/month=04/data.parquet ``` DuckDB automatically detects Hive partitioning and prunes partitions during queries: ```sql SELECT * FROM read_parquet('s3://my-bucket/events/**/*.parquet', hive_partitioning=true) WHERE year = 2025 AND month = 3; ``` ### Use consistent naming conventions - Use lowercase paths (MotherDuck URLs are case-sensitive) - Avoid dots in bucket names (causes [SSL issues](/key-tasks/cloud-storage/querying-s3-files/#ssl-certificate-verification-and-s3-bucket-names)) - Include timestamps or sequence numbers in file names for incremental loads: ```text s3://my-bucket/orders/orders_20250323_001.parquet s3://my-bucket/orders/orders_20250323_002.parquet ``` ## Set up continuous loading from S3 For pipelines that continuously land files in S3, keep these guidelines in mind: ### Loading frequency | Frequency | Recommendation | |-----------|----------------| | **Under 1 minute** | Not recommended. Per-file overhead and small file sizes make this inefficient. Instead consider [Ducklake](/docs/integrations/file-formats/ducklake/) which will inline data until the batch is big enough to write to a file. | | **1-5 minutes** | Possible for time-sensitive workloads, but files will be small. Ensure each file is at least 1 MB. | | **5-15 minutes** | Good balance of freshness and file size for most use cases. | | **Hourly or daily** | Ideal for batch workloads. Produces well-sized files with minimal overhead. | :::tip If your source system produces data continuously, buffer at least 5-15 minutes of data before writing a file to S3. This produces files in the optimal 10-256 MB range and avoids the small-file problem. ::: ### Incremental loading pattern For incremental loads, use a landing zone pattern: 1. Land new files in an `incoming/` prefix 2. Load them into MotherDuck with a timestamp filter or file listing 3. Move processed files to a `processed/` prefix ```sql -- Load new files from the incoming prefix INSERT INTO my_table SELECT * FROM read_parquet('s3://my-bucket/incoming/*.parquet'); ``` For more complex incremental workflows with state management, use an [ingestion tool](#use-ingestion-tools-for-production-pipelines). ## Use ingestion tools for production pipelines For production pipelines that need scheduling, error handling, retries, and schema evolution, use a dedicated ingestion tool rather than writing raw SQL scripts. Many tools support MotherDuck as a destination and handle S3 file management automatically. **Ingestion tools with MotherDuck support:** - [dlt (data load tool)](/integrations/ingestion/dlt/) supports loading from APIs, databases, and files into MotherDuck with automatic schema evolution - [Streamkap](/integrations/ingestion/streamkap/) provides real-time CDC from databases to MotherDuck **Orchestration tools** like Dagster, Airflow, Prefect, and Kestra can schedule S3-to-MotherDuck pipelines. Browse the full list of [ingestion](https://motherduck.com/ecosystem/?category=Ingestion) and [orchestration](https://motherduck.com/ecosystem/?category=Orchestration) tools in the MotherDuck ecosystem. ## Colocate data with MotherDuck MotherDuck connects to S3 directly from the cloud, so network distance between your S3 bucket and MotherDuck's region matters. - MotherDuck is available in **US East (N. Virginia)** (`us-east-1`), **US West (Oregon)** (`us-west-2`), and **Europe (Frankfurt)** (`eu-central-1`) - Place your S3 bucket in the **same region** as your MotherDuck organization for best performance ## Summary | Area | Recommendation | |------|----------------| | **File format** | Parquet for most workloads; CSV/JSON for files under 1 MB | | **File size** | 10-256 MB per file | | **Row count** | 100K-10M rows per file | | **Loading frequency** | 5-15 minutes minimum; hourly or daily for batch | | **Partitioning** | Hive-style for large, time-series datasets | | **Region** | Same region as your MotherDuck organization | | **Production pipelines** | Use a dedicated ingestion or orchestration tool | --- Source: https://motherduck.com/docs/key-tasks/cloud-storage/writing-to-s3 --- sidebar_position: 5 title: Writing Data to Amazon S3 description: Export data from MotherDuck to Amazon S3 or transform S3 files in place. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; You can use MotherDuck to transform files on Amazon S3 or export data from MotherDuck to Amazon S3. :::note MotherDuck supports several cloud storage providers, including [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). ::: MotherDuck supports the [DuckDB dialect](https://duckdb.org/docs/guides/import/s3_export) to write data to Amazon S3. The examples here write data in Parquet format, for more options refer to the [documentation for DuckDB's COPY command](https://duckdb.org/docs/stable/sql/statements/copy.html). ## Syntax ```sql COPY
TO 's3:///[]/'; ``` ## Example usage ```sql -- write entire ducks_table table to parquet file in S3 COPY ducks_table to 's3://ducks_bucket/ducks.parquet'; -- writing the output of a query will also work COPY (SELECT * FROM ducks_table LIMIT 100) to 's3://ducks_bucket/ducks_head.parquet'; ``` --- Source: https://motherduck.com/docs/key-tasks/customer-facing-analytics/3-tier-cfa-guide --- title: 3-tier customer-facing analytics guide sidebar_label: Builder's Guide description: Step-by-step guide to building a 3-tier customer-facing analytics application with MotherDuck. slug: /key-tasks/customer-facing-analytics/3-tier-cfa-guide/ --- To build a **Customer-Facing Analytics (CFA) application** on MotherDuck, use this step-by-step guide. This guide will focus on patterns for traditional 3-tier architecture, but you can also run 1.5-tier apps using Wasm, as seen in the [1.5-tier architecture guide](/getting-started/customer-facing-analytics/#15-tier-architecture-duckdb-wasm). You'll know you're done when: - Your application (`B2B Tool`) can run analytics queries for a customer (`Goose Inc`) against MotherDuck from a backend service. - Data from a transactional database is synced into a per-customer MotherDuck database on a schedule using your orchestrator. - You understand when to add more service accounts, databases, and read scaling capacity as your product grows. Use this guide when you want to: - Build a 3-tier web app (browser → app server → MotherDuck) with embedded analytics. - Use per-customer service accounts and databases to isolate data and compute. - Keep analytics data in MotherDuck in sync with your transactional database. Before starting, ensure you have: - A MotherDuck account and an organization you can use for development. - Basic familiarity with Python and SQL. - Access to a PostgreSQL database (or a test instance) with an `orders`-style schema. - Python installed locally (DuckDB is compatible with the latest Python LTS version). > This guide assumes you've read the conceptual overview [**Customer-Facing Analytics Getting Started**](/getting-started/customer-facing-analytics). ## 1. understand the 3-tier CFA architecture In this guide, you are building `B2B Tool`, a SaaS product that serves analytics to employees at many customer companies. Each customer company gets: - Its own **service account** in MotherDuck. - Its own **database(s)** for analytics tables. - Its own **compute** (Ducklings) for queries and data loading. Your high-level architecture: ```mermaid graph LR; subgraph Users["End Users"] U1{{"Kate (Goose Inc)"}}:::green; U2{{"John (Goose Inc)"}}:::green; U3{{"Hari (Duck Co)"}}:::green; end subgraph App["Your Application"] FE["Frontend"]; BE["Backend API"]; TX["Transactional DB"]; end MDORG["MotherDuck"]; U1 --> FE; U2 --> FE; U3 --> FE; FE -->|"HTTP / JSON APIs"| BE; BE -->|"User + Company lookup"| TX; BE -->|"Analytics queries"| MDORG; ``` [Hypertenancy](/concepts/hypertenancy) here means each company (`Goose Inc`, `Duck Co`) owns its MotherDuck database(s) (that store only that company's analytics data), that compute is isolated (each company has its own Ducklings) and heavy workloads for one customer cannot slow down others. You will: 1. Set up a dev organization and add other developers on the team. 2. Create a service account for your first customer company (`Goose Inc`). 3. Sync data from your transactional DB, such as Postgres, into Goose Inc’s MotherDuck analytical database using your chosen replication method. 4. Connect your backend service to MotherDuck with a **read token** to serve analytics queries. 5. Plan how to scale to many customer companies and higher concurrency. ### Alternative to per-customer service accounts The per-customer service account pattern is the strongest isolation model. Some teams, especially B2C or lighter multi-tenant apps, opt for a simpler setup: - Keep a **single writer service account** that owns all customer databases. - Create a **[read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)** for that account and configure the flock size to target one duckling per concurrent end user (default max 16, adjustable through support). For cost control, users can share a duckling, but that increases contention. - Have each end user connect in **[single attach mode](/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/)** to the one database they should see (`md:?attach_mode=single`), which avoids carrying other attachments from the workspace. - Use [`session_name`](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#session-names) in the connection string to keep an end user pinned to the same read scaling duckling for cache reuse and steadier latency. This model trades away service-account isolation in favor of operational simplicity. Ensure your security and compliance needs allow a shared service account before choosing it. Read scaling replicas are eventually consistent. If you need fresher reads on demand, combine `CREATE SNAPSHOT` on the writer with `REFRESH DATABASE` on the read scaling connections. Example connection string for an end user: ```text md:customer_db?attach_mode=single&session_name= ``` ## 2. set up your dev environment and organization Prepare your dev environment: 1. **Create your dev organization and account** 1. Go to `https://motherduck.com` and sign up or log in with your work email (for example, `manager@b2btool.com`). 2. Create or select an organization you’ll use for development (for example, `B2B Tool Co`). 3. In the MotherDuck UI, open the default database (`my_db`) and confirm you can run a simple query such as: ```sql SELECT 1; ``` You should see a single row with the value `1`. 2. **Upload a small CSV to confirm data ownership and access** 1. In the MotherDuck web UI, upload a small example CSV (for example, `orders_sample.csv`) into `my_db`. If this step is unclear, check out the [MotherDuck tutorial on loading data](/getting-started/e2e-tutorial/part-2/#loading-your-data). 2. Run a query like: ```sql SELECT COUNT(*) AS row_count FROM orders_sample; ``` You should see the number of rows you uploaded. 3. **Invite a second developer and share data** 1. Invite `devlead@b2btool.com` to your `B2B Tool Co` organization. 2. Create a new database in your personal account (for example, `b2btool_dev`) and copy or create a simple table. 3. Share that database with your colleague following the [**Sharing Data** guide](/key-tasks/sharing-data/sharing-overview/). 4. Ask your colleague to query the shared database from their account. At this point: - You have a dev org with two human users. - You’ve seen how database ownership and read-only sharing works. Conceptually, your dev setup looks like this: ```mermaid graph LR; DM["devlead@b2btool.com"] <-->|"read/write"| DB1[("DB: b2btool_dev")]:::database; DB1 -->|"read only"| DC{{Colleague}}:::green; ``` ## 3. create a service account for a customer company For customer-facing analytics, your customers usually do **not** log into MotherDuck directly. Instead: - Your application mediates access. - Each customer company gets a **service account** in your MotherDuck organization. - Your backend uses that service account’s tokens to load and query data. In this guide, you’ll create a service account for your first customer company: `Goose Inc`. ### 3.1 create a service account in the MotherDuck UI 1. In the MotherDuck UI, go to the **Service Accounts** section for your organization. 2. Click **Create Service Account**. 3. Name it something like `goose-inc-service-account`. 4. Save the generated access token in your secret manager or a secure store. For more detail, see [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/). ### 3.2 (optional) create service accounts through REST API Later, you will likely automate service account creation. To create a service account programmatically: - Use the [`users-create-service-account`](/sql-reference/rest-api/users-create-service-account/) REST API endpoint. - Use the [`users-create-token`](/sql-reference/rest-api/users-create-token/) endpoint to create an access token for that service account. Your provisioning workflow should: **(1)** detect a new customer signup, **(2)** call `users-create-service-account` for that company, **(3)** call `users-create-token`, and **(4)** store the token metadata (or an alias) in your transactional database so your backend can look it up later. ## 4. model and load customer data in MotherDuck Next, populate data for `Goose Inc` into its own MotherDuck database. Assume: - Your transactional system (`B2B Tool`) uses PostgreSQL. - Each customer company is an e-commerce store with: - `orders` table: order-level facts. - `fulfillments` table: shipment or delivery events. Example schema: ```sql CREATE TABLE orders ( order_id BIGINT PRIMARY KEY, company_id BIGINT, order_date TIMESTAMP, customer_email TEXT, total_amount NUMERIC(18, 2), status TEXT ); CREATE TABLE fulfillments ( fulfillment_id BIGINT PRIMARY KEY, order_id BIGINT REFERENCES orders(order_id), fulfilled_at TIMESTAMP, carrier TEXT, status TEXT ); ``` Example data: ```sql INSERT INTO orders SELECT row_number() OVER () AS order_id, (random() * 9 + 1)::BIGINT AS company_id, current_timestamp - INTERVAL (random() * 365) DAY AS order_date, 'customer' || (random() * 999 + 1)::INT || '@example.com' AS customer_email, (random() * 9999 + 1)::NUMERIC(18, 2) AS total_amount, (['pending', 'processing', 'shipped', 'delivered', 'cancelled'])[(random() * 4)::INT + 1] AS status FROM range(1000); INSERT INTO fulfillments SELECT row_number() OVER () AS fulfillment_id, (random() * 999 + 1)::BIGINT AS order_id, current_timestamp - INTERVAL (random() * 300) DAY AS fulfilled_at, (['UPS', 'FedEx', 'USPS', 'DHL', 'Amazon Logistics'])[(random() * 4)::INT + 1] AS carrier, (['pending', 'in_transit', 'out_for_delivery', 'delivered', 'failed'])[(random() * 4)::INT + 1] AS status FROM range(1000); ``` :::info Use your [orchestrator](/integrations/orchestration/) and [ingestion tool](/integrations/ingestion/) to keep this data in sync for each customer company. ::: ### 4.1 create a MotherDuck database for `Goose Inc` Use the `Goose Inc` service account’s token to create a database for that customer: ```sql CREATE DATABASE goose_inc; ``` Run this in the UI after impersonating the `Goose Inc` service account or connect as that service account from Python and issue the `CREATE DATABASE` statement. :::note To move forward, replicate your data into `goose_inc`. [This page](/key-tasks/data-warehousing/replication/postgres/) shows a simple example for replicating a Postgres database to MotherDuck. ::: ## 5. run analytics queries from your backend With data in Goose Inc’s MotherDuck database, your backend can run analytics queries. At a high level: 1. Your user (`Kate` at Goose Inc) logs into `B2B Tool`. 2. Your backend authenticates Kate and determines she belongs to the `Goose Inc` customer company. 3. Your backend looks up Goose Inc’s **read token** for its service account from your transactional database or secret store. 4. Your backend uses that read token to run analytics queries against the `goose_inc` database in MotherDuck. ### 5.1 create a read token for `Goose Inc` For production, you’ll usually create a token dedicated to **reading** analytics data: 1. In the MotherDuck UI, impersonate the Goose Inc service account. 2. Create a new access token intended only for read workloads. 3. Store this token securely and associate it with Goose Inc in your transactional database. You can also create tokens through the REST API using the [`users-create-token`](/sql-reference/rest-api/users-create-token/) endpoint. ### 5.2 connect from Python using DuckDB Your backend service connects to MotherDuck using the DuckDB client and the `md:` connection string. Typically, you: - Set the `MOTHERDUCK_TOKEN` (or `motherduck_token`) environment variable to the Goose Inc read token. - Connect to the `goose_inc` database using DuckDB. Example helper in your backend (for example, `analytics_client.py`): ```python import os import duckdb def get_customer_connection(customer_id: str): """ Get a DuckDB connection to a customer's MotherDuck database. Args: customer_id: Identifier for the customer (e.g., 'goose_inc', 'duck_co') Returns: DuckDB connection to the customer's database """ # Look up the customer's read token from your secret store or environment # In production, you'd fetch this from your transactional DB or secret manager token_env_var = f"{customer_id.upper().replace('-', '_')}_READ_TOKEN" read_token = os.environ.get(token_env_var) if not read_token: raise ValueError(f"Read token not found for customer: {customer_id}") # Set the token for this connection os.environ["MOTHERDUCK_TOKEN"] = read_token # Connect to the customer's database on MotherDuck # Database name typically matches the customer_id conn = duckdb.connect(f"md:{customer_id}") return conn ``` Then, a simple analytics function in your API service: ```python def get_customer_kpis(customer_id: str): conn = get_customer_connection(customer_id) query = """ SELECT date_trunc('day', order_date) AS day, COUNT(*) AS orders_count, SUM(total_amount) AS gross_revenue FROM orders WHERE order_date >= current_date - INTERVAL 30 DAY GROUP BY 1 ORDER BY 1 """ result = conn.execute(query).fetch_df() # Convert to JSON-serializable structure for your frontend return result.to_dict(orient="records") ``` Expose this from a REST endpoint such as `/api/customers/{customer_id}/kpis` and render the results in your frontend dashboards. The same code works for any customer by passing their identifier. The runtime query flow looks like: ```mermaid sequenceDiagram participant User as Kate (Goose Inc) participant FE as B2B Tool Frontend participant BE as B2B Tool Backend participant MD as MotherDuck (Goose Inc DB) User->>FE: Opens analytics dashboard FE->>BE: GET /api/customers/goose-inc/kpis BE->>BE: Lookup Goose Inc read token BE->>MD: Analytics query using DuckDB + md:goose_inc MD-->>BE: Result rows BE-->>FE: JSON KPIs FE-->>User: Render charts ``` ## 6. scaling to many customer companies As your product grows, add more customer companies. For each new company: 1. **Create a service account** (through the UI or REST API). 2. **Create one or more databases** for that company’s analytics data. 3. **Configure your orchestrator** to run a `dlt` pipeline (or equivalent) for that company. 4. **Create a read token** for the company and store it in your transactional database. Your architecture naturally scales horizontally: ```mermaid graph LR; subgraph Org["Your MotherDuck Org"] SA1["Service Account: Goose Inc"]; SA2["Service Account: Swan Gmbh"]; SA3["Service Account: Duck Co"]; DB1[("DB: goose_inc")]:::db; DB2[("DB: swan_gmbh")]:::db; DB3[("DB: duck_co")]:::db; end SA1 --> DB1; SA2 --> DB2; SA3 --> DB3; ``` Each service account and database pair has its own compute, minimizing noisy neighbors and making performance a per-customer concern. ## 7. scaling a single customer to high concurrency When a customer (for example, `Goose Inc`) grows to hundreds or thousands of simultaneous users, use these levers: 1. **Increase the Duckling size** for the service account’s default compute Duckling to handle heavier transformation jobs (vertical scaling). 2. **Use read scaling** for high-concurrency read workloads: - Refer to [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) to create read scaling Ducklings for Goose Inc's read token. - Point your backend’s analytics queries at the read scaling token instead of the main read/write token. 3. **Optimize queries and models**: - Pre-aggregate frequently-used metrics. - Use summary tables to avoid scanning the full `orders` table on every request. For most applications, you start with a single Duckling per customer and introduce read scaling only when your monitoring shows sustained high concurrency or latency issues. ## 8. troubleshooting and when to add more service accounts As you operate your CFA deployment, you may run into several common situations. ### 8.1 queries are slow or time out for one customer If you see slow queries or timeouts for a specific customer: - **Check query patterns**: - Are you scanning too much data on every request? - Can you pre-aggregate or cache common metrics? - **Scale compute for that customer**: - Increase the size for the service account’s Duckling. - Add read scaling Ducklings OR increase the Duckling size used for the read token used by that customer. You rarely need to change the number of service accounts in this case; focus on scaling and optimizing the existing one. ### 8.2 data loads interfere with reads If your hourly (or more frequent) data load jobs are locking tables and causing read queries to queue: - Consider: - Scheduling heavy load jobs during off-peak times. - Using zero-copy cloning (`CREATE SNAPSHOT` and `REFRESH DATABASE`) patterns so that readers query a snapshot database while writers update the primary. - Ensure you are using a **dedicated read token** and read scaling configuration for user-facing queries. ### 8.3 when to add more service accounts In most B2B scenarios: - You create **one service account per customer company**. - All users at that company share the same analytics data and compute through your application. You should consider adding **additional service accounts** when: - You need hard isolation between different environments (for example, separate service accounts for `Prod`, `Staging`, and `Sandbox` within the same customer). - A customer has sub-tenants of their own and you want to isolate compute and data at that sub-tenant level (for example, separate service accounts per region or per major business unit). When you add new service accounts: 1. Create the service account (UI or REST API). 2. Create dedicated databases for the new scope. 3. Create tokens and wire them into your application’s configuration. ### 8.4 common token and permission issues If you see authentication or permission errors: - **Token expired or revoked**: - Rotate the token in MotherDuck and update your secret store. - **Permission denied on database or table**: - Confirm that the service account owns the database or has the necessary privileges. - Re-check sharing settings if you are using shared data. ## 9. next steps Once you have a basic 3-tier CFA deployment working: - **Automate provisioning**: - Automate service account and token creation using the [REST APIs](/sql-reference/rest-api/motherduck-rest-api/). - Automate database and schema creation for new customer companies. - **Automate data loading**: - Move your `dlt` jobs fully into your orchestrator so that new companies are onboarded with little manual work. - Monitor load durations and adjust scheduling as your data grows. - **Enhance your frontend**: - Add charts and drill-downs powered by MotherDuck. - Consider additional guides under `Customer-Facing Analytics` for advanced topics in your docs set. For a high-level conceptual overview and architecture comparison, see the [**Customer-Facing Analytics Getting Started**](/getting-started/customer-facing-analytics/) page. --- Source: https://motherduck.com/docs/key-tasks/customer-facing-analytics/customer-facing-analytics --- sidebar_position: 14 title: Build a customer-facing analytics app sidebar_label: Customer-Facing Analytics description: Build customer-facing analytics applications with read scaling tokens and isolated tenant data. --- To build your first application with **Customer-Facing Analytics (CFA)** on MotherDuck, use this overview as a starting point. You'll know you're done when: - Each of your customer tenants (or organizations) has its own service account and database(s) in MotherDuck. - Your application can query customer-specific analytics data with predictable performance and isolation. - You understand which detailed guide to follow next for implementation. Use this overview to choose a **tenancy model** and learn the building blocks before the step-by-step 3-tier guide. ## Customer provisioning Every [Duckling](https://motherduck.com/blog/scaling-duckdb-with-ducklings/) is an isolated bucket of compute. For Customer-Facing Analytics, this usually means: - Each **customer tenant or organization** has **one service account** dedicated to serving analytics (and often also ingestion and transformation). - Your backend mediates all access; customers typically do not log into MotherDuck directly. You manage service accounts and tokens using: - [`users-create-service-account`](/sql-reference/rest-api/users-create-service-account/) – create a service account per customer tenant. - [`users-create-token`](/sql-reference/rest-api/users-create-token/) – create tokens for ingestion and read workloads. With accounts and tokens in place, you can: - Create databases under each service account. - Load data into those databases using your orchestrator. - Use dedicated read tokens from your application to serve analytics. For a concrete example of this pattern in a 3-tier web app, see the **[CFA Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/)**. ## Data modeling and loading One database per customer tenant or organization scales cleanly because: - Each database is tied to a tenant's service account. - Each tenant's workloads are isolated from the others. - You can scale Duckling (compute instance) sizes independently based on tenant needs using [different sizes (Pulse, Standard, etc)](/about-motherduck/billing/duckling-sizes/). You can also: - Use a single "landing" service account to ingest raw data from upstream systems. - Use [ATTACH](/sql-reference/motherduck-sql-reference/attach.md) and [zero-copy cloning](/key-tasks/sharing-data/sharing-overview/#consuming-shared-data) to fan that data out into per-customer databases owned by their respective service accounts. High-level patterns for data pipelines: ```mermaid graph LR; A[Source Systems]-->D[(Landing Database)]:::database; D-->F[(Transform & Clone)]:::database; F-->G[(Customer DB A)]:::database; F-->H[(Customer DB B)]:::database; F-->I[(Customer DB C)]:::database; subgraph App E[Serve Analytics] end G-->E; H-->E; I-->E; ``` Check out the detailed [Builder's Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/) for instructions on loading data into per-customer MotherDuck databases and orchestrating customer-facing analytics pipelines. ## Other considerations Since MotherDuck [Shares](/key-tasks/sharing-data/sharing-overview/) are read-only, in more real-time scenarios it may make sense to use: - [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot/) to force a checkpoint on the writer. - [`REFRESH DATABASE`](/sql-reference/motherduck-sql-reference/refresh-database/) to get the latest version of the data on the reader. This pattern can help enforce consistency between writer and reader databases that power your customer-facing dashboards. For high-scale, high-concurrency applications, MotherDuck offers [Read Scaling Replicas](https://motherduck.com/blog/read-scaling-preview/) for applications that send hundreds or thousands of queries in a few seconds, such as BI tools or busy embedded dashboards. Read replicas: - Can be created and modified in the UI. - Can be managed using the [MotherDuck REST API](/sql-reference/rest-api/motherduck-rest-api/). - Follow the same consistency considerations as Shares, and can be checkpointed and refreshed more frequently if needed. When you're ready to implement a full 3-tier architecture with per-customer service accounts, scheduled data loading, and a backend API, continue to the [**Customer-Facing Analytics Guide**](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/). --- Source: https://motherduck.com/docs/key-tasks/data-warehousing/data-warehousing --- title: Data Warehousing How-to description: Data Warehousing How-to guides --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Versions from '@site/src/components/Versions'; ## Introduction to MotherDuck for data warehousing MotherDuck is a cloud-native data warehouse built on top of [DuckDB](https://duckdb.org/docs/sql/introduction), a fast in-process analytical database. While DuckDB provides the core analytical engine capabilities, MotherDuck adds cloud storage, sharing, and collaboration features that make it a complete data warehouse solution. Key advantages include its serverless architecture that eliminates infrastructure management, an intuitive interface that simplifies data analysis, and hybrid execution that intelligently processes queries across local and cloud resources. MotherDuck is an ideal choice for organizations seeking a modern data warehouse solution. It excels at ad-hoc analytics by providing instant compute resources for each user, serves well as a departmental data mart with its simplified sharing model, and enables powerful embedded analytics through its WASM capabilities. Different personas benefit uniquely - data analysts get an intuitive SQL interface with AI assistance, engineers can leverage familiar APIs and tools like dbt, and data scientists can seamlessly combine local and cloud data processing. ![img_duck_stack](./img/md-diagram.svg) The modern data stack with MotherDuck integrates seamlessly with popular tools across the ecosystem. As shown in the ecosystem diagram, this includes ingestion tools like [Fivetran](https://fivetran.com/docs/destinations/motherduck#motherduck) and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) for loading data, transformation tools like [dbt](/docs/integrations/transformation/dbt) for modeling, BI tools like [Tableau](/integrations/bi-tools/tableau/) and [PowerBI](/integrations/bi-tools/powerbi/) for visualization, and orchestration tools like [Airflow](https://airflow.apache.org/docs/) and [Dagster](https://docs.dagster.io/integrations/libraries/duckdb/using-duckdb-with-dagster) for pipeline management. This comprehensive integration enables teams to build complete data warehousing solutions while leveraging their existing tooling investments. ## MotherDuck basics: concepts to understand before you start ![Architecture](./img/the-md-dwh.png) MotherDuck's core architecture is built on a serverless foundation that eliminates infrastructure management overhead. The platform handles data storage with enterprise-grade durability and security, while optimizing performance through intelligent data organization. Each user gets their own isolated compute resource called a "Duckling" that sits on top of the storage layer, and the separation of storage and compute enables independent scaling of these resources based on workload demands. The [dual execution model](/concepts/architecture-and-capabilities/#dual-execution) is a unique capability that allows MotherDuck to seamlessly query both local and cloud data. The query planner intelligently determines the optimal execution path, deciding whether to process data locally, in the cloud, or using a hybrid approach. This enables efficient querying across data sources while minimizing data movement and optimizing for performance. MotherDuck follows a familiar hierarchical structure with databases containing schemas and tables. Databases serve as the primary unit of organization and access control, while schemas help logically group related tables together. This structure provides a clean way to organize data while maintaining compatibility with common [SQL patterns](https://duckdb.org/docs/sql/introduction) and tools. Authentication in MotherDuck is handled through secure [token-based access](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token), with comprehensive user and organization management capabilities. The platform uses a simplified access model where users either have full access to a database or none at all. The [SHARES](/key-tasks/sharing-data/managing-shares/) feature enables secure data sharing within organizations and with external parties through zero-copy clones that maintain data consistency and security. The [MotherDuck user interface](/getting-started/interfaces/motherduck-quick-tour/) provides a modern notebook-style environment for data interaction. The SQL IDE includes powerful features like intelligent autocomplete, AI-powered query suggestions and fixes, and an interactive Column Explorer that helps users understand and analyze their data structure. These features combine to create an intuitive and productive environment for data analysis. While MotherDuck is designed for analytical workloads, it's important to note that it's not optimized for high-frequency small transactions like traditional OLTP databases. The platform works best with batch operations and [analytical queries](https://duckdb.org/docs/sql/introduction), and users should consider using queues for streaming workloads to achieve optimal performance. Additionally, the database-level security model means access cannot be controlled at the schema or table level. ## Data ingestion: getting your data in MotherDuck provides multiple strategies for ingesting data into your data warehouse. The platform leverages DuckDB's powerful data loading capabilities while adding cloud-native features for seamless data ingestion at scale. You can load data through direct file imports, cloud storage connections, database migrations, or specialized ETL tools like [Fivetran](https://fivetran.com/docs/destinations/motherduck#motherduck) and [Airbyte](https://docs.airbyte.com/integrations/destinations/motherduck) depending on your needs. The [MotherDuck Web UI](/getting-started/interfaces/motherduck-quick-tour/) provides an intuitive interface for data loading and exploration. ### Loading local data Loading data from local files supports common formats like CSV, Parquet, and JSON. The [MotherDuck UI](/getting-started/interfaces/motherduck-quick-tour/) provides an intuitive interface for uploading files directly, while the [Python client](https://duckdb.org/docs/api/python/overview) enables programmatic loading using DuckDB's native functions. For example, you can use [read_csv()](https://duckdb.org/docs/data/csv), [read_parquet()](https://duckdb.org/docs/data/parquet), or [read_json()](https://duckdb.org/docs/data/json) to efficiently load data files while taking advantage of DuckDB's parallel processing capabilities. ### Interacting with cloud storage (S3, GCS, etc) Cloud storage integration lets you directly query and load data from major providers including [AWS S3](https://duckdb.org/docs/guides/import/s3_import), [Google Cloud Storage](https://duckdb.org/docs/guides/import/gcs_import), [Azure Blob Storage](https://duckdb.org/docs/stable/extensions/azure), and [Cloudflare R2](https://duckdb.org/docs/guides/import/s3_import). Using SQL commands like SELECT FROM read_parquet('s3://bucket/file.parquet'), you can seamlessly access cloud data. MotherDuck handles credential management securely through [environment variables](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) or configuration settings. ### Database-to-database data loading For database migrations, MotherDuck supports importing data from other databases like [PostgreSQL](https://duckdb.org/docs/guides/import/query_postgres.html) and [MySQL](https://duckdb.org/docs/guides/import/query_mysql). You can directly connect to these sources using database connectors and execute queries to extract and load data. Existing [DuckDB databases](https://duckdb.org/docs/stable/data/multiple_files/overview) can be imported efficiently since MotherDuck is built on DuckDB's core engine. ### Fetching data from APIs [Data ingestion](/integrations/ingestion/) tools like Fivetran, Airbyte, dltHub and Estuary integrate with MotherDuck to provide automated, reliable data pipelines. These tools handle complex ETL workflows, data validation, and transformation while offering features like scheduling, monitoring and error handling that simplify ongoing data operations. For real-time data needs, MotherDuck works with streaming partners like [Estuary](https://docs.estuary.dev/reference/Connectors/materialization-connectors/motherduck/) to enable continuous data ingestion. While DuckDB is optimized for batch operations, these integrations allow you to build streaming pipelines that buffer and load data in micro-batches for near real-time analytics. ### Unstructured data integrations When working with unstructured data like documents, emails or images, tools like [Unstructured.io](https://motherduck.com/blog/effortless-etl-unstructured-data-unstructuredio-motherduck/) can pre-process and structure the data before loading into MotherDuck. This lets you analyze unstructured data alongside your structured data warehouse tables. ### Loading performance notes For optimal performance, follow DuckDB's recommended practices around batch sizes and data types. Load data in reasonably sized batches (at leasts 122k rows) to balance memory usage and throughput. Use appropriate data types like TIMESTAMP for datetime values and avoid unnecessary type conversions. Sort data by columns that are frequently queried together such as TIMESTAMPs. Monitor [recent queries](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) during large loads and adjust batch sizes accordingly. ## Data transformation: shaping your data for analysis Data transformation is a critical step in the data warehousing process that converts raw data into analysis-ready formats. MotherDuck provides powerful SQL capabilities inherited from DuckDB for transforming data directly within the warehouse. You can leverage DuckDB's rich library of SQL functions to clean, reshape, and model your data through operations like filtering, joining, aggregating and pivoting. ### Transformation tools - **[dbt (data build tool)](/integrations/transformation/dbt/)** * Native MotherDuck adapter for seamless integration to dbt core * Enables version controlled, modular SQL transformations * Supports testing, documentation and lineage tracking * Recommended for complex transformation workflows * See our [blog post](https://motherduck.com/blog/duckdb-dbt-e2e-data-engineering-project-part-2/) for detailed examples - **[SQLMesh](https://sqlmesh.readthedocs.io/en/stable/integrations/engines/motherduck/)** * Compatible with MotherDuck through DuckDB support * Provides data pipeline and transformation management * Enables incremental processing and scheduling * - **[Paradime](https://docs.paradime.io/app-help/documentation/settings/connections/scheduler-environment/duckdb)** * Modern data transformation platform built for DuckDB/MotherDuck * Offers collaborative development environment * Includes version control and deployment tools ## Orchestration: automating your data pipelines Orchestration is essential for keeping data up to date with MotherDuck. Scheduling data loads and transformations ensures your data warehouse stays current by running ingestion jobs at appropriate intervals to capture new data from your sources. Managing dependencies between tasks lets you create reliable pipelines where transformations only run after their prerequisite data loads complete successfully. Monitoring and alerting capabilities help you track pipeline health and quickly address any issues that arise. For orchestrating MotherDuck workflows, you have several options: Popular workflow orchestration platforms like [Airflow, Dagster, Kestra, Prefect and Bacalhau](/integrations/orchestration/) provide robust scheduling, dependency management and monitoring capabilities. For simpler use cases, basic scheduling tools like cron jobs or [GitHub Actions](/key-tasks/data-warehousing/orchestration/github-action-cron/) can effectively orchestrate data pipelines. Many ingestion & transformation tools also come with built-in orchestration features, allowing you to schedule and monitor data loads without additional tooling. When orchestrating MotherDuck pipelines, follow these best practices: - Design idempotent jobs that can safely re-run without duplicating or corrupting data. - Implement proper error handling and retries to gracefully handle temporary failures. - Set up logging and monitoring to maintain visibility into pipeline health and performance. ## Connecting BI tools and data applications MotherDuck provides robust support for business intelligence and reporting through its cloud data warehouse capabilities. The platform enables organizations to build scalable analytics solutions by connecting their data warehouse to popular visualization and reporting tools. With isolated compute tenancy per user, analysts can run complex queries without impacting other users' performance. For connecting popular BI tools, MotherDuck offers several integration options. Tableau users can connect through the [cloud and server connectors](/integrations/bi-tools/tableau/), with support for both token-based and environment variable authentication methods. The platform works with both live and extracted connections, and Tableau Bridge enables cloud connectivity. [Microsoft Power BI](/integrations/bi-tools/powerbi/) integration is achieved through the DuckDB ODBC driver and Power Query connector, supporting both import and DirectQuery modes. Other supported BI tools include Omni, Metabase, Preset/Superset, and Rill, typically connecting through standard JDBC/ODBC interfaces. MotherDuck seamlessly integrates with data science and AI tools through its native APIs and connectors. Python users can leverage the DuckDB SDK and Pandas integration for data analysis workflows. The platform supports R for statistical computing, while AI applications can be built using LangChain or LlamaIndex integrations. Notebook tools like Hex and Jupyter provide both hosted and on-prem environments for data exploration. For building [custom data applications](/getting-started/customer-facing-analytics/), MotherDuck's unique architecture enables novel approaches through its WASM-powered 1.5-tier architecture. The platform runs DuckDB in the browser through WebAssembly, allowing for highly interactive visualizations with near-zero latency. Developers can use MotherDuck's APIs and SDKs in languages like Python and Go to create custom data applications that leverage both local and cloud-based data processing. ## Advanced topics & best practices ### Performance tuning and optimization in MotherDuck MotherDuck inherits DuckDB's powerful query optimization capabilities. You can analyze query performance using the `EXPLAIN` command to view execution plans and identify bottlenecks. While DuckDB doesn't use traditional indexes, it automatically creates statistics and metadata to optimize query execution with row groups. As a result, [sorting the data on insert](https://duckdb.org/2025/05/14/sorting-for-fast-selective-queries.html) is very effective way to improve query performance. ### Data sharing and collaboration MotherDuck implements a data sharing model through SHARES, which provide read-only access to specific databases. To create a share, use the [`CREATE SHARE`](/sql-reference/motherduck-sql-reference/create-share/) command and specify the database you want to share. Recipients can then access the shared data through their own MotherDuck account while maintaining data isolation. ### Monitoring and logging MotherDuck usage DuckDB's meta-queries like `EXPLAIN ANALYZE` provide detailed query execution statistics. You can also use the platform's built-in profiling capabilities to monitor query performance and resource utilization, helping identify optimization opportunities and troubleshoot performance issues. [Recent queries](/sql-reference/motherduck-sql-reference/md_information_schema/recent_queries/) and [historical queries](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/) can be observed as well, to further optimize the warehouse load. ### Cost management While MotherDuck's pricing model is still evolving, you can optimize costs by efficiently managing compute resources. Consider implementing data lifecycle policies to archive or delete old data. Monitor query patterns to identify opportunities for optimization and avoid unnecessary data processing. ### Security best practices for your MotherDuck warehouse - Implement robust security practices by following MotherDuck's database-level security model. - Use token-based authentication for all connections and avoid sharing credentials. - When integrating with tools, leverage environment variables for secure credential management. - Regularly audit database access and maintain an inventory of active shares. ### Leveraging AI features within MotherDuck MotherDuck enhances DuckDB with AI-powered features to improve productivity. The platform includes a [SQL AI fixer](/getting-started/interfaces/motherduck-quick-tour/#fix-errors-and-edit-queries-with-ai) that helps identify and correct query syntax issues. The `prompt()` function enables natural language interactions with your data warehouse, allowing users to generate SQL queries from plain English descriptions. These are just a few of the AI capabilities that help make data analysis more accessible while maintaining the power and flexibility of SQL. ## Further guides: ## Included pages - [GitHub Actions](https://motherduck.com/docs/key-tasks/data-warehousing/orchestration/github-action-cron): Schedule MotherDuck SQL and dbt jobs with GitHub Actions as a lightweight cron-based orchestrator. - [PostgreSQL](https://motherduck.com/docs/key-tasks/data-warehousing/replication/postgres): Replicate PostgreSQL tables to MotherDuck using DuckDB and the PostgreSQL extension. - [Dagster](https://motherduck.com/docs/key-tasks/data-warehousing/orchestration/dagster): Orchestrate an incremental S3-to-MotherDuck data loading pipeline with Dagster and Python. - [SQL Server](https://motherduck.com/docs/key-tasks/data-warehousing/replication/sql-server): Replicate SQL Server tables to MotherDuck using Python and dataframes. - [Flat Files](https://motherduck.com/docs/key-tasks/data-warehousing/replication/flat-files): Load CSV, Parquet, and JSON files into MotherDuck from local storage or cloud sources. - [Excel and Google Sheets](https://motherduck.com/docs/key-tasks/data-warehousing/replication/spreadsheets): Load Excel and Google Sheets data into MotherDuck using the DuckDB CLI. ## Appendix ### Troubleshooting common issues When working with MotherDuck, you may encounter challenges around data loading, query performance, or connectivity. For data loading issues, refer to our [best practices for programmatic loading](/key-tasks/data-warehousing/) which covers optimizing batch sizes and file formats. For query performance, review our [dual execution capabilities](/concepts/architecture-and-capabilities/#dual-execution) to understand how MotherDuck optimizes query execution across local and cloud resources. For connectivity problems, check our [authentication guides](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck) and ensure you're following the recommended connection patterns. ### Useful SQL snippets for MotherDuck MotherDuck supports a wide range of SQL functionality inherited from DuckDB. For data ingestion, refer to our [PostgreSQL replication examples](/key-tasks/data-warehousing/replication/postgres) which demonstrate common patterns for loading data. For building customer facing analytics, check our [guide](/getting-started/customer-facing-analytics) which includes examples of data processing and visualization queries. The [DuckDB SQL documentation](https://duckdb.org/docs/sql/introduction.html) provides comprehensive reference for the SQL dialect. ### Links to further resources (MotherDuck docs, community) To deepen your understanding of data warehousing with MotherDuck, explore our [data warehousing concepts guide](/key-tasks/data-warehousing/) which covers architectural principles and best practices. For hands-on examples, the free [DuckDB in Action eBook](https://motherduck.com/duckdb-book-brief/) provides real-world scenarios and solutions. If you need help, don't hesitate to [contact our support team](https://motherduck.com/customer-support/) or explore our [ecosystem integrations](/integrations/) for additional tools and capabilities. Please do not hesitate to **[contact us](https://motherduck.com/customer-support/)** if you need help along your journey. --- Source: https://motherduck.com/docs/key-tasks/data-warehousing/orchestration/dagster --- sidebar_position: 2 title: Dagster description: Orchestrate an incremental S3-to-MotherDuck data loading pipeline with Dagster and Python. --- Use Dagster when you want asset lineage, schedules, retries, and run history around a Python data loading job. This guide builds a minimum viable Dagster asset that reads Parquet data from S3, loads rows newer than the last successful run, upserts them into MotherDuck, and stores a watermark for the next run. The example uses a public S3 Parquet file from the MotherDuck sample data bucket. Replace the S3 path and column mapping with your own bucket layout when you move from the demo to your pipeline. ## How the pipeline works ```mermaid graph LR S3[("S3 Parquet file")]:::yellow A["Dagster asset
taxi_trips"]:::watermelon W[("ingestion_watermarks")]:::yellow T[("taxi_trips")]:::yellow W --> A S3 --> A A --> T A --> W ``` The asset keeps the state in MotherDuck: - `taxi_trips` is the target table. - `ingestion_watermarks` stores the latest `pickup_at` value loaded by this pipeline. - Each run reads only rows where `tpep_pickup_datetime` is greater than the stored watermark. - The target table has a primary key, so reprocessing the same row updates the existing row instead of creating a duplicate. ## Prerequisites Before you start, ensure you have: - Python 3.10 or later. - `uv` for Python project and dependency management. - A MotherDuck access token in `MOTHERDUCK_TOKEN`. - A MotherDuck database name for the pipeline. The example creates the database if it doesn't exist. - For private S3 buckets, a MotherDuck S3 secret. See [Amazon S3 credentials](/integrations/cloud-storage/amazon-s3/) for setup. :::tip Use a dedicated MotherDuck service account for scheduled ingestion jobs. This keeps ingestion compute, permissions, and cost attribution separate from analyst and application workloads. See [Hypertenancy](/concepts/hypertenancy/) for the compute isolation model. ::: ## Create the Dagster project Create a small Python project and add Dagster with DuckDB: ```bash > uv init dagster-motherduck-s3 > cd dagster-motherduck-s3 > uv add dagster dagster-webserver duckdb ``` Create `definitions.py`: ```python import os import re import dagster as dg import duckdb S3_URI = os.getenv( "S3_URI", "s3://us-prd-motherduck-open-datasets/nyc_taxi/parquet/yellow_cab_nyc_2022_11.parquet", ) MOTHERDUCK_DATABASE = os.getenv("MOTHERDUCK_DATABASE", "dagster_s3_demo") PIPELINE_NAME = "dagster_s3_taxi_trips" # Optional cap for running the demo quickly. Leave unset for a real pipeline. INGESTION_END_TS = os.getenv("MOTHERDUCK_INGESTION_END_TS") PUBLIC_DEMO_SCOPE = "s3://us-prd-motherduck-open-datasets/" def database_identifier(name: str) -> str: if not re.fullmatch(r"[A-Za-z_][A-Za-z0-9_]*", name): raise ValueError("Use a database name with letters, numbers, and underscores.") return name def open_motherduck_connection() -> duckdb.DuckDBPyConnection: database = database_identifier(MOTHERDUCK_DATABASE) con = duckdb.connect("md:") con.execute(f"CREATE DATABASE IF NOT EXISTS {database}") con.execute(f"USE {database}") if S3_URI.startswith(PUBLIC_DEMO_SCOPE): con.execute(""" CREATE OR REPLACE TEMPORARY SECRET public_motherduck_open_data ( TYPE S3, PROVIDER config, REGION 'us-east-1', SCOPE 's3://us-prd-motherduck-open-datasets/' ) """) return con @dg.asset def taxi_trips(context: dg.AssetExecutionContext) -> dg.MaterializeResult: con = open_motherduck_connection() try: con.execute(""" CREATE TABLE IF NOT EXISTS taxi_trips ( trip_id VARCHAR PRIMARY KEY, pickup_at TIMESTAMP, dropoff_at TIMESTAMP, passenger_count DOUBLE, trip_distance DOUBLE, total_amount DOUBLE, source_file VARCHAR, loaded_at TIMESTAMP DEFAULT now() ) """) con.execute(""" CREATE TABLE IF NOT EXISTS ingestion_watermarks ( pipeline_name VARCHAR PRIMARY KEY, last_pickup_at TIMESTAMP ) """) con.execute(""" INSERT INTO ingestion_watermarks VALUES (?, TIMESTAMP '1970-01-01') ON CONFLICT (pipeline_name) DO NOTHING """, [PIPELINE_NAME]) last_pickup_at = con.execute( "SELECT last_pickup_at FROM ingestion_watermarks WHERE pipeline_name = ?", [PIPELINE_NAME], ).fetchone()[0] con.execute(""" CREATE OR REPLACE TEMP TABLE new_taxi_trips AS SELECT md5(concat_ws('|', VendorID::VARCHAR, tpep_pickup_datetime::VARCHAR, tpep_dropoff_datetime::VARCHAR, PULocationID::VARCHAR, DOLocationID::VARCHAR, total_amount::VARCHAR )) AS trip_id, tpep_pickup_datetime AS pickup_at, tpep_dropoff_datetime AS dropoff_at, passenger_count, trip_distance, total_amount, filename AS source_file, now() AS loaded_at FROM read_parquet(?, filename = true) WHERE tpep_pickup_datetime > ? AND (? IS NULL OR tpep_pickup_datetime < ?::TIMESTAMP) """, [S3_URI, last_pickup_at, INGESTION_END_TS, INGESTION_END_TS]) rows_loaded = con.execute("SELECT count(*) FROM new_taxi_trips").fetchone()[0] con.execute(""" INSERT INTO taxi_trips BY NAME SELECT * FROM new_taxi_trips ON CONFLICT (trip_id) DO UPDATE SET pickup_at = excluded.pickup_at, dropoff_at = excluded.dropoff_at, passenger_count = excluded.passenger_count, trip_distance = excluded.trip_distance, total_amount = excluded.total_amount, source_file = excluded.source_file, loaded_at = excluded.loaded_at """) max_pickup_at = con.execute( "SELECT max(pickup_at) FROM new_taxi_trips" ).fetchone()[0] if max_pickup_at is not None: con.execute( "UPDATE ingestion_watermarks SET last_pickup_at = ? WHERE pipeline_name = ?", [max_pickup_at, PIPELINE_NAME], ) total_rows = con.execute("SELECT count(*) FROM taxi_trips").fetchone()[0] context.log.info("Loaded %s rows into taxi_trips", rows_loaded) return dg.MaterializeResult( metadata={ "rows_loaded": rows_loaded, "total_rows": total_rows, "last_pickup_at": str(max_pickup_at or last_pickup_at), } ) finally: con.close() daily_s3_ingestion = dg.ScheduleDefinition( name="daily_s3_taxi_trips", cron_schedule="0 2 * * *", target=[taxi_trips], ) defs = dg.Definitions( assets=[taxi_trips], schedules=[daily_s3_ingestion], ) if __name__ == "__main__": result = dg.materialize([taxi_trips]) if not result.success: raise RuntimeError("Dagster materialization failed.") ``` ## Run the ingestion Set the MotherDuck token and database name: ```bash > export MOTHERDUCK_TOKEN="" > export MOTHERDUCK_DATABASE="dagster_s3_demo" ``` For the public demo file, you can cap the first run to one day of taxi trips so the example finishes quickly: ```bash > export MOTHERDUCK_INGESTION_END_TS="2022-11-02" ``` Run the asset once from Python: ```bash > uv run python definitions.py ``` Run the same command again. The second run should load `0` rows because the first run advanced the watermark. Verify the loaded rows in MotherDuck: ```sql SELECT count(*) FROM taxi_trips; SELECT pipeline_name, last_pickup_at FROM ingestion_watermarks; ``` When you use your own S3 data, remove `MOTHERDUCK_INGESTION_END_TS` and replace: - `S3_URI` with your `s3:////*.parquet` path. - The `SELECT` list in `new_taxi_trips` with your source columns. - The watermark column with a stable source timestamp, such as `updated_at` or `created_at`. - The primary key expression with the source system's durable row key. ## Run it in Dagster Start the Dagster UI from the same directory: ```bash > uv run dagster dev -f definitions.py ``` Open `http://localhost:3000`, select the `taxi_trips` asset, and materialize it. Dagster records the asset materialization, metadata, logs, and schedule definition. To use the schedule in a long-running Dagster deployment, keep the `daily_s3_taxi_trips` schedule enabled and run a Dagster daemon. For local one-off testing, `uv run python definitions.py` is enough. ## Production considerations This example is intentionally small. Before using the pattern in production: - Use a dedicated service account token with only the permissions needed for ingestion. - Store private bucket credentials as a MotherDuck S3 secret instead of embedding AWS keys in code. - Keep S3 files in Parquet and avoid very small files. See [S3 import best practices](/key-tasks/cloud-storage/s3-import-best-practices/). - Use a source-provided primary key for upserts. Hashing source fields is useful for demos but less stable than a real key. - Use a source timestamp that only moves forward for watermarking. If your source sends late-arriving records, add a small overlap window and deduplicate by primary key. ## Related content - [Amazon S3 credentials](/integrations/cloud-storage/amazon-s3/) - [S3 import best practices](/key-tasks/cloud-storage/s3-import-best-practices/) - [Connecting to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/) - [Hypertenancy](/concepts/hypertenancy/) --- Source: https://motherduck.com/docs/key-tasks/data-warehousing/orchestration/github-action-cron --- sidebar_position: 1 title: GitHub Actions description: Schedule MotherDuck SQL and dbt jobs with GitHub Actions as a lightweight cron-based orchestrator. keywords: - cron - workflow_dispatch - duckdb cli - service account --- GitHub Actions works well as a lightweight orchestrator for simple MotherDuck jobs: nightly SQL scripts, small ELT steps, dbt builds, smoke tests, and periodic exports. It is not a full data orchestrator, but it is often enough when a pipeline has one or two steps and can tolerate GitHub's scheduler behavior. ## When to use this pattern | Use GitHub Actions when | Use a dedicated orchestrator when | |-------------------------|-----------------------------------| | The job has a small number of steps | Jobs have complex dependencies or branching | | A missed or delayed run can be retried manually | Every run needs strict service-level guarantees | | The pipeline can run from repository files | State, retries, and backfills need first-class tracking | | GitHub is already where you review pipeline changes | Multiple teams need a shared orchestration UI | For larger workflows, use a tool from the [MotherDuck orchestration ecosystem](https://motherduck.com/ecosystem/?category=Orchestration). ## Set up authentication Create a [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token), preferably from a service account dedicated to the pipeline. Store it as a GitHub repository secret named `MOTHERDUCK_TOKEN`: ```bash gh secret set MOTHERDUCK_TOKEN ``` Use the token as an environment variable in workflow steps. Avoid putting tokens directly into SQL files, command arguments, artifacts, or logs. ## Choose the trigger Most MotherDuck cron jobs should support both manual and scheduled runs with GitHub Actions [`workflow_dispatch`](https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax#onworkflow_dispatch) and [`schedule`](https://docs.github.com/en/actions/reference/workflows-and-actions/events-that-trigger-workflows#schedule) triggers: ```yaml on: workflow_dispatch: schedule: - cron: "17 2 * * *" ``` Keep these GitHub Actions scheduling details in mind: - Scheduled workflows run from the latest commit on the default branch. - Cron schedules use UTC by default. - The shortest supported interval is every 5 minutes. - Jobs scheduled at the top of the hour can be delayed or dropped during periods of high GitHub Actions load. Pick a non-zero minute such as `17` or `43`. - `workflow_dispatch` lets you test the same workflow manually and rerun failed jobs after a fix. ## Example: run a SQL file on a schedule This example runs a checked-in SQL script every night and on demand. It uses: - Least-privilege repository permissions - A timeout so failed jobs do not burn runner minutes indefinitely - A concurrency group so two runs do not write to the same target at once - The MotherDuck install script for a compatible DuckDB CLI Create `.github/workflows/motherduck-nightly-sql.yml`: ```yaml name: motherduck nightly sql on: workflow_dispatch: schedule: - cron: "17 2 * * *" permissions: contents: read concurrency: group: motherduck-nightly-sql cancel-in-progress: false jobs: run-sql: runs-on: ubuntu-24.04 timeout-minutes: 15 env: motherduck_token: ${{ secrets.MOTHERDUCK_TOKEN }} steps: - name: Check out repository uses: actions/checkout@v6 - name: Install DuckDB CLI run: | install_home="$RUNNER_TEMP/motherduck" mkdir -p "$install_home" curl -s https://install.motherduck.com | env -u motherduck_token HOME="$install_home" sh echo "$install_home/.duckdb/cli/latest" >> "$GITHUB_PATH" - name: Run nightly SQL run: duckdb "md:" < sql/nightly_orders.sql ``` Create `sql/nightly_orders.sql`: ```sql CREATE DATABASE IF NOT EXISTS analytics; USE analytics; CREATE SCHEMA IF NOT EXISTS orchestration; CREATE TABLE IF NOT EXISTS orchestration.github_action_runs ( run_id VARCHAR, workflow_name VARCHAR, run_started_at TIMESTAMP ); DELETE FROM orchestration.github_action_runs WHERE run_id = getenv('GITHUB_RUN_ID'); INSERT INTO orchestration.github_action_runs VALUES ( getenv('GITHUB_RUN_ID'), getenv('GITHUB_WORKFLOW'), current_timestamp ); ``` Replace `analytics` with the MotherDuck database your pipeline should write to. The example creates the database if it does not already exist so a new repository can run without extra setup. The GitHub secret is named `MOTHERDUCK_TOKEN`, while the workflow exposes it as `motherduck_token`. The DuckDB CLI can use that environment variable to connect to MotherDuck non-interactively in GitHub Actions. The install step uses `RUNNER_TEMP` as `HOME` and unsets `motherduck_token` for the installer process so the install script does not try to update the runner's shell profile or validate the connection before the SQL step runs. ## Example: run dbt on a schedule For dbt projects, keep the dbt profile in the repository and read the MotherDuck token from the GitHub secret. Create `.github/workflows/motherduck-dbt.yml`: ```yaml name: motherduck dbt on: workflow_dispatch: schedule: - cron: "43 3 * * *" permissions: contents: read concurrency: group: motherduck-dbt-prod cancel-in-progress: false jobs: dbt-build: runs-on: ubuntu-24.04 timeout-minutes: 30 env: MOTHERDUCK_TOKEN: ${{ secrets.MOTHERDUCK_TOKEN }} steps: - name: Check out repository uses: actions/checkout@v6 - name: Set up Python uses: actions/setup-python@v6 with: python-version: "3.12" cache: pip - name: Install dbt run: python -m pip install -r requirements.txt - name: Install dbt packages run: dbt deps - name: Build dbt project run: dbt build --profiles-dir .github/dbt --target prod ``` Create `requirements.txt`: ```text dbt-duckdb>=1.9,<2.0 ``` Create `.github/dbt/profiles.yml`: ```yaml motherduck: target: prod outputs: prod: type: duckdb path: "md:analytics?motherduck_token={{ env_var('MOTHERDUCK_TOKEN') }}" threads: 4 ``` In `dbt_project.yml`, set the same profile name: ```yaml profile: motherduck ``` ## Production checklist | Area | Recommendation | |------|----------------| | Authentication | Use a service account token stored as `MOTHERDUCK_TOKEN`. Rotate it on the same cadence as other production secrets. | | Permissions | Set `permissions: contents: read` unless the workflow must write to the repository or call GitHub APIs. | | Scheduling | Use non-zero cron minutes and keep `workflow_dispatch` enabled for manual retries. | | Concurrency | Use a `concurrency` group for jobs that write to the same tables. | | Idempotency | Make SQL safe to rerun. Prefer `CREATE TABLE IF NOT EXISTS`, `CREATE OR REPLACE TABLE`, `MERGE`, or delete-and-insert patterns keyed by the run or partition. | | Timeouts | Set `timeout-minutes` on every job. | | Dependencies | Pin dependencies in `requirements.txt` or an equivalent lock file. Use dependency caching for Python/dbt jobs. | | Environments | Use separate service accounts and databases for development, staging, and production. | | Observability | Write a run record to a small audit table and rely on GitHub Actions notifications for failures. | ## Related content - [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/) - [dbt with DuckDB and MotherDuck](/integrations/transformation/dbt/) - [DuckDB CLI](/getting-started/interfaces/connect-query-from-duckdb-cli/) - [Orchestration integrations](https://motherduck.com/ecosystem/?category=Orchestration) --- Source: https://motherduck.com/docs/key-tasks/data-warehousing/replication/flat-files --- sidebar_position: 10 title: Flat Files description: Load CSV, Parquet, and JSON files into MotherDuck from local storage or cloud sources. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import DownloadLink from '@site/src/components/DownloadLink'; # Replicating flat files to MotherDuck The goal of this guide is to show users simple examples of loading data from flat file sources into MotherDuck. Examples are shown for both the MotherDuck Web UI and the DuckDB CLI. To install the DuckDB CLI, [check out the instructions first.](/getting-started/interfaces/connect-query-from-duckdb-cli) ## CSV From the UI, follow these steps: 1. Navigate to the **Add Data** section. 2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB. 3. Execute the generated query which will create a table for you. 1. Modify the query as needed to suit the correct Database / Schema / Table name. In the CLI, you can load a CSV file using the `read_csv` function. For example: ### Local file ```sql CREATE TABLE my_table AS SELECT * FROM read_csv('path/to/local_file.csv'); ``` ### S3 file To load from S3, ensure your DuckDB instance is configured with [S3 secrets](/documentation/integrations/cloud-storage/amazon-s3.mdx). Then: ```sql CREATE TABLE my_table AS SELECT * FROM read_csv('s3://bucket-name/path-to-file.csv'); ``` ## JSON From the UI, follow these steps: 1. Navigate to the **Add Data** section. 2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB. 3. Execute the generated query which will create a table for you. 1. Modify the query as needed to suit the correct Database / Schema / Table name. In the CLI, use the `read_json` function to load JSON files. ### Local file ```sql CREATE TABLE my_table AS SELECT * FROM read_json('path/to/local_file.json'); ``` ### S3 file Make sure S3 support is enabled as described in the [S3 secrets documentation](/documentation/integrations/cloud-storage/amazon-s3.mdx). ```sql CREATE TABLE my_table AS SELECT * FROM read_json('s3://bucket-name/path-to-file.json'); ``` :::tip Provide a schema for large or deeply nested JSON When loading large JSON files, DuckDB scans the data to discover the schema during query planning. For deeply nested or complex JSON, this can add significant time. To speed things up, provide the schema directly with the `columns` parameter: ```sql CREATE TABLE my_table AS SELECT * FROM read_json( 'path/to/local_file.json', columns={ id: 'BIGINT', name: 'VARCHAR', amount: 'DECIMAL(10,2)' } ); ``` If you already have a table with the right schema, use `INSERT INTO` instead of `CREATE TABLE AS` — DuckDB skips schema discovery when the target schema is known: ```sql INSERT INTO my_table SELECT * FROM read_json('path/to/local_file.json'); ``` You can also limit how deep DuckDB looks into nested structures with `maximum_depth`, or reduce the number of sampled objects with `sample_size` (default: 20480). See the [DuckDB JSON documentation](https://duckdb.org/docs/stable/data/json/loading_json) for all available options. ::: ## Parquet From the UI, follow these steps: 1. Navigate to the **Add Data** section. 2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB. 3. Execute the generated query which will create a table for you. 1. Modify the query as needed to suit the correct Database / Schema / Table name. In the CLI, use the `read_parquet` function to load Parquet files. ### Local file ```sql CREATE TABLE my_table AS SELECT * FROM read_parquet('path/to/local_file.parquet'); ``` ### S3 file Ensure S3 support is enabled as described in the [S3 secrets documentation](/documentation/integrations/cloud-storage/amazon-s3.mdx). ```sql CREATE TABLE my_table AS SELECT * FROM read_parquet('s3://bucket-name/path-to-file.parquet'); ``` ## Handling more complex workflows Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- Source: https://motherduck.com/docs/key-tasks/data-warehousing/replication/postgres --- sidebar_position: 1 title: PostgreSQL description: Replicate PostgreSQL tables to MotherDuck using DuckDB and the PostgreSQL extension. --- This page shows SQL patterns for connecting DuckDB to PostgreSQL, connecting to MotherDuck, and writing data from PostgreSQL into MotherDuck. For more complex replication scenarios, use one of our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion). If you are looking for the [pg_duckdb extension](https://github.com/duckdb/pg_duckdb), see the [pg_duckdb explainer page](/concepts/pgduckdb). To skip the documentation and look at the entire script, expand the element below:
SQL script ```sql -- install the PostgreSQL extension in DuckDB INSTALL postgres; LOAD postgres; -- tune the local DuckDB client for a larger initial load SET threads = 4; SET memory_limit = '4GB'; SET pg_connection_limit = 4; SET pg_pages_per_task = 250; -- attach PostgreSQL as pg_db ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS pg_db (TYPE POSTGRES, READ_ONLY); -- connect to MotherDuck ATTACH 'md:'; USE my_db; -- copy a PostgreSQL table into MotherDuck CREATE OR REPLACE TABLE main.postgres_table AS SELECT * FROM pg_db.public.some_table ```
## Loading the PostgreSQL extension and authenticating :::info MotherDuck does not yet support the PostgreSQL and MySQL extensions, so you need to perform the following steps on your own computer or cloud computing resource. We are working on supporting the PostgreSQL extension on the server side so that this can happen within the MotherDuck app in the future with improved performance. ::: The first step is to install and load the PostgreSQL extension using the [DuckDB CLI](/getting-started/interfaces/connect-query-from-duckdb-cli): ```sql INSTALL postgres; LOAD postgres; ``` Once this is completed, you can connect to PostgreSQL by attaching it to your DuckDB session: ```sql ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS pg_db (TYPE POSTGRES, READ_ONLY); ``` More detailed information can be found on the [DuckDB documentation](https://duckdb.org/docs/extensions/postgres.html#connecting). For larger initial loads, tune the DuckDB client explicitly instead of relying on defaults: ```sql SET threads = 8; SET memory_limit = '8GB'; SET pg_connection_limit = 8; SET pg_pages_per_task = 250; ``` `pg_connection_limit` controls how many PostgreSQL connections DuckDB may open for the scan, while `pg_pages_per_task` controls how much table work is grouped into each scan task. ## Connecting to MotherDuck and inserting the table Once you are connected to your PostgreSQL database, you need to connect to MotherDuck. To learn more, see [Connecting to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck). ```sql ATTACH 'md:'; USE my_db; ``` Once you have authenticated, you can use `CREATE TABLE AS SELECT` to replicate data from PostgreSQL into MotherDuck. ```sql CREATE OR REPLACE TABLE main.postgres_table AS SELECT * FROM pg_db.public.some_table ``` Congratulations! You have now replicated data from PostgreSQL into MotherDuck. ## Choosing the right PostgreSQL workflow ### Use DuckDB's PostgreSQL extension for client-side movement Use DuckDB's PostgreSQL extension when you want to copy a PostgreSQL table into MotherDuck for analytics, backfill a MotherDuck table from PostgreSQL, or export a DuckDB or MotherDuck result set back into PostgreSQL from a controlled DuckDB client. Keep the client close to both systems, use `READ_ONLY` for PostgreSQL sources, and chunk large writes when the destination is PostgreSQL so you do not overload an OLTP database. ### Use the Postgres endpoint for PostgreSQL-compatible clients Use the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) when an application, BI tool, or serverless runtime needs to connect to MotherDuck through the PostgreSQL wire protocol. It is the preferred path for PostgreSQL-compatible clients because it does not require installing or operating a PostgreSQL extension. ### Use pg_duckdb when the query must run inside PostgreSQL Use `pg_duckdb` only when you specifically need PostgreSQL itself to host the integration. This is useful when queries must run inside an existing PostgreSQL database, when PostgreSQL-local tables need to be joined with DuckDB or MotherDuck data from that PostgreSQL environment, or when a tool must connect to a PostgreSQL server that you control. For ongoing production replication from PostgreSQL into MotherDuck, prefer an ingestion or CDC partner. Those tools handle scheduling, retries, incremental state, schema changes, and operational monitoring better than a one-off SQL script. ## Best practices Here are a few tips to keep large PostgreSQL replication jobs predictable. ### Run DuckDB close to both systems The DuckDB client is the data mover in this workflow. Run it on a machine with a good network path to both PostgreSQL and MotherDuck, and avoid running large backfills on the same host as a production PostgreSQL instance when possible. ### Tune scan parallelism explicitly Start with `threads` set to the available CPU count on the client and `memory_limit` set below total system memory. For larger tables, start with `pg_connection_limit` in the `4-8` range and `pg_pages_per_task` in the `250-1000` range, then tune after observing the source database. ::::warning[Watch Out] Increasing `pg_connection_limit` can increase pressure on the source PostgreSQL instance. If PostgreSQL memory or connection pressure climbs, reduce `pg_connection_limit` before reducing DuckDB `threads`. :::: ### Keep PostgreSQL sources read-only Use `READ_ONLY` when attaching PostgreSQL for an initial replication job. For long-lived scripts, use PostgreSQL environment variables, the PostgreSQL password file, or DuckDB secrets instead of embedding credentials directly in the connection string. ### Reduce each statement's working set The DuckDB side of this workflow is usually streaming, so out-of-memory risk is often driven by the source PostgreSQL instance and total host headroom rather than DuckDB buffering the full table. Project only the columns you need when source rows are wide, and replicate very large tables in smaller primary key or time ranges. ### Load in chunks For a very large initial backfill, create the target table once and then insert one range at a time. ```sql INSTALL postgres; LOAD postgres; SET threads = 4; SET memory_limit = '4GB'; SET pg_connection_limit = 4; SET pg_pages_per_task = 250; ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS pg_db (TYPE POSTGRES, READ_ONLY); ATTACH 'md:'; USE my_db; CREATE TABLE IF NOT EXISTS main.postgres_table AS SELECT * FROM pg_db.public.some_table WHERE 1 = 0; INSERT INTO main.postgres_table SELECT * FROM pg_db.public.some_table WHERE updated_at >= TIMESTAMP '2026-01-01' AND updated_at < TIMESTAMP '2026-02-01'; ``` Repeat the `INSERT` statement for each chunk until the backfill is complete. ## Handling more complex workflows Production use cases tend to be much more complex and include things like incremental builds and state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native Python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- Source: https://motherduck.com/docs/key-tasks/data-warehousing/replication/spreadsheets --- sidebar_position: 20 title: Excel and Google Sheets description: Load Excel and Google Sheets data into MotherDuck using the DuckDB CLI. --- # Using Excel and Google Sheets Data in MotherDuck Key bits of data and side schedules often exist in spreadsheets like Excel and Google Sheets. It is nice to be able to easily add that data to your data warehouse and query it. This guide aims to show you how to perform this workflow using the DuckDB CLI for both [Excel](#microsoft-excel) and [Google Sheets](#google-sheets). :::tip In order use these extensions, you will need to first install the DuckDB CLI. [Instructions can be found here.](/getting-started/interfaces/connect-query-from-duckdb-cli). ::: ## Microsoft Excel :::note The purpose of this guide is to show you how to _load_ data from Excel into MotherDuck. If you'd like to _retrieve_ MotherDuck data in Excel, you can [follow this guide](/integrations/bi-tools/excel/). ::: To read from an Excel spreadsheet, open the DuckDB CLI by typing `duckdb 'md:'` in your terminal. This will ask you for access to your MotherDuck account if you haven't already provided it. You can now read Excel files directly with a simple `SELECT * FROM 'movies.xslx'` which will automatically load the DuckDB Excel extension. If you want to get more control you can use [the `read_xlsx` function](https://duckdb.org/docs/stable/core_extensions/excel) directly. ```sql SELECT * FROM read_xlsx('movies.xlsx', sheet = 'Action Movies'); ``` The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS). ```sql CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name SELECT * FROM "C:\users\documents\movies.xlsx"; ``` Of course, sometimes there is data in multiple tabs. In that case, you can use the `sheet` parameter to pass the tab names, and depending on the context, even union multiple tabs into a single table. ```sql CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name SELECT * FROM st_read("C:\users\documents\movies.xlsx", sheet = 'Action Movies') UNION ALL SELECT * FROM st_read("C:\users\documents\movies.xlsx", sheet = 'Romance Movies'); ``` ## Google Sheets ::::info While the Excel extension is a core DuckDB extension, the Google Sheets extension is a community extension maintained by Evidence. :::: The first step to handle Google Sheets is to install the [duckdb-gsheets](https://duckdb-gsheets.com/) extension. That is done with these commands after starting the DuckDB CLI with `duckdb 'md:'` ```sql INSTALL gsheets FROM community; LOAD gsheets; ``` Since Google Sheets is a hosted application, we need to use [DuckDB Secrets](https://duckdb.org/docs/configuration/secrets_manager.html) to handle authentication. This is as simple as: ```sql CREATE SECRET (TYPE gsheet); ``` :::note Using this workflow will require interactivity with a browser, so if you need to run it from a job (i.e. Airflow or similar), consider setting up a [Google API access token](https://duckdb-gsheets.com/#getting-a-google-api-access-token). ::: In order to read from a Google Sheet, we need at minimum the sheet id, which is found in the URL, for example `https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit`. The string between `d/` and `/edit` represents the spreadsheet id. It can therefore be queried with: ```sql SELECT * FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit'); ``` The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS). ```sql CREATE OR REPLACE TABLE my_db.main.my_table AS -- use fully qualified table name SELECT * FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit'); ``` For convenience, the spreadsheet id itself can be queried as well. ```sql SELECT * FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8'); ``` To query data from multiple tabs, the tab name can be passed as parameter using `sheet` to select the preferred tab. ```sql SELECT * FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8', sheet='Sheet2'); ``` For more detailed documentation, including writing to Google Sheets, review the [duckdb-gsheets documentation](https://duckdb-gsheets.com/#getting-a-google-api-access-token). ## Handling More Complex Workflows Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- Source: https://motherduck.com/docs/key-tasks/data-warehousing/replication/sql-server --- sidebar_position: 2 title: SQL Server description: Replicate SQL Server tables to MotherDuck using Python and dataframes. --- # Replicating SQL Server tables to MotherDuck This page will serve to show basic patterns for using Python to connect to SQL Server, read data into a dataframe, connect to MotherDuck, and then writing the data from the dataframe into MotherDuck. For more complex replication scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion). To skip the documentation and look at the entire script, expand the element below:
Python script ```py import pyodbc # Define your connection parameters server = 'ip_address' database = 'master' # or use your database name username = 'your_username' password = 'your_password' # consider using a secret manager or .env port = 1433 # default SQL Server port # Define the connection string for ODBC Driver 17 connection_string = ( f"DRIVER={{ODBC Driver 17 for SQL Server}};" f"SERVER={server},{port};" f"DATABASE={database};" f"UID={username};" f"PWD={password};" ) # Connect to SQL Server try: connection = pyodbc.connect(connection_string) print("Connection successful.") except pyodbc.Error as e: print(f"Error: {e}") finally: connection.close() import pandas as pd try: connection = pyodbc.connect(connection_string) query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials" # Execute the query using pyodbc cursor = connection.cursor() cursor.execute(query) # Fetch the column names and data columns = [column[0] for column in cursor.description] data = cursor.fetchall() # Convert the data into a DataFrame df = pd.DataFrame.from_records(data, columns=columns) finally: connection.close() import duckdb motherduck_token = 'your_token' # Attach using the MOTHERDUCK_TOKEN duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'") # Create or replace table in the attached database duckdb.sql( """ CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS SELECT * FROM df """ ) ```
## SQL Server Authentication SQL Server supports [multiple methods of authentication](https://learn.microsoft.com/en-us/sql/relational-databases/security/choose-an-authentication-mode?view=sql-server-ver16) - for the purpose of this example, we will use username/password authentication and [pyodbc](https://github.com/mkleehammer/pyodbc/), along with [ODBC Driver 17 for SQL Server](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server?view=sql-server-ver16). It should be noted that 'ODBC Driver 18 for SQL Server' is also available and includes support for some newer SQL Server features, but for the sake of compatibility, this example will use 17. Consider the following authentication example: ```py import pyodbc # Define your connection parameters server = 'ip_address' database = 'master' # or use your database name username = 'your_username' password = 'your_password' # consider using a secret manager or .env port = 1433 # default SQL Server port # Define the connection string for ODBC Driver 17 connection_string = ( f"DRIVER={{ODBC Driver 17 for SQL Server}};" f"SERVER={server},{port};" f"DATABASE={database};" f"UID={username};" f"PWD={password};" ) # Connect to SQL Server try: connection = pyodbc.connect(connection_string) print("Connection successful.") except pyodbc.Error as e: print(f"Error: {e}") finally: connection.close() ``` This will set your credentials, and then attempt to connect to your server with `pyodbc.connect`, and return an error if it fails. ## Reading a SQL Server table into a dataframe Once you have authenticated, you can define arbitrary queries and then execute them with `pd.read_sql`, using the `query` and `connection` objects. For the purpose of this example, we are using SQL Server 2022 along with the AdventureWorks OLTP database. :::note While `pandas` is a great library, it is not particularly well-suited for very large tables. To learn more about using buffers and alternative libraries, check out [Loading data with Python](/key-tasks/loading-data-into-motherduck/loading-data-md-python/). ::: ```py import pandas as pd try: connection = pyodbc.connect(connection_string) query = "SELECT * FROM AdventureWorks2022.Production.BillOfMaterials" # Execute the query using pyodbc cursor = connection.cursor() cursor.execute(query) # Fetch the column names and data columns = [column[0] for column in cursor.description] data = cursor.fetchall() # Convert the data into a DataFrame df = pd.DataFrame.from_records(data, columns=columns) finally: connection.close() ``` ## Inserting the table into MotherDuck Now that the data has been loaded into a dataframe object, we can connect to MotherDuck and insert the table. :::note You will need to [generate a token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#creating-an-access-token) in your MotherDuck account. For production use cases, make sure to use a secret manager and never commit your token to your codebase. ::: ```py import duckdb motherduck_token = 'your_token' # Attach using the MOTHERDUCK_TOKEN duckdb.sql(f"ATTACH 'md:my_db?MOTHERDUCK_TOKEN={motherduck_token}'") # Create or replace table in the attached database duckdb.sql( """ CREATE OR REPLACE TABLE my_db.main.BillOfMaterials AS SELECT * FROM df """ ) ``` This will create the table, or replace it for the table already exists. ## Handling More Complex Workflows Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below. ![Diagram](../../../img/md-diagram.svg) --- Source: https://motherduck.com/docs/key-tasks/database-operations/basics-operations --- sidebar_position: 1 title: Basics database operations description: Create, list, and drop MotherDuck databases using SQL commands. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; While embedded DuckDB uses files on your local filesystem to represent databases, MotherDuck implements SQL syntax for creating, listing and dropping databases. ## Create database ```sql -- [OR REPLACE] and [IF NOT EXISTS] are optional modifiers. CREATE [OR REPLACE | IF NOT EXISTS] DATABASE ; USE ; ``` Creating copies of databases in MotherDuck in this manner is a metadata-only operation that copies no data. Learn more in the [`CREATE DATABASE`](/sql-reference/motherduck-sql-reference/create-database/) overview documentation. ## Listing databases ```sql -- returns all connected local and remote databases SHOW DATABASES; -- returns current database SELECT current_database(); ``` Learn more in the [`SHOW ALL DATABASES`](/sql-reference/motherduck-sql-reference/show-databases/) overview documentation. ## Delete database ```sql USE ; DROP DATABASE ; ``` Example usage: ```sql > SHOW DATABASES; test01 -- Let's put two different t1 tables into into two different databases > CREATE TABLE dbname.t1 AS (SELECT range AS r FROM range(12)); > SELECT * FROM t1; -- now for the other database > CREATE DATABASE test02; > CREATE TABLE test02.t1 AS (SELECT 'test02' AS dbname) -- show the databases we've created > SHOW DATABASES; test01 test02 ``` Learn more in the [`DROP DATABASE`](/sql-reference/motherduck-sql-reference/show-databases/) overview documentation. --- Source: https://motherduck.com/docs/key-tasks/database-operations/copying-databases --- sidebar_position: 10 title: Copying DuckDB Databases description: Duplicate databases between MotherDuck cloud and local DuckDB using COPY FROM DATABASE. --- # Copying MotherDuck and DuckDB Databases The `COPY FROM DATABASE` statement creates an exact duplicate of an existing database, including both schema and data. This functionality enables the following operations: [Interact with MotherDuck Databases](#copy-a-motherduck-database-to-a-motherduck-database) - Copy between MotherDuck databases [Interact with Local Databases](#interacting-with-local-databases) - Import local database to MotherDuck - Export MotherDuck database to local filesystem - Copy between local databases The `COPY FROM DATABASE` command is implemented as a multiple statement macro, which is not supported in WebAssembly. As a result, simultaneous schema and data copying is not available in the MotherDuck Web UI. However, the Web UI supports copying schema only (`SCHEMA` option) or data only (`DATA` option). All functionality is available in other drivers, including the DuckDB CLI. :::caution No zero-copy clone `COPY FROM DATABASE` creates a *physical* copy of both the schema and the data. It **does not** use MotherDuck's zero-copy cloning, so the operation may take longer to run and will consume additional storage proportional to the size of the source database. ::: ## Syntax The syntax for `COPY FROM DATABASE` is: ```sql COPY FROM DATABASE TO [ (SCHEMA) | (DATA) ] ``` ### Parameters - ``: The name or path of the source database to copy from - ``: The name or path of the target database to create - `(SCHEMA)`: Optional parameter to copy only the database schema without data - `(DATA)`: Optional parameter to copy only the database data without schema ## Example Usage ### Copy a MotherDuck database to a MotherDuck database This is the same as [creating a new database from an existing one](/sql-reference/motherduck-sql-reference/create-database.md). ```sql COPY FROM DATABASE my_db TO my_db_copy; ``` ### Interacting with Local Databases These operations can be done with access to the local filesystem, i.e. inside the DuckDB CLI. #### Copy a local database to a MotherDuck database ```sql ATTACH 'local_database.db'; ATTACH 'md:'; CREATE DATABASE md_database; COPY FROM DATABASE local_database TO md_database; ``` #### Copy a MotherDuck database to a local database To copy a MotherDuck database to a local database requires some extra steps. ```sql ATTACH 'md:'; ATTACH 'local_database.db' as local_db; COPY FROM DATABASE my_db TO local_db; ``` #### Copy a local database to a local database To copy a local database to a local database, please see the [DuckDB documentation](https://duckdb.org/docs/stable/sql/statements/copy.html#copy-from-database--to). ### Copying the Database Schema ```sql COPY FROM DATABASE my_db TO my_db_copy (SCHEMA); ``` This will copy the schema of the database, but not the data. ### Copying the Database Data ```sql COPY FROM DATABASE my_db TO my_db_copy (DATA); ``` This will copy the data of the database, but not the schema. --- Source: https://motherduck.com/docs/key-tasks/database-operations/database-operations --- title: Database operations description: Learn how to work with databases and MotherDuck --- ## Included pages - [Basics database operations](https://motherduck.com/docs/key-tasks/database-operations/basics-operations): Create, list, and drop MotherDuck databases using SQL commands. - [Specifying different databases](https://motherduck.com/docs/key-tasks/database-operations/specifying-different-databases): Reference tables across databases using fully qualified names with database.schema.table syntax. - [Switching the current database](https://motherduck.com/docs/key-tasks/database-operations/switching-the-current-database): Change the active database and schema context using USE statements. - [Querying historical data with time travel](https://motherduck.com/docs/key-tasks/database-operations/time-travel): Use MotherDuck snapshots to query past database states, compare data across time periods, debug pipeline issues, reproduce reports, and create audit checkpoints. - [Copying DuckDB Databases](https://motherduck.com/docs/key-tasks/database-operations/copying-databases): Duplicate databases between MotherDuck cloud and local DuckDB using COPY FROM DATABASE. - [Detach and re-attach a MotherDuck database](https://motherduck.com/docs/key-tasks/database-operations/detach-and-reattach-motherduck-database): Temporarily disconnect from a MotherDuck database using DETACH and reconnect with ATTACH. --- Source: https://motherduck.com/docs/key-tasks/database-operations/detach-and-reattach-motherduck-database --- sidebar_position: 12 title: Detach and re-attach a MotherDuck database description: Temporarily disconnect from a MotherDuck database using DETACH and reconnect with ATTACH. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; After [creating a remote MotherDuck database](/sql-reference/motherduck-sql-reference/create-database.md), the [`DETACH` command](/sql-reference/motherduck-sql-reference/detach.md) may be used to detach it. This will prevent access and modifications to the database until it is re-attached using the [`ATTACH` command](/sql-reference/motherduck-sql-reference/attach.md). This pattern can be used to isolate queries and changes to a specific set of databases. Note that this is a convenience feature and not a security feature, as a MotherDuck database may be reattached at any time. Database shares behave slightly differently than non-shared databases, so if you want to `ATTACH` and `DETACH` shares, please have a look at how to [manage shared MotherDuck databases](/key-tasks/sharing-data/sharing-data.mdx). ## Creating, detaching, and re-attaching a database This guide will show how to `CREATE`, `DETACH`, and `ATTACH` a database using the CLI and the UI. ```sql CREATE DATABASE my_new_md_database; DETACH my_new_md_database; ATTACH 'my_new_md_database'; -- OR ATTACH 'md:my_new_md_database'; ``` To create a database, add a new cell and enter the SQL command `CREATE DATABASE `. Click the Run button. ![create_database](./img/create_database.png) Click on the menu of the database you would like to detach and select `Detach`. ![detach_database](./img/detach_database.png) The database will be moved to the "Detached Databases" section of the object explorer. ![detached_databases](./img/detached_databases.png) To re-attach, click on the menu of the database in the "Detached Databases" section and select `Attach`. ![attach_database](./img/attach_database.png) The database will be returned to the "My Databases" section. ![my_databases_post_attach](./img/my_databases_post_attach.png) ## Show All Databases To see all databases, both attached and detached, use the [`SHOW ALL DATABASES` command](/sql-reference/motherduck-sql-reference/show-databases.md). ```sql SHOW ALL DATABASES; ``` Example output: ```bash ┌──────────────────────────────────────────┬─────────────┬──────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┐ │ alias │ is_attached │ type │ fully_qualified_name │ │ varchar │ boolean │ varchar │ varchar │ ├──────────────────────────────────────────┼─────────────┼──────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤ │ TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ true │ motherduck │ md:TEST_DB_02d6fc2158094bd693b6f285dbd402f7 │ │ TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ true │ motherduck │ md:TEST_DB_62b53d968a4f4b6682ed117a7251b814 │ │ base │ false │ motherduck │ md:base │ │ base2 │ true │ motherduck │ md:base2 │ │ db1 │ false │ motherduck │ md:db1 │ │ integration_test_001 │ false │ motherduck │ md:integration_test_001 │ │ my_db │ true │ motherduck │ md:my_db │ │ my_share_1 │ true │ motherduck share │ md:_share/integration_test_001/18d6dbdb-e130-4cdf-97c4-60782ed5972b │ │ sample_data │ false │ motherduck │ md:sample_data │ │ source_db │ true │ motherduck │ md:source_db │ │ test_db_115 │ false │ motherduck │ md:test_db_115 │ │ test_db_28d │ false │ motherduck │ md:test_db_28d │ │ test_db_cc9 │ false │ motherduck │ md:test_db_cc9 │ │ test_share │ true │ motherduck share │ md:_share/source_db/b990b424-2f9a-477a-b216-680a22c3f43f │ │ test_share_002 │ true │ motherduck share │ md:_share/integration_test_001/06cc5500-e49a-4f62-9203-105e89a4b8ae │ ├──────────────────────────────────────────┴─────────────┴──────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┤ │ 15 rows (15 shown) 4 columns │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- Source: https://motherduck.com/docs/key-tasks/database-operations/specifying-different-databases --- sidebar_position: 2.2 title: Specifying different databases description: Reference tables across databases using fully qualified names with database.schema.table syntax. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck enables you to specify an active/current database and an active/current schema within that database. Queryable objects (e.g. tables) that belong to the current database are resolved with just ``. MotherDuck will automatically search all schemas within the current database. If there are overlapping names within different schemas, objects can be qualified with `.`. Queryable objects in your account outside of the active/current database are resolved with `.`. However, if a schema in the current database shares the same name as another database, the fully qualified name must be used: `..` (an error will be thrown to indicate the ambiguity). This applies to databases that both live in MotherDuck and in your local DuckDB environment. For example: ```sql -- check your current database SELECT current_database(); dbname -- check your current schema SELECT current_schema(); main -- query a table mytable that exists in the current database dbname SELECT count(*) FROM mytable; 34 -- query a table mytable2 that exists in the database dbname2 SELECT count(*) FROM dbname2.mytable2; 41 -- query a table mytable3 that exists in schema2 -- note that the syntax is identical to the database name syntax above and -- MotherDuck will detect whether a database or schema is involved SELECT count(*) FROM schema2.mytable3 42 -- query a table in another database when a schema exists with the same name in the current database -- (overlappingname is both a database name and a schema name) SELECT count(*) FROM overlappingname.myschemaname.mytable4 43 ``` You can also reference local databases in the same MotherDuck queries. This type of query is known as a [hybrid query](/key-tasks/running-hybrid-queries.md). To change the active database, schema, or database/schema combination, execute a `USE` command. See the documentation on [switching the current database](./switching-the-current-database.md) for details. --- Source: https://motherduck.com/docs/key-tasks/database-operations/switching-the-current-database --- sidebar_position: 3 title: Switching the current database description: Change the active database and schema context using USE statements. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Below are examples of how to determine the current/active database and schema and switch between different databases and schemas: ```sql -- check your current database SELECT current_database(); dbname -- list all tables in the current database SHOW TABLES; table1 table2 -- list all databases SHOW DATABASES; dbname dbname2 -- switch to database named 'dbname2' USE dbname2; -- verify that you've successfully switched databases SELECT current_database(); dbname2 -- check your current schema SELECT current_schema(); main -- list all schemas across all databases SELECT * FROM duckdb_schemas(); ``` | oid | database_name | database_oid | schema_name | internal | sql | |------|---------------|--------------|--------------------|----------|------| | 986 | my_db | 989 | information_schema | true | NULL | | 974 | my_db | 989 | main | false | NULL | | 972 | my_db | 989 | my_schema | false | NULL | | 987 | my_db | 989 | pg_catalog | true | NULL | | 1508 | system | 0 | information_schema | true | NULL | | 0 | system | 0 | main | true | NULL | | 1509 | system | 0 | pg_catalog | true | NULL | | 1510 | temp | 1453 | information_schema | true | NULL | | 1454 | temp | 1453 | main | true | NULL | | 1511 | temp | 1453 | pg_catalog | true | NULL | ```sql -- switch to schema my_schema within the same database USE my_schema; -- verify that you've successfully switched schemas SELECT current_schema(); my_schema -- switch to database my_db and schema main USE my_db.my_schema -- verify that both the database and schema have been changed SELECT current_database(), current_schema(); ``` | current_database() | current_schema() | |--------------------|------------------| | my_db | main | --- Source: https://motherduck.com/docs/key-tasks/database-operations/time-travel --- sidebar_position: 6 title: Querying historical data with time travel description: Use MotherDuck snapshots to query past database states, compare data across time periods, debug pipeline issues, reproduce reports, and create audit checkpoints. --- MotherDuck's [snapshot system](/concepts/snapshots) automatically captures your database state whenever you insert, delete, or update rows in a table, or create a new table. This means you can query your database as it existed at any point within your [retention window](/concepts/snapshots#snapshot-retention): this is called **time travel**, though there is no flux capacitor involved. Unlike the traditional backup strategy of copy-paste and restore workflows, time travel lets you read historical data directly alongside your current data without modifying anything. This guide covers practical patterns for querying historical database states: - [**Compare data across time periods**](#comparing-data-across-time-periods) — Diff today vs. yesterday, detect changed records, and spot anomalies - [**Debug data pipeline issues**](#debugging-data-pipeline-issues) — Find exactly when and how bad data entered your system - [**Reproduce past reports**](#reproducing-past-reports) — Re-run a query against the exact data a dashboard showed last week - [**Create audit checkpoints**](#creating-audit-checkpoints-with-named-snapshots) — Preserve database state at key moments for compliance and regulatory needs :::info Prerequisites Time travel requires a paid plan with `snapshot_retention_days` > 0. See [snapshot features per plan](/concepts/snapshots#snapshot-features-per-plan) for details. ::: ## Try it yourself: sample data setup The examples in this guide all use the same `shop_db` database. Run the following to create it and follow along. ```sql CREATE DATABASE IF NOT EXISTS shop_db; USE shop_db; -- Customers table CREATE OR REPLACE TABLE customers AS SELECT * FROM (VALUES (1, 'Alice Johnson', 'alice@example.com', 'US-West', '2025-11-01'::DATE), (2, 'Bob Smith', 'bob@example.com', 'US-East', '2025-11-15'::DATE), (3, 'Carol Williams', 'carol@example.com', 'EU-West', '2025-12-01'::DATE) ) AS t(customer_id, name, email, region, created_at); -- Orders table CREATE OR REPLACE TABLE orders AS SELECT * FROM (VALUES (101, 1, 250.00, '2026-01-15'::DATE, 'completed'), (102, 2, 89.99, '2026-01-16'::DATE, 'completed'), (103, 3, 450.00, '2026-01-20'::DATE, 'completed'), (104, 1, 125.50, '2026-02-01'::DATE, 'completed'), (105, 2, 67.25, '2026-02-10'::DATE, 'completed'), (106, 3, 215.75, '2026-02-14'::DATE, 'pending'), (107, 1, 175.00, '2026-02-15'::DATE, 'pending') ) AS t(order_id, customer_id, amount, order_date, status); ``` Now create a snapshot to mark this as a known-good baseline: ```sql CREATE SNAPSHOT baseline OF shop_db; ``` To simulate changes over time (for testing the examples below), apply some modifications and snapshot again: ```sql -- Simulate a data update: customer email change + new customer UPDATE customers SET email = 'alice.j@newdomain.com' WHERE customer_id = 1; INSERT INTO customers VALUES (6, 'Dave Miller', 'dave@example.com', 'US-East', '2026-02-16'); -- Simulate a pipeline issue: accidentally delete some orders DELETE FROM orders WHERE order_id IN (106, 107); -- Insert a new order INSERT INTO orders VALUES (108, 6, 95.00, '2026-02-16', 'pending'); CREATE SNAPSHOT after_changes OF shop_db; ``` You now have two named snapshots (`baseline` and `after_changes`) you can use with the patterns below. ## Core pattern: clone a point-in-time snapshot The fundamental time travel pattern is to create a temporary database from a historical snapshot, then query it alongside your current data: ```sql -- Create a zero-copy clone of your database at a past point in time CREATE DATABASE shop_db_yesterday FROM shop_db ( SNAPSHOT_NAME 'baseline' ); -- Query the historical clone SELECT * FROM shop_db_yesterday.main.orders; ``` To make sure you don't unnecessary store data we clean up the database again. ```sql DROP DATABASE shop_db_yesterday; ``` This uses a [zero-copy clone](/concepts/database-concepts/#motherduck-architectural-concepts), so no data is duplicated. The clone points to the same underlying storage objects. To see what snapshots are available and find the right timestamp, query: ```sql SELECT snapshot_id, created_ts, active_bytes FROM md_information_schema.database_snapshots WHERE database_name = 'shop_db' ORDER BY created_ts DESC LIMIT 10; ``` ## Comparing data across time periods Your operations team notices that order volume looks off this morning. Rather than waiting for a full data audit, you can instantly diff today's data against yesterday's snapshot to find new records, deleted rows, or unexpected changes — useful for anomaly detection, daily change tracking, and operational monitoring. ```sql -- Clone yesterday's state CREATE DATABASE shop_yesterday FROM shop_db ( SNAPSHOT_NAME 'baseline' -- or use a timebased reference SNAPSHOT_TIME '2026-02-15 00:00:00' ); -- Find new customers added since yesterday SELECT c.customer_id, c.name, c.created_at FROM shop_db.main.customers c ANTI JOIN shop_yesterday.main.customers y ON c.customer_id = y.customer_id; -- Compare daily order totals SELECT 'today' AS period, count(*) AS order_count, sum(amount) AS total_revenue FROM shop_db.main.orders WHERE order_date = CURRENT_DATE UNION ALL SELECT 'yesterday' AS period, count(*) AS order_count, sum(amount) AS total_revenue FROM shop_yesterday.main.orders WHERE order_date = CURRENT_DATE - INTERVAL 1 DAY; -- Detect changed records (e.g. email updates) SELECT c.customer_id, y.email AS old_email, c.email AS new_email FROM shop_db.main.customers c JOIN shop_yesterday.main.customers y ON c.customer_id = y.customer_id WHERE c.email != y.email; DROP DATABASE shop_yesterday; ``` ## Debugging data pipeline issues A dashboard that was showing correct numbers yesterday is now off. You suspect a pipeline run corrupted or dropped data, but you're not sure when it happened. Time travel lets you clone the database at a known-good point and compare it to the current state to find exactly which records disappeared, changed, or were introduced incorrectly. ```sql -- List recent snapshots to narrow down the issue SELECT snapshot_id, created_ts, active_bytes FROM md_information_schema.database_snapshots WHERE database_name = 'shop_db' AND created_ts >= '2026-02-14 00:00:00' ORDER BY created_ts; ``` ```sql -- Clone the database at a known-good time CREATE DATABASE shop_before FROM shop_db ( SNAPSHOT_ID 'b1ecf2f3-4567-8901-b23f-45c67890b12' ); -- Compare row counts to spot unexpected changes SELECT 'before' AS state, count(*) AS row_count, count(DISTINCT customer_id) AS unique_customers FROM shop_before.main.orders UNION ALL SELECT 'current' AS state, count(*) AS row_count, count(DISTINCT customer_id) AS unique_customers FROM shop_db.main.orders; -- Find records that disappeared SELECT b.order_id, b.customer_id, b.amount, b.order_date FROM shop_before.main.orders b ANTI JOIN shop_db.main.orders c ON b.order_id = c.order_id; DROP DATABASE shop_before; ``` ## Reproducing past reports A stakeholder asks "why did last week's revenue report show different numbers?" Instead of guessing what data has changed since then, you can clone the exact database state from when the report ran and re-execute the same query. This is also useful for validating past analyses, debugging metric discrepancies, and ensuring reproducibility of historical results. ```sql -- Recreate the database state from last Tuesday morning CREATE DATABASE shop_last_tuesday FROM shop_db ( SNAPSHOT_NAME 'baseline' -- or use a timebased reference SNAPSHOT_TIME '2026-02-15 00:00:00' ); -- Re-run the same report query against the historical state SELECT region, sum(amount) AS total_revenue, count(DISTINCT customer_id) AS active_customers FROM shop_last_tuesday.main.orders o JOIN shop_last_tuesday.main.customers c USING (customer_id) WHERE order_date BETWEEN '2026-02-01' AND '2026-02-09' GROUP BY region ORDER BY total_revenue DESC; DROP DATABASE shop_last_tuesday; ``` ## Creating audit checkpoints with named snapshots Regulatory audits, end-of-quarter financial reviews, and legal discovery often require proof of what data looked like at a specific moment. [Named snapshots](/concepts/snapshots#2-named-snapshots ) let you preserve the exact database state at key business milestones. Unlike automatic snapshots, named snapshots are not subject to garbage collection — they persist until you explicitly remove them. This feature is available on the Business plan. ```sql -- Create a named snapshot at end-of-quarter close CREATE SNAPSHOT q1_2026_close OF shop_db; -- Months later, an auditor needs to verify the numbers CREATE DATABASE audit_q1 FROM shop_db ( SNAPSHOT_NAME 'q1_2026_close' ); -- Re-run the audit query against the exact data from that moment SELECT c.region, count(*) AS order_count, sum(o.amount) AS total_revenue FROM audit_q1.main.orders o JOIN audit_q1.main.customers c USING (customer_id) WHERE o.order_date BETWEEN '2026-01-01' AND '2026-03-31' GROUP BY c.region; DROP DATABASE audit_q1; ``` To manage your named snapshots: ```sql -- List all named snapshots SELECT snapshot_id, snapshot_name, database_name, created_ts FROM md_information_schema.database_snapshots WHERE snapshot_name IS NOT NULL; -- Rename a snapshot ALTER SNAPSHOT q1_2026_close SET snapshot_name = 'audit_fy2026_q1'; -- Remove a snapshot name (makes it subject to garbage collection) ALTER SNAPSHOT old_checkpoint SET snapshot_name = ''; ``` ## Best practices - **Clean up clones promptly.** Snapshot clones are zero-copy, but they may hold `historical_bytes` longer than necessary unless they are dropped. When they original database is deleted the clone may still hold `retained_for_clone_bytes`. - **Use `SNAPSHOT_TIME` for exploration, `SNAPSHOT_ID` for precision, `SNAPSHOT_NAME` for re-usability.** When narrowing down a time range, timestamps are convenient. Once you've identified the exact snapshot, switch to the ID to avoid ambiguity. See [restoring a database to a historical snapshot](/concepts/data-recovery#restoring-a-database-to-a-historical-snapshot). - **Set retention to match your needs.** Longer `snapshot_retention_days` gives you a wider time travel window but increases `historical_bytes` storage. See [snapshot retention](/concepts/snapshots#snapshot-retention). - **Use named snapshots for fixed checkpoints.** Automatic snapshots are garbage-collected after the retention window. For audit or compliance points that need to persist, create a [named snapshot](/concepts/snapshots#2-named-snapshots). ## See also - [Database Snapshots](/concepts/snapshots) — Snapshot types, retention, and plan availability - [Data Recovery](/concepts/data-recovery) — Step-by-step restore workflows - [Storage Lifecycle](/concepts/storage-lifecycle) — How historical bytes affect your storage bill - [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) — Clone from a snapshot - [`ALTER DATABASE SET SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) — Restore a database in-place --- Source: https://motherduck.com/docs/key-tasks/how-to-guides --- title: How-to guides sidebar_class_name: how-to-guide-icon description: How-to guides --- ## Included pages - [AI and MotherDuck](https://motherduck.com/docs/category/ai-and-motherduck): Practical guides for using AI with MotherDuck. - [Authenticating and connecting to MotherDuck](https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck): Learn how to authenticate and connect to MotherDuck - [Data Warehousing How-to](https://motherduck.com/docs/key-tasks/data-warehousing): Data Warehousing How-to guides - [Database operations](https://motherduck.com/docs/key-tasks/database-operations): Learn how to work with databases and MotherDuck - [Interacting with cloud storage](https://motherduck.com/docs/key-tasks/cloud-storage): Learn how to work with databases and MotherDuck - [Loading Data into MotherDuck](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck): Learn how to load data into MotherDuck from various sources - [Managing organizations](https://motherduck.com/docs/key-tasks/managing-organizations): Learn how to manage your organization with MotherDuck - [Running dual execution (or hybrid) queries](https://motherduck.com/docs/key-tasks/running-hybrid-queries): Query local and cloud data together using MotherDuck's dual execution hybrid query engine. - [Service accounts](https://motherduck.com/docs/key-tasks/service-accounts-guide): Learn how to create, configure, manage, and impersonate MotherDuck service accounts. - [Sharing data in MotherDuck](https://motherduck.com/docs/key-tasks/sharing-data): Learn how to securely share data in MotherDuck - [Build a customer-facing analytics app](https://motherduck.com/docs/key-tasks/customer-facing-analytics): Build customer-facing analytics applications with read scaling tokens and isolated tenant data. - [3-tier customer-facing analytics guide](https://motherduck.com/docs/key-tasks/customer-facing-analytics/3-tier-cfa-guide): Step-by-step guide to building a 3-tier customer-facing analytics application with MotherDuck. --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/considerations-for-loading-data --- sidebar_position: 0.5 title: Loading Data Best Practices description: Understanding trade-offs and performance implications when loading data into MotherDuck --- # Loading data best practices When loading data into MotherDuck, understanding the trade-offs between different approaches helps you make informed decisions that optimize for your specific use case. This guide explains the key considerations that impact performance, cost, and reliability. ## File format considerations The choice of file format significantly impacts loading performance: | | Parquet (recommended) | CSV | JSON | |---|---|---|---| | **Compression** | 5-10x better than CSV | Minimal | Moderate | | **Performance** | 5-10x more throughput | Slower, especially for large files | Slower than Parquet due to parsing overhead | | **Schema** | Self-describing with embedded metadata | Requires type inference or specification | Flexible but requires careful type handling. DuckDB scans data to discover the schema before running the query, which can add significant time for large or deeply nested files (see [tips for loading JSON](/key-tasks/data-warehousing/replication/flat-files/#json)) | | **Best for** | Production data loading, large datasets | Simple data exploration, small datasets | Semi-structured data, API responses | ## Avoid single-row INSERTs A common mistake is inserting data one row at a time using repeated `INSERT INTO ... VALUES (...)` statements. This pattern is significantly slower than bulk loading because each individual INSERT statement incurs network round-trip overhead to MotherDuck and prevents DuckDB from parallelizing the work. :::tip Do not use single-row `INSERT INTO ... VALUES` statements to load data into MotherDuck. Instead, use bulk approaches like `INSERT INTO ... SELECT` from files, `COPY`, or load data from DataFrames. See [Loading data into MotherDuck](/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck.mdx) for recommended methods. ::: If you're working with a client library (Python, Node.js, Java), avoid looping over rows and calling `execute("INSERT INTO ...")` for each one. Methods like `executemany` also send individual INSERT statements under the hood and are equally slow. Instead, write your data to a file (Parquet or CSV) and load it with `COPY` or `INSERT INTO ... SELECT`, or use a DataFrame-based approach where available. ## Performance optimization strategies ### Batch size DuckDB internally processes data in row groups of ~122,000 rows and parallelizes work across multiple row groups. This means batch size affects both memory usage and throughput: | Batch size | What happens | |---|---| | **1-100 rows** (single-row INSERTs) | Each statement has network and transaction overhead. Very slow — avoid this pattern entirely. | | **100K rows** | Fits in roughly one row group. Already a bulk operation and orders of magnitude faster than row-by-row. Good default chunk size when streaming from Python to manage memory. | | **1M+ rows** | Spans multiple row groups, so DuckDB parallelizes across threads. Best throughput for large loads. | :::tip When streaming data from a client library, load in chunks of at least **100K rows** to keep memory manageable while staying well above row-by-row overhead. For maximum throughput on large datasets, aim for **1M+ rows** per load operation to fully leverage DuckDB's parallelization. ::: Keep individual transactions under roughly one minute. If you have tens of millions of rows, break them into multiple loads rather than one very large transaction. ### Memory management Effective memory management is crucial for large data loads: **Data Type Optimization** - Use explicit schemas to avoid type inference overhead — this is especially important for JSON, where schema discovery can add minutes for large or deeply nested files - Choose appropriate data types (for example, TIMESTAMP for dates) - Avoid unnecessary type conversions **Sorting Strategy** - Sort data by frequently queried columns during loading - To re-sort existing tables, use `CREATE OR REPLACE` with the preferred sorting method - Improves query performance through better data locality - Consider the trade-off between loading speed and query performance ### Network and location considerations **Data Location** - MotherDuck is available on AWS in three regions: **US East (N. Virginia)** - `us-east-1`, **US West (Oregon)** - `us-west-2`, and **Europe (Frankfurt)** - `eu-central-1` - For optimal performance, consider locating source data in the same region as your MotherDuck Organization - Consider network latency when loading from remote sources **Cloud Storage Integration** - Direct integration with S3, R2, GCS, Azure Blob Storage - Use [cloud storage](/integrations/cloud-storage/) to leverage network speeds for better performance - Reduces local storage requirements - Consider setting [force_download=true](https://duckdb.org/docs/stable/configuration/overview) when querying files stored in remote storage to accelerate response times. This could be useful in scenarios where it makes sense to download the full file upfront instead of making many small requests. ## Duckling sizing **Duckling Selection** For data sets under 100 GB in size, use Jumbo Ducklings to load the data. For larger data sizes, use [Mega or Giga](/about-motherduck/billing/duckling-sizes/). ## Summary The key to successful data loading in MotherDuck is understanding the trade-offs between different approaches and optimizing for your specific use case. Focus on: 1. **Bulk loading** with at least 100K rows per chunk, and 1M+ for maximum throughput. 2. If you can control how they are written from sources, use **Parquet** for compression and speed 3. Write data into **S3** for speedy reads. 4. Use **larger Duckling sizes (Jumbo or bigger)** for loading bigger data sets. By following these guidelines and understanding the underlying principles, you can build efficient, reliable data loading pipelines that scale with your needs while managing costs effectively. --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-cloud-or-https --- sidebar_position: 2 title: From Cloud Storage or over HTTPS description: Load data into MotherDuck from S3, Azure, GCS, or public HTTPS URLs. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import useBaseUrl from '@docusaurus/useBaseUrl'; # From cloud storage or over HTTPS # From public cloud storage MotherDuck supports several cloud storage providers, including [Amazon S3](/integrations/cloud-storage/amazon-s3.mdx), [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx) and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). :::note MotherDuck is available on AWS in three regions: **US East (N. Virginia)** - `us-east-1`, **US West (Oregon)** - `us-west-2`, and **Europe (Frankfurt)** - `eu-central-1`. For an optimal experience, we strongly encourage you locate your data in the same region as your MotherDuck Organization. ::: :::tip If you want to inspect storage paths from SQL before loading data, see [`MD_LIST_FILES()`](/sql-reference/motherduck-sql-reference/md-list-files). It supports S3 and Azure paths. For S3 bucket discovery by secret, see [`MD_LIST_BUCKETS_FOR_SECRET()`](/sql-reference/motherduck-sql-reference/md-list-buckets-for-secret). ::: The following example features Amazon S3. 1. In the left panel of the UI, click **Add data** 2. Select **From cloud storage**
3. For a publicly accessible bucket, skip creating a secret
4. Enter the S3 bucket path (e.g., `s3://motherduck-demo`) and select the files you want, or use Wildcard mode to choose files with a matching pattern 5. Preview the files and select the table names and destination database 6. Click **Create tables**
Connect to MotherDuck if you haven't already by doing the following: ```sql -- assuming the db my_db exists ATTACH 'md:my_db'; ``` ```sql -- CTAS a table from a publicly available demo dataset stored in s3 CREATE OR REPLACE TABLE pypi_small AS SELECT * FROM 's3://motherduck-demo/pypi.small.parquet'; -- JOIN the demo dataset against a larger table to find the most common duplicate urls -- Note you can directly refer to the url as a table! SELECT pypi_small.url, COUNT(*) FROM pypi_small JOIN 's3://motherduck-demo/pypi_downloads.parquet' AS s3_pypi ON pypi_small.url = s3_pypi.url GROUP BY pypi_small.url ORDER BY COUNT(*) DESC LIMIT 10; ```
## From a secure cloud storage provider MotherDuck supports several cloud storage providers, including [Amazon S3](/integrations/cloud-storage/amazon-s3.mdx), [Azure](/integrations/cloud-storage/azure-blob-storage.mdx), [Google Cloud](/integrations/cloud-storage/google-cloud-storage.mdx), and [Cloudflare R2](/integrations/cloud-storage/cloudflare-r2). To access them securely, you first must [create a secret](/sql-reference/motherduck-sql-reference/create-secret/). :::info When you load data from cloud storage while connected to MotherDuck, the query runs on MotherDuck's cloud execution engine, not your local machine. MotherDuck connects to your storage provider directly and can use any matching secret, including temporary secrets from your local DuckDB session. For more details, see [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/#querying-with-secrets). ::: :::note For SQL-based object discovery, [`MD_LIST_FILES()`](/sql-reference/motherduck-sql-reference/md-list-files) supports only `s3://`, `azure://`, and `az://` paths. It does not accept `gcs://`, `gs://`, or `r2://` paths. ::: You can set cloud storage secrets directly from the UI under Settings —> Integrations —> Secrets, or with the "Add data" button in the left panel. First, create a secret for your cloud storage credentials: 1. Go to **Settings** → **Integrations** → **Secrets** ![The MotherDuck UI for adding a new secret](./img/loading_data__secrets_overview.png) 2. Click **Add secret** and select your cloud storage provider (S3, R2, GCS, Azure)
3. Enter your access key and secret for your service account in your cloud storage provider. 4. For S3 credentials, you can test and verify your connection before saving Once your secret is configured, load data from your secure bucket: 1. In the left panel of the notebook UI, click **Add data** 2. Select **From cloud storage** 3. Enter the bucket path and select the files you want, or use Wildcard mode to choose files with a matching pattern 4. Preview the files and select the table names and destination database 5. Click **Create tables** :::note When loading data from [Azure](/integrations/cloud-storage/azure-blob-storage) or [Hugging Face](https://duckdb.org/docs/extensions/httpfs/hugging_face), you must use Wildcard mode to select files. Browse mode is not supported for these providers. :::
To create a secret in MotherDuck using the CLI or SQL notebooks you'll need to explicitly add the `IN MOTHERDUCK`. ```sql CREATE SECRET IN MOTHERDUCK ( TYPE S3, KEY_ID 'access_key', SECRET 'secret_key', REGION 'us-east-1', SCOPE 'my-bucket-path' ); -- Now you can query from a secure S3 bucket CREATE OR REPLACE TABLE mytable AS SELECT * FROM 's3://...'; ```
## Over HTTPS MotherDuck supports loading data over HTTPS. ```sql -- Reads the Central Park Squirrel Data SELECT * FROM read_csv_auto('https://docs.google.com/spreadsheets/d/e/2PACX-1vQUZR6ikwZBRXWWQsFaUceEiYzJiVw4OQNGtwGBfcMfVatpCyfxxaWPdoKJIHlwNM-ow1oeW_2F-pO5/pub?gid=2035607922&single=true&output=csv'); ``` ## Related content - [Troubleshooting AWS S3 Secrets](/docs/troubleshooting/aws-s3-secrets/) --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-local-machine --- sidebar_position: 0.9 title: From Your Local Machine description: Moving data from local to MotherDuck through the UI or programmatically. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Single file Using the CLI, you can connect to MotherDuck, create a database, and load a single local file (JSON, Parquet, CSV, etc.) to a MotherDuck table. First, connect to MotherDuck using the `ATTACH` command. ```sql ATTACH 'md:'; ``` Create a cloud database (or point to any existing one) and load a local file into a table. ```sql CREATE DATABASE test01; USE test01; CREATE OR REPLACE TABLE orders as SELECT * from 'orders.csv'; ``` In the MotherDuck UI, you can add JSON, CSV or Parquet file directly using the **Add data** button in the top left of the UI. See the [Getting Started Tutorial](../../../getting-started/e2e-tutorial/part-2#loading-your-data) for details. ## Multiple files or database To upload multiple files at once, or data in other formats supported by DuckDB, you can use the DuckDB CLI or any other supported [DuckDB client](https://duckdb.org/docs/data/multiple_files/overview.html). If your all your files reside from a single table, you can use the [glob syntax to load all files into a single table](https://duckdb.org/docs/data/multiple_files/overview.html). For example, to load all CSV files from a directory into a single table, you can use the following SQL command: ```sql ATTACH 'md:'; CREATE DATABASE test01; USE test01; CREATE OR REPLACE TABLE orders as SELECT * from 'dir/*.csv'; ``` If your files are in different formats or you want to load them into different tables, you can first load the files into different tables in a local DuckDB database and then copy the entire database into MotherDuck. To copy the entire local DuckDB database into MotherDuck, you can use the following SQL commands: ```sql ATTACH 'md:'; ``` ```sql ATTACH 'local.ddb'; CREATE DATABASE cloud_db from 'local.ddb'; ``` --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-postgres --- sidebar_position: 11 title: From a PostgreSQL or MySQL Database description: Learn to load a table from your PostgreSQL or MySQL database into MotherDuck. --- ## Using PostgreSQL or MySQL DuckDB extensions DuckDB's [PostgreSQL extension](https://duckdb.org/docs/extensions/postgres.html) and [MySQL extension](https://duckdb.org/docs/extensions/mysql.html) make it easy to connect to OLTP databases and copy data into MotherDuck from a DuckDB client running on your own machine or compute resource. In this guide we demonstrate the workflow with PostgreSQL. Consult the [DuckDB MySQL extension documentation](https://duckdb.org/docs/extensions/mysql) to adapt the same pattern for MySQL. :::info MotherDuck does not yet support the PostgreSQL and MySQL extensions, so you need to perform the following steps on your own computer or cloud computing resource. We are working on supporting the PostgreSQL extension on the server side so that this can happen within the MotherDuck app in the future with improved performance. ::: ### Prerequisites - **PostgreSQL Database Credentials**: Ensure you have access details to the PostgreSQL database, including host address, port, and user credentials. You can put the user credentials in the [PostgreSQL Password File](https://www.postgresql.org/docs/current/libpq-pgpass.html), [store them in environment variables](https://duckdb.org/docs/extensions/postgres.html#configuring-via-environment-variables), or pass them inline in the script below. - **Network Connectivity**: Your machine must be able to connect to the target PostgreSQL database. - **MotherDuck Credentials**: MotherDuck credentials should be set up. If not, follow the steps in [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md). - **DuckDB**: Either the DuckDB command-line interface or Python + the DuckDB package should be installed and operational. See the [Getting Started tutorials](../../getting-started/getting-started.mdx) for instructions to install DuckDB. ### Steps The following SQL script installs and loads DuckDB's PostgreSQL extension, tunes a few settings that matter for larger bulk loads and copies one PostgreSQL table into the MotherDuck table `my_db.pg_data_schema.first_pg_table`. Fill in the placeholders ``, ``, ``, ``, ``, and `
` with the appropriate values and save the script to a file, for example `ingest_data_from_postgres.sql`. ```sql INSTALL postgres; LOAD postgres; -- Tune the local DuckDB client for a larger initial load. SET threads = 8; SET memory_limit = '8GB'; SET pg_connection_limit = 8; SET pg_pages_per_task = 250; -- Connect to MotherDuck. ATTACH 'md:'; USE my_db; -- Optionally create a schema. By default MotherDuck uses the main schema. CREATE SCHEMA IF NOT EXISTS pg_data_schema; -- Ingest data from PostgreSQL to a MotherDuck table. CREATE OR REPLACE TABLE pg_data_schema.first_pg_table AS SELECT * FROM postgres_scan( 'dbname= host= port=5432 user= password= connect_timeout=10', '', '
' ); -- Optional: verify the number of rows in the MotherDuck table. SELECT count(1) FROM pg_data_schema.first_pg_table; ``` If you only want to smoke-test the connection first, add `LIMIT 1000` to the `SELECT` before running the full load. ### Best practices Here are a few tips to keep larger PostgreSQL loads predictable. #### Run DuckDB close to both systems This workflow is client-side, so the DuckDB client becomes the data mover. Run DuckDB on a machine with a good network path to both PostgreSQL and MotherDuck, and use separate client compute when possible instead of competing with the production PostgreSQL instance for the same RAM. #### Tune scan parallelism explicitly Start with `SET threads = ` and `SET memory_limit = ''`, then tune `pg_connection_limit` and `pg_pages_per_task` for your source table. For larger tables, start with `pg_connection_limit` in the `4-8` range and `pg_pages_per_task` in the `250-1000` range rather than relying on defaults. ::::warning[Watch Out] Increasing `pg_connection_limit` can increase pressure on the source PostgreSQL instance. If PostgreSQL memory or connection pressure climbs, reduce `pg_connection_limit` before reducing DuckDB `threads`. :::: #### Reduce each statement's working set The DuckDB side of this workflow is typically streaming rather than loading the full source table into RAM. Out-of-memory risk is usually driven more by the source PostgreSQL instance and the host's overall headroom than by DuckDB itself. Select only the schema and columns you need, and attach PostgreSQL with `READ_ONLY` if you use `ATTACH` instead of `postgres_scan`. #### Keep credentials out of long-lived scripts Use PostgreSQL environment variables, the PostgreSQL password file, or DuckDB secrets instead of embedding credentials directly in production scripts. #### Load in chunks For very large tables, break the initial load into ranges and insert them one chunk at a time. ```sql INSTALL postgres; LOAD postgres; SET threads = 8; SET memory_limit = '8GB'; SET pg_connection_limit = 8; SET pg_pages_per_task = 250; ATTACH 'md:'; USE my_db; CREATE SCHEMA IF NOT EXISTS pg_data_schema; CREATE TABLE IF NOT EXISTS pg_data_schema.first_pg_table AS SELECT * FROM postgres_scan( 'dbname= host= port=5432 user= password= connect_timeout=10', '', '
' ) WHERE 1 = 0; INSERT INTO pg_data_schema.first_pg_table SELECT * FROM postgres_scan( 'dbname= host= port=5432 user= password= connect_timeout=10', '', '
' ) WHERE updated_at >= TIMESTAMP '2026-01-01' AND updated_at < TIMESTAMP '2026-02-01'; ``` Repeat the `INSERT` statement for each key range or time window until the backfill is complete. If you need recurring replication, change data capture (CDC), or production orchestration, prefer a dedicated ingestion partner over a one-off client-side script. ### Run with DuckDB CLI After filling out the placeholders, you can either execute the statements line by line in the DuckDB CLI, or save the commands in a file, for example `ingest_data_from_postgres.sql`, and run: ```sh > duckdb < ingest_data_from_postgres.sql ``` ### Run with Python You can also execute it using Python with the DuckDB package. ```python import duckdb with open("ingest_data_from_postgres.sql", 'r') as f: s = f.read() duckdb.sql(s) ``` After completing these steps, you should see the new table show up in the MotherDuck Web UI. ## Using MotherDuck ingestion partners MotherDuck collaborates with various integration partners to facilitate data transfer in diverse ways—including change data capture (CDC)—from your PostgreSQL or MySQL database to MotherDuck. For example, you can refer to our [Estuary guide](https://motherduck.com/blog/streaming-data-to-motherduck/) that demonstrates how to stream data from Neon, a PostgreSQL-based database, to MotherDuck. To explore the full range of solutions tailored to your needs, visit our [MotherDuck ecosystem partners page](https://motherduck.com/ecosystem/). --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-into-motherduck --- title: Loading Data into MotherDuck description: Learn how to load data into MotherDuck from various sources --- You can leverage MotherDuck's managed storage to persist your data. MotherDuck storage provides a high level of manageability and abstraction, optimizing your data for secure, durable, performant, and efficient use. There are several ways to load data into MotherDuck storage. ## Before You Start: Understanding Trade-offs Before choosing a loading method, it's important to understand the performance implications and trade-offs involved. Our [Considerations for Loading Data](./considerations-for-loading-data.mdx) guide explains: - **Batch vs. streaming approaches** and when to use each - **File format choices** and their impact on performance - **Optimal batch sizes** for different scenarios - **Cost implications** of different loading strategies - **Common performance pitfalls** and how to avoid them This understanding will help you make informed decisions that optimize for your specific use case. ## Included pages - [Loading Data Best Practices](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/considerations-for-loading-data): Understanding trade-offs and performance implications when loading data into MotherDuck - [From Your Local Machine](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-local-machine): Moving data from local to MotherDuck through the UI or programmatically. - [Loading data to MotherDuck with Python](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-md-python): Efficient methods for loading data from Python using DataFrames, temporary files, or bulk inserts. - [From Cloud Storage or over HTTPS](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-cloud-or-https): Load data into MotherDuck from S3, Azure, GCS, or public HTTPS URLs. - [Load a DuckDB database into MotherDuck](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-duckdb-database): Upload a local DuckDB database file to MotherDuck cloud storage. - [From a PostgreSQL or MySQL Database](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-from-postgres): Learn to load a table from your PostgreSQL or MySQL database into MotherDuck. - [Via the Postgres Endpoint](https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-via-postgres-endpoint): Best practices for loading data into MotherDuck efficiently when you are connected through the Postgres endpoint. --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-md-python --- sidebar_position: 1 title: Loading data to MotherDuck with Python description: Efficient methods for loading data from Python using DataFrames, temporary files, or bulk inserts. --- # Loading data to MotherDuck with Python When you ingest data with Python, typically from an API or other source, you have three options to load it into MotherDuck: 1. **FAST:** Use a Pandas, Polars, or PyArrow dataframe as an in-memory buffer before bulk loading. This is the easiest approach because dataframe libraries are optimized for bulk insert. 2. **FAST:** Write to a temporary file and load it with a `COPY` command. This involves writing to disk, but the `COPY` command is faster than `INSERT` statements. 3. **SLOW:** Use `executemany` to perform several `INSERT` statements in a single transaction. This should be discouraged unless data is very small (fewer than 500 rows). :::tip No matter which options you are picking, we recommend loading data in chunks (typically `100k` rows to match row group size) to avoid memory issues and making sure your transaction is not too large, typically finishing around a minute maximum. You can further optimize the data loading by reading our guidelines on [connections](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md) and [threading](/key-tasks/authenticating-and-connecting-to-motherduck/multithreading-and-parallelism/multithreading-and-parallelism-python.md). ::: ## 1. load data to MotherDuck with Pandas, Polars, or PyArrow When using a dataframe library you can load data to MotherDuck in a single transaction. DuckDB uses Apache Arrow as its internal data interchange format. This means **PyArrow and Polars** (which are Arrow-native) benefit from zero-copy data transfer, making them the most memory-efficient choice. **Pandas** with the default NumPy backend copies data during transfer, which doubles memory usage. If you use Pandas, consider using [Arrow-backed dtypes](https://pandas.pydata.org/docs/user_guide/pyarrow.html) (`dtype_backend="pyarrow"`) to avoid the extra copy. ```python # Creating your table with PyArrow import duckdb import pyarrow as pa data = { 'id': [1, 2, 3, 4, 5], 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'] } arrow_table = pa.table(data) con = duckdb.connect('md:') con.sql('CREATE TABLE my_table AS SELECT * FROM arrow_table') ``` ### Batching data When you have a large dataset, it's recommended you chunk your data and load it in batches. This will help you to avoid memory issues and make sure your transaction is not too large. This example uses PyArrow and DuckDB in a class to: 1. Initialize a connection 2. Create a database and table if they do not already exist 3. Accept a PyArrow table to insert 4. Insert the data in chunks ```python import duckdb import os import pyarrow as pa import logging # Setup basic configuration for logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') class ArrowTableLoadingBuffer: def __init__( self, duckdb_schema: str, pyarrow_schema: pa.Schema, database_name: str, table_name: str, destination="local", chunk_size: int = 100_000, # Default chunk size ): self.duckdb_schema = duckdb_schema self.pyarrow_schema = pyarrow_schema self.database_name = database_name self.table_name = table_name self.total_inserted = 0 self.conn = self.initialize_connection(destination, duckdb_schema) self.chunk_size = chunk_size def initialize_connection(self, destination, sql): if destination == "md": logging.info("Connecting to MotherDuck...") if not os.environ.get("motherduck_token"): raise ValueError( "MotherDuck token is required. Set the environment variable 'MOTHERDUCK_TOKEN'." ) conn = duckdb.connect("md:") logging.info( f"Creating database {self.database_name} if it doesn't exist" ) conn.execute(f"CREATE DATABASE IF NOT EXISTS {self.database_name}") conn.execute(f"USE {self.database_name}") else: conn = duckdb.connect(database=f"{self.database_name}.db") conn.execute(sql) # Execute schema setup on initialization return conn def insert(self, table: pa.Table): total_rows = table.num_rows for batch_start in range(0, total_rows, self.chunk_size): batch_end = min(batch_start + self.chunk_size, total_rows) chunk = table.slice(batch_start, batch_end - batch_start) self.insert_chunk(chunk) logging.info(f"Inserted chunk {batch_start} to {batch_end}") self.total_inserted += total_rows logging.info(f"Total inserted: {self.total_inserted} rows") def insert_chunk(self, chunk: pa.Table): self.conn.register("buffer_table", chunk) insert_query = f"INSERT INTO {self.table_name} SELECT * FROM buffer_table" self.conn.execute(insert_query) self.conn.unregister("buffer_table") ``` Using the above class, you can load your data in chunks. ```python import pyarrow as pa # Define the explicit PyArrow schema pyarrow_schema = pa.schema([ ('id', pa.int32()), ('name', pa.string()) ]) # Sample data to create a PyArrow table based on the schema data = { 'id': [1, 2, 3, 4, 5], 'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'] } arrow_table = pa.table(data, schema=pyarrow_schema) # Define the DuckDB schema as a DDL statement duckdb_schema = "CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)" # Initialize the loading buffer loader = ArrowTableLoadingBuffer( duckdb_schema=duckdb_schema, pyarrow_schema=pyarrow_schema, database_name="my_db", # The DuckDB database filename or MotherDuck database name table_name="my_table", # The name of the table in DuckDB or MotherDuck destination="md", # Set "md" for MotherDuck or "local" for a local DuckDB database chunk_size=2 # Example chunk size for illustration ) # Load the data loader.insert(arrow_table) ``` ### Typing your dataset When working with production pipeline, it's recommended to type your dataset to avoid any issues with inference. Pyarrow is our recommendation to type your dataset as it's the easiest way to type your dataset, especially for complex data types. In the above example, the schema is defined explicitly on both the PyArrow table and the DuckDB schema. ```python # Initialize the loading buffer loader = ArrowTableLoadingBuffer( duckdb_schema=duckdb_schema, # prepare a DuckDB DDL statement pyarrow_schema=pyarrow_schema, # define explictely your PyArrow schema database_name="my_db", table_name="my_table", destination="md", chunk_size=2 ) ``` ## 2. write to a temporary file and load with `COPY` When you have a large dataset, another method is to write your data to temporary files and load it to MotherDuck using a `COPY` command. This also works great if you have existing data on a blob storage like AWS S3, Google Cloud Storage or Azure Blob Storage as you will benefit from cloud network speed. ```python import pyarrow as pa import pyarrow.parquet as pq import duckdb import os # Step 1: Define the schema and create a large PyArrow table schema = pa.schema([ ('id', pa.int32()), ('name', pa.string()) ]) # Example data - multiply the data to simulate a large dataset data = { 'id': list(range(1, 1000001)), # Simulating 1 million rows 'name': ['Name_' + str(i) for i in range(1, 1000001)] } # Create the PyArrow table with the schema large_table = pa.table(data, schema=schema) # Step 2: Write the large PyArrow table to a Parquet file parquet_file = "large_data.parquet" pq.write_table(large_table, parquet_file) # Step 3: Load the Parquet file into MotherDuck using the COPY command conn = duckdb.connect("md:") # Connect to MotherDuck conn.execute("CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)") # Use the COPY command to load the Parquet file into MotherDuck conn.execute(f"COPY my_table FROM '{os.path.abspath(parquet_file)}' (FORMAT 'parquet')") print("Data successfully loaded into MotherDuck") ``` ## 3. use `executemany` for small datasets For small datasets (fewer than 500 rows), you can use the `executemany` method to insert data row by row in a single transaction. This approach is the slowest of the three options and should only be used when working with very small amounts of data. ```python import duckdb # Sample data as a list of tuples data = [ (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'), (4, 'David'), (5, 'Eva') ] con = duckdb.connect('md:') con.execute('CREATE TABLE IF NOT EXISTS my_table (id INTEGER, name VARCHAR)') con.executemany('INSERT INTO my_table VALUES (?, ?)', data) print("Data successfully loaded into MotherDuck") ``` :::warning The `executemany` method sends individual `INSERT` statements, which is significantly slower than the dataframe or `COPY` approaches. Use Option 1 or Option 2 for datasets larger than a few hundred rows. ::: --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-data-via-postgres-endpoint --- sidebar_position: 12 title: Via the Postgres Endpoint description: Best practices for loading data into MotherDuck efficiently when you are connected through the Postgres endpoint. --- # Loading data via the Postgres endpoint MotherDuck's Postgres endpoint is a good thin-client loading path when your application, BI tool, or serverless runtime already speaks PostgreSQL and you want to run SQL in MotherDuck without installing a DuckDB client. It is best suited to server-side loading from remote data sources. :::tip Best practice If your files already live in object storage or are available over HTTPS, use the Postgres endpoint to run `CREATE TABLE AS SELECT` or `INSERT INTO ... SELECT` and let MotherDuck read the files remotely. ::: If your data is on your laptop, application server disk, or in a local DuckDB file, a DuckDB client path is usually a better fit. In that case, either: - Upload the files to object storage first, then load them remotely through the Postgres endpoint. - Use a DuckDB client path instead, such as `duckdb`, Python DuckDB, or another DuckDB client connected to `md:`. ## Recommended patterns ### Load directly from cloud storage or HTTPS This is the preferred pattern for the Postgres endpoint. The examples below use public sample files so you can run them directly. ```sql CREATE OR REPLACE TABLE my_db.main.orders_raw AS SELECT * FROM read_parquet( 'https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet', MD_RUN = REMOTE ); ``` You can use the same approach with CSV or JSON: ```sql CREATE OR REPLACE TABLE my_db.main.weather_events AS SELECT * FROM read_csv( 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv', HEADER = true, AUTO_DETECT = true, MD_RUN = REMOTE ); ``` This keeps the work inside MotherDuck and avoids sending rows one statement at a time over the Postgres wire. ### Load into a staging table, then transform For repeatable pipelines, stage the raw data first and then publish into the final table. ```sql CREATE SCHEMA IF NOT EXISTS my_db.ingest; CREATE OR REPLACE TABLE my_db.ingest.orders_stage AS SELECT * FROM read_parquet( 'https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet', MD_RUN = REMOTE ); CREATE OR REPLACE TABLE my_db.main.orders_curated AS SELECT o_orderkey AS order_id, o_custkey AS customer_id, o_orderdate::TIMESTAMP AS order_ts, o_totalprice::DOUBLE AS total_amount FROM my_db.ingest.orders_stage; ``` This keeps ingestion and transformation separate, which makes validation, retries, and backfills easier. ### Batch rows if the data exists only in application memory If your source data exists only in application memory, use multi-row `INSERT` statements instead of row-by-row inserts. Recommended: ```sql CREATE OR REPLACE TABLE my_db.main.orders_batch ( id INTEGER, note VARCHAR, amount DOUBLE ); INSERT INTO my_db.main.orders_batch VALUES (1, 'a', 10.0), (2, 'b', 20.0), (3, 'c', 30.0); ``` Less efficient: ```sql INSERT INTO my_db.main.orders_batch VALUES (1, 'a', 10.0); INSERT INTO my_db.main.orders_batch VALUES (2, 'b', 20.0); INSERT INTO my_db.main.orders_batch VALUES (3, 'c', 30.0); ``` Single-row inserts create unnecessary round trips and are much slower for loading. When loading rows from an application: - fewer, larger batches - append-only staging tables - transactions that stay comfortably below a minute ## Use a DuckDB client path instead when The Postgres endpoint is not currently intended for workflows that depend on local DuckDB-client capabilities. Use a DuckDB client path instead when you need: - local-file `COPY` - `EXPORT DATABASE` - `IMPORT DATABASE` - `ATTACH ':memory:'` - `ATTACH '/path/to/file.duckdb'` - `CREATE DATABASE ... FROM '/path/to/file.duckdb'` - `MD_RUN = LOCAL` - `INSTALL` and `LOAD` In practice, that means the Postgres endpoint is not the primary interface for: - loading directly from local files - attaching local or in-memory DuckDB databases - extension-based workflows - local execution paths such as `MD_RUN = LOCAL` ## Protected cloud storage If you are loading from protected S3, GCS, R2, or Azure storage, make sure the required MotherDuck secret already exists. Cloud-storage secret creation requires DuckDB extension support and is not currently supported through the Postgres endpoint. The recommended workflow is: 1. Create the secret using a DuckDB client path or another supported MotherDuck workflow. 2. Then use the Postgres endpoint to run the load query. ## Decision guide | Situation | Best approach | |---|---| | Files already in S3, GCS, R2, Azure, or public HTTPS | Use `read_parquet`, `read_csv`, or `read_json` with `MD_RUN = REMOTE` over the Postgres endpoint | | Data is local on your machine | Prefer a DuckDB client path, or upload the files to object storage first | | Data exists only in app memory and volume is modest | Use explicit large multi-row `INSERT` batches over the Postgres endpoint | | Very large local bulk load | Use a DuckDB client path instead | ## Summary For the best mix of throughput and simplicity: 1. Write source files as Parquet when you can. 2. Put them in object storage close to your MotherDuck region. 3. Use the Postgres endpoint to run `CREATE TABLE AS SELECT` or `INSERT INTO ... SELECT` with `MD_RUN = REMOTE`. 4. Stage first, validate row counts and schemas, then publish into the final table. ## Related pages - [Postgres Endpoint reference](/sql-reference/postgres-endpoint) - [Loading data best practices](./considerations-for-loading-data.mdx) - [From cloud storage or HTTPS](./loading-data-from-cloud-or-https.md) - [From your local machine](./loading-data-from-local-machine.md) - [Loading a DuckDB database](./loading-duckdb-database.md) - [Connect from Python via Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/python) --- Source: https://motherduck.com/docs/key-tasks/loading-data-into-motherduck/loading-duckdb-database --- sidebar_position: 4 title: Load a DuckDB database into MotherDuck description: Upload a local DuckDB database file to MotherDuck cloud storage. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck supports uploading local DuckDB databases in the cloud as referenced by the [CREATE DATABASE](/sql-reference/motherduck-sql-reference/create-database.md) statement. To create a remote database from the current active local database, execute the following command: ```sql CREATE OR REPLACE DATABASE remote_database_name FROM CURRENT_DATABASE(); ``` To upload an attached local duckdb database, execute the following commands: ```sql ATTACH '/path/to/local/database.ddb' AS local_db_name; ATTACH 'md:'; CREATE OR REPLACE DATABASE remote_database_name FROM local_db_name; ``` To upload an duckdb file on disk: ```sql ATTACH 'md:'; CREATE OR REPLACE DATABASE remote_database_name FROM '/path/to/local/database.ddb'; ``` Here's a full end-to-end example: ```sql -- Let's generate some data based on the tpch extension (will be automatically autoloaded). -- This will create a couple of tables in the current database. CALL dbgen(sf=0.1); -- Connect to MotherDuck ATTACH 'md:'; CREATE OR REPLACE DATABASE remote_tpch from CURRENT_DATABASE(); ``` :::note Uploading database does not alter context, meaning you are still in the local context after the upload and the query will run locally. ::: --- Source: https://motherduck.com/docs/key-tasks/managing-organizations/managing-organizations --- title: Managing organizations description: Learn how to manage your organization with MotherDuck --- import Versions from '@site/src/components/Versions'; An organization is a top-level entity in MotherDuck that lets you perform administrative functions, such as managing users, setting up billing, configuring sharing, and monitoring security. A MotherDuck user can only belong to a single organization at a time. Multi-organization membership support is planned for a future release. ::: ::: Organizations are helpful for: - Grouping users together for tracking usage and billing. - Sharing data with other users of the same organization. :::note MotherDuck is available on three AWS regions: - **US East (N. Virginia):** `us-east-1`, supporting DuckDB versions between and . - **US West (Oregon):** `us-west-2`, supporting DuckDB versions between and . - **Europe (Frankfurt):** `eu-central-1`, supporting DuckDB versions between and . You can choose the region in which to create your organization. Organizations can only exist within a single cloud region. ::: ## Creating an organization If you already have a MotherDuck account, an organization was already created for you by MotherDuck. If you are a new MotherDuck user, during sign-up you will be prompted to create a new organization. ![create_org](./img/create_org.png) :::note If another coworker at your company already has an organization, you can create your own organization to get started with MotherDuck right away, and then ask them to invite you to their organization later (see ["Joining an existing organization"](#joining-an-existing-organization) below). ::: ## Inviting users to your organization You can check if your teammates are in your organization by navigating to the MotherDuck Web UI -> **Settings** -> **Members**. There you can also invite your teammates to join your organization. You can invite both teammates without a MotherDuck account and existing MotherDuck users. ![members](./img/members.png) Admins can control whether members are allowed to send invitations. When organization invites are disabled, only Admins can invite new users. This gives you tighter control over who has access to MotherDuck. You can configure this setting from the organization **Settings** page. ![invite policy](../authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/img/org-invite-policy.png) :::tip If your organization has [SSO enabled](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/) you can use [Just-in-Time (JIT) provisioning](/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/sso-setup/#just-in-time-jit-user-provisioning) enabled, users in your verified domains who authenticate through your identity provider can join the organization on first login without needing an invitation. ::: ## Joining an existing organization If you'd like to join your teammates' existing MotherDuck organization, you must be invited by an Admin in that organization. Once an invite is generated, you will receive an email with a link to join the organization. ## Roles Within an organization a user can have an "Admin" or "Member" role. The first user in an organization is the Admin and subsequent users have the Member role. Admin users can change the roles of other users in the organization or remove a user from the organization. :::note Sending invitations, changing between plans, and updating billing information requires an Admin role. ::: ## Removing users If a user leaves your team or no longer needs access, Admin users can remove them from the organization to restrict data access or clean up resources that are no longer used. This is done from the context menu in the [Members table](https://app.motherduck.com/settings/members). :::warning Because a user can only belong to one organization, removing them from the organization permanently deletes the user and all of their data. This action cannot be undone. ::: ## Limitations - It is not possible to search for existing organizations to join. Please reach out to other MotherDuck users at your company or [contact us](../../troubleshooting/support.md) if you would like to find other existing users at your company. --- Source: https://motherduck.com/docs/key-tasks/running-hybrid-queries --- sidebar_position: 9 title: Running dual execution (or hybrid) queries description: Query local and cloud data together using MotherDuck's dual execution hybrid query engine. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck can use local data and remote data in the same query. **Example:** Run the DuckDB CLI. ```bash duckdb ``` Connect to MotherDuck. You may be prompted to sign in if you aren't already. ```sql ATTACH 'md:'; ``` Run the following in a MotherDuck notebook. Create a local database in memory. ```sql ATTACH ':memory:' AS local_db; CREATE TABLE local_db.pricing AS FROM (VALUES ('A', 1.4), ('B', 1.12), ('C', 2.552), ('D', 5.23)) pricing(item, price); FROM local_db.pricing; ``` ```bash ┌─────────┬──────────────┐ │ item │ price │ │ varchar │ decimal(4,3) │ ├─────────┼──────────────┤ │ A │ 1.400 │ │ B │ 1.120 │ │ C │ 2.552 │ │ D │ 5.230 │ └─────────┴──────────────┘ ``` Create a remote database in MotherDuck. ```sql CREATE OR REPLACE DATABASE remote_db; CREATE TABLE remote_db.sales AS SELECT 'ABCD'[floor(random() * 3.999)::int + 1] AS item, current_date() - interval (random() * 100) days AS dt, floor(random() * 50)::int AS tally FROM generate_series(1000); FROM remote_db.sales LIMIT 10; ``` ```bash ┌─────────┬─────────────────────┬───────┐ │ item │ dt │ tally │ │ varchar │ timestamp │ int32 │ ├─────────┼─────────────────────┼───────┤ │ D │ 2024-11-29 00:00:00 │ 0 │ │ A │ 2024-10-04 00:00:00 │ 17 │ │ A │ 2024-10-13 00:00:00 │ 0 │ │ C │ 2024-11-05 00:00:00 │ 49 │ │ A │ 2024-09-30 00:00:00 │ 12 │ │ B │ 2024-09-27 00:00:00 │ 47 │ │ C │ 2024-11-23 00:00:00 │ 47 │ │ B │ 2024-09-18 00:00:00 │ 13 │ │ A │ 2024-11-18 00:00:00 │ 40 │ │ C │ 2024-09-18 00:00:00 │ 4 │ ├─────────┴─────────────────────┴───────┤ │ 10 rows 3 columns │ └───────────────────────────────────────┘ ``` Join the remote sales table to our local pricing data to get revenue by month. ```sql SELECT date_trunc('month', dt) AS mo, round(sum(price * tally),2) AS rev FROM remote_db.sales JOIN (FROM local_db.pricing WHERE price > 2) pricing ON sales.item = pricing.item GROUP BY mo ORDER BY mo; ``` ```bash ┌────────────┬───────────────┐ │ mo │ rev │ │ date │ decimal(38,2) │ ├────────────┼───────────────┤ │ 2024-09-01 │ 9241.39 │ │ 2024-10-01 │ 14226.12 │ │ 2024-11-01 │ 13136.55 │ │ 2024-12-01 │ 7783.26 │ └────────────┴───────────────┘ ``` To see what is running locally and remotely, you can use EXPLAIN: ```sql EXPLAIN SELECT date_trunc('month', dt) AS mo, round(sum(price * tally),2) AS rev FROM remote_db.sales JOIN (FROM local_db.pricing WHERE price > 2) pricing ON sales.item = pricing.item GROUP BY mo ORDER BY mo; ``` In each operator of the plan, `(L)` indicates local while `(R)` indicates remote. Data is transferred using sinks and sources. ```bash ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Physical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ DOWNLOAD_SOURCE (L) │ │ ──────────────────── │ │ bridge_id: 1 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ BATCH_DOWNLOAD_SINK (R) │ │ ──────────────────── │ │ bridge_id: 1 │ │ parallel: true │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ ORDER_BY (R) │ │ ──────────────────── │ │ date_trunc('month', sales │ │ .dt) ASC │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (R) │ │ ──────────────────── │ │ 0 │ │ rev │ │ │ │ ~125 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ HASH_GROUP_BY (R) │ │ ──────────────────── │ │ Groups: #0 │ │ Aggregates: sum(#1) │ │ │ │ ~125 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (R) │ │ ──────────────────── │ │ mo │ │ (CAST(price AS DECIMAL(14 │ │ ,3)) * CAST(tally AS │ │ DECIMAL(14,0))) │ │ │ │ ~250 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (R) │ │ ──────────────────── │ │ #0 │ │ #1 │ │ #2 │ │__internal_compress_string_│ │ utinyint(#3) │ │ #4 │ │ │ │ ~250 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ HASH_JOIN (R) │ │ ──────────────────── │ │ Join Type: INNER │ │ │ │ Conditions: ├──────────────┐ │ item = item │ │ │ │ │ │ ~250 Rows │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ SEQ_SCAN (R) ││ UPLOAD_SOURCE (R) │ │ ──────────────────── ││ ──────────────────── │ │ sales ││ bridge_id: 2 │ │ ││ │ │ Projections: ││ │ │ item ││ │ │ dt ││ │ │ tally ││ │ │ ││ │ │ ~1001 Rows ││ │ └───────────────────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ BATCH_UPLOAD_SINK (L) │ │ ──────────────────── │ │ bridge_id: 2 │ │ parallel: true │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION (L) │ │ ──────────────────── │ │ item │ │ price │ │ │ │ ~1 Rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ SEQ_SCAN (L) │ │ ──────────────────── │ │ pricing │ │ │ │ Projections: │ │ price │ │ item │ │ │ │ Filters: │ │ price>2.000 AND price IS │ │ NOT NULL │ │ │ │ ~1 Rows │ └───────────────────────────┘ ``` A dual execution (or hybrid) query can be run on any database format supported by DuckDB, including [sqlite](https://duckdb.org/docs/stable/core_extensions/sqlite), [postgres](https://duckdb.org/docs/stable/core_extensions/postgres.html) and many others. --- Source: https://motherduck.com/docs/key-tasks/service-accounts-guide/create-and-configure-service-accounts --- title: Create and configure service accounts description: Learn how to create service accounts, create access tokens, and configure Duckling resources. sidebar_position: 1 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; A service account is a non-human user identity for workloads that need to connect to MotherDuck without using a person's credentials. Use service accounts for backend services, scheduled pipelines, BI connections, embedded analytics, and customer-facing analytics workloads. Each service account has its own credentials and Duckling configuration. This gives the workload isolated compute and makes it easier to rotate credentials without disrupting human users. :::warning[Admin access required] Creating service accounts, creating service account tokens, and configuring service account Ducklings requires an organization Admin. REST API examples use a read/write access token generated by an Admin user. Pass the token in the `Authorization` header as `Bearer `. ::: ## Create a service account Choose a stable username for the service account. The username must be unique within your organization and can contain letters, numbers, and underscores. ![Service account creation form](../img/sa_ui.png) 1. In the MotherDuck UI, go to **Settings** > **Service Accounts**. 2. Click **Create service account**. 3. Enter a username for the service account. 4. Click **Create service account**. Use the [`POST /v1/users`](/sql-reference/rest-api/users-create-service-account/) endpoint to create a service account. ```bash curl -X POST \ https://api.motherduck.com/v1/users \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "username": "analytics_service_account" }' ``` The response includes the service account `username`. Store this username in your provisioning system. The REST API doesn't provide an endpoint for listing all service accounts in an organization. Use the [`POST /v1/users`](/sql-reference/rest-api/users-create-service-account/) endpoint to create a service account. ```python import requests response = requests.post( "https://api.motherduck.com/v1/users", headers={ "Authorization": "Bearer ", "Content-Type": "application/json", }, json={"username": "analytics_service_account"}, ) response.raise_for_status() print(response.json()["username"]) ``` The response includes the service account `username`. Store this username in your provisioning system. The REST API doesn't provide an endpoint for listing all service accounts in an organization. ## Create an access token Create a token for the service account after you create the account. The token value is shown only once, so store it in a secret manager before closing the modal or discarding the API response. ![Service account details page](../img/sa_details.png) 1. In **Settings** > **Service Accounts**, open the service account details page. 2. Click **Create token**. 3. Enter a token name. 4. Choose the token type: - **Read/Write Token** for writes, administration, and general service workloads. - **Read Scaling Token** for read-heavy workloads that should use [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). 5. To set an expiration, select **Automatically expire this token** and choose a time-to-live. 6. Click **Create token**, then copy the token and store it securely. Use the [`POST /v1/users/{username}/tokens`](/sql-reference/rest-api/users-create-token/) endpoint to create a token for a known service account username. ```bash curl -X POST \ https://api.motherduck.com/v1/users/analytics_service_account/tokens \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "name": "analytics-service-token", "token_type": "read_write" }' ``` Set `token_type` to `read_scaling` when you need a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). To create an expiring token, include `ttl` as seconds between `300` and `31536000`. Use the [`POST /v1/users/{username}/tokens`](/sql-reference/rest-api/users-create-token/) endpoint to create a token for a known service account username. ```python import requests response = requests.post( "https://api.motherduck.com/v1/users/analytics_service_account/tokens", headers={ "Authorization": "Bearer ", "Content-Type": "application/json", }, json={ "name": "analytics-service-token", "token_type": "read_write", }, ) response.raise_for_status() token = response.json()["token"] print(token) ``` Set `token_type` to `read_scaling` when you need a [read scaling token](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). To create an expiring token, include `ttl` as seconds between `300` and `31536000`. :::note If you create a service account through the API and plan to use read scaling, connect as that service account with a read/write token before using read scaling tokens for that account. ::: ## Configure Ducklings Configure Duckling resources for the service account based on the workload it runs. The read/write Duckling handles writes and general queries. The read scaling pool handles read-only connections that use read scaling tokens. ![Service account Duckling size settings](../img/sa_set_instance_size.png) 1. In **Settings** > **Service Accounts**, find the service account. 2. Use the **Read/Write Duckling** dropdown to choose the read/write Duckling size. 3. If you use [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/), choose the read scaling Duckling size and pool size. Use [`GET /v1/users/{username}/instances`](/sql-reference/rest-api/ducklings-get-duckling-config-for-user/) to inspect the current configuration before updating it. ```bash curl -X GET \ https://api.motherduck.com/v1/users/analytics_service_account/instances \ -H "Authorization: Bearer " ``` Then use [`PUT /v1/users/{username}/instances`](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/) to update the service account's Ducklings. ```bash curl -X PUT \ https://api.motherduck.com/v1/users/analytics_service_account/instances \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "config": { "read_write": { "instance_size": "standard" }, "read_scaling": { "instance_size": "pulse", "flock_size": 4 } } }' ``` The update request requires both `read_write` and `read_scaling` configuration blocks. Use [`GET /v1/users/{username}/instances`](/sql-reference/rest-api/ducklings-get-duckling-config-for-user/) to inspect the current configuration before updating it. ```python import requests headers = {"Authorization": "Bearer "} current_config = requests.get( "https://api.motherduck.com/v1/users/analytics_service_account/instances", headers=headers, ) current_config.raise_for_status() print(current_config.json()) ``` Then use [`PUT /v1/users/{username}/instances`](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/) to update the service account's Ducklings. ```python import requests response = requests.put( "https://api.motherduck.com/v1/users/analytics_service_account/instances", headers={ "Authorization": "Bearer ", "Content-Type": "application/json", }, json={ "config": { "read_write": {"instance_size": "standard"}, "read_scaling": { "instance_size": "pulse", "flock_size": 4, }, } }, ) response.raise_for_status() print(response.json()) ``` The update request requires both `read_write` and `read_scaling` configuration blocks. ## Connect as the service account Use the service account token anywhere you would use a MotherDuck access token. For example, set `motherduck_token` in a DuckDB connection string or set `MOTHERDUCK_TOKEN` in your environment. See [Connecting to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/) for connection string examples. ## Related content - [Manage service accounts and tokens](/key-tasks/service-accounts-guide/manage-service-accounts-and-tokens/) - [Impersonate service accounts](/key-tasks/service-accounts-guide/impersonate-service-accounts/) - [MotherDuck REST API](/sql-reference/rest-api/motherduck-rest-api/) - [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) --- Source: https://motherduck.com/docs/key-tasks/service-accounts-guide/impersonate-service-accounts --- title: Impersonate service accounts description: Use UI impersonation to troubleshoot and inspect resources as a service account. sidebar_position: 2 --- Organization Admins can impersonate a service account in the MotherDuck UI. Impersonation is useful when you need to inspect resources, run one-off queries, or troubleshoot service account-specific behavior from that account's point of view. Impersonation is different from using a service account token. Tokens are for applications and automation. Impersonation is an interactive UI workflow for Admin users. :::warning[UI only] Service account impersonation is available only in the MotherDuck UI. DuckDB clients, the CLI, and the REST API don't support impersonation sessions. Use service account tokens for non-UI access. ::: ## Start an impersonation session ![Service account impersonation action](../img/sa_impersonate_option.png) 1. In the MotherDuck UI, go to **Settings** > **Service Accounts**. 2. Open the three-dot menu for the service account. 3. Click **Impersonate this account**. 4. The UI refreshes and signs you in as the service account. While impersonating, MotherDuck shows a banner with controls to refresh the session or return to your Admin account. ![Service account impersonation banner](../img/sa_impersonate_banner.png) Impersonation sessions expire after two hours. Refresh the browser tab to reset the expiry countdown. :::tip You can bookmark the URL while impersonating a service account. Opening the bookmark starts a new impersonation session for the same service account when you're signed in as an Admin user. ::: ## Use impersonation for troubleshooting Use impersonation when you need to: - Verify which databases, shares, secrets, and Dives the service account can access. - Run read-write actions as the service account from the MotherDuck UI. - Inspect query history and ongoing query activity for that service account. - Confirm that a service account-specific setup works before wiring it into an application. ## Use tokens for applications Applications and DuckDB clients should connect with a service account token instead of impersonation. Create a read/write token for workloads that need to write data or manage resources. Create a read scaling token for read-heavy workloads that should use [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). ## Related content - [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) - [Manage service accounts and tokens](/key-tasks/service-accounts-guide/manage-service-accounts-and-tokens/) - [Connecting to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/) --- Source: https://motherduck.com/docs/key-tasks/service-accounts-guide/index --- title: Service accounts description: Learn how to create, configure, manage, and impersonate MotherDuck service accounts. --- Service accounts are non-human user identities for workloads that need to connect to MotherDuck without using a person's credentials. Use these guides to create service accounts, configure their Ducklings, manage tokens, and troubleshoot through UI impersonation. ## Included pages - [Create and configure service accounts](https://motherduck.com/docs/key-tasks/service-accounts-guide/create-and-configure-service-accounts): Learn how to create service accounts, create access tokens, and configure Duckling resources. - [Impersonate service accounts](https://motherduck.com/docs/key-tasks/service-accounts-guide/impersonate-service-accounts): Use UI impersonation to troubleshoot and inspect resources as a service account. - [Manage service accounts and tokens](https://motherduck.com/docs/key-tasks/service-accounts-guide/manage-service-accounts-and-tokens): Use the MotherDuck UI and REST API to view, delete, and rotate service account tokens. --- Source: https://motherduck.com/docs/key-tasks/service-accounts-guide/manage-service-accounts-and-tokens --- title: Manage service accounts and tokens description: Use the MotherDuck UI and REST API to view, delete, and rotate service account tokens. sidebar_position: 3 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; Use the MotherDuck UI for service account inventory and one-off administration. Use the REST API when your automation already knows the target service account username. :::warning[Admin access required] Managing service accounts and service account tokens requires an organization Admin. REST API examples use a read/write access token generated by an Admin user. ::: ## Check what each interface supports | Task | MotherDuck UI | REST API | |---|---|---| | List all service accounts in an organization | Yes | No | | Create a service account | Yes | Yes, with [`POST /v1/users`](/sql-reference/rest-api/users-create-service-account/) | | View tokens for a known service account | Yes | Yes, with [`GET /v1/users/{username}/tokens`](/sql-reference/rest-api/users-list-tokens/) | | Create a token for a known service account | Yes | Yes, with [`POST /v1/users/{username}/tokens`](/sql-reference/rest-api/users-create-token/) | | Revoke a known token | Yes | Yes, with [`DELETE /v1/users/{username}/tokens/{token_id}`](/sql-reference/rest-api/users-delete-token/) | | Delete a known service account | Yes | Yes, with [`DELETE /v1/users/{username}`](/sql-reference/rest-api/users-delete/) | | View or configure Ducklings for a known service account | Yes | Yes, with the [Duckling configuration endpoints](/sql-reference/rest-api/ducklings-get-duckling-config-for-user/) | | Impersonate a service account | Yes | No | The REST API doesn't provide an endpoint for listing all service accounts in an organization. If you provision service accounts through the API, store the returned usernames in your own system. ## View service accounts ![Service account management page](../img/sa_manage_details.png) 1. In the MotherDuck UI, go to **Settings** > **Service Accounts**. 2. Review the service account list. 3. Click a username to view that service account's details and tokens. 4. Use the Duckling size and pool size dropdowns to review compute configuration. The REST API doesn't provide a service account list endpoint. Use the UI to view organization-level service account inventory. For automated provisioning, persist the `username` returned by [`POST /v1/users`](/sql-reference/rest-api/users-create-service-account/) when you create each service account. ## View tokens for a service account The token list shows token metadata, including token ID, name, type, creation time, and expiration time. It doesn't return the token secret. 1. In **Settings** > **Service Accounts**, open the service account details page. 2. Review the token list. Use [`GET /v1/users/{username}/tokens`](/sql-reference/rest-api/users-list-tokens/) to list tokens for a known service account username. ```bash curl -X GET \ https://api.motherduck.com/v1/users/analytics_service_account/tokens \ -H "Authorization: Bearer " ``` Use [`GET /v1/users/{username}/tokens`](/sql-reference/rest-api/users-list-tokens/) to list tokens for a known service account username. ```python import pprint import requests response = requests.get( "https://api.motherduck.com/v1/users/analytics_service_account/tokens", headers={"Authorization": "Bearer "}, ) response.raise_for_status() pprint.pp(response.json()["tokens"]) ``` ## Rotate a service account token Rotate tokens by creating a replacement token before revoking the old token. 1. Create a replacement token for the service account. 2. Update your secret manager or application configuration to use the replacement token. 3. Deploy or restart clients that use the token. 4. Verify that the workload can connect to MotherDuck with the replacement token. 5. Revoke the old token. ## Revoke a token ![Service account token actions](../img/sa_revoke_token_option.png) 1. In **Settings** > **Service Accounts**, open the service account details page. 2. Open the token's three-dot menu. 3. Click **Revoke token**. 4. Confirm the revocation. Use [`DELETE /v1/users/{username}/tokens/{token_id}`](/sql-reference/rest-api/users-delete-token/) to revoke a known token. ```bash curl -X DELETE \ "https://api.motherduck.com/v1/users/analytics_service_account/tokens/" \ -H "Authorization: Bearer " ``` Use [`DELETE /v1/users/{username}/tokens/{token_id}`](/sql-reference/rest-api/users-delete-token/) to revoke a known token. ```python import requests response = requests.delete( "https://api.motherduck.com/v1/users/analytics_service_account/tokens/", headers={"Authorization": "Bearer "}, ) response.raise_for_status() ``` ## Delete a service account Deleting a service account immediately revokes its tokens and permanently deletes data owned by that account. :::warning[This action can't be undone] Verify the service account username before deleting it. Data and users deleted through the API can't be recovered. ::: 1. In **Settings** > **Service Accounts**, find the service account. 2. Open the service account's three-dot menu. 3. Click **Delete account**. 4. Confirm the deletion. Use [`DELETE /v1/users/{username}`](/sql-reference/rest-api/users-delete/) to delete a known service account. ```bash curl -X DELETE \ https://api.motherduck.com/v1/users/analytics_service_account \ -H "Authorization: Bearer " ``` Use [`DELETE /v1/users/{username}`](/sql-reference/rest-api/users-delete/) to delete a known service account. ```python import requests response = requests.delete( "https://api.motherduck.com/v1/users/analytics_service_account", headers={"Authorization": "Bearer "}, ) response.raise_for_status() print(response.json()["username"]) ``` ## Related content - [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) - [Impersonate service accounts](/key-tasks/service-accounts-guide/impersonate-service-accounts/) - [MotherDuck REST API](/sql-reference/rest-api/motherduck-rest-api/) --- Source: https://motherduck.com/docs/key-tasks/sharing-data/managing-shares --- sidebar_position: 4 title: Managing shares description: View share details, modify permissions, and manage shared database access. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Getting details about a share You can learn more about a specific share that you've created by using [`DESCRIBE SHARE`](/sql-reference/motherduck-sql-reference/describe-share.md) command. For example: ```sql -- if you are the share owner, use the database name DESCRIBE SHARE "duckshare"; -- if you are the share viewer, use the full url DESCRIBE SHARE "md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6"; ``` In the UI you can roll over a share to see a tooltip that tells you the share owner, when it was last updated, and access scope. ## Listing Shares You can list the shares you have created via the [`LIST SHARES`](/sql-reference/motherduck-sql-reference/list-shares.md) statement. For example: ```sql LIST SHARES; ``` 1. You can see shares that you've created under "Shares I've created". 2. You can find **Discoverable** **Organization** shares that members of your Organization created under "Shared with me". To view the URLs of shares created by others that you have currently attached, use the [`SHOW ALL DATABASES`](/sql-reference/motherduck-sql-reference/show-databases/) command. The `fully_qualified_name` column gives you the share URL of the attached share. ## Deleting a share Shares can be deleted with the [`DROP SHARE`](/sql-reference/motherduck-sql-reference/drop-share.md) or `DROP SHARE IF EXISTS` method. For example: Users who have [`ATTACH`](/sql-reference/motherduck-sql-reference/attach.md)-ed it will lose access. ```sql DROP SHARE "share1"; ``` 1. Roll over the share you'd like to delete. 2. Click on the "trident" on the right side. 3. Select "Drop". 4. Confirm. ## Updating a share Sharing a database creates a point-in-time snapshot of the database at the time it is shared. To publish changes, you need to explicitly run `UPDATE SHARE `. When updating a `SHARE` with the same database, the URL does not change. ```sql UPDATE SHARE ; ``` In the following example database 'mydb' was previously shared by creating a share 'myshare', and the database 'mydb' has been updated since. Owner of the database would like their colleagues to receive the new version of this database: ```sql # 'myshare' was previously created on the database 'mydb' UPDATE SHARE "myshare"; ``` If you lost your database share url, you can use the `LIST SHARES` command to list all your share or `DESCRIBE SHARE ` to get specific details about a given share name. ## Editing/Altering a share You can change the configuration of shares you've created in the UI. SQL operation `ALTER SHARE` is in the works. 1. Roll over the share you'd like to edit. 2. Click on the "trident" on the right side. 3. Select "Alter". 4. Change the share configuration as you see fit. 5. Confirm "Alter share". **Error handling:** If you don't see the trident icon, you may not have permission to edit this share. --- Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-data --- title: Sharing data in MotherDuck description: Learn how to securely share data in MotherDuck --- :::note Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is scoped to a single cloud region that must be chosen at Org creation when signing up. MotherDuck is available on AWS in three regions: - **US East (N. Virginia):** `us-east-1` - **US West (Oregon):** `us-west-2` - **Europe (Frankfurt):** `eu-central-1` ::: You can securely share data in MotherDuck. MotherDuck's sharing model is specifically optimized for the following scenarios: - Sharing data with everyone in your Organization for easy discovery and low-friction access. Typical of small highly collaborative data teams. - Sharing data with specific accounts in your Organization. Popular with data application builders needing to isolate tenants. - Sharing data publicly with anyone with a MotherDuck account in the same cloud region as your Organization, including users outside your Organization. ## Included pages - [Sharing concepts and overview](https://motherduck.com/docs/key-tasks/sharing-data/sharing-overview): MotherDuck data sharing model concepts including read-only shares and scope options. - [Sharing data with your organization](https://motherduck.com/docs/key-tasks/sharing-data/sharing-within-org): Share databases with all members of your MotherDuck organization. - [Sharing data with specific users](https://motherduck.com/docs/key-tasks/sharing-data/sharing-with-users): Grant read access to specific users for multi-tenant applications and collaboration. - [Managing shares](https://motherduck.com/docs/key-tasks/sharing-data/managing-shares): View share details, modify permissions, and manage shared database access. - [Updating shares](https://motherduck.com/docs/key-tasks/sharing-data/updating-shares): Learn about data replication timing, checkpoints, and how to ensure your latest data is available in shares and read-only Ducklings. --- Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-overview --- sidebar_position: 1 title: Sharing concepts and overview description: MotherDuck data sharing model concepts including read-only shares and scope options. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Sharing data in MotherDuck MotherDuck's data sharing model has the following key characteristics: - Sharing is at the granularity of a MotherDuck database. - Sharing is read-only. - Sharing is done through **share** objects. - You can make shares discoverable and queryable by all users in your [Organization](../managing-organizations/managing-organizations.mdx). - You can create restricted shares, where access to each is managed with an [Access Control List (ACL)](./sharing-with-users.md). - Alternatively, you can use hidden share URLs to limit access to specific people in your organization you share the URL with. - You can also configure the URL of a hidden share to be accessible by anyone with a MotherDuck account in the same cloud region as your Organization. :::note Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is scoped to a single cloud region that must be chosen at Org creation when signing up. MotherDuck is available on AWS in three regions: - **US East (N. Virginia):** `us-east-1` - **US West (Oregon):** `us-west-2` - **Europe (Frankfurt):** `eu-central-1` ::: Sharing in MotherDuck works as follows: 1. The **data provider** shares their database in MotherDuck by creating a share. 2. The **data consumer** attaches said share, which creates a database clone in their workspace. The data consumer can now query this database. 3. The **data provider** periodically updates the share to push updates to the database to **data consumers**. ## Creating a share The first step in sharing databases in MotherDuck is to create a share, which can be done in both UI and SQL. Creating a share does not incur additional costs, and no actual data is copied or transferred - creating a share is a zero-copy, metadata-only operation. Click on the "trident" next to the database you'd like to share. Select "share". Then: ![trident](./img/ui-share_new.png) 1. Optionally, choose a share name. Default will be the database name. 2. Choose whether the share should only be accessible by all users in your Organization, specified users in your Organization, or any MotherDuck user in the same cloud region who has access to the share link. 3. Choose whether the share should be automatically updated or not. Default is `MANUAL` The following example creates a share from database "birds": - Share is also named "birds". - This share can only be accessed by accounts authenticated in your [Organization](../managing-organizations/managing-organizations.mdx). - This share is discoverable. Users in your Organization can find this share. ```sql use birds; CREATE SHARE; -- Shorthand syntax. Share name is optional. By default, shares are Organization-scoped and Discoverable. CREATE SHARE IF NOT EXISTS birds FROM birds (ACCESS ORGANIZATION , VISIBILITY DISCOVERABLE, UPDATE MANUAL); -- This query is identical to the previous one but with explicit options. ``` Learn more about the [CREATE SHARE](/sql-reference/motherduck-sql-reference/create-share.md) SQL command. ### Organization shares When creating a share, you may choose scope of access to this share: - **Organization**. Only users authenticated in your Organization will have access to this share. - **Restricted**. Only the share owner and users specified with `GRANT` commands can access the share. - **Unrestricted**. Any user signed into any MotherDuck organization in the same cloud region can access this share using the share URL. ### Discoverable shares When creating a share, you may choose to make this share **Discoverable**. All authenticated users in your Organization can find this share in the UI. You can create **Discoverable** shares that are **Unrestricted**, but only members of your Organization can find this share in the UI. Non-members can still access this share using the share URL. ### Share URLs When you create a share, a URL for this share is generated: - If the share is **Discoverable**, members of your Organization can find this share without the share URL. Alternatively, they can use the URL directly. - If the share is **Hidden** (e.g. not Discoverable), other users will not be able to find the share URL. You will need to send this URL directly to the users with whom you want to share this data. ## Consuming shared data The **data consumer** needs to attach the share to their workspace, thereby creating a read-only zero-copy clone of the source database. This is a free, metadata-only operation. When you attach a share, it gets an alias that defaults to the source database name. If you already have a database with that name, the attach fails. Use `AS` to pick a different alias, or [detach](/key-tasks/database-operations/detach-and-reattach-motherduck-database/) the conflicting database first. See [share alias conflicts](/sql-reference/motherduck-sql-reference/attach/#share-alias-conflicts) for details. ### Views and fully-qualified table references If the shared database contains views, those views may reference tables using fully-qualified paths that include the original database name. For example, a view in a database called `org_dwh` might reference `org_dwh.main.sales`. When you attach the share, make sure the database alias matches the original database name. Otherwise, the views fail because they can't resolve the original database name in your namespace. ```sql -- The share was created from a database called "org_dwh". -- Views inside reference the tables as "org_dwh.main.". -- This will cause view errors because the alias doesn't match: ATTACH 'md:_share/org_dwh/id_abc123' AS dwh; -- Use the original database name as the alias: ATTACH 'md:_share/org_dwh/id_abc123' AS org_dwh; ``` This applies to any object in the shared database that uses fully-qualified references, including views, macros, and stored procedures. ### Consuming discoverable shares If the **data provider** created a Discoverable share you have access to, you should be able to find this share in the UI. 1. Select the share you want under "Shared with me". 2. Optionally roll over the share to see the tooltip that tells you the share owner, when it was last updated, and share access scope. 2. Click "attach". 3. You can query the resulting database. ### Consuming hidden shares If the **data provider** created a Hidden (e.g. non-Discoverable) share, they need to pass the share URL to the **data consumer**. The **data consumer**, in turn, needs to attach the share URL. ```sql ATTACH 'md:_share/ducks/0a9a026ec5a55946a9de39851087ed81' AS birds; # attaches the share as database `birds` ``` ## Updating shared data If during creation of the share, the **data provider** chooses to have the share update automatically, the share will be updated periodically. If the share was created with `MANUAL` updates, the **data provider** needs to manually update the share. ```sql UPDATE SHARE birds; ``` Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md) and [data replication timing and checkpoints](./updating-shares.md). ## Consuming updated data By default, shares automatically update every minute. However, if you need the most up-to-date data sooner, the consumer can manually refresh the share after the producer executes UPDATE SHARE. To manually refresh the data: ```sql REFRESH DATABASES; -- Refreshes all connected databases and shares REFRESH DATABASE my_share; -- Alternatively, refresh a specific database/share ``` Lean more about [REFRESH DATABASES](/sql-reference/motherduck-sql-reference/refresh-database.md). --- Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-with-users --- sidebar_position: 3 title: Sharing data with specific users description: Grant read access to specific users for multi-tenant applications and collaboration. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import useBaseUrl from '@docusaurus/useBaseUrl'; MotherDuck lets you securely share data with specific users. Common scenarios include: - Building data applications, in which each tenant should only have access to their own data. - Sharing sensitive data within your Organization. - Sharing data outside of your Organization. :::note Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is scoped to a single cloud region that must be chosen at Org creation when signing up. MotherDuck is available on AWS in three regions: - **US East (N. Virginia):** `us-east-1` - **US West (Oregon):** `us-west-2` - **Europe (Frankfurt):** `eu-central-1` ::: Sharing data with individuals is easy. MotherDuck supports two approaches: - Creating a share with **Restricted** access, limiting access to a list of specified users within your organization (known as an "ACL" or "Access Control List"). - Creating a **Hidden** share and providing individuals with the share URL. ## Creating a share with restricted access (ACL) **Overview** 1. **Data provider** creates a share with **Restricted** access. 2. **Data provider** _(share owner)_ specifies which **data consumers** _(users)_ can read from the share. 3. **Data consumer** **attaches** the share. 4. **Data provider** periodically updates the share to push new data to **data consumers**. Anyone within your organization that is _not_ included in the list will **not** be able to access the share, even if they have a share link. Click on the "trident" next to the database you'd like to share. Select "Share". trident 1. Optionally name the share. 2. Under "Who has access" choose "Specified users with the share link". Search for and add the users within your Organization that should have access to read the share. 3. Choose whether the share should be [automatically updated or not](../sharing-overview/#updating-shared-data). Default is `MANUAL`. 3. Create the share. 4. For the specified users, the share will appear in their UI under 'Shared with me' and can be attached. ```sql use birds; CREATE SHARE birds FROM birds (ACCESS RESTRICTED); -- This query creates a share accessible only by organization users specified with GRANT commands GRANT READ ON SHARE birds TO duck1, duck2; -- Gives the users with usernames 'duck1' and 'duck2' access to the share 'birds' ``` **Data consumer** must `ATTACH` the restricted share before querying the share. See [consuming restricted shares](./#consuming-restricted-shares). :::note Restricted shares default to **Discoverable** visibility for users who have been granted access to the share. (Learn more about ["Discoverable shares"](../sharing-overview/#discoverable-shares)). ::: ### Consuming restricted shares The **data consumers** in your Organization with access to the restricted share can use the UI or SQL to **attach** the share and start querying it. 1. Select the restricted share you want to attach under "Shared with me" 2. Click "attach" and optionally name the resulting database. 3. You can query the resulting database. Run the `ATTACH` command to attach the share as a queryable database. This is a zero-cost metadata-only operation. ```sql ATTACH md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa; -- Creates a zero-copy clone database called birds ``` Learn more about [ATTACH](/sql-reference/motherduck-sql-reference/attach.md). ### Modifying share access **Data providers** _(share owners)_ can modify which users within your Organization have access to the share. 1. Find the target share in the "Shares I've created" section of the Object Explorer, and choose the 'Alter' option from the context menu. 2. From here, you can add and remove users with access to the share. 3. You may also alter the share to use a different **access** scope. Learn more about [share access scopes](../sharing-overview/#organization-shares). For more details on how to configure access controls for restricted shares, see the [`GRANT READ ON SHARE` reference page](/sql-reference/motherduck-sql-reference/grant-access/). ```sql GRANT READ ON SHARE birds TO duck3; -- Gives the user with username 'duck3' access to the share 'birds' REVOKE READ ON SHARE birds FROM penguin; -- Revokes access to the share 'birds' from the user with username 'penguin' ``` For more details on configuring access controls for restricted shares, see the [`GRANT READ ON SHARE` reference page](/sql-reference/motherduck-sql-reference/grant-access/). ## Creating hidden shares **Overview** 1. **Data provider** creates the share URL and passes this URL to the **data consumer**. 2. **Data consumer** **attaches** the share. 3. **Data provider** periodically updates the share to push new data to **data consumers**. To share a database, first create a Hidden share. No actual data is copied and no additional costs are incurred in this process. Click on the "trident" next to the database you'd like to share. Select "share". trident 1. Optionally name the share. 2. To share the data with MotherDuck users inside or outside of your Organization, choose the "Anyone with the share link" option. This will enable anyone with the share link in the same cloud region to attach and query the share, including users outside your Organization. 3. Create the share. 4. Copy the resulting **ATTACH** command to your clipboard and send it to your **data consumers**. ```sql use birds; CREATE SHARE birds FROM birds (ACCESS UNRESTRICTED , VISIBILITY HIDDEN); -- This query creates a Hidden share accessible by anyone with the share link in the same cloud region, including users outside your Organization > md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa ``` Save the returned share URL and pass it to **data consumers**. ### Consuming hidden shares The **data consumer** in your Organization can use SQL to attach the share and start querying it! Run the `ATTACH` command to attach the share as a queryable database. This is a zero-cost metadata-only operation. ```sql ATTACH md:_share/birds/e9ads7-dfr32-41b4-a230-bsadgfdg32tfa; -- Creates a zero-copy clone database called birds ``` Learn more about [ATTACH](/sql-reference/motherduck-sql-reference/attach.md). ## Updating shared data If during creation of the share, the **data provider** chose to have the share updated automatically, the share will be updated periodically. If the share was created with `MANUAL` updates, the **data provider** needs to manually update the share. ```sql UPDATE SHARE birds; ``` Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md) and [data replication timing and checkpoints](./updating-shares.md). --- Source: https://motherduck.com/docs/key-tasks/sharing-data/sharing-within-org --- sidebar_position: 2 title: Sharing data with your organization description: Share databases with all members of your MotherDuck organization. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; # Sharing data with your organization MotherDuck makes it easy for you to share data with all members of your Organization and making that data discoverable and queryable. This is a common use case for small, highly collaborative data teams. 1. **Data provider** creates an **Organization** scoped, **Discoverable** share. 2. **Data consumers** find the share and **attach** it. 3. **Data provider** periodically updates the share to push new data to **data consumers**. :::note Shares are **region-scoped** based on your Organization's cloud region. Each MotherDuck Organization is scoped to a single cloud region that must be chosen at Org creation when signing up. MotherDuck is available on AWS in three regions: - **US East (N. Virginia):** `us-east-1` - **US West (Oregon):** `us-west-2` - **Europe (Frankfurt):** `eu-central-1` ::: ## 1. create an organization-scoped, discoverable share To share a database with your Organization, create a share. No actual data is copied and no additional costs are incurred in this process. ![trident](./img/ui-share_new.png) Click on the "trident" next to the database you'd like to share. Select "share". Then: 1. Optionally, choose a share name. Default will be the database name. 2. Choose whether the share should only be accessible by all users in your Organization, specified users in your Organization, or any MotherDuck user in the same cloud region who has access to the share link. 4. Choose whether the share should be automatically updated or not; the current default is `MANUAL` ```sql use birds; CREATE SHARE; -- Shorthand syntax. Share name is optional. By default, shares are Organization-scoped and Discoverable. CREATE SHARE birds FROM birds (ACCESS ORGANIZATION , VISIBILITY DISCOVERABLE); -- This query is identical to the previous one yet optionally more verbose. ``` ## 2. find and consume shares The **data consumer** in your Organization can use the UI to find the share, attach it, and start querying it! 1. Select the share you want under "Shared with me" 2. Click "attach" and optionally name the resulting database. 3. You can query the resulting database. :::note The ability to list and discover Discoverable shares in SQL is coming shortly. ::: ## 3. update shared data If during creation of the share, the **data provider** chose to have the share updated automatically, the share will be updated periodically. If the share was created with `MANUAL` updates, the **data provider** needs to manually update the share. ```sql UPDATE SHARE birds; ``` Learn more about [UPDATE SHARE](/sql-reference/motherduck-sql-reference/update-share.md) and [data replication timing and checkpoints](./updating-shares.md). --- Source: https://motherduck.com/docs/key-tasks/sharing-data/updating-shares --- sidebar_position: 5 title: Updating shares description: Learn about data replication timing, checkpoints, and how to ensure your latest data is available in shares and read-only Ducklings. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Data replication speed **Use this when you need to:** Understand how quickly data changes become available in shares and read-only Ducklings. **Prerequisites:** You should have shares or read-only Ducklings configured in your MotherDuck environment. **You'll know you're done when:** You understand the timing characteristics and can optimize data availability when needed. MotherDuck automatically replicates data to shares and read-only Ducklings with the following timing characteristics: ### Auto-updated shares For shares configured with auto-update enabled, MotherDuck polls for new data **once per minute**. When new data is detected, it becomes available in the share after the next checkpoint occurs. ### Checkpoints and data availability Data is written to shares whenever there is a checkpoint. Checkpoints occur automatically based on your database's configuration. Starting with DuckDB 1.5, checkpoints run in the background, so reads, writes, and deletes can continue while a checkpoint is in progress. For read scaling Ducklings, you can force a snapshot using [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot/) to make data available sooner. For read scaling Ducklings, to force a snapshot and make data immediately available: ```sql CREATE SNAPSHOT OF ; ``` **Expected result:** A new read-only snapshot is created, ensuring read scaling connections can access the most up-to-date data. **Use case:** Run this when you need to ensure the latest data is available to read scaling Ducklings immediately. **Important:** This command will wait for any ongoing write queries to complete and prevent new ones from starting during snapshot creation. 1. Navigate to your database in the MotherDuck interface 2. Look for snapshot options in the database management section 3. Trigger a snapshot to ensure your latest data is available in read scaling Ducklings immediately **Expected result:** Your latest data becomes immediately available in all read scaling Ducklings. ### Read-only Ducklings Data replication to read-only Ducklings within the same account follows the same timing as shares - data becomes available after checkpoints, with polling occurring once per minute for auto-updated configurations. ## Manual share updates **Use this when you need to:** Publish recent changes from your database to make them available in the share. **Prerequisites:** You must be the owner of the share and have made changes to the source database since the last share update. **You'll know you're done when:** The share reflects the latest version of your database and the last updated timestamp changes. Sharing a database creates a point-in-time snapshot of the database at the time it is shared. To publish changes, you need to explicitly run `UPDATE SHARE `. When updating a `SHARE` with the same database, the URL does not change. ```sql UPDATE SHARE ; ``` **Example:** Database 'mydb' was previously shared by creating a share 'myshare', and the database 'mydb' has been updated since. The owner wants colleagues to receive the latest version: ```sql # 'myshare' was previously created on the database 'mydb' UPDATE SHARE "myshare"; ``` **Expected result:** The share is updated with the latest data from the source database. **Recovery:** If you lost your database share URL, you can use the `LIST SHARES` command to list all your shares or `DESCRIBE SHARE ` to get specific details about a given share name. ## Refreshing shared data (consumer side) **Use this when you need to:** Get the most up-to-date data from a share or read scaling Duckling after the producer has made updates. **Prerequisites:** You must have attached a share or be connected to a read scaling Duckling. **You'll know you're done when:** Your local copy reflects the latest data from the producer. By default, shares and read scaling Ducklings _automatically sync every minute_. However, if you need the most up-to-date data sooner, you can manually refresh after the producer executes their update command. ### Complete workflow for maximum freshness For the freshest possible data, follow this two-step process: 1. **Producer side:** Either wait for normal checkpoints or force an update 2. **Consumer side:** Run `REFRESH DATABASE` to pull the latest changes **Producer (writer connection):** ```sql -- Make your changes INSERT INTO my_db.my_table VALUES (...); -- Option 1: Wait for normal checkpoint (automatic) -- Data becomes available after the next checkpoint occurs -- Option 2: Force a snapshot to make data immediately available CREATE SNAPSHOT OF my_db; ``` **Consumer (read scaling connection):** ```sql -- Refresh to get the latest snapshot REFRESH DATABASES; -- Refreshes all connected databases and shares -- OR REFRESH DATABASE my_db; -- Refresh just one specific database ``` **Producer (share owner):** ```sql -- Make your changes INSERT INTO my_db.my_table VALUES (...); -- Option 1: Wait for normal checkpoint (automatic) -- Data becomes available after the next checkpoint occurs -- Option 2: Force a share update to make data immediately available UPDATE SHARE "myshare"; ``` **Consumer (share recipient):** ```sql -- Refresh to get the latest share data REFRESH DATABASES; -- Refreshes all connected databases and shares -- OR REFRESH DATABASE my_share; -- Refresh just one specific share ``` ### Understanding the refresh output When you run `REFRESH DATABASES`, you'll see output showing which databases were refreshed: ```sql REFRESH DATABASES; ┌─────────┬───────────────────┬──────────────────────────┬───────────┐ │ name │ type │ fully_qualified_name │ refreshed │ │ varchar │ varchar │ varchar │ boolean │ ├─────────┼───────────────────┼──────────────────────────┼───────────┤ │ my_db │ motherduck │ md:my_db │ false │ │ myshare │ motherduck share │ md:_share/myshare/uuid │ true │ └─────────┴───────────────────┴──────────────────────────┴───────────┘ ``` The `refreshed` column shows `true` for databases that were successfully refreshed with new data. Learn more about [`REFRESH DATABASE`](/sql-reference/motherduck-sql-reference/refresh-database.md). --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/aggregate-functions --- sidebar_position: 6 title: Aggregate functions description: DuckDB aggregate functions like SUM, COUNT, AVG, and statistical functions. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/configurations --- sidebar_position: 8 title: Configurations description: DuckDB configuration options for memory, threads, and query behavior. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/constraints --- sidebar_position: 9 title: Constraints description: Table constraints in DuckDB including PRIMARY KEY, UNIQUE, NOT NULL, and CHECK. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/data-types --- sidebar_position: 3 title: Data types description: Supported data types in DuckDB including numeric, string, date/time, and complex types. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; ## VARIANT type Starting with DuckDB 1.5, the `VARIANT` type provides a high-performance way to store and query semi-structured data. It is a strongly typed alternative to storing data as plain `JSON` or `VARCHAR` columns that delivers significantly faster reads and writes. ### How VARIANT works When data is stored as `VARIANT`, DuckDB automatically "shreds" (decomposes) the semi-structured values into their underlying typed columns in Parquet files. This means a column of mixed JSON objects is stored as efficiently typed columnar data rather than opaque strings, enabling: - **Columnar compression** on the underlying typed values - **Predicate pushdown** using row group statistics - **Faster reads** by scanning only the fields you reference in your query ### Using VARIANT ```sql -- Create a table with a VARIANT column CREATE TABLE events ( id INTEGER, payload VARIANT ); -- Insert JSON data -- it is automatically converted to VARIANT INSERT INTO events VALUES (1, '{"user": "alice", "action": "click", "ts": "2026-03-19T10:00:00Z"}'::VARIANT), (2, '{"user": "bob", "action": "purchase", "amount": 42.50}'::VARIANT); -- Query individual fields SELECT id, payload->>'user' AS user_name, payload->>'action' AS action FROM events; ``` ### VARIANT in DuckLake [DuckLake](/integrations/file-formats/ducklake/) tables support `VARIANT` columns starting with DuckLake 0.4. This combination is particularly effective for workloads with high-volume semi-structured data because DuckLake's Parquet-backed storage takes full advantage of VARIANT shredding. For complete details on the VARIANT type, see [VARIANT](https://duckdb.org/docs/stable/sql/data_types/variant) in the DuckDB documentation. --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-sql-reference --- title: DuckDB SQL description: DuckDB SQL Reference --- DuckDB provides a rich SQL dialect with powerful analytical capabilities. MotherDuck uses DuckDB's SQL engine, so all standard DuckDB syntax works seamlessly. This reference covers core SQL statements, data types, functions, window functions, and query syntax. For MotherDuck-specific extensions like cloud database management and sharing, see the [MotherDuck SQL](/sql-reference/motherduck-sql-reference) reference. :::tip DuckDB maintains comprehensive documentation at [duckdb.org/docs](https://duckdb.org/docs/stable/). The reference here focuses on the most commonly used features. ::: ## Included pages - [DuckDB statements](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements): DuckDB SQL statements reference - [Query syntax](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/query-syntax): DuckDB query syntax including SELECT, FROM, WHERE, GROUP BY, ORDER BY, and other clauses. - [Data types](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/data-types): Supported data types in DuckDB including numeric, string, date/time, and complex types. - [Enum data type](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/enum): DuckDB enum data type for defining columns with a fixed set of string values. - [Expressions](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/expressions): DuckDB expression syntax including operators, CASE, subqueries, and type casts. - [Functions](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/functions): Built-in scalar functions in DuckDB for string manipulation, math, dates, and more. - [Aggregate functions](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/aggregate-functions): DuckDB aggregate functions like SUM, COUNT, AVG, and statistical functions. - [Window functions](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/window-functions): DuckDB window functions for ranking, running totals, and analytical queries. - [Configurations](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/configurations): DuckDB configuration options for memory, threads, and query behavior. - [Constraints](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/constraints): Table constraints in DuckDB including PRIMARY KEY, UNIQUE, NOT NULL, and CHECK. - [Information schema](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/information-schema): DuckDB information_schema views for querying database metadata. - [Metadata functions](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/metadata-functions): DuckDB functions for querying table and column metadata programmatically. - [PRAGMA statements](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/pragma-statements): PRAGMA statements for DuckDB configuration and metadata queries. - [SAMPLE](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/sample): SAMPLE clause for retrieving random subsets of query results in DuckDB. --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/alter-table --- title: ALTER TABLE description: ALTER TABLE statement for modifying table structure in DuckDB. --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/attach-detach --- title: ATTACH/DETACH description: "ATTACH and DETACH statements for connecting to external databases in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/call --- title: CALL description: "CALL statement for executing table functions in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/comment-on --- title: COMMENT ON description: "COMMENT ON statement for adding descriptions to database objects in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/copy --- title: COPY description: "COPY statement for importing and exporting data in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-index --- title: CREATE INDEX description: "Use CREATE INDEX to speed up point lookups and highly selective queries in MotherDuck." --- # CREATE INDEX The `CREATE INDEX` statement creates an [Adaptive Radix Tree (ART)](https://duckdb.org/docs/stable/sql/indexes) index on one or more columns. In MotherDuck, indexes speed up point lookups, range queries, and some highly selective joins. ## Syntax ```sql CREATE [UNIQUE] INDEX [IF NOT EXISTS] ON ( [, ...]); ``` ## When to use indexes Indexes work best for very selective queries that return a small fraction of the table's rows. For example: - **Point lookups** -- finding a single row by ID or key - **Highly selective range queries** -- filtering on a narrow range that matches less than ~0.1% of the data - **Selective joins** -- joining on indexed columns with high selectivity For broader analytical queries that scan large portions of a table, MotherDuck's columnar storage and zone maps already provide strong performance without indexes. ## Example ```sql -- Create a table and an index CREATE TABLE users (id INTEGER, name VARCHAR); INSERT INTO users VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'); CREATE INDEX idx_user_id ON users(id); -- Point lookup uses the index SELECT * FROM users WHERE id = 1; ``` You can verify that the index is being used with the [EXPLAIN](/sql-reference/motherduck-sql-reference/explain/) statement: ```sql EXPLAIN SELECT * FROM users WHERE id = 1; -- Shows INDEX_SCAN when the index is used ``` ## Constraints Indexes are also created automatically when you add a `UNIQUE` or `PRIMARY KEY` constraint. This lets you use features like [`INSERT ... ON CONFLICT`](https://duckdb.org/docs/stable/sql/statements/insert#on-conflict-clause) for upserts and deduplication. ```sql CREATE TABLE events ( event_id INTEGER PRIMARY KEY, event_name VARCHAR ); -- Upsert: insert or update on conflict INSERT INTO events VALUES (1, 'signup') ON CONFLICT (event_id) DO UPDATE SET event_name = excluded.event_name; ``` ## Trade-offs Indexes slow down `INSERT`, `UPDATE`, and `DELETE` operations because the index must be updated alongside the table data. If your workload is write-heavy and doesn't benefit from selective lookups, skip the index. ART indexes also need to fit in memory during creation, so they may not be practical for very large columns. For more details on DuckDB's index implementation, see the [DuckDB Indexes documentation](https://duckdb.org/docs/stable/sql/indexes). --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-macro --- title: CREATE MACRO description: "CREATE MACRO statement for defining reusable SQL expressions in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-table --- title: CREATE TABLE description: "CREATE TABLE statement for defining new tables in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/delete --- title: DELETE description: "DELETE statement for removing rows from DuckDB tables." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/drop --- title: DROP description: "DROP statement for removing tables, views, and other objects in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/duckdb-statements --- title: DuckDB statements description: DuckDB SQL statements reference --- Reference documentation for DuckDB SQL statements. These statements work in both local DuckDB and MotherDuck cloud environments. **Common operations:** - **Data manipulation**: `SELECT`, `INSERT`, `UPDATE`, `DELETE` - **Schema management**: `CREATE TABLE`, `ALTER TABLE`, `DROP` - **Data loading**: `COPY`, `EXPORT` - **Advanced queries**: `PIVOT`, `UNPIVOT` ## Included pages - [ALTER TABLE](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/alter-table): ALTER TABLE statement for modifying table structure in DuckDB. - [ATTACH/DETACH](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/attach-detach): ATTACH and DETACH statements for connecting to external databases in DuckDB. - [CALL](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/call): CALL statement for executing table functions in DuckDB. - [COMMENT ON](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/comment-on): COMMENT ON statement for adding descriptions to database objects in DuckDB. - [COPY](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/copy): COPY statement for importing and exporting data in DuckDB. - [CREATE INDEX](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-index): Use CREATE INDEX to speed up point lookups and highly selective queries in MotherDuck. - [CREATE MACRO](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-macro): CREATE MACRO statement for defining reusable SQL expressions in DuckDB. - [CREATE TABLE](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/create-table): CREATE TABLE statement for defining new tables in DuckDB. - [DELETE](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/delete): DELETE statement for removing rows from DuckDB tables. - [DROP](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/drop): DROP statement for removing tables, views, and other objects in DuckDB. - [EXPORT](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/export): EXPORT statement for exporting database contents to files in DuckDB. - [INSERT](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/insert): INSERT statement for adding rows to tables in DuckDB. - [PIVOT](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/pivot): PIVOT statement for transforming rows to columns in DuckDB. - [SELECT](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/select): SELECT statement syntax and options in DuckDB. - [SET/RESET](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/set-reset): SET and RESET statements for configuring DuckDB session options. - [UNPIVOT](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/unpivot): UNPIVOT statement for transforming columns to rows in DuckDB. - [UPDATE](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/update): UPDATE statement for modifying existing rows in DuckDB tables. - [USE](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/use): USE statement for changing the default database or schema in DuckDB. - [VACUUM](https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/vacuum): VACUUM statement for optimizing storage in DuckDB. --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/export --- title: EXPORT description: "EXPORT statement for exporting database contents to files in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/insert --- title: INSERT description: "INSERT statement for adding rows to tables in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/pivot --- title: PIVOT description: "PIVOT statement for transforming rows to columns in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/select --- title: SELECT description: "SELECT statement syntax and options in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/set-reset --- title: SET/RESET description: "SET and RESET statements for configuring DuckDB session options." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/unpivot --- title: UNPIVOT description: "UNPIVOT statement for transforming columns to rows in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/update --- title: UPDATE description: "UPDATE statement for modifying existing rows in DuckDB tables." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/use --- title: USE description: "USE statement for changing the default database or schema in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/duckdb-statements/vacuum --- title: VACUUM description: "VACUUM statement for optimizing storage in DuckDB." --- import PartialExample from '../_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/enum --- sidebar_position: 3 title: Enum data type description: DuckDB enum data type for defining columns with a fixed set of string values. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/expressions --- sidebar_position: 3 title: Expressions description: DuckDB expression syntax including operators, CASE, subqueries, and type casts. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/functions --- sidebar_position: 5 title: Functions description: Built-in scalar functions in DuckDB for string manipulation, math, dates, and more. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/information-schema --- sidebar_position: 10 title: Information schema description: DuckDB information_schema views for querying database metadata. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; If you want to query information about your MotherDuck entities, take a look at [md_information_schema](/sql-reference/motherduck-sql-reference/md_information_schema/introduction). --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/metadata-functions --- sidebar_position: 11 title: Metadata functions description: "DuckDB functions for querying table and column metadata programmatically." --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/pragma-statements --- sidebar_position: 12 title: PRAGMA statements description: "PRAGMA statements for DuckDB configuration and metadata queries." --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/query-syntax --- sidebar_position: 2 title: Query syntax description: DuckDB query syntax including SELECT, FROM, WHERE, GROUP BY, ORDER BY, and other clauses. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/sample --- sidebar_position: 13 title: SAMPLE description: "SAMPLE clause for retrieving random subsets of query results in DuckDB." --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/duckdb-sql-reference/window-functions --- sidebar_position: 7 title: Window functions description: DuckDB window functions for ranking, running totals, and analytical queries. --- import PartialExample from './_include-thing-for-parity-with-duckdb.mdx'; --- Source: https://motherduck.com/docs/sql-reference/mcp/ask-docs-question --- sidebar_position: 7 title: ask_docs_question description: Ask questions about DuckDB or MotherDuck documentation --- # ask_docs_question Ask a question about DuckDB or MotherDuck and get answers from official documentation. ## Description The `ask_docs_question` tool queries the official DuckDB and MotherDuck documentation to answer questions about SQL syntax, features, best practices, and more. This is useful when you need help with DuckDB-specific SQL syntax or MotherDuck features. The tool uses MotherDuck's documentation assistant to provide accurate answers based on official documentation sources. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `question` | string | Yes | Question about DuckDB or MotherDuck | ## Output Schema ```json { "success": boolean, "question": string, // Original question (on success) "answer": string, // Documentation-based answer (on success) "sources": string, // Source references (optional, on success) "error": string // Error message (on failure) } ``` ## Example Usage **Ask about DuckDB syntax:** ```text How do I use window functions in DuckDB? ``` The AI assistant will call the tool with: ```json { "question": "How do I use window functions in DuckDB?" } ``` **Ask about MotherDuck features:** ```text How do I create a share in MotherDuck? ``` ```json { "question": "How do I create a share in MotherDuck?" } ``` **Ask about data types:** ```text What's the difference between LIST and ARRAY types in DuckDB? ``` ```json { "question": "What's the difference between LIST and ARRAY types in DuckDB?" } ``` ## Success Response Example ```json { "success": true, "question": "How do I use window functions in DuckDB?", "answer": "Window functions in DuckDB allow you to perform calculations across a set of rows related to the current row. Here's how to use them:\n\n**Basic syntax:**\n```sql\nSELECT \n column,\n SUM(value) OVER (PARTITION BY category ORDER BY date) as running_total\nFROM table_name;\n```\n\n**Common window functions:**\n- `ROW_NUMBER()` - assigns unique row numbers\n- `RANK()` and `DENSE_RANK()` - ranking with/without gaps\n- `LAG()` and `LEAD()` - access previous/next rows\n- `FIRST_VALUE()` and `LAST_VALUE()` - first/last value in window\n\n**Using QUALIFY:**\nDuckDB supports the QUALIFY clause to filter window function results:\n```sql\nSELECT *\nFROM sales\nQUALIFY ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) = 1;\n```\n\nThis returns only the top sale per region.", "sources": "https://duckdb.org/docs/sql/window_functions" } ``` ## Tips for Good Questions - Be specific about what you want to know - Include context about what you're trying to accomplish - Mention specific functions or features if known --- Source: https://motherduck.com/docs/sql-reference/mcp/delete-dive --- sidebar_position: 15 title: delete_dive description: Permanently delete a Dive by ID feature_stage: preview --- Delete a [Dive](/docs/key-tasks/ai-and-motherduck/dives) by ID. This action is permanent and cannot be undone. ## Description The `delete_dive` tool permanently removes a Dive from your MotherDuck workspace. Once deleted, the Dive cannot be recovered. Use [`list_dives`](../list-dives) to find the Dive ID before deleting. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | string | Yes | The unique identifier (UUID) of the Dive to delete | ## Output Schema ```json { "success": boolean, "message": string, // Status message (on success) "error": string // Error message (on failure) } ``` ## Example Usage **Delete a Dive:** ```text Delete the old revenue Dive I no longer need ``` The AI assistant will call `list_dives` to find the Dive, confirm with the user, then call `delete_dive`: ```json { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" } ``` ## Success Response Example ```json { "success": true, "message": "Dive 'a1b2c3d4-e5f6-7890-abcd-ef1234567890' deleted successfully." } ``` ## Error Response Example ```json { "success": false, "error": "Dive with id 'invalid-uuid' not found" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/get-dive-guide --- sidebar_position: 8 title: get_dive_guide description: Load instructions for creating MotherDuck Dives feature_stage: preview --- Load instructions for creating MotherDuck [Dives](/docs/key-tasks/ai-and-motherduck/dives). Call this before creating or saving dives. ## Description The `get_dive_guide` tool returns comprehensive instructions on how to write MotherDuck Dives—interactive React data apps that query live MotherDuck data. It provides guidance on the [`useSQLQuery` hook](/sql-reference/motherduck-sql-reference/ai-functions/dives/use-sql-query), data type conversions, available libraries, and design system. The guide content is tailored to the AI client you are using. Call this tool before using [`save_dive`](../save-dive) or [`update_dive`](../update-dive) to ensure the generated code follows the correct format. :::note Dives are available on all MotherDuck plans at no additional charge. ::: ## Input parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `client` | string | Yes | The AI client being used: `"claude"`, `"chatgpt"`, `"claude_code"`, or `"other"` | ### Client options | Client | Use case | |--------|----------| | `claude` | Claude (Anthropic) through Claude.ai or API | | `chatgpt` | ChatGPT (OpenAI) | | `claude_code` | Claude Code (terminal-based agent) | | `other` | Any other AI client or custom integration | ## Output schema ```json { "success": boolean, "guide": string, // Dive guide content (on success) or upgrade message "client": string, // The client that was used (on success) "reason": string, // "upgrade_required" (when plan doesn't support Dives) "plan": string, // Current plan name (when upgrade required) "error": string // Error message (on failure) } ``` On success, `guide` contains the client-specific instructions for building Dives. ## Example usage **Build a new Dive from Claude:** ```text Create a Dive showing monthly revenue trends for my sales database ``` The AI assistant will first call `get_dive_guide` to load the instructions: ```json { "client": "claude" } ``` **Build a Dive from ChatGPT:** ```text Create a Dive with a bar chart of customer signups by region ``` ```json { "client": "chatgpt" } ``` **Build a Dive from Claude Code:** ```text Create a Dive showing daily active users over the past 90 days ``` ```json { "client": "claude_code" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/list-columns --- sidebar_position: 4 title: list_columns description: List columns of a table or view with types and comments --- # list_columns List all columns of a table or view with their types and comments. ## Description The `list_columns` tool returns detailed column information for a specified table or view, including data types, nullability, and any comments. This is useful for understanding table structure before writing queries. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `table` | string | Yes | Table or view name | | `database` | string | Yes | Database name | | `schema` | string | No | Schema name (defaults to `main`) | ## Output Schema ```json { "success": boolean, "database": string, // Database name "schema": string, // Schema name "table": string, // Table or view name "objectType": "table" | "view", // Whether it's a table or view "columns": [ // List of columns (on success) { "name": string, // Column name "type": string, // Data type "nullable": boolean, // Whether nulls are allowed "comment": string | null // Column comment if set } ], "columnCount": number, // Number of columns "error": string // Error message (on failure) } ``` ## Example Usage **Get columns for a table:** ```text What columns does the customers table have in my_database? ``` The AI assistant will call the tool with: ```json { "table": "customers", "database": "my_database" } ``` **Get columns in a specific schema:** ```text Show me the schema of staging.raw_events in analytics_db ``` ```json { "table": "raw_events", "database": "analytics_db", "schema": "staging" } ``` ## Success Response Example ```json { "success": true, "database": "my_database", "schema": "main", "table": "customers", "objectType": "table", "columns": [ { "name": "id", "type": "INTEGER", "nullable": false, "comment": "Primary key" }, { "name": "email", "type": "VARCHAR", "nullable": false, "comment": "Customer email address" }, { "name": "name", "type": "VARCHAR", "nullable": true, "comment": "Full name" }, { "name": "created_at", "type": "TIMESTAMP", "nullable": false, "comment": null }, { "name": "metadata", "type": "JSON", "nullable": true, "comment": "Additional customer attributes" } ], "columnCount": 5 } ``` ## Error Response Example ```json { "success": false, "error": "Catalog Error: Table \"nonexistent_table\" does not exist" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/list-databases --- sidebar_position: 1 title: list_databases description: List all databases in your MotherDuck account --- # list_databases List all databases in your MotherDuck account with their names and types. ## Description The `list_databases` tool returns all databases accessible to your MotherDuck account, including both owned databases and attached shared databases. This is useful for discovering what data is available before running queries. ## Input Parameters This tool takes no input parameters. ## Output Schema ```json { "success": boolean, "databases": [ // List of databases (on success) { "alias": string, // Database name/alias "is_attached": boolean, // Whether the database is currently attached "type": string // Database type (e.g., "motherduck", "memory") } ], "error": string // Error message (on failure) } ``` ## Example Usage **List available databases:** ```text What databases do I have access to? ``` The AI assistant will call the tool with no parameters. ## Success Response Example ```json { "success": true, "databases": [ { "alias": "my_db", "is_attached": true, "type": "motherduck" }, { "alias": "analytics", "is_attached": true, "type": "motherduck" }, { "alias": "shared_sales_data", "is_attached": true, "type": "motherduck" } ] } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/list-dives --- sidebar_position: 9 title: list_dives description: List all Dives in your MotherDuck workspace feature_stage: preview --- List all owned [Dives](/docs/key-tasks/ai-and-motherduck/dives) in MotherDuck. Dives are interactive React data apps that query live data. Returns metadata including `current_version` (the latest version number, 1-indexed) for each Dive. Use [`read_dive`](../read-dive) with the optional `version` parameter to retrieve a specific historical version. Optionally filter by keywords to search in title and description. ## Description The `list_dives` tool returns a list of all Dives in your MotherDuck workspace. Each Dive includes its ID, title, description, owner, version history, and timestamps. Use this to discover existing Dives before reading, updating, or deleting them. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `keywords` | string | No | Keywords to filter dives by title or description (case-insensitive, all words must match) | ## Output Schema ```json { "success": boolean, "dives": [ // Array of dives (on success) { "id": string, // Unique identifier (UUID) "title": string, // Dive title "description": string, // Dive description "owner_name": string, // Name of the Dive owner "current_version": number, // Latest version number (1-indexed) "created_at": string, // ISO 8601 creation timestamp "updated_at": string // ISO 8601 last update timestamp } ], "count": number, // Number of dives returned "totalCount": number, // Total number of matching dives "truncated": boolean, // Whether the results were truncated "message": string, // Truncation message (when truncated) "error": string // Error message (on failure) } ``` ## Example Usage **List all Dives:** ```text What Dives do I have in my workspace? ``` The AI assistant will call the tool with no parameters. **Filter Dives by keywords:** ```text Show me my revenue-related Dives ``` The AI assistant will call the tool with keywords: ```json { "keywords": "revenue" } ``` **Find a specific Dive to update:** ```text Show me my existing Dives so I can update the revenue dashboard ``` ## Success Response Example ```json { "success": true, "dives": [ { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "title": "Monthly Revenue Trends", "description": "Line chart showing revenue by month with category breakdown", "owner_name": "alice", "current_version": 3, "created_at": "2025-01-15T10:30:00Z", "updated_at": "2025-01-20T14:45:00Z" }, { "id": "b2c3d4e5-f6a7-8901-bcde-f12345678901", "title": "Customer Signups by Region", "description": "Bar chart of customer signups grouped by region", "owner_name": "bob", "current_version": 1, "created_at": "2025-01-18T09:00:00Z", "updated_at": "2025-01-18T09:00:00Z" } ], "count": 2, "totalCount": 2 } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/list-shares --- sidebar_position: 2 title: list_shares description: List database shares that have been shared with you --- # list_shares List all database [shares](/key-tasks/sharing-data/sharing-overview) that have been shared with you. ## Description The `list_shares` tool returns all database shares that have been shared with you by other users. Each share includes its name and URL, which can be used to attach the share as a database using the `query` tool. To attach a share, execute: `ATTACH '' AS my_alias;` To detach a share: `DETACH ;` ## Input Parameters This tool takes no input parameters. ## Output Schema ```json { "success": boolean, "shares": [ // List of shares (on success) { "name": string, // Share name "url": string // Share URL for attaching } ], "error": string // Error message (on failure) } ``` ## Example Usage **List available shares:** ```text What shares have been shared with me? ``` The AI assistant will call the tool with no parameters. **Attach a share after listing:** ```text Attach the sales_data share so I can query it ``` After getting the share URL from `list_shares`, the AI will use the `query` tool: ```json { "database": "my_db", "sql": "ATTACH 'md:_share/org123/sales_data' AS sales_data" } ``` ## Success Response Example ```json { "success": true, "shares": [ { "name": "sales_data", "url": "md:_share/org123/sales_data" }, { "name": "product_catalog", "url": "md:_share/org456/product_catalog" }, { "name": "analytics_benchmark", "url": "md:_share/org789/analytics_benchmark" } ] } ``` ## Empty Response Example When no shares have been shared with you: ```json { "success": true, "shares": [] } ``` ## Related - [Sharing Overview](/key-tasks/sharing-data/sharing-overview) - Learn about MotherDuck's data sharing capabilities - [Managing Shares](/key-tasks/sharing-data/managing-shares) - How to create and manage shares --- Source: https://motherduck.com/docs/sql-reference/mcp/list-tables --- sidebar_position: 3 title: list_tables description: List tables and views in a MotherDuck database --- # list_tables List all tables and views in a MotherDuck database with their comments. ## Description The `list_tables` tool returns all tables and views in a specified database, including their schema, type (table or view), and any comments that have been added. You can optionally filter by schema. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `database` | string | Yes | Database name to list tables from | | `schema` | string | No | Schema name to filter by (defaults to all schemas) | ## Output Schema ```json { "success": boolean, "database": string, // Database name "schema": string, // Schema filter used ("all" if not specified) "tables": [ // List of tables and views (on success) { "schema": string, // Schema name "name": string, // Table or view name "type": "table" | "view", // Object type "comment": string | null // Table/view comment if set } ], "tableCount": number, // Number of tables "viewCount": number, // Number of views "error": string // Error message (on failure) } ``` ## Example Usage **List all tables in a database:** ```text Show me all tables in my_database ``` The AI assistant will call the tool with: ```json { "database": "my_database" } ``` **List tables in a specific schema:** ```text What tables are in the staging schema of analytics_db? ``` ```json { "database": "analytics_db", "schema": "staging" } ``` ## Success Response Example ```json { "success": true, "database": "my_database", "schema": "all", "tables": [ { "schema": "main", "name": "customers", "type": "table", "comment": "Customer master data" }, { "schema": "main", "name": "orders", "type": "table", "comment": "Order transactions" }, { "schema": "main", "name": "monthly_sales", "type": "view", "comment": "Aggregated monthly sales view" }, { "schema": "staging", "name": "raw_events", "type": "table", "comment": null } ], "tableCount": 3, "viewCount": 1 } ``` ## Error Response Example ```json { "success": false, "error": "Catalog Error: Database \"nonexistent_db\" does not exist" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/mcp --- sidebar_position: 0 title: MCP Server description: Connect AI assistants to MotherDuck using the remote (fully managed) or local (fully customizable) MCP server --- # MotherDuck MCP server MotherDuck offers a **remote MCP server** (fully managed, read-write) and a [**local MCP server**](#local-mcp-server) (fully customizable, self-hosted) that let AI assistants query and explore your MotherDuck databases using the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/). :::info Connection URL The remote MCP server is hosted at `https://api.motherduck.com/mcp`. Most clients connect through OAuth automatically; clients that need a manual configuration use this URL with an HTTP transport. ::: For step-by-step setup instructions for all supported clients (Claude, ChatGPT, Cursor, Claude Code, and others), see [Connect to the MotherDuck MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/). ## Server capabilities With the remote MCP server, your agent can: - Execute read-only and read-write SQL against your databases - Explore database schemas, tables, and columns - Attach and detach [shares](/key-tasks/sharing-data/sharing-overview) - Ask questions about DuckDB and MotherDuck documentation - Create and manage [Dives](/key-tasks/ai-and-motherduck/dives) (interactive data visualizations) - Render Dives inline in supported clients with the Dive Viewer MCP App, so you iterate against live data instead of a sample-data preview For clients that [support MCP instructions](https://modelcontextprotocol.io/clients#feature-support-matrix), the remote MCP server provides detailed [query guidelines](https://app.motherduck.com/assets/docs/mcp_server_instructions.md) to help AI assistants write effective DuckDB SQL. Learn more about [using the MotherDuck MCP server](/key-tasks/ai-and-motherduck/mcp-workflows). ### Regional availability The remote MCP server is available in all MotherDuck regions. Requests are routed to the MCP server closest to where the client runs: - **Desktop clients** (Cursor, Claude Code): Routed based on your physical location - **Web-based agents** (Claude.ai, ChatGPT): Routed based on the agent provider's server location Your data is always processed in your MotherDuck organization's region. However, query results transit through the remote MCP server. If you have strict data residency requirements, ensure your MCP client runs within your region. ### Restricting to read-only access The remote MCP server exposes both read-only and read-write tools. To restrict your AI assistant to read-only access, see [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/). ## Local MCP server For local DuckDB databases, custom configurations, or self-hosted scenarios, use the **local MCP server** ([mcp-server-motherduck](https://github.com/motherduckdb/mcp-server-motherduck)). For a comparison of remote vs local and when to use each, see the [setup guide](/key-tasks/ai-and-motherduck/mcp-setup/#remote-vs-local-mcp-server). 📦 **Local MCP Server GitHub Repository** – Self-host the open-source MCP server for DuckDB and MotherDuck ## Related resources - [Connect to the MCP Server](/key-tasks/ai-and-motherduck/mcp-setup/) - Setup instructions for all supported AI clients - [MCP Workflows Guide](/key-tasks/ai-and-motherduck/mcp-workflows) - Tips and workflows for using the MotherDuck MCP server - [Building Analytics Agents](/key-tasks/ai-and-motherduck/building-analytics-agents) - Guide to building AI agents with MotherDuck - [MCP Specification (2025-06-18)](https://modelcontextprotocol.io/specification/2025-06-18) - Official protocol documentation ## Included pages - [list_databases](https://motherduck.com/docs/sql-reference/mcp/list-databases): List all databases in your MotherDuck account - [list_shares](https://motherduck.com/docs/sql-reference/mcp/list-shares): List database shares that have been shared with you - [list_tables](https://motherduck.com/docs/sql-reference/mcp/list-tables): List tables and views in a MotherDuck database - [list_columns](https://motherduck.com/docs/sql-reference/mcp/list-columns): List columns of a table or view with types and comments - [search_catalog](https://motherduck.com/docs/sql-reference/mcp/search-catalog): Fuzzy search across databases, schemas, tables, columns, and shares - [query](https://motherduck.com/docs/sql-reference/mcp/query): Execute SQL queries against MotherDuck databases - [query_rw](https://motherduck.com/docs/sql-reference/mcp/query-rw): Execute SQL queries that can modify data or schema in MotherDuck - [ask_docs_question](https://motherduck.com/docs/sql-reference/mcp/ask-docs-question): Ask questions about DuckDB or MotherDuck documentation - [get_dive_guide](https://motherduck.com/docs/sql-reference/mcp/get-dive-guide): Load instructions for creating MotherDuck Dives - [list_dives](https://motherduck.com/docs/sql-reference/mcp/list-dives): List all Dives in your MotherDuck workspace - [read_dive](https://motherduck.com/docs/sql-reference/mcp/read-dive): Read a specific Dive by ID, including its full component code - [save_dive](https://motherduck.com/docs/sql-reference/mcp/save-dive): Save a new Dive to your MotherDuck workspace - [update_dive](https://motherduck.com/docs/sql-reference/mcp/update-dive): Update an existing Dive's title, description, or content - [share_dive_data](https://motherduck.com/docs/sql-reference/mcp/share-dive-data): Share the data for a Dive with your organization - [delete_dive](https://motherduck.com/docs/sql-reference/mcp/delete-dive): Permanently delete a Dive by ID --- Source: https://motherduck.com/docs/sql-reference/mcp/query-rw --- sidebar_position: 6.5 title: query_rw description: Execute SQL queries that can modify data or schema in MotherDuck --- # query_rw Execute SQL queries that can modify data or schema in MotherDuck. ## Description The `query_rw` tool executes SQL against your MotherDuck databases, including operations that change data or schema. For cross-database queries, use fully qualified names: `database.schema.table` (or `database.table` for the main schema). ## Input parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `database` | string | No | Database context for the query. Required when the statement targets database objects. Optional for account-level operations. | | `sql` | string | Yes | DuckDB SQL statement to execute | ## Output schema Same as [`query`](/sql-reference/mcp/query/): ```json { "success": boolean, "columns": string[], // Column names (on success) "columnTypes": string[], // Column types (on success) "rows": any[][], // Query results (on success) "rowCount": number, // Number of rows returned (on success) "error": string, // Error message (on failure) "errorType": string // Error type (on failure) } ``` ## Limits - **Result limit:** Maximum 2,048 rows and 50,000 characters. Results exceeding these limits will be truncated with a truncation message. - **Query timeout:** 55 seconds. Queries exceeding this limit will be cancelled server-side and the tool will respond with an error message. ## Example usage **Insert rows:** ```text Insert a new customer 'Acme Corp' with id 100 into my_database.customers ``` ```json { "database": "my_database", "sql": "INSERT INTO customers (id, name) VALUES (100, 'Acme Corp')" } ``` **Update and delete:** ```text In my_database, set status to 'shipped' for all orders in the orders table where status is 'pending', then delete the old log entries from audit_log ``` The AI assistant can call `query_rw` with the appropriate UPDATE and DELETE statements (or multiple calls if the client requires one statement per call). **Create table:** ```text Create a table my_database.main.events with columns id (BIGINT), name (VARCHAR), created_at (TIMESTAMP) ``` ```json { "database": "my_database", "sql": "CREATE TABLE main.events (id BIGINT, name VARCHAR, created_at TIMESTAMP)" } ``` **Account-level operations (database optional):** ```text Create a new database called reporting ``` ```json { "sql": "CREATE DATABASE reporting" } ``` For account-level operations, omit `database` and pass only `sql`. :::tip Read-only access To restrict the MCP server so the AI can only read data, see [Restricting to read-only access](/key-tasks/ai-and-motherduck/securing-read-only-access/). ::: --- Source: https://motherduck.com/docs/sql-reference/mcp/query --- sidebar_position: 6 title: query description: Execute SQL queries against MotherDuck databases --- # query Execute **read-only** SQL queries against MotherDuck databases. ## Description The `query` tool executes SQL queries against your MotherDuck databases. For cross-database queries, use fully qualified names: `database.schema.table` (or `database.table` for the main schema). `query` is for read-only SQL. Operations that modify data, schema, or account settings, or trigger side effects, are rejected. For SQL that can change data or schema, use [`query_rw`](/sql-reference/mcp/query-rw/). ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `database` | string | Yes | Database name to query | | `sql` | string | Yes | DuckDB SQL query to execute | ## Output Schema ```json { "success": boolean, "columns": string[], // Column names (on success) "columnTypes": string[], // Column types (on success) "rows": any[][], // Query results (on success) "rowCount": number, // Number of rows returned (on success) "error": string, // Error message (on failure) "errorType": string // Error type (on failure) } ``` ## Limits - **Result limit:** Maximum 2,048 rows and 50,000 characters. Results exceeding these limits will be truncated with a truncation message. - **Query timeout:** 55 seconds, to stay within common client timeouts. Queries exceeding this limit will be cancelled server-side and the tool will respond with an error message. ## Example Usage **Simple query:** ```text Query the top 5 customers by total orders from my_database ``` The AI assistant will call the tool with: ```json { "database": "my_database", "sql": "SELECT customer_name, COUNT(*) as order_count FROM orders GROUP BY customer_name ORDER BY order_count DESC LIMIT 5" } ``` **Cross-database query:** ```text Join the users table from auth_db with orders from sales_db ``` ```json { "database": "auth_db", "sql": "SELECT u.name, o.order_id, o.amount FROM auth_db.main.users u JOIN sales_db.main.orders o ON u.id = o.user_id LIMIT 100" } ``` ## Success Response Example ```json { "success": true, "columns": ["customer_name", "order_count"], "columnTypes": ["VARCHAR", "BIGINT"], "rows": [ ["Acme Corp", 150], ["TechStart Inc", 89], ["Global Services", 72] ], "rowCount": 3 } ``` ## Error Response Example ```json { "success": false, "error": "Query is not read-only", "errorType": "ForbiddenQueryError" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/read-dive --- sidebar_position: 10 title: read_dive description: Read a specific Dive by ID, including its full component code feature_stage: preview --- Read a specific [Dive](/docs/key-tasks/ai-and-motherduck/dives) by ID, including its full JSX/React component code. Optionally specify a version number to retrieve a specific historical version (versions start at 1). If no version is specified, the latest version is returned. ## Description The `read_dive` tool retrieves a Dive's complete details, including its title, description, timestamps, and the full React component source code. Use this to inspect an existing Dive before updating it, or to understand how a Dive is built. Use [`list_dives`](../list-dives) first to find the Dive ID and its `current_version`. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | string | Yes | The unique identifier (UUID) of the Dive to read | | `version` | number | No | Version number to retrieve (1-indexed). Defaults to the latest version. | ## Output Schema ```json { "success": boolean, "dive": { // Dive object (on success) "id": string, // Unique identifier (UUID) "title": string, // Dive title "description": string, // Dive description "content": string, // Full JSX/React component code "current_version": number, // Current version number "created_at": string, // ISO 8601 creation timestamp "updated_at": string // ISO 8601 last update timestamp }, "error": string // Error message (on failure) } ``` ## Example Usage **Read a Dive to inspect its code:** ```text Show me the code for my revenue trends Dive ``` The AI assistant will call the tool with the Dive's ID: ```json { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" } ``` **Read a specific version of a Dive:** ```text Show me version 1 of my revenue trends Dive ``` ```json { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "version": 1 } ``` **Read a Dive before updating it:** ```text I want to modify my customer signups Dive—can you show me what it looks like? ``` ## Success Response Example ```json { "success": true, "dive": { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "title": "Monthly Revenue Trends", "description": "Line chart showing revenue by month", "content": "import { useSQLQuery } from \"@motherduck/react-sql-query\";\n\nexport default function Dive() {\n const { data, isLoading } = useSQLQuery(`SELECT ...`);\n // ...\n}", "current_version": 3, "created_at": "2025-01-15T10:30:00Z", "updated_at": "2025-01-20T14:45:00Z" } } ``` ## Error Response Example ```json { "success": false, "error": "Dive with ID 'invalid-uuid' not found" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/save-dive --- sidebar_position: 11 title: save_dive description: Save a new Dive to your MotherDuck workspace feature_stage: preview --- Save a new [Dive](/docs/key-tasks/ai-and-motherduck/dives) to MotherDuck. Returns a URL to the Dive in MotherDuck as a link that the user can click to view the Dive. ## Description The `save_dive` tool creates a new Dive in your MotherDuck workspace. It accepts a title, optional description, and the JSX/React component code. Before saving, the tool validates the code to check for common issues like invalid SQL queries or missing exports. After saving, the tool analyzes which databases the Dive queries. If any referenced databases are not yet shared with your organization, it prompts you to use [`share_dive_data`](../share-dive-data) so others in your organization can view the Dive. Call [`get_dive_guide`](../get-dive-guide) first to learn the required JSX/React format. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `title` | string | Yes | The title of the Dive | | `description` | string | No | A brief description of the Dive | | `content` | string | Yes | The JSX/React component code for the Dive | ## Output Schema ```json { "success": boolean, "dive": { // Created dive info (on success) "id": string, // Unique identifier (UUID) "title": string, // Dive title "description": string | null // Dive description }, "dive_url": string, // URL to view the Dive (on success) "warnings": string[], // Validation warnings (if any) "database_warnings": string[], // Warnings from database analysis (if any) "unshared_databases": string[], // Database names not yet shared with the org (if any) "next_steps": string[], // Ordered instructions for the AI to follow after saving "error": string, // Error message (on failure) "validationErrors": [ // Validation errors (on failure) { "type": string, // Error type "message": string, // Error description "details": string // Additional details } ] } ``` ## Example Usage **Create a new Dive:** ```text Create a Dive showing monthly revenue trends for my analytics database ``` The AI assistant will first call [`get_dive_guide`](../get-dive-guide) to load the instructions, then call `save_dive`: ```json { "title": "Monthly Revenue Trends", "description": "Line chart showing revenue by month with year-over-year comparison", "content": "import { useSQLQuery } from \"@motherduck/react-sql-query\";\nimport { LineChart, Line, XAxis, YAxis, Tooltip, ResponsiveContainer } from \"recharts\";\n\nexport default function Dive() {\n const { data, isLoading, isError, error } = useSQLQuery(`\n SELECT DATE_TRUNC('month', order_date) as month, SUM(revenue) as revenue\n FROM analytics.sales\n GROUP BY 1 ORDER BY 1\n `);\n // ... component code\n}" } ``` ## Success Response Example ```json { "success": true, "dive": { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "title": "Monthly Revenue Trends", "description": "Line chart showing revenue by month with year-over-year comparison" }, "dive_url": "https://app.motherduck.com/dives/a1b2c3d4-e5f6-7890-abcd-ef1234567890", "unshared_databases": ["analytics"], "next_steps": [ "Regenerate the dive preview artifact with the updated banner...", "Show the dive to the user in chat as a markdown hyperlink: [Monthly Revenue Trends](https://app.motherduck.com/dives/a1b2c3d4-...)", "The dive references databases not yet shared with the organization: analytics. Ask the user if they want to share them." ] } ``` ## Validation Error Response Example ```json { "success": false, "error": "Dive validation failed", "validationErrors": [ { "type": "SQL_ERROR", "message": "Query validation failed: Table 'analytics.nonexistent_table' not found", "details": "SELECT * FROM analytics.nonexistent_table" } ], "hint": "Please fix the errors above and try again." } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/search-catalog --- sidebar_position: 5 title: search_catalog description: Fuzzy search across databases, schemas, tables, columns, and shares --- # search_catalog Search the catalog for databases, schemas, tables, columns, and shares using fuzzy matching. ## Description The `search_catalog` tool performs fuzzy search across your entire MotherDuck catalog. It finds matching objects by name using partial matching, supporting underscores, dots, and multi-word queries. This is useful for discovering available data when you don't know exact names. The search uses Jaro-Winkler similarity scoring and returns results ranked by relevance. Results are limited per category to provide a balanced view across different object types. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `query` | string | Yes | Search term to find in object names (supports partial matching, underscores, dots) | | `object_types` | string[] | No | Filter results to specific types: `"database"`, `"schema"`, `"table"`, `"column"`, `"share"` | ## Output Schema ```json { "success": boolean, "query": string, // Search query used "resultCount": number, // Total results found "results": [ // Search results (on success) { "type": "database" | "schema" | "table" | "column" | "share", "name": string, // Object name "fullyQualifiedName": string, // Full path (e.g., "db.schema.table.column") "database": string | null, // Database (null for shares) "schema": string | null, // Schema (null for databases/shares) "table": string | null, // Table (only for columns) "dataType": string | null, // Data type (columns) or URL (shares) "comment": string | null, // Object comment if set "relevanceScore": number // Match score 0-1 (higher is better) } ], "error": string, // Error message (on failure) "errorType": string // Error type (on failure) } ``` ## Result Limits Results are limited per object type to provide balanced coverage: - Shares: 10 results - Columns: 40 results - Tables: 30 results - Schemas: 20 results - Databases: 20 results Maximum total results: 100 ## Example Usage **Search for tables with "sales" in the name:** ```text Find all tables related to sales data ``` The AI assistant will call the tool with: ```json { "query": "sales" } ``` **Search only for columns:** ```text Find columns containing "email" ``` ```json { "query": "email", "object_types": ["column"] } ``` **Search with qualified name:** ```text Find anything matching analytics.events ``` ```json { "query": "analytics.events" } ``` ## Success Response Example ```json { "success": true, "query": "sales", "resultCount": 8, "results": [ { "type": "table", "name": "sales_data", "fullyQualifiedName": "analytics.main.sales_data", "database": "analytics", "schema": "main", "table": null, "dataType": null, "comment": "Daily sales transactions", "relevanceScore": 0.95 }, { "type": "table", "name": "monthly_sales", "fullyQualifiedName": "analytics.main.monthly_sales", "database": "analytics", "schema": "main", "table": null, "dataType": null, "comment": null, "relevanceScore": 0.89 }, { "type": "column", "name": "total_sales", "fullyQualifiedName": "analytics.main.revenue.total_sales", "database": "analytics", "schema": "main", "table": "revenue", "dataType": "DECIMAL(18,2)", "comment": "Total sales amount", "relevanceScore": 0.87 }, { "type": "share", "name": "regional_sales_share", "fullyQualifiedName": "regional_sales_share", "database": "regional_sales_share", "schema": null, "table": null, "dataType": "md:_share/org123/regional_sales_share", "comment": null, "relevanceScore": 0.82 } ] } ``` ## Error Response Example ```json { "success": false, "error": "Search query cannot be empty", "errorType": "ValidationError" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/share-dive-data --- sidebar_position: 14 title: share_dive_data description: Share the data for a Dive with your organization feature_stage: preview --- Share the data for a [Dive](/docs/key-tasks/ai-and-motherduck/dives) with your organization. Creates org-scoped shares for owned databases used in the Dive, so others in the organization can view it. ## Description The `share_dive_data` tool makes a Dive's underlying data accessible to your organization. When a Dive queries databases that you own but haven't shared, other users in your organization won't be able to view the Dive. This tool creates shares for those databases and updates the Dive to reference the shared versions. The tool: 1. Verifies you own the Dive 2. Analyzes the Dive's SQL queries to find referenced databases 3. Creates org-scoped shares for any databases that aren't already shared 4. Updates the Dive to use the shared database references Use this after [`save_dive`](../save-dive) or [`update_dive`](../update-dive) when you want your team to be able to view a Dive that queries your private databases. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `diveId` | string | Yes | The unique identifier (UUID) of the Dive to share data for | ## Output Schema ```json { "success": boolean, "dive": { // Dive info (on success) "id": string, // Dive identifier "title": string, // Dive title "version": number // New version number after update }, "shares": [ // Shares created (on success) { "database": string, // Database name "shareName": string, // Share name "shareUrl": string, // Share URL for the database "created": boolean // Whether the share was newly created } ], "requiredDatabases": [ // All databases referenced by the Dive { "type": string, // "share" or "database" "path": string, // Share URL or database path "alias": string // Database alias name } ], "url": string, // URL to view the Dive (on success) "message": string, // Status message (on success) "warnings": string[], // Warnings from analysis or sharing (if any) "error": string // Error message (on failure) } ``` ## Example Usage **Share a Dive's data after saving:** ```text Share the data for my revenue Dive with the rest of my team ``` The AI assistant will call the tool with the Dive's ID: ```json { "diveId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" } ``` **Respond to a sharing prompt after save:** After calling [`save_dive`](../save-dive), the tool may suggest sharing unshared databases. The AI assistant will call `share_dive_data` to make the data accessible: ```json { "diveId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" } ``` ## Success Response Example ```json { "success": true, "dive": { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "title": "Monthly Revenue Trends", "version": 4 }, "shares": [ { "database": "analytics", "shareName": "analytics", "shareUrl": "md:_share/analytics/a1b2c3d4-...", "created": true } ], "requiredDatabases": [ { "type": "share", "path": "md:_share/analytics/a1b2c3d4-...", "alias": "analytics" } ], "url": "https://app.motherduck.com/dives/a1b2c3d4-e5f6-7890-abcd-ef1234567890", "message": "Created 1 share(s). Dive updated with share URLs." } ``` ## Nothing to Share Response Example When all referenced databases are already shared: ```json { "success": true, "message": "All referenced databases are already shared. No action needed.", "shares": [], "requiredDatabases": [ { "type": "share", "path": "md:_share/analytics/a1b2c3d4-...", "alias": "analytics" } ], "dive": { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "version": 3 } } ``` ## Error Response Example ```json { "success": false, "error": "You don't own this dive or it doesn't exist" } ``` --- Source: https://motherduck.com/docs/sql-reference/mcp/update-dive --- sidebar_position: 12 title: update_dive description: Update an existing Dive's title, description, or content feature_stage: preview --- Update an existing [Dive's](/docs/key-tasks/ai-and-motherduck/dives) title, description, or content. Returns a URL to the Dive in MotherDuck as a link the user can click to view the updated Dive. ## Description The `update_dive` tool modifies an existing Dive in your MotherDuck workspace. You can update the title, description, content (React component code), or any combination. At least one field must be provided. When updating content, the tool validates the new code before saving, just like [`save_dive`](../save-dive). It also analyzes which databases the Dive queries and reports any unshared databases. Use [`list_dives`](../list-dives) to find the Dive ID, and [`read_dive`](../read-dive) to inspect the current code before modifying it. ## Input Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | string | Yes | The unique identifier (UUID) of the Dive to update | | `title` | string | No | New title for the Dive | | `description` | string | No | New description for the Dive | | `content` | string | No | New JSX/React component code | At least one of `title`, `description`, or `content` must be provided. ## Output Schema ```json { "success": boolean, "dive": { // Updated dive info (on success) "id": string // Dive identifier }, "dive_url": string, // URL to view the Dive (on success) "warnings": string[], // Validation warnings (if any) "database_warnings": string[], // Warnings from database analysis (if any) "unshared_databases": string[], // Database names not yet shared with the org (if any) "next_steps": string[], // Ordered instructions for the AI to follow after updating "error": string, // Error message (on failure) "validationErrors": [ // Validation errors (on failure) { "type": string, "message": string, "details": string } ] } ``` ## Example Usage **Update a Dive's content:** ```text Add a region filter to my revenue trends Dive ``` The AI assistant will call `read_dive` to get the current code, modify it, then call `update_dive`: ```json { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "content": "import { useSQLQuery } from \"@motherduck/react-sql-query\";\n// ... updated component with region filter\n" } ``` **Update just the title and description:** ```text Rename my revenue Dive to "Q1 Revenue Dashboard" ``` ```json { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", "title": "Q1 Revenue Dashboard", "description": "Revenue trends filtered to Q1 2025" } ``` ## Success Response Example ```json { "success": true, "dive": { "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890" }, "dive_url": "https://app.motherduck.com/dives/a1b2c3d4-e5f6-7890-abcd-ef1234567890", "next_steps": [ "Regenerate the dive preview artifact with the updated banner...", "Show the dive to the user in chat as a markdown hyperlink using the dive title: [dive title](https://app.motherduck.com/dives/a1b2c3d4-...)" ] } ``` ## Error Response Example ```json { "success": false, "error": "At least one of title, description, or content must be provided" } ``` --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/ai-functions --- sidebar_position: 0 title: AI Functions description: MotherDuck AI SQL functions for text generation, embeddings, and SQL assistance. --- # AI Functions MotherDuck AI functions reference. These functions leverage AI models to perform various tasks including text generation, embeddings, and SQL assistance. For more practical guidance, see our [AI and MotherDuck](/category/ai-and-motherduck/) how-to guides. Costs can be found on the [Pricing Page](/about-motherduck/billing/pricing/#ai-function-pricing). Information about regional data processing of AI functions can be found at the bottom of the individual function pages. ## Available Functions ## Included pages - [SQL Assistant](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant) - [EMBEDDING](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/embedding): Generate vector embeddings for text using the EMBEDDING function for semantic search. - [PROMPT](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/prompt): Generate AI responses directly in SQL with the PROMPT function. - [Dives Functions](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives): SQL table functions for creating, reading, updating, and deleting MotherDuck Dives. --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/dives --- sidebar_position: 0 title: Dives Functions description: SQL table functions for creating, reading, updating, and deleting MotherDuck Dives. feature_stage: preview --- SQL table functions for managing [Dives](/key-tasks/ai-and-motherduck/dives), the interactive React data apps that query live MotherDuck data. These functions let you create, read, update, and delete Dives directly from SQL. Dives use the [`useSQLQuery` hook](use-sql-query) to query data from within their React components. :::note These functions are executed server-side on MotherDuck. They are not available on local-only DuckDB connections. ::: Create your first Dive assisted by your AI-tool of choice using our [MCP server](/key-tasks/ai-and-motherduck/mcp-setup/). Or try out a minimal working example using only SQL. ```sql SELECT * FROM MD_CREATE_DIVE( title = 'PokeDuck', content = ' import { useSQLQuery } from "@motherduck/react-sql-query"; export default function Dive() { const { data } = useSQLQuery( `SELECT PROMPT(''Suggest a duck type or pokemon and tell a fun fact about them'')`, { select: (rows) => Object.values(rows[0])[0] } ); return

FUN FACT:

{JSON.stringify(data)}

; }' ); ``` ## Available functions ## Included pages - [MD_LIST_DIVES](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-list-dives): List all Dives in your MotherDuck account with pagination support. - [useSQLQuery hook](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/use-sql-query): React hook for querying MotherDuck data from within Dives. - [MD_GET_DIVE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-get-dive): Retrieve a Dive by ID including its full React component content. - [MD_CREATE_DIVE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-create-dive): Create a new Dive in your MotherDuck account. - [MD_UPDATE_DIVE_METADATA](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-update-dive-metadata): Update a Dive's title or description without creating a new version. - [MD_UPDATE_DIVE_CONTENT](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-update-dive-content): Update a Dive's React component code, creating a new version. - [MD_DELETE_DIVE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-delete-dive): Permanently delete a Dive by ID. - [MD_LIST_DIVE_VERSIONS](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-list-dive-versions): List all versions of a specific Dive with pagination support. - [MD_GET_DIVE_VERSION](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-get-dive-version): Retrieve a specific historical version of a Dive including its content. --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-create-dive --- sidebar_position: 3 title: MD_CREATE_DIVE description: Create a new Dive in your MotherDuck account. feature_stage: preview --- Creates a new [Dive](/key-tasks/ai-and-motherduck/dives) in your MotherDuck account. Returns the created Dive's metadata and initial version information. ## Syntax ```sql SELECT * FROM MD_CREATE_DIVE( title ='My Dive', content ='', description ='A brief description', api_version =1 ); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `title` | `VARCHAR` | Yes | The title of the Dive | | `content` | `VARCHAR` | Yes | The JSX/React component code | | `description` | `VARCHAR` | No | A brief description of the Dive | | `api_version` | `UINTEGER` | No | API version for the Dive format. Defaults to `1`. | ## Return Columns | Column | Type | Description | |--------|------|-------------| | `id` | `UUID` | Unique identifier of the created Dive | | `title` | `VARCHAR` | Dive title | | `description` | `VARCHAR` | Dive description | | `owner_id` | `UUID` | UUID of the Dive owner | | `current_version` | `INTEGER` | Version number (1 for newly created Dives) | | `created_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was created | | `updated_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was last updated | | `owner_name` | `VARCHAR` | Name of the Dive owner | | `version_id` | `UUID` | UUID of the initial version | | `version_storage_url` | `VARCHAR` | Storage URL of the version content | | `version_description` | `VARCHAR` | Description for this version | | `version_created_at` | `TIMESTAMP WITH TIME ZONE` | When this version was created | | `version_api_version` | `UINTEGER` | API version used | ## Examples Create a Dive with title and content: ```sql SELECT id, title, current_version FROM MD_CREATE_DIVE( title ='Revenue Trends', content ='import { useSQLQuery } from "@motherduck/react-sql-query"; export default function Dive() { const { data } = useSQLQuery(`SELECT * FROM sales`); return
{JSON.stringify(data)}
; }' ); ``` Create a Dive with a description: ```sql SELECT * FROM MD_CREATE_DIVE( title ='Monthly Revenue', description ='Line chart of revenue by month', content ='' ); ``` ## Related - [`MD_UPDATE_DIVE_CONTENT`](../md-update-dive-content) — Update a Dive's content (creates a new version) - [`MD_UPDATE_DIVE_METADATA`](../md-update-dive-metadata) — Update a Dive's title or description - [`save_dive` MCP tool](/sql-reference/mcp/save-dive) — AI assistant equivalent --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-delete-dive --- sidebar_position: 6 title: MD_DELETE_DIVE description: Permanently delete a Dive by ID. feature_stage: preview --- Permanently deletes a [Dive](/key-tasks/ai-and-motherduck/dives) and all its versions. This action cannot be undone. ## Syntax ```sql SELECT * FROM MD_DELETE_DIVE(id = 'your-dive-uuid'::UUID); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | `UUID` | Yes | The unique identifier of the Dive to delete | ## Return Columns | Column | Type | Description | |--------|------|-------------| | `success` | `BOOLEAN` | `true` if the Dive was deleted | ## Examples Delete a Dive: ```sql SELECT * FROM MD_DELETE_DIVE(id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID); ``` ## Errors Returns an error if the Dive does not exist. ## Related - [`MD_LIST_DIVES`](../md-list-dives) — List Dives to find the ID - [`delete_dive` MCP tool](/sql-reference/mcp/delete-dive) — AI assistant equivalent --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-get-dive-version --- sidebar_position: 8 title: MD_GET_DIVE_VERSION description: Retrieve a specific historical version of a Dive including its content. feature_stage: preview --- Retrieves a specific version of a [Dive](/key-tasks/ai-and-motherduck/dives), including the full React component source code. Version numbers are 0-based—the first version of a Dive is version `0`. ## Syntax ```sql SELECT * FROM MD_GET_DIVE_VERSION( id = 'your-dive-uuid'::UUID, version = 0 ); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | `UUID` | Yes | The unique identifier of the Dive | | `version` | `UINTEGER` | Yes | The version number to retrieve (0-based) | ## Return Columns | Column | Type | Description | |--------|------|-------------| | `id` | `UUID` | UUID of this version | | `version` | `UINTEGER` | Version number | | `storage_url` | `VARCHAR` | Storage URL of the version content | | `description` | `VARCHAR` | Version description or commit message | | `created_at` | `TIMESTAMP WITH TIME ZONE` | When this version was created | | `api_version` | `UINTEGER` | API version used to create this version | | `content` | `VARCHAR` | Full JSX/React component source code | ## Examples Read the original version of a Dive: ```sql SELECT content FROM MD_GET_DIVE_VERSION( id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID, version = 0 ); ``` Compare version metadata: ```sql SELECT version, description, created_at FROM MD_GET_DIVE_VERSION( id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID, version = 2 ); ``` ## Errors Returns an error if the Dive or the specified version does not exist. ## Related - [`MD_LIST_DIVE_VERSIONS`](../md-list-dive-versions) — List all versions to find version numbers - [`MD_GET_DIVE`](../md-get-dive) — Get the latest version of a Dive - [`read_dive` MCP tool](/sql-reference/mcp/read-dive) — AI assistant equivalent (supports `version` parameter) --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-get-dive --- sidebar_position: 2 title: MD_GET_DIVE description: Retrieve a Dive by ID including its full React component content. feature_stage: preview --- Retrieves a single [Dive](/key-tasks/ai-and-motherduck/dives) by ID, including the full React component source code for the current version. ## Syntax ```sql SELECT * FROM MD_GET_DIVE(id ='your-dive-uuid'::UUID); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | `UUID` | Yes | The unique identifier of the Dive | ## Return Columns | Column | Type | Description | |--------|------|-------------| | `id` | `UUID` | Unique identifier of the Dive | | `title` | `VARCHAR` | Dive title | | `description` | `VARCHAR` | Dive description | | `owner_id` | `UUID` | UUID of the Dive owner | | `current_version` | `INTEGER` | Latest version number (1-based) | | `created_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was created | | `updated_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was last updated | | `owner_name` | `VARCHAR` | Name of the Dive owner | | `version_id` | `UUID` | UUID of the current version | | `version_storage_url` | `VARCHAR` | Storage URL of the current version content | | `version_description` | `VARCHAR` | Description/commit message for the current version | | `version_created_at` | `TIMESTAMP WITH TIME ZONE` | When the current version was created | | `version_api_version` | `UINTEGER` | API version used to create this version | | `content` | `VARCHAR` | Full JSX/React component source code | ## Examples Read a Dive by ID: ```sql SELECT title, description, content FROM MD_GET_DIVE(id ='a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID); ``` Get the metadata without the content: ```sql SELECT id, title, owner_name, current_version, updated_at FROM MD_GET_DIVE(id ='a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID); ``` ## Errors Returns an error if the Dive does not exist. ## Related - [`MD_GET_DIVE_VERSION`](../md-get-dive-version) — Retrieve a specific historical version - [`MD_LIST_DIVES`](../md-list-dives) — List all Dives to find IDs - [`read_dive` MCP tool](/sql-reference/mcp/read-dive) — AI assistant equivalent --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-list-dive-versions --- sidebar_position: 7 title: MD_LIST_DIVE_VERSIONS description: List all versions of a specific Dive with pagination support. feature_stage: preview --- Lists all versions of a specific [Dive](/key-tasks/ai-and-motherduck/dives) with pagination support. Each time a Dive's content is updated via [`MD_UPDATE_DIVE_CONTENT`](../md-update-dive-content), a new version is created. This function returns version metadata without the content. ## Syntax ```sql SELECT * FROM MD_LIST_DIVE_VERSIONS(id = 'your-dive-uuid'::UUID); SELECT * FROM MD_LIST_DIVE_VERSIONS( id = 'your-dive-uuid'::UUID, "limit" = 100, "offset" = 0 ); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | `UUID` | Yes | The unique identifier of the Dive | | `limit` | `UINTEGER` | No | Maximum number of versions to return | | `offset` | `UINTEGER` | No | Number of versions to skip | :::tip `limit` and `offset` are reserved SQL keywords and must be double-quoted when used as named parameters. ::: ## Return Columns | Column | Type | Description | |--------|------|-------------| | `id` | `UUID` | UUID of this version | | `version` | `UINTEGER` | Version number (0-based) | | `storage_url` | `VARCHAR` | Storage URL of the version content | | `description` | `VARCHAR` | Version description or commit message | | `created_at` | `TIMESTAMP WITH TIME ZONE` | When this version was created | | `api_version` | `UINTEGER` | API version used to create this version | ## Examples List all versions of a Dive: ```sql SELECT version, description, created_at FROM MD_LIST_DIVE_VERSIONS(id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID); ``` Get the 5 most recent versions: ```sql SELECT version, description, created_at FROM MD_LIST_DIVE_VERSIONS(id = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID) ORDER BY version DESC LIMIT 5; ``` ## Errors Returns an error if the Dive does not exist. ## Related - [`MD_GET_DIVE_VERSION`](../md-get-dive-version) — Retrieve a specific version including content - [`MD_UPDATE_DIVE_CONTENT`](../md-update-dive-content) — Create a new version --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-list-dives --- sidebar_position: 1 title: MD_LIST_DIVES description: List all Dives in your MotherDuck account with pagination support. feature_stage: preview --- Lists all [Dives](/key-tasks/ai-and-motherduck/dives) in your MotherDuck account. Returns metadata for each Dive without the component content. ## Syntax ```sql SELECT * FROM MD_LIST_DIVES(); SELECT * FROM MD_LIST_DIVES("limit" =100, "offset" =0); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `limit` | `UINTEGER` | No | Maximum number of Dives to return | | `offset` | `UINTEGER` | No | Number of Dives to skip | | `include_org_shares` | `BOOLEAN` | No | Include Dives shared with your organization. Defaults to `false`. | :::tip `limit` and `offset` are reserved SQL keywords and must be double-quoted when used as named parameters. ::: ## Return Columns | Column | Type | Description | |--------|------|-------------| | `id` | `UUID` | Unique identifier of the Dive | | `title` | `VARCHAR` | Dive title | | `description` | `VARCHAR` | Dive description | | `owner_id` | `UUID` | UUID of the Dive owner | | `current_version` | `INTEGER` | Latest version number (1-based) | | `created_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was created | | `updated_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was last updated | | `owner_name` | `VARCHAR` | Name of the Dive owner | ## Examples List all Dives: ```sql SELECT * FROM MD_LIST_DIVES(); ``` List the 10 most recently updated Dives: ```sql SELECT id, title, owner_name, updated_at FROM MD_LIST_DIVES() ORDER BY updated_at DESC LIMIT 10; ``` Paginate through Dives: ```sql SELECT * FROM MD_LIST_DIVES("limit" =20, "offset" =0); -- first page SELECT * FROM MD_LIST_DIVES("limit" =20, "offset" =20); -- second page ``` ## Related - [`MD_GET_DIVE`](../md-get-dive) — Read a Dive including its content - [`MD_LIST_DIVE_VERSIONS`](../md-list-dive-versions) — List version history for a Dive - [`list_dives` MCP tool](/sql-reference/mcp/list-dives) — AI assistant equivalent --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-update-dive-content --- sidebar_position: 5 title: MD_UPDATE_DIVE_CONTENT description: Update a Dive's React component code, creating a new version. feature_stage: preview --- Updates the content of an existing [Dive](/key-tasks/ai-and-motherduck/dives), creating a new version. Each call increments the version number. Previous versions remain accessible via [`MD_GET_DIVE_VERSION`](../md-get-dive-version). ## Syntax ```sql SELECT * FROM MD_UPDATE_DIVE_CONTENT( id ='your-dive-uuid'::UUID, content ='' ); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | `UUID` | Yes | The unique identifier of the Dive to update | | `content` | `VARCHAR` | Yes | The new JSX/React component code | | `description` | `VARCHAR` | No | Version description or commit message | | `api_version` | `UINTEGER` | No | API version for the Dive format. Defaults to `1`. | ## Return Columns | Column | Type | Description | |--------|------|-------------| | `id` | `UUID` | UUID of the new version (not the Dive UUID) | | `version` | `UINTEGER` | New version number (0-based) | | `storage_url` | `VARCHAR` | Storage URL of the version content | | `description` | `VARCHAR` | Version description | | `created_at` | `TIMESTAMP WITH TIME ZONE` | When this version was created | | `api_version` | `UINTEGER` | API version used | ## Examples Update a Dive's content: ```sql SELECT version, created_at FROM MD_UPDATE_DIVE_CONTENT( id ='a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID, content ='' ); ``` Update with a version description: ```sql SELECT * FROM MD_UPDATE_DIVE_CONTENT( id ='a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID, content ='', description ='Added region filter' ); ``` ## Errors Returns an error if the Dive does not exist. ## Related - [`MD_UPDATE_DIVE_METADATA`](../md-update-dive-metadata) — Update title or description without creating a new version - [`MD_LIST_DIVE_VERSIONS`](../md-list-dive-versions) — List all versions of a Dive - [`update_dive` MCP tool](/sql-reference/mcp/update-dive) — AI assistant equivalent --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/md-update-dive-metadata --- sidebar_position: 4 title: MD_UPDATE_DIVE_METADATA description: Update a Dive's title or description without creating a new version. feature_stage: preview --- Updates the title and/or description of an existing [Dive](/key-tasks/ai-and-motherduck/dives). This does not create a new version—only the metadata is changed. ## Syntax ```sql SELECT * FROM MD_UPDATE_DIVE_METADATA( id ='your-dive-uuid'::UUID, title ='New Title', description ='New description' ); ``` ## Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `id` | `UUID` | Yes | The unique identifier of the Dive to update | | `title` | `VARCHAR` | No | New title for the Dive | | `description` | `VARCHAR` | No | New description for the Dive | At least one of `title` or `description` should be provided. ## Return Columns | Column | Type | Description | |--------|------|-------------| | `id` | `UUID` | Unique identifier of the Dive | | `title` | `VARCHAR` | Updated Dive title | | `description` | `VARCHAR` | Updated Dive description | | `owner_id` | `UUID` | UUID of the Dive owner | | `current_version` | `INTEGER` | Current version number (unchanged) | | `created_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was created | | `updated_at` | `TIMESTAMP WITH TIME ZONE` | When the Dive was last updated | | `owner_name` | `VARCHAR` | Name of the Dive owner | ## Examples Update a Dive's title: ```sql SELECT id, title, updated_at FROM MD_UPDATE_DIVE_METADATA( id ='a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID, title ='Q1 Revenue Dashboard' ); ``` Update both title and description: ```sql SELECT * FROM MD_UPDATE_DIVE_METADATA( id ='a1b2c3d4-e5f6-7890-abcd-ef1234567890'::UUID, title ='Q1 Revenue Dashboard', description ='Revenue breakdown by region for Q1 2025' ); ``` ## Errors Returns an error if the Dive does not exist. ## Related - [`MD_UPDATE_DIVE_CONTENT`](../md-update-dive-content) — Update a Dive's content (creates a new version) - [`update_dive` MCP tool](/sql-reference/mcp/update-dive) — AI assistant equivalent --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/dives/use-sql-query --- sidebar_position: 1 title: useSQLQuery hook description: React hook for querying MotherDuck data from within Dives. feature_stage: preview --- The `useSQLQuery` hook is a React hook that runs SQL queries against MotherDuck from within a [Dive](/key-tasks/ai-and-motherduck/dives). It handles loading states and error reporting so your component can focus on rendering data. ## Import ```jsx import { useSQLQuery } from "@motherduck/react-sql-query"; ``` ## Syntax ```jsx const { data, isLoading, isError, error, exportAs } = useSQLQuery(sql, options); ``` ## Parameters ### `sql` | Type | Required | |------|----------| | `string` | Yes | A SQL query string to run against MotherDuck. The Dive runtime returns rows as objects. Use fully qualified, double-quoted table names in your queries (`"database"."schema"."table"`) to avoid issues when the Dive runs outside your current database context. ### `options` | Property | Type | Default | Description | |----------|------|---------|-------------| | `enabled` | `boolean` | `true` | Set to `false` to skip query execution. Useful when a query depends on user input that isn't available yet. | ## Return value | Property | Type | Description | |----------|------|-------------| | `data` | array or `undefined` | Query result as an array of row objects. `undefined` while loading. | | `isLoading` | `boolean` | `true` while the query is running | | `isError` | `boolean` | `true` if the query failed | | `error` | `Error` or `null` | Error object if the query failed | | `exportAs` | `(options) => Promise` | Exports this query as a file | :::warning `data` is the row array directly — there is no `data.rows` wrapper. Always guard against `undefined`: ```jsx const rows = Array.isArray(data) ? data : []; ``` ::: ## Numeric values DuckDB returns `BIGINT`, `HUGEINT`, and `DECIMAL` as JavaScript `BigInt` or special objects, not `number`. These crash if rendered in JSX or used with `.toFixed()`. Define this helper at the top of every Dive and wrap all numeric values: ```jsx const N = (v) => (v != null ? Number(v) : 0); ``` ## Examples ### Basic query ```jsx import { useSQLQuery } from "@motherduck/react-sql-query"; const N = (v) => (v != null ? Number(v) : 0); export default function Dive() { const { data, isLoading } = useSQLQuery(` SELECT category, SUM(amount) AS total FROM "my_db"."main"."sales" GROUP BY ALL `); if (isLoading) return
Loading...
; const rows = Array.isArray(data) ? data : []; return (
    {rows.map((row) => (
  • {row.category}: ${N(row.total).toLocaleString()}
  • ))}
); } ``` ### Conditional queries with `enabled` Skip a query until a user selection is available: ```jsx const [selected, setSelected] = useState(null); const { data: details } = useSQLQuery(` SELECT * FROM "my_db"."main"."products" WHERE category = '${selected}' `, { enabled: !!selected }); ``` ### Multiple independent queries Each `useSQLQuery` call loads independently. Render the page layout immediately and show inline placeholders per section instead of a single loading spinner: ```jsx const summary = useSQLQuery(` SELECT COUNT(*) AS total, SUM(revenue) AS revenue FROM "my_db"."main"."orders" `); const monthly = useSQLQuery(` SELECT strftime(date_trunc('month', order_date), '%Y-%m') AS month, SUM(revenue) AS revenue FROM "my_db"."main"."orders" GROUP BY 1 ORDER BY 1 `); return (
{summary.isLoading ?
:

{N(summary.data?.[0]?.revenue)}

} {monthly.isLoading ?
: }
); ``` ### Export query results Use `exportAs()` when you want to export the same SQL query used by a `useSQLQuery()` hook. Exports must start from a user action, such as a button click. ```jsx import { useSQLQuery } from "@motherduck/react-sql-query"; export default function Dive() { const orders = useSQLQuery(` SELECT * FROM "my_db"."main"."orders" ORDER BY order_date DESC `); return ( ); } ``` Use `useExport()` when the export should run a different SQL query than the rows rendered on the page: ```jsx import { useExport } from "@motherduck/react-sql-query"; export default function DiveExportButton() { const { exportQuery } = useExport(); return ( ); } ``` Supported formats are `csv`, `json`, `parquet`, and `xlsx`. The `filename` value is a base name; MotherDuck adds the file extension. You can pass DuckDB `COPY` writer options under `csv`, `json`, `parquet`, or `xlsx`: ```jsx orders.exportAs({ format: "xlsx", filename: "orders", xlsx: { sheet: "Orders", header: true, }, }); ``` Exports run with DuckDB `COPY TO`, not from the rows already materialized in React. This means an export can contain more rows than the Dive renders. Export SQL must be read-only. ## Tips - **Format dates in SQL, not JavaScript.** Use DuckDB's `strftime()` to format dates and timestamps as strings. DuckDB date types are returned as special objects that don't render in JSX. - **Fill time series gaps in SQL.** Recharts does not interpolate missing time periods. Use `generate_series` with a `LEFT JOIN` to produce a continuous date spine. - **Dollar-quoted strings.** When passing Dive content through SQL (for example, with [`MD_CREATE_DIVE`](../md-create-dive)), use [dollar-quoted string literals](https://duckdb.org/docs/stable/sql/data_types/literal_types#dollar-quoted-string-literals) to avoid escaping issues with nested quotes. ## Related - [Dives SQL functions](/sql-reference/motherduck-sql-reference/ai-functions/dives/) — Manage Dives with SQL - [Creating visualizations with Dives](/key-tasks/ai-and-motherduck/dives) — How-to guide - [Embedding Dives in your web application](/key-tasks/ai-and-motherduck/dives/embedding-dives) — Handle embedded Dive sessions and exports - [MCP Server](/sql-reference/mcp/) — Create Dives through AI assistants --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/embedding --- sidebar_position: 1 title: EMBEDDING description: Generate vector embeddings for text using the EMBEDDING function for semantic search. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ::::warning[Preview Feature] This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility. :::: ## Embedding function The `embedding` function lets you generate vector representations (embeddings) of text directly from SQL. These embeddings capture semantic meaning, enabling powerful [semantic search](/key-tasks/ai-and-motherduck/text-search-in-motherduck/#embedding-based-search) and other natural language processing tasks. The function uses OpenAI's models: `text-embedding-3-small` (default) with 512 dimensions or `text-embedding-3-large` with 1024 dimensions. Both models support single- and multi-row inputs, enabling batch processing. The maximum input size is limited to 2048 characters - larger inputs will be truncated. Consumption is measured in [AI Units](/about-motherduck/billing/pricing#ai-function-pricing). One AI Unit equates to approximately: - 60,000 embedding rows with `text-embedding-3-small` - 12,000 embedding rows with `text-embedding-3-large` These estimates assume an input size of 1,000 characters. ### Syntax ```sql SELECT embedding(my_text_column) FROM my_table; -- returns FLOAT[512] column ``` ### Parameters The `embedding` function accepts parameters using named parameter syntax with the `:=` operator. | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `text_input` | Yes | The text to be converted into an embedding vector | | `model` | No | Model type, either `'text-embedding-3-small'` (default) or `'text-embedding-3-large'` | ### Return types The `embedding` function returns different array sizes depending on the model used: - With `text-embedding-3-small`: Returns `FLOAT[512]` - With `text-embedding-3-large`: Returns `FLOAT[1024]` ### Examples #### Basic embedding generation ```sql -- Generate embeddings using the default model (text-embedding-3-small) SELECT embedding('This is a sample text') AS text_embedding; -- Generate embeddings using the larger model for higher dimensionality SELECT embedding('This is a sample text', model:='text-embedding-3-large') AS text_embedding; ``` #### Batch processing ```sql -- Generate embeddings for multiple rows at once SELECT title, embedding(overview) AS overview_embeddings FROM kaggle.movies LIMIT 10; ``` ### Use cases #### Creating an embedding database This example uses the sample movies dataset from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). ```sql --- Create a new table with embeddings for the first 100 overview entries CREATE TABLE my_db.movies AS SELECT title, overview, embedding(overview) AS overview_embeddings FROM kaggle.movies LIMIT 100; ``` If write access to the source table is available, the embedding column can also be added in place: ```sql --- Update the existing table to add new column for embeddings ALTER TABLE my_db.movies ADD COLUMN overview_embeddings FLOAT[512]; --- Populate the column with embeddings UPDATE my_db.movies SET overview_embeddings = embedding(overview); ``` The movies table now contains a new column `overview_embeddings` with vector representations of each movie description: ```sql SELECT * FROM my_db.movies; ``` | **title** | **overview** | **overview_embeddings** | | ----------------- | ----------------- |----------------------------------------------------| | 'Toy Story 3' | 'Led by Woody, Andy's toys live happily in [...]' | [0.023089351132512093, -0.012809964828193188, ...] | | 'Jumanji' | 'When siblings Judy and Peter discover an [...]' | [-0.005538413766771555, 0.0799209326505661, ...] | | ... | ... | ... | #### Semantic similarity search The `array_cosine_similarity` function can be used to compute similarities between embeddings. This enables semantic search to retrieve entries that are conceptually / semantically similar to a query, even if they don't share the same keywords. ```sql -- Find movies similar to "Toy Story" based on semantic similarity SELECT title, overview, array_cosine_similarity( embedding('Led by Woody, Andy''s toys live happily [...]'), overview_embeddings ) AS similarity FROM kaggle.movies WHERE title != 'Toy Story' ORDER BY similarity DESC LIMIT 5; ``` | **title** | **overview** | **similarity** | |-----------------|-----------------|-----------------| |'Toy Story 3'|'Woody, Buzz, and the rest of Andy's toys haven't [...]'|0.7372807860374451| |'Toy Story 2'|'Andy heads off to Cowboy Camp, leaving his toys [...]'|0.7222828269004822| |... |... |... | For advanced similarity search techniques including document chunking, hybrid search, and performance optimization, see the [Embedding-Based Search](/key-tasks/ai-and-motherduck/text-search-in-motherduck/#embedding-based-search) section in the Text Search guide. #### Building a recommendation system Embeddings can be used to build content-based recommendation systems: ```sql -- Create a macro to recommend similar movies CREATE OR REPLACE MACRO recommend_similar_movies(movie_title) AS TABLE ( WITH target_embedding AS ( SELECT embedding(overview) AS emb FROM sample_data.kaggle.movies WHERE title = movie_title LIMIT 1 ) SELECT m.title AS recommended_title, m.overview, array_cosine_similarity(t.emb, m.overview_embeddings) AS similarity FROM sample_data.kaggle.movies m, target_embedding t WHERE m.title != movie_title ORDER BY similarity DESC LIMIT 5 ); -- Use the macro to get recommendations SELECT * FROM recommend_similar_movies('The Matrix'); ``` #### Retrieval-augmented generation (RAG) Embeddings are a key component in building [RAG](https://motherduck.com/blog/search-using-duckdb-part-2/) systems, which can be combined with the [[`prompt` function]](/sql-reference/motherduck-sql-reference/ai-functions/prompt/#retrieval-augmented-generation-rag) for powerful question-answering capabilities: ```sql -- Create a reusable macro for question answering CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE ( SELECT question_text AS question, prompt( 'User asks the following question:\n' || question_text || '\n\n' || 'Here is some additional information:\n' || STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' || 'Please answer the question based only on the additional information provided.', model := 'gpt-4o' ) AS response FROM ( SELECT title, overview FROM sample_data.kaggle.movies ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC LIMIT 3 ) ); -- Use the macro to answer questions SELECT question, response FROM ask_question('Can you recommend some good sci-fi movies about AI?'); ``` ### Security considerations When passing free-text arguments from external sources to the embedding function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection. ```python # Using prepared statements in Python user_query = "Led by Woody, Andy's toys live happily [...]" con.execute(""" SELECT title, overview, array_cosine_similarity(embedding(?), overview_embeddings) as similarity FROM kaggle.movies ORDER BY similarity DESC LIMIT 5""", [user_query]) ``` ### Error handling When usage limits have been reached or an unexpected error occurs while computing embeddings, the function will not fail the entire query but will return `NULL` values for the affected rows. To check if all embeddings were computed successfully: ```sql -- Check for NULL values in embedding column SELECT count(*) FROM my_db.movies WHERE overview_embeddings IS NULL AND overview IS NOT NULL; ``` Missing values can be filled in with a separate query: ```sql -- Fill in missing embedding values UPDATE my_db.movies SET overview_embeddings = embedding(overview) WHERE overview_embeddings IS NULL AND overview IS NOT NULL; ``` ### Performance considerations - **Batch Processing**: when processing multiple rows, consider using `LIMIT` to control the number of API calls. - **Model Selection**: use `text-embedding-3-small` for faster, less expensive embeddings when the highest precision isn't critical. - **Caching**: results are not cached between queries, so consider storing embeddings in tables for repeated use. - **Dimensionality**: higher dimensions (using `text-embedding-3-large`) provide more precise semantic representation but require more storage and computation time. ### Notes These capabilities are provided by MotherDuck's integration with Azure OpenAI and inputs to the embedding function will be processed by Azure OpenAI. For availability and usage limits, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). Usage limits are in place to safeguard your spend, not because of throughput limitations. MotherDuck has the capacity to handle high-volume embedding workloads and is always open to working alongside customers to support any type of workload and model requirements. If you need higher usage limits or have specific requirements, please see our [support page](/troubleshooting/support/). #### Regional processing Requests are processed based on the region of the MotherDuck organization according to the table below. Functions that are not available within the region (no checkmark) will be processed with global compute resources. | Function | Global | Europe | US West | |----------|--------|--------|---------| | `EMBEDDING` (`text-embedding-3-small`) | ✓ | ✓ | ✓ | | `EMBEDDING` (`text-embedding-3-large`) | ✓ | ✓ | ✓ | --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/prompt --- sidebar_position: 1 title: PROMPT description: Generate AI responses directly in SQL with the PROMPT function. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ::::warning[Preview Feature] This is a preview feature. Preview features may be operationally incomplete and may offer limited backward compatibility. :::: ## Prompt function The `prompt` function sends text to a Large Language Model (LLM) from SQL and returns the model's response. Use it to generate free-form text, extract typed values, or produce structured data. The function supports OpenAI's `gpt-5` series (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`), `gpt-4o-mini` (default), `gpt-4o`, and the `gpt-4.1` series. All models support single-row prompts and multi-row queries for batch processing. The `prompt` function runs once per row in the result set. A query like `SELECT prompt('Write a joke') FROM range(0, 10000)` calls the model 10,000 times, even though the prompt text looks like a single call. Cost scales with the number of rows the query evaluates. Consumption is measured in [AI Units](/about-motherduck/billing/pricing#ai-function-pricing). As a rough guide, one AI Unit covers approximately the following number of rows per model: - 480 rows with `gpt-4o` - 8,000 rows with `gpt-4o-mini` - 600 rows with `gpt-4.1` - 3,000 rows with `gpt-4.1-mini` - 12,000 rows with `gpt-4.1-nano` - 720 rows with `gpt-5` - 3,600 rows with `gpt-5-mini` - 18,000 rows with `gpt-5-nano` These estimates assume about 1,000 input characters and 250 output characters per row. Actual cost depends on token usage, so longer prompts or responses consume more AI Units per row. ## Syntax ```sql SELECT prompt('Write a poem about ducks'); -- returns a single-cell result with the response ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `prompt_text` | Yes | The text input to send to the model | | `model` | No | Model type: `'gpt-5'`, `'gpt-5-mini'`, `'gpt-5-nano'`, `'gpt-4o-mini'` (default), `'gpt-4o'`, `'gpt-4.1'`, `'gpt-4.1-mini'`, or `'gpt-4.1-nano'` | | `temperature` | No | Model temperature value between `0` and `1`, default: `0.1`. Lower values produce more deterministic outputs. **Not supported with GPT-5 models** (use `reasoning_effort` instead). | | `reasoning_effort` | No | Controls reasoning depth for GPT-5 models only. Valid values: `'minimal'` (default), `'low'`, `'medium'`, `'high'`. Higher effort may improve accuracy for complex tasks. **Only available for GPT-5 series models**. | | `return_type` | No | Specifies the exact SQL type to return (e.g., `'INTEGER'`, `'BOOLEAN'`, `'DATE'`, `'VARCHAR[]'`, `'STRUCT(name VARCHAR, age INTEGER)'`). Supports most DuckDB types including primitives, arrays, structs, and enums. Mutually exclusive with `struct` and `json_schema`. | | `struct` | No | Output schema as struct, e.g. `{summary: 'VARCHAR', persons: 'VARCHAR[]'}`. Will result in `STRUCT` output. Mutually exclusive with `return_type` and `json_schema`. | | `struct_descr` | No | Descriptions for struct fields that will be added to the model's context, e.g. `{summary: 'a 1 sentence summary of the text', persons: 'an array of all persons mentioned in the text'}` | | `json_schema` | No | A JSON schema that adheres to [OpenAI's structured output guide](https://developers.openai.com/api/docs/guides/structured-outputs). Provides more flexibility than the struct/struct_descr parameters. Will result in `JSON` output. Mutually exclusive with `return_type` and `struct`. | **Note**: The `return_type` and `struct` parameters support enum types for classification tasks. Define enum types first using `CREATE TYPE`, then reference them in the struct schema (e.g., `sentiment: 'sentiment_enum'` or `categories: 'category_enum[]'` for arrays). ### Return types The `prompt` function can return different data types depending on the parameters used: - Without structure parameters: Returns `VARCHAR` - With `return_type` parameter: Returns the exact SQL type specified (e.g., `INTEGER`, `BOOLEAN`, `DATE`, `VARCHAR[]`, `STRUCT(...)`) - With `struct` parameter: Returns a `STRUCT` with the specified schema - With `json_schema` parameter: Returns `JSON` **Note**: The `return_type`, `struct`, and `json_schema` parameters are mutually exclusive. Use only one at a time. ## Example usage ### Basic text generation ```sql -- Call gpt-4o-mini (default) to generate text SELECT prompt('Write a poem about ducks') AS response; -- Call gpt-4o with higher temperature for more creative outputs SELECT prompt('Write a poem about ducks', model:='gpt-4o', temperature:=1) AS response; ``` ### Structured output with struct ```sql -- Extract structured information from text using struct parameter SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'}, struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review; ``` This returns a `STRUCT` value that can be accessed with dot notation: ```sql SELECT zoo_review.summary, zoo_review.favourite_animals, zoo_review.star_rating FROM ( SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', struct:={summary: 'VARCHAR', favourite_animals:'VARCHAR[]', star_rating:'INTEGER'}, struct_descr:={star_rating: 'visit rating on a scale from 1 (bad) to 5 (very good)'}) AS zoo_review ); ``` ### Structured output with JSON schema ```sql -- Extract structured information using JSON schema SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', json_schema := '{ "name": "zoo_visit_review", "schema": { "type": "object", "properties": { "summary": { "type": "string" }, "sentiment": { "type": "string", "enum": ["positive", "negative", "neutral"] }, "animals_seen": { "type": "array", "items": { "type": "string" } } }, "required": ["summary", "sentiment", "animals_seen"], "additionalProperties": false }, "strict": true }') AS json_review; ``` This returns a `JSON` value that, if saved, can be accessed using JSON extraction functions: ```sql SELECT json_extract_string(json_review, '$.summary') AS summary, json_extract_string(json_review, '$.sentiment') AS sentiment, json_extract(json_review, '$.animals_seen') AS animals_seen FROM ( SELECT prompt('My zoo visit was amazing, I saw elephants, tigers, and penguins. The staff was friendly.', json_schema := '{ ... }') AS json_review ); ``` ### Typed output with return type The `return_type` parameter lets you specify the exact SQL type for the model's response, providing strong typing for single-value extractions: ```sql -- Extract an integer from text SELECT prompt('The answer is 42', return_type := 'INTEGER') AS answer; -- Returns: 42 (as INTEGER type) -- Extract a boolean SELECT prompt('Is the sky blue?', return_type := 'BOOLEAN') AS is_blue; -- Returns: true (as BOOLEAN type) -- Extract a date SELECT prompt('When is January 15, 2025?', return_type := 'DATE') AS event_date; -- Returns: 2025-01-15 (as DATE type) -- Extract multiple structured fields SELECT prompt( 'John is 30 years old and lives in NYC', return_type := 'STRUCT(name VARCHAR, age INTEGER, city VARCHAR)' ) AS person_info; -- Returns: {'name': 'John', 'age': 30, 'city': 'NYC'} (as STRUCT type) -- Extract arrays SELECT prompt('List the days of the week', return_type := 'VARCHAR[]') AS weekdays; -- Returns: ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'] ``` The `return_type` parameter supports most DuckDB types including: - **Primitives**: `VARCHAR`, `INTEGER`, `BIGINT`, `DOUBLE`, `BOOLEAN`, `DATE`, `TIMESTAMP`, etc. - **Arrays**: `INTEGER[]`, `VARCHAR[]`, `DOUBLE[]`, etc. - **Structs**: `STRUCT(field1 TYPE1, field2 TYPE2, ...)` - **Enums**: Custom enum types created with `CREATE TYPE` ### GPT-5 reasoning effort The `reasoning_effort` parameter controls how much computational effort GPT-5 models spend on reasoning. This is only available for GPT-5 series models (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`): ```sql -- Use minimal reasoning (fastest, default) SELECT prompt('What is 2+2?', 'gpt-5-mini', reasoning_effort := 'minimal', return_type := 'INTEGER') AS result; -- Use low reasoning for simple tasks SELECT prompt('Count the letters in "hello"', 'gpt-5-nano', reasoning_effort := 'low', return_type := 'INTEGER') AS letter_count; -- Use medium reasoning for moderate complexity SELECT prompt('Calculate 5 factorial', 'gpt-5-mini', reasoning_effort := 'medium', return_type := 'INTEGER') AS factorial; -- Use high reasoning for complex tasks SELECT prompt('Solve this logic puzzle: ...', 'gpt-5', reasoning_effort := 'high') AS solution; ``` **Note**: The `reasoning_effort` parameter cannot be used with non-GPT-5 models, and `temperature` cannot be used with GPT-5 models. They are mutually exclusive ways of controlling model behavior. ## Use cases ### Text generation Use the prompt function to write a poem about ducks: ```sql --- Prompt LLM to write a poem about ducks SELECT prompt('Write a poem about ducks') AS response; ``` | **response** | |------------------------------------------------------------------------------------------------------------------| | 'Beneath the whispering willow trees, Where ripples dance with wayward breeze, A symphony of quacks arise [...]' | ### Summarization Use the prompt function to create a one-sentence summary of movie descriptions. The example is based on the sample movies dataset from [MotherDuck's sample data database](/docs/getting-started/interfaces/client-apis/python/query-data). ```sql --- Create a new table with summaries for the first 100 overview texts CREATE TABLE my_db.movies AS SELECT title, overview, prompt('Summarize this movie description in one sentence: ' || overview) AS summary FROM kaggle.movies LIMIT 100; ``` If write access to the source table is available, the summary column can also be added in place: ```sql --- Update the existing table to add new column for summaries ALTER TABLE my_db.movies ADD COLUMN summary VARCHAR; --- Populate the column with summaries UPDATE my_db.movies SET summary = prompt('Summarize this movie description in one sentence: ' || overview); ``` The movies table now contains a new column `summary` with one-sentence summaries of the movies: ```sql SELECT title, overview, summary FROM my_db.movies; ``` | **title** | **overview** | **summary** | |-----------|----------------------------------------------|------------------------------------------------------| | Toy Story | Led by Woody, Andy's toys live happily [...] | In "Toy Story," Woody's jealousy of the new [...] | | Jumanji | When siblings Judy and Peter discover [...] | In this thrilling adventure, siblings Judy and [...] | | ... | ... | ... | ### Structured data extraction Use the prompt function to extract structured data from text. The example is based on the same sample movies dataset from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). This time we aim to extract structured metadata from the movie's overview description. We are interested in the main characters mentioned in the descriptions, as well as the movie's genre and a rating of how much action the movie contains, given a scale of 1 (no action) to 5 (lot of action). For this, we make use of the `struct` and `struct_descr` parameters, which will result in structured output. ```sql --- Update the existing table to add new column for structured metadata ALTER TABLE my_db.movies ADD COLUMN metadata STRUCT(main_characters VARCHAR[], genre VARCHAR, action INTEGER); --- Populate the column with structured information UPDATE my_db.movies SET metadata = prompt( overview, struct:={main_characters: 'VARCHAR[]', genre: 'VARCHAR', action: 'INTEGER'}, struct_descr:={ main_characters: 'an array of the main character names mentioned in the movie description', genre: 'the primary genre of the movie based on the description', action: 'rate on a scale from 1 (no action) to 5 (high action) how much action the movie contains' } ); ``` The resulting `metadata` field is a `STRUCT` that can be accessed as follows: ```sql SELECT title, overview, metadata.main_characters, metadata.genre, metadata.action FROM my_db.movies; ``` | **title** | **overview** | **metadata.main_characters** | **metadata.genre** | **action** | |-----------|----------------------------------------------|-------------------------------------------------------------------------|------------------------------|------------| | Toy Story | Led by Woody, Andy's toys live happily [...] | ['"Woody"', '"Buzz Lightyear"', '"Andy"', '"Mr. Potato Head"', '"Rex"'] | Animation, Adventure, Comedy | 3 | | Jumanji | When siblings Judy and Peter discover [...] | ['"Judy Shepherd"', '"Peter Shepherd"', '"Alan Parrish"'] | Adventure, Fantasy, Family | 4 | | ... | ... | ... | ... | ... | ### Classification with enums The `prompt` function supports enum types for classification tasks, ensuring consistent and constrained outputs. This is particularly useful for sentiment analysis, categorization, and other classification scenarios. #### Sentiment analysis ```sql -- Define an enum for sentiment classification CREATE TYPE sentiment_type AS ENUM ('positive', 'negative', 'neutral'); -- Classify customer reviews SELECT review_text, prompt( 'Classify the sentiment of this review: ' || review_text, struct := {sentiment: 'sentiment_type'} ).sentiment AS sentiment FROM ( VALUES ('The product is amazing, I love it!'), ('Terrible quality, waste of money.'), ('It works fine, nothing special.') ) AS reviews(review_text); ``` This returns: | **review_text** | **sentiment** | |-----------------|---------------| | The product is amazing, I love it! | positive | | Terrible quality, waste of money. | negative | | It works fine, nothing special. | neutral | #### Extracting multiple categories Use enum arrays to extract multiple instances of the same category from text: ```sql -- Define enums for different types of skills mentioned in text CREATE TYPE skill_type AS ENUM ('sql', 'python', 'javascript', 'react', 'aws', 'docker', 'git'); CREATE TYPE topic_type AS ENUM ('database', 'frontend', 'backend', 'devops', 'analytics', 'security'); -- Extract skills and topics from job descriptions SELECT description, prompt( 'Extract the technical skills and topics mentioned in this text: ' || description, struct := { skills: 'skill_type[]', topics: 'topic_type[]' } ) AS extracted FROM ( VALUES ('Looking for a developer with Python and SQL experience for database analytics work'), ('Frontend role using React and JavaScript, plus Git for version control'), ('DevOps engineer needed for AWS and Docker deployment automation') ) AS jobs(description); ``` This returns arrays of enum values: | **description** | **extracted.skills** | **extracted.topics** | |-----------------|---------------------|---------------------| | Looking for a developer with Python and SQL experience for database analytics work | ['python', 'sql'] | ['database', 'analytics'] | | Frontend role using React and JavaScript, plus Git for version control | ['javascript', 'react', 'git'] | ['frontend'] | | DevOps engineer needed for AWS and Docker deployment automation | ['aws', 'docker'] | ['devops'] | ### Retrieval-augmented generation (RAG) The `prompt` function can be combined with [similarity search on embeddings](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) to build a [RAG](https://motherduck.com/blog/search-using-duckdb-part-2/) pipeline. For advanced retrieval strategies including hybrid search, reranking, and HyDE, see the [Text Search guide](/key-tasks/ai-and-motherduck/text-search-in-motherduck/). ```sql -- Create a reusable macro for question answering CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE ( SELECT question_text AS question, prompt( 'User asks the following question:\n' || question_text || '\n\n' || 'Here is some additional information:\n' || STRING_AGG('Title: ' || title || '; Description: ' || overview, '\n') || '\n' || 'Please answer the question based only on the additional information provided.', model := 'gpt-4o' ) AS response FROM ( SELECT title, overview FROM kaggle.movies ORDER BY array_cosine_similarity(overview_embeddings, embedding(question_text)) DESC LIMIT 3 ) ); -- Use the macro to answer questions SELECT question, response FROM ask_question('Can you recommend some good sci-fi movies about AI?'); ``` This will result in the following output: | **question** | **response** | |-----------------------------------------------------|-----------------------------------------------------------------------------------| | Can you recommend some good sci-fi movies about AI? | Based on the information provided, here are some sci-fi movies about AI that you might enjoy: [...] | :::warning When passing free-text arguments from external sources to the prompt function (e.g., user questions in a RAG application), always use prepared statements to prevent SQL injection. ::: Using prepared statements in [Python](/docs/getting-started/interfaces/client-apis/python/query-data/): ```python # First register the macro con.execute(""" CREATE OR REPLACE TEMP MACRO ask_question(question_text) AS TABLE ( -- Macro definition as above ); """) # Then use prepared statements for user input user_query = "Can you recommend some good sci-fi movies about AI?" result = con.execute(""" SELECT response FROM ask_question(?) """, [user_query]).fetchall()[0] print(result[0]) ``` ## Batch processing The `prompt` function can process multiple rows in a single query: ```sql --- Process multiple rows at once SELECT title, prompt('Write a tagline for this movie: ' || overview) AS tagline FROM kaggle.movies LIMIT 10; ``` ## Error handling When usage limits have been reached or an unexpected error occurs while computing prompt responses, the function returns `NULL` for the affected rows instead of failing the entire query. To check whether all responses were computed successfully, check for `NULL` values in the resulting column. ```sql -- Check for NULL values in response column SELECT count(*) FROM my_db.movies WHERE response IS NULL AND overview IS NOT NULL; ``` Missing values can be filled in with a separate query: ```sql -- Fill in missing prompt responses UPDATE my_db.movies SET response = prompt('Summarize this movie description in one sentence: ' || overview) WHERE response IS NULL AND overview IS NOT NULL; ``` ## Performance considerations - **Batch processing**: When processing multiple rows, consider using `LIMIT` to control the number of API calls. - **Model selection**: Use `gpt-4o-mini` for faster, less expensive responses when high accuracy isn't critical. - **Caching**: Results are not cached between queries, so consider storing results in tables for repeated use. ## Notes These capabilities are provided by MotherDuck's integration with Azure OpenAI. Inputs to the prompt function will be processed by Azure OpenAI. For availability and usage limits, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). Usage limits are in place to safeguard your spend, not because of throughput limitations. MotherDuck has the capacity to handle high-volume embedding workloads and is always open to working alongside customers to support any type of workload and model requirements. If you need higher usage limits or have specific requirements, please see our [support page](/troubleshooting/support/). ### Regional processing Requests are processed based on the region of the MotherDuck organization according to the table below. Functions that are not available within the region (no checkmark) will be processed with global compute resources. | Function | Global | Europe | US West | |----------|--------|--------|---------| | `PROMPT` | ✓ | ✓ | ✓ | --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/index --- sidebar_position: 0 title: SQL Assistant --- # SQL assistant Built-in SQL functions that use AI to help you work with SQL. Generate SQL queries, execute read-only questions directly, fix errors, explain queries, and more. These functions can be useful building blocks for [AI-driven analytics solutions](/key-tasks/ai-and-motherduck/building-analytics-agents/) or used stand-alone on all MotherDuck surfaces (including the CLI). To use external tools like Claude Desktop or Cursor with MotherDuck, see the [MCP Server setup guide](/key-tasks/ai-and-motherduck/mcp-setup/) (or the [local MCP server](/key-tasks/ai-and-motherduck/mcp-setup/#remote-vs-local-mcp-server) for self-hosted, read-write use). ## Available functions ## Included pages - [PROMPT_QUERY](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-query): Answer natural language questions about your data using the PROMPT_QUERY function. - [PROMPT_SQL](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql): Generate SQL queries from natural language descriptions using the PROMPT_SQL function. - [PROMPT_EXPLAIN](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-explain): Get AI-generated explanations of SQL queries using the PROMPT_EXPLAIN function. - [PROMPT_FIX_LINE](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line): Fix SQL query errors line by line using the PROMPT_FIX_LINE function. - [PROMPT_FIXUP](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fixup): Automatically fix SQL query errors using the PROMPT_FIXUP function. - [PROMPT_SCHEMA](https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-schema): Describe database contents using the PROMPT_SCHEMA function for AI-generated schema summaries. ## Notes SQL assistant functions operate on your current database by evaluating the schemas and contents of the database. You can specify which tables and columns should be considered using the optional `include_tables` parameter. By default, all tables in the current database are considered. To point the SQL assistant functions at a specific database, execute the `USE database` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). These capabilities are provided by MotherDuck's integration with Azure OpenAI. For availability and pricing, see [MotherDuck's Pricing Model](/about-motherduck/billing/pricing#motherduck-pricing-model). If you have further questions or specific requirements, please see our [support page](/troubleshooting/support/). ### Regional processing Requests are processed based on the region of the MotherDuck organization according to the table below. Functions that are not available within the region (no checkmark) will be processed with global compute resources. | Function | Global | Europe | US West | |----------|--------|--------|---------| | SQL Assistant Functions | ✓ | ✓ | ✓ | ### Data usage The data processed by MotherDuck's AI functionality is **not** used for model training. --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-explain --- sidebar_position: 0.9 title: PROMPT_EXPLAIN description: Get AI-generated explanations of SQL queries using the PROMPT_EXPLAIN function. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Explain a query The `prompt_explain` table function allows MotherDuck AI to analyze and explain SQL queries in plain English. This feature helps you understand complex queries, verify that a query does what you intend, and learn SQL concepts through practical examples. ::::tip This function is particularly useful for understanding queries written by others or for automatically documenting your own queries for future reference. :::: ### Syntax ```sql CALL prompt_explain('', [include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `query` | Yes | The SQL query to explain | | `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database). Can also be a dictionary in the format `{'table_name': ['column1', 'column2']}` to specify which columns to include for each table. | ### Example usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). #### Explaining a complex query ```sql CALL prompt_explain(' SELECT COUNT(*) as domain_count, SUBSTRING(SPLIT_PART(url, ''//'', 2), 1, POSITION(''/'' IN SPLIT_PART(url, ''//'', 2)) - 1) as domain FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY domain_count DESC LIMIT 10; '); ``` **Output**: when you run a `prompt_explain` query, you'll receive a single-column table with a detailed explanation: | **explanation** | |-----------------| |The query retrieves the top 10 most frequent domains from the `url` field in the `hn.hacker_news` table. It counts the occurrences of each domain by extracting the domain part from the URL (after the '//' and before the next '/'), groups the results by domain, and orders them in descending order of their count. The result includes the count of occurrences (`domain_count`) and the domain name itself (`domain`). | #### Using dictionary format for include_tables You can specify which columns to include for each table using the dictionary format: ```sql CALL prompt_explain(' SELECT u.id, u.name, COUNT(s.id) AS story_count FROM hn.users u LEFT JOIN hn.stories s ON u.id = s.user_id GROUP BY u.id, u.name HAVING COUNT(s.id) > 5 ORDER BY story_count DESC LIMIT 20; ', include_tables={'hn.users': ['id', 'name'], 'hn.stories': ['id', 'user_id']}); ``` This approach allows you to focus the explanation on only the relevant columns, which can be helpful for tables with many columns. #### How it works The `prompt_explain` function processes your query in several steps: 1. **Parsing**: analyzes the SQL syntax to understand the query structure 2. **Schema analysis**: examines the referenced tables and columns to understand the data model 3. **Operation analysis**: identifies the operations being performed (filtering, joining, aggregating, etc.) 4. **Translation**: converts the technical SQL into a clear, human-readable explanation 5. **Context addition**: adds relevant context about the purpose and expected results of the query ### Best practices For the best results with `prompt_explain`: 1. **Provide complete queries**: include all parts of the query for the most accurate explanation 2. **Use table aliases consistently**: this helps the function understand table relationships 3. **Specify relevant tables**: use the `include_tables` parameter for large databases 4. **Review explanations**: verify that the explanation matches your understanding of the query 5. **Use for documentation**: save explanations as comments in your code for future reference --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fix-line --- sidebar_position: 0.9 title: PROMPT_FIX_LINE description: Fix SQL query errors line by line using the PROMPT_FIX_LINE function. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Fix your query line-by-line The `prompt_fix_line` table function allows MotherDuck AI to correct specific lines in your SQL queries that contain syntax or spelling errors. Unlike [`prompt_fixup`](../prompt-fixup), which rewrites the entire query, this function targets only the problematic lines, making it faster and more precise for localized errors. ::::tip This function is ideal for fixing minor syntax errors in large queries where you want to preserve most of the original query structure and formatting. :::: ### Syntax ```sql CALL prompt_fix_line('', error='', [include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `query` | Yes | The SQL query that needs correction | | `error` | No | The error message from the SQL parser (helps identify the problematic line) | | `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database) | ### Example usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). #### Fixing simple syntax errors ```sql -- Fixing a misspelled keyword with error message CALL prompt_fix_line('SEELECT COUNT(*) as domain_count FROM hn.hackers', error=' Parser Error: syntax error at or near "SEELECT" LINE 1: SEELECT COUNT(*) as domain_count FROM h... ^'); -- Fixing a typo in a column name CALL prompt_fix_line('SELECT user_id, titlee, score FROM hn.stories LIMIT 10'); -- Fixing incorrect operator usage CALL prompt_fix_line('SELECT * FROM hn.stories WHERE score => 100'); ``` #### Fixing errors in multi-line queries ```sql -- Fixing a specific line in a complex query CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FRUM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: syntax error at or near "FRUM" LINE 5: FRUM hn.stories ^'); ``` ### Example output When you run a `prompt_fix_line` query, you'll receive a two-column table with the line number and corrected content: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 1 | SELECT COUNT(*) as domain_count FROM hn.hackers | For multi-line queries, only the problematic line is corrected: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 5 | FROM hn.stories | #### How it works The `prompt_fix_line` function processes your query in a targeted way: 1. **Error localization**: uses the error message (if provided) to identify the specific line with issues 2. **Context analysis**: examines surrounding lines to understand the query's structure and intent 3. **Targeted correction**: fixes only the problematic line while preserving the rest of the query 4. **Line replacement**: returns the corrected line with its line number for easy integration For example, when fixing a syntax error in a single line: ```sql CALL prompt_fix_line('SEELECT COUNT(*) as domain_count FROM hn.hackers', error=' Parser Error: syntax error at or near "SEELECT" LINE 1: SEELECT COUNT(*) as domain_count FROM h... ^'); ``` The function will focus only on line 1, correcting the misspelled keyword: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 1 | SELECT COUNT(*) as domain_count FROM hn.hackers | For multi-line queries with an error on a specific line: ```sql CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FRUM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: syntax error at or near "FRUM" LINE 5: FRUM hn.stories ^'); ``` The function will only correct line 5, leaving the rest of the query untouched: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 5 | FROM hn.stories | This allows you to apply the fix by replacing just the problematic line in your original query, which is especially valuable for large, complex queries where a complete rewrite would be disruptive. When multiple errors exist, you would run `prompt_fix_line` multiple times, fixing one line at a time: ```sql -- First fix CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FRUM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: syntax error at or near "FRUM" LINE 5: FRUM hn.stories ^'); -- After applying the first fix, run again for the second error CALL prompt_fix_line('SELECT user_id, COUNT(*) AS post_count, AVG(scor) AS average_score FROM hn.stories GROUP BY user_id ORDER BY post_count DESC LIMIT 10', error=' Parser Error: column "scor" does not exist LINE 4: AVG(scor) AS average_score ^'); ``` The second call would return: | **line_number** | **line_content** | |-----------------|-------------------------------------------------| | 4 | AVG(score) AS average_score | Note: you need to run `prompt_fix_line` multiple times to fix all errors. ### Best practices For the best results with `prompt_fix_line`: 1. **Include the error message**: the parser error helps pinpoint the exact issue 2. **Preserve query structure**: use this function when you want to maintain most of your original query 3. **Fix one error at a time**: to address multiple errors, run `prompt_fix_line` multiple times 4. **Include context**: provide the complete query, not just the problematic line 5. **Be specific with table names**: use the `include_tables` parameter for large databases ### Limitations While `prompt_fix_line` is efficient, be aware of these limitations: - Only fixes syntax errors, not logical errors in query structure - Accurate error messages help identify the problematic line and improve output - May not be able to fix errors that span multiple lines - Cannot fix issues related to missing tables or columns in your database - Works best with standard SQL patterns and common table structures ### Troubleshooting If you're not getting the expected results: - Ensure you've included the complete error message - Check that the line numbers in the error message match your query - For complex errors, try using `prompt_fixup` instead - If multiple lines need fixing, address them one at a time - Verify that your database schema is accessible to the function --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-fixup --- sidebar_position: 0.9 title: PROMPT_FIXUP description: Automatically fix SQL query errors using the PROMPT_FIXUP function. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Fix up your query The `prompt_fixup` table function allows MotherDuck AI to correct and **completely rewrite** SQL queries that have logical or severe syntactical issues. This powerful feature analyzes your problematic query, identifies issues, and generates a corrected version that follows proper SQL syntax and semantics. ::::tip For minor syntax errors or typos in large queries, consider using the [`prompt_fix_line`](../prompt-fix-line) function instead, which is faster and more precise as it only rewrites the problematic line. :::: ### Syntax ```sql CALL prompt_fixup('', [include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `query` | Yes | The SQL query that needs correction | | `include_tables` | No | Array of table names to consider for context (defaults to all tables in current database) | ### Example Usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). #### Fixing syntax errors ```sql -- Fixing misspelled keywords CALL prompt_fixup('SEELECT COUNT(*) as domain_count FROM hn.hackers'); -- Fixing incorrect table names CALL prompt_fixup('SELECT * FROM hn.stories WHERE score > 100 ODER BY score DESC'); -- Fixing missing clauses CALL prompt_fixup('SELECT AVG(score) hn.hacker_news GROUP score > 10'); ``` #### Fixing logical errors ```sql -- Fixing incorrect join syntax CALL prompt_fixup('SELECT u.name, s.title FROM hn.users u, hn.stories s WHERE u.id = s.user_id ORDER BY s.score'); -- Fixing aggregation issues CALL prompt_fixup('SELECT user_id, AVG(score) FROM hn.stories GROUP BY score'); -- Fixing complex query structure CALL prompt_fixup('SELECT COUNT(*) FROM hn.stories WHERE timestamp > "2020-01-01" AND timestamp < "2020-12-31" WITH score > 100'); ``` ### Example output When you run a `prompt_fixup` query, you'll receive a single-column table with the corrected SQL: | **query** | |-----------------| | SELECT COUNT(*) as domain_count FROM hn.hacker_news | #### How it works The `prompt_fixup` function processes your query in several steps: 1. **Analysis**: examines your query to identify syntax errors, logical issues, and structural problems 2. **Schema validation**: checks your query against the database schema to ensure table and column references are valid 3. **Correction**: applies fixes based on the identified issues and your likely intent 4. **Rewriting**: generates a complete, corrected version of your query that maintains your original goal For example, when fixing this query with multiple issues: ```sql CALL prompt_fixup('SEELECT AVG(scor) FRUM hn.stories WERE timestamp > "2020-01-01" GRUP BY user_id'); ``` The function will: - Correct misspelled keywords (`SEELECT` → `SELECT`, `FRUM` → `FROM`, `WERE` → `WHERE`, `GRUP` → `GROUP`) - Fix column name typos (`scor` → `score`) - Ensure proper clause ordering and syntax Resulting in a properly formatted query: | **query** | |-----------------| | SELECT AVG(score) FROM hn.stories WHERE timestamp > '2020-01-01' GROUP BY user_id | For logical errors, the process is similar but focuses on semantic correctness: ```sql CALL prompt_fixup('SELECT user_id, AVG(score) FROM hn.stories GROUP BY score'); ``` Will be corrected to: | **query** | |-----------------| | SELECT user_id, AVG(score) FROM hn.stories GROUP BY user_id | The function recognized that grouping should be by `user_id` (the non-aggregated column) rather than by `score` (which is being averaged). ### Best practices For the best results with `prompt_fixup`: 1. **Include the entire query**: even if only part of it has issues 2. **Be specific with table names**: use the `include_tables` parameter for large databases 3. **Review the fixed query**: always check that the corrected query matches your intent 4. **Use for complex issues**: prefer this function for logical errors or major syntax problems 5. **Consider alternatives**: for simple typos, `prompt_fix_line` may be more efficient ### Limitations While `prompt_fixup` is powerful, be aware of these limitations: - May change query logic if the original intent isn't clear - Performance depends on the complexity of your query - Works best with standard SQL patterns and common table structures - May not preserve exact formatting or comments from the original query - Cannot fix issues related to missing tables or columns in your database ### Troubleshooting If you're not getting the expected results: - Check that you've included all relevant tables in the `include_tables` parameter - Ensure your database schema is accessible to the function - For very complex queries, try breaking them into smaller parts - If the fixed query doesn't match your intent, try providing more context in comments --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-query --- sidebar_position: 0.1 title: PROMPT_QUERY description: Answer natural language questions about your data using the PROMPT_QUERY function. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Answer questions about your data The `prompt_query` pragma allows you to ask questions about your data in natural language. This feature translates your plain English questions into SQL, executes the query, and returns the results. Under the hood, MotherDuck analyzes your database schema, generates appropriate SQL and executes the query on your behalf. This makes data exploration and analysis accessible to users of all technical levels. For comprehensive guidance on building analytics agents, including best practices and implementation patterns, see [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/). ::::info The `prompt_query` pragma is a read-only operation and does not allow queries that modify the database. :::: ### Syntax ```sql PRAGMA prompt_query('') ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `question` | Yes | The natural language question about your data | ### Example usage Here are several examples using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news) from [MotherDuck's sample data database](/getting-started/sample-data-queries/datasets). `prompt_query` can be used to answer both simple and complex questions. #### Basic questions ```sql -- Find the most shared domains PRAGMA prompt_query('what are the top domains being shared on hacker_news?') -- Analyze posting patterns PRAGMA prompt_query('what day of the week has the most posts?') -- Identify trends PRAGMA prompt_query('how has the number of posts changed over time?') ``` #### Complex questions ```sql -- Multi-part analysis PRAGMA prompt_query('what are the top 5 domains with the highest average score, and how many stories were posted from each?') -- Time-based analysis PRAGMA prompt_query('compare the average score of posts made during weekdays versus weekends') -- Conditional filtering PRAGMA prompt_query('which users have posted the most stories about artificial intelligence or machine learning?') ``` ### Best practices For the best results with `prompt_query`: 1. **Be specific**: clearly state what information you're looking for 2. **Provide context**: include relevant details about the data you want to analyze 3. **Use natural language**: phrase your questions as you would ask a data analyst 4. **Start simple**: begin with straightforward questions and build to more complex ones 5. **Refine iteratively**: if results aren't what you expected, try rephrasing your question ### Limitations While `prompt_query` is powerful, be aware of these limitations: - Only performs read operations (`SELECT` queries) - Works best with well-structured data with clear column names - Complex statistical analyses will likely require you (or an LLM) to write SQL - Performance depends on the complexity of your question and database size - May not understand highly domain-specific terminology without you giving more context ### Troubleshooting If you're not getting the expected results: - Check that you're connected to the correct database - Ensure your question is clear and specific - Try rephrasing your question using different terms - For complex analyses, break down into multiple simpler questions --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-schema --- sidebar_position: 0.9 title: PROMPT_SCHEMA description: Describe database contents using the PROMPT_SCHEMA function for AI-generated schema summaries. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import Admonition from '@theme/Admonition'; ## Describe contents of a database The `prompt_schema` table function allows MotherDuck AI to analyze and describe the contents of your current database in plain English. This feature helps you understand the structure, purpose, and relationships between tables in your database without having to manually inspect each table's schema. ::::tip This function is particularly useful when working with unfamiliar databases or when you need a high-level overview of a complex database structure. :::: ### Syntax ```sql CALL prompt_schema([include_tables=['', '']]); ``` ### Parameters | **Parameter** | **Required** | **Description** | |--------------------|--------------|--------------------------------------------------------------------------------------------------------------------------| | `include_tables` | No | Array of table names to consider for analysis (defaults to all tables in current database) | ### Example usage Here are several examples using MotherDuck's [sample data database](/getting-started/sample-data-queries/datasets). #### Describing the entire database ```sql CALL prompt_schema(); ``` #### Example output When you run a `prompt_schema` query, you'll receive a single-column table with a detailed description: | **summary** | |-----------------| | The database contains tables related to ambient air quality data, Stack Overflow survey results, NYC taxi and service requests, rideshare data, movie information with embeddings, and Hacker News articles, capturing a wide range of information from environmental metrics to user-generated content and transportation data. | #### Describing specific tables ```sql CALL prompt_schema(include_tables=['hn.hacker_news', 'hn.stories']); ``` | **summary** | |-----------------| | The database contains information about Hacker News posts, including details such as the title, URL, content, author, score, time of posting, type of post, and various identifiers and status flags. | #### How it works The `prompt_schema` function processes your database in several steps: 1. **Schema extraction**: examines the structure of tables, including column names and data types 2. **Data sampling**: analyzes sample data to understand the content and purpose of each table 3. **Relationship detection**: identifies potential relationships between tables based on column names and values 4. **Domain recognition**: categorizes tables into domains or subject areas based on their content 5. **Summary generation**: creates a human-readable description of the database structure and purpose ### Best practices For the best results with `prompt_schema`: 1. **Focus on relevant tables**: use the `include_tables` parameter to analyze specific parts of large databases 2. **Run on updated databases**: ensure your database is up-to-date for the most accurate description 3. **Use for documentation**: save the output as part of your database documentation 4. **Combine with other tools**: use alongside `DESCRIBE` and `SHOW` commands for complete understanding 5. **Share with team members**: use the output to help new team members understand the database structure --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/ai-functions/sql-assistant/prompt-sql --- sidebar_position: 0.8 title: PROMPT_SQL description: Generate SQL queries from natural language descriptions using the PROMPT_SQL function. --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; ## Overview The `prompt_sql` function allows you to generate SQL queries using natural language. Simply describe what you want to analyze in plain English, and MotherDuck AI will translate your request into a valid SQL query based on your database schema and content. This function helps users who are less familiar with SQL syntax to generate queries and experienced SQL users save time when working with unfamiliar schemas. For comprehensive guidance on building analytics agents, including best practices and implementation patterns, see [Building Analytics Agents with MotherDuck](/key-tasks/ai-and-motherduck/building-analytics-agents/). ## Syntax ```sql CALL prompt_sql(''[, include_tables=]); ``` ## Parameters | Parameter | Type | Description | Required | |-----------|------|-------------|----------| | `natural language question` | STRING | Your query in plain English describing the data you want to analyze | Yes | | `include_tables` | ARRAY or MAP | Specifies which tables and columns to consider for query generation. When not provided, all tables in the current database will be considered. | No | ### Include tables parameter You can specify which tables and columns should be considered during SQL generation using the `include_tables` parameter. This is particularly useful when: - You want to focus on specific tables in a large database - You want to improve performance by reducing the schema analysis scope The parameter accepts three formats: 1. **Array of table names**: include all columns from specified tables: ```sql include_tables=['table1', 'table2'] ``` 2. **Map of tables to columns**: include only specific columns from tables: ```sql include_tables={'table1': ['column1', 'column2'], 'table2': ['column3']} ``` 3. **Map with column regex patterns**: include columns matching patterns: ```sql include_tables={'table1': ['column_prefix.*', 'exact_column']} ``` ## Examples ### Basic example Let's start with a simple example using MotherDuck's sample [Hacker News dataset](/getting-started/sample-data-queries/hacker-news): ```sql CALL prompt_sql('what are the top domains being shared on hacker_news?'); ``` Output: | **query** | |-----------------| | SELECT regexp_extract(url, 'https?://([^/]+)') AS domain, COUNT(*) AS count FROM hn.hacker_news WHERE url IS NOT NULL GROUP BY domain ORDER BY count DESC; | ### Intermediate example This example demonstrates how to generate a more complex query with filtering, aggregation, and time-based analysis: ```sql CALL prompt_sql('Show me the average score of stories posted by each author who has posted at least 5 stories in 2022, sorted by average score'); ``` Output: | **query** | |-----------------| | SELECT 'by', AVG(score) AS average_score FROM hn.hacker_news WHERE EXTRACT(YEAR FROM 'timestamp') = 2022 GROUP BY 'by' HAVING COUNT(id) >= 5 ORDER BY average_score; | ### Advanced Example: Multi-table Analysis with Specific Columns This example shows how to generate a query that focuses on specific columns: ```sql CALL prompt_sql( 'Find the top 10 users who submitted the most stories with the highest average scores in 2023', include_tables={ 'hn.hacker_news': ['id', 'by', 'score', 'timestamp', 'type', 'title'] } ); ``` Output: | **query** | |-----------------| | SELECT "by", AVG(score) AS avg_score, COUNT(*) AS story_count FROM hn.hacker_news WHERE "type" = 'story' AND EXTRACT(YEAR FROM "timestamp") = 2023 GROUP BY "by" ORDER BY story_count DESC, avg_score DESC LIMIT 10; | ### Expert example This example demonstrates generating a complex query with subqueries, window functions, and complex logic: ```sql CALL prompt_sql('For each month in 2022, show me the top 3 users who posted stories with the highest scores, and how their average score compares to the previous month'); ``` Output: | **query** | |-----------------| | WITH monthly_scores AS (
SELECT
"by" AS user,
DATE_TRUNC('month', "timestamp") AS month,
AVG(score) AS avg_score
FROM hn.hacker_news
WHERE "type" = 'story' AND DATE_PART('year', "timestamp") = 2022
GROUP BY user, month
),
... | ## Failure example This example shows that for some complex queries, the model might not generate a valid SQL query. Therefore the output will be the following error message: ```sql CALL prompt_sql('Identify the most discussed technology topics in Hacker News stories from the past year based on title keywords, and show which days of the week have the highest engagement for each topic'); ``` Output: | **query** | |-----------------| | Invalid Input Error: The AI could not generate valid SQL. Try re-running the command or rephrasing your question. | To generate a valid SQL query, you can try to break down the question into simpler parts. ## Best practices 1. **Be specific in your questions**: the more specific your natural language query, the more accurate the generated SQL will be. 2. **Start simple and iterate**: begin with basic queries and gradually add complexity as needed. 3. **Use the `include_tables` parameter**: when working with large databases, specify relevant tables to improve performance and accuracy. 4. **Review generated SQL**: always review the generated SQL before executing it, especially for complex queries. 5. **Understand your schema**: knowing your table structure helps you phrase questions that align with available data. 6. **Use domain-specific terminology**: include field names in your questions when possible. 7. **Provide context in your questions**: mention time periods, specific metrics, or business context to get more relevant results. ## Notes - By default, all tables in the current database are considered. Use the `include_tables` parameter to narrow the scope. - To target a specific database, first execute the `USE ` command ([learn more about switching databases](/key-tasks/database-operations/switching-the-current-database)). - The quality of generated SQL depends on the clarity of your natural language question and the quality of your database schema (table and column names). ## Troubleshooting If you encounter issues with the `prompt_sql` function, consider the following troubleshooting steps: 1. **Check your database schema**: ensure that the tables and columns you're querying are present in the current database. 2. **Be specific in your questions**: the more specific your natural language query, the more accurate the generated SQL will be. 3. **Use the `include_tables` parameter**: when working with large databases, specify relevant tables to improve performance and accuracy. --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/alter-database-snapshot --- sidebar_position: 1 title: ALTER DATABASE SET SNAPSHOT description: Restore a database from a snapshot using ALTER DATABASE SET SNAPSHOT TO. --- ## Overview `ALTER DATABASE ... SET SNAPSHOT TO` overwrites a target database with the contents of a selected snapshot. You can restore from a snapshot created by the same database or from another database you own. For background on snapshots and retention, see the [snapshots guide](/concepts/snapshots). :::caution This replaces the current contents of the target database. If you want to inspect a snapshot before overwriting, use `CREATE DATABASE ... FROM ...` to clone it first. ::: ## Syntax ```sql ALTER DATABASE SET SNAPSHOT TO ( SNAPSHOT_ID '' [, DATABASE_NAME ''] | SNAPSHOT_TIME '' [, DATABASE_NAME ''] | SNAPSHOT_NAME '' ); ``` ## Options | Option | Type | Description | |--------|------|-------------| | SNAPSHOT_ID | UUID | Restores to the snapshot with this ID. The source database is inferred unless `DATABASE_NAME` is provided. | | SNAPSHOT_TIME | TIMESTAMP | Restores to the newest snapshot created at or before this timestamp. Uses the target database unless `DATABASE_NAME` is provided. | | SNAPSHOT_NAME | STRING | Restores to a named snapshot. Only valid for snapshots created with `CREATE SNAPSHOT ...`. | | DATABASE_NAME | STRING | Source database for `SNAPSHOT_ID` or `SNAPSHOT_TIME`. Not allowed with `SNAPSHOT_NAME`. | ## Notes - Only one snapshot selector can be used per statement. - `DATABASE_NAME` is only valid with `SNAPSHOT_ID` or `SNAPSHOT_TIME`. It is not allowed with `SNAPSHOT_NAME`. - `SNAPSHOT_TIME` picks the newest snapshot created at or before the timestamp. Use UTC; the recommended format is `YYYY-MM-DD HH:MM:SS[.ffffff]`. - Automatic and unnamed snapshots are only available if `snapshot_retention_days` is greater than 0. Named snapshots are retained until they are unnamed. See [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database). - To list snapshots and their IDs, query [`MD_INFORMATION_SCHEMA.DATABASE_SNAPSHOTS`](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots). - This statement applies to MotherDuck native storage databases. DuckLake databases do not support snapshot restore. ## Examples Restore to a snapshot ID (source database inferred): ```sql ALTER DATABASE my_db SET SNAPSHOT TO (SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27'); ``` Restore to a snapshot ID from a specific source database (extra safety): ```sql ALTER DATABASE my_db SET SNAPSHOT TO ( DATABASE_NAME 'prod_db', SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27' ); ``` Restore to a snapshot by time from the same database: ```sql ALTER DATABASE my_db SET SNAPSHOT TO (SNAPSHOT_TIME '2025-07-29 14:30:25'); ``` Restore to a snapshot by time from another database: ```sql ALTER DATABASE my_db SET SNAPSHOT TO ( DATABASE_NAME 'prod_db', SNAPSHOT_TIME '2025-07-29 14:30:25' ); ``` Restore to a named snapshot: ```sql ALTER DATABASE my_db SET SNAPSHOT TO (SNAPSHOT_NAME 'prod_backup'); ``` --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/alter-database --- sidebar_position: 1 title: ALTER DATABASE description: Update storage-related settings on a MotherDuck database. --- The `ALTER DATABASE` statement updates storage-related settings for an existing MotherDuck database. ## Syntax ```sql ALTER DATABASE SET
AS ... ``` Temporary Tables can be created traditionally with column names and types, or with `Create Table ... As Select` (CTAS). ### Shorthand Convention The word `TEMP` can be used interchangably with `TEMPORARY`. ## Example Usage ```sql CREATE TEMPORARY TABLE flights AS FROM 'https://duckdb.org/data/flights.csv'; ``` This will create a local table with data from the duckdb `flights.csv` file. ## Notes - Temporary Tables in MotherDuck persist locally, not on the server. As such, local constraints should be considered when using them. - Because they are bound to your session, when your session ends, any temporary tables will no longer be available. --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/undrop-database --- sidebar_position: 1 title: UNDROP DATABASE description: Restore a dropped MotherDuck database within its snapshot retention window. --- The `UNDROP DATABASE` statement restores a previously dropped MotherDuck database and its snapshot history, as long as the drop is still within the database's snapshot retention window. This statement applies to MotherDuck native storage databases (standard or transient). DuckLake databases do not support snapshots. ## Syntax ```sql UNDROP DATABASE ``` ## Notes - A dropped database can be undropped only while historical snapshots are still retained. The retention period is controlled by `SNAPSHOT_RETENTION_DAYS`. Use [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database) to change it. - After undropping, the database is not automatically attached in your current session. Re-attach it with `ATTACH 'md:'`, then `USE ` if needed. - For retention defaults and plan limits, see the [data recovery guide](/concepts/data-recovery) and [storage lifecycle](/concepts/storage-lifecycle#storage-management). ## Examples Drop a database, then undrop and attach it: ```sql DROP DATABASE test_db; UNDROP DATABASE test_db; ATTACH 'md:test_db'; ``` --- Source: https://motherduck.com/docs/sql-reference/motherduck-sql-reference/update-share --- sidebar_position: 1 title: UPDATE SHARE description: Manually update a share with a new database snapshot. --- # UPDATE SHARE Shares can either be manually or automatically updated by the share creator. All users of the share will automatically see share updates within 1 minute, containing both DDL (like CREATE TABLE) and DML (inserts, updates, or deletes) changes. These updates are transactionally consistent snapshots, i.e. never partial database updates. The share creator can have the share be automatically updated when the underlying database changes. This is done by specifying the `UPDATE AUTOMATIC` option during [share creation](create-share.md). Alternatively the share creator can manually update the share with a new point-in-time snapshot of the database. This is done by running the `UPDATE SHARE` command. # Syntax ```sql UPDATE SHARE ; ``` --- Source: https://motherduck.com/docs/sql-reference/postgres-endpoint --- sidebar_position: 6 title: Postgres Endpoint description: Connection parameters, SSL options, session settings, and limitations for the MotherDuck Postgres wire protocol endpoint feature_stage: preview --- MotherDuck's Postgres endpoint lets you query your databases using any client that speaks the [PostgreSQL wire protocol](https://www.postgresql.org/docs/current/protocol.html) — without installing a DuckDB client library. For a how-to guide on connecting, see [Connect through the Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint). ## Connection parameters | Parameter | Value | |-----------|-------| | **Host** | `pg.-aws.motherduck.com` (for example, `pg.us-east-1-aws.motherduck.com`) | | **Port** | `5432` | | **Database** | `md:` for your default database, or a specific database name | | **User** | `postgres` | | **Password** | Your [MotherDuck access token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck) | ## Connection string formats ```sh # psql PGPASSWORD=$MOTHERDUCK_TOKEN psql -h pg.us-east-1-aws.motherduck.com -p 5432 -U postgres "dbname=md: sslmode=verify-full sslrootcert=system" ``` ```sh # libpq URI postgresql://postgres:$MOTHERDUCK_TOKEN@pg.us-east-1-aws.motherduck.com:5432/md:?sslmode=verify-full&sslrootcert=system ``` ```sh # DSN keyword/value host=pg.us-east-1-aws.motherduck.com port=5432 dbname=md: user=postgres password=$MOTHERDUCK_TOKEN sslmode=verify-full sslrootcert=system ``` Use `md:` as the database name to connect to your default database. You can also specify a database by name, for example `sample_data`. ## SSL and certificate verification The Postgres endpoint requires encrypted connections. For the best security, verify the server certificate. ### SSL modes | Mode | Encryption | Server verification | Recommendation | |------|-----------|-------------------|----------------| | `verify-full` | Yes | Yes | Recommended for production | | `require` | Yes | No | Fallback if certificate verification is not possible | ### Use the system certificate store (recommended) Set `sslmode=verify-full` with `sslrootcert=system` to use your operating system's trusted root certificates. This is supported in libpq 16 and later, and in libraries that wrap libpq (like psycopg v3). ```sh sslmode=verify-full sslrootcert=system ``` ### Use a specific certificate file If your client doesn't support `sslrootcert=system`, download the [ISRG Root X1](https://letsencrypt.org/certs/isrgrootx1.pem) certificate from Let's Encrypt and point your client to it: ```sh sslmode=verify-full sslrootcert=/path/to/isrgrootx1.pem ``` ### Library-specific SSL handling Some libraries have their own SSL implementations that don't use libpq directly: | Library | SSL behavior | Workaround | |---------|-------------|------------| | **psycopg (v3)** | Wraps libpq — `sslrootcert=system` works | None needed | | **psycopg2** | Bundles its own OpenSSL — `sslrootcert=system` is not supported | Use `sslrootcert=certifi.where()` with the `certifi` package | | **PostgreSQL JDBC** | Looks for `~/.postgresql/root.crt` by default | Set `sslfactory=org.postgresql.ssl.DefaultJavaSSLFactory` to use JVM truststore | | **node-postgres (`pg`)** | Reads `sslrootcert` as a file path — `system` causes `ENOENT` | Use config object: `ssl: { rejectUnauthorized: true }` | | **Cloudflare Workers (`pg-cloudflare`)** | TLS handled by the Workers runtime at the socket level — application-level verification settings are not exposed through the `pg` client | Use `?sslmode=require` in the connection string | ## Session options You can pass DuckDB session options using the `PGOPTIONS` environment variable: ```bash PGOPTIONS="--attach_mode=single --session_name=pg-using-options" psql -h pg.us-east-1-aws.motherduck.com -p 5432 -U postgres md: ``` | Option | Description | |--------|-------------| | `--attach_mode=single` | Only attach the specified database. Recommended when connecting from IDEs or BI tools to avoid seeing objects from other databases. See [Attach Modes](/key-tasks/authenticating-and-connecting-to-motherduck/attach-modes/). | ## Supported features and limitations ### DuckDB SQL, not PostgreSQL The Postgres endpoint is a PostgreSQL-wire interface to MotherDuck. You write **DuckDB SQL**, not PostgreSQL SQL. ### Best suited for - query execution against MotherDuck tables - DDL and DML that run entirely inside MotherDuck - metadata inspection - server-side reads from remote storage ### Use a DuckDB client path instead when you need - local-file workflows such as local-file `COPY`, `EXPORT DATABASE`, or `IMPORT DATABASE` - local or in-memory attachments such as `ATTACH ':memory:'` or `ATTACH '/path/to/file.duckdb'` - local execution paths such as `MD_RUN=LOCAL` - extension-based workflows such as `INSTALL`, `LOAD`, or cloud-storage `CREATE SECRET` - DuckDB-client session features such as `CREATE RESULT` ### Compatibility notes - PostgreSQL-specific features such as `pg_*` functions, PostgreSQL indexes, sequences, and stored procedures are not supported. - Transaction semantics follow the DuckDB model. Nested transactions are not supported. - Some commands are further restricted in PG server mode. For example, `SET threads` and `CREATE TEMP TABLE` are not supported through the Postgres endpoint. ### Operational limitations - **Configuration settings are restricted.** The Postgres endpoint connects in [SaaS mode](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-saas-mode), a MotherDuck mode that blocks most DuckDB configuration changes after connecting and disables installing or loading extensions. Avoid using `SET` statements in your client code. - **IDE schema browsers may show extra objects.** Some IDEs display tables from all attached databases. Use `attach_mode=single` to scope the catalog to your target database. - **Use connection pooling in production.** Each connection consumes server resources. For applications that need many connections, use a connection pooler (for example, [Cloudflare Hyperdrive](https://developers.cloudflare.com/hyperdrive/), PgBouncer) rather than rapidly opening and closing connections. - **Third-party tool support is in early stages.** Check the [Integrations](/integrations/) page for tools that support the Postgres endpoint. --- Source: https://motherduck.com/docs/sql-reference/rest-api/dashboards-create-embed-session --- id: dashboards-create-embed-session title: "Create a Dive embed session" description: "Creates an embed session for the specified Dive." sidebar_label: "Create a Dive embed session" hide_title: true hide_table_of_contents: true api: eJzlWN9v4jgQ/lcsP+1KFMpu+8IbXVId0h7sBbiHQwiZxIC3+XW2Q9uL8r/fjJ0QErKV7uChUl9axx57vvnms7Eno3HCJdMijsY+HVCfqf0mZtJXN57kTHMn3HB/xpUCC9qhPleeFAnag/U3Y6IIiwhHO6KsIdnGkug9JyrhntgKGBmJA+/CAprtFB0sqQ/fiq46VHEvlUK/QmdGh6nex1L8w6yD5SoHi4RJFnLNpTI2Aj0nTO9htQgGEDUsthY+dChvz0NGBxnVrwkOKS1FtIMRgBQyDT1pCpZ5h0r+dyokh6i1TDk6wh6u9EPsv+IKdYMO9eJI80jjEEuSQHgGZe+nQqjZuet485N7GlwnEknWAgKG0VRxaXGfgwzZy3ce7SC2wZf7+ybdMPEgPE6Y58VppEm5EnkWei8iw3gsdywqCCQ6JqEAQxw4SxByUHyuYbZuxSOiEk+/iSY2DRYQnIxYfPQnObRs7hEZCMS/UR4LYL2jc+bJWKkCUUG6onktJ8uKp1We2zGVxJGyJH65vcV/dUSz1PPAxTYNSGlMr5a2AnwbS01eGEREUJN10s8iLPuLAO/aYnpgPnEtRdeLJQTHbNeqwLr3OQpHStjN5ZQO5S8sTALewJYjOP8/LWnsa+sNR2vX+WPhzOa4nlAqtXiLFZmU7BVmCM1DdUGUzTyUhqu8CXYIYkWnJN4SCwekzTR55pKXGhObgB8PPBPZaVBwhP3CW0EYnDta6CafBUWfQBOf36BQRPZYs4fz0WlW5KLGaKcipKGqkuilEaJRYv9ciePoANvYJ3Do+6BBwQL17hTZhvEaylxMhov5b1N3/Jcz+pDSbCG2kmj/EonWqD3VaLveWrT69Vyri4gVNwnuvzuR1sBdQ52PU/dhPBo5kw8pzVM6K01+vUSTFaGngmyIqkWJd+dKnMSaPMJt7f3JsEJ2DQ1OpvP143Qx+ZjH45HLSoB3lwiwYvNUgKdaOlPffdvtcQyCk3hHd4ro3pcEG/CuocPxZO64k+H39cxx/3TcteO6U/dDarJObinM+8sulO301n+2G5JrKDVHYzhGsdqQxOb+aR7zA9o79HumLNDLigd93jMPqJvyodQxT+CyEJDKAGbttU7UoNdjieiGMbAm/dR76npxSPOT8sIMlW0z2SgyHPOJK5UlBPzecCa5NPnF/eFW1QGn4qV6zVdirb+qjzoBKoBd47BI0e8G7wjwEheu6GT4YwzTMUCbl373tnuLE5GokBmwhTNbeiHMFFZaqi+MNKoFzaRn1Vnwv+o4ljPNX3QvCZiIEKbJSFbkc0kPfXRqCj0dOqiKNPWkQpL2qAOYkGUbpvhCBnmO3cC3xLoQNA9MCrYxWwDOOKGwDQrawqWQvxHYJ7fYKp/JrzCXp0GEZwHcOFP8guYTfz2pLOW4m/ec+SAIhGBHv1lHN3Nco5p9drLmnXLG0PN4ot+0XZ1skB9T83DbFBWp0G5CyZ6hE/8apLYQY6Rt+jIasGiX2u1o10QN4+XlRO2FujtlA6Nq5QKSYizm8ROPIDElNRq/kZg8/xcHCv3l sidebar_class_name: "post api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; Creates an embed session for the specified Dive. Request --- Source: https://motherduck.com/docs/sql-reference/rest-api/ducklings-get-active-accounts --- id: ducklings-get-active-accounts title: "Get active accounts" description: "[Preview] Get active accounts in an organization along with active Ducklings per account. Requires 'Admin' role" sidebar_label: "Get active accounts" hide_title: true hide_table_of_contents: true api: eJzlmN9z4jYQx/8VjV6unaEQermXvHGHc2WmhasDfbiEYYS9wbrYsivJcBzj/727/oGN4dq0MNPM5AlbrHZX3/1IlrTjcQJaWBmrkc9vuJ96T6FUK/PTCuzAs3INA8+LU2UN73AfjKdlQtZoe/9Jw1rCZs4+gmUiN8afwppJxYRisV4JJb/lAZgIY7ViG2mDynpYhWOYRdW3y1z4M5UaDHsz8COp3jAdh4DxrVgZfnPPqyB83uEGvFRLu8X2HR+kNoh1GQ9b5hlaoKMkVgaw647/fHVFP4cjuUs9D4x5TENWGWM0L1YWlCVzkSSh9HKvvS+G+uy48QKIBD3ZbQLoJV5+Ac9ix0STqFYWEffJ1pZCa7FFQ2khMv/sITWglYigYWmsRtl41qkr9t/9S//Yc7vYD1xvHjjWk+GT6Y7xeROABjZm0keV5CM6wybpBaih8JnxBKWFL7l0lGoR4RmRsP9ig0WFfURqKT0+cPJlrLCpeY63ArXKkxfHoR9vFHnJiI2cNJ+oQhnKHPfu51nbaF+LpvLHZjWhWf7f9Sns3qNMhDoYezncIgRZrJ6h8zQABlqjKlWXDoevIkpCaOWWUXL+v3KZ2x/4GwwXrvP7zLmbkj9pTApnIPvdUbYLURnOs3ayA1yQKCiLH1mRDrOBsGxDVJfLgFyGwB5xPLYaWXNQuLx8J1opGC4+Vtq2nqVEPyATP/6NhFJh4KhYyRpBd2UtDhTt1IK0qKqEvs9BzEnsH5M4UmucWz77gOOgySxC8+KIPJXjJcicjQez6S8Td/TZGb5KNE8IWyPaPwfRA2mbjJ7m7QSrb49ZnSlRfuXBf3GQHiR3CTpvJ+770XDojF8lmk05aybfnsNkLWgTyBZUJ0i8PiZxHFt2i5/5l4dhndklGBxPpovbyWz8OpfHvZY1gNfnAFir2QSwydIRfe9O7R5HCBxuRUPmlKN7WQi20rsEh6Px1HHHg18Xd477h+MuHNeduK+SyUNxKzDfnbehPC3v4We7hVyL1IyMcRmlm4QV5DoLG+BLb93vFeewhaivE/AotQZt8pN7qkO0C6xNzE2vJxLZjWKUSdMhq+vFEc8aZ/07QrkoXevEvy8geaIQuSW+L0Fo0HlBSYfcshTztzwQXUUwFzfTbPBphD0ps0LBfveqe0WYJbGxkcijFMdxfuLio12AXT0v/48bk0INC19tLwmFVDSOXOtdWZt7vu6jYbs6qHaAw6X/d7ulMDDTYZZRM54sNN224ONaaCmWJbzYAU/pqDGV8wm2NCk8DxLCAHd7aT4v2+sSVXWPzEeHTjL0IW4Usixcp3og79UsV9uGb0wzt5jGT6Aw1U6ZhKV3DIR4/gXNDIU3 sidebar_class_name: "get api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; [Preview] Get active accounts in an organization along with active Ducklings per account. Requires 'Admin' role --- Source: https://motherduck.com/docs/sql-reference/rest-api/ducklings-get-duckling-config-for-user --- id: ducklings-get-duckling-config-for-user title: "Get user Duckling configuration" description: "Gets Duckling (instance) configuration for a user. Requires 'Admin' role." sidebar_label: "Get user Duckling configuration" hide_title: true hide_table_of_contents: true api: eJzlmN9v4jgQx/8VKy/bSj2gu2114o3d0h7SHr1L4R4OVcgkQ3Cb2Dnboe2i/O83Y0jzg2hXq3K6Sn2KQ8bjme98YpzZeCoFza1QchR6fS/MgodYyMj8EoG93N18UXIpoiulpwa0d+KFYAItUpqEU67BGlaYsiMhjeUygGMWuGnZ1jtbKs04y9BDh/nwTyY0GPZhECZCfmBaxdBBz5ZHxuvPyjDwt8Kh8e5OPANBpoV9RqONN8jsSmnxjW8jmd3laJFyzROwoI2zERRiyu0KPUl8gHcUgxs2M6EH7FHYlZDMroApHXFZuMe1gxUk3OtvPPuckiNjNcaITxL+9BVkhIv0P56f/5Tb/MTTWzVQfqszoBxQmlRJgynjYh97PbrUfd5mASpillnMCmOMAxW3IC2Z8zSNReDW6N4bmrPZT0At7iGwODHVhIEV2xU18HD+iDLDj22L6syN+AZt2oDMEippmsUuRjIPuQ5xeJ8lC0XyQcTxEgm83OWUhopD9SjnWG0lQ1NxKzDByEGI3IiEXF/0XAG2N79enPV6eU3UWSPGO/cUMzQBJ8b+nxyXsQoemg7RzaKeXDW3i7P/XJtaXHdN2woXDQnRlIzP2lD9zEP3woOxh0M0Qfh51FqM+uoTfOFAa9x8iilYrieepDE0YnPihj/l0tnX/A0u5/7wz+nwdkL+hDEZVEvEtebPtKdZSMwrsmxWpjAksurBDiRzizK1ZNtwcA/ilj2ChmLrEIsY3P5si8yqSc32OCiVdALghmWFbeq5k+gImTj+joRC4sJJscW+LLrZ1aKm6EkpSIOqQuiZA9GReLpP4kiuEdeQfcE8kEHBY/PmiGyL8RBkTseD6eS3G3/09/DyXaLZImyJ6OlrEK1JW2W0nbcWVj/tszqVfHe6gfDNQVoL7hB0Xt34n0eXl8Pxu0SzKmfJ5KfXMFkKWgWyAVULiWf7JI6VZVcqk28PwzKyQzA4vpnMr26m4/e5Pb5oWQJ49hoASzWrAFZZ2qPvvO30OELg8GstZsNddm8LwUZ4h+BwNJ4M/fHg6/x26P819OdD37/x3yWTdXELMM9fd6Bsl7f+t91ArkFqTsa4jVK/JAKnM/UX+l53fdqlj33T3RQ9hrxbti+oe6HXRWMi0zFOWVmbmn63y1PRSRQqpqn10QlU4uWVdsctUb2tYqPp8VJL8lQ0Keh+AVzjxyBFS5I4y52uv7uFqGXDfDxXs8EfI5xJkW3FPO30Oj0iLlXGJtytsuudXIN1PZyy4VPr8jTrsilf18P2ibY5W3iy3TTmwvVRnKKbXTFm3voUDV058NqvNH1qDaUVpkjWm82CG5jqOM/pZ/yw0NRkwuGaa8EXjl3cnIShMZZ+iac5+E66R7vgw2P2g75SazLFKy7pBcdjZEZ3OHyA52oLK6d3dIWf4lhqim/7eBAEkNrKxL2dkuB6gfh6SN9WdDSo8LTj56QYkPfWoFA5ZzFRDyBRvSJGS/cUYJ7/C53hBVc= sidebar_class_name: "get api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; Gets Duckling (instance) configuration for a user. Requires 'Admin' role. Request --- Source: https://motherduck.com/docs/sql-reference/rest-api/ducklings-set-duckling-config-for-user --- id: ducklings-set-duckling-config-for-user title: "Set user Duckling configuration" description: "Sets Duckling (instance) configuration for a user. Requires 'Admin' role" sidebar_label: "Set user Duckling configuration" hide_title: true hide_table_of_contents: true api: eJztmG1P4zgQx7+KlTcLUpeWXUCnvitQdJX2ym5o78VVVeUmbmtI7JztUNgo3/1mnIQ8NAdC9HRI7Js2D+PJzH9+cexJHBkxRQ2XYuQ7fcePvbuAi7X+rJm5zE8upFjx9ZVUU82U03F8pj3FIxwEQ26Y0aQwJQdcaEOFxw6JZ4fFmXeykopQEoOHI+Kyv2OumCafBn7IxSeiZMDAsaFr7fRnZRRwrfCnnXnH0cyLFTePYJQ4g9hspOI/aRbIbJ6CRUQVDZlhSlsbjhFG1GzAk4AbcIYh2MNmIniDbLnZcEHMhhGp1lQU7uHZ3oaF1OknjnmM0JE2CmKEOyF9+MbEGh7S/3J6+iq3acdRmRigvlExwxzwCtPmXPqP+Li6QccBXQ0TBm/RKAq4Z111bzU+LtmNUy5vmWcgzkhhsQ0HLeFuVp6X7RSj/mILqrOXbYtiLTT/ydqkYiIOscJRHGisAJr7VPlweBuHS4lqsjWFvzWHv3mK6crAl1uxgOJL4euKWw5CrC2SQBEP0fVZz9YjO/nt7KTXS2sazxoxzu1dyFB7FJH7f3JcBdK7azoEN8t6ctXczk7+c21qcc2bthUuGhLumOaowfXsjo6k0JmaX3o9/GtMKbEHL7xexQEpjJ29cf+L5188v5pnND5pQ/Wc+vZzBvP1/hANAX66bi1G/ekT+J4wpeDTWgyBcj3QMApYIzYrrv8ql9a+5m9wuXCHP6bDmwn641rHrFoiqhR9xE+2YaF+Q5bNyhSGSFY92IEg9qFErkgWDnxiqSFbplgxdfBlwOzqwxSZVZOa7XBQKmkFgO+x4aapZy7RATBx+IyEXMCDw2IF8fTQJK9FTdFOKUiDqkLomQXRkni8S+JI3AOuPrmAPIBBTgP97ohsi3EfZE7Hg+nk92t39Nfw8kOi2SJsiejxWxCtSVtltJ23Fla/7rI6FTRfvDP/3UFaC24fdF5du+ejy8vh+EOiWZWzZPLrW5gsBa0C2YCqhcSTXRLH0pArGYv3h2EZ2T4YHF9PFlfX0/HHnB6ftCwBPHkLgKWaVQCrLO3Qd9q2ehwBcErQgAzz7N4Xgo3w9sHhaDwZuuPBt8XN0P1z6C6Grnvtfkgm6+IWYJ6+bUHZLm/9s91ArkFqisYwjWI3MIqtztg+6zvd++Mu9rJ0NylaaGm37M5hc07dF323WAUwZGNMpPvdLo34UShBMYWdvSNPhk5a6ebdINVZFRs9vadaoqeiB4fnS0YVbAYxWnw33LJjNqxqUnS46nv+xo73aWO7u/vsteym/210dc/ba3eG4WIFbWI5Bn9YXbB/SlzYBpDB9xE4QyGz2h8f9Y56+IJEUpuQWlHyTuYNM7ahWnZfay3XJkZJObvstWmbVciwB9ONAsptU9PWP8nRmTn3x2Bo4YH/fqUDW+vubiBDtE6SJdVsqoI0xctQWoUdXzi8p4rTpX3TYCrlGo8B1BWsPdkz2R7ksfuH5IUmb2syxYQkcDqCRW+MZ3B4xx6r/eQUZ5QN0AJgYnzZ7Yssis8TdFIO35nd004xYuB5LDLP2s4rL+n3Ke4dl3mzOMzmAUW32MWAXxuptKJk7S+8ljgBFes4mxEyl8gmrp8qL13+knWKA0yqVQsomLWYyDsmoGiFNAbPUZc0/QcB5muk sidebar_class_name: "put api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; ::::info This endpoint is used to configure user-specific settings, primarily for setting Duckling sizes for service accounts. For a complete walkthrough of service account management, see Create and configure service accounts. :::: ::::caution[Username Parameter] When configuring a service account, ensure the `username` in the path (`/v1/users/:username/instances`) is the specific username defined when creating the service account. The endpoint path uses `instances` for legacy reasons but configures Ducklings. :::: Sets Duckling (instance) configuration for a user. Requires 'Admin' role. ::::note Authentication for this endpoint requires an Admin token. This endpoint configures Duckling sizes and read scaling pool size. Use the token endpoint to create read scaling tokens. :::: Request --- Source: https://motherduck.com/docs/sql-reference/rest-api/motherduck-rest-api --- id: motherduck-rest-api title: "MotherDuck REST API" description: REST API reference for managing MotherDuck resources including databases, users, and access tokens. sidebar_label: Introduction sidebar_position: 0 hide_title: true custom_edit_url: null --- import ApiLogo from "@theme/ApiLogo"; import Admonition from '@theme/Admonition'; import Heading from "@theme/Heading"; import SchemaTabs from "@theme/SchemaTabs"; import TabItem from "@theme/TabItem"; import Export from "@theme/ApiExplorer/Export"; ::::warning[Preview Feature] The REST API methods are in 'Preview' and may change in the future :::: To better support scenarios that require some flexibility or dynamic configuration around managing a MotherDuck organization we are exposing an OpenAPI endpoint with some new functionality. At the moment it enables limited management of users and tokens through HTTP without requiring a DuckDB + MotherDuck client to be running. All of the methods are authenticated using a Read/Write token of a user with the `Admin` role within your MotherDuck Organization and passing it through the `Authorization` header with a value of `Bearer {TOKEN}`. ::::info[Service Account Management] You can use this REST API to programmatically manage service accounts, including their creation, token generation, and Duckling configuration. For a detailed walkthrough, see [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/). :::: If you would like to generate your own OpenAPI client the spec file is located at https://api.motherduck.com/docs/specs ## Included pages - [Create a Dive embed session](https://motherduck.com/docs/sql-reference/rest-api/dashboards-create-embed-session): Creates an embed session for the specified Dive. - [Create an access token for a user](https://motherduck.com/docs/sql-reference/rest-api/users-create-token): Create an access token for a user - [Create new user](https://motherduck.com/docs/sql-reference/rest-api/users-create-service-account): Create user is restricted to creating a user with a 'Member' role - [Delete a user](https://motherduck.com/docs/sql-reference/rest-api/users-delete): Permanently delete a user and all of their data. THIS CANNOT BE UNDONE - [Get active accounts](https://motherduck.com/docs/sql-reference/rest-api/ducklings-get-active-accounts): [Preview] Get active accounts in an organization along with active Ducklings per account. Requires 'Admin' role - [Get user Duckling configuration](https://motherduck.com/docs/sql-reference/rest-api/ducklings-get-duckling-config-for-user): Gets Duckling (instance) configuration for a user. Requires 'Admin' role. - [Invalidate a user access token](https://motherduck.com/docs/sql-reference/rest-api/users-delete-token): Invalidate a user access token - [List a user's access tokens](https://motherduck.com/docs/sql-reference/rest-api/users-list-tokens): List a user's access tokens - [Set user Duckling configuration](https://motherduck.com/docs/sql-reference/rest-api/ducklings-set-duckling-config-for-user): Sets Duckling (instance) configuration for a user. Requires 'Admin' role --- Source: https://motherduck.com/docs/sql-reference/rest-api/users-create-service-account --- id: users-create-service-account title: "Create new user" description: "Create user is restricted to creating a user with a 'Member' role" sidebar_label: "Create new user" hide_title: true hide_table_of_contents: true api: eJzlmFFv2zYQx78KIQzoNrhx3CYvfnNmBTOQ2p1s76GOG9DSxWYjURpJOfEEfffdUVIkK163wR4QIC8JRR2Pd3/+KPOYOXECihsRy1Hg9J1Ug9LvfQXcwBTUVvgw8P04lcbpOAFoX4mEjNH0F2vEaAQTmvmpUiBNuGMKtFHCNxAwEzPrS8g144XpozAbbL/7BNEK1Dum4hDQt+Fr7fQXRQDOsuNoQI/C7LAzcwap2cRK/MmLuRfLHC0U/JHiVFdxgEaZfRQKMAujUug4fiwNBkSveJKEwreDu980ecgc7W8g4tQyuwQwn3j1DXxKM1GkiRGg6S0FJHkE1k8YTu5tQOUYSlSuaQw3Bu2w6+vi9jbJbvIl9kZC3oBcm43T7+ETf6qePlxe5p1/4+X2Nrhb/vyDQ/nuy1/FxaJUG7bChZAC9WBCMi5ZrNZcVnrleaehzqJOaZnnxTudxFIX+X44P6d/+5NNU98Hre/TkFXGzv+i8L4i/xT4xaFYr3jAvAKN08UYYfZ8fSDE9rLMNsBAqVixakjHgSceJSG0YsspuOA/ubT2e/4GwzvP/W3uTmfkT2idFvGWHrlSfIcjhIFIH5FleyEqw2XeDnaA+NGkLL5nRTjMbLhhj6CgYkesQmD3mI+pMmsmhZv7b2YrBcOtYIRp61lK9CMy8dN3JBQSJ46KfdGYNCvXYk/RTi1Ii6pK6IUF0ZLYe0niSG55KAKGX8oAGRQ81K+OyEMxnoLM+Xgwn/068UZf3OGbRPOAsDWivWMQ3ZO2yehh3g6w+vElq3PJy99YTOy1QboX3CnovJ54V6Ph0B2/STSbctZMfjyGyVrQJpAtqA6QePGSxHFs2DWeOF8fhnVkp2BwPJndXU/m47f5eXzWsgbw4hgAazWbADZZekHf5aHT40jSCZyHzC2ze10ItsI7BYej8cz1xoObu6nr/e56d67nTbw3yeS+uBWYl8cdKA/Lu/+z3UKuRWpOxvgZpfo8ie35EwtFrCKd7rbXLQpmqpfVllpUnaYqxLcbYxLd73Z5Is6iGNVRQeo/nPlxZMvJqsCeEsHFirXK7Od1I080hbXE5xVwBcquI+0Dry7E3Tr/uqx7XndMDdWyjkvJP9m4hhgX8/DIzQafRzgRJVLo3Ds7PzungZR4xG1QpdPy/kHCo71YaC9RVu/cU19VFKIYeDLdJORCUnxW8qxcmIWz7aHh813GhlYNe7NsxTXMVZjn1I2qKbrfwOaWK8FXJbA4AHiAKdFaPsCOMihyeT+juck8TO2ObH+R6FahGDHAij0x37VdNsD6PLEFz6q8TIkKeBV/xE7623ewEVttLSq2L3NCLtdpgXHhk5igH/0GPSUtnarRuD/hcteIEOWxFrP4ASRK1ClTMfSM4aLzvwAw4m74 sidebar_class_name: "post api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; ::::info This endpoint is used to create new users / service accounts. For a detailed guide, see [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/). :::: Create user is restricted to creating a user with a `Member` role. Request --- Source: https://motherduck.com/docs/sql-reference/rest-api/users-create-token --- id: users-create-token title: "Create an access token for a user" description: "Create an access token for a user" sidebar_label: "Create an access token for a user" hide_title: true hide_table_of_contents: true api: eJzlWN9v4jgQ/lesPO1KbIFtew+80ZbqkPZgL4V7OISQSVzwNrFzttMui/K/34ydkB/kdq+Ch0p9aRN7PJ75vm+MM3tPJkxRw6UYh97ASzVT+lOgGDVsJp+Y8DqeoRvtDRaewXftLTueZkGquNnB6N4bpmYrFf9hncDIMgOLhCoaMwPOrA2HCRgzW3AnYCLfyT52vJDpQPHErbcT5IWbLRfEbBmRakNF4R72DrYspt5g75ldgo60UVxsYCam378wsYFNBp+vr1/lNut4iv2TcsUABKNShjngCNPmRoY73K5u0PECKQwTBqdokkQ8sK663zRutz+OU66/scBAnIlCzA1n2s6aqGIk0njNFCbDBY/T2Btc9no2tfytf335Ww+H6tlZbgj7nnBHJoEsgSUpQo3JOcxbIOOigKzfBBCWWa8rt+h4MRMY0QKAoeHqBfSAXNoXHdAIbZYY5iNNI0CpapbV8F648JZZ5sZ1IoV24Hzu9fBfPdeHNAiY1o9pRApj73x0WNG3ZFuPgdoQiLOGqHnYtuZRqphi7mkKBv/JA0xY5tiKml9vPeMxIxW+mSawDHy4qg1XRr/KyQvVUAx8wwWNoh3JvbiSAMakiHYVf2spI0ZtzieroykDkx84gFUtm2oktW1zyVy1qeSGhsR3FXw+dcRAOt205tvAF04YppRUpFiCHNM4iVgjNiROhq9yae1r/oZ3K3/053z0MLNi1DplVRFQpShCByTE+oQsm4QVhsusGexQELspkY/EhQOHLjXkhSlWVC1fR4xAidjj2GZWTWpxJI8SSQsAHC6GmyaeOUQfQBMffwIhF6423W/KYdN9zkUN0U4JSENVBdALK0SrxP6xEsfiGQQfklvIAzTIaaTfnCLbYjyHMueT4Xz2+9Qf/z26e5fSbAG2lGj/FInWoK1qtF1vLVq9PNbqXND8OsfCNyfSWnDnUOf91L8Z392NJu9SmlU4S01enqLJEtCqIBuialHi1bESJ9KQe5mKtyfDMrJzaHAyna3up/PJ+zweD1iWArw6RYAlmlUBVrV0pL7rttvjGAQHn6cRGeXZvS0JNsI7hw7Hk9nInwy/rB5G/l8jfzXy/an/LjVZB7cQ5vVpF8p2eOs/2w3JNZR6hMqt/UIiFMCpfI1aDCjBdod1DwcvNnYSaW+stgcz8LrP/a5t9XT3RR8m6+b9HWzvqOeic5OqCOy3xiR60O3ShF/EEhBWYRo8XQQy9rJKP+gBq8Cx3ugKHbhHT0UXx35PMqogUswOa8kvey6jEkPbI+kdWkcHcVe/QGv9BQAP+LBLc1L/sFHfQdTEh0s9GX4dgwNM04HZv+hd9HAhAhVTG3K+3f/BucbMIVnDvptuElFuv5ctlPucgoX33Id1lgT4P6i0w8o+2xZJA9P9fk01m6soy3AYIFLYe4PHZ6o4XVuFwxHGNT4D3Y9w52NHUR3OMO+Dn1fCR/KLdltrJsVBIPAYgMtmim/w+MR21c5ehpW8BVoAIozPTd+6KD7N0Em5/OhUzTrFiiHAnpif2i4rUv86tR9t67xvF7sCVPQFGwnw14YqLSpWqnZs70VUbFJXis4nahIvLtXuh1Nrp3jArFrBAMashe2hAmsFNnnDCCv6X01uY2M= sidebar_class_name: "post api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; Create an access token for a user :::note - **Token Creation Scope**: Through the API, you can only create tokens for: - Your own user account - Service accounts within your organization - Admins cannot create tokens for other non-service-account members through the API - **For service account token creation**, ensure you are using an **Admin token** for authentication. The token generated by this call will be the service account's own token for its operations. - For detailed guidance on service account token creation and best practices, see Create and configure service accounts. - If a service account is created through the admin API, connect to that service account with a read/write token before using read scaling tokens. - Each token is tied to a specific user. Use the exact `username` that was specified during service account creation in the path `/v1/users/:username/tokens`. - If the optional `ttl` parameter is not specified, the access token will remain valid indefinitely until revoked by an administrator. ::: Request --- Source: https://motherduck.com/docs/sql-reference/rest-api/users-delete-token --- id: users-delete-token title: "Invalidate a user access token" description: "Invalidate a user access token" sidebar_label: "Invalidate a user access token" hide_title: true hide_table_of_contents: true api: eJzlmMtu2zgUhl+F4KoFNFHcJhvv3LGCMZCxW8XuokFg0NKJxUa3ISk3GUHvPofUxbp1isZZBIg3lijynJ8/P0k8ymmSgmCKJ/HCp1OaSRDyDx9CULBOHiCmFlVsL+n0lip9LumdRSV4meDqCVtzOstUkAj+rwmCLXcF9kiZYBHGENL04XgB21SA4WK8gGcm2pb72CK9ACJGpzlVT6m+JpXg8Z4WFhXwT8YFoDQlMiis0VBatDm0qA/SEzwtpZgL5AdXAY+JCoAkYs/iWunP81o0Yo/XEO8xyfTD5eVvhR2KvtMtMk1iCVIn+3B+rv+6MW8yzwMp77OQ1J1Rh5fECmKlu7M0DblnctjfpR6TDyeQ7L6Dp2iBP4tejOX5xHzioj6Q6vnxcX2F5kbxckYRKmd7GHOym32NboEQiSD1EIvCI4vSEHraCi3O/62Qpn8n3my+dZ0vG+dmreNxKbNSbxWRCcGecARXEMkTZll01vy26XhX9MXOYmKSkuSelHIQIKbIDxBQrzvfhUDucT6qnll7Unh7/SRbZRjSprjq+1lZ9A6ZeP8/FvIYE0f1/dEkzau16DhqHQ3pUVUbrbVWJE6GJC7iAwu5T/7EeSCDnIXy1RE5pvElyNwsZ5v1Xyt38c2Zv0k0R4w9Ijo5BdGOtW1Gx3kbYfXjkNVNzKq3HPivDtKOuJeg82rlflrM587yTaLZtvPI5MdTmDwa2gayB9UIiRdDEpeJIldJFr8+DI/KXoLB5Wq9vVptlm/z8dh4eQTw4hQAj262AWyzNKDvcmz3uEDgcKsdEqea3etCsCfvJThcLNeOu5xdb28c96vjbh3XXblvksmuuTWYl6dtKMft7b62e8j1SB24Ur3nmQLCiCnVmCmtiKrqWaxLg0QXu2WZq1dGl5NTah8mtimA7bwuKQu7rHrtvK5XC104gjjUpW0mQhwaKJXKqW2zlJ9FCXot/Mx7OPOSiBatgvlG3w/l+vfK5oYCHamuTfX5DpgAYajQZpqe1Yr8bRLNMRFxcUdOZp8XOFIrK52YnJ2fnWtW00SqiJksVcn8S5M6njbiFDwqOw0ZN3WumXpeuXdLDxMcZ/zD/2mrKK8+HGBjU/SjJwGK0sPyfMckbERYFLoZiwihPyzg4YEJznaGU3wQcamPcd3ucecGA4XNk4i+cyue3xOdfEx5fcvG+oZFJzJ9hocP8NT+NKG/Nzwr7y8+PDxDU2NnoZ8DATAfodC+lJdnuHqpag0cPI01hg35c+faWTvYXe9AWutbwWbVBzrBqC5cNNPDfCMqiq51WmNR/AdGEE2U sidebar_class_name: "delete api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; Invalidate a user access token Request --- Source: https://motherduck.com/docs/sql-reference/rest-api/users-delete --- id: users-delete title: "Delete a user" description: "Permanently delete a user and all of their data. THIS CANNOT BE UNDONE" sidebar_label: "Delete a user" hide_title: true hide_table_of_contents: true api: eJzlmE1z2kgQhv/K1JySKhZMYl+44SBXqHJEVsAelnK5BqkNE+srMyNiVqX/vt36QEJoU5vAwVU+IVDP9NvvPCOmlfIoBiWMjMKpx0c80aD0Hx74YID3uAfaVTKm23jzK6hAhBAaf8+KECYYjWAi9JjwfRY9MbMFqZgnjOizxefpnH0a2/ZswW4ttrQnM9vCaY3YaD5aFdn4Q49rcBMlzR5/TPk4MdtIyX9EkXb1kGFELJQIMKPSeYwkPbEwW5wtxBul9PyyLTtX+EOarQxJHYvURoTl9MxE7FCtdrcQCD5KudnHNKU2SoYbvBOIl3sIN5hu9OHm5jcTZD2u4HsiFaDTRiVAdSnQcRRq0JT2w9UVfRzPPk9cF7R+SnxWBaMiNwoNrgSFizj2pZtnG3zTNCY9LSVafwPX4MBY0YobWWQ8mHZSdHYkd1VHPmQZ3bvu0norPObgINDmchoDrF5sOiS212GB3oNSkWLVkB6HFxHEPrS0ZSTO+6Up8/ij+caTR8f6c2nNFzSf1Dop9JYzCqXEHkdIA4E+o8r2QlSBD1lb7DhkeVLahoUcxFEY9gMUVOzItQ/sCesxVWXNonCv/Ue20jAk1kjT9rO06B0y8f4nFsoQEwfFtm4kTcu1OHK0VxvSoqoyepWDmJM4PCVxGu6ELz32CetABqXw9asjskvjJchc2uPl4vPMmf5tTd4kmh3G1ogOz0H0yNomo928dbD68ZTVZSjKvzws7LVBeiTuEnTezZzb6WRi2W8SzaadNZMfz2GyNrQJZAuqDhKvT0m0I8PuoiR8fRjWyi7BIB5IH+9meB59kwwevKwBvD4HwNrNJoBNlk7ou+k6PU4RODxj+swqq3tdCLbkXYLDqb2wHHt8/zi3nL8s59FynJnzJpk8NrcC8+a8A2W3vcd/2y3kWqRmFIyPUeqND31i3neO+GA3HOQd7CCtmqOMukhQu6pNTZSPgVtjYj0aDEQs+0GEVikvcZ/7bhTwrNH8zgnnYvlaLfBhEWmmqlGl72sQClS+qORFHlka+iVPNMFEzMEDNRt/neJIUla4OOxf9a8ItTjSBtt6Glt20pNmb9+2P6135QVfCBTlGXgxg9gX2N2jsNy8tHR7xXdDDCzeGPT4qG5He3yLFVBEmq6FhqXys4x+xoZB0RsFvNwJJcU6ZxIfOlLTNS7pE57S4Cf1vXNKdt+z//0SobOUahOHtIXxoJjQN7x8hn3zzUVGu3ALwkPbSWlxe4z9f2waA0+ehUTRAdOJdW8tyFT6/2+wU7LSqy4oQacutDGPWETPEGbZQaah76Qxy/4F/Dk/SA== sidebar_class_name: "delete api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; Permanently delete a user and all of their data. THIS CANNOT BE UNDONE Request --- Source: https://motherduck.com/docs/sql-reference/rest-api/users-list-tokens --- id: users-list-tokens title: "List a user's access tokens" description: "List a user's access tokens" sidebar_label: "List a user's access tokens" hide_title: true hide_table_of_contents: true api: eJzlmE1v2zgQhv8Kocu2gDeO2+Tim1sruwZSu1XsPawRGLQ0sdlIpEpSTryG/vsOKcn6bIvAPgTIKQo1Mxy+85AW5+CIGCTVTPBJ4AydRIFUf4ZM6bl4BK6cnqPpRjnDpaOzgfueo8BPJNN7HD04o0RvhWT/2Rg4cp+iRUwljUBjLGvD8AWO6S2G4/gin8g+9pwAlC9ZnPnbF+SJ6S3jRG+BCLmhvAiPc/tbiKgzPDh6H5tASkvGN/gmos+3wDc4yfDD9fWLwqY9R8KPhElADbRMwKxBgooFV6DMZB8uL82fesy7xPdBqYckJIUx5uELroFrY07jOGS+naP/XRmfQ3sBYv0dfI2OsTS10CybMZe7tKNS0j2aMQ2R+r0/C7pEehAyotrokaBBWpSjaYgv4DlGPVZUd4Wp6zBnERCbL8m8FEE3jOFLoBqClVYvCvJEFdaHbRinYbgneZSsSjRYCR7uK/HWQoRAbRGt+yobb88HPIkMyDbIEwJsqmX/UT4Njc19WiNhaUSsraKaQW069Gy4FtsltW+uuvj5RAPioQsofT5uIiSSbjoFaAiOuwCkFJIULqboNIpDaORmKimCF4W09rV4o/HKc78t3Lu5iceUSuAEun+6ymYZCsP7tJnsiBM7KREPJEsHDwaqyRNIKPYzW4dAcM/YI8OurLqoZYuXUkkrAJ4imummnrlE75CJ97+QkPFss2bn3nHSQ16LmqK9UpAGVYXQSwuiJXHQJnHCd7gDAvIZ14EMMhqqV0dkV47nIHMxHS3mf8+8yb/u+E2i2SFsiejgFERr0lYZ7eatg9WPbVYXnOafHBC8OkhryZ2DzpuZ92kyHrvTN4lmVc6SyY+nMFkKWgWyAVUHiVdtEqdCkxuR8NeHYZnZORiczuarm9li+jaPx6OWJYBXpwBYqlkFsMpSi77rrq/HCQKHV6iQuPnqXheCjfTOweFkOne96eh2ded6/7jeyvW8mfcmmayLW4B5fdoHZbe89Z/tBnINUluq3DL83KXEXL7/wEuhvS4TXXQWIsAj13QdNmBrYhoEQ6e/G/RtG6J/KJoEaf/ogyO7oq2QyBDtt1rHatjv05hdRAKllUHiP174InLSSrPizuCflbvRsjgW3UQqWgz2ZglUgrQQGO2sZV6AL3aiMU5EPPwAJ6OvE/Q0mWULH1xcXlwaNGOhdETtLHnn49ea1PQ7ZqbhWffjkDJ7zbXrPuR6LZ3dAP2sYvh3WGmslB2bLWZhTA+HNVWwkGGammG8JEjTxcHHHZWMri2HeNAwZZ6xNA/4ZQatrI4njfPOy3l9T37TuOlcSbFdudms+EmYmP/w8RH21R5RavbbFm/eWA2TX/Z6hNLFuuLYOvVM/Y+Q/eWae5L5ma82D7IS94oHE70zKVTOWti2GKpX5Gg1Ngmm6f8fcaOl sidebar_class_name: "get api-method" info_path: sql-reference/rest-api/motherduck-rest-api custom_edit_url: null --- import MethodEndpoint from "@theme/ApiExplorer/MethodEndpoint"; import ParamsDetails from "@theme/ParamsDetails"; import RequestSchema from "@theme/RequestSchema"; import StatusCodes from "@theme/StatusCodes"; import OperationTabs from "@theme/OperationTabs"; import TabItem from "@theme/TabItem"; import Heading from "@theme/Heading"; import Translate from "@docusaurus/Translate"; List a user's access tokens Request --- Source: https://motherduck.com/docs/sql-reference/sql-reference --- title: SQL reference sidebar_class_name: sql-reference-icon description: SQL reference for MotherDuck & DuckDB --- Complete SQL reference documentation for MotherDuck and DuckDB. This reference covers MotherDuck-specific SQL extensions, DuckDB's comprehensive SQL dialect, the Admin API for programmatic management, and the [remote MCP Server](/sql-reference/mcp/) for AI assistant integrations (and the [local MCP server](/sql-reference/mcp/#local-mcp-server) for self-hosted use). For practical examples and step-by-step instructions, see our [How-to Guides](/key-tasks/how-to-guides) and [Getting Started](/getting-started/) tutorials. ## Included pages - [MotherDuck REST API](https://motherduck.com/docs/sql-reference/rest-api/motherduck-rest-api): REST API reference for managing MotherDuck resources including databases, users, and access tokens. - [DuckDB SQL](https://motherduck.com/docs/sql-reference/duckdb-sql-reference): DuckDB SQL Reference - [MCP Server](https://motherduck.com/docs/sql-reference/mcp): Connect AI assistants to MotherDuck using the remote (fully managed) or local (fully customizable) MCP server - [MotherDuck SQL](https://motherduck.com/docs/sql-reference/motherduck-sql-reference): MotherDuck-specific SQL extensions and cloud database management - [Wasm Client](https://motherduck.com/docs/sql-reference/wasm-client): Connect browser applications to MotherDuck using the DuckDB WebAssembly client and Hybrid Query Execution. - [Postgres Endpoint](https://motherduck.com/docs/sql-reference/postgres-endpoint): Connection parameters, SSL options, session settings, and limitations for the MotherDuck Postgres wire protocol endpoint --- Source: https://motherduck.com/docs/sql-reference/wasm-client --- sidebar_position: 5 description: Connect browser applications to MotherDuck using the DuckDB WebAssembly client and Hybrid Query Execution. --- # MotherDuck Wasm client [MotherDuck](https://motherduck.com/) is a managed DuckDB-in-the-cloud service. [DuckDB Wasm](https://github.com/duckdb/duckdb-wasm) brings DuckDB to every browser thanks to WebAssembly. The MotherDuck Wasm Client library enables using MotherDuck through DuckDB Wasm in your own browser applications. ## Examples Example projects and live demos can be found in the [wasm-client GitHub repository](https://github.com/motherduckdb/wasm-client). ## Status The MotherDuck Wasm Client library is in an early stage of active development. Its structure and API may change considerably. We intend to align more closely with the DuckDB Wasm API in the future. ## DuckDB version support - The MotherDuck Wasm Client library uses the same version of DuckDB Wasm as the MotherDuck web UI. Since the DuckDB Wasm assets are fetched dynamically, and the MotherDuck web UI is updated weekly and adopts new DuckDB versions promptly, the DuckDB version used could change even without upgrading the MotherDuck Wasm Client library. Check `pragma version` to see which DuckDB version is in use. ## Installation `npm install @motherduck/wasm-client` ## Dependencies The MotherDuck Wasm Client library depends on `apache-arrow` as a peer dependency. If you use `npm` version 7 or later to install `@motherduck/wasm-client`, then `apache-arrow` will automatically be installed, if it is not already. If you already have `apache-arrow` installed, then `@motherduck/wasm-client` will use it, as long as it is a compatible version (`^14.0.x` at the time of this writing). Optionally, you can use a variant of `@motherduck/wasm-client` that bundles `apache-arrow` instead of relying on it as a peer dependency. Don't use this option if you are using `apache-arrow` elsewhere in your application, because different copies of this library don't work together. To use this version, change your imports to: ```ts import '@motherduck/wasm-client/with-arrow'; ``` instead of: ```ts import '@motherduck/wasm-client'; ``` ## Usage The MotherDuck Wasm Client library is written in TypeScript and exposes full TypeScript type definitions. These instructions assume you are using it from TypeScript. Once you have installed `@motherduck/wasm-client`, you can import the main class, `MDConnection`, as follows: ```ts import { MDConnection } from '@motherduck/wasm-client'; ``` ### Creating connections To create a `connection` to a MotherDuck-connected DuckDB instance, call the `create` static method: ```ts const connection = MDConnection.create({ mdToken: token }); ``` The `mdToken` parameter is required and should be set to a valid MotherDuck access token. You can create a MotherDuck access token in the MotherDuck UI. For more information, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck#authentication-using-an-access-token). The `create` call returns immediately, but starts the process of loading the DuckDB Wasm assets from `https://app.motherduck.com` and starting the DuckDB Wasm worker. This initialization process happens asynchronously. Any query evaluated before initialization is complete will be queued. To determine whether initialization is complete, call the `isInitialized` method, which returns a promise resolving to `true` when DuckDB Wasm is initialized: ```ts await connection.isInitialized(); ``` Multiple connections can be created. Connections share a DuckDB Wasm instance, so creating subsequent connections will not repeat the initialization process. Queries evaluated on different connections happen concurrently; queries evaluated on the same connection are queued sequentially. ### Evaluating queries To evaluate a query, call the `evaluateQuery` method on the `connection` object: ```ts try { const result = await connection.evaluateQuery(sql); console.log('query result', result); } catch (err) { console.log('query failed', err); } ``` The `evaluateQuery` method returns a [promise](https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Promises) for the result. In an [async function](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/async_function), you can use the `await` syntax as above. Or, you can use the `then` and/or `catch` methods: ```ts connection.evaluateQuery(sql).then((result) => { console.log('query result', result); }).catch((reason) => { console.log('query failed', reason); }); ``` See [Results](#results) below for the structure of the result object. ### Prepared statements To create a [prepared](https://duckdb.org/docs/api/c/prepared) [statement](https://duckdb.org/docs/api/wasm/query#prepared-statements) for later evaluation, use the `prepareQuery` method: ```ts const prepareResult = await this.prepareQuery('SELECT v + ? FROM generate_series(0, 10000) AS t(v);'); ``` This returns an [AsyncPreparedStatement](https://shell.duckdb.org/docs/classes/index.AsyncPreparedStatement.html), which can be evaluated later using the `send` method: ```ts const arrowStream = await prepareResult.send(234); ``` Note: The `query` method of the AsyncPreparedStatement should not be used, because it can lead to deadlock when combined with the MotherDuck extension. To immediately evaluate a prepared statement, call the `evaluatePreparedStatement` method: ```ts const result = await connection.evaluatePreparedStatement('SELECT v + ? FROM generate_series(0, 10000) AS t(v);', [234]); ``` This returns a materialized result, as described in [Results](#results) below. ### Canceling queries To evalute a query that can be canceled, use the `enqueueQuery` and `evaluateQueuedQuery` methods: ```ts const queryId = connection.enqueueQuery(sql); const result = await connection.evaluateQueuedQuery(queryId); ``` To cancel a query evaluated in this fashion, use the `cancelQuery` method, passing the `queryId` returned by `enqueueQuery`: ```ts const queryWasCanceled = await connection.cancelQuery(queryId); ``` The `cancelQuery` method returns a promise for a boolean indicating whether the query was successfully canceled. The result promise of a canceled query will be rejected with and error message. The `cancelQuery` method takes an optional second argument for controlling this message: ```ts const queryWasCanceled = await connection.cancelQuery(queryId, 'custom error message'); ``` ### Streaming results The query methods above return fully materialized results. To evalute a query and return a stream of results, use `evaluateStreamingQuery` or `evaluateStreamingPreparedStatement`: ```ts const result = await connection.evaluateStreamingQuery(sql); ``` See [Results](#results) below for the structure of the result object. ### Error handling The query result promises returned by `evaluateQuery`, `evaluatePreparedStatement`, `evaluateQueuedQuery`, and `evaluateStreamingQuery` will be rejected in the case of an error. For convenience, "safe" variants of these three method are provided that catch this error and always resolve to a value indicating success or failure. For example: ```ts const result = await connection.safeEvaluateQuery(sql); if (result.status === 'success') { console.log('rows', result.rows); } else { console.log('error', result.err); } ``` ### Results A successful query result may either be fully materialized, or it may contain a stream. Use the `type` property of the result object, which is either `'materialized'` or `'streaming'`, to distinguish these. #### Materialized results A materialized result contains a `data` property, which provides several methods for getting the results. The number of columns and rows in the result are available through the `columnCount` and `rowCount` properties of `data`. Column names and types can be retrived using the `columnName(columnIndex)` and `columnType(columnIndex)` methods. Individual values can be accessed using the `value(columnIndex, rowIndex)` method. See below for details about the forms values can take. Several convenience methods also simplify common access patterns; see `singleValue()`, `columnNames()`, `deduplicatedColumnNames()`, and `toRows()`. The `toRows()` method is especially useful in many cases. It returns the result as an array of row objects. Each row object has one property per column, named after that column. (Multiple columns with the same name are dedupicated with suffixes.) The type of each column property of a row object depends on the type of the corresponding column in DuckDB. Many values are converted to a JavaScript primitive type, such as `boolean`, `number`, or `string`. Some numeric values too large to fit in a JavaScript `number` (e.g a DuckDB [BIGINT](https://duckdb.org/docs/sql/data_types/numeric#integer-types)) are converted to a JavaScript `bigint`. Some DuckDB types, such as [DATE](https://duckdb.org/docs/sql/data_types/date), [TIME](https://duckdb.org/docs/sql/data_types/time), [TIMESTAMP](https://duckdb.org/docs/sql/data_types/timestamp), and [DECIMAL](https://duckdb.org/docs/sql/data_types/numeric#fixed-point-decimals), are converted to JavaScript objects implementing an interface specific to that type. Nested types such as DuckDB [LIST](https://duckdb.org/docs/sql/data_types/list), [MAP](https://duckdb.org/docs/sql/data_types/map), and [STRUCT](https://duckdb.org/docs/sql/data_types/struct) are also exposed through speical JavaScript objects. These objects all implement `toString` to return a string representation. For primitive, this representation is identical to DuckDB's string conversion (e.g. using [CAST](https://duckdb.org/docs/sql/expressions/cast.html) to VARCHAR). For nested types, the representation is equivalent to the syntax used to construct these types. They also have properties exposing the underlying value. For example, the object for a DuckDB TIME has a `microseconds` property (of type `bigint`). See the TypeScript type definitions for details. Note that these result types differ from those returned by DuckDB Wasm without the MotherDuck Wasm Client library. The MotherDuck Wasm Client library implements custom conversion logic to preserve the full range of some types. #### Streaming results A streaming result contains three ways to consume the results, `arrowStream`, `dataStream`, and `dataReader`. The first two (`arrowStream` and `dataStream`) implement the async iterator protocol, and return items representing batches of rows, but return different kinds of batch objects. Batches correspond to DuckDB DataChunks, which are no more than 2048 rows. The third (`dataReader`) wraps `dataStream` and makes consuming multiple batches easier. The `dataStream` iterator returns a sequence of `data` objects, each of which implements the same interface as the `data` property of a materialized query result, described above. The `dataReader` implements the same `data` interface, but also adds useful methods such as `readAll` and `readUntil`, which can be used to read at least a given number of rows, possibly across multiple batches. The `arrowStream` property provides access to the underlying Arrow RecordBatch stream reader. This can be useful if you need the underlying Arrow representation. Also, this stream has convenience methods such as `readAll` to materialize all batches. Note, however, that Arrow performs sometimes lossy conversion of the underlying data to JavaScript types for certain DuckDB types, especially dates, times, and decimals. Also, converting Arrow values to strings will not always match DuckDB's string conversion. Note that results of remote queries are not streamed end-to-end yet. Results of remote queries are fully materialized on the client upstream of this API. So the first batch will not be returned from this API until all results have been received by the client. End-to-end streaming of remote query results is on our roadmap. ### DuckDB Wasm API To access the underlying DuckDB Wasm instance, use the `getAsyncDuckDb` function. Note that this function returns (a Promise to) a singleton instance of DuckDB Wasm also used by the MotherDuck Wasm Client. --- Source: https://motherduck.com/docs/troubleshooting/aws-s3-secrets --- sidebar_position: 5 title: Troubleshoot AWS S3 secrets description: Diagnose and fix AWS S3 credential issues including IAM policies, credential chains, and secret configuration. keywords: - AWS S3 - secrets - authentication - credentials - troubleshooting - IAM policy - credential chain --- # Troubleshoot AWS S3 secrets This page is for troubleshooting help with AWS S3 secrets in MotherDuck. For more information on creating a secret, see: [Create Secret](/documentation/sql-reference/motherduck-sql-reference/create-secret.md). ## Prerequisites Before troubleshooting AWS S3 secrets, ensure you have: - **Required**: [A valid MotherDuck Token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md#creating-an-access-token) with access to the target database - **Required**: [AWS credentials](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-files.html) (access keys, SSO, or IAM role) - **Optional**: [DuckDB](https://duckdb.org/docs/stable/clients/cli/overview.html) CLI (for troubleshooting purposes, though any DuckDB client will work) - **Optional**: [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html) (for bucket access verification) :::note **AWS CLI PATH**: If you installed AWS CLI manually, you may need to add it to your system PATH. Package managers like Homebrew (macOS) typically add it to PATH automatically. Verify with `which aws` (macOS/Linux) or `where aws` (Windows) - if it returns a path, you're all set! ::: ## Verify secret access ### Check that the secret is configured First, make sure you're connected to MotherDuck: ```sql -- Connect to MotherDuck (replace 'your_db' with your database name) ATTACH 'md:your_db'; ``` Then type in the following: ```sql .mode line SELECT secret_string, storage FROM duckdb_secrets(); ``` The output should look something like this. Verify that the output string includes values for: `key_id`, `region`, and `session_token`: ```text secret_string = name=aws_sso;type=s3;provider=credential_chain;serializable=true;scope=s3://,s3n://,s3a://;endpoint=s3.amazonaws.com;key_id=;region=us-east-1;secret=;session_token= ``` :::note If you see no results, it means no secrets are configured. You'll need to create a secret first using [CREATE SECRET](/documentation/sql-reference/motherduck-sql-reference/create-secret.md). ::: If your output is missing a value for `key_id`, `region`, or `session_token`, you can recreate your secret by following the directions for [CREATE OR REPLACE SECRET](/documentation/sql-reference/motherduck-sql-reference/create-secret.md). If that output worked successfully, you can confirm you have access to your AWS bucket by running these commands **in your terminal** (not in DuckDB): ```bash # Log into AWS by running: aws sso login # Check bucket access: aws s3 ls ``` **Example Output:** ```text PRE lambda-deployments/ PRE raw/ PRE ducklake/ 2025-05-29 07:03:26 14695690 sample-data.csv ``` :::note **Understanding the output**: `PRE` indicates folders/prefixes, while files show their size and modification date. If you only see `PRE` entries, your bucket contains organized data in folders. To explore deeper, use `aws s3 ls s3:////` or `aws s3 ls s3:/// --recursive` to see all files. ::: ## Configure permissions in AWS This is an example of an IAM policy that will allow MotherDuck to access your S3 bucket. Note: if you use KMS keys, the IAM policy should also have `kms:Decrypt` in `AllowBucketListingAndLocation`. ```json { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowBucketListingAndLocation", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::your_bucket_name" ] }, { "Sid": "AllowObjectRead", "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::your_bucket_name/*" ] } ] } ``` ## AWS credential chain MotherDuck automatically finds your AWS credentials using AWS's credential chain. This is the recommended approach, as it uses short-lived credentials (typically valid for 1 hour), which are more secure and reduce the risk of credential leakage. For most users, it works seamlessly with your existing AWS setup. ### Most common: AWS SSO If you use AWS SSO, first set up an SSO profile (if you haven't already): ```bash aws configure sso ``` Then refresh your SSO token: ```bash aws sso login --profile ``` Create a secret using the `sso` chain with your profile name: ```sql CREATE OR REPLACE SECRET my_secret IN MOTHERDUCK ( TYPE s3, PROVIDER credential_chain, CHAIN 'sso', PROFILE '' ); ``` :::note Secret validation Starting with DuckDB v1.4.0, credentials are validated at secret creation time. If your credentials are not resolvable locally (for example, expired SSO tokens or missing `~/.aws/credentials`), the `CREATE SECRET` command will fail with a `Secret Validation Failure` error. The recommended fix is to use the correct `CHAIN` and `PROFILE` for your credential type (see the SSO example above) and confirm your SSO session is active. If you need to bypass local validation, you can add `VALIDATION 'none'`, but keep in mind that this skips the local check that confirms your credentials are valid before storing them in MotherDuck. ::: ### Other credential types The credential chain also works with: - **Access keys** stored in `~/.aws/credentials` - **IAM roles** (if running on EC2) - **Environment variables** ### Advanced: role assumption :::note **Only needed for**: Cross-account access, elevated permissions, or when you need to assume a different role than your current profile. ::: If you need to assume a specific IAM role, create a profile in `~/.aws/config`: ```ini [profile my_motherduck_role] role_arn = arn:aws:iam::your_account_id:role/your_role_name source_profile = your_source_profile ``` Then create a secret that uses this profile: ```sql CREATE SECRET my_s3_secret ( TYPE S3, PROVIDER credential_chain, PROFILE 'my_motherduck_role', REGION 'us-east-1' -- Use your bucket's region if different ); ``` ## Common challenges ### Scope When using multiple secrets, the `SCOPE` parameter ensures MotherDuck knows which secret to use. You can validate which secret is being used with the `which_secret` function: ```sql SELECT * FROM which_secret('s3://my-bucket/file.parquet', 's3'); ``` ### Periods in bucket name (url_style = path) Because of SSL certificate verification requirements, S3 bucket names that contain dots (.) cannot be accessed using virtual-hosted style URLs. This is due to AWS's SSL wildcard certificate (`*.s3.amazonaws.com`) which only validates single-level subdomains. If your bucket name contains dots, you have two options: 1. **Rename your bucket** to remove dots (e.g., use dashes instead) 2. **Use path-style URLs** by adding the `URL_STYLE 'path'` option to your secret: ```sql CREATE OR REPLACE SECRET my_secret ( TYPE s3, URL_STYLE 'path', SCOPE 's3://my.bucket.with.dots' ); ``` For more information, see [Amazon S3 Virtual Hosting documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html). ## What's Next After resolving your AWS S3 secret issues: - **[Query your S3 data](/key-tasks/cloud-storage/querying-s3-files.md)** - Learn how to query files stored in S3 - **[Load data into MotherDuck](/key-tasks/loading-data-into-motherduck/)** - Set up data loading workflows - **[Configure additional cloud storage](/integrations/cloud-storage/)** - Set up Azure, Google Cloud, or other providers - **[Share data with your team](/key-tasks/sharing-data/)** - Collaborate using MotherDuck's sharing features --- Source: https://motherduck.com/docs/troubleshooting/error_messages --- sidebar_position: 1 title: Error messages description: Common MotherDuck error messages and their solutions, including connection and configuration errors. --- ## Connection errors ### Disallowed connections with a different configuration If you create different connections with the same connection database path (such as `md:my_db`) but a different configuration dictionary, you may encounter the following error: ```text Connection Error: Can't open a connection to same database file with a different configuration than existing connections ``` This validation error prevents accidental retrieval of a previously cached database connection, and can happen only in DuckDB APIs that make use of a [database instance cache](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#multiple-connections-and-the-database-instance-cache). In file-based DuckDB, this can only happen when the previous connection is still in scope. With MotherDuck, the database instance cache is longer lived, so you may see this error even after the previous connections have been closed. #### How to recover For multiple connections that are used sequentially: * If the configuration does not need to differ, consider unifying it, which will allow the same underlying client-side database instance to be reused. * If the configuration differs intentionally, [set the database instance TTL to zero](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#setting-custom-database-instance-cache-time-ttl) and close the previous connections. For multiple connections whose life cycles need to overlap, add a differentiating suffix to the connection string, so that these connections are no longer considered to be backed by the same database. A good differentiating string is the [`session_name`](/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck/#session-names). While it is meant to associate an individual end user to a dedicated backend when used with read scaling tokens, it can also be used to signal client-side intent for a distinct database instance when used with regular tokens. --- Source: https://motherduck.com/docs/troubleshooting/faq --- sidebar_position: 1 title: FAQ description: Frequently asked questions about MotherDuck including DuckDB versions, connection methods, and common issues. keywords: - MotherDuck version - open vs attach - database connection - WAL file - database cache - compatibility --- import Versions from '@site/src/components/Versions'; ### What's the difference between .open md: & ATTACH 'md:' ? `.open` initiates a new database connection (to a given database or `my_db` by default) and can be passed different parameters in the connection strings like `motherduck_token` or [saas_mode](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/authenticating-to-motherduck.md#authentication-using-saas-mode) flag. If you have previous local database attached, it will be detached when using `.open`. `ATTACH` keeps the current database connection and attaches a new motherduck (cloud) database(s) to the current connection. You'll need to use `USE` to select the database you want to query. ### How do I know which version of DuckDB I should be running ? MotherDuck supports DuckDB . - In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . - In **US West (Oregon) -** `us-west-2`, MotherDuck is compatible with client versions through . - In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck is compatible with client versions through . Please check that you have a compatible version of DuckDB running locally. ### How do I know which version of DuckDB am I running? You can use the `VERSION` pragma to find out which version of DuckDB you are running ```sql PRAGMA VERSION; ``` ### How do I know what's executed locally and what's executed remote ? If you run an [EXPLAIN](/sql-reference/motherduck-sql-reference/explain/) on your query, you will see the physical plan. Each operation is followed by either (L)= Local or (R)= Remote as shown in the query plan example below. More information can be found in the [documentation](/sql-reference/motherduck-sql-reference/explain/). ```sql EXPLAIN [Your Query] ``` ![explain-sample](./img/explain_sample.png) ### I connect to both MotherDuck and a local database, why is there an uncheckpointed WAL left behind? DuckDB keeps a [database instance cache](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#multiple-connections-and-the-database-instance-cache) for each unique connection path. Connecting to MotherDuck extends the lifetime of the database instance to a default of 15 minutes. If you observe a WAL file left behind for the local database after the process exits or run into the "File is already open" error when closing and reopening the connection, there are several workarounds: * Run `CHECKPOINT "local-database-name"` in the application code. * Run `DETACH "local-database-name"` in the application code * Disable the cache lifetime extension by setting `motherduck_dbinstance_inactivity_ttl` setting to `0s` (see [Setting Custom Database Instance Cache TTL](/documentation/key-tasks/authenticating-and-connecting-to-motherduck/connecting-to-motherduck.md#setting-custom-database-instance-cache-time-ttl)). ### Why am I not in the same Organization as my team? If you sign up to MotherDuck directly, you will create your own Organization as a part of the sign up flow. To join your team's Organization, reach out to your team and request that they [invite you to their Organization](../key-tasks/managing-organizations/managing-organizations.mdx#inviting-users-to-your-organization). As an alternative, you may reach out to [MotherDuck support](./support.md) and we can search for other users within your domain. ### How do I use my team's shared databases? Some database shares are scoped at the `ORGANIZATION` level. To use those shares, you must be in the same Organization as the person who created the share. In addition, some shares are marked as 'DISCOVERABLE`. This allows members of the same Organization to find those shares through the UI. Follow the steps outlined in ["Why am I not in the same Organization as my team?"](#why-am-i-not-in-the-same-organization-as-my-team) to join your team! ### How do I delete my account? You can delete your account and all associated information by following these steps: 1. Navigate to your personal Settings and select "Members" from the left sidebar 2. Click the three dots (⋮) next to your name 3. Select "Delete" 4. Confirm the account deletion :::note If you are the only member of your Organization, deleting your account will also delete the Organization. ::: For additional assistance, please contact our [support team](./support.md). ### Why I am getting SSL errors when connecting to MotherDuck from a Docker image? If you see SSL errors when trying to connect to MotherDuck from a Docker image, this is likely because the image does not have updated CA certificates. If the container was working and suddenly stopped, it is likely that the certificates in the image have expired. Please refer to [Docker's documentation](https://docs.docker.com/engine/network/ca-certs/) for best practices on updating CA certificates in Docker images. Some common errors you might see indicating an issue with your CA certificates include: * `Could not get default pem root certs.` * `Failed to create security handshaker.` * `Update handshaker factory failed.` ### Why don't COPY DATABASE statements work in the MotherDuck Web UI? The MotherDuck Web UI has limitations with certain SQL statements that are implemented as multiple statement macros: **COPY DATABASE statements** have limited support in the MotherDuck Web UI: * The full `COPY FROM DATABASE` command is not supported when copying both schema and data simultaneously * **Workaround**: Use the `COPY FROM DATABASE` command with specific options: * `COPY FROM DATABASE source_db TO target_db (SCHEMA)` - copies only the database structure * `COPY FROM DATABASE source_db TO target_db (DATA)` - copies only the database data For full functionality with these commands, use the DuckDB CLI or other supported drivers. More information about database copying can be found in the [database operations documentation](/documentation/key-tasks/database-operations/copying-databases.md). --- Source: https://motherduck.com/docs/troubleshooting/glossary --- sidebar_position: 10 title: Glossary description: Definitions of key terms and concepts used throughout the MotherDuck documentation. --- import glossary from '@site/glossary/glossary.json'; export const GlossarySection = ({ category, terms }) => { const filtered = terms .filter((t) => t.category === category) .sort((a, b) => a.term.localeCompare(b.term, undefined, { sensitivity: 'base' })); return (
{filtered.map((t) => ( ))}
TermDefinition
{t.term} {t.definition} {t.link && ( <>{' '}Learn more )}
); }; {glossary.categories.map((category) => (

{category}

))} --- Source: https://motherduck.com/docs/troubleshooting/reinstall-md-extension --- sidebar_position: 3 title: Reinstall the MotherDuck extension description: Force reinstall the MotherDuck extension when experiencing connection or compatibility issues. --- The MotherDuck extension is automatically loaded and downloaded when you connect to MotherDuck. However, you can force a reinstallation by following these steps: ```sql FORCE INSTALL motherduck; ``` Next to that make sure you are running the current supported [version of DuckDB](../faq#how-do-i-know-which-version-of-duckdb-i-should-be-running-). --- Source: https://motherduck.com/docs/troubleshooting/support --- sidebar_position: 7 title: Contact support description: Contact MotherDuck support via Slack community or email for questions not covered in the FAQ. --- Have a question that isn't answered in our [FAQ](./faq.md)? Join the [MotherDuck Slack Community](https://slack.motherduck.com/) or contact us at [support@motherduck.com](mailto:support@motherduck.com?subject=Support+question). --- Source: https://motherduck.com/docs/troubleshooting/troubleshooting-access-policy --- sidebar_position: 6 title: Data access policy for support troubleshooting description: Policy details for when MotherDuck support accesses your account data during troubleshooting. --- To help you with certain kinds of MotherDuck issues, it can be helpful for us to access your MotherDuck account. For example, if a specific query on a specific dataset is triggering a bug, it may be necessary for us to access the data and SQL query, and possibly re-run a specific query, to reproduce the issue and diagnose the problem. A MotherDuck employee may use our community Slack or email to request your permission to access your MotherDuck account while troubleshooting an issue. If you give us permission to access to your MotherDuck account for troubleshooting, here is what you need to know: - Our goal is to understand the issue and resolve the problem. We will make every effort to minimize the amount of time we spend accessing your account and the amount of data we access. We will only access the data we need to investigate and troubleshoot the specific issue. - Any access to your data will be strictly read-only. - A MotherDuck employee may pull in other MotherDuck employees during the debugging process. By agreeing to allow us to access your account for troubleshooting an issue, other MotherDuck employees who are asked to help investigate the issue may also access your account, subject to the same terms of this policy, without requesting additional authorization from you. - We will not share or disclose the data we access while troubleshooting the issue to any third party or non-MotherDuck employee. - We may make temporary copies of your data while debugging the issue. Any such copies will be permanently deleted once the issue is resolved. - We may use the data we access in your account to generate a redacted copy of the data to be used for creating a bug report or test. - The permission you have granted to access your account lapses once this specific issue is resolved. --- Source: https://motherduck.com/docs/troubleshooting/troubleshooting --- title: Troubleshooting sidebar_class_name: troubleshooting-icon description: Troubleshooting --- --- Source: https://motherduck.com/docs/troubleshooting/uninstall --- sidebar_position: 2 title: Uninstall the MotherDuck extension description: Steps to completely remove the MotherDuck extension and related environment variables from your system. --- ### How do I uninstall MotherDuck? * Remove `motherduck_*` from your environment variables (most likely only `motherduck_token`) [1] * Remove any `motherduck*.duckdb_extension` file located into `~/.duckdb` [2] [1] To view all your environment variables you may use: ```bash $ env | grep -i motherduck ``` To unset in the current session: ```bash $ unset motherduck_token ``` To unset the variable permanently, you may have to check your shell initialization files (`~/.bashrc`, `~/.zshrc`, etc.) [2] Note those files are generally under `~/.duckdb/extensions//`. Eg. `~/.duckdb/extensions/v0.9.1/osx_arm64`. You may use this script: ```bash $ find ~/.duckdb -name 'motherduck*.duckdb_extension' -exec rm {} \; ``` --- Source: https://motherduck.com/docs/troubleshooting/version-lifecycle-schedules --- sidebar_position: 8 title: MotherDuck version lifecycle schedules description: DuckDB and DuckLake version support schedules, end of life policies, and extended lifecycle support options. --- import Versions from '@site/src/components/Versions'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; MotherDuck supports DuckDB versions according to a predictable lifecycle so you always know which version is safe to use. The lifecycle schedules below form a part of MotherDuck’s Support Policies. They include Major Releases and Minor Releases to support [DuckDB](/troubleshooting/faq/#how-do-i-know-which-version-of-duckdb-i-should-be-running-) and [DuckLake](/integrations/file-formats/ducklake/) versions and specify end of life dates for both. ## Supported versions MotherDuck supports DuckDB . - In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions through . - In **US West (Oregon) -** `us-west-2`, MotherDuck is compatible with client versions through . - In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck is compatible with client versions through . MotherDuck strives to support DuckDB Major and Minor versions in alignment with the [DuckDB](https://duckdb.org/release_calendar) and [DuckLake](https://ducklake.select/release_calendar) release calendars. For new releases, MotherDuck provides updates to users through email and the [Community Slack](https://slack.motherduck.com/) if support for new versions will take more than 48 hours. Newly supported versions are announced in our [release notes](https://motherduck.com/docs/about-motherduck/release-notes/). When a version is available, we recommend that users [install and run the latest **MotherDuck-supported version**](https://motherduck.com/docs/getting-started/interfaces/connect-query-from-duckdb-cli/#install-with-bash) to take advantage of the most up-to-date features and functionality. ## Programmatic access Agents, scripts, and CI checks can read the supported version ranges as JSON: ```bash curl https://motherduck.com/docs/duckdb-versions.json ``` Response shape: ```json { "title": "MotherDuck supported DuckDB versions", "description": "Versions of DuckDB and DuckDB-based language clients supported by MotherDuck...", "$comment": "https://motherduck.com/docs/troubleshooting/version-lifecycle-schedules/", "duckdb": { "motherduck_regions": { "global": { "min": "1.4.0", "max": "1.5.2" }, "us-east-1": { "min": "1.4.0", "max": "1.5.2" }, "us-west-2": { "min": "1.4.1", "max": "1.5.2" }, "eu-central-1": { "min": "1.4.1", "max": "1.5.2" } } }, "language_clients": { "duckdb_jdbc": "1.5.1.0" } } ``` How to read it: - `duckdb.motherduck_regions.` is the inclusive `min`/`max` DuckDB version range that the named MotherDuck region accepts. Use this as the default compatibility window for any client. - `language_clients` lists individually-published drivers with a fixed published version. For any client not listed here (Python, Node.js, Go, and so on), use the matching region's range from `duckdb.motherduck_regions`. The JSON updates whenever the support windows on this page change — the version numbers above and the `` components throughout the docs render from the same file. ### Latest supported version (single value) If you need only the latest supported DuckDB version as a plain string (no region info), MotherDuck publishes it at: ```bash curl https://api.motherduck.com/latest_supported_duckdb_version.txt ``` This is the same value the [`install.motherduck.com`](https://github.com/motherduckdb/mono/blob/main/scripts/install-motherduck-com/install.sh) installer reads. It tracks the **upper** (max) bound automatically as new DuckDB releases are qualified. The **lower** (min) bound — when a previous DuckDB version is deprecated — is decided manually as part of MotherDuck's [End of life policy](#end-of-life-eol-policy) and announced in the release notes. That decision is reflected by hand in this docs page and the JSON endpoint above, which is why the docs are the source of truth for full support ranges. ### Updating the data (for MotherDuck contributors) The source of truth is `static/duckdb-versions.json` in the [`motherduck-docs`](https://github.com/motherduckdb/motherduck-docs) repository. Update this file when: - A new DuckDB version is qualified for MotherDuck (bump `max` for each entry under `duckdb.motherduck_regions`). - A previous DuckDB version reaches End of Life and is removed from support (bump `min` for each entry under `duckdb.motherduck_regions`). - The published JDBC driver version changes (update `language_clients.duckdb_jdbc`). The same file feeds the `` and `` components rendered throughout these docs, so a single edit propagates everywhere. ## MotherDuck support lifecycle schedules The chart shows current support windows. The tables list all versions.
```mermaid %%{init: { "theme": "base", "themeVariables": { "fontFamily": "var(--ifm-font-family-base)", "textColor": "#383838", "titleColor": "#383838", "primaryColor": "#16AA98", "primaryTextColor": "#383838", "primaryBorderColor": "#383838", "sectionBkgColor": "#F4EFEA", "sectionBkgColor2": "#F4EFEA", "altSectionBkgColor": "#F4EFEA", "gridColor": "#B8C3CA", "taskBkgColor": "#16AA98", "taskBorderColor": "#383838", "activeTaskBkgColor": "#16AA98", "activeTaskBorderColor": "#0C7D71", "doneTaskBkgColor": "#D7D7D7", "doneTaskBorderColor": "#8C8C8C", "critBkgColor": "#16AA98", "critBorderColor": "#0C7D71", "todayLineColor": "#FFDE02", "taskTextColor": "#383838", "taskTextDarkColor": "#383838", "taskTextOutsideColor": "#383838" }, "gantt": { "fontSize": 16, "sectionFontSize": 18, "barHeight": 30, "barGap": 14, "leftPadding": 116, "topPadding": 42, "gridLineStartPadding": 36 } }}%% gantt dateFormat YYYY-MM-DD axisFormat %b '%y tickInterval 1month todayMarker stroke-width:3px,stroke:#FFDE02,opacity:0.9 section DuckDB DuckDB 1.4.x LTS :active, 2025-10-09, 2026-09-30 DuckDB 1.5.1 :active, 2026-03-24, 2027-03-31 DuckDB 1.5.2 :crit, 2026-04-13, 2027-03-31 section DuckLake DuckLake 0.4 :active, 2026-03-24, 2026-09-30 DuckLake 1.0 :crit, 2026-04-13, 2027-03-31 ```
The chart uses MotherDuck support announcement dates where available. The tables below retain version release dates and support end dates. **DuckDB support schedule** | DuckDB release | Supported DuckLake version (release date) | Release date | End of life date* | |----------------|-------------------------------------------|--------------|-------------------| | 1.5.2 | 1.0 (April 13, 2026) | April 13, 2026 | March 2027 | | 1.5.1 | 0.4 (March 9, 2026) | March 23, 2026 | March 2027 | | 1.5.0 | 0.4 (March 9, 2026) | March 9, 2026 | March 2027 | | 1.4.4 | 0.3 (September 17, 2025) | January 27, 2026 | September 2026 | | 1.4.3 | 0.3 (September 17, 2025) | December 9, 2025 | September 2026 | | 1.4.2 | 0.3 (September 17, 2025) | November 12, 2025 | September 2026 | | 1.4.1 | 0.3 (September 17, 2025) | October 7, 2025 | September 2026 | | 1.4.0 | 0.3 (September 17, 2025) | September 16, 2025 | September 2026 | | 1.3.2 | — | July 8, 2025 | March 2026 | | 1.3.1 | — | June 16, 2025 | March 2026 | | 1.3.0 | — | May 21, 2025 | March 2026 | | 1.2.2 | — | April 8, 2025 | January 2026 | | 1.2.1 | — | March 5, 2025 | January 2026 | | 1.2.0 | — | February 5, 2025 | January 2026 | | 1.1.3 | — | November 4, 2024 | July 2025 | | 1.1.2 | — | October 14, 2024 | July 2025 | | 1.1.1 | — | September 24, 2024 | July 2025 | | 1.1.0 | — | September 9, 2024 | July 2025 | | 1.0.0 | — | June 3, 2024 | July 2025 | * Beginning with DuckDB 1.3.0, MotherDuck supports each Minor Release until the date specified above. **DuckLake support schedule** | DuckLake release | Supported DuckDB version (release date) | Release date | End of life date** | |------------------|-----------------------------------------|--------------|--------------------| | 1.0 | 1.5.2 (April 13, 2026) | April 13, 2026 | January 2027 | | 0.4 | 1.5.0-1.5.1 (March 2026) | March 9, 2026 | September 2026 | | 0.3 | 1.4.x (September 2025-January 2026) | September 17, 2025 | March 2026 | | 0.2 | 1.3.x (May-July 2025) | July 4, 2025 | September 2025 | | 0.1 | 1.3.x (May-July 2025) | May 27, 2025 | July 2025 | **Note:** DuckLake 1.0 support is aligned with the DuckDB 1.5.x support window shown above.
## End of life (EOL) policy When a new Minor version becomes available, the previous one enters Extended Support. While we don't offer support for new features, critical fixes may still be backported for the greater of: - **6 months** after the version’s release, or - **4 months** after the next Minor version is released When a Minor version reaches its End of Life (EoL): - Connections using that DuckDB version are blocked, requiring MotherDuck users to upgrade - Ahead of scheduled End of Life (EoL) dates, MotherDuck provides in-app UI warnings, email communications, and targeted outreach to users about impacted versions slated for deprecation DuckLake versions follow the published compatibility schedule above and require a supported DuckDB and DuckLake combination. ## MotherDuck extended lifecycle support add-on MotherDuck offers an **Extended Lifecycle Support Add-On** to provide customers with peace of mind and flexibility to upgrade at a later date by extending ongoing technical support for a Minor DuckDB version after it reaches its End of Life (EOL) date. For more information, please [get in touch with our team](https://motherduck.com/contact-us/product-expert/). 💁 If you have additional questions about our version lifecycle, feel free to connect with us directly in our [Community Slack support channel](https://slack.motherduck.com/) or send a note to support@motherduck.com. --- Source: https://motherduck.com/docs/troubleshooting/windows-certs --- sidebar_position: 4 title: Install Let's Encrypt certificates on Windows description: Fix Let's Encrypt certificate trust issues on Windows that cause HTTP 400/500 connection errors. --- In some circumstances, you may face an error that reads like `Http response at 400 or 500 level, http status code: 0`. On Windows machine, this is usually due to [Let's Encrypt](https://letsencrypt.org/) certificate not being trusted. To fix this, please follow the steps below: * download this file https://letsencrypt.org/certs/isrgrootx1.der * open it (double click on the file) ![Certificate window](images/open-certificate.png) * click on "Install Certificate" and follow the instructions: ![Import certificate](images/certificate-import.png) Then you should be able to try again. If it still doesn't work, could you check if it was correctly installed by opening the certmgr (typing "`cert`" in the search box should show it) ![Manage user certificates](images/manage-user-certs.png) And then it should be under `Trusted Root Certification Authorities\Certificates`: ![Certificates manager](images/certmgr.png)