8 min read
DuckDB in the Cloud with MotherDuck
This is a summary of a book chapter from DuckDB in Action, published by Manning. Download the complete book for free to read the complete chapter.

7.1 Introduction to MotherDuck
MotherDuck is a collaborative serverless analytics platform designed to extend the capabilities of DuckDB into the cloud. It allows users to query and analyze data stored in cloud databases and object storage (like S3) using a standard browser or any DuckDB API.
By adopting a serverless model, MotherDuck eliminates the need for users to provision servers, configure clusters, or manage database instances. Instead, the service handles the infrastructure, allowing data engineers and analysts to focus purely on SQL logic and data insights. You can find the documentation for the service at motherduck.com/docs/.
7.1.1 How it works
MotherDuck operates on a unique hybrid architecture. When a user interacts with the platform—whether through the web UI, the Command Line Interface (CLI), or language integrations like Python—they are using a specialized version of DuckDB.
- The Web UI: Runs a WebAssembly (WASM) compiled version of DuckDB directly in the browser, handling local caching and lightweight operations.
- The Extension: When using the CLI or Python, a MotherDuck extension is automatically loaded when the
md:ormotherduck:protocol is detected. - Hybrid Execution: The query engine intelligently analyzes the query to determine if tables are local or remote. It then routes the execution accordingly—sending heavy processing to the cloud or fetching data to join locally.
The architecture consists of a Service Layer (for auth and monitoring), Ducklings (serverless compute instances), a Catalog, and optimized Storage.
7.1.2 Why use MotherDuck?
MotherDuck addresses the "long tail" of data warehouse users—those who do not require petabyte-scale processing but need more power and collaboration than a local file allows.
- Efficient Scaling: It is ideal for datasets in the gigabyte range (e.g., monitoring energy production sites), which are too large for a spreadsheet but too small to justify a complex distributed system.
- Data Lake Querying: It acts as a query engine for heterogeneous data sources, allowing users to join cold data stored in Parquet or Iceberg formats on S3 with "hot" data stored directly in MotherDuck.
- Serverless Backend: It serves as a backend for data apps and dashboards, offloading analytics queries from transactional databases.
7.2 Getting started with MotherDuck
To begin, users navigate to the MotherDuck website and sign up. The platform supports seamless onboarding via GitHub or Google accounts. Once logged in, users are presented with the MotherDuck UI, which displays databases and schemas in a navigable tree structure alongside a query interface.
7.2.1 Using MotherDuck through the UI
The web-based MotherDuck UI (app.motherduck.com) serves as the central hub for managing remote databases and secrets.
- Notebook Experience: The UI offers a Jupyter-Notebook-like interface with SQL auto-complete.
- Local Caching: Because the UI runs a WASM version of DuckDB, query results are cached locally in the browser, enabling instant sorting, pivoting, and filtering without re-running the query in the cloud.
- Data Visualization: The interface includes built-in histograms and bar charts within the data grid to visualize distributions immediately.
- File Support: Users can upload CSV and Parquet files directly through the UI for immediate analysis.
- Shell Alternative: If you don't want a MotherDuck account, you can use the purely local web shell at shell.duckdb.org, though it cannot persist data or connect to MotherDuck.
7.2.2 Connecting to MotherDuck with DuckDB via token-based authentication
DuckDB integrates with MotherDuck using a token-based authentication flow.
- Initiating Connection: Running
.open md:in the CLI triggers the authentication process. - Browser Auth: If not logged in, the CLI prompts the user to visit an authentication URL (e.g.,
https://auth.motherduck.com/activate) and enter a confirmation code. - Token Management: Once authenticated, a token is retrieved. For automation, this token can be stored in an environment variable (
motherduck_token) or passed directly in the connection string:duckdb.connect('md:?motherduck_token=...').
By default, users connect to a database named my_db, but can switch using the standard USE command.
7.3 Making the best possible use of MotherDuck
The MotherDuck extension enhances DuckDB with specific cloud capabilities. To maximize the platform's potential, users should master features related to data migration, sharing, S3 integration, and hybrid execution controls. These features allow for a workflow where data is developed locally and seamlessly deployed or shared in the cloud.
7.3.1 Uploading databases to MotherDuck
Migrating a local DuckDB database to the cloud is straightforward.
- The Command: Use
CREATE DATABASE "remote_name" FROM 'local_file.duckdb'. - Important Rule: The local filename and the remote database name must be different to avoid catalog errors (e.g.,
Catalog Error: Database... has already been created). - Performance: For very large databases, it may be faster to export data to Parquet files, upload them to cloud storage, and ingest them into MotherDuck from there, rather than uploading the database file directly from a local machine.
7.3.2 Creating databases in MotherDuck
Users can build schemas from scratch directly in the cloud using standard SQL.
- Creation:
CREATE DATABASE "my-test";establishes a new remote database. - Verification:
SHOW DATABASES;or.databaseslists all available remote instances. - Context: The
USE "my-test";command switches the current session to the new database, allowing queries to reference tables without prefixes. - Direct Ingress: Creating databases directly in MotherDuck is highly efficient when the source data already resides in the cloud (e.g., S3 buckets), as it reduces data ingestion latency compared to downloading to a local machine first.
7.3.3 Sharing databases
One of MotherDuck's most powerful features is the ability to share read-only snapshots of databases. To make your data available to others (see more), you can use the CREATE SHARE statement.
- Create Share: The statement generates a unique link (
md:_share/...). - Attach Share: Collaborators use the
ATTACH 'link' AS db_namecommand to access the data, views, and schemas exactly as they existed at the snapshot moment. - Updates: Shares are not live by default; the
UPDATE SHAREcommand must be run to propagate changes to consumers.
7.3.4 Managing S3 secrets and loading Data from S3 buckets
To query private data stored in Amazon S3, MotherDuck manages authentication credentials securely.
- Secret Creation: Users can run
CREATE OR REPLACE SECRETto store S3 access keys, secret keys, and region settings. - Persistent Storage: Unlike temporary sessions, MotherDuck stores these secrets securely in the service layer, making them available across different sessions and interfaces (Web UI, Python, CLI).
- Usage: Once configured, users can query data using the
s3://protocol just as they would a local file.
7.3.5 Optimizing data ingestion and MotherDuck usage
MotherDuck provides granular control over where a query is executed to optimize for cost and performance via the MD_RUN parameter in functions like read_csv_auto.
MD_RUN=LOCAL: Forces execution on the local machine (saves cloud compute costs).MD_RUN=REMOTE: Executes the function in the MotherDuck cloud (faster for data already in the cloud).MD_RUN=AUTO: The default setting; intelligently executes S3/HTTP requests remotely and local file requests locally.
Storage Pricing: MotherDuck distinguishes between Cold Storage (persistent object storage) and Hot Storage (fast, memory-like storage for active querying). Efficiently managing hot storage limits is key to controlling costs. Keep in mind that pricing for additional compute or storage applies to cloud usage, not your local execution (see pricing).
7.3.6 Querying your data with AI
MotherDuck includes a generative AI feature to lower the barrier to entry for SQL.
- Natural Language to SQL: Using the
pragma prompt_query('Question')directive, users can ask questions in plain English. The system sends the schema and prompt to an LLM, which generates and executes the corresponding SQL query. - SQL Fixer: In early 2024, MotherDuck introduced FixIt, a fast SQL error fixer for the UI. It can auto-correct syntactically incorrect SQL or logic errors (like a missing
GROUP BYclause). - Schema Description: The
prompt_schema()stored procedure provides a natural language summary of the tables and relationships in the database.
7.3.7 Integrations
MotherDuck fits into the "Modern Duck Stack," supporting a wide array of data tools:
- Ingestion: Integrates with tools like Airbyte and Fivetran.
- Orchestration: Works with Airflow, Dagster, and dbt.
- BI & Visualization: Compatible with Superset, Tableau, and Evidence.
- Data Science: Acts as a retrieval component for RAG (Retrieval Augmented Generation) pipelines in LLM applications.
Because MotherDuck speaks the DuckDB protocol, virtually any tool with a DuckDB driver can connect to MotherDuck simply by modifying the connection string.
Summary
- Serverless Platform: MotherDuck removes infrastructure management, enabling SQL analytics directly from the browser.
- Hybrid Architecture: It seamlessly integrates local and remote execution via the
md:protocol. - Collaboration: It introduces easy database sharing via snapshots (
CREATE SHARE). - Performance: Hybrid execution optimizes speed by processing data where it resides (local vs. cloud).
- AI capabilities: Built-in LLM features allow users to query data using natural language and automatically fix SQL errors.


