# MotherDuck Documentation - Core Concepts

> Scoped full Markdown content for Core Concepts. For other areas, start from https://motherduck.com/docs/llms.txt instead of loading unrelated documentation.

## Agent guidance

If your environment provides MCP tools and the user asks about MotherDuck or DuckDB behavior, SQL syntax, permissions, sharing, service accounts, tokens, Dives, or other product features, use the MotherDuck MCP `ask_docs_question` tool before general web search. It answers from official DuckDB and MotherDuck documentation.

For broad context, prefer the most specific scoped `llms-full.txt` file listed in https://motherduck.com/docs/llms.txt before loading the root `llms-full.txt`. The root file contains the complete public documentation corpus and is intended for bulk indexing or large-context workflows.

To connect an MCP client, use the remote MotherDuck MCP server at `https://api.motherduck.com/mcp`. Setup instructions: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-setup. Tool reference: https://motherduck.com/docs/sql-reference/mcp/ask-docs-question.

---
Source: https://motherduck.com/docs/concepts/architecture-and-capabilities

# Architecture and capabilities
> MotherDuck's serverless architecture combining cloud scale with DuckDB's efficiency through Dual Execution.
MotherDuck is a serverless cloud analytics service with a unique architecture that combines the power and scale of the cloud with the efficiency and convenience of DuckDB.

MotherDuck's key components are:

- The MotherDuck cloud service
- MotherDuck's DuckDB SDK
- Dual Execution
- The MotherDuck web UI

### The MotherDuck cloud service

The MotherDuck cloud service lets you store structured data, query that data with SQL, and share it with others. A key MotherDuck product principle is ease of use.

**Serverless execution model**&mdash;You don't need to configure or spin up instances, clusters, or warehouses. You write and submit SQL. MotherDuck takes care of the rest. Under the hood, MotherDuck runs DuckDB and speaks DuckDB's SQL dialect.

**Managed storage**&mdash;you can load data into MotherDuck storage to be queried or shared. MotherDuck storage is durable, secure, and automatically optimized for best performance. MotherDuck storage is surfaced to you through the **catalog** and logical primitives database, schema, table, view, and so on. In addition, MotherDuck can query data outside of MotherDuck storage&mdash;as data on Amazon S3, through HTTPS endpoints, on your laptop, and more.

**The service layer**&mdash;MotherDuck provides key capabilities like secure identity, authorization, administration, and monitoring.

:::note
MotherDuck is available on three AWS regions:
- **US East (N. Virginia):** `us-east-1`, supporting DuckDB versions between 1.4.0 and 1.5.3.
- **US West (Oregon):** `us-west-2`, supporting DuckDB versions between 1.4.1 and 1.5.3.
- **Europe (Frankfurt):** `eu-central-1`, supporting DuckDB versions between 1.4.1 and 1.5.3.

You can choose in which region to create your organization, and organizations can only exist within a single cloud region.

We are working on expanding to other regions and cloud providers.
:::

### MotherDuck's DuckDB SDK

If you're using DuckDB in Python or CLI, you can connect to MotherDuck with a single line of code, `ATTACH 'md:';`. After you run this command, your DuckDB instance becomes supercharged by MotherDuck. MotherDuck's Dual Execution is enabled, and your DuckDB instance gets additional capabilities like sharing, secrets storage, better interoperability with S3, and cloud persistence.

### Dual execution

When connected together, DuckDB and MotherDuck form a different type of distributed system. The two nodes work in concert so you can query data wherever it lives, in the most efficient way possible. This query execution model, called **Dual Execution** (formerly known as Hybrid Execution), automatically routes the various stages of queries execution to the most opportune locations, including highly arbitrary scenarios:

- If a SQL query queries data on your laptop, MotherDuck routes the query to your local DuckDB instance
- If a SQL query queries data in MotherDuck or cloud storage (S3, GCS, Azure, R2), MotherDuck routes that query to MotherDuck's cloud engine, which connects to your storage provider directly. MotherDuck can use both cloud-stored and local secrets to authenticate. See [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/) for details.
- If a SQL query executes a join between data on your laptop and data in MotherDuck, MotherDuck finds the best way to efficiently join the two

![Image](./../img/hybrid_query.png)

### The MotherDuck web UI

You can use MotherDuck's web UI to analyze and share data and to perform administrative tasks. MotherDuck's UI consists of a lightweight notebook, a SQL IDE, and a data catalog. Uniquely, MotherDuck caches query results in a highly interactive query results panel, letting you sort, filter, and even pivot data quickly.

## Summary of capabilities

With MotherDuck you can:

- Use serverless DuckDB in the cloud to store data and execute DuckDB SQL
- Load data into MotherDuck from your personal computer, https, or S3
- Join datasets on your computer with datasets in MotherDuck or in S3
- Copy DuckDB databases between local and MotherDuck locations
- Materialize query results into local or MotherDuck locations, or S3
- Work with data in MotherDuck's notebook UI, standard DuckDB CLI, or standard DuckDB Python package
- Share databases with your teammates
- Securely save S3 credentials in MotherDuck

Additionally, MotherDuck supports connectivity to third party tools through:

- JDBC
- Go
- sqlalchemy

## Considerations and limitations

MotherDuck does not yet support the full range of SQL of DuckDB. We are continuously working on improving coverage of DuckDB in MotherDuck. If you need specific features enabled, please let us know.

Below is the list of DuckDB features that MotherDuck does not yet support:

- Custom Python / Native user defined functions.
- Server-side attach of postgres, sqlite, etc.
- Custom or community extensions.

---

Source: https://motherduck.com/docs/concepts/database-concepts

# Database Concepts
> MotherDuck Database Concepts
## MotherDuck Architectural Concepts

:::note
MotherDuck is a cloud-native data warehouse, built on top of DuckDB, a fast in-process analytical database. It inherits some features from DuckDB that present opportunities to think differently about data warehousing methods in order to achieve high levels of performance and simplify the experience.
:::

 - **Isolated Compute Tenancy**: Each user is allocated their own "Duckling," which is an isolated piece of compute that sits on top of the MotherDuck storage layer. MotherDuck is designed this way to lessen contention between users, which is a common challenge with other data warehouses. Each Duckling had under 100ms of cold start time as MotherDuck keeps Ducklings on warm standby.

- **Aggressively Serverless**: Unlike conventional data warehouses, DuckDB automatically parallelizes the work that you send to it. The implication of this is that scheduling multiple queries at-a-time does not meaningfully increase throughput, as DuckDB has already parallelized the workload across all available resources.

- **Database level security model**: It has a simplified access model - users either have access to an entire database, or nothing at all. As a result, users will interact with data frequently at the database level. This is unusual when compared to other databases, which often treat multiple database files as single concepts from an interactivity perspective.

- **Database Sharing**: MotherDuck separates storage and compute, which means that one user cannot see another's writes into a database until that database is updated to that user. As such, it has its own concept called ["SHARES"](/key-tasks/sharing-data/sharing-overview/) within Organizations, which are zero-copy clones of the main database for read-only use, enabling high scalability of analytics workloads.

- **Dual Execution**: Every MotherDuck client is also a DuckDB engine, so you can efficiently query local data and (JOIN, UNION) with data that's stored in your MotherDuck data warehouse.  [The query planner automatically decides](/concepts/architecture-and-capabilities#dual-execution) the best place to execute each part of your query.

---

Source: https://motherduck.com/docs/concepts/hypertenancy

# Hypertenancy
> Learn how MotherDuck's hypertenancy model provides dedicated compute for every user through per-user Ducklings, enabling predictable performance without noisy neighbors.
MotherDuck implements a unique tenancy model called **hypertenancy**: every user or service account gets their own dedicated DuckDB compute instance, called a Duckling. Unlike traditional data warehouses where all users share a single cluster, hypertenancy provides full compute isolation at the individual user level.

## The problem with traditional multi-tenancy

Traditional data warehouses and OLAP systems use a shared-compute model:

```mermaid
graph TB
    subgraph Users["All Users"]
        U1{{"User A"}}:::green
        U2{{"User B"}}:::green
        U3{{"User C"}}:::green
    end

    subgraph Warehouse["Shared Data Warehouse"]
        Cluster["Single Compute Cluster"]:::yellow
    end

    U1 --> Cluster
    U2 --> Cluster
    U3 --> Cluster
```

This shared model creates several challenges:

- **Noisy neighbors**: One user's expensive query affects everyone else's performance
- **Resource contention**: Concurrency limits apply across all users
- **Unpredictable performance**: Query times vary based on overall system load
- **Overprovisioning**: Resources must be sized for peak aggregate load, sitting idle most of the time
- **Difficult cost attribution**: Hard to track compute costs per user or customer

## How Hypertenancy works

With hypertenancy, MotherDuck provisions a separate Duckling for each user:

```mermaid
graph TB
    subgraph Users["All Users"]
        U1{{"User A"}}:::green
        U2{{"User B"}}:::green
        U3{{"User C"}}:::green
    end

    subgraph MotherDuck["MotherDuck"]
        D1["Duckling A"]:::yellow
        D2["Duckling B"]:::yellow
        D3["Duckling C"]:::yellow
    end

    U1 --> D1
    U2 --> D2
    U3 --> D3
```

Each Duckling is a complete DuckDB instance with dedicated CPU, memory, and fast SSD spill space. This architecture delivers:

- **Perfect isolation**: No noisy neighbors—one user's workload never impacts another
- **Predictable performance**: Dedicated resources mean consistent query times
- **Independent scaling**: Each user's compute can be sized to their specific needs
- **Per-user billing**: Compute costs directly attributable to individual users
- **Fast cold starts**: Ducklings start in approximately 1 second

## Scaling with Hypertenancy

Hypertenancy supports both vertical and horizontal scaling, letting you match compute resources to actual demand.

### Vertical scaling: Duckling sizes

Each user's Duckling can be configured to different sizes based on their workload requirements:

| Duckling Size | Best For |
|---------------|----------|
| **Pulse** | Ad-hoc queries, read-heavy workloads, high-concurrency analytics |
| **Standard** | Core analytical workflows, ETL/ELT pipelines |
| **Jumbo** | Large-scale batch processing, complex joins |
| **Mega** | Demanding jobs with high data volumes |
| **Giga** | Largest and toughest batch workloads |

You can adjust Duckling size per user through the [MotherDuck UI](/about-motherduck/billing/duckling-sizes/#changing-duckling-sizes) or [REST API](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/).

For example, in a customer-facing analytics scenario, you might provision:
- **Pulse** Ducklings for most customers running standard dashboards
- **Standard** or **Jumbo** Ducklings for enterprise customers with heavier workloads
- **Mega** or **Giga** Ducklings for batch data loading jobs

### Horizontal scaling: read scaling

When a single user needs to handle many concurrent queries—such as a service account powering a customer-facing application—you can enable [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). Read scaling provisions additional read-only Ducklings that share the same data but distribute query load:

```mermaid
graph TB
    subgraph App["Application Users"]
        E1{{"End User 1"}}:::green
        E2{{"End User 2"}}:::green
        E3{{"End User 3"}}:::green
        E4{{"End User 4"}}:::green
    end
    S1[Service Account]:::watermelon

    subgraph MotherDuck["MotherDuck (Customer X)"]
        RW["Read-Write Duckling<br/>(Data Loading)"]
        R1["Read Scaling Duckling 1"]
        R2["Read Scaling Duckling 2"]
    end
    S1 --> RW
    E1 --> R1
    E2 --> R1
    E3 --> R2
    E4 --> R2

```

Read scaling lets you serve hundreds or thousands of concurrent end users through a single service account while maintaining predictable performance.

## Hypertenancy use cases

### Customer-facing analytics

Hypertenancy is particularly powerful for [customer-facing analytics](/getting-started/customer-facing-analytics/). Each of your customers can have their own service account with isolated Ducklings:

- **Data isolation**: Each customer's data stays in their own database
- **Compute isolation**: One customer's workload never impacts another
- **Cache isolation**: Each customer's Duckling maintains its own cache, so cached query results and data remain private and predictable
- **Independent sizing**: Scale resources per customer based on their tier or needs
- **Predictable costs**: Bill customers accurately based on their actual compute usage

For a hands-on guide to building customer-facing analytics with per-customer service accounts, see the [Builder's Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/).

### Development and production pipelines

Service accounts enable clean separation between deployment environments. Each environment gets its own isolated compute:

| Environment | Service Account | Duckling Size | Purpose |
|-------------|-----------------|---------------|---------|
| Local/Dev | `dev-pipeline` | Pulse | Interactive development and testing |
| Staging | `staging-pipeline` | Standard | Pre-production validation |
| Production | `prod-pipeline` | Standard/Jumbo/... | Production workloads |

This separation ensures:
- Development experiments never impact production performance
- Each environment has appropriately sized compute
- Clear cost attribution per environment
- Easy rollback by switching service account credentials

### Data warehouse and data pipeline workloads

For data pipelines, you can assign dedicated service accounts to different stages of your data workflow. If you're using dbt you can run dbt models with different duckling sizes.

| Pipeline Stage | Service Account | Duckling Size | Workload Pattern |
|----------------|-----------------|---------------|------------------|
| Ingestion | `ingest-service` | Jumbo/Mega | Bulk data loading, high I/O |
| Transformation | `transform-service-standard` / `transform-service-jumbo` /  | Standard/Jumbo | dbt models, ETL jobs |
| Reporting | `reporting-service` | Pulse (read scaling) | Dashboard queries, read-heavy |

This pattern provides:
- **Workload isolation**: Heavy batch ingestion jobs won't slow down interactive reporting queries
- **Right-sized compute**: Each stage gets the Duckling size optimized for its workload
- **Cost visibility**: Track compute costs per pipeline stage
- **Independent scheduling**: Run ingestion during off-peak hours without affecting daytime analysts

### Analytics and data science

For internal analytics teams, hypertenancy means analysts and data scientists each get their own compute. A data scientist running a complex ML feature extraction job won't slow down an analyst building a quick dashboard.

## Why single-node beats distributed for per-user compute

Traditional distributed data warehouses use clusters with multiple nodes that coordinate to execute queries. This architecture introduces:

- Network latency between nodes
- Coordination overhead
- Data shuffling costs

For queries that operate on one user's data at a time (the common pattern in hypertenancy), single-node execution on a Duckling eliminates this overhead entirely. The result is often faster query performance and lower costs compared to distributed systems, especially for interactive analytics workloads.

DuckDB's efficient columnar execution, combined with MotherDuck's fast storage architecture, means queries can handle datasets larger than memory with minimal performance impact.

## Related content

- **Learn about Duckling sizes**: [Duckling Sizes](/about-motherduck/billing/duckling-sizes/)
- **Configure read scaling**: [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
- **Build customer-facing analytics**: [Customer-Facing Analytics Overview](/getting-started/customer-facing-analytics/)
- **Set up per-customer service accounts**: [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/)

---

Source: https://motherduck.com/docs/concepts/resource-management

# Resource management
> Understand MotherDuck's resource hierarchy, from organizations, accounts, tokens, and secrets down to databases and tables, and how each level provides compute isolation, data isolation, and access control.
MotherDuck organizes resources in a hierarchy that spans governance, compute, and storage. Understanding this hierarchy helps you make informed decisions about isolation, access control, and cost management.

## The resource hierarchy

MotherDuck resources are organized into three layers: governance (who can do what), compute (where queries run), and storage (where data lives).

### Governance

Organizations contain accounts (for both users and machine-to-machine/services), and accounts hold tokens and secrets. Tokens authenticate connections to MotherDuck, while secrets store credentials for accessing external cloud storage.

```mermaid
flowchart LR
    Org["Organization"]:::yellow
    Org --> User
    Org --> SA

    subgraph " "
        User{{"User account"}}:::green
        SA{{"Service account"}}:::green
    end

    User --> Creds
    SA --> Creds

    subgraph Creds["Credentials (per account)"]
        Token1["R/W token"]:::sky
        Token2["Read scaling token"]:::sky
        Secret["Secrets"]:::sky
    end
```

### Compute

Each account gets a dedicated R/W Duckling. Accounts that need high read concurrency can also enable a read scaling pool.

```mermaid
flowchart LR
    Token1["R/W token"]:::sky
    Token2["Read scaling token"]:::sky
    RW["R/W Duckling"]:::yellow
    RS["Read scaling pool"]:::yellow
    RSI1["Read scaling Duckling"]:::yellow
    RSI2["Read scaling Duckling"]:::yellow
    Token1 --> RW
    Token2 --> RS
    RS --> RSI1
    RS --> RSI2
```

### Storage

Databases follow the standard DuckDB hierarchy. Shares provide user, organization or public access to databases not created by that account.

```mermaid
flowchart LR
    RW["R/W Duckling"]:::yellow
    DB[("Database")]:::yellow
    Share[("Share<br/>(read-only clone)")]:::sky
    Schema["Schema"]:::green
    Table["Table / View"]:::green
    RW --> DB
    DB --> Schema
    Schema --> Table
    DB -. "CREATE SHARE" .-> Share
```

Ducklings can also read from and write to external cloud storage. A [secret](#secrets) provides the credentials, and the Duckling connects to the storage provider directly:

```mermaid
flowchart LR
    RW["R/W Duckling"]:::yellow
    Secret["Secret"]:::sky
    S3[("S3 / GCS / R2 / Azure")]:::green

    RW -->|"SELECT FROM 's3://...'"| S3
    Secret -. "authenticates" .-> S3
```

## What each level means

### Organization

An organization is the top-level container in MotherDuck. It defines:

- A **billing boundary**: all compute and storage costs roll up to the organization
- A **region**: each organization lives in a single region (for example, `us-east-1`, `us-west-2`, or `eu-central-1`)
- **Admin controls**: organization admins manage users, service accounts, and SSO configuration

Every MotherDuck user belongs to exactly one organization.

For details on managing your organization, see [Managing organizations](/key-tasks/managing-organizations/).

### Accounts: users and service accounts

MotherDuck has two types of accounts:

- **User accounts** represent individual people who sign in interactively
- **Service accounts** represent applications, pipelines, or automated processes

Both types function the same way from a resource perspective: each account gets its own dedicated [Read-Write Duckling and read scaling pool](#ducklings) and owns its own databases. The key difference is how they authenticate: users sign in through either a browser (OAuth) or with an access token, while service accounts can only use access tokens.

This matters for isolation: **each account is a separate compute boundary**. Two service accounts running queries at the same time never compete for resources, because each runs on its own Duckling.

::::tip
Organization admins can [impersonate a service account](/key-tasks/service-accounts-guide/impersonate-service-accounts/) through the MotherDuck UI to view its resources, run queries, or troubleshoot issues as that account.
::::

For details on creating service accounts, see [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/).

### Access tokens

Tokens are the credentials that authenticate a connection to MotherDuck. Each token is scoped to a specific account and determines how the connection is routed:

| Token type | Routes to | Use case |
|---|---|---|
| **R/W token** | The account's R/W Duckling | Data loading, writes, interactive queries |
| **Read scaling token** | The account's read scaling pool | High-concurrency read workloads |

Multiple connections using the same R/W token share the same Duckling. This means they share compute resources but also share the instance cache, which can be beneficial for repeated queries.

For details on authentication, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/).

### Secrets

Secrets store cloud storage credentials (for S3, GCS, Azure, R2, and Hugging Face) in MotherDuck so your Ducklings can read from and write to external storage. Secrets are:

- **Encrypted**: stored fully encrypted in MotherDuck
- **Account-scoped**: each secret belongs to the account that created it and is not visible to other accounts in the organization
- **Scope-matched**: when multiple secrets exist for the same storage type, MotherDuck picks in alphabetical order.

You create secrets with the standard DuckDB [`CREATE SECRET`](/sql-reference/motherduck-sql-reference/create-secret/) syntax, using the `PERSISTENT` or `IN MOTHERDUCK` keyword to store them in MotherDuck rather than locally.

Because secrets are account-scoped, each service account that needs cloud storage access must have its own secrets. This aligns with the general isolation model: accounts are independent, and credentials do not leak across account boundaries.

### Ducklings

A Duckling is a dedicated DuckDB compute instance. Every account gets its own R/W Duckling, providing [hypertenancy](/concepts/hypertenancy/) to guarantee full compute isolation at the individual account level.

Ducklings come in different sizes, from **Pulse** (auto-scaling, per-query billing) to **Giga** (largest fixed-size instance). You choose the size based on workload requirements. See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for details.

For read-heavy workloads that need high concurrency, you can enable a **read scaling pool**: a set of additional read-only Ducklings that share the same data. Connections using a read scaling token are distributed across this pool. See [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for details.

### Databases, schemas, and tables

Databases follow the standard DuckDB hierarchy: a database contains schemas, and schemas contain tables and views. In MotherDuck:

- Every database is **owned by the account** that created it
- Access is **all-or-nothing at the database level**: an account either has full access to a database or no access at all
- You can grant read-only access to others through [shares](/key-tasks/sharing-data/sharing-overview/)

MotherDuck also supports [Ducklake](/concepts/ducklake/), an open table format that stores data in your own object storage while MotherDuck manages the catalog.

## Isolation boundaries

Different boundaries in the resource hierarchy provide different types of isolation. Use this table to understand what separates what:

| Boundary | Compute isolation | Data isolation | Access control | Secret isolation |
|---|---|---|---|---|
| Different organizations | Full | Full | Full | Full |
| Different accounts (same org) | Full (separate Ducklings) | Per-database (owned separately) | Per-database (through shares) | Full (secrets are account-scoped) |
| Different tokens (same account) | None (same R/W Duckling) | None (same databases) | None (same permissions) | None (same secrets) |
| Read scaling pool | Read-only isolation (separate Ducklings) | Shared (read-only replicas) | Token-scoped | Shared (same account secrets) |
| Different databases | N/A | Full | Share-based | N/A |

Key takeaways:

- **Accounts are the primary isolation boundary.** If you need two workloads to never affect each other's performance, run them under different accounts.
- **Tokens do not provide isolation.** Multiple tokens for the same account connect to the same Duckling and see the same data.
- **Shares provide data access without compute sharing.** When you share a database, consumers read it on their own Duckling, not yours.
- **Secrets follow account boundaries.** Each account manages its own cloud storage credentials. Secrets created by one account are never visible to another.

## Common patterns

### Isolate ETL from analysts

Create separate service accounts for your data pipeline and your analysts. Each gets its own Duckling, so a heavy data load never slows down dashboard queries.

```mermaid
flowchart LR
    subgraph Org["Organization"]
        ETL{{"etl-pipeline"}}:::green
        Analyst{{"analyst-team"}}:::green
    end

    subgraph Compute["Compute"]
        D1["Jumbo Duckling"]:::yellow
        D2["Pulse Duckling"]:::yellow
        D3["Pulse Duckling"]:::yellow
        D4["Pulse Duckling"]:::yellow
    end

    ETL --> D1
    Analyst --> D2
    Analyst --> D3
    Analyst --> D4
    D1 --> DB[("Shared database<br/>(via share)")]:::sky
    D2 --> DB
    D3 --> DB
    D4 --> DB
```

The ETL service account owns the database and writes to it on a large Duckling. The analyst-team account uses a [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) pool of Pulse Ducklings to handle concurrent dashboard queries. Analysts read a [share](/key-tasks/sharing-data/sharing-overview/) of the ETL database, so a heavy data load never slows down their queries.

### Separate dev and prod environments

Use different service accounts per environment. Each has isolated compute and its own databases:

```mermaid
flowchart LR
    subgraph Org["Organization"]
        Dev{{"dev-pipeline"}}:::green
        Staging{{"staging-pipeline"}}:::green
        Prod{{"prod-pipeline"}}:::green
    end

    subgraph Compute["Compute"]
        D1["Pulse Duckling"]:::yellow
        D2["Standard Duckling"]:::yellow
        D3["Jumbo Duckling"]:::yellow
    end

    Dev --> D1
    Staging --> D2
    Prod --> D3
    D1 --> DB1[("dev-db")]:::sky
    D2 --> DB2[("staging-db")]:::sky
    D3 --> DB3[("prod-db")]:::sky
```

Right-size each environment: Pulse for development (on-demand), Standard for staging validation, and Jumbo for production workloads.

### Customer-facing analytics (3-tier)

For B2B applications that embed analytics, use a service account per customer. Your backend mediates access, and each customer gets isolated compute and data:

```mermaid
flowchart LR
    subgraph App["Your application (auth, sessions, routing)"]
        FE["Frontend UI"]:::green
        BE["Backend API"]:::green
    end

    subgraph Org["Organization"]
        SA1{{"customer-a"}}:::green
        SA2{{"customer-b"}}:::green
    end

    subgraph Compute["Compute"]
        D1["Jumbo"]:::yellow
        RS["Read scaling pool"]:::yellow
        D2["Pulse Duckling"]:::yellow
        D3["Pulse Duckling"]:::yellow
    end

    FE -->|"user request"| BE
    BE -->|"read token"| SA1
    BE -->|"read token"| SA2
    SA1 --> D1
    SA2 --> RS
    RS --> D2
    RS --> D3
    D1 --> DB1[("customer-a-db")]:::sky
    D2 --> DB2[("customer-b-db")]:::sky
    D3 --> DB2
```

Your application handles user authentication and session management, then routes queries to the right customer's service account using stored read tokens. Each customer's service account owns its own database and Duckling. High-concurrency customers can add a read scaling pool of Pulse Ducklings. For the full walkthrough, see the [3-tier customer-facing analytics guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/).

### Embedded analytics with DuckDB WASM

For lightweight, interactive analytics embedded directly in a web page, you can skip the backend tier entirely. The browser runs DuckDB WASM and connects to MotherDuck with a read-only token:

```mermaid
flowchart LR
    Browser["Browser + DuckDB WASM"]:::green

    subgraph Org["Organization"]
        SA{{"embed-account"}}:::green
    end

    subgraph Compute["Compute"]
        D1["Duckling"]:::yellow
    end

    Browser -->|"read token"| SA
    SA --> D1
    D1 --> DB[("analytics-db")]:::sky
```

DuckDB WASM runs queries client-side or routes them to MotherDuck depending on the query. This is how [Dives](/key-tasks/ai-and-motherduck/dives/) work: each embedded Dive connects to MotherDuck through a session token and queries live data directly from the browser, with no backend needed.

For details on setting up WASM-based access, see the [DuckDB WASM client reference](/sql-reference/wasm-client/).

### Give read access to another team

Use shares to grant read-only access without sharing compute:

1. Create a share of the database: `CREATE SHARE my_share FROM my_database`
2. Grant access to the other team's account: `GRANT READ ON SHARE my_share TO 'other_user'`

The other team reads the shared database on their own Duckling. Your compute is not affected.

### Understand compute costs

Because each account runs on its own Duckling, compute costs are directly attributable:

- **Per-account billing**: you can see exactly how much compute each service account or user consumes
- **Right-sizing**: assign different [Duckling sizes](/about-motherduck/billing/duckling-sizes/) based on workload needs
- **Pulse for variable workloads**: use Pulse Ducklings for ad-hoc or bursty workloads to pay per query instead of per hour

## What's next

MotherDuck is adding workspaces and role-based access control (RBAC) to provide finer-grained access control within organizations. These features build on the resource hierarchy described here. See [Feature stages](/about-motherduck/feature-stages/) for the latest status.

## Related content

- [Architecture and capabilities](/concepts/architecture-and-capabilities/)
- [Hypertenancy](/concepts/hypertenancy/)
- [Duckling sizes](/about-motherduck/billing/duckling-sizes/)
- [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/)
- [Sharing data](/key-tasks/sharing-data/sharing-overview/)
- [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
- [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/)

---

Source: https://motherduck.com/docs/concepts/database-snapshots

# Database Snapshots
> Understand how snapshots work in MotherDuck, including retention, restore, snapshot management, and plan availability
## What are snapshots?

Snapshots capture the complete state of a database at a specific point in time. MotherDuck creates **historical snapshots** in the background for attached databases (databases that are connected to MotherDuck and available for querying), enabling [data recovery](/concepts/data-recovery) features such as restore and [undrop](/sql-reference/motherduck-sql-reference/undrop-database). Historical snapshots come in two forms.

:::note DuckLake databases
DuckLake databases have their own snapshot and maintenance system. Snapshot lifecycle management for DuckLake is handled through [auto maintenance](/concepts/ducklake#auto-maintenance), not the native storage snapshot system described on this page. See [DuckLake snapshot retention](/concepts/ducklake#snapshot-retention) for details on configuring `SNAPSHOT_RETENTION_DAYS` and `AUTO_MAINTENANCE` for DuckLake databases.
:::

### 1. automatic snapshots {#1-automatic-snapshots}

Automatic snapshots are created continuously in the background by MotherDuck whenever data changes. For paid plans every new database has automatic snapshots configured by default. For paid plans you can also set or adjust your database's snapshot retention window with:

```sql
ALTER DATABASE example_database SET SNAPSHOT_RETENTION_DAYS = 4;
```

Automatic snapshots:

- Are created whenever data in the database changes or explicitly with `CREATE SNAPSHOT OF <database_name>;`
- Are retained as [`historical_bytes`](/concepts/storage-lifecycle) according to the database's `snapshot_retention_days` setting
- Can be queried using [`md_information_schema.database_snapshots`](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots)
- Are automatically removed by garbage collection when they fall outside the retention window

### 2. named snapshots {#2-named-snapshots}
Named snapshots have to be explicitly created with a name using [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot).

```sql
CREATE SNAPSHOT my_backup OF example_database
```

These persist indefinitely until the name is removed. Named snapshots are **not** subject to automatic garbage collection and are only available on the Business plan.

Named snapshots differ from automatic snapshots:

- They are **not garbage-collected** by snapshot retention
- They persist even if the source database they are associated with is deleted
- They can be referenced directly by name when restoring or cloning a database
- Snapshot names must be unique per user
- They can only be deleted by removing the name, after which they are picked up by garbage collection

Named snapshots are intended for **long-lived backups** and are the recommended mechanism for durable recovery points.

Named snapshots can be used with the [`ALTER DATABASE SET SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) command, as well as the [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) command to specify the snapshot you want to use.

## Restoring a database
You can [restore a database](/docs/sql-reference/motherduck-sql-reference/create-database/#source-database-options) from a snapshot by specifying the snapshot name, `snapshot_id` or a timestamp. When using a timestamp the latest snapshot at or before that time will be selected.

```sql
CREATE DATABASE example_db_from_snap FROM example_db (SNAPSHOT_NAME 'snap');

CREATE DATABASE example_db_from_id FROM example_db (SNAPSHOT_ID '4bfbd992-e586-48ab-9176-8dfb2d2c30b4');

CREATE DATABASE example_db_from_ts FROM example_db (SNAPSHOT_TIME '2026-01-01 00:00:01.234567');
```

## Snapshot features per plan

### Native storage databases

| Plan | Automatic snapshot retention default | Configurable retention period | Named snapshots | Point-in-time restore | [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) database |
|------|----------------------------|-------------------------------|-----------------|----------------------|-------------------|
| **Business** | 7 days | 0-90 days | Yes | Yes | Yes |
| **Lite (paid)** | 1 day | 1 day | No | Yes | Yes |
| **Lite (free)** | 0 days | N/A | N/A | N/A | N/A |

### DuckLake databases

DuckLake databases manage snapshots through [auto maintenance](/concepts/ducklake#auto-maintenance) rather than the native storage snapshot system.

| Database type | Auto maintenance default | Snapshot retention default | Configurable retention |
|------|----------------------------|-------------------------------|-----------------|
| **Fully managed** | Enabled | Infinite (`NULL`) | Yes, with `SNAPSHOT_RETENTION_DAYS` |
| **BYOB** | Disabled | Infinite (`NULL`) | Yes, after enabling `AUTO_MAINTENANCE` |

## Snapshot retention

The `snapshot_retention_days` database setting controls how long historical snapshots are retained for [data recovery](/concepts/data-recovery). This setting determines how much data is stored as [`historical_bytes`](/concepts/storage-lifecycle) in your storage footprint.

- **`0` days:** No historical snapshots are accessible; automatic snapshots are immediately eligible for garbage collection
- **`1+` days:** Automatic snapshots created within the retention window can be accessed and restored

Users can modify snapshot retention at any time using [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database):

```sql
ALTER DATABASE my_database SET SNAPSHOT_RETENTION_DAYS = 4;
```

To see your database's current snapshot retention, use [`md_information_schema.databases`](/sql-reference/motherduck-sql-reference/md_information_schema/databases) and look for the `historical_snapshot_retention` field.

::::note
Snapshot retention days are inherited when cloning a database.
::::

::::important
Increasing `snapshot_retention_days` does not restore previously deleted snapshots. Once the garbage collection process removes a snapshot, it cannot be recovered through this setting.
::::

## Working with named snapshots

Named snapshots are subject to naming rules.

- Snapshot names must be 1–255 characters long
- Names are unique per user across all databases
- If a name includes special characters (such as `.` or `/`), wrap it in double quotes
- If you create two named snapshots in a row without any new writes, the second can fail because the latest snapshot already has a name

### Renaming a named snapshot

Users can change the name of an existing named snapshot using the [`ALTER SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-snapshot) command:

```sql
ALTER SNAPSHOT <original_snapshot_name> SET snapshot_name = '<new_snapshot_name>';
```

### Deleting (un-naming) a named snapshot

To remove a name from a snapshot, run the following command:

```sql
ALTER SNAPSHOT <original_snapshot_name> SET snapshot_name = '';
```

Once unnamed, the snapshot will become subject to the database's `snapshot_retention_days` policy and will be deleted automatically when it falls outside the retention window.

## Historical snapshots and failsafe bytes

It's important to understand the distinction between historical snapshots and failsafe data:

- **Historical snapshots** are point-in-time copies of your database that you can restore yourself using SQL commands. They are stored as `historical_bytes` and controlled by your `snapshot_retention_days` setting.
- **Failsafe data** is a system-managed backup that MotherDuck retains for disaster recovery. It is stored as `failsafe_bytes` and can only be restored by contacting MotherDuck support.

| | Historical Snapshots: `historical_bytes` | Failsafe Data: `failsafe_bytes` |
|---|---|---|
| **Purpose** | User-initiated data recovery and point-in-time restore | System-level disaster recovery backup |
| **Controlled by** | `snapshot_retention_days` setting | MotherDuck system (7 days for standard databases, 1 day for transient) |
| **Recovery method** | Self-service through [`ALTER DATABASE SET SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) or [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) | Requires contacting [MotherDuck support](https://motherduck.com/contact-us/support/) |
| **Visibility** | Queryable through [`md_information_schema.database_snapshots`](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots) | Not directly visible to users |
| **Storage billing** | Billed as `historical_bytes` | Billed as `failsafe_bytes` |

For more details on storage lifecycle stages, see [Storage Lifecycle and Management](/concepts/storage-lifecycle).

## Best practices

- Use **named snapshots** for long-lived backups you may need to restore far into the future
- If you frequently overwrite your data, use a short snapshot retention window (1-7 days) to avoid storing multiple copies of the same data
- Failsafe restores should be thought of as a precautionary, last-minute measure in exception scenarios only; we recommend using historical snapshots for routine recovery needs
- Do **not** use [transient databases](/concepts/storage-lifecycle#transient-databases) for critical or hard-to-reconstruct data

## Related content

- [Data Recovery](/concepts/data-recovery) — Step-by-step guide to restoring databases from snapshots
- [Storage Lifecycle and Management](/concepts/storage-lifecycle) — Understanding storage stages and billing
- [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot) — SQL reference for creating snapshots
- [`DATABASE_SNAPSHOTS` view](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots) — Query snapshot history and metadata
- [`CREATE DATABASE` from a snapshot](/docs/sql-reference/motherduck-sql-reference/create-database/#source-database-options)

---

Source: https://motherduck.com/docs/concepts/pgduckdb

# pg_duckdb Extension
> Use pg_duckdb to run DuckDB analytics within PostgreSQL and connect to MotherDuck.
[pg_duckdb](https://github.com/duckdb/pg_duckdb) is an open-source Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres.

Use `pg_duckdb` when you specifically need DuckDB or MotherDuck access from inside a PostgreSQL server. If you only need to connect to MotherDuck from a PostgreSQL-compatible client, or to use common connection pooling libraries, use the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) instead.

## Main features

- SELECT queries executed by the DuckDB engine can directly read Postgres tables
- Read and write support for object storage (AWS S3, Cloudflare R2, or Google GCS)
- Read and write support for data stored in MotherDuck

For more information about functionality and installation, check out the [repository's README](https://github.com/duckdb/pg_duckdb/blob/main/README.md).

## Connect with MotherDuck

To enable this support you first need to [generate an access token][md-access-token] and then add the following line to your `postgresql.conf` file:

```ini
duckdb.motherduck_token = 'your_access_token'
```

NOTE: If you don't want to store the token in your `postgresql.conf` file can also store the token in the `motherduck_token` environment variable and then explicitly enable MotherDuck support in your `postgresql.conf` file:

```ini
duckdb.motherduck_enabled = true
```

If you installed `pg_duckdb` in a different Postgres database than the default one named `postgres`, then you also need to add the following line to your `postgresql.conf` file:

```ini
duckdb.motherduck_postgres_database = 'your_database_name'
```

After doing this (and possibly restarting Postgres). You can then create tables in the MotherDuck database by using the `duckdb` [Table Access Method][tam] like this:

```sql
CREATE TABLE orders(id bigint, item text, price NUMERIC(10, 2)) USING duckdb;
CREATE TABLE users_md_copy USING duckdb AS SELECT * FROM users;
```

[tam]: https://www.postgresql.org/docs/current/tableam.html

Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres.

This is done in the following way:
1. Each schema in your default MotherDuck database is merged with the Postgres schema that has the same name.
2. Except for the `main` DuckDB schema in your default database, which is merged with the Postgres `public` schema.
3. Tables in other databases are put into dedicated DuckDB-only schemas. These schemas are of the form `ddb$<duckdb_db_name>$<duckdb_schema_name>` (including the literal `$` characters).
4. Except for the `main` schema in those other databases. That schema should be accessed using the shorter name `ddb$<db_name>` instead.

An example of each of these cases is shown below:

```sql
INSERT INTO my_table VALUES (1, 'abc'); -- inserts into my_db.main.my_table
INSERT INTO your_schema.tab1 VALUES (1, 'abc'); -- inserts into my_db.your_schema.tab1
SELECT COUNT(*) FROM ddb$my_shared_db.aggregated_order_data; -- reads from my_shared_db.main.aggregated_order_data
SELECT COUNT(*) FROM ddb$sample_data$hn.hacker_news; -- reads from sample_data.hn.hacker_news
```

[md]: https://motherduck.com/
[md-access-token]: /key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token

---

Source: https://motherduck.com/docs/concepts/storage-lifecycle

# Storage Lifecycle and Management
> Understand how MotherDuck manages data storage across different lifecycle stages and how this affects your billing and data management strategies.
Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes.

The storage lifecycle applies to both native storage databases and [DuckLake](/concepts/ducklake) databases, with some differences in lifecycle stages and management. See [storage management](#storage-management) for retention defaults by database type.

## Storage lifecycle overview

The following diagram shows the storage lifecycle for native storage databases.

```mermaid
graph LR;
    A[Active Bytes]-->|bytes deleted or updated|B[Historical Bytes];
    B-->|shares dropped|C[Retained for Clone Bytes];
    B-->|historical retention period passes, or snapshots become unnamed|D[Failsafe Bytes];
    C-->|bytes deleted or updated by cloned databases|D[Failsafe Bytes];
    D-->|7 day retention|E[Deleted];
```

There are 5 distinct stages of the storage lifecycle:

1. **Active bytes**: Actively referenced bytes of the database. These bytes are accessible by directly querying the database.
2. **Historical bytes**: Non-active bytes referenced by historical [snapshots](/concepts/snapshots) or shares of this database. Used for time travel and self-service restore.
3. **Retained for clone bytes**: Bytes referenced by other databases (through zero-copy clone) that are no longer referenced by this database as active or historical bytes. This stage applies to native storage databases only.
4. **Failsafe bytes**: Bytes no longer referenced by any database or share, retained for a period as a last-resort, best-effort recovery service. Recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed to be complete. Don't rely on failsafe bytes as part of a backup plan.
5. **Deleted**: Bytes are fully removed from the system and no longer accessible.

MotherDuck runs a periodic job that reclassifies data to the proper storage lifecycle stage. For DuckLake databases, [auto maintenance](/concepts/ducklake#auto-maintenance) handles file cleanup and snapshot expiration.

Data can only flow through the storage lifecycle unidirectionally, from left to right.

The following conditions can trigger data to be reclassified to a new stage:

| Trigger | State transition |
|---------|------------------|
| Data is deleted or updated in the database | Active → Historical |
| All shares referencing the data are dropped or updated, and all historic [snapshots](/concepts/snapshots) referencing the data are deleted | Historical → Retained for Clone or Failsafe |
| Data is deleted from all zero-copy-cloned databases | Retained for Clone → Failsafe |
| Failsafe retention period passes (7 days for standard, 1 day for transient) | Failsafe → Deleted |

An organization is billed based on the average of active, historical, retained for clone, and failsafe bytes across all of their databases over the billing period.

Refer to the [data recovery](/concepts/data-recovery) overview for more details on how to manage historical snapshots.

### How this affects your data strategy

Understanding the storage lifecycle helps you make informed decisions about:

- **Data deletion strategies**: When you delete data, it doesn't immediately reduce your bill due to the retention stages
- **Sharing considerations**: Shared data remains in historical bytes until shares are updated or dropped
- **Cloning decisions**: [Zero-copy clones](/docs/sql-reference/motherduck-sql-reference/create-database/) can keep data in retained for clone bytes even after deletion from the source
- **Cost optimization**: Different lifecycle stages have different cost implications and management strategies

For more information on data sharing, see [Sharing Data](/key-tasks/sharing-data/sharing-overview). For details on zero-copy cloning, refer to [MotherDuck Architectural Concepts](/concepts/database-concepts/#motherduck-architectural-concepts).

## Storage management

Storage retention behavior depends on the database type: standard, transient, or DuckLake.

`SNAPSHOT_RETENTION_DAYS` controls how many days historical snapshots are retained for data recovery and time travel (see [Data Recovery](/concepts/data-recovery)). The recommended minimum is at least 1 day, so you can recover your data if you accidentally drop or overwrite it.

To see the historical retention and transient status of your databases, use the [`md_information_schema.databases`](/sql-reference/motherduck-sql-reference/md_information_schema/databases) view.

Lite starts in free-tier mode with no historical retention until usage limits are reached, after which Lite defaults apply.

### Standard databases

| Plan     | Failsafe period | Default historical retention | Min historical retention | Max historical retention |
|----------|-------------------------------------|------------------------------|------------------------------|------------------------------|
| **Business** | 7 days                              | 7 days                        | 0 days                       | 90 days                      |
| **Lite (paid)**     | 7 days                              | 1 day                        | 1 day                        | 1 day                        |
| **Lite (free)**    | 7 days                              | 0 days                        | 0 days                       | 0 days                       |

Historical retention enables point-in-time restore for your data. Business plan users can configure retention up to 90 days for extended data recovery capabilities.

### Transient databases

For use cases that don't require the default failsafe retention period (7 days), a native storage database can be set as `TRANSIENT` [at database creation](/sql-reference/motherduck-sql-reference/create-database/#database-options) to enforce a 1 day failsafe minimum. This setting can only be defined at database creation and **is not** modifiable.

| Plan     | Failsafe period | Default historical retention | Min historical retention | Max historical retention |
|----------|----------------------------------|--------------------------------------------------|--------------------------------------------------|--------------------------------------------------|
| **Business** | 1 day                             | 1 day                                           | 0 days                                           | 90 days                                          |
| **Lite (paid)**    | 1 day                             | 1 day                                            | 1 day                                            | 1 day                                            |
| **Lite (free)**    | 1 day                             | 0 days                                           | 0 days                                           | 0 days                                           |

Transient databases enforce a 1-day minimum lifetime for data, which  shows up in your bill as failsafe bytes.

Transient databases can be helpful for the following datasets:

* Datasets that are the intermediate output of a job (write once, read once)
* Datasets that can be reconstructed from an external data source

### DuckLake databases

[DuckLake](/concepts/ducklake) databases follow the same lifecycle stages as native storage databases (active, historical, failsafe, deleted), except there is no "retained for clone" stage since DuckLake does not support zero-copy cloning.

| Setting | Fully managed DuckLake | BYOB DuckLake |
|---------|----------------------|---------------|
| **Failsafe period** | 7 days | 7 days |
| **Default snapshot retention** | Infinite (`NULL`) | Infinite (`NULL`) |
| **Auto maintenance** | Enabled by default | Disabled by default |
| **Configurable retention** | Yes, with `SNAPSHOT_RETENTION_DAYS` | Yes, after enabling `AUTO_MAINTENANCE` |

DuckLake storage optimization and snapshot expiration are handled by [auto maintenance](/concepts/ducklake#auto-maintenance) rather than the native storage garbage collector. When `SNAPSHOT_RETENTION_DAYS` is set to `NULL` (the default), snapshots are retained indefinitely.

To configure snapshot retention for a DuckLake database:

```sql
ALTER DATABASE my_ducklake SET SNAPSHOT_RETENTION_DAYS = 7;
```

For more details on DuckLake storage management, see the [DuckLake storage lifecycle](/concepts/ducklake#storage-lifecycle) section.

## Backup strategies

If your data can't be recreated from source, plan an explicit backup strategy. Failsafe bytes are a last-resort recovery mechanism, not a backup plan: recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed.

The storage lifecycle gives you several mechanisms that you can rely on for backups:

- **Automatic snapshots** for time travel and short-term restore, retained as `historical_bytes` according to `SNAPSHOT_RETENTION_DAYS`. Retention defaults and limits depend on your plan (see [Standard databases](#standard-databases)).
- **Named snapshots** (Business plan) for long-lived backups that persist until you explicitly remove them. See [database snapshots](/concepts/snapshots#2-named-snapshots) for details.
- **Zero-copy clones** through [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) for isolated copies without duplicating storage costs.

[Transient databases](#transient-databases) skip the default 7-day failsafe retention and are appropriate for data that can be recreated from a job or external source.

For recovery procedures, see [data recovery](/concepts/data-recovery).

## Breaking down storage usage

:::note Admin only
Storage breakdown information is only available to users with the Admin role.
:::

To understand your organization's storage bill, you have two entry points:

### SQL

Query the [`STORAGE_INFO` and `STORAGE_INFO_HISTORY` views](/sql-reference/motherduck-sql-reference/md_information_schema/storage_info) in [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction) for a breakdown by lifecycle stage, as either a current snapshot or up to 30 days of history.

```sql
-- Get current storage information for all databases
SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO;
```

### UI

Open the [databases page](https://app.motherduck.com/settings/databases) in settings to see total storage across all databases and a per-database breakdown. Click a row to view lifecycle stages for that database.

### _Active bytes_ are higher than expected
Consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types.

### _Historical bytes_ are higher than expected
You should look into either outstanding manually updated shares referencing this database in the organization or your historical database snapshots. Outstanding manually updated shares may keep historical data referenced (which prevent it from being deleted). Your historical byte footprint will decrease as the shares are updated (`UPDATE SHARE`) or dropped. You can find all shares that reference some database by using the [OWNED_SHARES](/sql-reference/motherduck-sql-reference/md_information_schema/owned_shares) view in the [MD_INFORMATION_SCHEMA](/sql-reference/motherduck-sql-reference/md_information_schema/introduction).

Otherwise you can consider reducing the `SNAPSHOT_RETENTION_DAYS` on your database to reduce the number of historical snapshots you retain. Note that this will reduce the window of time that you can restore data from. See [data recovery](/concepts/data-recovery) for more details on how to plan and setup a proper data recovery protocol for your organization.

### _Retained for clone bytes_ are higher than expected
Consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases.

### _Failsafe bytes_ are higher than expected
Failsafe bytes result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (through overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint. You can also consider using a [`TRANSIENT` database](#transient-databases), if it supports your use case, to reduce failsafe bytes to [1 day](https://motherduck.com/docs/concepts/storage-lifecycle/#transient-databases).

If you need help understanding or reducing your storage bill, reach out to [MotherDuck support](https://motherduck.com/contact-us/support/).

---

Source: https://motherduck.com/docs/concepts/data-recovery

# Data Recovery
> Understand MotherDuck's data recovery mechanisms
## Overview

MotherDuck provides [historical snapshots](/concepts/snapshots) to support point-in-time backup/restore mechanisms on Lite and Business plans. On the Lite plan, databases only keep the active snapshot (no historical retention) until usage limits are reached, after which Lite snapshot retention and [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) behavior apply. This page covers an example workflow with [named snapshots](/concepts/snapshots#2-named-snapshots) and outlines how to restore a database to a historical snapshot within the [snapshot retention](/concepts/snapshots#snapshot-retention) window (`snapshot_retention_days`). Refer to the [Database Snapshots](/concepts/snapshots) page for more details.

:::note DuckLake databases
DuckLake databases manage snapshots through [auto maintenance](/concepts/ducklake#auto-maintenance). Snapshot retention for DuckLake defaults to infinite (`NULL`) and is configured with `SNAPSHOT_RETENTION_DAYS` through [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database). See [DuckLake snapshot retention](/concepts/ducklake#snapshot-retention) for details.
:::

### Snapshot options per plan (native storage)

| Plan | Snapshot Retention Default | Configurable Retention Period | Named Snapshots | Point-in-Time Restore | [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) Database |
|------|----------------------------|-------------------------------|-----------------|----------------------|-------------------|
| **Business** | 7 days | 0–90 days | Yes | Yes | Yes |
| **Lite (paid)** | 1 day | 1 day | No | Yes | Yes |
| **Lite (free)** | 0 days | N/A | N/A | N/A | N/A |

Snapshots can be used to restore a new database to the snapshot using [`CREATE DATABASE`](/sql-reference/motherduck-sql-reference/create-database) or to [`ALTER`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) an existing database to reflect the contents of a specific snapshot.

- **[Automatic snapshots](/concepts/snapshots#1-automatic-snapshots)** are retained for a set period of time according to `snapshot_retention_days` after they are no longer the active snapshot for a database.
- **[Named snapshots](/concepts/snapshots#2-named-snapshots)** are created explicitly and persist until unnamed. They are not subject to automatic garbage collection.

A new database:

```sql
CREATE DATABASE <new_db> FROM <current_db> (
    SNAPSHOT_TIME ... | SNAPSHOT_NAME ... | SNAPSHOT_ID ...
)
```

An existing database:

```sql
ALTER DATABASE <existing_db> SET SNAPSHOT TO (
    SNAPSHOT_TIME ... | SNAPSHOT_NAME ... | SNAPSHOT_ID ...
)
```

Snapshots can also be used to recover a dropped database:

```sql
UNDROP DATABASE <db_name>
```

Refer to [undrop database](/sql-reference/motherduck-sql-reference/undrop-database) page for details.

Example:

```sql
-- You cannot drop the currently active database
USE some_other_db;
DROP DATABASE recovery_demo;

UNDROP DATABASE recovery_demo;
```

Refer to the [named snapshots](/sql-reference/motherduck-sql-reference/create-snapshot) page for an example.

## Restoring your database to a named snapshot

```sql
CREATE DATABASE example_db;

USE example_db;
CREATE TABLE one AS SELECT  1;
CREATE SNAPSHOT one OF example_db;
CREATE TABLE two AS SELECT  2;
CREATE SNAPSHOT two OF example_db;
CREATE TABLE three AS SELECT 3;
CREATE SNAPSHOT three OF example_db;

-- Accidentally drop data!
DROP TABLE two;

-- Restore a previous snapshot of the DB and check it's what you want
CREATE DATABASE example_restore FROM example_db (SNAPSHOT_NAME 'three');

-- The snapshot looks correct!
SELECT * FROM example_restore.two;

-- Restore the database to the old valid snapshot
ALTER DATABASE example_db SET SNAPSHOT TO (SNAPSHOT_NAME 'three');

-- We have successfully restored our data!
SELECT * FROM two;
```

## Restoring a database to a historical snapshot

To find all snapshots corresponding to your database, run the following queries.

To see the history of snapshots for a given database:

```sql
SELECT * FROM MD_INFORMATION_SCHEMA.DATABASE_SNAPSHOTS WHERE database_name = '<your_database_name>' ORDER BY created_ts DESC;
```

If you have a rough idea of the time range you want to restore your database to, you can filter the above query by `created_ts`,

```sql
SELECT snapshot_id, created_ts, active_bytes FROM MD_INFORMATION_SCHEMA.DATABASE_SNAPSHOTS
WHERE database_name = '<your_database_name>'
and created_ts >= '2024-12-02 20:00:00'
and created_ts <= '2024-12-02 20:05:00'
ORDER BY created_ts DESC
```

The results should look something like this:

| snapshot_id | created_ts | active_bytes|
|-------------|-----------|----------|
| `73034f48-e832-40d6-a30f-9055eb302a2e` | `2024-12-02 20:03:30` | `2191330` |
| `c204ce3b-f3fd-4677-8a05-e8680648cf27` | `2024-12-02 20:02:05` | `2183991` |
| `63395025-b139-4c6f-8fc2-7b8c0feff748` | `2024-12-02 20:01:55` | `1847296` |

Example (restore an existing database by ID):

```sql
ALTER DATABASE your_database_name
SET SNAPSHOT TO (SNAPSHOT_ID '<snapshot_id_from_query>');
```

Both automated and named snapshots can be used to restore to a desired state that was captured. Users can either restore a new or existing database to a specific snapshot.

```sql
-- For a new database
CREATE DATABASE restored_database FROM your_database_name (SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27');
```

After running the above command, users can run queries on `restored_database` and use the state of the database from a prior point-in-time.

Once users have the exact snapshot they are interested in restoring, we recommend finding the `snapshot_id` (instead of using `snapshot_time`) and using the command:

```sql
ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27');
```

**Note:** Running a `SET SNAPSHOT TO` command that specifies a timestamp that doesn't exist in `md_information_schema.database_snapshots` will select the most recent snapshot created at or before the specified timestamp. In our example, snapshot `63395025-b139-4c6f-8fc2-7b8c0feff748` would be selected because it is the only snapshot in the information schema table that was created before `'2024-12-02 20:02:04'`

```sql
ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_TIME '2024-12-02 20:02:04');
```

However, if you run:

```sql
ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_TIME '2024-12-02 20:02:05');
```

then snapshot `c204ce3b-f3fd-4677-8a05-e8680648cf27` will be selected because there is an exact timestamp match.

## See also

- [Database Snapshots](/concepts/snapshots) — Understanding snapshot types, retention, and best practices
- [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot) — SQL reference for creating snapshots
- [`UNDROP DATABASE`](/sql-reference/motherduck-sql-reference/undrop-database) — Recovering dropped databases

---

Source: https://motherduck.com/docs/concepts/flights

# Flights
> How Flights run scheduled Python next to your MotherDuck data, and when to reach for one instead of SQL, a Dive, or an external orchestrator.
A **Flight** is a Python program that MotherDuck schedules and runs, with direct access to your databases. It can do whatever Python can: call external APIs, use any PyPI package, process files or run custom logic.

## Where Flights fit in MotherDuck

MotherDuck runs two compute planes against the same storage:

```mermaid
flowchart LR
    User{{"You"}}:::green

    subgraph MotherDuck["MotherDuck"]
        Runtime["Python runtime<br/>(per Flight run)"]:::yellow
        Duckling["Duckling<br/>(SQL compute)"]:::yellow
    end

    DB[("Your databases")]:::yellow

    User -->|"interactive SQL"| Duckling
    User -->|"create / schedule"| Runtime
    Runtime -->|"md:"| Duckling
    Duckling --> DB
```

**Ducklings** run your SQL. They're per-user (see [Hypertenancy](/concepts/hypertenancy)) and start in about a second.

The **Python runtime** runs your scheduled Python. You provide the Flight's source code, and each run gets its own isolated runtime that executes the `main()` entrypoint and exits. A run starts from the Flight's schedule or an `MD_RUN_FLIGHT` call.

A Flight reaches your data the same way any DuckDB client does: the Python runtime opens an `md:` connection, which routes through a Duckling that executes the SQL against your databases. MotherDuck injects an access token into the Flight's environment so the connection works without you embedding credentials.

## The relationship to SQL

Flights and SQL talk to each other in both directions.

**From Python to SQL.** A Flight uses the DuckDB Python client to run any query you can express in SQL:

```python
import duckdb

def main():
    con = duckdb.connect("md:")
    con.execute("INSERT INTO sales.daily_totals SELECT * FROM read_parquet('s3://incoming/today.parquet')")
```

Anything in the SQL reference is available here: aggregations, MotherDuck table functions, attach commands, and so on.

**From SQL to Flights.** Flights also have a SQL surface. `MD_CREATE_FLIGHT`, `MD_FLIGHTS`, `MD_FLIGHT_RUNS`, and the rest of the [Flights SQL functions](/sql-reference/motherduck-sql-reference) let you create, schedule, list, and monitor Flights from anywhere you can run SQL: a DuckDB CLI, your BI tool, or another Flight.

**When to pick which.** A reasonable rule of thumb:

- **Stay in SQL** for a one-off query, or one you already schedule elsewhere.
- **Use a Flight** when you want MotherDuck to run the query on a schedule, when the job should retry and keep a run history, or when the work goes beyond SQL: calling an external service, installing a Python package, reading a file format SQL can't, or chaining steps that don't fit one query.

## What you can build with Flights

- **Ingest.** Pull data from external sources (Postgres, BigQuery, Snowflake, S3, APIs) and write it to MotherDuck tables. [dlt](https://dlthub.com/) is the recommended ingest library: it gives you a declarative pipeline with schema evolution, incremental loading, and a MotherDuck destination.
- **Transform.** Read MotherDuck data, run Python-heavy transformations (machine-learning features, geospatial work), and write the result back. When the transformation is graph-shaped, [dbt](https://docs.getdbt.com/) with the `dbt-duckdb` adapter is the recommended approach.
- **AI enrichment.** Run LLM and embedding work over your data on a schedule. Call MotherDuck's [`prompt`](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) and [`embedding`](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) functions from SQL to classify, summarize, or extract structured fields row by row, or use a Python AI library in `main()` to generate embeddings, score records, or call a model API and write the results back to a table.
- **Export and delivery.** Write query results to object storage as Parquet or CSV — a scheduled drop in S3, Google Cloud Storage, or Azure Blob Storage for a partner or a downstream pipeline to pick up.
- **Reverse ETL and notifications.** Read from MotherDuck and push to a downstream operational system: post a daily summary to Slack, sync an aggregate to a SaaS tool, update a record in a CRM.

## Creating a Flight

You can create a Flight with the [MotherDuck MCP server](/key-tasks/ai-and-motherduck/mcp-setup/): describe the job in natural language, and the agent writes the Python, sets the schedule, and runs it for you with the MCP's Flight tools (`create_flight`, `edit_flight_source`, `run_flight`, and the rest).

You can also create and manage Flights directly with SQL through [`MD_CREATE_FLIGHT`](/sql-reference/motherduck-sql-reference/flights/) and the other Flight table functions, from a DuckDB CLI, your BI tool, or another Flight.

## Integrating with the tools you already use

- **dlt** for ingest. Generate or hand-write a dlt pipeline that uses MotherDuck as the destination, then deploy it as a Flight.
- **DuckDB Community Extensions**. Use [community extensions](https://duckdb.org/community_extensions/list_of_extensions) you trust that are not available on MotherDuck by default. Pull data from [BigQuery](https://duckdb.org/community_extensions/extensions/bigquery) or [Snowflake](https://duckdb.org/community_extensions/extensions/snowflake), [read YAML files](https://duckdb.org/community_extensions/extensions/yaml) and much more.
- **Anything on PyPI.** Pull in any pip-installable package through `requirements.txt`.
- **MCP and AI agents.** Agents create, edit, and run Flights through the [MotherDuck MCP Server](/sql-reference/mcp/). This is the marquee creation path: describe what you want in natural language, the agent writes the Python and wires up the schedule.

## Beyond Python

A Flight is a Linux process that runs your Python program, with shell access through `subprocess`. That opens up two patterns worth knowing.

### Install and call system binaries

Use `subprocess` to run `apt-get` and invoke command-line tools (git, ffmpeg, Playwright, anything available as a Debian package) from your Flight.

```python
import subprocess

def main():
    subprocess.run(["apt-get", "install", "-y", "git"], check=True)
    subprocess.run(["git", "clone", "https://github.com/example/repo"], check=True)
```

### Run a local DuckDB with community extensions

A Flight can open its own in-process DuckDB connection alongside the `md:` connection, which lets you load DuckDB [community extensions](https://duckdb.org/community_extensions/) that aren't supported on MotherDuck's server-side runtime. For example, the [`bigquery`](https://duckdb.org/community_extensions/extensions/bigquery) and [`snowflake`](https://duckdb.org/community_extensions/extensions/snowflake) extensions let a Flight read directly from another warehouse and write the result to MotherDuck.

```python
import duckdb

def main():
    local = duckdb.connect()  # local in-process DuckDB
    local.execute("INSTALL bigquery FROM community; LOAD bigquery;")
    local.execute("ATTACH 'project=my-project' AS bq (TYPE bigquery, READ_ONLY)")

    # read from BigQuery into the Flight, then write the result to MotherDuck
    events = local.sql("SELECT * FROM bq.analytics.events").df()

    md = duckdb.connect("md:")
    md.execute("INSERT INTO raw.events SELECT * FROM events")
```

Keep heavy compute outside the Flight runtime. A Flight is sized for orchestration and light processing, not for crunching large tables in memory. Let the source warehouse (BigQuery, Snowflake) and MotherDuck handle the heavy lifting, and let the Flight move data between them.

## How Flights are scoped

- **A Flight runs as a triggered job.** A run starts from its schedule or a `MD_RUN_FLIGHT` call, executes `main()` to completion, and exits; a single run can be long or short.
- **One Flight is one process.** A run isn't a managed worker pool with a built-in queue or distributed state. You can fan out by having a Flight call `MD_RUN_FLIGHT` to trigger others, or parallelize within a run with a Python thread pool — the coordination logic is yours to write. Flights are at their best for MotherDuck-centric work where the data, schedule, and compute live in one place; for workflows that span many external systems, a dedicated orchestrator like Airflow or Prefect is still the better tool.
- **Runs are unattended.** You read a run's output afterward from its [logs](/sql-reference/motherduck-sql-reference/flights/md-flight-logs) rather than stepping through it live. To explore data interactively, run queries in the SQL editor instead.

## Related resources

- [Flights SQL reference](/sql-reference/motherduck-sql-reference/flights/) — `MD_CREATE_FLIGHT`, `MD_RUN_FLIGHT`, and the rest of the Flight table functions.
- [Hypertenancy](/concepts/hypertenancy) — why Ducklings give every user dedicated SQL compute.
- [Creating visualizations with Dives](/key-tasks/ai-and-motherduck/dives) — interactive React apps over MotherDuck data.
- [MotherDuck MCP Server](/sql-reference/mcp/) — the MCP surface for AI agents that create and manage Flights.

---

Source: https://motherduck.com/docs/concepts/scaling-patterns

# Workload scaling patterns
> Choose the right compute size, scaling approach, and connection model for your MotherDuck workload using a decision flowchart and workload-to-pattern matrix.
MotherDuck gives you several levers to scale your workloads. The right combination depends on your concurrency needs, query characteristics, and whether your workload is read-heavy or write-heavy. This page helps you match your workload to the right scaling pattern.

## How MotherDuck scales per workload

MotherDuck scales workloads through compute units called [Ducklings](/concepts/hypertenancy/). Each user or service account gets a dedicated Duckling and read scaling pool, and you can adjust three levers to match your workload:

| Scaling lever | What it does | When to use it |
|---|---|---|
| **Vertical scaling** | Resize your Duckling ([Pulse through Giga](/about-motherduck/billing/duckling-sizes)) | Queries need more CPU or memory|
| **Horizontal scaling** | Add read-only Ducklings through [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) | Many concurrent users running read queries |
| **Workload isolation** | Create separate [service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) | Teams or pipelines that should not share compute, for example to prevent a large data ingestion impacting the queries of analysts. |

These levers are complementary. For example, you might use a Jumbo Duckling (vertical) for data loading and use a pool of Pulse Ducklings with read scaling (horizontal) for your dashboard users. You can connect through any [supported interface](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/), including the native DuckDB SDK, the Postgres endpoint, or DuckDB WASM.

:::tip[Not sure what you need?]
Follow the [decision flowchart](#decision-flowchart) at the bottom of this page to find the right scaling pattern for your workload.
:::

## Understanding the scaling levers

### Vertical scaling: Duckling sizes

When a single query needs more resources, move to a larger Duckling. Larger Ducklings have more CPU, memory, and extra SSD space to be used whenever queries use up too much memory. This helps with:

- Complex joins and aggregations
- Large data loading jobs
- Queries that process more data than fits in memory

Duckling sizes range from **Pulse** (lightweight, on-demand billing) to **Giga** (maximum resources for the heaviest batch jobs). See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for the full comparison.

**When to size up**: If queries are slow and you see high values for `BYTES_SPILLED_TO_DISK` or `WAIT_TIME` in your [query history](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/), your Duckling may need more memory.

### Horizontal scaling: Read scaling

When you need to serve many concurrent read queries, [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) adds read-only Duckling replicas behind your account. Key concepts:

- **Pool size**: The default pool is 4 Ducklings, configurable up to 16 (soft limit). [Contact support](https://motherduck.com/contact-us/support/) for higher limits.
- **Eventual consistency**: Read replicas lag a few minutes behind the primary. Use [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot/) and [`REFRESH DATABASES`](/sql-reference/motherduck-sql-reference/refresh-database/) if you need tighter synchronization.
- **One Duckling per user**: For the best performance, aim for one Duckling per concurrent user. This takes full advantage of DuckDB's single-node architecture.

### Session affinity and routing

By default, read scaling distributes connections across the pool in round-robin fashion. When the number of connections exceeds your pool size, new connections share existing Ducklings.

For workloads where users run unique queries, use [`session_name`](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#session-affinity-with-session-name) to route a user's connections to the same Duckling. This improves performance because:

- **Cache locality**: DuckDB caches data per instance. Routing the same user to the same Duckling means their subsequent queries benefit from a warm cache.
- **Consistent view**: Queries within a session see a consistent snapshot of the data.
- **Better isolation**: Concurrent users do not compete for the same Duckling's resources.

Set `session_name` to a user ID, session ID, or any stable identifier to benefit from caching where possible. You can [set the Duckling cooldown period](/docs/about-motherduck/billing/duckling-sizes/#configuring-the-cooldown-period) to match your use case and keep the Duckling alive with its cache.

**When to use `session_name`**: Use it when users run unique, personalized queries, for example in customer-facing analytics or multi-tenant dashboards. Skip it when all users run the same queries (such as a shared reporting dashboard), since a shared connection pool already routes efficiently.

### Workload isolation: Service accounts

[Service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) give you full compute and data isolation between workloads. Each service account gets its own Duckling and read scaling pool, which makes it easier to track usage and billing.

Use separate service accounts when:

- Different teams should never share compute (for example, production vs. development)
- A data loading pipeline should not compete with queries from analysts
- You need your customers to be able to write back to a database in a [customer-facing analytics](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/) setup, or have a separate read scaling pool for each customer.

If you need a visual interface to manage these service accounts, the UI lets you impersonate a service account and adjust settings and run queries as the service account.

## Quick reference: workload patterns

Use this matrix to find the recommended pattern for common workloads. Each row represents a typical use case, with inputs describing the workload and outputs recommending a configuration.

| Use case | Users | Concurrency | R/W | Overlap | Weight | Duckling size | Scaling approach |
|---|---|---|---|---|---|---|---|
| *Ad-hoc analyst* | 👤 | Sequential | R/W | — | ⚡/🏋️ | Pulse / Standard+ | Default (single Duckling) |
| *dbt or ELT pipeline* | 👤 | Concurrent | W | — | 🏋️ | Jumbo / Mega | Default (single Duckling) |
| *Scheduled ingestion job* | 👤 | Sequential | W | — | 🏋️ | Jumbo+ | Default + dedicated service account |
| *BI dashboard (Omni, Hex, Metabase)* | 👥👥 | Concurrent | R | High | ⚡ | Pulse / Standard | Read scaling (shared pool) |
| *Embedded analytics* | 👥👥 | Concurrent | R | Low | ⚡/🏋️ | Pulse / Standard+ | Read scaling + `session_name` |
| *Customer-facing app (3-tier)* | 👥👥 | Concurrent | R | Low | ⚡ | Standard | Read scaling + `session_name` |
| *Serverless function (Lambda, Workers)* | 👥👥 | Concurrent | R | Varies | ⚡ | Standard | Read scaling |
| *Multi-team production* | 👥👥 | Concurrent | R/W | Low | ⚡/🏋️ | Per team | Separate service accounts |

### Reading the matrix

**Input columns** describe your workload:

- **Users**: How many people or clients connect: 👤 single, 👥👥 many
- **Concurrency**: Whether queries run one at a time (sequential) or in parallel (concurrent)
- **R/W**: Whether the workload reads (R), writes (W), or both (R/W)
- **Overlap**: Whether different users tend to run the same queries (high) or unique queries (low)
- **Weight**: Whether queries are light (⚡ sub-second) or heavy (🏋️ seconds to minutes)

**Output columns** recommend a configuration:

- **Duckling size**: Which [Duckling size](/about-motherduck/billing/duckling-sizes/) to use
- **Scaling approach**: Which horizontal scaling method to apply

## Choosing an interface

Your choice of interface does not change the scaling levers available to you, but it does affect session management and connection behavior.

| Interface | Best for | Session management |
|---|---|---|
| Native SDK (Python, Node.js, Java) | Client applications, scripts, dbt | Instance cache, `session_name` |
| [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/) | Serverless functions, BI tools, environments without DuckDB | Per-connection |
| DuckDB WASM | Browser-based applications | Client-side compute |

## Cost considerations

Scaling decisions affect your compute costs:

- **Vertical scaling** increases the per-second cost of your Duckling. Larger Ducklings cost more but finish heavy queries faster.
- **Horizontal scaling** adds Ducklings proportional to active sessions, not total users. Idle Ducklings shut down after the configured [cooldown period](/about-motherduck/billing/duckling-sizes/).
- **Pulse Ducklings** use per-query billing (minimum 1 compute-unit second), making them cost-effective for sporadic, lightweight workloads.
- **Cooldown tuning** balances cost against cache warmth. A longer cooldown keeps the cache warm for returning users but costs more during idle periods.

See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) and [pricing](/about-motherduck/billing/pricing/) for the full cost breakdown.

## Decision flowchart

If you are not sure where to start, follow this flowchart:

```mermaid
flowchart TD
    WriteQ{"Write-heavy?<br/>(data loading, ETL)"}:::yellow

    WriteQ -->|Yes| WeightQ{"Heavy queries?<br/>(complex joins, large loads)"}:::yellow
    WriteQ -->|No| ConcQ{"How many concurrent<br/>read users?"}:::yellow

    WeightQ -->|Yes| SizeUp["Size up your Duckling<br/>(Jumbo / Mega / Giga)"]:::green
    WeightQ -->|No| StdDuckling["Standard Duckling"]:::green

    SizeUp --> Isolation{"Need compute isolation<br/>between workloads?"}:::yellow
    StdDuckling --> Isolation
    Isolation -->|Yes| SvcAcct["Separate service accounts"]:::green
    Isolation -->|No| SingleAcct["Single service account"]:::green

    ConcQ -->|"1-5"| Default["Default Duckling,<br/>size for your heaviest query"]:::green
    ConcQ -->|"5-50"| OverlapQ{"Do users read the<br/>same data?"}:::yellow
    ConcQ -->|"50+"| OverlapQ2{"Do users read the<br/>same data?"}:::yellow

    OverlapQ -->|"Yes, mostly shared"| SharedPool["Read scaling<br/>(shared connection pool)"]:::green
    OverlapQ -->|"No, differs per user"| SessionHint["Read scaling<br/>+ session_name"]:::green

    OverlapQ2 -->|"Yes, mostly shared"| HighConcShared["Read scaling at max<br/>connection pool size"]:::green
    OverlapQ2 -->|"No, differs per user"| HighConcUnique["Read scaling at max<br/>connection pool size<br/>+ session_name"]:::green
```

## Related content

- [Hypertenancy](/concepts/hypertenancy/): how MotherDuck's per-user compute model works
- [Duckling sizes](/about-motherduck/billing/duckling-sizes/): compare sizes and configure your Ducklings
- [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/): set up read-only Duckling pools
- [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/): create isolated compute for teams and pipelines
- [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/): connect through the PostgreSQL wire protocol
- [Customer-facing analytics](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/): build multi-tenant analytics with per-customer isolation

---

Source: https://motherduck.com/docs/concepts/object-name-resolution

# Object name resolution


> Fully qualified naming conventions and database resolution rules in MotherDuck.

## Fully qualified naming convention

Fully qualified names (FQN) in MotherDuck are of the form `<database>.<schema>.<object>`. Fully qualified naming convention allows you to query objects in MotherDuck regardless of context. Queryable objects can be tables and views.

For example:

```sql
SELECT * FROM mydatabase.myschema.mytable;
```

Fully qualified naming convention is useful when you want your SQL to execute reliably across multiple interfaces, by various users, or in programmatic scripts.

## Relative naming convention

For convenience, MotherDuck enables you to omit database or schema when querying objects.

When **database is omitted**, MotherDuck will attempt to resolve the query by using the current database:

```sql
SELECT * FROM myschema.mytable;
```

When **both database and schema are omitted**, MotherDuck will first attempt to find the object in the current schema. Thereafter, it will attempt to find the object in other schemas in the current database. If the object name is ambiguous - for example if multiple tables with the same name exist in the database - MotherDuck will return an error:

```sql
SELECT * FROM mytable;
```

You may also choose to **omit just the schema**. MotherDuck will first search the current schema, and thereafter will search for the object across all other schemas in the specified database:

```sql
SELECT * FROM mydatabase.mytable;
```

---

Source: https://motherduck.com/docs/concepts/duckdb-extensions

# DuckDB extensions in MotherDuck


> Supported DuckDB extensions for the MotherDuck cloud service, Web UI, and CLI.

MotherDuck supports a wide array of DuckDB extensions to enhance your analytics workflows. Support varies depending on whether you are using the DuckDB CLI, the MotherDuck cloud service (server-side), or the MotherDuck Web UI.

## Extension support

### MotherDuck Web UI
The MotherDuck Web UI supports a subset of extensions optimized for interactive analytics and data exploration directly in your browser. Some extensions can be loaded in the Web UI but are not supported server side (i.e., they are invoked and ran only in the browser).

### MotherDuck Cloud (server-side)
MotherDuck's cloud service supports a curated set of extensions for optimized, secure, and scalable query execution. These extensions are available for all queries running against the MotherDuck service.

### DuckDB CLI
When connected to MotherDuck through the local DuckDB CLI, **all** DuckDB extensions are available. These extensions are loaded locally, giving you access to the entire DuckDB ecosystem for development and testing.

## Extension support matrix

The following table summarizes the current support for DuckDB extensions across MotherDuck environments, as it relates to execution context - extensions supported only server-side will only use server-side compute, where as extensions also supported in the Web UI will use local compute as well.

The environments are **MD Web UI**, located at https://app.motherduck.com, **MD Cloud**, which runs on MotherDuck infrastructure when you connect using `md:`, and **DuckDB UI / CLI** which run on local environments where the DuckDB client is installed.

| Extension            | MD UI* | MD Cloud | DuckDB UI / CLI |
|----------------------|--------|----------|-----------------|
| autocomplete         | ✅      | ❌        | ✅               |
| avro                 | ✅      | ✅        | ✅               |
| aws                  | ❌      | ❌        | ✅               |
| azure                | ❌      | ✅        | ✅               |
| delta                | ❌      | ✅        | ✅               |
| ducklake             | ✅      | ✅        | ✅               |
| encodings            | ❌      | ✅        | ✅               |
| excel                | ✅      | ✅        | ✅               |
| fts                  | ✅      | ✅        | ✅               |
| httpfs               | ✅      | ✅        | ✅               |
| h3                   | ✅      | ✅        | ✅               |
| iceberg              | ❌      | ✅        | ✅               |
| icu                  | ✅      | ✅        | ✅               |
| inet                 | ✅      | ✅        | ✅               |
| jemalloc             | ❌      | ❌        | ✅               |
| json                 | ✅      | ✅        | ✅               |
| mysql                | ❌      | ❌        | ✅               |
| parquet              | ✅      | ✅        | ✅               |
| postgres             | ❌      | ❌        | ✅               |
| spatial              | ✅      | ✅        | ✅               |
| sqlite               | ✅      | ❌        | ✅               |
| tpcds                | ✅      | ✅        | ✅               |
| tpch                 | ✅      | ✅        | ✅               |
| ui                   | ❌      | ❌        | ✅               |
| vss                  | ❌      | ❌        | ✅               |
| community extensions | ❌      | ❌        | ✅               |

:::note
*Not all features of extensions in the MotherDuck UI (Wasm) are supported.
:::

:::note
For some extensions (such as `h3`), you should load it before loading the `motherduck` extension if you want to use it
on local data without routing the query to MotherDuck.

```sql
-- Install and load the h3 extension before MotherDuck
INSTALL h3 FROM community;
LOAD h3;
LOAD motherduck;
ATTACH 'md:';
```

:::

Extensions listed as supported by DuckDB UI / CLI, such as `aws`, `postgres`, and `vss`, can also be used through a local DuckDB instance connected to MotherDuck.

## Future development

MotherDuck's extension support is continuously evolving. The team regularly evaluates and adds support for new extensions based on user demand and technical feasibility. If you need specific extensions enabled, please reach out to the MotherDuck team.

---

Source: https://motherduck.com/docs/concepts/ducklake

# DuckLake


> Understanding DuckLake - A high-performance open table format for petabyte-scale analytics

::::info
MotherDuck supports DuckDB 1.5.3. In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions 1.4.0 through 1.5.3. In **US West (Oregon) -** `us-west-2`, MotherDuck supports client versions 1.4.1 through 1.5.3. In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions 1.4.1 through 1.5.3.
::::

DuckLake is an open table format for large-scale analytics that provides data management capabilities similar to Apache Iceberg and Delta Lake. It organizes data into partitions based on column values like date or region for efficient querying, with actual data files stored on object storage systems. DuckLake innovates by storing metadata in database tables rather than files, enabling faster lookups through database indexes and more efficient partition pruning using SQL queries, while the columnar data itself resides on scalable object storage infrastructure.

MotherDuck provides support for managed DuckLake, enabling you to back MotherDuck databases with a DuckLake catalog and storage for petabyte-scale data workloads.

:::tip
Looking for **code examples?** Check out the [integration guide](/integrations/file-formats/ducklake/) to see how easy it is to start using DuckLake with MotherDuck.
:::

## Key characteristics

**Database-backed metadata**: DuckLake stores table metadata in a transactional database (PostgreSQL, MySQL) rather than files, providing:
- Faster metadata lookups through database indexes
- Efficient filtering of data by skipping irrelevant partitions using SQL WHERE clauses
- Simplified writes without the performance of manifest file merging

**Multi-table transactions**: Unlike other lake formats that operate on individual tables, DuckLake supports ACID transactions across multiple related tables, better reflecting how organizations think about databases as collections of inter-related tables.

**Simplified architecture**: No additional catalog server required—just a standard transactional database that most organizations already have expertise managing.

## DuckLake vs. other lake formats

### Performance differences

Table formats like Apache Iceberg and Delta Lake store metadata in file-based structures. Read and write operations must traverse these file-based metadata structures, which can create latency that increases with scale.

**File-based metadata challenges**:
- Sequential file scanning for metadata discovery
- Complex manifest file merging for writes
- Limited query optimization due to metadata access patterns
- Catalog server complexity for coordination

**DuckLake approach**:
- Database indexes provide faster metadata lookups
- Transactional writes reduce manifest merging overhead
- SQL-based partition pruning and query optimization
- Standard database operations for metadata management

### Scale and capability comparison

| Capability | DuckLake | Iceberg/Delta Lake |
| ---------- | -------- | ------------------ |
| **Data Scale** | Petabytes | Petabytes |
| **Metadata Storage** | Database tables with indexed access | File-based structures requiring sequential traversal |
| **Metadata Performance** | Database index lookups | Additional catalog required |
| **Write Operations** | Database transactions | Manifest file merging |
| **Multi-table Operations** | Full ACID transactions across tables | Limited cross-table coordination |
| **Infrastructure Requirements** | Standard transactional databases | Separate catalog servers |
| **Schema Evolution** | Coordinated multi-table schema evolution | Individual table-level changes |

## Use cases and applications

### When to choose DuckLake as your open table format

DuckLake is particularly well-suited for:

**Large-scale analytics**: Organizations with petabytes of historical data, high-volume event streams, or analytics requirements that exceed traditional data warehouse storage or processing capabilities.

**Multi-table workloads**: Applications requiring coordinated schema evolution, cross-table constraints, or transactional consistency across related tables.

**Metadata-intensive workloads**: Scenarios where file-based metadata access patterns may impact query performance.

**Reduced infrastructure complexity**: Organizations seeking lake-scale capabilities with fewer separate catalog servers and metadata management components.

### Storage comparison: MotherDuck native vs DuckLake storage

For loading data, MotherDuck and DuckLake perform very similarly.

However, when reading data, MotherDuck native storage format is 2x-10x faster than DuckLake, for both cold & hot runs.

### Migration considerations

**From data warehouses**: DuckLake provides a scaling option when warehouse storage limits or costs become constraining, while maintaining SQL interfaces and compatibility.

**From other lake formats**: DuckLake may provide performance improvements for metadata-intensive workloads, though migration requires consideration of existing tooling and processes.

**Hybrid architectures**: Organizations can use MotherDuck for traditional data warehouse workloads while graduating specific databases to DuckLake as scale requirements increase.

## Performance characteristics

### Metadata operations

DuckLake's database-backed metadata provides different performance characteristics:

- **Partition discovery**: Index-based vs. file scanning
- **Schema evolution**: Transactional vs. eventual consistency
- **Query planning**: Index-based vs. file traversal
- **Concurrent access**: Database locks vs. file coordination

## Data inlining

DuckLake supports data inlining, an optimization that stores small data changes directly in the metadata catalog rather than creating individual Parquet files. This feature is particularly valuable for high-frequency, small-batch inserts common in streaming and transactional workloads.

Starting with DuckLake 0.4, **deletion inlining** extends this concept to delete operations -- small deletes are stored in the metadata catalog rather than creating separate deletion files.

For implementation details and examples, see the [DuckLake integration guide](/integrations/file-formats/ducklake/#data-inlining).

## Storage lifecycle

DuckLake databases follow most of the same [storage lifecycle stages](/concepts/storage-lifecycle) as native storage databases:

1. **Active bytes**: Data that is part of the current state of the database
2. **Historical bytes**: Data retained by snapshots that is no longer part of the active state
3. **Failsafe bytes**: Data retained as system backups after snapshots expire (7-day retention)
4. **Deleted**: Data fully removed from the system

Unlike native storage databases, DuckLake does not have a "retained for clone" stage because DuckLake does not support zero-copy cloning.

Storage optimization and snapshot expiration are handled by [auto maintenance](#auto-maintenance) rather than the native storage garbage collector. For retention defaults and plan-specific details, see [Storage lifecycle and management](/concepts/storage-lifecycle#ducklake-databases).

## Auto maintenance

MotherDuck runs background maintenance on DuckLake databases to optimize storage layout and manage data lifecycle. Maintenance runs periodically on the duckling that owns the database while it is active.

### Defaults and configuration

| Database type | Default | Description |
| ------------- | ------- | ----------- |
| Fully managed DuckLake | Enabled | Maintenance runs automatically; opt out with `ALTER DATABASE SET AUTO_MAINTENANCE = FALSE` |
| BYOB (Bring Your Own Bucket) | Disabled | Opt in with `ALTER DATABASE SET AUTO_MAINTENANCE = TRUE` to enable maintenance |

To disable auto maintenance:

```sql
ALTER DATABASE <database_name> SET AUTO_MAINTENANCE = FALSE;
```

To enable auto maintenance (for example, for BYOB databases):

```sql
ALTER DATABASE <database_name> SET AUTO_MAINTENANCE = TRUE;
```

### Maintenance operations

Auto maintenance runs two phases for each database. You can also run these operations manually using the [DuckLake maintenance functions](https://ducklake.select/docs/stable/duckdb/maintenance/recommended_maintenance).

#### File layout optimization

These operations keep query performance high by organizing data files:

| Operation | Description |
| --------- | ----------- |
| **[Flush inlined data](https://ducklake.select/docs/stable/duckdb/advanced_features/data_inlining#flushing-inlined-data)** | Converts small inlined data stored in the metadata catalog to Parquet files. This may produce small files, which the merge operation consolidates in the same maintenance round. |
| **[Merge small files](https://ducklake.select/docs/stable/duckdb/maintenance/merge_adjacent_files)** | Combines adjacent small Parquet files into larger files, reducing the number of files scanned during queries. |
| **[Rewrite data files](https://ducklake.select/docs/stable/duckdb/maintenance/rewrite_data_files)** | Rewrites data files that have accumulated deleted rows to reclaim space and remove delete overhead. |

Merge and rewrite operate on disjoint sets of files: merge handles files without deletes, while rewrite handles files with deletes.

#### Snapshot lifecycle management

These operations manage time travel snapshots and clean up files that are no longer needed:

| Operation | Description |
| --------- | ----------- |
| **[Expire snapshots](https://ducklake.select/docs/stable/duckdb/maintenance/expire_snapshots)** | Removes snapshots older than the configured retention period and queues their associated files for deletion. |
| **[Clean up old files](https://ducklake.select/docs/stable/duckdb/maintenance/cleanup_of_files)** | Physically deletes files that have been queued for deletion by expire, merge, or rewrite operations. Files are kept for at least 12 hours after queuing, allowing in-flight queries to finish. |

### Snapshot retention

Snapshot expiration is controlled by the `SNAPSHOT_RETENTION_DAYS` database option. By default, this is set to `NULL` (infinite retention), meaning snapshots are never automatically expired. You must explicitly configure a retention period to enable automatic snapshot expiration.

To set a retention period:

```sql
ALTER DATABASE <database_name> SET SNAPSHOT_RETENTION_DAYS = 7;
```

To revert to infinite retention:

```sql
ALTER DATABASE <database_name> SET SNAPSHOT_RETENTION_DAYS = NULL;
```

:::note
When `SNAPSHOT_RETENTION_DAYS` is `NULL`, the expire snapshots operation is skipped entirely. No snapshot data is expired unless you explicitly set a retention period. The file cleanup operation still runs to delete files queued by merge and rewrite, which are always safe to remove regardless of retention settings.
:::

### Write conflicts

The merge and rewrite operations modify table metadata, which can occasionally conflict with concurrent write transactions on the same table. If a conflict occurs, the maintenance operation is skipped for that table and retried in the next round. While these conflicts are rare, you can disable auto maintenance for the affected database if you experience elevated transaction conflicts.

### Where maintenance runs

Maintenance runs on the duckling that owns the database. It executes in the background while the duckling is active. If the duckling shuts down, any in-progress maintenance operations stop gracefully. Maintenance resumes when the duckling starts again.

## Future capabilities

MotherDuck continues expanding DuckLake support with planned features including:

**External catalog integration**: Access to customer-managed DuckLake catalogs hosted in cloud databases

**Local storage access**: Direct access to MotherDuck-managed storage from local DuckDB instances for hybrid workloads

**Enhanced Iceberg support**: Continued improvements to Iceberg integration alongside DuckLake development

## Architecture implications

### Catalog database requirements

DuckLake catalogs require a transactional database with:
- ACID transaction support
- Concurrent read/write access
- Standard SQL interface
- Backup and recovery capabilities

Thankfully, this is all supported as part of MotherDuck without adding an additional catalog, although in self-hosted scenarios, an alternative database like Postgres, MySQL, or SQLite can be used.

### Storage considerations

DuckLake data storage follows similar patterns to other lake formats:
- Columnar file formats (Parquet)
- Partitioned directory structures
- Object storage compatibility
- Compression and encoding optimizations

---

Source: https://motherduck.com/docs/concepts/concepts

# Concepts
> Concepts
This section contains a collection of high level views of concepts & features.

## Included pages

- [Architecture and capabilities](https://motherduck.com/docs/concepts/architecture-and-capabilities): MotherDuck's serverless architecture combining cloud scale with DuckDB's efficiency through Dual Execution.
- [Database Concepts](https://motherduck.com/docs/concepts/database-concepts): MotherDuck Database Concepts
- [Hypertenancy](https://motherduck.com/docs/concepts/hypertenancy): Learn how MotherDuck's hypertenancy model provides dedicated compute for every user through per-user Ducklings, enabling predictable performance without noisy neighbors.
- [Resource management](https://motherduck.com/docs/concepts/resource-management): Understand MotherDuck's resource hierarchy, from organizations, accounts, tokens, and secrets down to databases and tables, and how each level provides compute isolation, data isolation, and access control.
- [Database Snapshots](https://motherduck.com/docs/concepts/database-snapshots): Understand how snapshots work in MotherDuck, including retention, restore, snapshot management, and plan availability
- [pg_duckdb Extension](https://motherduck.com/docs/concepts/pgduckdb): Use pg_duckdb to run DuckDB analytics within PostgreSQL and connect to MotherDuck.
- [Storage Lifecycle and Management](https://motherduck.com/docs/concepts/storage-lifecycle): Understand how MotherDuck manages data storage across different lifecycle stages and how this affects your billing and data management strategies.
- [Data Recovery](https://motherduck.com/docs/concepts/data-recovery): Understand MotherDuck's data recovery mechanisms
- [Flights](https://motherduck.com/docs/concepts/flights): How Flights run scheduled Python next to your MotherDuck data, and when to reach for one instead of SQL, a Dive, or an external orchestrator.
- [Workload scaling patterns](https://motherduck.com/docs/concepts/scaling-patterns): Choose the right compute size, scaling approach, and connection model for your MotherDuck workload using a decision flowchart and workload-to-pattern matrix.
- [Object name resolution](https://motherduck.com/docs/concepts/object-name-resolution): Fully qualified naming conventions and database resolution rules in MotherDuck.
- [DuckDB Extensions in MotherDuck](https://motherduck.com/docs/concepts/duckdb-extensions): Supported DuckDB extensions for the MotherDuck cloud service, Web UI, and CLI.
- [DuckLake](https://motherduck.com/docs/concepts/ducklake): Understanding DuckLake - A high-performance open table format for petabyte-scale analytics
- [Results](https://motherduck.com/docs/concepts/results): Results

---

Source: https://motherduck.com/docs/concepts/results

# Results
> Results
**RESULT** provides asynchronous query execution with a transparent cache. Create a RESULT to run a SELECT in the
background, then query it like a table while controlling its lifecycle (pause, resume, cancel, drop). You can think of
a result as a view with an attached cache that is used whenever possible to speed up queries.
Results are stored in memory and will only remain visible until your client-side DuckDB session is restarted.

For the SQL syntax reference, see [`RESULT`](/sql-reference/motherduck-sql-reference/result).

## Core concepts

### What is a RESULT?

```sql
CREATE RESULT <result_name> AS <select_query>;
FROM <result_name> SELECT ...;
```

A RESULT is a named relation in your DuckDB database that:

- Runs the provided `SELECT` in the background (creation is non-blocking)
- Caches rows produced by that statement as it runs
- Provides lifecycle management (pause, resume, cancel, drop)
- Can be queried like a regular table
- Maintains execution state and progress information

### Result states

Results can be in one of three states:

- **BUILDING**: Query is actively running and appending rows to the cache
- **PAUSED**: Query execution is temporarily paused
- **DONE**: Query execution has completed, which can occur for three reasons:
    1. Query finished successfully
    2. Query was preemptively stopped (e.g., aborted by the user)
    3. Query encountered an error

## Interacting with results

### Creating results

When you create a RESULT, the provided `SELECT` starts running in the background. You can query the result like a normal
table at any time.
results, you can query the result just like you would query a normal table.

```sql
-- Basic syntax
CREATE RESULT <result_name> AS <select_query>;

-- With conflict resolution
CREATE RESULT IF NOT EXISTS <result_name> AS <select_query>;
CREATE OR REPLACE RESULT <result_name> AS <select_query>;

-- Accessing the result
FROM <result_name> LIMIT <n>;
```

### Accessing results

You can query a result like a table. The relation appears quickly after creation, although the background `SELECT` may still
be running.
query creating the result has completed successfully. This occurs very quickly and does not mean that the `SELECT`
statement associated with the result has completed running.

```sql
FROM <result_name> LIMIT <n>;
```

There is **no guarantee** the cache is complete when you query a result. Depending on the state of the `RESULT` and your query, the system
may read from the cache, wait for additional rows, or bypass the cache and re-run the original `SELECT`.

The decision tree below shows how the `FROM my_result LIMIT 100` accessing the RESULT `my_result` behaves.

```mermaid
flowchart TD
    start(("FROM my_result LIMIT 100")):::circle -->|Completed successfully| cache(((Read from cache))):::circle
    start -->|"RESULT is not running (PAUSED/DONE with error)"| enough
    start -->|RESULT is BUILDING| enough_building
    enough_building{"Has enough data?<br> (cache > 100)"}:::green -->|Yes| cache
    enough_building -->|No| access_limit
    access_limit{"access limit < 500,000<br>(100 < 500,000)"}:::green -->|Yes| delay
    access_limit -->|No| rerun
    delay(Wait for 100 rows in cache<br> or result complete) --> cache
    enough{"Has enough data?<br> (cache > 100) OR DONE without error?"}:::green -->|Yes| cache
    enough -->|No| rerun(((Re-run query))):::circle
```

### Lifecycle management

On creation, new results start in the **BUILDING** state. While building, you can **PAUSE**, **RESUME**, **CANCEL**, or **DROP** the result.
Pause suspends execution, resume continues from where it stopped. Cancel stops the job permanently and it cannot
be resumed. Canceled results can still be queried, but they will not append any new rows to the cache.
When a result is dropped, it is permanently deleted and can no longer be queried. Dropping a result also removes its associated cache.

```mermaid
stateDiagram-v2
    [*] --> BUILDING: Result Created
    BUILDING --> PAUSED: PAUSE RESULT
    PAUSED --> BUILDING: RESUME RESULT
    BUILDING --> DONE: SELECT statement completes
    BUILDING --> DONE: CANCEL RESULT
    PAUSED --> DONE: CANCEL RESULT
    note right of BUILDING
        Query is actively running
    end note
    note right of PAUSED
        Query execution paused. Can be resumed.
    end note
    note right of DONE
        Execution finished: completed, error, or canceled.
    end note
    note left of DONE
        PAUSE/RESUME will error when in DONE state
    end note
```

#### Pause result

```sql
PAUSE RESULT <result_name>;
PAUSE RESULT IF EXISTS <result_name>;
```

#### Resume result

```sql
RESUME RESULT <result_name>;
RESUME RESULT IF EXISTS <result_name>;
```

#### Cancel result

```sql
CANCEL RESULT <result_name>;
CANCEL RESULT IF EXISTS <result_name>;
```

#### Drop result

```sql
DROP RESULT <result_name>;
DROP RESULT IF EXISTS <result_name>;
```

### Introspecting results

Use `SHOW ALL RESULTS` to list all your results alongside their status and progress. The returned table also includes:

1. `name`: The name of the result
2. `error`: Any error message associated with the result (is empty if no error occurred)
3. `status`: The current status of the result (BUILDING, PAUSED, DONE)
4. `row_count`: The number of rows in the result cache. This grows as the result builds and is not
   stable within the same transaction (it will increase as the result is being built).

```sql
SHOW ALL RESULTS;

--| name  | error                                                               | status   | row_count |
--|-------|---------------------------------------------------------------------|----------|-----------|
--| foo   | (empty)                                                             | DONE     | 100,000   |
--| bar   | INTERRUPT Error: The RESULT "bar" has been manually canceled.       | DONE     | 10,000    |
--| hello | (empty)                                                             | PAUSED   | 1,000     |
--| world | (empty)                                                             | BUILDING | 100       |
```

If you want to order the results, filter them or limit the output you can use the `MD_SHOW_RESULTS` table function:

```sql
FROM MD_SHOW_RESULTS() WHERE name = 'foo';

--| name | error   | status | row_count |
--|------|---------|--------|-----------|
--| foo  | (empty) | DONE   | 100,000   |
```

## Best practices

- Use `LIMIT` when you need only a small sample so that `RESULT` can serve them quickly from the cache.
- Prefer deterministic `SELECT` statements for predictable caching and reuse.
- Pause or cancel long-running results you do not need immediately and remember to drop them when no longer in use.

## Notes and limitations

- `RESULT` accepts `SELECT` statements only.
- The cache may be partial while the result is building. Queries may wait briefly, use the cache, or re-run the `SELECT`.
- A canceled result cannot be resumed.
- Results are stored in memory and will not persist across client restarts.

## See also
- [Building data applications with MotherDuck](https://motherduck.com/blog/building-data-applications-with-motherduck/)
- [MotherDuck wasm npm package](https://www.npmjs.com/package/@motherduck/wasm-client?activeTab=readme)
- [MotherDuck wasm example repository](https://github.com/motherduckdb/wasm-client)