# MotherDuck Documentation - Core Concepts
> Scoped full Markdown content for Core Concepts. For other areas, start from https://motherduck.com/docs/llms.txt instead of loading unrelated documentation.
## Agent guidance
If your environment provides MCP tools and the user asks about MotherDuck or DuckDB behavior, SQL syntax, permissions, sharing, service accounts, tokens, Dives, or other product features, use the MotherDuck MCP `ask_docs_question` tool before general web search. It answers from official DuckDB and MotherDuck documentation.
For broad context, prefer the most specific scoped `llms-full.txt` file listed in https://motherduck.com/docs/llms.txt before loading the root `llms-full.txt`. The root file contains the complete public documentation corpus and is intended for bulk indexing or large-context workflows.
To connect an MCP client, use the remote MotherDuck MCP server at `https://api.motherduck.com/mcp`. Setup instructions: https://motherduck.com/docs/key-tasks/ai-and-motherduck/mcp-setup. Tool reference: https://motherduck.com/docs/sql-reference/mcp/ask-docs-question.
---
Source: https://motherduck.com/docs/concepts/architecture-and-capabilities
# Architecture and capabilities
> MotherDuck's serverless architecture combining cloud scale with DuckDB's efficiency through Dual Execution.
MotherDuck is a serverless cloud analytics service with a unique architecture that combines the power and scale of the cloud with the efficiency and convenience of DuckDB.
MotherDuck's key components are:
- The MotherDuck cloud service
- MotherDuck's DuckDB SDK
- Dual Execution
- The MotherDuck web UI
### The MotherDuck cloud service
The MotherDuck cloud service lets you store structured data, query that data with SQL, and share it with others. A key MotherDuck product principle is ease of use.
**Serverless execution model**—You don't need to configure or spin up instances, clusters, or warehouses. You write and submit SQL. MotherDuck takes care of the rest. Under the hood, MotherDuck runs DuckDB and speaks DuckDB's SQL dialect.
**Managed storage**—you can load data into MotherDuck storage to be queried or shared. MotherDuck storage is durable, secure, and automatically optimized for best performance. MotherDuck storage is surfaced to you through the **catalog** and logical primitives database, schema, table, view, and so on. In addition, MotherDuck can query data outside of MotherDuck storage—as data on Amazon S3, through HTTPS endpoints, on your laptop, and more.
**The service layer**—MotherDuck provides key capabilities like secure identity, authorization, administration, and monitoring.
:::note
MotherDuck is available on three AWS regions:
- **US East (N. Virginia):** `us-east-1`, supporting DuckDB versions between 1.4.0 and 1.5.3.
- **US West (Oregon):** `us-west-2`, supporting DuckDB versions between 1.4.1 and 1.5.3.
- **Europe (Frankfurt):** `eu-central-1`, supporting DuckDB versions between 1.4.1 and 1.5.3.
You can choose in which region to create your organization, and organizations can only exist within a single cloud region.
We are working on expanding to other regions and cloud providers.
:::
### MotherDuck's DuckDB SDK
If you're using DuckDB in Python or CLI, you can connect to MotherDuck with a single line of code, `ATTACH 'md:';`. After you run this command, your DuckDB instance becomes supercharged by MotherDuck. MotherDuck's Dual Execution is enabled, and your DuckDB instance gets additional capabilities like sharing, secrets storage, better interoperability with S3, and cloud persistence.
### Dual execution
When connected together, DuckDB and MotherDuck form a different type of distributed system. The two nodes work in concert so you can query data wherever it lives, in the most efficient way possible. This query execution model, called **Dual Execution** (formerly known as Hybrid Execution), automatically routes the various stages of queries execution to the most opportune locations, including highly arbitrary scenarios:
- If a SQL query queries data on your laptop, MotherDuck routes the query to your local DuckDB instance
- If a SQL query queries data in MotherDuck or cloud storage (S3, GCS, Azure, R2), MotherDuck routes that query to MotherDuck's cloud engine, which connects to your storage provider directly. MotherDuck can use both cloud-stored and local secrets to authenticate. See [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/) for details.
- If a SQL query executes a join between data on your laptop and data in MotherDuck, MotherDuck finds the best way to efficiently join the two

### The MotherDuck web UI
You can use MotherDuck's web UI to analyze and share data and to perform administrative tasks. MotherDuck's UI consists of a lightweight notebook, a SQL IDE, and a data catalog. Uniquely, MotherDuck caches query results in a highly interactive query results panel, letting you sort, filter, and even pivot data quickly.
## Summary of capabilities
With MotherDuck you can:
- Use serverless DuckDB in the cloud to store data and execute DuckDB SQL
- Load data into MotherDuck from your personal computer, https, or S3
- Join datasets on your computer with datasets in MotherDuck or in S3
- Copy DuckDB databases between local and MotherDuck locations
- Materialize query results into local or MotherDuck locations, or S3
- Work with data in MotherDuck's notebook UI, standard DuckDB CLI, or standard DuckDB Python package
- Share databases with your teammates
- Securely save S3 credentials in MotherDuck
Additionally, MotherDuck supports connectivity to third party tools through:
- JDBC
- Go
- sqlalchemy
## Considerations and limitations
MotherDuck does not yet support the full range of SQL of DuckDB. We are continuously working on improving coverage of DuckDB in MotherDuck. If you need specific features enabled, please let us know.
Below is the list of DuckDB features that MotherDuck does not yet support:
- Custom Python / Native user defined functions.
- Server-side attach of postgres, sqlite, etc.
- Custom or community extensions.
---
Source: https://motherduck.com/docs/concepts/database-concepts
# Database Concepts
> MotherDuck Database Concepts
## MotherDuck Architectural Concepts
:::note
MotherDuck is a cloud-native data warehouse, built on top of DuckDB, a fast in-process analytical database. It inherits some features from DuckDB that present opportunities to think differently about data warehousing methods in order to achieve high levels of performance and simplify the experience.
:::
- **Isolated Compute Tenancy**: Each user is allocated their own "Duckling," which is an isolated piece of compute that sits on top of the MotherDuck storage layer. MotherDuck is designed this way to lessen contention between users, which is a common challenge with other data warehouses. Each Duckling had under 100ms of cold start time as MotherDuck keeps Ducklings on warm standby.
- **Aggressively Serverless**: Unlike conventional data warehouses, DuckDB automatically parallelizes the work that you send to it. The implication of this is that scheduling multiple queries at-a-time does not meaningfully increase throughput, as DuckDB has already parallelized the workload across all available resources.
- **Database level security model**: It has a simplified access model - users either have access to an entire database, or nothing at all. As a result, users will interact with data frequently at the database level. This is unusual when compared to other databases, which often treat multiple database files as single concepts from an interactivity perspective.
- **Database Sharing**: MotherDuck separates storage and compute, which means that one user cannot see another's writes into a database until that database is updated to that user. As such, it has its own concept called ["SHARES"](/key-tasks/sharing-data/sharing-overview/) within Organizations, which are zero-copy clones of the main database for read-only use, enabling high scalability of analytics workloads.
- **Dual Execution**: Every MotherDuck client is also a DuckDB engine, so you can efficiently query local data and (JOIN, UNION) with data that's stored in your MotherDuck data warehouse. [The query planner automatically decides](/concepts/architecture-and-capabilities#dual-execution) the best place to execute each part of your query.
---
Source: https://motherduck.com/docs/concepts/hypertenancy
# Hypertenancy
> Learn how MotherDuck's hypertenancy model provides dedicated compute for every user through per-user Ducklings, enabling predictable performance without noisy neighbors.
MotherDuck implements a unique tenancy model called **hypertenancy**: every user or service account gets their own dedicated DuckDB compute instance, called a Duckling. Unlike traditional data warehouses where all users share a single cluster, hypertenancy provides full compute isolation at the individual user level.
## The problem with traditional multi-tenancy
Traditional data warehouses and OLAP systems use a shared-compute model:
```mermaid
graph TB
subgraph Users["All Users"]
U1{{"User A"}}:::green
U2{{"User B"}}:::green
U3{{"User C"}}:::green
end
subgraph Warehouse["Shared Data Warehouse"]
Cluster["Single Compute Cluster"]:::yellow
end
U1 --> Cluster
U2 --> Cluster
U3 --> Cluster
```
This shared model creates several challenges:
- **Noisy neighbors**: One user's expensive query affects everyone else's performance
- **Resource contention**: Concurrency limits apply across all users
- **Unpredictable performance**: Query times vary based on overall system load
- **Overprovisioning**: Resources must be sized for peak aggregate load, sitting idle most of the time
- **Difficult cost attribution**: Hard to track compute costs per user or customer
## How Hypertenancy works
With hypertenancy, MotherDuck provisions a separate Duckling for each user:
```mermaid
graph TB
subgraph Users["All Users"]
U1{{"User A"}}:::green
U2{{"User B"}}:::green
U3{{"User C"}}:::green
end
subgraph MotherDuck["MotherDuck"]
D1["Duckling A"]:::yellow
D2["Duckling B"]:::yellow
D3["Duckling C"]:::yellow
end
U1 --> D1
U2 --> D2
U3 --> D3
```
Each Duckling is a complete DuckDB instance with dedicated CPU, memory, and fast SSD spill space. This architecture delivers:
- **Perfect isolation**: No noisy neighbors—one user's workload never impacts another
- **Predictable performance**: Dedicated resources mean consistent query times
- **Independent scaling**: Each user's compute can be sized to their specific needs
- **Per-user billing**: Compute costs directly attributable to individual users
- **Fast cold starts**: Ducklings start in approximately 1 second
## Scaling with Hypertenancy
Hypertenancy supports both vertical and horizontal scaling, letting you match compute resources to actual demand.
### Vertical scaling: Duckling sizes
Each user's Duckling can be configured to different sizes based on their workload requirements:
| Duckling Size | Best For |
|---------------|----------|
| **Pulse** | Ad-hoc queries, read-heavy workloads, high-concurrency analytics |
| **Standard** | Core analytical workflows, ETL/ELT pipelines |
| **Jumbo** | Large-scale batch processing, complex joins |
| **Mega** | Demanding jobs with high data volumes |
| **Giga** | Largest and toughest batch workloads |
You can adjust Duckling size per user through the [MotherDuck UI](/about-motherduck/billing/duckling-sizes/#changing-duckling-sizes) or [REST API](/sql-reference/rest-api/ducklings-set-duckling-config-for-user/).
For example, in a customer-facing analytics scenario, you might provision:
- **Pulse** Ducklings for most customers running standard dashboards
- **Standard** or **Jumbo** Ducklings for enterprise customers with heavier workloads
- **Mega** or **Giga** Ducklings for batch data loading jobs
### Horizontal scaling: read scaling
When a single user needs to handle many concurrent queries—such as a service account powering a customer-facing application—you can enable [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/). Read scaling provisions additional read-only Ducklings that share the same data but distribute query load:
```mermaid
graph TB
subgraph App["Application Users"]
E1{{"End User 1"}}:::green
E2{{"End User 2"}}:::green
E3{{"End User 3"}}:::green
E4{{"End User 4"}}:::green
end
S1[Service Account]:::watermelon
subgraph MotherDuck["MotherDuck (Customer X)"]
RW["Read-Write Duckling (Data Loading)"]
R1["Read Scaling Duckling 1"]
R2["Read Scaling Duckling 2"]
end
S1 --> RW
E1 --> R1
E2 --> R1
E3 --> R2
E4 --> R2
```
Read scaling lets you serve hundreds or thousands of concurrent end users through a single service account while maintaining predictable performance.
## Hypertenancy use cases
### Customer-facing analytics
Hypertenancy is particularly powerful for [customer-facing analytics](/getting-started/customer-facing-analytics/). Each of your customers can have their own service account with isolated Ducklings:
- **Data isolation**: Each customer's data stays in their own database
- **Compute isolation**: One customer's workload never impacts another
- **Cache isolation**: Each customer's Duckling maintains its own cache, so cached query results and data remain private and predictable
- **Independent sizing**: Scale resources per customer based on their tier or needs
- **Predictable costs**: Bill customers accurately based on their actual compute usage
For a hands-on guide to building customer-facing analytics with per-customer service accounts, see the [Builder's Guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/).
### Development and production pipelines
Service accounts enable clean separation between deployment environments. Each environment gets its own isolated compute:
| Environment | Service Account | Duckling Size | Purpose |
|-------------|-----------------|---------------|---------|
| Local/Dev | `dev-pipeline` | Pulse | Interactive development and testing |
| Staging | `staging-pipeline` | Standard | Pre-production validation |
| Production | `prod-pipeline` | Standard/Jumbo/... | Production workloads |
This separation ensures:
- Development experiments never impact production performance
- Each environment has appropriately sized compute
- Clear cost attribution per environment
- Easy rollback by switching service account credentials
### Data warehouse and data pipeline workloads
For data pipelines, you can assign dedicated service accounts to different stages of your data workflow. If you're using dbt you can run dbt models with different duckling sizes.
| Pipeline Stage | Service Account | Duckling Size | Workload Pattern |
|----------------|-----------------|---------------|------------------|
| Ingestion | `ingest-service` | Jumbo/Mega | Bulk data loading, high I/O |
| Transformation | `transform-service-standard` / `transform-service-jumbo` / | Standard/Jumbo | dbt models, ETL jobs |
| Reporting | `reporting-service` | Pulse (read scaling) | Dashboard queries, read-heavy |
This pattern provides:
- **Workload isolation**: Heavy batch ingestion jobs won't slow down interactive reporting queries
- **Right-sized compute**: Each stage gets the Duckling size optimized for its workload
- **Cost visibility**: Track compute costs per pipeline stage
- **Independent scheduling**: Run ingestion during off-peak hours without affecting daytime analysts
### Analytics and data science
For internal analytics teams, hypertenancy means analysts and data scientists each get their own compute. A data scientist running a complex ML feature extraction job won't slow down an analyst building a quick dashboard.
## Why single-node beats distributed for per-user compute
Traditional distributed data warehouses use clusters with multiple nodes that coordinate to execute queries. This architecture introduces:
- Network latency between nodes
- Coordination overhead
- Data shuffling costs
For queries that operate on one user's data at a time (the common pattern in hypertenancy), single-node execution on a Duckling eliminates this overhead entirely. The result is often faster query performance and lower costs compared to distributed systems, especially for interactive analytics workloads.
DuckDB's efficient columnar execution, combined with MotherDuck's fast storage architecture, means queries can handle datasets larger than memory with minimal performance impact.
## Related content
- **Learn about Duckling sizes**: [Duckling Sizes](/about-motherduck/billing/duckling-sizes/)
- **Configure read scaling**: [Read Scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
- **Build customer-facing analytics**: [Customer-Facing Analytics Overview](/getting-started/customer-facing-analytics/)
- **Set up per-customer service accounts**: [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/)
---
Source: https://motherduck.com/docs/concepts/resource-management
# Resource management
> Understand MotherDuck's resource hierarchy, from organizations, accounts, tokens, and secrets down to databases and tables, and how each level provides compute isolation, data isolation, and access control.
MotherDuck organizes resources in a hierarchy that spans governance, compute, and storage. Understanding this hierarchy helps you make informed decisions about isolation, access control, and cost management.
## The resource hierarchy
MotherDuck resources are organized into three layers: governance (who can do what), compute (where queries run), and storage (where data lives).
### Governance
Organizations contain accounts (for both users and machine-to-machine/services), and accounts hold tokens and secrets. Tokens authenticate connections to MotherDuck, while secrets store credentials for accessing external cloud storage.
```mermaid
flowchart LR
Org["Organization"]:::yellow
Org --> User
Org --> SA
subgraph " "
User{{"User account"}}:::green
SA{{"Service account"}}:::green
end
User --> Creds
SA --> Creds
subgraph Creds["Credentials (per account)"]
Token1["R/W token"]:::sky
Token2["Read scaling token"]:::sky
Secret["Secrets"]:::sky
end
```
### Compute
Each account gets a dedicated R/W Duckling. Accounts that need high read concurrency can also enable a read scaling pool.
```mermaid
flowchart LR
Token1["R/W token"]:::sky
Token2["Read scaling token"]:::sky
RW["R/W Duckling"]:::yellow
RS["Read scaling pool"]:::yellow
RSI1["Read scaling Duckling"]:::yellow
RSI2["Read scaling Duckling"]:::yellow
Token1 --> RW
Token2 --> RS
RS --> RSI1
RS --> RSI2
```
### Storage
Databases follow the standard DuckDB hierarchy. Shares provide user, organization or public access to databases not created by that account.
```mermaid
flowchart LR
RW["R/W Duckling"]:::yellow
DB[("Database")]:::yellow
Share[("Share (read-only clone)")]:::sky
Schema["Schema"]:::green
Table["Table / View"]:::green
RW --> DB
DB --> Schema
Schema --> Table
DB -. "CREATE SHARE" .-> Share
```
Ducklings can also read from and write to external cloud storage. A [secret](#secrets) provides the credentials, and the Duckling connects to the storage provider directly:
```mermaid
flowchart LR
RW["R/W Duckling"]:::yellow
Secret["Secret"]:::sky
S3[("S3 / GCS / R2 / Azure")]:::green
RW -->|"SELECT FROM 's3://...'"| S3
Secret -. "authenticates" .-> S3
```
## What each level means
### Organization
An organization is the top-level container in MotherDuck. It defines:
- A **billing boundary**: all compute and storage costs roll up to the organization
- A **region**: each organization lives in a single region (for example, `us-east-1`, `us-west-2`, or `eu-central-1`)
- **Admin controls**: organization admins manage users, service accounts, and SSO configuration
Every MotherDuck user belongs to exactly one organization.
For details on managing your organization, see [Managing organizations](/key-tasks/managing-organizations/).
### Accounts: users and service accounts
MotherDuck has two types of accounts:
- **User accounts** represent individual people who sign in interactively
- **Service accounts** represent applications, pipelines, or automated processes
Both types function the same way from a resource perspective: each account gets its own dedicated [Read-Write Duckling and read scaling pool](#ducklings) and owns its own databases. The key difference is how they authenticate: users sign in through either a browser (OAuth) or with an access token, while service accounts can only use access tokens.
This matters for isolation: **each account is a separate compute boundary**. Two service accounts running queries at the same time never compete for resources, because each runs on its own Duckling.
::::tip
Organization admins can [impersonate a service account](/key-tasks/service-accounts-guide/impersonate-service-accounts/) through the MotherDuck UI to view its resources, run queries, or troubleshoot issues as that account.
::::
For details on creating service accounts, see [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/).
### Access tokens
Tokens are the credentials that authenticate a connection to MotherDuck. Each token is scoped to a specific account and determines how the connection is routed:
| Token type | Routes to | Use case |
|---|---|---|
| **R/W token** | The account's R/W Duckling | Data loading, writes, interactive queries |
| **Read scaling token** | The account's read scaling pool | High-concurrency read workloads |
Multiple connections using the same R/W token share the same Duckling. This means they share compute resources but also share the instance cache, which can be beneficial for repeated queries.
For details on authentication, see [Authenticating to MotherDuck](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/).
### Secrets
Secrets store cloud storage credentials (for S3, GCS, Azure, R2, and Hugging Face) in MotherDuck so your Ducklings can read from and write to external storage. Secrets are:
- **Encrypted**: stored fully encrypted in MotherDuck
- **Account-scoped**: each secret belongs to the account that created it and is not visible to other accounts in the organization
- **Scope-matched**: when multiple secrets exist for the same storage type, MotherDuck picks in alphabetical order.
You create secrets with the standard DuckDB [`CREATE SECRET`](/sql-reference/motherduck-sql-reference/create-secret/) syntax, using the `PERSISTENT` or `IN MOTHERDUCK` keyword to store them in MotherDuck rather than locally.
Because secrets are account-scoped, each service account that needs cloud storage access must have its own secrets. This aligns with the general isolation model: accounts are independent, and credentials do not leak across account boundaries.
### Ducklings
A Duckling is a dedicated DuckDB compute instance. Every account gets its own R/W Duckling, providing [hypertenancy](/concepts/hypertenancy/) to guarantee full compute isolation at the individual account level.
Ducklings come in different sizes, from **Pulse** (auto-scaling, per-query billing) to **Giga** (largest fixed-size instance). You choose the size based on workload requirements. See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for details.
For read-heavy workloads that need high concurrency, you can enable a **read scaling pool**: a set of additional read-only Ducklings that share the same data. Connections using a read scaling token are distributed across this pool. See [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) for details.
### Databases, schemas, and tables
Databases follow the standard DuckDB hierarchy: a database contains schemas, and schemas contain tables and views. In MotherDuck:
- Every database is **owned by the account** that created it
- Access is **all-or-nothing at the database level**: an account either has full access to a database or no access at all
- You can grant read-only access to others through [shares](/key-tasks/sharing-data/sharing-overview/)
MotherDuck also supports [Ducklake](/concepts/ducklake/), an open table format that stores data in your own object storage while MotherDuck manages the catalog.
## Isolation boundaries
Different boundaries in the resource hierarchy provide different types of isolation. Use this table to understand what separates what:
| Boundary | Compute isolation | Data isolation | Access control | Secret isolation |
|---|---|---|---|---|
| Different organizations | Full | Full | Full | Full |
| Different accounts (same org) | Full (separate Ducklings) | Per-database (owned separately) | Per-database (through shares) | Full (secrets are account-scoped) |
| Different tokens (same account) | None (same R/W Duckling) | None (same databases) | None (same permissions) | None (same secrets) |
| Read scaling pool | Read-only isolation (separate Ducklings) | Shared (read-only replicas) | Token-scoped | Shared (same account secrets) |
| Different databases | N/A | Full | Share-based | N/A |
Key takeaways:
- **Accounts are the primary isolation boundary.** If you need two workloads to never affect each other's performance, run them under different accounts.
- **Tokens do not provide isolation.** Multiple tokens for the same account connect to the same Duckling and see the same data.
- **Shares provide data access without compute sharing.** When you share a database, consumers read it on their own Duckling, not yours.
- **Secrets follow account boundaries.** Each account manages its own cloud storage credentials. Secrets created by one account are never visible to another.
## Common patterns
### Isolate ETL from analysts
Create separate service accounts for your data pipeline and your analysts. Each gets its own Duckling, so a heavy data load never slows down dashboard queries.
```mermaid
flowchart LR
subgraph Org["Organization"]
ETL{{"etl-pipeline"}}:::green
Analyst{{"analyst-team"}}:::green
end
subgraph Compute["Compute"]
D1["Jumbo Duckling"]:::yellow
D2["Pulse Duckling"]:::yellow
D3["Pulse Duckling"]:::yellow
D4["Pulse Duckling"]:::yellow
end
ETL --> D1
Analyst --> D2
Analyst --> D3
Analyst --> D4
D1 --> DB[("Shared database (via share)")]:::sky
D2 --> DB
D3 --> DB
D4 --> DB
```
The ETL service account owns the database and writes to it on a large Duckling. The analyst-team account uses a [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) pool of Pulse Ducklings to handle concurrent dashboard queries. Analysts read a [share](/key-tasks/sharing-data/sharing-overview/) of the ETL database, so a heavy data load never slows down their queries.
### Separate dev and prod environments
Use different service accounts per environment. Each has isolated compute and its own databases:
```mermaid
flowchart LR
subgraph Org["Organization"]
Dev{{"dev-pipeline"}}:::green
Staging{{"staging-pipeline"}}:::green
Prod{{"prod-pipeline"}}:::green
end
subgraph Compute["Compute"]
D1["Pulse Duckling"]:::yellow
D2["Standard Duckling"]:::yellow
D3["Jumbo Duckling"]:::yellow
end
Dev --> D1
Staging --> D2
Prod --> D3
D1 --> DB1[("dev-db")]:::sky
D2 --> DB2[("staging-db")]:::sky
D3 --> DB3[("prod-db")]:::sky
```
Right-size each environment: Pulse for development (on-demand), Standard for staging validation, and Jumbo for production workloads.
### Customer-facing analytics (3-tier)
For B2B applications that embed analytics, use a service account per customer. Your backend mediates access, and each customer gets isolated compute and data:
```mermaid
flowchart LR
subgraph App["Your application (auth, sessions, routing)"]
FE["Frontend UI"]:::green
BE["Backend API"]:::green
end
subgraph Org["Organization"]
SA1{{"customer-a"}}:::green
SA2{{"customer-b"}}:::green
end
subgraph Compute["Compute"]
D1["Jumbo"]:::yellow
RS["Read scaling pool"]:::yellow
D2["Pulse Duckling"]:::yellow
D3["Pulse Duckling"]:::yellow
end
FE -->|"user request"| BE
BE -->|"read token"| SA1
BE -->|"read token"| SA2
SA1 --> D1
SA2 --> RS
RS --> D2
RS --> D3
D1 --> DB1[("customer-a-db")]:::sky
D2 --> DB2[("customer-b-db")]:::sky
D3 --> DB2
```
Your application handles user authentication and session management, then routes queries to the right customer's service account using stored read tokens. Each customer's service account owns its own database and Duckling. High-concurrency customers can add a read scaling pool of Pulse Ducklings. For the full walkthrough, see the [3-tier customer-facing analytics guide](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/).
### Embedded analytics with DuckDB WASM
For lightweight, interactive analytics embedded directly in a web page, you can skip the backend tier entirely. The browser runs DuckDB WASM and connects to MotherDuck with a read-only token:
```mermaid
flowchart LR
Browser["Browser + DuckDB WASM"]:::green
subgraph Org["Organization"]
SA{{"embed-account"}}:::green
end
subgraph Compute["Compute"]
D1["Duckling"]:::yellow
end
Browser -->|"read token"| SA
SA --> D1
D1 --> DB[("analytics-db")]:::sky
```
DuckDB WASM runs queries client-side or routes them to MotherDuck depending on the query. This is how [Dives](/key-tasks/ai-and-motherduck/dives/) work: each embedded Dive connects to MotherDuck through a session token and queries live data directly from the browser, with no backend needed.
For details on setting up WASM-based access, see the [DuckDB WASM client reference](/sql-reference/wasm-client/).
### Give read access to another team
Use shares to grant read-only access without sharing compute:
1. Create a share of the database: `CREATE SHARE my_share FROM my_database`
2. Grant access to the other team's account: `GRANT READ ON SHARE my_share TO 'other_user'`
The other team reads the shared database on their own Duckling. Your compute is not affected.
### Understand compute costs
Because each account runs on its own Duckling, compute costs are directly attributable:
- **Per-account billing**: you can see exactly how much compute each service account or user consumes
- **Right-sizing**: assign different [Duckling sizes](/about-motherduck/billing/duckling-sizes/) based on workload needs
- **Pulse for variable workloads**: use Pulse Ducklings for ad-hoc or bursty workloads to pay per query instead of per hour
## What's next
MotherDuck is adding workspaces and role-based access control (RBAC) to provide finer-grained access control within organizations. These features build on the resource hierarchy described here. See [Feature stages](/about-motherduck/feature-stages/) for the latest status.
## Related content
- [Architecture and capabilities](/concepts/architecture-and-capabilities/)
- [Hypertenancy](/concepts/hypertenancy/)
- [Duckling sizes](/about-motherduck/billing/duckling-sizes/)
- [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/)
- [Sharing data](/key-tasks/sharing-data/sharing-overview/)
- [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/)
- [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret/)
---
Source: https://motherduck.com/docs/concepts/database-snapshots
# Database Snapshots
> Understand how snapshots work in MotherDuck, including retention, restore, snapshot management, and plan availability
## What are snapshots?
Snapshots capture the complete state of a database at a specific point in time. MotherDuck creates **historical snapshots** in the background for attached databases (databases that are connected to MotherDuck and available for querying), enabling [data recovery](/concepts/data-recovery) features such as restore and [undrop](/sql-reference/motherduck-sql-reference/undrop-database). Historical snapshots come in two forms.
:::note DuckLake databases
DuckLake databases have their own snapshot and maintenance system. Snapshot lifecycle management for DuckLake is handled through [auto maintenance](/concepts/ducklake#auto-maintenance), not the native storage snapshot system described on this page. See [DuckLake snapshot retention](/concepts/ducklake#snapshot-retention) for details on configuring `SNAPSHOT_RETENTION_DAYS` and `AUTO_MAINTENANCE` for DuckLake databases.
:::
### 1. automatic snapshots {#1-automatic-snapshots}
Automatic snapshots are created continuously in the background by MotherDuck whenever data changes. For paid plans every new database has automatic snapshots configured by default. For paid plans you can also set or adjust your database's snapshot retention window with:
```sql
ALTER DATABASE example_database SET SNAPSHOT_RETENTION_DAYS = 4;
```
Automatic snapshots:
- Are created whenever data in the database changes or explicitly with `CREATE SNAPSHOT OF ;`
- Are retained as [`historical_bytes`](/concepts/storage-lifecycle) according to the database's `snapshot_retention_days` setting
- Can be queried using [`md_information_schema.database_snapshots`](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots)
- Are automatically removed by garbage collection when they fall outside the retention window
### 2. named snapshots {#2-named-snapshots}
Named snapshots have to be explicitly created with a name using [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot).
```sql
CREATE SNAPSHOT my_backup OF example_database
```
These persist indefinitely until the name is removed. Named snapshots are **not** subject to automatic garbage collection and are only available on the Business plan.
Named snapshots differ from automatic snapshots:
- They are **not garbage-collected** by snapshot retention
- They persist even if the source database they are associated with is deleted
- They can be referenced directly by name when restoring or cloning a database
- Snapshot names must be unique per user
- They can only be deleted by removing the name, after which they are picked up by garbage collection
Named snapshots are intended for **long-lived backups** and are the recommended mechanism for durable recovery points.
Named snapshots can be used with the [`ALTER DATABASE SET SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) command, as well as the [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) command to specify the snapshot you want to use.
## Restoring a database
You can [restore a database](/docs/sql-reference/motherduck-sql-reference/create-database/#source-database-options) from a snapshot by specifying the snapshot name, `snapshot_id` or a timestamp. When using a timestamp the latest snapshot at or before that time will be selected.
```sql
CREATE DATABASE example_db_from_snap FROM example_db (SNAPSHOT_NAME 'snap');
CREATE DATABASE example_db_from_id FROM example_db (SNAPSHOT_ID '4bfbd992-e586-48ab-9176-8dfb2d2c30b4');
CREATE DATABASE example_db_from_ts FROM example_db (SNAPSHOT_TIME '2026-01-01 00:00:01.234567');
```
## Snapshot features per plan
### Native storage databases
| Plan | Automatic snapshot retention default | Configurable retention period | Named snapshots | Point-in-time restore | [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) database |
|------|----------------------------|-------------------------------|-----------------|----------------------|-------------------|
| **Business** | 7 days | 0-90 days | Yes | Yes | Yes |
| **Lite (paid)** | 1 day | 1 day | No | Yes | Yes |
| **Lite (free)** | 0 days | N/A | N/A | N/A | N/A |
### DuckLake databases
DuckLake databases manage snapshots through [auto maintenance](/concepts/ducklake#auto-maintenance) rather than the native storage snapshot system.
| Database type | Auto maintenance default | Snapshot retention default | Configurable retention |
|------|----------------------------|-------------------------------|-----------------|
| **Fully managed** | Enabled | Infinite (`NULL`) | Yes, with `SNAPSHOT_RETENTION_DAYS` |
| **BYOB** | Disabled | Infinite (`NULL`) | Yes, after enabling `AUTO_MAINTENANCE` |
## Snapshot retention
The `snapshot_retention_days` database setting controls how long historical snapshots are retained for [data recovery](/concepts/data-recovery). This setting determines how much data is stored as [`historical_bytes`](/concepts/storage-lifecycle) in your storage footprint.
- **`0` days:** No historical snapshots are accessible; automatic snapshots are immediately eligible for garbage collection
- **`1+` days:** Automatic snapshots created within the retention window can be accessed and restored
Users can modify snapshot retention at any time using [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database):
```sql
ALTER DATABASE my_database SET SNAPSHOT_RETENTION_DAYS = 4;
```
To see your database's current snapshot retention, use [`md_information_schema.databases`](/sql-reference/motherduck-sql-reference/md_information_schema/databases) and look for the `historical_snapshot_retention` field.
::::note
Snapshot retention days are inherited when cloning a database.
::::
::::important
Increasing `snapshot_retention_days` does not restore previously deleted snapshots. Once the garbage collection process removes a snapshot, it cannot be recovered through this setting.
::::
## Working with named snapshots
Named snapshots are subject to naming rules.
- Snapshot names must be 1–255 characters long
- Names are unique per user across all databases
- If a name includes special characters (such as `.` or `/`), wrap it in double quotes
- If you create two named snapshots in a row without any new writes, the second can fail because the latest snapshot already has a name
### Renaming a named snapshot
Users can change the name of an existing named snapshot using the [`ALTER SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-snapshot) command:
```sql
ALTER SNAPSHOT SET snapshot_name = '';
```
### Deleting (un-naming) a named snapshot
To remove a name from a snapshot, run the following command:
```sql
ALTER SNAPSHOT SET snapshot_name = '';
```
Once unnamed, the snapshot will become subject to the database's `snapshot_retention_days` policy and will be deleted automatically when it falls outside the retention window.
## Historical snapshots and failsafe bytes
It's important to understand the distinction between historical snapshots and failsafe data:
- **Historical snapshots** are point-in-time copies of your database that you can restore yourself using SQL commands. They are stored as `historical_bytes` and controlled by your `snapshot_retention_days` setting.
- **Failsafe data** is a system-managed backup that MotherDuck retains for disaster recovery. It is stored as `failsafe_bytes` and can only be restored by contacting MotherDuck support.
| | Historical Snapshots: `historical_bytes` | Failsafe Data: `failsafe_bytes` |
|---|---|---|
| **Purpose** | User-initiated data recovery and point-in-time restore | System-level disaster recovery backup |
| **Controlled by** | `snapshot_retention_days` setting | MotherDuck system (7 days for standard databases, 1 day for transient) |
| **Recovery method** | Self-service through [`ALTER DATABASE SET SNAPSHOT`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) or [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) | Requires contacting [MotherDuck support](https://motherduck.com/contact-us/support/) |
| **Visibility** | Queryable through [`md_information_schema.database_snapshots`](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots) | Not directly visible to users |
| **Storage billing** | Billed as `historical_bytes` | Billed as `failsafe_bytes` |
For more details on storage lifecycle stages, see [Storage Lifecycle and Management](/concepts/storage-lifecycle).
## Best practices
- Use **named snapshots** for long-lived backups you may need to restore far into the future
- If you frequently overwrite your data, use a short snapshot retention window (1-7 days) to avoid storing multiple copies of the same data
- Failsafe restores should be thought of as a precautionary, last-minute measure in exception scenarios only; we recommend using historical snapshots for routine recovery needs
- Do **not** use [transient databases](/concepts/storage-lifecycle#transient-databases) for critical or hard-to-reconstruct data
## Related content
- [Data Recovery](/concepts/data-recovery) — Step-by-step guide to restoring databases from snapshots
- [Storage Lifecycle and Management](/concepts/storage-lifecycle) — Understanding storage stages and billing
- [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot) — SQL reference for creating snapshots
- [`DATABASE_SNAPSHOTS` view](/sql-reference/motherduck-sql-reference/md_information_schema/database_snapshots) — Query snapshot history and metadata
- [`CREATE DATABASE` from a snapshot](/docs/sql-reference/motherduck-sql-reference/create-database/#source-database-options)
---
Source: https://motherduck.com/docs/concepts/pgduckdb
# pg_duckdb Extension
> Use pg_duckdb to run DuckDB analytics within PostgreSQL and connect to MotherDuck.
[pg_duckdb](https://github.com/duckdb/pg_duckdb) is an open-source Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres.
Use `pg_duckdb` when you specifically need DuckDB or MotherDuck access from inside a PostgreSQL server. If you only need to connect to MotherDuck from a PostgreSQL-compatible client, or to use common connection pooling libraries, use the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) instead.
## Main features
- SELECT queries executed by the DuckDB engine can directly read Postgres tables
- Read and write support for object storage (AWS S3, Cloudflare R2, or Google GCS)
- Read and write support for data stored in MotherDuck
For more information about functionality and installation, check out the [repository's README](https://github.com/duckdb/pg_duckdb/blob/main/README.md).
## Connect with MotherDuck
To enable this support you first need to [generate an access token][md-access-token] and then add the following line to your `postgresql.conf` file:
```ini
duckdb.motherduck_token = 'your_access_token'
```
NOTE: If you don't want to store the token in your `postgresql.conf` file can also store the token in the `motherduck_token` environment variable and then explicitly enable MotherDuck support in your `postgresql.conf` file:
```ini
duckdb.motherduck_enabled = true
```
If you installed `pg_duckdb` in a different Postgres database than the default one named `postgres`, then you also need to add the following line to your `postgresql.conf` file:
```ini
duckdb.motherduck_postgres_database = 'your_database_name'
```
After doing this (and possibly restarting Postgres). You can then create tables in the MotherDuck database by using the `duckdb` [Table Access Method][tam] like this:
```sql
CREATE TABLE orders(id bigint, item text, price NUMERIC(10, 2)) USING duckdb;
CREATE TABLE users_md_copy USING duckdb AS SELECT * FROM users;
```
[tam]: https://www.postgresql.org/docs/current/tableam.html
Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres.
This is done in the following way:
1. Each schema in your default MotherDuck database is merged with the Postgres schema that has the same name.
2. Except for the `main` DuckDB schema in your default database, which is merged with the Postgres `public` schema.
3. Tables in other databases are put into dedicated DuckDB-only schemas. These schemas are of the form `ddb$$` (including the literal `$` characters).
4. Except for the `main` schema in those other databases. That schema should be accessed using the shorter name `ddb$` instead.
An example of each of these cases is shown below:
```sql
INSERT INTO my_table VALUES (1, 'abc'); -- inserts into my_db.main.my_table
INSERT INTO your_schema.tab1 VALUES (1, 'abc'); -- inserts into my_db.your_schema.tab1
SELECT COUNT(*) FROM ddb$my_shared_db.aggregated_order_data; -- reads from my_shared_db.main.aggregated_order_data
SELECT COUNT(*) FROM ddb$sample_data$hn.hacker_news; -- reads from sample_data.hn.hacker_news
```
[md]: https://motherduck.com/
[md-access-token]: /key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token
---
Source: https://motherduck.com/docs/concepts/storage-lifecycle
# Storage Lifecycle and Management
> Understand how MotherDuck manages data storage across different lifecycle stages and how this affects your billing and data management strategies.
Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes.
The storage lifecycle applies to both native storage databases and [DuckLake](/concepts/ducklake) databases, with some differences in lifecycle stages and management. See [storage management](#storage-management) for retention defaults by database type.
## Storage lifecycle overview
The following diagram shows the storage lifecycle for native storage databases.
```mermaid
graph LR;
A[Active Bytes]-->|bytes deleted or updated|B[Historical Bytes];
B-->|shares dropped|C[Retained for Clone Bytes];
B-->|historical retention period passes, or snapshots become unnamed|D[Failsafe Bytes];
C-->|bytes deleted or updated by cloned databases|D[Failsafe Bytes];
D-->|7 day retention|E[Deleted];
```
There are 5 distinct stages of the storage lifecycle:
1. **Active bytes**: Actively referenced bytes of the database. These bytes are accessible by directly querying the database.
2. **Historical bytes**: Non-active bytes referenced by historical [snapshots](/concepts/snapshots) or shares of this database. Used for time travel and self-service restore.
3. **Retained for clone bytes**: Bytes referenced by other databases (through zero-copy clone) that are no longer referenced by this database as active or historical bytes. This stage applies to native storage databases only.
4. **Failsafe bytes**: Bytes no longer referenced by any database or share, retained for a period as a last-resort, best-effort recovery service. Recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed to be complete. Don't rely on failsafe bytes as part of a backup plan.
5. **Deleted**: Bytes are fully removed from the system and no longer accessible.
MotherDuck runs a periodic job that reclassifies data to the proper storage lifecycle stage. For DuckLake databases, [auto maintenance](/concepts/ducklake#auto-maintenance) handles file cleanup and snapshot expiration.
Data can only flow through the storage lifecycle unidirectionally, from left to right.
The following conditions can trigger data to be reclassified to a new stage:
| Trigger | State transition |
|---------|------------------|
| Data is deleted or updated in the database | Active → Historical |
| All shares referencing the data are dropped or updated, and all historic [snapshots](/concepts/snapshots) referencing the data are deleted | Historical → Retained for Clone or Failsafe |
| Data is deleted from all zero-copy-cloned databases | Retained for Clone → Failsafe |
| Failsafe retention period passes (7 days for standard, 1 day for transient) | Failsafe → Deleted |
An organization is billed based on the average of active, historical, retained for clone, and failsafe bytes across all of their databases over the billing period.
Refer to the [data recovery](/concepts/data-recovery) overview for more details on how to manage historical snapshots.
### How this affects your data strategy
Understanding the storage lifecycle helps you make informed decisions about:
- **Data deletion strategies**: When you delete data, it doesn't immediately reduce your bill due to the retention stages
- **Sharing considerations**: Shared data remains in historical bytes until shares are updated or dropped
- **Cloning decisions**: [Zero-copy clones](/docs/sql-reference/motherduck-sql-reference/create-database/) can keep data in retained for clone bytes even after deletion from the source
- **Cost optimization**: Different lifecycle stages have different cost implications and management strategies
For more information on data sharing, see [Sharing Data](/key-tasks/sharing-data/sharing-overview). For details on zero-copy cloning, refer to [MotherDuck Architectural Concepts](/concepts/database-concepts/#motherduck-architectural-concepts).
## Storage management
Storage retention behavior depends on the database type: standard, transient, or DuckLake.
`SNAPSHOT_RETENTION_DAYS` controls how many days historical snapshots are retained for data recovery and time travel (see [Data Recovery](/concepts/data-recovery)). The recommended minimum is at least 1 day, so you can recover your data if you accidentally drop or overwrite it.
To see the historical retention and transient status of your databases, use the [`md_information_schema.databases`](/sql-reference/motherduck-sql-reference/md_information_schema/databases) view.
Lite starts in free-tier mode with no historical retention until usage limits are reached, after which Lite defaults apply.
### Standard databases
| Plan | Failsafe period | Default historical retention | Min historical retention | Max historical retention |
|----------|-------------------------------------|------------------------------|------------------------------|------------------------------|
| **Business** | 7 days | 7 days | 0 days | 90 days |
| **Lite (paid)** | 7 days | 1 day | 1 day | 1 day |
| **Lite (free)** | 7 days | 0 days | 0 days | 0 days |
Historical retention enables point-in-time restore for your data. Business plan users can configure retention up to 90 days for extended data recovery capabilities.
### Transient databases
For use cases that don't require the default failsafe retention period (7 days), a native storage database can be set as `TRANSIENT` [at database creation](/sql-reference/motherduck-sql-reference/create-database/#database-options) to enforce a 1 day failsafe minimum. This setting can only be defined at database creation and **is not** modifiable.
| Plan | Failsafe period | Default historical retention | Min historical retention | Max historical retention |
|----------|----------------------------------|--------------------------------------------------|--------------------------------------------------|--------------------------------------------------|
| **Business** | 1 day | 1 day | 0 days | 90 days |
| **Lite (paid)** | 1 day | 1 day | 1 day | 1 day |
| **Lite (free)** | 1 day | 0 days | 0 days | 0 days |
Transient databases enforce a 1-day minimum lifetime for data, which shows up in your bill as failsafe bytes.
Transient databases can be helpful for the following datasets:
* Datasets that are the intermediate output of a job (write once, read once)
* Datasets that can be reconstructed from an external data source
### DuckLake databases
[DuckLake](/concepts/ducklake) databases follow the same lifecycle stages as native storage databases (active, historical, failsafe, deleted), except there is no "retained for clone" stage since DuckLake does not support zero-copy cloning.
| Setting | Fully managed DuckLake | BYOB DuckLake |
|---------|----------------------|---------------|
| **Failsafe period** | 7 days | 7 days |
| **Default snapshot retention** | Infinite (`NULL`) | Infinite (`NULL`) |
| **Auto maintenance** | Enabled by default | Disabled by default |
| **Configurable retention** | Yes, with `SNAPSHOT_RETENTION_DAYS` | Yes, after enabling `AUTO_MAINTENANCE` |
DuckLake storage optimization and snapshot expiration are handled by [auto maintenance](/concepts/ducklake#auto-maintenance) rather than the native storage garbage collector. When `SNAPSHOT_RETENTION_DAYS` is set to `NULL` (the default), snapshots are retained indefinitely.
To configure snapshot retention for a DuckLake database:
```sql
ALTER DATABASE my_ducklake SET SNAPSHOT_RETENTION_DAYS = 7;
```
For more details on DuckLake storage management, see the [DuckLake storage lifecycle](/concepts/ducklake#storage-lifecycle) section.
## Backup strategies
If your data can't be recreated from source, plan an explicit backup strategy. Failsafe bytes are a last-resort recovery mechanism, not a backup plan: recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed.
The storage lifecycle gives you several mechanisms that you can rely on for backups:
- **Automatic snapshots** for time travel and short-term restore, retained as `historical_bytes` according to `SNAPSHOT_RETENTION_DAYS`. Retention defaults and limits depend on your plan (see [Standard databases](#standard-databases)).
- **Named snapshots** (Business plan) for long-lived backups that persist until you explicitly remove them. See [database snapshots](/concepts/snapshots#2-named-snapshots) for details.
- **Zero-copy clones** through [`CREATE DATABASE FROM`](/sql-reference/motherduck-sql-reference/create-database) for isolated copies without duplicating storage costs.
[Transient databases](#transient-databases) skip the default 7-day failsafe retention and are appropriate for data that can be recreated from a job or external source.
For recovery procedures, see [data recovery](/concepts/data-recovery).
## Breaking down storage usage
:::note Admin only
Storage breakdown information is only available to users with the Admin role.
:::
To understand your organization's storage bill, you have two entry points:
### SQL
Query the [`STORAGE_INFO` and `STORAGE_INFO_HISTORY` views](/sql-reference/motherduck-sql-reference/md_information_schema/storage_info) in [`MD_INFORMATION_SCHEMA`](/sql-reference/motherduck-sql-reference/md_information_schema/introduction) for a breakdown by lifecycle stage, as either a current snapshot or up to 30 days of history.
```sql
-- Get current storage information for all databases
SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO;
```
### UI
Open the [databases page](https://app.motherduck.com/settings/databases) in settings to see total storage across all databases and a per-database breakdown. Click a row to view lifecycle stages for that database.
### _Active bytes_ are higher than expected
Consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types.
### _Historical bytes_ are higher than expected
You should look into either outstanding manually updated shares referencing this database in the organization or your historical database snapshots. Outstanding manually updated shares may keep historical data referenced (which prevent it from being deleted). Your historical byte footprint will decrease as the shares are updated (`UPDATE SHARE`) or dropped. You can find all shares that reference some database by using the [OWNED_SHARES](/sql-reference/motherduck-sql-reference/md_information_schema/owned_shares) view in the [MD_INFORMATION_SCHEMA](/sql-reference/motherduck-sql-reference/md_information_schema/introduction).
Otherwise you can consider reducing the `SNAPSHOT_RETENTION_DAYS` on your database to reduce the number of historical snapshots you retain. Note that this will reduce the window of time that you can restore data from. See [data recovery](/concepts/data-recovery) for more details on how to plan and setup a proper data recovery protocol for your organization.
### _Retained for clone bytes_ are higher than expected
Consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases.
### _Failsafe bytes_ are higher than expected
Failsafe bytes result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (through overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint. You can also consider using a [`TRANSIENT` database](#transient-databases), if it supports your use case, to reduce failsafe bytes to [1 day](https://motherduck.com/docs/concepts/storage-lifecycle/#transient-databases).
If you need help understanding or reducing your storage bill, reach out to [MotherDuck support](https://motherduck.com/contact-us/support/).
---
Source: https://motherduck.com/docs/concepts/data-recovery
# Data Recovery
> Understand MotherDuck's data recovery mechanisms
## Overview
MotherDuck provides [historical snapshots](/concepts/snapshots) to support point-in-time backup/restore mechanisms on Lite and Business plans. On the Lite plan, databases only keep the active snapshot (no historical retention) until usage limits are reached, after which Lite snapshot retention and [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) behavior apply. This page covers an example workflow with [named snapshots](/concepts/snapshots#2-named-snapshots) and outlines how to restore a database to a historical snapshot within the [snapshot retention](/concepts/snapshots#snapshot-retention) window (`snapshot_retention_days`). Refer to the [Database Snapshots](/concepts/snapshots) page for more details.
:::note DuckLake databases
DuckLake databases manage snapshots through [auto maintenance](/concepts/ducklake#auto-maintenance). Snapshot retention for DuckLake defaults to infinite (`NULL`) and is configured with `SNAPSHOT_RETENTION_DAYS` through [`ALTER DATABASE`](/sql-reference/motherduck-sql-reference/alter-database). See [DuckLake snapshot retention](/concepts/ducklake#snapshot-retention) for details.
:::
### Snapshot options per plan (native storage)
| Plan | Snapshot Retention Default | Configurable Retention Period | Named Snapshots | Point-in-Time Restore | [`UNDROP`](/sql-reference/motherduck-sql-reference/undrop-database) Database |
|------|----------------------------|-------------------------------|-----------------|----------------------|-------------------|
| **Business** | 7 days | 0–90 days | Yes | Yes | Yes |
| **Lite (paid)** | 1 day | 1 day | No | Yes | Yes |
| **Lite (free)** | 0 days | N/A | N/A | N/A | N/A |
Snapshots can be used to restore a new database to the snapshot using [`CREATE DATABASE`](/sql-reference/motherduck-sql-reference/create-database) or to [`ALTER`](/sql-reference/motherduck-sql-reference/alter-database-snapshot) an existing database to reflect the contents of a specific snapshot.
- **[Automatic snapshots](/concepts/snapshots#1-automatic-snapshots)** are retained for a set period of time according to `snapshot_retention_days` after they are no longer the active snapshot for a database.
- **[Named snapshots](/concepts/snapshots#2-named-snapshots)** are created explicitly and persist until unnamed. They are not subject to automatic garbage collection.
A new database:
```sql
CREATE DATABASE FROM (
SNAPSHOT_TIME ... | SNAPSHOT_NAME ... | SNAPSHOT_ID ...
)
```
An existing database:
```sql
ALTER DATABASE SET SNAPSHOT TO (
SNAPSHOT_TIME ... | SNAPSHOT_NAME ... | SNAPSHOT_ID ...
)
```
Snapshots can also be used to recover a dropped database:
```sql
UNDROP DATABASE
```
Refer to [undrop database](/sql-reference/motherduck-sql-reference/undrop-database) page for details.
Example:
```sql
-- You cannot drop the currently active database
USE some_other_db;
DROP DATABASE recovery_demo;
UNDROP DATABASE recovery_demo;
```
Refer to the [named snapshots](/sql-reference/motherduck-sql-reference/create-snapshot) page for an example.
## Restoring your database to a named snapshot
```sql
CREATE DATABASE example_db;
USE example_db;
CREATE TABLE one AS SELECT 1;
CREATE SNAPSHOT one OF example_db;
CREATE TABLE two AS SELECT 2;
CREATE SNAPSHOT two OF example_db;
CREATE TABLE three AS SELECT 3;
CREATE SNAPSHOT three OF example_db;
-- Accidentally drop data!
DROP TABLE two;
-- Restore a previous snapshot of the DB and check it's what you want
CREATE DATABASE example_restore FROM example_db (SNAPSHOT_NAME 'three');
-- The snapshot looks correct!
SELECT * FROM example_restore.two;
-- Restore the database to the old valid snapshot
ALTER DATABASE example_db SET SNAPSHOT TO (SNAPSHOT_NAME 'three');
-- We have successfully restored our data!
SELECT * FROM two;
```
## Restoring a database to a historical snapshot
To find all snapshots corresponding to your database, run the following queries.
To see the history of snapshots for a given database:
```sql
SELECT * FROM MD_INFORMATION_SCHEMA.DATABASE_SNAPSHOTS WHERE database_name = '' ORDER BY created_ts DESC;
```
If you have a rough idea of the time range you want to restore your database to, you can filter the above query by `created_ts`,
```sql
SELECT snapshot_id, created_ts, active_bytes FROM MD_INFORMATION_SCHEMA.DATABASE_SNAPSHOTS
WHERE database_name = ''
and created_ts >= '2024-12-02 20:00:00'
and created_ts <= '2024-12-02 20:05:00'
ORDER BY created_ts DESC
```
The results should look something like this:
| snapshot_id | created_ts | active_bytes|
|-------------|-----------|----------|
| `73034f48-e832-40d6-a30f-9055eb302a2e` | `2024-12-02 20:03:30` | `2191330` |
| `c204ce3b-f3fd-4677-8a05-e8680648cf27` | `2024-12-02 20:02:05` | `2183991` |
| `63395025-b139-4c6f-8fc2-7b8c0feff748` | `2024-12-02 20:01:55` | `1847296` |
Example (restore an existing database by ID):
```sql
ALTER DATABASE your_database_name
SET SNAPSHOT TO (SNAPSHOT_ID '');
```
Both automated and named snapshots can be used to restore to a desired state that was captured. Users can either restore a new or existing database to a specific snapshot.
```sql
-- For a new database
CREATE DATABASE restored_database FROM your_database_name (SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27');
```
After running the above command, users can run queries on `restored_database` and use the state of the database from a prior point-in-time.
Once users have the exact snapshot they are interested in restoring, we recommend finding the `snapshot_id` (instead of using `snapshot_time`) and using the command:
```sql
ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_ID 'c204ce3b-f3fd-4677-8a05-e8680648cf27');
```
**Note:** Running a `SET SNAPSHOT TO` command that specifies a timestamp that doesn't exist in `md_information_schema.database_snapshots` will select the most recent snapshot created at or before the specified timestamp. In our example, snapshot `63395025-b139-4c6f-8fc2-7b8c0feff748` would be selected because it is the only snapshot in the information schema table that was created before `'2024-12-02 20:02:04'`
```sql
ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_TIME '2024-12-02 20:02:04');
```
However, if you run:
```sql
ALTER DATABASE your_database_name SET SNAPSHOT TO (SNAPSHOT_TIME '2024-12-02 20:02:05');
```
then snapshot `c204ce3b-f3fd-4677-8a05-e8680648cf27` will be selected because there is an exact timestamp match.
## See also
- [Database Snapshots](/concepts/snapshots) — Understanding snapshot types, retention, and best practices
- [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot) — SQL reference for creating snapshots
- [`UNDROP DATABASE`](/sql-reference/motherduck-sql-reference/undrop-database) — Recovering dropped databases
---
Source: https://motherduck.com/docs/concepts/flights
# Flights
> How Flights run scheduled Python next to your MotherDuck data, and when to reach for one instead of SQL, a Dive, or an external orchestrator.
A **Flight** is a Python program that MotherDuck schedules and runs, with direct access to your databases. It can do whatever Python can: call external APIs, use any PyPI package, process files or run custom logic.
## Where Flights fit in MotherDuck
MotherDuck runs two compute planes against the same storage:
```mermaid
flowchart LR
User{{"You"}}:::green
subgraph MotherDuck["MotherDuck"]
Runtime["Python runtime (per Flight run)"]:::yellow
Duckling["Duckling (SQL compute)"]:::yellow
end
DB[("Your databases")]:::yellow
User -->|"interactive SQL"| Duckling
User -->|"create / schedule"| Runtime
Runtime -->|"md:"| Duckling
Duckling --> DB
```
**Ducklings** run your SQL. They're per-user (see [Hypertenancy](/concepts/hypertenancy)) and start in about a second.
The **Python runtime** runs your scheduled Python. You provide the Flight's source code, and each run gets its own isolated runtime that executes the `main()` entrypoint and exits. A run starts from the Flight's schedule or an `MD_RUN_FLIGHT` call.
A Flight reaches your data the same way any DuckDB client does: the Python runtime opens an `md:` connection, which routes through a Duckling that executes the SQL against your databases. MotherDuck injects an access token into the Flight's environment so the connection works without you embedding credentials.
## The relationship to SQL
Flights and SQL talk to each other in both directions.
**From Python to SQL.** A Flight uses the DuckDB Python client to run any query you can express in SQL:
```python
import duckdb
def main():
con = duckdb.connect("md:")
con.execute("INSERT INTO sales.daily_totals SELECT * FROM read_parquet('s3://incoming/today.parquet')")
```
Anything in the SQL reference is available here: aggregations, MotherDuck table functions, attach commands, and so on.
**From SQL to Flights.** Flights also have a SQL surface. `MD_CREATE_FLIGHT`, `MD_FLIGHTS`, `MD_FLIGHT_RUNS`, and the rest of the [Flights SQL functions](/sql-reference/motherduck-sql-reference) let you create, schedule, list, and monitor Flights from anywhere you can run SQL: a DuckDB CLI, your BI tool, or another Flight.
**When to pick which.** A reasonable rule of thumb:
- **Stay in SQL** for a one-off query, or one you already schedule elsewhere.
- **Use a Flight** when you want MotherDuck to run the query on a schedule, when the job should retry and keep a run history, or when the work goes beyond SQL: calling an external service, installing a Python package, reading a file format SQL can't, or chaining steps that don't fit one query.
## What you can build with Flights
- **Ingest.** Pull data from external sources (Postgres, BigQuery, Snowflake, S3, APIs) and write it to MotherDuck tables. [dlt](https://dlthub.com/) is the recommended ingest library: it gives you a declarative pipeline with schema evolution, incremental loading, and a MotherDuck destination.
- **Transform.** Read MotherDuck data, run Python-heavy transformations (machine-learning features, geospatial work), and write the result back. When the transformation is graph-shaped, [dbt](https://docs.getdbt.com/) with the `dbt-duckdb` adapter is the recommended approach.
- **AI enrichment.** Run LLM and embedding work over your data on a schedule. Call MotherDuck's [`prompt`](/sql-reference/motherduck-sql-reference/ai-functions/prompt/) and [`embedding`](/sql-reference/motherduck-sql-reference/ai-functions/embedding/) functions from SQL to classify, summarize, or extract structured fields row by row, or use a Python AI library in `main()` to generate embeddings, score records, or call a model API and write the results back to a table.
- **Export and delivery.** Write query results to object storage as Parquet or CSV — a scheduled drop in S3, Google Cloud Storage, or Azure Blob Storage for a partner or a downstream pipeline to pick up.
- **Reverse ETL and notifications.** Read from MotherDuck and push to a downstream operational system: post a daily summary to Slack, sync an aggregate to a SaaS tool, update a record in a CRM.
## Creating a Flight
You can create a Flight with the [MotherDuck MCP server](/key-tasks/ai-and-motherduck/mcp-setup/): describe the job in natural language, and the agent writes the Python, sets the schedule, and runs it for you with the MCP's Flight tools (`create_flight`, `edit_flight_source`, `run_flight`, and the rest).
You can also create and manage Flights directly with SQL through [`MD_CREATE_FLIGHT`](/sql-reference/motherduck-sql-reference/flights/) and the other Flight table functions, from a DuckDB CLI, your BI tool, or another Flight.
## Integrating with the tools you already use
- **dlt** for ingest. Generate or hand-write a dlt pipeline that uses MotherDuck as the destination, then deploy it as a Flight.
- **DuckDB Community Extensions**. Use [community extensions](https://duckdb.org/community_extensions/list_of_extensions) you trust that are not available on MotherDuck by default. Pull data from [BigQuery](https://duckdb.org/community_extensions/extensions/bigquery) or [Snowflake](https://duckdb.org/community_extensions/extensions/snowflake), [read YAML files](https://duckdb.org/community_extensions/extensions/yaml) and much more.
- **Anything on PyPI.** Pull in any pip-installable package through `requirements.txt`.
- **MCP and AI agents.** Agents create, edit, and run Flights through the [MotherDuck MCP Server](/sql-reference/mcp/). This is the marquee creation path: describe what you want in natural language, the agent writes the Python and wires up the schedule.
## Beyond Python
A Flight is a Linux process that runs your Python program, with shell access through `subprocess`. That opens up two patterns worth knowing.
### Install and call system binaries
Use `subprocess` to run `apt-get` and invoke command-line tools (git, ffmpeg, Playwright, anything available as a Debian package) from your Flight.
```python
import subprocess
def main():
subprocess.run(["apt-get", "install", "-y", "git"], check=True)
subprocess.run(["git", "clone", "https://github.com/example/repo"], check=True)
```
### Run a local DuckDB with community extensions
A Flight can open its own in-process DuckDB connection alongside the `md:` connection, which lets you load DuckDB [community extensions](https://duckdb.org/community_extensions/) that aren't supported on MotherDuck's server-side runtime. For example, the [`bigquery`](https://duckdb.org/community_extensions/extensions/bigquery) and [`snowflake`](https://duckdb.org/community_extensions/extensions/snowflake) extensions let a Flight read directly from another warehouse and write the result to MotherDuck.
```python
import duckdb
def main():
local = duckdb.connect() # local in-process DuckDB
local.execute("INSTALL bigquery FROM community; LOAD bigquery;")
local.execute("ATTACH 'project=my-project' AS bq (TYPE bigquery, READ_ONLY)")
# read from BigQuery into the Flight, then write the result to MotherDuck
events = local.sql("SELECT * FROM bq.analytics.events").df()
md = duckdb.connect("md:")
md.execute("INSERT INTO raw.events SELECT * FROM events")
```
Keep heavy compute outside the Flight runtime. A Flight is sized for orchestration and light processing, not for crunching large tables in memory. Let the source warehouse (BigQuery, Snowflake) and MotherDuck handle the heavy lifting, and let the Flight move data between them.
## How Flights are scoped
- **A Flight runs as a triggered job.** A run starts from its schedule or a `MD_RUN_FLIGHT` call, executes `main()` to completion, and exits; a single run can be long or short.
- **One Flight is one process.** A run isn't a managed worker pool with a built-in queue or distributed state. You can fan out by having a Flight call `MD_RUN_FLIGHT` to trigger others, or parallelize within a run with a Python thread pool — the coordination logic is yours to write. Flights are at their best for MotherDuck-centric work where the data, schedule, and compute live in one place; for workflows that span many external systems, a dedicated orchestrator like Airflow or Prefect is still the better tool.
- **Runs are unattended.** You read a run's output afterward from its [logs](/sql-reference/motherduck-sql-reference/flights/md-flight-logs) rather than stepping through it live. To explore data interactively, run queries in the SQL editor instead.
## Related resources
- [Flights SQL reference](/sql-reference/motherduck-sql-reference/flights/) — `MD_CREATE_FLIGHT`, `MD_RUN_FLIGHT`, and the rest of the Flight table functions.
- [Hypertenancy](/concepts/hypertenancy) — why Ducklings give every user dedicated SQL compute.
- [Creating visualizations with Dives](/key-tasks/ai-and-motherduck/dives) — interactive React apps over MotherDuck data.
- [MotherDuck MCP Server](/sql-reference/mcp/) — the MCP surface for AI agents that create and manage Flights.
---
Source: https://motherduck.com/docs/concepts/scaling-patterns
# Workload scaling patterns
> Choose the right compute size, scaling approach, and connection model for your MotherDuck workload using a decision flowchart and workload-to-pattern matrix.
MotherDuck gives you several levers to scale your workloads. The right combination depends on your concurrency needs, query characteristics, and whether your workload is read-heavy or write-heavy. This page helps you match your workload to the right scaling pattern.
## How MotherDuck scales per workload
MotherDuck scales workloads through compute units called [Ducklings](/concepts/hypertenancy/). Each user or service account gets a dedicated Duckling and read scaling pool, and you can adjust three levers to match your workload:
| Scaling lever | What it does | When to use it |
|---|---|---|
| **Vertical scaling** | Resize your Duckling ([Pulse through Giga](/about-motherduck/billing/duckling-sizes)) | Queries need more CPU or memory|
| **Horizontal scaling** | Add read-only Ducklings through [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) | Many concurrent users running read queries |
| **Workload isolation** | Create separate [service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) | Teams or pipelines that should not share compute, for example to prevent a large data ingestion impacting the queries of analysts. |
These levers are complementary. For example, you might use a Jumbo Duckling (vertical) for data loading and use a pool of Pulse Ducklings with read scaling (horizontal) for your dashboard users. You can connect through any [supported interface](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/), including the native DuckDB SDK, the Postgres endpoint, or DuckDB WASM.
:::tip[Not sure what you need?]
Follow the [decision flowchart](#decision-flowchart) at the bottom of this page to find the right scaling pattern for your workload.
:::
## Understanding the scaling levers
### Vertical scaling: Duckling sizes
When a single query needs more resources, move to a larger Duckling. Larger Ducklings have more CPU, memory, and extra SSD space to be used whenever queries use up too much memory. This helps with:
- Complex joins and aggregations
- Large data loading jobs
- Queries that process more data than fits in memory
Duckling sizes range from **Pulse** (lightweight, on-demand billing) to **Giga** (maximum resources for the heaviest batch jobs). See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) for the full comparison.
**When to size up**: If queries are slow and you see high values for `BYTES_SPILLED_TO_DISK` or `WAIT_TIME` in your [query history](/sql-reference/motherduck-sql-reference/md_information_schema/query_history/), your Duckling may need more memory.
### Horizontal scaling: Read scaling
When you need to serve many concurrent read queries, [read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/) adds read-only Duckling replicas behind your account. Key concepts:
- **Pool size**: The default pool is 4 Ducklings, configurable up to 16 (soft limit). [Contact support](https://motherduck.com/contact-us/support/) for higher limits.
- **Eventual consistency**: Read replicas lag a few minutes behind the primary. Use [`CREATE SNAPSHOT`](/sql-reference/motherduck-sql-reference/create-snapshot/) and [`REFRESH DATABASES`](/sql-reference/motherduck-sql-reference/refresh-database/) if you need tighter synchronization.
- **One Duckling per user**: For the best performance, aim for one Duckling per concurrent user. This takes full advantage of DuckDB's single-node architecture.
### Session affinity and routing
By default, read scaling distributes connections across the pool in round-robin fashion. When the number of connections exceeds your pool size, new connections share existing Ducklings.
For workloads where users run unique queries, use [`session_name`](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/#session-affinity-with-session-name) to route a user's connections to the same Duckling. This improves performance because:
- **Cache locality**: DuckDB caches data per instance. Routing the same user to the same Duckling means their subsequent queries benefit from a warm cache.
- **Consistent view**: Queries within a session see a consistent snapshot of the data.
- **Better isolation**: Concurrent users do not compete for the same Duckling's resources.
Set `session_name` to a user ID, session ID, or any stable identifier to benefit from caching where possible. You can [set the Duckling cooldown period](/docs/about-motherduck/billing/duckling-sizes/#configuring-the-cooldown-period) to match your use case and keep the Duckling alive with its cache.
**When to use `session_name`**: Use it when users run unique, personalized queries, for example in customer-facing analytics or multi-tenant dashboards. Skip it when all users run the same queries (such as a shared reporting dashboard), since a shared connection pool already routes efficiently.
### Workload isolation: Service accounts
[Service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/) give you full compute and data isolation between workloads. Each service account gets its own Duckling and read scaling pool, which makes it easier to track usage and billing.
Use separate service accounts when:
- Different teams should never share compute (for example, production vs. development)
- A data loading pipeline should not compete with queries from analysts
- You need your customers to be able to write back to a database in a [customer-facing analytics](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/) setup, or have a separate read scaling pool for each customer.
If you need a visual interface to manage these service accounts, the UI lets you impersonate a service account and adjust settings and run queries as the service account.
## Quick reference: workload patterns
Use this matrix to find the recommended pattern for common workloads. Each row represents a typical use case, with inputs describing the workload and outputs recommending a configuration.
| Use case | Users | Concurrency | R/W | Overlap | Weight | Duckling size | Scaling approach |
|---|---|---|---|---|---|---|---|
| *Ad-hoc analyst* | 👤 | Sequential | R/W | — | ⚡/🏋️ | Pulse / Standard+ | Default (single Duckling) |
| *dbt or ELT pipeline* | 👤 | Concurrent | W | — | 🏋️ | Jumbo / Mega | Default (single Duckling) |
| *Scheduled ingestion job* | 👤 | Sequential | W | — | 🏋️ | Jumbo+ | Default + dedicated service account |
| *BI dashboard (Omni, Hex, Metabase)* | 👥👥 | Concurrent | R | High | ⚡ | Pulse / Standard | Read scaling (shared pool) |
| *Embedded analytics* | 👥👥 | Concurrent | R | Low | ⚡/🏋️ | Pulse / Standard+ | Read scaling + `session_name` |
| *Customer-facing app (3-tier)* | 👥👥 | Concurrent | R | Low | ⚡ | Standard | Read scaling + `session_name` |
| *Serverless function (Lambda, Workers)* | 👥👥 | Concurrent | R | Varies | ⚡ | Standard | Read scaling |
| *Multi-team production* | 👥👥 | Concurrent | R/W | Low | ⚡/🏋️ | Per team | Separate service accounts |
### Reading the matrix
**Input columns** describe your workload:
- **Users**: How many people or clients connect: 👤 single, 👥👥 many
- **Concurrency**: Whether queries run one at a time (sequential) or in parallel (concurrent)
- **R/W**: Whether the workload reads (R), writes (W), or both (R/W)
- **Overlap**: Whether different users tend to run the same queries (high) or unique queries (low)
- **Weight**: Whether queries are light (⚡ sub-second) or heavy (🏋️ seconds to minutes)
**Output columns** recommend a configuration:
- **Duckling size**: Which [Duckling size](/about-motherduck/billing/duckling-sizes/) to use
- **Scaling approach**: Which horizontal scaling method to apply
## Choosing an interface
Your choice of interface does not change the scaling levers available to you, but it does affect session management and connection behavior.
| Interface | Best for | Session management |
|---|---|---|
| Native SDK (Python, Node.js, Java) | Client applications, scripts, dbt | Instance cache, `session_name` |
| [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/) | Serverless functions, BI tools, environments without DuckDB | Per-connection |
| DuckDB WASM | Browser-based applications | Client-side compute |
## Cost considerations
Scaling decisions affect your compute costs:
- **Vertical scaling** increases the per-second cost of your Duckling. Larger Ducklings cost more but finish heavy queries faster.
- **Horizontal scaling** adds Ducklings proportional to active sessions, not total users. Idle Ducklings shut down after the configured [cooldown period](/about-motherduck/billing/duckling-sizes/).
- **Pulse Ducklings** use per-query billing (minimum 1 compute-unit second), making them cost-effective for sporadic, lightweight workloads.
- **Cooldown tuning** balances cost against cache warmth. A longer cooldown keeps the cache warm for returning users but costs more during idle periods.
See [Duckling sizes](/about-motherduck/billing/duckling-sizes/) and [pricing](/about-motherduck/billing/pricing/) for the full cost breakdown.
## Decision flowchart
If you are not sure where to start, follow this flowchart:
```mermaid
flowchart TD
WriteQ{"Write-heavy? (data loading, ETL)"}:::yellow
WriteQ -->|Yes| WeightQ{"Heavy queries? (complex joins, large loads)"}:::yellow
WriteQ -->|No| ConcQ{"How many concurrent read users?"}:::yellow
WeightQ -->|Yes| SizeUp["Size up your Duckling (Jumbo / Mega / Giga)"]:::green
WeightQ -->|No| StdDuckling["Standard Duckling"]:::green
SizeUp --> Isolation{"Need compute isolation between workloads?"}:::yellow
StdDuckling --> Isolation
Isolation -->|Yes| SvcAcct["Separate service accounts"]:::green
Isolation -->|No| SingleAcct["Single service account"]:::green
ConcQ -->|"1-5"| Default["Default Duckling, size for your heaviest query"]:::green
ConcQ -->|"5-50"| OverlapQ{"Do users read the same data?"}:::yellow
ConcQ -->|"50+"| OverlapQ2{"Do users read the same data?"}:::yellow
OverlapQ -->|"Yes, mostly shared"| SharedPool["Read scaling (shared connection pool)"]:::green
OverlapQ -->|"No, differs per user"| SessionHint["Read scaling + session_name"]:::green
OverlapQ2 -->|"Yes, mostly shared"| HighConcShared["Read scaling at max connection pool size"]:::green
OverlapQ2 -->|"No, differs per user"| HighConcUnique["Read scaling at max connection pool size + session_name"]:::green
```
## Related content
- [Hypertenancy](/concepts/hypertenancy/): how MotherDuck's per-user compute model works
- [Duckling sizes](/about-motherduck/billing/duckling-sizes/): compare sizes and configure your Ducklings
- [Read scaling](/key-tasks/authenticating-and-connecting-to-motherduck/read-scaling/): set up read-only Duckling pools
- [Create and configure service accounts](/key-tasks/service-accounts-guide/create-and-configure-service-accounts/): create isolated compute for teams and pipelines
- [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint/): connect through the PostgreSQL wire protocol
- [Customer-facing analytics](/key-tasks/customer-facing-analytics/3-tier-cfa-guide/): build multi-tenant analytics with per-customer isolation
---
Source: https://motherduck.com/docs/concepts/object-name-resolution
# Object name resolution
> Fully qualified naming conventions and database resolution rules in MotherDuck.
## Fully qualified naming convention
Fully qualified names (FQN) in MotherDuck are of the form `..