Workload scaling patterns
MotherDuck gives you several levers to scale your workloads. The right combination depends on your concurrency needs, query characteristics, and whether your workload is read-heavy or write-heavy. This page helps you match your workload to the right scaling pattern.
How MotherDuck scales per workload
MotherDuck scales workloads through compute units called Ducklings. Each user or service account gets a dedicated Duckling and read scaling flock, and you can adjust three levers to match your workload:
| Scaling lever | What it does | When to use it |
|---|---|---|
| Vertical scaling | Resize your Duckling (Pulse through Giga) | Queries need more CPU or memory |
| Horizontal scaling | Add read-only Ducklings through read scaling | Many concurrent users running read queries |
| Workload isolation | Create separate service accounts | Teams or pipelines that should not share compute, for example to prevent a large data ingestion impacting the queries of analysts. |
These levers are complementary. For example, you might use a Jumbo Duckling (vertical) for data loading and use a flock of pulse Ducklings with read scaling (horizontal) for your dashboard users. You can connect through any supported interface, including the native DuckDB SDK, the Postgres endpoint, or DuckDB WASM.
Follow the decision flowchart at the bottom of this page to find the right scaling pattern for your workload.
Understanding the scaling levers
Vertical scaling: Duckling sizes
When a single query needs more resources, move to a larger Duckling. Larger Ducklings have more CPU, memory, and extra SSD space to be used whenever queries use up too much memory. This helps with:
- Complex joins and aggregations
- Large data loading jobs
- Queries that process more data than fits in memory
Duckling sizes range from Pulse (lightweight, on-demand billing) to Giga (maximum resources for the heaviest batch jobs). See Duckling sizes for the full comparison.
When to size up: If queries are slow and you see high values for BYTES_SPILLED_TO_DISK or WAIT_TIME in your query history, your Duckling may need more memory.
Horizontal scaling: Read scaling
When you need to serve many concurrent read queries, read scaling adds read-only Duckling replicas behind your account. Key concepts:
- Pool size: The default pool is 4 Ducklings, configurable up to 16 (soft limit). Contact support for higher limits.
- Eventual consistency: Read replicas lag a few minutes behind the primary. Use
CREATE SNAPSHOTandREFRESH DATABASESif you need tighter synchronization. - One Duckling per user: For the best performance, aim for one Duckling per concurrent user. This takes full advantage of DuckDB's single-node architecture.
Session affinity and routing
session_namesession_nameBy default, read scaling distributes connections across the pool in round-robin fashion. When the number of connections exceeds your pool size, new connections share existing Ducklings.
For workloads where users run unique queries, use session_name to route a user's connections to the same Duckling. This improves performance because:
- Cache locality: DuckDB caches data per instance. Routing the same user to the same Duckling means their subsequent queries benefit from a warm cache.
- Consistent view: Queries within a session see a consistent snapshot of the data.
- Better isolation: Concurrent users do not compete for the same Duckling's resources.
Set session_name to a user ID, session ID, or any stable identifier to benefit from caching where possible. You can set the Duckling cooldown period to match your use case and keep the Duckling alive with its cache.
When to use session_name: Use it when users run unique, personalized queries, for example in customer-facing analytics or multi-tenant dashboards. Skip it when all users run the same queries (such as a shared reporting dashboard), since a shared connection pool already routes efficiently.
Workload isolation: Service accounts
Service accounts give you full compute and data isolation between workloads. Each service account gets its own Duckling and read scaling pool, which makes it easier to track usage and billing.
Use separate service accounts when:
- Different teams should never share compute (for example, production vs. development)
- A data loading pipeline should not compete with queries from analysts
- You need your customers to be able to write back to a database in a customer-facing analytics setup, or have a separate read scaling flock for each customer.
If you need a visual interface to manage these service accounts, the UI lets you impersonate a service account and adjust settings and run queries as the service account.
Quick reference: workload patterns
Use this matrix to find the recommended pattern for common workloads. Each row represents a typical use case, with inputs describing the workload and outputs recommending a configuration.
| Use case | Users | Concurrency | R/W | Overlap | Weight | Duckling size | Scaling approach |
|---|---|---|---|---|---|---|---|
| Ad-hoc analyst | 👤 | Sequential | R/W | — | ⚡/🏋️ | Pulse / Standard+ | Default (single Duckling) |
| dbt or ELT pipeline | 👤 | Concurrent | W | — | 🏋️ | Jumbo / Mega | Default (single Duckling) |
| Scheduled ingestion job | 👤 | Sequential | W | — | 🏋️ | Jumbo+ | Default + dedicated service account |
| BI dashboard (Omni, Hex, Metabase) | 👥👥 | Concurrent | R | High | ⚡ | Pulse / Standard | Read scaling (shared pool) |
| Embedded analytics | 👥👥 | Concurrent | R | Low | ⚡/🏋️ | Pulse / Standard+ | Read scaling + session_name |
| Customer-facing app (3-tier) | 👥👥 | Concurrent | R | Low | ⚡ | Standard | Read scaling + session_name |
| Serverless function (Lambda, Workers) | 👥👥 | Concurrent | R | Varies | ⚡ | Standard | Read scaling |
| Multi-team production | 👥👥 | Concurrent | R/W | Low | ⚡/🏋️ | Per team | Separate service accounts |
Reading the matrix
Input columns describe your workload:
- Users: How many people or clients connect: 👤 single, 👥👥 many
- Concurrency: Whether queries run one at a time (sequential) or in parallel (concurrent)
- R/W: Whether the workload reads (R), writes (W), or both (R/W)
- Overlap: Whether different users tend to run the same queries (high) or unique queries (low)
- Weight: Whether queries are light (⚡ sub-second) or heavy (🏋️ seconds to minutes)
Output columns recommend a configuration:
- Duckling size: Which Duckling size to use
- Scaling approach: Which horizontal scaling method to apply
Choosing an interface
Your choice of interface does not change the scaling levers available to you, but it does affect session management and connection behavior.
| Interface | Best for | Session management |
|---|---|---|
| Native SDK (Python, Node.js, Java) | Client applications, scripts, dbt | Instance cache, session_name |
| Postgres endpoint | Serverless functions, BI tools, environments without DuckDB | Per-connection |
| DuckDB WASM | Browser-based applications | Client-side compute |
Cost considerations
Scaling decisions affect your compute costs:
- Vertical scaling increases the per-second cost of your Duckling. Larger Ducklings cost more but finish heavy queries faster.
- Horizontal scaling adds Ducklings proportional to active sessions, not total users. Idle Ducklings shut down after the configured cooldown period.
- Pulse Ducklings use per-query billing (minimum 1 compute-unit second), making them cost-effective for sporadic, lightweight workloads.
- Cooldown tuning balances cost against cache warmth. A longer cooldown keeps the cache warm for returning users but costs more during idle periods.
See Duckling sizes and pricing for the full cost breakdown.
Decision flowchart
If you are not sure where to start, follow this flowchart:
Related content
- Hypertenancy: how MotherDuck's per-user compute model works
- Duckling sizes: compare sizes and configure your Ducklings
- Read scaling: set up read-only Duckling pools
- Create and configure service accounts: create isolated compute for teams and pipelines
- Postgres endpoint: connect through the PostgreSQL wire protocol
- Customer-facing analytics: build multi-tenant analytics with per-customer isolation