Read Scaling
Connecting read-heavy applications or BI tools with many concurrent users through a single MotherDuck account can sometimes lead to performance bottlenecks. By default, all connections using the same account share a single cloud DuckDB instance (a "duckling"). To spread the load, you can use a Read Scaling Token. This token lets your application transparently access additional compute resources (starting with 16 replicas by default, adjustable upon request) to smoothly handle concurrent reads. The result? A smoother experience for your users and faster read performance for your applications!
Read scaling tokens allow MotherDuck accounts to spin up multiple read-only replicas of your database that can be read concurrently. When a client connects using a read scaling token, it's directed to one of these read scaling replicas, each powered by its own dedicated duckling. As more users connect via read scaling tokens, your flock of ducklings expands, aiming to give each user their own dedicated duckling, up to the configured limit (defaulting to 16, but adjustable). If you exceed this limit, new connections will share existing ducklings, maintaining user-to-duckling affinity where possible.
Understanding read scaling tokens
MotherDuck offers a special type of personal access token called a Read Scaling Token. This token enables scalable read operations while restricting write capabilities. Here's how it works:
Creating a read scaling token
You can generate a read scaling token through the MotherDuck UI. When generating an access token, simply select "Read Scaling Token" as the token type.
Permissions and limitations
A read scaling token grants permission for read operations (like querying tables) but restricts write operations, including:
- Updating tables
- Creating new databases
- Attaching or detaching databases
How a DuckDB client uses read scaling tokens
When a DuckDB client connects to MotherDuck with a read scaling token:
- It is assigned to one of the read scaling replicas for the user account.
- This replica is separate from the standard read-write "duckling" used for connections without a scaling token.
These replicas are eventually consistent. This means read operations might see data that slightly lags behind the very latest writes made to the primary instance. This trade-off enables scalability but doesn't guarantee real-time data synchronization across all replicas.
Scaling and Replicas
- Your flock of read scaling replicas starts with a maximum default size of 16 replicas, but this limit can be adjusted by contacting support.
- Each replica operates independently to handle concurrent read operations efficiently.
Session affinity with session_hint
To ensure users consistently connect to the same replica (improving caching and consistency), the DuckDB connection string supports the session_hint
parameter:
- Clients providing the same
session_hint
value are directed to the same replica. - This parameter can be set to the ID of a user session, a user ID, or a hashed value for privacy.
By leveraging read scaling tokens and session_hint
, you can efficiently scale read operations and group user sessions for optimal performance.
Instance caching with dbinstance_inactivity_ttl
DuckDB integrations (Python, JDBC, R, etc.) use an instance cache to keep connections to the same database instance alive for a short period after use. This mechanism now includes a configurable Time-to-Live (TTL):
- It improves read scaling by helping maintain session affinity even across separate queries or short connection gaps.
- You can customize the TTL (in minutes) in the connection string:
md:?dbinstance_inactivity_ttl=<seconds>
This caching behavior boosts the effectiveness of session_hint
, making it more likely that frequent queries from the same client land on the same duckling, even with short breaks between connections.
Ensuring Data Freshness
In read scaling mode, you can typically expect replicas to sync changes from the primary read-write instance within a few minutes. For most use cases, this default eventual consistency is perfectly fine.
However, if your application requires stricter synchronization, you can manually trigger updates. Create snapshots on the writer connection and then refresh the database on the read-scaling connections.
Creating a snapshot of a database will interrupt any ongoing queries interacting with that database.
Example Workflow
-- Writer connection: make changes and create a snapshot
CREATE TABLE my_db.my_table AS ...;
CREATE SNAPSHOT OF my_db;
-- Read-scaling connection: refresh to see the snapshot
REFRESH DATABASES; -- Refreshes all connected databases and shares
REFRESH DATABASE my_db; -- Alternatively, refresh just one specific database
This approach guarantees that readers see the most recent snapshot.
Learn more about REFRESH DATABASES and CREATE SNAPSHOT.
Best Practices
Here are a few tips to get the most out of MotherDuck's read scaling capabilities:
Optimize your flock size
Aim for one duckling per concurrent user for the best experience, as this leverages DuckDB's single-instance efficiency. However, you can configure a maximum flock size based on expected concurrency and cost considerations, allowing some users to share ducklings if needed. The default limit is 16 replicas, but this is a soft limit – contact MotherDuck support if you need more.
Leverage local processing where possible
Consider using DuckDB WASM to run client instances directly in the browser. Shifting computation client-side utilizes local resources effectively and can sometimes scale better for very large user numbers than relying solely on cloud ducklings.
Stick users to their ducklings
Use the session_hint
parameter (ideally derived from a unique user identifier) to ensure all queries from a specific end user hit the same duckling replica. This improves caching effectiveness, provides a more consistent view of data across queries for that user and offers better isolation between concurrent users.