MotherDuck Internals: How does read scaling work?
In MotherDuck, every user gets their own DuckDB instance in the cloud, we call it a Duckling. The Duckling starts almost instantly (generally less than 200 ms), can auto-scale, and shuts down when it isn’t being used. What’s more, you only pay when it is running queries. If you have hundreds of users in your organization, each one has a Duckling. The diagram below shows how Alice and Bob, both from the Foo.com organization, both connect to MotherDuck. Each one gets routed to their own Duckling, and they can both access Foo.com’s data warehouse independently, using independent compute.

The tricky part comes when Alice and Bob use a BI tool that funnels all of their requests to the data warehouse looking like a single user (bi-user@foo.com in this case). All of the users in that organization look like they are coming from that single user, so they get routed to the same duckling.

With Read Scaling, we will automatically spin up additional Ducklings as needed to handle read-only workloads. Those Ducklings are a clone of the original Duckling, and have access to the same data. Subsequent queries are load distributed between the DuckDB instances. In order to get more consistent performance, we will attempt to route an end user to the same duckling for all of their queries. You can see how this works in the diagram below.

Even though MotherDuck can start new instances very quickly, there is benefit to having a Duckling already up and running with data pre-cached. If queries get sent to random replicas, performance might not be great if you do back-to-back queries against the same data. To solve this problem, MotherDuck routes queries based on your client-side identity. If you create a database connection and reuse that connection, you will continue to talk to the same Duckling. This lets you take advantage of cache locality since the DuckDB buffer pool will have relevant parts of your data already in memory.
There is a further optimization that can be used by applications that fan out to multiple end users and manage connection pools. If multiple connections map to the same user, or connections get recycled frequently, applications can provide a session_hint parameter when connecting to MotherDuck. The session_hint can be a hash of the user id or session id, and it allows a stable mapping between end users and Ducklings. This can ensure that a user’s workload will get consistent performance.
Other data warehouse vendors have something similar; you can configure Snowflake to spin up additional warehouses when you have high concurrency. However, this is a pretty heavyweight solution; MotherDuck instances are much lighter weight, since they can be started and stopped in milliseconds with minimal performance overhead.