Storage Lifecycle and Management
Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a sophisticated multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes.
Storage Lifecycle Overview
The following documents MotherDuck's storage lifecycle.
There are 4 distinct stages of the storage lifecycle:
Active bytes: Actively referenced bytes of the database. These bytes are accessible by directly querying the database
- Historical bytes: Non-active bytes referenced by a historical snapshot or share of this database
- Kept for cloned bytes: Bytes referenced by other databases (via zero-copy clone) that are no longer referenced by this database as active or historical bytes
- Failsafe bytes: Bytes that are no longer referenced by any database or share that are retained for some period of time as system backups
MotherDuck will run a periodic job that will reclassify data to the proper storage lifecycle stage.
Data can only flow through the storage lifecycle unidirectionally, from left to right.
The following conditions can trigger data to be reclassified to a new stage:
- Active bytes: When the data is deleted from the database
- Historical bytes: When all shares referencing the data are dropped or updated, and all historic snapshots referencing the data are deleted
- Kept for cloned bytes: When the data is deleted from all zero-copy-cloned databases
- Failsafe bytes: After the failsafe retention period for Standard databases (7 days) or Transient databases (1 day) has passed
An organization is billed based on the average of active, historical, kept for cloned, and failsafe bytes across all of their databases over the billing period.
Refer to the data recovery overview for more details on how to manage historical snapshots.
How This Affects Your Data Strategy
Understanding the storage lifecycle helps you make informed decisions about:
- Data deletion strategies: When you delete data, it doesn't immediately reduce your bill due to the retention stages
- Sharing considerations: Shared data remains in historical bytes until shares are updated or dropped
- Cloning decisions: Zero-copy clones can keep data in kept for cloned bytes even after deletion from the source
- Cost optimization: Different lifecycle stages have different cost implications and management strategies
For more information on data sharing, see Sharing Data. For details on zero-copy cloning, refer to MotherDuck Architectural Concepts.
Storage Management
MotherDuck databases support two options for configuring storage retention.
The first option is SNAPSHOT_RETENTION_DAYS. This setting allows a user to specify how many days historical snapshots should be retained for data recovery and time travel (see Data Recovery). The recommended minimum is at least 1 day, such that you can easily recover your data if you accidentally drop or overwrite it. The max historical retention period you can set depends on your current plan (see below).
The second option is setting a database to be TRANSIENT. This setting controls the failsafe period of the database, and has different historical retention period limits. TRANSIENT databases are only available to Lite and Business plans.
To see the historical retention and transient status of your databases, use the md_information_schema.databases view.
Lite starts in free-tier mode with no historical retention until usage limits are reached, after which Lite defaults apply.
Standard Databases:
| Plan | Failsafe Period: failsafe_bytes | Default Historical Retention: historical_bytes | Minimum Historical Retention: historical_bytes | Maximum Historical Retention: historical_bytes |
|---|---|---|---|---|
| Business | 7 days | 1 day | 0 days | 90 days |
| Lite (paid) | 7 days | 1 day | 1 day | 1 day |
| Lite (free) | 7 days | 0 days | 0 days | 0 days |
Historical retention enables point-in-time restore for your data. Business plan users can configure retention up to 90 days for extended data recovery capabilities.
Transient Databases
For use cases that don't require the default failsafe retention period (7 days), a MotherDuck database can be set as TRANSIENT at database creation to enforce a 1 day failsafe minimum. This setting can only be defined at database creation and is not modifiable.
| Plan | Failsafe Period: failsafe_bytes | Default Historical Retention: historical_bytes | Minimum Historical Retention: historical_bytes | Maximum Historical Retention: historical_bytes |
|---|---|---|---|---|
| Business | 1 day | 1 day | 0 days | 90 days |
| Lite (paid) | 1 day | 1 day | 1 day | 1 day |
| Lite (free) | 1 day | 0 days | 0 days | 0 days |
Transient databases will not retain data in failsafe storage. Transient databases will however still enforce a 1 day minimum lifetime for data (this will show up in your bill as failsafe).
Transient databases can be helpful for the following datasets:
- Datasets that are the intermediate output of a job (write once, read once)
- Datasets that can be easily reconstructed from an external data source
Breaking Down Storage Usage
To better understand your organization's storage bill, start with the STORAGE_INFO view in the MD_INFORMATION_SCHEMA. This function returns an overview of the storage footprint by lifecycle stage for the databases in your organization.
If Active bytes are higher than expected, consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types.
If Historical bytes are higher than expected, you should look into either outstanding manually updated shares referencing this database in the organization or your historical database snapshots. Outstanding manually updated shares may keep historical data referenced (which prevent it from being deleted). Your historical byte footprint will decrease as the shares are updated (UPDATE SHARE) or dropped. You can find all shares that reference some database by using the OWNED_SHARES view in the MD_INFORMATION_SCHEMA.
Otherwise you can consider reducing the SNAPSHOT_RETENTION_DAYS on your database to reduce the number of historical snapshots you retain. Note that this will reduce the window of time that you can restore data from. See data recovery for more details on how to plan and setup a proper data recovery protocol for your organization.
If Kept for cloned bytes are higher than expected, consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases.
Failsafe bytes result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (via overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint. You can also consider using a TRANSIENT database, which won't have failsafe bytes, if they are applicable for your use case.
If you have the Admin role, you can view your organization's storage breakdown on the databases page.
If you need help understanding or reducing your storage bill, please reach out to MotherDuck support.