Storage Lifecycle and Management
Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a sophisticated multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes.
Storage Lifecycle Overview
The following documents MotherDuck's storage lifecycle.
There are 4 distinct stages of the storage lifecycle:
- Active bytes: Actively referenced bytes of the database. These bytes are accessible by directly querying the database.
- Historical bytes: Non-active bytes referenced by a share of this database
- Kept for cloned bytes: Bytes referenced by other databases (via zero-copy clone) that are no longer referenced by this database as active or historical bytes
- Failsafe bytes: Bytes that are no longer referenced by any database or share that are retained for some period of time as system backups
MotherDuck will run a periodic job that will reclassify data to the proper storage lifecycle stage.
Data can only flow through the storage lifecycle unidirectionally, from left to right.
The following conditions can trigger data to be reclassified to a new stage:
- Active bytes: when the data is deleted from the database
- Historical bytes: when all shares referencing the data are dropped or updated
- Kept for cloned bytes: when the data is deleted from all zero-copy-cloned databases
- Failsafe bytes: after the failsafe retention period (7 days)
An organization is billed for the sum of active, historical, kept for cloned, and failsafe bytes across all of their databases.
How This Affects Your Data Strategy
Understanding the storage lifecycle helps you make informed decisions about:
- Data deletion strategies: When you delete data, it doesn't immediately reduce your bill due to the retention stages
- Sharing considerations: Shared data remains in historical bytes until shares are updated or dropped
- Cloning decisions: Zero-copy clones can keep data in kept for cloned bytes even after deletion from the source
- Cost optimization: Different lifecycle stages have different cost implications and management strategies
For more information on data sharing, see Sharing Data. For details on zero-copy cloning, refer to MotherDuck Architectural Concepts.
Storage Management
To better understand your organization's storage bill, you should start with the STORAGE_INFO
view in the MD_INFORMATION_SCHEMA. This function will provide you an overview of the storage footprint of the different databases in your organization broken down by storage lifecycle stages.
If Active bytes are higher than expected, consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types.
If Historical bytes are higher than expected, consider whether there are outstanding manually updated shares that reference this database in the organization. This footprint will decrease as the shares are updated (UPDATE SHARE) or dropped. You can find all shares that reference some database by using the OWNED_SHARES view in the MD_INFORMATION_SCHEMA.
If Kept for cloned bytes are higher than expected, consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases.
Failsafe bytes result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (via overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint.
If you need help understanding or reducing your storage bill, please reach out to MotherDuck support.