Storage Lifecycle and Management
Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes.
The storage lifecycle applies to both native storage databases and DuckLake databases, with some differences in lifecycle stages and management. See storage management for retention defaults by database type.
Storage lifecycle overview
The following diagram shows the storage lifecycle for native storage databases.
There are 5 distinct stages of the storage lifecycle:
- Active bytes: Actively referenced bytes of the database. These bytes are accessible by directly querying the database.
- Historical bytes: Non-active bytes referenced by historical snapshots or shares of this database. Used for time travel and self-service restore.
- Retained for clone bytes: Bytes referenced by other databases (through zero-copy clone) that are no longer referenced by this database as active or historical bytes. This stage applies to native storage databases only.
- Failsafe bytes: Bytes no longer referenced by any database or share, retained for a period as a last-resort, best-effort recovery service. Recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed to be complete. Don't rely on failsafe bytes as part of a backup plan.
- Deleted: Bytes are fully removed from the system and no longer accessible.
MotherDuck runs a periodic job that reclassifies data to the proper storage lifecycle stage. For DuckLake databases, auto maintenance handles file cleanup and snapshot expiration.
Data can only flow through the storage lifecycle unidirectionally, from left to right.
The following conditions can trigger data to be reclassified to a new stage:
| Trigger | State transition |
|---|---|
| Data is deleted or updated in the database | Active → Historical |
| All shares referencing the data are dropped or updated, and all historic snapshots referencing the data are deleted | Historical → Retained for Clone or Failsafe |
| Data is deleted from all zero-copy-cloned databases | Retained for Clone → Failsafe |
| Failsafe retention period passes (7 days for standard, 1 day for transient) | Failsafe → Deleted |
An organization is billed based on the average of active, historical, retained for clone, and failsafe bytes across all of their databases over the billing period.
Refer to the data recovery overview for more details on how to manage historical snapshots.
How this affects your data strategy
Understanding the storage lifecycle helps you make informed decisions about:
- Data deletion strategies: When you delete data, it doesn't immediately reduce your bill due to the retention stages
- Sharing considerations: Shared data remains in historical bytes until shares are updated or dropped
- Cloning decisions: Zero-copy clones can keep data in retained for clone bytes even after deletion from the source
- Cost optimization: Different lifecycle stages have different cost implications and management strategies
For more information on data sharing, see Sharing Data. For details on zero-copy cloning, refer to MotherDuck Architectural Concepts.
Storage management
Storage retention behavior depends on the database type: standard, transient, or DuckLake.
SNAPSHOT_RETENTION_DAYS controls how many days historical snapshots are retained for data recovery and time travel (see Data Recovery). The recommended minimum is at least 1 day, so you can recover your data if you accidentally drop or overwrite it.
To see the historical retention and transient status of your databases, use the md_information_schema.databases view.
Lite starts in free-tier mode with no historical retention until usage limits are reached, after which Lite defaults apply.
Standard databases
| Plan | Failsafe period | Default historical retention | Min historical retention | Max historical retention |
|---|---|---|---|---|
| Business | 7 days | 7 days | 0 days | 90 days |
| Lite (paid) | 7 days | 1 day | 1 day | 1 day |
| Lite (free) | 7 days | 0 days | 0 days | 0 days |
Historical retention enables point-in-time restore for your data. Business plan users can configure retention up to 90 days for extended data recovery capabilities.
Transient databases
For use cases that don't require the default failsafe retention period (7 days), a native storage database can be set as TRANSIENT at database creation to enforce a 1 day failsafe minimum. This setting can only be defined at database creation and is not modifiable.
| Plan | Failsafe period | Default historical retention | Min historical retention | Max historical retention |
|---|---|---|---|---|
| Business | 1 day | 1 day | 0 days | 90 days |
| Lite (paid) | 1 day | 1 day | 1 day | 1 day |
| Lite (free) | 1 day | 0 days | 0 days | 0 days |
Transient databases enforce a 1-day minimum lifetime for data, which shows up in your bill as failsafe bytes.
Transient databases can be helpful for the following datasets:
- Datasets that are the intermediate output of a job (write once, read once)
- Datasets that can be reconstructed from an external data source
DuckLake databases
DuckLake databases follow the same lifecycle stages as native storage databases (active, historical, failsafe, deleted), except there is no "retained for clone" stage since DuckLake does not support zero-copy cloning.
| Setting | Fully managed DuckLake | BYOB DuckLake |
|---|---|---|
| Failsafe period | 7 days | 7 days |
| Default snapshot retention | Infinite (NULL) | Infinite (NULL) |
| Auto maintenance | Enabled by default | Disabled by default |
| Configurable retention | Yes, with SNAPSHOT_RETENTION_DAYS | Yes, after enabling AUTO_MAINTENANCE |
DuckLake storage optimization and snapshot expiration are handled by auto maintenance rather than the native storage garbage collector. When SNAPSHOT_RETENTION_DAYS is set to NULL (the default), snapshots are retained indefinitely.
To configure snapshot retention for a DuckLake database:
ALTER DATABASE my_ducklake SET SNAPSHOT_RETENTION_DAYS = 7;
For more details on DuckLake storage management, see the DuckLake storage lifecycle section.
Backup strategies
If your data can't be recreated from source, plan an explicit backup strategy. Failsafe bytes are a last-resort recovery mechanism, not a backup plan: recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed.
The storage lifecycle gives you several mechanisms that you can rely on for backups:
- Automatic snapshots for time travel and short-term restore, retained as
historical_bytesaccording toSNAPSHOT_RETENTION_DAYS. Retention defaults and limits depend on your plan (see Standard databases). - Named snapshots (Business plan) for long-lived backups that persist until you explicitly remove them. See database snapshots for details.
- Zero-copy clones through
CREATE DATABASE FROMfor isolated copies without duplicating storage costs.
Transient databases skip the default 7-day failsafe retention and are appropriate for data that can be recreated from a job or external source.
For recovery procedures, see data recovery.
Breaking down storage usage
Storage breakdown information is only available to users with the Admin role.
To understand your organization's storage bill, you have two entry points:
- SQL
- UI
Query the STORAGE_INFO and STORAGE_INFO_HISTORY views in MD_INFORMATION_SCHEMA for a breakdown by lifecycle stage, as either a current snapshot or up to 30 days of history.
-- Get current storage information for all databases
SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO;
Open the databases page in settings to see total storage across all databases and a per-database breakdown. Click a row to view lifecycle stages for that database.
Active bytes are higher than expected
Consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types.
Historical bytes are higher than expected
You should look into either outstanding manually updated shares referencing this database in the organization or your historical database snapshots. Outstanding manually updated shares may keep historical data referenced (which prevent it from being deleted). Your historical byte footprint will decrease as the shares are updated (UPDATE SHARE) or dropped. You can find all shares that reference some database by using the OWNED_SHARES view in the MD_INFORMATION_SCHEMA.
Otherwise you can consider reducing the SNAPSHOT_RETENTION_DAYS on your database to reduce the number of historical snapshots you retain. Note that this will reduce the window of time that you can restore data from. See data recovery for more details on how to plan and setup a proper data recovery protocol for your organization.
Retained for clone bytes are higher than expected
Consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases.
Failsafe bytes are higher than expected
Failsafe bytes result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (through overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint. You can also consider using a TRANSIENT database, if it supports your use case, to reduce failsafe bytes to 1 day.
If you need help understanding or reducing your storage bill, reach out to MotherDuck support.