New: The AI Analytics Eval Field GuideGet the Free Playbook

Skip to main content

Storage Lifecycle and Management

Understanding MotherDuck's storage lifecycle is crucial for optimizing costs and managing data effectively. Unlike traditional databases where deleted data is immediately freed, MotherDuck implements a multi-stage storage system that ensures data safety while providing cost transparency. This system is particularly important for organizations that share data, use zero-copy cloning, or need to understand their storage footprint for billing purposes.

The storage lifecycle applies to both native storage databases and DuckLake databases, with some differences in lifecycle stages and management. See storage management for retention defaults by database type.

Storage lifecycle overview

The following diagram shows the storage lifecycle for native storage databases.

There are 5 distinct stages of the storage lifecycle:

  1. Active bytes: Actively referenced bytes of the database. These bytes are accessible by directly querying the database.
  2. Historical bytes: Non-active bytes referenced by historical snapshots or shares of this database. Used for time travel and self-service restore.
  3. Retained for clone bytes: Bytes referenced by other databases (through zero-copy clone) that are no longer referenced by this database as active or historical bytes. This stage applies to native storage databases only.
  4. Failsafe bytes: Bytes no longer referenced by any database or share, retained for a period as a last-resort, best-effort recovery service. Recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed to be complete. Don't rely on failsafe bytes as part of a backup plan.
  5. Deleted: Bytes are fully removed from the system and no longer accessible.

MotherDuck runs a periodic job that reclassifies data to the proper storage lifecycle stage. For DuckLake databases, auto maintenance handles file cleanup and snapshot expiration.

Data can only flow through the storage lifecycle unidirectionally, from left to right.

The following conditions can trigger data to be reclassified to a new stage:

TriggerState transition
Data is deleted or updated in the databaseActive → Historical
All shares referencing the data are dropped or updated, and all historic snapshots referencing the data are deletedHistorical → Retained for Clone or Failsafe
Data is deleted from all zero-copy-cloned databasesRetained for Clone → Failsafe
Failsafe retention period passes (7 days for standard, 1 day for transient)Failsafe → Deleted

An organization is billed based on the average of active, historical, retained for clone, and failsafe bytes across all of their databases over the billing period.

Refer to the data recovery overview for more details on how to manage historical snapshots.

How this affects your data strategy

Understanding the storage lifecycle helps you make informed decisions about:

  • Data deletion strategies: When you delete data, it doesn't immediately reduce your bill due to the retention stages
  • Sharing considerations: Shared data remains in historical bytes until shares are updated or dropped
  • Cloning decisions: Zero-copy clones can keep data in retained for clone bytes even after deletion from the source
  • Cost optimization: Different lifecycle stages have different cost implications and management strategies

For more information on data sharing, see Sharing Data. For details on zero-copy cloning, refer to MotherDuck Architectural Concepts.

Storage management

Storage retention behavior depends on the database type: standard, transient, or DuckLake.

SNAPSHOT_RETENTION_DAYS controls how many days historical snapshots are retained for data recovery and time travel (see Data Recovery). The recommended minimum is at least 1 day, so you can recover your data if you accidentally drop or overwrite it.

To see the historical retention and transient status of your databases, use the md_information_schema.databases view.

Lite starts in free-tier mode with no historical retention until usage limits are reached, after which Lite defaults apply.

Standard databases

PlanFailsafe periodDefault historical retentionMin historical retentionMax historical retention
Business7 days7 days0 days90 days
Lite (paid)7 days1 day1 day1 day
Lite (free)7 days0 days0 days0 days

Historical retention enables point-in-time restore for your data. Business plan users can configure retention up to 90 days for extended data recovery capabilities.

Transient databases

For use cases that don't require the default failsafe retention period (7 days), a native storage database can be set as TRANSIENT at database creation to enforce a 1 day failsafe minimum. This setting can only be defined at database creation and is not modifiable.

PlanFailsafe periodDefault historical retentionMin historical retentionMax historical retention
Business1 day1 day0 days90 days
Lite (paid)1 day1 day1 day1 day
Lite (free)1 day0 days0 days0 days

Transient databases enforce a 1-day minimum lifetime for data, which shows up in your bill as failsafe bytes.

Transient databases can be helpful for the following datasets:

  • Datasets that are the intermediate output of a job (write once, read once)
  • Datasets that can be reconstructed from an external data source

DuckLake databases

DuckLake databases follow the same lifecycle stages as native storage databases (active, historical, failsafe, deleted), except there is no "retained for clone" stage since DuckLake does not support zero-copy cloning.

SettingFully managed DuckLakeBYOB DuckLake
Failsafe period7 days7 days
Default snapshot retentionInfinite (NULL)Infinite (NULL)
Auto maintenanceEnabled by defaultDisabled by default
Configurable retentionYes, with SNAPSHOT_RETENTION_DAYSYes, after enabling AUTO_MAINTENANCE

DuckLake storage optimization and snapshot expiration are handled by auto maintenance rather than the native storage garbage collector. When SNAPSHOT_RETENTION_DAYS is set to NULL (the default), snapshots are retained indefinitely.

To configure snapshot retention for a DuckLake database:

ALTER DATABASE my_ducklake SET SNAPSHOT_RETENTION_DAYS = 7;

For more details on DuckLake storage management, see the DuckLake storage lifecycle section.

Backup strategies

If your data can't be recreated from source, plan an explicit backup strategy. Failsafe bytes are a last-resort recovery mechanism, not a backup plan: recovery requires contacting MotherDuck support, can take hours to days, and isn't guaranteed.

The storage lifecycle gives you several mechanisms that you can rely on for backups:

  • Automatic snapshots for time travel and short-term restore, retained as historical_bytes according to SNAPSHOT_RETENTION_DAYS. Retention defaults and limits depend on your plan (see Standard databases).
  • Named snapshots (Business plan) for long-lived backups that persist until you explicitly remove them. See database snapshots for details.
  • Zero-copy clones through CREATE DATABASE FROM for isolated copies without duplicating storage costs.

Transient databases skip the default 7-day failsafe retention and are appropriate for data that can be recreated from a job or external source.

For recovery procedures, see data recovery.

Breaking down storage usage

Admin only

Storage breakdown information is only available to users with the Admin role.

To understand your organization's storage bill, you have two entry points:

Query the STORAGE_INFO and STORAGE_INFO_HISTORY views in MD_INFORMATION_SCHEMA for a breakdown by lifecycle stage, as either a current snapshot or up to 30 days of history.

-- Get current storage information for all databases
SELECT * FROM MD_INFORMATION_SCHEMA.STORAGE_INFO;

Active bytes are higher than expected

Consider whether you need all of the data stored in that database. Some common ways to decrease active bytes are to delete the data or optimize sorting and data types.

Historical bytes are higher than expected

You should look into either outstanding manually updated shares referencing this database in the organization or your historical database snapshots. Outstanding manually updated shares may keep historical data referenced (which prevent it from being deleted). Your historical byte footprint will decrease as the shares are updated (UPDATE SHARE) or dropped. You can find all shares that reference some database by using the OWNED_SHARES view in the MD_INFORMATION_SCHEMA.

Otherwise you can consider reducing the SNAPSHOT_RETENTION_DAYS on your database to reduce the number of historical snapshots you retain. Note that this will reduce the window of time that you can restore data from. See data recovery for more details on how to plan and setup a proper data recovery protocol for your organization.

Retained for clone bytes are higher than expected

Consider whether there are other databases that were zero-copy cloned from this database that are still referencing deleted data. This footprint will decrease as you delete the cloned data from these other databases.

Failsafe bytes are higher than expected

Failsafe bytes result from deleting data. This footprint should drop if this was a one-time deletion of data. If failsafe bytes remain consistently high - it is likely that you are overwriting or updating data too frequently. Common workloads that tend to delete a lot of data (through overwrites or updates) are: create or replace tables, truncate and insert, updates, and deletes. Avoiding these workload patterns can reduce your failsafe footprint. You can also consider using a TRANSIENT database, if it supports your use case, to reduce failsafe bytes to 1 day.

If you need help understanding or reducing your storage bill, reach out to MotherDuck support.