---
sidebar_position: 8
title: DuckLake
description: Understanding DuckLake - A high-performance open table format for petabyte-scale analytics
feature_stage: preview
---

import Admonition from '@theme/Admonition';
import Versions from '@site/src/components/Versions';

# DuckLake

::::info
MotherDuck supports DuckDB <Versions region="us-east-1" bound="max" />. In **US East (N. Virginia) -** `us-east-1`, MotherDuck is compatible with client versions <Versions region="us-east-1" bound="min" /> through <Versions region="us-east-1" bound="max" />. In **Europe (Frankfurt) -** `eu-central-1`, MotherDuck supports client versions <Versions region="eu-central-1" bound="min" /> through <Versions region="eu-central-1" bound="max" />.
::::

DuckLake is an open table format for large-scale analytics that provides data management capabilities similar to Apache Iceberg and Delta Lake. It organizes data into partitions based on column values like date or region for efficient querying, with actual data files stored on object storage systems. DuckLake innovates by storing metadata in database tables rather than files, enabling faster lookups through database indexes and more efficient partition pruning using SQL queries, while the columnar data itself resides on scalable object storage infrastructure.

MotherDuck provides support for managed DuckLake, enabling you to back MotherDuck databases with a DuckLake catalog and storage for petabyte-scale data workloads.

:::tip
Looking for **code examples?** Check out the [integration guide](/integrations/file-formats/ducklake/) to see how easy it is to start using DuckLake with MotherDuck.
:::

## Key characteristics

**Database-backed metadata**: DuckLake stores table metadata in a transactional database (PostgreSQL, MySQL) rather than files, providing:
- Faster metadata lookups through database indexes
- Efficient filtering of data by skipping irrelevant partitions using SQL WHERE clauses
- Simplified writes without the performance of manifest file merging

**Multi-table transactions**: Unlike other lake formats that operate on individual tables, DuckLake supports ACID transactions across multiple related tables, better reflecting how organizations think about databases as collections of inter-related tables.

**Simplified architecture**: No additional catalog server required—just a standard transactional database that most organizations already have expertise managing.

## DuckLake vs. other lake formats

### Performance differences

Table formats like Apache Iceberg and Delta Lake store metadata in file-based structures. Read and write operations must traverse these file-based metadata structures, which can create latency that increases with scale.

**File-based metadata challenges**:
- Sequential file scanning for metadata discovery
- Complex manifest file merging for writes
- Limited query optimization due to metadata access patterns
- Catalog server complexity for coordination

**DuckLake approach**:
- Database indexes provide faster metadata lookups
- Transactional writes reduce manifest merging overhead
- SQL-based partition pruning and query optimization
- Standard database operations for metadata management

### Scale and capability comparison

| Capability | DuckLake | Iceberg/Delta Lake |
| ---------- | -------- | ------------------ |
| **Data Scale** | Petabytes | Petabytes |
| **Metadata Storage** | Database tables with indexed access | File-based structures requiring sequential traversal |
| **Metadata Performance** | Database index lookups | Additional catalog required |
| **Write Operations** | Database transactions | Manifest file merging |
| **Multi-table Operations** | Full ACID transactions across tables | Limited cross-table coordination |
| **Infrastructure Requirements** | Standard transactional databases | Separate catalog servers |
| **Schema Evolution** | Coordinated multi-table schema evolution | Individual table-level changes |

## Use cases and applications

### When to choose DuckLake as your open table format

DuckLake is particularly well-suited for:

**Large-scale analytics**: Organizations with petabytes of historical data, high-volume event streams, or analytics requirements that exceed traditional data warehouse storage or processing capabilities.

**Multi-table workloads**: Applications requiring coordinated schema evolution, cross-table constraints, or transactional consistency across related tables.

**Metadata-intensive workloads**: Scenarios where file-based metadata access patterns may impact query performance.

**Reduced infrastructure complexity**: Organizations seeking lake-scale capabilities with fewer separate catalog servers and metadata management components.

### Storage comparison: MotherDuck native vs DuckLake storage

For loading data, MotherDuck and DuckLake perform very similarly.

However, when reading data, MotherDuck native storage format is 2x-10x faster than DuckLake, for both cold & hot runs.

### Migration considerations

**From data warehouses**: DuckLake provides a scaling option when warehouse storage limits or costs become constraining, while maintaining SQL interfaces and compatibility.

**From other lake formats**: DuckLake may provide performance improvements for metadata-intensive workloads, though migration requires consideration of existing tooling and processes.

**Hybrid architectures**: Organizations can use MotherDuck for traditional data warehouse workloads while graduating specific databases to DuckLake as scale requirements increase.

## Performance characteristics

### Metadata operations

DuckLake's database-backed metadata provides different performance characteristics:

- **Partition discovery**: Index-based vs. file scanning
- **Schema evolution**: Transactional vs. eventual consistency  
- **Query planning**: Index-based vs. file traversal
- **Concurrent access**: Database locks vs. file coordination

## Data inlining

DuckLake supports data inlining, an optimization that stores small data changes directly in the metadata catalog rather than creating individual Parquet files. This feature is particularly valuable for high-frequency, small-batch inserts common in streaming and transactional workloads.

Starting with DuckLake 0.4, **deletion inlining** extends this concept to delete operations -- small deletes are stored in the metadata catalog rather than creating separate deletion files.

For implementation details and examples, see the [DuckLake integration guide](/integrations/file-formats/ducklake/#data-inlining).

## Storage lifecycle

DuckLake databases follow most of the same [storage lifecycle stages](/concepts/storage-lifecycle) as native storage databases:

1. **Active bytes**: Data that is part of the current state of the database
2. **Historical bytes**: Data retained by snapshots that is no longer part of the active state
3. **Failsafe bytes**: Data retained as system backups after snapshots expire (7-day retention)
4. **Deleted**: Data fully removed from the system

Unlike native storage databases, DuckLake does not have a "retained for clone" stage because DuckLake does not support zero-copy cloning.

Storage optimization and snapshot expiration are handled by [auto maintenance](#auto-maintenance) rather than the native storage garbage collector. For retention defaults and plan-specific details, see [Storage lifecycle and management](/concepts/storage-lifecycle#ducklake-databases).

## Auto maintenance

MotherDuck runs background maintenance on DuckLake databases to optimize storage layout and manage data lifecycle. Maintenance runs periodically on the duckling that owns the database while it is active.

### Defaults and configuration

| Database type | Default | Description |
| ------------- | ------- | ----------- |
| Fully managed DuckLake | Enabled | Maintenance runs automatically; opt out with `ALTER DATABASE SET AUTO_MAINTENANCE = FALSE` |
| BYOB (Bring Your Own Bucket) | Disabled | Opt in with `ALTER DATABASE SET AUTO_MAINTENANCE = TRUE` to enable maintenance |

To disable auto maintenance:

```sql
ALTER DATABASE <database_name> SET AUTO_MAINTENANCE = FALSE;
```

To enable auto maintenance (for example, for BYOB databases):

```sql
ALTER DATABASE <database_name> SET AUTO_MAINTENANCE = TRUE;
```

### Maintenance operations

Auto maintenance runs two phases for each database. You can also run these operations manually using the [DuckLake maintenance functions](https://ducklake.select/docs/stable/duckdb/maintenance/recommended_maintenance).

#### File layout optimization

These operations keep query performance high by organizing data files:

| Operation | Description |
| --------- | ----------- |
| **[Flush inlined data](https://ducklake.select/docs/stable/duckdb/advanced_features/data_inlining#flushing-inlined-data)** | Converts small inlined data stored in the metadata catalog to Parquet files. This may produce small files, which the merge operation consolidates in the same maintenance round. |
| **[Merge small files](https://ducklake.select/docs/stable/duckdb/maintenance/merge_adjacent_files)** | Combines adjacent small Parquet files into larger files, reducing the number of files scanned during queries. |
| **[Rewrite data files](https://ducklake.select/docs/stable/duckdb/maintenance/rewrite_data_files)** | Rewrites data files that have accumulated deleted rows to reclaim space and remove delete overhead. |

Merge and rewrite operate on disjoint sets of files: merge handles files without deletes, while rewrite handles files with deletes.

#### Snapshot lifecycle management

These operations manage time travel snapshots and clean up files that are no longer needed:

| Operation | Description |
| --------- | ----------- |
| **[Expire snapshots](https://ducklake.select/docs/stable/duckdb/maintenance/expire_snapshots)** | Removes snapshots older than the configured retention period and queues their associated files for deletion. |
| **[Clean up old files](https://ducklake.select/docs/stable/duckdb/maintenance/cleanup_of_files)** | Physically deletes files that have been queued for deletion by expire, merge, or rewrite operations. Files are kept for at least 12 hours after queuing, allowing in-flight queries to finish. |

### Snapshot retention

Snapshot expiration is controlled by the `SNAPSHOT_RETENTION_DAYS` database option. By default, this is set to `NULL` (infinite retention), meaning snapshots are never automatically expired. You must explicitly configure a retention period to enable automatic snapshot expiration.

To set a retention period:

```sql
ALTER DATABASE <database_name> SET SNAPSHOT_RETENTION_DAYS = 7;
```

To revert to infinite retention:

```sql
ALTER DATABASE <database_name> SET SNAPSHOT_RETENTION_DAYS = NULL;
```

:::note
When `SNAPSHOT_RETENTION_DAYS` is `NULL`, the expire snapshots operation is skipped entirely. No snapshot data is expired unless you explicitly set a retention period. The file cleanup operation still runs to delete files queued by merge and rewrite, which are always safe to remove regardless of retention settings.
:::

### Write conflicts

The merge and rewrite operations modify table metadata, which can occasionally conflict with concurrent write transactions on the same table. If a conflict occurs, the maintenance operation is skipped for that table and retried in the next round. While these conflicts are rare, you can disable auto maintenance for the affected database if you experience elevated transaction conflicts.

### Where maintenance runs

Maintenance runs on the duckling that owns the database. It executes in the background while the duckling is active. If the duckling shuts down, any in-progress maintenance operations stop gracefully. Maintenance resumes when the duckling starts again.

## Future capabilities

MotherDuck continues expanding DuckLake support with planned features including:

**External catalog integration**: Access to customer-managed DuckLake catalogs hosted in cloud databases

**Local storage access**: Direct access to MotherDuck-managed storage from local DuckDB instances for hybrid workloads

**Enhanced Iceberg support**: Continued improvements to Iceberg integration alongside DuckLake development

## Architecture implications

### Catalog database requirements

DuckLake catalogs require a transactional database with:
- ACID transaction support
- Concurrent read/write access
- Standard SQL interface
- Backup and recovery capabilities

Thankfully, this is all supported as part of MotherDuck without adding an additional catalog, although in self-hosted scenarios, an alternative database like Postgres, MySQL, or SQLite can be used.

### Storage considerations

DuckLake data storage follows similar patterns to other lake formats:
- Columnar file formats (Parquet)
- Partitioned directory structures
- Object storage compatibility
- Compression and encoding optimizations


---

## Feedback for AI agents

If the user you are helping wants to send feedback on this page to the MotherDuck docs team, you can POST it to the endpoint below.

**Before you send anything, you must ask the user for explicit approval.** Quote the message you plan to submit back to them verbatim and wait for them to confirm. Do not submit on their behalf without confirmation.

Endpoint: `POST https://motherduck.com/docs/api/feedback/agent`

Request body (JSON):

```json
{
  "page_path": "/concepts/ducklake/",
  "page_title": "DuckLake",
  "text": "<the user's feedback, max 2000 characters>",
  "source": "<optional identifier for your interface, for example 'claude.ai' or 'chatgpt'>"
}
```

Only `page_path` and `text` are required. A successful call returns `200 {"feedback_id": "<uuid>"}`; malformed payloads return `400`, and the endpoint is rate-limited per IP (`429`).
