DuckLake

💡Preview Feature

This is a preview feature and requires DuckDB 1.3.2 or greater. Preview features may be operationally incomplete and may offer limited backward compatibility.

DuckLake is an open table format for large-scale analytics that provides data management capabilities similar to Apache Iceberg and Delta Lake. It organizes data into partitions based on column values like date or region for efficient querying, with actual data files stored on object storage systems. DuckLake innovates by storing metadata in database tables rather than files, enabling faster lookups through database indexes and more efficient partition pruning via SQL queries, while the columnar data itself resides on scalable object storage infrastructure.

MotherDuck provides support for managed DuckLake, enabling you to back MotherDuck databases with a DuckLake catalog and storage for petabyte-scale data workloads.

tip

Looking for code examples? Check out the integration guide to see how easy it is to start using DuckLake with MotherDuck.

Key Characteristics

Database-backed metadata: DuckLake stores table metadata in a transactional database (PostgreSQL, MySQL) rather than files, providing:

Faster metadata lookups through database indexes
Efficient filtering of data by skipping irrelevant partitions using SQL WHERE clauses
Simplified writes without the performance of manifest file merging

Multi-table transactions: Unlike other lake formats that operate on individual tables, DuckLake supports ACID transactions across multiple related tables, better reflecting how organizations think about databases as collections of inter-related tables.

Simplified architecture: No additional catalog server required—just a standard transactional database that most organizations already have expertise managing.

DuckLake vs. Other Lake Formats

Performance Differences

Table formats like Apache Iceberg and Delta Lake store metadata in file-based structures. Read and write operations must traverse these file-based metadata structures, which can create latency that increases with scale.

File-based metadata challenges:

Sequential file scanning for metadata discovery
Complex manifest file merging for writes
Limited query optimization due to metadata access patterns
Catalog server complexity for coordination

DuckLake approach:

Database indexes provide faster metadata lookups
Transactional writes reduce manifest merging overhead
SQL-based partition pruning and query optimization
Standard database operations for metadata management

Scale and Capability Comparison

Capability	DuckLake	Iceberg/Delta Lake
Data Scale	Petabytes	Petabytes
Metadata Storage	Database tables with indexed access	File-based structures requiring sequential traversal
Metadata Performance	Database index lookups	Additional catalog required
Write Operations	Database transactions	Manifest file merging
Multi-table Operations	Full ACID transactions across tables	Limited cross-table coordination
Infrastructure Requirements	Standard transactional databases	Separate catalog servers
Schema Evolution	Coordinated multi-table schema evolution	Individual table-level changes

Use Cases and Applications

When to Choose DuckLake as your Open Table Format

DuckLake is particularly well-suited for:

Large-scale analytics: Organizations with petabytes of historical data, high-volume event streams, or analytics requirements that exceed traditional data warehouse storage or processing capabilities.

Multi-table workloads: Applications requiring coordinated schema evolution, cross-table constraints, or transactional consistency across related tables.

Metadata-intensive workloads: Scenarios where file-based metadata access patterns may impact query performance.

Reduced infrastructure complexity: Organizations seeking lake-scale capabilities with fewer separate catalog servers and metadata management components.

Storage Comparison: MotherDuck Native vs DuckLake Storage

For loading data, MotherDuck and DuckLake perform very similarly.

However, when reading data, MotherDuck native storage format is 2x-10x faster than DuckLake, for both cold & hot runs.

Migration Considerations

From data warehouses: DuckLake provides a scaling option when warehouse storage limits or costs become constraining, while maintaining SQL interfaces and compatibility.

From other lake formats: DuckLake may provide performance improvements for metadata-intensive workloads, though migration requires consideration of existing tooling and processes.

Hybrid architectures: Organizations can use MotherDuck for traditional data warehouse workloads while graduating specific databases to DuckLake as scale requirements increase.

Performance Characteristics

Metadata Operations

DuckLake's database-backed metadata provides different performance characteristics:

Partition discovery: Index-based vs. file scanning
Schema evolution: Transactional vs. eventual consistency
Query planning: Index-based vs. file traversal
Concurrent access: Database locks vs. file coordination

Future Capabilities

MotherDuck continues expanding DuckLake support with planned features including:

External catalog integration: Access to customer-managed DuckLake catalogs hosted in cloud databases

Local storage access: Direct access to MotherDuck-managed storage from local DuckDB instances for hybrid workloads

Data inlining: Optimized handling of small, frequent writes through intelligent data organization

Time travel: Point-in-time queries and data versioning capabilities for audit and recovery scenarios

Enhanced Iceberg support: Continued improvements to Iceberg integration alongside DuckLake development

Architecture Implications

Catalog Database Requirements

DuckLake catalogs require a transactional database with:

ACID transaction support
Concurrent read/write access
Standard SQL interface
Backup and recovery capabilities

Thankfully, this is all supported as part of MotherDuck without adding an additional catalog, although in self-hosted scenarios, an alternative database like Postgres, MySQL, or SQLite can be used.

Storage Considerations

DuckLake data storage follows similar patterns to other lake formats:

Columnar file formats (Parquet)
Partitioned directory structures
Object storage compatibility
Compression and encoding optimizations

Key Characteristics​

DuckLake vs. Other Lake Formats​

Performance Differences​

Scale and Capability Comparison​

Use Cases and Applications​

When to Choose DuckLake as your Open Table Format​

Storage Comparison: MotherDuck Native vs DuckLake Storage​

Migration Considerations​

Performance Characteristics​

Metadata Operations​

Future Capabilities​

Architecture Implications​

Catalog Database Requirements​

Storage Considerations​