Free Guide · Open Table Formats

Stop wrestling with thousands of metadata files

The Essential Guide to DuckLake

This 39-page guide covers DuckLake architecture, a hands-on tutorial, complete SQL reference, production operations, migration paths from Iceberg and Delta Lake, security implementation, and cost analysis — everything you need to evaluate and adopt DuckLake.

Fill out the form to download the complete guide.

Why data teams are switching to DuckLake

Traditional open table formats store metadata as thousands of small files scattered across object storage. Every catalog operation — listing tables, checking history, resolving conflicts — requires file-system round trips. Compaction jobs run for hours. Schema changes cascade into rewrites.

DuckLake takes a fundamentally different approach: store metadata in a SQL database, not in files.

SQL-powered metadata

Stored in Postgres, MySQL, SQLite, or DuckDB — not scattered files.

ACID-compliant

Multi-table transactions and time travel built in from day one.

No vendor lock-in

Open format, built to work with any engine.

10–100x faster

Metadata operations that leave file-based catalogs in the dust.

Simple to set up

Create a DuckLake from scratch in under a minute.

Inside the guide

39 pages of technical depth for data engineers and platform teams evaluating open table formats.

Architecture

How DuckLake Works

Why SQL-backed metadata changes everything. Deep dive into the architecture, plus an honest feature comparison against Iceberg and Delta Lake.

Tutorial

Build Your First DuckLake

Step-by-step hands-on walkthrough — create a DuckLake, ingest data, query it, and use snapshots and time travel.

Reference

SQL Reference & API

Complete command reference, metadata functions, time travel syntax, and language bindings for Python, Node.js, and more.

Operations

Performance & Production

Partitioning strategies, compaction, concurrency patterns, health monitoring, backup/recovery, and catalog maintenance.

Pipelines

Integrations & Migration

ETL/ELT patterns with dbt, Airflow, and Dagster. Migration paths from Iceberg, Delta Lake, RDBMS, and raw Parquet files.

Security

Security & Compliance

Zero-trust Parquet encryption, catalog-level access control, GDPR and HIPAA considerations, and full audit trail configuration.

DuckLake vs Iceberg vs Delta Lake

A preview of the feature comparison from the guide:

Feature DuckLake Apache Iceberg Delta Lake
Metadata Storage SQL Database (Postgres, MySQL, DuckDB) Files in Object Storage (JSON, Avro) Files in Object Storage (JSON, Parquet)
Transaction Scope Multi-table, database-level ACID Single-table ACID Single-table ACID
Small File Handling Data inlining + SQL-based compaction Periodic compaction jobs Periodic compaction jobs (OPTIMIZE)
Schema Evolution Transactional DDL via SQL ALTER TABLE Atomic metadata pointer updates Atomic commits to transaction log
Query Planning Single SQL query to catalog Multi-hop file reads (metadata → manifest list → manifests) Read transaction log for file list

The full guide includes additional comparison dimensions: concurrency models, primary dependencies, and more.

Is this guide for you?

✓ You're evaluating open table formats for a new or existing data lakehouse

✓ You're running Iceberg or Delta Lake and hitting metadata or compaction pain

✓ You want a hands-on tutorial, not just a whitepaper

✓ You need to make a case to your team with real comparison data

✓ You're a data engineer or platform lead building a modern analytics stack

✓ You want production guidance — security, monitoring, and migration paths

Frequently asked questions

What is DuckLake?

DuckLake is an open table format that moves all metadata — for both the catalog and individual tables — into a standard SQL database (Postgres, MySQL, SQLite, or DuckDB), while data stays in Parquet files on object storage. Every operation becomes a SQL transaction against the catalog database, leveraging its native ACID guarantees for true multi-table transactions.

How does DuckLake compare to Apache Iceberg and Delta Lake?

Iceberg and Delta Lake store metadata as files in object storage (JSON/Avro for Iceberg, JSON/Parquet for Delta), requiring periodic compaction and file-system round trips for catalog operations. DuckLake stores metadata in a SQL database, enabling index-based partition discovery, transactional schema evolution via SQL ALTER TABLE, and database-level concurrency control. The guide includes a detailed feature comparison table covering transaction scope, concurrency models, small file handling, schema evolution, and more.

Is DuckLake open source?

Yes. DuckLake is released under the MIT license. The format specification, DuckDB extension, and all tooling are fully open source. Data is stored as standard Parquet files readable by a wide variety of query engines and tools.

Can I use DuckLake with my existing data stack?

Yes. DuckLake works with any SQL-compatible metadata database and any major object storage (S3, GCS, Azure Blob, Cloudflare R2). The guide covers integration patterns for dbt, Airflow, Dagster, Tableau, Power BI, and streaming pipelines via Kafka. It also covers migration paths from Iceberg, Delta Lake, traditional RDBMS, and raw Parquet files.

Who is this guide for?

Data engineers, platform teams, and technical leaders evaluating open table formats. It ranges from foundational architecture concepts and a hands-on tutorial through to production operations, security implementation, and cost analysis versus Snowflake, BigQuery, and Databricks.

What is MotherDuck?

MotherDuck is a serverless cloud analytics platform built on DuckDB. It provides managed DuckLake hosting with integrated authentication, automatic credential brokering for object storage, and AWS PrivateLink for enterprise network security. You can try it free at motherduck.com.

Does MotherDuck offer managed DuckLakes?

Yes. MotherDuck manages the DuckLake catalog database and metadata operations for you, so you get 10–100x faster metadata lookups and sub-second query performance at petabyte scale without running your own catalog infrastructure. You can bring your own S3-compatible storage for data files, or let MotherDuck manage that too. Start with MotherDuck’s standard storage for typical workloads, then seamlessly scale to DuckLake-backed databases as your data grows — same SQL interface either way. Managed DuckLakes are currently available in preview.

Download the guide

Fill out the form to get instant access to The Essential Guide to DuckLake.

Download the Guide

Thanks for requesting the guide - you'll be taken there shortly! Redirecting youOne sec