DuckLake - The Definitive Guide
Table formats shouldn't require a PhD in file management
A comprehensive O'Reilly guide to the open table format that replaces file-based metadata with SQL databases for a faster, easier LakeHouse. Free early access for data engineers and platform teams.
Chapter 1 Now Available. We'll send you additional chapters of the early release as they become available.

Why DuckLake
Traditional open table formats store metadata as thousands of small files scattered across object storage. Every catalog operation requires file-system round trips. Compaction jobs run for hours.
SQL-powered metadata
Stored in Postgres, SQLite, or DuckDB — not scattered files.
10–100x faster
Catalog queries in milliseconds, not hundreds of milliseconds.
ACID-compliant
Multi-table transactions and time travel built in from day one.
Engine-agnostic spec
MIT licensed with DuckDB, Spark, and DataFusion implementations.
Iceberg-compatible
Read Iceberg tables directly. Migrate metadata without moving data.
Simple to set up
Three SQL commands to create your first DuckLake.
What's inside the guide
Everything you need to evaluate, adopt, and operate DuckLake.
Part 01
Architecture & Design
Why SQL-backed metadata changes everything. Deep dive into DuckLake's architecture, how catalog operations hit 10–100x speed improvements, and the design decisions that make it possible.
Part 02
Compared: Iceberg & Delta Lake
An honest, technical comparison. Where DuckLake excels, where incumbent formats still have strengths, and how to think about the trade-offs for your data platform.
Part 03
Migration & Getting Started
Practical guidance for adopting DuckLake. Iceberg interop patterns, migration strategies, getting started with DuckDB, and integrating with your existing data stack.
Not another vendor whitepaper
O'Reilly doesn't put their name on marketing material. This is the same editorial standard behind their definitive guides to Kafka, Spark, and Kubernetes — peer-reviewed technical content built for practitioners, not prospects.
The guide covers DuckLake's strengths and its limitations. It compares it honestly against Iceberg and Delta Lake. It provides real migration paths, not handwaving. If you're evaluating table formats for your data platform, this is the unbiased technical resource you need.
What makes this different
- ✓ Deep technical content from DuckLake contributors
- ✓ The quality content you expect from O'Reilly
- ✓ Covers competing formats honestly
- ✓ Practical migration paths & interop patterns
Frequently asked questions
When will DuckLake: The Definitive Guide be available?
It’s available now in Early Release. We’ll send individual chapters as they’re released if you sign up to receive it.
Is it free?
Yes, the Early Release is completely free, though you’ll get some emails from us.
What topics does the guide cover?
The guide covers DuckLake’s architecture and design philosophy, performance characteristics, honest comparisons with Iceberg/Delta Lake, migration strategies, Iceberg interop patterns, and practical getting-started guidance with DuckLake.
Do I need to use MotherDuck?
No. DuckLake is an open source project (MIT license) that works with DuckDB. The guide covers DuckLake as a technology — MotherDuck is one way to use it, but the content applies regardless of your deployment choice.
Who wrote the guide?
The book is written by Matt Martin and Alex Monahan. They bring extensive experience in data engineering at Fortune 100 companies as well as extensive knowledge of the DuckDB and DuckLake ecosystems.
The guide is produced in collaboration with O’Reilly Media’s editorial team. It meets the same peer-reviewed standards as O’Reilly’s other definitive technology guides.
Get the guide before anyone else
Enter your email and we'll send you chapters from Early Access as they are ready.
Download the Early Release
Check your email for the book!
