Pre-Release · O'Reilly

The Definitive Guide to DuckLake

Table formats shouldn't require a PhD in file management

A comprehensive O'Reilly guide to the open table format that replaces file-based metadata with SQL databases for a faster, easier LakeHouse. Free early access for data engineers and platform teams.

Enter your email. We'll notify you the moment the guide is available.

Why DuckLake

Traditional open table formats store metadata as thousands of small files scattered across object storage. Every catalog operation requires file-system round trips. Compaction jobs run for hours.

SQL-powered metadata

Stored in Postgres, SQLite, or DuckDB — not scattered files.

10–100x faster

Catalog queries in milliseconds, not hundreds of milliseconds.

ACID-compliant

Multi-table transactions and time travel built in from day one.

Engine-agnostic spec

MIT licensed with DuckDB, Spark, and DataFusion implementations.

Iceberg-compatible

Read Iceberg tables directly. Migrate metadata without moving data.

Simple to set up

Three SQL commands to create your first DuckLake.

What's inside the guide

Everything you need to evaluate, adopt, and operate DuckLake.

Part 01

Architecture & Design

Why SQL-backed metadata changes everything. Deep dive into DuckLake's architecture, how catalog operations hit 10–100x speed improvements, and the design decisions that make it possible.

Part 02

Compared: Iceberg & Delta Lake

An honest, technical comparison. Where DuckLake excels, where incumbent formats still have strengths, and how to think about the trade-offs for your data platform.

Part 03

Migration & Getting Started

Practical guidance for adopting DuckLake. Iceberg interop patterns, migration strategies, getting started with DuckDB, and integrating with your existing data stack.

Not another vendor whitepaper

O'Reilly doesn't put their name on marketing material. This is the same editorial standard behind their definitive guides to Kafka, Spark, and Kubernetes — peer-reviewed technical content built for practitioners, not prospects.

The guide covers DuckLake's strengths and its limitations. It compares it honestly against Iceberg and Delta Lake. It provides real migration paths, not handwaving. If you're evaluating table formats for your data platform, this is the unbiased technical resource you need.

What makes this different

  • ✓ Deep technical content from DuckLake contributors
  • ✓ The quality content you expect from O'Reilly
  • ✓ Covers competing formats honestly
  • ✓ Practical migration paths & interop patterns

Meet the authors

Written by practitioners who build with DuckDB and DuckLake every day.

Matt Martin

Matt Martin

Staff Engineer, State Farm

Matt is a data engineering professional with over 20 years of experience designing and delivering scalable data solutions. His background spans legacy systems like DB2 and SQL Server to modern cloud platforms and large-scale data processing. He previously served as a Senior Manager of Data Engineering at Home Depot, and currently works on complex cloud data integration challenges at State Farm. Outside of work, Matt enjoys home renovation projects and indoor rowing. He lives with his wife and three young children.

Alex Monahan

Alex Monahan

Developer Advocate, MotherDuck

Alex is a developer advocate at MotherDuck and a DuckLake contributor. Previously a customer software engineer at MotherDuck and a blogger for DuckDB Labs, Alex spent nine years at Intel moving from industrial engineer to data scientist before discovering DuckDB in 2020 and diving deeper into Duck-themed databases ever since. Beyond work, you'll find Alex jumping on trampolines and going on adventures with his daughter Adeline and wife Christy.

Frequently asked questions

When will the guide be available?

The guide is currently in pre-release with O'Reilly. Sign up to be notified the moment early access opens. We expect the first couple chapters in the next few weeks. We'll notify you as additional chapters are available, and when the 1st edition is fully published.

Is it free?

Yes. The early-access edition is completely free. No paywall, no credit card, no trial signup required.

What topics does the guide cover?

The guide covers DuckLake's architecture and design philosophy, performance characteristics, honest comparisons with Iceberg/Delta Lake, migration strategies, Iceberg interop patterns, and practical getting-started guidance with DuckDB.

Do I need to use MotherDuck?

No. DuckLake is an open source project (MIT license) that works with DuckDB. The guide covers DuckLake as a technology — MotherDuck is one way to use it, but the content applies regardless of your deployment choice. With MotherDuck you can create a DuckLake with just 2 SQL commands in under 5 minutes.

Get the guide before anyone else

Enter your email and we'll send you early access the moment it's ready.

Sign up for early access

YOU'RE ON THE LIST We'll send you early access the moment it's ready.