Early Release · O'Reilly

DuckLake - The Definitive Guide

Table formats shouldn't require a PhD in file management

A comprehensive O'Reilly guide to the open table format that replaces file-based metadata with SQL databases for a faster, easier LakeHouse. Free early access for data engineers and platform teams.

Chapter 1 Now Available. We'll send you additional chapters of the early release as they become available.

Why DuckLake

Traditional open table formats store metadata as thousands of small files scattered across object storage. Every catalog operation requires file-system round trips. Compaction jobs run for hours.

SQL-powered metadata

Stored in Postgres, SQLite, or DuckDB — not scattered files.

10–100x faster

Catalog queries in milliseconds, not hundreds of milliseconds.

ACID-compliant

Multi-table transactions and time travel built in from day one.

Engine-agnostic spec

MIT licensed with DuckDB, Spark, and DataFusion implementations.

Iceberg-compatible

Read Iceberg tables directly. Migrate metadata without moving data.

Simple to set up

Three SQL commands to create your first DuckLake.

What's inside the guide

Everything you need to evaluate, adopt, and operate DuckLake.

Part 01

Architecture & Design

Why SQL-backed metadata changes everything. Deep dive into DuckLake's architecture, how catalog operations hit 10–100x speed improvements, and the design decisions that make it possible.

Part 02

Compared: Iceberg & Delta Lake

An honest, technical comparison. Where DuckLake excels, where incumbent formats still have strengths, and how to think about the trade-offs for your data platform.

Part 03

Migration & Getting Started

Practical guidance for adopting DuckLake. Iceberg interop patterns, migration strategies, getting started with DuckDB, and integrating with your existing data stack.

Not another vendor whitepaper

O'Reilly doesn't put their name on marketing material. This is the same editorial standard behind their definitive guides to Kafka, Spark, and Kubernetes — peer-reviewed technical content built for practitioners, not prospects.

The guide covers DuckLake's strengths and its limitations. It compares it honestly against Iceberg and Delta Lake. It provides real migration paths, not handwaving. If you're evaluating table formats for your data platform, this is the unbiased technical resource you need.

What makes this different

  • ✓ Deep technical content from DuckLake contributors
  • ✓ The quality content you expect from O'Reilly
  • ✓ Covers competing formats honestly
  • ✓ Practical migration paths & interop patterns

Meet the authors

Written by practitioners who build with DuckDB and DuckLake every day.

Matt Martin

Matt Martin

Staff Engineer, State Farm

Matt is a data engineering professional with over 20 years of experience designing and delivering scalable data solutions. His background spans legacy systems like DB2 and SQL Server to modern cloud platforms and large-scale data processing. He previously served as a Senior Manager of Data Engineering at Home Depot, and currently works on complex cloud data integration challenges at State Farm. Outside of work, Matt enjoys home renovation projects and indoor rowing. He lives with his wife and three young children.

Alex Monahan

Alex Monahan

Developer Advocate, MotherDuck

Alex is a developer advocate at MotherDuck and a DuckLake contributor. Previously a customer software engineer at MotherDuck and a blogger for DuckDB Labs, Alex spent nine years at Intel moving from industrial engineer to data scientist before discovering DuckDB in 2020 and diving deeper into Duck-themed databases ever since. Beyond work, you'll find Alex jumping on trampolines and going on adventures with his daughter Adeline and wife Christy.

Get the guide before anyone else

Enter your email and we'll send you chapters from Early Access as they are ready.

Download the Early Release

Check your email for the book!