DuckLake: Making BIG DATA feel small (Coalesce 2025)
2025/10/14TL;DR: DuckLake is a new open lakehouse format that combines the simplicity of a database catalog with the scalability of open data formats—eliminating the "big data tax" and enabling a full lakehouse setup in just 5 minutes.
The Big Data Tax
Current cloud data warehouses were designed in 2012 when hardware was much weaker. Their distributed architecture comes with penalties:
- Latency: Small queries take longer than they should due to coordination overhead
- Cost: Network shuffling between nodes isn't free
- Complexity: Scheduling, planning, and routing across nodes adds operational burden
The key insight: "Big compute is dead" (though it's not as catchy as "big data is dead"). Most queries (P99) touch under 256GB of data—well within single-node capability.
DuckDB: Pushing Single-Node Performance
- In-process: Runs inside Python, Node, Go, Rust, and 15+ languages
- Lightweight: 20MB binary, zero dependencies, installs in seconds
- Fast: #1 on ClickBench, beating ClickHouse, Snowflake, Redshift, and BigQuery
DuckLake vs Iceberg Architecture
| Iceberg | DuckLake |
|---|---|
| Multiple metadata layers (manifests, metadata files, catalog) | Single transactional database holds all metadata |
| Metadata overhead grows with commits | Database scales efficiently |
| Complex setup | 5-minute setup |
| Requires Java ecosystem | Pure SQL, any language that wraps DuckDB |
Key insight: DuckLake uses the same architecture as Snowflake (FoundationDB) and BigQuery (Spanner)—a transactional database for metadata.
5-Minute Lakehouse Demo with dbt
The demo shows setting up a complete lakehouse using:
- A dbt profile configured to use DuckDB with the DuckLake extension
- Postgres as the metadata catalog backend
- Local or cloud storage for the actual data files
Maintenance operations include merging small files, expiring old snapshots, and cleaning up expired files—all callable through dbt run-operation.
Production Considerations
- Cloud compute: Serverless preferred for simplicity
- Large instances: Sometimes you need beefy compute for repartitioning or full scans
- Access control: Lock down your lakehouse
- Caching: Lakehouse files are immutable—perfect for caching
- Scheduled maintenance: Automate file compaction and snapshot expiration
MotherDuck: Ducklings of Unusual Size
| Size | Specs | Use Case |
|---|---|---|
| Standard | Various | Day-to-day queries |
| Mega | 64 cores, 256GB RAM | Heavy transformations |
| Giga | 192 cores, 1.5TB RAM | Most problems fit here |
Real-World Migration
A customer replaced a 5-server distributed cluster (largest AWS instances) running Iceberg with one serverless DuckLake on MotherDuck.
- Migration: Metadata-only (no data copying)
- Iceberg import: Supported for bringing in existing Iceberg data
- Iceberg export: Also supported for interoperability
Key Takeaways
- 10-100x data scale with existing SQL/dbt skills—no new stack or team required
- Instant import from Iceberg—leverage existing data investments
- Local dev parity: Same lakehouse runs on laptop and in production
- Future: Spark connector in development for multi-engine support
Related Videos

2025-11-19
LLMs Meet Data Warehouses: Reliable AI Agents for Business Analytics
LLMs excel at natural language understanding but struggle with factual accuracy when aggregating business data. Ryan Boyd explores the architectural patterns needed to make LLMs work effectively alongside analytics databases.
AI, ML and LLMs
MotherDuck Features
SQL
Talk
Python
BI & Visualization
2025-11-05
In the Long Run, Everything is a Fad
Benn Stancil uses Olympics gymnastics scoring to argue data's quantification obsession is generational. We went from vibes to math and may return to AI-powered vibes. Will dashboards matter to the next generation?
Talk
BI & Visualization

2025-11-05
The Unbearable Bigness of Small Data
MotherDuck CEO Jordan Tigani shares why we built our data warehouse for small data first, not big data. Learn about designing for the bottom left quadrant, hypertenancy, and why scale doesn't define importance.
Talk
MotherDuck Features
Ecosystem
SQL
BI & Visualization
AI, ML and LLMs

