DuckDB Ecosystem: November 2025

2025/11/12 - 9 min read

BY

Hey, friend 👋

I hope you're doing well. I'm Simon, and I am excited to share another monthly newsletter with highlights and the latest updates about DuckDB, delivered straight to your inbox.

In this November issue, I compiled eight updates and news highlights (the usual 10 links) from DuckDB's ecosystem. This month, we've got updates including block-based caching for remote files, DuckLake's simplified lakehouse architecture, and powerful new extensions for DNS lookups and ML inference. In addition to a comprehensive analysis of the extension ecosystem, there is a fascinating experiment that stores an entire movie as relational data.

If you have feedback, news, or any insights, they are always welcome. 👉🏻 duckdbnews@motherduck.com.

Post asset
Post asset

Andreas Kretz

Andreas Kretz is a data engineering educator and the founder of Learn Data Engineering Academy. Before transitioning to education, he spent over 10 years at Bosch as a Data Engineering and Data Labs Team Lead. He's the creator of The Data Engineering Cookbook, an open-source GitHub resource that has become a widely used reference for data platform architecture fundamentals.

Andreas just released DuckDB for Data Engineers: From Local to Cloud with MotherDuck, a hands-on course teaching hybrid data workflows using DuckDB locally and MotherDuck in the cloud. The course covers everything from local setup to building production analytics with Duck Lake's lakehouse format.

Thanks, Andreas, for making data engineering education accessible and for bringing DuckDB and MotherDuck to your community!

You can find more of Andreas's work at andreaskretz.com.

Post asset

DuckDB QuackStore Extension

TL;DR: The QuackStore DuckDB extension introduces block-based caching for remote files, enhancing performance for recurring data queries by localizing frequently accessed data.

The extension implements persistent, block-based caching (1MB blocks) with LRU eviction, meaning only actively used file segments are stored, making it highly efficient for large remote files. This approach supports automatic corruption detection, re-fetching corrupt blocks as needed. Setup involves installing the extension followed by SET GLOBAL quackstore_cache_path = '/path/to/cache.bin'; and SET GLOBAL quackstore_cache_enabled = true;.

I could speed up reading 25 million rows over a mobile phone without cache select count(*) FROM read_csv('https://noaa-ghcn-pds.s3.amazonaws.com/csv.gz/by_year/2025.csv.gz'); from 49.366 seconds to 3.304 seconds after the first cache with select count(*) FROM read_csv('quackstore://https://noaa-ghcn-pds.s3.amazonaws.com/csv.gz/by_year/2025.csv.gz');. Even the heavy command SUMMARIZE with the cached query took only 4.1 seconds without additional caching on the first try. I can imagine this being hugely powerful for apps that need fast query-responses. Remember: building cache is one of the hardest challenges out there, as it's constantly outdated and needs to foresee potential future queries.

📝 Striim implemented a similar solution with DuckDB as an OLAP Cache with PostgreSQL Extension as part of their streaming solution. Read more on Beyond Materialized Views by John Kutay.

duckdb-dns: DNS (Reverse) Lookup Extension for DuckDB

TL;DR: The DNS extension, a pure Rust implementation, integrates powerful DNS lookup and reverse DNS capabilities into DuckDB, featuring dynamic configuration for resolvers, caching, and concurrency.

You can simply run SELECT dns_lookup('motherduck.com'); after installing the extension to get the IP. Tobias's extension leverages the DuckDB C Extension API to provide scalar functions like dns_lookup() and dns_lookup_all() for various record types (A, AAAA, CNAME, MX, TXT, etc.), alongside reverse_dns_lookup(). It includes DNS configs for switching DNS providers (e.g., 'google', 'cloudflare'), setting concurrency limits (default 50), and cache size and more. This extension offers efficient, in-database network data resolution.

Tobias, a very active community member, also created another extension called sql-workbench-embedded for embedding DuckDB queries directly as part of your website, such as HTML-based sites, or React or Vue applications. I tested it immediately as part of my static Hugo second brain, and it works great.

Why Python Developers Need DuckDB (And Not Just Another DataFrame Library)

TL;DR: Explaining DuckDB's full database capabilities over standalone DataFrame libraries for Python developers.

Mehdi emphasizes the in-process nature and comprehensive database features, ensuring native ACID transactions, data integrity with automatic rollbacks, and robust persistence. And perhaps the biggest advantage, DuckDB's language agnosticism, which supports JavaScript (WebAssembly), Java, and Rust, enables consistent access across diverse environments. Its "friendly SQL" syntax (e.g., SELECT * EXCLUDE (password, ssn)) is another plus.

For Python users, zero-copy integration with Pandas and Polars through Apache Arrow allows querying Dataframes directly with SQL, facilitating incremental adoption. DuckDB provides an integrated solution, blending database power with DataFrame simplicity.

Infera: A DuckDB extension for in-database inference

TL;DR: Infera is a new DuckDB extension, developed in Rust, that integrates machine learning inference directly into SQL queries using ONNX models via the Tract inference engine.

This capability allows data practitioners to perform predictions without moving data out of the database, streamlining ML workflows. For example, after installing and loading the extension, models can be loaded using select infera_load_model('model_name', 'model_url'); and predictions executed via select infera_predict('model_name', ...);.

Hassan notes that this approach adds ML inference as a first-class citizen in SQL, supporting both local and remote models, and handling single or multi-value outputs efficiently on table columns or raw tensor data. Check out the short terminal video.

DuckDB Extensions Analysis

TL;DR: DuckDB's extension ecosystem is rapidly expanding, with 127 extensions providing diverse functionalities from core data format support to advanced community-driven integrations, which might be hard to keep up with.

To manage all extensions discussed in this newsletter too, this automated analysis report helps you stay up to date with the latest activities per extension and the most important properties. It's still work in progress, but the latest analysis reveals already the extension landscape, comprising 24 core and 103 community extensions, with significant recent activity (19 very active, 66 recently active). It's impressive to see the range of implementation languages, including C++, Rust, Python, and even Shell scripts, demonstrating a flexible and extensible architecture.

Michael, the creator, also shares a little more in his blog post about Navigating DuckDB Extension Updates: Lessons from the Field. The code is available on GitHub.

DuckLake: Learning from Cloud Data Warehouses to Build a Robust “Lakehouse” (Jordan Tigani)

TL;DR: In this video, Jordan presented how DuckLake solves lakehouse challenges by storing metadata in a database rather than chained files, enabling ACID transactions while simplifying deployment. DuckLake is an open source implementation of an architectural pattern proven at scale inside both BigQuery and Snowflake.

Jordan discussed the convergent evolution of lakehouses toward cloud data warehouse architectures, arguing that "tables are a better interface than files" and "databases are a better place to store metadata than object stores." He contrasted Iceberg's multi-layered approach (REST catalog → metadata.json → manifest lists → manifest files) with DuckLake's direct SQL storage. File pruning becomes a simple query following a similar approach to BigQuery's internal Spanner queries.

For writes, DuckLake buffers small writes in catalog tables for immediate querying, avoiding Iceberg's tiny file problem. DuckLake is an open standard that can be implemented by other analytical engines. As one example, a minimal Spark connector requires just 34 lines (proof of concept).

Relational Charades: Turning Movies into Tables

TL;DR: DuckDB can store and process video data by representing frames as relational tables.

In this experiment, Hannes explored turning the 1963 film "Charade" into a DuckDB table. The full movie, at 720x392 resolution, resulted in a table of 47 billion rows, stored in approximately 200 GB using DuckDB's native format and lightweight compression. The article shows two new features of DuckDB with its transformation leveraging replacement scans to directly query NumPy arrays (for R, G, B components) and POSITIONAL JOIN for efficient bulk INSERT operations per frame. Hannes demonstrated that SUMMARIZE on this massive table completes in around 20 minutes on a MacBook, and a DISTINCT r, g, b query, benefiting from DuckDB's larger-than-memory aggregate hash table, finishes in about 2 minutes.

This illustrates DuckDB's capability to manage and analyze extremely large, non-traditional datasets efficiently on local hardware in an entertaining and unusual way 🙂.

Free Tutorial - Data Engineering with DuckDB & MotherDuck

TL;DR: The free course introduces data engineers and analysts to building versatile data workflows using DuckDB for local processing and MotherDuck for scalable cloud analytics, emphasizing hybrid execution and the new DuckLake format.

This course, by Andreas, gets you started with the fundamentals about DuckDB and MotherDuck. Andreas explains how to set up DuckDB locally, demonstrating querying CSV/Parquet files and building persistent databases via CLI or UI. The course then transitions to MotherDuck, detailing connection methods like ATTACH for cloud query execution and exploring performance differences between local and cloud compute for analytical queries.

Andreas shows how to scale by connecting Python to MotherDuck for remote execution, or the ability to combine local and cloud datasets in a single "dual execution" or "hybrid workflow" query. Find the course on YouTube and follow the playlist, or go to Udemy. The code examples can be found on GitHub.

Post asset

Upcoming Events

Empowering Data Teams: Smarter AI Workflows with Hex + MotherDuck

November 13 - Online : 9:00 AM PT

​AI isn’t here to take over data work, it’s here to make it better. Join Hex + Motherduck for a practical look at how modern data teams are designing AI workflows that prioritize context, iteration, and real-world impact.

Data-based: Going Beyond the Dataframe

November 20 - Online : 9:30 AM PT

 ​​Learn how to go from local data exploration to scalable cloud analytics without changing your workflow. In this live demo, we’ll show how MotherDuck builds on DuckDB to give data scientists a fast, flexible, and Python-friendly way to analyze data—no infrastructure setup, no petabyte-scale headaches.

Subscribe to DuckDB Newsletter

blog subscription icon

Subscribe to motherduck blog

PREVIOUS POSTS

Faster Ducks

2025/10/28 - Jordan Tigani

Faster Ducks

Benchmarks, efficiency, and how MotherDuck just got nearly 20% faster.

4 Senior Data Engineers Answer 10 Top Reddit Questions

2025/10/30 - Simon Späti

4 Senior Data Engineers Answer 10 Top Reddit Questions

A great panel answering the most voted/commented data questions on Reddit