This Month in the DuckDB Ecosystem: January 2026
2026/01/17 - 8 min read
BYHey, friend 👋
I hope you're doing well. First of all, a happy new year to you and your family. I'm Simon, and I am excited to share the first monthly newsletter of this new year, featuring highlights and the latest updates on DuckDB, delivered straight to your inbox.
In this January issue, I gathered the usual 10 links, with insights from many new extensions, such as TypeScript Macros, Iceberg write and update operations, and FIX logs directly within the comfort of DuckDB SQL queries. There's Jordan's take on working with AI agents and using Tera as a templating framework to write more complex SQL queries.
![]() | Featured Community Member |
.jpeg%3Fupscale%3Dtrue%26name%3DT03E74FC7QC-U09NJJ9H9GS-g131e88062ec-512%2520(1).jpeg&w=3840&q=75)
![]() | Top DuckDB Links this Month |
Writes in DuckDB-Iceberg
TL;DR: The DuckDB-Iceberg extension now offers full DML support for Iceberg v2 tables, including INSERT, UPDATE, and DELETE operations, along with improved metadata and transactional consistency.
In case you missed it, in the DuckDB v1.4.2 release before the new year, DuckDB expands the DuckDB-Iceberg extension, complementing existing CREATE TABLE and INSERT functionality. Standard SQL syntax is now supported. A handy feature, the first table read caches snapshot information, preventing repeated fetches for subsequent reads within the same transaction and improving performance by reducing REST Catalog calls. Time travel queries are accessed via AT (VERSION => snapshot_id) and provide detailed metadata introspection through iceberg_metadata() functions.
Iceberg in the Browser
TL;DR: DuckDB-Wasm now supports the Iceberg extension, enabling serverless, in-browser interaction with Iceberg REST Catalogs, including read and write capabilities, without requiring local installations or managed infrastructure.
This capability was achieved through HTTP interaction redesigns in the core DuckDB codebase, implementing a JavaScript network stack wrapper in DuckDB-Wasm, and routing all networking in the DuckDB-Iceberg extension through this uniform HTTP interface. This allows users to directly query Iceberg from a browser tab, as Carlo explains, utilizing commands like ATTACH 'warehouse' AS db (TYPE ICEBERG, ENDPOINT_URL 'https://your-iceberg-endpoint'); for setup. Computations are performed locally in the browser, ensuring sensitive data, such as credentials, remain client-side for processing, offering a true zero-setup, infrastructure-free experience.
TypeScript scripts as DuckDB Table Functions
TL;DR: This post describes a method for integrating TypeScript scripts as DuckDB table functions, enabling direct SQL querying of REST APIs, GraphQL endpoints, and web pages by leveraging DuckDB's shellfs and arrow extensions with Bun.
Tobias illustrates an integration pattern in which DuckDB's shellfs extension reads output from shell commands, and the arrow extension parses it into the Apache Arrow IPC format.
A CREATE OR REPLACE MACRO bun(script, args := '') AS TABLE SELECT * FROM read_arrow('bun ' || script || ' ' || args || ' |'); macro enables running TypeScript scripts. These scripts, executed by Bun, use the json-to-arrow-ipc npm package to convert fetched JSON into DuckDB-compatible Arrow IPC and stream it to stdout. This setup may help with dynamic data ingestion from diverse external sources directly into DuckDB for ad hoc SQL analysis.
Check also the related Python table function at Python Scripts as DuckDB Table Functions.
Processing 1 TB with DuckDB in less than 30 seconds
TL;DR: A benchmark demonstrating DuckDB's capability to process 1TB datasets faster than previously assumed, especially when leveraging MotherDuck's cloud infrastructure and optimized data structures.
Matt's experiments involved processing a 1TB dataset on a local M2 Pro laptop, executing a common aggregation query (GROUP BY, SUM, and COUNT) in an average of 1 minute, 29 seconds. Transitioning to MotherDuck with "Mega" compute, the same query on an unsorted 1TB dataset averaged under 17 seconds. A key optimization was loading the dataset in a sorted order by the field on which the grouping takes effect, which cut the benchmark time by roughly 30%.
1TB of Parquet files. Single Node Benchmark. (DuckDB style)
TL;DR: Daniel benchmarked DuckDB processing a 1TB Parquet dataset directly from S3 on a single-node, 64GB RAM machine, completing a full-column analytical query in under 20 minutes while utilizing approximately 48GB of RAM.
Similar to the above article processing 1 TB, this benchmark leveraged DuckDB's Python client, configuring SET memory_limit='50GB'; and SET temp_directory='tmp'; to enable disk spilling, and SET threads = 16; on a Linode instance. Data was accessed directly from S3 via HTTPS and the Parquet format. Daniel also tried Daft, which took around 30 minutes. The code can be found on GitHub.
Quack-Cluster: A Serverless Distributed SQL Query Engine with DuckDB and Ray
TL;DR: Quack-Cluster is a serverless distributed SQL query engine combining DuckDB's analytical power with Ray for scalable parallel processing of object storage data.
Kristian developed Quack-Cluster, which uses FastAPI and SQLGlot as a Coordinator to parse incoming SQL queries and generate distributed execution plans for a Ray cluster. Each Ray Worker, implemented as a Ray Actor, embeds a DuckDB instance to process assigned query fragments on data subsets, utilizing Apache Arrow for efficient partial result aggregation.
The engine natively reads Parquet and CSV files directly from S3, GCS, or local filesystems, supporting glob patterns like "s3://my-data/2025/**/*.parquet". Quack-Cluster supports distributed JOIN operations and DuckDB's window functions for MPP-style analytics.
QuackFIX: Fix log extension for DuckDB
TL;DR: The QuackFIX DuckDB extension provides native SQL capabilities for parsing and querying FIX protocol log files, leveraging DuckDB's in-memory columnar engine for efficient analysis.
If you don't know, FIX log files are records of messages exchanged using the FIX (Financial Information eXchange) protocol, the standard messaging format used in electronic trading across financial markets. As logs are a constant need to process, this extension helps you directly ingest raw FIX messages and convert them into a structured, queryable format within DuckDB. Supporting custom FIX dialects via XML dictionaries (e.g., fix_fields('dialects/FIX44.xml')) to interpret venue-specific tags without external pre-processing, thus reducing "glue code."
Building an answering machine
TL;DR: MotherDuck's new MCP server empowers AI agents like Claude and ChatGPT to perform sophisticated data analytics via natural language, effectively abstracting SQL complexities through an iterative, agentic approach.
This article by Jordan shows the MotherDuck Answering Machine, implemented as an MCP server, exposes an endpoint (https://api.motherduck.com/mcp) that enables Large Language Models to interact with your data warehouse.
Jordan initially expressed skepticism, stating he "thought it was going to be interesting for small, well-curated datasets, but not super useful for real-world data." However, the system's core innovation lies in its "agentic" methodology, where the LLM doesn't attempt a single SQL query but instead acts as an iterative data analyst: probing schema, running diagnostic queries, inspecting results for anomalies, and refining its understanding. If you haven't tried agents, this gives an easy way to try.
Onager: A DuckDB extension for graph data analytics
TL;DR: The Onager DuckDB extension provides over 40 graph analytics functions implemented in Rust, leveraging the Graphina library, and offers multi-threaded algorithm execution via simple SQL table functions.
The Onager extension expands DuckDB's analytical toolkit with robust graph algorithms. Compared to the existing DuckPGQ extension, which focuses on graph querying, Onager focuses primarily on graph analytics.
Onager delivers a suite of pre-built functions for tasks like centrality measurement (e.g., onager_ctr_pagerank), community detection (onager_cmm_louvain), and shortest pathfinding (onager_pth_dijkstra). The core implementation is in Rust, using the Graphina graph library, and supports both directed and undirected graphs, as well as weighted and unweighted graphs, with multi-threaded execution through a uniform API. This helps us as data engineers perform advanced graph analytics directly within DuckDB.
Tera DuckDB Extension – Query.Farm
TL;DR: The new Tera DuckDB extension integrates a powerful templating engine directly into SQL for dynamic text generation.
Tera extension supports Rust-based Tera templating for template rendering within DuckDB SQL queries for generating dynamic reports, configuration files, and even dynamic SQL. Similar to what Jinja is doing to SQL in dbt.
The primary function, tera_render(template, context, ...options), accepts a template string or filename, a JSON context, and options. The extension supports the whole Tera templating language, including variables, filters, and control structures.
![]() | Upcoming Events |
Data Day Texas
Jan. 24-25. Austin, US
Data Day Texas, one of the largest independent data-centric events held within 1000 miles of Texas. MotherDuck is the official Database Track Sponsor! Join us Jan. 24-25.
dltHub ❤️ Marimo ❤️ MotherDuck
Thursday, January 29, Amsterdam
dltHub and Marimo and MotherDuck are having a child.
DuckDB Developer Meeting #1
Pakhuis de Zwijger, Amsterdam : Jan 30, 4:00 PM GMT+1
The first ever DuckDB Developer Meeting, organized by DuckDB Labs. The event will feature talks from DuckDB developers, and is aimed at developers who build DuckDB extensions or complex applications on top of DuckDB.
Subscribe to DuckDB Newsletter






