HEY, FRIEND 👋
I hope you're doing well. I'm Simon, and I am happy to share another monthly newsletter with highlights and the latest updates about DuckDB, delivered straight to your inbox.
In this February issue, I gathered the usual 10 updates and news highlights from DuckDB's ecosystem. Please enjoy this month's update with an MSSQL extension, DuckDB with MySQL, or the Ghostty emulator in the browser. Further, the latest Small Data Talks are online, Vortex support is available, and much more.
If you have feedback, news, or any insights, they are always welcome. 👉🏻 duckdbnews@motherduck.com.
![]() | Featured Community Member |

Stephanie Wang
Stephanie is a Staff Software Engineer at MongoDB, passionate about the intersection of core data infrastructure and entrepreneurship.
She built the mongo community extension for DuckDB, which lets you run SQL queries directly against MongoDB collections — no data export or ETL required. The extension supports both standalone MongoDB instances and Atlas clusters, with automatic schema inference and filter pushdown for efficient querying. Just INSTALL mongo FROM community; and you're ready to run analytical SQL on your MongoDB data!
Check out her LinkedIn to connect with her.
![]() | Top DuckDB Links this Month |
Bringing Microsoft SQL Server to DuckDB: A Native TDS Extension
TL;DR: Vladimir has released the mssql-extension, a DuckDB community extension providing native TDS protocol communication with Microsoft SQL Server, eliminating external drivers.
This extension offers zero external dependencies, full TLS/SSL support, connection pooling, and projection pushdown for optimized queries directly at the SQL Server level, enabling efficient data fetching like SELECT * FROM sqlserver.dbo.customers WHERE status = 'active'. It translates standard DuckDB DDL to T-SQL, supports INSERT with automatic batching, and allows direct COPY operations from MSSQL to local DuckDB tables or formats like Parquet.
In Part 2, Vladimir delivers UPDATE/DELETE support, transaction semantics, CTAS, and a native TDS BulkLoadBCP implementation hitting ~1.2M rows/sec, all without ODBC or JDBC. Code is on GitHub.
AliSQL is a MySQL branch originated from Alibaba Group
TL;DR: AliSQL integrates DuckDB as a columnar OLAP engine and introduces native vector search via HNSW, providing performance boosts for analytical and AI/ML workloads within a MySQL-compatible environment.
The core advancement is treating DuckDB as an ENGINE=DuckDB storage engine, enabling a reported 200x speedup on analytical queries compared to InnoDB. It also features a native Vector Index (VIDX) using HNSW with up to 16,383 dimensions, supporting ANN search with VECTOR(N) data types and distance functions like COSINE_DISTANCE. All of this works seamlessly with existing MySQL tools.
A browser-based SQL Terminal for DuckDB powered by Ghostty terminal emulator
TL;DR: A browser-based SQL REPL for DuckDB, using WebAssembly and Ghostty for terminal emulation.
It runs DuckDB via WASM in a Web Worker with ghostty-web, supporting syntax highlighting, multi-line input, and persistent command history. Try SELECT (random() * 100)::INTEGER AS value FROM generate_series(1, 200); and then add .chart to see what happens 🙂 (powered by uPlot). It also supports OPFS for persistent storage, direct CSV/Parquet loading, and experimental AI-powered SQL generation via .ai commands. Check the User Guide and Code for more info.
Announcing Vortex Support in DuckDB
TL;DR: DuckDB now officially supports Vortex, a newish columnar file format, via a core extension, demonstrating significant performance gains over Parquet in TPC-H benchmarks.
Vortex is an extensible, open-source columnar format with lightweight compression. Its key innovation, as Guillermo explained, is the ability to run compute functions on compressed data, filtering within storage segments without full decompression. This "late materialization" strategy leverages FastLanes encoding and defers decompression to the CPU or GPU. The duckdb-vortex extension integrates seamlessly via read_vortex() and COPY ... (FORMAT vortex).
Last night a DB saved my life
TL;DR: Jonathan shows how DuckDB + Parquet replaces Pandas, Dask, and ad-hoc Postgres for 'large-but-not-big' data processing.
These two blog posts document how it changed the author's way of working. Part 1 covers the serverless architecture, executing queries in-memory or directly over files. Part 2 focuses on how Parquet streamlines workflows by enabling efficient incremental data management and improving performance for complex joins on large datasets compared to Python/Pandas.
Why DuckDB is my first choice for data processing
TL;DR: Robin highlights DuckDB's performance, "friendly SQL" enhancements, and integration capabilities for modern data processing.
This article got lots of impressions on HackerNews due to DuckDB's versatile data ingestion, embeddability, performance for analytical workloads, and SQL as a stable interface.
The article showcases DuckDB being 100-1,000 times faster than OLTP databases for analytical queries, making it ideal for CI/CD and rapid development. It highlights key SQL features such as EXCLUDE, COLUMNS('emp_(.*)') AS '\1' for regex-based column selection and renaming, QUALIFY, and function chaining (e.g., first_name.lower().trim()) that significantly improve ergonomics.
Small Data SF 2025 - Talks are out (Videos)
TL;DR: Small Data SF returned to San Francisco in November last year, and 16 talks are now available. Speakers include Jordan Tigani, Joe Reis, Holden Karau, and Glauber Costa.
Jordan kicked things off arguing that most data infrastructure is designed backwards. Other highlights: George Fraser on how Fivetran built a distributed DuckDB system for Iceberg and Delta lakes, Glauber Costa on rewriting SQLite in Rust, Adi Polak arguing the real problem was never about big data, and Holden Karau on "When Not to Use Spark?".
Also check out Ryan's reflections on Jordan's keynote: Stop Paying the Complexity Tax.
DuckDB vs. Polars: Performance & Memory on Parquet Data
TL;DR: Benchmarking DuckDB and Polars on up to 2 TB of Parquet data reveals distinct memory strategies and the critical impact of file layout.
Niklas's stress-testing shows DuckDB's peak memory stays below 2.5 GB even on 2 TB datasets, thanks to its strict buffer manager. Default Polars, leveraging mmap, showed up to 20 GB peak memory for large files (though reclaimable by the OS).
Maybe most surprising: partitioning a 140 GB dataset into 72 smaller files cut DuckDB's peak memory by 8x (to 160 MB) and Polars' by 4x (to 4.3 GB). File organization impacts memory more than the choice of engine itself.
Streaming Patterns with DuckDB
TL;DR: Guillermo outlines how DuckDB can handle streaming analytics through materialized views, lakehouse integration, and streaming engines.
He starts with three architectural patterns, noting that DuckDB excels in the "Materialized View Pattern" even without native support. This pattern involves a "Delta Processor" using periodic MERGE INTO statements. For lakehouse architectures, DuckLake's Data Inlining and Data Change Feed enhance performance for high-throughput inserts by avoiding small files and unnecessary scans.
DuckDB also integrates with Spark Streaming via JDBC, and the tributary community extension can directly query Kafka topics.
DuckDB: The Swiss Army Knife For Data Engineers
TL;DR: Alejandro shows how DuckDB replaces pandas, Spark, and Airflow for 80% of use cases, highlighting its ability to query 50GB files on 8GB laptops due to intelligent data streaming.
Technical implementations include direct ETL from S3, cross-database joins using extensions, and direct querying of APIs or Google Sheets.
![]() | Upcoming Events |
Streams, Queries & Quacks: A Data Meetup with Estuary & MotherDuck
2026-02-17. h: 18:00. New York City, USA
We'll walk through turning Slack conversations into a queryable knowledge base using real-time data pipelines. Jonathan Wihl from Estuary will show how streaming data flows into MotherDuck without batch processing, and Jacob Matson from MotherDuck will cover metadata enrichment techniques that improve text-to-SQL accuracy — going beyond traditional foreign keys to surface hidden relationships in your data. Expect working demos, not slides.
SF Apache DataFusion Meetup
2026-02-19. h: 05:30. San Francisco, CA
Five talks on Apache DataFusion and what's happening around it in data infrastructure. Xiangpeng Hao from UW-Madison covers pushdown caching with LiquidCache. Bev Turnbaugh from MotherDuck presents DuckLake and what a next-generation lakehouse looks like. Zac Farrell from Hotdata pairs DataFusion with DuckLake, LakeSail's Shehab Amin demos native Iceberg and Delta Lake support, and Embucket's founders walk through DataFusion infrastructure internals. Hosted at Chroma in SF with food and networking from 5:30.
From Ad Hoc Questions to Real-Time Answers
2026-02-19. h: 09:00. Online
Data teams field the same types of questions over and over — "what were last month's numbers," "break this down by region," "compare Q3 to Q4." This webinar looks at where LLMs, natural language, and AI agents fit into SQL-first workflows. We'll cover how MotherDuck turns ad hoc requests into repeatable, real-time answers without replacing SQL or adding complexity to your stack.
Building an AI Chatbot for your SaaS app in 1 day
2026-02-26. h: 09:00. Online
This webinar walks through building a conversational AI layer for a SaaS product using the MotherDuck MCP server. We'll cover the full loop: connecting an LLM to live user data with read-only scoped access, building a streaming chat backend that handles sequential tool calls, adding custom visualization tools inside the conversation, and isolating customer data with MotherDuck's hypertenancy architecture. You'll leave with a reusable pattern you can drop into your own product.
Start using MotherDuck now!






