MotherDuck Research

Advancing the future of data + AI.

Publications

All Caching Code Generation Cost-Based Optimization Data Apps Data Lineage Data Transformation Data Wrangling DuckDB Entity Matching Hybrid Query Processing LLM MotherDuck Query Optimization Schema Documentation Serverless Analytics Text-to-SQL

MotherDuck: DuckDB in the Cloud and in the Client

DuckDBHybrid Query ProcessingServerless Analytics

MotherDuck: DuckDB in the Cloud and in the Client

We describe and demo MotherDuck: a new service that connects DuckDB to the cloud. MotherDuck provides the concept of hybrid query processing: the ability to execute queries partly on the client and partly in the cloud.

LEARN MORE

Results on Results: Building New Results from Cached Partial Results

CachingData LineageDuckDBMotherDuck

Results on Results: Building New Results from Cached Partial Results

An intelligent recovery framework for real-time SQL previews. When composing partial cached results produces too few rows, a Data Lineage heuristic selects the cheapest upstream dependency to re-fetch — enabling fluid exploratory analysis in MotherDuck's hybrid architecture.

LEARN MORE

Cost-Based Hybrid Query Optimization in MotherDuck

Cost-Based OptimizationDuckDBHybrid Query ProcessingQuery Optimization

Cost-Based Hybrid Query Optimization in MotherDuck

This thesis tackles a core challenge in MotherDuck's hybrid execution model: deciding which parts of a query should run locally vs. in the cloud. The result is a cost-based optimizer that delivers up to 17x speedups on long-running analytical queries.

LEARN MORE

Query-Log-Informed Schema Descriptions and their Impact on Text-to-SQL

DuckDBMotherDuckSchema DocumentationText-to-SQL

Query-Log-Informed Schema Descriptions and their Impact on Text-to-SQL

Automatically generating schema documentation from historical query logs to improve LLM-powered Text-to-SQL. Tested on both the BIRD benchmark and MotherDuck’s production data warehouse, query pattern descriptions boost SQL generation accuracy by up to 16% on real-world data.

LEARN MORE

Declarative Caching in MotherDuck

Data AppsDuckDBHybrid Query ProcessingQuery Optimization

Declarative Caching in MotherDuck

This thesis introduces Accelerated Approximate Views — a new SQL-level caching mechanism for MotherDuck’s hybrid execution model. By partially materializing query results on the client, AAVs reduce latency for interactive data exploration in both native and WebAssembly environments.

LEARN MORE

Towards Efficient Data Wrangling with LLMs using Code Generation

Code GenerationData TransformationData WranglingEntity MatchingLLMMotherDuck

Towards Efficient Data Wrangling with LLMs using Code Generation

Instead of applying LLMs to every row, generate code once and run it on millions of rows. Up to 37-point F1 improvement on data transformations at a fraction of the cost.

LEARN MORE