YouTube Interview Short

The Surprising Birth Of DuckDB ft. Co-creator Hannes Mühleisen

2023/03/06Featuring:

Hannes Mühleisen

TL;DR: DuckDB co-creator Hannes Mühleisen shares how the project started from conversations with the R community—who loved data but hated databases—leading to the invention of a new category: in-process analytical databases.

The Origin Story

At CWI (Dutch Research Institute for Computer Science), Hannes worked in the Database Architectures group researching how data systems should be built. A surprising discovery changed everything.

"They Weren't Using Databases at All"

When talking to the R community (data practitioners doing serious analytical work):

They stored data in CSV files
Some had hand-rolled data frame engines
Everything was slow and limited
They really didn't like databases

Hannes was confused: "I really love databases. Why aren't they using our stuff?"

The Two Problems

Problem 1: Client Protocols Are Slow

Traditional database client protocols (built in the '80s, never updated):

Row-based
Heavy serialization
Moving data between database and R shell was painfully slow

Fix: They improved data transfer speeds. R users said: "Better, but..."

Problem 2: Server Management is a Nightmare

Client-server setup issues:

Works on your laptop, fails when sharing scripts
Configuration headaches
"We still hate managing database servers"

Insight: What about SQLite's approach—an in-process library, no server—but for analytics?

A New Category is Born

"It turns out we invented a whole new class of database systems: the in-process analytics thing."

SQLite is amazing but designed for transactional workloads, not analytical ones. This category didn't exist.

The Prototype: MonetDB Lite

First attempt: Hack an existing system (MonetDB) into an in-process analytical database.

Result: Worked well enough to get people excited, but hit architectural issues:

Can't just exit the process on errors (you're inside someone else's program)
Can't use APIs that change global state (working directory, locale)
"Really trivial stuff, but it adds up"

Starting From Scratch

Hannes and Mark Raasveldt made a bold decision: build a completely new database system.

The risk:

"Monumental undertaking"
"Not successful 99% of the time"
"Not great for your career as a researcher—you disappear for 5 years not writing papers"

The advantage: They already had product-market fit from the prototype. They knew what people wanted.

The Grind

"Mark and I basically did nothing else—evenings, weekends—just hacking on DuckDB for a couple of years."

SQL is complex. Supporting all the operators, edge cases, and details took massive effort.

Open Source: 2019

Immediate positive feedback. The design resonated because it came from real user problems, not theoretical research.

The Lesson

"My definition of success is not to write papers—it's to have impact. In data systems research, you have impact by making systems."

The R community's willingness to try "crazy stuff that was barely working" and give honest feedback was essential. This kind of practitioner collaboration is rare in academia—but it's why DuckDB exists.

TABLE OF CONTENTS

The Origin Story

"They Weren't Using Databases at All"

The Two Problems

A New Category is Born

The Prototype: MonetDB Lite

Starting From Scratch

Open Source: 2019

Related Videos

"Beyond the Benchmarks: A BigQuery Co-Founder's Guide to Evaluating Data Warehouse Performance" video thumbnail

2025-10-31

Beyond the Benchmarks: A BigQuery Co-Founder's Guide to Evaluating Data Warehouse Performance

Big Data is dead. Learn to evaluate data warehouse performance via Time-to-Insight and real costs, ignoring misleading petabyte-scale vendor benchmarks.

Stream

Interview

"Can DuckDB replace your data stack?" video thumbnail

60:00

2025-10-23

Can DuckDB replace your data stack?

MotherDuck co-founder Ryan Boyd joins the Super Data Brothers show to talk about all things DuckDB, MotherDuck, AI agents/LLMs, hypertenancy and more.

YouTube

BI & Visualization

AI, ML and LLMs

Interview

"Building AI Applications That Need Analytics" video thumbnail

2025-09-10

Building AI Applications That Need Analytics

Jordan Tigani explores how AI applications can leverage analytics databases to answer complex questions. LLMs struggle with calculations and private data. Enter "Hyper-Tenancy" for isolated cloud instances.

AI, ML and LLMs

Short

SQL

MotherDuck Features

Talk