YouTubeInterviewShort

The Surprising Birth Of DuckDB ft. Co-creator Hannes Mühleisen

2023/03/06

TL;DR: DuckDB co-creator Hannes Mühleisen shares how the project started from conversations with the R community—who loved data but hated databases—leading to the invention of a new category: in-process analytical databases.

The Origin Story

At CWI (Dutch Research Institute for Computer Science), Hannes worked in the Database Architectures group researching how data systems should be built. A surprising discovery changed everything.

"They Weren't Using Databases at All"

When talking to the R community (data practitioners doing serious analytical work):

  • They stored data in CSV files
  • Some had hand-rolled data frame engines
  • Everything was slow and limited
  • They really didn't like databases

Hannes was confused: "I really love databases. Why aren't they using our stuff?"

The Two Problems

Problem 1: Client Protocols Are Slow

Traditional database client protocols (built in the '80s, never updated):

  • Row-based
  • Heavy serialization
  • Moving data between database and R shell was painfully slow

Fix: They improved data transfer speeds. R users said: "Better, but..."

Problem 2: Server Management is a Nightmare

Client-server setup issues:

  • Works on your laptop, fails when sharing scripts
  • Configuration headaches
  • "We still hate managing database servers"

Insight: What about SQLite's approach—an in-process library, no server—but for analytics?

A New Category is Born

"It turns out we invented a whole new class of database systems: the in-process analytics thing."

SQLite is amazing but designed for transactional workloads, not analytical ones. This category didn't exist.

The Prototype: MonetDB Lite

First attempt: Hack an existing system (MonetDB) into an in-process analytical database.

Result: Worked well enough to get people excited, but hit architectural issues:

  • Can't just exit the process on errors (you're inside someone else's program)
  • Can't use APIs that change global state (working directory, locale)
  • "Really trivial stuff, but it adds up"

Starting From Scratch

Hannes and Mark Raasveldt made a bold decision: build a completely new database system.

The risk:

  • "Monumental undertaking"
  • "Not successful 99% of the time"
  • "Not great for your career as a researcher—you disappear for 5 years not writing papers"

The advantage: They already had product-market fit from the prototype. They knew what people wanted.

The Grind

"Mark and I basically did nothing else—evenings, weekends—just hacking on DuckDB for a couple of years."

SQL is complex. Supporting all the operators, edge cases, and details took massive effort.

Open Source: 2019

Immediate positive feedback. The design resonated because it came from real user problems, not theoretical research.

The Lesson

"My definition of success is not to write papers—it's to have impact. In data systems research, you have impact by making systems."

The R community's willingness to try "crazy stuff that was barely working" and give honest feedback was essential. This kind of practitioner collaboration is rare in academia—but it's why DuckDB exists.

Related Videos

"Lies, Damn Lies, and Benchmarks" video thumbnail

2025-10-31

Lies, Damn Lies, and Benchmarks

Why do database benchmarks so often mislead? MotherDuck CEO Jordan Tigani discusses the pitfalls of performance benchmarking, lessons from BigQuery, and why your own workload is the only benchmark that truly matters.

Stream

Interview

"Can DuckDB replace your data stack?" video thumbnail

60:00

2025-10-23

Can DuckDB replace your data stack?

MotherDuck co-founder Ryan Boyd joins the Super Data Brothers show to talk about all things DuckDB, MotherDuck, AI agents/LLMs, hypertenancy and more.

YouTube

BI & Visualization

AI, ML and LLMs

Interview

"Building AI Applications That Need Analytics" video thumbnail

2025-09-10

Building AI Applications That Need Analytics

Jordan Tigani explores how AI applications can leverage analytics databases to answer complex questions. LLMs struggle with calculations and private data. Enter "Hyper-Tenancy" for isolated cloud instances.

AI, ML and LLMs

Short

SQL

MotherDuck Features

Talk