The Surprising Birth Of DuckDB ft. Co-creator Hannes Mühleisen
2023/03/06TL;DR: DuckDB co-creator Hannes Mühleisen shares how the project started from conversations with the R community—who loved data but hated databases—leading to the invention of a new category: in-process analytical databases.
The Origin Story
At CWI (Dutch Research Institute for Computer Science), Hannes worked in the Database Architectures group researching how data systems should be built. A surprising discovery changed everything.
"They Weren't Using Databases at All"
When talking to the R community (data practitioners doing serious analytical work):
- They stored data in CSV files
- Some had hand-rolled data frame engines
- Everything was slow and limited
- They really didn't like databases
Hannes was confused: "I really love databases. Why aren't they using our stuff?"
The Two Problems
Problem 1: Client Protocols Are Slow
Traditional database client protocols (built in the '80s, never updated):
- Row-based
- Heavy serialization
- Moving data between database and R shell was painfully slow
Fix: They improved data transfer speeds. R users said: "Better, but..."
Problem 2: Server Management is a Nightmare
Client-server setup issues:
- Works on your laptop, fails when sharing scripts
- Configuration headaches
- "We still hate managing database servers"
Insight: What about SQLite's approach—an in-process library, no server—but for analytics?
A New Category is Born
"It turns out we invented a whole new class of database systems: the in-process analytics thing."
SQLite is amazing but designed for transactional workloads, not analytical ones. This category didn't exist.
The Prototype: MonetDB Lite
First attempt: Hack an existing system (MonetDB) into an in-process analytical database.
Result: Worked well enough to get people excited, but hit architectural issues:
- Can't just exit the process on errors (you're inside someone else's program)
- Can't use APIs that change global state (working directory, locale)
- "Really trivial stuff, but it adds up"
Starting From Scratch
Hannes and Mark Raasveldt made a bold decision: build a completely new database system.
The risk:
- "Monumental undertaking"
- "Not successful 99% of the time"
- "Not great for your career as a researcher—you disappear for 5 years not writing papers"
The advantage: They already had product-market fit from the prototype. They knew what people wanted.
The Grind
"Mark and I basically did nothing else—evenings, weekends—just hacking on DuckDB for a couple of years."
SQL is complex. Supporting all the operators, edge cases, and details took massive effort.
Open Source: 2019
Immediate positive feedback. The design resonated because it came from real user problems, not theoretical research.
The Lesson
"My definition of success is not to write papers—it's to have impact. In data systems research, you have impact by making systems."
The R community's willingness to try "crazy stuff that was barely working" and give honest feedback was essential. This kind of practitioner collaboration is rare in academia—but it's why DuckDB exists.
Related Videos
2025-10-31
Lies, Damn Lies, and Benchmarks
Why do database benchmarks so often mislead? MotherDuck CEO Jordan Tigani discusses the pitfalls of performance benchmarking, lessons from BigQuery, and why your own workload is the only benchmark that truly matters.
Stream
Interview

60:00
2025-10-23
Can DuckDB replace your data stack?
MotherDuck co-founder Ryan Boyd joins the Super Data Brothers show to talk about all things DuckDB, MotherDuck, AI agents/LLMs, hypertenancy and more.
YouTube
BI & Visualization
AI, ML and LLMs
Interview

2025-09-10
Building AI Applications That Need Analytics
Jordan Tigani explores how AI applications can leverage analytics databases to answer complex questions. LLMs struggle with calculations and private data. Enter "Hyper-Tenancy" for isolated cloud instances.
AI, ML and LLMs
Short
SQL
MotherDuck Features
Talk

