YouTube Python Tutorial

DuckDB vs Pandas vs Polars: Performance Comparison for Python

2023/06/01Featuring:

TL;DR: DuckDB, Pandas, and Polars aren't enemies—they're complementary tools that integrate seamlessly via Apache Arrow. In benchmarks on 33M rows, DuckDB was fastest, Polars came second (with lazy evaluation), and Pandas ran out of memory.

The Three Frameworks Compared

Feature	DuckDB	Pandas	Polars
Type	In-process OLAP database	Data frame library	Data frame library
Backend	C++	NumPy → Arrow (2.0+)	Rust
Query style	SQL (+ relational API)	DataFrame methods	DataFrame + SQL
Lazy evaluation	Via SQL optimization	No	Yes
Larger-than-memory	Yes	No	Yes (lazy mode)

Why DuckDB in a Python Workflow?

"But why do I need a database in my Python environment?"

DuckDB is an in-process database—it runs in the same process as your Python app. Install with pip and you're ready to go.

Key advantages:

Blazingly fast: Vectorized columnar query execution
Built-in extensions: JSON, Parquet, S3, spatial data—no extra pip packages
Single-file format: ACID-compliant database file
Arrow integration: Zero-copy data sharing with Pandas/Polars

Installation Size Matters

Comparing site-packages folder sizes:

DuckDB: Smallest footprint, extensions loaded on-demand
Polars: Lightweight compared to Pandas
Pandas: Largest, most dependencies

"Less dependencies, less code, less problems."

Syntax Comparison

The same operation (extract domain, group by, count) can be written in:

DuckDB SQL: Using regexp_extract and GROUP BY
DuckDB Relational API: Chained method calls in Python
Polars: DataFrame method chaining with str.extract

Each has its own syntax style, but all achieve the same result.

Benchmark: 33M Rows of Hacker News Data

Task: Read Parquet, extract domains, group by, count, write to S3

Framework	Result
DuckDB	Fastest
Polars (lazy)	Second
Pandas	Out of memory

Important: Polars required lazy evaluation (LazyFrame) to avoid memory blowup. You need to know the framework's optimization features to use it correctly.

The Apache Arrow Advantage

All three frameworks support Arrow, enabling:

Zero-copy conversion between DuckDB, Pandas, and Polars
Query Pandas with SQL via DuckDB
Mix and match the best tool for each step

DuckDB provides methods to convert results directly to Pandas DataFrames or Polars DataFrames, and can query existing DataFrames with SQL.

Versatility

DuckDB: CLI, Python, Rust, Java, Swift (mobile!)
Polars: Python, Rust, new CLI
Pandas: Python only, but massive visualization ecosystem (Seaborn, Plotly, etc.)

Should You Use DuckDB?

"It depends on your use case... but DuckDB can easily be installed with just pip install. It adds little overhead to your development. You should do it."

TL;DR: Leverage the best of all worlds—they work together.

TABLE OF CONTENTS

The Three Frameworks Compared

Why DuckDB in a Python Workflow?

Installation Size Matters

Syntax Comparison

Benchmark: 33M Rows of Hacker News Data

The Apache Arrow Advantage

Should You Use DuckDB?

Related Videos

" Preparing Your Data Warehouse for AI: Let Your Agents Cook" video thumbnail

2026-01-27

Preparing Your Data Warehouse for AI: Let Your Agents Cook

Jacob and Jerel from MotherDuck showcase practical ways to optimize your data warehouse for AI-powered SQL generation. Through rigorous testing with the Bird benchmark, they demonstrate that text-to-SQL accuracy can jump from 30% to 74% by enriching your database with the right metadata.

AI, ML and LLMs

SQL

MotherDuck Features

Stream

Tutorial

"No More Writing SQL for Quick Analysis" video thumbnail

0:09:18

2026-01-21

No More Writing SQL for Quick Analysis

Learn how to use the MotherDuck MCP server with Claude to analyze data using natural language—no SQL required. This text-to-SQL tutorial shows how AI data analysis works with the Model Context Protocol (MCP), letting you query databases, Parquet files on S3, and even public APIs just by asking questions in plain English.

YouTube

Tutorial

AI, ML and LLMs

"The MCP Sessions - Vol 2: Supply Chain Analytics" video thumbnail

2026-01-21

The MCP Sessions - Vol 2: Supply Chain Analytics

Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!

Stream

AI, ML and LLMs

MotherDuck Features

SQL

BI & Visualization

Tutorial