YouTubePythonTutorial

DuckDB vs Pandas vs Polars For Python devs

2023/06/01Featuring:

TL;DR: DuckDB, Pandas, and Polars aren't enemies—they're complementary tools that integrate seamlessly via Apache Arrow. In benchmarks on 33M rows, DuckDB was fastest, Polars came second (with lazy evaluation), and Pandas ran out of memory.

The Three Frameworks Compared

FeatureDuckDBPandasPolars
TypeIn-process OLAP databaseData frame libraryData frame library
BackendC++NumPy → Arrow (2.0+)Rust
Query styleSQL (+ relational API)DataFrame methodsDataFrame + SQL
Lazy evaluationVia SQL optimizationNoYes
Larger-than-memoryYesNoYes (lazy mode)

Why DuckDB in a Python Workflow?

"But why do I need a database in my Python environment?"

DuckDB is an in-process database—it runs in the same process as your Python app. Install with pip and you're ready to go.

Key advantages:

  • Blazingly fast: Vectorized columnar query execution
  • Built-in extensions: JSON, Parquet, S3, spatial data—no extra pip packages
  • Single-file format: ACID-compliant database file
  • Arrow integration: Zero-copy data sharing with Pandas/Polars

Installation Size Matters

Comparing site-packages folder sizes:

  • DuckDB: Smallest footprint, extensions loaded on-demand
  • Polars: Lightweight compared to Pandas
  • Pandas: Largest, most dependencies

"Less dependencies, less code, less problems."

Syntax Comparison

The same operation (extract domain, group by, count) can be written in:

  • DuckDB SQL: Using regexp_extract and GROUP BY
  • DuckDB Relational API: Chained method calls in Python
  • Polars: DataFrame method chaining with str.extract

Each has its own syntax style, but all achieve the same result.

Benchmark: 33M Rows of Hacker News Data

Task: Read Parquet, extract domains, group by, count, write to S3

FrameworkResult
DuckDBFastest
Polars (lazy)Second
PandasOut of memory

Important: Polars required lazy evaluation (LazyFrame) to avoid memory blowup. You need to know the framework's optimization features to use it correctly.

The Apache Arrow Advantage

All three frameworks support Arrow, enabling:

  • Zero-copy conversion between DuckDB, Pandas, and Polars
  • Query Pandas with SQL via DuckDB
  • Mix and match the best tool for each step

DuckDB provides methods to convert results directly to Pandas DataFrames or Polars DataFrames, and can query existing DataFrames with SQL.

Versatility

  • DuckDB: CLI, Python, Rust, Java, Swift (mobile!)
  • Polars: Python, Rust, new CLI
  • Pandas: Python only, but massive visualization ecosystem (Seaborn, Plotly, etc.)

Should You Use DuckDB?

"It depends on your use case... but DuckDB can easily be installed with just pip install. It adds little overhead to your development. You should do it."

TL;DR: Leverage the best of all worlds—they work together.

Related Videos

" Preparing Your Data Warehouse for AI: Let Your Agents Cook" video thumbnail

2026-01-27

Preparing Your Data Warehouse for AI: Let Your Agents Cook

Jacob and Jerel from MotherDuck showcase practical ways to optimize your data warehouse for AI-powered SQL generation. Through rigorous testing with the Bird benchmark, they demonstrate that text-to-SQL accuracy can jump from 30% to 74% by enriching your database with the right metadata.

AI, ML and LLMs

SQL

MotherDuck Features

Stream

Tutorial

"The MCP Sessions - Vol 2: Supply Chain Analytics" video thumbnail

2026-01-21

The MCP Sessions - Vol 2: Supply Chain Analytics

Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!

Stream

AI, ML and LLMs

MotherDuck Features

SQL

BI & Visualization

Tutorial

"No More Writing SQL for Quick Analysis" video thumbnail

0:09:18

2026-01-21

No More Writing SQL for Quick Analysis

Learn how to use the MotherDuck MCP server with Claude to analyze data using natural language—no SQL required. This text-to-SQL tutorial shows how AI data analysis works with the Model Context Protocol (MCP), letting you query databases, Parquet files on S3, and even public APIs just by asking questions in plain English.

YouTube

Tutorial

AI