TL;DR: DuckDB, Pandas, and Polars aren't enemies—they're complementary tools that integrate seamlessly via Apache Arrow. In benchmarks on 33M rows, DuckDB was fastest, Polars came second (with lazy evaluation), and Pandas ran out of memory.
The Three Frameworks Compared
| Feature | DuckDB | Pandas | Polars |
|---|---|---|---|
| Type | In-process OLAP database | Data frame library | Data frame library |
| Backend | C++ | NumPy → Arrow (2.0+) | Rust |
| Query style | SQL (+ relational API) | DataFrame methods | DataFrame + SQL |
| Lazy evaluation | Via SQL optimization | No | Yes |
| Larger-than-memory | Yes | No | Yes (lazy mode) |
Why DuckDB in a Python Workflow?
"But why do I need a database in my Python environment?"
DuckDB is an in-process database—it runs in the same process as your Python app. Install with pip and you're ready to go.
Key advantages:
- Blazingly fast: Vectorized columnar query execution
- Built-in extensions: JSON, Parquet, S3, spatial data—no extra pip packages
- Single-file format: ACID-compliant database file
- Arrow integration: Zero-copy data sharing with Pandas/Polars
Installation Size Matters
Comparing site-packages folder sizes:
- DuckDB: Smallest footprint, extensions loaded on-demand
- Polars: Lightweight compared to Pandas
- Pandas: Largest, most dependencies
"Less dependencies, less code, less problems."
Syntax Comparison
The same operation (extract domain, group by, count) can be written in:
- DuckDB SQL: Using regexp_extract and GROUP BY
- DuckDB Relational API: Chained method calls in Python
- Polars: DataFrame method chaining with str.extract
Each has its own syntax style, but all achieve the same result.
Benchmark: 33M Rows of Hacker News Data
Task: Read Parquet, extract domains, group by, count, write to S3
| Framework | Result |
|---|---|
| DuckDB | Fastest |
| Polars (lazy) | Second |
| Pandas | Out of memory |
Important: Polars required lazy evaluation (LazyFrame) to avoid memory blowup. You need to know the framework's optimization features to use it correctly.
The Apache Arrow Advantage
All three frameworks support Arrow, enabling:
- Zero-copy conversion between DuckDB, Pandas, and Polars
- Query Pandas with SQL via DuckDB
- Mix and match the best tool for each step
DuckDB provides methods to convert results directly to Pandas DataFrames or Polars DataFrames, and can query existing DataFrames with SQL.
Versatility
- DuckDB: CLI, Python, Rust, Java, Swift (mobile!)
- Polars: Python, Rust, new CLI
- Pandas: Python only, but massive visualization ecosystem (Seaborn, Plotly, etc.)
Should You Use DuckDB?
"It depends on your use case... but DuckDB can easily be installed with just
pip install. It adds little overhead to your development. You should do it."
TL;DR: Leverage the best of all worlds—they work together.


