Hands-on Lab: Agentic Data Engineering with MotherDuck and AscendDecember 3, 10am PT / 1pm ET

YouTubeIngestionSQLEcosystem

pg_duckdb: Postgres analytics just got faster with DuckDB

2024/10/25

PostgreSQL is on a roll. It was named DBMS of the Year in 2023 and continues to be the most popular database among developers, according to the 2024 Stack Overflow survey. It's robust, reliable, and fantastic for transactional workloads (OLTP)—the bread and butter of most applications.

Many of us start our analytics journey right inside Postgres. Our application data is already there, so it's the easiest place to start asking questions. You might want to know, "How much revenue did we generate in a specific country last quarter?" or, "How many of our customers have names that rhyme with 'duck'?"

But as you ask more complex questions across larger datasets, you inevitably hit a wall. Postgres, a row-oriented database optimized for transactions, can become painfully slow for these analytical (OLAP) queries. The common solution involves setting up a separate data warehouse and a complex ETL pipeline to move data, which adds overhead and delays insights.

This is where DuckDB shines. As a fast, open-source, in-process OLAP database, it’s built from the ground up for analytics. Its popularity is growing exponentially for good reason.

So, here's the central question: What if you could get the analytical power of DuckDB without ever leaving your favorite PostgreSQL client?

Meet pg_duckdb, a new PostgreSQL extension that integrates DuckDB’s analytical engine directly into Postgres. It's an open-source project born from a partnership between Hydra and MotherDuck, designed to bring the best of both worlds together.

Today, you can. Let's get your psql client warmed up.

Part 1: Accelerate Local Analytics by 500x

Instead of moving your data, pg_duckdb brings the query engine to your data, giving you a massive performance boost right where you work.

Hands-On Demo: The 500x Performance Gain

Let's see it in action. The easiest way to get started is with the official Docker image, which comes with everything pre-configured.

Copy code
# Run the container with a permissive password setting for this demo
# In production, you should manage secrets properly!
docker run -d --name pgduckdb -e POSTGRES_HOST_AUTH_METHOD=trust ghcr.io/hydradatabase/pg_duckdb:main

Once the container is running, you can connect directly with psql:

Copy code
docker exec -it pgduckdb psql -U postgres

For this demo, we've pre-loaded a 1GB TPC-DS dataset, a standard benchmark for analytical systems. First, let's run a complex analytical query using the native PostgreSQL engine. We'll use \timing to measure how long it takes.

Copy code
-- psql
\timing
-- TPC-DS Query 1
SELECT
    c.c_customer_id AS customer_id,
    c.c_first_name AS customer_first_name,
    c.c_last_name AS customer_last_name,
    -- ... (rest of the complex query)
FROM
    customer c,
    customer_address ca,
    customer_demographics cd
WHERE
    ca.ca_city = 'Hopewell'
    -- ... (more joins and conditions)
GROUP BY
    c.c_customer_id,
    c.c_first_name,
    c.c_last_name;

Running this query on our instance took 1 minute and 29 seconds.

Copy code
Time: 89000.123 ms (01:29.000)

Now, let's unleash the duck. With pg_duckdb, all we have to do is enable the DuckDB engine with a simple SET command and run the exact same query.

Copy code
-- Enable the DuckDB engine for this session
SET pg_duckdb.enable = true;

-- Run the exact same TPC-DS query again
SELECT
    c.c_customer_id AS customer_id,
    -- ... (same query as before)

The result? 137 milliseconds.

Copy code
Time: 137.000 ms

That's not a typo. We went from nearly a minute and a half to just over a tenth of a second—a ~500x speedup. This is the power of DuckDB's columnar engine, which is purpose-built for analytical queries, reading only the data it needs from each column. This incredible boost is achieved with zero changes to how your data is stored; it's still a regular PostgreSQL table, but you're swapping in a more efficient engine for the job.

Part 2: Turn Postgres into a Data Lake Gateway

The power of pg_duckdb goes far beyond just speeding up queries on local tables. It brings the entire DuckDB extension ecosystem into PostgreSQL, turning your database into a true data hub. This allows you to query Parquet files on S3, read from Apache Iceberg tables, and more, all from within psql.

For instance, you can query Parquet files directly from your data lake. DuckDB's read_parquet() function works seamlessly, as shown in this query that reads a public dataset from S3 to find top TV shows.

Copy code
-- Remember to keep pg_duckdb.enable = true;
SELECT "Title", "Days In Top 10"
FROM read_parquet('s3://us-west-2.opendata.source.coop/netflix/daily_top_10/day=*/country_name=United States/*.parquet')
WHERE "Days In Top 10" > 200
ORDER BY "Days In Top 10" DESC;

You can even connect to modern table formats like Apache Iceberg by installing the necessary DuckDB extension on the fly.

Copy code
-- Install the Iceberg extension
SELECT duckdb_install('iceberg');
-- Load the extension for the current session
SELECT duckdb_load('iceberg');

-- Query an Iceberg table stored on S3
SELECT *
FROM iceberg_scan('s3://my-iceberg-bucket/warehouse/db/table')
LIMIT 10;

It's also a two-way street. You can use pg_duckdb to export data from PostgreSQL back to your data lake. The standard COPY command, when used with the DuckDB engine, can write to Parquet on S3.

Copy code
-- Export a Postgres table to Parquet on S3
COPY (SELECT * FROM my_postgres_table)
TO 's3://my-backup-bucket/my_table.parquet'
WITH (FORMAT 'parquet');

This opens up powerful new workflows, like backing up large tables, exporting data for your data team, or importing valuable datasets from the lake directly into Postgres to support your applications.

Part 3: Scale Analytics in the Cloud with MotherDuck

Running large analytical queries, even with DuckDB's speed, can still consume significant CPU and memory. On a production PostgreSQL instance that's also handling application transactions, this can create resource contention that slows down your application.

This is where MotherDuck comes in. MotherDuck is a serverless analytics platform powered by DuckDB. With pg_duckdb, you can seamlessly offload heavy analytical workloads to MotherDuck's cloud compute, protecting your production database without ever leaving your Postgres environment.

Connecting Postgres to MotherDuck

Connecting is simple. First, sign up for a free MotherDuck account and get a service token from the UI.

Next, you need to add this token to your postgresql.conf file.

Copy code
# postgresql.conf
# Add this line with your token
motherduck.token = 'md_my_super_secret_token'

After adding the token, restart your PostgreSQL instance.

Security Best Practice: Hardcoding secrets is not ideal. For a more secure setup, pg_duckdb also supports reading the token from an environment variable. You can set motherduck_token in your environment and use this line in postgresql.conf instead: motherduck.token = '${motherduck_token}'

Hybrid Queries: The Best of Both Worlds

Once connected, you can query MotherDuck directly from psql. MotherDuck includes a shared database, sample_data, which you can query immediately. Let's count the mentions of "DuckDB" in Hacker News titles from 2022.

Copy code
-- This query runs on MotherDuck's cloud infrastructure
SELECT
    date_trunc('month', "timestamp") AS month,
    count(*) AS mentions
FROM sample_data.hacker_news.stories_2022
WHERE
    lower(title) LIKE '%duckdb%'
GROUP BY 1
ORDER BY 1;

The true power lies in moving data effortlessly between your local Postgres instance and MotherDuck.

1. Pulling analytical results from MotherDuck into a local PostgreSQL table:

Copy code
-- Create a local Postgres table from a MotherDuck query result
CREATE TABLE local_duckdb_mentions AS
SELECT
    date_trunc('month', "timestamp") AS month,
    count(*) AS mentions
FROM sample_data.hacker_news.stories_2022
WHERE
    lower(title) LIKE '%duckdb%'
GROUP BY 1;

2. Pushing a local PostgreSQL table up to MotherDuck:

Copy code
-- Create a table in MotherDuck from a local Postgres table
CREATE TABLE my_motherduck_backup.public.customer_archive
USING duckdb AS
SELECT * FROM public.customer;

This seamless, bi-directional data movement gives you ultimate flexibility, all without leaving the comfort of your psql prompt.

Conclusion: The Best of Both Worlds

With pg_duckdb, you truly get the power of the duck in the elephant's hands. You can accelerate local analytics by orders of magnitude without changing your data storage, query your data lake (S3, Iceberg) directly from your operational database, and seamlessly scale your analytics by offloading heavy work to MotherDuck's serverless platform.

pg_duckdb is currently in Beta, and we're excited about what comes next. The success of DuckDB is all about simplicity, and we're thrilled to bring that simplicity directly to PostgreSQL users in their existing database.

Get Started Today

Try it now: The fastest way to start is with the Docker container.
Check out the code: The project is open source on GitHub.
Share your feedback: We're actively developing the roadmap. Please open an issue with feature requests or feedback!

In the meantime, keep quacking and keep coding.

Related Videos

"What can Postgres learn from DuckDB? (PGConf.dev 2025)" video thumbnail

20:44

2025-06-13

What can Postgres learn from DuckDB? (PGConf.dev 2025)

DuckDB an open source SQL analytics engine that is quickly growing in popularity. This begs the question: What can Postgres learn from DuckDB?

YouTube

Ecosystem

Talk

"Instant SQL Mode - Real Time Feedback to Make SQL Data Exploration Fly" video thumbnail

2025-04-23

Instant SQL Mode - Real Time Feedback to Make SQL Data Exploration Fly

Hamilton Ulmer shares insights from MotherDuck's Instant SQL Mode, exploring how real-time query result previews eliminate the traditional write-run-debug cycle through client-side parsing and DuckDB-WASM caching.

SQL

Talk

MotherDuck Features

"Using SQL in Your Data Lake with DuckDB, Iceberg, dbt, and MotherDuck" video thumbnail

24:42

2025-01-17

Using SQL in Your Data Lake with DuckDB, Iceberg, dbt, and MotherDuck

In this talk at our MotherDuck Seattle meetup, Jacob, developer advocate at MotherDuck, talked about why we should use more SQL and go hands-on with some practical examples of how to read from Iceberg using dbt, DuckDB, and MotherDuck.

dbt

Meetup

SQL