YouTube

Why Your Postgres Analytics Are So Slow (And 3 Ways to Fix It)

2025/07/18Featuring:

Why Postgres Struggles with Analytical Queries

Postgres has become ubiquitous in modern applications, powering backend APIs and web applications with exceptional performance for transactional workloads. Its row-oriented storage and MVCC (Multi-Version Concurrency Control) design excel at handling classic database operations like inserting users, updating orders, and deleting expired sessions.

However, when it comes to analytical questions—such as calculating revenue by region over six months—Postgres shows its limitations. These types of queries require scanning large amounts of data across multiple columns, which is where column-oriented analytical database systems prove much more suitable.

Three Solutions to Accelerate Postgres Analytics

Solution 1: DuckDB Postgres Extension

The DuckDB Postgres extension (previously called the Postgres scanner) offers the simplest path to improved analytics performance. This approach uses the Postgres wire protocol directly and works with any DuckDB client—Python, Java, Golang, or others.

Setup Process:

Create a secret of type Postgres
Use the ATTACH command to connect your Postgres database
Query Postgres tables as regular tables
Optionally connect to MotherDuck for cloud-based analytics

Pros:

Simple installation with just a DuckDB client
Works locally or in the cloud
Great for ad-hoc analysis on smaller datasets
No modifications to Postgres setup required
Only needs username and password

Cons:

Network bottlenecks from data transfer between Postgres and DuckDB
Limited push-down capabilities for complex operations
Puts pressure on Postgres server (can be mitigated with replicas)
Some compute happens on Postgres, some on DuckDB

Solution 2: PG DuckDB - Embedded Analytics

PG DuckDB takes a different approach by embedding DuckDB directly inside Postgres as an extension. This allows querying Postgres tables using the embedded DuckDB engine while accessing all DuckDB features like reading Parquet files and Iceberg tables over S3.

Key Features:

Query within the Postgres environment
Create tables using the DuckDB engine with USING duckdb syntax
Move data between Postgres and MotherDuck seamlessly
Export to object storage using COPY commands

Pros:

No need to leave Postgres environment
Significant speed improvements for analytical queries
Easy data offloading to object storage or MotherDuck
Excellent performance with external Parquet files
No network overhead for local Postgres data

Cons:

Requires custom extension installation
Most managed Postgres services don't allow custom extensions
DuckDB aggressively uses CPU and memory, potentially impacting transactional workloads
Requires careful monitoring to prevent resource contention

Solution 3: CDC (Change Data Capture)

Change Data Capture provides a real-time synchronization approach, keeping your analytical database perfectly synced with your transactional database. Popular implementations include Debezium with Kafka or vendors like Airbyte.

How it works:

Configure Postgres for logical replication
Install and configure CDC tool (Debezium/Kafka or vendor solution)
CDC reads the WAL (Write-Ahead Log) and streams changes
Changes propagate to DuckDB or MotherDuck

Pros:

Near real-time synchronization (changes appear within seconds)
Complete workload separation
Automatic schema evolution
No impact on transactional queries

Cons:

High operational complexity
Requires setup and maintenance of additional infrastructure
Increased disk I/O and storage requirements from WAL
Risk of processing lag during high-volume periods

Choosing the Right Solution

The choice between these three approaches depends on your specific requirements:

DuckDB Postgres Extension: Best for ad-hoc analysis and smaller data movements
PG DuckDB: Ideal when you can install extensions and want embedded analytics
CDC Pipeline: Perfect for real-time requirements and complete workload isolation

Operational Best Practices

Regardless of the chosen approach, following these best practices ensures optimal performance:

Use Replicas: Always run analytical workloads on read replicas, especially with PG DuckDB
Monitor Resources: Track CPU, memory, I/O, and network usage closely
Profile Queries: Understand bottlenecks and leverage optimization capabilities

Each solution offers unique advantages for accelerating Postgres analytics. The key is matching the approach to your specific use case, infrastructure constraints, and performance requirements.

TABLE OF CONTENTS

Why Postgres Struggles with Analytical Queries

Three Solutions to Accelerate Postgres Analytics

Choosing the Right Solution

Operational Best Practices

Transcript

0:00Everybody is using bus nowadays. You see it everywhere. Backend APIs, web apps, you name it. It's fantastic for what it is designed for transactional workloads.

0:10You know the classic insert this user update. Order, delete those expired sessions. Postgress handles this beautifully with its row oriented storage and MVCC design. But here is where things gets interesting. As soon as you start asking analytical questions, you know the fun stuff like what's our revenues by region over the last six months? Well, there is an

0:34elephant in the room. I mean literally. So why is it so slow? That's where typical analytical databases systems which are colon oriented are much more suitable. They're basically built for this stuff. bad. But even if you picked the right all at database, how do you move smoothly from your bus to your shiny new analytical plot twist? The

0:57answer is actually also sometimes duct DB. So in this video, I'll explore three ways to connect Doug DB or mother duck your duct DB in the cloud with posgress to offload your analytical query. So first the duck db posgress extension the I'll just want to get started and done.

1:16So dark has various extensions and it has a posgress extension also previously called the posgress scanner and this is the simplest path. It used the posgress wire protocol directly and it's literally plug and play. You get any duct DB client, Python, Java, Golang, whatever floats your boat or rather floats your duck and you're ready to go.

1:39For this demo I'll show you using the DDB CLI because well I'm not scared of a dark terminal. So first we create a secret type of pargress and then we use the attach command. The attach command basically enables you to attach any database duct db database or here a posgress database and because we created our secrets safely just before duck db

2:04will pick up the first posest secrets available and that basically it two commands and now you can start query your posress table as they are regular tables to connect to motherduck you just add another attach nd provided that you have a mother duck token available. And now you can easily also move your data from your possess table to mother duck

2:28using simple create table has select

2:32statements or insert queries if you want to move your data in batches. Basically the same strategy would it be to move your data to a local duck DB or moduck in the cloud. So let's cover the pros and cons quickly. It's really simple.

2:47You just install a duct DB client and you are ready to go. TDDB can run locally or on your clouds, your choice.

2:54It's great for adduct analysis on smaller data sets or moving data in smaller batches. You don't need to touch your posgress setup. You just need an user and a password and that's enough.

3:05Regarding the counts, well, network bottleneck data has to travel over the wire between your posgress and Doug DDB client. There is limited push down.

3:15While it supports some projection and filtering push down complex operation that can be pushed down uh on posgress side is going to happen to the duct DB side and potentially requiring more data transfer than necessary. You also put pressure on posgress depending on your query. Some compute happens on postgress and others on DDB. You can mitigate this

3:39by using a posgress replica. So the pos extension is great for moving the data first and and then doing your analytics using duct. Second option PG duck DB.

3:49Doug DB inside posgress. Now this is where things get spicy. PG duck DB is a posgress extension. So rather than having ductb as a separate client, it's embedded directly in posgress. Now you can query posgress table using the embedded ductbine. But you also get access to all the duct DB features like reading parket files, iceberg tables over S3 and more. And finally, you can

4:14easily connect again to moduck to offload your compute in the cloud. For this demo, I'm running a local postgress container that already has the pgb extension installed because yes, I like to live dangerously. As you can see, we pass already our modd token as an environment variable. So assuming it is already existing in your machine and then you can connect to the psql client

4:38using a docker exact comment. Then if you want to enable moto duck you just call duct db enable motoduck and now you're ready to create tables using the duct db engine by using the using duct db keywords and still query also existing postgress table using the ductb engine and moving data around or even exporting to an object storage using the

5:02copy command for example of db. So what are the pro of this? Well, you don't need to even leave the possess environment. You install the extension and you're good to go. You can run the same analytical query and get up to insane speed up. And offloading data to an object storage or mock is straightforward. You can query external

5:23files like park and that performs really well since they are already in colon format. You don't have network coveret for data already present in posgress because the db engine isn't baited there. The cons of course that you need a custom extension and sometimes managed posgress service or most of the time won't let you install custom extension.

5:44If you are using a managed posgress service you are fighting resource db is designed to aggressively use available CPU and memory which can impact your transactional workload. Again you should use a replica anyway. Operational monitoring is needed. you need to monitor versus hug carefully so that your identical queries are not competing with your transactional queries. The third option is CDC change data capture

6:08aka the real time everything approach. So change data capture is the classic set it and forget it method for keeping your analytical database in perfect sync with your upstream transactional database. One well-known matcher approach is using the besium in combination with CFKA but that requires quite an overhead. You also have vendors like air bite that can help you to

6:32self-manage a CDC pipeline between posgress and your target OLAP database here dakdb or modak conceptually you configure posgress for logical replication then you install and configure the bzium with cafka or your cdc vendor and the cdc tool reads the wall and streams chain to your db directly or to your cafka topic. So what are the pros and cons of this approach?

6:59You have near realtime synchronization. So change appears in DDB within seconds. Complete workload separation. Analytics doesn't impact your transactional database at all. You have automatic schema evolution. Change to your POS schema can be propagated directly to your targets sync. And the cons of course is high operational complexity.

7:23You need to set up, monitor and maintain CFKA and DBZM or set up another vendors

7:29which of course is going to be easier to maintain the operational complexity. The wall impact it increase disk IO and storage requirement on your bus servers and finally you could have processing lag risk CDC process can fall behind during high volume wire period. So when to pick which solution? Well, here is a recap table of the three option we just

7:51saw. Doug Dbb Postgress extension the posgress extension PGDB and setting up a CDC pipeline. Whatever path you choose, remember that with great powers comes great responsibility. So don't forget about operational best practice. Spoiler alert, use replicas for God's sake. For PGDB especially, a dedicated read replica is highly recommended. Monitor your resources. Keep a close eyes on CPU, memory, IO, and network usage. And

8:20finally, profile your query. This is for understanding where bottlenecks lie and leverage optimization capability. Let me know in the comments which approach you've chose uh for which use case and if you're already using any of these methods. I'd love to hear how it's working out for you. In the meantime, take care of your data pipelines and I'll see you in the next one.

8:42[Music]

Related Videos

" Preparing Your Data Warehouse for AI: Let Your Agents Cook" video thumbnail

2026-01-27

Preparing Your Data Warehouse for AI: Let Your Agents Cook

Jacob and Jerel from MotherDuck showcase practical ways to optimize your data warehouse for AI-powered SQL generation. Through rigorous testing with the Bird benchmark, they demonstrate that text-to-SQL accuracy can jump from 30% to 74% by enriching your database with the right metadata.

AI, ML and LLMs

SQL

MotherDuck Features

Stream

Tutorial

"The MCP Sessions - Vol 2: Supply Chain Analytics" video thumbnail

2026-01-21

The MCP Sessions - Vol 2: Supply Chain Analytics

Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!

Stream

AI, ML and LLMs

MotherDuck Features

SQL

BI & Visualization

Tutorial

"No More Writing SQL for Quick Analysis" video thumbnail

0:09:18

2026-01-21

No More Writing SQL for Quick Analysis

Learn how to use the MotherDuck MCP server with Claude to analyze data using natural language—no SQL required. This text-to-SQL tutorial shows how AI data analysis works with the Model Context Protocol (MCP), letting you query databases, Parquet files on S3, and even public APIs just by asking questions in plain English.

YouTube

Tutorial

AI