Why Your Postgres Analytics Are So Slow (And 3 Ways to Fix It)
2025/07/18Featuring:Why Postgres Struggles with Analytical Queries
Postgres has become ubiquitous in modern applications, powering backend APIs and web applications with exceptional performance for transactional workloads. Its row-oriented storage and MVCC (Multi-Version Concurrency Control) design excel at handling classic database operations like inserting users, updating orders, and deleting expired sessions.
However, when it comes to analytical questions—such as calculating revenue by region over six months—Postgres shows its limitations. These types of queries require scanning large amounts of data across multiple columns, which is where column-oriented analytical database systems prove much more suitable.
Three Solutions to Accelerate Postgres Analytics
Solution 1: DuckDB Postgres Extension
The DuckDB Postgres extension (previously called the Postgres scanner) offers the simplest path to improved analytics performance. This approach uses the Postgres wire protocol directly and works with any DuckDB client—Python, Java, Golang, or others.
Setup Process:
- Create a secret of type Postgres
- Use the
ATTACHcommand to connect your Postgres database - Query Postgres tables as regular tables
- Optionally connect to MotherDuck for cloud-based analytics
Pros:
- Simple installation with just a DuckDB client
- Works locally or in the cloud
- Great for ad-hoc analysis on smaller datasets
- No modifications to Postgres setup required
- Only needs username and password
Cons:
- Network bottlenecks from data transfer between Postgres and DuckDB
- Limited push-down capabilities for complex operations
- Puts pressure on Postgres server (can be mitigated with replicas)
- Some compute happens on Postgres, some on DuckDB
Solution 2: PG DuckDB - Embedded Analytics
PG DuckDB takes a different approach by embedding DuckDB directly inside Postgres as an extension. This allows querying Postgres tables using the embedded DuckDB engine while accessing all DuckDB features like reading Parquet files and Iceberg tables over S3.
Key Features:
- Query within the Postgres environment
- Create tables using the DuckDB engine with
USING duckdbsyntax - Move data between Postgres and MotherDuck seamlessly
- Export to object storage using COPY commands
Pros:
- No need to leave Postgres environment
- Significant speed improvements for analytical queries
- Easy data offloading to object storage or MotherDuck
- Excellent performance with external Parquet files
- No network overhead for local Postgres data
Cons:
- Requires custom extension installation
- Most managed Postgres services don't allow custom extensions
- DuckDB aggressively uses CPU and memory, potentially impacting transactional workloads
- Requires careful monitoring to prevent resource contention
Solution 3: CDC (Change Data Capture)
Change Data Capture provides a real-time synchronization approach, keeping your analytical database perfectly synced with your transactional database. Popular implementations include Debezium with Kafka or vendors like Airbyte.
How it works:
- Configure Postgres for logical replication
- Install and configure CDC tool (Debezium/Kafka or vendor solution)
- CDC reads the WAL (Write-Ahead Log) and streams changes
- Changes propagate to DuckDB or MotherDuck
Pros:
- Near real-time synchronization (changes appear within seconds)
- Complete workload separation
- Automatic schema evolution
- No impact on transactional queries
Cons:
- High operational complexity
- Requires setup and maintenance of additional infrastructure
- Increased disk I/O and storage requirements from WAL
- Risk of processing lag during high-volume periods
Choosing the Right Solution
The choice between these three approaches depends on your specific requirements:
- DuckDB Postgres Extension: Best for ad-hoc analysis and smaller data movements
- PG DuckDB: Ideal when you can install extensions and want embedded analytics
- CDC Pipeline: Perfect for real-time requirements and complete workload isolation
Operational Best Practices
Regardless of the chosen approach, following these best practices ensures optimal performance:
- Use Replicas: Always run analytical workloads on read replicas, especially with PG DuckDB
- Monitor Resources: Track CPU, memory, I/O, and network usage closely
- Profile Queries: Understand bottlenecks and leverage optimization capabilities
Each solution offers unique advantages for accelerating Postgres analytics. The key is matching the approach to your specific use case, infrastructure constraints, and performance requirements.
Transcript
0:00Everybody is using bus nowadays. You see it everywhere. Backend APIs, web apps, you name it. It's fantastic for what it is designed for transactional workloads.
0:10You know the classic insert this user update. Order, delete those expired sessions. Postgress handles this beautifully with its row oriented storage and MVCC design. But here is where things gets interesting. As soon as you start asking analytical questions, you know the fun stuff like what's our revenues by region over the last six months? Well, there is an
0:34elephant in the room. I mean literally. So why is it so slow? That's where typical analytical databases systems which are colon oriented are much more suitable. They're basically built for this stuff. bad. But even if you picked the right all at database, how do you move smoothly from your bus to your shiny new analytical plot twist? The
0:57answer is actually also sometimes duct DB. So in this video, I'll explore three ways to connect Doug DB or mother duck your duct DB in the cloud with posgress to offload your analytical query. So first the duck db posgress extension the I'll just want to get started and done.
1:16So dark has various extensions and it has a posgress extension also previously called the posgress scanner and this is the simplest path. It used the posgress wire protocol directly and it's literally plug and play. You get any duct DB client, Python, Java, Golang, whatever floats your boat or rather floats your duck and you're ready to go.
1:39For this demo I'll show you using the DDB CLI because well I'm not scared of a dark terminal. So first we create a secret type of pargress and then we use the attach command. The attach command basically enables you to attach any database duct db database or here a posgress database and because we created our secrets safely just before duck db
2:04will pick up the first posest secrets available and that basically it two commands and now you can start query your posress table as they are regular tables to connect to motherduck you just add another attach nd provided that you have a mother duck token available. And now you can easily also move your data from your possess table to mother duck
2:28using simple create table has select
2:32statements or insert queries if you want to move your data in batches. Basically the same strategy would it be to move your data to a local duck DB or moduck in the cloud. So let's cover the pros and cons quickly. It's really simple.
2:47You just install a duct DB client and you are ready to go. TDDB can run locally or on your clouds, your choice.
2:54It's great for adduct analysis on smaller data sets or moving data in smaller batches. You don't need to touch your posgress setup. You just need an user and a password and that's enough.
3:05Regarding the counts, well, network bottleneck data has to travel over the wire between your posgress and Doug DDB client. There is limited push down.
3:15While it supports some projection and filtering push down complex operation that can be pushed down uh on posgress side is going to happen to the duct DB side and potentially requiring more data transfer than necessary. You also put pressure on posgress depending on your query. Some compute happens on postgress and others on DDB. You can mitigate this
3:39by using a posgress replica. So the pos extension is great for moving the data first and and then doing your analytics using duct. Second option PG duck DB.
3:49Doug DB inside posgress. Now this is where things get spicy. PG duck DB is a posgress extension. So rather than having ductb as a separate client, it's embedded directly in posgress. Now you can query posgress table using the embedded ductbine. But you also get access to all the duct DB features like reading parket files, iceberg tables over S3 and more. And finally, you can
4:14easily connect again to moduck to offload your compute in the cloud. For this demo, I'm running a local postgress container that already has the pgb extension installed because yes, I like to live dangerously. As you can see, we pass already our modd token as an environment variable. So assuming it is already existing in your machine and then you can connect to the psql client
4:38using a docker exact comment. Then if you want to enable moto duck you just call duct db enable motoduck and now you're ready to create tables using the duct db engine by using the using duct db keywords and still query also existing postgress table using the ductb engine and moving data around or even exporting to an object storage using the
5:02copy command for example of db. So what are the pro of this? Well, you don't need to even leave the possess environment. You install the extension and you're good to go. You can run the same analytical query and get up to insane speed up. And offloading data to an object storage or mock is straightforward. You can query external
5:23files like park and that performs really well since they are already in colon format. You don't have network coveret for data already present in posgress because the db engine isn't baited there. The cons of course that you need a custom extension and sometimes managed posgress service or most of the time won't let you install custom extension.
5:44If you are using a managed posgress service you are fighting resource db is designed to aggressively use available CPU and memory which can impact your transactional workload. Again you should use a replica anyway. Operational monitoring is needed. you need to monitor versus hug carefully so that your identical queries are not competing with your transactional queries. The third option is CDC change data capture
6:08aka the real time everything approach. So change data capture is the classic set it and forget it method for keeping your analytical database in perfect sync with your upstream transactional database. One well-known matcher approach is using the besium in combination with CFKA but that requires quite an overhead. You also have vendors like air bite that can help you to
6:32self-manage a CDC pipeline between posgress and your target OLAP database here dakdb or modak conceptually you configure posgress for logical replication then you install and configure the bzium with cafka or your cdc vendor and the cdc tool reads the wall and streams chain to your db directly or to your cafka topic. So what are the pros and cons of this approach?
6:59You have near realtime synchronization. So change appears in DDB within seconds. Complete workload separation. Analytics doesn't impact your transactional database at all. You have automatic schema evolution. Change to your POS schema can be propagated directly to your targets sync. And the cons of course is high operational complexity.
7:23You need to set up, monitor and maintain CFKA and DBZM or set up another vendors
7:29which of course is going to be easier to maintain the operational complexity. The wall impact it increase disk IO and storage requirement on your bus servers and finally you could have processing lag risk CDC process can fall behind during high volume wire period. So when to pick which solution? Well, here is a recap table of the three option we just
7:51saw. Doug Dbb Postgress extension the posgress extension PGDB and setting up a CDC pipeline. Whatever path you choose, remember that with great powers comes great responsibility. So don't forget about operational best practice. Spoiler alert, use replicas for God's sake. For PGDB especially, a dedicated read replica is highly recommended. Monitor your resources. Keep a close eyes on CPU, memory, IO, and network usage. And
8:20finally, profile your query. This is for understanding where bottlenecks lie and leverage optimization capability. Let me know in the comments which approach you've chose uh for which use case and if you're already using any of these methods. I'd love to hear how it's working out for you. In the meantime, take care of your data pipelines and I'll see you in the next one.
8:42[Music]
Related Videos

2026-01-27
Preparing Your Data Warehouse for AI: Let Your Agents Cook
Jacob and Jerel from MotherDuck showcase practical ways to optimize your data warehouse for AI-powered SQL generation. Through rigorous testing with the Bird benchmark, they demonstrate that text-to-SQL accuracy can jump from 30% to 74% by enriching your database with the right metadata.
AI, ML and LLMs
SQL
MotherDuck Features
Stream
Tutorial
2026-01-21
The MCP Sessions - Vol 2: Supply Chain Analytics
Jacob and Alex from MotherDuck query data using the MotherDuck MCP. Watch as they analyze 180,000 rows of shipment data through conversational AI, uncovering late delivery patterns, profitability insights, and operational trends with no SQL required!
Stream
AI, ML and LLMs
MotherDuck Features
SQL
BI & Visualization
Tutorial

0:09:18
2026-01-21
No More Writing SQL for Quick Analysis
Learn how to use the MotherDuck MCP server with Claude to analyze data using natural language—no SQL required. This text-to-SQL tutorial shows how AI data analysis works with the Model Context Protocol (MCP), letting you query databases, Parquet files on S3, and even public APIs just by asking questions in plain English.
YouTube
Tutorial
AI

