From a PostgreSQL or MySQL Database
Using PostgreSQL or MySQL DuckDB extensions
DuckDB's PostgreSQL extension and MySQL extension make it easy to connect to OLTP databases and copy data into MotherDuck from a DuckDB client running on your own machine or compute resource. In this guide we demonstrate the workflow with PostgreSQL. Consult the DuckDB MySQL extension documentation to adapt the same pattern for MySQL.
MotherDuck does not yet support the PostgreSQL and MySQL extensions, so you need to perform the following steps on your own computer or cloud computing resource. We are working on supporting the PostgreSQL extension on the server side so that this can happen within the MotherDuck app in the future with improved performance.
Prerequisites
- PostgreSQL Database Credentials: Ensure you have access details to the PostgreSQL database, including host address, port, and user credentials. You can put the user credentials in the PostgreSQL Password File, store them in environment variables, or pass them inline in the script below.
- Network Connectivity: Your machine must be able to connect to the target PostgreSQL database.
- MotherDuck Credentials: MotherDuck credentials should be set up. If not, follow the steps in Authenticating to MotherDuck.
- DuckDB: Either the DuckDB command-line interface or Python + the DuckDB package should be installed and operational. See the Getting Started tutorials for instructions to install DuckDB.
Steps
The following SQL script installs and loads DuckDB's PostgreSQL extension, tunes a few settings that matter for larger bulk loads and copies one PostgreSQL table into the MotherDuck table my_db.pg_data_schema.first_pg_table.
Fill in the placeholders <dbname>, <host_address>, <username>, <password>, <schema>, and <table> with the appropriate values and save the script to a file, for example ingest_data_from_postgres.sql.
INSTALL postgres;
LOAD postgres;
-- Tune the local DuckDB client for a larger initial load.
SET threads = 8;
SET memory_limit = '8GB';
SET pg_connection_limit = 8;
SET pg_pages_per_task = 250;
-- Connect to MotherDuck.
ATTACH 'md:';
USE my_db;
-- Optionally create a schema. By default MotherDuck uses the main schema.
CREATE SCHEMA IF NOT EXISTS pg_data_schema;
-- Ingest data from PostgreSQL to a MotherDuck table.
CREATE OR REPLACE TABLE pg_data_schema.first_pg_table AS
SELECT *
FROM postgres_scan(
'dbname=<dbname> host=<host_address> port=5432 user=<username> password=<password> connect_timeout=10',
'<schema>',
'<table>'
);
-- Optional: verify the number of rows in the MotherDuck table.
SELECT count(1) FROM pg_data_schema.first_pg_table;
If you only want to smoke-test the connection first, add LIMIT 1000 to the SELECT before running the full load.
Best practices
Here are a few tips to keep larger PostgreSQL loads predictable.
Run DuckDB close to both systems
This workflow is client-side, so the DuckDB client becomes the data mover. Run DuckDB on a machine with a good network path to both PostgreSQL and MotherDuck, and use separate client compute when possible instead of competing with the production PostgreSQL instance for the same RAM.
Tune scan parallelism explicitly
Start with SET threads = <available_cpu_count> and SET memory_limit = '<available_memory>', then tune pg_connection_limit and pg_pages_per_task for your source table. For larger tables, start with pg_connection_limit in the 4-8 range and pg_pages_per_task in the 250-1000 range rather than relying on defaults.
Increasing pg_connection_limit can increase pressure on the source PostgreSQL instance. If PostgreSQL memory or connection pressure climbs, reduce pg_connection_limit before reducing DuckDB threads.
Reduce each statement's working set
The DuckDB side of this workflow is typically streaming rather than loading the full source table into RAM. Out-of-memory risk is usually driven more by the source PostgreSQL instance and the host's overall headroom than by DuckDB itself. Select only the schema and columns you need, and attach PostgreSQL with READ_ONLY if you use ATTACH instead of postgres_scan.
Keep credentials out of long-lived scripts
Use PostgreSQL environment variables, the PostgreSQL password file, or DuckDB secrets instead of embedding credentials directly in production scripts.
Load in chunks
For very large tables, break the initial load into ranges and insert them one chunk at a time.
INSTALL postgres;
LOAD postgres;
SET threads = 8;
SET memory_limit = '8GB';
SET pg_connection_limit = 8;
SET pg_pages_per_task = 250;
ATTACH 'md:';
USE my_db;
CREATE SCHEMA IF NOT EXISTS pg_data_schema;
CREATE TABLE IF NOT EXISTS pg_data_schema.first_pg_table AS
SELECT *
FROM postgres_scan(
'dbname=<dbname> host=<host_address> port=5432 user=<username> password=<password> connect_timeout=10',
'<schema>',
'<table>'
)
WHERE 1 = 0;
INSERT INTO pg_data_schema.first_pg_table
SELECT *
FROM postgres_scan(
'dbname=<dbname> host=<host_address> port=5432 user=<username> password=<password> connect_timeout=10',
'<schema>',
'<table>'
)
WHERE updated_at >= TIMESTAMP '2026-01-01'
AND updated_at < TIMESTAMP '2026-02-01';
Repeat the INSERT statement for each key range or time window until the backfill is complete.
If you need recurring replication, change data capture (CDC), or production orchestration, prefer a dedicated ingestion partner over a one-off client-side script.
Run with DuckDB CLI
After filling out the placeholders, you can either execute the statements line by line in the DuckDB CLI, or save the commands in a file, for example ingest_data_from_postgres.sql, and run:
> duckdb < ingest_data_from_postgres.sql
Run with Python
You can also execute it using Python with the DuckDB package.
import duckdb
with open("ingest_data_from_postgres.sql", 'r') as f:
s = f.read()
duckdb.sql(s)
After completing these steps, you should see the new table show up in the MotherDuck Web UI.
Using MotherDuck ingestion partners
MotherDuck collaborates with various integration partners to facilitate data transfer in diverse ways—including change data capture (CDC)—from your PostgreSQL or MySQL database to MotherDuck. For example, you can refer to our Estuary guide that demonstrates how to stream data from Neon, a PostgreSQL-based database, to MotherDuck. To explore the full range of solutions tailored to your needs, visit our MotherDuck ecosystem partners page.