Loading data via the Postgres endpoint
MotherDuck's Postgres endpoint is a good thin-client loading path when your application, BI tool, or serverless runtime already speaks PostgreSQL and you want to run SQL in MotherDuck without installing a DuckDB client.
It is best suited to server-side loading from remote data sources.
If your files already live in object storage or are available over HTTPS, use the Postgres endpoint to run CREATE TABLE AS SELECT or INSERT INTO ... SELECT and let MotherDuck read the files remotely.
If your data is on your laptop, application server disk, or in a local DuckDB file, a DuckDB client path is usually a better fit. In that case, either:
- Upload the files to object storage first, then load them remotely through the Postgres endpoint.
- Use a DuckDB client path instead, such as
duckdb, Python DuckDB, or another DuckDB client connected tomd:.
Recommended patterns
Load directly from cloud storage or HTTPS
This is the preferred pattern for the Postgres endpoint.
The examples below use public sample files so you can run them directly.
CREATE OR REPLACE TABLE my_db.main.orders_raw AS
SELECT *
FROM read_parquet(
'https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet',
MD_RUN = REMOTE
);
You can use the same approach with CSV or JSON:
CREATE OR REPLACE TABLE my_db.main.weather_events AS
SELECT *
FROM read_csv(
'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv',
HEADER = true,
AUTO_DETECT = true,
MD_RUN = REMOTE
);
This keeps the work inside MotherDuck and avoids sending rows one statement at a time over the Postgres wire.
Load into a staging table, then transform
For repeatable pipelines, stage the raw data first and then publish into the final table.
CREATE SCHEMA IF NOT EXISTS my_db.ingest;
CREATE OR REPLACE TABLE my_db.ingest.orders_stage AS
SELECT *
FROM read_parquet(
'https://shell.duckdb.org/data/tpch/0_01/parquet/orders.parquet',
MD_RUN = REMOTE
);
CREATE OR REPLACE TABLE my_db.main.orders_curated AS
SELECT
o_orderkey AS order_id,
o_custkey AS customer_id,
o_orderdate::TIMESTAMP AS order_ts,
o_totalprice::DOUBLE AS total_amount
FROM my_db.ingest.orders_stage;
This keeps ingestion and transformation separate, which makes validation, retries, and backfills easier.
Batch rows if the data exists only in application memory
If your source data exists only in application memory, use multi-row INSERT statements instead of row-by-row inserts.
Recommended:
CREATE OR REPLACE TABLE my_db.main.orders_batch (
id INTEGER,
note VARCHAR,
amount DOUBLE
);
INSERT INTO my_db.main.orders_batch VALUES
(1, 'a', 10.0),
(2, 'b', 20.0),
(3, 'c', 30.0);
Less efficient:
INSERT INTO my_db.main.orders_batch VALUES (1, 'a', 10.0);
INSERT INTO my_db.main.orders_batch VALUES (2, 'b', 20.0);
INSERT INTO my_db.main.orders_batch VALUES (3, 'c', 30.0);
Single-row inserts create unnecessary round trips and are much slower for loading.
When loading rows from an application:
- fewer, larger batches
- append-only staging tables
- transactions that stay comfortably below a minute
Use a DuckDB client path instead when
The Postgres endpoint is not currently intended for workflows that depend on local DuckDB-client capabilities. Use a DuckDB client path instead when you need:
- local-file
COPY EXPORT DATABASEIMPORT DATABASEATTACH ':memory:'ATTACH '/path/to/file.duckdb'CREATE DATABASE ... FROM '/path/to/file.duckdb'MD_RUN = LOCALINSTALLandLOAD
In practice, that means the Postgres endpoint is not the primary interface for:
- loading directly from local files
- attaching local or in-memory DuckDB databases
- extension-based workflows
- local execution paths such as
MD_RUN = LOCAL
Protected cloud storage
If you are loading from protected S3, GCS, R2, or Azure storage, make sure the required MotherDuck secret already exists.
Cloud-storage secret creation requires DuckDB extension support and is not currently supported through the Postgres endpoint. The recommended workflow is:
- Create the secret using a DuckDB client path or another supported MotherDuck workflow.
- Then use the Postgres endpoint to run the load query.
Decision guide
| Situation | Best approach |
|---|---|
| Files already in S3, GCS, R2, Azure, or public HTTPS | Use read_parquet, read_csv, or read_json with MD_RUN = REMOTE over the Postgres endpoint |
| Data is local on your machine | Prefer a DuckDB client path, or upload the files to object storage first |
| Data exists only in app memory and volume is modest | Use explicit large multi-row INSERT batches over the Postgres endpoint |
| Very large local bulk load | Use a DuckDB client path instead |
Summary
For the best mix of throughput and simplicity:
- Write source files as Parquet when you can.
- Put them in object storage close to your MotherDuck region.
- Use the Postgres endpoint to run
CREATE TABLE AS SELECTorINSERT INTO ... SELECTwithMD_RUN = REMOTE. - Stage first, validate row counts and schemas, then publish into the final table.