Skip to main content

Loading Data Into MotherDuck

You can leverage MotherDuck’s managed storage to persist your data. MotherDuck storage provides a high level of manageability and abstraction, optimizing your data for secure, durable, performant, and efficient use. There are several ways to load data into MotherDuck storage.

Load data into MotherDuck from your local machine

You can load files from your local machine into MotherDuck. For example:

Connect to MotherDuck if you haven't already by doing the following:

.open md:     
CREATE DATABASE test01;  
INSERT INTO test01 SELECT * FROM 'customers.csv';
use test01;
CREATE OR REPLACE TABLE orders as SELECT * from 'orders.csv';

You can also upload DuckDB files from your local machine into MotherDuck.

ATTACH 'localdb.db';
CREATE DATABASE cloud_db from 'localdb';
CREATE DATABASE cloud_db2 from 'localdb2.db';

Load data from public cloud storage or HTTPS

MotherDuck supports loading data over HTTPS and a number of cloud storage providers, such as Amazon S3 and Azure. These data transfers are initiated from MotherDuck servers for faster access to data.

note

MotherDuck is currently hosted in Amazon AWS region us-east-1. We strongly encourage you locate your data in this availability zone for working with MotherDuck.

The following example features Amazon S3.

Connect to MotherDuck if you haven't already by doing the following:

-- assume db test01 exists
.open motherduck:test01;
-- CTAS a table from a publicly available demo dataset stored in s3
CREATE OR REPLACE TABLE pypi_small AS
SELECT * FROM 's3://motherduck-demo/pypi.small.parquet';

-- JOIN the demo dataset against a larger table to find the most common duplicate urls
-- Note you can directly refer to the url as a table!
SELECT pypi_small.url, COUNT(*)
FROM pypi_small
JOIN 's3://motherduck-demo/pypi_downloads.parquet' AS s3_pypi
ON pypi_small.url = s3_pypi.url
GROUP BY pypi_small.url
ORDER BY COUNT(*) DESC
LIMIT 10;

Load data from a secure cloud storage provider

MotherDuck supports secure access to several cloud storage providers. MotherDuck also supports securely storing credentials for easier ongoing access to cloud storage. The following example uses Amazon S3.

Connect to MotherDuck if you haven't already by doing the following:

-- assume db test01 exists
.open motherduck:test01;
CREATE [OR REPLACE] SECRET (TYPE S3, S3_ACCESS_KEY_ID 'access-key', S3_SECRET_ACCESS_KEY 'secret-key', S3_REGION 'us-east-1');

-- Now you can query from a secure S3 bucket
CREATE OR REPLACE TABLE mytable AS SELECT * FROM 's3://...';

Load a DuckDB database into MotherDuck

MotherDuck supports persisting local DuckDB databases in the cloud. To upload the current/active local database, please run the following command:

CREATE DATABASE remote_database FROM CURRENT_DATABASE();   

To upload a different local named database, please execute the following command:

CREATE DATABASE remote_database FROM '<local database name>';

Example of usage :

-- assuming we already created a database called test02
-- let's begin by making a local database
.open tpch.ddb
install tpch;
LOAD tpch;
CALL dbgen(sf=0.1);
show tables;
┌──────────┐
│ name │
varchar
├──────────┤
│ customer │
│ lineitem │
...
└──────────┘

LOAD motherduck; -- we need the extension loaded for the following command
-- note test02 must already exist in motherduck
CREATE DATABASE test02 from CURRENT_DATABASE();
┌─────────┐
│ Success │
boolean
├─────────┤
0 rows
└─────────┘

-- note that uploading database does not alter context
-- you are still in the local context and below tpch query will run locally

-- query #20
SELECT s_name, s_address FROM supplier, nation
WHERE s_suppkey IN (
SELECT ps_suppkey FROM partsupp
WHERE ps_partkey IN
( SELECT p_partkey FROM part WHERE p_name LIKE 'forest%')
AND ps_availqty > (
SELECT 0.5 * sum(l_quantity) FROM lineitem
WHERE l_partkey = ps_partkey
AND l_suppkey = ps_suppkey
AND l_shipdate >= CAST('1994-01-01' AS date)
AND l_shipdate < CAST('1995-01-01' AS date)
)
) AND s_nationkey = n_nationkey AND n_name = 'CANADA'
ORDER BY s_name;