Skip to main content

From Cloud Storage or over HTTPS

From Public Cloud Storage

MotherDuck supports loading data from a number of cloud storage providers, such as Amazon S3 and Azure. These data transfers are initiated from MotherDuck servers for faster access to data.

note

MotherDuck is currently hosted in Amazon AWS region us-east-1. We strongly encourage you locate your data in this availability zone for working with MotherDuck.

The following example features Amazon S3.

Connect to MotherDuck if you haven't already by doing the following:

-- assume db my_db exists
ATTACH 'motherduck:my_db';
-- CTAS a table from a publicly available demo dataset stored in s3
CREATE OR REPLACE TABLE pypi_small AS
SELECT * FROM 's3://motherduck-demo/pypi.small.parquet';

-- JOIN the demo dataset against a larger table to find the most common duplicate urls
-- Note you can directly refer to the url as a table!
SELECT pypi_small.url, COUNT(*)
FROM pypi_small
JOIN 's3://motherduck-demo/pypi_downloads.parquet' AS s3_pypi
ON pypi_small.url = s3_pypi.url
GROUP BY pypi_small.url
ORDER BY COUNT(*) DESC
LIMIT 10;

From a Secure Cloud Storage Provider

MotherDuck supports secure access to several cloud storage providers. MotherDuck also supports securely storing credentials for easier ongoing access to cloud storage. The following example uses Amazon S3.

CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key',
SECRET 'secret_key',
REGION 'us-east-1'
);

-- Now you can query from a secure S3 bucket
CREATE OR REPLACE TABLE mytable AS SELECT * FROM 's3://...';

Over HTTPS

MotherDuck supports loading data over HTTPS.

-- Reads the Central Park Squirrel Data
SELECT * FROM read_csv_auto('https://docs.google.com/spreadsheets/d/e/2PACX-1vQUZR6ikwZBRXWWQsFaUceEiYzJiVw4OQNGtwGBfcMfVatpCyfxxaWPdoKJIHlwNM-ow1oeW_2F-pO5/pub?gid=2035607922&single=true&output=csv');