From Cloud Storage or over HTTPS
From Public Cloud Storage
MotherDuck supports several cloud storage providers, including Azure, Google Cloud and Cloudflare R2.
note
MotherDuck is currently hosted in Amazon AWS region us-east-1
. We strongly encourage you locate your data in this availability zone for working with MotherDuck.
The following example features Amazon S3.
- CLI
Connect to MotherDuck if you haven't already by doing the following:
-- assuming the db my_db exists
ATTACH 'md:my_db';
-- CTAS a table from a publicly available demo dataset stored in s3
CREATE OR REPLACE TABLE pypi_small AS
SELECT * FROM 's3://motherduck-demo/pypi.small.parquet';
-- JOIN the demo dataset against a larger table to find the most common duplicate urls
-- Note you can directly refer to the url as a table!
SELECT pypi_small.url, COUNT(*)
FROM pypi_small
JOIN 's3://motherduck-demo/pypi_downloads.parquet' AS s3_pypi
ON pypi_small.url = s3_pypi.url
GROUP BY pypi_small.url
ORDER BY COUNT(*) DESC
LIMIT 10;
From a Secure Cloud Storage Provider
MotherDuck supports several cloud storage providers, including Azure, Google Cloud and Cloudflare R2.
- CLI
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key',
SECRET 'secret_key',
REGION 'us-east-1'
);
-- Now you can query from a secure S3 bucket
CREATE OR REPLACE TABLE mytable AS SELECT * FROM 's3://...';
Over HTTPS
MotherDuck supports loading data over HTTPS.
- CLI
-- Reads the Central Park Squirrel Data
SELECT * FROM read_csv_auto('https://docs.google.com/spreadsheets/d/e/2PACX-1vQUZR6ikwZBRXWWQsFaUceEiYzJiVw4OQNGtwGBfcMfVatpCyfxxaWPdoKJIHlwNM-ow1oeW_2F-pO5/pub?gid=2035607922&single=true&output=csv');