From Cloud Storage or over HTTPS
From Public Cloud Storage
MotherDuck supports loading data from a number of cloud storage providers, such as Amazon S3 and Azure. These data transfers are initiated from MotherDuck servers for faster access to data.
note
MotherDuck is currently hosted in Amazon AWS region us-east-1
. We strongly encourage you locate your data in this availability zone for working with MotherDuck.
The following example features Amazon S3.
- CLI
Connect to MotherDuck if you haven't already by doing the following:
-- assume db my_db exists
ATTACH 'motherduck:my_db';
-- CTAS a table from a publicly available demo dataset stored in s3
CREATE OR REPLACE TABLE pypi_small AS
SELECT * FROM 's3://motherduck-demo/pypi.small.parquet';
-- JOIN the demo dataset against a larger table to find the most common duplicate urls
-- Note you can directly refer to the url as a table!
SELECT pypi_small.url, COUNT(*)
FROM pypi_small
JOIN 's3://motherduck-demo/pypi_downloads.parquet' AS s3_pypi
ON pypi_small.url = s3_pypi.url
GROUP BY pypi_small.url
ORDER BY COUNT(*) DESC
LIMIT 10;
From a Secure Cloud Storage Provider
MotherDuck supports secure access to several cloud storage providers. MotherDuck also supports securely storing credentials for easier ongoing access to cloud storage. The following example uses Amazon S3.
- CLI
CREATE SECRET IN MOTHERDUCK (
TYPE S3,
KEY_ID 'access_key',
SECRET 'secret_key',
REGION 'us-east-1'
);
-- Now you can query from a secure S3 bucket
CREATE OR REPLACE TABLE mytable AS SELECT * FROM 's3://...';
Over HTTPS
MotherDuck supports loading data over HTTPS.
- CLI
-- Reads the Central Park Squirrel Data
SELECT * FROM read_csv_auto('https://docs.google.com/spreadsheets/d/e/2PACX-1vQUZR6ikwZBRXWWQsFaUceEiYzJiVw4OQNGtwGBfcMfVatpCyfxxaWPdoKJIHlwNM-ow1oeW_2F-pO5/pub?gid=2035607922&single=true&output=csv');