Access Managed DuckLakes from your Own Cloud (or Laptop)
If you supply your own cloud storage bucket, you can bring your own compute (BYOC) to your DuckLake. Today, this allows you to configure DuckDB to use the DuckLake metadata catalog on MotherDuck, but read and write directly to your cloud storage (let’s say from your AWS Lambda jobs!).
In the DuckDB CLI (as an example), create a secret that provides access to your DATA_PATH:
CREATE PRESISTENT SECRET my_secret (
TYPE S3,
KEY_ID 'my_s3_access_key',
SECRET 'my_s3_secret_key',
REGION 'my-bucket-region'
);
Next, attach the DuckLake to your DuckDB session:
ATTACH 'ducklake:md:__ducklake_metadata_<database_name>' AS <alias>;
Now, you can say USE <alias>; to default your DuckDB session to your DuckLake, or just reference the <alias> in your queries. The following will copy a file from a MotherDuck-owned S3 bucket into your DuckLake as a new table.
CREATE TABLE <alias>.air_quality AS
SELECT * FROM 's3://us-prd-motherduck-open-datasets/who_ambient_air_quality/parquet/who_ambient_air_quality_database_version_2024.parquet';
This capability of DuckLakes gets much more interesting when additional data processing frameworks implement support for the DuckLake specification. Support for using DuckLake with Apache Spark is in development.
How do I use my own compute with a fully-managed DuckLake?
Right now, if you want to be able to bring your own compute, you also need to bring your own cloud storage bucket.
Support for using your own compute with a fully-managed DuckLake will be available soon. Although the storage buckets in this scenario will continue to be owned and managed by MotherDuck, we’ll provide signed URLs which clients can use to access these buckets.