Skip to main content

dbt with DuckDB and MotherDuck

Data Build Tool (dbt) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses by defining SQL in model files. It bring the composability of programming languages to SQL while automating the mechanics of updating tables.

dbt-duckdb is the adapter which allows dbt to use DuckDB and MotherDuck. The adapter also supports DuckDB extensions and any of the additional DuckDB configuration options.

Installation

Since dbt is a Python library, it can be installed through pip:

pip3 install dbt-duckdb will install both dbt and duckdb.

Configuration for Local DuckDB

This configuration allows you to connect to S3 and perform read/write operations on Parquet files using an AWS access key and secret.

profiles.yml

default:
outputs:
dev:
type: duckdb
path: /tmp/dbt.duckdb
threads: 1
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev

You can find more information about these connections profiles in the dbt documentation.

Configuration for MotherDuck

The only change needed for motherduck is the path: setting.

default:
outputs:
dev:
type: duckdb
path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}"
threads: 1
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev

This assumes that you have setup MOTHERDUCK_TOKEN as an environment variable. To know more about how to persist your authentication credentials, read Authenticating to MotherDuck using an access token. If you don't set the motherduck_token in your path, you will be prompted to authenticate to MotherDuck when running your dbt run command.

auth_md

Follow the instructions and it will export the service account variable for the current dbt run process.

DuckDB will parallelize a single write query as much as possible, so running one at a time is the most efficient use of compute resources. For most use cases, we recommend setting threads: 1 for the best performance.

Extra resources

Take a look at our video guide on DuckDB and dbt provided below, along with the corresponding demo tutorial on GitHub.