Skip to main content

dbt

Data Build Tool (dbt) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses more effectively. By applying practices borrowed from software development, like version control, testing, and deployment, dbt empowers users to create reliable, efficient, and scalable data pipelines using SQL.

To work with DuckDB/MotherDuck within dbt, it's the same process as working with any other cloud data warehouse. You will need what dbt refers to as an "adapter".

dbt-duckdb is an adapter that works with DuckDB and MotherDuck. The adapter supports also DuckDB extensions and any of the additional DuckDB configuration options.

Installation

Since dbt is a Python library, it can be installed through pip. By using the command pip3 install dbt-duckdb, you will have all dependencies included, dbt and duckdb.

Configuration for DuckDB & MotherDuck

In order to effectively use DuckDB with dbt, certain connection profiles are required. You can find more information about these connections profiles in the dbt documentation.

In a nutshell, you would need a profiles.yaml file that looks like below. The one below for instance, allows you to connect to S3 and perform read/write operations on Parquet files using an AWS access key and secret.

default:
outputs:
dev:
type: duckdb
path: /tmp/dbt.duckdb
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev

If you need to work with MotherDuck, the process is straightforward. You simply need to alter the path to md:<mydatabase>?motherduck_token= as shown in the following profiles.yaml example:

default:
outputs:
dev:
type: duckdb
path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}"
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev

This assumes that you have setup MOTHERDUCK_TOKEN as an environment variable. To know more about how to persist your authentication credentials, read Authenticating to MotherDuck using a service token.

If you don't set the motherduck_token in your path, you will be prompted to authenticate to MotherDuck when running your dbt run command.

auth_md

Follow the instructions and it will export the service account variable for the current dbt run process.

Extra resources

Take a look at our video guide on DuckDB and dbt provided below, along with the corresponding demo tutorial on GitHub.