dbt
Data Build Tool (dbt) is an open-source command-line tool that enables data analysts and engineers to transform data in their warehouses more effectively. By applying practices borrowed from software development, like version control, testing, and deployment, dbt empowers users to create reliable, efficient, and scalable data pipelines using SQL.
To work with DuckDB/MotherDuck within dbt, it's the same process as working with any other cloud data warehouse. You will need what dbt refers to as an "adapter".
dbt-duckdb is an adapter that works with DuckDB and MotherDuck. The adapter supports also DuckDB extensions and any of the additional DuckDB configuration options.
Installation
Since dbt is a Python library, it can be installed through pip. By using the command pip3 install dbt-duckdb
, you will have all dependencies included, dbt
and duckdb
.
Configuration for DuckDB & MotherDuck
In order to effectively use DuckDB with dbt, certain connection profiles are required. You can find more information about these connections profiles in the dbt documentation.
In a nutshell, you would need a profiles.yaml
file that looks like below. The one below for instance, allows you to connect to S3 and perform read/write operations on Parquet files using an AWS access key and secret.
default:
outputs:
dev:
type: duckdb
path: /tmp/dbt.duckdb
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev
If you need to work with MotherDuck, the process is straightforward.
You simply need to alter the path to md:<mydatabase>?motherduck_token=
as shown in the following profiles.yaml
example:
default:
outputs:
dev:
type: duckdb
path: "md:my_db?motherduck_token={{env_var('MOTHERDUCK_TOKEN')}}"
extensions:
- httpfs
- parquet
settings:
s3_region: my-aws-region
s3_access_key_id: "{{ env_var('S3_ACCESS_KEY_ID') }}"
s3_secret_access_key: "{{ env_var('S3_SECRET_ACCESS_KEY') }}"
target: dev
This assumes that you have setup MOTHERDUCK_TOKEN
as an environment variable. To know more about how to persist your authentication credentials, read Authenticating to MotherDuck using a service token.
If you don't set the motherduck_token
in your path, you will be prompted to authenticate to MotherDuck when running your dbt run
command.
Follow the instructions and it will export the service account variable for the current dbt run
process.
Extra resources
Take a look at our video guide on DuckDB and dbt provided below, along with the corresponding demo tutorial on GitHub.