# dbt cloud with MotherDuck via pg_duckdb > dbt cloud is a managed service for dbt core. MotherDuck is used with dbt cloud by deploying a Postgres proxy with pg_duckdb installed. :::note If you only need to connect to MotherDuck from a PostgreSQL-compatible client, use the [Postgres endpoint](/key-tasks/authenticating-and-connecting-to-motherduck/postgres-endpoint) instead. Use this `pg_duckdb` pattern when you specifically need to operate a PostgreSQL server or proxy for dbt Cloud. ::: ## Getting started You will need the following items to get started: 1. A Postgres instance with pg_duckdb installed. 2. A [MotherDuck token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token). 3. A dbt cloud account. ## Configuring pg_duckdb The full documentation for pg_duckdb can be found on [GitHub](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md), but a simple way to set it up is using Docker on EC2. In our testing, we have used m7g.xlarge, which is a 4-core, 16GB instance. Since Postgres exists as a proxy for MotherDuck, it only needs to have enough working space to stream results back to dbt. Even smaller instances could suffice as well, i.e. a1.large, although it has not been tested thoroughly. The memory limits set below assumes a 16GB limit. Once you have added your MotherDuck Token and Postgres password to your environment, you can execute the `docker run` statement below: ```yml docker run -d \ --name pgduckdb \ -p 5432:5432 \ -e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \ -e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \ -v ~/pgduckdb_data_v17:/var/lib/postgresql/data \ --restart unless-stopped \ --memory=12288m \ pgduckdb/pgduckdb:17-main ``` :::note The default configuration of Postgres is sub-optimal for m7g.xlarge. Consider making the following changes to the `postgresql.conf` file. ```ini # Memory configuration optimized for AWS m7g.xlarge with more conservative settings work_mem = '32MB' # Per-operation memory for sorts, joins, etc. maintenance_work_mem = '512MB' # Memory for maintenance operations shared_buffers = '2GB' # ~12.5% of RAM for shared buffer cache effective_cache_size = '6GB' # Conservative estimate of OS cache max_connections = 100 # Reduced maximum concurrent connections ``` ::: ### Upgrading to newer builds of pg_duckdb New containers are built for pg_duckdb on every release. Since we are using docker to run the container, the pg_duckdb server can be stopped, pruned, and then rebuilt with the above docker run command. It is recommended to use a script to rebuild docker image on some cadence. Terraform or similar is recommended to handle this maintenance process. An example shell script can be seen below:

Shell script

```sh #!/bin/bash # Error handling function handle_error() { local line_no=$1 local exit_code=$2 echo "ERROR: An error occurred at line ${line_no}, exit code ${exit_code}" exit ${exit_code} } # Set up error trap trap 'handle_error ${LINENO} $?' ERR # Script to install Docker and run PGDuckDB with MotherDuck on AWS EC2 # Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh # Detect OS if grep -q 'Amazon Linux release 2023' /etc/os-release; then OS_VERSION="Amazon Linux 2023" elif grep -q 'Amazon Linux release 2' /etc/os-release; then OS_VERSION="Amazon Linux 2" elif grep -q 'Ubuntu' /etc/os-release; then OS_VERSION="Ubuntu" else OS_VERSION="Linux" fi echo "Starting setup for PGDuckDB with MotherDuck on $OS_VERSION..." # Check if required environment variables are set if [ -z "$POSTGRES_PASSWORD" ]; then echo "ERROR: POSTGRES_PASSWORD environment variable is not set." echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh" exit 1 fi if [ -z "$MOTHERDUCK_TOKEN" ]; then echo "ERROR: MOTHERDUCK_TOKEN environment variable is not set." echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh" exit 1 fi # Update package lists - continue even if there are errors with some repositories echo "Updating package lists..." if [[ "$OS_VERSION" == "Ubuntu" ]]; then sudo apt-get update -y || true elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then sudo dnf update -y || true else sudo yum update -y || true fi # Check if Docker is already installed if command -v docker &>/dev/null; then echo "Docker is already installed, skipping installation." else # Install prerequisites based on OS echo "Installing prerequisites..." if [[ "$OS_VERSION" == "Ubuntu" ]]; then sudo apt-get install -y \ apt-transport-https \ ca-certificates \ curl \ gnupg \ lsb-release elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then # Use --allowerasing to handle curl package conflicts sudo dnf install -y --allowerasing \ device-mapper-persistent-data \ lvm2 \ ca-certificates else sudo yum install -y \ device-mapper-persistent-data \ lvm2 \ ca-certificates fi # Install Docker based on OS echo "Installing Docker..." if [[ "$OS_VERSION" == "Ubuntu" ]]; then # Add Docker's official GPG key curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg # Set up the repository echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Update and install sudo apt-get update -y sudo apt-get install -y docker-ce docker-ce-cli containerd.io elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then # Amazon Linux 2023 - use the standard package sudo dnf install -y docker elif [[ "$OS_VERSION" == "Amazon Linux 2" ]]; then # Amazon Linux 2 - use extras sudo amazon-linux-extras install -y docker else # Fallback sudo yum install -y docker fi # Verify Docker was installed if ! command -v docker &>/dev/null; then echo "ERROR: Docker installation failed." exit 1 fi fi # Start Docker service echo "Starting Docker service..." sudo systemctl start docker || sudo service docker start sudo systemctl enable docker || sudo chkconfig docker on # Add current user to docker group to avoid using sudo with docker commands echo "Adding current user to docker group..." sudo usermod -aG docker "$USER" # Create a new data directory for PostgreSQL 17 echo "Creating new data directory for PostgreSQL 17..." mkdir -p ~/pgduckdb_data_v17 # Fix permissions on the data directory echo "Setting correct permissions on data directory..." sudo chown -R 999:999 ~/pgduckdb_data_v17 # 999 is the standard UID for postgres user in Docker sudo chmod 700 ~/pgduckdb_data_v17 # Check architecture ARCH=$(uname -m) echo "Detected architecture: $ARCH" if [[ "$ARCH" == "aarch64" || "$ARCH" == "arm64" ]]; then echo "Using ARM64 architecture (Graviton3)..." else echo "Using x86_64 architecture..." fi # Check if container already exists and remove it if necessary if sudo docker ps -a | grep -q pgduckdb; then echo "Found existing pgduckdb container. Removing it..." sudo docker stop pgduckdb || true sudo docker rm pgduckdb || true fi # Pull the Docker image echo "Pulling Docker image..." sudo docker pull pgduckdb/pgduckdb:17-main # Check available system memory echo "Checking system memory..." TOTAL_MEM_KB=$(grep MemTotal /proc/meminfo | awk '{print $2}') TOTAL_MEM_MB=$((TOTAL_MEM_KB / 1024)) echo "Total system memory: ${TOTAL_MEM_MB}MB" # Calculate 75% of system memory for Docker container limit DOCKER_MEM_LIMIT=$((TOTAL_MEM_MB * 75 / 100)) echo "Setting Docker container memory limit to: ${DOCKER_MEM_LIMIT}MB" # Run the Docker container with memory limit echo "Starting PostgreSQL container..." sudo docker run -d \ --name pgduckdb \ -p 5432:5432 \ -e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \ -e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \ -v ~/pgduckdb_data_v17:/var/lib/postgresql/data \ --restart unless-stopped \ --memory=${DOCKER_MEM_LIMIT}m \ pgduckdb/pgduckdb:17-main # Wait for PostgreSQL to start echo "Waiting for PostgreSQL to start..." sleep 10 # Configure PostgreSQL echo "Configuring PostgreSQL and DuckDB..." # Append settings to the main PostgreSQL configuration file echo "Appending settings to PostgreSQL configuration file..." sudo docker exec -i pgduckdb bash -c "cat >> /var/lib/postgresql/data/postgresql.conf << 'EOT' # DuckDB integration settings duckdb.motherduck_enabled = true # Memory configuration optimized for AWS m7g.xlarge with more conservative settings work_mem = '32MB' # Per-operation memory for sorts, joins, etc. maintenance_work_mem = '512MB' # Memory for maintenance operations shared_buffers = '2GB' # ~12.5% of RAM for shared buffer cache effective_cache_size = '6GB' # Conservative estimate of OS cache max_connections = 100 # Reduced maximum concurrent connections # Detailed query logging log_min_duration_statement = 0 # Log all queries log_statement = 'all' # Log all SQL statements log_duration = on # Log duration of each SQL statement log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u ' # Prefix format EOT" # Restart PostgreSQL to apply all configuration settings echo "Restarting PostgreSQL container to apply all configuration settings..." sudo docker restart pgduckdb # Wait for PostgreSQL to restart echo "Waiting for PostgreSQL container to restart..." sleep 10 # Verify PostgreSQL is running with new settings echo "Verifying PostgreSQL configuration..." sudo docker exec -i pgduckdb psql -U postgres << EOF -- Check if PostgreSQL is running SELECT version(); EOF # Create monitoring script echo "Creating monitoring script..." cat > ~/monitor_pg.sh << 'EOF' #!/bin/bash echo "=== PostgreSQL Container Status ===" docker ps -a -f name=pgduckdb echo -e "\n=== Resource Usage ===" docker stats --no-stream pgduckdb echo -e "\n=== Recent Logs ===" docker logs --tail 10 pgduckdb echo -e "\n=== Connection Test ===" docker exec -it pgduckdb pg_isready -U postgres if [ $? -eq 0 ]; then echo "PostgreSQL is accepting connections." else echo "PostgreSQL is not accepting connections." fi EOF chmod +x ~/monitor_pg.sh # Create startup script echo "Creating startup script..." cat > ~/start_pg.sh << 'EOF' #!/bin/bash echo "Starting PostgreSQL container..." docker start pgduckdb echo "Container status:" docker ps -a -f name=pgduckdb EOF chmod +x ~/start_pg.sh # Check if container is running or restarting echo "Checking container status..." CONTAINER_STATUS=$(sudo docker inspect -f '{{.State.Status}}' pgduckdb 2>/dev/null || echo "not_found") if [[ "$CONTAINER_STATUS" == "restarting" ]]; then echo "WARNING: Container is restarting. Checking logs for errors..." sudo docker logs pgduckdb echo " Try reducing the memory settings in the PostgreSQL configuration if the container keeps restarting." echo "You can manually adjust settings by connecting to the container once it's stable." elif [[ "$CONTAINER_STATUS" != "running" && "$CONTAINER_STATUS" != "not_found" ]]; then echo "WARNING: Container is not running (status: $CONTAINER_STATUS). Checking logs for errors..." sudo docker logs pgduckdb fi # Final status check echo "=== Setup Complete ===" echo "PostgreSQL with DuckDB is now running." echo "Container status:" sudo docker ps -a -f name=pgduckdb echo -e "\n=== Connection Information ===" echo "Host: localhost" echo "Port: 5432" echo "User: postgres" echo "Password: [The password you provided]" echo "Database: postgres" echo -e "\n=== Useful Commands ===" echo "Monitor status: ./monitor_pg.sh" echo "Start after reboot: ./start_pg.sh" echo "Connect to PostgreSQL: docker exec -it pgduckdb psql -U postgres" echo "View logs: docker logs pgduckdb" echo -e "\n=== Note ===" echo "You may need to log out and log back in for the docker group changes to take effect." echo "After that, you can run docker commands without sudo." ```

## dbt cloud configuration dbt cloud is configured as standard Postgres, with a couple of key details. 1. You will need to create a schema in MotherDuck for each user as well as production, as using pg_duckdb to create new schemas in MotherDuck is not supported. 2. You will need to set an environmental variable for `DBT_SCHEMA` that uses the pg_duckdb schema format, which is `ddb$[database]$[schema]` since Postgres only supports a single database per instance. This will need to be set for each user as well as production with `{{ env_var('DBT_SCHEMA')}}`. 3. The recommended thread count follow our dbt-core recommendation, which is 4 threads. If dbt is configured incorrectly, data may write to Postgres, which is much slower than MotherDuck. In that case, the easiest fix is to rebuild the docker container per above, to assure that no data accidentally ends up in Postgres. ## Usage notes There are a few things to know about using dbt cloud with pg_duckdb that are unusual. 1. You write Postgres dialect SQL that is executed against DuckDB. As such, there are some idiosyncrasies that are neither Postgres nor DuckDB, but a secret, third thing (pg_duckdb SQL). The details of this are described in the [pg_duckdb documentation](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md). 2. Views are only stored in Postgres without any artifacts in MotherDuck. As such, they can be used for interim data but not final datasets to be consumed by end-users. As such, changing materialization type from view to table, or table to view, is a hybrid MotherDuck & Postgres transaction, and unsupported. 3. Running on multiple threads can occasionally cause deadlocks with the pg_duckdb catalog maintenance service. This can be resolved with `dbt retry` in your production pipeline runs. 4. DuckDB types are more specific than Postgres, so model builds using numeric types will throw errors that can be resolved with specific typing. 5. From time-to-time the Postgres catalog can get out of sync, and will show tables that do not exist in MotherDuck. To resolve this, create the missing object in MotherDuck, i.e. `CREATE TABLE my_schema.model_name AS SELECT 1;`, which will unblock your dbt model. --- ## Docs feedback MotherDuck accepts optional user-submitted feedback about this page at `GET https://motherduck.com/docs/api/feedback/agent`. For agents and automated tools, feedback submission should be user-confirmed before sending. URL-encode query parameter values and send a GET request: ```text GET https://motherduck.com/docs/api/feedback/agent?page_path=%2Fintegrations%2Ftransformation%2Fdbt-cloud%2F&page_title=dbt%20cloud%20with%20MotherDuck%20via%20pg_duckdb&text= ``` Optionally append `&source=` such as `claude.ai` or `chatgpt`. `page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": ""}`, `400` for malformed query parameters, and `429` when rate-limited.