---
sidebar_position: 20
title: dbt cloud with MotherDuck via pg_duckdb
description: For dbt cloud users, pg_duckdb can be used as a shim for MotherDuck
sidebar_label: dbt cloud
---

[dbt cloud](https://www.getdbt.com/product/dbt-cloud) is a managed service for dbt core. MotherDuck is used with dbt cloud by deploying a Postgres proxy with [pg_duckdb](/concepts/pgduckdb) installed.

## Getting Started

You will need the following items to get started:
1. A Postgres instance with pg_duckdb installed.
2. A [MotherDuck token](/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token).
3. A dbt cloud account.

## Configuring pg_duckdb

The full documentation for pg_duckdb can be found on [GitHub](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md), but a simple way to set it up is using Docker on EC2.

In our testing, we have used m7g.xlarge, which is a 4-core, 16GB instance. Since Postgres exists as a proxy for MotherDuck, it only needs to have enough working space to stream results back to dbt. Even smaller instances could suffice as well, i.e. a1.large, although it has not been tested thoroughly. The memory limits set below assumes a 16GB limit.

Once you have added your MotherDuck Token and Postgres password to your environment, you can execute the `docker run` statement below:

```yml
docker run -d \
  --name pgduckdb \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \
  -e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \
  -v ~/pgduckdb_data_v17:/var/lib/postgresql/data \
  --restart unless-stopped \
  --memory=12288m \
  pgduckdb/pgduckdb:17-main
```

:::note
The default configuration of Postgres is sub-optimal for m7g.xlarge. Consider making the following changes to the `postgresql.conf` file.

```ini
# Memory configuration optimized for AWS m7g.xlarge with more conservative settings
work_mem = '32MB'                    # Per-operation memory for sorts, joins, etc.
maintenance_work_mem = '512MB'       # Memory for maintenance operations
shared_buffers = '2GB'               # ~12.5% of RAM for shared buffer cache
effective_cache_size = '6GB'         # Conservative estimate of OS cache
max_connections = 100                # Reduced maximum concurrent connections
```

:::

### Upgrading to newer builds of pg_duckdb
New containers are built for pg_duckdb on every release. Since we are using docker to run the container, the pg_duckdb server can be stopped, pruned, and then rebuilt with the above docker run command. It is recommended to use a script to rebuild docker image on some cadence. Terraform or similar is recommended to handle this maintenance process.

An example shell script can be seen below:

<details>
    <summary>Shell script</summary>
    ```sh

      #!/bin/bash

      # Error handling function
      handle_error() {
        local line_no=$1
        local exit_code=$2
        echo "ERROR: An error occurred at line ${line_no}, exit code ${exit_code}"
        exit ${exit_code}
      }

      # Set up error trap
      trap 'handle_error ${LINENO} $?' ERR

      # Script to install Docker and run PGDuckDB with MotherDuck on AWS EC2
      # Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh

      # Detect OS
      if grep -q 'Amazon Linux release 2023' /etc/os-release; then
        OS_VERSION="Amazon Linux 2023"
      elif grep -q 'Amazon Linux release 2' /etc/os-release; then
        OS_VERSION="Amazon Linux 2"
      elif grep -q 'Ubuntu' /etc/os-release; then
        OS_VERSION="Ubuntu"
      else
        OS_VERSION="Linux"
      fi

      echo "Starting setup for PGDuckDB with MotherDuck on $OS_VERSION..."

      # Check if required environment variables are set
      if [ -z "$POSTGRES_PASSWORD" ]; then
        echo "ERROR: POSTGRES_PASSWORD environment variable is not set."
        echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh"
        exit 1
      fi

      if [ -z "$MOTHERDUCK_TOKEN" ]; then
        echo "ERROR: MOTHERDUCK_TOKEN environment variable is not set."
        echo "Usage: POSTGRES_PASSWORD=your_secure_password MOTHERDUCK_TOKEN=your_md_token ./setup_pgduckdb.sh"
        exit 1
      fi

      # Update package lists - continue even if there are errors with some repositories
      echo "Updating package lists..."
      if [[ "$OS_VERSION" == "Ubuntu" ]]; then
        sudo apt-get update -y || true
      elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then
        sudo dnf update -y || true
      else
        sudo yum update -y || true
      fi

      # Check if Docker is already installed
      if command -v docker &>/dev/null; then
        echo "Docker is already installed, skipping installation."
      else
        # Install prerequisites based on OS
        echo "Installing prerequisites..."
        if [[ "$OS_VERSION" == "Ubuntu" ]]; then
          sudo apt-get install -y \
            apt-transport-https \
            ca-certificates \
            curl \
            gnupg \
            lsb-release
        elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then
          # Use --allowerasing to handle curl package conflicts
          sudo dnf install -y --allowerasing \
            device-mapper-persistent-data \
            lvm2 \
            ca-certificates
        else
          sudo yum install -y \
            device-mapper-persistent-data \
            lvm2 \
            ca-certificates
        fi

        # Install Docker based on OS
        echo "Installing Docker..."
        if [[ "$OS_VERSION" == "Ubuntu" ]]; then
          # Add Docker's official GPG key
          curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
          # Set up the repository
          echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
          # Update and install
          sudo apt-get update -y
          sudo apt-get install -y docker-ce docker-ce-cli containerd.io
        elif [[ "$OS_VERSION" == "Amazon Linux 2023" ]]; then
          # Amazon Linux 2023 - use the standard package
          sudo dnf install -y docker
        elif [[ "$OS_VERSION" == "Amazon Linux 2" ]]; then
          # Amazon Linux 2 - use extras
          sudo amazon-linux-extras install -y docker
        else
          # Fallback
          sudo yum install -y docker
        fi

        # Verify Docker was installed
        if ! command -v docker &>/dev/null; then
          echo "ERROR: Docker installation failed."
          exit 1
        fi
      fi

      # Start Docker service
      echo "Starting Docker service..."
      sudo systemctl start docker || sudo service docker start
      sudo systemctl enable docker || sudo chkconfig docker on

      # Add current user to docker group to avoid using sudo with docker commands
      echo "Adding current user to docker group..."
      sudo usermod -aG docker "$USER"

      # Create a new data directory for PostgreSQL 17
      echo "Creating new data directory for PostgreSQL 17..."
      mkdir -p ~/pgduckdb_data_v17

      # Fix permissions on the data directory
      echo "Setting correct permissions on data directory..."
      sudo chown -R 999:999 ~/pgduckdb_data_v17  # 999 is the standard UID for postgres user in Docker
      sudo chmod 700 ~/pgduckdb_data_v17

      # Check architecture
      ARCH=$(uname -m)
      echo "Detected architecture: $ARCH"

      if [[ "$ARCH" == "aarch64" || "$ARCH" == "arm64" ]]; then
        echo "Using ARM64 architecture (Graviton3)..."
      else
        echo "Using x86_64 architecture..."
      fi

      # Check if container already exists and remove it if necessary
      if sudo docker ps -a | grep -q pgduckdb; then
        echo "Found existing pgduckdb container. Removing it..."
        sudo docker stop pgduckdb || true
        sudo docker rm pgduckdb || true
      fi

      # Pull the Docker image
      echo "Pulling Docker image..."
      sudo docker pull pgduckdb/pgduckdb:17-main

      # Check available system memory
      echo "Checking system memory..."
      TOTAL_MEM_KB=$(grep MemTotal /proc/meminfo | awk '{print $2}')
      TOTAL_MEM_MB=$((TOTAL_MEM_KB / 1024))
      echo "Total system memory: ${TOTAL_MEM_MB}MB"

      # Calculate 75% of system memory for Docker container limit
      DOCKER_MEM_LIMIT=$((TOTAL_MEM_MB * 75 / 100))
      echo "Setting Docker container memory limit to: ${DOCKER_MEM_LIMIT}MB"

      # Run the Docker container with memory limit
      echo "Starting PostgreSQL container..."
      sudo docker run -d \
        --name pgduckdb \
        -p 5432:5432 \
        -e POSTGRES_PASSWORD="$POSTGRES_PASSWORD" \
        -e MOTHERDUCK_TOKEN="$MOTHERDUCK_TOKEN" \
        -v ~/pgduckdb_data_v17:/var/lib/postgresql/data \
        --restart unless-stopped \
        --memory=${DOCKER_MEM_LIMIT}m \
        pgduckdb/pgduckdb:17-main

      # Wait for PostgreSQL to start
      echo "Waiting for PostgreSQL to start..."
      sleep 10

      # Configure PostgreSQL
      echo "Configuring PostgreSQL and DuckDB..."

      # Append settings to the main PostgreSQL configuration file
      echo "Appending settings to PostgreSQL configuration file..."
      sudo docker exec -i pgduckdb bash -c "cat >> /var/lib/postgresql/data/postgresql.conf << 'EOT'

      # DuckDB integration settings
      duckdb.motherduck_enabled = true

      # Memory configuration optimized for AWS m7g.xlarge with more conservative settings
      work_mem = '32MB'                    # Per-operation memory for sorts, joins, etc.
      maintenance_work_mem = '512MB'       # Memory for maintenance operations
      shared_buffers = '2GB'               # ~12.5% of RAM for shared buffer cache
      effective_cache_size = '6GB'         # Conservative estimate of OS cache
      max_connections = 100                # Reduced maximum concurrent connections

      # Detailed query logging
      log_min_duration_statement = 0        # Log all queries
      log_statement = 'all'                 # Log all SQL statements
      log_duration = on                     # Log duration of each SQL statement
      log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u ' # Prefix format
      EOT"

      # Restart PostgreSQL to apply all configuration settings
      echo "Restarting PostgreSQL container to apply all configuration settings..."
      sudo docker restart pgduckdb

      # Wait for PostgreSQL to restart
      echo "Waiting for PostgreSQL container to restart..."
      sleep 10

      # Verify PostgreSQL is running with new settings
      echo "Verifying PostgreSQL configuration..."
      sudo docker exec -i pgduckdb psql -U postgres << EOF
      -- Check if PostgreSQL is running
      SELECT version();
      EOF

      # Create monitoring script
      echo "Creating monitoring script..."
      cat > ~/monitor_pg.sh << 'EOF'
      #!/bin/bash
      echo "=== PostgreSQL Container Status ==="
      docker ps -a -f name=pgduckdb

      echo -e "\n=== Resource Usage ==="
      docker stats --no-stream pgduckdb

      echo -e "\n=== Recent Logs ==="
      docker logs --tail 10 pgduckdb

      echo -e "\n=== Connection Test ==="
      docker exec -it pgduckdb pg_isready -U postgres
      if [ $? -eq 0 ]; then
        echo "PostgreSQL is accepting connections."
      else
        echo "PostgreSQL is not accepting connections."
      fi
      EOF
      chmod +x ~/monitor_pg.sh

      # Create startup script
      echo "Creating startup script..."
      cat > ~/start_pg.sh << 'EOF'
      #!/bin/bash
      echo "Starting PostgreSQL container..."
      docker start pgduckdb
      echo "Container status:"
      docker ps -a -f name=pgduckdb
      EOF
      chmod +x ~/start_pg.sh

      # Check if container is running or restarting
      echo "Checking container status..."
      CONTAINER_STATUS=$(sudo docker inspect -f '{{.State.Status}}' pgduckdb 2>/dev/null || echo "not_found")

      if [[ "$CONTAINER_STATUS" == "restarting" ]]; then
        echo "WARNING: Container is restarting. Checking logs for errors..."
        sudo docker logs pgduckdb
        echo "
      Try reducing the memory settings in the PostgreSQL configuration if the container keeps restarting."
        echo "You can manually adjust settings by connecting to the container once it's stable."
      elif [[ "$CONTAINER_STATUS" != "running" && "$CONTAINER_STATUS" != "not_found" ]]; then
        echo "WARNING: Container is not running (status: $CONTAINER_STATUS). Checking logs for errors..."
        sudo docker logs pgduckdb
      fi

      # Final status check
      echo "=== Setup Complete ==="
      echo "PostgreSQL with DuckDB is now running."
      echo "Container status:"
      sudo docker ps -a -f name=pgduckdb

      echo -e "\n=== Connection Information ==="
      echo "Host: localhost"
      echo "Port: 5432"
      echo "User: postgres"
      echo "Password: [The password you provided]"
      echo "Database: postgres"

      echo -e "\n=== Useful Commands ==="
      echo "Monitor status: ./monitor_pg.sh"
      echo "Start after reboot: ./start_pg.sh"
      echo "Connect to PostgreSQL: docker exec -it pgduckdb psql -U postgres"
      echo "View logs: docker logs pgduckdb"

      echo -e "\n=== Note ==="
      echo "You may need to log out and log back in for the docker group changes to take effect."
      echo "After that, you can run docker commands without sudo."
      ```
</details>

## dbt cloud configuration

dbt cloud is configured as standard Postgres, with a couple of key details.

1. You will need to create a schema in MotherDuck for each user as well as production, as using pg_duckdb to create new schemas in MotherDuck is not supported.
2. You will need to set an environmental variable for `DBT_SCHEMA` that uses the pg_duckdb schema format, which is `ddb$[database]$[schema]` since Postgres only supports a single databse per instance. This will need to be set for each user as well as production with `{{ env_var('DBT_SCHEMA')}}`.
3. The recommended thread count follow our dbt-core recommendation, which is 4 threads.

If dbt is configured incorrectly, data may write to Postgres, which is much slower than MotherDuck. In that case, the easiest fix is to rebuild the docker container per above, to assure that no data accidently ends up in Postgres.

## Usage notes

There are a few things to know about using dbt cloud with pg_duckdb that are unusual.

1. You write Postgres dialect SQL that is executed against DuckDB. As such, there is some ideosyncracies that are neither Postgres nor DuckDB, but a secret, third thing (pg_duckdb SQL). The details of this are described in the [pg_duckdb documentation](https://github.com/duckdb/pg_duckdb/blob/main/docs/README.md).
2. Views are only stored in Postgres without any artifacts in MotherDuck. As such, they can be used for interim data but not final datasets to be consumed by end-users. As such, changing materialization type from view to table, or table to view, is a hybrid MotherDuck & Postgres transaction, and unsupported.
3. Running on multiple threads can occasionally cause deadlocks with the pg_duckdb catalog maintenance service. This can be resolved with `dbt retry` in your production pipeline runs.
4. DuckDB types are more specific than Postgres, so model builds using numeric types will throw errors that can be resolved with specific typing.
5. From time-to-time the Postgres catalog can get out of sync, and will show tables that do not exist in MotherDuck. To resolve this, create the missing object in MotherDuck, i.e. `CREATE TABLE my_schema.model_name AS SELECT 1;`, which will unblock your dbt model.
