New: The AI Analytics Eval Field GuideGet the Free Playbook

Skip to main content

GitHub Actions

GitHub Actions works well as a lightweight orchestrator for simple MotherDuck jobs: nightly SQL scripts, small ELT steps, dbt builds, smoke tests, and periodic exports. It is not a full data orchestrator, but it is often enough when a pipeline has one or two steps and can tolerate GitHub's scheduler behavior.

When to use this pattern

Use GitHub Actions whenUse a dedicated orchestrator when
The job has a small number of stepsJobs have complex dependencies or branching
A missed or delayed run can be retried manuallyEvery run needs strict service-level guarantees
The pipeline can run from repository filesState, retries, and backfills need first-class tracking
GitHub is already where you review pipeline changesMultiple teams need a shared orchestration UI

For larger workflows, use a tool from the MotherDuck orchestration ecosystem.

Set up authentication

Create a MotherDuck access token, preferably from a service account dedicated to the pipeline. Store it as a GitHub repository secret named MOTHERDUCK_TOKEN:

gh secret set MOTHERDUCK_TOKEN

Use the token as an environment variable in workflow steps. Avoid putting tokens directly into SQL files, command arguments, artifacts, or logs.

Choose the trigger

Most MotherDuck cron jobs should support both manual and scheduled runs with GitHub Actions workflow_dispatch and schedule triggers:

on:
workflow_dispatch:
schedule:
- cron: "17 2 * * *"

Keep these GitHub Actions scheduling details in mind:

  • Scheduled workflows run from the latest commit on the default branch.
  • Cron schedules use UTC by default.
  • The shortest supported interval is every 5 minutes.
  • Jobs scheduled at the top of the hour can be delayed or dropped during periods of high GitHub Actions load. Pick a non-zero minute such as 17 or 43.
  • workflow_dispatch lets you test the same workflow manually and rerun failed jobs after a fix.

Example: run a SQL file on a schedule

This example runs a checked-in SQL script every night and on demand. It uses:

  • Least-privilege repository permissions
  • A timeout so failed jobs do not burn runner minutes indefinitely
  • A concurrency group so two runs do not write to the same target at once
  • The MotherDuck install script for a compatible DuckDB CLI

Create .github/workflows/motherduck-nightly-sql.yml:

name: motherduck nightly sql

on:
workflow_dispatch:
schedule:
- cron: "17 2 * * *"

permissions:
contents: read

concurrency:
group: motherduck-nightly-sql
cancel-in-progress: false

jobs:
run-sql:
runs-on: ubuntu-24.04
timeout-minutes: 15
env:
motherduck_token: ${{ secrets.MOTHERDUCK_TOKEN }}
steps:
- name: Check out repository
uses: actions/checkout@v6

- name: Install DuckDB CLI
run: |
install_home="$RUNNER_TEMP/motherduck"
mkdir -p "$install_home"
curl -s https://install.motherduck.com | env -u motherduck_token HOME="$install_home" sh
echo "$install_home/.duckdb/cli/latest" >> "$GITHUB_PATH"

- name: Run nightly SQL
run: duckdb "md:" < sql/nightly_orders.sql

Create sql/nightly_orders.sql:

CREATE DATABASE IF NOT EXISTS analytics;
USE analytics;

CREATE SCHEMA IF NOT EXISTS orchestration;

CREATE TABLE IF NOT EXISTS orchestration.github_action_runs (
run_id VARCHAR,
workflow_name VARCHAR,
run_started_at TIMESTAMP
);

DELETE FROM orchestration.github_action_runs
WHERE run_id = getenv('GITHUB_RUN_ID');

INSERT INTO orchestration.github_action_runs
VALUES (
getenv('GITHUB_RUN_ID'),
getenv('GITHUB_WORKFLOW'),
current_timestamp
);

Replace analytics with the MotherDuck database your pipeline should write to. The example creates the database if it does not already exist so a new repository can run without extra setup.

The GitHub secret is named MOTHERDUCK_TOKEN, while the workflow exposes it as motherduck_token. The DuckDB CLI can use that environment variable to connect to MotherDuck non-interactively in GitHub Actions.

The install step uses RUNNER_TEMP as HOME and unsets motherduck_token for the installer process so the install script does not try to update the runner's shell profile or validate the connection before the SQL step runs.

Example: run dbt on a schedule

For dbt projects, keep the dbt profile in the repository and read the MotherDuck token from the GitHub secret.

Create .github/workflows/motherduck-dbt.yml:

name: motherduck dbt

on:
workflow_dispatch:
schedule:
- cron: "43 3 * * *"

permissions:
contents: read

concurrency:
group: motherduck-dbt-prod
cancel-in-progress: false

jobs:
dbt-build:
runs-on: ubuntu-24.04
timeout-minutes: 30
env:
MOTHERDUCK_TOKEN: ${{ secrets.MOTHERDUCK_TOKEN }}
steps:
- name: Check out repository
uses: actions/checkout@v6

- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: "3.12"
cache: pip

- name: Install dbt
run: python -m pip install -r requirements.txt

- name: Install dbt packages
run: dbt deps

- name: Build dbt project
run: dbt build --profiles-dir .github/dbt --target prod

Create requirements.txt:

dbt-duckdb>=1.9,<2.0

Create .github/dbt/profiles.yml:

motherduck:
target: prod
outputs:
prod:
type: duckdb
path: "md:analytics?motherduck_token={{ env_var('MOTHERDUCK_TOKEN') }}"
threads: 4

In dbt_project.yml, set the same profile name:

profile: motherduck

Production checklist

AreaRecommendation
AuthenticationUse a service account token stored as MOTHERDUCK_TOKEN. Rotate it on the same cadence as other production secrets.
PermissionsSet permissions: contents: read unless the workflow must write to the repository or call GitHub APIs.
SchedulingUse non-zero cron minutes and keep workflow_dispatch enabled for manual retries.
ConcurrencyUse a concurrency group for jobs that write to the same tables.
IdempotencyMake SQL safe to rerun. Prefer CREATE TABLE IF NOT EXISTS, CREATE OR REPLACE TABLE, MERGE, or delete-and-insert patterns keyed by the run or partition.
TimeoutsSet timeout-minutes on every job.
DependenciesPin dependencies in requirements.txt or an equivalent lock file. Use dependency caching for Python/dbt jobs.
EnvironmentsUse separate service accounts and databases for development, staging, and production.
ObservabilityWrite a run record to a small audit table and rely on GitHub Actions notifications for failures.