GitHub Actions
GitHub Actions works well as a lightweight orchestrator for simple MotherDuck jobs: nightly SQL scripts, small ELT steps, dbt builds, smoke tests, and periodic exports. It is not a full data orchestrator, but it is often enough when a pipeline has one or two steps and can tolerate GitHub's scheduler behavior.
When to use this pattern
| Use GitHub Actions when | Use a dedicated orchestrator when |
|---|---|
| The job has a small number of steps | Jobs have complex dependencies or branching |
| A missed or delayed run can be retried manually | Every run needs strict service-level guarantees |
| The pipeline can run from repository files | State, retries, and backfills need first-class tracking |
| GitHub is already where you review pipeline changes | Multiple teams need a shared orchestration UI |
For larger workflows, use a tool from the MotherDuck orchestration ecosystem.
Set up authentication
Create a MotherDuck access token, preferably from a service account dedicated to the pipeline. Store it as a GitHub repository secret named MOTHERDUCK_TOKEN:
gh secret set MOTHERDUCK_TOKEN
Use the token as an environment variable in workflow steps. Avoid putting tokens directly into SQL files, command arguments, artifacts, or logs.
Choose the trigger
Most MotherDuck cron jobs should support both manual and scheduled runs with GitHub Actions workflow_dispatch and schedule triggers:
on:
workflow_dispatch:
schedule:
- cron: "17 2 * * *"
Keep these GitHub Actions scheduling details in mind:
- Scheduled workflows run from the latest commit on the default branch.
- Cron schedules use UTC by default.
- The shortest supported interval is every 5 minutes.
- Jobs scheduled at the top of the hour can be delayed or dropped during periods of high GitHub Actions load. Pick a non-zero minute such as
17or43. workflow_dispatchlets you test the same workflow manually and rerun failed jobs after a fix.
Example: run a SQL file on a schedule
This example runs a checked-in SQL script every night and on demand. It uses:
- Least-privilege repository permissions
- A timeout so failed jobs do not burn runner minutes indefinitely
- A concurrency group so two runs do not write to the same target at once
- The MotherDuck install script for a compatible DuckDB CLI
Create .github/workflows/motherduck-nightly-sql.yml:
name: motherduck nightly sql
on:
workflow_dispatch:
schedule:
- cron: "17 2 * * *"
permissions:
contents: read
concurrency:
group: motherduck-nightly-sql
cancel-in-progress: false
jobs:
run-sql:
runs-on: ubuntu-24.04
timeout-minutes: 15
env:
motherduck_token: ${{ secrets.MOTHERDUCK_TOKEN }}
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Install DuckDB CLI
run: |
install_home="$RUNNER_TEMP/motherduck"
mkdir -p "$install_home"
curl -s https://install.motherduck.com | env -u motherduck_token HOME="$install_home" sh
echo "$install_home/.duckdb/cli/latest" >> "$GITHUB_PATH"
- name: Run nightly SQL
run: duckdb "md:" < sql/nightly_orders.sql
Create sql/nightly_orders.sql:
CREATE DATABASE IF NOT EXISTS analytics;
USE analytics;
CREATE SCHEMA IF NOT EXISTS orchestration;
CREATE TABLE IF NOT EXISTS orchestration.github_action_runs (
run_id VARCHAR,
workflow_name VARCHAR,
run_started_at TIMESTAMP
);
DELETE FROM orchestration.github_action_runs
WHERE run_id = getenv('GITHUB_RUN_ID');
INSERT INTO orchestration.github_action_runs
VALUES (
getenv('GITHUB_RUN_ID'),
getenv('GITHUB_WORKFLOW'),
current_timestamp
);
Replace analytics with the MotherDuck database your pipeline should write to. The example creates the database if it does not already exist so a new repository can run without extra setup.
The GitHub secret is named MOTHERDUCK_TOKEN, while the workflow exposes it as motherduck_token. The DuckDB CLI can use that environment variable to connect to MotherDuck non-interactively in GitHub Actions.
The install step uses RUNNER_TEMP as HOME and unsets motherduck_token for the installer process so the install script does not try to update the runner's shell profile or validate the connection before the SQL step runs.
Example: run dbt on a schedule
For dbt projects, keep the dbt profile in the repository and read the MotherDuck token from the GitHub secret.
Create .github/workflows/motherduck-dbt.yml:
name: motherduck dbt
on:
workflow_dispatch:
schedule:
- cron: "43 3 * * *"
permissions:
contents: read
concurrency:
group: motherduck-dbt-prod
cancel-in-progress: false
jobs:
dbt-build:
runs-on: ubuntu-24.04
timeout-minutes: 30
env:
MOTHERDUCK_TOKEN: ${{ secrets.MOTHERDUCK_TOKEN }}
steps:
- name: Check out repository
uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: "3.12"
cache: pip
- name: Install dbt
run: python -m pip install -r requirements.txt
- name: Install dbt packages
run: dbt deps
- name: Build dbt project
run: dbt build --profiles-dir .github/dbt --target prod
Create requirements.txt:
dbt-duckdb>=1.9,<2.0
Create .github/dbt/profiles.yml:
motherduck:
target: prod
outputs:
prod:
type: duckdb
path: "md:analytics?motherduck_token={{ env_var('MOTHERDUCK_TOKEN') }}"
threads: 4
In dbt_project.yml, set the same profile name:
profile: motherduck
Production checklist
| Area | Recommendation |
|---|---|
| Authentication | Use a service account token stored as MOTHERDUCK_TOKEN. Rotate it on the same cadence as other production secrets. |
| Permissions | Set permissions: contents: read unless the workflow must write to the repository or call GitHub APIs. |
| Scheduling | Use non-zero cron minutes and keep workflow_dispatch enabled for manual retries. |
| Concurrency | Use a concurrency group for jobs that write to the same tables. |
| Idempotency | Make SQL safe to rerun. Prefer CREATE TABLE IF NOT EXISTS, CREATE OR REPLACE TABLE, MERGE, or delete-and-insert patterns keyed by the run or partition. |
| Timeouts | Set timeout-minutes on every job. |
| Dependencies | Pin dependencies in requirements.txt or an equivalent lock file. Use dependency caching for Python/dbt jobs. |
| Environments | Use separate service accounts and databases for development, staging, and production. |
| Observability | Write a run record to a small audit table and rely on GitHub Actions notifications for failures. |