# Run a dlt ingest pipeline
> Create a Flight that runs a dlt pipeline into MotherDuck on a schedule.
You want Python ingestion that handles API calls, schema drift, state, and load packages without hand-writing every `INSERT`. In this guide, a Flight runs a [dlt](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck) pipeline that fetches public GitHub repository metadata, loads it into `docs_playground.flights_demo_dlt.github_repo_stats`, and records each run in `docs_playground.flights_demo.dlt_ingest_runs`.

```mermaid
flowchart LR
    API["GitHub API"]:::green --> DLT["dlt pipeline<br/>inside a Flight"]:::yellow
    DLT --> Tables[("docs_playground.flights_demo_dlt.*")]:::yellow
    DLT --> Ledger[("docs_playground.flights_demo.dlt_ingest_runs")]:::yellow
```

The dlt dataset and run ledger live in your own MotherDuck account, so you can inspect the generated tables, replace the demo `repo_rows()` source with a real API, or add the output to a dashboard.

## Before you start

The Flight runtime authenticates to MotherDuck for you and injects the credential as `MOTHERDUCK_TOKEN`, which dlt's MotherDuck destination picks up automatically. To run a scheduled Flight as a service account instead, see [Authentication, config, and secrets](/key-tasks/flights/flights-authentication-config-and-secrets).

:::tip
Use `dlt[motherduck]` with `destination="motherduck"` and pass `loader_file_format="parquet"` in the run call. That keeps the Flight's loading path explicit and avoids falling back to row-wise `insert_values` if the destination or loader config changes.
:::

The demo uses a small public GitHub API call so you can run the whole flow without extra credentials. If you replace it with a private API, keep secrets out of Flight `config` and read credentials from a MotherDuck-managed secret or another short-lived credential source.

## Create the Flight

Create the Flight. The code sets the dlt MotherDuck destination database to `docs_playground`; the injected `MOTHERDUCK_TOKEN` supplies the credential. Create it on demand first, then add a schedule after you verify the run.

#### Create the dlt ingest Flight

Database: `docs_playground`

```sql
SELECT flight_id, flight_name, current_version
FROM MD_CREATE_FLIGHT(
    name := 'docs_dlt_ingest',
    requirements_txt := array_to_string([
        'duckdb==1.5.3',
        'dlt[motherduck]==1.27.0',
        'httpx==0.28.1'
    ], chr(10)),
    source_code := $flight$
import os
import duckdb
import dlt
import httpx

REPOS = [
    "duckdb/duckdb",
    "motherduckdb/motherduck-docs",
    "dlt-hub/dlt",
]

def repo_rows():
    for repo in REPOS:
        response = httpx.get(
            f"https://api.github.com/repos/{repo}",
            timeout=30,
            headers={"Accept": "application/vnd.github+json"},
        )
        response.raise_for_status()
        payload = response.json()
        yield {
            "repo": repo,
            "stars": payload.get("stargazers_count"),
            "forks": payload.get("forks_count"),
            "open_issues": payload.get("open_issues_count"),
            "default_branch": payload.get("default_branch"),
            "pushed_at": payload.get("pushed_at"),
            "loaded_at": payload.get("updated_at"),
        }

def main():
    os.environ.setdefault("HOME", "/tmp")
    os.environ["DESTINATION__MOTHERDUCK__CREDENTIALS__DATABASE"] = "docs_playground"

    pipeline = dlt.pipeline(
        pipeline_name="flights_github_repo_stats",
        destination="motherduck",
        dataset_name="flights_demo_dlt",
    )
    load_info = pipeline.run(
        repo_rows(),
        table_name="github_repo_stats",
        write_disposition="merge",
        primary_key="repo",
        loader_file_format="parquet",
    )

    con = duckdb.connect("md:")
    con.execute("CREATE SCHEMA IF NOT EXISTS docs_playground.flights_demo")
    con.execute("""
        CREATE TABLE IF NOT EXISTS docs_playground.flights_demo.dlt_ingest_runs (
            run_at TIMESTAMPTZ,
            pipeline_name VARCHAR,
            destination_dataset VARCHAR,
            load_summary VARCHAR
        )
    """)
    con.execute(
        """
        INSERT INTO docs_playground.flights_demo.dlt_ingest_runs
        VALUES (current_timestamp, ?, ?, ?)
        """,
        ["flights_github_repo_stats", "flights_demo_dlt", str(load_info)],
    )
    print(load_info)

if __name__ == "__main__":
    main()
$flight$
);
```

## Run and inspect it

The `MD_*` Flight table functions only accept literal parameters, not subqueries or lateral join columns, so store the Flight ID in a SQL variable first. The next cells reuse it through `getvariable`:

#### Set the dlt Flight ID

Database: `docs_playground`

```sql
SET VARIABLE dlt_flight_id = (
    SELECT flight_id
    FROM MD_LIST_FLIGHTS()
    WHERE flight_name = 'docs_dlt_ingest'
    ORDER BY created_at DESC
    LIMIT 1
);
```

Trigger a manual run:

#### Run the dlt Flight

Database: `docs_playground`

```sql
SELECT *
FROM MD_RUN_FLIGHT(
    flight_id := getvariable('dlt_flight_id')
);
```

Poll for completion:

#### Check dlt Flight runs

Database: `docs_playground`

```sql
SELECT run_number, status, flight_version, created_at
FROM MD_LIST_FLIGHT_RUNS(
    flight_id := getvariable('dlt_flight_id')
)
ORDER BY run_number DESC
LIMIT 5;
```

## Schedule the pipeline

After the manual run succeeds, add a daily `07:15 UTC` schedule. Schedule updates are metadata-only; they do not create a new Flight version.

#### Schedule the dlt Flight

Database: `docs_playground`

```sql
CALL MD_UPDATE_FLIGHT(
    flight_id := getvariable('dlt_flight_id'),
    schedule_cron := '15 7 * * *'
);
```

Query the table dlt created:

#### Read dlt-loaded repo stats

Database: `docs_playground`

```sql
SELECT repo, stars, forks, open_issues, default_branch, pushed_at
FROM docs_playground.flights_demo_dlt.github_repo_stats
ORDER BY stars DESC;
```

The ledger table captures the dlt load package summary:

#### Read the dlt load ledger

Database: `docs_playground`

```sql
SELECT run_at, pipeline_name, destination_dataset, load_summary
FROM docs_playground.flights_demo.dlt_ingest_runs
ORDER BY run_at DESC
LIMIT 5;
```

## Why this dlt setup

The important default is the load format. For MotherDuck, prefer Parquet loader files over row-wise `insert_values`. The Flight example makes that choice explicit with `loader_file_format="parquet"` so larger sources stay on a bulk-loading path.

Use this dlt pattern when you want schema evolution, state tracking, merge behavior, or a source connector. If you already have clean Parquet files in S3, the S3 guide is simpler. If you only have a few hundred rows of control metadata, direct inserts are fine.

## Adapt the pattern

- Replace `repo_rows()` with a dlt source for your API, database, or file system.
- Move run-specific values such as the repo list into the Flight's `config` (for example a comma-separated `REPOS` key read with `os.environ`), then [override them for a single run](/key-tasks/flights/scheduling-and-runs#override-config-for-a-single-run) with the `config` argument of `MD_RUN_FLIGHT` instead of editing the source.
- Keep `DESTINATION__MOTHERDUCK__CREDENTIALS__DATABASE` pointed at the database where dlt should create datasets.
- Use `write_disposition="merge"` with a `primary_key` for entity tables and `append` for event streams.
- Keep `loader_file_format="parquet"` unless you have measured a reason to change it.
- Lower dlt load workers if a source or network path is unreliable. See the [dlt MotherDuck destination docs](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck).

## Related resources

- [dlt MotherDuck destination](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck)
- [Packages and recommended libraries](/key-tasks/flights/packages-and-runtime)
- [MD_CREATE_FLIGHT](/sql-reference/motherduck-sql-reference/flights/md-create-flight)


---

## Docs feedback

MotherDuck accepts optional user-submitted feedback about this page at `GET https://motherduck.com/docs/api/feedback/agent`.
For agents and automated tools, feedback submission should be user-confirmed before sending.

URL-encode query parameter values and send a GET request:

```text
GET https://motherduck.com/docs/api/feedback/agent?page_path=%2Fkey-tasks%2Fflights%2Frun-dlt-ingest-pipeline%2F&page_title=Run%20a%20dlt%20ingest%20pipeline&text=<url-encoded user feedback, max 2000 characters>
```

Optionally append `&source=<url-encoded interface identifier>` such as `claude.ai` or `chatgpt`.

`page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": "<uuid>"}`, `400` for malformed query parameters, and `429` when rate-limited.