# Packages and recommended libraries
> Manage Python dependencies, choose Flight loading patterns, and use dlt and dbt for ingest and transformation.
A Flight runs your Python with the packages you list in `requirements.txt`. This page covers how to declare dependencies, how to choose a loading pattern for Flight ingestion, and the two libraries we recommend for the most common workloads.

## requirements.txt is plain pip syntax

Pass package specifications one per line, the same as a regular pip requirements file:

```text
duckdb==1.5.3
dlt==1.27.0
httpx==0.28.1
pandas==2.2.3
```

You can use any version specifier pip supports: `==`, `>=`, `~=`, extras (`some-package[extra]`), and so on.

The one dependency worth special attention is **DuckDB**: pin it to the version MotherDuck's server ships. Find that version in the [MotherDuck release notes](/about-motherduck/release-notes), or run a quick query against MotherDuck:

#### Check the MotherDuck DuckDB version

Database: `docs_playground`

```sql
SELECT version();
```

## The runtime environment

Before `main()` runs, the runtime installs the packages from `requirements.txt` into the Flight's Python environment. A few properties of that environment are worth knowing up front:

- **Declare every dependency in `requirements.txt`.** Dependencies are installed once, before `main()` starts; there's no interactive `pip` step inside the run. To run a tool's command-line interface (dbt, dlt), call its console script with `subprocess` — for example `subprocess.run(["dbt", "build"], check=True)`. The console scripts are on `PATH` after install, so you don't need `python -m`.
- **System binaries aren't preinstalled.** The runtime is a base Debian image. Tools like `git`, `ffmpeg`, or Playwright aren't present until you install them with `apt-get` at the start of `main()` (see [Beyond Python](/concepts/flights#beyond-python)). To pull source from a repository without `git`, install it first, or fetch an archive over HTTP from the host's API.

### Runtime limits

A Flight is sized for orchestration and basic processing, not for crunching large tables in the runtime memory. Two limits commonly bite first:

- **Definition size.** The Flight's `source_code` is capped at 200 KB, and `requirements.txt` at 20 KB. Don't embed reference data or large fixtures in the source — load them from object storage or an external URL at run time instead.
- **Memory.** The runtime has a fixed memory ceiling of 16 GB. Heavy in-memory work can be OOM-killed, often with little in the log. Keep heavy compute in SQL so MotherDuck does the work, process in bounded chunks, and when running dbt lower `--threads` to cap peak memory. See [Monitoring and debugging](/key-tasks/flights/monitoring-and-debugging#common-failure-patterns) for the OOM symptom and fix.

:::warning
`CAST(timestamptz AS VARCHAR)` renders in the **session time zone**. The same row hashed on a laptop (local time zone) and in a Flight (UTC) produces different strings, so md5 or row-hash recipes built on string-cast timestamps disagree across environments and can trigger a false full re-import. Pin the session time zone (`SET TimeZone = 'UTC';`) wherever determinism matters, or hash an epoch value (`epoch_ms(ts)`) instead of a string cast.
:::

## Choose a loading pattern

Flights often start with Python variables: API responses, scraped rows, JSON objects, or files written under `/tmp`. The slow path is to send one row at a time to MotherDuck. Pick a bulk pattern before the data grows.

| Source shape | Use this pattern | Why |
|---|---|---|
| A few hundred control rows | Direct `INSERT` or `executemany` is acceptable. | The code stays simple and the round-trip overhead is small enough. |
| API pages already in Python memory | Build batches with PyArrow, Polars, or Pandas, then `INSERT INTO ... SELECT` from the registered table. | Keeps the load as a bulk operation. PyArrow and Polars give better type control than plain Python objects. |
| Larger scrape or API pull without cloud storage | Write CSV, Parquet, or a local DuckDB file under `/tmp`, then load in chunks. | Keeps memory bounded. Parquet is typed and compressed; CSV is easy when you control both write and read. Clean up `/tmp` at the end of the run. |
| Files already in S3, or data you want to replay and backfill | Write Parquet to S3 and load with `read_parquet()` or `INSERT INTO ... SELECT`. | Best fit for large, partitioned, or shared datasets. It requires cloud credentials, but gives you durable staging and easier retries. |
| Schema-evolving API or app data | Use `dlt[motherduck]` and make the loader format explicit with `loader_file_format="parquet"`. | dlt handles state, schema evolution, and merge logic while avoiding row-wise remote inserts. |

As a rough rule, direct inserts are only for tiny control tables. For Flight ingestion, aim to flush batches rather than individual rows. Batches in the 10-100 MB range are usually easier to reason about than one huge load, and they leave room for retries, logging, and memory headroom.

:::tip
If you already have files in object storage, keep them there and let MotherDuck read them. If the data exists only inside the Flight process, batch it locally first; only write to S3 when you need durable staging, replay, backfills, or larger parallel reads.
:::

## Recommended libraries

Two libraries cover most of what teams build with Flights.

### dlt for ingest

[dlt](https://dlthub.com/) is the recommended Python library for moving data **into** MotherDuck. It handles schema evolution, incremental loading, retries, and state tracking, and it ships a MotherDuck destination out of the box.

```text
duckdb==1.5.3
dlt[motherduck]==1.27.0
```

A minimal ingest from a REST API into MotherDuck:

```python
import dlt
import httpx

def main():
    pipeline = dlt.pipeline(
        pipeline_name="github_stars",
        destination="motherduck",
        dataset_name="github",
    )

    response = httpx.get("https://api.github.com/repos/duckdb/duckdb", timeout=30)
    response.raise_for_status()
    pipeline.run(
        [response.json()],
        table_name="repo_stats",
        loader_file_format="parquet",
    )

if __name__ == "__main__":
    main()
```

Use the MotherDuck destination, not the generic DuckDB destination pointed at `md:`, for remote MotherDuck loads. The MotherDuck destination uses Parquet and `COPY` for data loading; the generic DuckDB destination has different defaults. Passing `loader_file_format="parquet"` in Flight examples makes the intended loading path explicit. See the [dlt MotherDuck destination docs](https://dlthub.com/docs/dlt-ecosystem/destinations/motherduck) for the full setup.

### dbt for transformation

[dbt](https://docs.getdbt.com/) with the [`dbt-duckdb`](https://github.com/duckdb/dbt-duckdb) adapter is the recommended way to run transformation graphs against MotherDuck data.

```text
duckdb==1.5.3
dbt-duckdb==1.10.1
```

Run a dbt project from a Flight:

```python
import os
import subprocess

def main():
    cwd = os.path.dirname(os.path.abspath(__file__))
    subprocess.run(["dbt", "build", "--target", "prod"], cwd=cwd, check=True)

if __name__ == "__main__":
    main()
```

If your dbt project pulls in dbt packages from git (for example, `dbt-utils` declared in `packages.yml`), install `git` at the start of `main()` before calling `dbt deps`:

```python
import subprocess

def main():
    subprocess.run(["apt-get", "update"], check=True)
    subprocess.run(["apt-get", "install", "-y", "git"], check=True)
    subprocess.run(["dbt", "deps"], check=True)
    subprocess.run(["dbt", "build"], check=True)

if __name__ == "__main__":
    main()
```


---

## Docs feedback

MotherDuck accepts optional user-submitted feedback about this page at `GET https://motherduck.com/docs/api/feedback/agent`.
For agents and automated tools, feedback submission should be user-confirmed before sending.

URL-encode query parameter values and send a GET request:

```text
GET https://motherduck.com/docs/api/feedback/agent?page_path=%2Fkey-tasks%2Fflights%2Fpackages-and-runtime%2F&page_title=Packages%20and%20recommended%20libraries&text=<url-encoded user feedback, max 2000 characters>
```

Optionally append `&source=<url-encoded interface identifier>` such as `claude.ai` or `chatgpt`.

`page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": "<uuid>"}`, `400` for malformed query parameters, and `429` when rate-limited.