# Ingest S3 Parquet files on a schedule
> Create a Flight that refreshes a MotherDuck table from Parquet files in S3 on a cron schedule.
You have Parquet files in S3 and want a MotherDuck table to refresh on a schedule. This guide builds the smallest version that works: the Flight's `config` holds a source Parquet URL (or S3 glob) and a destination table, and each run rebuilds that table from the source.

The Flight reads two config values, so you change the source or the destination without editing the Python:

| Config key | Example | Purpose |
|---|---|---|
| `SOURCE` | `s3://my-bucket/events/**/*.parquet` | A Parquet URL or S3 glob to read. |
| `DESTINATION_TABLE` | `docs_playground.main.yellow_taxi` | Fully qualified destination table to rebuild. |

For an incremental version that prunes to a single Hive partition and keeps a run ledger, use the cookbook [Ingest partitioned S3 Parquet on a schedule](/cookbook/flight-scheduled-s3-ingest/).

## Before you start

The Flight runtime authenticates to MotherDuck for you: `duckdb.connect("md:")` inside the Flight picks up your identity automatically, so there is nothing to configure. To run a scheduled Flight as a service account instead, see [Authentication, config, and secrets](/key-tasks/flights/flights-authentication-config-and-secrets).

:::info
For private S3 paths, create a MotherDuck S3 secret first. The Flight's `read_parquet()` call uses the matching secret when it's available to the user or service account the Flight runs as.
:::

Preview the public source file before scheduling anything:

#### Preview the public S3 Parquet file

Database: `docs_playground`

```sql
SELECT passenger_count, trip_distance, total_amount, tpep_pickup_datetime
FROM read_parquet('s3://us-prd-motherduck-open-datasets/nyc_taxi/parquet/yellow_cab_nyc_2022_11.parquet')
LIMIT 10;
```

## Create the Flight

The `config` map keeps the source URI and destination table outside the Python source. Create the Flight without a schedule so you can run it once by hand first.

#### Create the S3 Parquet ingest Flight

Database: `docs_playground`

```sql
SELECT flight_id, flight_name, current_version
FROM MD_CREATE_FLIGHT(
    name := 's3_parquet_ingest_docs_demo',
    config := MAP {
        'SOURCE': 's3://us-prd-motherduck-open-datasets/nyc_taxi/parquet/yellow_cab_nyc_2022_11.parquet',
        'DESTINATION_TABLE': 'docs_playground.main.yellow_taxi'
    },
    requirements_txt := 'duckdb==1.5.3',
    source_code := $flight$
import os
import duckdb

def main():
    source = os.environ["SOURCE"]
    destination_table = os.environ["DESTINATION_TABLE"]

    con = duckdb.connect("md:")
    con.execute(f"CREATE OR REPLACE TABLE {destination_table} AS SELECT * FROM read_parquet('{source}')")
    row_count = con.execute(f"SELECT count(*) FROM {destination_table}").fetchone()[0]
    print(f"loaded {row_count} rows into {destination_table} from {source}")

if __name__ == "__main__":
    main()
$flight$
);
```

One Flight is enough: because `config` can be overridden per run, you reuse the same Flight for a one-off backfill instead of creating a uniquely named Flight per source. See [Run a one-off backfill](#run-a-one-off-backfill) below.

## Run it

Trigger a manual run before you trust the schedule. The `MD_*` Flight table functions only accept literal parameters, not subqueries, so store the Flight ID in a SQL variable first. The next cells reuse it through `getvariable`:

#### Set the S3 ingest Flight ID

Database: `docs_playground`

```sql
SET VARIABLE s3_ingest_flight_id = (
    SELECT flight_id
    FROM MD_LIST_FLIGHTS()
    WHERE flight_name = 's3_parquet_ingest_docs_demo'
    ORDER BY created_at DESC
    LIMIT 1
);
```

#### Run the S3 ingest Flight

Database: `docs_playground`

```sql
SELECT *
FROM MD_RUN_FLIGHT(
    flight_id := getvariable('s3_ingest_flight_id')
);
```

Runs are asynchronous. Poll the run history until the latest run reaches a terminal status:

#### Check the run status

Database: `docs_playground`

```sql
SELECT run_number, status, flight_version, created_at
FROM MD_LIST_FLIGHT_RUNS(
    flight_id := getvariable('s3_ingest_flight_id')
)
ORDER BY run_number DESC
LIMIT 5;
```

When the run succeeds, the table is ready for SQL:

#### Query the refreshed table

Database: `docs_playground`

```sql
SELECT CAST(tpep_pickup_datetime AS DATE) AS pickup_date,
       count(*) AS trips,
       round(avg(total_amount), 2) AS avg_total
FROM docs_playground.main.yellow_taxi
GROUP BY ALL
ORDER BY pickup_date
LIMIT 15;
```

## Schedule the refresh

After the manual run succeeds, add a daily `06:30 UTC` schedule. Schedule updates are metadata-only and don't create a new Flight version.

#### Schedule the S3 ingest Flight

Database: `docs_playground`

```sql
CALL MD_UPDATE_FLIGHT(
    flight_id := getvariable('s3_ingest_flight_id'),
    schedule_cron := '30 6 * * *'
);
```

## Run a one-off backfill

To load a different file or write to a different table for a single run, override `config` on `MD_RUN_FLIGHT`. You can only override keys already defined on the Flight, and the override applies to that run alone:

#### Backfill a different month with a per-run override

Database: `docs_playground`

```sql
CALL MD_RUN_FLIGHT(
    flight_id := getvariable('s3_ingest_flight_id'),
    config := MAP {
        'SOURCE': 's3://us-prd-motherduck-open-datasets/nyc_taxi/parquet/yellow_cab_nyc_2022_10.parquet'
    }
);
```

## Adapt the pattern

- Point `SOURCE` at your own bucket with a glob, for example `s3://your-bucket/events/**/*.parquet`.
- Change `DESTINATION_TABLE` to any fully qualified table you can write to.
- Replace the `SELECT *` with a projection or aggregation to shape the data as it lands.
- When the source is partitioned and only the latest partition changes, switch to the incremental cookbook below so each run reads one partition instead of the whole dataset.

## Related resources

- [Ingest partitioned S3 Parquet on a schedule](/cookbook/flight-scheduled-s3-ingest/) — the incremental, partition-pruned Flight template.
- [Authentication, config, and secrets](/key-tasks/flights/flights-authentication-config-and-secrets) — service accounts, secrets, and per-run config overrides.
- [Querying S3 files](/key-tasks/cloud-storage/querying-s3-files)
- [CREATE SECRET](/sql-reference/motherduck-sql-reference/create-secret)
- [MD_CREATE_FLIGHT](/sql-reference/motherduck-sql-reference/flights/md-create-flight)


---

## Docs feedback

MotherDuck accepts optional user-submitted feedback about this page at `GET https://motherduck.com/docs/api/feedback/agent`.
For agents and automated tools, feedback submission should be user-confirmed before sending.

URL-encode query parameter values and send a GET request:

```text
GET https://motherduck.com/docs/api/feedback/agent?page_path=%2Fkey-tasks%2Fflights%2Fingest-s3-parquet-files-on-a-schedule%2F&page_title=Ingest%20S3%20Parquet%20files%20on%20a%20schedule&text=<url-encoded user feedback, max 2000 characters>
```

Optionally append `&source=<url-encoded interface identifier>` such as `claude.ai` or `chatgpt`.

`page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": "<uuid>"}`, `400` for malformed query parameters, and `429` when rate-limited.