Introducing Flights: agent-native data pipelines in MotherDuckJoin the livestream

Skip to main content

Replicating flat files to MotherDuck

The goal of this guide is to show users simple examples of loading data from flat file sources into MotherDuck. Examples are shown for both the MotherDuck Web UI and the DuckDB CLI. To install the DuckDB CLI, check out the instructions first.

CSV

From the UI, follow these steps:

  1. Navigate to the Add Data section.
  2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB.
  3. Execute the generated query which will create a table for you.
    1. Modify the query as needed to suit the correct Database / Schema / Table name.

JSON

From the UI, follow these steps:

  1. Navigate to the Add Data section.
  2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB.
  3. Execute the generated query which will create a table for you.
    1. Modify the query as needed to suit the correct Database / Schema / Table name.
Provide a schema for large or deeply nested JSON

When loading large JSON files, DuckDB scans the data to discover the schema during query planning. For deeply nested or complex JSON, this can add significant time.

To speed things up, provide the schema directly with the columns parameter:

CREATE TABLE my_table AS
SELECT * FROM read_json(
'path/to/local_file.json',
columns={
id: 'BIGINT',
name: 'VARCHAR',
amount: 'DECIMAL(10,2)'
}
);

If you already have a table with the right schema, use INSERT INTO instead of CREATE TABLE AS — DuckDB skips schema discovery when the target schema is known:

INSERT INTO my_table
SELECT * FROM read_json('path/to/local_file.json');

You can also limit how deep DuckDB looks into nested structures with maximum_depth, or reduce the number of sampled objects with sample_size (default: 20480). See the DuckDB JSON documentation for all available options.

Parquet

From the UI, follow these steps:

  1. Navigate to the Add Data section.
  2. Select the file. This file will be uploaded into your browser so that it can be queried by DuckDB.
  3. Execute the generated query which will create a table for you.
    1. Modify the query as needed to suit the correct Database / Schema / Table name.

Handling more complex workflows

Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our ingestion partners, which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

Diagram