Skip to main content

BigQuery

BigQuery is Google Cloud's fully-managed, serverless data warehouse that enables SQL queries using the processing power of Google's infrastructure.

BigQuery is Google Cloud's fully-managed, serverless data warehouse that lets you run SQL queries on the processing power of Google's infrastructure.

To load data from BigQuery into MotherDuck, use the duckdb-bigquery community extension. It reads through the BigQuery Storage Read API with parallel streams, filter pushdown, and Arrow compression — and loads results straight into DuckDB or MotherDuck without any glue code.

Prerequisites

  • DuckDB installed (using the CLI or Python).
  • Access to a GCP project with BigQuery enabled.
  • Valid Google Cloud credentials, provided through one of:
    • the GOOGLE_APPLICATION_CREDENTIALS environment variable, or
    • gcloud auth application-default login.

Minimum required IAM roles:

  • BigQuery Data Editor
  • BigQuery Job User

Loading data from BigQuery into MotherDuck

The following examples use the DuckDB CLI, but you can use any DuckDB or MotherDuck client.

Install and load the extension

INSTALL bigquery FROM community;
LOAD bigquery;

Attach a BigQuery project

To read data from your project, attach it like you would attach a DuckDB database:

ATTACH 'project=my-gcp-project' AS bq (TYPE bigquery, READ_ONLY);

To read from a public dataset, use the following syntax:

ATTACH 'project=bigquery-public-data dataset=pypi billing_project=my-gcp-project'
AS bq_public (TYPE bigquery, READ_ONLY);

Query a table

Once attached, you can query BigQuery tables directly using standard SQL syntax:

SELECT * FROM bq.dataset_name.table_name LIMIT 10;

Behind the scenes, this uses bigquery_scan. The extension also exposes two functions you can call directly:

bigquery_scan — for direct reads from a single table:

SELECT * FROM bigquery_scan('my_gcp_project.my_dataset.my_table');

bigquery_query — for custom GoogleSQL, including views and external tables that the Storage Read API can't access on its own:

SELECT * FROM bigquery_query(
'my_gcp_project',
'SELECT * FROM `my_gcp_project.my_dataset.my_table` WHERE column = "value"'
);

Both functions share the same Arrow scan engine. For very large reads, you can enable parallel read streams by relaxing DuckDB's default ordering guarantee:

SET preserve_insertion_order = FALSE;

Load data into MotherDuck

Verify the motherduck_token environment variable is set, then attach MotherDuck:

ATTACH 'md:';

Use CREATE TABLE ... AS to create a new table, or INSERT INTO ... SELECT to append data to an existing one:

CREATE DATABASE IF NOT EXISTS pypi_playground;
USE pypi_playground;

CREATE TABLE IF NOT EXISTS duckdb_sample AS
SELECT *
FROM bq_public.pypi.file_downloads
WHERE project = 'duckdb'
AND timestamp = TIMESTAMP '2025-05-26 00:00:00'
LIMIT 100;