# Using Excel and Google Sheets data in MotherDuck


> Load Excel and Google Sheets data into MotherDuck using the DuckDB CLI or HTTPS CSV export URLs.

Key bits of data and side schedules often exist in spreadsheets like Excel and Google Sheets. It is useful to add that data to your data warehouse and query it. This guide shows how to perform this workflow using the DuckDB CLI for both [Excel](#microsoft-excel) and [Google Sheets](#google-sheets).

:::tip
To use these extensions, you will need to first install the DuckDB CLI. [Instructions can be found here.](/getting-started/interfaces/connect-query-from-duckdb-cli).
:::

## Microsoft Excel

:::note
The purpose of this guide is to show you how to _load_ data from Excel into MotherDuck. If you'd like to _retrieve_ MotherDuck data in Excel, you can [follow this guide](/integrations/bi-tools/excel/).
:::

To read from an Excel spreadsheet, open the DuckDB CLI by typing `duckdb 'md:'` in your terminal.
This will ask you for access to your MotherDuck account if you haven't already provided it.

You can read Excel files directly with `SELECT * FROM 'movies.xlsx'`, which will automatically load the
DuckDB Excel extension. If you want to get more control you can use
[the `read_xlsx` function](https://duckdb.org/docs/stable/core_extensions/excel) directly.

```sql
SELECT * FROM read_xlsx('movies.xlsx', sheet = 'Action Movies');
```

The previous query returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS).

```sql
CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name
SELECT *
FROM 'C:\users\documents\movies.xlsx';
```

Sometimes there is data in multiple tabs. In that case, you can use the `sheet` parameter to pass the tab names, and depending on the context, even union multiple tabs into a single table.

```sql
CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name
SELECT *
FROM read_xlsx('C:\users\documents\movies.xlsx', sheet = 'Action Movies')
UNION ALL
SELECT *
FROM read_xlsx('C:\users\documents\movies.xlsx', sheet = 'Romance Movies');
```

## Google Sheets

### Query Google Sheets as CSV over HTTPS

If a Google Sheet is publicly accessible, or can be accessed with HTTP authentication, query it from MotherDuck with DuckDB's `read_csv()` function and the Google Sheets CSV export URL:

```sql
SELECT *
FROM read_csv(
    'https://docs.google.com/spreadsheets/d/<sheet_id>/export?format=csv&gid=<tab_id>',
    MD_RUN = REMOTE
);
```

The `sheet_id` is the value between `/d/` and `/edit` in the Google Sheet URL. The `gid` identifies the worksheet tab. When you run this while connected to MotherDuck, the HTTPS read can execute server side in MotherDuck.

To keep the spreadsheet queryable as live source data, create a view:

```sql
CREATE OR REPLACE VIEW my_db.main.sheet_source AS
SELECT *
FROM read_csv(
    'https://docs.google.com/spreadsheets/d/<sheet_id>/export?format=csv&gid=<tab_id>',
    MD_RUN = REMOTE
);
```

To snapshot the current spreadsheet data into MotherDuck, create a table instead:

```sql
CREATE OR REPLACE TABLE my_db.main.sheet_snapshot AS
SELECT *
FROM read_csv(
    'https://docs.google.com/spreadsheets/d/<sheet_id>/export?format=csv&gid=<tab_id>',
    MD_RUN = REMOTE
);
```

For private sheets, create an HTTP secret with an OAuth bearer token and scope it to Google Sheets:

```sql
CREATE SECRET google_sheets_http IN MOTHERDUCK (
    TYPE HTTP,
    SCOPE 'https://docs.google.com',
    EXTRA_HTTP_HEADERS MAP {
        'Authorization': 'Bearer <google_oauth_access_token>'
    }
);
```

See the [DuckDB HTTP authentication documentation](https://duckdb.org/docs/current/core_extensions/httpfs/https#authenticating) for more `httpfs` authentication options. For more detail on this Google Sheets URL pattern, see [Swimming in Google Sheets with MotherDuck](https://motherduck.com/blog/google-sheets-motherduck/).

### Query with the Google Sheets extension

::::info
While the Excel extension is a core DuckDB extension, the Google Sheets extension is a community extension maintained by Evidence.
::::

The first step to handle Google Sheets is to install the [duckdb-gsheets](https://duckdb-gsheets.com/) extension. That is done with these commands after starting the DuckDB CLI with `duckdb 'md:'`

```sql
INSTALL gsheets FROM community;
LOAD gsheets;
```

Since Google Sheets is a hosted application, we need to use [DuckDB Secrets](https://duckdb.org/docs/configuration/secrets_manager.html)
to handle authentication. This is as simple as:

```sql
CREATE SECRET (TYPE gsheet);
```

:::note
Using this workflow will require interactivity with a browser, so if you need to run it from a job (i.e. Airflow or similar), consider setting up a [Google API access token](https://duckdb-gsheets.com/#getting-a-google-api-access-token).
:::

To read from a Google Sheet, we need at minimum the sheet id, which is found in the URL, for example `https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit`. The string between `d/` and `/edit` represents the spreadsheet id. It can therefore be queried with:

```sql
SELECT *
FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit');
```

The previous query returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS).

```sql
CREATE OR REPLACE TABLE my_db.main.my_table AS -- use fully qualified table name
SELECT *
FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit');
```

For convenience, the spreadsheet id itself can be queried as well.

```sql
SELECT *
FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8');
```

To query data from multiple tabs, the tab name can be passed as parameter using `sheet` to select the preferred tab.

```sql
SELECT * FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8', sheet='Sheet2');
```

For more detailed documentation, including writing to Google Sheets, review the [duckdb-gsheets documentation](https://duckdb-gsheets.com/#getting-a-google-api-access-token).

## Handling more complex workflows

Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our [ingestion partners](https://motherduck.com/ecosystem/?category=Ingestion), which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

![Diagram](../../../img/md-diagram.svg)


---

## Docs feedback

MotherDuck accepts optional user-submitted feedback about this page at `POST https://motherduck.com/docs/api/feedback/agent`.
For agents and automated tools, feedback submission should be user-confirmed before sending.

Payload:

```json
{
  "page_path": "/key-tasks/data-warehousing/replication/spreadsheets/",
  "page_title": "Excel and Google Sheets",
  "text": "<the user's feedback, max 2000 characters>",
  "source": "<optional identifier for your interface, for example 'claude.ai' or 'chatgpt'>"
}
```

`page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": "<uuid>"}`, `400` for malformed payloads, and `429` when rate-limited.
