We Built a Machine for Getting AnswersTry the MotherDuck MCP

Skip to main content

Using Excel and Google Sheets Data in MotherDuck

Key bits of data and side schedules often exist in spreadsheets like Excel and Google Sheets. It is nice to be able to easily add that data to your data warehouse and query it. This guide aims to show you how to perform this workflow using the DuckDB CLI for both Excel and Google Sheets.

tip

In order use these extensions, you will need to first install the DuckDB CLI. Instructions can be found here..

Microsoft Excel

note

The purpose of this guide is to show you how to load data from Excel into MotherDuck. If you'd like to retrieve MotherDuck data in Excel, you can follow this guide.

To read from an Excel spreadsheet, open the DuckDB CLI by typing duckdb 'md:' in your terminal. This will ask you for access to your MotherDuck account if you haven't already provided it.

You can now read Excel files directly with a simple SELECT * FROM 'movies.xslx' which will automatically load the DuckDB Excel extension. If you want to get more control you can use the read_xlsx function directly.

SELECT * FROM read_xlsx('movies.xlsx', sheet = 'Action Movies');

The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS).

CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name
SELECT *
FROM "C:\users\documents\movies.xlsx";

Of course, sometimes there is data in multiple tabs. In that case, you can use the sheet parameter to pass the tab names, and depending on the context, even union multiple tabs into a single table.

CREATE OR REPLACE TABLE my_db.main.my_movies AS -- use fully qualified table name
SELECT *
FROM st_read("C:\users\documents\movies.xlsx", sheet = 'Action Movies')
UNION ALL
SELECT *
FROM st_read("C:\users\documents\movies.xlsx", sheet = 'Romance Movies');

Google Sheets

info

While the Excel extension is a core DuckDB extension, the Google Sheets extension is a community extension maintained by Evidence.

The first step to handle Google Sheets is to install the duckdb-gsheets extension. That is done with these commands after starting the DuckDB CLI with duckdb 'md:'

INSTALL gsheets FROM community;
LOAD gsheets;

Since Google Sheets is a hosted application, we need to use DuckDB Secrets to handle authentication. This is as simple as:

CREATE SECRET (TYPE gsheet);
note

Using this workflow will require interactivity with a browser, so if you need to run it from a job (i.e. Airflow or similar), consider setting up a Google API access token.

In order to read from a Google Sheet, we need at minimum the sheet id, which is found in the URL, for example https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit. The string between d/ and /edit represents the spreadsheet id. It can therefore be queried with:

SELECT *
FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit');

The previous query simply returns the data set to the terminal, but the query can be modified to write the data into MotherDuck with "Create Table As Select" (CTAS).

CREATE OR REPLACE TABLE my_db.main.my_table AS -- use fully qualified table name
SELECT *
FROM read_gsheet('https://docs.google.com/spreadsheets/d/11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8/edit');

For convenience, the spreadsheet id itself can be queried as well.

SELECT * 
FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8');

To query data from multiple tabs, the tab name can be passed as parameter using sheet to select the preferred tab.

SELECT * FROM read_gsheet('11QdEasMWbETbFVxry-SsD8jVcdYIT1zBQszcF84MdE8', sheet='Sheet2');

For more detailed documentation, including writing to Google Sheets, review the duckdb-gsheets documentation.

Handling More Complex Workflows

Production use cases tend to be much more complex and include things like incremental builds & state management. In those scenarios, please take a look at our ingestion partners, which includes many options including some that offer native python. An overview of the MotherDuck Ecosystem is shown below.

Diagram