
2023/10/02 - Michael Hunger
Exploring StackOverflow with DuckDB on MotherDuck (Part 2)
Exploring StackOverflow with DuckDB on MotherDuck (Part 2)
So, you've got some data in Azure blob storage, and you want to run some queries? You can do that with DuckDB or MotherDuck! DuckDB is a lightweight app you install on your computer and execute queries by typing in commands at your terminal. MotherDuck is essentially DuckDB in the cloud, with a UI running in your browser. There's nothing to install. The good news is that both now support querying data stored on Azure!
Whether you use DuckDB or MotherDuck, you need your Azure connection string to authenticate to the Azure platform. You can find your connection string in the Azure portal. Under Resources, click your storage container. Under Security + networking, click Access keys. If this is your first time using access keys, you may need to generate a new one. Next, click the Show button to reveal your connection string. Select the entire connection string and copy it to your clipboard.

As mentioned on this page in the Azure portal, keeping your connection string secure is very important. Learn more about Azure connection strings.
The DuckDB command-line interface (CLI) application supports Azure queries through an optional extension. To use this extension, you must start the application, install the extension, and configure your Azure connection string.
If you haven't already, download and install DuckDB on your computer. Open your terminal or command prompt, and start the DuckDB application.
Copy code
./duckdb
Now that you have DuckDB running, you must install and activate the Azure extension. You can do this in the DuckDB CLI with the following commands.
Copy code
INSTALL azure; LOAD azure;
With the Azure extension loaded, you can configure the extension to use your Azure connection string. Use the following SET command, replacing <your_connection_string> with the value copied from the Azure portal.
Copy code
SET azure_storage_connection_string = '<your_connection_string>';
You are now ready to query data files stored in your Azure container!
Here is the syntax for querying a file in Azure Blob storage.
Copy code
FROM 'azure://[container]/[file-name-or-file-pattern]'
For example, to query a file named survey_results.csv in a container named my_container, the SQL may look like the following.
Copy code
SELECT count(*) FROM 'azure://my_container/survey_results.csv';
You can also query across multiple files with a file-matching pattern. For example, if you have separate files for each month of the year named year-month-sales.csv, you could query across the entire year using the following.
Copy code
SELECT count(*) FROM 'azure://my_container/2023-*-sales.csv';
Combining the new Azure extension and the HTTPS extension, it's possible to query across multiple storage providers, should the need arise. For example, you may have historical data stored in Amazon S3 and more recent data stored in Azure and need to query across both.
Copy code
-- Load and configure Azure
INSTALL azure;
LOAD azure;
SET azure_storage_connection_string = 'your-connection-string';
-- Load and configure Amazon S3
INSTALL httpfs;
LOAD httpfs;
SET s3_access_key_id='your-access-key-id';
SET s3_secret_access_key='your-secret-access-key';
SET s3_region='your-region';
SELECT t1.*
FROM (
SELECT * FROM 's3://my-s3-bucket/sales-history.csv'
UNION ALL
SELECT * FROM 'azure://my-container/ytd-sales.csv'
) t1
ORDER BY "Gross Amt" DESC
LIMIT 10;
MotherDuck is a powerful, serverless analytics tool that enables you to run queries directly from your browser. And, there are fewer steps to configure MotherDuck to query Azure.
MotherDuck provides a secure and convenient way to store your Azure connection string so you can query Azure whenever you need. To save your Azure connection string in MotherDuck, log in to your MotherDuck account and complete the following steps.

Next, create a new cell in your MotherDuck notebook. Then, write a SQL query to access your Azure storage account. For example, if you have a file saved in my_container named ytd-sales.csv, you might try the following.
Copy code
SELECT *
FROM 'azure://my_container/ytd-sales.csv'
ORDER BY "Gross Sales"
LIMIT 10;
You are ready to duck and roll with MotherDuck and Azure!
With the Azure extension for DuckDB, you can now query data in secure Azure Blob storage, including CSV, JSON, parquet, Apache Iceberg, and others. To learn more, you may be interested in the following.

2023/10/02 - Michael Hunger
Exploring StackOverflow with DuckDB on MotherDuck (Part 2)
2023/10/30 - Marcos Ortiz
DuckDB news: Enhanced CSV reader with dialect detection. sqlfmt formatter hits 1.5M downloads. Harlequin terminal IDE. Spatial data management course launches.