Example Datasets
We have prepared a series of datasets for you to dive into MotherDuck!
sample_data
The sample_data database is automatically attached to every MotherDuck account regardless of your region. You can start querying the following tables right away:
schema.table | Description |
|---|---|
who.ambient_air_quality | Historical air quality data from the World Health Organization. |
nyc.taxi | Taxi ride data from November 2020 |
nyc.rideshare | Ride share trips (Lyft, Uber etc) in NYC |
nyc.service_requests | Requests to NYC's 311 complaint hotline through phone and web |
hn.hacker_news | Sample of comments from Hacker News |
kaggle.movies | Movie titles and overviews with pre-computed embeddings from Kaggle |
stackoverflow_survey.survey_results | Survey results from 2017 to 2024 |
stackoverflow_survey.survey_schemas | Survey schemas (questions from the survey) from 2017 to 2024 |
Additional datasets
The following datasets are available as separate shared databases. See each dataset's page for instructions on how to attach them.
aws-us-east-1 region onlyThese additional databases are only available for accounts in the aws-us-east-1 region.
| Dataset | Description |
|---|---|
| StackOverflow | Full StackOverflow data dump up to May 2023 |
| PyPi / DuckDB Stats | Python package download data for the duckdb package, refreshed weekly |
| Hacker News (full) | Full Hacker News dataset from 2016 to 2025 |
| Foursquare | Global dataset of over 100 million points of interest (POIs) with location and business information |
FAQ
How do I re-attach the sample_data database?
The sample_data database is attached automatically, but if you have accidentally removed it, you can re-attach it with:
ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data;