---
title: Example Datasets
description: A collections of open datasets and queries to get you started with DuckDB and MotherDuck
---


We have prepared a series of datasets for you to [dive](/key-tasks/ai-and-motherduck/dives/) into MotherDuck!

## sample_data

The `sample_data` database is automatically attached to every MotherDuck account regardless of your region. You can start querying the following tables right away:

| `schema.table`                                                     | Description                                                                                                    |
|--------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| [`who.ambient_air_quality`](air-quality.md)                        | Historical air quality data from the World Health Organization.                                                |
| [`nyc.taxi`](nyc-311-data.md)                                      | Taxi ride data from November 2020                                                                              |
| [`nyc.rideshare`](nyc-311-data.md)                                 | Ride share trips (Lyft, Uber etc) in NYC                                                                       |
| [`nyc.service_requests`](nyc-311-data.md)                          | Requests to NYC's 311 complaint hotline via phone and web                                                      |
| [`hn.hacker_news`](hacker-news.md)                                 | Sample of comments from [Hacker News](https://news.ycombinator.com/)                                          |
| [`kaggle.movies`](kaggle-movies.md)                                | Movie titles and overviews with pre-computed embeddings from [Kaggle](https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset) |
| [`stackoverflow_survey.survey_results`](stackoverflow-survey.md)   | Survey results from 2017 to 2024                                                                               |
| [`stackoverflow_survey.survey_schemas`](stackoverflow-survey.md)   | Survey schemas (questions from the survey) from 2017 to 2024                                                   |

## Additional datasets

The following datasets are available as separate shared databases. See each dataset's page for instructions on how to attach them.

:::note `aws-us-east-1` region only
These additional databases are only available for accounts in the `aws-us-east-1` region.
:::

| Dataset                                    | Description                                                                           |
|--------------------------------------------|---------------------------------------------------------------------------------------|
| [StackOverflow](stackoverflow.md)          | Full StackOverflow data dump up to May 2023                                           |
| [PyPi / DuckDB Stats](pypi.md)             | Python package download data for the `duckdb` package, refreshed weekly               |
| [Hacker News (full)](hacker-news.md)       | Full [Hacker News](https://news.ycombinator.com/) dataset from 2016 to 2025           |
| [Foursquare](foursquare.md)               | Global dataset of over 100 million points of interest (POIs) with location and business information |

## FAQ

### How do I re-attach the sample_data database?

The `sample_data` database is attached automatically, but if you have accidentally removed it, you can re-attach it with:

```sql
ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data;
```
