---
sidebar_position: 5
title: StackOverflow Data
description: Sample data from StackOverflow to use with DuckDB and MotherDuck to understand SQL-based data analytics.
---

## About the dataset

[Stack Overflow](https://stackoverflow.com/) is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions.

The dataset includes a complete **data dump up to May 2023**, covering posts, comments, users, badges, and related metrics.

You can read more about the dataset in our blog series [part 1](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-1/) and [part 2](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-2/).

## How to query the dataset

As this dataset is quite large, it's not part of the `sample_data` database. Instead, you can find it as a dedicated shared database.

:::note `aws-us-east-1` region only
This database is only available for accounts in the `aws-us-east-1` region.
:::

To attach it to your workspace, you can use the following command:

```sql
ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
```

## Schema
### Badges

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| UserId | BIGINT | YES | | | |
| Name | VARCHAR | YES | | | |
| Date | TIMESTAMP | YES | | | |
| Class | BIGINT | YES | | | |
| TagBased | BOOLEAN | YES | | | |

### Comments

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostId | BIGINT | YES | | | |
| Score | BIGINT | YES | | | |
| Text | VARCHAR | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| UserId | BIGINT | YES | | | |
| ContentLicense | VARCHAR | YES | | | |

### Post Links

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| PostId | BIGINT | YES | | | |
| RelatedPostId | BIGINT | YES | | | |
| LinkTypeId | BIGINT | YES | | | |

### Posts

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostTypeId | BIGINT | YES | | | |
| AcceptedAnswerId | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| Score | BIGINT | YES | | | |
| ViewCount | BIGINT | YES | | | |
| Body | VARCHAR | YES | | | |
| OwnerUserId | BIGINT | YES | | | |
| LastEditorUserId | BIGINT | YES | | | |
| LastEditorDisplayName | VARCHAR | YES | | | |
| LastEditDate | TIMESTAMP | YES | | | |
| LastActivityDate | TIMESTAMP | YES | | | |
| Title | VARCHAR | YES | | | |
| Tags | VARCHAR | YES | | | |
| AnswerCount | BIGINT | YES | | | |
| CommentCount | BIGINT | YES | | | |
| FavoriteCount | BIGINT | YES | | | |
| CommunityOwnedDate | TIMESTAMP | YES | | | |
| ContentLicense | VARCHAR | YES | | | |

### Tags

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| TagName | VARCHAR | YES | | | |
| Count | BIGINT | YES | | | |
| ExcerptPostId | BIGINT | YES | | | |
| WikiPostId | BIGINT | YES | | | |

### Votes

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostId | BIGINT | YES | | | |
| VoteTypeId | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |

### Users

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| Reputation | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| DisplayName | VARCHAR | YES | | | |
| LastAccessDate | TIMESTAMP | YES | | | |
| AboutMe | VARCHAR | YES | | | |
| Views | BIGINT | YES | | | |
| UpVotes | BIGINT | YES | | | |
| DownVotes | BIGINT | YES | | | |


## Examples queries
The following queries assume that the current database connected is `stackoverflow`. Run `use stackoverflow` to switch to it.

### List the top 5 posts that received the most votes

```sql
SELECT posts.Title, COUNT(votes.Id) AS VoteCount 
FROM posts 
JOIN votes ON posts.Id = votes.PostId 
GROUP BY posts.Title 
ORDER BY VoteCount DESC 
LIMIT 5;
```

### Find the top 5 posts with the highest view count:

```sql
SELECT Title, ViewCount 
FROM posts 
ORDER BY ViewCount DESC 
LIMIT 5;
```
