---
sidebar_position: 5
title: StackOverflow Data
description: Sample data from StackOverflow to use with DuckDB and MotherDuck to understand SQL-based data analytics.
---

import EmbeddedDive from '@site/src/components/EmbeddedDive';

## Explore the data

Interactive dashboard built on the full Stack Overflow archive. Use it as a starting point for your own [Dives](/key-tasks/ai-and-motherduck/dives/).

<EmbeddedDive
  diveId="eb4c2b4e-5b0c-4c13-833c-6d97989ea746"
  title="Stack Overflow Archive"
  height="700px"
/>

## About the dataset

[Stack Overflow](https://stackoverflow.com/) is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions.

The dataset includes a complete **data dump up to May 2023**, covering posts, comments, users, badges, and related metrics.

You can read more about the dataset in our blog series [part 1](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-1/) and [part 2](https://motherduck.com/blog/exploring-stackoverflow-with-duckdb-on-motherduck-2/).

## How to query the dataset

As this dataset is quite large, it's not part of the `sample_data` database. Instead, you can find it as a dedicated shared database.

:::note `aws-us-east-1` region only
This database is only available for accounts in the `aws-us-east-1` region.
:::

To attach it to your workspace, you can use the following command:

```sql
ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
```

## Schema
### Badges

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| UserId | BIGINT | YES | | | |
| Name | VARCHAR | YES | | | |
| Date | TIMESTAMP | YES | | | |
| Class | BIGINT | YES | | | |
| TagBased | BOOLEAN | YES | | | |

### Comments

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostId | BIGINT | YES | | | |
| Score | BIGINT | YES | | | |
| Text | VARCHAR | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| UserId | BIGINT | YES | | | |
| ContentLicense | VARCHAR | YES | | | |

### Post links

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| PostId | BIGINT | YES | | | |
| RelatedPostId | BIGINT | YES | | | |
| LinkTypeId | BIGINT | YES | | | |

### Posts

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostTypeId | BIGINT | YES | | | |
| AcceptedAnswerId | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| Score | BIGINT | YES | | | |
| ViewCount | BIGINT | YES | | | |
| Body | VARCHAR | YES | | | |
| OwnerUserId | BIGINT | YES | | | |
| LastEditorUserId | BIGINT | YES | | | |
| LastEditorDisplayName | VARCHAR | YES | | | |
| LastEditDate | TIMESTAMP | YES | | | |
| LastActivityDate | TIMESTAMP | YES | | | |
| Title | VARCHAR | YES | | | |
| Tags | VARCHAR | YES | | | |
| AnswerCount | BIGINT | YES | | | |
| CommentCount | BIGINT | YES | | | |
| FavoriteCount | BIGINT | YES | | | |
| CommunityOwnedDate | TIMESTAMP | YES | | | |
| ContentLicense | VARCHAR | YES | | | |

### Tags

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| TagName | VARCHAR | YES | | | |
| Count | BIGINT | YES | | | |
| ExcerptPostId | BIGINT | YES | | | |
| WikiPostId | BIGINT | YES | | | |

### Votes

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| PostId | BIGINT | YES | | | |
| VoteTypeId | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |

### Users

| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| Id | BIGINT | YES | | | |
| Reputation | BIGINT | YES | | | |
| CreationDate | TIMESTAMP | YES | | | |
| DisplayName | VARCHAR | YES | | | |
| LastAccessDate | TIMESTAMP | YES | | | |
| AboutMe | VARCHAR | YES | | | |
| Views | BIGINT | YES | | | |
| UpVotes | BIGINT | YES | | | |
| DownVotes | BIGINT | YES | | | |


## Examples queries
The following queries assume that the current database connected is `stackoverflow`. Run `use stackoverflow` to switch to it.

### List the top 5 posts that received the most votes

```sql
SELECT posts.Title, COUNT(votes.Id) AS VoteCount 
FROM posts 
JOIN votes ON posts.Id = votes.PostId 
GROUP BY posts.Title 
ORDER BY VoteCount DESC 
LIMIT 5;
```

### Find the top 5 posts with the highest view count:

```sql
SELECT Title, ViewCount 
FROM posts 
ORDER BY ViewCount DESC 
LIMIT 5;
```


---

## Docs feedback

MotherDuck accepts optional user-submitted feedback about this page at `POST https://motherduck.com/docs/api/feedback/agent`.
For agents and automated tools, feedback submission should be user-confirmed before sending.

Payload:

```json
{
  "page_path": "/getting-started/sample-data-queries/stackoverflow/",
  "page_title": "StackOverflow Data",
  "text": "<the user's feedback, max 2000 characters>",
  "source": "<optional identifier for your interface, for example 'claude.ai' or 'chatgpt'>"
}
```

`page_path` and `text` are required; `page_title` and `source` are optional. Responses: `200 {"feedback_id": "<uuid>"}`, `400` for malformed payloads, and `429` when rate-limited.
