About the dataset
Stack Overflow is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions.
The dataset includes a complete data dump up to May 2023, covering posts, comments, users, badges, and related metrics.
You can read more about the dataset in our blog series part 1 and part 2.
How to query the dataset
As this dataset is quite large, it's not part of the sample_data
database. Instead, you can find it as a dedicated shared database.
To attach it to your workspace, you can use the following command:
ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
Schema
Badges
column_name | column_type | null | key | default | extra |
---|
Id | BIGINT | YES | | | |
UserId | BIGINT | YES | | | |
Name | VARCHAR | YES | | | |
Date | TIMESTAMP | YES | | | |
Class | BIGINT | YES | | | |
TagBased | BOOLEAN | YES | | | |
column_name | column_type | null | key | default | extra |
---|
Id | BIGINT | YES | | | |
PostId | BIGINT | YES | | | |
Score | BIGINT | YES | | | |
Text | VARCHAR | YES | | | |
CreationDate | TIMESTAMP | YES | | | |
UserId | BIGINT | YES | | | |
ContentLicense | VARCHAR | YES | | | |
Post Links
column_name | column_type | null | key | default | extra |
---|
Id | BIGINT | YES | | | |
CreationDate | TIMESTAMP | YES | | | |
PostId | BIGINT | YES | | | |
RelatedPostId | BIGINT | YES | | | |
LinkTypeId | BIGINT | YES | | | |
Posts
column_name | column_type | null | key | default | extra |
---|
Id | BIGINT | YES | | | |
PostTypeId | BIGINT | YES | | | |
AcceptedAnswerId | BIGINT | YES | | | |
CreationDate | TIMESTAMP | YES | | | |
Score | BIGINT | YES | | | |
ViewCount | BIGINT | YES | | | |
Body | VARCHAR | YES | | | |
OwnerUserId | BIGINT | YES | | | |
LastEditorUserId | BIGINT | YES | | | |
LastEditorDisplayName | VARCHAR | YES | | | |
LastEditDate | TIMESTAMP | YES | | | |
LastActivityDate | TIMESTAMP | YES | | | |
Title | VARCHAR | YES | | | |
Tags | VARCHAR | YES | | | |
AnswerCount | BIGINT | YES | | | |
CommentCount | BIGINT | YES | | | |
FavoriteCount | BIGINT | YES | | | |
CommunityOwnedDate | TIMESTAMP | YES | | | |
ContentLicense | VARCHAR | YES | | | |
column_name | column_type | null | key | default | extra |
---|
Id | BIGINT | YES | | | |
TagName | VARCHAR | YES | | | |
Count | BIGINT | YES | | | |
ExcerptPostId | BIGINT | YES | | | |
WikiPostId | BIGINT | YES | | | |
Votes
column_name | column_type | null | key | default | extra |
---|
Id | BIGINT | YES | | | |
PostId | BIGINT | YES | | | |
VoteTypeId | BIGINT | YES | | | |
CreationDate | TIMESTAMP | YES | | | |
Users
column_name | column_type | null | key | default | extra |
---|
Id | BIGINT | YES | | | |
Reputation | BIGINT | YES | | | |
CreationDate | TIMESTAMP | YES | | | |
DisplayName | VARCHAR | YES | | | |
LastAccessDate | TIMESTAMP | YES | | | |
AboutMe | VARCHAR | YES | | | |
Views | BIGINT | YES | | | |
UpVotes | BIGINT | YES | | | |
DownVotes | BIGINT | YES | | | |
Examples queries
The following queries assume that the current database connected is stackoverflow
. Run use stackoverflow
to switch to it.
List the top 5 posts that received the most votes