StackOverflow Data
About the dataset
Stack Overflow is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions.
The dataset includes a complete data dump up to May 2023, covering posts, comments, users, badges, and related metrics.
How to query the dataset
As this dataset is quite large, it's not part of the sample_data
database. Instead, you can find it as a dedicated shared database.
To attach it to your workspace, you can use the following command:
ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;
Schema
Badges
column_name | column_type | null | key | default | extra |
---|---|---|---|---|---|
Id | BIGINT | YES | |||
UserId | BIGINT | YES | |||
Name | VARCHAR | YES | |||
Date | TIMESTAMP | YES | |||
Class | BIGINT | YES | |||
TagBased | BOOLEAN | YES |
Comments
column_name | column_type | null | key | default | extra |
---|---|---|---|---|---|
Id | BIGINT | YES | |||
PostId | BIGINT | YES | |||
Score | BIGINT | YES | |||
Text | VARCHAR | YES | |||
CreationDate | TIMESTAMP | YES | |||
UserId | BIGINT | YES | |||
ContentLicense | VARCHAR | YES |
Post Links
column_name | column_type | null | key | default | extra |
---|---|---|---|---|---|
Id | BIGINT | YES | |||
CreationDate | TIMESTAMP | YES | |||
PostId | BIGINT | YES | |||
RelatedPostId | BIGINT | YES | |||
LinkTypeId | BIGINT | YES |
Posts
column_name | column_type | null | key | default | extra |
---|---|---|---|---|---|
Id | BIGINT | YES | |||
PostTypeId | BIGINT | YES | |||
AcceptedAnswerId | BIGINT | YES | |||
CreationDate | TIMESTAMP | YES | |||
Score | BIGINT | YES | |||
ViewCount | BIGINT | YES | |||
Body | VARCHAR | YES | |||
OwnerUserId | BIGINT | YES | |||
LastEditorUserId | BIGINT | YES | |||
LastEditorDisplayName | VARCHAR | YES | |||
LastEditDate | TIMESTAMP | YES | |||
LastActivityDate | TIMESTAMP | YES | |||
Title | VARCHAR | YES | |||
Tags | VARCHAR | YES | |||
AnswerCount | BIGINT | YES | |||
CommentCount | BIGINT | YES | |||
FavoriteCount | BIGINT | YES | |||
CommunityOwnedDate | TIMESTAMP | YES | |||
ContentLicense | VARCHAR | YES |
Tags
column_name | column_type | null | key | default | extra |
---|---|---|---|---|---|
Id | BIGINT | YES | |||
TagName | VARCHAR | YES | |||
Count | BIGINT | YES | |||
ExcerptPostId | BIGINT | YES | |||
WikiPostId | BIGINT | YES |
Votes
column_name | column_type | null | key | default | extra |
---|---|---|---|---|---|
Id | BIGINT | YES | |||
PostId | BIGINT | YES | |||
VoteTypeId | BIGINT | YES | |||
CreationDate | TIMESTAMP | YES |
Users
column_name | column_type | null | key | default | extra |
---|---|---|---|---|---|
Id | BIGINT | YES | |||
Reputation | BIGINT | YES | |||
CreationDate | TIMESTAMP | YES | |||
DisplayName | VARCHAR | YES | |||
LastAccessDate | TIMESTAMP | YES | |||
AboutMe | VARCHAR | YES | |||
Views | BIGINT | YES | |||
UpVotes | BIGINT | YES | |||
DownVotes | BIGINT | YES |
Examples queries
The following queries assume that the current database connected is stackoverflow
. Run use stackoverflow
to switch to it.
List the top 5 posts that received the most votes
SELECT posts.Title, COUNT(votes.Id) AS VoteCount
FROM posts
JOIN votes ON posts.Id = votes.PostId
GROUP BY posts.Title
ORDER BY VoteCount DESC
LIMIT 5;
Find the top 5 posts with the highest view count:
SELECT Title, ViewCount
FROM posts
ORDER BY ViewCount DESC
LIMIT 5;