Skip to main content

StackOverflow Data

About the dataset

Stack Overflow is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions.

The dataset includes a complete data dump up to May 2023, covering posts, comments, users, badges, and related metrics.

You can read more about the dataset in our blog series part 1 and part 2.

How to query the dataset

As this dataset is quite large, it's not part of the sample_data database. Instead, you can find it as a dedicated shared database. To attach it to your workspace, you can use the following command:

ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;

Schema

Badges

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
UserIdBIGINTYES
NameVARCHARYES
DateTIMESTAMPYES
ClassBIGINTYES
TagBasedBOOLEANYES

Comments

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
PostIdBIGINTYES
ScoreBIGINTYES
TextVARCHARYES
CreationDateTIMESTAMPYES
UserIdBIGINTYES
ContentLicenseVARCHARYES

Post Links

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
CreationDateTIMESTAMPYES
PostIdBIGINTYES
RelatedPostIdBIGINTYES
LinkTypeIdBIGINTYES

Posts

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
PostTypeIdBIGINTYES
AcceptedAnswerIdBIGINTYES
CreationDateTIMESTAMPYES
ScoreBIGINTYES
ViewCountBIGINTYES
BodyVARCHARYES
OwnerUserIdBIGINTYES
LastEditorUserIdBIGINTYES
LastEditorDisplayNameVARCHARYES
LastEditDateTIMESTAMPYES
LastActivityDateTIMESTAMPYES
TitleVARCHARYES
TagsVARCHARYES
AnswerCountBIGINTYES
CommentCountBIGINTYES
FavoriteCountBIGINTYES
CommunityOwnedDateTIMESTAMPYES
ContentLicenseVARCHARYES

Tags

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
TagNameVARCHARYES
CountBIGINTYES
ExcerptPostIdBIGINTYES
WikiPostIdBIGINTYES

Votes

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
PostIdBIGINTYES
VoteTypeIdBIGINTYES
CreationDateTIMESTAMPYES

Users

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
ReputationBIGINTYES
CreationDateTIMESTAMPYES
DisplayNameVARCHARYES
LastAccessDateTIMESTAMPYES
AboutMeVARCHARYES
ViewsBIGINTYES
UpVotesBIGINTYES
DownVotesBIGINTYES

Examples queries

The following queries assume that the current database connected is stackoverflow. Run use stackoverflow to switch to it.

List the top 5 posts that received the most votes

SELECT posts.Title, COUNT(votes.Id) AS VoteCount 
FROM posts
JOIN votes ON posts.Id = votes.PostId
GROUP BY posts.Title
ORDER BY VoteCount DESC
LIMIT 5;

Find the top 5 posts with the highest view count:

SELECT Title, ViewCount 
FROM posts
ORDER BY ViewCount DESC
LIMIT 5;