Skip to main content

StackOverflow Data

Explore the data

Interactive dashboard built on the full Stack Overflow archive. Use it as a starting point for your own Dives.

Loading live Dive preview...

About the dataset

Stack Overflow is a website dedicated to providing professional and enthusiast programmers a platform to learn and share knowledge. It features questions and answers on a wide range of topics in computer programming and is renowned for its community-driven approach. Users can ask questions, provide answers, vote on questions and answers, and earn reputation points and badges for their contributions.

The dataset includes a complete data dump up to May 2023, covering posts, comments, users, badges, and related metrics.

You can read more about the dataset in our blog series part 1 and part 2.

How to query the dataset

As this dataset is quite large, it's not part of the sample_data database. Instead, you can find it as a dedicated shared database.

aws-us-east-1 region only

This database is only available for accounts in the aws-us-east-1 region.

To attach it to your workspace, you can use the following command:

ATTACH 'md:_share/stackoverflow/6c318917-6888-425a-bea1-5860c29947e5' AS stackoverflow;

Example queries

The following queries assume that the current database connected is stackoverflow. Run use stackoverflow to switch to it.

List the top 5 posts that received the most votes

SELECT posts.Title, COUNT(votes.Id) AS VoteCount
FROM posts
JOIN votes ON posts.Id = votes.PostId
GROUP BY posts.Title
ORDER BY VoteCount DESC
LIMIT 5;

Find the top 5 posts with the highest view count:

SELECT Title, ViewCount
FROM posts
ORDER BY ViewCount DESC
LIMIT 5;

Schema

Badges

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
UserIdBIGINTYES
NameVARCHARYES
DateTIMESTAMPYES
ClassBIGINTYES
TagBasedBOOLEANYES

Comments

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
PostIdBIGINTYES
ScoreBIGINTYES
TextVARCHARYES
CreationDateTIMESTAMPYES
UserIdBIGINTYES
ContentLicenseVARCHARYES

Post links

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
CreationDateTIMESTAMPYES
PostIdBIGINTYES
RelatedPostIdBIGINTYES
LinkTypeIdBIGINTYES

Posts

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
PostTypeIdBIGINTYES
AcceptedAnswerIdBIGINTYES
CreationDateTIMESTAMPYES
ScoreBIGINTYES
ViewCountBIGINTYES
BodyVARCHARYES
OwnerUserIdBIGINTYES
LastEditorUserIdBIGINTYES
LastEditorDisplayNameVARCHARYES
LastEditDateTIMESTAMPYES
LastActivityDateTIMESTAMPYES
TitleVARCHARYES
TagsVARCHARYES
AnswerCountBIGINTYES
CommentCountBIGINTYES
FavoriteCountBIGINTYES
CommunityOwnedDateTIMESTAMPYES
ContentLicenseVARCHARYES

Tags

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
TagNameVARCHARYES
CountBIGINTYES
ExcerptPostIdBIGINTYES
WikiPostIdBIGINTYES

Votes

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
PostIdBIGINTYES
VoteTypeIdBIGINTYES
CreationDateTIMESTAMPYES

Users

column_namecolumn_typenullkeydefaultextra
IdBIGINTYES
ReputationBIGINTYES
CreationDateTIMESTAMPYES
DisplayNameVARCHARYES
LastAccessDateTIMESTAMPYES
AboutMeVARCHARYES
ViewsBIGINTYES
UpVotesBIGINTYES
DownVotesBIGINTYES