StackOverflow Survey Data
About the dataset
Each year, Stack Overflow conducts a survey of developers to understand the trends in the developer community. The survey covers a wide range of topics, including programming languages, frameworks, databases, and platforms, as well as developer demographics, education, and career satisfaction.
Starting from 2017, StackOverflow provided consistent schema and data format for the survey data, making it a great dataset to analyze trends in the developer community over the years.
The source is data are a series of CSV files that has been merged into a single schema with two tables for easy querying.
How to query the dataset
This dataset is available as part of the sample_data
database. This database is auto attached to any new user's workspace.
To re-attach the database, you can use the following command:
ATTACH 'md:_share/sample_data/23b0d623-1361-421d-ae77-62d701d471e6' AS sample_data;
Schema
stackoverflow_survey.survey_results
This table contains all the survey results from 2017 to 2024. Each columns represents a question from the survey. As questions change from year to year, the columns may vary a bit and the table is quite large.
stackoverflow_survey.survey_schema
This table contains the schema of the survey results. qname
is the name of the question which is also the column name in the survey_results
table. question
is the full question text.
Column Name | Column Type |
---|---|
qname | VARCHAR |
question | VARCHAR |
qid | VARCHAR |
force_resp | VARCHAR |
type | VARCHAR |
selector | VARCHAR |
year | VARCHAR |
Examples_queries
List the most popular programming languages in 2024
SELECT
language,
COUNT(*) AS count
FROM (
SELECT UNNEST(STRING_SPLIT(LanguageHaveWorkedWith, ';')) AS language
FROM sample_data.stackoverflow_survey.survey_results
where year='2024'
) AS languages
GROUP BY language
ORDER BY count DESC;
Top 10 Countries with the Most Respondents in 2024
SELECT
Country,
COUNT(*) AS Respondents
FROM sample_data.stackoverflow_survey.survey_results
WHERE year = '2024'
GROUP BY Country
ORDER BY Respondents DESC
LIMIT 10;
Correlation Between Remote Work and Job Satisfaction in 2024
SELECT RemoteWork,
AVG(CAST(JobSat AS DOUBLE)) AS AvgJobSatisfaction,
COUNT(*) AS RespondentCount
FROM sample_data.stackoverflow_survey.survey_results
WHERE JobSat NOT IN ('NA',
'Slightly satisfied',
'Neither satisfied nor dissatisfied',
'Very dissatisfied',
'Very satisfied',
'Slightly dissatisfied')
AND RemoteWork NOT IN ('NA')
AND YEAR='2024'
GROUP BY ALL