Hacker News
Explore the data
Interactive dashboard built on the Hacker News sample dataset. Use it as a starting point for your own Dives.
About the dataset
Hacker News is a social news website focusing on computer science and entrepreneurship. It is run by Y Combinator, a startup accelerator, and it's known for its minimalist interface. Users can post stories (such as links to articles), comment on them, and vote them up or down, affecting their visibility.
There are two ways to access the dataset:
- Through the
sample_datadatabase, which contains a sample of the data (from January 2022 to November 2022). This database is automatically attached to every MotherDuck account. - Through the
hacker_newsdatabase, which contains the full dataset (from 2016 to 2025).
To attach the full hacker_news database, you can use the following command:
aws-us-east-1 region onlyThe hacker_news database is only available for accounts in the aws-us-east-1 region.
ATTACH 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' AS hacker_news;
To read from the sample_data database, please refer to attach the sample datasets database
Example queries
Most shared websites
This query returns the top domains being shared on Hacker News.
SELECT regexp_extract(url, 'http[s]?://([^/]+)/', 1) AS domain, count(*) AS count FROM sample_data.hn.hacker_news WHERE url IS NOT NULL AND regexp_extract(url, 'http[s]?://([^/]+)/', 1) != '' GROUP BY domain ORDER BY count DESC LIMIT 20;
Most commented stories each month
This query calculates the total number of comments for each story and identifies the most commented story of each month.
WITH
ranked_stories AS (
SELECT
title,
'https://news.ycombinator.com/item?id=' || id AS hn_url,
descendants AS nb_comments,
YEAR(timestamp) AS year,
MONTH(timestamp) AS month,
ROW_NUMBER() OVER (
PARTITION BY
YEAR(timestamp),
MONTH(timestamp)
ORDER BY
descendants DESC
) AS rn
FROM
sample_data.hn.hacker_news
WHERE
type = 'story'
)
SELECT
year,
month,
title,
hn_url,
nb_comments
FROM
ranked_stories
WHERE
rn = 1
ORDER BY
year,
month;Most monthly voted stories
This query determines the most voted story for each month.
WITH
ranked_stories AS (
SELECT
title,
'https://news.ycombinator.com/item?id=' || id AS hn_url,
score,
YEAR(timestamp) AS year,
MONTH(timestamp) AS month,
ROW_NUMBER() OVER (
PARTITION BY
YEAR(timestamp),
MONTH(timestamp)
ORDER BY
score DESC
) AS rn
FROM
sample_data.hn.hacker_news
WHERE
type = 'story'
)
SELECT
year,
month,
title,
hn_url,
score
FROM
ranked_stories
WHERE
rn = 1
ORDER BY
year,
month;Keyword analysis
This query counts the monthly mentions a the keyword (here duckdb) in the title or text of Hacker News posts, organized by year and month.
SELECT
YEAR(timestamp) AS year,
MONTH(timestamp) AS month,
COUNT(*) AS keyword_mentions
FROM
sample_data.hn.hacker_news
WHERE
(
title LIKE '%duckdb%'
OR text LIKE '%duckdb%'
)
GROUP BY
year,
month
ORDER BY
year ASC,
month ASC;Schema
| column_name | column_type | null | key | default | extra |
|---|---|---|---|---|---|
| title | VARCHAR | YES | |||
| url | VARCHAR | YES | |||
| text | VARCHAR | YES | |||
| dead | BOOLEAN | YES | |||
| by | VARCHAR | YES | |||
| score | BIGINT | YES | |||
| time | BIGINT | YES | |||
| timestamp | TIMESTAMP | YES | |||
| type | VARCHAR | YES | |||
| id | BIGINT | YES | |||
| parent | BIGINT | YES | |||
| descendants | BIGINT | YES | |||
| ranking | BIGINT | YES | |||
| deleted | BOOLEAN | YES |