---
sidebar_position: 2
title: Hacker News
description: Sample data from Hacker News stories to use for SQL querying of DuckDB and MotherDuck databases.
---


## About the dataset

[Hacker News](https://news.ycombinator.com/) is a social news website focusing on computer science and entrepreneurship. It is run by Y Combinator, a startup accelerator, and it's known for its minimalist interface. Users can post stories (such as links to articles), comment on them, and vote them up or down, affecting their visibility.

There are two ways to access the dataset:

- Through the `sample_data` database, which contains a sample of the data (from **January 2022** to **November 2022**). This database is automatically attached to every MotherDuck account.
- Through the `hacker_news` database, which contains the full dataset (from **2016** to **2025**).

To attach the full `hacker_news` database, you can use the following command:

:::note `aws-us-east-1` region only
The `hacker_news` database is only available for accounts in the `aws-us-east-1` region.
:::

```sql
ATTACH 'md:_share/hacker_news/de11a0e3-9d68-48d2-ac44-40e07a1d496b' AS hacker_news;
```

## Schema

| column_name | column_type | null | key | default | extra |
|-------------|-------------|------|-----|---------|-------|
| title       | VARCHAR     | YES  |     |         |       |
| url         | VARCHAR     | YES  |     |         |       |
| text        | VARCHAR     | YES  |     |         |       |
| dead        | BOOLEAN     | YES  |     |         |       |
| by          | VARCHAR     | YES  |     |         |       |
| score       | BIGINT      | YES  |     |         |       |
| time        | BIGINT      | YES  |     |         |       |
| timestamp   | TIMESTAMP   | YES  |     |         |       |
| type        | VARCHAR     | YES  |     |         |       |
| id          | BIGINT      | YES  |     |         |       |
| parent      | BIGINT      | YES  |     |         |       |
| descendants | BIGINT      | YES  |     |         |       |
| ranking     | BIGINT      | YES  |     |         |       |
| deleted     | BOOLEAN     | YES  |     |         |       |

To read from the `sample_data` database, please refer to [attach the sample datasets database](./datasets.mdx)

## Example queries

### Most shared websites
This query returns the top domains being shared on Hacker News.

```sql
SELECT
    regexp_extract(url, 'http[s]?://([^/]+)/', 1) AS domain,
    count(*) AS count
FROM sample_data.hn.hacker_news
WHERE url IS NOT NULL AND regexp_extract(url, 'http[s]?://([^/]+)/', 1) != ''
GROUP BY domain
ORDER BY count DESC
LIMIT 20;
```

### Most Commented Stories Each Month
This query calculates the total number of comments for each story and identifies the most commented story of each month.

```sql
WITH ranked_stories AS (
    SELECT
        title,
        'https://news.ycombinator.com/item?id=' || id AS hn_url,
        descendants AS nb_comments,
        YEAR(timestamp) AS year,
        MONTH(timestamp) AS month,
        ROW_NUMBER()
            OVER (
                PARTITION BY YEAR(timestamp), MONTH(timestamp) 
                ORDER BY descendants DESC
            )
        AS rn
    FROM sample_data.hn.hacker_news
    WHERE type = 'story'
)

SELECT
    year,
    month,
    title,
    hn_url,
    nb_comments
FROM ranked_stories
WHERE rn = 1
ORDER BY year, month;
```


### Most monthly voted stories
This query determines the most voted story for each month.

```sql
WITH ranked_stories AS (
    SELECT
        title,
        'https://news.ycombinator.com/item?id=' || id AS hn_url,
        score,
        YEAR(timestamp) AS year,
        MONTH(timestamp) AS month,
        ROW_NUMBER()
            OVER (PARTITION BY YEAR(timestamp), MONTH(timestamp) ORDER BY score DESC)
        AS rn
    FROM sample_data.hn.hacker_news
    WHERE type = 'story'
)

SELECT
    year,
    month,
    title,
    hn_url,
    score
FROM ranked_stories
WHERE rn = 1
ORDER BY year, month;
```

### Keyword analysis
This query counts the monthly mentions a the keyword (here `duckdb`) in the title or text of Hacker News posts, organized by year and month.

```sql
SELECT
    YEAR(timestamp) AS year,
    MONTH(timestamp) AS month,
    COUNT(*) AS keyword_mentions
FROM sample_data.hn.hacker_news
WHERE
    (title LIKE '%duckdb%' OR text LIKE '%duckdb%')
GROUP BY year, month
ORDER BY year ASC, month ASC;
```
