We Built Our Employees a Wrapped—Using SQL and MotherDuck

2025/12/29 - 6 min read

BY
Hannah Rifkin

Spotify Wrapped gets 200 million people to voluntarily share their listening data every December. It's basically an end of year performance deck, except people actually want to look at it. One stat at a time, with a ranking at the end.

It's also, technically, not that complicated:

A tweet from Ana's parody account discusses Spotify Wrapped being simple SQL queries.

To ring in the new year, I decided to build a Wrapped for our team at MotherDuck.

I'm Hannah, a Customer Engineer at MotherDuck. We had the data, so I built it. Once the draft circulated, people had opinions—mostly about why their metric should rank higher.

Below: the metrics we tracked, the SQL behind them, and the duck personas we assigned. About an hour of work total.

The leaderboard

I ran 1.15 million queries this year—3x more than second place. Before you're too impressed: a lot of those were probably me re-running the same broken query until it worked. I also shared 54 databases with the team, more than anyone else. Did I include that metric because I knew I'd just shared 20 snapshots of the same database? Absolutely.

Elena (Ecosystems Engineering) had the longest streak: 182 consecutive days of query activity. Six months, no gaps. That's either impressive dedication or a sign we should check on her.

Alex (DevRel) created 2,176 databases. On average, that’s six databases a day.

Gaby (Support Engineer) processed 217 terabytes—more than double anyone else. Probably debugging someone else's problems.

Leonardo (Customer Engineer) logged 211 active days while running 681K queries. 👀

How I built it

I pointed our MotherDuck MCP at the internal data warehouse and quickly oriented to the data we had available in our MotherDuck account.

The data was already there—tables tracking queries, databases, and shares. The work was:

  1. Union two data sources (one covered Jan–Apr, one covered May–Dec)
  2. Calculate percentiles, streaks, and totals per user
  3. Filter out test accounts and service accounts
  4. Assign archetypes based on thresholds
  5. Materialize it into a table

Filtering out the noise

The trickiest part was filtering. First pass had an internal service account at #1 with 115 million queries. Impressive streak, but not an employee success story. With the MCP, I could iterate quickly—run a query, see who surfaced, add another exclusion, repeat until the list was actually employees.

Copy code

internal_users AS ( SELECT id::VARCHAR as user_id FROM current_users WHERE is_motherduck = true AND is_motherduck_test = false AND (is_service_account = false OR is_service_account IS NULL) AND email NOT LIKE '%+%' -- Exclude test aliases AND email NOT LIKE '%\_sa@%' ESCAPE '\' -- Exclude service accounts )

NOTE: Service accounts, test aliases, and automated processes can easily dominate your metrics. Always filter these out before running any "top users" analysis.

Total queries

We pull this from our aggregated daily stats tables. One row per user with everything we need: total queries, terabytes processed, hours of execution time, and active days. The MIN dates give us their first activity—useful for identifying new members of the team.

Copy code

user_totals AS ( SELECT q.user_id, SUM(q.queries) as total_queries, ROUND(SUM(q.total_io_bytes_gb) / 1000, 2) as tb_processed, ROUND(SUM(q.execution_time_seconds) / 3600, 1) as total_hours, COUNT(DISTINCT q.dt) as active_days, MIN(q.dt) as first_active_date, MAX(q.dt) as last_active_date FROM all_query_stats q INNER JOIN internal_users r ON q.user_id = r.user_id GROUP BY ALL )

The percentile

This is what gets screenshotted:

Copy code

percentiles AS ( SELECT user_id, total_queries, ROUND(100 - (PERCENT_RANK() OVER (ORDER BY total_queries) * 100), 1) as top_percentile FROM user_totals )

PERCENT_RANK() returns a value between 0 and 1. We flip it with 100 - so that lower numbers mean higher rank. Top 1% means you're ahead of 99% of people.

The streak

Longest consecutive days of activity.

TIP: Consecutive dates, when you subtract a row number, map to the same value. Group by that, count the rows, and you've got streak lengths. This pattern works for any "longest consecutive X" problem.

Copy code

streaks AS ( WITH daily_activity AS ( SELECT DISTINCT q.user_id, q.dt as activity_date FROM all_query_stats q INNER JOIN internal_users r ON q.user_id = r.user_id WHERE q.queries > 0 ), streak_groups AS ( SELECT user_id, activity_date, activity_date - (ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date ))::INT as streak_group FROM daily_activity ), streak_lengths AS ( SELECT user_id, streak_group, COUNT(*) as streak_length FROM streak_groups GROUP BY user_id, streak_group ) SELECT user_id, MAX(streak_length) as longest_streak FROM streak_lengths GROUP BY ALL )

The builder stats

databases_created shows who's spinning up new projects. databases_shared shows who's collaborating. We also split out DuckLake databases specifically. I was curious to learn which employees have already adopted DuckLake into their workflows.

Copy code

databases_created AS ( SELECT d.owner_id as user_id, COUNT(*) as databases_created, COUNT(*) FILTER (WHERE d.md_database_type = 'ducklake') as ducklake_dbs_created FROM current_databases d INNER JOIN internal_users r ON d.owner_id = r.user_id WHERE d.created_ts >= '2025-01-01' AND d.created_ts < '2026-01-01' AND d.owner_type = 'user' GROUP BY ALL ), databases_shared AS ( SELECT s.owner_id::VARCHAR as user_id, COUNT(*) as databases_shared FROM current_shares s INNER JOIN internal_users r ON s.owner_id::VARCHAR = r.user_id WHERE s.created_ts >= '2025-01-01' AND s.created_ts < '2026-01-01' GROUP BY ALL )

The archetypes

Spotify has "Audio Auras." We have duck personas. The order matters. CASE statements evaluate top to bottom, so Elite Duck (top 1%) takes priority over Streak Master. Someone could qualify for multiple archetypes— ordering helps assign the most impressive one first.

Copy code

CASE WHEN p.top_percentile <= 1 THEN 'Elite Duck 🏆' WHEN p.top_percentile <= 5 THEN 'Power User ⚡' WHEN p.top_percentile <= 10 THEN 'Super Quacker 🦆' WHEN COALESCE(st.longest_streak, 0) > 30 THEN 'Streak Master 🔥' WHEN t.active_days > 200 THEN 'Steady Builder 🏗️' WHEN COALESCE(ds.databases_shared, 0) >= 3 THEN 'Sharing Champion 🤝' WHEN COALESCE(dc.databases_created, 0) > 50 THEN 'Database Architect 🏛️' WHEN t.tb_processed > 100 THEN 'Data Cruncher 📊' ELSE 'Rising Duck 🌟' END as archetype

Elena is a Streak Master. Alex is a Database Architect. And yes, I gave myself Elite Duck. My family was thrilled to hear about the new title.

The final query

The full query joins the CTEs and writes to a table. Materializing means we can share it directly without giving people access to the raw data—and without re-running the aggregation every time someone wants to see their stats. For an internal project, this doesn't matter much. If we were serving a Wrapped to customers, it would.

Copy code

CREATE OR REPLACE TABLE wrapped_2025_internal AS WITH internal_users AS (...), all_query_stats AS (...), user_totals AS (...), percentiles AS (...), databases_created AS (...), databases_shared AS (...), streaks AS (...) SELECT t.user_id, t.total_queries, t.tb_processed, t.total_hours, t.active_days, t.first_active_date, t.last_active_date, p.top_percentile, COALESCE(dc.databases_created, 0) as databases_created, COALESCE(dc.ducklake_dbs_created, 0) as ducklake_dbs_created, COALESCE(ds.databases_shared, 0) as databases_shared, COALESCE(st.longest_streak, 0) as longest_streak, -- archetype CASE statement (shown above) FROM user_totals t JOIN percentiles p ON t.user_id = p.user_id LEFT JOIN databases_created dc ON t.user_id = dc.user_id LEFT JOIN databases_shared ds ON t.user_id = ds.user_id LEFT JOIN streaks st ON t.user_id = st.user_id

Building your own "Wrapped"

A few things that helped:

  • Start with what you can actually query. We had daily stats tables. Work with what exists before building new instrumentation.
  • Filter aggressively. Our first pass had a service account at #1 with 115 million queries. Impressive, but not the user type we were targeting.
  • Make it personal. People care about their own numbers. A company-wide total is interesting; "you're in the top 5%" is shareable.

End of year is a good time to count things. Might as well make it fun. Happy holidays from the MotherDuck team. May your queries be fast and your streaks unbroken.

blog subscription icon

Subscribe to motherduck blog

PREVIOUS POSTS

A Quack-Packed Fall

2025/12/22 - Carly Spoljaric, Jerel Navarrete

A Quack-Packed Fall

MotherDuck spent fall on the conference circuit across Europe and the US. Here's what we presented and the pattern that emerged in nearly every conversation about analytics.

What If We Don't Need the Semantic Layer?

2025/12/23 - Jacob Matson

What If We Don't Need the Semantic Layer?

What if AI could discover your business logic by mining query history instead of requiring manual metric definitions? Explore how LLMs are replacing static semantic layers with systems that learn from actual usage.