
2024/12/17 - Jordan Tigani, Boaz Leskes
Separating Storage and Compute in DuckDB
Why separate storage and compute in DuckDB, how we do it in MotherDuck to enable sharing and future work.
- 5 min read
BY- 5 min read
BYData analytics requires sophisticated tools that can perform complex calculations while maintaining detailed row-level insights. Window functions are a powerful technique to meet this challenge, and MotherDuck provides an easy-to-use experience for implementing these analytical queries.
Window functions operate on a defined subset of rows within a result set. Unlike traditional aggregate functions that compress data into a single summary value, window functions enable calculations across a group of rows while preserving individual row details. They are particularly useful for:
MotherDuck offers a cloud-native approach to data analysis, providing a multiplayer environment for running complex queries. Its architecture supports seamless window function implementations, allowing data folks to perform sophisticated analytical tasks quickly and easily from the convenience of their browser or CLI.
Key advantages of MotherDuck include:
Let's create a sample dataset to illustrate window functions in MotherDuck:
Copy code
CREATE TABLE sales (
sales_date DATE,
product TEXT,
region TEXT,
sales_amount DOUBLE
);
INSERT INTO sales VALUES
('2023-01-01', 'Product A', 'East', 200),
('2023-01-02', 'Product A', 'East', 250),
('2023-01-03', 'Product A', 'East', 300),
('2023-01-01', 'Product B', 'West', 400),
('2023-01-02', 'Product B', 'West', 450),
('2023-01-03', 'Product B', 'West', 500);
In MotherDuck, the OVER clause defines the window for calculations by addressing three key aspects:
A general template for window functions looks like this:
Copy code
-- for illustrative purposes, not executable SQL
function_name(expression) OVER (
PARTITION BY column_name
ORDER BY column_name
ROWS/RANGE BETWEEN start_point AND end_point
)
row_number(): Assigning Unique IdentifiersAssigns a unique sequential number to rows within a partition:
Copy code
SELECT
sales_date,
product,
region,
sales_amount,
row_number() OVER (PARTITION BY region ORDER BY sales_date) AS row_id
FROM sales;
Query Result:
Copy code
sales_date | product | region | sales_amount | row_id
-----------+---------+--------+--------------+-------
2023-01-01| Product A| East | 200 | 1
2023-01-02| Product A| East | 250 | 2
2023-01-03| Product A| East | 300 | 3
2023-01-01| Product B| West | 400 | 1
2023-01-02| Product B| West | 450 | 2
2023-01-03| Product B| West | 500 | 3
rank() and dense_rank(): Establishing OrderThese functions determine a value's rank within a partition, with different approaches to handling ties:
Copy code
SELECT
product,
region,
sales_amount,
rank() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS sales_rank,
dense_rank() OVER (PARTITION BY region ORDER BY sales_amount DESC) AS dense_sales_rank
FROM sales;
Query Result:
Copy code
product | region | sales_amount | sales_rank | dense_sales_rank
----------+--------+--------------+------------+-----------------
Product A | East | 300 | 1 | 1
Product A | East | 250 | 2 | 2
Product A | East | 200 | 3 | 3
Product B | West | 500 | 1 | 1
Product B | West | 450 | 2 | 2
Product B | West | 400 | 3 | 3
lag() and lead(): Analyzing Adjacent RowsAccess values from preceding or following rows within a partition:
Copy code
SELECT
sales_date,
product,
region,
sales_amount,
lag(sales_amount, 1) OVER (PARTITION BY region ORDER BY sales_date) AS previous_day_sales
FROM sales
ORDER BY product, sales_date;
Query Result:
Copy code
sales_date | product | region | sales_amount | previous_day_sales
------------+---------+--------+--------------+--------------------
2023-01-01 | Product A | East | 200 | NULL
2023-01-02 | Product A | East | 250 | 200
2023-01-03 | Product A | East | 300 | 250
2023-01-01 | Product B | West | 400 | NULL
2023-01-02 | Product B | West | 450 | 400
2023-01-03 | Product B | West | 500 | 450
Calculate averages over a sliding window of rows:
Copy code
SELECT
sales_date,
product,
region,
sales_amount,
avg(sales_amount) OVER (
PARTITION BY region
ORDER BY sales_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS moving_avg
FROM sales
ORDER BY product, sales_date;
Query Result:
Copy code
sales_date | product | region | sales_amount | moving_avg
------------+---------+--------+--------------+------------
2023-01-01 | Product A | East | 200 | 225
2023-01-02 | Product A | East | 250 | 250
2023-01-03 | Product A | East | 300 | 275
2023-01-01 | Product B | West | 400 | 425
2023-01-02 | Product B | West | 450 | 450
2023-01-03 | Product B | West | 500 | 475
MotherDuck supports additional window function techniques:
MotherDuck provides a powerful platform for implementing window functions, enabling data professionals to perform sophisticated analytical queries with ease. By offering these flexible, easy-to-use analytics capabilities, MotherDuck supports seamless and fast insight generation for even the most complex queries.
As data complexity continues to grow, platforms like MotherDuck demonstrate the importance of these kinds of analytical tools in transforming raw data into meaningful insights.

2024/12/17 - Jordan Tigani, Boaz Leskes
Why separate storage and compute in DuckDB, how we do it in MotherDuck to enable sharing and future work.

2024/12/18 - Mehdi Ouazza
Learn how analytical database can help your web app, with MotherDuck and the native integration with Vercel