aggregation
Back to DuckDB Data Engineering Glossary
Overview
Aggregation refers to the process of combining multiple individual data values into a single summary value. In data analysis, aggregation helps compress detailed data into meaningful statistics that provide insights about groups of records rather than individual rows.
Common SQL Aggregations
The most frequently used SQL aggregation functions include:
COUNT()
which tallies the number of rows or valuesSUM()
which adds numeric values togetherAVG()
which calculates the arithmetic meanMIN()
andMAX()
which find extreme valuesSTDDEV()
andVARIANCE()
which compute statistical measures of spread
DuckDB Examples
DuckDB supports standard SQL aggregations plus additional statistical functions. Here are some examples:
Basic aggregation:
Copy code
SELECT
department,
COUNT(*) as employee_count,
AVG(salary) as avg_salary,
MAX(hire_date) as most_recent_hire
FROM employees
GROUP BY department;
Multiple aggregations on the same column:
Copy code
SELECT
MIN(price) as lowest_price,
MAX(price) as highest_price,
AVG(price) as avg_price,
STDDEV(price) as price_stddev
FROM products;
DuckDB also supports statistical aggregations like CORRELATION
, ENTROPY
, and SKEWNESS
:
Copy code
SELECT
CORRELATION(price, quantity) as price_qty_correlation,
ENTROPY(category) as category_entropy,
SKEWNESS(price) as price_skewness
FROM sales;
Window Function Aggregation
Aggregations can also be used as window functions in DuckDB to compute running totals and moving averages:
Copy code
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) as running_total,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as seven_day_average
FROM daily_sales;
Filtered Aggregation
DuckDB supports filtering rows within aggregation functions using the FILTER
clause:
Copy code
SELECT
COUNT(*) as total_orders,
COUNT(*) FILTER (WHERE status = 'completed') as completed_orders,
SUM(amount) FILTER (WHERE status = 'completed') as completed_amount
FROM orders;