*New* The MotherDuck Native Integration is Live on Vercel Marketplace for Embedded Analytics and Data AppsLearn more

aggregation

Back to DuckDB Data Engineering Glossary

Overview

Aggregation refers to the process of combining multiple individual data values into a single summary value. In data analysis, aggregation helps compress detailed data into meaningful statistics that provide insights about groups of records rather than individual rows.

Common SQL Aggregations

The most frequently used SQL aggregation functions include:

  • COUNT() which tallies the number of rows or values
  • SUM() which adds numeric values together
  • AVG() which calculates the arithmetic mean
  • MIN() and MAX() which find extreme values
  • STDDEV() and VARIANCE() which compute statistical measures of spread

DuckDB Examples

DuckDB supports standard SQL aggregations plus additional statistical functions. Here are some examples:

Basic aggregation:

Copy code

SELECT department, COUNT(*) as employee_count, AVG(salary) as avg_salary, MAX(hire_date) as most_recent_hire FROM employees GROUP BY department;

Multiple aggregations on the same column:

Copy code

SELECT MIN(price) as lowest_price, MAX(price) as highest_price, AVG(price) as avg_price, STDDEV(price) as price_stddev FROM products;

DuckDB also supports statistical aggregations like CORRELATION, ENTROPY, and SKEWNESS:

Copy code

SELECT CORRELATION(price, quantity) as price_qty_correlation, ENTROPY(category) as category_entropy, SKEWNESS(price) as price_skewness FROM sales;

Window Function Aggregation

Aggregations can also be used as window functions in DuckDB to compute running totals and moving averages:

Copy code

SELECT date, amount, SUM(amount) OVER (ORDER BY date) as running_total, AVG(amount) OVER ( ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) as seven_day_average FROM daily_sales;

Filtered Aggregation

DuckDB supports filtering rows within aggregation functions using the FILTER clause:

Copy code

SELECT COUNT(*) as total_orders, COUNT(*) FILTER (WHERE status = 'completed') as completed_orders, SUM(amount) FILTER (WHERE status = 'completed') as completed_amount FROM orders;